Introduction
It has been well established that public attitudes toward stuttering are characterized by erroneous or negative stereotypes about stuttering that affect the development, maintenance, and symptoms of stuttering as well as the attitudes and quality of life of people who stutter (e.g., Arnold et al., 2015; Barnett et al., 2005; Blood et al., 2011; Boyle & Cheyne, 2024; Bricker-Katz et al., 2013; Craig et al., 2009; Ham, 1990; Hughes, 2015; Iimura & Miyamoto, 2022; Kumar & Varghese, 2018; Norman et al., 2023; Turnbull, 2006). Weidner and St. Louis (2023) offered guidelines for developing interventions designed to reduce negative stuttering attitudes toward stuttering in children or adults who do not stutter. They summarized 47 different intervention studies that had been intended to improve stuttering attitudes. Based on authors’ objective or subjective reports, 21% of the participants had “very positive” changes, 43% had “positive” changes, 30% had “little change,” 2% had “negative and positive” change, 2% had “little and positive” change, and 2% had “negative” change. For ten studies wherein control groups with no intervention were employed, 80% showed little change and 20% improved.
Weidner and St. Louis’s (2023) list of intervention studies was not exhaustive but contained most of the available intervention studies. Included were numerous unpublished as well as published studies that administered the Public Opinion Survey of Human Attributes–Stuttering (POSHA–S) (St. Louis, 2011, 2022), which measures explicit public attitudes toward stuttering. Researchers around the world have been offered the option to administer the POSHA–S to their desired samples of people at no cost providing that they obtained human subject eithics approval at their respective institutions and shared copies of their respondent data with its author (this paper’s first author) to be included in a large and growing POSHA–S database. The purpose of the database has been to empirically define what can be called “average” stuttering attitudes.
The POSHA–S, described in detail by St. Louis (2011, 2012, 2015, 2025), was developed as a “standard” measure of public attitudes such that samples from different areas and populations could be compared. It is psychometrically satisfactory in terms of validity, reliability, translatability, readability, and administration mode. Although periodically updated, the database recently contained 25,739 respondents from 45 countries. These include translations of the instrument into 28 different languages (St. Louis, 2025). In addition to an extensive demographic section and a general section that compares stuttering to four other “anchor” attributes (i.e., mental illness, obesity, left handedness, and intelligence) based on four items, stuttering attitudes are addressed in 39 items (35 items in the stuttering section and four from stuttering ratings in the general section). These are averaged into eight components (e.g., Cause), and these components into subscores. Two stuttering subscores, Beliefs and Self Reactions, are further averaged into an Overall Stuttering Score (OSS). Beliefs refer to what respondents know, surmise, or guess about stuttering but does not involve them specifically thinking about what they would do or where their beliefs come from. In that sense, Beliefs are external to the respondent. Self Reactions are internal in that, to rate an item, respondents must imagine themselves in a speaking situation with a stutterer or evaluate the extent or type of knowledge they bring to bear to their ratings.
An Obesity/Mental Illness subscore is also generated from general items. All ratings are converted to a standard scale from -100 to +100, and ratings for some items are inverted so that, consistently, higher scores reflect more positive (accurate, empathetic, and evidence-based) beliefs and reactions, whereas lower scores reflect more negative attitudes.
A series of studies, described below, were motivated by the need to explain why most efforts to improve stuttering attitudes in pre- versus post-test designs resulted in improved attitudes, but a substantial minority did not (Weidner & St. Louis, 2023). Three studies, in particular, that failed to improve public attitudes were especially puzzling. The first was a study in which Kuwaiti teachers did not improve in POSHA–S measured attitudes toward stuttering whereas a group of preservice education students did improve significantly (Abdalla & St. Louis, 2014). Another was an attempt to improve stuttering attitudes in American middle school students after viewing a well-known published film about children who stutter (Kuhn & St. Louis, 2015). The third was a large study of high school and university students in Poland who either viewed the Polish version of another well-known British film or viewed an illustrated presentation on stuttering (Węsierska et al. (2015). Mean post-intervention POSHA–S scores did not improve over pre-intervention scores in the respondents in these samples.
The source for the study series were samples from the POSHA–S database collated in 2016 that included both pre- and post-intervention data. Comprised were 29 different samples wherein an intervention occurred between the pre- and post-test and 12 samples with no interventions between the two POSHA–S administrations. These latter samples were carried out either as test-retest reliability samples or as control groups (C/R) in intervention studies. The four aggregate studies utilized either or both the intervention and the control/reliability (referred herein as “Control” samples.
The first aggregate study (St. Louis et al., 2020) classified the 29 intervention samples according to changes in three POSHA–S summary scores as “very successful” (VS), “successful” (S), “marginally successful” (MS), and “unsuccessful” (U). This was accomplished by determining whether the Beliefs subscore, Self Reactions subscore, and/or the Overall Stuttering Score (OSS) improved from pre- to post-test by more or less than 5 units. If all three improved by ≥ 5 units, the sample was categorized as VS; if two of three improved by that amount, it was deemed S; if only one of three improved, it was considered MS; and if none of the three improved, it was categorized as U. A total of 480 respondents (from 15 samples) were thereby assigned to the VS category, 109 respondents (from 3 samples) to the S category, 92 respondents (from 4 samples) to the MS category, and 253 (from 7 samples) to the U category. The percentages of the intervention respondents who were thereby assigned to each category were as follows: VS = 51%, S = 12%, MS = 10%, and U = 27%.
Discriminant function analysis revealed that the four success categories were predicted partially by three characteristics of the interventions themselves but not at all by demographic characteristics of the 29 samples. The three intervention characteristics were: (a) content that was of interest to or involved the respondents (e.g., use of humor or personal experience or contact with people who stutter), (b) personal or emotional connections (e.g., feelings associated with stuttering), and (c) information about stuttering that is sufficient, but not overwhelming (e.g., showing videos of people who stutter rather than providing didactic descriptive information and explaining DOs and DON’Ts regarding interacting with people who stutter).
The second aggregate study (St. Louis, Aliveto, et al., 2024) first replicated early test-retest reliability studies of the POSHA–S by combining all 345 C/R respondents. Pre-test and post-test means were nearly identical, and the pre- versus post-test correlation for the overall attitude score was .79. Both metrics were indicative of satisfactory test-retest reliability, which was consistent with earlier reliability studies (Abdi et al., 2015; St. Louis et. al., 2009; St. Louis, 2012). The second study’s primary aim, however, was to explore characteristics of individual respondents from the 12 non-intervention samples. The authors observed that individual post-OSS values were substantially negatively correlated with the post-minus-pre scores (or amount and direction the respondents changed) indicating that something unusual may have occurred. As a next step, respondents were sorted according to whether their OSSs (a) worsened from pre- to post-test by 5 or more units (≤ -5) (negative changers), (b) improved from pre- to post-test by 5 or more units (≥ +5) (positive changers), or (c) did not worsen or improve by 5 units (> -5 and < +5) (minimal changers). Two surprising findings emerged, both of which were unexpected by all investigators of the 12 samples. First, rather than a large majority, only about one-third of the respondents were in the minimal change group, with the other two-thirds split quite evenly between the positive changers and negative changers. Second, confirming the correlations, the positive changers had the lowest scores on the pre-test but highest scores on the post-test, while the negative changers had the opposite pattern, highest scores at pre-test and lowest scores at post-test. This pattern was termed a “crossover” effect, which basically meant that the positive and negative changers canceled each other out in the overall pre- versus post-test means, which were virtually unchanged. The “regression to the mean” phenomenon (Barnett et al., 2005) was carefully considered because it typically influences post-test scores if pre-test scores are sorted in terms of magnitude. Importantly, regression to the mean is related to the extent to which pre- and post-test scores are correlated. There is no regression to the mean if the pre- versus post-test correlation equals ±1.0 but complete regression to the mean if the correlation equals 0. The formula to calculate the percentage regression (movement) to the mean is 100 x (1 – pre versus post correlation) (Trochim, 2026). The authors concluded that regression to the mean certainly occurred, as it always does when pre-test data are sorted from higher to lower; however, given the relatively high correlation of r = 0.789 between pre and post responses, its effect was negligible.
The third aggregate study (St. Louis, Abdalla, et al., 2024) utilized both the 29 intervention samples and the 12 non-intervention samples using the same success categories of the first study (VS, S, MS, and U (St. Louis et al., 2020). In the same manner as in the second study (St. Louis, Aliveto, et al., 2024), 934 individual respondents from each intervention category of respondents were sorted as positive, negative, and minimal changers. The 345 non-intervention respondents were included in this third study for comparison to the intervention categories. Findings showed that all four intervention success categories, like the non-intervention category in St. Louis, Aliveto, et al. (2024), demonstrated quite similar “crossover” effects of the positive and negative changers, while minimal changers, by definition, stayed roughly the same from pre-test to post-test. Moreover, for all four success categories, the OSS values of the positive changers, like those for the non-intervention respondents in the second study, were quite dramatically lowest at pre-test and, also, dramatically highest at post-test. The opposite effect occurred for the negative changers. Importantly, the magnitude of the positive and negative changes was remarkably similar across the categories. What did change as a function of success was the percentage of respondents in each category. For example, in the VS category, 75% were in the positive change group, 18% in the minimal change group, and 7% in the negative change group. By contrast, in the U category, 41% were positive changers, 23% were minimal changers, and 35% were negative changers. Importantly, these U category percentages were similar to the percentages in the C/R category, that is, 36% positive changers, 35% minimal changers, and 30% negative changers. As with the second study, regression to the mean was considered and found to have a measurable but minimal effect on the “crossover” effect in the VS group but virtually no effect in the other categories. Overall, the intervention sample’s OSSs, Belief, and Self Reactions improved by 10 unis each.
The fourth aggregate study also considered both the 29 intervention samples and the 12 non-intervention C/R samples (St. Louis et al., 2025). Using the C/R sample as a baseline, the aim of this study was to estimate the percentages of respondents who shifted—either shifting to or shifting from—positive, minimal, or negative change groups for each of the four intervention categories. For example, if the percentage of positive changers increased from the C/R percentage of 36% to that of the very successful (VS) intervention percentage of 75%, where did the “added” approximately 40% come from? Did they come from the minimal changers, negative changers, or both? The answer is “both,” that is, 23% from the potential negative changers and 17% from the potential minimal changers. In other words, something in the VS interventions impacted both of these groups in the desired direction. Progressing in the direction from more to less successful interventions, the successful (S) interventions most likely impacted the potential negative changers by shifting 15% of them to positive changers and less than 1% to positive changers from the minimal change group. The marginally successful (MS) intervention category was most similar to the non-intervention (C/R) category. Five percent from potential negative changers shifted to positive changers and 1% shifted to minimal changers. Interventions in the unsuccessful (U) category, with an overall mean difference between pre-test and post-test OSSs near zero like the C/R category, actually reduced the potential minimal change category by 13%, with 9% shifting to the positive changers and 4% to negative changers.
To the extent that this strategy yields a valid picture of what actually occurred in the minds of the respondents subjected to interventions, the good news is that most of the intervention-induced changes were in the desired direction. Excepting a 4% shift from minimal to negative in the U category, all the shifts were toward more positive (or less negative) groups (St. Louis et al., 2025). Interventions shifted respondents from both the negative change and minimal change groups to the positive change group in the VS category. Most of the positive change gain in the S category resulted from shifts from the negative change group. All intervention-induced changes in the MS category came from the negative change group, a modest percentage to the positive change group and very small percentage to the minimal change group. The U category shifted significant percentages to both the positive and negative change groups, resulting mainly in a smaller minimal change category, compared to the non-intervention category.
One of several future lines of research recommended by authors of these aggregate studies was to explore the stability of stuttering attitudes. Two recent studies were designed to accomplish that. In the first, optometry students in the northern part of India filled out the POSHA–S four times in succession over three months with no intervention (Gupta, St. Louis, Rastogi, et al., 2025). Comparing the second to first administrations of positive, negative, and minimal changers, an apparent “crossover” effect was negated by regression to the mean because, unlike all but one of the 41 samples in the aggregate studies, there was virtually no correlation between pre and post scores (r = −0.013). Responses in subsequent third and fourth administrations were chaotic. The authors concluded that the respondents and the testing environments or constraints rendered the results anomalous such that the study should be repeated with more serious and consistent respondents. It is curious that the one intervention sample in the 2016 cohort with virtually no correlation between pre and post stuttering scores (OSS r = 0.005) were teachers from Kuwait who, unlike preservice education students, did not improve in measured attitudes after viewing a custom video on problems children who stutter have in school (Abdalla & St. Louis, 2014).
The second study (Gupta, St. Louis, Dutt, et al., 2025) was similar to the first except it introduced three brief—but slightly different—interventions between the first and second administrations of the POSHA–S to clinical psychology or food and nutrition students based on OSS values of their first administration. Sorted by OSSs in the first administration, the highest one-third of respondents received a short instruction that their first impressions about stutterers were probably correct, with the hypothesis that they would not then shift negatively so much. One-third of the respondents with the lowest OSSs at the first administration were briefly instructed to consider that stuttering is really not a serious problem, hypothesizing that they would therefore respond with more positive attitudes. The middle third was told that public opinion about stuttering can improve with accurate information, hypothesizing that they would rate more positively rather than stay roughly the same. The interventions also included a PowerPoint presentation on stuttering that, unfortunately in hindsight, was different for the three one-third groups. The interventions were followed in the second administration by improved OSSs in all three groups, progressively more for the lowest third, followed by the middle third, and then by the highest third from the first administration. Degrees of success of the three different interventions were not apparent; however, overall attitudes improved substantially. Using the categories in the aforementioned aggregate studies, the overall sample improvement would be categorized very successful (VS). Next, the respondents were sorted as in the St. Louis, Abdalla, et al. (2024) and St. Louis, Aliveto, et al. (2024). When sorted this way, a notable “crossover” effect did occur but was less prominent than in the earlier aggregate studies because the regression to the mean effect was greater due to a low test-retest correlation (r = 0.132) from the first to second administration. In third and fourth administrations, post-intervention improvements progressively decreased essentially back to the level of the first administration.
Purpose
The purpose of the current study was two-fold: (a) to replicate selected findings from the aforementioned aggregate studies and (b) to extend findings from both earlier and later samples to explore which items in the POSHA–S are most responsible for intervention-induced changes in attitudes toward stuttering and “crossover” effect.
Research Questions (RQs) and Hypotheses
1. To what extent would a combined intervention group improve attitudes toward stuttering compared to a combined control group? Based on earlier data, we hypothesized that the intervention group would be approximately 10 units more positive in the post-test versus the pre-test for Beliefs, Self Reactions, and Overall Stuttering Scores and that the control group would not change from pre-test to post-test.
2. To what extent would the extreme regression past the mean or “crossover” effect characterize subgroups of intervention or control respondents who are sorted by those who improved substantially, worsened substantially, or remained nearly the same from pre-test to post-test. We hypothesized that the “crossover” effect would characterize ratings of negative changers and positive changers and that the regression to the mean phenomenon would have a negligible effect.
3. To what extent are individual POSHA–S items more or less responsible for improvements in stuttering attitudes and/or the “crossover” effect? We hypothesized that those items with larger effect sizes in pair-wise comparison among the three change groups would be more likely to be associated with improved attitudes or the “crossover” effect than those with smaller effect sizes.
Discussion
Summary
This pre versus post study using the POSHA–S compared attitudes toward stuttering of large international samples of non-stuttering persons who either had been exposed to interventions designed to improve those attitudes or who were in non-intervention control groups. Data from a 2026 cohort consisting of 16 intervention samples and seven control samples were combined. Results from these were then compared results from a 2016 cohort, 29 intervention samples and 12 control or test-retest reliability samples, reported in four previous aggregate studies (St. Louis et al., 2020; St. Louis, Abdalla, et al., 2024; St. Louis, Aliveto, et al., 2024; St. Louis et al., 2025). Those studies documented an extreme case of regression to the mean, termed a “crossover” effect, in which negative changers from pre to post began with the highest scores and ended with the lowest scores. Conversely, positive changers began with the lowest scores and ended with the highest scores. Minimal changers, by definition, scored similarly in pre- and post-administrations. The 2026 intervention and control respondents demonstrated the “crossover” effect, similar to that of the 2016 cohort; however, because their pre- versus post-test correlations were much lower, the “crossover” effect was greatly minimized after mathematically correcting for regression to the mean. Overall, the 2026 interventions were more effective than the 2016 interventions. Aside from POSHA–S stuttering items that were rated similarly or dissimilarly being least or most changeable, respectively, specific questionnaire items that would predict overall improvement or who would be negative, minimal, or positive changers could not be identified. It appeared that individual differences or preferences from one respondent to another were primarily responsible for the large differences among the change groups that occurred.
Comparison of Interventions and Controls in Two Cohorts
The 16 interventions applied in the 2026 cohort were clearly as effective as the 29 interventions in the 2016 cohort. Not only did they improve by the benchmark identified in previous studies (St. Louis, 2012, 2015) of an average of 10 OSS units, which was the case in the previous aggregate study (St. Louis, Abdalla et al., 2024), they improved on average by 12 units, thus confirming our RQ1 hypothesis. It should be noted that in both cohorts, Beliefs and Self Reactions improved approximately equally, indicating that the interventions affected beliefs about stuttering that are external to the respondent as well as reactions and awareness of knowledge that are internal to the respondent.
Regression to the Mean and the “Crossover” Effect
The “crossover” effect clearly characterized the 2016 cohort (St. Louis, Abdalla, et al., 2024; St. Louis, Aliveto, et al., 2024), which can be regarded as an extreme case of regression to the mean. In other words, rather than high or low scorers on the pre-test scoring simply closer to the mean on the post-test, their post-test scores were far beyond the mean in the opposite direction. These findings were unexpected and potentially controversial, especially since it was observed in both intervention and control or test-retest reliability samples (
Figure 3). Its effect was diminished slightly by applying the formula to correct for regression to the mean, which is a function of the pre versus post correlation.
Interestingly, upon reanalysis, the “crossover” effect occurred in the unsuccessful intervention samples that were included in the aggregate studies, mentioned in the introduction. St. Louis and Kuhn (2015) introduced a video intervention to 12 middle school respondents in a group setting without a teacher present and found that numerous students were not serious about the ratings. Several of the boys laughed when a youth in the film stuttered. A second group of 36 students watched the video with a teacher present with instructions to pay attention seriously. The first subgroup was categorized as U (unsuccessful); the second as MS (marginally successful). The Węsierska et al. (2015) high school students and university students who were either in two intervention groups or in two control groups also demonstrated the “crossover” effect. The high school student samples who were either shown a film or a witnessed a presentation on stuttering were both classified as U, as were the university students who watched the film. The university students listening to the presentation were placed into the MS category. As in the second aggregate study (St. Louis, Aliveto, et al., 2024), the near equal OSS means of the pre-tests and post-test occurred because the minimal changers did not change while the “crossover” scores of the negative changers and positive changers cancelled each other out. This also was true of the Kuwaiti teachers in the Abdalla and St. Louis (2014) study.
One of the primary purposes of the current study was to confirm or disconfirm the “crossover” effect in entirely different aggregate samples. The effect was observed quite similarly in the 2026 intervention and control samples as well, confirming its presence. However, a much larger proportion of its effect can be considered artifact due to lower pre versus post correlations in both the intervention and control groups. We conclude that the “crossover” effect, although present in the 2026 cohorts, was not as strong as in the earlier samples, only partly supporting our RQ2 hypothesis.
How can the pre versus post correlation discrepancy between the two cohorts be explained? The reason may, in part, lie in the fact that up to 10 years intervened between collecting data for samples in the two cohorts. During that period, greater international awareness of stuttering may have affected some respondents in various samples while the remaining respondents held to previously more uniform attitudes. Importantly, the correlation discrepancy was not the result of a few outliers in the contributing samples in the cohorts. Comparing pre versus post OSS correlations among the 16 intervention samples that made up the 2026 cohort, the mean was 0.377, the median was 0.383, the minimum was -0.051, and the maximum was 0.768. These compare to means, median, minimum, and maximums of 0.586, 0.625, 0.005, and 0.867, respectively, in the 29 samples in the 2016 cohort, which, except for the minimum, were uniformly higher. The same is true of the control groups. In the 2026 cohort, the correlations of the six individual samples were: mean = 0.469, median = 0.510, minimum = -0.013, and maximum = 0.741. The same correlations, respectively, for the twelve 2016 samples were 0.618, 0.646, 0.329, and 0.815, again uniformly higher than in the 2026 cohort.
Regression to the mean is a real, but often overlooked, phenomenon (Zhang & Tomblin, 2003). To our knowledge, what the few previous studies of attitudes toward stuttering labeled as the “crossover” effect has not been reported in any other social science context. Arguably, it is an epidemiological phenomenon in that it simply describes how a sample performs within a population in repeated measures. Speech-language pathologists do not encounter regression to the mean in any tangible way with their individual clients except, perhaps, in longitudinal tracking of progress or test results. Just as a baseball player with the worst batting average for one year will likely have a better batting average in the next year, excellent performance by a client on one isolated measurement will typically be followed by some diminished performance in the next measurement.
The POSHA–S is designed primarily as an epidemiological measure, that is, to measure attitudes of populations through sampling (St. Louis, 2015). To that extent, it might be argued that the “crossover” effect, as an expected but extreme case of regression to the mean, is not an issue in epidemiological sampling (Smith, 2016). However, virtually all the samples in either cohort were undertaken to improve attitudes toward stuttering or to compare such samples with no intervention. The ultimate stated or unstated goal of all of this research was that more accurate or sensitive beliefs and reactions would result in more informed and empathetic interactions with those who stutter and, thereby, improve their quality of life (e.g., Arnold et al., 2015; Barnett et al., 2005; Blood et al., 2011; Boyle & Cheyne, 2024; Bricker-Katz et al., 2013; Craig et al., 2009; Ham, 1990; Hughes, 2015; Iimura & Miyamoto, 2022; Kumar & Varghese, 2018; Norman et al., 2023; Turnbull, 2006; Weidner & St. Louis, 2023) As such, we submit emphatically that it does matter that a minority of respondents exposed to interventions do, in fact, begin with quite positive attitudes and then end up with quite dramatically worse attitudes in a subsequent administration of the POSHA–S, and vice versa. It is also important to be aware of the robust finding that only about one-third of people—or fewer—hold stable attitudes from test to retest after no intervention. In the 2016 controls, it was 34.5%; in the 2026 controls, it was 26.1%. The remainder split between those who change from very positive to very negative or from very negative to very positive. These low percentages indicate that the assumption that most respondents “follow the mean” in pre and post studies is simply not true for attitudes toward stuttering in POSHA–S studies. As St. Louis, Abdalla, et al. (2024) proposed, tailored interventions to target those negative, minimal, and positive changers should be undertaken. Gupta, St. Louis, Rastogi, et al. (2025) reported the first attempt to tailor interventions to high, intermediate, or low scorers, but sorting pre scores into thirds did not affect the attitude changes differentially. Only sorting as was done in the two cohorts in this study revealed differences in respondent subgroups.
Is it possible that the “crossover” effect is a characteristic of only the POSHA–S? We do not believe that it is, but confirming research needs to be done. As a sample confirmation, we explored the pre and post results of 34 speech-language pathology students who filled out a 7-point bipolar adjective (or semantic differential) scale with 25 adjective pairs, e.g., “Friendly/Unfriendly” or “Talkative/Reticent” (Reichel & St Louis, 2004; St. Louis et al., 2014). First, the pre-test mean scores across all 25 ratings were calculated and then sorted from highest to lowest. In fact, their attitudes were less positive in the post-test than the pre-test (4.25 versus 4.06, respectively). Next, the rank-ordered means were divided as closely as possible into thirds (11, 12, and 11) and then compared to post-test scores. A “crossover” effect occurred. The overall mean difference from pre to post was -0.19, but the third with highest mean in the pre-test (4.73) had the lowest mean in the post-test (3.90) and the third with the lowest pre-test mean (3.75) had the highest post-test mean (4.23). The middle third’s pre- and post-test means were 4.30 and 4.06, respectively. The correlations between all the pre-test and post-test data was 0.506. Even so, “crossover” occurred even after correction for regression to the mean. While this post-hoc analysis was preliminary, it suggests that the “crossover” effect may be a common extreme version of regression to the mean in group attitude studies or related areas such as education.
Stuttering Item Contributions to Attitude Improvement and Differences in Change Groups
Despite exploring both the magnitude of item differences in both intervention and control samples from overall pre-test to post-test, and also among three pair-wise comparisons among the three change groups, strong likelihoods that specific items would be related to improvement or to change groups were not found. Nevertheless, the analyses revealed that items for which nearly all respondents agree on their positivity or negativity were the least likely to be different from pre to post overall or to differentially weight in the three change groups. For example, respondents in all the samples tended to be in agreement that stuttering is not caused by a fright or unseen spirits. They also agreed that they would try to ignore stuttering and not feel pity. They would believe that a person who stutters can do about any job, make friends, and live a normal life. Also, they would not think that other stutterers should help a stutterer or that they, themselves, would want to stutter.
Conversely, items that typically showed the least agreement among respondents were those items most likely to change overall in interventions or to vary among the three change groups. Respondents demonstrated considerable disagreement in the strength of their opinions about viral or disease causation or the extent to which they would be impatient or worry if their doctor stuttered. Uncertainty characterized their beliefs as to whether a stutterer is to blame for the stuttering, is nervous, or is shy or fearful. Not surprisingly, especially in the intervention sample, respondents changed ratings of their sources of knowledge about stuttering.
It cannot be overstated that these item differences cannot be assumed to always apply. For example, rejecting the notion that people who stutter are nervous or excitable or that other stutterers should help a person who stutters were invariant from pre to post in the intervention sample (
Figure 4), but not so in the control group (
Figure 6). This could be considered evidence that substantial differences in the attitudes of the two samples existed at the pre-test stage.
The POSHA–S was designed to tap into a wide variety of constructs relating to stuttering attitudes, which can include thoughts, feelings, and actions (St. Louis et al., 2008; St. Louis, 2025). In standard scoring and interpretation, the stuttering items are grouped according to topic within the two subscores of Beliefs and Self Reactions. The results of our analyses for RQ3 indicates that items within both Beliefs and Self Reactions subscores were among both the least and most variant. Among the more variable Beliefs, it is striking that ratings were likely to change for several of the constructs traditionally referred to as the so-called “stuttering stereotype” (Woods & Williams, 1976). These include the stereotype that stutterers are nervous, shy, reticent, weak, and psychologically involved. Apparently, these characteristics come to mind among the public, and even professionals who treat people who stutter, when they think about stuttering. For example, Ruscello et al. (1989-90) found that when undergraduate and graduate students in speech-language pathology were asked simply to write characteristics of a “typical adult stutterer” and “typical child stutterer,” the five most frequent traits for adults were frustrated, embarrassed, angry, nervous, and self-conscious. For the hypothetical child, they were frustrated, shy, embarrassed, nervous, and anxious.
Strengths, Limitations, and Future Research
Large aggregate studies such as this 2026 cohort and the earlier 2016 cohort rarely can analyze raw data from numerous unfunded international investigations. The typical alternative is to conduct meta-analysis studies. This study is not a meta-analysis; all the raw data has been archived by the first author over more than 25 years. As such, it can be best conceived of as an international, multi-site study reporting comparable results.
The main strength of our study is that it stands as a major replication and, to our best knowledge, only the second report of the unexpected finding of a “crossover” effect in respondents from a large, international dataset sorted for negative or positive change of attitudes toward stuttering in pre-post designs. On the other hand, the regression to the mean formula correction resulted in a diminished effect due to unexpected low test-retest correlations. This is not a limitation of the study but a finding that begs for further investigation to explain why respondents were apparently less stable in their responding than those samples taken a number of years earlier. Two lines for future research are thereby suggested: (a) to explore reasons for greater respondent variability and (b) to determine the extent to which the “crossover” effect occurs in other measures and for topics other than stuttering attitudes.
Inferences from our study of the 2026 cohort are limited by the fact that interventions were not at all uniform, as was the case for the 2016 cohort. Also, the 2026 intervention and control groups were unbalanced because some intervention studies lacked control groups which may explain some of the variability in the control group versus the intervention group. Nevertheless, it can be regarded as a strength that the comparison between the 2026 and 2016 cohorts documented the overall success of a wide variety of interventions but with a minority showing limited or no success.
As was called for in previous reports (e.g., St. Louis, Abdalla, et al., 2024), an especially productive line of future research would be to document the stability of changed attitudes over several successive administration of a standard measure. One attempt with a non-intervention sample was greatly limited by lack of care in responding (Gupta, St. Louis, Dutt, et al. 2025). A related study with differential interventions showed an initial improvement and “crossover” effect that appeared to weaken over two successive administrations (Gupta, St. Louis, Rastogi, et al, 2025). These studies should be replicated.
Figure 1.
Summary of samples and respondents in the 2026 cohort showing sample sorts according to success in improving attitudes toward stuttering as well as individual respondents who changed negatively, minimally, or positively.
Figure 1.
Summary of samples and respondents in the 2026 cohort showing sample sorts according to success in improving attitudes toward stuttering as well as individual respondents who changed negatively, minimally, or positively.
Figure 2.
Summary of samples and respondents in the 2016 cohort showing sample sorts according to success in improving attitudes toward stuttering as well as individual respondents who changed negatively, minimally, or positively.
Figure 2.
Summary of samples and respondents in the 2016 cohort showing sample sorts according to success in improving attitudes toward stuttering as well as individual respondents who changed negatively, minimally, or positively.
Figure 3.
Pre- versus post-tests for the 2026 intervention group (upper left graph), 2016 intervention group (upper right graph), 2026 control group (lower left graph), and 2016 control group (lower right graph). On the left side of each graph are the actual OSSs for the three change groups (negative changers, minimal changers, and positive changers), and on the right side of each graph are the results after applying correction for regression to the mean for the negative and positive changers.
Figure 3.
Pre- versus post-tests for the 2026 intervention group (upper left graph), 2016 intervention group (upper right graph), 2026 control group (lower left graph), and 2016 control group (lower right graph). On the left side of each graph are the actual OSSs for the three change groups (negative changers, minimal changers, and positive changers), and on the right side of each graph are the results after applying correction for regression to the mean for the negative and positive changers.
Figure 4.
Stuttering item mean scores for Beliefs in the 2026 intervention sample for negative, minimal, and positive changers shown within four components.
Figure 4.
Stuttering item mean scores for Beliefs in the 2026 intervention sample for negative, minimal, and positive changers shown within four components.
Figure 5.
Stuttering item mean scores for Self Reactions in the 2026 Intervention sample for negative, minimal, and positive changers shown within four components.
Figure 5.
Stuttering item mean scores for Self Reactions in the 2026 Intervention sample for negative, minimal, and positive changers shown within four components.
Figure 6.
Stuttering item mean scores for Beliefs in the 2026 control sample for negative, minimal, and positive changers shown within four components.
Figure 6.
Stuttering item mean scores for Beliefs in the 2026 control sample for negative, minimal, and positive changers shown within four components.
Figure 7.
Stuttering item mean scores for Self Reactions in the 2026 control sample for negative, minimal, and positive sorts shown within four components.
Figure 7.
Stuttering item mean scores for Self Reactions in the 2026 control sample for negative, minimal, and positive sorts shown within four components.
Figure 8.
Pre-administration POSHA–S stuttering attitude items classified according to whether their Cohen’s d values for pairwise sorts of change groups (i.e., negative versus minimal changers, negative versus positive changers, and minimal versus positive changers) were within the lowest 51% versus highest 49% in low to high ranks for the intervention and control samples. Blue shading of the variables represent Beliefs items and tan shading represents Self Reaction items. Items are not rank ordered by Cohen’s d’s within each cluster of “All Lower,” “More Lower,” “More Higher,” or “All Higher.” Note: The word “reject” is not included in POSHA–S items. Scores for these items are inverted.
Figure 8.
Pre-administration POSHA–S stuttering attitude items classified according to whether their Cohen’s d values for pairwise sorts of change groups (i.e., negative versus minimal changers, negative versus positive changers, and minimal versus positive changers) were within the lowest 51% versus highest 49% in low to high ranks for the intervention and control samples. Blue shading of the variables represent Beliefs items and tan shading represents Self Reaction items. Items are not rank ordered by Cohen’s d’s within each cluster of “All Lower,” “More Lower,” “More Higher,” or “All Higher.” Note: The word “reject” is not included in POSHA–S items. Scores for these items are inverted.
Figure 9.
Post-administration POSHA–S stuttering attitude items classified according to whether their Cohen’s d values for pairwise sorts of change groups (i.e., negative versus minimal changers, negative versus positive changers, and minimal versus positive changers) were within the lowest 51% versus highest 49% in low to high ranks for the intervention and control samples. The order of items are shown the same as for the pre-administration to better visualize changes from pre- to post-test. Blue shading of the variables represent Beliefs items and tan shading represents Self Reaction items. Items are not rank ordered by Cohen’s d’s within each cluster of “All Lower,” “More Lower,” “More Higher,” or “All Higher.” Note: The word “reject” is not included in POSHA–S items. Scores for these items are inverted.
Figure 9.
Post-administration POSHA–S stuttering attitude items classified according to whether their Cohen’s d values for pairwise sorts of change groups (i.e., negative versus minimal changers, negative versus positive changers, and minimal versus positive changers) were within the lowest 51% versus highest 49% in low to high ranks for the intervention and control samples. The order of items are shown the same as for the pre-administration to better visualize changes from pre- to post-test. Blue shading of the variables represent Beliefs items and tan shading represents Self Reaction items. Items are not rank ordered by Cohen’s d’s within each cluster of “All Lower,” “More Lower,” “More Higher,” or “All Higher.” Note: The word “reject” is not included in POSHA–S items. Scores for these items are inverted.
Table 1.
Selected demographic summary of the intervention and control samples in the 2026 and 2016 cohorts.
Table 1.
Selected demographic summary of the intervention and control samples in the 2026 and 2016 cohorts.
| |
2026 |
2016 |
| Variable |
Intervention |
Control |
Intervention |
Control |
| Age (Years) |
27.3 |
29.8 |
23.3 |
29.2 |
| Education (Years) |
14.3 |
14.6 |
13.3 |
13.9 |
| Relative Income Score (-100 to +100) |
3 |
-7 |
6 |
6 |
| Male |
13% |
20% |
27% |
27% |
| Female |
87% |
80% |
73% |
73% |
| I am/have been married |
41% |
40% |
28% |
43% |
| Parent |
25% |
38% |
15% |
34% |
| Student |
65% |
55% |
88% |
60% |
| Working |
48% |
57% |
25% |
46% |
| Self-Identification |
|
|
|
|
| Multilingual |
56% |
75% |
48% |
46% |
| Intelligent |
33% |
33% |
35% |
34% |
| Left handed |
8% |
8% |
8% |
6% |
| Obese |
7% |
7% |
5% |
7% |
| Mentally Ill |
9% |
2% |
2% |
2% |
| Stuttering |
2% |
1% |
1% |
1% |
| No Persons Known |
|
|
|
|
| Intelligent |
2% |
4% |
1% |
3% |
| Left handed |
4% |
6% |
2% |
3% |
| Obese |
8% |
16% |
5% |
7% |
| Mentally Ill |
26% |
45% |
23% |
21% |
| Stuttering |
33% |
38% |
21% |
17% |
Table 2.
Pre-test, post-test, and difference (post minus pre) values for the three POSHA–S subscores and OSSs of the 2026 and 2016 cohorts.
Table 2.
Pre-test, post-test, and difference (post minus pre) values for the three POSHA–S subscores and OSSs of the 2026 and 2016 cohorts.
| |
Intervention |
Control |
| |
Pre |
Post |
Difference |
Pre |
Post |
Difference |
| 2026 Cohort |
|
|
|
|
|
|
| Obesity/Mental Illness |
-23 |
-22 |
1 |
-27 |
-27 |
0 |
| Beliefs |
38 |
52 |
14 a
|
32 |
37 |
5 |
| Self Reactions |
12 |
24 |
12 b
|
7 |
10 |
3 |
| Overall Stuttering Score |
25 |
37 |
12 c
|
19 |
23 |
4 |
| 2016 Cohort |
|
|
|
|
|
|
| Obesity/Mental Illness |
-33 |
-30 |
3 |
-36 |
-31 |
5 |
| Beliefs |
34 |
44 |
10 d
|
29 |
30 |
1 |
| Self Reactions |
1 |
11 |
10 e
|
-8 |
-8 |
1 |
| Overall Stuttering Score |
18 |
27 |
10 f
|
10 |
11 |
1 |
Table 3.
Number of identical or similar ranks of 39 stuttering attitude items to overall pre- versus post-test Cohen’s d’s after being sorted from lowest to highest for similarly sorted d’s for the three pair-wise comparisons of the three change groups (negative versus minimal, negative versus positive, and minimal versus positive). Shown are (a) the numbers of identical ranks, (b) the numbers and means of approximate quarters of the items, and (c) numbers and means of approximate halves of the items.
Table 3.
Number of identical or similar ranks of 39 stuttering attitude items to overall pre- versus post-test Cohen’s d’s after being sorted from lowest to highest for similarly sorted d’s for the three pair-wise comparisons of the three change groups (negative versus minimal, negative versus positive, and minimal versus positive). Shown are (a) the numbers of identical ranks, (b) the numbers and means of approximate quarters of the items, and (c) numbers and means of approximate halves of the items.
| |
Frequency of Occurrence in Three Sorts |
| Number of Occurrences |
Same Rank |
Within Lowest Quarter (10) |
Within 2nd Lowest Quarter (10) |
Within 2nd Highest Quarter (9)a
|
Within Highest Quarter (10) |
Mean Quarter Occurrences |
Within Lower Half (20) |
Within Higher Half (19) a
|
Mean Half Occurrences |
| Intervention: Cohen’s d—Pre |
| 1 |
4 |
17 |
10 |
18 |
6 |
12.75 |
13 |
10 |
11.5 |
| 2 |
1 |
2 |
10 |
3 |
9 |
6.00 |
10 |
13 |
11.5 |
| 3 |
0 |
3 |
0 |
1 |
2 |
1.50 |
9 |
7 |
8.0 |
| Intervention: Cohen’s d—Post |
| 1 |
1 |
13 |
19 |
14 |
8 |
13.50 |
13 |
7 |
10.0 |
| 2 |
0 |
7 |
4 |
5 |
5 |
5.25 |
7 |
13 |
10.0 |
| 3 |
0 |
1 |
1 |
1 |
4 |
1.75 |
11 |
8 |
9.5 |
| Control: Cohen’s d—Pre |
| 1 |
3 |
20 |
11 |
13 |
8 |
13.00 |
21 |
3 |
12.0 |
| 2 |
0 |
2 |
5 |
7 |
11 |
6.25 |
3 |
21 |
12.0 |
| 3 |
0 |
2 |
3 |
0 |
0 |
1.25 |
11 |
4 |
7.5 |
| Control: Cohen’s d—Post |
| 1 |
2 |
12 |
15 |
15 |
9 |
12.75 |
16 |
4 |
10.0 |
| 2 |
0 |
3 |
6 |
6 |
9 |
6.00 |
4 |
16 |
10.0 |
| 3 |
0 |
4 |
1 |
0 |
1 |
1.50 |
12 |
7 |
9.5 |
Table 4.
Correlation coefficients of Cohen’s d effect size of pre-test scores (upper) and post-test scores (lower) for 39 stuttering attitude items in intervention (left) and control (right) groups for three comparisons: negative changers versus minimal changers, negative changers versus positive changers, and minimal versus positive changers. First, correlations are shown for sorts within each of the same three sorts; second, correlations are shown separately for negative, minimal, and positive changers within the three different sorts.
Table 4.
Correlation coefficients of Cohen’s d effect size of pre-test scores (upper) and post-test scores (lower) for 39 stuttering attitude items in intervention (left) and control (right) groups for three comparisons: negative changers versus minimal changers, negative changers versus positive changers, and minimal versus positive changers. First, correlations are shown for sorts within each of the same three sorts; second, correlations are shown separately for negative, minimal, and positive changers within the three different sorts.
| |
Cohen’s d Differences Pre: Intervention |
Cohen’s d Differences Pre: Control |
| |
Negative vs Minimal |
Negative vs Positive |
Minimal vs Positive |
Negative vs Minimal |
Negative vs Positive |
Minimal vs Positive |
| Within Same Negative, Minimal or Positive Sorts (Equal in All Three) |
0.376 |
0.006 |
0.538 a
|
0.307 |
0.290 |
0.288 |
| |
Negative Sort & Minimal Sort |
Negative Sort & Positive Sort |
Minimal Sort & Positive Sort |
Negative Sort & Minimal Sort |
Negative Sort & Positive Sort |
Minimal Sort & Positive Sort |
| Negative Changers Within Different Sorts for Negative, Minimal and Positive Change |
0.290 |
0.000 |
-0.367 |
-0.135 |
0.152 |
-0.079 |
| Minimal Changers Within Different Sorts for Negative, Minimal and Positive Change |
0.179 |
-0.303 |
-0.108 |
-0.047 |
0.397 |
0.021 |
| Positive Changers Within Different Sorts for Negative, Minimal and Positive Change |
0.124 |
-0.113 |
-0.311 |
0.227 |
-0.014 |
0.066 |
| |
Cohen’sdDifferences Post: Intervention
|
Cohen’sdDifferences Post: Control
|
| |
Negative vs Minimal |
Negative vs Positive |
Minimal vs Positive |
Negative vs Minimal |
Negative vs Positive |
Minimal vs Positive |
| Within Same Negative, Minimal or Positive Sorts (Equal in All Three) |
0.421 a
|
0.081 |
0.767 a
|
0.624 a
|
0.353 |
0.293 |
| |
Negative Sort & Minimal Sort |
Negative Sort & Positive Sort |
Minimal Sort & Positive Sort |
Negative Sort & Minimal Sort |
Negative Sort & Positive Sort |
Minimal Sort & Positive Sort |
| Negative Changers Within Different Sorts for Negative, Minimal and Positive Change |
0.028 |
0.071 |
-0.086 |
-0.097 |
-0.293 |
0.184 |
| Minimal Changers Within Different Sorts for Negative, Minimal and Positive Change |
0.167 |
-0.023 |
-0.319 |
0.214 |
0.041 |
-0.092 |
| Positive Changers Within Different Sorts for Negative, Minimal and Positive Change |
0.346 |
-0.147 |
-0.127 |
0.305 |
-0.012 |
-0.190 |