A Learning Interaction Bbetween Statistical Learning Experiments

When participants in a statistical learning paradigm are asked to learn from two incompatible or competing inputs, they often fail to learn from one or both inputs. This study presents the results of two experiments that were both completed by one group of typically developing four-year-old children. One experiment targeted word-medial consonant patterns (phonotactics), whereas the other targeted strong-weak and weak-strong stress patterns (prosody). The order of the experiments was critical for learning outcomes in the phonotactics experiment: When children learned phonotactics first, their production accuracy increased following exposure to a high frequency input. When children learned phonotactics second, however, their production accuracy dropped when they were exposed to the high frequency input. Results from the prosody experiment were inconclusive, with limited evidence of any learning effect. Overall, the results suggest that children may conflate learning experiences, and patterns learned from an initial experimental input compete with patterns in a subsequent experiment. When considering natural language acquisition, the results suggest that an isolated episode of learning may lead to generalizations that are incompatible with later input, and possibly, with larger patterns in the language.


Introduction
Statistical learning has for several decades been a highly productive framework for studying the learning of a variety of linguistic structures, as well as structured visual input (for an overview see Saffran & Kirkham, 2018). An emerging area of statistical learning research investigates learning from an input containing multiple patterns. Multilingual learners represent one instance of this challenge-their input includes two separate languages that must be segregated such that each language can be learned. However, even when we consider a monolingual learner who is only exposed to one dialect of one language, there are myriad linguistic patterns present in that input. In the domain of phonology, an input contains prosodic and segmental cues, within-and between-word phonotactics, and morphologically conditioned phonological patterns such as the patterns for pluralization in English. Although it is clear that infants are able to solve many learning challenges, and even infants learning three or more languages are ultimately successful, much remains to be understood about the process of learning from complex, multidimensional inputs.
The focus of this study is on learning two phonological structures across separate experiments. More specifically, four-and five-year-old children were tasked with learning phonotactic patterns in one experiment and prosodic patterns in the other. Anticipating the results, learning depended on the order in which the experiments were completed. This effect of order is important because it signals that statistical learning can answer questions about-not only what can be learned-but how episodes of learning interact with each other. Put slightly differently, our study suggests that statistical learning research that incorporates multiple patterns can be used to explore learning of a pattern even when it is absent from, irrelevant to, or in conflict with the current learning episode.
In our review of the literature, we cover several studies with multidimensional inputs.
Learners are typically presented with two incompatible or competing inputs. In general, this literature suggests that it is difficult for learners to simultaneously retain knowledge of both inputs. Furthermore, success is often driven by the inputs' phonological properties. Weiss, Gerfen, and Mitchel (2009) report four statistical word segmentation experiments exploring how adults interpret and learn from two inputs presented in sequence. In statistical word segmentation tasks, participants hear a continuous stream of syllables like bətigusɪtʃəvivʊbosaetogʊtʃa. Some syllables always occur in a sequence, such as bə, ti, and then gu, meaning that bətigu functions like a word in the stream. However, Weiss et al. interleaved two inputs in the stream such that bətigu functioned as a word during some sections of the stream but not others. The participants were tested on their ability to discriminate words like bətigu from part-words like tigusɪ. Adults learned the patterns from both inputs when each input was spoken by a different talker, but not when the same talker produced both inputs. Weiss et al. suggest that learning fails when participants conflate the statistics of the two inputs, as may be expected when all stimuli come from a single talker. Gebhart, Aslin, and Newport (2009) conducted a similar statistical word segmentation study with adults. Participants heard just one talker produce both inputs, but the inputs were blocked such that participants heard one in its entirety and then the other. Participants typically only learned the pattern in the first input, although learning of the second pattern was observed when participants were explicitly told to listen for two distinctly patterned inputs, when they heard a pause between each input, as well as when participants heard the second input for three times longer than the first. Gebhart et al. conclude that statistical learning from two inputs shows a primacy effect-the first of two structurally different streams is likely to be remembered, but not the second. This primacy effect may reflect a learning bias that favors what comes first (see Bulgarelli & Weiss, 2016 for additional discussion), but the authors also propose that, beyond retention, the learning of the first language interferes with learning of the second.

Interactions in Statistical Learning
Turning to statistical learning in infants, Benitez, Bulgarelli, Byers-Heinlein, Saffran, and Weiss (2020) observe that 8-month-olds struggle to learn the statistics of the second of two syllable streams. Across several experiments, infants failed to reliably extract words from the second input, even when each input was signaled by a different pitch quality and accent.
Although this study does not provide direct evidence for a primacy effect, the first experiment demonstrated that each input was learnable when presented in isolation. Thus, it was the presence of two competing inputs-each with its own phonological patterns-that impeded learning, consistent with Gebhart et al.'s (2009) proposal of a primacy effect.
In contrast to the work with adults, Benitez et al. (2020) observed that, for infants, indexical cues like pitch and accent were insufficient to allow learning of the second of two inputs. A relative weakness of indexical cues was also reported in an infant study by Potter and Lew-Williams (2019). Those authors explored how infants use different types of cues to attune to linguistic structure. They exposed infants to one structured input (either AAB as in le-le-di or ABA as in le-di-le); that input was embedded in the middle of an unstructured input (16 trisyllables without an internal pattern, such as foi-nah-vuh). The authors then varied the cues that signaled the structure-a unique talker, a unique phoneme inventory, or both. When the structured input comprised unique phonemes, infants learned it regardless of whether it was produced by the same talker that produced the unstructured input. Without unique sounds, however, infants were not able to use talker as a cue to learn the structured language. Potter and Lew-Williams conclude that infants can learn a pattern in the presence of a competing input, but it appears that phonological cues like the phoneme inventory are more informative of the target pattern than indexical cues like talker.
In a study involving similar phonological cues to those examined here, Thiessen and Saffran (2003) observe developmental changes to the phonological cues to which learners attend.
Those authors conducted a statistical word segmentation study in which infants were exposed to just one input, for example, where da was always followed by pu, and bu by go. However, some infants heard a syllable stream in which stress was consistent with English (weak-strong stress on DApu and BUgo) whereas other infants heard a stress pattern uncommon in English (strongweak stress on daPU and buGO). Nine-month-old infants appeared to ignore statistical cues and instead segment words based entirely on the stress pattern. In contrast, seven-month-old infants used the statistical cues regardless of the stress pattern. Thiessen and Saffran argue that the statistics of syllable order may be an earlier developing phonological cue to word boundaries, but by nine months, word stress is the primary phonological cue for segmenting the speech stream.
Finally, phonology played a surprising role in what infants learned in a study by Gerken and Quam (2017). In this study, 11-month-olds were exposed to just one input. Infants heard novel CVCV words containing a target phonological pattern, either shared place of articulation (poba contains two labials) or shared voicing (dova contains two voiced consonants). Although only one pattern was present in the exposure words, some infants heard the words in an order that allowed for a local phonological generalization, for example, when two or three adjacent words started with the same consonant. When local generalizations were present, infants did not appear to learn the more global phonological patterns for place of articulation or voicing. When those local generalizations were removed, however, infants learned the more general patterns.
In sum, learning from two incompatible or competing inputs poses a challenge to learners across the lifespan. Although adults are sometimes able to learn patterns across multiple inputs, they often only learn the first pattern presented. Furthermore, both adults and infants rely on phonological cues that signal the pattern in the input. When two patterns are present, phonology may help learners track both, but some phonological cues appear to outweigh others or lead to unintended generalizations. This final point, that learners may apply generalizations unexpectedly, is especially relevant to the present study. Our focus was on the ability of preschool-aged children to learn and apply two distinct phonological patterns to their own speech (prosodic and segmental patterns, similar to Thiessen and Saffran, 2003). Although the patterns were distinct, children completed both experiments, allowing us to examine unintended generalizations across experiments.

Method
Two experiments-one targeting phonotactics and the other prosody-were originally designed to be interpreted separately, and they focus on different dependent measures to track learning.
However, individual participants completed both experiments, and the order of experiments was counterbalanced across participants. This counterbalancing allowed us to examine an interaction based on experiment order which is most readily interpretable from the perspective of a single study. Thus, we present both experiments under a single methods section.

Participants
A total of 41 children between the ages of 4 and 5 years (see Table 1) were recruited for the study. Ten children were not included in the analyses because they did not participate for all five days, and they left one or both experiments incomplete. Two additional children were removed because standardized testing indicated that they had a speech sound disorder. The remaining 29 children (17 females and 12 males) were included in analyses.
All participants met the following criteria for typical development. All children passed a hearing screening of pure tones at 500, 1000, 2000, and 4000 Hz at 20 dB. All children received standardized test scores at or above one standard deviation below the mean (standard scores above 85). Additionally, parents were asked about the child's development, and for the 29 participants included, no concerns were raised.
Normative data were collected across a range of areas: speech production (Goldman-Fristoe Test of Articulation-2; GFTA-2; Goldman & Fristoe, 2000), nonverbal skill (Columbia Mental Maturity Scale; CMMS; Burgemeister, Blum, & Lorge, 1972), receptive vocabulary (Peabody Picture Vocabulary Test-4; PPVT-4;Dunn & Dunn, 2007), expressive vocabulary (Expressive Vocabulary Test; EVT; Williams, 2007), expressive syntax (Structured Photographic Expressive Language Test-3; SPELT-3; Dawson, Stout, & Eyer, 2003), and nonword repetition accuracy (Dollaghan & Campbell, 1998). Table 1 below provides these normative data, as well as the participants' mean age in months, the age range, and average accuracy in the two experiments. Because the critical variable in this study is the experiment order, the normative data are presented separately for the phonotactics first and prosody first groups, and a t-test comparison between the groups is presented in the rightmost column. No significant difference, where p < .05, was observed. We also note that scores from the CMMS, PPVT-4, the EVT, and the SPELT-3 indicate that this group of participants possessed aboveaverage cognitive and language skills.

SPELT-III = Structured Photographic Expressive Language Test-3. a The number in parentheses
for ages in months is the range of ages rather than the standard deviation. b Accuracy in the phonotactics experiment is on a scale from 0-6. c Accuracy in the prosody experiment was averaged across 2-syllable words (scale 0-6) and 4-syllable words (scale 0-12), resulting in a derived scale of 0-9.
*No statistical comparisons of the phonotactics first and prosody first orders were significant.

Materials
This study relies on speech production to measure learning. Children under the age of six still produce speech errors and are developing their knowledge of phonology (McLeod & Crowe, 2018). Thus, we ask whether passive, perceptual learning from a familiarization input influences children's production accuracy for test items. The target patterns that children produced were word-medial consonant sequences in the phonotactics experiment, as well as strong-weak and weak-strong stress patterns in the prosody experiment. The familiarization and test materials for both the phonotactics and prosody experiments are presented in Table 2. In the phonotactics experiment, the learning targets were word-medial consonant sequences (Munson, 2001). The targets appeared in nonsense words (hereafter referred to as "items") with a CVCCVC shape and stress on the first syllable. All items started with a unique CV sequence and differed from other items by at least three phonemes. Phonotactics familiarization and test items were paired with colorful make-believe animals (Ohala, 1999), and children were told that the nonsense words were the names of the animals. The four target consonant sequences were chosen because consonant sequences are relatively difficult, making it likely that children would sometimes produce them in error, and learning could be measured. We note that data from the phonotactics experiment-when that experiment was completed firstare reported in Richtsmeier and Goffman (2017). All other data have not been reported elsewhere.
In the prosody experiment, the learning targets were prosodic contours, that is, one of two different stress patterns. The first pattern was strong-weak (SW), such as on the noun REC-ord; the second pattern was weak-strong (WS) as on the verb re-CORD. These patterns appeared in both 2-syllable and 4-syllable items composed of CV syllables, or four total targets. The prosody familiarization and test items were paired with colorful aliens (Gupta et al., 2004). Prosodic contours were chosen because developmental data show that children have not yet reached adult levels of mastery, as indicated by omissions of unstressed syllables as well as acoustic and motor analyses of the WS stress pattern (Ballard, Djaja, Arciuli, James, & van Doorn, 2012;Gladfelter & Goffman, 2013;Goffman, 1999;Goffman, Gerken, & Lucchesi, 2007;Goffman & Malin, 1999 Recordings of all familiarization and test items were obtained from seven adult female speakers of a Midwestern dialect of American English. The recordings were made in a sound booth following model productions made by the first author. This process was implemented to ensure that acoustic cues for medial consonants and the prosodic contours were produced faithfully. Recordings were later scrubbed of acoustic artifacts and scaled for intensity using Praat software (Boersma & Weenink, 2021). Productions from five of the talkers were used for the familiarization items; productions from the other two talkers were used for the test items.
Experimental Frequency. In both experiments, participants were familiarized with the learning targets during a perceptual familiarization phase, and the experimental frequency of the targets varied as a within-subjects factor, with two targets in the low experimental frequency condition and two in the high experimental frequency condition. Children were familiarized with low experimental frequency targets in just one familiarization item (the items in italics in Table   2). Participants heard that item five times from a single talker. High experimental frequency targets appeared in three familiarization items. Participants heard each item five times, each from a different talker. Thus, high experimental frequency was a combination of high word-type frequency and talker variability (Richtsmeier, Gerken, & Ohala, 2011).
Several previous studies suggest that high experimental frequency can lead to greater production accuracy in children (Edeal & Gildersleeve-Neumann, 2011;Plante, Bahl, Vance, & Gerken, 2011;Richtsmeier, Gerken, Goffman, & Hogan, 2009;Richtsmeier & Moore, 2020), a finding that is consistent with the augmentative effect that high natural language frequency has on production accuracy (Beckman & Edwards, 2000;Edwards, Beckman, & Munson, 2004;Masdottir & Stokes, 2016;Munson, 2001;Storkel, 2015). The assignment of items to the two experimental frequencies was counterbalanced across four lists. The four lists also allowed make-believe animals in the phonotactics experiment and aliens in the prosody experiment to be assigned to different items.

Procedure
The procedures, including informed consent, were approved by the Internal Review Board at Purdue University. Children participated over five weeks, one visit per week. The first session included only testing; participants completed a hearing screening, the SPELT-III language test, and the CMMT nonverbal skill test. Other normative data were collected following the experiment during the other four sessions. The first experiment was completed at the start of the second and third sessions, and the second experiment was completed at the start of the fourth and fifth sessions. Similar numbers of children completed the phonotactics experiment first (n = 16) or the prosody experiment first (n = 13). All sessions were held in a quiet room in a university building. Throughout each session, participants were seated in a Rifton chair with an attachable tabletop, approximately 10 feet from a monitor and speakers. Caregivers were seated nearby.
Before the start of the phonotactics experiment, the experimenter explained that the child would hear the names of "funny, make-believe animals", and that the child's task during familiarization was to watch the animals and listen to their names. Before the start of the prosody experiment, the experimenter explained that the child would hear the names of "aliens from another planet" and that they should watch and listen during the familiarization. Thus, the instructions were similar except for the referents to be learned. The second experiment was nevertheless described as a new experiment and different from what had come before.
The experiments were controlled by Paradigm software (Paradigm, 2015). Each experiment began with familiarization, during which Paradigm presented items in random order.
Familiarization was immediately followed by the first test block, and the second test block was completed in the subsequent session a week later. Test items were presented in a predetermined, pseudorandom order, and the same word was repeated no more than twice in a row.
During test blocks, participants were told that they would repeat the names of some new animals or aliens. Children heard and repeated each item immediately. Although children typically required one or two prompts for the first few productions of the first test block, they eventually learned the task and were able to proceed without prompts. Children had nine opportunities to produce each item during a test block. Cases where children did not produce an item were minimal (20 missing productions for the phonotactics experiment; 13 missing productions for the prosody experiment; less than 1% of all attempts), and for all participants, there were always five or more productions of each word in each test block.

Analysis
Children's productions were recorded digitally for transcription and acoustic analysis. The dependent measure of interest for the phonotactics experiment was production accuracy of the word-medial consonant sequence and was based on transcription. Transcriptions were made by the first author and were converted to points based on a system adapted from Edwards et al. Dependent measures of interest for the prosody study included three ratios of different acoustic markers of stress: ratios of duration, pitch/fundamental frequency, and amplitude (for example, Kehoe, Stoel-Gammon, & Buder, 1995). Omitted or inaudible syllables-based on transcriptions by the first author-comprised a fourth dependent measure. The acoustic measures were analyzed using Praat software (Boersma & Weenink, 2021). The beginning and ending of vowels were first demarcated. Durations were equivalent to the lengths of the demarcated vowel regions. Pitch and intensity were operationalized as the averages across the vowel region. Ratios were then calculated by dividing the value of the first syllable by the value of the second syllable, or σ1/σ2. For four-syllable words, two ratios were collected: σ1/σ2 and σ3/σ4.
Ratios were not calculated for 555 missing productions (26.6%) including productions that children did not make, productions in which one syllable was omitted, or productions that were missing due to experimenter error. From the remaining 1,533 productions, pitch and intensity ratios were removed if the production was whispered, made in creaky voice, contained acoustic artifacts like foot tapping, or was more than 2 standard deviations from the participant's mean. There were 48 duration ratios (2.1%), 215 pitch ratios (9.4%), and 53 intensity ratios (2.3%) removed for these reasons. Due to missing data, mean pitch ratios were missing for 12 words across 8 participants, and mean intensity ratios were missing for 5 words across 4 participants; all mean duration ratios could be calculated. We also note that the recordings for six participants (all of whom completed the prosody experiment first) contained a high-frequency artifact resembling a square wave. It was created by a sound mixer for unknown reasons.
Because the noise started at 1000 Hz, it was not expected to interfere with the measures of pitch and intensity. Data from these six participants were therefore included in the acoustic analyses.
A summary of the inferential statistical analyses is presented in Table 3. The transcription-based points from the phonotactics experiment, as well as the three acoustic ratios and the omitted-syllable counts from the prosody experiment, were entered separately into linear mixed-effects models in R statistical software using the lmerTest package (Kuznetsova, Brockhoff, Christensen, & Jensen, 2020). Mixed effects models are ideal for evaluating incomplete data sets such as the prosody dataset here. We followed recommendations for mixedmodel analyses described by Baayen, Davidson, and Bates (2008). In particular, we began with baseline models of main effects that were then compared with more specific models containing interactions. For the baseline models, the main effects of experiment order, experimental frequency, and session were included for both experiments; stress pattern was included as an additional main effect for the prosody experiment. Random effects for participant intercepts were included in all models. To evaluate the presence of interactions between experimental frequency and experiment order, a second model was assessed for each experiment in which experimental frequency and experiment order were allowed to interact. The two models were then compared using a likelihood ratio test that was implemented with the anova function in R (Kuznetsova, Brockhoff, & Christensen, 2017), and the optimal model was interpreted for significant effects. The baseline mixed effects model included experiment order, experimental frequency, and session as main effects. Using a log likelihood ratio test, the baseline was then compared with an alternative model in which experiment order and experimental frequency were allowed to interact. The results of the model comparison appear in Table 4 below. The alternative model had lower information criterion scores (AIC and BIC) and a lower log likelihood value.

Phonotactics Experiment
Furthermore, it was significantly better when explaining the data (χ 2 = 29.98, df = 1, p < .001). As the alternative model was significantly better at explaining the data, it is summarized in Table   5 below. Although there was a main effect of experimental frequency (β = .29, SE = .05, t = 5.79, p < .001), the interaction of experimental frequency and experiment order was significant (β = -.37, SE = .07, t = -5.50, p < .001). To better understand that interaction, separate mixed-effects analyses were completed to examine experimental frequency and session in each experiment order condition. Accuracy was marginally lower in the low experimental frequency condition for the phonotactic learning first data (β = -.08, SE = .05, t = -1.66, p = .097); accuracy was significantly higher in the low experimental frequency condition for the phonotactic learning second data (β = .29, SE = .04, t = 6.51, p < .001). The results for the phonotactic learning second condition are surprising because high experimental frequency has typically been reported to increase children's production accuracy relative to low experimental frequency (for example, Plante et al., 2011;Richtsmeier et al., 2009;Richtsmeier & Goffman, 2017;Richtsmeier & Good, 2018).

Prosody Experiment
Figure 2 presents a summary of the three acoustic ratios. Mean ratios were calculated by averaging across productions within and across sessions; for the four-syllable words /mifəpoʊbə/ and /pəfɑməbeɪ/, ratios were also averaged across the two syllable groups ([σ1/σ2 + σ3/σ4] ÷ 2).
Visual analysis of the ratios suggests robust acoustic contrasts when comparing the SW stress pattern to the WS pattern, but no consistent differences related to experimental frequency.  The baseline mixed effects model included experiment order, experimental frequency, session, and stress pattern as main effects. Equivalent to the analysis of the phonotactics experiment, the baseline was then compared with an alternative model in which experiment order and experimental frequency were allowed to interact using a log likelihood ratio test. For duration and intensity ratios, the alternative models did not provide a better fit (χ 2 duration = 0.25, df = 1, p = .615; χ 2 intensity = 0.004, df = 1, p = .949). In the case of pitch ratios, the model with the interaction provided a significantly worse fit compared to the baseline model (χ 2 duration = 7.55, df = 1, p = .006). Thus, in the analyses of three acoustic correlates of stress, experimental frequency did not interact with experiment order. The ANOVAS comparing the baseline and alternative models, as well as the full baseline models of all three ratios, are presented in Appendix A. Here, we present a brief summary of the findings.
There were consistent differences between the SW and WS stress patterns for all three acoustic measures. Regarding durations, participants produced ratios near 1.0 for the SW stress pattern, but ratios less than .5, or second syllables twice as long as first syllables, for the WS pattern (βduration = -0.70, p < .001). Pitch was higher on the first syllable for all items (ratios > 1.0) but highest for the SW pattern (βpitch = -0.06, p < .001). Intensity ratios were greater than 1.0 for the SW pattern, indicating a louder first syllable. Intensity ratios were lower for the WS pattern and averaged below 1.0 for /pəfɑməbeɪ/ (βintensity = -0.10, p < .001). Most importantly, there was no significant effect of experimental frequency in any analysis (all ps > .100). This final result indicates that the acoustic ratio analyses lacked a learning effect attributable to the familiarization and the relative frequencies of the different stress patterns.
The final analysis of the prosody experiment considered inaudible or omitted syllables. Figure 3 presents the number of omitted syllables for each of the four test items. With just 14 omitted syllables observed, the dataset is quite small. Furthermore, one child provided 8 of the 9 omitted syllables for /pəfɑməbeɪ/ in the high experimental frequency, prosody learning second condition. The low number of omitted syllables is to be expected given that the children were all typically developing, and more than half were over the age of four. By that age, syllable omission is quite rare in typically developing children (Roberts, Burchinal, & Footo, 1990).
Here, we present a summary of the findings from the baseline model because the To summarize the results of the prosody experiment, there were consistent differences related to the SW and WS stress patterns, but there was no significant interaction between experiment order and experimental frequency. More generally, with no observed effect of experimental frequency, there was limited evidence for learning of the stress patterns.

General Discussion
In this study, children completed two statistical learning experiments-one focused on learning phonotactics in the form of word-medial consonant sequences, the other focused on learning prosodic modulation in the form of SW and WS stress patterns. Learning was probed by a comparison of high and low experimental frequency conditions. In the high experimental frequency condition, learning targets appeared in multiple items produced by multiple talkers; in the low experimental frequency condition, targets appeared in a single item produced by one talker. A relative difference in the high and low experimental frequency conditions was taken to signal learning. The key finding is that learning of the phonotactic sequences was influenced by the order in which participants completed the two experiments. When participants completed phonotactic learning first, there was a trend towards greater accuracy following high experimental exposure. In contrast, when participants completed phonotactic learning second, participants were significantly more accurate following the low experimental frequency exposure. Therefore, the effect of experimental frequency, and by extension learning, varied depending on the order in which the two experiments were completed. An experiment order by experimental frequency interaction was not observed in the prosody experiment, nor was there a main effect of experimental frequency. Below we describe several reasons why the effect may have been limited to the phonotactics experiment.
The interaction of experiment order and experimental input frequency is consistent with the literature reviewed in the Introduction. Benitez et al. (2020), Gebhart et al. (2009), Potter andLew-Williams (2019), and Weiss et al. (2009) all found that conflicting language inputs were challenging for participants. In the adult studies, participants often only learned the first of two artificial languages presented in succession. The infant study by Benitez et al. was consistent with the same "primacy effect". The present study may also reflect a primacy effect. When participants completed the phonotactics experiment first, the effect of experimental frequency was consistent with previous findings. It was only when the experiment was completed second that a surprising finding arose.
The results are also consistent with various phonological cues as central to the learnability-or difficulty-of a multidimensional input. When searching for word boundaries, Thiessen and Saffran (2003) find that 7-month-old infants rely on statistical cues, but 9-montholds ignore the statistics and instead rely on prosodic cues. Gerken and Quam (2017) report that infants are sometimes misled by narrow phonological generalizations, such as repeated wordinitial consonants. Potter and Lew-Williams' (2019) infants demonstrate that learning of a structured input-surrounded by unstructured input-is possible when the structure is signaled by a unique inventory of phonemes. Here, the phonotactic and prosody experiments targeted different phonological structures (Kenstowicz & Kisseberth, 2014). Phonotactic generalizations often occur at a segmental level. Prosodic generalizations, in contrast, occur at a metrical or intonational level that spans multiple syllables. Additionally, a variety of cues were given to participants so that they might treat the experiments as separate. These cues included time (a week between experiments), visual referents (make-believe animals for phonotactics and aliens for prosody), and instructions (participants were told that the second experiment was, in fact, a different experiment). Given these factors, it would appear that the two experiments were relatively well distinguished, at least in terms of the phonological aspects to attend to.
The list of differences above notwithstanding, the interaction of experiment order and experimental frequency informs us that participants did not treat the experiments as separate.
Given the clear phonological distinction between targets, the interference is most readily attributable to the phonological learning environment. In particular, participants began with an exposure phase in which they listened to a structured input of nonwords with high and low experimental frequency conditions. Future research is needed to establish a unified account of the various types of phonological interference or cue interaction observed by Gerken and Quam (2017), Potter and Lew-Williams (2018), and here. Furthermore, an account of phonological interference should also account for the type of segmental and prosodic cue integration studied by Thiessen and Saffran (2003), as well as the successful input segregation observed in adults by Gebhart, Weiss, and colleagues. Robust segregation of different statistical learning inputs may not occur until adolescence or adulthood. Of course, in the real world, infants and children are exposed to a vast array of inputs reflecting different rules and patterns, so future research is also needed to better understand the conditions under which even the youngest learners can learn from and segregate a multidimensional input.
Finally, our study is consistent with proposals by Gebhart et al. (2009) and Bulgarelli and Weiss (2016) that the primacy effect likely reflects a kind of interference across experiments.
This interference was signaled by a qualitative difference in the direction of the experimental frequency effect that was determined by experiment order. Despite a surface connection to previous findings of interference, the accuracy advantage for low experimental frequency sequences in the phonotactic learning second experiment is remarkable, and to our knowledge, it is unprecedented. Consider the logic put forth by Richtsmeier et al. (2011) to explain the benefits of a high experimental frequency: They argue that this benefit is consistent with the high frequency advantage seen across language development (Ambridge, Kidd, Rowland, & Theakston, 2015). In fact, Richtsmeier et al. interpret experimental frequency as a simulation of the production advantage for high English frequency consonant sequences, such as when children produce novel words with high-frequency sequences like /blɪk/ more accurately than words with low-frequency sequences like /sfɪk/. These effects have been reported by Edwards et al. (2004), Masdottir and Stokes (2016), Munson (2001), Richtsmeier et al. (2009), Zamuner, Gerken, andHammond (2004), and many others. To have obtained the opposite of this wellestablished result is striking.
As such, we consider the experiment order by experimental frequency interaction to be most consistent with the kind of unintended generalization observed by Gerken and Quam (2016). In other words, a relative advantage for low experimental frequency sequences in the phonotactic learning second condition may be a case of the wrong generalization being applied. This is in part because participants in the phonotactic learning second condition probably did not better learn the low experimental frequency sequences. In some sense, such an explanation defies the basic notion of learning, which is closely tied to stimulus frequency (Ambridge et al., 2015).
Rather, we argue that high experimental frequency had a kind of damping effect because it was inconsistent with high experimental frequency from the previous experiment. That is, it was inconsistent with the high frequency, prosody-focused items from the prosody experiment.
Additional research is necessary to verify that participants were attempting to impose patterns from the first input onto the second input. Here and in previous studies, the authors have verified that participants did not exhibit an expected pattern, but they have not specifically stopping, or gliding. How do children learn these unobserved phonological patterns? One possibility is that they start out as unintended generalizations from statistical learning. For example, a child who hears and imitates several words in a row that begin with initial alveolar stops (toy, tooth, dad, and doll) may draw the generalization that word-initial stops are alveolars.
If they are later exposed to the word car, that generalization could result in a production like [tɑr]. This type of generalization is the focus of ongoing experiments in the first author's lab.
Furthermore, using the statistical learning paradigm to study phonological processes may shed light on why phonological processes like fronting are relatively common, including in typical development, whereas processes like backing (underlying /t, d/ are produced as [k, g]; [haek] for hat) are rare.
A notable limitation of the present study is that no learning effects were observed in the prosody experiment. More specifically, there was not a significant main effect in any of the acoustic ratios or in the number of omitted syllables. As such, it may not have been possible to observe experiment order by experimental frequency interactions to either reinforce or limit the interpretation of the phonotactics study.
It may be that our participants were proficient enough when producing strong-weak and weak-strong contours that the results reflect an aspect of phonology that is less amenable to learning. There is some support for such a conclusion. For example, Pollock, Brammer, and Hageman (1993) found that 3-and 4-year-olds were able to consistently use duration, pitch, and amplitude to distinguish polysyllabic nonwords with strong-weak and weak-strong prosody. In a larger study with children up to seven years of age, Ballard et al. (2012) found that children acquiring English were adept at using duration, pitch, and amplitude to signal strong-weak patterns as young as age three. However, the strong-weak patterns produced by seven-year-olds did not reach adult levels of contrast. Our data are consistent with Ballard et al.'s protracted developmental trajectory. In statistical comparisons with the acoustic ratios of the adult model productions, children differed from the adult norms, particularly in the use of pitch for strongweak patterns (see Appendix C). A similar delay for the strong-weak pattern was observed in kinematic analyses of articulatory stability made by Goffman and Malin (1999). Thus, there was room for learning to be observed in some of the acoustic parameters of prosody. However, relative to the impact of phonemic substitution errors common in the phonotactics experiment, there may have been fewer perceptual consequences to falling short of adult-like prosodic targets because the basic targets of SW and WS were being achieved. In this more nebulous learning space, children may have had fewer incentives to improve their production targets for the prosody items. Regardless of the adequacy of this explanation, further research is needed to better understand the learnability of various phonological targets within the statistical learning paradigm and as applied to child speech development.
In conclusion, the results of this study reflect an interesting case of multidimensional statistical learning. Compared to many previous studies in this area, our study did not include stimuli that were inherently in conflict. Phonotactics and prosody are different enough that it would be reasonable to expect participants to learn them separately. Nevertheless, learners did not treat the two experiments as separate. When participants completed the phonotactic experiment first, learning was consistent with previous findings, perhaps reflecting a primacy effect for initial learning. When participants completed the phonotactic experiment second, participant accuracy was unexpectedly low for the high experimental frequency condition, indicating interference from the previously completed prosody experiment. Exactly what this interference looks like, and whether it reflects overgeneralization of the patterns from the first experiment, remains to be determined by future studies.