Variable affix order: grammar and learning

While affix ordering often reflects general syntactic or semantic principles, it can also be arbitrary or variable. This article develops a theory of morpheme ordering based on local morphotac-tic restrictions encoded as weighted bigram constraints. I examine the formal properties of morphotactic systems, including arbitrariness, nontransitivity, context-sensitivity, analogy, and variation. Several variable systems are surveyed before turning to a detailed corpus study of a variable affix in Tagalog. Bigram morphotactics is shown to cover Tagalog and the typology, while other formalisms, such as alignment, precedence, and position classes, undergenerate. Moreover, learning simulations reveal that affix ordering under bigram morphotactics is subject to analogical pressures, providing a learning-theoretic motivation for the specific patterns of variation observed in Tagalog. I raise a different set of objections to rule-based approaches invoking affix movement. Finally, I demonstrate that bigram morphotactics is restrictive, being unable to generate unattested scenarios such as nonlocal contingency in ordering.*

Though the order in which morphemes are realized within a word is sometimes predictable on general syntactic or semantic grounds, ordering can also be subject to language-specific constraints, or exhibit free variation.Without denying the roles of the interfaces in affix ordering, this article focuses on the grammar and learnability of its arbitrary aspects.The theory of morphotactics must cover phenomena such as counterscopal or otherwise arbitrary ordering restrictions, nontransitivity (e.g.X-Y [*Y-X] and Y-Z [*Z-Y], but Z-X [*X-Z]), context-sensitivity (e.g.X-Y-A [*Y-X-A] but Y-X-B [*X-Y-B]), gradient variation, learnability, and analogical effects ( §6).For some of these phenomena, only a subset of logically possible patterns is found in human languages (e.g.context-sensitivity in ordering is always local).The theory should therefore also be restrictive, generating only possible patterns.I argue that morphotactics are encoded locally, as adjacency bigrams, while other proposals-such as morpheme alignment, precedence constraints, position classes, templates, and affix movement-critically undergenerate or overgenerate.
Gradient variation in affix ordering is a particularly stringent testing ground for morphotactic theories.I flesh out the typology of affix-order variation with cases from several languages before turning to the primary case study of this article, Tagalog's aspectual reduplicant (Schachter & Otanes 1972 (hereafter S&O), Carrier 1979, Condoravdi & Kiparsky 1998, Rackowski 1999, Mercado 2007, Skinner 2008).Building on previous accounts with a corpus study, I discuss the productive characteristics of the system, including its gradience (not all options are equally good) and the sorts of morphological contingencies the morphotactic grammar must address.A sample of Tagalog data follows in 1-4 as a brief illustration of such gradience and contingency (see §3 for details).First, individuals freely place aspectual 'RED' before or after the causative prefix pa, as in 1.The corpus incidence of the affix matrix (with RED in any position) and percentages of the variants appear to the right.As discussed in §3, the variants' proportions are generally consistent across roots.
(1) ~pa-RED -ROOT-an (64.5%) ~RED -pa-ROOT-an (35.5%) (N = 212,140) When pag 'transitive' is added outside of pa as in 2, however, the position after pa is almost never employed.(Variants whose corpus proportions round to 0.0% are not shown.)Third, the position between pag and pa continues to be the preferred variant (though more weakly so) when two additional prefixes, ma 'ability' and ʔi 'object topic', are added to the prefix string in 2 to make 3. Finally, when ʔi, the second prefix in 3, is replaced with ka 'telic' in 4, a higher position in the verb, between ma and ka,is preferred.
One important finding of this article is that the set of marked (grammatical but not preferred) variants in Tagalog is predictable from the set of unmarked (preferred) variants.When the bigram learner is trained on the Tagalog corpus with all marked variants removed, the learner overgenerates variation in patterns that resemble the unseen actual corpus, both in terms of the positions that are overgenerated and the degrees to which they are overgenerated.I am therefore able to motivate the evolution and stability of such a complex morphotactic system.The variation, I suggest, is driven by the overgeneralization of local morphotactics, a kind of analogy in affix ordering that I refer to as morphotactic extension.In sum, bigram morphotactics is supported not only by its descriptive adequacy, covering the typology where other theories fall short, but also by its predictive power in modeling analogical effects during learning.
The article is organized as follows.I first focus primarily on description, establishing the typology of affix ordering systems and examining a corpus study of a variable affix in Tagalog.The theory of morphotactic constraints is then developed, including learning simulations and discussion of morphotactic extension as an explanation of free variation in ordering.Following this, I argue against other theories of affix ordering, including both rule-based accounts and other constraint formalisms.Finally, I discuss nontransitivity, and then address the restrictiveness of bigram morphotactics.
1. FOUR TYPES OF AFFIX ORDERING.Consider a verb with two derivational affixes.For simplicity, let us assume that the two affixes surface adjacently on the same side of the root; more complex situations are treated in the following sections.Four ordering scenarios are possible.First, the ordering of the affixes might be determined by semantic scope.For example, both Quechua verbs in 5 contain causative chi and reciprocal na (Muysken 1986). 1 In 5a the reciprocal scopes over the causative (i.e. the causation is re-ciprocal).In 5b the causative scopes over the reciprocal (i.e. the seeing is reciprocal).In both cases, the higher-scoping morpheme is more offset from the root, as predicted by ordering principles based on semantic scope (e.g.Muysken 1981a,b, Bybee 1985, Pesetsky 1985, Rice 1993, 2000) or on affix ordering reflecting the order of syntactic operations (Baker 1985, cf. Alsina 1999).
(5) a. riku-chi-na-n-ku see-CAUS-REC-3-PL 'they make each other see (something)' b. riku-na-chi-n-ku see-REC-CAUS-3-PL 'they make (them) see each other' Second, the order might be fixed, even if the scope is reversible.In Luganda, for instance, a passivized causative and a causativized passive can both only be realized with causative es preceding passive ebw (e.g. the stem nyw-es-ebw 'drink-CAUS-PASS' is used for both '[be [made to drink]]' and '[make [be drunk]]'; McPherson & Paster 2009).Other fixed orderings (despite reversible scope) have been discussed in Bantu and elsewhere (e.g.Hyman 1994, 2003, Good 2003, 2005, 2007, McFarland 2005, Paster 2006a,b, Caballero 2008, 2011).A second example, from Mapuche (Smeets 1989:348), is given in 6 in which the simulative and negative are fixed in that order regardless of whether the negative scopes over or under the simulative.
(6) a. pe-w-faluw-la-e-y-u see-REFL-SIM-NEG-IND.OBJ-IND-AGR 'I did not pretend to see you (sg.).' b. pe-w-faluw-la-e-y-u see-REFL-SIM-NEG-IND.OBJ-IND-AGR 'I pretended not to see you (sg.).' Third, the affixes might be freely ordered for one scope but fixed (according to scope) for the other.Hyman (2003:250) refers to this situation as 'asymmetric compositionality' and claims it is exhibited by the Chichewa causative its and reciprocal an suffixes.For a causativized reciprocal semantics (7a), either order is grammatical (Hyman 2003; full examples in 7 and 10 from Larry Hyman and Sam Mchombo, p.c.).For a reciprocalized causative (7b), only the scopal order CAUS-REC is available.For similar cases in Choguita Rarámuri, see Caballero 2011.
(8) a. o irt-ir-in-ii ~irt-in-ir-ii kam supu o kuddu 3SG stir-APPL-CAUS-PAST stir-CAUS-APPL-PAST 1SG soup DET spoon 'he made me stir the soup with a spoon' (I used a spoon) b. o irt-ir-in-ii ~irt-in-ir-ii kam supu o laɓi 3SG stir-APPL-CAUS-PAST stir-CAUS-APPL-PAST 1SG soup DET knife 'he made me stir the soup with a knife' (he used a knife) This fourfold typology is summarized in Table 1, which shows the possible output(s) for both possible semantic scopes in each type of grammar.The grammars are modeled by rankings of three constraints, namely, two morphotactic constraints (e.g.'X-Y' prefers an output with X-Y; for more details, see §4) and a SCOPE constraint pressuring for scopal ordering (Condoravdi & Kiparsky 1998, Paster 2006a,b, Aronoff & Xu 2010; similar violable constraint proposals include MIRROR in Hyman 2003, Wolf 2008, and McCarthy 2011;LINEAR CORRESPONDENCE in Ackema & Neeleman 2004, 2005;and LINEARITY in Horwood 2002).Tilde indicates indeterminate or free ranking; '>>' strict domination; '[[... X]Y]' an input in which Y scopes over X. Assume that other morphotactics guarantee that the root is always initial, so that X and Y are always suffixes.
Finally, note that the ordering scenario associated with a pair of morphemes is generally unpredictable from the semantics of those morphemes.For instance, causative and reciprocal are scopally ordered in Quechua (5), asymmetrically compositional in Chichewa (7), and fixed in Luganda (McPherson & Paster 2009).There is even variation across Quechua dialects on this point: 'The highland dialects … allow reciprocal to both precede and follow causative; this does not appear to be possible in the jungle dialects' (Muysken 1981a:470).Moreover, different pairs of affixes within a language might exhibit different ordering facts.In Fuuta Tooro Pulaar, for example, in and ir are freely ordered (8), id 'comprehensive' and in vary according to scope, and it 'repetitive' and ir are fixed in that order regardless of scope (Paster 2006b).

FREE VARIATION IN AFFIX ORDERING.
In §1 I mentioned the case of two Pulaar affixes being freely ordered regardless of scope.In this section I describe four additional scenarios of free variation in affix order.This is only an introductory survey; analysis and empirical details of free variation, particularly in Tagalog, are pursued in the following sections.
First, a morpheme might vary freely between nonadjacent positions.Examples from five languages are given in 9.In each case, the boxed morpheme surfaces freely (i.e.synonymously) before or after a two-morpheme sequence, but cannot interrupt it.The intervening bigrams are clearly compositional, bimorphemic strings in each language.Each of the morphemes can occur with the glossed semantics independently of its neighbor (sources cited in 9).Furthermore, in each case, the source explicity rejects any other ordering options for the given meaning.Tagalog RED is an aspectual reduplicant, to be treated in detail in the following sections, where I argue that its placement is morphologically determined ( §3) and suggest a motivation for this pattern ( §6).On the semantics of Tagalog ka 'telic' and pag 'transitive', see Travis 1996, 2007, and Rackowski 1999.
(11) a. ma-RED-ka-tulong ~ma-ka-RED-tulong ABIL-ASP-TEL-help ABIL-TEL-ASP-help 'will be able to help' b. ma-RED-ka-pag-trabaho, *ma-ka-RED-pag-trabaho ABIL-ASP-TEL-TRANS-work ABIL-TEL-ASP-TRANS-work 'will be able to work' Fourth, a group of more than two adjacent affixes might all be freely ordered with respect to each other.Bickel and colleagues (2007:44) describe this situation in Chintang.For example, the three prefixes in 12, namely, ma 'negative', u '3rd-person nonsingular agent', and kha '1st-person nonsingular primary object', are grammatical in all six permutations with no meaning differences.This type of variation is also exhibited by certain sets of suffixes in (e.g.Cochabamba) Quechua (van de Kerke 1996: §3).
(12) u-kha-ma-cop-yokt-e 3NS.AGENT-1NS.P-NEG-see-NEG-PAST ~u-ma-kha-cop-yokt-e ~kha-u-ma-cop-yokt-e ~ma-u-kha-cop-yokt-e ~kha-ma-u-cop-yokt-e ~ma-kha-u-cop-yokt-e 'they didn't see us' 3. THE TAGALOG CONTEMPLATED ASPECT MORPHEME.The contemplated (unrealized) aspect of Tagalog is marked by a long, weakly stressed CV‰ reduplicant prefix.I refer to this morpheme as RED (corresponding to RED a in S&O), though it should be understood throughout that I am only treating one of a number of reduplicative morphemes in Tagalog, which do not all behave in the same way.If a verb has no other prefixes, RED immediately precedes the root, for example, sà‰-salitaʔ-ín 'RED-talk-OT'(OT being 'object topic'; topic markers indicate the role of the topic-marked argument in the sentence).
The phonological facts support the analysis of RED as a prefix as opposed to an infix (i.e.sà‰-sa-litaʔ-ín).First, cluster simplification, if asymmetric, obtains in the first copy, not the second, for example, trabahúh-in 'work-LT' yields tà‰-trabahúh-in with RED.Second, segmental nativization, if asymmetric, obtains in the first copy, not the second; for example, mag-θǽŋkyu 'thank' can yield mag-t ‰θǽŋkyu but not *mag-θ ‰-tǽŋkyu (Zuraw 1996:8;  In some verbs with three or more prefixes, RED can take three or more positions, as in 10 above in §2.But other verbs with multiple prefixes are more restrictive about where RED can be situated, for example, ma-ki-pag-pa-sayá 'ABIL-SOC-TRANS-CAUS-be.happy' (SOC being 'social', TRANS 'transitive'), in which RED is only used before ki.It is ungrammatical or degraded in all other positions.But when ki is replaced by ka 'telic', as in 14, the position between pag and pa, unavailable in ma-ki-pag-pa-sayá, becomes a grammatical option.
(14) a. * mà‰-ma-ka-pag-pa-sayá 'RED-ABIL-TEL-TRANS-CAUS-be.happy' b. ma-kà‰-ka-pag-pa-sayá 'ABIL-RED-TEL-TRANS-CAUS-be.happy'c. *ma-ka-pà‰-pag-pa-sayá 'ABIL-TEL-RED-TRANS-CAUS-be.happy'd. ma-ka-pag-pà‰-pa-sayá 'ABIL-TEL-TRANS-RED-CAUS-be.happy'e. ?ma-ka-pag-pa-sà‰-sayá 'ABIL-TEL-TRANS-CAUS-RED-be.happy''will be able to make happy' Although I have exemplified only particular roots in the Tagalog data thus far, the facts about the placement of RED are essentially the same regardless of which root is used with the given prefixes.For one, this is how the descriptive grammarians state the facts.According to the rules of S&O (p.362) and the paradigms of Ramos & Bautista 1986, RED is grammatical immediately before ka or pa and illicit elsewhere in 14.Samuels (2006) likewise rejects 14a and 14c but implies that 14e is licit.In my corpus data (below), the order in 14e is encountered 0.04% of the time for this prefix string.
Corpus data support the judgments of the descriptive grammarians.The bar chart in Figure 1 shows the Google hits (www.google.com,retrieved May 2009) for the prefix string ma-ka-pag-pa with RED in each of the five positions listed in 14.Each bar represents the hits for the position indicated summed over 321 different Tagalog roots queried with this prefix string.For example, the 'RED-ka' bar in Fig. 1 represents the sum of the results for makakapagpabili (188 hits), makakapagpakita (1,310 hits), and so forth for the remaining 319 roots in the pilot lexicon, making for 32,488 total hits.Searches were automated by a Perl script with the LWP::UserAgent module (distributed by www.cpan.org).This chart reveals that for ma-ka-pag-pa, the position before ka is preferred almost three to one over the position before pa, while all other positions, including root reduplication, are rarely if ever encountered in the corpus.Furthermore, the aggregate preference for RED-ka depicted in Fig. 1 is relatively stable across roots (r = 0.84 for the 172 roots of the 321 queried that had nonzero results for ma-ka-pag-pa plus RED).This rough consistency across roots is illustrated by the jittered one-dimensional scatterplot in Figure 2.Each circle represents one of the 172 roots attested with ma-ka-pag-pa plus RED in any position.For example, the largest circle represents saya 'be happy', for which there were 13,495 tokens with ma-ka-pag-pa plus RED.The x-coordinate of the circle center represents the proportion of the time that RED is placed before ka when that root is prefixed with ma-ka-pag-pa and RED; for saya it is 0.748.The size of each circle is proportional to the sample size on which that proportion is based.Thus, x-coordinates of larger circles are expected to be closer to their underlying propensities than those of smaller circles, which reflect smaller and therefore less reliable samples.The weighted mean of the circles in Fig. 2 is 0.745, the same as that given in Fig. 1.Table 2 shows the Web-based corpus results for RED in each prefixal position in twenty-nine prefix strings, organized by the number of prefixes.In this table, 'pos.0' is always the position before the first prefix, 'pos.1' is after the first prefix, and so forth.Incidences for each position are given as percentages rounded to the nearest tenth.For example, the first row indicates that in verbs of the form ka-ROOT-an, RED precedes ka 31.9% of the time and follows it 68.1% of the time.'Token N' is the total incidence of the affix matrix plus RED in the corpus.'Type N' is the number of roots attested in the corpus with the given affix matrix plus RED in any position.In total, 321 roots were queried with RED in each position in each prefix string.In some rows the type count exceeds 321 because these rows combine counts for affix matrices with different suffixes.For instance, the 'pa-+ {-an, -in}' row has a type N of 337 because it sums the results for 'pa-+ -an' (161 roots) and 'pa-+ -in' (176 roots).As this practice implies, the choice of suffix has no effect on RED placement in the prefix string.In this case, for instance, there is no significant difference between the incidence of RED before pa-in the -an types vs. the -in types, Welch's t(615) = -0.58,p = 0.56.Finally, to increase the scope and reliability of the study, all m-initial forms were also queried with nasal-substituted n-initial variants (e.g.ma-ka-includes results for na-ka-).Nasal substitution (see Kroeger 1993 on semantics, Zuraw 2010 on phonology) and RED together indicate imperfective aspect. 2ercentages in Table 2 are based on token frequencies.For example, ma-ka-pag-pa-ROOT has a token N of 43,623 (summing over all 172 roots found with this prefix string plus RED), so the 74.6% for 'pos.1' corresponds to approximately 32,500 tokens.It would also be possible to obtain these percentages by averaging across types (i.e.roots), but the results would be essentially the same.For example, when the root with ma-ka-pag-pa is tawa, the pre-ka position is used 71.7% of the time.For tayo,i ti s 78.6% of the time.Averaging over the 172 percentages for roots attested in this frame, RED is found in the pre-ka position 73.5% of the time, a small difference from the 74.6% figure based on the overall token count for that position.Results would be similar for all percentages given (r = 0.964 for type-vs.token-derived percentages).
Most sources agree that the facts about RED's placement cannot be determined by phonology (S&O, Carrier 1979, Condoravdi & Kiparsky 1998, Rackowski 1999, etc.).First, unlike Tagalog's infixes, RED never interrupts a morpheme.Second, the preferences in placing RED are apparently insensitive to the prosodic properties, for example, stress pattern, of the root and its affix matrix.For example, nearly phonologically identical pairs of prefix strings can exhibit markedly different patterns in RED placement, for example, ma-ki-pag, in which RED almost never follows pag, contrasting with ma-kapag, in which RED often follows pag.Third, the rules need to be stated over morphemes, as in S&O, rather than over segments or prosodic positions.For instance, RED cannot precede word-initial ma 'ability'.This cannot be because RED is avoided word-initially; for example, it can precede (otherwise) initial pa.Nor is there a dispreference for the reduplication of m or ma as opposed to other strings.RED can felicitously precede ma if it is part of the root.More complex morphological dependencies are treated in §4.To briefly address one particular phonological proposal, Inkelas (2000) and Inkelas and Zoll (2005) (see also French 1988, Booij & Lieber 1993, Cole 1994, Downing 1998) suggest that RED variation in Tagalog can be explained as optionality between root and second-syllable reduplication.Although they consider a variety of reduplicants, some of their examples involve the aspectual reduplicant, implying that the analysis is meant to cover it.A prefix string such as ʔi-pag-pa demonstrates the untenability of this proposal for aspectual RED.In this string, the position between pag and pa is preferred for RED, accounting for 93% of some 50,000 corpus instances.This is neither the second syllable nor preroot position.Nor is it possible to interpret pa as part of the (prosodic) root, because in other prefix strings, RED can intervene between pa and the root.
4. MORPHOTACTIC CONSTRAINTS.I argue that the sorts of morphological variation described in § §2 and 3 are the result of tensions between different morphotactic constraints competing in the grammar.I propose that these constraints are local in the sense that each evaluates only a pair of adjacent morphemes.In this section, I sketch how bigram morphotactics can implement the four variable scenarios in §2.I argue against other possible formalisms in § §7 and 8.
A bigram constraint X-Y, in which X and Y are (classes of ) morphemes, can be taken to penalize each instance of X not immediately followed by Y (cf.local selectional restrictions, e.g.Fabb 1988). 3The ranking of these constraints motivates ordering restrictions, as in 15, in which X-Y-Z is the only grammatical output for an input comprising X, Y, and Z.This OPTIMALITY THEORY (OT; Prince & Smolensky 2004[1993]) tableau includes all six ordered pairs of morphemes as constraints and all six orders of the three morphemes as candidates.Dashed vertical lines indicate indeterminate or freely variable rankings.
(15) Tableau illustrating ranked bigram constraints 3 For the purposes of this article, I interpret X-Y as assigning a violation iff a candidate contains X not immediately followed by Y.Other interpretations favoring candidates containing X-Y over those not containing X-Y would work just as well for the data treated here.I remain neutral as to whether these constraints are part of a combined phonological/morphological constraint-based component (along the lines of Wolf 2008 andMcCarthy 2011) or part of an autonomous morphological component (cf.distributed morphology), among other possibilities.The present proposal is compatible with either architecture, given some flexibility about the precise interpretation of the constraints (e.g.whether they refer to abstract morphemic indices or to phonological material licensed by morphemes).
It is also possible that the learner posits constraints only for bigrams that it has actually encountered, in which case only the first two constraints in 15 would be present.
For simplicity, I omit candidates with morpheme duplication or deletion, as these can be ruled out by constraints that are unviolated in all the data considered in this article.On constraints requiring morpheme realization, see Noyer 1993, Samek-Lodovici 1993, Rose 1997, Walker 1998, Kurisu 2001, Ussishkin & Wedel 2002, MacBride 2004, Wolf 2008, and many others; on ruling out gratuitous duplication of morphemes, see Noyer 1993, Peterson 1994, Jaker 2006, Inkelas & Caballero 2008, Wolf 2008, among others.Phonological constraints such as MAX and DEP (McCarthy & Prince 1995) could also be used to ensure biunique mapping in the cases analyzed in this article.Furthermore, under bigram morphotactics, circumfixes can be treated as pairs of morphosyntactically or semantically interdependent affixes.
Candidates with morpheme duplication (etc.)need to be generated, however, since multiple exponence is known to occur (e.g.Inkelas & Caballero 2008, Caballero 2010, and numerous references therein).To give one relevant example, Pima compound pluralization (Munro & Riggle 2004, Riggle & Wilson 2004) superficially resembles Tagalog in that a reduplicant is free to occur in multiple morphological positions in the word, as in 16a-c (leaving RED abstract for clarity).Unlike Tagalog, however, RED can also be simultaneously realized in any combination of the positions 16d-g.One potential analysis of 16 is that RED duplication is motivated by local morphotactics: if bigram constraints such as RED-root 1 , RED-root 2 , and RED-root 3 are freely ranked with a constraint assigning a penalty for each copy of RED, all of the options in 16 are generated (this was confirmed using OT-Help, Becker et al. 2007; to rule out the candidate with no copy of RED realized, a constraint such as MAX(morph) 'assign a penalty if a morpheme is unrealized' can be undominated).This line of analysis would also permit giving the constraints different weights to account for gradient preferences among the options in 16.I leave exploring the typology of possible interactions between morphotactics and multiple exponence to future work.
(16) Plurals of the Pima compound ˈus-kàlit-váinom 'wagon knife (lit.tree-carknife)' a. RED-ˈus-kàlit-váinom b. ~ˈus-RED-kàlit-váinom c. ~ˈus-kàlit-RED-váinom d. ~RED-ˈus-RED-kàlit-váinom e. ~RED-ˈus-kàlit-RED-váinom f. ~ˈus-RED-kàlit-RED-váinom g. ~RED-ˈus-RED-kàlit-RED-váinom I first address the case of a morpheme varying freely between nonadjacent positions.Recall, for example, that Tagalog RED can optionally alight on either side of the ka-pag prefix bigram, but cannot interrupt it.The fact that the variable morpheme is a reduplicant here is apparently not synchronically crucial (regardless of the diachrony, cf.§6); after all, all of the scenarios involving RED discussed in §2 were also exemplified by fixed-melody morphemes in other languages.This case can be modeled by a highly ranked (or weighted) ka-pag bigram, representing the speaker's knowledge of the relative coherence (unsplittability) of these two affixes.At the same time, the variation of RED can be motivated by two or more variably ranked (or closely weighted) bigrams pressuring for both the pre-ka (and/or post-ma) and pre-pa (and/or post-pag) positions, as in 17.To save space, '√' is used for 'ROOT' in tableaux.The numbers are explained in the following paragraphs.
(17) Tableau illustrating variation between nonadjacent positions RED-pag-ka-pag-pa-RED-ma-ma-pa-REDka RED pag pa RED pa RED ka √ ka GEN'D SCORE CANDIDATE 11.1 10.9 9.6 7.9 7.6 7.2 6.9 6.7 6.5 2.  2010), constraints are assigned real-number weights and each candidate's score is the sum of its weighted constraint violations.I employ a type of harmonic grammar termed MAXIMUM ENTROPY (maxent) grammar (e.g.Johnson 2002, Goldwater & Johnson 2003, Wilson 2006, Hayes & Wilson 2008) to model gradient variable data.In tableau 17, the weight of each constraint is given under its label and the score of each candidate (lower being better) is given to its left.Scores are computed as in 18, in which i ranges over constraints, C i (x n ) gives the violations incurred by candidate x n on constraint C i , and Weights were computed using maxent learning software by Wilson and George (2008), which draws on the CONJUGATE GRADIENT method of Press et al. 1992 (no smoothing term is employed here).The learner is given the actual Tagalog output frequencies, from which it weights the constraints so as to maximize the fit between generated and observed frequencies.The generated ('gen'd') percentages in the first column in 17 were computed as follows.First, H(x n ), the 'maxent value' of a candidate x n , is given by raising the constant e (approximately 2.718) to the negation of x n 's score.Then, to get P(x n ), the probability of x n , H(x n ) is divided by the sum of the maxent values of all candidates, as in 20.Note that like Goldwater & Johnson 2003and Wilson 2006but unlike Hayes & Wilson 2008, maxent values are translated into probabilities on a per-input basis. ( For the purposes of the simulations in this article, the candidate space is always the factorial set of morpheme permutations.(Any additional candidates, even if theoretically present, would not be distinguishable by the constraints discussed here.)For example, tableau 17 shows only five selected candidates, but all 6! = 720 permutations of the six morphemes were used as candidates in training the weights.In tableau 19 and elsewhere, all unshown candidates have effectively zero (0.00%) generated proportions.
In addition to the two most frequent outputs in 17, with RED immediately preceding or following ka-pag, selected other candidates shown there include (c) marginally acceptable root reduplication, (d) illicit placement of RED within ka-pag, and (e) illicit ordering of ka after pag rather than before it.The ten bigrams in 17 comprise the full set of bigrams encountered in the three attested outputs {ma-RED-ka-pag-pa-ROOT, ma-ka-pag-RED-pa-ROOT, ma-ka-pag-pa-RED-ROOT}, following the principle that only bigrams from attested outputs need to be posited as constraints.
A tableau for the second type of variation described in §2 is given in 21.In this Tagalog example, RED can be situated in any of the three positions between the first prefix and the root, while all other morphemes are fixed in order.As before, the nine constraints represent all bigrams seen in actual outputs for this input.(A more comprehensive learning simulation combining all data in the Tagalog corpus is pursued in §5; for now, I analyze each case independently for the purposes of illustration.)In the third type of variation in §2, two morphemes are freely ordered except in the context of some third morpheme.In Tagalog, for instance, RED can precede or follow ka in the prefix string ma-ka, but it can only precede ka in ma-ka-pag.Two tableaux are given in 22, the first for the input containing {RED, ma, ka, ROOT}, the second for {RED, ma, ka, pag, ROOT}.RED is generated after ka 2.85% of the time for the first input but never for the second.The final scenario described in §2, that of free ordering among three or more morphemes, can be achieved by giving a number of bigrams the same weight.Recall Chintang, in which the prefixes u, ma, and kha occur freely in all permutations, while the suffixes yokt and e are fixed in order.To model these facts, the bigrams ROOT-yokt and yokt-e can be given relatively large weights (say, 20.0), while the nine remaining bigrams (three prefix-root bigrams plus six prefix-prefix bigrams) can all be assigned to the same sufficiently lower weight (say, 5.0).

LEARNING SIMULATIONS.
In §4, I presented analyses of fragments of Tagalog verbal morphology.I now turn to a larger set of outputs annotated with corpus frequencies in order to undertake a more unified and comprehensive analysis of the prefixal morphology.As my corpus, I use the Web-based frequencies summarized in Table 2, which covers twenty-nine combinations of prefixes (as explained in §3, I ignore suffixes).For scope and realism, I also include the same twenty-nine prefix sets without RED, even though the order is always fixed in those cases.Thus, the present corpus comprises fifty-eight prefix sets, half with RED, half without.These sets can be considered inputs.In cases of free variation, one input corresponds to multiple outputs.
Among all the outputs attested in the corpus, thirty-nine bigrams are observed; these are posited as constraints.The learner then weights the constraints so that its generated forms match its training data as closely as possible.As in §4, I use maxent learning software by Wilson and George (2008) (see Wilson 2006), but other learning algorithms for harmonic/maxent grammar could have been employed (see §6).The learned weights are given in Table 3.In training this grammar, I included all ordering permutations of each morpheme set as candidates.For example, the input set {RED, ma, ka, pag, pa, ROOT} had 6! = 720 candidates in the training tableau, even though only three were attested.A Perl script facilitated assessing violations for each of the 4,118 candidates (all permutations of the fifty-eight input sets) against each of the thirty-nine constraints, tabulating the results in a spreadsheet formatted for the maxent software.Various criteria can be used to assess how well the learned grammar matches its training data.The mean percentage error for the ninety-seven RED positions given in Table 2 is 1.16%.The weighted mean error, in which observed/generated differences are given weight in proportion to the frequency of the input, is 0.13%.Pearson's correlation coefficient is 0.9954 for all forms observed and/or generated with greater than 0.0% incidence.Finally, the three maximum errors are 29.4%,17.9%, and 11.5%, respectively, all for the relatively poorly attested input {RED, ʔi, pa, ki, ROOT}.While it is difficult to say what constitutes a 'good' or 'bad' fit in absolute terms, these values become more meaningful when theories are compared.To this end, I also trained a grammar of precedence constraints (X … Y: 'assess a violation iff X does not precede Y'; see §7) on the same data.Forty-three constraints were posited, one for each precedence pair observed in the training data.Despite its greater number of free parameters (i.e.constraints), the fit of the optimal precedence grammar is substantially worse on all criteria.The unweighted and weighted mean percentage errors for the ninety-seven RED positions are 13.55% and 7.09%, respectively, over ten times as great as for the bigram grammar.The correlation is 0.8195.The three maximum errors are 83.8%,77.1%, and 65.8%.
In this section, I trained the grammar on all available corpus data.It is also possible to hold back data from the learner and then test the resulting grammar on unseen data to see how well it differentiates actually grammatical suggestions from ungrammatical ones, as a form of cross-validation.I pursue one such simulation in §6.
6. MORPHOTACTIC EXTENSION.What are possible and impossible patterns of free variation?Why does variation exist at all?In this section, I intend to begin to answer these questions (see also §10) by describing an emergent property of morphotactic constraint systems I call morphotactic extension, which can be defined as the analogical extension of an ordering relationship among morphemes to a context in which it was not previ- ously encountered.My claim, in other words, is that affix order is subject to analogy, and that this analogy can help explain the diachronic origin and synchronic stability of morphotactic variation.
To illustrate this principle with a simple Tagalog example, consider the prefix strings ma-ka, pag-pa, and ma-ka-pag-pa.In Modern Tagalog, RED is relatively stable in the first two (97.2%ma-RED-ka and 100.0%pag-RED-pa) but varies between two positions in the third (74.6% ma-RED-ka-pag-pa ~25.4% ma-ka-pag-RED-pa).But imagine, for the sake of illustration, that we are dealing with a hypothetical stage of pre-Tagalog in which RED's placement is always categorical, for example, a second-syllable reduplicant (cf.Inkelas & Zoll 2005).Thus, our hypothetical pre-Tagalog learner encounters only ma-RED-ka, pag-RED-pa, and ma-RED-ka-pag-pa.Furthermore, as in Modern Tagalog, we can assume that ma-ka (+RED) and pag-pa (+RED) are vastly more frequent than ma-ka-pag-pa (+RED) (in Table 2, they are 139 and twenty times as frequent, respectively).
Bigram morphotactics predicts that until a late stage of learning, the learner will substantially overgenerate RED between pag and pa in ma-ka-pag-pa, creating variation where there was none in its training data (in this hypothetical historical scenario).Figure 3 schematizes the extension of RED from between pag and pa in pag-RED-pa to that same position in ma-ka-pag-RED-pa.In this case, the learner posits and gives some weight to the constraints pag-RED and RED-pa when it sees RED in those contexts in the prefix string pag-RED-pa.In doing so, however, it initially also (over)generates RED between pag and pa in the longer prefix string ma-ka-pag-pa, even though it has not seen RED in that position in that string.The key to morphotactic extension is the locality of the constraints: the learner initially anticipates the position between pag and pa to be felicitous based on its knowledge, encoded in bigram constraints and their weights, that RED occurs after pag and before pa in other, more common prefix strings.Of course, the learner would also encounter many other prefix strings containing these affixes, so in order to show more rigorously how extension exerts its effects during learning, a more comprehensive simulation is in order.As my training corpus, I use the Web-based corpus (Table 2), only now with all variants suppressed.Only the 'best' (most frequent) output is represented for each input.In order to model the timecourse of learning, I employ a gradual learning algorithm for harmonic grammar described by Boersma and Pater (2008) and Pater (2009:19), which is based on Rosenblatt's (1958) Perceptron and only slightly modified from the gradual learning algorithm for stochastic optimality theory (Boersma 1997, Boersma & Hayes 2001).See also Jäger 2007 on an online maxent learner.
A babbler, written in Perl, feeds Tagalog verb tokens to the learner one at a time, randomly selected from the training corpus in proportion to their frequencies.The learner, again implemented in Perl by the author, takes two steps for each token.First, if it sees any bigrams that are not yet part of its constraint set, it posits them as constraints, initially weighted zero.(I assume that the learner starts out already mature enough to parse affixes; on parsing, see Goldsmith 2001, Baroni 2003, Poon et al. 2009, and many oth-ers.Here I treat only the evolution of the productive grammar of ordering.)4Second, the learner undergoes error-driven reweighting.Learning is error-driven in the sense that the learner takes the set of morphemes in the observed token and compares its generated output for that set of morphemes.If they are the same, nothing further happens.If there is a discrepancy, however, an error has occurred, to which the learner responds by adjusting its constraint weights, specifically, incrementing any constraints preferring the form and decrementing any preferring the error.Finally, variation between outputs is achieved by adding random (Gaussian) noise to each weight at each evaluation, causing closely weighted constraints to sometimes trade off dominance.Additional details about these processes are not important here and can be found in the cited references.
Crucial for the present purposes is that if we stop this learner at a sufficiently early (but not, as is explained below, overly early) stage of learning and test it on all inputs, it generates significant variation for some inputs, despite having observed none.Moreover, this variation resembles the variation found in the actual Tagalog corpus, to which the learner did not have access. 5For example, after training on 30,000 tokens from the best-only corpus, the learner generates RED as in Table 4, which is laid out like Table 2. Percentages from the actual corpus (Table 2) are repeated here to the left of each slash; percentages generated by the learner are to the right.For each RED position seen by the learner, the number of tokens seen is given in parentheses (these do not add up to 30,000 because Table 4 shows only inputs containing RED at least two other prefixes).Since the training corpus contains no variation, the learner never sees more than one RED position used for any particular input.For a few prefix strings (e.g. the learner did not see any tokens with RED. Despite its categorical training data, the learner generates variation for most inputs in Table 4.For example, for the prefix string ma-ka-pag-pa, the learner only saw RED between the first two prefixes (twenty-seven tokens).But it generates RED in two positions (72.1% ma-RED-ka-pag-pa ~26.5% ma-ka-pag-RED-pa), poorly matching its training data, but uncannily approximating the real Tagalog corpus (i.e.74.6% ma-RED-ka-pagpa ~25.4% ma-ka-pag-RED-pa).As outlined above, this prediction is driven by extension from other prefix strings (e.g.728 tokens of pag-RED-pa).The learner has not yet seen enough tokens of ma-ka-pag-pa (with RED always in the second position) to 'unlearn' this possibility, as it eventually would if learning were allowed to proceed.Because short prefix strings vastly outnumber longer ones, the learner's early treatment of longer strings is based more on extrapolation from short strings than on actual experience with longer ones (cf.Elman 1993 on how 'starting small' can facilitate learning under certain conditions).
The learner also correctly anticipates, for the most part, where extension is blocked.Compare, for instance, ma-ka-pag and ma-ki-pag (Table 5).For both inputs, the learner only saw RED immediately following ma.For ma-ka-pag, it additionally extends to the position between pag and the root.But it does not (more than slightly) extend to the same position in ma-ki-pag.This is because, based on its training data, it has given RED-ki more weight than RED-ka (28.3 and 21.8, respectively).(The learner erroneously extends RED to the preroot position in mag-si-pag because it has not yet seen enough tokens with si to know si is more like ki than ka when it comes to blocking extension.)Thus, on average, the learner's (over)generation of variation is localized to positions where variation is exhibited in the actual Tagalog corpus.actual to generated percentages in Table 4, the overall weighted mean error (as in §5) is 2.9%.If the learner had matched its training data perfectly, this error would have been 4.6%, almost twice as great.Indeed, if trained for long enough on the best-only corpus, the learner eventually (after around a million tokens) converges on a virtually categorical grammar.Not surprisingly, the observed-to-generated fit increases monotonically as a function of training.More interesting, however, is that the best-only-trained learner's fit to the unobserved actual corpus peaks at ~30,000 tokens and declines from then on.The gener- To conclude, in the previous section I showed that bigram morphotactics enables the learner to closely match its training data when trained on the whole corpus.In this section I showed that even when it is trained on a filtered corpus of invariable (most frequent output only) Tagalog data, the semi-mature learner still correctly predicts to a large extent which inputs should exhibit variation and how that variation should manifest.It follows that under bigram morphotactics, the marked variants are not arbitrary; rather, they emerge naturally from extension from unmarked outputs.In Tagalog, the invariable corpus is actually harder for the learner to master than the variable corpus, as evidenced by the learner's initial skewing toward the latter.Diachronically, morphotactic extension provides an analogical explanation for the innovation of variation in categorical systems with certain characteristics.Synchronically, it motivates the stability of marked variants in the productive morphology.Consider a rare but grammatical position for RED such as ma-RED-ʔi-pa-ROOT (0.3% incidence, but explicitly identified as in S&O:362).This form is rare enough that a young learner might never hear it.Under bigram morphotactics, however, this position, unlike certain other unseen positions, receives support from extension.
Finally, morphotactic extension constrains the typology by rendering certain types of systems unlearnable.Imagine, for instance, a hypothetical version of Tagalog in which RED is placed between pag and the root unless the suffix is present, in which case RED must precede pag.The bigram morphotactic learner can never converge on this system; it can only learn variation for both inputs, unavoidably extending to both.Extension can be blocked only when the blocking morpheme is adjacent to the morpheme whose extension is being blocked (as with ki blocking the extension of RED in Table 5, in which the bigram RED-ki is able to 'attract' RED away from what would otherwise be an eligible position; there is no constraint RED-an or an-RED to play the role of attractor in the present case).I return to restrictiveness in §10.

7.
AGAINST OTHER MORPHOTACTIC THEORIES.I showed in §4 that bigram constraints can model the types of variation described in §2.I now argue that other possible morphotactic formalisms, namely, morpheme alignment, precedence constraints, and a sin- gle big template, undergenerate these scenarios.I also make some preliminary remarks against the morphosyntactic accounts of RED variation proposed by Rackowski (1999), Mercado (2007), and Skinner (2008) before going on to make more general arguments against serial rule-based approaches to affix order in §8.
First, an approach to morphotactics sometimes used in OT is that of morpheme alignment (e.g.Hargus & Tuttle 1997, Trommer 2003, Jaker 2006).An alignment constraint conforming to the schema proposed by McCarthy and Prince (1994) can be used to enforce the proximity of a morpheme to the edge of a phonological constituent such as the prosodic word (PWd).For instance, ALIGN(its, left, PWd, left), which I abbreviate 'its-L' in 23, is violated by the number of morphemes (or some other unit) that intervene between the left edges of the PWd and its.This constraint therefore pressures its to be word-initial.Constraints specifying other morphemes can be formulated on this pattern and ranked against each other, as in 23.This ranking guarantees that its always precedes il in the output, regardless of scope.
(23) Tableau illustrating morpheme-alignment constraints LANGUAGE, VOLUME 86, NUMBER 4 ( 2010) Alignment cannot generate the first three scenarios in §2.It cannot model an affix varying freely between nonadjacent positions, since that would require the alignment constraint of the variable affix to float freely between nonadjacent positions in the hierarchy over a set of fixed constraints: {RED-L >> ka-L >> pag-L; ka-L >> pag-L >> RED-L} (for RED-ka-pag ~ka-pag-RED).A partially ordered grammar (as in Anttila 1997a,b) could specify ka-L >> pag-L, leaving RED-L unranked with respect to ka-L and pag-L, but that grammar would also generate *ka-RED-pag.A similar problem obtains for statistical models such as stochastic OT (Boersma 1997, Boersma & Hayes 2001), maxent grammar (see §4), and noisy harmonic grammar (see §4).If only morpheme alignment to the word edge is employed, the pre-ka and post-pag cannot be generated to the exclusion of the position between ka and pag.
Alignment also cannot model the scenario in which an affix is freely situated among a fixed sequence of morphemes, as in 10.Under partial ordering, RED-L must occupy the same stratum as both ʔi-L and pa-L, since RED can precede or follow each of these prefixes.This entails that ʔi, pa, and RED all be freely ordered.A stochastic implementation would require the probability distribution of RED-L to have a greater standard deviation than that of ʔi-L and pa-L, so that RED-L might fall anywhere around these two constraints, which remain narrowly pinned down.But deviation is not a free parameter in the cited models (cf.Reynolds 1994, Nagy & Reynolds 1997).Finally, alignment cannot implement context-sensitive reorderability, as in 11.Because ka and RED are swappable in ma-ka-ROOT, the variable ranking ka-L ~RED-L is necessary.This variable ranking incorrectly requires that ka and RED be equally swappable in all contexts.For additional arguments against ordering affixes by alignment, see Rackowski 1999 and §9 below.Another morphotactic formalism is the precedence constraint, as found in Muysken 1981b:266ff., Paster 2006b:184, and Caballero 2011, for example, 'C > A: Causative precedes Applicative' (Caballero).A precedence constraint can be considered the same as a bigram except that a bigram evaluates only adjacent pairs of morphemes, while a prece-dence constraint evaluates all pairs, regardless of adjacency.Precedence constraints cannot implement variation between nonadjacent positions, for example, Tagalog RED-kapag ~ka-pag-RED.To select the RED-ka-pag variant, the following ranking is needed: {RED > ka, RED > pag}>>{ka > RED, pag > RED}.For the ka-pag-RED variant, the two strata must be swapped wholesale: {ka > RED, pag > RED}>>{RED > ka, RED > pag}.Such contingency in ranking variation is not possible in the cited frameworks.Moreover, precedence cannot model context-sensitivity in affix ordering, for example, Chichewa an 'reciprocal' and its 'causative' being swappable except following il 'applicative'.an > its and its > an must be freely ranked to motivate the variation, but then there is no way to make the variation contingent on the presence of il earlier in the word.
Other theorists employ a monolithic template (see Stump 2006 and references therein) to enforce arbitrary ordering restrictions, for example, Hyman's (2003) violable 'CARP' constraint: 'causative > applicative > reciprocal > passive'.As Paster (2006b:184) points out, such templates cannot account for cases such as 8, in which two affixes are freely ordered for both scopes.Scenarios with multiple free variants (e.g. the third and fourth types of variation described in §2) make the need for multiple interacting morphotactics even more apparent.See §9 for additional arguments against templatic theories.
Finally, I briefly address two morphosyntactic accounts of affix order variation in Tagalog.Rackowski (1999) proposes a DISTRIBUTED MORPHOLOGY (Halle 1990, Halle & Marantz 1993) account of the variable placement of RED optional morphological movement, which she likens to scrambling (see also Vaux 2002).RED is generated next to the root and optionally raises in the morphology to adjoin to any vP head within its phase (see Chomsky 2001 on phases).The phase is taken to be closed by the topic marker, for example, 'object topic' or m 'actor topic'.This phase-bounded movement ostensibly explains why RED is usually ungrammatical preceding a topic marker, but in various positions after it.Mercado (2007) endorses the phase-boundedness of Rackowski's analysis.Nevertheless, observing that Rackowski's low generation of RED under morphemes such as the causative is incompatible with the fact that RED invariably scopes over such morphemes (see also Travis 2007), Mercado eliminates affix movement, instead envisioning RED as a multipositional head.As such, RED is basegenerated simultaneously in all intermorphemic (pre-vP) positions in its phase, including next to the root.In the morphology, a constraint of unique instantiation (Noyer 1993) ensures that only one position of this head is spelled out.Finally, Skinner (2008) argues that RED is generated high in the structure, just inside the topic marker, and optionally lowers to any preroot position.
These phase-bounded analyses greatly overgenerate positions for RED.As described in §3, in a verb such as ma-ka-pag-pa-ROOT, RED is virtually always (over 99.9% of the time) placed before ka or pa.Yet the phase-based analyses fail to differentiate among any of the four positions after ma, merely licensing them all as grammatical, even the positions (e.g.pre-pag here) that the grammarians (e.g.S&O) identify as ungrammatical.Moreover, these analyses fail to capture the crucial role context plays in determining whether a particular position is available.For instance, RED is sometimes acceptable between pag the root and sometimes not, depending on which other morphemes are present.In mag-si-pag-ROOT, RED is never found between and ROOT, while in maka-pag-ROOT, RED is common in that position (38.2% incidence).Finally, these analyses undergenerate outside the phase.For example, in ma-ʔi-ROOT, RED precedes ʔi 12.8% of the time in the corpus, which is also regarded as a grammatical position in S&O (362).
But ʔi is a topic marker, so the phase-bounded analyses predict this position to be completely ungrammatical.
8. PARALLEL VS.SERIAL THEORIES OF AFFIX ORDERING.Language-specific rules of postsyntactic movement or dislocation are sometimes invoked to explain arbitrary ordering restrictions (see e.g.Embick & Noyer 2001, Embick 2007 on these operations in distributed morphology).Movement and dislocation are operations in the sense that they are structure-altering rules in a serial derivation, each rule being blind to any subsequent rule applications (Chomsky 1995, 2001, Embick 2007:329, 2008).A constraintbased theory morphotactics, by contrast, selects the optimal output(s) from a set of candidates evaluated in parallel by a system of ranked or weighted constraints (on the general theory of constraint interaction, see Prince & Smolensky 2004[1993] and Smolensky & Legendre 2006).
In this section I briefly address some general differences between serial-operational and parallel theories of affix ordering.Consider the fixed ordering of the causative and applicative suffixes in Chichewa.When these are the only two derivational morphemes in the verb, the causative its must always precede the applicative il, regardless of which scopes over the other (Hyman 2003).For an applicativized causative, this order is scopal; but for a causativized applicative, it is counterscopal.In a parallel morphotactic theory, a constraint such as CAUS-APPL could be active, dominating any constraints preferring the other order (e.g.other morphotactics and/or, as Hyman proposes, a MIRROR constraint).
In an operational framework such as Embick and Noyer's (2001), the counterscopal ordering could be achieved by movement.Rackowski (1999) suggests that this case involves lowering, perhaps as sketched in Figure 5a, in which the causative head lowers to adjoin to the applicative head.Local dislocation could not be invoked here because the causative skips over intervening morphemes in more complex verbs such as the causativized reciprocalized applicative in Figure 5b, though it can optionally remain in its base position in such verbs (data from Hyman 2003:273).(It is possible that a more powerful version of lowering than what Embick and Noyer (2001) assume is required for this case, such as the version advocated by Skinner 2008.)These rules might be formalized somewhat differently by different authors, but here I intend only to sketch some general differences in the perspectives and predictions of constraint-vs.rulebased frameworks, since there undoubtedly are such differences that transcend certain details of implementation (see also Embick 2008: §1).
the operational framework, it would be possible to add a second rule to Chichewa moving APPL to CAUS applicativized causative verbs (not shown).Both of these rules together in the language would make for a hypothetical version of Chichewa (call it Chichewa′) in which both causativized applicatives and applicativized causatives sur- faced in counterscopal order.6I call this hypothetical phenomenon SCOPAL METATHESIS, in the sense that the two suffixes always surface in counterscopal order.No such language is attested.Constraint-based morphotactics properly rules out scopal metathesis.Affixes can be realized in counterscopal order only when that order is morphotactically motivated, that is, when the reordering (vis-à-vis scope) results in more regularity in order across all outputs.In a system of morphotactic and SCOPE constraints, the complete typology of possible interactions between two affixes was given in 10.This typology includes fixed ordering (morphotactics dominates), scopal ordering (SCOPE dominates), asymmetric compositionality (morphotactics and SCOPE are variably ranked), and free ordering (morphotactic constraints are variably ranked).No ranking can generate scopal metathesis.(Free ordering is not scopal metathesis, since there is no requirement that the affixes surface in counterscopal as opposed to scopal order.) This fact about the typology extends to all constraint-based morphotactic theories, even if multiple morphotactics interact, as I maintain.It is guaranteed by Moreton's CHARACTERIZATION THEOREM of violable constraint evaluation (1999), according to which deviations from faithfulness (analogous to scope) are possible only if they reduce markedness (analogous to morphotactics).This theorem does not hold of operational approaches; for example, one could posit a rule of symmetrical metathesis, V 1 V 2 → V 2 V 1 , that converts /oa/ into [ao] and /ao/ into [oa] (Moreton 1999:17).Symmetrical metathesis is analogous to scopal metathesis, in which the surface order of two morphemes is always the reverse of their base order.Just as constraint-based phonology with markedness and faithfulness constraints cannot generate symmetrical metathesis, constraint-based morphology with morphotactic and scope constraints cannot generate scopal metathesis.
A second unattested scenario that derivations can generate but morphotactic constraints cannot is the case of asymmetric compositionality (see 7 and 10) in which the fixed order is counterscopal rather than scopal.This could be achieved by having optional movement or dislocation for one scope (as required for the attested pattern) as well as obligatory movement or dislocation for the other scope.
In sum, constraint-based morphotactics predicts that ordering will be either scopally licensed (iff a SCOPE is active) or else insensitive to syntax/semantics, being driven entirely by the active morphotactics.An operational framework, by contrast, predicts (without limiting stipulations) that orderings that do not directly reflect scope could nevertheless be determined by scopal information, as in the case of scopal metathesis, in which the surface order is never faithful to scope but always determined by it.This extra power afforded by derivations is apparently unnecessary.9. NONTRANSITIVITY.By NONTRANSITIVITY in morpheme order, I refer to the situation in which the ordering restrictions cannot be expressed collectively as a single total or partial order over morphemes.Bigram morphotactics, unlike certain other theories, predicts nontransitivity (see also Anderson 1986).Consider such a case in the abstract: (a) morpheme X must precede Y, (b) Y must precede Z, but (c) X must follow (or optionally follows) Z.This scenario cannot be implemented in a framework in which affix ordering is required to be transitive, for example, alignment (see §7).Under alignment, (a) requires X-L [= Align (X,left,PWd,left)] >> Y-L and (b) requires Y-L >> Z-L.Because of ranking transitivity, (a) and (b) entail X-L >> Y-L >> Z-L, which contradicts (c), according to which X always or sometimes follows Z. Nontransitivity is also problematic for theories in which affix order falls out from the association of each affix with a coherent level, stratum, or position class (e.g.Kiparsky 1982, 2000, Mohanan 1986, Inkelas 1993).
In doubly derived Chumbivilcas Quechua verbs, for instance, ri 'inchoative' can only precede schi 'assistive' and schi can only precede 'reciprocal' (Muysken 1988:263).If transitivity held, ri would have to precede na.But, in fact, na-ri is the only acceptable order, violating transitivity.As a second example, this time involving variation, paya 'frequentative' must precede ru and ru must precede schi.But paya and schi can occur in either order, again violating transitivity (this is true regardless of whether the variation of paya and schi is free or correlated with scope).Bigram analyses of these two cases are given in 24 and 25, respectively.
A different, context-sensitive type of nontransivity is exhibited when the ordering of a pair of affixes depends on whether or not some third affix is present.In Huave, for instance, reflexive ay can usually precede only first-person Vs (27a) (Stairs & Hollenbach 1981, Embick & Noyer 2001:576).But in 27b, in the context of plural on, ay must follow Vs, contradicting the ay-Vs ordering implied by 27a.What happens when all three suffixes in one of these pairwise nontransitive sets cooccur?Consider the Chumbivilcas nontransitive set: la 'just' > n '3', n > kuna 'PL', kuna > la.Muysken reports that for a triply affixed noun containing la, n, and kuna, free obtains between -la-n-kuna and -n-kuna-la (1981b:295; §2 above).These four tableaux (three pairs plus one triplet) are given in 26.
( *t-e-kohč-ay-as-on PAST-THEME-cut-1-REFL-PL *PAST-THEME-cut-REFL-1-PL 'we cut (PAST) ourselves' A similar example can be found in Lithuanian (Senn 1966, Nevis & Joseph 1993, Embick & Noyer 2001:578), in which si 'reflexive' is a suffix, unless one or more of a certain class of prefixes is present, in case si occurs as the second prefix, as in 28.
(29) Nontransitivity in Huave In short, bigram morphotactics can implement the sorts of nontransitive ordering restrictions instantiated in natural languages.Still, one might justifiably wonder why nontransitivity is as uncommon as it is, given how many rankings/weightings generate it on this proposal.I return to this question at the end of the next section, which addresses the restrictiveness of the theory.
ON THE RESTRICTIVENESS OF BIGRAM CONSTRAINTS.Bigram morphotactics is a restrictive theory of affix order variation.Unlike a brute-force approach such as representing possible orderings as a finite-state automaton (e.g. the directed graphs used for illustrative purposes in Jurafsky & Martin 2000: §3, Plag & Baayen 2009),8 there are many unattested but logically possible types of affix order variation that constraintbased morphotactics cannot generate.Recall, for instance, the attested pattern of context-sensitive variation described in 11, which might be schematized /A,B,C/ {A-B-C, B-A-C}; /A,B,D/ → {A-B-D} (*B-A-D).In this case, A and B are freely ordered, except when D follows them.This pattern can be modeled by a finite-state au- A system of ranked or weighted bigrams can generate the first grammar (Fig. 7) but not the second (Fig. 8).To get variation between A-B-C-D and B-A-C-D, constraints A-B and B-A must be closely weighted (as in maxent grammar, e.g.Goldwater & Johnson 2003, or noisy harmonic grammar, e.g.Boersma & Pater 2008) or variably ranked (Anttila 1997a,b).There is no constraint that can force the A-B order in the context C-E but not C-D.Therefore, no grammar can rule out B-A-C-E while allowing A-B-C-D and B-A-C-D, as in Fig. 8.
This restrictiveness extends to output patterns for single inputs as well.For example, imagine a grammar in which input /A,B,C,D,E/ has exactly three outputs in free variation (A-B-C-D-E, B-A-D-C-E, B-D-A-E-C).Unlike the directed graph formalism in Figure 9, no bigram constraint grammar can generate those three candidates without substantially overgenerating other candidates (as confirmed by running in Wilson & FIGURE 6. Finite-state automaton for morpheme order. To model gradient data, a probability could be assigned to each edge.To restrict the machine to be sensitive to the input, a transducer could be employed, adding an input morpheme to each edge in Fig. 6, as in Figure 7.  George 2008 the tableau with all 120 candidate permutations and all twenty pairwisedistinct bigram constraints).These examples are sufficient to show that interacting bigrams are not a brute-force approach like directed graphs.Despite this formal restrictiveness of bigram morphotactics, there remain various typological tendencies in affix ordering that the theory does not address.To pick one, in a language with case and number affixes, number is almost always inside case (Greenberg 1963, Hawkins & Gilligan 1988).But under the present proposal, both CASE-NUMBER >> NUMBER-CASE and the opposite ranking are equally accessible to the learner.Even if we allowed that some faithfulness constraint was in the constraint set penalizing number outside of case (e.g.LINEARITY in Horwood 2002), as long as the learner is capable of positing a morphotactic constraint favoring the other order and candidates with the other order are generated, number outside of case is predicted to be synchronically accessible.This issue is a concern not only for my proposal, but also for any general theory addressing affix order variation and language-specific morphotactics, such as the theories mentioned in §7 and §8.
Nevertheless, it is often not clear to what extent such restrictions are imposed by the synchronic morphology as opposed to being diachronic artifacts with explanations in other domains (on piecing apart synchronic and diachronic explanation of typological generalizations, see e.g.Blevins 2004, Kiparsky 2006, Wilson 2006, Anderson 2008, and Moreton 2008).Is a language with number-outside-of-case morphology unlearnable?The existence of exceptions to the generalization (Konstanz Universals Archive, record 7; Plank & Filimonova 2000) demonstrates that this is not so.In that case, is there some kind of learning bias (Wilson 2006, Moreton 2008) disfavoring such a language?Or might its rarity be due entirely to the typical chronology of affixal fusion, whose causes might in turn be associated with other domains, such as the syntax (cf.Hawkins & Gilligan 1988, Bybee et al. 1990, Siewierska & Bakker 1996, Trommer 2003, etc.)?The Konstanz Universals Archive entry for the number/case generalization, for instance, includes the observation: 'Number will always be grammaticalized before Case, hence bound Number exponents will always end up closer to the stem and bound Case exponents will always be more marginal' (Frans Plank).Evaluating these possibilities is beyond the scope of this article, but if Plank's suggestion is on the right track, hardwiring the case/number generalization into the synchronic morphology might not only be unnecessary and redundant (since the scarcity of the pattern would already be accounted for), but should in principle be empirically falsifiable (e.g. if learners exhibit no difficulty in acquiring the less common order).
In a similar vein, my proposal does not synchronically distinguish between the markedness of free variation vs. nonvariation, or transitivity vs. nontransitivity.Yet nonvariation and transitivity are impressionistically much better represented than their counterparts typologically.In these cases, diachronic and/or functional (see e.g.Hay 2003, Hay & Plag 2004, and Plag & Baayen 2009) explanations of the asymmetries are especially plausible.I have argued in §6 that variation can evolve from a categorical system under special conditions (e.g. an infixing reduplicant reanalyzed as mobile due to morphotactic extension).But under more run-of-the-mill circumstances, the learner would typically have no motivation to innovate free variation (e.g. the Tagalog learners above did not learn variation for morphemes other than aspectual RED); the affixes would remain fixed in the order in which they were morphologized or the order determined by scope.
Likewise, if we assume that affixes are usually morphologized in a fixed chronological sequence (X, then Y, then Z), the combinations will be transitive by default (X-Y, X-Z, Y-Z, X-Y-Z), and the learner would typically have no reason to deviate from the transitive learning data.But under less frequent conditions, the learner might innovate nontransitivity (e.g.due to analogy, as in §6) or affixes might be morphologized in a nontransitive order.As an example of the latter, recall the case of nontransitivity in Lithuanian in 28 in which si 'reflexive' can either precede or follow the root, depending on whether a prefix (of a certain type) is present.According to Nevis and Joseph (1993), si was originally a second-position clitic, explaining its variable placement with respect to the root.If second-position clitics are uncommon, we would not expect this sort of situation to be morphologized often, even though it is entirely learnable when it is morphologized.In sum, any synchronic theory that cannot generate variation or nontransitivity is inadequate.But the explanation for the relative scarcity of variation and nontransitivity might reside in historical and functional considerations.
CONCLUSION.Bigram morphotactic constraints provide a constrained, sufficiently powerful, and demonstrably learnable means of implementing local morphological restrictions on the placement of RED in Tagalog and similar semantically unpredictable affix ordering restrictions in other languages.In fact, the model does better than covering the facts.When trained on an impoverished 'core' corpus on invariable (least marked output only) Tagalog data, the bigram learner correctly anticipates to a large extent which variant positions for RED should and should not be allowed, and in roughly which proportions.These simulations show that when a categorical training corpus exhibits certain characteristics, variation can be easier to learn than categoricality, motiits diachronic emergence and synchronic stability.The bigram schema is intended to be universal, but individual bigram constraints must be posited during learning (as in Pater 2007: 'morpheme-specific constraints are constructed from universal constraints in the course of learning').Roughly speaking, this entails that as the learner parses morphemes, he or she also pays attention to the local sequences in which those morphemes occur (see especially Poon et al. 2009 on morpheme parsing in a log-linear framework closely related to the maxent formalism in this article).Bigram morphotactics is intended to supplement, not replace, semantic factors in affix ordering such as scope.As discussed in § §1-2, scope plays a role in some (sub)systems, but not others.Because free variation is by definition not correlated with semantic effects, another theory, such as the present one, is necessary to account for its grammar.
FIGURE 5. Sketches of causative lowering in Chichewa.

FIGURE 8 .
FIGURE 8. Automaton for an unattested pattern of variation.

FIGURE 9 .
FIGURE 9. Automaton for an unattested pattern of variation for a single input.

TABLE 1 .
Basic ordering typology for adjacent affixes.

TABLE 2 .
RED placement in twenty-nine prefix strings.

TABLE 3 .
Constraint weights after learning.

TABLE 4 .
Learner's predictions after training on filtered corpus.

TABLE 5 .
A closer look at some of the learner's predictions.ated-to-actualfit(by 'fit' I mean 100% minus the weighted mean error) is plotted against time (log tokens babbled) in Figure4.The dotted line at 95.4% indicates the fit between the training and actual corpora.The generated-to-actual fit converges to this line at asymptote as the learner perfects its match to the training corpus.
FIGURE 4. Learner's improvement over its impoverished training corpus over time.

TABLE 6 .
Ten examples of nontransitivity from Chumbivilcas and Tarata Quechua.