Resumption Still Does Not Rescue Islands

RESUMPTION STILL DOES NOT RESCUE ISLANDS Dustin Heestand Harvard University Ming Xiang University of Chicago Maria Polinsky Harvard University act x has been accomplished in a situation s (e.g., do), and (b) a separate predicate expressing that a given act x is of a particular type in a situation s (e.g., juggling). This semantic consequence, in turn, converges with well-known morphosyntactic evidence that a verb like juggle is syntactically derived from a more complex structure, one akin to a complex predicate like do juggling (e.g., Hale and Keyser 1993, Kratzer 1996). Despite these prior results, the claim that simple verbs like juggle are, at a greater level of abstraction, semantically and syntactically complex remains controversial (Fodor and Lepore 1999, Horvath and Siloni 2002). The fact that sentences like (10a) must be seen as having readings akin to (10b) provides some novel, additional support for this now popular, though still controversial, claim.


Introduction.
Since Ross 1967, island constraints have been a major topic in syntactic research; however, the status of different types of islands and their psychological reality to this day remain subjects of hot debate. Environments where island constraints appear to be violated are particularly important to our understanding of the constraints. Resumptive pronouns (RPs) have traditionally been cited as an island-rescuing device in English and other languages (Ross 1967, Kroch 1981, Erteschik-Shir 1992. And indeed, in some languages, RPs show immunity from island constraints (Aoun et al. 2001, McCloskey 2006, as in the following Lebanese Arabic example: (1) ħđrna l-masraħiyye yalli tʕarrafna ʕala l-muχriʒ yalli ʔaχraʒ-*(a) saw.1PL the-play that met.1PL on the-director that directed.3SM-*(it) 'We saw the play that we met the director that directed it.' (Aoun and Choueiri 1996, ex. (12)) In English, RPs have been reported to appear in island and non-island contexts, as attested in spontaneous speech (Prince 1990) and in laboratory production studies (Zukowski andLarsen 2004, Ferreira andSwets 2005): (2) a. I have this friend who she does all the platters. (Prince 1990) b. This is the donkey that I don't know where it lives. (Ferreira and Swets 2005) c.
The man who the spider is falling on his head… (Zukowski and Larsen 2004) This productive use of RPs might suggest that resumption is a strategy available to the grammar of English just as it is available to the grammar of languages like Irish or Lebanese Arabic. On this view, the main difference may be in how acceptable languages find this strategy, and how often they use it. In English, for instance, resumption is extremely rare (Creswell 2002;Hermann 2005;Jaeger 2006;Manetta 2007;Bennett 2008), so it may have the same "last resort" character that Shlonsky (1992) proposes for Hebrew and Palestinian Arabic (both languages with fully grammatical resumption). When speakers have painted themselves into a syntactic corner, they use RPs to salvage what they can of their intended meaning. Nevertheless, this use of RPs forms part of the grammatical knowledge of native English speakers, and therefore a descriptively adequate syntactic theory should have a means of representing them. Assuming such an account, the relative ungrammaticality of resumption in English might result merely from frequency effects.
Turning now to experimental investigation of RPs, there is a deep discrepancy between production studies and comprehension studies. On the production side, Zukowski and Larsen (2004) and Ferreira and Swets (2005) asked the participants to judge the acceptability of sentences using the same resumptive structure that the participants had readily produced (cf. (2b, c) above).
Resumptive structures were consistently rated significantly below the grammatical controls.
However, these studies did not compare the RPs with their illicitly gapped counterparts, and therefore were inconclusive as to whether resumption can rescue islands. Across different languages and conditions, A&K's results consistently showed that when extracting from an island, strong or weak, a resumptive structure was never more acceptable than its gapped counterpart. On the other hand, the violation caused by the RP was ameliorated by increased syntactic distance. The latter finding is particularly surprising in light of corpus studies (Bennett 2008, Jaeger 2006, which show English resumption to be more common for highest subjects than for embedded subjects.
If we combine all these findings, a paradox emerges: first, resumption in English seems to be most common where it is least acceptable; and second, the amelioration effect is not reciprocal.
That is, more deeply embedded islands can improve the acceptability of RPs, but RPs can never improve the acceptability of island violations.
The current study takes this paradox seriously and attempts to fill two gaps in the previous studies. First, we wish to test the acceptability of more types of islands in both declarative and whquestion contexts. It is well known that syntactic islands are not a homogeneous group, as they show amenability in various degrees to gap creation (Phillips 2006, Sprouse et al. in press, Sprouse et al. 2010, and thus, presumably, to RPs. In testing resumption, A&K did not test declarative statements, where resumption is rampant in production (instead, they tested whquestions, where no such effect has been observed). In order to strike at the heart of the production-comprehension mismatch, an investigation of RPs in declaratives is necessary.
Second, RP acceptability should be tested in an online, not offline task (A&K used the latter).
One reason that RPs occur more often in production might be due to the temporal constraint people are facing during production. In order to address this possibility, we need studies of RP acceptability in an online comprehension task, which puts people under a similar kind of time pressure. Setting a longer-term agenda, we need to obtain online comprehension results for both visual (reading) and auditory presentation. If we find no rescuing effect of RPs in any of these circumstances, we have stronger evidence that at least for English, resumption has no capacity to amelorate island constraint violations in comprehension. The visual presentation is necessary to better contextualize the results reported by A&K and to remove the possible confounds their study had. The auditory presentation is necessitated by the fact that RPs are very much a spoken register phenomenon (see also Bennett 2008 andJaeger 2006). Looking back to the many years of introspection into resumption, auditory presentation has always been the dominant method of establishing if something is acceptable or not: linguists would say a sentence to themselves or their friends.
This squib takes the first step to this end, presenting three experiments on reading. The first one tested RPs in complex NP islands, using an offline judgment task (the data which are needed to fill in the gap from A&K's study). The second and third experiments used online judgment tasks, in which we tested relative clauses islands and adjunct islands. In all these experiments, we also compared declarative sentences with wh-questions; the latter allowed us a more direct comparison with A&K's results.

EXPERIMENT 1: Resumptive pronouns within complex NP islands.
In this experiment we tested the rescuing function of RPs for complex NP islands, using two types of complex NP islands, one with a factive complement clause and one with a standard relative clause: Experiment 1a examines RPs in factive clauses; Experiment 1b, in relative clauses.

EXPERIMENT 1a. Factive Complement Clauses
Materials, subjects, procedure. In a 2x3 design, we crossed the following two factors: construction type and gap type. For the first factor, construction type, participants read either a wh-construction or a complex-NP (factive clause) construction; for the second factor, gap type, participants read a sentence with a gap within the island, a sentence with a resumptive pronoun within the island, or a grammatical control sentence with a gap in the main clause. Example sentences are given in (4): (4) Factive clauses-Declaratives a. This is the man that the news that the police arrested __ shocked the public. (Gap in an island) b. This is the man that the news that the police arrested him shocked the public. (RP in an island) c. This is the man that Mary thought that the police arrested __ to protect the president. (Grammatical

control)
Factive clauses-Wh-questions d. Which man did the news that the police arrested __ shock the public? (Gap in an island) e. Which man did the news that the police arrested him shock the public? (RP in an island) f. Which man did Mary think that the police arrested __ to protect the president? (Grammatical control) There were a total of 30 sets of experimental items, and each had the six conditions described above. All the experimental sentences were distributed into six lists with a Latin-square design. In addition, there were 108 fillers. Each participant was given a printed copy of one of the lists, with a total of 138 sentences in a randomized order. The participants were instructed to score the acceptability of each sentence on a 1 to 7 scale, where 7 indicated perfect acceptability and 1, unacceptability. They were instructed to judge based on their native-speaker intuition rather than any prescriptive rules, and to go with their first instincts rather than spending time pondering their answers. 18 native speakers of English from the Boston area participated in our study.

Results.
Average ratings for Experiment 1a are shown in Figure 1. The 2x3 ANOVA reveals a main effect of gap type both by subjects (F(2, 34)=68.9,p<.001) and by items (F(2, 58)=122.5,p<.001). We did not find any main effect of construction type, nor any interactions.
Planned comparisons found that the main effect of gap type was driven by the difference between the grammatical control and the ungrammatical conditions. In the group of the declarative constructions, the control condition was rated significantly higher than the gapped condition (

EXPERIMENT 1b: Relative Clauses
Materials, procedure and subjects. This experiment focuses on RPs within relative clause islands. The experimental design is the same as Experiment 1a. An example of the stimuli is given in (5) There were 30 sets of experimental items and 114 fillers. The procedure was the same as Experiment 1a. 18 native speakers of English from the Boston area participated in the study.

Results.
Average ratings for Experiment 1b are shown in Figure 2. The 2x3 ANOVA found a main effect of construction type, which is marginal by subject analysis (F(1, 17) illicit gaps. For the grammatical controls, at least for relative clauses, wh-constructions were rated lower than declarative constructions. This difference could be due to the fact that in our material, the wh-controls are slightly longer than the declaratives, and hence a bit more complex.
With respect to the null-rescuing function of RPs, a number of studies have suggested that RPs are used as a "last-resort" strategy. If the acceptability of resumptions is related to processing resources, it raises the possibility that people might consider RPs as more acceptable than gaps when they have a limited amount of time to process them. To test if this is the case, in Experiment 2, we repeated Experiment 1b in an online acceptability judgment task. An additional advantage of the online task is that it allows us to collect response time data, in addition to the acceptability judgment data.

Experiment 2: Online acceptability judgments of relative clause islands
Materials, procedure, and subjects. The materials were the same as in Experiment 1b. The experiment was implemented using Linger (Rohde 2007). All the sentences are automatically randomized. Each sentence was presented word by word, with 400 ms presentation for each word.
After the last word of each sentence, participants used the mouse to choose on a 1 to 7 acceptability scale (7: perfectly grammatical, 1: ungrammatical). 24 native speakers of English from the Boston area participated in the study.
Turning now to the reaction times, RTs longer than 4500ms (2 standard deviations from the mean) were not included for data analysis. The mean RT result is presented in Figure 4. A 2x3 ANOVA found no main effect of construction type (F 1 (1, 23)=0.5, p>.5, F 2 (1,29)=0.2, p>.5).
There is a significant effect for gap type (F 1 (2, 46)=3.5, p<.05; F 2 (2, 58)=4.6, p<.05.). No interaction was found (F 1 (2, 46)=0.8, p>.1; F 2 (2, 58)=1.4, p>.1). Planned comparisons found that the significant effect of gap type was mainly driven by the difference between the resumption conditions and other conditions. For the group of wh-constructions, the RTs for the resumption condition were significantly shorter than for the control condition (t 1 (1, 23)=-2.4, p<.05; t 2 (1. 29)=-2.7, p<.05). There was also a numerical trend that the RT for the resumption condition was also shorter than the gap condition, but this difference did not reach significance. For the declarative group, the RTs for the resumption condition were marginally shorter than the control condition by subject analysis (t 1 (1, 23)=-1.8, p=.08; t 2 (1. 29)=-1.5, p>.1); and also in this group the RTs for the resumption condition were significantly shorter than the gap condition by item analysis, and this difference is marginally significant by subject analysis (t 1 (1, 23)=-1.8, p=.08; t 2 (1. 29)=-2.4, p<.05). No other difference was observed. <Figures 3, 4 here> Summary of Experiment 2. The online rating results from Experiment 2 replicated the offline results in Experiment 1b. Once again, RPs did not show any rescuing effect, even when subjects were under time pressure. Interestingly, the RT data also showed that people actually judged the resumption condition fastest. In other words, rather than having any rescuing effect, RPs made it easier for people to detect the ungrammaticality of the sentences. In addition, as in Experiment 1b, we also observed that grammatical wh-constructions are again rated lower than grammatical declarative constructions. The RT difference on the two grammatical conditions, however, did not reach significance.
Overall, Experiments 1 and 2 showed that RPs have no rescuing effect for violations of complex NP islands. Complex NPs are generally considered strong islands (Postal 1998, Szabolcsi 2006). In the last experiment, we tested RPs in adjunct clauses, at least some of which have been considered weak(er) islands (Cinque 1990;Truswell 2007), thus establishing a contrast with the strong islands considered in previous work.

Experiment 3: Online acceptability judgments of adjunct islands
Materials, procedure, and subjects. The experimental design is the same as the previous experiments, except that the target stimuli are adjunct islands. In theoretical work, such islands are considered strong; an experimental investigation by Hiramatsu (1999Hiramatsu ( , 2000 shows their status is particularly opaque. An example of the experimental stimuli is given in (6).
(6) Adjunct Clause-Declaratives a. This is the dish that, although the chef overcooked __, the guests were not upset. (Gap within an island) b. This is the dish that, although the chef overcooked it, the guests were not upset. (RP within an island) c. This is the dish that, although the chef overcooked the sauce, the guest enjoyed __. (Grammatical

control)
Adjunct Clause-Wh-questions d. Which dish did Gina think that, although the chef overcooked __, the guests were not upset? (Gap within an island) e. Which dish did Gina think that, although the chef overcooked it, the guests were not upset? (RP within an island) f. Which dish did Gina think that, although the chef overcooked the sauce, the guests enjoyed __?

(Grammatical control)
There were a total of 30 sets of experimental sentences and 114 fillers that were used in an online rating task. The procedure is the same as Experiment 2. 24 native speakers of English were recruited from the Boston area to participate in our study.
The RT mean (ms) for Experiment 3 is presented in Figure 6. Again, RTs longer than 4500ms were not included in the data analysis. We did not observe any significant difference in the mean RT data for Experiment 3. <Figures 5, 6 here> Summary of Experiment 3. The rating results in Experiment 3 show that adjunct islands, at least the kind presented here, are very weak: sentences without any island violations only had a slight advantage in rating over those with island violations. This result adds to the growing body of evidence that adjunct islands are a heterogeneous group (see also Hiramatsu 2000, Truswell 2007, Sprouse et al. 2010. The particular adjunct islands we used (although-, while-islands) consistently show greater transparency. The understanding of which adjunct islands are weaker and why is still outstanding, and we are not ready to offer any speculations. However, critically for our purposes, sentences with RPs again showed no advantage over sentences with gaps.

Conclusions.
This paper took as its starting point the conclusion, reached on the basis of experimental work, that English resumption does not ameliorate island effects (Alexopoulou & Keller 2007).
The reasons to question this result had to do with the methodology (offline reading-based judgments) and the nature of islands tested. Our own study included the contexts where resumption in English is more commonly found (declaratives; relative and adjunct clauses) and employed a different methodology. Nonetheless, resumption still failed to rescue the island violations and was judged ungrammatical. While not the final word in the long-standing debate on the role of resumption in English (the next logical step, which we are now planning, is to test the auditory presentation of the offending stimuli), this result suggests that the role English RPs play in production is not that of island rescuers.
If this is the case, what is the function of RPs in English? We have shown that resumption does not help the hearer, or more accurately, the reader (and it has been reasonably well established that it is ungrammatical in written discourse). One possible explanation is that English resumption is, unlike Irish resumption, not a strategy for establishing A'-binding relations, but rather something more similar to cross-sentential anaphora (see also Erteschik-Shir 1992). If this is the case, performance pressures in production could lead to speakers resorting to such anaphora as a way of adding more information without breaking the production chain. If this suggestion is on the right track, the difference between production and comprehension with respect to resumption falls outside the domain of grammar and pertains to the planning of an utterance. That in turn would account for the paradox we noted earlier: naturally occurring resumption is more common in the subject position of an embedded or relative clause, which is also the context where it is judged most ungrammatical. Subjects (or maybe topics) are privileged with respect to coreference across clauses and in discourse (Keenan and Comrie 1977, Comrie 1987, Lambrecht 1994, Erteschik-Shir 2007, and this privileged status with respect to coreference would favor them over other arguments in the use of RPs by the coreference-marking speaker. Thus, resumption in English may be still another instance of phenomena where, contrary to belief, speakers structure an utterance to meet their own needs, in addition to the needs of the hearer (for other instances of speakers following their needs rather than those of the listeners, see Brennan andClark 1996, Engelhardt et al. 2006, a.o.). Finally, if the use of RPs in English is nothing more than a speaker-centered device for maintaining coreference, we are in a position to better differentiate it from non-intrusive resumption in such languages as Irish (McCloskey 2006 and references therein).
With further experimentation, intrusive resumption in English could be differentiated from yet another type of licit resumption: that found in Italian. In Italian, left-dislocated elements may be doubled by clitic RPs, but the two elements may not be separated by an island boundary, indicating that movement is involved (Cinque 1990). Therefore, we would expect RPs in Italian to surpass the acceptability of gaps under deep embedding in the absence of an island, in contrast to their behavior in English, Greek, or German (Alexopoulou and Keller 2007). However, in the presence of an island, Italian judgments should mirror those in the present experiments.
Finally, we would like to remind the reader that ungrammatical sentences with RPs were judged in our experiments as bad very quickly. In cases where RP judgments were faster than Truswell, Robert. 2007. Extraction from adjuncts and the structure of events. Lingua 117, 1355-1377.
Zukowski, Andrea, and Jenny Larsen. 2004. The production of sentences that we fill their gaps.
Poster presented at the CUNY Sentence Processing Conference, University of Maryland.