Learner corpora (large collections of spoken or written texts produced by second or foreign language learners) provide unique insights into language learning. Yet the great bulk of available learner corpora are based on writing – this has largely precluded them being used to explore the pragmatics of interaction, essential for the development of interactional competence (see Culpeper et al., 2018: 5-6). While some conversational spoken learner corpora have been produced, publicly available datasets have notable limitations. For example, the LINDSEI corpus (see https://uclouvain.be/en/research-institutes/ilc/cecl /lindsei.html) varies the setting in which the corpus data was gathered, which makes controlling this variable difficult and the relatively small scale of the corpus (just over 1 million words) encourages data aggregation which may lead to misleading results (see Friginal and Polat, 2015 for a critique; see McEnery et al., 2019, for a fuller discussion of the drawbacks of current learner corpora). In second language acquisition research, the investigation of pragmatics has been guided by small experimental studies (see Culpeper et al, 2018). While these are valuable, there is now a consensus that the study of language acquisition is so complex that experimental studies need to be supplemented by insights gained from suitable large learner corpora (Rebuschat et al., 2017), as demonstrated in Ellis et al. (2017).
So, while learner corpus research is growing significantly, work on learner speech is rare, contributing to what Culpeper et al. (2018: 194) rightly call ‘a dearth of studies of pragmatic development utilizing corpus techniques’. For example, the bibliography of learner corpus research (see https://uclouvain.be/en/research-institutes/ilc/cecl/learner-corpus-bibliography.html) currently lists over 1,100 publications in the past three decades; only 111 (9.7%) explore speech and a mere 16 principally focus on pragmatics through studies of discourse markers (14) and speech acts (2). Notable is the absence of any studies of macrostructures within which, for example, discourse markers and speech acts are organized. Yet mastery of such structures can have a direct impact on performance for second language users as they attain “over time the ability to carry out dialogic interaction with … [an L1 speaker] … with fewer communication gaps, which may be the result of their increased narrative and discursive abilities” (Collentine and Freed, 2004: 163). This talk will directly address this lack of work on pragmatic macrostructures by using newly available corpora, annotated with discourse structure, to investigate discourse in learner speech.