Corpus





The SpeechReporting corpus

The SpeechReporting corpus contains corpora of traditional folk stories, annotated for a number of discourse phenomena using the ELAN-CorpA software and tools (Chanard 2015; Nikitina et al. 2019). It is updated regularly with newly available data, including data from new languages. All texts are transcribed, glossed, translated, and annotated.

The corpus was developed as part of the project “Discourse Reporting in African Storytelling”, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 758232, PI Tatiana Nikitina). In addition to standard glosses and part of speech tags, it is annotated for instances of reported discourse; see our Annotation Guide.

Languages and their corpora

  • Bandial (Jóola Eegimaa)
  • Bashkir
  • Chuvash
  • Gizey
  • Guro
  • Kafire
  • Macedonian (coming soon)
  • Mwan (coming soon)
  • Udihe
  • Ut-Ma’in (coming soon)
  • Wan (coming soon)




A narrative corpus of Bashkir

Language and storytelling tradition

Bashkir is a Turkic language spoken in Bashkortostan (Russia) by approximately 1,2 million people. Nowadays, the traditional way of storytelling no longer exists, and people do not gather to tell stories. However, there are people who still remember tales that were told to them by older people in their childhood. Moreover, a lot of tales are published and people might retell these written tales.

Corpus composition

The current version of the corpus contains about two hours of texts (10.500 words) and corresponding video files. 1 h 20 minutes were collected by Ekaterina Aplonova during fieldwork in 2019; the rest comes from the Spoken Corpus of Bashkir (without video). The texts were recorded in the following villages: Abzanovo, Aslayevo, Baimovo, Kyzgy, Mullakaevo, Rakhmetovo, Tuishevo, and Usakly.

Orthography

The texts are transcribed in the Latin script with some extensions, unlike standard Bashkir orthography, which is based on the Cyrillic script. The system employed here is mainly a transliteration of the standard Bashkir orthography, but it has some features of phonological transcription. Correspondences between the standard orthography and the transliteration system employed here are listed on the Spoken Corpus of Bashkir website.

List of parts of speech
  • adj – adjective
  • adv – adverb
  • conj – conjunction
  • cop – copula
  • intj – interjection
  • n – noun

  • n.prop – proper noun
  • nsuf – noun suffix
  • num – numeral
  • numsuf – suffix of numerals
  • onomat – onomatopoeia
  • post – postposition

  • pron – pronoun
  • part – particle
  • suf – suffix
  • v – verb
  • vsuf – suffix of verbs
  • word – unclassified


List of glosses
  • A.CV – converb (-a)
  • ABL – ablative
  • ACC – accusative
  • ADJ – adjectivizer
  • ADV – adverbializer
  • AFF – affective
  • AG – agent nominalization
  • B.CV – converb (-p)
  • CAUS – causative
  • COND – conditional
  • CV.ANT – anterior converb
  • CV.TERM – terminative converb
  • DAT – dative
  • FUT – future
  • GEN – genitive

  • HORT – hortative
  • IMP – imperative
  • IMP.EMPH – emphatic imperative
  • IPFV – imperfective
  • JUSS – jussive
  • LOC – locative
  • NEG – negative suffix
  • NEG.CV.ATT – negative form of converb
  • NEG.POT – negative form of the potential
  • NMLZ – nominalization
  • NUM.SUBST – substantivizer of numeral
  • ORD – ordinal numeral
  • P.1PL – possessive suffix of 1PL
  • P.1SG – possessive suffix of 1SG
  • P.2PL – possessive suffix of 2PL

  • P.2SG – possessive suffix of 2SG
  • P.3 – possessive suffix of 3 person
  • PASS – passive
  • PC.PST – past participle
  • PL – plural
  • PLPF – pluperfect
  • POSS.SUBST – substantivizer of possessor
  • POT – potential
  • PST – past tense
  • PTCP.FUT – future participle
  • Q – question marker
  • RECP – reciprocal
  • REFL – reflexive
  • \RUS – Russian borrowing
Acknowledgements

I would like to thank my Bashkir language assistants who told me their stories and Fizaliya Makhanova for her help with transcription, as well as Lilya Buskunbaeva and Ramilya Karimova from Ufa Institute of Linguistics for their precious help during data collection and analysis.

Citing the sub-corpus:
Aplonova, Ekaterina. 2021. A narrative corpus of Bashkir. In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





A narrative corpus of Bandial (Jóola Eegimaa)

Language and storytelling tradition

Jóola Eegimaa is an Atlantic language spoken in Senegal by approximately 15.700 people. The practice of storytelling is largely extinct in Senegal; even in remote areas, the only people capable of telling stories are 'rememberers' of their grandparents' generation. Among the Jóola, many tales are likely borrowed from other linguistic-ethnic groups such as the Mandinka.

Corpus composition

The current version of the corpus contains about two hours of texts (about 12.000 words) and corresponding video files. The data was collected during a three month period in 2018-2019 by Abbie Hantgan-Sonko in the area known as The Kingdom in Casamance.

Orthography

The transcription is made according to the official Jóola (otherwise written as Jola or Diola) orthography which indicates long vowels by doubling the letter: aa ee ii oo uu , and advanced tongue root ([+ATR]) by an acute accent over the first vowel of a stem: á é í ó ú . Retracted or [-ATR] roots are unmarked.

List part of speech tags and glosses (adapted from Sagna 2008: 18)
  • adj – adjective
  • adp – adposition
  • adv – adverb
  • aux – auxiliary
  • comp – complementizer
  • coordconn – coordinative conjunction
  • cop – copula
  • def – definite article
  • dem – demonstrative
  • emph – emphatic

  • idph – ideophone
  • indf – indefinite article
  • intj – interjection
  • n – noun
  • n: – nominal morpheme
  • n>v – noun to verb conversion
  • nprop – proper noun
  • num – numeral
  • part – particle
  • post – postposition

  • pn – proper noun
  • pref – prefix
  • prep – preposition
  • pro – pronoun
  • q – question mark
  • rel – relative marker
  • subordconn – subordinating conjunction
  • v – verb
  • v: – verbal morpheme
  • v>n – verb to noun conversion


  • ABSTR – abstract
  • AGT – agentive
  • ASSOC – associative
  • CAUS – causative
  • CD(3, 6, 13) – concord/ agreement marker
  • CENT – centrifugal
  • COMP – complementizer
  • DEF – definite
  • DEM – demonstrative
  • DEM.PROX – proximal demonstrative
  • DEP – dependent
  • DIR – directional
  • DO – direct object
  • EMPH – emphatic
  • EQUAT – equative
  • EXCL – exclusive

  • \FR – French borrowing
  • FUT – future
  • GEN – genitive
  • GER – gerund
  • HAB.NEG – habitual negative
  • IMP.NEG – imperative Negative
  • INACT – inactualis
  • INCL – inclusive
  • INDF – indefinite
  • INF – infinitive
  • LOC – location marker
  • MED – medial
  • MID – middle voice
  • NC(1-14) – noun class marker
  • NEG – negation
  • NEG.COP – negative Copula

  • NEG.FUT – negative Future
  • NMLZ – nominalizer
  • PFV – perfective
  • PL – plural
  • PLUR – pluractional
  • POSS – possessive
  • PRO – pronoun
  • PROH – prohibitive
  • QUANT – quantifier
  • REFL – reflexive
  • REL – relative
  • REV – reversive
  • SG – singular
  • SBJ – subject

Citing the Jóola Eegimaa corpus:
Hantgan-Sonko, Abbie. 2021. A narrative corpus of Jóola Eegimaa. In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





A narrative corpus of Chuvash

Language

Chuvash is a Turkic language spoken in Russia by approximately 1 million people

Corpus composition

The current version of the corpus contains examples of traditional storytelling recorded by A.K. Salmin in the 1980s (the original recordings are available in the archival collections of the Chuvash State Institute of Humanities) and folktales recorded by Elena Fedotova during her fieldwork in the summer of 2019. The total length of the currently available portion of the corpus is about 5 hours; the total number of words is about 25.000. The corpus is under construction, with new texts due to become available soon.

List of parts of speech
  • adj – adjective
  • adv – adverb
  • cnj – conjunction
  • idph – ideophone
  • intj – interjection

  • n – noun
  • np – proper noun
  • num – numeral
  • onomat – onomatopoeia
  • pp – postposition

  • pro – pronoun
  • prt – particle
  • suff – suffix
  • v – verb
  • vn – non-verbal predicate


List of glosses
  • ABL - ablative
  • ACC/DAT - accusative/dative
  • ADVBZ - adverbializer
  • ANAPH - anaphoric marker
  • ANTR - anterior
  • APPR.ALL - approximative allative
  • CAR - caritive
  • CAUS - causative
  • CMPR - comparative
  • COLL - collective
  • COND - conditional
  • CSL - causal
  • CV - converb
  • CV_ANT - anterior converb
  • CV_COORD - coordinate converb
  • CV_POST - posterior converb
  • DEST - destinative
  • \DIAL - dialectal
  • DISTR - distributive
  • EMPH - emphatic
  • EX - existential
  • EX.NEG - negative existential

  • GEN - genitive
  • HORT - hortative
  • IDPH - ideophone
  • IMP - imperative
  • IMPF - imperfective
  • INF - infinitive
  • INF2 - second infinitive
  • INSTR - instrumental
  • INTJ - interjection
  • INTENS - intensifier
  • INTS - intensifying particle
  • ITER - iterative
  • JUSS - jussive
  • LOC - locative
  • NEG - negative
  • NEG.PRS - negative present
  • NMLZ - nominalizer
  • OBL - oblique
  • ONOMAT - onomatopoeia
  • ORD - ordinal
  • PC_DEBT - debitative participle
  • PC_FUT - future participle

  • PC_PRS - present participle
  • PC_PST - past participle
  • PL - plural
  • POSS - possessive
  • POT - potential
  • PROH - prohibitive
  • PROPR - property marker
  • PRS - present
  • PRT - particle
  • PST - past
  • Q - question marker
  • RECIPR - reciprocal
  • REFL - reflexive
  • REL.POSS - relational possessor
  • RETR - retrospective
  • \RUS - Russian borrowing or codeswitching with Russian
  • SG - singular
  • SUBST - substantivizer
  • TEMP_ATTR - marker of temporal attribute
  • VBLZ - verbalizer

Citing the Chuvash corpus:
Nikitina, Tatiana. 2022. A narrative corpus of Chuvash. In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





A narrative corpus of Gizey

Language and storytelling tradition

Gizey is a Masa languoid (Chadic < Afro-Asiatic) spoken by approximately 19.000 people across Cameroon and Chad. Stories are still occasionally told during night gatherings. People generally know a couple of tales, which they modify freely to suit the storytelling context.

Corpus composition

The data were collected in 2019 by Guillaume Guitang and Dieudonné Soupoursou in Djougoumta and Maïda (Mayo-Danay, Cameroon). The data were partly transcribed by Victor Marheyna. In total, the corpus contains around 210 minutes. Only part of this corpus (around 30 minutes) is currently available.

List of parts of speech
  • ADJ - adjective
  • ADV - adverb
  • ART - article
  • CNJ - conjunction
  • CONFADV - configurational adverb
  • COP - copula
  • DEM - demonstrative

  • EXIST - existential
  • IDPH - ideophone
  • INTERJ - interjection
  • LOCU - locution
  • N - noun
  • NUM - numeral
  • PREP - preposition

  • PRN - pronoun
  • PTCL - particle
  • Q - question marker
  • SUFF - suffix
  • V - verb
  • PM - parsing morphology


List of glosses
  • ʔ - epenthetic glottal
  • 1plexcl - 1st plural exclusive
  • 1plincl - 1st plural inclusive
  • 1s - 1st singular
  • 2pl - 2nd plural
  • 2sf - 2nd singular feminine
  • 2sm - 2nd singular masculine
  • 3pl - 3rd person plural
  • 3sf - 3rd person singular feminine
  • 3sm - 3rd person singular masculine
  • 3sf - 3rd person singular feminine
  • adv - adverb
  • alog.pl - plural antilogophor
  • alog.sf - singular feminine antilogophor
  • alog.sm - singular masculine antilogophor
  • art.sf - singular feminine definite article
  • art.sm - singular masculine definite article
  • aux - auxiliary
  • compl - completive
  • confadv - configurational adverb
  • cop - copular

  • dem - demonstrative
  • dest - destinative
  • dm - discourse marker
  • dyn - dynamic
  • dyn.dist - dynamic distal
  • emphadd - emphatic additive
  • ephellab - epenthetic labial
  • exist.neg - negative existential
  • exist.pos - positive existential
  • fut - future
  • idph - ideophone
  • indf.art.sf - singular feminine indefinite article
  • indf.art.sm - singular masculine indefinite article
  • interj - interjection
  • itv - itive
  • neg - negative
  • pl2 - second plural
  • PM - parsing morphology
  • ppv - prepausal vowel
  • prep - preposition
  • prog - progressive

  • q - question marker
  • quot1 - quotative 1
  • quot2 - quotative 2
  • quot3 - quotative 3
  • quot4 - quotative 4
  • rel.pl - plural relative pronoun
  • rel.sf - singular feminine relative pronoun
  • rel.sm - singular masculine relative pronoun
  • res - resultative
  • response - unanalysed response formula
  • rev - reversive
  • stat.dist - static distal
  • stat.lying.sg - static, lying, singular
  • stat.pl - static, plural
  • stat.prox - static, proximal
  • stat.seated - static, seated
  • stat.standing - static, standing
  • tale closer - unanalysed tale closing formula
  • tale opener - unanalysed tale opening formula

Citing the Gizey corpus:
Guitang, Guillaume. 2022. A narrative corpus of Gizey. In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





A narrative corpus of Guro

Language

Guro is a South Mande language spoken in Ivory Coast by approximately 500.000 people. The practice of storytelling is endangered and undergoing some transformations. Even in remote areas it is largely abandoned as a family practice. Instead, in the later decades there emerged semi-professionalised groups performing traditional tales accompanied with instrumental music and singing. They are invited to weddings, funerals and similar occasions. In any kind of performance the main narrator is usually supported by a second narrator, who animates the storytelling by short interventions.

Corpus composition

The data was collected in 2019 by Olga Kuznetsova during two expeditions to the Zuenoula and Oume regions (Ivory Coast). Recordings from the Zuenoula region (over 10 hours) mostly represent group performances, and recordings from Oume region (about 2 hours) are performances in pairs (the main and a second narrators), sparsely accompanied by songs. Only parts of this collection are currently available; the corpus contains two tales from the Zuenoula region, around 45 minutes (around 6500 words) in total.

List of parts of speech
  • adj - adjective
  • adv - adverb
  • art - article
  • conj - conjunction
  • cop - copula
  • det - determiner

  • id - ideophone
  • intj - interjection
  • mrph - morpheme
  • n - noun
  • num - numeral
  • part - particle

  • pn - proper noun
  • pp - postposition
  • prep - preposition
  • prn - predicative marker
  • pron - pronoun
  • v - verb


List of glosses
  • -/ - hightening tonal morpheme
  • -\ - lowering tonal morpheme
  • \FR - in French
  • ADVZ - adverbializer
  • ART - article
  • COMP - complementizer
  • COMPL - completive
  • COP - copula
  • DEF - definite article
  • DIM - diminutive
  • DISTR - distributive numeral
  • EMPH - emphatic particle
  • EX - exclusive pronoun
  • FOC - focalized pronoun
  • GER - gerund
  • H - high tone (pronoun)
  • INC - inclusive pronoun

  • INDF - indefinite article
  • INTJ - interjection
  • IPFV - imperfective
  • JNT - joint pronoun
  • LOG - logophoric pronoun
  • NEG - negation
  • NMLZ - nominalization
  • NREF - non-referential
  • NSBJ - non-subject pronoun
  • OPT - optative
  • ORD - ordinal numeral
  • PCOP - presentative copula
  • PFV - perfective
  • PL - plural
  • PNEG - negative presentative copula
  • POSS - possessive
  • PREP - preposition

  • PROG - progressive
  • PROH - prohibitive
  • Q - question
  • QUOT - quotative
  • RECP - reciprocal pronoun
  • REFL - reflexive pronoun
  • REL - relativizer
  • RES - resultative
  • RESP - respective
  • RET - retrospective
  • SBJ - subject pronoun
  • SCOP - subordinated copula
  • SG - singular
  • SIM - simultaneity
  • SUP - supine
  • SURP - surprise marker
  • TOP - topic

Citing the Guro corpus:
Kuznetsova, Olga. 2022. A narrative corpus of Guro. In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





A narrative corpus of Kafire

Language

Kafire is a Senufo language (Gur, Niger-Congo) spoken in Northern Côte d’Ivoire. Its speakers live in an area called ‘Kafigue’ which comprises the subprefectures of Sirasso, Nafoun and Kanoroba in the department of Korhogo. Kafire, like other Senufo languages, has noun classes / genders and it has a S(ubject)-Aux(iliary)-O(bject)-V(erb)-X (obliques, adverbials) word order. Verbs are marked for aspect (perfective vs imperfective) which is expressed through preverbal auxiliaries and on the verb as well.

Traditional narratives

In the community of Kafire speakers, i.e., the Kafibele, the narration activity is practiced by everyone (women, men, children…). It is learned during narration sessions which are held only at night (during the farm-keeping activity and around the fire in the village). Traditional narration requires at least two participants: a narrator (who narrates) and a responder (who assists him). But those roles are most of the time shifted during the session since people take turns narrating. Nowadays, the narration practice is endangered because people have changed their agricultural habits (they cultivate more products that do not require the farm-keeping activity) and children go to school (they do not gather with their elders to learn narration).

Corpus composition

This corpus is a portion of the data collected by Silué Songfolo Lacina from 2019 to 2021 in the community of Kafibele (Côte d’Ivoire, in the subprefectures of Sirasso, Nafoun and Kanoroba). The length of the total corpus is more than 10 hours. But the current annotated portion is 1h30 and it contains what the Kafibele narrate during a storytelling session, namely riddles, dilemma tales and tales. More annotated data will be uploaded progressively. In the ELAN files, morphemes are represented in two forms: the surface form (at tier mb) and the underlying form (at tier cf).

List of parts of speech
  • adj – adjective
  • adp – adposition
  • adv – adverb
  • art – article
  • aux – auxiliary
  • conj – conjunction
  • ideo – ideophone
  • inf – infinitive marker

  • intj – interjection
  • n – noun
  • num – numeral
  • onom – onomatopoeia
  • pref – prefix
  • post – postposition
  • pn – proper noun

  • pro – pronoun
  • prt – particle
  • q – question marker
  • quantifier – quantifier
  • stat – stative verb
  • suf – suffix
  • v – verb


List of glosses
  • 1PL - 1st person plural
  • 1SG - 1st person singular
  • 2PL - 2nd person plural
  • 2SG - 2nd person singular
  • 3PL - 3rd person plural
  • 3SG - 3rd person singular
  • ABESS - abessive
  • ADVS - adversative
  • AFF - affirmative marker
  • ARCH - archaic form
  • APPEL - appellative
  • ASSOC - associative marker
  • CAUS - causative
  • CIPRT - clause initial particle
  • CFPRT - clause final particle
  • CON - consecutive connective
  • COND - conditional marker
  • COP - copula
  • DEF - definite
  • DEM - demonstrative
  • DIST - distant demonstrative
  • DISTR - distributive
  • DM - discourse marker
  • \DY - Dyula origin
  • EMPH - emphatic
  • EXCL - exclamation marker
  • EXP - locution
  • \FR - French origin
  • FUT - future marker
  • (G)1 - gender 1
  • (G)2 - gender 2
  • (G)3 - gender 3
  • (G)4 - gender 4
  • (G)5 - gender 5
  • IDEO - ideophone
  • IDEN - identificational marker
  • IMPO - impossible
  • INCIT - incitative
  • INDF - indefinite
  • IPFV - imperfective
  • -MID - middle voice suffix
  • NPRS - non-present
  • NEG - negation marker
  • OBLIG – obligation marker
  • ONOM - onomatopoeia
  • PFV – perfective
  • POSS – possessive
  • PRES – presentative
  • PRF – perfect
  • PROG – progressive
  • PROH – prohibitive
  • PRS – present
  • PST – past
  • -RECP – reciprocal suffix
  • -REFL – reflexive suffix
  • REL – relative clause marker
  • SBJV – subjunctive
  • SIM – similative marker
  • VOC – vocative

Citing the Kafire corpus:
Silué, Songfolo Lacina. 2022. A narrative corpus of Kafire. In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





A narrative corpus of Udihe

Language

Udihe is a Tungusic language spoken in the Russian Far East. The language is critically endangered; currently there are no more than 40 native speakers.

Corpus composition

The corpus contains written data that was collected in 1936 by E.N. Baskakova from the archives from the Peter the Great Museum of Anthropology and Ethnography (Kunstkamera, Russian Academy of Sciences). There is one text recorded in 2006 by the author of the corpus. The total number of tokens of the currently available corpus is about 9K words.

List of parts of speech
  • AD - actant derivation
  • adj - adjective
  • adv - adverb
  • akz - Aktionsart (verb derivation)
  • case - case marker
  • cop - copula
  • dem - demonstrative
  • det - determinative
  • ideof - ideophone

  • intj - interjection
  • mood - suffix that encodes mood (verb derivation)
  • mrph - morpheme
  • n - noun
  • num - numeral
  • onomat - onomatopoeia
  • part - particle
  • pers - personal pronoun
  • pers.n - proper noun

  • postp - postposition
  • pp - past participle
  • pron - pronoun
  • q - question marker
  • quant - quantifier
  • suf - suffix
  • TAM - tense-aspect-mood-marker
  • v - verb
  • voice - suffix that encodes voice (verb derivation)


List of glosses
  • ABL – ablative
  • ACC – accusative
  • ADM – admirative
  • ALIEN – alienable possession
  • ANDAT - andative
  • ASS – associative plural
  • ATTR – attributive
  • AUG – augmentative
  • CAUS – causative
  • CC.DS - conditional converb different subjects
  • CC.SS - conditional converb same subject
  • COL – collective numeral
  • COMIT – comitative
  • COND – conditional
  • CONJ – conjunction
  • CONTR – contrastive
  • CP - perfective converb
  • CV.PST – past converb
  • CV.RES – resultative converb
  • DAT – dative
  • DEC - decausative
  • DEL - deliberative future
  • DEST – destinative case suffix
  • DIM – diminutive
  • DIR – directive case suffix
  • DISC - discursive particle
  • DIST - distributive

  • DIV - diversative
  • EMPH – emphatic
  • EVID – evidential
  • EXCL – exclusive first person pronoun
  • FOC – focus particle
  • FP - future participle
  • FUT – future
  • GER – gerund
  • HORT – hortative
  • IC - simultaneous converb
  • IM - imperfective
  • IMP – imperative
  • IMPRS – impersonal
  • INC - inchoative
  • INCL – inclusive first person pronoun
  • INDEF – indefinite
  • INEXP - unexpected modality
  • INS – instrumental
  • INTS – intensifier
  • LIMIT – limitative
  • LOC – locative
  • MOM - instant action converb
  • NECES - necessity modality
  • OPER - operation suffix
  • OPT - optative
  • ORN - ornative suffix
  • PASS - passive

  • PC.DS - perfective converb, different subjects
  • PC.SS - perfective converb, same subject
  • PP - past participle
  • PP.PASS - past participle passive
  • PRF – perfect
  • PRIV - privative
  • PRO - prospective
  • PROL - prolative case suffix
  • PRP - present participle
  • PRP.PASS - present participle passive
  • PST – past
  • PURP – purposive
  • Q – question mark
  • RECP – reciproc
  • REFL – reflexive
  • REGR – regressive
  • REV - reversive
  • RUSSIAN - Russian borrowing
  • SEM - semelfactive
  • SING – singulative
  • SS.PL - same subject suffix plural
  • SS.SG - same subject suffix singular
  • TOP – topic
  • VOC – vocative
  • VQ - question verb

Citing the Udihe corpus:
Perekhvalskaya, Elena. 2021. A narrative corpus of Joola Eegimaa. In In Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html





Citing the corpus and tools

Corpus:

Nikitina, Tatiana, Ekaterina Aplonova, Izabela Jordanoska, Abbie Hantgan-Sonko, Guillaume Guitang, Olga Kuznetsova, Elena Perekhvalskaya & Lacina Silué (eds.) 2022. The SpeechReporting Corpus: Discourse Reporting in Storytelling. CNRS-LLACAN & LACITO, http://discoursereporting.huma-num.fr/index.html

Template:

Nikitina, Tatiana, Hantgan-Sonko Abbie & Chanard Christian. 2019. Reported speech annotation template for ELAN (The SpeechReporting Corpus). Villejuif-Paris: LLACAN.

Elan-CorpA:

Chanard, Christian. 2015. ELAN-CorpA: Lexicon-aided annotation in ELAN. In Amina Mettouchi, Martine Vanhove & Dominique Caubet (eds.), Corpus-based Studies of Lesser-described Languages: The CorpAfroAs corpus of spoken AfroAsiatic languages (Studies in Corpus Linguistics 68), 311–332. Amsterdam: John Benjamins Publishing
Company. https://doi.org/10.1075/scl.68.10cha.
https://benjamins.com/catalog/scl.68.10cha (2 October, 2020).