More about the corpus

 
  1. S001
    S001A: Male, Farmer, Primary school, 84
    S001B: Female, Researcher, PhD, N/A

  2. S002
    S002A: Male, Factory owner, Primary school, 62
    S002B: Female, Researcher, PhD, N/A

  3. S003
    S003A: Female, House keeper, College, 33

  4. S004
    S004A: Female, Service, BA, 45
    S004B: Female, Researcher, PhD, N/A
    S004C: Male, Service, N/A, N/A

  5. S005
    S005A: Male, Sales, MA, 32

  6. S006
    S006A: Male, Sales, MA. 32

  7. S007
    S007A: Female, University lecturer, MA, 38
    S007B: Female, Researcher, PhD, N/A

About the speakers


Seven native speakers of Taiwanese Southern Min (TSM) native speakers were interviews by Ching Chu Sun, who is also a native speaker of TSM. The personal backgrounds of the speakers are shown below (gender, occupation, highest educational qualification, age where known), along with the filenames and speaker ids used in the A-C versions of the corpus.

About the spoken data in Praat


The spoken data has been saved as WAV files ready for listening and viewing in the sound-editing software Praat. Four tiers were created for each speaker in the same file: utterance, free translation, morpheme, morpheme gloss.


To use a WAV file in TSM Corpus D with the annotation tiers in Praat, co-select the WAV file and the TextGrid file and click on Edit:

corpus size, speakers, transcription, Praat

About the transcription


The spoken data is transcribed in a romanization based on Embree (1984). Nasalization is indicated by a capitalized N. Following Embree, the TSM Corpus distinguishes seven tones, which are indicated with numbers. The numbers marking tones represent the tonal values of forms as pronounced out of context, unaffected by tone sandhi.  They are, using traditional terms of Chinese philology: upper even (1), upper (2) upper going (3), upper entering (4), lower even (5), lower going (7), and lower entering (8). According to Embree, the historical lower tone (6) does not exist in Taiwanese Southern Min.


Embree, Bernard L.M. 1984. A Dictionary of Southern Min (Taiwanese-English Dictionary). Taipei: Taipei Language Institute.

Corpus size


Version 1.0 of the TSM Corpus is modest in size (7,849 words), though rich in annotation (TSM D):

S001 = 4,107 words
S002 = 580 words
S003 = 505 words
S004 = 1,327 words
S005 = 441 words
S006 = 497 words
S007 = 392 words

Bicycle