More about the corpus
More about the corpus
•S001
S001A: Male, Farmer, Primary school, 84
S001B: Female, Researcher, PhD, N/A
•S002
S002A: Male, Factory owner, Primary school, 62
S002B: Female, Researcher, PhD, N/A
•S003
S003A: Female, House keeper, College, 33
•S004
S004A: Female, Service, BA, 45
S004B: Female, Researcher, PhD, N/A
S004C: Male, Service, N/A, N/A
•S005
S005A: Male, Sales, MA, 32
•S006
S006A: Male, Sales, MA. 32
•S007
S007A: Female, University lecturer, MA, 38
S007B: Female, Researcher, PhD, N/A
About the speakers
Seven native speakers of Taiwanese Southern Min (TSM) native speakers were interviews by Ching Chu Sun, who is also a native speaker of TSM. The personal backgrounds of the speakers are shown below (gender, occupation, highest educational qualification, age where known), along with the filenames and speaker ids used in the A-C versions of the corpus.
About the spoken data in Praat
The spoken data has been saved as WAV files ready for listening and viewing in the sound-editing software Praat. Four tiers were created for each speaker in the same file: utterance, free translation, morpheme, morpheme gloss.
To use a WAV file in TSM Corpus D with the annotation tiers in Praat, co-select the WAV file and the TextGrid file and click on Edit:
corpus size, speakers, transcription, Praat
About the transcription
The spoken data is transcribed in a romanization based on Embree (1984). Nasalization is indicated by a capitalized N. Following Embree, the TSM Corpus distinguishes seven tones, which are indicated with numbers. The numbers marking tones represent the tonal values of forms as pronounced out of context, unaffected by tone sandhi. They are, using traditional terms of Chinese philology: upper even (1), upper (2) upper going (3), upper entering (4), lower even (5), lower going (7), and lower entering (8). According to Embree, the historical lower tone (6) does not exist in Taiwanese Southern Min.
Embree, Bernard L.M. 1984. A Dictionary of Southern Min (Taiwanese-English Dictionary). Taipei: Taipei Language Institute.
Corpus size
Version 1.0 of the TSM Corpus is modest in size (7,849 words), though rich in annotation (TSM D):
S001 = 4,107 words
S002 = 580 words
S003 = 505 words
S004 = 1,327 words
S005 = 441 words
S006 = 497 words
S007 = 392 words
Bicycle