Version A
Version A
The following conventions are followed in both versions of the corpus:
• Each speaker is represented by a number (S046 etc.) Information about gender, age, and highest education level of a speaker can be found on the Metadata page.
• Each number (S047 etc.) indicates the beginning of a new turn of a speaker. Each turn consists of a sequence of one or more utterances, with each utterance occupying its own line. Each utterance ends in a punctuation mark.
• Annotations (original spellings of proper nouns, e.g. “Sherwood Park”, Mandarin, unclear stretches etc.) are shown in [...].
•Files are formatted as UTF-8 text-only files.
• Spoken forms that lack a conventional representation by characters have been transcribed in IPA.
Note the following conventions underlying the A version of the corpus:
• In order to display all the special characters necessary for transcribing Shanghainese dialect, a unicode font Songti-Fangzheng was used. To display all characters correctly, this font should be installed. It can be downloaded from
下载地址1 (at http://www.xue5.com/Font/ZongHe/2413.html).
•Transcription was carried out in accordance with Xu and Tang (1988).
XU Baohua, and TANG Zhenzhu. (1988). Downtown Shanghai Dialect Annals. Shanghai, Shanghai Education Press .
[许宝华 汤珍珠 主编,《上海市区方言志》,上海, 上海教育出版社,1988年11月]
Sample of the A version of the corpus: