Version A

 

The following conventions are followed in both versions of the corpus:


  1. Each speaker is represented by a number (S046 etc.) Information about gender, age, and highest education level of a speaker can be found on the Metadata page.


  1. Each number (S047 etc.) indicates the beginning of a new turn of a speaker. Each turn consists of a sequence of one or more utterances, with each utterance occupying its own line. Each utterance ends in a punctuation mark.


  1. Annotations (original spellings of proper nouns, e.g. “Sherwood Park”, Mandarin, unclear stretches etc.) are shown in [...].


  1. Files are formatted as UTF-8 text-only files.


  1. Spoken forms that lack a conventional representation by characters have been transcribed in IPA.


Note the following conventions underlying the A version of the corpus:


  1. In order to display all the special characters necessary for transcribing Shanghainese dialect, a  unicode font Songti-Fangzheng was used. To display all characters correctly, this font should be installed. It can be downloaded from

地址1 (at http://www.xue5.com/Font/ZongHe/2413.html).


  1. Transcription was carried out in accordance with Xu and Tang (1988).


XU Baohua, and TANG Zhenzhu. (1988).  Downtown Shanghai Dialect Annals. Shanghai, Shanghai Education Press .

[许宝华 汤珍珠 主编,《上海市区方言志》,上海, 上海教育出版社,1988年11月]

Sample of the A version of the corpus: