More about the corpus
More about the corpus
SSC consists of samples of five genres of spoken Shanghainese:
• Monologue
• Conversation
• Interview with Jiali Mao
• Scripts (movies and cartoons dubbed into Shanghainese dialect, broadcast over PingJi Voice Forum: http://bbs.pingjivoice.com/)
• Songs (popular songs and nursery rhymes in Shanghainese)
Data was collected in Downtown Shanghai, China (see Maps) and in Edmonton, Alberta, Canada, in the period 2008-present. SSC 1.0 consists of 124,069 words (where a “word” may consist of more than one Chinese character).
Clock Tower at Shanghai Railway Station