More about the corpus

 

SSC consists of samples of five genres of spoken Shanghainese:


  1. Monologue


  1. Conversation


  1. Interview with Jiali Mao


  1. Scripts (movies and cartoons dubbed into Shanghainese dialect, broadcast over PingJi Voice Forum: http://bbs.pingjivoice.com/)


  1. Songs (popular songs and nursery rhymes in Shanghainese)


Data was collected in Downtown Shanghai, China (see Maps) and in Edmonton, Alberta, Canada, in the period 2008-present. SSC 1.0 consists of 124,069 words (where a “word” may consist of more than one Chinese character).


Clock Tower at Shanghai Railway Station