Skip to main content
Emblem of The Education University of Hong Kong
Faculty of Humanities

The Corpus of Mid-20th Century Hong Kong Cantonese

In 2011, Dr Andy Chin of the Department of Linguistics and Modern Languages Studies ​​of the Education University of Hong Kong developed the first phase of The Linguistic Corpus of Mid-20th Century Hong Kong Cantonese. It is based on 21 Cantonese films produced in Hong Kong in the 1950s. The dialogues in the films were transcribed into Chinese characters. The corpus has about 200,000 words, and users could search for the corpus data online.

In 2013, Dr Chin received the research funding support from the ECS of RGC to develop the second phase of the corpus, with the support of RGC’s ECS, 60 more films and 770,000 character tokens were transcribed.

 

The two phases of the corpus can be accessed at https://hkcc.eduhk.hk/.

 

The transcription platform of the corpus

Search function of the corpus 

Search results of “送” ‘give’ 

Back