In 2011, Dr Andy Chin of the Department of Linguistics and Modern Languages Studies of the Education University of Hong Kong developed the first phase of The Linguistic Corpus of Mid-20th Century Hong Kong Cantonese. It is based on 21 Cantonese films produced in Hong Kong in the 1950s. The dialogues in the films were transcribed into Chinese characters. The corpus has about 200,000 words, and users could search for the corpus data online.
In 2013, Dr Chin received the research funding support from the ECS of RGC to develop the second phase of the corpus, with the support of RGC’s ECS, 60 more films and 770,000 character tokens were transcribed.
The two phases of the corpus can be accessed at