Corpus of Chinese Academic Written
and Spoken English (CAWSE)

CAWSE The project of Corpus of Chinese Academic Written and Spoken English (UNNC CAWSE) aims to build a large collection of L1 Chinese students' L2 English samples from the University of Nottingham Ningbo China (UNNC). A variety of assessment tasks (both written and spoken) and speech events (spoken and multi-modal) were collected during 2016-18 from the preliminary-year programme.

Part of the corpus is now available for download, including coursework (approx. 1 million tokens), interviews (122 sessions, 10 mins each) and presentations (184 sessions, 10 mins each). All the data available has been anonymised with any personally identifiable information removed. The data access is subject to approval at the moment, but all you need is an affiliation email address for registration. The multimodal subcorpus composed of student group discussion is still work in progress and may be shareable in the future.

The CAWSE corpus provides great resources for us to investigate distinctive characteristics in L2 English samples in terms of lexical, syntactical or discoursal features across different band scores, assessment tasks, genres and many other contexts.

An online transcription and annotation tool, Transcribear, was also developed for the project, and it’s available for all registered users of CAWSE. This browser-based tool offers both manual (free) and automated transcription (with a small fee) services, where an all-in-one editor with the functionality of tag validation will make the tedious work of transcription and annotation a lot easier, faster and more reliable.

For any questions about the project, please contact us via the Contact Us link below.

Funding

The project is funded by the Ningbo 3315 plan (1 million RMB) awarded to Dr Yu-Hua Chen as part of the government's innovation development scheme to raise the profile of Ningbo. The corpus development is also assisted by funding (200k RMB) from The University of Nottingham Ningbo China Matching Funding Scheme.


Transcribear
    Sign out