Data Access

The CAWSE spoken and written subcorpora are currently available for download.

To access the data, Register or Sign in. Only institutional email addresses will be approved at this stage. We will review this policy to consider opening the access to the wider public in the future.

For more information, see Overview of CAWSE Written and Spoken (PDF).

Overview of Data (Updated in August 2019)

ContextModeTaskSourceNo. of scriptsAverage lengthSize
AssessmentWrittenListening and summary writing
Hand-written exam script515280 words144,000 words
Reading and writing (general)
Hand-written exam script365450 words164,000 words
Reading and writing (arts and social science)
Hand-written exam script344750 words258,000 words
Reading and writing (science and engineering)
Hand-written exam script268746 words200,000 words
Written assignment (arts and social science)
Electronic essay3671,500 words550,000 words
Written assignment (science and engineering)
Electronic essay2901,500 words435,000 words
Audio12210 minutes122,000 words
Video18410 minutes184,000 words
Teaching & LearningMultimodalChat-up session (academic group discussion)Video7 sessions1 hour7 hours
Pre-sessional classroom (Pre-Masters)Video87 clips-17 hours
Group presentation (Electrical and Electronic Engineering Year2)Video2610 minutes4 hours and 20 minutes

Notes for Users

1. All the data comes from the Year One Programme in CELE at UNNC except for presentations in the School of Electrical and Electronic Engineering (Year 2) and pre-sessional classroom data (Pre-Masters level), which are indicated clearly in the table above.

2. The assessment data is available in plain text only. For multimodal data, note that only some trial data is available, and please contact Dr Simon Harrison (Contact email: for further enquiries.

How to Cite CAWSE

Chen, Y. H., Harrison, S., Stevens, M. and Zhou, Q. (forthcoming). Developing A Multimodal Corpus of L2 English from an EMI University in China. Corpora.

Chen, Y. H., Harrison, S., Oakey, D., Stevens, M. P., Yang, S., Ioratim-Uba, G., Zhou, Q.Q. & Bruncak, R. (2018). UNNC Corpus of Academic Written and Spoken Corpus (CAWSE) Version 1.0. Ningbo, China: University of Nottingham Ningbo China.


Creative Commons Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

    Sign out