Data Access

The CAWSE spoken and written subcorpora are currently available for download.

To access the data, Register or Sign in.

For more information, see Overview of CAWSE Written and Spoken (PDF).

Overview of Data (Updated in August 2019)

ContextModeTaskSourceNo. of scriptsAverage lengthSize
AssessmentWrittenListening and summary writing
*sample
Hand-written exam script515280 words144,000 words
Reading and writing (general)
*sample
Hand-written exam script365450 words164,000 words
Reading and writing (arts and social science)
*sample
Hand-written exam script344750 words258,000 words
Reading and writing (science and engineering)
*sample
Hand-written exam script268746 words200,000 words
Written assignment (arts and social science)
*sample
Electronic essay3671,500 words550,000 words
Written assignment (science and engineering)
*sample
Electronic essay2901,500 words435,000 words
SpokenInterview
*sample
Audio12210 minutes122,000 words
Presentation
*sample
Video18410 minutes184,000 words
Teaching & LearningMultimodalChat-up session (academic group discussion)Video7 sessions1 hour7 hours
Pre-sessional classroom (Pre-Masters)Video87 clips-17 hours
Group presentation (Electrical and Electronic Engineering Year2)Video2610 minutes4 hours and 20 minutes


Notes for Users

1. All the data comes from the Year One Programme in CELE at UNNC except for presentations in the School of Electrical and Electronic Engineering (Year 2) and pre-sessional classroom data (Pre-Masters level), which are indicated clearly in the table above.

2. The assessment data is available in plain text only. For multimodal data, note that only some trial data is available, and please contact Dr Simon Harrison (Contact email: simon.harrison@cityu.edu.hk) for further enquiries.

How to Cite UNNC CAWSE

Chen, Y. H., Harrison, S., Oakey, D., Stevens, M. P., Yang, S., Ioratim-Uba, G., Zhou, Q.Q. & Bruncak, R. (2018). UNNC Corpus of Academic Written and Spoken Corpus (UNNC CAWSE) Version 1.0. Ningbo, China: University of Nottingham Ningbo China.

Lincense

Creative Commons Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.



Transcribear
    Sign out