TREMoLo-Tweets Corpus

TREMoLo-Tweets is a large corpus of French tweets annotated in casual, neutral and formal language registers. The resulting annotated corpus contains 228,505 tweets for a total of 6 million words. The annotations are automatically generated based on a CamemBERT classifier fine-tuned on a manually annotated seed. The corpus also contains linguistic features that can help analyze the notion of registers.

DOWNLOAD

Comments are closed.