TREMoLo-Tweets Corpus

The TREMoLo-tweets corpus is a large corpus of French tweets annotated in casual, neutral and formal language registers.

The resulting annotated corpus contains 228,505 tweets for a total of 6 million words.

The annotations are automatically generated based on a CamemBERT classifier fine-tuned on a manually annotated seed.

The corpus also contains linguistic features that can help analyze the notion of registers.

DOWNLOAD

Comments are closed.