PhD defense of Jade Mekki on 8th, September 2022

Title: Characterisation of language registers using emerging sequential pattern extraction

Abstract:

This PhD thesis aims at automatically characterising language registers. From a linguistic point of view, our contribution is to study the potential of natural language processing techniques to extract new knowledge about the casual, neutral, and formal registers. On the computational side, we have proposed a sufficiently generic and unsupervised method to characterise any type of linguistic variation, the registers then being similar to a use case. The manuscript first draws up an inventory of the many different definitions present in the literature, against which we position our work. Second, the constitution of a large lingustically-motivated corpus of French tweets annotated in registers is presented. The annotations result from a semi-supervised process based on a seed manually annotated in registers and a classifier that generalizes the annotations to all the tweets. Based on this annotated corpus, we then show that the use of emergent sequential pattern extraction techniques enables the extraction of linguistic peculiarities of the registers under study. Finally, we detail our approach for reducing the number of extracted patterns, which allows a better interpretability of the characterizations produced.

Accepted paper at RANLP 2021

TREMoLo-Tweets: a Multi-Label Corpus of French Tweets for Language Register Characterization

Jade Mekki, Gwénolé Lecorvé, Delphine Battistelli, and Nicolas Béchet

Accepted paper at ICANN 2021

Style as Sentiment versus Style as Formality: the same or different?
Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive, and John D. Kelleher

Paper accepted at CICLing 2019 (La Rochelle, France)

The paper “Towards the Automatic Processing of Language Registers: Semi-supervisedly Built Corpus and Classifier for French” has been accepted at the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). The authors are Gwénolé Lecorvé, Hugo Ayats, Benoît Fournier, Jade Mekki, Jonathan Chevelu, Delphine Battistelli, and Nicolas Béchet. Come talk with us if you attend the conference!

2 papers accepted at CORIA-TALN-RJC 2018!

First work on TREMoLo has just been accepted to CORIA-TALN-RJC 2018, the French NLP conference to be held in Rennes in May.

Paper titles (translated from French) :

  • Feature identification for register characterization. Jade Mekki, Delphine Battistelli, Gwénolé Lecorvé, Nicolas Béchet.
  • Joint building of a corpus and a classifier for language registers in French. Gwénolé Lecorvé, Hugo Ayats, Benoît Fournier, Jade Mekki, Jonathan Chevelu, Delphine Battistelli, Nicolas Béchet.

What?

The main objectives of the project are to study linguistic registers per se, and to develop methods for automatic transformation of linguistic registers across texts, i.e., translating a text from a register to another. This work will rely on the extraction of register-specific linguistic patterns and their integration in an automatic paraphrase generation process. These objectives are enabled by the strong and complementary skills of the consortium members.

The project is driven from a perspective of exploratory research where the goal is the production of fundamental knowledge for style-specific pattern extraction and automatic natural language generation. Linguistic registers are a well-suited case study to achieve this long term objective.