TREMoLo-Tweets: a Multi-Label Corpus of French Tweets for Language Register Characterization
Jade Mekki, Gwénolé Lecorvé, Delphine Battistelli, and Nicolas Béchet
Style as Sentiment versus Style as Formality: the same or different?
Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive, and John D. Kelleher
We are looking for a new collaborator to work on paraphrase generation / natural language generation. Details can be found here. Feel free to apply if you are interested!
The paper “Towards the Automatic Processing of Language Registers: Semi-supervisedly Built Corpus and Classifier for French” has been accepted at the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). The authors are Gwénolé Lecorvé, Hugo Ayats, Benoît Fournier, Jade Mekki, Jonathan Chevelu, Delphine Battistelli, and Nicolas Béchet. Come talk with us if you attend the conference!
First work on TREMoLo has just been accepted to CORIA-TALN-RJC 2018, the French NLP conference to be held in Rennes in May.
Paper titles (translated from French) :
- Feature identification for register characterization. Jade Mekki, Delphine Battistelli, Gwénolé Lecorvé, Nicolas Béchet.
- Joint building of a corpus and a classifier for language registers in French. Gwénolé Lecorvé, Hugo Ayats, Benoît Fournier, Jade Mekki, Jonathan Chevelu, Delphine Battistelli, Nicolas Béchet.
The kickoff meeting has just happened this Monday at IRISA Lannion, within the office of team Expression.
Let’s get to work now! 🙂
The main objectives of the project are to study linguistic registers per se, and to develop methods for automatic transformation of linguistic registers across texts, i.e., translating a text from a register to another. This work will rely on the extraction of register-specific linguistic patterns and their integration in an automatic paraphrase generation process. These objectives are enabled by the strong and complementary skills of the consortium members.
The project is driven from a perspective of exploratory research where the goal is the production of fundamental knowledge for style-specific pattern extraction and automatic natural language generation. Linguistic registers are a well-suited case study to achieve this long term objective.
Linguistic registers are known to have a strong influence on the expressivity conveyed by utterances. However, their study in natural language processing (NLP) is still marginal. To compensate for this deficiency, the TREMoLo project focuses on their analysis and automatic manipulation, with a particular attention on French. Beside its originality, this research work will be complementary with the wide-spread activities in textual information extraction in NLP.
The project is part of the growing interest towards stylistics in NLP, domain for which the number of potential applications increases. For instance, stylistics can take part in authorship authentication, access to information, human-machine dialogue or interaction, and language learning. Societal consequences of the project are thus naturally in these domains by opening possibilities for automatic text modulation according to a specific goal or audience. Scientific advances mainly stand in the joint use of data mining and statistical NLP approaches, along with the discovery of new linguistic and sociolinguistic findings. All these aspects provide a high industrial valorisation potential to the project.