At ILLUIN Technology, we're delighted to announce the launch of CroissantLLM (1.3B), an innovative language model (LLM) specially designed to meet the needs of French-speaking companies. This model, which is open-source, lightweight and industrializable, ethical and transparent, marks a significant step forward in the world of artificial intelligence.
A fruitful collaboration for innovation
CroissantLLM is the fruit of close collaboration between ILLUIN Technology teams and CentraleSupélec's MICS laboratory. This synergy has made it possible to contribute to the state of the art and open-source French language, at a time when companies in all sectors need generative AI solutions that are open and easily manipulated on a day-to-day basis. "This new language model not only meets industry expectations, but is also aligned with our values of openness, ethics, and transparency."
An environmentally-friendly model
At a time when the environmental impact of technology is becoming a major concern, CroissantLLM stands out for its light weight. This model can be deployed without the use of GPUs, which are generally very energy-hungry. This feature reflects our commitment to responsible industrial technological innovation, in line with the challenges of energy sobriety. đż
Sovereign and ethical innovation
CroissantLLM was trained on the Jean Zay calculator, using open and sourced data with total transparency, in compliance with AI Act regulations. This model thus embodies sovereign, trtansparent, ethical and responsible innovation, a major asset for companies wishing to integrate generative AI solutions with complete confidence.
CroissantLLM technical details
Here's what makes CroissantLLM particularly well-suited to the industrial context:
- đŻ 1.3 billion parameters: A "small" model ideal for industrial applications.
- đ Multilingual: Pre-trained on a mix of French, English, and code.
- đ Performance: The most powerful French-speaking model for its size, with performance equivalent to LLaMa-13B for translation đ«đ· / đŹđ§.
- đ± Flexibility: Runs on CPU and phone, enabling low-cost use in production.
Academic contributions and available resources
We are proud to share our advances with the academic and industrial community:
- đ„ CroissantLLM and its many variants are published under the MIT license, favoring reuse by the academic community.
- đ The largest French-language pre-training corpus to date, covering a wide range of data typologies, all under permissive licenses.
- đ FrenchBench: A high-quality LLM evaluation benchmark on industrial tasks of interest, including datasets made available by ILLUIN Technology.
Access resources
Find out more and access resources :
Thanks
This project would not have been possible without the hard work of the R&D teams, who contributed over many months. Many thanks to Manuel Faysse, Gautier Viaud, AntĂłnio Loison, Pierre Colombo, Celine Hudelot, Renaud Monnet, Paul-Henry CournĂšde, Robert VESOUL, Nuno Miguel Guerreiro, Patrick Fernandes and theUniversitĂ© Paris-Saclay. đ











