About Me

NLP engineering for large language model development

Data Collection & Curation for LLM Training

A significant part of my work involves building pipelines to collect and curate high-quality training data for large language models. This includes:

Document Processing & Format Conversion

Converting documents between formats while preserving structure and mathematical notation is a core challenge in preparing LLM training data:

LLM Evaluation

Systematic evaluation is essential for understanding model capabilities. My work in this area includes:

Model Fine-tuning

I have worked with advanced fine-tuning techniques for improving language model performance:

Technical Stack

Python

Primary language for all NLP work, data processing, and model development

HuggingFace

Transformers, Datasets, TRL, and the broader ecosystem for modern NLP

PyTorch

Deep learning framework for model training and inference

Distributed Computing

DeepSpeed, Accelerate, vLLM for efficient large-scale training and inference

Current Work at CEA

As an NLP Engineer at the French Alternative Energies and Atomic Energy Commission (CEA), I apply computational linguistics to real-world challenges. My work involves developing and evaluating language technologies that support the organization's research mission.

The role combines hands-on engineering with research-oriented thinking, requiring both practical implementation skills and a solid theoretical foundation in linguistics and machine learning.