[SocBiN] open phd position in bioinformatics / machine learning in Montpellier
Daniele Raimondi
daniele.raimondi at igmm.cnrs.fr
Mon Aug 11 11:48:31 CEST 2025
*_Please feel free to share this offer for a PHD position at IGMM in
Montpellier_
Project:*DNA large language models for end to end genome interpretation
_*Motivation*_
Interpreting the genome means modeling the relationship between genotype
and phenotype, which is the fundamental goal of biology. Achieving this
would revolutionize genetics, medicine, and agro-tech. In clinical
genetics, it could lead to personalized treatments tailored to each
patient's genome, enabling precision medicine.
_*Objectives*_
This project fuses quantitative genetics with bioinformatics and
cutting-edge Artificial Intelligence, using the latest Deep Learning
Large Language Models for DNA to advance our ability to predict the
phenotypes deriving from the observed genotypes. This project is based
on the previous work of Dr. Raimondi on Genome Interpretation (GI) for
the prediction of clinically relevant phenotypes in humans.
Dr. Raimondi previous works focused on encoding Whole-Exome/Whole-Genome
Sequencing (WES/WGS) data into compact, machine-readable features, while
reducing overfitting caused by high-dimensional genomic data. However,
to limit complexity, we had to rely on coarse gene-level summaries, such
as mutational burden per gene, which sacrificed fine-grained genetic detail.
The goal of this project is to move beyond these gene-centric encodings
by building neural network architectures that operate directly at the
nucleotide level. To achieve this, we propose integrating pre-trained
DNA LLMs as unsupervised feature extractors within GI models. These
models, trained using self-supervised learning on entire genomes,
capture rich patterns of DNA dependencies and can produce
information-dense latent representations.
DNA LLMs have shown strong performance in various functional genomics
tasks, such as identifying regulatory elements and variant effects. This
project will evaluate whether their latent representations can improve
phenotype prediction. The new DNA LLM methods will be prototyped on
A.thaliana, which is a well known model organism. Later developments
will be translated to the disease risk prediction of Inflammatory Bowel
Disease (IBD). Unlike DNA LLM research, this work applies LLMs to the
interpretation of individual-level WES/WGS data for disease risk,
marking a novel use of these models in human genetic prediction.
_*Candidate profile*_
We are looking for a *motivated*and *curious*candidate, with a strong
passion for science and for scientific discovery through the use and
creation of new data science and Machine Learning methods.
Bioinformatics and Genome Interpretation are multi-disciplinary and
rapidly evolving fields. Therefore, the candidate is expected to 1) be
eager to continuously *learn*new skills, methods and concepts, and 2) to
*enjoy*finding new solutions in the face of new and unforeseen difficulties.
The ideal candidate has *very good*1) python programming skills, 2)
understanding of *the mathematical foundations and principles*of Machine
Learning, Linear Algebra (vectorial and matricial operations,
optimization), with a particular focus on *Neural Networks*, 3) *problem
solving*skills, 4) familiarity with GNU/Linux environment and 5) ability
to *multi-task*across different projects. A good understanding of the
basic concepts of Bioinformatics is not necessary but welcome. The
project will be based on the development of *un-orthodox Neural
Network*models with *Pytorch*.
B2 level of English is *required*.
The offer provides an initial 6-month contract, with possibility of
renewal to 2 years. This project can be extended to 3 years and offers
the opportunity to obtain a PhD.
_*Research environment*___
The recruited person will join the “AI for Genome Interpretation” team
led by Dr. Daniele Raimondi at IGMM. The work will be conducted in an
international (English-speaking) and interdisciplinary environment.
The Institute of Molecular Genetics of Montpellier (IGMM) is a joint
research unit affiliated with the CNRS and the University of
Montpellier. It comprises around 200 members, organized into 18 research
groups, 9 shared support services and 9 technological and scientific
platforms.
IGMM is a multidisciplinary institute whose research has both
fundamental and translational impact in molecular and cellular biology
at the international level.
_*Qualifications*_
The ideal candidate has *very good*1) python programming skills, 2)
understanding of *the mathematical foundations and principles*of Machine
Learning, Linear Algebra (vectorial and matricial operations,
optimization), with a particular focus on *Neural Networks*, 3) *problem
solving*skills, 4) familiarity with GNU/Linux environment and 5) ability
to *multi-task*across different projects. A good understanding of the
basic concepts of Bioinformatics is not necessary but welcome. The
project will be based on the development of *un-orthodox Neural
Network*models with *Pytorch*. B2 level of English is required.
Familiarity with scientific computing and libraries such as numpy,
scikit-learn, scipy, pytorch.
*HOW TO APPLY:*
https://umontpellier.nous-recrutons.fr/poste/7sqwj9n4vg-ingenieur-en-calcul-scientifique-ou-ingenieur-en-ingenierie-logicielle-fh/
--
Daniele Raimondi, PhD
Chaire de Professeur Junior CNRS
AI for Genome Interpretation group
Institut de Génétique Moléculaire de Montpellier (IGMM)
1919 Route de Mende
34090 Montpellier, France
-obscurum per obscurius-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.su.se/pipermail/socbin-at-sbc.su.se/attachments/20250811/e79ab53a/attachment.html>
More information about the SocBiN
mailing list