Tackling data integration challenges – Integrated transcriptomic and epigenetic differential analysis with idiffomix

(Invited keynote) From complexity to clarity – Tackling the challenges of multi-omic integration

The increasing availability and affordability of high-throughput sequencing technologies have enabled the generation of large-scale multi-omic data, greatly enhancing our understanding of complex biological systems across hierarchical molecular levels. A great deal of attention has been devoted to developing integrative methods that can fully leverage these multifaceted data, despite numerous statistical challenges such as small sample sizes, high dimensionality, heterogeneous measures, missing data, and complex interdependencies within and between omic layers. To date, many multi-omic integrative approaches have been proposed, reflecting the diversity of omics combinations, definitions of inter-omic anchors, and analysis objectives. In this talk, I will provide an overview of some commonly used methods for multi-omics integration, and I will introduce one of our own recent contributions in this field – idiffomix, a joint mixture model for integrated differential analyses of paired transcriptomic and methylation data. I will conclude by discussing some future opportunities and challenges in integrative multi-omics research.

(Invited talk) Accounting for overlapping functional annotations as biological priors in genomic prediction models of complex traits

It is now widespread to build whole-genome regression models using genomic data to predict complex traits in a wide range of fields, including farm animal and plant breeding and human genetics. Functional genomic annotations, such as the accessibility of chromatin or methylation status in target tissues at relevant developmental stages, have the potential to provide valuable insight into the position and effect size of causal genetic variants underlying complex traits. In the H2020 GENE-SWitCH project, we aimed to develop and validate Bayesian models able to more fully leverage such complex functional annotations for improved accuracy and interpretability of genomic predictions in the pig and poultry breeding sectors. To this end, we defined and implemented a flexible framework for genomic prediction called BayesRCO to simultaneously take advantage of the availability of multiple functional genomic annotations. In this talk, I’ll describe the intuition behind our proposed model and discuss some of our key take-away messages from early use cases.

Using biological priors in genomic prediction models – QQOQCCP?

(Invited talk) Accounting for overlapping functional annotations as biological priors in Bayesian genomic prediction models of complex traits

It is now widespread to build whole-genome regression models using genomic data to predict complex traits in a wide range of fields, including farm animal and plant breeding and human genetics. Functional genomic annotations, such as the accessibility of chromatin or methylation status in relevant tissues, have the potential to provide valuable insight into the position and effect size of causal genetic variants underlying complex traits. In the H2020 GENE-SWitCH project, we aimed to develop and validate Bayesian models able to fully leverage such complex functional annotations for improved accuracy and interpretability of genomic predictions in the pig and poultry breeding sectors. To this end, we defined and implemented a flexible framework for genomic prediction called BayesRCO to simultaneously take advantage of the availability of multiple functional genomic annotations. In this talk, I’ll describe the intuition behind our proposed model and discuss some of our key take-away messages from early use cases.

Accounting for overlapping functional annotations as biological priors in genomic prediction models of complex traits

It is now widespread to build whole-genome regression models using genomic data to predict complex traits in a wide range of fields, including farm animal and plant breeding and human genetics. Functional genomic annotations, such as the accessibility of chromatin or methylation status in relevant tissues, have the potential to provide valuable insight into the position and effect size of causal genetic variants underlying complex traits. In the H2020 GENE-SWitCH project, we aimed to develop and validate Bayesian models able to fully leverage such complex functional annotations for improved accuracy and interpretability of genomic predictions in the pig and poultry breeding sectors. To this end, we defined and implemented a flexible framework for genomic prediction called BayesRCO to simultaneously take advantage of the availability of multiple functional genomic annotations. In this talk, I’ll describe the intuition behind our proposed model and discuss some of our key take-away messages from early use cases.

Long-term effects of high-fiber maternal diets on the functional genome of pig offspring

Accounting for overlapping annotations as biological priors in genomic prediction models of complex traits

It is now widespread in farm animal and plant breeding to use genotyping data to predict phenotypes with genomic prediction models. Functional genomic annotations (e.g., the accessibility of chromatin or methylation status in relevant tissues), have the potential to provide valuable insight into the location and effect size of causal genetic variants underlying complex traits. Developing and validating genomic prediction models able to fully leverage such complex functional annotations for improved accuracy and interpretability was one of the aims of the H2020 GENE-SWitCH project. To this end, we defined and implemented a flexible framework for genomic prediction called BayesRCO to simultaneously take advantage of the availability of multiple functional genomic annotations. In this talk, I’ll describe the intuition behind our proposed model and discuss some of our key take-away messages from early results.

Functional annotations to guide prediction of tissue-specific gene expression from cis-regulatory sequencing variants

Leveraging multi-omic data for integrative exploratory and predictive analyses

The increased availability and affordability of high-throughput sequencing technologies in recent years have facilitated the use of multi-omic studies, expanding and enriching our understanding of complex systems across hierarchical biological levels. Integrative methods for these heterogeneous and multi-faceted omics data have shown promise for enhancing the interpretability of exploratory analyses, improving predictive power, and contributing to a holistic understanding of systems biology. However, such integrative analyses are accompanied by several major obstacles, including the potentially ambiguous relationships among omic levels, high dimensionality coupled with small sample sizes, technical artefacts due to batch effects, potentially incomplete or missing data… and the occasional difficulty in posing well-defined and answerable research questions of such data. In light of these challenges, in this talk I will discuss some of our recent methodological contributions to integrate multi-omic data for exploratory and predictive analyses.

Incorporating biological information into genomic prediction models

Incorporating multiple annotations in genomic prediction models

Leveraging multi-omic data for integrative exploratory, predictive, and network analyses

(Poster) Differential network analysis of mixed-type data with copulae

Leveraging multi-omic data for integrative exploratory, predictive, and network analyses

The increased availability and affordability of high-throughput sequencing technologies in recent years have facilitated the use of multi-omic studies, expanding and enriching our understanding of complex systems across hierarchical biological levels. Integrative methods for these heterogeneous and multi-faceted omics data have shown promise for enhancing the interpretability of exploratory analyses, improving predictive power, and contributing to a holistic understanding of systems biology. However, such integrative analyses are accompanied by several major obstacles, including the potentially ambiguous relationships among omic levels, high dimensionality coupled with small sample sizes, technical artefacts due to batch effects, potentially incomplete or missing data… and the occasional difficulty in posing well-defined and answerable research questions of such data. In light of these challenges, in this talk I will discuss a few of our recent methodological contributions to integrate multi-omic data for (1) exploratory analyses, (2) genomic prediction, and (3) network inference, all with a focus on enhanced interpretability and user-friendly software implementations.

(Invited talk) A randomized pairwise likelihood method for complex statistical inferences

Pairwise likelihood methods are commonly used for inference in parametric statistical models in cases where the full likelihood is too complex to be used, such as multivariate count data. Although pairwise likelihood methods represent a useful solution to perform inference for intractable likelihoods, several computational challenges remain, particularly in higher dimensions. To alleviate these issues, we consider a randomized pairwise likelihood approach, where only summands randomly sampled across observations and pairs are used for the estimation. In addition to the usual tradeoff between statistical and computational efficiency, we show that, under a condition on the sampling parameter, this two-way random sampling mechanism allows for the construction of less computationally expensive confidence intervals. The proposed approach, which is implemented in the rpl R package, is illustrated in tandem with copula-based models for multivariate count data in simulations and on a set of transcriptomic data.

(Invited talk) A randomized pairwise likelihood method for complex statistical inferences

Pairwise likelihood methods are commonly used for inference in parametric statistical models in cases where the full likelihood is too complex to be used, such as multivariate count data. Although pairwise likelihood methods represent a useful solution to perform inference for intractable likelihoods, several computational challenges remain, particularly in higher dimensions. To alleviate these issues, we consider a randomized pairwise likelihood approach, where only summands randomly sampled across observations and pairs are used for the estimation. In addition to the usual tradeoff between statistical and computational efficiency, we show that, under a condition on the sampling parameter, this two-way random sampling mechanism allows for the construction of less computationally expensive confidence intervals. The proposed approach, which is implemented in the rpl R package, is illustrated in tandem with copula-based models for multivariate count data in simulations and on a set of transcriptomic data.

seminar