Data-Driven Discovery

Genevera’s group develops new statistical machine learning tools to help people make discoveries from large and complex data sets.
View Full List of Publications on Google Scholar

STATISTICAL MACHINE LEARNING

Graphical Models

We develop new types of probabilistic graphical models and graph learning strategies for representing, discovering, and visualizing relationships in large data sets. Our work includes developing new classes of graphical models for diverse data types as well as mixed or multi-modal data.  A recent focus of our work is on developing graph learning strategies for large-scale neuroscience data.

Key Publications:

Dimension Reduction

Dimension reduction techniques are used for visualizing, exploring, and discovering patterns in large data sets. We have developed many dimension reduction techniques for complex and structured data; these include sparse tensor decompositions and generalizations of PCA for structured or multi-modal data.  

Key Publications:

Clustering

 Clustering seeks to find groups in large data sets.  We have developed several convex clustering approaches that offer accurate, principled, and flexible strategies along with built-in visualizations for clustering.  

Key Publications:

Feature Selection

We develop a variety of feature selection techniques to improve the interpretability of machine learning methods. Our specific goal is to select relevant features from highly correlated and high-dimensional data sets.

Key Publications:

Data Integration

Large data sets are often diverse, with multiple types of features measured on the same set of subjects or observations.  We have developed a variety of interpretable machine learning techniques for discovering joint patterns in this so-called mixed multi-modal data.

Key Publications:

Ensemble Learning

Recently, we have begun developing new computationally efficient ensemble learning strategies that also lead to improved accuracy and interpretability.

Key Publications:

Interpretability & Fairness

We recently have begun work on machine learning ethics related to interpretability and algorithmic fairness. Our goal is to develop principled approaches for reliably interpreting or unraveling black-box machine learning approaches as well as mitigating biases in machine learning predictions.

Key Publications:

APPLICATIONS

Neuroscience

We develop new statistical machine learning approaches to analyze huge data sets from new technologies for neuroimaging and neural recordings. A key goal of our research is to understand brain connectomics, or how the brain is functionally and structurally connected.

Key Publications:

Bioinformatics

New biomedical technologies have led to an enormous proliferation of “omics” data measuring DNA, RNA, proteins, metabolites and more. We develop new statistical machine learning techniques to make discoveries from high-dimensional omics data as well as data integration techniques for analyzing multi-omics data.

Key Publications:

We gratefully acknowledge support from: