Welcome to GSoC 2019 in computational systems biology!
Our group works at the interface of computer science, biology, and mathematics by applying computational approaches to the seas of data in biomedical research. One of the main interests of our group is the development of technologies to make large-scale computational approaches accessible and more collaborative to a wider scientific audience, as well as to life sciences students who may or may not have been exposed to computational methods before. Our recent web-based technology, Cell Collective, enables scientists from across the globe to construct and simulate large-scale computational models of biological systems in a highly collaborative fashion. This software enables biomedical researchers to study the dynamics of biological systems (e.g., cells) under both healthy and diseased conditions. Cell Collective provides a unique environment for real-time, interactive simulations to enable users to analyze and visualize the multitude of effects a disease-related mal-function can have on the rest of the cell.
Feel free to suggest your own idea. Our interests are within the general space of easy-to-use, interactive, data visualization in large-scale networks. Please be detailed about the specifics of your project, why it is important, and how you plan to achieve it.
As a group interested in understanding of the dynamics of complex networks via collaborative efforts,
we have developed a new web-based, large-scale network visualization component. One of the main obstacles
associated with this project is the ability to visualize networks on the web that are very large
(tens of thousands of nodes and edges) and provide interactive features (moving/adding/removing nodes,
zooming/panning, etc), while making it usable from a performance perspective in most modern web-browsers.
Specific components of this project include:
Hive Plots: Biological networks are typically visualized using traditional force-based or spectral layout algorithms. While these algorithms are useful for small to medium-sized networks, they result in largely not-so-informative and irreproducible "hair-balls" when used for large-scale networks (left side of the figure below). To address these issues, hive plots were developed as a method to manage the visual complexity and generate informative, quantitative, and comparable network layouts. Hive plot is a linear network layout whereby nodes are organized along axes based on specified data attributes. In addition, the arrangement of nodes on each axis can also provide information about the type of the data (node), providing an informative network visualization tool. An example of this layout can be seen in this D3.js component developed by Mike Bostock.
Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to create a stand-alone web pipeline/service that will enable a user to submit an SBML model and select analysis methods (via a React-based GUI) to NodeJS and server-side microservices, which will in turn submit the simulation/analysis job to a backend COBRApy service. This service will subsequently return the simulation/analysis results through the API back the React client for user viewing.
The following simulation/analysis methods should be available through this pipeline by the end of the project: Define objective functions, Flux balance analysis, Flux variance analysis, gap-filling.
One of the main objectives for this project is to define and implement a specific Data Structure for a Constrained-Based Model that can be tabulated within a PostgreSQL DataBase. The data structure must be JSON parsable on the client side through a ReST-full API. The detailed Data Layer structure could be discussed during the Community-Bonding Phase of the Project Timeline.
Potential Mentors: Robert Moore, Dr. Rauf Shah
Difficulty Rating: Easy
Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. As a compliment to Idea 3, the goal of this project is to build major analytical methods available in COBRA and COBRApy as a stand-alone JS library for web-based analysis of metabolic networks. Should be NodeJS compatible.
The methods to be implemented in the library include: gap-filling, flux variance analysis, and flux balance analysis.
For those new to constraint-based modeling, these papers might be useful: Orth et. al,Bordbar et. al
Potential Mentors: Drs. Bhanwar Lal Puniya, Tomas Helikar
Difficulty Rating: Medium
Ordinary Differential Equations are widely used to model biological networks and kinetic processes. While there are many desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to create a stand-alone web pipeline/service that will enable a user to submit an SBML model and select analysis methods (via a React-based GUI) to NodeJS, which will in turn submit the simulation/analysis job to a backend simulation service (e.g., COPASI, CellDesigner, Systems Biology Simulation Core Library, etc). This service will subsequently return the simulation/analysis results through the API back the the React client for user viewing.
The following simulation/analysis methods should be available through this pipeline by the end of the project: Parameter fitting, steady state analysis, simulation.
One of the main objectives for this project is to define and implement a specific Data Structure for a kinetic model that can be tabulated within a PostgreSQL DataBase. The data structure must be JSON parsable on the client side through a ReST-full API. The detailed Data Layer structure could be discussed during the Community-Bonding Phase of the Project Timeline.
Potential Mentors: Rauf Shah, Achilles Rasquinha
Difficulty Rating: Medium
Potential Mentors: Robert Moore, Dr. Tomas Helikar
Difficulty Rating: Medium
Most statistical technologies require users to be familiar with the command line and/or some type of a higher level programming language, making statistics less accessible to those who are not familiar with these technologies. The technology already provides access to many sdata visualizations and analyses, including tabular data upload/visualization, descriptive statistics, t-tests, graphing, one-/multi-way ANOVA, clustering, classification, principal component analyses, and heatmaps. It also provides an interactive command-line snippet manager in the web browser that allows users to learn R programming in a scaffolded fashion. The goal of this project is to develop a production-ready version of cross-platform web-based application that enables anyone to perform various statistical computations in an easy-to-use, interactive, and graphical manner. Specifically, the selected student will:
The animation should support customization by a dev user. Examples of customization should be the animated color, speed, as well as some pre-set styles.
Potential Mentors: Ales Saska, Resa Helikar
Difficulty Rating: High
Machine learning tools for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. However, current cancer prediction methods lack the interface that offers the interactivity, flexibility, modularity and data visualization for users. To address these issues, we develop an open-source browser-based database-driven GUI (Graphical User Interface) application, Candis, for cancer classification and biomarker identification using gene expression data (to distinguish cancer from normal samples, as well as different subtypes of cancer). Candis is built using ReactJS frontend and Flask-based backend. The easy-to-follow application interface enables researchers to access their local and remote gene expression data (search and download data from NCBI database) and build cancer prediction machine learning models. In addition to handling large datasets, the application also provides an intuitive method to create experiments, add additional data fields (e.g., Patient’s demographic data), pre-process and normalize data, including the ability to conduct feature selection and classification analysis. Candis is platform independent and comes with an easy-to-follow installation and operation instructions. Candis is useful for biomedical researchers with no computer programming background and is interested in performing cancer biomarker identification and cancer class prediction analyses on their computer. Check out a brief demo of the tool below!
The purpose of this project is to expand Candis’ machine learning services and tools to include deep learning, in particular by incorporating python tensor flow, pytorch. In addition, the student will be responsible for maintaining the stability of the platform by addressing various bugs.
Networks are used in visualizing a variety of biological systems. Signaling and gene regulatory networks are often described as a composition of interactions or relationships between nodes and components of signaling or gene regulatory. However, the resulting node-and-edge graphs are visually distinct from the more familiar ‘metabolic map’ layout where lines show the flow of reactants coming together before branching into products (see below). The goal of this project is to modify the CCNetViz network visualization library (currently focused on node-edge graphs) and develop a new “mode” that will enable visualization of metabolic and/or kinetik network models.