Google Summer of Code 2020 Idea Page

Welcome to GSoC 2020 in computational systems biology!

Our group works at the interface of computer science, biology, and mathematics by applying computational approaches to the seas of data in biomedical research. One of the main interests of our group is the development of technologies to make large-scale computational approaches accessible and more collaborative to a wider scientific audience, as well as to life sciences students who may or may not have been exposed to computational methods before. Our recent web-based technology, Cell Collective, enables scientists from across the globe to construct and simulate large-scale computational models of biological systems in a highly collaborative fashion. This software enables biomedical researchers to study the dynamics of biological systems (e.g., cells) under both healthy and diseased conditions. Cell Collective provides a unique environment for real-time, interactive simulations to enable users to analyze and visualize the multitude of effects a disease-related mal-function can have on the rest of the cell.

This very same modeling software is being used by thousands of life sciences students (see our user growth above) to learn about biological by the means of building, simulating, breaking, and re-simulating computational models of various biological processes. Tools that have been developed with the help of GSoC students over the last few years include ccNetViz (network visualization tool used in Cell Collective), CancerDiscover (high-throughput data analysis pipeline), Candis (graphical cancer biomarker discovery tool), etc. Our group consists of computer scientists, biochemists, biologists, bioinformaticians, as well as mathematicians, creating an unique environment of diverse skills, integrated by a single interest point.

We have been fortunate to work with some great students over the last three years of GSoC (check out their testimonials here).

Please join our Google Group for additional project details, questions, and discussions.

Before applying, please review this application template, as well as the GSoC Student Guide.

Cell Collective can be found at https://cellcollective.org, and the more recent projects on GitHub

Below you can find our project ideas for GSoC 2020.

Idea 0: Your own idea!

Feel free to suggest your own idea. Our interests are within the general space of easy-to-use, interactive, data visualization in large-scale networks. Please be detailed about the specifics of your project, why it is important, and how you plan to achieve it.

Idea 1: Javascript/WebGL library for interactive visualization of large-scale network graphs: Optimization and Dynamic Generation and Navigation through Multi-Level Networks.

As a group interested in understanding of the dynamics of complex networks via collaborative efforts, we have developed a new web-based, large-scale network visualization component. One of the main obstacles associated with this project is the ability to visualize networks on the web that are very large (tens of thousands of nodes and edges) and provide interactive features (moving/adding/removing nodes, zooming/panning, etc), while making it usable from a performance perspective in most modern web-browsers.

To address this issue we developed a new WebGL-based javascript network visualization library, Cell Collective Network Visualizer -- ccNetViz. And it's fast! Check out the example below (you can pan and zoom in these examples).

The purpose of this project is to extend ccNetViz’ functionality of multi-level networks and further increase performance. The outcome of this project will be three-fold. 1) Enable users to seamlessly zoom through multi-level, nested networks. The current implementation requires all the network levels passed in the initialization phase; however, deep nested levels are not utilized by the visualization engine. This feature would allow the user to define function to compute the nested graphs “on the fly” by using the generation function called during switching of levels. The current multi-level network features can be seen here. 2) Implementation of the force-directed layout at the level of GPU for increased performance. This implementation would be done by integrating this library as a plug-in. and 3) An integrated benchmark analyzer, with a (simple) interactive website that would enable benchmarking of user-specified generic networks.

Required skills: HTML5, Javascript (ES6)
Potential Mentors: Ales Saska, Dr. Tomas Helikar
Difficulty Rating: Easy

Idea 2: Network Visualization: Support for Systems Biology Graphical Notation (SBGN).

Because of the complexity of biological processes, and the vast ways of representing them graphically, the Systems Biology Graphical Notation (SBGN) was developed in a community effort to standardize how biological processes are visualized. The purpose of this project is to extend CCNetViz functionality by directly supporting the visualization of SBGN models, and in particular SBGN Process Description (PD) and Activity Flow (AF) elements (see example below). For more information about SBGN, see here. This project should also support the import/parsing and export of SBGN-encoded maps.

Required skills: Javascript (ES6), WebGL
Potential Mentors: Rahul Prajapati
Difficulty Rating: High

Idea 3: Javascript/WebGL library for interactive visualization of large-scale network graphs: Snappable grid with custom edges.

CCNetViz currently displays networks and graphs on a blank canvas, with the option to add a background image. We are looking to write a plugin to expand its features to enable networks to be built/visualized on a snappable grid background. This grid would enable users to position nodes and draw edges by following “dot/square anchors” on the grid. (See example below)

The plugin will consist of visual elements featuring 1) grid lines and 2) custom edges drawn through user-defined waypoints. Edges will be defined as either lines or curves. This project would also consist of updating current search function (analytic geometry based) to support the edges with waypoints.

Required skills: Analytic Geometry, WebGL, Javascript (ES6)
Potential Mentors: Robert Moore, Ales Saska
Difficulty Rating: High

Idea 4: ccNetViz Multi-Platform Integration.

CCNetViz is currently implemented in WebGL and Javascript. However, there has been increased interest in making it more compatible within other technologies. The output of this project will be three-fold: 1) A component wrapper for the most common JS frameworks ( React, Angular, Vue ); and 2) Library support for Python (Jupyter), R, Matlab;

Required skills: Python, R, Matlab, WebGL, Javascript (ES6)
Potential Mentors: Achilles Rasquinha, Shivani Tamkiya
Difficulty Rating: Medium

Idea 5: High resolution graph image export

CCNetViz currently supports the download of graphs as a png file only, with limited resolution. We are looking to develop support to enable the download of high resolution images of graphs in various file formats, including .png, .jpg., and potential secondary implementation of the rendering engine in SVG.

Required skills: WebGL, Javascript (ES6)
Potential Mentors: Dr. Tomas Helikar
Difficulty Rating: Medium

Idea 6: ccNetViz Advanced Curves

Numerous biological networks can be more comprehensively visualized with the addition of advanced edges. For instance, reactions in constraint based models often require edges with custom curvature to understand the salient relationships between metabolites (https://escher.github.io/#/app?map=iJO1366.Fatty%20acid%20beta-oxidation&tool=Builder&model=iJO1366). The output of this project is a method in ccNetViz that will enable the creation of customizable curves in WebGL. This would be done by creating a plugin with the implementation of quadratic bezier formula inside WebGL. Interested students can review this resource for background information. This project will also involve advancing the ccNetViz search function to support this type of curve (https://github.com/HelikarLab/ccNetViz/blob/master/src/spatialSearch/spatialSearch.js). This advancement will require extending the current plugin architecture to the ccNetViz spatial search file. The potential candidates are required to have strong curve parameterization and computer graphics programming skills.

Required skills: JavaScript (ES6), WebGL, computer graphics, analytic geometry
Potential Mentors: Ales Saska, Robert Moore
Difficulty Rating: High

Idea 7: Javascript library for Flux Balance and Variance Analysis

Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to build major analytical methods available in COBRA and COBRApy as a stand-alone JS library for web-based analysis of metabolic networks. Should be NodeJS compatible. The methods to be implemented in the library include: gap-filling, flux variance analysis, and flux balance analysis. This would require a JavaScript-based Linear Programming implementation and a mathematical port for Symbolic Notation in JavaScript. Examples include Algebrite.js, jsLPSolver, Math.js, etc. The main objective would be to build a complete in-browser port for the optlang (Python Library).


For those new to constraint-based modeling, these papers might be useful: Orth et. al,Bordbar et. al

Required skills: Javascript (ES6), Linear programming
Potential Mentors: Resa H, Achilles Rasquinha
Difficulty Rating: Medium

Idea 8: Reaction-based model (differential equations) analysis library for JavaScript

Ordinary Differential Equations are widely used to model biological networks and processes. While there are many desktop-based tools for these analyses, web-based solutions are largely lacking. The objective of this project is to develop a stand-alone javascript library that will enable the evaluation and simulation of differential equation based models of biological processes. The following simulation/analysis methods should be available through this library by the end of the project: Parameter fitting, steady state analysis, simulation. This library will be demonstrated through a simple web UI which will enable a user to upload an SBML file with a computational model, set up a simulation/analysis, and visualize the results of the analysis. The library can leverage other open source javascript libraries, such as: numericJS, libSBML (for parsing and processing SBML files). Should be NodeJS compatible.

Required skills: Javascript (ES6), NodeJS, Python
Potential Mentors: Robert Moore, Achilles Rasquinha
Difficulty Rating: Medium

Idea 9: Javascript Library for Problem Optimization

We are looking to develop a Javascript library that will enable the solving of optimization problems in the browser. Optimization software provides better design and development of optimization solutions for real-life problems, including biology. In particular, optimization solvers are relied upon when solving genome-wide constraint based models of metabolic networks. However, all popular and efficient solvers are currently available for back-end implementations, and not for browser, client utility. The objective and outcome of this project is to port some of the popular integer and linear programming solvers to Javascript. Examples of such solvers include GuRoBi, CPLEX, Xpress, and Pulp. This would require a JavaScript-based Linear Programming implementation and a mathematical port for Symbolic Notation in JavaScript. Examples include Algebrite.js, jsLPSolver, Math.js, etc. The main objective would be to build a complete in-browser port for the optlang (Python Library).

Required skills: C++, Emscripten, Javascript (ES6)
Potential Mentors: Ales Saska, Achilles Rasquinha
Difficulty Rating: Easy

Idea 10: Candis: A Software Tool for Cancer Prediction And Biomarker Identification Using High-throughput Data

Machine learning tools for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. However, current cancer prediction methods lack the interface that offers the interactivity, flexibility, modularity and data visualization for users. To address these issues, we develop an open-source browser-based database-driven GUI (Graphical User Interface) application, Candis, for cancer classification and biomarker identification using gene expression data (to distinguish cancer from normal samples, as well as different subtypes of cancer). Candis is built using ReactJS frontend and Flask-based backend. The easy-to-follow application interface enables researchers to access their local and remote gene expression data (search and download data from NCBI database) and build cancer prediction machine learning models. In addition to handling large datasets, the application also provides an intuitive method to create experiments, add additional data fields (e.g., Patient’s demographic data), pre-process and normalize data, including the ability to conduct feature selection and classification analysis. Candis is platform independent and comes with an easy-to-follow installation and operation instructions. Candis is useful for biomedical researchers with no computer programming background and is interested in performing cancer biomarker identification and cancer class prediction analyses on their computer. Check out a brief demo of the tool below!

The purpose of this project is to first expand Candis’ pipeline support for transcriptomics, and single cell transcriptomics data. The second part of the project would be to include deep learning, in particular by incorporating python tensor flow, pytorch. In addition, the student will be responsible for maintaining the stability of the platform by addressing various bugs.



Required skills: ReactJS, Flask, Python, high-throughput data analysis and visualization experience preferred

Potential Mentors: Kaustubh Gupta, Robert Moore
Difficulty Rating: Easy