Google Summer of Code 2018 Idea Page

Welcome to GSoC 2018 in computational systems biology!

Our group works at the interface of computer science, biology, and mathematics by applying computational approaches to the seas of data in biomedical research. One of the main interests of our group is the development of technologies to make large-scale computational approaches accessible and more collaborative to a wider scientific audience, as well as to life sciences students who may or may not have been exposed to computational methods before. Our recent web-based technology, Cell Collective, enables scientists from across the globe to construct and simulate large-scale computational models of biological systems in a highly collaborative fashion. This software enables biomedical researchers to study the dynamics of biological systems (e.g., cells) under both healthy and diseased conditions. Cell Collective provides a unique environment for real-time, interactive simulations to enable users to analyze and visualize the multitude of effects a disease-related mal-function can have on the rest of the cell.

This very same modeling software is being used by thousands of life sciences students (see our user growth above) to learn about biological by the means of building, simulating, breaking, and re-simulating computational models of various biological processes. Tools that have been developed with the help of GSoC students over the last few years include ccNetViz (network visualization tool used in Cell Collective), CancerDiscover (high-throughput data analysis pipeline), Candis (graphical cancer biomarker discovery tool), etc. Our group consists of computer scientists, biochemists, biologists, bioinformaticians, as well as mathematicians, creating an unique environment of diverse skills, integrated by a single interest point.

We have been fortunate to work with some great students over the last three years of GSoC (check out their testimonials here).

Please join our Google Group for additional project details, questions, and discussions.

Before applying, please review this application template, as well as the GSoC Student Guide.

Cell Collective can be found at https://cellcollective.org, and the more recent projects on GitHub

Below you can find our project ideas for GSoC 2018.

Idea 0: Your own idea!

Feel free to suggest your own idea. Our interests are within the general space of easy-to-use, interactive, data visualization in large-scale networks. Please be detailed about the specifics of your project, why it is important, and how you plan to achieve it.

Idea 1: Javascript/WebGL library for interactive visualization of large-scale network graphs: Core Features.

As a group interested in understanding of the dynamics of complex networks via collaborative efforts, we have developed a new web-based, large-scale network visualization component. One of the main obstacles associated with this project is the ability to visualize networks on the web that are very large (tens of thousands of nodes and edges) and provide interactive features (moving/adding/removing nodes, zooming/panning, etc), while making it usable from a performance perspective in most modern web-browsers.

To address this issue we developed a new WebGL-based javascript network visualization library, Cell Collective Network Visualizer -- ccNetViz. And it's fast! Check out the example below (you can pan and zoom in these examples).

This project will center around the development of additional core features and optimization techniques to enable interactive visualization of very large networks, as well as new interactive graph features, including structured text rendering, new network layouts, etc.

Specific components of this project include:

  • * Support for formatting labels.
  • * Improve the linear-algebra based spatial search to support edge labeling.
  • * Structured text formatting options.
  • * Advanced node styling: eg., border, data-driven (e.g., color and size).
  • * Support for custom node styles -- e.g., Pie chart.
  • * Improve efficiency of SDF rendering.

Required skills: Javascript (ES6), WebGL, Graph Theory
Potential Mentors: Ales Saska, Dr. Tomas Helikar
Difficulty Rating: 3

Idea 2: Network Visualization Layouts.

We are looking to extend the network layout options within the aforementioned ccNetViz javascript library. In particular, we aim at developing the cutting edge Hive Plot layout (see below), as well as to implement support for layout calculations inside web-workers. All currently implemented layouts can be found in here.

Hive Plots: Biological networks are typically visualized using traditional force-based or spectral layout algorithms. While these algorithms are useful for small to medium-sized networks, they result in largely not-so-informative and irreproducible "hair-balls" when used for large-scale networks (left side of the figure below). To address these issues, hive plots were developed as a method to manage the visual complexity and generate informative, quantitative, and comparable network layouts. Hive plot is a linear network layout whereby nodes are organized along axes based on specified data attributes. In addition, the arrangement of nodes on each axis can also provide information about the type of the data (node), providing an informative network visualization tool. An example of this layout can be seen in this D3.js component developed by Mike Bostock.

Required skills: Javascript (ES6), WebGL
Potential Mentors: Ales Saska, Dr. Tomas Helikar
Difficulty Rating: 2

Idea 3: Javascript/WebGL library for interactive visualization of large-scale network graphs: Expanded Features.

The final project related to CCNetViz includes the implementation of the following features that will enhance the overall infrastructure of the Javascript library:

  • * Ability to hide edges/nodes
  • * Extend live demo page (png image export, support for context menu, etc.)
  • * Develop an on-line benchmark and (performance) testing framework
  • * Comprehensive unit tests
  • * Extend user and developer documentation

Required skills: Javascript (ES6), Graph Theory
Potential Mentors: Varun Sharma
Difficulty Rating: 1

Idea 4: Web pipeline for Flux Balance Analysis

Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to create a stand-alone web pipeline/service that will enable a user to submit an SBML model and select analysis methods (via a React-based GUI) to a Java-based server-side API, which will in turn submit the simulation/analysis job to a backend COBRApy service. This service will subsequently return the simulation/analysis results through the API back the React client for user viewing.
The following simulation/analysis methods should be available through this pipeline by the end of the project: Define objective functions, Flux balance analysis, Flux variance analysis, gap-filling.


For those new to constraint-based modeling, these papers might be useful: Orth et. al,Bordbar et. al

Required skills: Java, Javascript (ES6), React, Data visualization packages such as D3.js

Potential Mentors: Dr. Bhanwar Lal Puniya, Dr. Akram Mohammed
Difficulty Rating: 2

Idea 5: Javascript library for Flux Balance and Variance Analysis

Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. As a compliment to Idea 3, the goal of this project is to build major analytical methods available in COBRA and COBRApy as a stand-alone JS library for web-based analysis of metabolic networks. Should be NodeJS compatible.

The methods to be implemented in the library include: gap-filling, flux variance analysis, and flux balance analysis.
For those new to constraint-based modeling, these papers might be useful: Orth et. al,Bordbar et. al

Required skills: Javascript (ES6), Node.js

Potential Mentors: Varun Sharma, Dr. Bhanwar Lal Puniya
Difficulty Rating: 3

Idea 6: Web pipeline for ODE-based models.

Ordinary Differential Equations are widely used to model biological networks and processes. While there are many desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to create a stand-alone web pipeline/service that will enable a user to submit an SBML model and select analysis methods (via a React-based GUI) to a Java-based server-side API, which will in turn submit the simulation/analysis job to a backend simulation service (e.g., COPASI, CellDesigner, Systems Biology Simulation Core Library, etc). This service will subsequently return the simulation/analysis results through the API back the the React client for user viewing.

The following simulation/analysis methods should be available through this pipeline by the end of the project: Parameter fitting, steady state analysis, simulation.

Required skills: Java, ReactJS, Javascript (data visualization packages such as D3), CSS3, HTML5

Potential Mentors: Dr. Jim Rogers, Dr. Bhanwar Lal Puniya, Dr. Tomas Helikar
Difficulty Rating: 2

Idea 7: Reaction-based model (differential equations) analysis library for JavaScript

Related to Idea 5, the objective of this project is to develop a stand-alone javascript library that will enable the evaluation and simulation of differential equation based models of biological processes. The following simulation/analysis methods should be available through this library by the end of the project: Parameter fitting, steady state analysis, simulation. This library will be demonstrated through a simple web UI which will enable a user to upload an SBML file with a computational model, set up a simulation/analysis, and visualize the results of the analysis. The library can leverage other open source javascript libraries, such as: numericJS, libSBML (for parsing and processing SBML files).

Required skills: Javascript (ES6), Node.js

Potential Mentors: Varun Sharma, Dr. Bhanwar Lal Puniya
Difficulty Rating: 3

Idea 8: Javascript framework for Agent-based modeling.

The goal of this project is to develop a stand-alone javascript library to create and simulate agent-based models. Agent-based modeling (ABM) is a widely used technique to model systems of autonomous, and mutually interacting agents. ABM has been used in many fields, including life sciences with various objectives and applications, for example, to predict the spread of a pathogen, cancer development, cell differentiation, etc. Given its long history, ABM has strong theoretical and practical foundations with a wide variety of desktop-based software and methods for developing and simulating agent models. To facilitate a more collaborative environment for ABM in life sciences, we aim to also bring ABM to the web through this Javascript-based ABM framework. The framework will consist of Agents that live and communicate in a 3-dimensional environment (utilizing threeJS) in a distributed and independent fashion. Each agent will be able to: move, receive signal from the environment, die, and divide. Agents will also be able to assume geometrical shape; however, in this version, they can all be of the same geometry (e.g., a sphere). You can review NetLogo (a widely-used desktop ABM software) to learn more about ABM.



Required skills: Javascript (ES6), three.js, agent-based modeling (preferred)

Potential Mentors: Dr. Akram Mohammed, Dr. Kenneth Wertheim
Difficulty Rating: 3

Idea 9: Interactive web platform for R-based data analysis

Most statistical technologies require users to be familiar with the command line and/or some type of a higher level programming language, making statistics less accessible to those who are not familiar with these technologies. The goal of this project is to develop the final version of cross-platform web-based application that enables anyone to perform various statistical computations in an easy-to-use, interactive, and graphical manner.

During the previous versions (code), the following statistical functions were implemented using R programming language: Tabular data upload/visualization, descriptive statistics, t-tests, graphing, one-/multi-way ANOVA, clustering, classification, principal component analyses, and heatmaps. Currently the application is compatible with Linux and MacOS. The final version will be platform-independent and will have the interactive command-line snippet manager on the web browser that teaches users R programming. Currently, it displays the static R code on the browser, the final version will have the interactivity from both (GUI and CLI) ends. User account management and data management components also still need to be developed.

Required skills: R, React, Javascript, D3.js, high-throughput data analysis and visualization experience preferred

Potential Mentors: Achilles Rasquinha, Dr. Akram Mohammed
Difficulty Rating: 1

Idea 10: Candis: A Software Tool for Cancer Prediction And Biomarker Identification Using High-throughput Data

The purpose of this project is to expand the existing Candis GUI application (GSoC 2017 code) that was developed to provide users the ability to perform cancer biomarker identification and cancer type and subtype predictions using microarray data. The existing version of Candis application offers users options to preprocess data, perform feature selection, and classification. The next version of Candis will have the ability to integrate other high-throughput datasets such as Transcriptomics, microRNA, and DNA Methylation data. Candis user-friendly interface will also provide the ability to visualize the description of the loaded datasets and display selected genes/biomarkers from feature selection step in tabular format with links to the online reference databases. The final product will be compatible with Linux, Mac and Windows operating systems. Check out a brief demo of the tool below!



Examples of new feature include:
  • * Data integration from public databases (NCBI, UniProt, etc.)
  • * User accounts and data management and persistence
  • * Advanced customization of the data analysis pipeline
  • * Advanced visualization options
  • * Improve performance
  • * Extend software infrastructure (unit testing, documentation)
  • * More specifics can be found here.


Required skills: ReactJS, Flask, Python, high-throughput data analysis and visualization experience preferred

Potential Mentors: Achilles Rasquinha, Dr. Akram Mohammed
Difficulty Rating: 2

Idea 11: Javascript/WebGL library for interactive visualization of large-scale network graphs: Edge Animations.

We are looking to extend our WebGL-based javascript network visualization library, Cell Collective Network Visualizer -- ccNetViz to support animation of edges. An example of such animation can be seen in the network visualized here. (Don't worry about any of the AI stuff on that page; the link is supposed to be only an example of what ccNetViz should support for the animated edges). And it's fast! Check out the example below (you can pan and zoom in these examples).

The animation should support customization by a dev user. Examples of customization should be the animated color, speed, as well as some pre-set styles.
Required skills: Javascript (ES6), WebGL, Graph Theory
Potential Mentors: Ales Saska, Dr. Tomas Helikar
Difficulty Rating: 3

Idea 12: jsVirtualBlot: Javascript library to visualize Western-like biological assays.

We are looking to develop a JS library that will enable life sciences researchers to visualize digital data in the familiar Western blot-like fashion (if you're not familiar with these techniques and expected visual ouput, you can take a look at these example explenations and tools: javaTool, wikiOverview).

The tool should have the following features (feel free to add others if you can justify them):

  • * Customizable scale (the Y-axis) that allows the user to change the axis label as well as the range of values
  • * Data import (likely as a csv file)
  • * Each band would correspond to an average of a range of values, including a pre-set number of standard deviations. The average value would be used as the darkest spot, and each standard deviation would be visualized as lighter colors in each direction from the average value.
  • * Save the image as png, jpg, svg


Required skills: Javascript (ES6), Life sciences
Potential Mentors: Dr. Tomas Helikar
Difficulty Rating: 2