Google Summer of Code 2017 Idea Page

Welcome to GSoC 2017 in computational systems biology!

Our group works at the interface of computer science, biology, and mathematics by applying computational approaches to the seas of data in biomedical research. One of the main interests of our group is the development of technologies to make large-scale computational approaches accessible and more collaborative to a wider scientific audience, as well as to life sciences students who may or may not have been exposed to computational methods before. Our recent web-based technology, Cell Collective, enables scientists from across the globe to construct and simulate large-scale computational models of biological systems in a highly collaborative fashion. This software enables biomedical researchers to study the dynamics of biological systems (e.g., cells) under both healthy and diseased conditions. Cell Collective provides a unique environment for real-time, interactive simulations to enable users to analyze and visualize the multitude of effects a disease-related mal-function can have on the rest of the cell.

This very same modeling software is being used by thousands of life sciences students to learn about biological by the means of building, simulating, breaking, and re-simulating computational models of various biological processes. Our group consists of computer scientists, biochemists, biologists, bioinformaticians, as well as mathematicians, creating an unique environment of diverse skills, integrated by a single interest point.

We have been fortunate to work with some great students over the last two years of GSoC (check out their testimonials here).

Please join our Google Group for additional project details, questions, and discussions.

Before applying, please review this application template, as well as the GSoC Student Guide.

Below you can find our project ideas for GSoC 2017.

Idea 0: Your own idea!

Feel free to suggest your own idea. Our interests are within the general space of easy-to-use, interactive, data visualization in large-scale networks. Please be detailed about the specifics of your project, why it is important, and how you plan to achieve it.

Idea 1: Javascript/WebGL library for interactive visualization of large-scale network graphs.

As a group interested in understanding of the dynamics of complex networks via collaborative efforts, we have developed a new web-based, large-scale network visualization component. One of the main obstacles associated with this project is the ability to visualize networks on the web that are very large (tens of thousands of nodes and edges) and provide interactive features (moving/adding/removing nodes, zooming/panning, etc), while making it usable from a performance perspective in most modern web-browsers.

To address this issue we developed a new WebGL-based javascript network visualization library, Cell Collective Network Visualizer -- ccNetViz. And it's fast! Check out the example below (you can pan and zoom in these examples).

This project will center around the development of additional optimization techniques to enable interactive visualization of very large networks, as well as new interactive graph features, including structured text rendering, new network layouts, etc.

Specific components of this project include:

  • * Touch-screen support (e.g., zoom-in, pan, other interactivity on tablets and phones).
  • * XGMML export/import
  • * Png/jpg export
  • * Structured text formatting options.
  • * Advanced node styling: eg., border, data-driven (e.g., color and size).
  • * Improve efficiency of SDF rendering

Required skills: Javascript (ES6), WebGL
Potential Mentors: Ales Saska, Dr. Tomas Helikar
Difficulty Rating: 2

Idea 2: Network Visualization Layouts.

We are looking to extend the network layout options within the aforementioned ccNetViz javascript library. These layouts include:

  • * Hive Plots (see below)
  • * Circular
  • * Hierarchical
  • * Grid
  • * Others, depending on your interests as indicated in your proposal

Hive Plots: Biological networks are typically visualized using traditional force-based or spectral layout algorithms. While these algorithms are useful for small to medium-sized networks, they result in largely not-so-informative and irreproducible "hair-balls" when used for large-scale networks (left side of the figure below). To address these issues, hive plots were developed as a method to manage the visual complexity and generate informative, quantitative, and comparable network layouts. Hive plot is a linear network layout whereby nodes are organized along axes based on specified data attributes. In addition, the arrangment of nodes on each axis can also provide information about the type of the data (node), providing an informative network visualization tool. An example of this layout can be seen in this D3.js component developed by Mike Bostock.

Required skills: Javascript (ES6), WebGL
Potential Mentors: Ales Saska, Dr. Tomas Helikar
Difficulty Rating: 1

Idea 3: Web pipeline for Flux Balance Analysis

Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to create a stand-alone web pipeline/service that will enable a user to submit an SBML model and select analysis methods (via a React-based GUI) to a Java-based server-side API, which will in turn submit the simulation/analysis job to a backend COBRApy service. This service will subsequently return the simulation/analysis results through the API back the the React client for user viewing.
The following simulation/analysis methods should be available through this pipeline by the end of the project: Define objective functions, Flux balance analysis, Flux variance analysis, gap-filling.


For those new to contraint-based modeling, these papers might be useful: Orth et. al,Bordbar et. al

Required skills: Java, Javascript (ES6), React, Data visualization packages such as D3.js

Potential Mentors: Dr. Bhanwar Lal Puniya, Dr. Akram Mohammed
Difficulty Rating: 2

Idea 4: Javascript library for Flux Balance and Variance Analysis

Flux Balance Analysis is a type of popular Constraint-based Reconstruction and Analysis methods for metabolic networks. While there are desktop-based tools for these analyses, web-based solutions are largely lacking. As a compliment to Idea 3, the goal of this project is to build major analytical methods available in COBRA and COBRApy as a stand-alone JS library for web-based analysis of metabolic networks. Should be NodeJS compatible.

The methods to be implemented in the library include: gap-filling, flux variance analysis, and flux balance analysis.
For those new to contraint-based modeling, these papers might be useful: Orth et. al,Bordbar et. al

Required skills: Javascript (ES6), Node.js

Potential Mentors: Varun Sharma, Dr. Bhanwar Lal Puniya
Difficulty Rating: 3

Idea 5: Web pipeline for ODE-based models.

Ordinary Differential Equations are widely used to model biological networks and processes. While there are many desktop-based tools for these analyses, web-based solutions are largely lacking. The goal of this project is to create a stand-alone web pipeline/service that will enable a user to submit an SBML model and select analysis methods (via a React-based GUI) to a Java-based server-side API, which will in turn submit the simulation/analysis job to a backend simulation service (e.g., COPASI, CellDesigner, Systems Biology Simulation Core Library, etc). This service will subsequently return the simulation/analysis results through the API back the the React client for user viewing.

The following simulation/analysis methods should be available through this pipeline by the end of the project: Parameter fitting, steady state analysis, simulation.

Required skills: Java, ReactJS, Javascript (data visualization packages such as D3), CSS3, HTML5

Potential Mentors: Dr. Jim Rogers, Dr. Bhanwar Lal Puniya, Dr. Tomas Helikar
Difficulty Rating: 2

Idea 6: Reaction-based model (differential equations) analysis library for JavaScript

Related to Idea 5, the objective of this project is to develop a stand-alone javascript library that will enable the evaluation and simulation of differential equation based models of biological processes. The following simulation/analysis methods should be available through this library by the end of the project: Parameter fitting, steady state analysis, simulation. This library will be demonstrated through a simple web UI which will enable a user to upload an SBML file with a computational model, set up a simulation/analysis, and visualize the results of the analysis. The library can leverage other open source javascript libraries, such as: numericJS, libSBML (for parsing and processing SBML files).

Required skills: Javascript (ES6), Node.js

Potential Mentors: Varun Sharma, Dr. Bhanwar Lal Puniya
Difficulty Rating: 3

Idea 7: Mobile-based (iOS) Blood-Sample Image Analysis.

The goal of this project is to develop a simple, portable, low-cost, and highly reliable screening test for detection of cancer biomarker(s) from a small drop of blood. This software component of this project aims to develop a mobile app to analyze the intensity of blood samples and quantify the presence or absence of various molecules. This project is a follow-up on a previous GSoC project, where this application was developed for Android. The core of this open source application will be modeled after and evaluated against well-established, open source image analysis software, ImageJ. While ImageJ is an application with a multitude of sophisticated and complex and customizable features, the focus of the mobile application will be primarily on user intuitiveness and simplicity while maintaining the rigor of precision in the implemented algorithms.

This application will take advantage of the high quality cameras that equip the majority of modern mobile devices, and support loss-less image compression. The main use case for the application consists of four seamlessly integrated steps: image capture, preprocessing, detection, and measurement.



Required skills: iOS app development, image analysis experience preferred

Potential Mentors: Dr. Jiri Adamec, Dr. Tomas Helikar
Difficulty Rating: 2

Idea 8: Javascript framework for Agent-based modeling.

The goal of this project is to develop a stand-alone javascript library to create and simulate agent-based models. Agent-based modeling (ABM) is a widely used technique to model systems of autonomous, and mutually interacting agents. ABM has been used in many fields, including life sciences with various objectives and applications, for example, to predict the spread of a pathogen, cancer development, cell differentiation, etc. Given its long history, ABM has strong theoretical and practical foundations with a wide variety of desktop-based software and methods for developing and simulating agent models. To facilitate a more collaborative environment for ABM in life sciences, we aim to also bring ABM to the web through this Javascript-based ABM framework. The framework will consist of Agents that live and communicate in a 3-dimmensional environment (utilizing threeJS) in a distrubuted and independent fashion. Each agent will be able to: move, receive signal from the environment, die, and divide. Agents will also be able to assume geometrical shape; however, in this version, they can all be of the same geometry (e.g., a sphere). You can review NetLogo (a widely-used desktop ABM software) to learn more about ABM.



Required skills: Javascript (ES6), three.js, agent-based modeling (preferred)

Potential Mentors: Dr. Akram Mohammed, Dr. Kenneth Wertheim
Difficulty Rating: 3

Idea 9: A database for tissue sample analysis

The goal of this project is to develop a web-based and database driven environment for scientists to upload and analyze data related to various tissue samples for cancer and other disease classification.

The webserver will have the following functionalities: user management, data upload/download, datasets statistics by various categories (cancer type, tissue type, disease type, platform used etc.), data search capabilities, data filters based on platform used, dataset type (cell-line data, TCGA data, DNA-methylation data, cancer panel data, normal tissue panel data etc.), dataset size, cancer type and subtype, tissue type, disease type, sample demographics, differential analysis (normal vs. cancer, cancer vs. cancer , normal vs. normal paired data). The software will be designed to be scalable in order to enable effective development of additional features as part of a future project.

Required skills: React, Javascript, D3.js, high-throughput data analysis and visualization experience preferred

Potential Mentors: Dr. Akram Mohammed, Dr. Tomas Helikar
Difficulty Rating: 2

Idea 10: CancerDiscover: A graphical user interface for cancer prediction and biomarker identification using microarray data

The purpose of this project is to develop a (cross-platform) graphical user interface (GUI) stand-alone application is to provide users the ability to perform cancer type and subtype predictions using microarray data. The GUI will provide users options to preprocess data, partition it into train and test datasets, and perform feature selection and classification. For data analysis and processing, the GUI will directly utilize the open source, command-line CancerDiscover pipeline developed by our group: https://github.com/HelikarLab/CancerDiscover

Required skills: QT, high-throughput data analysis and visualization experience preferred

Potential Mentors: Dr. Akram Mohammed, Dr. Tomas Helikar
Difficulty Rating: 2