Curriculum Module: Synthetic Biology and Machine Learning

A collection of six discussions with researchers who work at the interface of synthetic biology and machine learning.

Overview

Introductory video (4:07) explaining the series by EBRC member Mary Dunlop, Associate Professor at Boston University.

This series is for synthetic biologists who are interested in learning more about what machine learning is, how it is used, and what kinds of problems it can be applied to in the field. There are six discussions, each has a research presentation, a discussion about background and advice, and links to other content.

 

 

 

Research Discussions & Advice

Learning protein sequence-function relationships from deep mutational scanning data.

With Phil Romero (University of Wisconsin-Madison)

Research Discussion (52:37)                                               Background and Advice (5:22)

                             

Research discussion links
Gelman et al. – PNAS 2021
GitHub associated with manuscript
Romero Lab
Gitter Lab

Other resources
Protabank
Sebastian Raschka’s machine learning resources 

Examples of machine learning papers for protein engineering
Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics
Machine learning-assisted directed protein evolution with combinatorial libraries
Low-N protein engineering with data-efficient deep learning
Machine learning-aided engineering of hydrolases for PET depolymerization

 

A deep learning model to predict antibiotic activity based on chemical structure.

With Jonathan Stokes (McMaster University)

Research Discussion (34:33)                                                  Background and Advice (13:12)

                                 

Research discussion links
Stokes et al. – Cell 2020
Stokes Lab 

 

Optimus 5-Prime model for predicting ribosome loading given sequence data for the 5’ untranslated region upstream of a gene of interest.

With Georg Seelig (University of Washington)

Research Discussion (43:15)                                                Background and Advice (4:37)

                               

Research discussion links
Sample et al. – Nature Biotech 2019
GitHub with code for Optimus 5-Prime model 
Seelig Lab

 

Automated Recommendation Tool that works with the synthetic biology design-build-test-learn cycle to provide recommendations that can guide which designs to build and test next.

With Tijana Radivojevic (Berkeley National Lab)

Research Discussion (36:59)                                                  Background and Advice (11:08)

                                 

Research discussion links
Radivojevic, et al. – Nature Communications 2020
Zhang, et al. – Nature Communications 2020
Machine Learning for Metabolic Engineering: A Review
Quantitative Metabolic Modeling Group at Berkeley Lab

 

Promoter calculator model that can be used to accurately predict transcription initiation rates for sigma70 promoter sequences.

With Howard Salis (Pennsylvania State University)

Research Discussion (51:29)                                                Background and Advice (5:27)

                               

Research discussion links
La Fleur, Hossain, Salis – bioRxiv preprint
Promoter Calculator
Salis Lab GitHub
Salis Lab

 

Reinforcement learning approach to control the composition of co-cultures within bioreactors.

With Neythen Treloar (University College London), Brian Ingalls (University of Waterloo), Chris Barnes (University College London)

Research Discussion (35:42)                                                  Background and Advice (10:35)

                                 

Research discussion links
Treloar et al. PLOS Computational Biology 2020
GitHub associated with the manuscript
Treloar et al. bioRxiv 2022
Ingalls Lab
Barnes Lab

Reinforcement learning references
Sutton and Barto “Reinforcement Learning: An Introduction”
Spinning Up in Reinforcement Learning
Open AI Gym
Python libraries for Reinforcement Learning

Other Resources

Towards Data Science
Curated list of free machine learning courses and tutorials by Tivadar Danka
ML Protein Engineering Seminar Series

Acknowledgements

This series was organized, produced, and edited by Dr. Mary Dunlop.

Thanks to all individuals who generously participated in the discussions, Heidi Klumpe for early advice on interview design, and David Dunlop for training on video editing. This series was produced with support from the National Science Foundation grant MCB-2143289.