Kehang Han

LinkedIn | GitHub | Website | CV (PDF) | 617-694-7029 | Cambridge, MA


  • Languages: Python, C/C++, Java, R, Matlab, PHP
  • Packages: Pandas, Flask, scikit-learn, PyTorch, Keras
  • ML/Stats: Random Forests, Graph Convolutional Neural Network, Ordinal Classification
  • Databases: MySQL, PostgreSQL, mongoDB
  • Cloud: AWS, Azure, server admin


Staples, Inc., Data Scientist, Framingham, MA

July 2018-Present

  • Deep learning based Image Product Detection service: FasterRCNN model, Active learning, Azure Cloud
  • Product Matching service: ETL pipeline, ordinal classification, Azure Cloud
  • Intelligent Inventory Management system: IoT hardware, UI design, Azure Cloud, API hub
  • Open source software - model2service: conveniently converts ML models into API services
  • Other in-house services: Named Entity Recognition, General Sequence Classification Platform, Warehouse Simulator, Robotics Optimization

Massachusetts Institute of Technology, PhD, Cambridge, MA

August 2012-May 2018

  • Designed and implemented Graph Convolutional Neural Network suitable for molecular property prediction (e.g., entropy, heat capacity, etc.)
  • Co-designed uncertainty estimation functionality for Graph Convolutional Neural Network using Dropout as an approximate Bayesian Inference strategy
  • Built a self-evolving machine that actively seeks to improve itself with uncertainty estimation and automatic quantum mechanics calculation
  • Lead developer of an open source project team (~10 members from MIT and Northeast University) for Reaction Mechanism Generator, where I introduced parallelism, pruning, machine learning into the library.

Shell Oil Company, Data Science Intern, Hamburg, Germany

June 2015-Sep 2015

  • Streamlined price data processing for DACH Supply Chain Optimization Tool
  • Created KPI analysis and visualization tools using R for scenario study

Tsinghua University, BSc, Beijing, China

August 2008-June 2012

  • Developed a scalable heuristic-based algorithm for Mix-Integer Linear Programming used in computational protein design, which achieved 20 times speed-up for large enzyme systems
  • Software developer of Protein Design Algorithm (PRODA) library (written in C), built by Tsinghua ChemE System Engineering Lab


PhD, Chemical Engineering (major) and Computer Science (minor), Massachusetts Institute of Technology, USA

August 2012-May 2018

BSc, Chemical Engineering, Tsinghua University, China

August 2008-June 2012


  • Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry The Journal of Physical Chemistry A (2019). [link]
  • Scalability strategies for automated reaction mechanism generation Computers & Chemical Engineering (2018). [link]
  • A Fragment-Based Mechanistic Kinetic Modeling Framework for Complex Systems Industrial & Engineering Chemistry Research (2018). [link]
  • An Extended Group Additivity Method for Polycyclic Thermochemistry Estimation International Journal of Chemical Kinetics (2018). [link]
  • On-the-fly pruning for rate-based reaction mechanism generation Computers & Chemical Engineering (2017). [link]
  • Reversible encapsulation of lysozyme within mPEG-b-PMAA: experimental observation and molecular dynamics simulation Soft Matter (2013). [link]
  • Systematic optimization model and algorithm for binding sequence selection in computational enzyme design Protein Science (2013). [link]
  • Comparison of catalytic combustion of carbon monoxide and formaldehyde over Au/ZrO2 catalysts Catalysis Today (2010). [link]