Dexter Antonio

1672 Spring Street · Davis, CA 95616 · (415) 971-3026 · dexter.d.antonio@gmail.com

I'm a data scientist and developer passionate about applying next-gen data science techniques to speed up research and draw novel insights from unusual data.


Relevant Experience (Please see my LinkedIn for an updated resume)

Graduate Student Researcher

UC Davis, Department of Chemical Engineering

  • Automated experimentalist’s workflow by using a trained ML algorithm to classify thousands of Raman spectra
  • Applied machine learning algorithms to identify cancerous and noncancerous blood samples
  • Coordinated the development of a python software package with git and GitHub
  • Screened materials for their CO2 capture ability with Grand canonical Monte Carlo (GCMC) simulations
  • Designed and trained a convolution neural network to identify chemical concentrations from simulated spectra

September 2018 - Present

Teaching Assistant

UC Davis, Department of Chemical Engineering

  • Led live Python labs on utilizing NumPy and Matplotlib to analyze and visualize energy transport phenomena
  • Independently led labs consisting of 30+ students and ensured a safe and positive environment
  • Taught complex chemical engineering concepts in an accessible and exciting way through the design of coffee
September 2019 - Present

Participant

UC Davis, DataLab’s Hack for California Research Cluster

  • Implemented Elasticsearch to scan tens of thousands of pages of converted pdf documents for key phrases
  • Created a California General Plan mapping website with R shiny that visualizes general plan policies

January 2020 - Present

Data Science Intern

Allstate

  • Improved lead ranking by developing a model to predict clients’ likelihoods to bind using billions of rows of data
  • Applied SQL to filter, clean and build a dataset used to train and identify a high performing ML model
  • Engineered a temporal feature which captured seasonal trends while avoiding bias, improving model accuracy
  • Effectively communicated with team ensuring ML models were well understood and incorporated into business processes

June 2020 - September 2020
Download a PDF of my Resume Here

Education

University of California, Davis

Master of Science
Chemical Engineering

GPA: 3.85/4.00

June 2018 - Present (expected June 2021)
Dual Bachelor of Science and Bachelor of Arts Degree Program

Columbia University

Bachelor of Science
Chemical Engineering

GPA: 3.55/4.00

University of Puget Sound

Bachelor of Arts
Chemistry

GPA: 3.84/4.00

September 2012 - May 2017

Projects

Surface Enhanced Raman Spectroscopy (SERS) is an analytical technique used for detecting low-abundance biomolecules in biological fluids. One particularly promising application of this technique is the detection of cancer-cell-originating extracellular vesicles in human blood. The presence of these extracellular vesicles indicates the presence of cancerous cells in the body, thus providing an early warning of cancer. A major barrier to the widespread application of this technique is the challenge associated with distinguishing cellular signals from background signals in the SERS spectra.

I addressed this problem through careful data engineering and the application of modern machine learning algorithms. First, I built a program that allowed experienced researchers to label spectra as either “good” (i.e., a real signal) or “bad” (i.e., background). After this labeling was complete, I then used a random forest algorithm to automatically classify unlabeled spectra as “good” or “bad”. Finally, I demonstrated that this algorithm could be integrated with LabVIEW. This integration demonstration set the groundwork so that the classification could be done automatically, allowing for more efficient data collection.

I am currently building the Multiscale Atomic Zeolite Simulation Environment (MAZE) python package to streamline the calculations performed in my research group. The MAZE project extends the Atomic Simulation Environment (ASE) to naturally represent zeolites facilitating the calculations required to determine their properties. The main functionality of this code comes about by creating classes which represent zeolites and their derivatives. These zeolite classes inherit from ASE’s Atoms object, and can be treated as one. MAZE also includes additional functionality, performing calculations and allowing zeolite derivatives to be quickly generated.

Global climate change is one of the biggest challenges humanity faces today. One promising solution is retrofitting existing fossil fuel powerplants with CO2 capture technology. Zeolites, which are nano-porous materials used industrially for gas separation, hold considerable promise for solid-state CO2 capture. The diversity of zeolite chemical structures, makes identifying the best Zeolite for this purpose challenging, thus computational chemistry experiments are needed to narrow down the number of potential candidates.

To identify promising zeolites, I utilized Grand Canonical Monte Carlo (GCMC) simulations to predict the CO2 capture ability of a number of zeolites. To facilitate these calculations, I built a host of tools to make setting up the calculations simple. For example, I built an extension of the Atomic Simulation Environment (ASE) commonly used in computational experiments to better facilitate zeolite calculations. I have also written a Python wrapper for the C-coded GCMC simulation package to rapidly perform calculations.

I am currently a participant in UC Davis Lab’s Hack for California Research cluster where I lead the General Plan Mapping Project. This project aims to make the hundreds of general plans prepared by California cities and counties easily searchable.

A major challenge of this project was finding a way to search the tens of thousands of document pages efficiently. To meet this requirement, I developed a custom word indexer in C++, which was able to integrate with the initial mapping dashboard and achieve high performance. The website was then reprogrammed in Python, which allowed me to utilize Elasticsearch to improve the searching capabilities significantlyThe first version of the R shinny website can be found here. The new Elasticsearch based website can be found here . The Github repo for the python version of this projects can be found here.

Skills

Programming Languages & Tools
Workflow

Keywords

Data Science, Machine Learning, Agile Project Management, Industry Knowledge, Mathematics, Chemistry, Laboratory Skills, Chemical Engineering, Agile Methodologies, Strategic Planning, Project Management, Data Analysis, Requirements Gathering, Software Development, Quantitative Analysis, Big Data, Artificial Intelligence (AI), Cell Culture, Computer Vision, Python, R (Programming Language), SQL, Python (Programming Language), MATLAB, Git, Elasticsearch, C, C++, Cross-functional Teamwork, Communication, Presentations, Problem Solving, Leadership, ML/AI, Machine Learning Engineering, openCV, Data Pipelines, ML Models, CI/CD, Data Analytics, FinTech, NumPy, Pandas, Scikit-learn


Interests

When I am not doing research, I enjoy spending time outdoors. I am living in Davis, California right now, which is close to the Bay Area and Tahoe. Both of these places have hikes to explore. I grew up in the Bay Area and want to move back there or move to Seattle. Technology of all kinds is really interesting to me and want help bring the future into existence.