Dexter Antonio
I'm a data scientist and developer passionate about applying next-gen data science techniques to speed up research and draw novel insights from unusual data.
I'm a data scientist and developer passionate about applying next-gen data science techniques to speed up research and draw novel insights from unusual data.
GPA: 3.85/4.00
GPA: 3.55/4.00
GPA: 3.84/4.00
Surface Enhanced Raman Spectroscopy (SERS) is an analytical technique used for detecting low-abundance biomolecules in biological fluids. One particularly promising application of this technique is the detection of cancer-cell-originating extracellular vesicles in human blood. The presence of these extracellular vesicles indicates the presence of cancerous cells in the body, thus providing an early warning of cancer. A major barrier to the widespread application of this technique is the challenge associated with distinguishing cellular signals from background signals in the SERS spectra.
I addressed this problem through careful data engineering and the application of modern machine learning algorithms. First, I built a program that allowed experienced researchers to label spectra as either “good” (i.e., a real signal) or “bad” (i.e., background). After this labeling was complete, I then used a random forest algorithm to automatically classify unlabeled spectra as “good” or “bad”. Finally, I demonstrated that this algorithm could be integrated with LabVIEW. This integration demonstration set the groundwork so that the classification could be done automatically, allowing for more efficient data collection.
I am currently building the Multiscale Atomic Zeolite Simulation Environment (MAZE) python package to streamline the calculations performed in my research group. The MAZE project extends the Atomic Simulation Environment (ASE) to naturally represent zeolites facilitating the calculations required to determine their properties. The main functionality of this code comes about by creating classes which represent zeolites and their derivatives. These zeolite classes inherit from ASE’s Atoms object, and can be treated as one. MAZE also includes additional functionality, performing calculations and allowing zeolite derivatives to be quickly generated.
Global climate change is one of the biggest challenges humanity faces today. One promising solution is retrofitting existing fossil fuel powerplants with CO2 capture technology. Zeolites, which are nano-porous materials used industrially for gas separation, hold considerable promise for solid-state CO2 capture. The diversity of zeolite chemical structures, makes identifying the best Zeolite for this purpose challenging, thus computational chemistry experiments are needed to narrow down the number of potential candidates.
To identify promising zeolites, I utilized Grand Canonical Monte Carlo (GCMC) simulations to predict the CO2 capture ability of a number of zeolites. To facilitate these calculations, I built a host of tools to make setting up the calculations simple. For example, I built an extension of the Atomic Simulation Environment (ASE) commonly used in computational experiments to better facilitate zeolite calculations. I have also written a Python wrapper for the C-coded GCMC simulation package to rapidly perform calculations.
I am currently a participant in UC Davis Lab’s Hack for California Research cluster where I lead the General Plan Mapping Project. This project aims to make the hundreds of general plans prepared by California cities and counties easily searchable.
A major challenge of this project was finding a way to search the tens of thousands of document pages efficiently. To meet this requirement, I developed a custom word indexer in C++, which was able to integrate with the initial mapping dashboard and achieve high performance. The website was then reprogrammed in Python, which allowed me to utilize Elasticsearch to improve the searching capabilities significantlyThe first version of the R shinny website can be found here. The new Elasticsearch based website can be found here . The Github repo for the python version of this projects can be found here.
Data Science, Machine Learning, Agile Project Management, Industry Knowledge, Mathematics, Chemistry, Laboratory Skills, Chemical Engineering, Agile Methodologies, Strategic Planning, Project Management, Data Analysis, Requirements Gathering, Software Development, Quantitative Analysis, Big Data, Artificial Intelligence (AI), Cell Culture, Computer Vision, Python, R (Programming Language), SQL, Python (Programming Language), MATLAB, Git, Elasticsearch, C, C++, Cross-functional Teamwork, Communication, Presentations, Problem Solving, Leadership, ML/AI, Machine Learning Engineering, openCV, Data Pipelines, ML Models, CI/CD, Data Analytics, FinTech, NumPy, Pandas, Scikit-learn
When I am not doing research, I enjoy spending time outdoors. I am living in Davis, California right now, which is close to the Bay Area and Tahoe. Both of these places have hikes to explore. I grew up in the Bay Area and want to move back there or move to Seattle. Technology of all kinds is really interesting to me and want help bring the future into existence.