Skip to content

Latest commit

 

History

History
129 lines (77 loc) · 11.4 KB

index.md

File metadata and controls

129 lines (77 loc) · 11.4 KB

Scientific software developer in the Washington, D.C. area.

Portfolio of my projects

[Tautomer Generation Algorithms and InChI Representations]({% post_url 2024-05-01-Tautomer-Sources-Comparison %})

[Histogram of frequency against difference in number of tautomers from RDKit baseline algorithm minus other sources]({% post_url 2024-05-01-Tautomer-Sources-Comparison %})

Which cheminformatics algorithms produce the most tautomers? And how successful is InChI at representing with a single representation all tautomers of a given structure?

Molecular Isotopic Distributions: [Permutations]({% post_url 2023-12-26-Molecular-isotopes-1-permutations %}) and [Combinations]({% post_url 2024-01-20-Molecular-isotopes-2-combinations %})

[Abundance against mass for SCl2 molecular isotopes]({% post_url 2023-12-26-Molecular-isotopes-1-permutations %})

These posts use two different methods to calculate molecular isotopic mass distributions.

[RDKit Contribution MolsMatrixToGridImage()]({% post_url 2023-12-02-MolsMatrixToGridImage-simplifies-code %})

[Three reactions, each in a row. First column: Target molecule and whether it's accessible based on commercial availability of reactants. Subsequent columns: Each reactant and whether it's commercial available.]({% post_url 2023-12-02-MolsMatrixToGridImage-simplifies-code %})

I contributed MolsMatrixToGridImage to the RDKit 2023.09.1 release to draw row-and-column grids of molecules.

[Display Molecular Formulas]({% post_url 2023-10-28-Display-Molecular-Formulas %})

Uses Python, RDKit, seaborn, and matplotlib

[Two series of molecules with carbon chains 3, 2, and 1 atoms long. Top: Dialdehydes, with the one-carbon molecule, CO2, not shown. Bottom: Diols.]({% post_url 2023-10-28-Display-Molecular-Formulas %})

How to display molecular formulas such as C3H4O2 in molecular grids, tables, and graphs. Also works for other HTML-, Markdown-, or LaTeX-formatted text.

[Molecular Formula Generation]({% post_url 2023-10-20-Molecular-Formula-Generation %})

Uses Python and RDKit

[Photosynthesis chemical equation: 6CO2 + 6H2O → C6H12O6 + 6O2]({% post_url 2023-10-20-Molecular-Formula-Generation %})

In cheminformatics, the typical way of representing a molecule is with a SMILES string such as CCO for ethanol. However, there are still cases where the molecular formula such as C2H6O is useful.

[Refitting Data From Wiener’s Classic Cheminformatics Paper]({% post_url 2023-04-25-Refitting-Data-from-Wiener %})

Uses Python, SciPy, Polars, NumPy, seaborn, matplotlib, and mol_frame

[Graph of calculated against observed boiling point for alkanes]({% post_url 2023-04-25-Refitting-Data-from-Wiener %})

How well did cheminformatics pioneers Egloff and Wiener fit their models to boiling points of alkanes in the 1940s? This blog post revisits their fits using digital tools.

[Revisiting a Classic Cheminformatics Paper: The Wiener Index]({% post_url 2023-03-10-Revisiting-a-Classic-Cheminformatics-Paper-The-Wiener-Index %})

Uses Python, RDKit, Polars, matplotlib, seaborn, py2opsin, and mol_frame

[Graph of calculated against observed boiling point for alkanes]({% post_url 2023-03-10-Revisiting-a-Classic-Cheminformatics-Paper-The-Wiener-Index %})

This post revisits Harry Wiener's article "Structural Determination of Paraffin Boiling Points", extracts data for molecules from it, recalculates cheminformatics parameters and boiling points, and plots the data.

[RDKit Utility to Check Whether Starting Materials for Synthesizing Your Target Molecules Are Commercially Available]({% post_url 2023-02-07-Are-the-Starting-Materials-for-Synthesizing-Your-Target-Molecules-Commercially-Available %})

Uses Python, RDKit, PubChem's API, asyncio, and Semaphore

[Three reactions, each in a row. First column: Target molecule and whether it's accessible based on commercial availability of reactants. Subsequent columns: Each reactant and whether it's commercial available.]({% post_url 2023-02-07-Are-the-Starting-Materials-for-Synthesizing-Your-Target-Molecules-Commercially-Available %})

Given target molecules and reactions to synthesize them, determine whether the starting materials are commercially available using PubChem's API, and thus whether the target is synthetically accessible.

[RDKit Utility to Create a Mass Spectrometry Fragmentation Tree]({% post_url 2023-01-02-Mass-Spectrometry-Fragmentation-Tree %})

Uses Python and RDKit

[Annotated mass spectrometry fragmentation tree using the function mass_spec_frag_tree in this blog post]({% post_url 2023-01-02-Mass-Spectrometry-Fragmentation-Tree %})

Given a mass spec fragmentation hierarchy, with species as SMILES strings, display the fragmentation tree in a grid, labeling each species with its name and either mass or mass to charge ratio m/z.

[RDKit Utility to Find the Maximum Common Substructure, and Groups Off It, Between a Set of Molecules]({% post_url 2022-12-25-RDKit-Find-Groups-Off-Common-Core %})

Uses Python and RDKit

[Annotated grid of maximum common substructure and core; molecules and groups off maximum common substructure]({% post_url 2022-12-25-RDKit-Find-Groups-Off-Common-Core %})

Given a collection of molecules as SMILES strings, find the maximum common substructure (MCS) match between them, and the groups off that common core for each molecule, displaying the results using a grid.

[Chemistry machine learning for drug discovery with DeepChem]({% post_url 2022-12-13-Chemistry-machine-learning-for-drug-discovery-with-DeepChem %})

Uses Python, DeepChem, seaborn, Matplotlib, and pandas

[Predicted against measured lipophilicity for test and train data]({% post_url 2022-12-13-Chemistry-machine-learning-for-drug-discovery-with-DeepChem %})

Use the DeepChem deep learning package to predict compounds' lipophilicity--how well they are absorbed into the lipids of biological membranes, which is important for oral delivery of drugs.

[RDKit Utility to Visualize Retrosynthetic Analysis Hierarchically]({% post_url 2022-11-11-RDKit-Recap-decomposition-tree %})

Uses Python and RDKit

[Annotated Recap retrosynthetic hierarchy tree]({% post_url 2022-11-11-RDKit-Recap-decomposition-tree %})

Given a target molecule, use the Recap algorithm{:target='_blank'} to decompose it into a set of fragments that could be combined to make the parent molecule using common reactions. Display the fragmentation hierarchically.

[RDKit Utility to Find and Highlight the Maximum Common Substructure Amongst Molecules]({% post_url 2022-10-09-RDKit-find-and-highlight-the-maximum-common-substructure-between-molecules %})

Uses Python and RDKit

[Maximum substructure match, and the two molecules which are labeled by their functional groups]({% post_url 2022-10-09-RDKit-find-and-highlight-the-maximum-common-substructure-between-molecules %})

Given a collection of molecules as SMILES{:target='_blank'} strings, find the maximum common substructure (MCS) match between them as a SMARTS{:target='_blank'} string, display the match pattern as a molecule, and highlight the match pattern in each molecule using a grid.

Uses Python, NumPy, SymPy, ChemPy, Flask, JavaScript, and Bootstrap

Find a given number of points which satisfy constraints given in a constraints file for an n-dimensional space defined on the unit hypercube, then write them to an output file.

Optionally, identify the components (dimensions) in the constraints file using chemical formulas, and Sampler will use ChemPy to calculate their molar masses, then output the component weight fraction.

Uses Ruby, Sinatra, PostgreSQL, and JavaScript

Understand how the elements are related to each other. Emphasizes electronic configuration of the elements.

My open-source contributions

RDKit cheminformatics package

  • Conceived, proposed, and coded MolsMatrixToGridImage feature to use a two-dimensional (nested) data structure as input to create molecular grid images. Feature was merged into the main codebase by the project maintainer and scheduled for 2023_09_1 release.
  • Improved documentation by illustrating drawing capability in tutorial and adding SMILES (chemical notation) for R groups

SymPy computer algebra system in pure Python

ChemPy package for chemistry in Python

Sphinx documentation generator