Skip to content
/ blog Public

My open source contributions, and ML Qs and As

License

Notifications You must be signed in to change notification settings

kddubey/blog

Repository files navigation

My blog

This is mostly a home for simulations for my questions and answers on stats.stackexchange.com and stackoverflow.com.

Here also lies a list of my contributions to open source software.

dir/file link q/a link
select_on_test.ipynb Demonstrate that a model can simultaneously be
selected and evaluated on a test set
train_on_test_features For high rank data and a small test set, train
a PCA on test set features to boost test set performance!
precision_drop.ipynb A simple answer to: why did precision drop in
production?
auprc.ipynb Demonstrate that integral approximators are
trying to hurt you
db_sampling_rate.ipynb Calculate a sampling rate for a database query
negative_vs_downsampling.ipynb (not done) What's the need to formulate negative
sampling for contrastive training?
to_batch_or_not_to_batch (bad!) Mathematically analyze and demo dynamic
batching
var_pred_var_error Does higher variance in predictions result in
higher variance error estimation?
sample_via_gumbel Demonstrate that one can sample directly in
log-space
(external) cappr Demonstrate that a more usable version of
zero-shot text classification works
langchain_save_all.ipynb Save all method calls. Inspired by this issue

My dumber code dumps are in dumpy.

Setup

Need Python 3.8+

Create an environment blog using venv:

cd /your/venvs

python -m venv blog

source blog/bin/activate

python -m pip install -r /path/to/blog/requirements.txt

If the notebook says that it needs to run on a GPU machine, and you have a Google account, open the notebook in Google Colab.

Usage

Interact w/ the code via Jupyter. I like VS code notebooks.

About

My open source contributions, and ML Qs and As

Resources

License

Stars

Watchers

Forks