nanogpt with labels

Nanogpt with labels implemented. Performance on the task of subtracting two single-digit numbers is increased for the comparable hyperparameters.

subtraction task results

ranking sum, lower is better

`no_labels`

39M parameter model, 512 context window

eval_type	sum
0shot_cot	373
0shot_direct	294
nshot_cot	453
nshot_direct	376

`with_labels`

39M parameter model, 512 (effective) context window

eval_type	sum
0shot_cot	173
0shot_direct	142
nshot_cot	178
nshot_direct	193

`label_embeddings`

39M parameter model, 512 context window,

eval_type	sum
0shot_cot	421
0shot_direct	263
nshot_cot	382
nshot_direct	956

training

6.5 hours compute budget on 1 H100 for each model
132 batch_size for 512 context window, 38 batch_size for 1048 context window

tokenizer & labeler

BPE-like tokenizer, vocab_size 1024
labels are word commonality based + character "type" (digit, typography, emoji, special, etc)

original hypothesis & findings

The original hypothesis was that the label embedding model must increase eval performance over no labels. This has been proven false - label embedding model has slightly better/worse eval performance, overall mixed. On the other hand, having a label on every second token (with_labels model) does increase eval performance for the comparable hyperparameters. The intuition here is that the extra token is a chance for the model to "think".

caveats

The model may be memorizing the substraction result from the dataset, thus becoming more of a recall test rather than calculation test. Although there are instances where the model gets the right result that's not in the dataset.
No optimal hyperparameter search was done.
main branch is for evals, training is in real_run

weights

See here

Name		Name	Last commit message	Last commit date
Latest commit History 553 Commits
dataset		dataset
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
WEIGHTS.md		WEIGHTS.md
checkpoint.sh		checkpoint.sh
compare_dates.py		compare_dates.py
concat_dataset.py		concat_dataset.py
eval.py		eval.py
eval_all.sh		eval_all.sh
eval_simple.py		eval_simple.py
first_lines.sh		first_lines.sh
future_work		future_work
gendata.py		gendata.py
generate_0shot_cot_exers.py		generate_0shot_cot_exers.py
generate_0shot_direct_exers.py		generate_0shot_direct_exers.py
generate_evals.py		generate_evals.py
generate_nshot_cot_exers.py		generate_nshot_cot_exers.py
generate_nshot_direct_exers.py		generate_nshot_direct_exers.py
grade_0_toc.json		grade_0_toc.json
grade_0_toc.txt		grade_0_toc.txt
grade_1_toc.json		grade_1_toc.json
grade_1_toc.txt		grade_1_toc.txt
grade_2_toc.json		grade_2_toc.json
grade_2_toc.txt		grade_2_toc.txt
grade_3_toc.json		grade_3_toc.json
grade_3_toc.txt		grade_3_toc.txt
grade_4_toc.json		grade_4_toc.json
grade_4_toc.txt		grade_4_toc.txt
inference.py		inference.py
interpret_results.py		interpret_results.py
md2md.sh		md2md.sh
modify_first_lines.py		modify_first_lines.py
parse_toc.py		parse_toc.py
prompt_inference.py		prompt_inference.py
prompt_inference_simple.py		prompt_inference_simple.py
prompts		prompts
ranking_sum_results_29jan		ranking_sum_results_29jan
real_run_output		real_run_output
requirements.txt		requirements.txt
results_29jan		results_29jan
ruff.toml		ruff.toml
set_title.py		set_title.py
test.py		test.py
toker.py		toker.py
toker_all.sh		toker_all.sh
toker_decode.py		toker_decode.py
toker_decode_simple.py		toker_decode_simple.py
toker_simple.py		toker_simple.py
train.py		train.py
train_all.sh		train_all.sh
train_simple.py		train_simple.py
trim_dashes.py		trim_dashes.py
trim_lines.py		trim_lines.py

noway/labels-nanogpt

Folders and files

Latest commit

History

Repository files navigation

nanogpt with labels

subtraction task results

no_labels

with_labels

label_embeddings

training

tokenizer & labeler

original hypothesis & findings

caveats

weights

About

Topics

Resources

Stars

Watchers

Forks

Languages

`no_labels`

`with_labels`

`label_embeddings`