{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":573203730,"defaultBranch":"main","name":"tiktoken","ownerLogin":"openai","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-12-01T23:22:11.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/14957082?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715623401.0","currentOid":""},"activityList":{"items":[{"before":"bfe00ad1bf59fac47513b45fe5173672dcbbcbb4","after":"c0ba74c238d18b4824c25f3c27fc8698055b9a76","ref":"refs/heads/main","pushedAt":"2024-05-13T19:24:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"update README to mention gpt-4o","shortMessageHtmlLink":"update README to mention gpt-4o"}},{"before":"9d01e5670ff50eb74cdb96406c7f3d9add0ae2f8","after":"bfe00ad1bf59fac47513b45fe5173672dcbbcbb4","ref":"refs/heads/main","pushedAt":"2024-05-13T17:29:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Bump cibuildwheel","shortMessageHtmlLink":"Bump cibuildwheel"}},{"before":"1b9faf2779855124f05174adf1383e53689ed94b","after":"9d01e5670ff50eb74cdb96406c7f3d9add0ae2f8","ref":"refs/heads/main","pushedAt":"2024-05-13T17:09:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Sync codebase","shortMessageHtmlLink":"Sync codebase"}},{"before":"6defed51291184e3de4cb3ac8329994d0cc1d721","after":"1b9faf2779855124f05174adf1383e53689ed94b","ref":"refs/heads/main","pushedAt":"2024-02-11T08:20:22.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Simplify byte_pair_merge (#255)\n\nBased on suggestion in https://github.com/openai/tiktoken/pull/239\r\n(specifically 8f5dd7d)\r\n\r\nLike that commit, this:\r\n- Does the init in a single loop and saves a loop if there are no merges\r\n- Simplifies get_rank and no longer uses it in init (so you don't need\r\nmultiple skip values)\r\n\r\nUnlike that commit:\r\n- We drop optimisations enabled by ignoring single tokens. These didn't\r\nshow any benefit on benchmarks for me (this makes sense given typical\r\npiece sizes, but let me know if that's unexpected!). Given this, I opted\r\nfor the simpler version.\r\n- I preserve some of the comments from the original that I think are\r\nstill useful\r\n\r\nCo-authored-by: @paplorinc\r\n\r\n---------\r\n\r\nCo-authored-by: Lőrinc Pap <1841944+paplorinc@users.noreply.github.com>","shortMessageHtmlLink":"Simplify byte_pair_merge (#255)"}},{"before":"2cc09e0776964c30e51f5a6475d9cd6e1572c828","after":null,"ref":"refs/heads/byte-pair-merge","pushedAt":"2024-02-11T08:20:22.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"}},{"before":"66a57bae4017d2b41a3e47654db8ad44c56ee66f","after":"2cc09e0776964c30e51f5a6475d9cd6e1572c828","ref":"refs/heads/byte-pair-merge","pushedAt":"2024-02-11T08:15:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Apply suggestions from code review\n\nCo-authored-by: Lőrinc Pap <1841944+paplorinc@users.noreply.github.com>","shortMessageHtmlLink":"Apply suggestions from code review"}},{"before":null,"after":"66a57bae4017d2b41a3e47654db8ad44c56ee66f","ref":"refs/heads/byte-pair-merge","pushedAt":"2024-02-09T22:36:24.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Simplify byte_pair_merge","shortMessageHtmlLink":"Simplify byte_pair_merge"}},{"before":"053c00fe120f02d084f9f9ad21ed6e9c8d787d64","after":null,"ref":"refs/heads/paplorinc-3","pushedAt":"2024-02-09T21:10:02.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"}},{"before":"b4c687ef3625e1737fba4f6643d7bedb9d6d2b6d","after":"6defed51291184e3de4cb3ac8329994d0cc1d721","ref":"refs/heads/main","pushedAt":"2024-02-09T21:10:01.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Inline custom mapping function in _byte_pair_merge","shortMessageHtmlLink":"Inline custom mapping function in _byte_pair_merge"}},{"before":"b9ebf0fe68b0d23226d80cf4611192768bbaf92f","after":null,"ref":"refs/heads/paplorinc-2","pushedAt":"2024-02-09T21:09:49.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"}},{"before":"6e4851a76be22a4f9cc428de3ea39d50ca767c60","after":"b4c687ef3625e1737fba4f6643d7bedb9d6d2b6d","ref":"refs/heads/main","pushedAt":"2024-02-09T21:09:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Avoid calling byte_pair_encode for existing tokens\n\nThis was byte_pair_encode can be optimized further, assuming we'll always have at least 2 tokens","shortMessageHtmlLink":"Avoid calling byte_pair_encode for existing tokens"}},{"before":"3774ca182b18e70217e31e14587a796caa44b4d0","after":"053c00fe120f02d084f9f9ad21ed6e9c8d787d64","ref":"refs/heads/paplorinc-3","pushedAt":"2024-02-09T07:42:12.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Inline custom mapping function in _byte_pair_merge","shortMessageHtmlLink":"Inline custom mapping function in _byte_pair_merge"}},{"before":null,"after":"3774ca182b18e70217e31e14587a796caa44b4d0","ref":"refs/heads/paplorinc-3","pushedAt":"2024-02-09T07:41:38.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Inline custom mapping function in _byte_pair_merge","shortMessageHtmlLink":"Inline custom mapping function in _byte_pair_merge"}},{"before":null,"after":"b9ebf0fe68b0d23226d80cf4611192768bbaf92f","ref":"refs/heads/paplorinc-2","pushedAt":"2024-02-09T07:40:15.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Avoid calling byte_pair_encode for existing tokens\n\nThis was byte_pair_encode can be optimized further, assuming we'll always have at least 2 tokens","shortMessageHtmlLink":"Avoid calling byte_pair_encode for existing tokens"}},{"before":"c2960c16c4a0b3a6b1a760eaac50b6a2c89b45fa","after":"6e4851a76be22a4f9cc428de3ea39d50ca767c60","ref":"refs/heads/main","pushedAt":"2024-02-09T07:36:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Add finer grained gratitude","shortMessageHtmlLink":"Add finer grained gratitude"}},{"before":"1b81474be4318895a2f1e27cfb4c237cd16ab735","after":null,"ref":"refs/heads/paplorinc-1","pushedAt":"2024-02-09T07:29:07.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"}},{"before":"84d88dca52d591481b986ae386108e632b32cd61","after":"c2960c16c4a0b3a6b1a760eaac50b6a2c89b45fa","ref":"refs/heads/main","pushedAt":"2024-02-09T07:29:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Store tokens in u32 instead of usize\n\nAnd hide it behind a Rank type to make it easier to separate it from other numeric values","shortMessageHtmlLink":"Store tokens in u32 instead of usize"}},{"before":null,"after":"1b81474be4318895a2f1e27cfb4c237cd16ab735","ref":"refs/heads/paplorinc-1","pushedAt":"2024-02-09T07:25:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Store tokens in u32 instead of usize\n\nAnd hide it behind a Rank type to make it easier to separate it from other numeric values","shortMessageHtmlLink":"Store tokens in u32 instead of usize"}},{"before":"01df4360a1fa4581550b6f144e1541c952824569","after":"84d88dca52d591481b986ae386108e632b32cd61","ref":"refs/heads/main","pushedAt":"2024-02-09T05:52:56.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Allow use of gpt-2 and gpt-3.5 in encoding_for_model (#185)","shortMessageHtmlLink":"Allow use of gpt-2 and gpt-3.5 in encoding_for_model (#185)"}},{"before":"89153d70db062e35c585c06568f6ace6f43079bd","after":"01df4360a1fa4581550b6f144e1541c952824569","ref":"refs/heads/main","pushedAt":"2024-02-09T05:50:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Update cibuildwheel","shortMessageHtmlLink":"Update cibuildwheel"}},{"before":"55c8d83da33a2986b5cb259997d6b2c20f078d79","after":"89153d70db062e35c585c06568f6ace6f43079bd","ref":"refs/heads/main","pushedAt":"2024-02-09T05:46:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Sync codebase","shortMessageHtmlLink":"Sync codebase"}},{"before":"6cc3a46c8dd43a913612b9c839a51936a2c6bd41","after":"55c8d83da33a2986b5cb259997d6b2c20f078d79","ref":"refs/heads/main","pushedAt":"2024-02-09T02:33:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"added two new embedding model's encoding (#247)\n\nLibrary doesn't support two new embedding model's encoding mapper\r\n- `text-embedding-3-small`\r\n- `text-embedding-3-large`\r\n\r\nAdded Encoding mapper for 2 new embedding models. The source of mapping\r\nis taken from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb","shortMessageHtmlLink":"added two new embedding model's encoding (#247)"}},{"before":"db5bda9fc93b3171db6c4afea329394e6b6d31ca","after":"6cc3a46c8dd43a913612b9c839a51936a2c6bd41","ref":"refs/heads/main","pushedAt":"2024-02-09T02:17:22.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Optimize regular expressions used for splitting by ~20% (#234)\n\nBy combining the contractions to a single non-capturing group prefixed\r\nby `'`, we can speed up matches by roughly 20%.\r\n\r\nBy using possessive quantifiers for the `cl100k_base` in the word and\r\npunctuation groups we're avoiding some backtracking.\r\n\r\nThe last whitespace groups can also be simplified to have a single\r\nnewline matched explicitly, since the previous whitespace would already\r\nmatch it.\r\n\r\nOverall the regex matches the exact same sequence of characters as\r\nbefore for any case and for unicode sequences.\r\n\r\nCo-authored-by: Lőrinc ","shortMessageHtmlLink":"Optimize regular expressions used for splitting by ~20% (#234)"}},{"before":"6f261deef63b49a7da9000b57a7cf938d7315ab3","after":null,"ref":"refs/heads/paplorinc/optimize-regex","pushedAt":"2024-02-09T02:09:09.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"}},{"before":null,"after":"6f261deef63b49a7da9000b57a7cf938d7315ab3","ref":"refs/heads/paplorinc/optimize-regex","pushedAt":"2024-02-09T02:08:49.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"gpt-2 docs","shortMessageHtmlLink":"gpt-2 docs"}},{"before":"3ee6c3517d3465775dc22f06a034cfcf8d06eba7","after":"db5bda9fc93b3171db6c4afea329394e6b6d31ca","ref":"refs/heads/main","pushedAt":"2024-01-30T00:55:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Clarify language models in README (#203)","shortMessageHtmlLink":"Clarify language models in README (#203)"}},{"before":"9e79899bc248d5313c7dd73562b5e211d728723d","after":"3ee6c3517d3465775dc22f06a034cfcf8d06eba7","ref":"refs/heads/main","pushedAt":"2024-01-30T00:51:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Add support for checking hash of downloaded files before use. (#230)\n\nWe are using tiktoken in various production scenarios and sometimes have\r\nthe problem that the download of `.tiktoken` files (e.g.,\r\n`cl100k_base.tiktoken`) will get interrupted or fail, causing the cached\r\nfile to be corrupted in some way. In those cases, the results returned\r\nfrom the encoder will be incorrect and could be damaging to our\r\nproduction instances.\r\n\r\nMore often, when this happens, `Encoder.encode()` will throw an\r\nexception such as\r\n```\r\npyo3_runtime.PanicException: no entry found for key\r\n```\r\nwhich turns out to be quite hard to track down.\r\n\r\nIn an effort to make tiktoken more robust for production use, this PR\r\nadds the `sha256` hash of each of the downloaded files to\r\n`openai_public.py` and augments `read_file` to check for the hash, if\r\nprovided, when the file is accessed from the cache or downloaded\r\ndirectly. This causes errors to be flagged at file load time, rather\r\nthan when the files are used, and provides a more meaningful error\r\nmessage indicating what might have gone wrong.\r\n\r\nThis also protects users of tiktoken from scenarios where a network\r\nissue or MITM attack could have corrupted these files in transit.","shortMessageHtmlLink":"Add support for checking hash of downloaded files before use. (#230)"}},{"before":"6267f91608dfe813986264f2f5c113317dc91762","after":"9e79899bc248d5313c7dd73562b5e211d728723d","ref":"refs/heads/main","pushedAt":"2023-12-03T08:15:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Sync codebase","shortMessageHtmlLink":"Sync codebase"}},{"before":"39f29cecdb6fc38d9a3434e5dd15e4de58cf3c80","after":"6267f91608dfe813986264f2f5c113317dc91762","ref":"refs/heads/main","pushedAt":"2023-12-03T08:06:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Sync codebase","shortMessageHtmlLink":"Sync codebase"}},{"before":"52fceb8fa1d287680e81c84bd300dbd1a1acd0cc","after":"39f29cecdb6fc38d9a3434e5dd15e4de58cf3c80","ref":"refs/heads/main","pushedAt":"2023-09-13T00:40:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hauntsaninja","name":"Shantanu","path":"/hauntsaninja","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12621235?s=80&v=4"},"commit":{"message":"Sync codebase","shortMessageHtmlLink":"Sync codebase"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAESMG0mQA","startCursor":null,"endCursor":null}},"title":"Activity · openai/tiktoken"}