{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":648389974,"defaultBranch":"main","name":"text-generation-inference","ownerLogin":"IBM","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-06-01T21:30:29.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1459110?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715878501.0","currentOid":""},"activityList":{"items":[{"before":"4b13db9428525549c879f53fde2cb6b828700f1b","after":"217ade53a88825b66bfc8b0f47815f2f5f8b90aa","ref":"refs/heads/counter_totals","pushedAt":"2024-05-16T20:26:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":construction_worker: turn off builds on this branch\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"๐Ÿ‘ท turn off builds on this branch"}},{"before":"c84c14269ef0c1d55ff94d0bc87030fc2680b797","after":"4b13db9428525549c879f53fde2cb6b828700f1b","ref":"refs/heads/counter_totals","pushedAt":"2024-05-16T20:13:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":bug: use bookworm for correct libssl version\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"๐Ÿ› use bookworm for correct libssl version"}},{"before":null,"after":"cccc00e7a65acd339aa9eb09236eed9667ebd709","ref":"refs/heads/health_check_investigation","pushedAt":"2024-05-16T16:55:01.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"maxdebayser","name":"Maximilien de Bayser","path":"/maxdebayser","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1291418?s=80&v=4"},"commit":{"message":"add logging statements\n\nSigned-off-by: Max de Bayser ","shortMessageHtmlLink":"add logging statements"}},{"before":"f9772c4bf127318f5b4f9fcef708ea92b6318c92","after":"c84c14269ef0c1d55ff94d0bc87030fc2680b797","ref":"refs/heads/counter_totals","pushedAt":"2024-05-16T00:07:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":alembic: push image off this branch for sanity checks\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"โš—๏ธ push image off this branch for sanity checks"}},{"before":"08f983791e82cc9e520d3a686d06233fc6b366fa","after":"f9772c4bf127318f5b4f9fcef708ea92b6318c92","ref":"refs/heads/counter_totals","pushedAt":"2024-05-15T21:41:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":alembic: try updating a lot, specifying listener for metrics exporter\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"โš—๏ธ try updating a lot, specifying listener for metrics exporter"}},{"before":"5825fdf4fe8d97c7aa064e99e09b7e767601ba0f","after":"08f983791e82cc9e520d3a686d06233fc6b366fa","ref":"refs/heads/counter_totals","pushedAt":"2024-05-15T19:49:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":recycle: simplify metrics logic\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"โ™ป๏ธ simplify metrics logic"}},{"before":"3e30dda51182313e0a6cacaea24f956d18f9be90","after":"5825fdf4fe8d97c7aa064e99e09b7e767601ba0f","ref":"refs/heads/counter_totals","pushedAt":"2024-05-15T19:40:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":bug: fix counter labels\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"๐Ÿ› fix counter labels"}},{"before":null,"after":"3e30dda51182313e0a6cacaea24f956d18f9be90","ref":"refs/heads/counter_totals","pushedAt":"2024-05-15T18:41:47.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"joerunde","name":"Joe Runde","path":"/joerunde","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1646872?s=80&v=4"},"commit":{"message":":recycle: move metrics into one file\n\nSigned-off-by: Joe Runde ","shortMessageHtmlLink":"โ™ป๏ธ move metrics into one file"}},{"before":"a88a3950e7473f30ff1ada4d94c5b09589253ad4","after":null,"ref":"refs/heads/tpa-cleanup-blocks","pushedAt":"2024-05-14T15:13:49.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"njhill","name":"Nick Hill","path":"/njhill","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16958488?s=80&v=4"}},{"before":"07349731f0f2ae31e46cac93a5a90cc15751d705","after":"fb23def9b391588e5825ff2aae4ace3b0ffb4b00","ref":"refs/heads/main","pushedAt":"2024-05-14T15:13:46.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"njhill","name":"Nick Hill","path":"/njhill","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16958488?s=80&v=4"},"commit":{"message":"Free blocks in KVCacheManager upon error (#96)\n\n#### Motivation\r\n\r\nWe are see pods with spec. decoding getting restarted in BAM due to\r\nhealth checks failing. Upon inspection of the logs, it looks like we are\r\nrunning out of blocks, and never recovering from it.\r\n#### Modifications\r\n\r\nI added a simple check that if something goes wrong when generating a\r\ntoken, we free the blocks associated with that batch. I also had to\r\nensure that the we free the child sequences that get created during\r\nspeculation if something goes wrong there too.\r\n\r\n#### Result\r\n\r\nI've verified this allow us to recover from failures related to running\r\nout of blocks. Hopefully after this fix, we don't see the inference\r\nserver getting restarted.\r\n\r\nSigned-off-by: Thomas Parnell ","shortMessageHtmlLink":"Free blocks in KVCacheManager upon error (#96)"}},{"before":null,"after":"a88a3950e7473f30ff1ada4d94c5b09589253ad4","ref":"refs/heads/tpa-cleanup-blocks","pushedAt":"2024-05-14T13:34:26.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tdoublep","name":"Thomas Parnell","path":"/tdoublep","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7945038?s=80&v=4"},"commit":{"message":"Free blocks in KVCacheManager upon error\n\nSigned-off-by: Thomas Parnell ","shortMessageHtmlLink":"Free blocks in KVCacheManager upon error"}},{"before":"f70571188b5f27d694edc4fdf276bc96886dcc20","after":"eb08f71b969dce3f6815b5dd3cf70f5421ac8663","ref":"refs/heads/lm-eval","pushedAt":"2024-05-13T14:48:18.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"maxdebayser","name":"Maximilien de Bayser","path":"/maxdebayser","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1291418?s=80&v=4"},"commit":{"message":"Merge branch 'main' into lm-eval","shortMessageHtmlLink":"Merge branch 'main' into lm-eval"}},{"before":"943a01a9e2e293031e5efe8ddb5ce829281ceb96","after":null,"ref":"refs/heads/fix-modelinfo-eos","pushedAt":"2024-05-10T20:13:09.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"}},{"before":"2358566dea74ec1a1830ec162390182cf7397f0d","after":"07349731f0f2ae31e46cac93a5a90cc15751d705","ref":"refs/heads/main","pushedAt":"2024-05-10T20:13:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"fix: check for tokenizer eos_token in ModelInfo response (#93)\n\nUse consistent logic to determine the eos_token_id in ModelInfo as it is\r\nin other functions by falling back to the tokenizer's `eos_token_id`\r\nattribute if the model config does not have an `eos_token_id`.\r\n\r\nFixes the behavior for a model that does not have an eos_token_id\r\nin the model config\r\n\r\nResolves https://github.com/IBM/text-generation-inference/issues/91\r\n\r\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"fix: check for tokenizer eos_token in ModelInfo response (#93)"}},{"before":"d5bb831847c180612e3b50cb58c06d18b1eff298","after":"943a01a9e2e293031e5efe8ddb5ce829281ceb96","ref":"refs/heads/fix-modelinfo-eos","pushedAt":"2024-05-10T18:03:41.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"fix: check for tokenizer eos_token in ModelInfo response\n\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"fix: check for tokenizer eos_token in ModelInfo response"}},{"before":"f546fe1df93322d9403a37f2dd7e111f81d07419","after":null,"ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-10T17:54:13.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"}},{"before":"ddc56ee758c7167870d872f38730b563f14b9b5a","after":"2358566dea74ec1a1830ec162390182cf7397f0d","ref":"refs/heads/main","pushedAt":"2024-05-10T17:52:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"njhill","name":"Nick Hill","path":"/njhill","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16958488?s=80&v=4"},"commit":{"message":"feat: deprecate TRANSFORMERS_CACHE, use HF_HUB_CACHE everywhere (#89)\n\n#### Motivation\r\n\r\n`TRANSFORMERS_CACHE` is deprecated (slated for removal with Transformers\r\nv5) and `HUGGINGFACE_HUB_CACHE` is legacy. This PR standardizes on\r\n`HF_HUB_CACHE` to configure the cache. Also, not all operations/CLI\r\ncommands were correctly pulling from `TRANSFORMERS_CACHE` so we have\r\nbeen setting both env vars anyways. After this change, everything should\r\nwork with only `HF_HUB_CACHE`.\r\n\r\n#### Modifications\r\n\r\n- Launcher inspects HF_HUB_CACHE to determine the model cache path\r\n- TRANSFORMERS_CACHE and HUGGINGFACE_HUB_CACHE are still checked as\r\nwell, but a deprecation warning is printed\r\n - if multiple values are present and do not match, raise an error\r\n- Launcher can resolve the default HF_HUB_CACHE so it does not need to\r\nbe set (HF_HOME or its default can be used instead)\r\n- Server CLI checks TRANSFORMERS_CACHE and prints a warning if it is set\r\n- Server CLI returns an error if both TRANSFORMERS_CACHE and\r\nHF_HUB_CACHE are set with different values\r\n\r\n---------\r\n\r\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"feat: deprecate TRANSFORMERS_CACHE, use HF_HUB_CACHE everywhere (#89)"}},{"before":"145dbcf86f4d95f80a705989d50abdc08f785b98","after":"f546fe1df93322d9403a37f2dd7e111f81d07419","ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-10T15:53:34.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"fix: more proper path handling for HF_HOME / HOME\n\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"fix: more proper path handling for HF_HOME / HOME"}},{"before":"04a6319899562c0c7ad55a0951b90e1d680ca77d","after":null,"ref":"refs/heads/tpa-tp-cache-fix","pushedAt":"2024-05-10T13:34:00.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tdoublep","name":"Thomas Parnell","path":"/tdoublep","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7945038?s=80&v=4"}},{"before":"e87d46215edccba99ecfc97d5d925cc8df926598","after":"ddc56ee758c7167870d872f38730b563f14b9b5a","ref":"refs/heads/main","pushedAt":"2024-05-10T13:33:52.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdoublep","name":"Thomas Parnell","path":"/tdoublep","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7945038?s=80&v=4"},"commit":{"message":"Set TP argument correctly when instantiating PagedKVCacheManager (#94)\n\n#### Motivation\r\n\r\nUsers are seeing runtime errors when trying to use TP>1 with speculative\r\ndecoding.\r\n\r\n#### Modifications\r\n\r\nWe need to set the tensor parallel argument correctly when we\r\ninstantiate the PagedKVCacheManager.\r\n\r\n#### Result\r\n\r\nI have verified that this change resolves the reported issue. \r\n\r\n#### Related Issues\r\n\r\nhttps://huggingface.co/ibm-fms/llama3-8b-accelerator/discussions/1\r\n\r\nSigned-off-by: Thomas Parnell ","shortMessageHtmlLink":"Set TP argument correctly when instantiating PagedKVCacheManager (#94)"}},{"before":null,"after":"04a6319899562c0c7ad55a0951b90e1d680ca77d","ref":"refs/heads/tpa-tp-cache-fix","pushedAt":"2024-05-10T09:57:06.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tdoublep","name":"Thomas Parnell","path":"/tdoublep","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7945038?s=80&v=4"},"commit":{"message":"Set TP argument correctly when instantiating PagedKVCacheManager\n\nSigned-off-by: Thomas Parnell ","shortMessageHtmlLink":"Set TP argument correctly when instantiating PagedKVCacheManager"}},{"before":"a6597cb6ffac17540191edc713efdd068cfc6a32","after":"145dbcf86f4d95f80a705989d50abdc08f785b98","ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-09T20:53:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"fix: more proper path handling for HF_HOME / HOME\n\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"fix: more proper path handling for HF_HOME / HOME"}},{"before":null,"after":"d5bb831847c180612e3b50cb58c06d18b1eff298","ref":"refs/heads/fix-modelinfo-eos","pushedAt":"2024-05-09T17:11:19.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"fix: check for tokenizer eos_token in ModelInfo response\n\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"fix: check for tokenizer eos_token in ModelInfo response"}},{"before":null,"after":"f70571188b5f27d694edc4fdf276bc96886dcc20","ref":"refs/heads/lm-eval","pushedAt":"2024-05-09T16:03:39.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"maxdebayser","name":"Maximilien de Bayser","path":"/maxdebayser","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1291418?s=80&v=4"},"commit":{"message":"TGIS gRPC adapter for lm-eval\n\nThis PoC adds a backend to lm-eval so that it can call a running TGIS or tgis-vllm server\nover grpc. It can run benchmarks based on the generate function for\ndecoder and encoder-decoder models. For the logprobs function only\ndecoder models are supported because tgis doesn't return the input\nlogprobs for encoder-decoder models.\n\nSigned-off-by: Max de Bayser ","shortMessageHtmlLink":"TGIS gRPC adapter for lm-eval"}},{"before":"4e979d43d3c7003304532d77604f201d352db5f9","after":"a6597cb6ffac17540191edc713efdd068cfc6a32","ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-08T22:52:41.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"review: apply suggested refactors\n\nCo-authored-by: Nick Hill \nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"review: apply suggested refactors"}},{"before":"6bf1fdc5e10cb7526666d84369dc99f1629c17e9","after":"4e979d43d3c7003304532d77604f201d352db5f9","ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-08T22:50:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"review: apply suggested refactors\n\nCo-authored-by: Nick Hill \nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"review: apply suggested refactors"}},{"before":"aa3e85522d8b3d151c002bfffbfe9b94201080ca","after":"6bf1fdc5e10cb7526666d84369dc99f1629c17e9","ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-08T22:42:33.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"test: update integration tests for HF_HUB_CACHE\n\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"test: update integration tests for HF_HUB_CACHE"}},{"before":null,"after":"aa3e85522d8b3d151c002bfffbfe9b94201080ca","ref":"refs/heads/cache-env-vars","pushedAt":"2024-05-08T21:16:07.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tjohnson31415","name":"Travis Johnson","path":"/tjohnson31415","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7907693?s=80&v=4"},"commit":{"message":"feat: deprecate TRANSFORMERS_CACHE, use HF_HUB_CACHE everywhere\n\nSigned-off-by: Travis Johnson ","shortMessageHtmlLink":"feat: deprecate TRANSFORMERS_CACHE, use HF_HUB_CACHE everywhere"}},{"before":"b3d1bbd5fe2d7aa24f463d07161bbe2af7e81086","after":null,"ref":"refs/heads/fix-attn-bias","pushedAt":"2024-05-08T17:38:27.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"njhill","name":"Nick Hill","path":"/njhill","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16958488?s=80&v=4"}},{"before":"f091ad5f75ddd036f79a842e0219ef98cd0f8ead","after":"e87d46215edccba99ecfc97d5d925cc8df926598","ref":"refs/heads/main","pushedAt":"2024-05-08T17:38:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"njhill","name":"Nick Hill","path":"/njhill","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16958488?s=80&v=4"},"commit":{"message":"Fix llama gqa attention bias (#88)\n\nTo support IBM granite code 8b models\r\n\r\nSigned-off-by: Nick Hill ","shortMessageHtmlLink":"Fix llama gqa attention bias (#88)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAETAroTgA","startCursor":null,"endCursor":null}},"title":"Activity ยท IBM/text-generation-inference"}