{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":642608852,"defaultBranch":"main","name":"gorilla","ownerLogin":"ShishirPatil","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-05-19T00:46:45.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/30296397?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1713398042.0","currentOid":""},"activityList":{"items":[{"before":"5da8835a840bdfa9faad09aa7236cf3b51c7d659","after":"ba6463ddc397e4808982299964769496387913ce","ref":"refs/heads/main","pushedAt":"2024-05-23T07:24:19.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] update tree_sitter version in requirements.txt (#433)\n\nFollowing the instructions in the `BFCL` README and running the\r\n`eval_data_compilation.py` script results in this error:\r\n\r\n```\r\nAttributeError: type object 'tree_sitter.Language' has no attribute 'build_library'\r\n```\r\n\r\nbecause the `build_library` method was removed in a [recent\r\nrelease](https://github.com/tree-sitter/py-tree-sitter/releases/tag/v0.22.0)\r\nof py-tree-sitter. This PR updates the `requirements.txt` file so that\r\nthe instructions in the README will work for anyone following them now.","shortMessageHtmlLink":"[BFCL] update tree_sitter version in requirements.txt (#433)"}},{"before":"d1d0385bb677f5ede3b8de255d048c64ed6a2517","after":"7491d720fb86cd20c6f84226a3487767c60c4523","ref":"refs/heads/gh-pages","pushedAt":"2024-05-16T19:59:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[Landing Page] Update Author Info (#429)\n\nThis PR adds Xin Wang and Joseph E. Gonzalez to the author section on\r\nthe Gorilla Project landing page.\r\nThis PR **DOES NOT** change the leaderboard value.","shortMessageHtmlLink":"[Landing Page] Update Author Info (#429)"}},{"before":"e04e9ed335b75e323194de1dcbc3db465d4065ff","after":"d1d0385bb677f5ede3b8de255d048c64ed6a2517","ref":"refs/heads/gh-pages","pushedAt":"2024-05-15T20:54:53.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Leaderboard Update, in sync with BFCL May 14th Release (GPT-4o and Gemini) (#428)\n\nAs mentioned in #426, this PR addes 4 new models to the leaderboard. The\r\nmodel costs are also updated accordingly.\r\n\r\nThis PR **DOES** change the leaderboard ranking.\r\n\r\nThis PR **DOES NOT** change the leaderboard score other than the added\r\nmodels.","shortMessageHtmlLink":"Leaderboard Update, in sync with BFCL May 14th Release (GPT-4o and Ge…"}},{"before":"45c7d65938cb011a48ef3af252f63165e8631036","after":"e04e9ed335b75e323194de1dcbc3db465d4065ff","ref":"refs/heads/gh-pages","pushedAt":"2024-05-15T18:33:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Leaderboard Update, in sync with PR#406 (handle parallel function calls from gemini) (#427)\n\nThis PR updates the leaderboard to reflect the change in score due to\r\nthe added support for parallel function calls for gemini series in #406.","shortMessageHtmlLink":"Leaderboard Update, in sync with PR#406 (handle parallel function cal…"}},{"before":"40b07a5de16c28b49f2fa9cd0692197c4b1ed3f2","after":"5da8835a840bdfa9faad09aa7236cf3b51c7d659","ref":"refs/heads/main","pushedAt":"2024-05-15T18:33:43.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"BFCL May 14th Release (GPT-4o and Gemini) (#426)\n\nThis PR makes 3 models(4 entries) available for inference on BFCL: \r\n- gpt-4o-2024-05-13 (Function Calling Mode and Prompting Mode)\r\n- gemini-1.5-pro-preview-0514 (Function Calling Mode)\r\n- gemini-1.5-flash-preview-0514(Function Calling Mode)\r\n\r\nYou can start the evaluation by running `python\r\nopenfunctions_evaluation.py --model MODEL_NAME` and get score by running\r\n`python ./eval_runner.py --model MODEL_NAME`. For more detail, refer to\r\nReadme under the BFCL page.\r\n\r\nScore changes are reflected in #428 . \r\n\r\nThis PR also updated different models' pricing: \r\n- For Gemini, when prompts are less than 128K tokens, the new Gemini\r\nseries' prices are lowered by around\r\nhalf(https://ai.google.dev/pricing). All the BFCL test cases are less\r\nthan 128K tokens.\r\n- For Anthropic Models, the prices have decreased for claude-2.1 and\r\nClaude-instant-1.2 which have updated accordingly\r\n- For Mistral Models, the prices have been halved for Mistral-large and\r\nMistral-Small\r\n- For OpenAI Models, we have corrected GPT-3.5-turbo-0125 to the price\r\nit should have\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"BFCL May 14th Release (GPT-4o and Gemini) (#426)"}},{"before":"2fc82a9bd1b667b594504556a58d0e8ee908c592","after":"40b07a5de16c28b49f2fa9cd0692197c4b1ed3f2","ref":"refs/heads/main","pushedAt":"2024-05-15T07:48:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[BFCL] Patch Gemini Handler (#421)\n\nBuild on the work of @vandyxiaowei #406, this PR is a bug fix for the\r\ngeneration pipeline for Gemini models ([Gemini-1.5-Pro\r\n(FC)](https://deepmind.google/technologies/gemini/#introduction) and\r\n[Gemini-1.0-Pro\r\n(FC)](https://deepmind.google/technologies/gemini/#introduction)) to\r\nmake it more robust. The Gemini model output `result[\"candidates\"][0]`\r\ndoes not always have the key `\"content\"`, which would cause the current\r\ngeneration pipeline to error out.\r\n\r\nThis PR **DOES NOT** change the leaderboard score.","shortMessageHtmlLink":"[BFCL] Patch Gemini Handler (#421)"}},{"before":"ae5f0a2daeed5a7c53ff8efda143dee55ecb17e9","after":"2fc82a9bd1b667b594504556a58d0e8ee908c592","ref":"refs/heads/main","pushedAt":"2024-05-11T06:03:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"[RAFT] Edit encode prompt to include `:` tag in label (#422)\n\nAdded line `You MUST begin your final answer with the tag \":\".`\r\nin `encode_question_gen`'s prompt to ensure the `:` tag is\r\nincluded in the label.\r\n\r\n---------\r\n\r\nCo-authored-by: Kai Wen ","shortMessageHtmlLink":"[RAFT] Edit encode prompt to include <ANSWER>: tag in label (#422)"}},{"before":"42f9d28b991d942235858ec51d1637bdc7096f97","after":"ae5f0a2daeed5a7c53ff8efda143dee55ecb17e9","ref":"refs/heads/main","pushedAt":"2024-05-10T02:59:41.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"RAFT Support for chat and completion model formats (#417)\n\nAdds support to convert the dataset to formats expected to fine tune\r\n`completion` and `chat` models as specified there:\r\n\r\nhttps://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset\r\n\r\n`chat` format:\r\n```\r\n{\"messages\": [{\"role\": \"system\", \"content\": \"Marv is a factual chatbot that is also sarcastic.\"}, {\"role\": \"user\", \"content\": \"What's the capital of France?\"}, {\"role\": \"assistant\", \"content\": \"Paris, as if everyone doesn't know that already.\"}]}\r\n{\"messages\": [{\"role\": \"system\", \"content\": \"Marv is a factual chatbot that is also sarcastic.\"}, {\"role\": \"user\", \"content\": \"Who wrote 'Romeo and Juliet'?\"}, {\"role\": \"assistant\", \"content\": \"Oh, just some guy named William Shakespeare. Ever heard of him?\"}]}\r\n{\"messages\": [{\"role\": \"system\", \"content\": \"Marv is a factual chatbot that is also sarcastic.\"}, {\"role\": \"user\", \"content\": \"How far is the Moon from Earth?\"}, {\"role\": \"assistant\", \"content\": \"Around 384,400 kilometers. Give or take a few, like that really matters.\"}]}\r\n```\r\n\r\n`completion` format:\r\n```\r\n{\"prompt\": \"\", \"completion\": \"\"}\r\n{\"prompt\": \"\", \"completion\": \"\"}\r\n{\"prompt\": \"\", \"completion\": \"\"}\r\n```\r\n\r\n`raft.py`:\r\n- Supports `jsonl` and `parquet` output types\r\n- Supports `hf`, `chat` and `completion` formats\r\n- `chat` format also accepts a `--output-chat-system-prompt` param to\r\nconfigure the system prompt\r\n- Ignore venv folders\r\n- Added usage to --help\r\n\r\n```\r\n --output-format {hf,completion,chat}\r\n Format to convert the dataset to. Defaults to hf.\r\n --output-type {parquet,jsonl}\r\n Type to export the dataset to. Defaults to jsonl.\r\n --output-chat-system-prompt OUTPUT_CHAT_SYSTEM_PROMPT\r\n The system prompt to use when the output format is chat\r\n```\r\n\r\n\r\nNew `format.py` script, to convert dataset previously generated by\r\n`raft.py`:\r\n\r\n```\r\n$ python format.py --help\r\nusage: format.py [-h] --input INPUT [--input-type {arrow,jsonl}] --output OUTPUT --output-format {hf,completion,chat}\r\n [--output-type {parquet,jsonl}] [--output-chat-system-prompt OUTPUT_CHAT_SYSTEM_PROMPT]\r\n\r\noptions:\r\n -h, --help show this help message and exit\r\n --input INPUT Input HuggingFace dataset file\r\n --input-type {arrow,jsonl}\r\n Format of the input dataset. Defaults to arrow.\r\n --output OUTPUT Output file\r\n --output-format {hf,completion,chat}\r\n Format to convert the dataset to\r\n --output-type {parquet,jsonl}\r\n Type to export the dataset to. Defaults to jsonl.\r\n --output-chat-system-prompt OUTPUT_CHAT_SYSTEM_PROMPT\r\n The system prompt to use when the output format is chat\r\n```\r\n\r\nHow to test `format.py` with the `chat` format:\r\n\r\n```\r\npython format.py --input output/data-00000-of-00001.arrow \\\r\n --output output/ucb-short.chat.jsonl \\\r\n --output-format chat \\\r\n --output-chat-system-prompt 'You are an AI expert on UC Berkeley'\r\n```\r\n\r\nHow to test `format.py` with the `completion` format:\r\n\r\n```\r\npython format.py --input output/data-00000-of-00001.arrow \\\r\n --output output/ucb-short.completion.jsonl \\\r\n --output-format completion\r\n```\r\n\r\nHow to test `raft.py` with the `chat` format:\r\n\r\n```\r\npython3 raft.py \\\r\n --datapath $PWD/sample_data/UC_Berkeley_short.pdf \\\r\n --output $PWD/output \\\r\n --distractors 3 \\\r\n --doctype pdf \\\r\n --chunk_size 512 \\\r\n --questions 2 \\\r\n --completion_model gpt-4-turbo \\\r\n --embedding_model text-embedding-ada-002 \\\r\n --output-format chat \\\r\n --output-chat-system-prompt \"You're a RAG AI\"\r\n```\r\n\r\nCo-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>","shortMessageHtmlLink":"RAFT Support for chat and completion model formats (#417)"}},{"before":"0eb02bb932e37e75def5cb0e26153dbc6ac1841a","after":"42f9d28b991d942235858ec51d1637bdc7096f97","ref":"refs/heads/main","pushedAt":"2024-05-09T07:39:34.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"handle parallel function calls from gemini (#406)\n\nHandle parallel function calls for Gemini handler for the Berkeley\r\nFunction Calling Leaderboard.\r\n\r\nThis PR does NOT change values in BFCL.\r\n\r\nCo-authored-by: Xiaowei Li ","shortMessageHtmlLink":"handle parallel function calls from gemini (#406)"}},{"before":"792a99ac9a7be0f7cb41a3278d21d5d0d2ebecf3","after":"45c7d65938cb011a48ef3af252f63165e8631036","ref":"refs/heads/gh-pages","pushedAt":"2024-05-09T07:22:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Leaderboard Update, in sync with BFCL May 6th Release (#418)\n\nAs mentioned in #412, this PR updates the leaderboard to reflect the\r\nchange in score due to the bug fix in the evaluation dataset.","shortMessageHtmlLink":"Leaderboard Update, in sync with BFCL May 6th Release (#418)"}},{"before":"c2a376b48bde71d299273fd19ee8fdf0cc67cddf","after":"0eb02bb932e37e75def5cb0e26153dbc6ac1841a","ref":"refs/heads/main","pushedAt":"2024-05-08T17:12:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"RAFT Fix arbitrary code execution vulnerability in checkpoint feature (#415)\n\nReplaced `eval()` by `int()` when loading the checkpoint file to remove\r\nthe possibility of executing arbitrary code.\r\n\r\nCo-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>","shortMessageHtmlLink":"RAFT Fix arbitrary code execution vulnerability in checkpoint feature ("}},{"before":"7f28b2096a4ffc70a97cc580dff3dad88dbef691","after":"c2a376b48bde71d299273fd19ee8fdf0cc67cddf","ref":"refs/heads/main","pushedAt":"2024-05-08T07:19:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"RAFT Add support for configuring separate completion and embedding endpoints + pytest (#396)\n\nFixes #393 \r\n\r\n**Changes**\r\n- OpenAI and AzureOpenAI clients env vars can now be overriden with\r\n`COMPLETION_` prefixed env vars\r\n- OpenAI and AzureOpenAI Langchain embedding clients env vars can now be\r\noverriden with `EMBEDDING_` prefixed env vars\r\n- Added pytest dependency\r\n- Unit tests for `env_config.read_env_config`\r\n- Unit tests for `env_config.set_env`\r\n- Updated README\r\n\r\n**Run unit tests**\r\ncd raft\r\npytest\r\n\r\n**Sample configuration to use a Llama 2 70b Instruct model for\r\ncompletion and ADA 002 for embeddings**\r\n\r\nIn `.env`\r\n\r\n```\r\n# Llama 3 70b Instruct completion model\r\n# Uses an OpenAI v1 compatible endpoint on Azure MaaS\r\n\r\nCOMPLETION_OPENAI_BASE_URL=https://Meta-Llama-3-70B-Instruct--serverless.eastus2.inference.ai.azure.com/v1\r\nCOMPLETION_OPENAI_API_KEY=\r\n\r\n# Ada 2 embedding model\r\n# Uses an Azure OpenAI endpoint\r\n\r\nEMBEDDING_AZURE_OPENAI_ENDPOINT=https://.openai.azure.com/\r\nEMBEDDING_AZURE_OPENAI_API_KEY=\r\nEMBEDDING_OPENAI_API_VERSION=\r\n```\r\n\r\n**Running raft.py**\r\n```\r\ncd raft\r\npython3 raft.py \\\r\n --datapath $PWD/sample_data/UC_Berkeley.pdf \\\r\n --output $PWD/output \\\r\n --distractors 3 \\\r\n --doctype pdf \\\r\n --chunk_size 512 \\\r\n --questions 5 \\\r\n --completion-model Meta-Llama-3-70B-Instruct- \\\r\n --embedding-model text-embedding-ada-002\r\n```","shortMessageHtmlLink":"RAFT Add support for configuring separate completion and embedding en…"}},{"before":"1418b602b78d0d2e0a2637fe2f2c8e96a7b53eb3","after":"7f28b2096a4ffc70a97cc580dff3dad88dbef691","ref":"refs/heads/main","pushedAt":"2024-05-08T06:02:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"RAFT DevContainer for GitHub Codespaces (#379)\n\nThis PR adds support for a GitHub Codespaces Dev Container for the\r\n`raft` sub-project:\r\n - Dev Container config `RAFT` with direnv with shell hooks\r\n- Adds `Open in GitHub Codespaces` badge to README.md pre-configured\r\nwith `RAFT` config\r\n- post-create script creates env using the python venv module and points\r\ndirenv at it\r\n - Upgrades pip and installs requirements\r\n\r\nNote: The Dev Container configuration can also be used locally in VS\r\nCode in addition to remotely in GitHub Codespaces.\r\n\r\nThis PR is similar to #351 but targets the `raft` sub project.\r\n\r\nCo-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>","shortMessageHtmlLink":"RAFT DevContainer for GitHub Codespaces (#379)"}},{"before":"0438e38f9253818723162d6c9ff803a0e7b4605e","after":"792a99ac9a7be0f7cb41a3278d21d5d0d2ebecf3","ref":"refs/heads/gh-pages","pushedAt":"2024-05-07T07:52:56.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Update to LangChain integration example (#400)\n\nThere has been a refactoring of the LangChain codebase earlier this\r\nyear. I've updated the LangChain integration example to reflect this.\r\n\r\nOn running the example, this is the output -\r\n```\r\n<<>>: Natural Language Processing Translation\r\n<<>>: translation = pipeline('translation_en_to_zh', model='Helsinki-NLP/opus-mt-en-zh')\r\n<<>>: Hugging Face Transformers\r\n<<>>: 1. Import the pipeline function from the transformers library provided by Hugging Face.\r\n2. Create a translation pipeline using the 'Helsinki-NLP/opus-mt-en-zh' model, which is specifically designed for translating English text to Chinese.\r\n3. Provide the input text to the translation pipeline and get the translated Chinese text.\r\n<<>>:\r\nfrom transformers import pipeline\r\n\r\ndef load_model():\r\n translation = pipeline('translation_en_to_zh', model='Helsinki-NLP/opus-mt-en-zh')\r\n return translation\r\n\r\ndef process_data(input_text, translation):\r\n response = translation(input_text, max_length=512)\r\n translated_text = response[0]['translation_text']\r\n return translated_text\r\n\r\ninput_text = \"Hello, how are you?\"\r\n\r\n# Load the translation model and process the data\r\ntranslation = load_model()\r\nresponse = process_data(input_text, translation)\r\nprint(response)\r\n\r\n```","shortMessageHtmlLink":"Update to LangChain integration example (#400)"}},{"before":"077b3b3d4361e81156998c4f7ec52a4522caa0ac","after":"1418b602b78d0d2e0a2637fe2f2c8e96a7b53eb3","ref":"refs/heads/main","pushedAt":"2024-05-07T07:46:13.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"BFCL May 6th Release (Dataset Bug Fix) (#412)\n\nThis PR is for the BFCL May 6th release.\r\n\r\nThanks to Brian Zhang at OpenAI for helping raise some of the issues.\r\n\r\n1. Bug fix in the evaluation dataset. This involves modifying both\r\nprompts and function docs.\r\n2. Bug fix for possible answers.\r\n\r\nThe detailed breakdown is attached below. If you spot any issue with our\r\nevaluation dataset and/or possible answers, please feel free to raise an\r\nissue/PR!\r\n\r\n| Test Category | Prompt/Func Doc Correction Count | Possible Answer\r\nCorrection Count |\r\n\r\n|---------------------|-----------------------------|-----------------------------|\r\n| Simple | 6 | 7 |\r\n| Parallel | 0 | 0|\r\n| Multiple | 0 | 1 |\r\n| Parallel Multiple | 0 | 1 |\r\n\r\nThis PR **DOES** change the leaderboard score. We will update the\r\nleaderboard website shortly, in a different PR.\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji ","shortMessageHtmlLink":"BFCL May 6th Release (Dataset Bug Fix) (#412)"}},{"before":"624d371e085eefd23ae3f818107490ee8f59e8d9","after":"077b3b3d4361e81156998c4f7ec52a4522caa0ac","ref":"refs/heads/main","pushedAt":"2024-05-07T06:27:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Small corrections to possible_answers for simple test category (#405)\n\nI’m redoing the leaderboard results and scores for some models in the\r\n`simple` test category. I’ve noticed the model is marked down for\r\nanswers that seem correct to me. Here are my suggestions for the\r\ncorrections.\r\n\r\nThis PR DOES change the leaderboard scores, which will be updated later.","shortMessageHtmlLink":"Small corrections to possible_answers for simple test category (#405)"}},{"before":"4e8dac12515d847694bbfd4d1c15020ccf5bf263","after":"624d371e085eefd23ae3f818107490ee8f59e8d9","ref":"refs/heads/main","pushedAt":"2024-05-04T05:59:46.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"RAFT Recovery Mode for interruptions (#410)\n\nImplemented a \"safe\"/recovery mode that periodically saves chunks into\r\n\"checkpoint\" datasets while also keeping track of the current chunk\r\nnumber. In the case of an interruption or crash, the script resumes at\r\nthe saved chunk number. After all chunks have been processed, all\r\ncheckpoint datasets are concatenated and saved as one final dataset.\r\n\r\nAdded an argument allowing user to choose whether to run RAFT in safe or\r\nfast mode (defaults to safe).\r\n\r\nClose #394\r\n\r\n---------\r\n\r\nCo-authored-by: Kaihao Wen ","shortMessageHtmlLink":"RAFT Recovery Mode for interruptions (#410)"}},{"before":"8e00c855673a6f4ae9a7e300acfaac2ed57010b1","after":"0438e38f9253818723162d6c9ff803a0e7b4605e","ref":"refs/heads/gh-pages","pushedAt":"2024-04-28T18:20:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Improve User Experience for Wagon Wheel (#399)\n\nThis PR aims to improve the user experience when interacting with the\r\nwagon wheel on the BFCL leaderboard. Due to the huge number of entries,\r\nthe wagon wheel is having a hard time displaying them all. This PR\r\naddresses the issue for laptop users. This issue still exists on mobile,\r\nand we will tackle it later.\r\n\r\nThis PR **DOES NOT** change the leaderboard score.","shortMessageHtmlLink":"Improve User Experience for Wagon Wheel (#399)"}},{"before":"7ae329d17c69241a2f8b3a824ab17fde25e56ae0","after":"8e00c855673a6f4ae9a7e300acfaac2ed57010b1","ref":"refs/heads/gh-pages","pushedAt":"2024-04-28T07:37:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Leaderboard Update, in sync with BFCL April 28th (New Model: `snowflake/arctic`) (#398)\n\nThis PR updates website leaderboard according with newly added model\r\n`snowflake/arctic` from #397\r\n\r\nThis PR **DOES** change the leaderboard ranking.\r\n\r\nThis PR **DOES NOT** change the leaderboard score other than the added\r\nmodel.\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi (Hans) Mao ","shortMessageHtmlLink":"Leaderboard Update, in sync with BFCL April 28th (New Model: `snowfla…"}},{"before":"cff48afcef8932076b1a5efe075242c595150a4c","after":"4e8dac12515d847694bbfd4d1c15020ccf5bf263","ref":"refs/heads/main","pushedAt":"2024-04-28T07:37:05.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"BFCL April 28th Release (New Model: snowflake/arctic) (#397)\n\nIn this PR, we have added `snowflake/arctic` to leaderboard:\r\n\r\n- The inference was done through [Nvidia API\r\ncatalog](https://build.nvidia.com/explore/discover#arctic) endpoint.\r\nLatencies are recorded and the cost is marked as \"N/A\" as the inference\r\nconsumed Nvidia credits which were provided free upon registration.\r\n- The leaderboard website will be updated shortly to reflect these new\r\nentries, in a different PR.\r\n\r\nNotes on Nvidia API catalog: We didn't have enough compute resources to\r\nhost full precision Arctic inference locally and therefore looked for\r\nthird-party hosting. Out of Snowflake Cortex, Together.ai, and Nvidia\r\nAPI catalog, we decided to use Nvidia's service as it does not have\r\ncontext length limit and has detailed documentation of Python usage.\r\n\r\n---------\r\n\r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"BFCL April 28th Release (New Model: snowflake/arctic) (#397)"}},{"before":"2c87d43ca163d426f503dfe198845e70c491b969","after":"7ae329d17c69241a2f8b3a824ab17fde25e56ae0","ref":"refs/heads/gh-pages","pushedAt":"2024-04-28T04:29:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Gorilla Website Styling Update (#389)\n\nThis PR updates the appearance of the Gorilla website landing page. \r\n\r\n1. Add button links to the papers\r\n2. Change the wording for the scrolling bar. \r\n3. Change nav bar ordering for the leaderboard to be consistant.\r\n\r\nIt **DOES NOT** change the leaderboard value","shortMessageHtmlLink":"Gorilla Website Styling Update (#389)"}},{"before":"46e959b73be6a40c233e36c71c268ce3a9eabe36","after":"2c87d43ca163d426f503dfe198845e70c491b969","ref":"refs/heads/gh-pages","pushedAt":"2024-04-27T08:47:44.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Leaderboard Update, in sync with BFCL April 27th Release (#391)\n\nAs mentioned in #390, in this PR, we fix some inconsistency issues in\r\nthe cost and latency calculation for open-source models, which are now\r\nall calculated when serving the model with\r\n[vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs.\r\n$$\\text{Cost} = \\text{Latency per 1000 function call} * (\\text{8xV100\r\nazure-pay-as-you-go-price per hour / 3600})$$\r\n\r\nWe want to thank the community for pointing out this oversight. Thanks\r\n[@abacaj](https://twitter.com/abacaj) and\r\n[@Teknium1](https://twitter.com/Teknium1) for initially raising the\r\nissue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi)\r\n[@HamelHusain](https://twitter.com/HamelHusain)\r\n[@nicoritschel](https://twitter.com/nicoritschel)\r\n[@winglian](https://twitter.com/winglian)\r\n[@olafgeibig](https://twitter.com/olafgeibig) and many others for\r\njoining the conversation. We are listening to community feedback and\r\ncontinuously improving our Berkeley Function Calling Leaderboard.\r\nDiscussions like\r\n[this](https://twitter.com/abacaj/status/1784003306508980250) serve as\r\ngreat examples. Let us know what you want us to include next!\r\n\r\nThis PR DOES change the leaderboard scores for `costs` and `latency`,\r\nbut not `accuracy`.\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji\r\n[charliechengjieji@berkeley.edu](mailto:charliechengjieji@berkeley.edu)\r\nCo-authored-by: Fanjia Yan\r\n[fanjiayan@berkeley.edu](mailto:fanjiayan@berkeley.edu)","shortMessageHtmlLink":"Leaderboard Update, in sync with BFCL April 27th Release (#391)"}},{"before":"5a850479540e872f7e38e4309d7b33214707a7d2","after":"cff48afcef8932076b1a5efe075242c595150a4c","ref":"refs/heads/main","pushedAt":"2024-04-27T08:47:27.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) (#390)\n\nIn this PR, we fix some inconsistency issues in the cost and latency\r\ncalculation for open-source models, which are now all calculated when\r\nserving the model with [vLLM](https://github.com/vllm-project/vllm)\r\nusing 8 V100 GPUs. $$\\text{Cost} = \\text{Latency per 1000 function call}\r\n* (\\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$\r\n\r\nThis PR **DOES** change the leaderboard value in the `cost` and\r\n`latency` columns; but it **DOES NOT** change the accuracy score. We\r\nwill update the leaderboard in a different PR #391.\r\n\r\nWe want to thank the community for pointing out this oversight. Thanks\r\n[@abacaj](https://twitter.com/abacaj) and\r\n[@Teknium1](https://twitter.com/Teknium1) for initially raising the\r\nissue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi)\r\n[@HamelHusain](https://twitter.com/HamelHusain)\r\n[@nicoritschel](https://twitter.com/nicoritschel)\r\n[@winglian](https://twitter.com/winglian)\r\n[@olafgeibig](https://twitter.com/olafgeibig) and many others for\r\njoining the conversation. We are listening to community feedback and\r\ncontinuously improving our Berkeley Function Calling Leaderboard.\r\nDiscussions like\r\n[this](https://twitter.com/abacaj/status/1784003306508980250) serve as\r\ngreat examples. Let us know what you want us to include next!\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji \r\nCo-authored-by: Fanjia Yan ","shortMessageHtmlLink":"BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) (#390)"}},{"before":"e6cd5129935896fac4bb12c5762927d5f125575f","after":"5a850479540e872f7e38e4309d7b33214707a7d2","ref":"refs/heads/main","pushedAt":"2024-04-26T20:06:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Colored logging configuration + displaying progress in logs (#384)\n\n**Logging config**\r\n- Colored logging using `coloredlogs` package\r\n- Logging configuration loaded from YAML config file, by default\r\n`logging.yaml`\r\n- Logging configuration YAML file location overridable with\r\n`LOGGING_CONFIG` env var\r\n\r\n**Displaying progress in logs**\r\n- Added mdc dependency\r\n- Progress attached to MDC and included in logging format message\r\n- A default progress empty value is provided to avoid a KeyError when\r\nthe progress field is not set, such as when logging from a different\r\nthread\r\n\r\nHere is what it looks like when we're hitting quota limits and getting\r\nsome retries, it displays progress and colored logs:\r\n![Screenshot 2024-04-23 at 8 33\r\n07 PM](https://github.com/ShishirPatil/gorilla/assets/33618/591a8093-0b02-451a-96bb-9b639d0a8fc5)\r\n\r\nHere is what it looks like when it runs smoothly:\r\n![Screenshot 2024-04-25 at 8 53\r\n02 PM](https://github.com/ShishirPatil/gorilla/assets/33618/d022c38d-a2c5-4b02-8d0f-06b8d8275c12)\r\n\r\n**Tests**\r\n\r\n- Tested non regression with OpenAI API\r\n- Tested with Azure AI Resource Real Time `gpt-3.5-turbo` and\r\n`text-embedding-ada-002` deployment\r\n\r\n**Note**: this PR depends on #381","shortMessageHtmlLink":"Colored logging configuration + displaying progress in logs (#384)"}},{"before":"8130885904a07702da41581c6e3b07a6b0413cde","after":"46e959b73be6a40c233e36c71c268ce3a9eabe36","ref":"refs/heads/gh-pages","pushedAt":"2024-04-26T18:37:13.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Leaderboard Update, in sync with BFCL April 19th and April 25th Release (#387)\n\n- As mentioned in #377, this PR updates the leaderboard to reflect the\r\nscore changes resulting from the updates in the executable test category\r\nevaluation pipeline.\r\n- As mentioned in #386, this PR also adds five new models to the\r\nleaderboard.\r\n- It also adds a `last_updated` field to the leaderboard. \r\n\r\nThis PR **DOES** change the leaderboard score.\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji ","shortMessageHtmlLink":"Leaderboard Update, in sync with BFCL April 19th and April 25th Relea…"}},{"before":"5514251be2e8219a5cca625b5c92a2279860fdb1","after":"e6cd5129935896fac4bb12c5762927d5f125575f","ref":"refs/heads/main","pushedAt":"2024-04-26T18:36:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"BFCL April 25th Release (New Models) (#386)\n\nIn this PR, 5 new models are added to the leaderboard: \r\n\r\n- `meta-llama/Meta-Llama-3-8B-Instruct`\r\n- `meta-llama/Meta-Llama-3-70B-Instruct`\r\n- `gemini-1.5-pro-preview-0409`\r\n- `command-r-plus`\r\n- `command-r-plus-FC`\r\n\r\nThe leaderboard website will be updated shortly to reflect these new\r\nentries, in a different PR.\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji \r\nCo-authored-by: Fanjia Yan ","shortMessageHtmlLink":"BFCL April 25th Release (New Models) (#386)"}},{"before":"28a0f425ab25a1a40a175077395f67256926b9f9","after":"5514251be2e8219a5cca625b5c92a2279860fdb1","ref":"refs/heads/main","pushedAt":"2024-04-26T17:05:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Azure OpenAI support in raft.py (#381)\n\n- Added `--embedding-model` CLI arg defaults to text-embedding-ada-002.\r\nUsed to generate the chunks embeddings\r\n- Added `--completion-model` CLI arg defaults to gpt-4. Used to generate\r\nthe Q/A dataset\r\n- New client_utils.py with two helpers: `build_openai_client` and\r\n`build_langchain_embeddings`.\r\n- `build_openai_client` builds an `OpenAI` or `AzureOpenAI` client based\r\non detecting `AZURE_*` specific env vars or not\r\n- `build_langchain_embeddings` builds a `AzureOpenAIEmbeddings` or\r\n`OpenAIEmbeddings` based on detecting `AZURE_*` specific env vars or not\r\n- Loads environment variables from .env file using python-dotenv package\r\n- Added dependency on python-dotenv\r\n- Added doc to README\r\n\r\nFixes #382","shortMessageHtmlLink":"Azure OpenAI support in raft.py (#381)"}},{"before":"12ff417c25d5025545d4f714e0335801f5a5c201","after":"28a0f425ab25a1a40a175077395f67256926b9f9","ref":"refs/heads/main","pushedAt":"2024-04-25T18:07:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"BFCL April 19th Release (Dataset & Pipeline) (#377)\n\nThis PR is for the BFCL April 19th Release. In this release:\r\n\r\n\r\n- Bug fix for the evaluation dataset in the executable test categories.\r\nThis includes updates to both prompts and function docs.\r\n- The `evaluation_result` field has been removed to accommodate the\r\nvariability in API execution results across different evaluation runs.\r\nInstead, a human-verified `ground_truth` is now included for the\r\nexecutable test categories. During each evaluation run,\r\n`evaluation_result` is generated anew using the `ground_truth`, and then\r\ncompared against the model output.\r\n- A stricter metric has been adopted when using the `structural_match`\r\n(aka. type match) evaluation criteria ---- For `list` results, the\r\nlengths are compared; for `dict` results, the keys are matched. This is\r\nto account for the fast-changing nature of some of the real-time API\r\nresults while ensuring the evaluation remains meaningful.\r\n- Added another evaluation criterion `real_time_match` for the\r\nexecutable category, which is a looser form of `exact_match`\r\nspecifically for numerical execution results. The execution result must\r\nbe within a certain percentage threshold (20%) from the expected result\r\nto accommodate the live updates of API responses. Users can change this\r\nthreshold value in `eval_checker_constant.py`.\r\n- Added support to distinguish Cohere's optimized score vs. original\r\nscore.\r\n- Resolved #363 \r\n\r\nThis PR **DOES** change the leaderboard score. We will update the\r\nleaderboard shortly, in a different PR.\r\nWe will also update our HuggingFace dataset accordingly.\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji \r\nCo-authored-by: Fanjia Yan ","shortMessageHtmlLink":"BFCL April 19th Release (Dataset & Pipeline) (#377)"}},{"before":"c22e5e6f596e8371e6f7f8db1737e81c2093848a","after":"8130885904a07702da41581c6e3b07a6b0413cde","ref":"refs/heads/gh-pages","pushedAt":"2024-04-25T06:58:05.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Gorilla Website Update April 22nd (#380)\n\nIn this PR: \r\n1. Update the evaluation metric for BFCL, in sync with #377.\r\n2. Change the button layout on the landing page. \r\n\r\nThis PR **does not** change the leaderboard value.\r\n\r\n---------\r\n\r\nCo-authored-by: Charlie Cheng-Jie Ji ","shortMessageHtmlLink":"Gorilla Website Update April 22nd (#380)"}},{"before":"1d8d51d0d091c33e730d38745745005c9bc7dfc0","after":"12ff417c25d5025545d4f714e0335801f5a5c201","ref":"refs/heads/main","pushedAt":"2024-04-25T06:57:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ShishirPatil","name":"Shishir Patil","path":"/ShishirPatil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/30296397?s=80&v=4"},"commit":{"message":"Add FC + Prompt for Cohere command-r-plus (#350)\n\nHello from Cohere 👋 \r\n\r\nThanks a lot for creating and maintaining this evaluation - it's been a\r\nlot of fun to work with, and the recent improvements are great to see.\r\n\r\nThis PR adds an initial implementation of Cohere's native tool use API\r\n(FC) and non-native prompt. It builds upon the work of @Fanjia-Yan and\r\n@HuanzhiMao in https://github.com/ShishirPatil/gorilla/pull/315\r\n\r\nI look forward to hearing what you think and seeing us on the\r\nleaderboard!\r\n\r\n---------\r\n\r\nCo-authored-by: Fanjia Yan \r\nCo-authored-by: Huanzhi Mao ","shortMessageHtmlLink":"Add FC + Prompt for Cohere command-r-plus (#350)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEUbS3VgA","startCursor":null,"endCursor":null}},"title":"Activity · ShishirPatil/gorilla"}