Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertex AI pipeline - IndexError: Invalid key: 0 is out of bounds for size 0 #2813

Open
kk2491 opened this issue Mar 27, 2024 · 17 comments
Open

Comments

@kk2491
Copy link

kk2491 commented Mar 27, 2024

Expected Behavior

The fine-tuning of the foundation model should complete without any issues.

Actual Behavior

The fine-tuning step gets terminated. The details provided below:

Training framework - Google collab
Model used - Llama2-7B  
Fine-tuning method - PEFT   
Number of samples in Training Set - 100  
Number of samples in Eval Set - 20  
Format of the training data - jsonl
Example sample is given below -   

{"text": "### Human: What is arithmatic mean? ### Assistant: The arithmetic mean, or simply the mean, is the average of a set of numbers obtained by adding them up and dividing by the total count of numbers."}
{"text": "### Human: What is geometric mean? ### Assistant: The geometric mean is a measure of central tendency calculated by multiplying all values in a dataset and then taking the nth root of the product, where n is the total number of values."}

Vertex pipeline parameters :  

pipeline_parameters = {
    "base_model": base_model,
    "dataset_name": dataset_name,
    "prediction_accelerator_type": prediction_accelerator_type,
    "training_accelerator_type": training_accelerator_type,
    "training_precision_mode": training_precision_mode,
    "training_lora_rank": 16,
    "training_lora_alpha": 32,
    "training_lora_dropout": 0.05,
    "training_steps": 20,
    "training_warmup_steps": 10,
    "training_learning_rate": 2e-4,
    "evaluation_steps": 10,
    "evaluation_limit": 1,
}

When I execute the training process, I get the below error:  

raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")  
IndexError: Invalid key: 0 is out of bounds for size 0

Can you please help in understanding the below question? 

  1. Is the format of training data correct ? I used the format which was given as default example in Collab notebook, you can find the dataset here
  2. Is the number of samples too less ? 
  3. Is there anything I am missing here ? 

Steps to Reproduce the Problem

Specifications

  • Version:
  • Platform:
@gericdong
Copy link
Contributor

@kk2491 can you please let me know which notebook you ran?

@kk2491
Copy link
Author

kk2491 commented Mar 28, 2024

Hi @gericdong I am using the below notebook

model_garden_pytorch_llama2_peft_finetuning.ipynb

Thank you,
KK

@gericdong
Copy link
Contributor

@genquan9: can you please assist with this? Thank you.

@genquan9
Copy link
Contributor

If you do training from HF datasets, you can input sth like: timdettmers/openassistant-guanaco directly.

but, if you use dataset json stored in gcs, you should use the format as:
{"input_text":"TRANSCRIPT: \nREASON FOR EVALUATION:,\n\n LABEL:","output_text":"Chiropractic"}

@genquan9
Copy link
Contributor

The team is verifying the notebook with pipelines again.

@kk2491
Copy link
Author

kk2491 commented Mar 29, 2024

@genquan9 Thanks for the response.

I am not using the dataset from the GCP bucket.
I have created my own dataset in huggingface following the format of timdettmers/openassistant-guanaco, you can find the dataset here.

Thank you,
KK

@kk2491
Copy link
Author

kk2491 commented Apr 2, 2024

@genquan9 @gericdong Sorry to bother you. Did you get chance to look into the above issue?

Thank you,
KK

@jismailyan-google
Copy link
Contributor

Hi @kk2491, I was able to reproduce the issue. Please try again but set the evaluation_limit to 100.

@kk2491
Copy link
Author

kk2491 commented Apr 2, 2024

@jismailyan-google Thanks for the suggestion.
Just out of curiosity, did you also try with my dataset (from here)?

Thank you,
KK

@kk2491
Copy link
Author

kk2491 commented Apr 3, 2024

@jismailyan-google Looks like the notebook for vertex-ai pipeline has been removed.
However I did try the fine-tuning with evaluation_limit set to 100, the error remains the same.

image

@kk2491
Copy link
Author

kk2491 commented Apr 5, 2024

@genquan9 @gericdong Did you get chance to look into the above issue?

Thank you,
KK

@jismailyan-google
Copy link
Contributor

jismailyan-google commented Apr 5, 2024

Hi @kk2491,

I was able to get the tuning completed with your dataset.
You can try this out, just replace the PIPELINE_ROOT_BUCKET with your GCS bucket and the SERVICE_ACCOUNT with your own.

Also, please note the updated COMPILED_PIPELINE_PATH.

COMPILED_PIPELINE_PATH = "https://us-kfp.pkg.dev/ml-pipeline/google-cloud-registry/oss-peft-llm-tuner/sha256:2e723d2eccb84d28652dd73324e0bf5dc7179f2ddb4230853cb95b0428438eb0"

pipeline_parameters = {
    "base_model_name": "Llama-2-7b",
    "dataset_name": "kk2491/test",
}

# Define and launch the Pipeline Job.
job = aiplatform.PipelineJob(
    display_name='llama2-tuner-04042024',
    template_path=COMPILED_PIPELINE_PATH,
    pipeline_root=PIPELINE_ROOT_BUCKET,
    parameter_values=pipeline_parameters,
)

job.submit(service_account=SERVICE_ACCOUNT)

Let me know if this works.

@kk2491
Copy link
Author

kk2491 commented Apr 5, 2024

@jismailyan-google I tried again this time with Vertex GUI (looks like the notebook for fine-tune with vertex-ai has been removed).
As per the comments provided by you, I dont have to make any changes in parameters except BUCKET and SERVICE_ACCOUNT. Hence tried with all default values, however the results remain the same.

Now I am 100% sure that I am doing some silly mistake here.. !!!

@Joshwani-broadcom
Copy link

Joshwani-broadcom commented Apr 23, 2024

I am running into the same error when trying to specify a custom dataset:

# Hugging Face dataset name or gs:// URI to a custom JSONL dataset.
dataset_name = "gs://llama-fine-tuning/training_data.jsonl"  # @param {type:"string"}

# Name of the dataset column containing training text input.
instruct_column_in_dataset = "text"  # @param {type:"string"}

# Optional. Template name or gs:// URI to a custom template.
template = ""  # @param {type:"string"}
Screenshot 2024-04-22 at 9 37 23 PM

I haven't looked, but I suspect that the image running the instruct-lora task is trying to load the gs:// URI as a huggingface dataset? Something like this: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/vertex_model_garden/model_oss/peft/instruct_lora.py#L27.

I saw the following comment by @genquan9:

If you do training from HF datasets, you can input sth like: timdettmers/openassistant-guanaco directly.

but, if you use dataset json stored in gcs, you should use the format as: {"input_text":"TRANSCRIPT: \nREASON FOR EVALUATION:,\n\n LABEL:","output_text":"Chiropractic"}

I haven't tried this yet, but it seems that the instruct lora task needs to account for gs:// URI somehow. Does it?

@kk2491
Copy link
Author

kk2491 commented Apr 23, 2024

@Joshwani-broadcom Here is how I was able to fix the error. (Worth giving a try, if not tried yet)

  1. Each jsonl sample should contain at least 2 Human and Assistant conversation.
  2. Each of jsonl sample should contain at least 512 words.

Looks like all of your samples are getting dropped due to one of the above reasons.

You can also find more details here.
By following this I was able to fix the error, and fine-tune the llama2 model successfully.

Kindly let me know if you face any other issues.

Thank you,
KK

@Joshwani-broadcom
Copy link

Joshwani-broadcom commented Apr 23, 2024

Thank you @kk2491 - Is it true that you are using a huggingface dataset? Did you ever find success using a gs:// uri in the notebook like this:

dataset_name = "gs://llama-fine-tuning/training_data.jsonl" 

?

@kk2491
Copy link
Author

kk2491 commented Apr 23, 2024

Yea initially I tried with huggingface dataset and got it working. Later with the same dataset I migrated to Google bucket, it worked as expected.

Thank you,
Kk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants