Vertex AI pipeline - IndexError: Invalid key: 0 is out of bounds for size 0 #2813

kk2491 · 2024-03-27T01:14:31Z

Expected Behavior

The fine-tuning of the foundation model should complete without any issues.

Actual Behavior

The fine-tuning step gets terminated. The details provided below:

Training framework - Google collab
Model used - Llama2-7B
Fine-tuning method - PEFT
Number of samples in Training Set - 100
Number of samples in Eval Set - 20
Format of the training data - jsonl
Example sample is given below -

{"text": "### Human: What is arithmatic mean? ### Assistant: The arithmetic mean, or simply the mean, is the average of a set of numbers obtained by adding them up and dividing by the total count of numbers."}
{"text": "### Human: What is geometric mean? ### Assistant: The geometric mean is a measure of central tendency calculated by multiplying all values in a dataset and then taking the nth root of the product, where n is the total number of values."}

Vertex pipeline parameters :

pipeline_parameters = {
    "base_model": base_model,
    "dataset_name": dataset_name,
    "prediction_accelerator_type": prediction_accelerator_type,
    "training_accelerator_type": training_accelerator_type,
    "training_precision_mode": training_precision_mode,
    "training_lora_rank": 16,
    "training_lora_alpha": 32,
    "training_lora_dropout": 0.05,
    "training_steps": 20,
    "training_warmup_steps": 10,
    "training_learning_rate": 2e-4,
    "evaluation_steps": 10,
    "evaluation_limit": 1,
}

When I execute the training process, I get the below error:

raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")  
IndexError: Invalid key: 0 is out of bounds for size 0

Can you please help in understanding the below question?

Is the format of training data correct ? I used the format which was given as default example in Collab notebook, you can find the dataset here
Is the number of samples too less ?
Is there anything I am missing here ?

Steps to Reproduce the Problem

Specifications

Version:
Platform:

The text was updated successfully, but these errors were encountered:

gericdong · 2024-03-27T21:00:44Z

@kk2491 can you please let me know which notebook you ran?

kk2491 · 2024-03-28T08:02:32Z

Hi @gericdong I am using the below notebook

model_garden_pytorch_llama2_peft_finetuning.ipynb

Thank you,
KK

gericdong · 2024-03-28T14:48:28Z

@genquan9: can you please assist with this? Thank you.

genquan9 · 2024-03-28T21:00:10Z

If you do training from HF datasets, you can input sth like: timdettmers/openassistant-guanaco directly.

but, if you use dataset json stored in gcs, you should use the format as:
{"input_text":"TRANSCRIPT: \nREASON FOR EVALUATION:,\n\n LABEL:","output_text":"Chiropractic"}

genquan9 · 2024-03-28T21:10:49Z

The team is verifying the notebook with pipelines again.

kk2491 · 2024-03-29T01:55:18Z

@genquan9 Thanks for the response.

I am not using the dataset from the GCP bucket.
I have created my own dataset in huggingface following the format of timdettmers/openassistant-guanaco, you can find the dataset here.

Thank you,
KK

kk2491 · 2024-04-02T07:03:15Z

@genquan9 @gericdong Sorry to bother you. Did you get chance to look into the above issue?

Thank you,
KK

jismailyan-google · 2024-04-02T16:31:05Z

Hi @kk2491, I was able to reproduce the issue. Please try again but set the evaluation_limit to 100.

kk2491 · 2024-04-02T17:17:15Z

@jismailyan-google Thanks for the suggestion.
Just out of curiosity, did you also try with my dataset (from here)?

Thank you,
KK

kk2491 · 2024-04-03T16:30:10Z

@jismailyan-google Looks like the notebook for vertex-ai pipeline has been removed.
However I did try the fine-tuning with evaluation_limit set to 100, the error remains the same.

kk2491 · 2024-04-05T02:33:49Z

@genquan9 @gericdong Did you get chance to look into the above issue?

Thank you,
KK

jismailyan-google · 2024-04-05T04:37:59Z

Hi @kk2491,

I was able to get the tuning completed with your dataset.
You can try this out, just replace the PIPELINE_ROOT_BUCKET with your GCS bucket and the SERVICE_ACCOUNT with your own.

Also, please note the updated COMPILED_PIPELINE_PATH.

COMPILED_PIPELINE_PATH = "https://us-kfp.pkg.dev/ml-pipeline/google-cloud-registry/oss-peft-llm-tuner/sha256:2e723d2eccb84d28652dd73324e0bf5dc7179f2ddb4230853cb95b0428438eb0"

pipeline_parameters = {
    "base_model_name": "Llama-2-7b",
    "dataset_name": "kk2491/test",
}

# Define and launch the Pipeline Job.
job = aiplatform.PipelineJob(
    display_name='llama2-tuner-04042024',
    template_path=COMPILED_PIPELINE_PATH,
    pipeline_root=PIPELINE_ROOT_BUCKET,
    parameter_values=pipeline_parameters,
)

job.submit(service_account=SERVICE_ACCOUNT)

Let me know if this works.

kk2491 · 2024-04-05T11:33:53Z

@jismailyan-google I tried again this time with Vertex GUI (looks like the notebook for fine-tune with vertex-ai has been removed).
As per the comments provided by you, I dont have to make any changes in parameters except BUCKET and SERVICE_ACCOUNT. Hence tried with all default values, however the results remain the same.

Now I am 100% sure that I am doing some silly mistake here.. !!!

Joshwani-broadcom · 2024-04-23T03:43:35Z

I am running into the same error when trying to specify a custom dataset:

# Hugging Face dataset name or gs:// URI to a custom JSONL dataset.
dataset_name = "gs://llama-fine-tuning/training_data.jsonl"  # @param {type:"string"}

# Name of the dataset column containing training text input.
instruct_column_in_dataset = "text"  # @param {type:"string"}

# Optional. Template name or gs:// URI to a custom template.
template = ""  # @param {type:"string"}

I haven't looked, but I suspect that the image running the instruct-lora task is trying to load the gs:// URI as a huggingface dataset? Something like this: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/vertex_model_garden/model_oss/peft/instruct_lora.py#L27.

I saw the following comment by @genquan9:

If you do training from HF datasets, you can input sth like: timdettmers/openassistant-guanaco directly.

but, if you use dataset json stored in gcs, you should use the format as: {"input_text":"TRANSCRIPT: \nREASON FOR EVALUATION:,\n\n LABEL:","output_text":"Chiropractic"}

I haven't tried this yet, but it seems that the instruct lora task needs to account for gs:// URI somehow. Does it?

kk2491 · 2024-04-23T04:38:32Z

@Joshwani-broadcom Here is how I was able to fix the error. (Worth giving a try, if not tried yet)

Each jsonl sample should contain at least 2 Human and Assistant conversation.
Each of jsonl sample should contain at least 512 words.

Looks like all of your samples are getting dropped due to one of the above reasons.

You can also find more details here.
By following this I was able to fix the error, and fine-tune the llama2 model successfully.

Kindly let me know if you face any other issues.

Thank you,
KK

Joshwani-broadcom · 2024-04-23T14:26:20Z

Thank you @kk2491 - Is it true that you are using a huggingface dataset? Did you ever find success using a gs:// uri in the notebook like this:

dataset_name = "gs://llama-fine-tuning/training_data.jsonl"

?

kk2491 · 2024-04-23T14:32:14Z

Yea initially I tried with huggingface dataset and got it working. Later with the same dataset I migrated to Google bucket, it worked as expected.

Thank you,
Kk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertex AI pipeline - IndexError: Invalid key: 0 is out of bounds for size 0 #2813

Vertex AI pipeline - IndexError: Invalid key: 0 is out of bounds for size 0 #2813

kk2491 commented Mar 27, 2024 •

edited

gericdong commented Mar 27, 2024

kk2491 commented Mar 28, 2024

gericdong commented Mar 28, 2024

genquan9 commented Mar 28, 2024

genquan9 commented Mar 28, 2024

kk2491 commented Mar 29, 2024

kk2491 commented Apr 2, 2024

jismailyan-google commented Apr 2, 2024

kk2491 commented Apr 2, 2024

kk2491 commented Apr 3, 2024

kk2491 commented Apr 5, 2024

jismailyan-google commented Apr 5, 2024 •

edited

kk2491 commented Apr 5, 2024

Joshwani-broadcom commented Apr 23, 2024 •

edited

kk2491 commented Apr 23, 2024

Joshwani-broadcom commented Apr 23, 2024 •

edited

kk2491 commented Apr 23, 2024

Vertex AI pipeline - IndexError: Invalid key: 0 is out of bounds for size 0 #2813

Vertex AI pipeline - IndexError: Invalid key: 0 is out of bounds for size 0 #2813

Comments

kk2491 commented Mar 27, 2024 • edited

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

gericdong commented Mar 27, 2024

kk2491 commented Mar 28, 2024

gericdong commented Mar 28, 2024

genquan9 commented Mar 28, 2024

genquan9 commented Mar 28, 2024

kk2491 commented Mar 29, 2024

kk2491 commented Apr 2, 2024

jismailyan-google commented Apr 2, 2024

kk2491 commented Apr 2, 2024

kk2491 commented Apr 3, 2024

kk2491 commented Apr 5, 2024

jismailyan-google commented Apr 5, 2024 • edited

kk2491 commented Apr 5, 2024

Joshwani-broadcom commented Apr 23, 2024 • edited

kk2491 commented Apr 23, 2024

Joshwani-broadcom commented Apr 23, 2024 • edited

kk2491 commented Apr 23, 2024

kk2491 commented Mar 27, 2024 •

edited

jismailyan-google commented Apr 5, 2024 •

edited

Joshwani-broadcom commented Apr 23, 2024 •

edited

Joshwani-broadcom commented Apr 23, 2024 •

edited