Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 30] Cannot create directory '/efs'. Detail: [errno 30] Read-only file system #246

Open
bhavya-giri opened this issue Oct 26, 2023 · 5 comments

Comments

@bhavya-giri
Copy link

@GokuMohandas can you help me figure this out

@bhavya-giri
Copy link
Author

Screenshot 2023-10-30 at 7 58 46 AM Screenshot 2023-10-30 at 7 59 08 AM Screenshot 2023-10-30 at 7 59 18 AM

@Meryl-Fang
Copy link

same error here, have you managed to resolve it?

@taaha
Copy link

taaha commented Nov 12, 2023

I am having the same issue and no idea why. basically it is unable to load function from madewilml/data directory. A hack that worked for me is to create and run the following code cell above this erroneous code cell

import re
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd
import ray
from ray.data import Dataset
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer

def stratify_split(
    ds: Dataset,
    stratify: str,
    test_size: float,
    shuffle: bool = True,
    seed: int = 1234,
) -> Tuple[Dataset, Dataset]:
    """Split a dataset into train and test splits with equal
    amounts of data points from each class in the column we
    want to stratify on.

    Args:
        ds (Dataset): Input dataset to split.
        stratify (str): Name of column to split on.
        test_size (float): Proportion of dataset to split for test set.
        shuffle (bool, optional): whether to shuffle the dataset. Defaults to True.
        seed (int, optional): seed for shuffling. Defaults to 1234.

    Returns:
        Tuple[Dataset, Dataset]: the stratified train and test datasets.
    """

    def _add_split(df: pd.DataFrame) -> pd.DataFrame:  # pragma: no cover, used in parent function
        """Naively split a dataframe into train and test splits.
        Add a column specifying whether it's the train or test split."""
        train, test = train_test_split(df, test_size=test_size, shuffle=shuffle, random_state=seed)
        train["_split"] = "train"
        test["_split"] = "test"
        return pd.concat([train, test])

    def _filter_split(df: pd.DataFrame, split: str) -> pd.DataFrame:  # pragma: no cover, used in parent function
        """Filter by data points that match the split column's value
        and return the dataframe with the _split column dropped."""
        return df[df["_split"] == split].drop("_split", axis=1)

    # Train, test split with stratify
    grouped = ds.groupby(stratify).map_groups(_add_split, batch_format="pandas")  # group by each unique value in the column we want to stratify on
    train_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "train"}, batch_format="pandas")  # combine
    test_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "test"}, batch_format="pandas")  # combine

    # Shuffle each split (required)
    train_ds = train_ds.random_shuffle(seed=seed)
    test_ds = test_ds.random_shuffle(seed=seed)

    return train_ds, test_ds

Basically instead of importing it which it is failing to do so (no idea why) we are directly using the function in the notebook

@bhavya-giri
Copy link
Author

But the same error would come in training, check this repo https://github.com/GokuMohandas/mlops-course

@gOsuzu
Copy link

gOsuzu commented Dec 3, 2023

As the error message indicated, this error caused by the permission related to /efs folder, you are creating.
I assume you use your own local machine. I edited like below, and it worked in my local environment, Mac OS (14.1.2) and Python 3.10.11. The path would be different, depending on where your directory located. I hope this might help you.

  1. config.py
    Change line 13:
    EFS_DIR = Path(f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ.get('GITHUB_USERNAME', '')}")

  2. madewithml.ipynb
    Change the codes in Setup section:
    EFS_DIR = f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ['GITHUB_USERNAME']}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants