Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFound when reading himawari data using fsspec #2775

Open
tervo opened this issue Apr 3, 2024 · 2 comments
Open

FileNotFound when reading himawari data using fsspec #2775

tervo opened this issue Apr 3, 2024 · 2 comments
Labels
enhancement code enhancements, features, improvements help wanted

Comments

@tervo
Copy link

tervo commented Apr 3, 2024

Describe the bug
I'm getting FileNotFound exception when trying to create a scene from himawari data located at S3 using fsspec.

The same data works if I first load the data to local disk and I can also read GOES data (from AWS) as in the documentation. fsspec is also able to open the files which makes them appear in the cache (but with different name what is indicated in the stack trace).

To Reproduce

S3_ENDPOINT_URL = "https://s3.waw3-1.cloudferro.com"
access_key_id = 'xx'
secret_access_key = 'xx'
bucket = 'rep.ahi_jma_nrt.ahixxx15.himawari8.level15.himawaristd'

# this works
s3 = s3fs.S3FileSystem(anon=False, key=access_key_id, secret=secret_access_key, client_kwargs={'endpoint_url': S3_ENDPOINT_URL})
files = s3.ls(f's3://{bucket}/hsfd/201810/04/201810040200/00/B01/')
print(files)

# this works
def local_wrapper(bucket, directory, access_key_id, secret_access_key, S3_ENDPOINT_URL, local_directory='/tmp/files'):
    s3 = s3fs.S3FileSystem(anon=False, key=access_key_id, secret=secret_access_key, client_kwargs={'endpoint_url': S3_ENDPOINT_URL})
    
    files = s3.glob(f's3://{bucket}/{directory}/*')
    local_files = []
    for file in files:
        local_path = os.path.join(local_directory, os.path.basename(file))
        if not os.path.exists(local_path):
            s3.get(file, local_path)
        local_files.append(local_path)
    return local_files

files = local_wrapper(bucket2, 'hsfd/201810/04/201810040200/00/B01', access_key_id, secret_access_key, S3_ENDPOINT_URL)
scn = Scene(reader='ahi_hsd', filenames=files)
scn.load(['B01'])

# this does not work
reader_kwargs = {
    'storage_options': {
        's3': {'anon': False,
               'key': access_key_id,
               'secret': secret_access_key,               
               'client_kwargs': {'endpoint_url': S3_ENDPOINT_URL}
        }
    }
}
filenames = [f'simplecache::s3://{bucket}/hsfd/201810/04/201810040200/00/B01/*.*']
scn = Scene(reader='ahi_hsd', filenames=filenames, reader_kwargs=reader_kwargs)
scn.load(['B01'])

# this also don't work
filename = f'{bucket}/hsfd/201810/04/201810040200/00/B01/*.*'
the_files = fsspec.open_files("simplecache::s3://" + filename, 
                              simplecache={'cache_storage': '/tmp/files'},
                              s3={'anon':False, 'key': access_key_id, 'secret':secret_access_key, 'endpoint_url': S3_ENDPOINT_URL, 'client_kwargs': {'verify': False}})
fs_files = [FSFile(open_file) for open_file in the_files]
scn = Scene(reader='ahi_hsd', filenames=fs_files)
scn.load(['B01'])

Expected behavior
Being able to read himawari data from S3 bucket using fsspec.

Actual results
Getting following stack trace:

FileNotFoundError                         Traceback (most recent call last)
Cell In[92], [line 11]
      [1]r_kwargs = {
      [2] 'storage_options': {
      [3] 's3': {'anon': False,
   (...)
      [8] }
      [9] }
     [10]s = [f'simplecache::s3://{bucket2}[/hsfd/201810/04/201810040200/00/B01/](https://vscode-remote+ssh-002dremote-002bml.vscode-resource.vscode-cdn.net/hsfd/201810/04/201810040200/00/B01/)*.*']
---> [11]n = Scene(reader='ahi_hsd', filenames=filenames, reader_kwargs=reader_kwargs)
     [12]n.load(['B01'])
     [13]n['B01']

File [~/miniconda/lib/python3.8/site-packages/satpy/scene.py:133], in Scene.__init__(self, filenames, reader, filter_parameters, reader_kwargs)
    [130](https://vscode-remote+ssh-002dremote-002bml.vscode-resource.vscode-cdn.net/home/tervo/code/~/miniconda/lib/python3.8/site-packages/satpy/scene.py:130) if filenames:
    [131] filenames = convert_remote_files_to_fsspec(filenames, storage_options)
--> [133]f._readers = self._create_reader_instances(filenames=filenames,
    [134] reader=reader,
    [135] reader_kwargs=cleaned_reader_kwargs)
    [136]f._datasets = DatasetDict()
    [137]f._wishlist = set()

File [~/miniconda/lib/python3.8/site-packages/satpy/scene.py:154], in Scene._create_reader_instances(self, filenames, reader, reader_kwargs)
    [149]f _create_reader_instances(self,
    [150] filenames=None,
    [151] reader=None,
    [152] reader_kwargs=None):
    [153] """Find readers and return their instances."""
--> [154] return load_readers(filenames=filenames,
    [155] reader=reader,
    [156] reader_kwargs=reader_kwargs)

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/__init__.py:575], in load_readers(filenames, reader, reader_kwargs)
    [573]s = reader_instance.select_files_from_pathnames(readers_files)
    [574]f loadables:
--> [575] reader_instance.create_filehandlers(
    [576] loadables,
    [577] fh_kwargs=reader_kwargs_without_filter[None if reader is None else reader[idx]])
    [578] reader_instances[reader_instance.name] = reader_instance
    [579] remaining_filenames -= set(loadables)

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/yaml_reader.py:1161], in GEOSegmentYAMLReader.create_filehandlers(self, filenames, fh_kwargs)
   [1159]f create_filehandlers(self, filenames, fh_kwargs=None):
   [1160] """Create file handler objects and determine expected segments for each."""
-> [1161] created_fhs = super(GEOSegmentYAMLReader, self).create_filehandlers(
   [1162] filenames, fh_kwargs=fh_kwargs)
   [1164] # add "expected_segments" information
   [1165] for fhs in created_fhs.values():

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/yaml_reader.py:616], in FileYAMLReader.create_filehandlers(self, filenames, fh_kwargs)
    [614] # load files that we know about by creating the file handlers
    [615]r filetype, filetype_info in self.sorted_filetype_items():
--> [616] filehandlers = self._new_filehandlers_for_filetype(filetype_info,
    [617] filename_set,
    [618] fh_kwargs=fh_kwargs)
    [620] if filehandlers:
    [621] created_fhs[filetype] = filehandlers

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/yaml_reader.py:604], in FileYAMLReader._new_filehandlers_for_filetype(self, filetype_info, filenames, fh_kwargs)
    [600]r_iter = self._new_filehandler_instances(filetype_info,
    [601] filename_iter,
    [602] fh_kwargs=fh_kwargs)
    [603]d_iter = self.filter_fh_by_metadata(filehandler_iter)
--> [604]n list(filtered_iter)

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/yaml_reader.py:572], in FileYAMLReader.filter_fh_by_metadata(self, filehandlers)
    [570]f filter_fh_by_metadata(self, filehandlers):
    [571] """Filter out filehandlers using provide filter parameters."""
--> [572] for filehandler in filehandlers:
    [573] filehandler.metadata['start_time'] = filehandler.start_time
    [574] filehandler.metadata['end_time'] = filehandler.end_time

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/yaml_reader.py:513], in FileYAMLReader._new_filehandler_instances(self, filetype_info, filename_items, fh_kwargs)
    [510] warnings.warn(str(err) + ' for {}'.format(filename), stacklevel=4)
    [511] continue
--> [513]d filetype_cls(filename, filename_info, filetype_info, *req_fh, **fh_kwargs)

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/ahi_hsd.py:360], in AHIHSDFileHandler.__init__(self, filename, filename_info, filetype_info, mask_space, calib_mode, user_calibration, round_actual_position)
    [356]r(AHIHSDFileHandler, self).__init__(filename, filename_info,
    [357] filetype_info)
    [359]f.is_zipped = False
--> [360]f._unzipped = unzip_file(self.filename, prefix=str(filename_info['segment']).zfill(2))
    [361] # Assume file is not zipped
    [362]f self._unzipped:
    [363] # But if it is, set the filename to point to unzipped temp file

File [~/miniconda/lib/python3.8/site-packages/satpy/readers/utils.py:251], in unzip_file(filename, prefix)
    [248] return tmpfilepath
    [250] # Otherwise, fall back to the original method
--> [251]e = bz2.BZ2File(filename)
    [252]h closing(os.fdopen(fdn, 'wb')) as ofpt:
    [253] try:

File [~/miniconda/lib/python3.8/bz2.py:96], in BZ2File.__init__(self, filename, mode, buffering, compresslevel)
     [93] raise ValueError("Invalid mode: %r" % (mode,))
     [95]f isinstance(filename, (str, bytes, os.PathLike)):
---> [96] self._fp = _builtin_open(filename, mode)
     [97] self._closefp = True
     [98] self._mode = mode_code

FileNotFoundError: [Errno 2] No such file or directory: 'rep.ahi_jma_nrt.ahixxx15.himawari8.level15.himawaristd/hsfd/201810/04/201810040200/00/B01/HS_H08_20181004_0200_B01_FLDK_R10_S0410.DAT.bz2'

Environment Info:

  • OS: linux, Ubuntu 22:04
  • Satpy Version: 0.41.1
  • PyResample Version: 1.26.1

Additional context
If this appears to be a real bug and someone is looking for that, I can provide keys to get data for the investigation.

@mraspaud
Copy link
Member

mraspaud commented Apr 3, 2024

@tervo Thanks for reporting this issue!

The short answer is that the Himawari readers do not support fsspec at the moment, as you can see here https://satpy.readthedocs.io/en/stable/#id1 (last column gives fsspec compatibility).

To elaborate a bit more, the roadblock with making Himawari readers fsspec-compatible is the usage of functions such as np.fromfile, eg here https://github.com/pytroll/satpy/blob/main/satpy/readers/ahi_hsd.py#L378-L389

This function relies on the getting access to the underlying C file pointer or fileno, which is obviously not available with fsspec file objects, as they are python filesobjects in a more abstract sense. Such functions include np.fromfile and np.memmap.

To make the readers compatible, these function calls need thus to be replaced with calls that work on pure python file objects.

For example, np.fromfile(file_object, ...) calls can be replaced with np.frombuffer(file_object.read(...), ...)

Contributions are always welcome :)

@mraspaud mraspaud added enhancement code enhancements, features, improvements help wanted labels Apr 3, 2024
@simonrp84
Copy link
Member

Looking into this, it appears that ahi_hsd does support fsspec, but the docs weren't updated. See this PR: #2423

I have tested on my machine against files on AWS (using the code given in that PR) and it works OK for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement code enhancements, features, improvements help wanted
Projects
None yet
Development

No branches or pull requests

3 participants