Spatial LibriSpeech

Spatial LibriSpeech, is a spatial audio dataset with over 650 hours of first-order ambisonics, and optional distractor noise (with raw 19-channel audio coming soon). Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech was generated by augmenting LibriSpeech samples with 200k+ simulated acoustic conditions across 8k+ synthetic rooms.

For more information, refer to our paper: https://doi.org/10.21437/Interspeech.2023-2117.

If you use Spatial LibriSpeech in a publication, please cite our paper:

@inproceedings{spatial_librispeech2023,
  author={Miguel Sarabia and Elena Menyaylenko and Alessandro Toso and Skyler Seto
          and Zakaria Aldeneh and Shadi Pirhosseinloo and Luca Zappella
          and Barry-John Theobald and Nicholas Apostoloff and Jonathan Sheaffer},
  title={{Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning}},
  year={2023},
  booktitle={Proc. Interspeech},
  pages={3724--3728},
  doi={10.21437/Interspeech.2023-2117}
}

📜 License

By downloading and using Spatial LibriSpeech, you are agreeing to comply with the terms of its LICENSE.

💾 Download

Our downloader script & pytorch dataloader will be uploaded soon.

Manual download

In the meantime, all our files are hosted here:

SLS_URI = "https://docs-assets.developer.apple.com/ml-research/datasets/spatial-librispeech/v1"

You can manually download the metadata from here. Refer to dataset schema for more information about how the data is structured.

f"{SLS_URI}/metadata.parquet"

Using the metadata you can manually download samples with:

# speech first order ambisonics samples
f"{SLS_URI}/ambisonics/{sample_id:06}.flac"
# distractor noise first order ambisonics samples
f"{SLS_URI}/noise_ambisonics/{sample_id:06}.flac"

So, for instance, you may download the metadata with this command:

curl -O https://docs-assets.developer.apple.com/ml-research/datasets/spatial-librispeech/v1/metadata.parquet

And the first speech sample with:

curl -O https://docs-assets.developer.apple.com/ml-research/datasets/spatial-librispeech/v1/ambisonics/000000.flac

⚠️ 19-channel speech and distractor noise samples are very large and we are evaluating how to best host them. If you need them in the meantime, please contact us.

✉️ Contact

spatial-librispeech-dataset@group.apple.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.editorconfig		.editorconfig
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATASET_SCHEMA.md		DATASET_SCHEMA.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.editorconfig

.editorconfig

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

DATASET_SCHEMA.md

DATASET_SCHEMA.md

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Spatial LibriSpeech

📜 License

💾 Download

Manual download

✉️ Contact

About

Contributors 2

License

apple/ml-spatial-librispeech

Folders and files

Latest commit

History

Repository files navigation

Spatial LibriSpeech

📜 License

💾 Download

Manual download

✉️ Contact

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks