PyTorch Datasets¶
This page lists the supported datasets and their corresponding
PyTorch’s Dataset
class. If you’re interested in the datasets more
than in the code, see this page.
LibriMix¶
-
class
asteroid.data.
LibriMix
(csv_dir, task='sep_clean', sample_rate=16000, n_src=2, segment=3, return_id=False)[source]¶ Bases:
torch.utils.data.Dataset
Dataset class for LibriMix source separation tasks.
- Parameters
csv_dir (str) – The path to the metadata file.
task (str) –
One of
'enh_single'
,'enh_both'
,'sep_clean'
or'sep_noisy'
:'enh_single'
for single speaker speech enhancement.'enh_both'
for multi speaker speech enhancement.'sep_clean'
for two-speaker clean source separation.'sep_noisy'
for two-speaker noisy source separation.
sample_rate (int) – The sample rate of the sources and mixtures.
n_src (int) – The number of sources in the mixture.
segment (int, optional) – The desired sources and mixtures length in s.
- References
[1] “LibriMix: An Open-Source Dataset for Generalizable Speech Separation”, Cosentino et al. 2020.
-
classmethod
loaders_from_mini
(batch_size=4, **kwargs)[source]¶ Downloads MiniLibriMix and returns train and validation DataLoader.
- Parameters
batch_size (int) – Batch size of the Dataloader. Only DataLoader param. To have more control on Dataloader, call mini_from_download and instantiate the DatalLoader.
**kwargs – keyword arguments to pass the LibriMix, see __init__. The kwargs will be fed to both the training set and validation set.
- Returns
train_loader, val_loader – training and validation DataLoader out of LibriMix Dataset.
- Examples
>>> from asteroid.data import LibriMix >>> train_loader, val_loader = LibriMix.loaders_from_mini( >>> task='sep_clean', batch_size=4 >>> )
-
classmethod
mini_from_download
(**kwargs)[source]¶ Downloads MiniLibriMix and returns train and validation Dataset. If you want to instantiate the Dataset by yourself, call mini_download that returns the path to the path to the metadata files.
- Parameters
**kwargs – keyword arguments to pass the LibriMix, see __init__. The kwargs will be fed to both the training set and validation set
- Returns
train_set, val_set – training and validation instances of LibriMix (data.Dataset).
- Examples
>>> from asteroid.data import LibriMix >>> train_set, val_set = LibriMix.mini_from_download(task='sep_clean')
Wsj0mix¶
-
class
asteroid.data.
Wsj0mixDataset
(json_dir, n_src=2, sample_rate=8000, segment=4.0)[source]¶ Bases:
torch.utils.data.Dataset
Dataset class for the wsj0-mix source separation dataset.
- Parameters
json_dir (str) – The path to the directory containing the json files.
sample_rate (int, optional) – The sampling rate of the wav files.
segment (float, optional) – Length of the segments used for training, in seconds. If None, use full utterances (e.g. for test).
n_src (int, optional) – Number of sources in the training targets.
- References
“Deep clustering: Discriminative embeddings for segmentation and separation”, Hershey et al. 2015.
WHAM!¶
-
class
asteroid.data.
WhamDataset
(json_dir, task, sample_rate=8000, segment=4.0, nondefault_nsrc=None, normalize_audio=False)[source]¶ Bases:
torch.utils.data.Dataset
Dataset class for WHAM source separation and speech enhancement tasks.
- Parameters
json_dir (str) – The path to the directory containing the json files.
task (str) –
One of
'enh_single'
,'enh_both'
,'sep_clean'
or'sep_noisy'
.'enh_single'
for single speaker speech enhancement.'enh_both'
for multi speaker speech enhancement.'sep_clean'
for two-speaker clean source separation.'sep_noisy'
for two-speaker noisy source separation.
sample_rate (int, optional) – The sampling rate of the wav files.
segment (float, optional) – Length of the segments used for training, in seconds. If None, use full utterances (e.g. for test).
nondefault_nsrc (int, optional) – Number of sources in the training targets. If None, defaults to one for enhancement tasks and two for separation tasks.
normalize_audio (bool) – If True then both sources and the mixture are normalized with the standard deviation of the mixture.
- References
“WHAM!: Extending Speech Separation to Noisy Environments”, Wichern et al. 2019
WHAMR!¶
-
class
asteroid.data.
WhamRDataset
(json_dir, task, sample_rate=8000, segment=4.0, nondefault_nsrc=None)[source]¶ Bases:
torch.utils.data.Dataset
Dataset class for WHAMR source separation and speech enhancement tasks.
- Parameters
json_dir (str) – The path to the directory containing the json files.
task (str) –
One of
'sep_clean'
,'sep_noisy'
,'sep_reverb'
or'sep_reverb_noisy'
.'sep_clean'
for two-speaker clean (anechoic) source separation.'sep_noisy'
for two-speaker noisy (anechoic) source separation.'sep_reverb'
for two-speaker clean reverberant source separation.'sep_reverb_noisy'
for two-speaker noisy reverberant source separation.
sample_rate (int, optional) – The sampling rate of the wav files.
segment (float, optional) – Length of the segments used for training, in seconds. If None, use full utterances (e.g. for test).
nondefault_nsrc (int, optional) – Number of sources in the training targets. If None, defaults to one for enhancement tasks and two for separation tasks.
- References
“WHAMR!: Noisy and Reverberant Single-Channel Speech Separation”, Maciejewski et al. 2020
SMS-WSJ¶
-
class
asteroid.data.
SmsWsjDataset
(json_path, target, dset, sample_rate=8000, single_channel=True, segment=4.0, nondefault_nsrc=None, normalize_audio=False)[source]¶ Bases:
torch.utils.data.Dataset
Dataset class for SMS WSJ source separation.
- Parameters
json_path (str) – The path to the sms_wsj json file.
target (str) –
One of
'source'
,'early'
or'image'
.'source'
non reverberant clean targets signals.'early'
early reverberation target signals.'image'
reverberant target signals
dset (str) – train_si284 for train, cv_dev93 for validation and test_eval92 for test
sample_rate (int, optional) – The sampling rate of the wav files.
segment (float, optional) – Length of the segments used for training, in seconds. If None, use full utterances (e.g. for test).
single_channel (bool) – if False all channels are used if True only a random channel is used during training and the first channel during test
nondefault_nsrc (int, optional) – Number of sources in the training targets.
normalize_audio (bool) – If True then both sources and the mixture are normalized with the standard deviation of the mixture.
- References
“SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition”, Drude et al. 2019
KinectWSJMix¶
-
class
asteroid.data.
KinectWsjMixDataset
(json_dir, n_src=2, sample_rate=16000, segment=4.0)[source]¶ Bases:
asteroid.data.wsj0_mix.Wsj0mixDataset
Dataset class for the KinectWSJ-mix source separation dataset.
- Parameters
json_dir (str) – The path to the directory containing the json files.
sample_rate (int, optional) – The sampling rate of the wav files.
segment (float, optional) – Length of the segments used for training, in seconds. If None, use full utterances (e.g. for test).
n_src (int, optional) – Number of sources in the training targets.
- References
“Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition”, Sunit Sivasankaran et al. 2020.
DNSDataset¶
-
class
asteroid.data.
DNSDataset
(json_dir)[source]¶ Bases:
torch.utils.data.Dataset
Deep Noise Suppression (DNS) Challenge’s dataset.
- Args
json_dir (str): path to the JSON directory (from the recipe).
- References
“The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results”, Reddy et al. 2020.
MUSDB18¶
-
class
asteroid.data.
MUSDB18Dataset
(root, sources=['vocals', 'bass', 'drums', 'other'], targets=None, suffix='.wav', split='train', subset=None, segment=None, samples_per_track=1, random_segments=False, random_track_mix=False, source_augmentations=<function MUSDB18Dataset.<lambda>>, sample_rate=44100)[source]¶ Bases:
torch.utils.data.Dataset
MUSDB18 music separation dataset
The dataset consists of 150 full lengths music tracks (~10h duration) of different genres along with their isolated stems:
drums, bass, vocals and others.
Out-of-the-box, asteroid does only support MUSDB18-HQ which comes as uncompressed WAV files. To use the MUSDB18, please convert it to WAV first:
MUSDB18 HQ: https://zenodo.org/record/3338373
Note
The datasets are hosted on Zenodo and require that users request access, since the tracks can only be used for academic purposes. We manually check this requests.
This dataset asssumes music tracks in (sub)folders where each folder has a fixed number of sources (defaults to 4). For each track, a list of sources and a common suffix can be specified. A linear mix is performed on the fly by summing up the sources
Due to the fact that all tracks comprise the exact same set of sources, random track mixing can be used can be used, where sources from different tracks are mixed together.
- Folder Structure:
>>> #train/1/vocals.wav ---------| >>> #train/1/drums.wav ----------+--> input (mix), output[target] >>> #train/1/bass.wav -----------| >>> #train/1/other.wav ---------/
- Parameters
root (str) – Root path of dataset
sources (
list
ofstr
, optional) – List of source names that composes the mixture. Defaults to MUSDB18 4 stem scenario: vocals, drums, bass, other.targets (list or None, optional) –
List of source names to be used as targets. If None, a dict with the 4 stems is returned.
If e.g [vocals, drums], a tensor with stacked vocals and drums is returned instead of a dict. Defaults to None.
suffix (str, optional) – Filename suffix, defaults to .wav.
split (str, optional) – Dataset subfolder, defaults to train.
subset (
list
ofstr
, optional) – Selects a specific of list of tracks to be loaded, defaults to None (loads all tracks).segment (float, optional) – Duration of segments in seconds, defaults to
None
which loads the full-length audio tracks.samples_per_track (int, optional) – Number of samples yielded from each track, can be used to increase dataset size, defaults to 1.
random_segments (boolean, optional) – Enables random offset for track segments.
boolean (random_track_mix) – enables mixing of random sources from different tracks to assemble mix.
source_augmentations (
list
ofcallable
) – list of augmentation function names, defaults to no-op augmentations (input = output)sample_rate (int, optional) – Samplerate of files in dataset.
- Variables
root (str) – Root path of dataset
sources (
list
ofstr
, optional) – List of source names. Defaults to MUSDB18 4 stem scenario: vocals, drums, bass, other.suffix (str, optional) – Filename suffix, defaults to .wav.
split (str, optional) – Dataset subfolder, defaults to train.
subset (
list
ofstr
, optional) – Selects a specific of list of tracks to be loaded, defaults to None (loads all tracks).segment (float, optional) – Duration of segments in seconds, defaults to
None
which loads the full-length audio tracks.samples_per_track (int, optional) – Number of samples yielded from each track, can be used to increase dataset size, defaults to 1.
random_segments (boolean, optional) – Enables random offset for track segments.
boolean (random_track_mix) – enables mixing of random sources from different tracks to assemble mix.
source_augmentations (
list
ofcallable
) – list of augmentation function names, defaults to no-op augmentations (input = output)sample_rate (int, optional) – Samplerate of files in dataset.
tracks (
list
ofDict
) – List of track metadata
- References
“The 2018 Signal Separation Evaluation Campaign” Stoter et al. 2018.
DAMP-VSEP¶
-
class
asteroid.data.
DAMPVSEPSinglesDataset
(root_path, task, split='train_singles', ex_per_track=1, random_segments=False, sample_rate=16000, segment=None, norm=None, source_augmentations=None, mixture='original')[source]¶ Bases:
torch.utils.data.Dataset
DAMP-VSEP vocal separation dataset
This dataset utilises one of the two preprocessed versions of DAMP-VSEP from https://github.com/groadabike/DAMP-VSEP-Singles aimed for SINGLE SINGER separation.
The DAMP-VSEP dataset is hosted on Zenodo. https://zenodo.org/record/3553059
- Parameters
root_path (str) – Root path to DAMP-VSEP dataset.
task (str) –
one of
'enh_vocal'
,``’separation’``.'enh_vocal'
for vocal enhanced.'separation'
for vocal and background separation.
split (str) – one of
'train_english'
,'train_singles'
,'valid'
and'test'
. Default to'train_singles'
.ex_per_track (int, optional) – Number of samples yielded from each track, can be used to increase dataset size, defaults to
1
.random_segments (boolean, optional) – Enables random offset for track segments.
sample_rate (int, optional) – Sample rate of files in dataset. Default 16000 Hz
segment (float, optional) – Duration of segments in seconds, Defaults to
None
which loads the full-length audio tracks.norm (Str, optional) –
Type of normalisation to use. Default to
None
'song_level'
use mixture mean and std.`None`
no normalisation
source_augmentations (Callable, optional) – Augmentations applied to the sources (only). Default to
None
.mixture (str, optional) –
Whether to use the original mixture with non-linear effects or remix sources. Default to original.
'remix'
for use addition to remix the sources.'original'
for use the original mixture.
Note
There are 2 train set available:
train_english: Uses all English spoken song. Duets are converted into 2 singles. Totalling 9243 performances and 77Hrs.
train_singles: Uses all singles performances, discarding all duets. Totalling 20660 performances and 149 hrs.
FUSS¶
-
class
asteroid.data.
FUSSDataset
(file_list_path, return_bg=False)[source]¶ Bases:
torch.utils.data.Dataset
Dataset class for FUSS [1] tasks.
- Parameters
- References
[1] Scott Wisdom et al. “What’s All the FUSS About Free Universal Sound Separation Data?”, 2020, in preparation.
AVSpeech¶
-
class
asteroid.data.
AVSpeechDataset
(input_df_path: Union[str, pathlib.Path], embed_dir: Union[str, pathlib.Path], n_src=2)[source]¶ Bases:
torch.utils.data.Dataset
Audio Visual Speech Separation dataset as described in [1].
- Parameters
- References
[1] “Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation” Ephrat et. al https://arxiv.org/abs/1804.03619