Filterbank API¶

Filterbank, Encoder and Decoder¶

class asteroid_filterbanks.Filterbank(n_filters, kernel_size, stride=None, sample_rate=8000.0)[source]¶

Bases: torch.nn.Module

Base Filterbank class. Each subclass has to implement a filters method.

Parameters

n_filters (int) – Number of filters.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the conv or transposed conv. (Hop size). If None (default), set to kernel_size // 2.
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.

Variables

n_feats_out (int) – Number of output filters.

filters()[source]¶: Abstract method for filters.

pre_analysis(wav: torch.Tensor)[source]¶: Apply transform before encoder convolution.

post_analysis(spec: torch.Tensor)[source]¶: Apply transform to encoder convolution.

pre_synthesis(spec: torch.Tensor)[source]¶: Apply transform before decoder transposed convolution.

post_synthesis(wav: torch.Tensor)[source]¶: Apply transform after decoder transposed convolution.

get_config()[source]¶: Returns dictionary of arguments to re-instantiate the class. Needs to be subclassed if the filterbanks takes additional arguments than n_filters kernel_size stride and sample_rate.

class asteroid_filterbanks.Encoder(filterbank, is_pinv=False, as_conv1d=True, padding=0)[source]¶

Bases: asteroid_filterbanks.enc_dec._EncDec

Encoder class.

Add encoding methods to Filterbank classes. Not intended to be subclassed.

Parameters

filterbank (Filterbank) – The filterbank to use as an encoder.
is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
as_conv1d (bool) – Whether to behave like nn.Conv1d. If True (default), forwarding input with shape \((batch, 1, time)\) will output a tensor of shape \((batch, freq, conv\_time)\). If False, will output a tensor of shape \((batch, 1, freq, conv\_time)\).
padding (int) – Zero-padding added to both sides of the input.

classmethod pinv_of(filterbank, **kwargs)[source]¶: Returns an Encoder, pseudo inverse of a Filterbank or Decoder.

forward(waveform)[source]¶

Convolve input waveform with the filters from a filterbank.

Parameters: waveform (torch.Tensor) – any tensor with samples along the last dimension. The waveform representation with and batch/channel etc.. dimension.
Returns: torch.Tensor – The corresponding TF domain signal.

Shapes

>>> (time, ) -> (freq, conv_time)
>>> (batch, time) -> (batch, freq, conv_time)  # Avoid
>>> if as_conv1d:
>>>     (batch, 1, time) -> (batch, freq, conv_time)
>>>     (batch, chan, time) -> (batch, chan, freq, conv_time)
>>> else:
>>>     (batch, chan, time) -> (batch, chan, freq, conv_time)
>>> (batch, any, dim, time) -> (batch, any, dim, freq, conv_time)

class asteroid_filterbanks.Decoder(filterbank, is_pinv=False, padding=0, output_padding=0)[source]¶

Bases: asteroid_filterbanks.enc_dec._EncDec

Decoder class.

Add decoding methods to Filterbank classes. Not intended to be subclassed.

Parameters

filterbank (Filterbank) – The filterbank to use as an decoder.
is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
padding (int) – Zero-padding added to both sides of the input.
output_padding (int) – Additional size added to one side of the output shape.

Note

padding and output_padding arguments are directly passed to F.conv_transpose1d.

classmethod pinv_of(filterbank)[source]¶: Returns an Decoder, pseudo inverse of a filterbank or Encoder.

forward(spec, length: Optional[int] = None) → torch.Tensor[source]¶

Applies transposed convolution to a TF representation.

This is equivalent to overlap-add.

Parameters

spec (torch.Tensor) – 3D or 4D Tensor. The TF representation. (Output of Encoder.forward()).
length – desired output length.

Returns

torch.Tensor – The corresponding time domain signal.

class asteroid_filterbanks.make_enc_dec[source]¶

Creates congruent encoder and decoder from the same filterbank family.

Parameters

fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft']. Can also be a class defined in a submodule in this subpackade (e.g. FreeFB).
n_filters (int) – Number of filters.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.0.
who_is_pinv (str, optional) – If None, no pseudo-inverse filters will be used. If string (among ['encoder', 'decoder']), decides which of Encoder or Decoder will be the pseudo inverse of the other one.
padding (int) – Zero-padding added to both sides of the input. Passed to Encoder and Decoder.
output_padding (int) – Additional size added to one side of the output shape. Passed to Decoder.
**kwargs – Arguments which will be passed to the filterbank class additionally to the usual n_filters, kernel_size and stride. Depends on the filterbank family.

Returns

Encoder, Decoder

class asteroid_filterbanks.get[source]¶

Returns a filterbank class from a string. Returns its input if it is callable (already a Filterbank for example).

Parameters: identifier (str or Callable or None) – the filterbank identifier.
Returns: Filterbank or None

Learnable filterbanks¶

Free¶

class asteroid_filterbanks.free_fb.FreeFB(n_filters, kernel_size, stride=None, sample_rate=8000.0, **kwargs)[source]¶

Bases: asteroid_filterbanks.enc_dec.Filterbank

Free filterbank without any constraints. Equivalent to nn.Conv1d.

Parameters

n_filters (int) – Number of filters.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.

Variables

n_feats_out (int) – Number of output filters.

References: [1] : “Filterbank design for end-to-end speech separation”. ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.

filters()[source]¶: Abstract method for filters.

Analytic Free¶

class asteroid_filterbanks.analytic_free_fb.AnalyticFreeFB(n_filters, kernel_size, stride=None, sample_rate=8000.0, **kwargs)[source]¶

Bases: asteroid_filterbanks.enc_dec.Filterbank

Free analytic (fully learned with analycity constraints) filterbank. For more details, see [1].

Parameters

n_filters (int) – Number of filters. Half of n_filters will have parameters, the other half will be the hilbert transforms. n_filters should be even.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.

Variables

n_feats_out (int) – Number of output filters.

References: [1] : “Filterbank design for end-to-end speech separation”. ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.

filters()[source]¶: Abstract method for filters.

Parameterized Sinc¶

class asteroid_filterbanks.param_sinc_fb.ParamSincFB(n_filters, kernel_size, stride=None, sample_rate=16000.0, min_low_hz=50, min_band_hz=50, **kwargs)[source]¶

Bases: asteroid_filterbanks.enc_dec.Filterbank

Extension of the parameterized filterbank from [1] proposed in [2]. Modified and extended from from https://github.com/mravanelli/SincNet

Parameters

n_filters (int) – Number of filters. Half of n_filters (the real parts) will have parameters, the other half will correspond to the imaginary parts. n_filters should be even.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
sample_rate (float, optional) – The sample rate (used for initialization).
min_low_hz (int, optional) – Lowest low frequency allowed (Hz).
min_band_hz (int, optional) – Lowest band frequency allowed (Hz).

Variables

n_feats_out (int) – Number of output filters.

References

[1] : “Speaker Recognition from raw waveform with SincNet”. SLT 2018. Mirco Ravanelli, Yoshua Bengio. https://arxiv.org/abs/1808.00158

[2] : “Filterbank design for end-to-end speech separation”. ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent. https://arxiv.org/abs/1910.10400

filters()[source]¶: Compute filters from parameters

get_config()[source]¶: Returns dictionary of arguments to re-instantiate the class.

Fixed filterbanks¶

STFT¶

class asteroid_filterbanks.stft_fb.STFTFB(n_filters, kernel_size, stride=None, window=None, sample_rate=8000.0, **kwargs)[source]¶

Bases: asteroid_filterbanks.enc_dec.Filterbank

STFT filterbank.

Parameters

n_filters (int) – Number of filters. Determines the length of the STFT filters before windowing.
kernel_size (int) – Length of the filters (i.e the window).
stride (int, optional) – Stride of the convolution (hop size). If None (default), set to kernel_size // 2.
window (numpy.ndarray, optional) – If None, defaults to np.sqrt(np.hanning()).
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.

Variables

n_feats_out (int) – Number of output filters.

filters()[source]¶: Abstract method for filters.

asteroid_filterbanks.stft_fb.perfect_synthesis_window(analysis_window, hop_size)[source]¶

Computes a window for perfect synthesis given an analysis window and: a hop size.

Parameters

analysis_window (np.array) – Analysis window of the transform.
hop_size (int) – Hop size in number of samples.

Returns

np.array – the synthesis window to use for perfectly inverting the STFT.

MelGram¶

class asteroid_filterbanks.melgram_fb.MelGramFB(n_filters, kernel_size, stride=None, window=None, sample_rate=8000.0, n_mels=40, fmin=0.0, fmax=None, norm='slaney', **kwargs)[source]¶

Bases: asteroid_filterbanks.stft_fb.STFTFB

Mel magnitude spectrogram filterbank.

Parameters

n_filters (int) – Number of filters. Determines the length of the STFT filters before windowing.
kernel_size (int) – Length of the filters (i.e the window).
stride (int, optional) – Stride of the convolution (hop size). If None (default), set to kernel_size // 2.
window (numpy.ndarray, optional) – If None, defaults to np.sqrt(np.hanning()).
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.
n_mels (int) – Number of mel bands.
fmin (float) – Minimum frequency of the mel filters.
fmax (float) – Maximum frequency of the mel filters. Defaults to sample_rate//2.
norm (str) – Mel normalization {None, ‘slaney’, or number}. See librosa.filters.mel
**kwargs –

post_analysis(spec: torch.Tensor)[source]¶: Apply transform to encoder convolution.

get_config()[source]¶: Returns dictionary of arguments to re-instantiate the class. Needs to be subclassed if the filterbanks takes additional arguments than n_filters kernel_size stride and sample_rate.

class asteroid_filterbanks.melgram_fb.MelScale(n_filters, sample_rate=8000.0, n_mels=40, fmin=0.0, fmax=None, norm='slaney')[source]¶

Bases: torch.nn.Module

Mel-scale filterbank matrix.

Parameters

n_filters (int) – Number of filters. Determines the length of the STFT filters before windowing.
sample_rate (float) – Sample rate of the expected audio. Defaults to 8000.
n_mels (int) – Number of mel bands.
fmin (float) – Minimum frequency of the mel filters.
fmax (float) – Maximum frequency of the mel filters. Defaults to sample_rate//2.
norm (str) – Mel normalization {None, ‘slaney’, or number}. See librosa.filters.mel

MPGT¶

class asteroid_filterbanks.multiphase_gammatone_fb.MultiphaseGammatoneFB(n_filters=128, kernel_size=16, sample_rate=8000.0, stride=None, **kwargs)[source]¶

Bases: asteroid_filterbanks.enc_dec.Filterbank

Multi-Phase Gammatone Filterbank as described in [1].

Please cite [1] whenever using this.

Original code repository:

Parameters

n_filters (int) – Number of filters.
kernel_size (int) – Length of the filters.
sample_rate (float, optional) – The sample rate (used for initialization).
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.

References: [1] David Ditter, Timo Gerkmann, “A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet”, ICASSP 2020 Available: https://ieeexplore.ieee.org/document/9053602/

filters()[source]¶: Abstract method for filters.

Transforms¶

Griffin-Lim and MISI¶

asteroid_filterbanks.griffin_lim.griffin_lim(mag_specgram, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.9)[source]¶

Estimates matching phase from magnitude spectogram using the ‘fast’ Griffin Lim algorithm [1].

Parameters

mag_specgram (torch.Tensor) – (any, dim, ension, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrogram to be inverted.
stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
n_iter (int) – Number of griffin-lim iterations to run.
momentum (float) – The momentum of fast Griffin-Lim. Original Griffin-Lim is obtained for momentum=0.

Returns

torch.Tensor – estimated waveforms of shape (any, dim, ension, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 1, 8000)
>>> spec = stft(wav)
>>> masked_spec = spec * torch.sigmoid(torch.randn_like(spec))
>>> mag = transforms.mag(masked_spec, -2)
>>> est_wav = griffin_lim(mag, stft, n_iter=32)

References

[1] Perraudin et al. “A fast Griffin-Lim algorithm,” WASPAA 2013.

[2] D. W. Griffin and J. S. Lim: “Signal estimation from modified short-time Fourier transform,” ASSP 1984.

asteroid_filterbanks.griffin_lim.misi(mixture_wav, mag_specgrams, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.0, src_weights=None, dim=1)[source]¶

Jointly estimates matching phase from magnitude spectograms using the Multiple Input Spectrogram Inversion (MISI) algorithm [1].

Parameters

mixture_wav (torch.Tensor) – (batch, time)
mag_specgrams (torch.Tensor) – (batch, n_src, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrograms to be jointly inverted using MISI (modified or not).
stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
n_iter (int) – Number of MISI iterations to run.
momentum (float) – Momentum on updates (this argument comes from GriffinLim). Defaults to 0 as it was never proposed anywhere.
src_weights (None or torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to istft_dec(mag_specgrams). We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
dim (int) – Axis which contains the sources in mag_specgrams. Used for consistency constraint.

Returns

torch.Tensor – estimated waveforms of shape (batch, n_src, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 3, 8000)
>>> specs = stft(wav)
>>> masked_specs = specs * torch.sigmoid(torch.randn_like(specs))
>>> mag = transforms.mag(masked_specs, -2)
>>> est_wav = misi(wav.sum(1), mag, stft, n_iter=32)

References

[1] Gunawan and Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures,” in IEEE Signal Processing Letters, 2010.

[2] Wang, LeRoux et al. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” Interspeech 2018 (2018)

Complex transforms¶

asteroid_filterbanks.transforms.mul_c(inp, other, dim: int = -2)[source]¶

Entrywise product for complex valued tensors.

Operands are assumed to have the real parts of each entry followed by the imaginary parts of each entry along dimension dim, e.g. for, dim = 1, the matrix

[[1, 2, 3, 4],
 [5, 6, 7, 8]]

is interpreted as

[[1 + 3j, 2 + 4j],
 [5 + 7j, 6 + 8j]

where j is such that j * j = -1.

Parameters

inp (torch.Tensor) – The first operand with real and imaginary parts concatenated on the dim axis.
other (torch.Tensor) – The second operand.
dim (int, optional) – frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – The complex multiplication between inp and other

For now, it assumes that other has the same shape as inp along dim.

asteroid_filterbanks.transforms.reim(x, dim: int = -2) → Tuple[torch.Tensor, torch.Tensor][source]¶

Returns a tuple (re, im).

Parameters

x (torch.Tensor) – Complex valued tensor.
dim (int) – frequency (or equivalent) dimension along which real and imaginary values are concatenated.

asteroid_filterbanks.transforms.mag(x, dim: int = -2, EPS: float = 1e-08)[source]¶

Takes the magnitude of a complex tensor.

The operands is assumed to have the real parts of each entry followed by the imaginary parts of each entry along dimension dim, e.g. for, dim = 1, the matrix

[[1, 2, 3, 4],
 [5, 6, 7, 8]]

is interpreted as

[[1 + 3j, 2 + 4j],
 [5 + 7j, 6 + 8j]

where j is such that j * j = -1.

Parameters

x (torch.Tensor) – Complex valued tensor.
dim (int) – frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – The magnitude of x.

asteroid_filterbanks.transforms.magreim(x, dim: int = -2)[source]¶

Returns a concatenation of (mag, re, im).

Parameters

x (torch.Tensor) – Complex valued tensor.
dim (int) – frequency (or equivalent) dimension along which real and imaginary values are concatenated.

asteroid_filterbanks.transforms.apply_real_mask(tf_rep, mask, dim: int = -2)[source]¶

Applies a real-valued mask to a real-valued representation.

It corresponds to ReIm mask in [1].

Parameters

tf_rep (torch.Tensor) – The time frequency representation to apply the mask to.
mask (torch.Tensor) – The real-valued mask to be applied.
dim (int) – Kept to have the same interface with the other ones.

Returns

torch.Tensor – tf_rep multiplied by the mask.

asteroid_filterbanks.transforms.apply_mag_mask(tf_rep, mask, dim: int = -2)[source]¶

Applies a real-valued mask to a complex-valued representation.

If tf_rep has 2N elements along dim, mask has N elements, mask is duplicated along dim to apply the same mask to both the Re and Im.

tf_rep is assumed to have the real parts of each entry followed by the imaginary parts of each entry along dimension dim, e.g. for, dim = 1, the matrix

[[1, 2, 3, 4],
 [5, 6, 7, 8]]

is interpreted as

[[1 + 3j, 2 + 4j],
 [5 + 7j, 6 + 8j]

where j is such that j * j = -1.

Parameters

tf_rep (torch.Tensor) – The time frequency representation to apply the mask to. Re and Im are concatenated along dim.
mask (torch.Tensor) – The real-valued mask to be applied.
dim (int) – The frequency (or equivalent) dimension of both tf_rep and mask along which real and imaginary values are concatenated.

Returns

torch.Tensor – tf_rep multiplied by the mask.

asteroid_filterbanks.transforms.apply_complex_mask(tf_rep, mask, dim: int = -2)[source]¶

Applies a complex-valued mask to a complex-valued representation.

Operands are assumed to have the real parts of each entry followed by the imaginary parts of each entry along dimension dim, e.g. for, dim = 1, the matrix

[[1, 2, 3, 4],
 [5, 6, 7, 8]]

is interpreted as

[[1 + 3j, 2 + 4j],
 [5 + 7j, 6 + 8j]

where j is such that j * j = -1.

Parameters

tf_rep (torch.Tensor) – The time frequency representation to apply the mask to.
(class (mask) – torch.Tensor): The complex-valued mask to be applied.
dim (int) – The frequency (or equivalent) dimension of both tf_rep an mask along which real and imaginary values are concatenated.

Returns

torch.Tensor – tf_rep multiplied by the mask in the complex sense.

asteroid_filterbanks.transforms.is_asteroid_complex(tensor, dim: int = -2)[source]¶

Check if tensor is complex-like in a given dimension.

Parameters

tensor (torch.Tensor) – tensor to be checked.
dim (int) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

True if dimension is even in the specified dimension, otherwise False

asteroid_filterbanks.transforms.check_complex(tensor, dim: int = -2)[source]¶

Assert that tensor is an Asteroid-style complex in a given dimension.

Parameters

tensor (torch.Tensor) – tensor to be checked.
dim (int) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Raises

AssertionError if dimension is not even in the specified dimension –

asteroid_filterbanks.transforms.to_numpy(tensor, dim: int = -2)[source]¶

Convert complex-like torch tensor to numpy complex array

Parameters

tensor (torch.Tensor) – Complex tensor to convert to numpy.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

numpy.array – Corresponding complex array.

asteroid_filterbanks.transforms.from_numpy(array, dim: int = -2)[source]¶

Convert complex numpy array to complex-like torch tensor.

Parameters

array (np.array) – array to be converted.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – Corresponding torch.Tensor (complex axis in dim `dim`=

asteroid_filterbanks.transforms.is_torchaudio_complex(x)[source]¶

Check if tensor is Torchaudio-style complex-like (last dimension is 2).

Parameters: x (torch.Tensor) – tensor to be checked.
Returns: True if last dimension is 2, else False.

asteroid_filterbanks.transforms.check_torchaudio_complex(tensor)[source]¶

Assert that tensor is Torchaudo-style complex-like (last dimension is 2).

Parameters: tensor (torch.Tensor) – tensor to be checked.
Raises: AssertionError if last dimension is != 2. –

asteroid_filterbanks.transforms.to_torchaudio(tensor, dim: int = -2)[source]¶

Converts complex-like torch tensor to torchaudio style complex tensor.

Parameters

tensor (torch.tensor) – asteroid-style complex-like torch tensor.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – torchaudio-style complex-like torch tensor.

asteroid_filterbanks.transforms.from_torchaudio(tensor, dim: int = -2)[source]¶

Converts torchaudio style complex tensor to complex-like torch tensor.

Parameters

tensor (torch.tensor) – torchaudio-style complex-like torch tensor.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – asteroid-style complex-like torch tensor.

asteroid_filterbanks.transforms.to_torch_complex(tensor, dim: int = -2)[source]¶

Converts complex-like torch tensor to native PyTorch complex tensor.

Parameters

tensor (torch.tensor) – asteroid-style complex-like torch tensor.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – Pytorch native complex-like torch tensor.

asteroid_filterbanks.transforms.from_torch_complex(tensor, dim: int = -2)[source]¶

Converts Pytorch native complex tensor to complex-like torch tensor.

Parameters

tensor (torch.tensor) – PyTorch native complex-like torch tensor.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – asteroid-style complex-like torch tensor.

asteroid_filterbanks.transforms.angle(tensor, dim: int = -2)[source]¶

Return the angle of the complex-like torch tensor.

Parameters

tensor (torch.Tensor) – the complex tensor from which to extract the phase.
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – The counterclockwise angle from the positive real axis on the complex plane in radians.

asteroid_filterbanks.transforms.from_magphase(mag_spec, phase, dim: int = -2)[source]¶

Return a complex-like torch tensor from magnitude and phase components.

Parameters

mag_spec (torch.tensor) – magnitude of the tensor.
phase (torch.tensor) – angle of the tensor
dim (int, optional) – the frequency (or equivalent) dimension along which real and imaginary values are concatenated.

Returns

torch.Tensor – The corresponding complex-like torch tensor.

asteroid_filterbanks.transforms.magphase(spec: torch.Tensor, dim: int = -2) → Tuple[torch.Tensor, torch.Tensor][source]¶: Splits Asteroid complex-like tensor into magnitude and phase.

asteroid_filterbanks.transforms.centerfreq_correction(spec: torch.Tensor, kernel_size: int, stride: int = None, dim: int = -2) → torch.Tensor[source]¶

Corrects phase from the input spectrogram so that a sinusoid in the middle of a bin keeps the same phase from one frame to the next.

Parameters

spec – Spectrogram tensor of shape (batch, n_freq + 2, frames).
kernel_size (int) – Kernel size of the STFT.
stride (int) – Stride of the STFT.
dim (int) – Only works of dim=-2.

Returns

Tensor – the input spec with corrected phase.

asteroid_filterbanks.transforms.phase_centerfreq_correction(phase: torch.Tensor, kernel_size: int, stride: int = None) → torch.Tensor[source]¶

Corrects phase so that a sinusoid in the middle of a bin keeps the same phase from one frame to the next.

Parameters

phase – tensor of shape (batch, n_freq//2 + 1, frames)
kernel_size (int) – Kernel size of the STFT.
stride (int) – Stride of the STFT.

Returns

Tensor – corrected phase.