museval

A python package to evaluate source separation estimates.

API documentation

museval.eval_dir(reference_dir, estimates_dir, output_dir=None, mode='v4', win=1.0, hop=1.0)[source]

Compute bss_eval metrics for two given directories assuming file names are identical for both, reference source and estimates.

Parameters
reference_dirstr

path to reference sources directory.

estimates_dirstr

path to estimates directory.

output_dirstr

path to output directory used to save evaluation results. Defaults to None, meaning no evaluation files will be saved.

modestr

bsseval version number. Defaults to ‘v4’.

winint

window size in

Returns
scoresTrackStore

scores object that holds the framewise and global evaluation scores.

museval.eval_mus_dir(dataset, estimates_dir, output_dir=None, ext='wav')[source]

Run evaluation of musdb estimate dir

Parameters
datasetDB(object)

MUSDB18 Database object.

estimates_dirstr

Path to estimates folder.

output_dirstr

Output folder where evaluation json files are stored.

extstr

estimate file extension, defaults to wav

museval.eval_mus_track(track, user_estimates, output_dir=None, mode='v4', win=1.0, hop=1.0)[source]

Compute all bss_eval metrics for the musdb track and estimated signals, given by a user_estimates dict.

Parameters
trackTrack

musdb track object loaded using musdb

estimated_sourcesDict

dictionary, containing the user estimates as np.arrays.

output_dirstr

path to output directory used to save evaluation results. Defaults to None, meaning no evaluation files will be saved.

modestr

bsseval version number. Defaults to ‘v4’.

winint

window size in

Returns
scoresTrackStore

scores object that holds the framewise and global evaluation scores.

museval.pad_or_truncate(audio_reference, audio_estimates)[source]

Pad or truncate estimates by duration of references: - If reference > estimates: add zeros at the and of the estimated signal - If estimates > references: truncate estimates to duration of references

Parameters
referencesnp.ndarray, shape=(nsrc, nsampl, nchan)

array containing true reference sources

estimatesnp.ndarray, shape=(nsrc, nsampl, nchan)

array containing estimated sources

Returns
——-
referencesnp.ndarray, shape=(nsrc, nsampl, nchan)

array containing true reference sources

estimatesnp.ndarray, shape=(nsrc, nsampl, nchan)

array containing estimated sources

museval.evaluate(references, estimates, win=44100, hop=44100, mode='v4', padding=True)[source]

BSS_EVAL images evaluation using metrics module

Parameters
referencesnp.ndarray, shape=(nsrc, nsampl, nchan)

array containing true reference sources

estimatesnp.ndarray, shape=(nsrc, nsampl, nchan)

array containing estimated sources

windowint, defaults to 44100

window size in samples

hopint

hop size in samples, defaults to 44100 (no overlap)

modestr

BSSEval version, default to v4

Returns
——-
SDRnp.ndarray, shape=(nsrc,)

vector of Signal to Distortion Ratios (SDR)

ISRnp.ndarray, shape=(nsrc,)

vector of Source to Spatial Distortion Image (ISR)

SIRnp.ndarray, shape=(nsrc,)

vector of Source to Interference Ratios (SIR)

SARnp.ndarray, shape=(nsrc,)

vector of Sources to Artifacts Ratios (SAR)

BSS Eval toolbox, version 4 (Based on mir_eval.separation)

Source separation algorithms attempt to extract recordings of individual sources from a recording of a mixture of sources. Evaluation methods for source separation compare the extracted sources from reference sources and attempt to measure the perceptual quality of the separation.

See also the bss_eval MATLAB toolbox: http://bass-db.gforge.inria.fr/bss_eval/

Conventions

An audio signal is expected to be in the format of a 2-dimensional array where the first dimension goes over the samples of the audio signal and the second dimension goes over the channels (as in stereo left and right). When providing a group of estimated or reference sources, they should be provided in a 3-dimensional array, where the first dimension corresponds to the source number, the second corresponds to the samples and the third to the channels.

Metrics

  • mir_eval.separation.bss_eval(): Computes the bss_eval metrics: source to distortion (SDR), source to artifacts (SAR), source to interference (SIR) ratios, plus the image to spatial ratio (ISR). These are computed on a frame by frame basis, (with infinite window size meaning the whole signal).

    Optionally, the distortion filters are time-varying, corresponding to behavior of BSS Eval version 3. Furthermore, metrics may optionally correspond to the bsseval_sources version, as defined in the BSS Eval version 2.

References

museval.metrics.validate(reference_sources, estimated_sources)[source]

Checks that the input data to a metric are valid, and throws helpful errors if not.

Parameters
reference_sourcesnp.ndarray, shape=(nsrc, nsampl,nchan)

matrix containing true sources

estimated_sourcesnp.ndarray, shape=(nsrc, nsampl,nchan)

matrix containing estimated sources

museval.metrics.bss_eval(reference_sources, estimated_sources, window=88200, hop=66150.0, compute_permutation=False, filters_len=512, framewise_filters=False, bsseval_sources_version=False)[source]

BSS_EVAL version 4.

Measurement of the separation quality for estimated source signals in terms of source to distortion, interference and artifacts ratios, (SDR, SIR, SAR) as well as the image to spatial ratio (ISR), as defined in 2.

The metrics are computed on a framewise basis, with overlap allowed between the windows.

The key difference between this version 4 and BSS Eval version 3 is the possibility of using the same distortion filters for all windows when matching the sources to their estimates, instead of estimating the filters anew at every frame, as done in BSS Eval v3.

This implementation is fully compatible with BSS Eval v2 and v3 written in MATLAB.

Parameters
reference_sourcesnp.ndarray, shape=(nsrc, nsampl, nchan)

matrix containing true sources

estimated_sourcesnp.ndarray, shape=(nsrc, nsampl, nchan)

matrix containing estimated sources

windowint, optional

size of each window for time-varying evaluation. Picking np.inf or any integer greater than nsampl will compute metrics on the whole signal.

hopint, optional

hop size between windows

compute_permutationbool, optional

compute all permutations of estimate/source combinations to compute the best scores (False by default). Note that picking True will lead to a significant computation overhead.

filters_lenint, optional

maximum time lag for the computation of the distortion filters. Default is filters_len = 512.

framewise_filtersbool, optional

Compute a new distortion filter for each frame (False by default). Note that picking True as in BSS Eval v2 and v3 leads to a significant computation overhead.

bsseval_sources_versionbool, optional

if True, results correspond to the bss_eval_sources version from the BSS Eval v2 and v3. Note however that this is not recommended because this evaluation method modifies the references according to the estimated sources, leading to potential problems for the estimation of SDR. For instance, zeroing some frequencies in the estimates will lead those to also be zeroed in the references, and hence not evaluated, artificially boosting results. For this reason, SiSEC always uses the bss_eval_images version, corresponding to False.

Returns
sdrnp.ndarray, shape=(nsrc, nwin)

matrix of Signal to Distortion Ratios (SDR). One for each source and window

isrnp.ndarray, shape=(nsrc, nwin)

matrix of source Image to Spatial distortion Ratios (ISR)

sirnp.ndarray, shape=(nsrc, nwin)

matrix of Source to Interference Ratios (SIR)

sarnp.ndarray, shape=(nsrc, nwin)

matrix of Sources to Artifacts Ratios (SAR)

permnp.ndarray, shape=(nsrc, nwin)

vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number perm[j,t] corresponds to true source number j at window t). Note: perm will be (0,2,...,nsrc-1) if compute_permutation is False.

References

1

Antoine Liutkus, Fabian-Robert Stöter and Nobutaka Ito, “The 2018 Signal Separation Evaluation Campaign,” In Proceedings of LVA/ICA 2018.

2

Emmanuel Vincent, Rémi Gribonval, and Cédric Févotte, “Performance measurement in blind audio source separation,” IEEE Trans. on Audio, Speech and Language Processing, 2006.

Examples

>>> # reference_sources[n] should be a 2D ndarray, with first dimension the
>>> # samples and second dimension the channels of the n'th reference
>>> # source estimated_sources[n] should be the same for the n'th estimated
>>> # source
>>> (sdr, isr, sir, sar, perm) = mir_eval.separation.bss_eval(
>>>    reference_sources,
>>>    estimated_sources)
museval.metrics.bss_eval_sources(reference_sources, estimated_sources, compute_permutation=True)[source]

BSS Eval v3 bss_eval_sources

Wrapper to bss_eval with the right parameters. The call to this function is not recommended. See the description for the bsseval_sources parameter of bss_eval.

museval.metrics.bss_eval_sources_framewise(reference_sources, estimated_sources, window=1323000, hop=661500, compute_permutation=False)[source]

BSS Eval v3 bss_eval_sources_framewise

Wrapper to bss_eval with the right parameters. The call to this function is not recommended. See the description for the bsseval_sources parameter of bss_eval.

museval.metrics.bss_eval_images(reference_sources, estimated_sources, compute_permutation=True)[source]

BSS Eval v3 bss_eval_images

Wrapper to bss_eval with the right parameters.

museval.metrics.bss_eval_images_framewise(reference_sources, estimated_sources, window=1323000, hop=661500, compute_permutation=False)[source]

BSS Eval v3 bss_eval_images_framewise

Framewise computation of bss_eval_images. Wrapper to bss_eval with the right parameters.

class museval.metrics.Framing(window, hop, length)[source]

helper iterator class to do overlapped windowing

class museval.aggregate.TrackStore(track_name, win=1, hop=1, frames_agg='median')[source]

Holds the metric scores for several frames of one track.

This is the fundamental building block of other succeeding scores such as MethodStore and EvalStore. Where as the latter use pandas dataframes, this store is using a simple dict that can easily exported to json using the builtin tools

Attributes
track_namestr

name of track.

winfloat, optional

evaluation window duration in seconds, default to 1 second

hopfloat, optional

hop length in seconds, defaults to 1 second

scoresDict

Nested Dictionary of all scores

frames_aggcallable or str

aggregation function for frames, defaults to ‘median’ == `np.nanmedian

add_target(target_name, values)[source]

add scores of target to the data store

Parameters
target_namestr

name of target to be added to list of targets

valuesList(Dict)

List of framewise data entries, see musdb.schema.json

property json

formats the track scores as json string

Returns
json_stringstr

json dump of the scores dictionary

property df

return track scores as pandas dataframe

Returns
dfDataFrame

pandas dataframe object of track scores

validate()[source]

Validate scores against musdb.schema

save(path)[source]

Saved the track scores as json format

class museval.aggregate.EvalStore(frames_agg='median', tracks_agg='median')[source]

Evaluation Storage that holds the scores for multiple tracks.

This is based on a Pandas DataFrame.

Attributes
dfDataFrame

Pandas DataFrame

frames_aggstr

aggregation function for frames supports mean and median, defaults to median

tracks_aggstr

aggregation function for frames supports mean and median, defaults to `’median’

add_track(track)[source]

add track score object to dataframe

Parameters
trackTrackStore or DataFrame

track store object

add_eval_dir(path)[source]

add precomputed json folder to dataframe

Parameters
pathstr

path to evaluation results

agg_frames_scores()[source]

aggregates frames scores

Returns
df_aggregated_framesGroupBy

data frame with frames aggregated by mean or median

agg_frames_tracks_scores()[source]

aggregates frames and track scores

Returns
df_aggregated_framesGroupBy

data frame with frames and tracks aggregated by mean or median

load(path)[source]

loads pickled dataframe

Parameters
pathstr
save(path)[source]

saves pickled dataframe

Parameters
pathstr
class museval.aggregate.MethodStore(frames_agg='median', tracks_agg='median')[source]

Holds a pandas DataFrame that stores data for several methods.

Attributes
dfDataFrame

Pandas DataFrame

frames_aggstr

aggregation function for frames supports mean and median, defaults to median

tracks_aggstr

aggregation function for frames supports mean and median, defaults to `’median’

add_sisec18()[source]

adds sisec18 participants results to DataFrame.

Scores will be downloaded on demand.

add_eval_dir(path)[source]

add precomputed json folder to dataframe.

The method name will be defined by the basename of provided path

Parameters
pathstr

path to evaluation results

add_evalstore(method, name)[source]

add DataFrame

The method name will be defined by the basename of provided path

Parameters
methodEvalStore

EvalStore object

namestr

name of method

agg_frames_scores()[source]

aggregates frames scores

Returns
df_aggregated_framesGroupBy

data frame with frames and tracks aggregated by mean or median

agg_frames_tracks_scores()[source]

aggregates frames and track scores

Returns
df_aggregated_framesGroupBy

data frame with frames and tracks aggregated by mean or median

load(path)[source]

loads pickled dataframe

Parameters
pathstr
save(path)[source]

saves pickled dataframe

Parameters
pathstr
museval.aggregate.json2df(json_string, track_name)[source]

converts json scores into pandas dataframe

Parameters
json_stringstr
track_namestr
museval.cli.bsseval(inargs=None)[source]

Generic cli app for bsseval results. Expects two folder with

museval.cli.museval(inargs=None)[source]

Commandline interface for museval evaluation tools