Skip to content

Cleaners

The cleaners combine several filters into higher-level cleaning strategies.

CNCNOSCleaner

Bases: BaseEstimator

CNC-NOS cleaner using the filters available in this project.

Parameters:

Name Type Description Default
base_filters sequence or None

Optional sequence of estimators or (name, estimator) pairs used as the base ensemble. When omitted, a heterogeneous default ensemble is built from the filters shipped with this repository.

None
base_neighbors int

Number of neighbors used by the base neighbor-driven filters.

3
score_neighbors int

Number of neighbors used to compute the weighted noise score.

5
cv int

Number of folds used by the default cross-validated base filters.

10
metric str

Distance metric used by the neighbor-based components.

"minkowski"
p int

Minkowski power parameter, only used when metric="minkowski".

2
max_iter int

Maximum number of cleaning iterations.

10
stagnation_patience int

Number of consecutive low-improvement iterations tolerated before stopping.

2
wns_tol float

Minimum improvement in mean weighted noise score to reset the stagnation counter.

1e-4
final_filtering bool

Reserved flag for an optional final conservative pass. The current implementation keeps this pass disabled.

False
final_filtering_min_fraction float

Minimum candidate fraction required to activate the optional final pass.

0.2
min_class_count int

Minimum number of samples per class that must remain after cleaning.

2
random_state int

Seed used to initialise the stochastic components.

33
n_jobs int or None

Parallelism forwarded to the distance-based base filters.

None
Notes

The cleaner combines a candidate selection stage, a weighted noise score and a consensus/majority relabelling step before optionally removing the remaining candidates.

fit(X, y)

Fit the cleaner and cache the cleaning history and masks.

fit_resample(X, y)

Fit the cleaner and return the cleaned subset.

get_filter_report()

Return a dictionary with the main cleaning diagnostics.

Summary of a CNC-NOS cleaning run.

Per-iteration diagnostics for CNC-NOS.

What to look at in the report

  • keep_mask: samples preserved after cleaning.
  • relabel_mask: samples whose label was changed.
  • remove_mask: samples removed from the final cleaned set.
  • history: per-iteration trace of the cleaning process.