Cleaners¶
The cleaners combine several filters into higher-level cleaning strategies.
CNCNOSCleaner¶
Bases: BaseEstimator
CNC-NOS cleaner using the filters available in this project.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_filters
|
sequence or None
|
Optional sequence of estimators or |
None
|
base_neighbors
|
int
|
Number of neighbors used by the base neighbor-driven filters. |
3
|
score_neighbors
|
int
|
Number of neighbors used to compute the weighted noise score. |
5
|
cv
|
int
|
Number of folds used by the default cross-validated base filters. |
10
|
metric
|
str
|
Distance metric used by the neighbor-based components. |
"minkowski"
|
p
|
int
|
Minkowski power parameter, only used when |
2
|
max_iter
|
int
|
Maximum number of cleaning iterations. |
10
|
stagnation_patience
|
int
|
Number of consecutive low-improvement iterations tolerated before stopping. |
2
|
wns_tol
|
float
|
Minimum improvement in mean weighted noise score to reset the stagnation counter. |
1e-4
|
final_filtering
|
bool
|
Reserved flag for an optional final conservative pass. The current implementation keeps this pass disabled. |
False
|
final_filtering_min_fraction
|
float
|
Minimum candidate fraction required to activate the optional final pass. |
0.2
|
min_class_count
|
int
|
Minimum number of samples per class that must remain after cleaning. |
2
|
random_state
|
int
|
Seed used to initialise the stochastic components. |
33
|
n_jobs
|
int or None
|
Parallelism forwarded to the distance-based base filters. |
None
|
Notes
The cleaner combines a candidate selection stage, a weighted noise score and a consensus/majority relabelling step before optionally removing the remaining candidates.
What to look at in the report¶
keep_mask: samples preserved after cleaning.relabel_mask: samples whose label was changed.remove_mask: samples removed from the final cleaned set.history: per-iteration trace of the cleaning process.