TabPFN-based filters¶

These filters use TabPFNClassifier as the base learner and expose a local explanation report for noisy samples.

TabPFN_CF¶

Bases: ClassificationFilter

Cross-validated TabPFN label-noise filter with fold-aware explanations.

Parameters:

Name	Type	Description	Default
`cv`	`int`	Number of stratified folds used to generate out-of-fold predictions.	`10`
`random_state`	`int`	Seed used by the stratified splitter and forwarded to TabPFN.	`33`
`action`	`('remove', 'detect')`	Whether noisy samples are dropped or only detected. Relabel is not implemented yet.	`"remove"`
`tabpfn_params`	`dict or None`	Keyword arguments forwarded to :class:`tabpfn.TabPFNClassifier`.	`None`

Notes

Explanations are computed with SHAP-based tooling from tabpfn_extensions. Using fit_mode="fit_with_cache" is recommended for faster and more stable explanations.

`fit(X, y)` ¶

Fit the filter and cache fold-wise predictions and diagnostics.

`fit_resample(X, y)` ¶

Fit the filter and return the filtered data.

`get_filter_report()` ¶

Return a dictionary with the main fit diagnostics.

`get_detection_report()` ¶

Return the stored detection report.

`get_fold_history()` ¶

Return the stored per-fold diagnostics.

`explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)` ¶

Explain OOF decisions made by the filter and aggregate the result.

Parameters:

Name	Type	Description	Default
`sample_indices`	`Sequence[int] \| ndarray \| None`	Optional subset of sample indices to explain.	`None`
`noisy_only`	`bool`	If True, ignore samples that were not flagged as noisy by the filter.	`True`
`class_index`	`Any`	"predicted" (default) explains the OOF predicted class for each sample, "true" explains the true class, or pass an int/label to use a fixed class.	`'predicted'`
`index`	`str`	Forwarded to `get_tabpfn_imputation_explainer` and `explain`.	`'SV'`
`max_order`	`str`	Forwarded to `get_tabpfn_imputation_explainer` and `explain`.	`'SV'`
`imputer`	`str`	Forwarded to `get_tabpfn_imputation_explainer` and `explain`.	`'SV'`
`budget`	`str`	Forwarded to `get_tabpfn_imputation_explainer` and `explain`.	`'SV'`
`top_k`	`int \| None`	Number of strongest contributions to report per sample. Set to `None` or `0` to skip.	`5`
`sort_by`	`str`	`"confidence"` (default), `"fold_idx"` or `"sample_idx"`.	`'confidence'`
`ascending`	`bool`	Sort direction. For confidence, ascending means less confident first.	`True`
`feature_names`	`Sequence[Any] \| None`	Optional feature names used in the returned contributions and plots.	`None`
`return_interaction_values`	`bool`	If True, store the raw `InteractionValues` object per explanation.	`False`
`return_figures`	`bool`	If True, attach a waterfall figure for each explanation.	`False`
`max_display`	`int`	Max number of interactions shown in the waterfall plot.	`10`

Per-fold diagnostics collected during a TabPFN cross-validation run.

Local explanation for a single sample flagged by TabPFN.

Container for TabPFN filter explanations and fold diagnostics.

`by_fold()` ¶

Group the stored explanations by fold index.

TabPFN_CVCF¶

Bases: CVCFFilter

Cross-validated TabPFN committee filter with fold-aware explanations.

Parameters:

Name	Type	Description	Default
`cv`	`int`	Number of stratified folds used to build the committee.	`10`
`vote_rule`	`('consensus', 'threshold')`	Rule used to flag samples as noisy from the fold disagreements.	`"consensus"`
`threshold`	`float`	Minimum fraction of disagreeing folds required when `vote_rule="threshold"`.	`0.5`
`random_state`	`int`	Seed used by the stratified splitter and forwarded to TabPFN.	`33`
`action`	`('remove', 'detect')`	Whether noisy samples are dropped or only detected. Relabel is not implemented yet.	`"remove"`
`tabpfn_params`	`dict or None`	Keyword arguments forwarded to :class:`tabpfn.TabPFNClassifier`.	`None`

Notes

Explanations are computed fold by fold and then aggregated into all-fold and majority-fold views. Using fit_mode="fit_with_cache" is recommended for faster and more stable explanations.

`fit(X, y)` ¶

Fit the filter and cache committee predictions and diagnostics.

`fit_resample(X, y)` ¶

Fit the filter and return the filtered data.

`get_filter_report()` ¶

Return a dictionary with the main fit diagnostics.

`get_detection_report()` ¶

Return the stored detection report.

`get_fold_history()` ¶

Return the stored per-fold diagnostics.

`explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)` ¶

Explain committee predictions fold by fold and aggregate them.

The all-view aggregates every fold explanation. The majority-view only aggregates folds that vote for the committee majority class of the sample. By default, each fold is explained with its own predicted class.

Per-fold diagnostics collected during a TabPFN committee run.

Local explanation computed for one fold and one sample.

Aggregated feature contribution view for a sample.

Complete explanation summary for one sample under the TabPFN committee filter.

Container for TabPFN committee explanations and fold diagnostics.

`by_fold()` ¶

Group the fold explanations by fold index.

`by_sample()` ¶

Return the sample explanations indexed by sample id.

Reading the explanation report¶

class_index="predicted" explains the class chosen by the model.
top_k stores the strongest local SHAP contributions.
confidence is the probability of the explained class.
all_view aggregates every fold, while majority_view only keeps the folds aligned with the committee vote.

TabPFN-based filters¶

TabPFN_CF¶

fit(X, y) ¶

fit_resample(X, y) ¶

get_filter_report() ¶

get_detection_report() ¶

get_fold_history() ¶

explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10) ¶

by_fold() ¶

TabPFN_CVCF¶

fit(X, y) ¶

fit_resample(X, y) ¶

get_filter_report() ¶

get_detection_report() ¶

get_fold_history() ¶

explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10) ¶

by_fold() ¶

by_sample() ¶

Reading the explanation report¶

`fit(X, y)` ¶

`fit_resample(X, y)` ¶

`get_filter_report()` ¶

`get_detection_report()` ¶

`get_fold_history()` ¶

`explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)` ¶

`by_fold()` ¶

`fit(X, y)` ¶

`fit_resample(X, y)` ¶

`get_filter_report()` ¶

`get_detection_report()` ¶

`get_fold_history()` ¶

`explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)` ¶

`by_fold()` ¶

`by_sample()` ¶