Skip to content

TabPFN-based filters

These filters use TabPFNClassifier as the base learner and expose a local explanation report for noisy samples.

TabPFN_CF

Bases: ClassificationFilter

Cross-validated TabPFN label-noise filter with fold-aware explanations.

Parameters:

Name Type Description Default
cv int

Number of stratified folds used to generate out-of-fold predictions.

10
random_state int

Seed used by the stratified splitter and forwarded to TabPFN.

33
action ('remove', 'detect')

Whether noisy samples are dropped or only detected. Relabel is not implemented yet.

"remove"
tabpfn_params dict or None

Keyword arguments forwarded to :class:tabpfn.TabPFNClassifier.

None
Notes

Explanations are computed with SHAP-based tooling from tabpfn_extensions. Using fit_mode="fit_with_cache" is recommended for faster and more stable explanations.

fit(X, y)

Fit the filter and cache fold-wise predictions and diagnostics.

fit_resample(X, y)

Fit the filter and return the filtered data.

get_filter_report()

Return a dictionary with the main fit diagnostics.

get_detection_report()

Return the stored detection report.

get_fold_history()

Return the stored per-fold diagnostics.

explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)

Explain OOF decisions made by the filter and aggregate the result.

Parameters:

Name Type Description Default
sample_indices Sequence[int] | ndarray | None

Optional subset of sample indices to explain.

None
noisy_only bool

If True, ignore samples that were not flagged as noisy by the filter.

True
class_index Any

"predicted" (default) explains the OOF predicted class for each sample, "true" explains the true class, or pass an int/label to use a fixed class.

'predicted'
index str

Forwarded to get_tabpfn_imputation_explainer and explain.

'SV'
max_order str

Forwarded to get_tabpfn_imputation_explainer and explain.

'SV'
imputer str

Forwarded to get_tabpfn_imputation_explainer and explain.

'SV'
budget str

Forwarded to get_tabpfn_imputation_explainer and explain.

'SV'
top_k int | None

Number of strongest contributions to report per sample. Set to None or 0 to skip.

5
sort_by str

"confidence" (default), "fold_idx" or "sample_idx".

'confidence'
ascending bool

Sort direction. For confidence, ascending means less confident first.

True
feature_names Sequence[Any] | None

Optional feature names used in the returned contributions and plots.

None
return_interaction_values bool

If True, store the raw InteractionValues object per explanation.

False
return_figures bool

If True, attach a waterfall figure for each explanation.

False
max_display int

Max number of interactions shown in the waterfall plot.

10

Per-fold diagnostics collected during a TabPFN cross-validation run.

Local explanation for a single sample flagged by TabPFN.

Container for TabPFN filter explanations and fold diagnostics.

by_fold()

Group the stored explanations by fold index.

TabPFN_CVCF

Bases: CVCFFilter

Cross-validated TabPFN committee filter with fold-aware explanations.

Parameters:

Name Type Description Default
cv int

Number of stratified folds used to build the committee.

10
vote_rule ('consensus', 'threshold')

Rule used to flag samples as noisy from the fold disagreements.

"consensus"
threshold float

Minimum fraction of disagreeing folds required when vote_rule="threshold".

0.5
random_state int

Seed used by the stratified splitter and forwarded to TabPFN.

33
action ('remove', 'detect')

Whether noisy samples are dropped or only detected. Relabel is not implemented yet.

"remove"
tabpfn_params dict or None

Keyword arguments forwarded to :class:tabpfn.TabPFNClassifier.

None
Notes

Explanations are computed fold by fold and then aggregated into all-fold and majority-fold views. Using fit_mode="fit_with_cache" is recommended for faster and more stable explanations.

fit(X, y)

Fit the filter and cache committee predictions and diagnostics.

fit_resample(X, y)

Fit the filter and return the filtered data.

get_filter_report()

Return a dictionary with the main fit diagnostics.

get_detection_report()

Return the stored detection report.

get_fold_history()

Return the stored per-fold diagnostics.

explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)

Explain committee predictions fold by fold and aggregate them.

The all-view aggregates every fold explanation. The majority-view only aggregates folds that vote for the committee majority class of the sample. By default, each fold is explained with its own predicted class.

Per-fold diagnostics collected during a TabPFN committee run.

Local explanation computed for one fold and one sample.

Aggregated feature contribution view for a sample.

Complete explanation summary for one sample under the TabPFN committee filter.

Container for TabPFN committee explanations and fold diagnostics.

by_fold()

Group the fold explanations by fold index.

by_sample()

Return the sample explanations indexed by sample id.

Reading the explanation report

  • class_index="predicted" explains the class chosen by the model.
  • top_k stores the strongest local SHAP contributions.
  • confidence is the probability of the explained class.
  • all_view aggregates every fold, while majority_view only keeps the folds aligned with the committee vote.