TabPFN-based filters¶
These filters use TabPFNClassifier as the base learner and expose a local explanation report for noisy samples.
TabPFN_CF¶
Bases: ClassificationFilter
Cross-validated TabPFN label-noise filter with fold-aware explanations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cv
|
int
|
Number of stratified folds used to generate out-of-fold predictions. |
10
|
random_state
|
int
|
Seed used by the stratified splitter and forwarded to TabPFN. |
33
|
action
|
('remove', 'detect')
|
Whether noisy samples are dropped or only detected. Relabel is not implemented yet. |
"remove"
|
tabpfn_params
|
dict or None
|
Keyword arguments forwarded to :class: |
None
|
Notes
Explanations are computed with SHAP-based tooling from tabpfn_extensions.
Using fit_mode="fit_with_cache" is recommended for faster and more stable explanations.
fit(X, y)
¶
Fit the filter and cache fold-wise predictions and diagnostics.
fit_resample(X, y)
¶
Fit the filter and return the filtered data.
get_filter_report()
¶
Return a dictionary with the main fit diagnostics.
get_detection_report()
¶
Return the stored detection report.
get_fold_history()
¶
Return the stored per-fold diagnostics.
explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)
¶
Explain OOF decisions made by the filter and aggregate the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_indices
|
Sequence[int] | ndarray | None
|
Optional subset of sample indices to explain. |
None
|
noisy_only
|
bool
|
If True, ignore samples that were not flagged as noisy by the filter. |
True
|
class_index
|
Any
|
"predicted" (default) explains the OOF predicted class for each sample, "true" explains the true class, or pass an int/label to use a fixed class. |
'predicted'
|
index
|
str
|
Forwarded to |
'SV'
|
max_order
|
str
|
Forwarded to |
'SV'
|
imputer
|
str
|
Forwarded to |
'SV'
|
budget
|
str
|
Forwarded to |
'SV'
|
top_k
|
int | None
|
Number of strongest contributions to report per sample. Set to |
5
|
sort_by
|
str
|
|
'confidence'
|
ascending
|
bool
|
Sort direction. For confidence, ascending means less confident first. |
True
|
feature_names
|
Sequence[Any] | None
|
Optional feature names used in the returned contributions and plots. |
None
|
return_interaction_values
|
bool
|
If True, store the raw |
False
|
return_figures
|
bool
|
If True, attach a waterfall figure for each explanation. |
False
|
max_display
|
int
|
Max number of interactions shown in the waterfall plot. |
10
|
Container for TabPFN filter explanations and fold diagnostics.
by_fold()
¶
Group the stored explanations by fold index.
TabPFN_CVCF¶
Bases: CVCFFilter
Cross-validated TabPFN committee filter with fold-aware explanations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cv
|
int
|
Number of stratified folds used to build the committee. |
10
|
vote_rule
|
('consensus', 'threshold')
|
Rule used to flag samples as noisy from the fold disagreements. |
"consensus"
|
threshold
|
float
|
Minimum fraction of disagreeing folds required when |
0.5
|
random_state
|
int
|
Seed used by the stratified splitter and forwarded to TabPFN. |
33
|
action
|
('remove', 'detect')
|
Whether noisy samples are dropped or only detected. Relabel is not implemented yet. |
"remove"
|
tabpfn_params
|
dict or None
|
Keyword arguments forwarded to :class: |
None
|
Notes
Explanations are computed fold by fold and then aggregated into all-fold and majority-fold views.
Using fit_mode="fit_with_cache" is recommended for faster and more stable explanations.
fit(X, y)
¶
Fit the filter and cache committee predictions and diagnostics.
fit_resample(X, y)
¶
Fit the filter and return the filtered data.
get_filter_report()
¶
Return a dictionary with the main fit diagnostics.
get_detection_report()
¶
Return the stored detection report.
get_fold_history()
¶
Return the stored per-fold diagnostics.
explain_noisy_instances(sample_indices=None, *, noisy_only=True, class_index='predicted', index='SV', max_order=1, imputer='baseline', budget=128, top_k=5, sort_by='confidence', ascending=True, feature_names=None, return_interaction_values=False, return_figures=False, max_display=10)
¶
Explain committee predictions fold by fold and aggregate them.
The all-view aggregates every fold explanation. The majority-view only aggregates folds that vote for the committee majority class of the sample. By default, each fold is explained with its own predicted class.
Reading the explanation report¶
class_index="predicted"explains the class chosen by the model.top_kstores the strongest local SHAP contributions.confidenceis the probability of the explained class.all_viewaggregates every fold, whilemajority_viewonly keeps the folds aligned with the committee vote.