The Mallory Group

Audit and robustness checks

Validation tests for the loyalty-discount result, separate from Findings and Methods.

Intro

This page reports checks that test whether the loyalty-discount finding is real and stable. Findings states what the result is. Methods states how it was produced. Audit shows how hard the result was pushed.

Falsification test

The test keeps the same model but randomizes contract grouping 2,000 times. If a random split could reproduce the same-team effect, the model structure might be creating the signal.

Observed same-team coefficient: +0.703. Null mean from random relabeling: +0.001. Empirical p-value: <0.0001 (0 of 2,000 random draws as extreme as the real estimate).

Takeaway: a random grouping does not reproduce the same-team advantage, so the effect is not an artifact of the model form.

Source: output/tables/audit_falsification.csv

Subsample robustness

The model was re-estimated inside position, trajectory, tier, and era slices. Holds is TRUE when the subgroup coefficient keeps the full-sample sign and has p<0.05.

Subgroup Value n Coefficient Std. error p-value Holds
positiondefensemen602+0.9770.215<0.0001TRUE
positionforwards1,105+0.5260.1380.0001TRUE
trajectorydeclining389+0.3920.2390.1018FALSE
trajectoryinsufficient_history566+0.5610.2800.0458TRUE
trajectoryrising543+0.7120.1920.0002TRUE
trajectorystable209+0.5310.2940.0724FALSE
tierfringe416+0.5280.2520.0368TRUE
tiermiddle539+0.8290.187<0.0001TRUE
tiertop434+0.9160.2410.0002TRUE
tierunknown*318+0.4860.2780.0814FALSE
eraearly_2012_2021971+0.6410.1780.0003TRUE
eralate_2022_2024736+0.5270.1610.0011TRUE

Selection-relevant read: the same-team coefficient remains positive and significant in key tier groups (fringe, middle, top) and in trajectory groups with enough precision (rising and insufficient history). That pattern is evidence against a pure selection story, but it is not a full causal identification.

Non-holding cells are shown directly: declining trajectory, stable trajectory, and the residual unknown tier (see note).

* Unknown is a residual category, not a player type. A contract lands here when its walk-year usage could not be placed in a tier, for example when walk-year position group or walk-year time on ice per game is missing and it cannot be ranked against same-season position peers. The non-holding result for the unknown row reflects that mixed, unclassified group. It is not a real tier where the effect fails.

Source: output/tables/audit_subsample_robustness.csv

Outlier sensitivity

The model was re-fit after trimming outcome tails to test whether a small number of extremes drives the result.

Trim fraction n Coefficient p-value
0.001,707+0.703<0.0001
0.011,671+0.593<0.0001
0.051,535+0.503<0.0001

Takeaway: the same-team effect survives removal of the tails. The estimate shrinks but stays positive and statistically strong.

Source: output/tables/audit_outlier_sensitivity.csv

Data integrity

Coverage tables from Phase 5 confirm the sample inputs used here: eligible walk-year n=1,624 and eligible overpay n=1,760.

Retention distribution in the panel is same_team 1,507, new_team 857, entry 728, unknown 83. This unknown count is also residual, not a retention category. Signing-year counts are explicit in the coverage table, with sparse edges in 2012 to 2014 and a partial 2025.

Sources: output/tables/coverage_eligible_sample_sizes.csv, output/tables/coverage_retention_distribution.csv, output/tables/coverage_contract_count_by_signing_year.csv

Interpretation limit

For the selection caveat and interpretation boundaries, see the Methods page discussion in What the model cannot see.