Snow White¶
Snow White is the white dwarf analysis pipeline in Astra. It classifies white dwarf spectra into spectral subtypes and fits effective temperature (Teff) and surface gravity (log g) for DA-type white dwarfs using an emulator-based Balmer line fitting method.
Snow White runs on BOSS combined spectra for sources in the mwm_wd carton program.
Overview¶
The pipeline has two stages:
Classification – A random forest classifier assigns a white dwarf spectral type (DA, DB, DC, DQ, DZ, and many subtypes) based on spectral line features.
DA fitting – If the source is classified as DA (or DA:), the pipeline fits Teff and log g by fitting Balmer absorption lines with a PCA-based spectral emulator.
Classification¶
Snow White extracts a set of spectral line features from the input spectrum and feeds
them into a pre-trained random forest classifier. The feature extraction
(get_line_info_v3.line_info) works as follows:
The spectrum is scaled so that the total flux sums to the number of pixels (a simple normalization).
The flux is binned by pairs of pixels and interpolated onto a standard wavelength grid (3850–8300 Angstroms, 1 Angstrom spacing).
A pseudo-continuum is estimated using a spline fit through predefined anchor points.
For each white dwarf subtype (DA, DB, DQ, DZ, WD+MS, peculiar, hot DQ), the ratio of the observed flux to the pseudo-continuum (or to a blackbody fit for WD+MS and peculiar types) is computed in diagnostic wavelength windows defined by feature files.
H-alpha emission features are measured separately.
All features are concatenated into a single feature vector.
The random forest outputs class probabilities for 24 white dwarf subtypes (see Output fields below). The classification is assigned as:
The most probable class, if its probability is at least 0.5.
A dual classification (e.g.,
DA/DB) if the second-most-probable class has a probability ratio greater than 0.6 relative to the top class.An uncertain classification (e.g.,
DA:) otherwise.
DA fitting¶
For spectra classified as DA or DA:, Snow White fits Teff and log g by fitting the profiles of the Balmer hydrogen absorption lines.
Spectral emulator¶
The fitting uses a PCA-based spectral emulator (Emulator_DA) trained on a grid of
DA white dwarf model atmospheres. The emulator performs PCA decomposition on the model
grid and uses linear interpolation in (log10(Teff), log g) space to predict PCA weights
at arbitrary parameter values. This allows rapid generation of model spectra without
repeatedly reading from the full grid.
Fitting procedure¶
Grid search – The normalized spectrum is compared against a pre-computed grid of DA models (
da_flux_cube.npy) to find a coarse best-fit Teff. If a photometric Teff and log g solution exists in a reference table (matched by Gaia DR3 source ID), that is used as the starting point instead.Line selection – Different sets of Balmer lines are used depending on the temperature regime:
Hot (Teff > 40,000 K): a dedicated hot line list
Warm (16,000–40,000 K): the standard line list
Cool (8,000–16,000 K): a cool line list
Very cool (Teff < 8,000 K): a very-cool line list
Balmer line fitting – Each Balmer line is individually normalized by fitting a linear pseudo-continuum to the line wings. The emulator generates a model spectrum at trial (Teff, log g, wavelength shift) values, the model lines are normalized the same way, and residuals are computed. The fit uses
lmfit.minimizewithleast_squares(soft L1 loss) to find the best parameters. The parameter bounds are:Teff: 4,000–120,000 K
log g: 6.01–9.49 dex (internally stored as 601–949, divided by 100)
Wavelength shift: -80 to +80 Angstroms
Uncertainty estimation – After convergence, the fit is repeated using
leastsq(Levenberg-Marquardt) starting from the best-fit values to estimate formal parameter uncertainties. A noise model is applied to the formal uncertainties to produce the reported values:e_teff = 1.5 * raw_e_teff + 100Ke_logg = 2 * raw_e_logg + 0.05dex
Hot/cold solution disambiguation¶
Because the strength of hydrogen Balmer lines peaks near Teff ~ 13,000 K, a given set of line profiles can be consistent with both a hotter and a cooler temperature solution. Snow White handles this ambiguity as follows:
If a photometric starting point was available (from the Gaia-matched reference table), only the single fit from that starting point is used.
Otherwise, the pipeline fits the spectrum twice: once starting from the best grid temperature, and once starting from the opposite side of the 13,000 K Balmer maximum. The correct solution is chosen by comparing the predicted Gaia BP-RP color from each model against the observed BP-RP color. The solution whose synthetic color is closest to the observed value is adopted.
If the observed Gaia BP and RP magnitudes are not available, the pipeline cannot disambiguate and sets
flag_missing_bp_rp_mag.
Output fields¶
Classification probabilities¶
Field |
Description |
|---|---|
|
Assigned spectral type string (e.g., |
|
Probability of CV (cataclysmic variable) |
|
Probability of DA |
|
Probability of DAB |
|
Probability of DABZ |
|
Probability of DAH (magnetic DA) |
|
Probability of DAHe |
|
Probability of DAO |
|
Probability of DAZ |
|
Probability of DA+MS (DA with main-sequence companion) |
|
Probability of DB |
|
Probability of DBA |
|
Probability of DBAZ |
|
Probability of DBH |
|
Probability of DBZ |
|
Probability of DB+MS (DB with main-sequence companion) |
|
Probability of DC (featureless) |
|
Probability of DC+MS |
|
Probability of DO |
|
Probability of DQ |
|
Probability of DQZ |
|
Probability of DQpec (peculiar DQ) |
|
Probability of DZ |
|
Probability of DZA |
|
Probability of DZB |
|
Probability of DZBA |
|
Probability of magnetic white dwarf |
|
Probability of hot DQ |
Stellar parameters (DA only)¶
Field |
Description |
|---|---|
|
Effective temperature (K) |
|
Uncertainty in Teff (K), after noise model |
|
Surface gravity (log g, dex) |
|
Uncertainty in log g (dex), after noise model |
|
Formal (pre-noise-model) uncertainty in Teff |
|
Formal (pre-noise-model) uncertainty in log g |
|
Relative velocity used in the fit (km/s) |
Spectral data¶
Field |
Description |
|---|---|
|
Wavelength array (log-lambda spaced, 4648 pixels starting at log10(lambda) = 3.5523) |
|
Best-fit DA model flux, resampled to the wavelength grid (DA fits only) |
Flags¶
Snow White uses a result_flags bit field. The individual flags are:
Flag |
Bit |
Description |
|---|---|---|
|
2^0 |
Signal-to-noise ratio is 8 or below. Results may be unreliable. |
|
2^1 |
The fit did not converge. |
|
2^2 |
Teff is at the edge of the model grid. |
|
2^3 |
log g is at the edge of the model grid. |
|
2^4 |
The spectrum has no flux (all zeros or invalid). No results are produced. |
|
2^5 |
The source is not in the |
|
2^6 |
Gaia BP or RP magnitude is missing, so the hot/cold solution could not be disambiguated. No stellar parameters are reported. |
Things to know¶
Only DA white dwarfs get stellar parameters. Other subtypes (DB, DC, DQ, DZ, etc.) receive a classification and class probabilities, but no Teff or log g.
The noise model inflates formal uncertainties. The reported
e_teffande_logginclude an empirical correction applied to the formal fitting uncertainties. If you need the raw formal errors, useraw_e_teffandraw_e_logg.Hot/cold ambiguity requires Gaia photometry. Without BP and RP magnitudes, the pipeline cannot choose between the hot and cold solutions and will set
flag_missing_bp_rp_magwithout reporting stellar parameters.Low S/N results are flagged but still reported. When the spectrum has S/N <= 8,
flag_low_snris set. The fitted parameters are still present in the output, but should be treated with caution.Classification notation:
A plain type like
DAmeans the classifier is confident (probability >= 0.5).A type with a colon like
DA:means the top class had probability < 0.5, and the second class was not close enough to warrant a dual classification.A dual type like
DA/DBmeans the top two classes had similar probabilities (ratio > 0.6).
Input spectra. Snow White operates on BOSS combined spectra (
BossCombinedSpectrum). Only sources assigned to themwm_wdcarton program are analyzed; all others are immediately flagged withflag_not_mwm_wd.Grid boundaries. The emulator covers Teff from 4,000 to 120,000 K and log g from 6.01 to 9.49. Fits that land on these boundaries should be interpreted carefully.