# Snow White Snow White is the white dwarf analysis pipeline in Astra. It classifies white dwarf spectra into spectral subtypes and fits effective temperature (Teff) and surface gravity (log g) for DA-type white dwarfs using an emulator-based Balmer line fitting method. Snow White runs on BOSS combined spectra for sources in the `mwm_wd` carton program. ## Overview The pipeline has two stages: 1. **Classification** -- A random forest classifier assigns a white dwarf spectral type (DA, DB, DC, DQ, DZ, and many subtypes) based on spectral line features. 2. **DA fitting** -- If the source is classified as DA (or DA:), the pipeline fits Teff and log g by fitting Balmer absorption lines with a PCA-based spectral emulator. ## Classification Snow White extracts a set of spectral line features from the input spectrum and feeds them into a pre-trained random forest classifier. The feature extraction (`get_line_info_v3.line_info`) works as follows: 1. The spectrum is scaled so that the total flux sums to the number of pixels (a simple normalization). 2. The flux is binned by pairs of pixels and interpolated onto a standard wavelength grid (3850--8300 Angstroms, 1 Angstrom spacing). 3. A pseudo-continuum is estimated using a spline fit through predefined anchor points. 4. For each white dwarf subtype (DA, DB, DQ, DZ, WD+MS, peculiar, hot DQ), the ratio of the observed flux to the pseudo-continuum (or to a blackbody fit for WD+MS and peculiar types) is computed in diagnostic wavelength windows defined by feature files. 5. H-alpha emission features are measured separately. 6. All features are concatenated into a single feature vector. The random forest outputs class probabilities for 24 white dwarf subtypes (see Output fields below). The classification is assigned as: - The most probable class, if its probability is at least 0.5. - A dual classification (e.g., `DA/DB`) if the second-most-probable class has a probability ratio greater than 0.6 relative to the top class. - An uncertain classification (e.g., `DA:`) otherwise. ## DA fitting For spectra classified as DA or DA:, Snow White fits Teff and log g by fitting the profiles of the Balmer hydrogen absorption lines. ### Spectral emulator The fitting uses a PCA-based spectral emulator (`Emulator_DA`) trained on a grid of DA white dwarf model atmospheres. The emulator performs PCA decomposition on the model grid and uses linear interpolation in (log10(Teff), log g) space to predict PCA weights at arbitrary parameter values. This allows rapid generation of model spectra without repeatedly reading from the full grid. ### Fitting procedure 1. **Grid search** -- The normalized spectrum is compared against a pre-computed grid of DA models (`da_flux_cube.npy`) to find a coarse best-fit Teff. If a photometric Teff and log g solution exists in a reference table (matched by Gaia DR3 source ID), that is used as the starting point instead. 2. **Line selection** -- Different sets of Balmer lines are used depending on the temperature regime: - Hot (Teff > 40,000 K): a dedicated hot line list - Warm (16,000--40,000 K): the standard line list - Cool (8,000--16,000 K): a cool line list - Very cool (Teff < 8,000 K): a very-cool line list 3. **Balmer line fitting** -- Each Balmer line is individually normalized by fitting a linear pseudo-continuum to the line wings. The emulator generates a model spectrum at trial (Teff, log g, wavelength shift) values, the model lines are normalized the same way, and residuals are computed. The fit uses `lmfit.minimize` with `least_squares` (soft L1 loss) to find the best parameters. The parameter bounds are: - Teff: 4,000--120,000 K - log g: 6.01--9.49 dex (internally stored as 601--949, divided by 100) - Wavelength shift: -80 to +80 Angstroms 4. **Uncertainty estimation** -- After convergence, the fit is repeated using `leastsq` (Levenberg-Marquardt) starting from the best-fit values to estimate formal parameter uncertainties. A noise model is applied to the formal uncertainties to produce the reported values: - `e_teff = 1.5 * raw_e_teff + 100` K - `e_logg = 2 * raw_e_logg + 0.05` dex ### Hot/cold solution disambiguation Because the strength of hydrogen Balmer lines peaks near Teff ~ 13,000 K, a given set of line profiles can be consistent with both a hotter and a cooler temperature solution. Snow White handles this ambiguity as follows: - If a photometric starting point was available (from the Gaia-matched reference table), only the single fit from that starting point is used. - Otherwise, the pipeline fits the spectrum twice: once starting from the best grid temperature, and once starting from the opposite side of the 13,000 K Balmer maximum. The correct solution is chosen by comparing the predicted Gaia BP-RP color from each model against the observed BP-RP color. The solution whose synthetic color is closest to the observed value is adopted. - If the observed Gaia BP and RP magnitudes are not available, the pipeline cannot disambiguate and sets `flag_missing_bp_rp_mag`. ## Output fields ### Classification probabilities | Field | Description | | --- | --- | | `classification` | Assigned spectral type string (e.g., `DA`, `DB`, `DA/DB`, `DA:`) | | `p_cv` | Probability of CV (cataclysmic variable) | | `p_da` | Probability of DA | | `p_dab` | Probability of DAB | | `p_dabz` | Probability of DABZ | | `p_dah` | Probability of DAH (magnetic DA) | | `p_dahe` | Probability of DAHe | | `p_dao` | Probability of DAO | | `p_daz` | Probability of DAZ | | `p_da_ms` | Probability of DA+MS (DA with main-sequence companion) | | `p_db` | Probability of DB | | `p_dba` | Probability of DBA | | `p_dbaz` | Probability of DBAZ | | `p_dbh` | Probability of DBH | | `p_dbz` | Probability of DBZ | | `p_db_ms` | Probability of DB+MS (DB with main-sequence companion) | | `p_dc` | Probability of DC (featureless) | | `p_dc_ms` | Probability of DC+MS | | `p_do` | Probability of DO | | `p_dq` | Probability of DQ | | `p_dqz` | Probability of DQZ | | `p_dqpec` | Probability of DQpec (peculiar DQ) | | `p_dz` | Probability of DZ | | `p_dza` | Probability of DZA | | `p_dzb` | Probability of DZB | | `p_dzba` | Probability of DZBA | | `p_mwd` | Probability of magnetic white dwarf | | `p_hotdq` | Probability of hot DQ | ### Stellar parameters (DA only) | Field | Description | | --- | --- | | `teff` | Effective temperature (K) | | `e_teff` | Uncertainty in Teff (K), after noise model | | `logg` | Surface gravity (log g, dex) | | `e_logg` | Uncertainty in log g (dex), after noise model | | `raw_e_teff` | Formal (pre-noise-model) uncertainty in Teff | | `raw_e_logg` | Formal (pre-noise-model) uncertainty in log g | | `v_rel` | Relative velocity used in the fit (km/s) | ### Spectral data | Field | Description | | --- | --- | | `wavelength` | Wavelength array (log-lambda spaced, 4648 pixels starting at log10(lambda) = 3.5523) | | `model_flux` | Best-fit DA model flux, resampled to the wavelength grid (DA fits only) | ## Flags Snow White uses a `result_flags` bit field. The individual flags are: | Flag | Bit | Description | | --- | --- | --- | | `flag_low_snr` | 2^0 | Signal-to-noise ratio is 8 or below. Results may be unreliable. | | `flag_unconverged` | 2^1 | The fit did not converge. | | `flag_teff_grid_edge_bad` | 2^2 | Teff is at the edge of the model grid. | | `flag_logg_grid_edge_bad` | 2^3 | log g is at the edge of the model grid. | | `flag_no_flux` | 2^4 | The spectrum has no flux (all zeros or invalid). No results are produced. | | `flag_not_mwm_wd` | 2^5 | The source is not in the `mwm_wd` program. No analysis is performed. | | `flag_missing_bp_rp_mag` | 2^6 | Gaia BP or RP magnitude is missing, so the hot/cold solution could not be disambiguated. No stellar parameters are reported. | ## Things to know - **Only DA white dwarfs get stellar parameters.** Other subtypes (DB, DC, DQ, DZ, etc.) receive a classification and class probabilities, but no Teff or log g. - **The noise model inflates formal uncertainties.** The reported `e_teff` and `e_logg` include an empirical correction applied to the formal fitting uncertainties. If you need the raw formal errors, use `raw_e_teff` and `raw_e_logg`. - **Hot/cold ambiguity requires Gaia photometry.** Without BP and RP magnitudes, the pipeline cannot choose between the hot and cold solutions and will set `flag_missing_bp_rp_mag` without reporting stellar parameters. - **Low S/N results are flagged but still reported.** When the spectrum has S/N <= 8, `flag_low_snr` is set. The fitted parameters are still present in the output, but should be treated with caution. - **Classification notation:** - A plain type like `DA` means the classifier is confident (probability >= 0.5). - A type with a colon like `DA:` means the top class had probability < 0.5, and the second class was not close enough to warrant a dual classification. - A dual type like `DA/DB` means the top two classes had similar probabilities (ratio > 0.6). - **Input spectra.** Snow White operates on BOSS combined spectra (`BossCombinedSpectrum`). Only sources assigned to the `mwm_wd` carton program are analyzed; all others are immediately flagged with `flag_not_mwm_wd`. - **Grid boundaries.** The emulator covers Teff from 4,000 to 120,000 K and log g from 6.01 to 9.49. Fits that land on these boundaries should be interpreted carefully.