Developer Guide

This guide is for developers who want to contribute to Astra, add a new analysis pipeline, or understand how the codebase works. It assumes familiarity with Python.

What is Astra?

Astra is the analysis framework for the SDSS-V Milky Way Mapper. It manages spectroscopic analysis pipelines, stores results in a database, and produces data products (FITS files) for data releases. Each pipeline analyzes spectra (APOGEE, BOSS, or combined MWM products) and writes per-spectrum results to a shared database. Apache Airflow orchestrates the pipelines in production.

Setting up a development environment

Astra uses uv for dependency management. To get started:

# Clone the repository
git clone https://github.com/sdss/astra.git
cd astra

# Create a virtual environment and install in editable mode with dev dependencies
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Database configuration

Astra needs a database connection. For local development, the simplest option is to use a SQLite database by setting an environment variable:

export ASTRA_DATABASE_PATH="/path/to/astra.db"

Alternatively, create a configuration file at ~/.config/sdss/astra/astra.yml:

# SQLite (simplest for local development)
database:
  path: /path/to/astra.db

# PostgreSQL (used in production)
# database:
#   dbname: astra
#   user: username
#   host: localhost
#   port: 5432
#   schema: astra

For tests, the database is automatically set to an in-memory SQLite instance (:memory:), so no configuration is needed to run the test suite.

Running the CLI

Astra provides a typer-based CLI. After installation:

astra --help
astra config show

Repository layout

astra/
  src/astra/           # Main source code
    models/            # Peewee ORM models (database schema)
    pipelines/         # Analysis pipeline code
    products/          # FITS data product generation
    specutils/         # Spectral utilities (continuum, resampling, LSF)
    cli/               # Command-line interface (typer)
    migrations/        # Database migration scripts
    operators/         # Airflow operators
    etc/               # Default configuration files
    fields.py          # Custom Peewee field types (BitField, ArrayField, PixelArray)
    glossary.py        # Standardised field descriptions for data models
    utils/             # General utilities, logging, Slurm helpers
  tests/               # Test suite
  dags/                # Airflow DAG definitions
  docs/                # Sphinx documentation
  pyproject.toml       # Project metadata and dependencies

See Project Structure for a deeper look at each directory.

Contents