# Database

Astra uses [Peewee](http://docs.peewee-orm.com/) as its ORM. All models inherit from `BaseModel` (defined in `src/astra/models/base.py`), which binds them to a shared database connection.

## Connection setup

The database connection is established at import time in `src/astra/models/base.py` by the `get_database_and_schema` function. The resolution order is:

1. **`ASTRA_DATABASE_PATH` environment variable** -- If set, uses a SQLite database at that path.
2. **`TESTING` flag in config** -- If `True`, uses an in-memory SQLite database (`:memory:`).
3. **Config file** -- Reads `database` from the Astra config file (loaded by `sdsstools.get_config`). Supports both PostgreSQL (`dbname` key) and SQLite (`path` key).
4. **Fallback** -- If nothing is configured, defaults to an in-memory SQLite database with a warning.

Astra looks for configuration files in the standard `sdsstools` locations:

- `~/.config/sdss/astra/astra.yml` (user override)
- `src/astra/etc/astra.yml` (package default)

### PostgreSQL in production

In production (at Utah/CHPC), Astra uses PostgreSQL with a schema (e.g., `astra_043`). The `BaseModel.Meta.schema` attribute is set from the config. Astra wraps the PostgreSQL driver with retry logic (`ResilientDatabase`) to handle transient errors like deadlocks and connection drops.

### SQLite for development

For local development and testing, SQLite is simpler. WAL journaling mode, a 64 MB cache, and `synchronous=0` are set by default for performance.

## Core tables

### `Source`

Defined in `src/astra/models/source.py`. One row per astronomical source. Key columns:

- `pk` -- Auto-incrementing primary key.
- `sdss_id` -- The SDSS-V identifier (unique).
- `gaia_dr3_source_id`, `tic_v8_id`, etc. -- Cross-match identifiers.
- Astrometric, photometric, and targeting columns.

### `Spectrum`

Defined in `src/astra/models/spectrum.py`. A base table for all spectrum types. The `SpectrumMixin` adds computed properties like `.e_flux` and `.plot()`.

Concrete spectrum types (`ApogeeCoaddedSpectrumInApStar`, `BossVisitSpectrum`, `BossCombinedSpectrum`, etc.) extend `Spectrum` and add instrument-specific columns and pixel-data accessors.

### `PipelineOutputMixin`

Defined in `src/astra/models/pipeline.py`. The base class for every pipeline result table. Provides:

| Column | Type | Purpose |
|---|---|---|
| `task_pk` | AutoField | Primary key, auto-assigned on insert. |
| `source_pk` | ForeignKey(Source) | Link to the source. |
| `spectrum_pk` | ForeignKey(Spectrum) | Link to the input spectrum. |
| `v_astra` | Integer | Astra version encoded as an integer (e.g., `0.8.1` becomes `8001`). |
| `created` | DateTime | When the row was first created. |
| `modified` | DateTime | When the row was last updated. |
| `t_elapsed` | Float | Seconds spent on this spectrum (analysis time). |
| `t_overhead` | Float | Seconds of overhead (model loading, I/O). |
| `tag` | Text | Free-form label for organising runs. |

A generated column `v_astra_major_minor` (= `v_astra / 1000`) is used in the unique constraint `(spectrum_pk, v_astra_major_minor)`. This means only one result per spectrum per major.minor version is kept; re-running with the same version updates the existing row.

## Upsert behaviour

The `bulk_insert_or_replace_pipeline_results` function in `src/astra/__init__.py` performs batched upserts using PostgreSQL's `ON CONFLICT ... DO UPDATE` (or SQLite equivalent). When a conflict occurs on `(spectrum_pk, v_astra_major_minor)`, all fields except `task_pk` and `created` are overwritten. This means:

- The first run for a given version inserts new rows.
- Subsequent runs update existing rows in place.
- The original `created` timestamp is preserved; `modified` is updated.

## Custom fields

Astra defines several custom Peewee field types in `src/astra/fields.py`:

- **`BitField`** -- Stores multiple boolean flags in a single integer column. Individual flags are accessed as properties (e.g., `result.flag_low_snr`). Each flag is defined with `result_flags.flag(2**N, help_text="...")`.
- **`PixelArray`** -- A virtual field (not stored in the database) that lazily loads per-pixel arrays (flux, ivar, wavelength) from FITS files. Uses accessor classes to control how data is loaded.
- **`LogLambdaArrayAccessor`** -- Generates a wavelength array from `crval`, `cdelt`, and `naxis` parameters instead of reading from disk.

## Creating tables

In tests or local development, you can create tables manually:

```python
from astra.models.source import Source
from astra.models.spectrum import Spectrum
from astra.models.rocket import Rocket

for model in (Source, Spectrum, Rocket):
    model.create_table()
```

In production, table creation is managed by database migrations.

## Querying results

Peewee's query API applies directly:

```python
from astra.models.corv import Corv

# Get all results with teff > 10000
hot = Corv.select().where(Corv.teff > 10000)

# Join with Source to filter by sdss_id
from astra.models.source import Source
result = (
    Corv
    .select()
    .join(Source, on=(Corv.source_pk == Source.pk))
    .where(Source.sdss_id == 12345678)
    .first()
)
```