Batching tasks
There is usually some computational overhead in analysing a stellar spectrum. For example, large grids of models may need to be loaded into memory every time a spectrum is analysed. To minimise this overhead, Astra allows you to execute any task in batch mode.
Executing in batch mode¶
Batch mode means that you can supply information about more than one observation (e.g.,
many ApStarFile objects or many ApVisitFile objects) and Astra will analyse those
observations together. The way to execute a task in batch mode is just to supply the
ApStarFile parameters (or ApVisitFile, or similar) as a tuple, instead of a string.
For example:
# Analyse a single star.
single_kwds = dict(
release="dr16",
apred="r12",
telescope="apo25m",
field="218-04",
prefix="ap",
obj="2M06440890-0610126",
apstar="stars"
)
# Analyse two stars in batch mode.
multiple_kwds = dict(
release=("dr16", "dr16"),
apred=("r12", "r12"),
telescope=("apo25m", "apo25m"),
field=("218-04", "000+14"),
prefix=("ap", "ap"),
obj=("2M06440890-0610126", "2M16505794-2118004"),
apstar=("stars", "stars")
)
Note that even if all of your observations share the same keyword (e.g., release or apstar),
the length of the tuple for all ApStarFile parameters must be the same.
You should keep other (e.g., non-ApStarFile) parameters as they are.
For example, if you were to execute APOGEENet, the other parameters would remain the same
even if you were analysing 1 spectrum, or 100 spectra:
from astra.contrib.apogeenet.tasks import EstimateStellarParametersGivenApStarFile
model_path = "APOGEENet.pt"
# Estimate stellar parameters for single star.
task = EstimateStellarParametersGivenApStarFile(
model_path=model_path,
**single_kwds
)
# Estimate stellar parameters for two stars.
task = EstimateStellarParametersGivenApStarFile(
model_path=model_path,
**multiple_kwds
)
Writing tasks for batch mode¶
Writing tasks to make use of batch mode is easy. The main things you have to do are to
make sure your task inherits from astra.tasks.BaseTask, and to use the
astra.tasks.BaseTask.get_batch_tasks() function to get a single-task’s worth
of work.
This function is an iterator that will yield individual tasks, regardless of whether
the task is being run in batch mode or not.
If the task is not being executed in batch mode, then only one task (self) will be
yielded.
Below is an example where the task can be executed in single-mode or batch-mode, without any extra information being supplied by the user:
from astra.tasks.base import BaseTask
from astra.tasks.io import LocalTargetTask
class MyTask(BaseTask):
def run(self):
# Do some expensive operation here that we otherwise would have
# to do for many tasks (for example, load a model).
model = self.read_model()
for task in self.get_batch_tasks():
# Read the observation for this individual single task.
spectrum = task.read_observation()
# Analyse the star using the model we loaded for all stars.
result = task.estimate_stellar_parameters(model, spectrum)
# Write the result of this task.
task.write_output(result)
return None
def requires(self):
requirements = dict(model=LocalTargetTask(self.model_path))
if not self.is_batch_mode:
requirements.update(
observation=ObservedSpectrum(**self.get_common_param_kwargs(ObservedSpectrum))
)
return requirements
You can see that you will have to write functions to do some of the expensive work (e.g., read_model),
but it is easy to write tasks that can be easily executed in batch mode.
The only potential _gotcha_ is what you need to do in requires().
Here you have to send back different dependencies based on whether the task is running in batch mode or not.
The reasons for this are deep and complex.
Scheduling batch tasks¶
Even if you do not explicitly batch a bunch of stars to be analysed together, Astra may schedule tasks together to run in batch mode to minimise overhead. Let’s go through an example to see how this works in practice.
Let’s assume that
Observationrepresents an observed spectrum, and you need to supply afieldandnameto uniquely identify a single observation:spectrum = Observation(field="250+00", name="2M000000+000000")
Let’s assume you a task called
MyAnalysisTaskthat runs onObservationobjects, and you need to supply the parametersorderandato theMyAnalysisTask, as well as the parameters for theObservationto analyse:task = MyAnalysisTask(a=3, order=5, field="250+00", name="2M000000+000000")
You need to analyse some stars, but you want to try different values of
orderto see the impact on the results. You create the following tasks and give them to the Astra scheduler:individual_tasks = [ MyAnalysisTask(a=3, order=5, field="250+00", name="2M123456+123456"), MyAnalysisTask(a=3, order=10, field="250+00", name="2M123456+123456"), MyAnalysisTask(a=3, order=5, field="omegaCen", name="2M003341+289732"), MyAnalysisTask(a=3, order=10, field="omegaCen", name="2M003341+289732"), MyAnalysisTask(a=-1, order=5, field="250+00", name="2M004562-1234872"), ]
You have submitted these as individual tasks, but Astra can see that MyAnalysisTask is batchable,
and that there are tasks where the non-Observation parameters are the same (e.g., these should be batched
together to minimise overhead).
In practice these tasks would be grouped together into just three batch tasks:
MyAnalysisTask(
a=3,
order=5,
field=("250+00", "omegaCen"),
name=("2M123456+123456", "2M003341+289732")
)
MyAnalysisTask(
a=3,
order=10,
field=("250+00", "omegaCen"),
name=("2M123456+123456", "2M003341+289732"),
)
MyAnalysisTask(a=-1, order=5, field="250+00", name="2M123456+123456")
Even though changing a=-1 from a=3 might not have any change on how the analysis is performed,
Astra doesn’t know that.
All it can assume is that if a non-Observation parameter is different, then that should be run
in a separate batch.
That’s because Astra isn’t smart enough to know that changing a is unimportant, but changing
something like model_path is important.
All parameters are assumed to have an effect on the output, unless you specify them to be insignificant
when writing the task class.