astra.contrib.thepayne.tasks

Module Contents

Classes

ThePayneMixin A base task class for Astra.
TrainThePayne Train a single-layer neural network given a pre-computed grid of synthetic spectra.
EstimateStellarLabels Use a pre-trained neural network to estimate stellar labels. This should be sub-classed to inherit properties from the type of spectra to be analysed.
ContinuumNormalizeGivenApStarFile Pseudo-continuum normalise ApStar spectra using a sum of sines and cosines.
ContinuumNormalizeGivenSDSS4ApStarFile Pseudo-continuum normalise SDSS-IV ApStar spectra using a sum of sines and cosines.
EstimateStellarLabelsGivenApStarFile Estimate stellar labels given a single-layer neural network and an ApStar file.
EstimateStellarLabelsGivenSDSS4ApStarFile Estimate stellar labels given a single-layer neural network and a SDSS-IV ApStar file.
astra.contrib.thepayne.tasks.SlurmMixin
class astra.contrib.thepayne.tasks.ThePayneMixin(*args, **kwargs)

A base task class for Astra.

task_namespace = ThePayne
n_steps
n_neurons
weight_decay
learning_rate
training_set_path
use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
max_batch_size
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
requires(self)

The requirements of this task.

output(self)

The outputs of this task.

query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

run(self)

The task run method, to be overridden in a subclass.

See Task.run

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

class astra.contrib.thepayne.tasks.TrainThePayne(*args, **kwargs)

Train a single-layer neural network given a pre-computed grid of synthetic spectra.

Parameters:
  • training_set_path

    The path where the training set spectra and labels are stored. This should be a binary pickle file that contains a dictionary with the following keys:

    • wavelength: an array of shape (P, ) where P is the number of pixels
    • spectra: an array of shape (N, P) where N is the number of spectra and P is the number of pixels
    • labels: an array of shape (L, P) where L is the number of labels and P is the number of pixels
    • label_names: a tuple of length L that contains the names of the labels
  • n_steps – (optional) The number of steps to train the network for (default 100000).
  • n_neurons – (optional) The number of neurons to use in the hidden layer (default: 300).
  • weight_decay – (optional) The weight decay to use during training (default: 0)
  • learning_rate – (optional) The learning rate to use during training (default: 0.001).
task_namespace = ThePayne
n_steps
n_neurons
weight_decay
learning_rate
training_set_path
use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
max_batch_size
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
requires(self)

The requirements of this task.

run(self)

Execute this task.

output(self)

The output of this task.

_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

class astra.contrib.thepayne.tasks.EstimateStellarLabels(*args, **kwargs)

Use a pre-trained neural network to estimate stellar labels. This should be sub-classed to inherit properties from the type of spectra to be analysed.

Parameters:
  • training_set_path

    The path where the training set spectra and labels are stored. This should be a binary pickle file that contains a dictionary with the following keys:

    • wavelength: an array of shape (P, ) where P is the number of pixels
    • spectra: an array of shape (N, P) where N is the number of spectra and P is the number of pixels
    • labels: an array of shape (L, P) where L is the number of labels and P is the number of pixels
    • label_names: a tuple of length L that contains the names of the labels
  • n_steps – (optional) The number of steps to train the network for (default 100000).
  • n_neurons – (optional) The number of neurons to use in the hidden layer (default: 300).
  • weight_decay – (optional) The weight decay to use during training (default: 0)
  • learning_rate – (optional) The learning rate to use during training (default: 0.001).
max_batch_size = 10000
task_namespace = ThePayne
n_steps
n_neurons
weight_decay
learning_rate
training_set_path
use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
prepare_observation(self)

Prepare the observations for analysis.

run(self)

Execute this task.

output(self)

The output of this task.

_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
requires(self)

The requirements of this task.

query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

class astra.contrib.thepayne.tasks.ContinuumNormalizeGivenApStarFile(*args, **kwargs)

Pseudo-continuum normalise ApStar spectra using a sum of sines and cosines.

task_namespace = ContinuumNormalize
L
continuum_order
continuum_regions_path
spectrum_kwds
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
max_batch_size
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
sdss_data_model_name = apStar
obj
healpix
apstar
apred
telescope
release
public
use_remote
remote_access_method
mirror
verbose
tree
local_path

The local path of the file.

remote_path

The remote path of the file. Useful for debugging path problems.

This is relatively expensive to return, so don’t use this to download sources. Instead use one instance of sdss_access.HttpAccess to get the remote paths of many sources.

requires(self)

The requirements of this task.

output(self)

The outputs of this task.

run(self)

The task run method, to be overridden in a subclass.

See Task.run

_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

get_or_create_data_model_relationships(self)

Return the keywords that reference the input data model for this task.

writer(self, spectrum, path, **kwargs)
classmethod get_local_path(cls, release, public=True, mirror=False, verbose=True, **kwargs)
get_remote_http(self)

Download the remote file using HTTP.

get_remote_rsync(self)

Download the remote file using rsync.

get_remote(self)

Download the remote file.

class astra.contrib.thepayne.tasks.ContinuumNormalizeGivenSDSS4ApStarFile(*args, **kwargs)

Pseudo-continuum normalise SDSS-IV ApStar spectra using a sum of sines and cosines.

task_namespace = ContinuumNormalize
L
continuum_order
continuum_regions_path
spectrum_kwds
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
max_batch_size
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
sdss_data_model_name = apStar
release
use_remote
remote_access_method
public
mirror
verbose
tree
local_path

The local path of the file.

remote_path

The remote path of the file. Useful for debugging path problems.

This is relatively expensive to return, so don’t use this to download sources. Instead use one instance of sdss_access.HttpAccess to get the remote paths of many sources.

requires(self)

The requirements of this task.

output(self)

The outputs of this task.

run(self)

The task run method, to be overridden in a subclass.

See Task.run

_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

get_or_create_data_model_relationships(self)

Return the keywords that reference the input data model for this task.

classmethod get_data_model_keywords(self, task_state)
writer(self, spectrum, path, **kwargs)
classmethod get_local_path(cls, release, public=True, mirror=False, verbose=True, **kwargs)
get_remote_http(self)

Download the remote file using HTTP.

get_remote_rsync(self)

Download the remote file using rsync.

get_remote(self)

Download the remote file.

class astra.contrib.thepayne.tasks.EstimateStellarLabelsGivenApStarFile(*args, **kwargs)

Estimate stellar labels given a single-layer neural network and an ApStar file.

This task also requires all parameters that astra.tasks.io.sdss5.ApStarFile requires, and that the astra.tasks.continuum.Sinusoidal task requires.

Parameters:
  • training_set_path

    The path where the training set spectra and labels are stored. This should be a binary pickle file that contains a dictionary with the following keys:

    • wavelength: an array of shape (P, ) where P is the number of pixels
    • spectra: an array of shape (N, P) where N is the number of spectra and P is the number of pixels
    • labels: an array of shape (L, P) where L is the number of labels and P is the number of pixels
    • label_names: a tuple of length L that contains the names of the labels
  • n_steps – (optional) The number of steps to train the network for (default: 100000).
  • n_neurons – (optional) The number of neurons to use in the hidden layer (default: 300).
  • weight_decay – (optional) The weight decay to use during training (default: 0)
  • learning_rate – (optional) The learning rate to use during training (default: 0.001).
  • continuum_regions_path – A path containing a list of (start, end) wavelength values that represent the regions to fit as continuum.
max_batch_size = 10000
task_namespace = ThePayne
n_steps
n_neurons
weight_decay
learning_rate
training_set_path
use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
L
continuum_order
continuum_regions_path
spectrum_kwds
sdss_data_model_name = apStar
obj
healpix
apstar
apred
telescope
release
public
use_remote
remote_access_method
mirror
verbose
tree
local_path

The local path of the file.

remote_path

The remote path of the file. Useful for debugging path problems.

This is relatively expensive to return, so don’t use this to download sources. Instead use one instance of sdss_access.HttpAccess to get the remote paths of many sources.

requires(self)

The requirements of this task.

prepare_observation(self)

Prepare the observations for analysis.

run(self)

Execute this task.

output(self)

The output of this task.

_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

get_or_create_data_model_relationships(self)

Return the keywords that reference the input data model for this task.

writer(self, spectrum, path, **kwargs)
classmethod get_local_path(cls, release, public=True, mirror=False, verbose=True, **kwargs)
get_remote_http(self)

Download the remote file using HTTP.

get_remote_rsync(self)

Download the remote file using rsync.

get_remote(self)

Download the remote file.

class astra.contrib.thepayne.tasks.EstimateStellarLabelsGivenSDSS4ApStarFile(*args, **kwargs)

Estimate stellar labels given a single-layer neural network and a SDSS-IV ApStar file.

This task also requires all parameters that astra.tasks.io.sdss4.ApStarFile requires, and that the astra.tasks.continuum.Sinusoidal task requires.

Parameters:
  • training_set_path

    The path where the training set spectra and labels are stored. This should be a binary pickle file that contains a dictionary with the following keys:

    • wavelength: an array of shape (P, ) where P is the number of pixels
    • spectra: an array of shape (N, P) where N is the number of spectra and P is the number of pixels
    • labels: an array of shape (L, P) where L is the number of labels and P is the number of pixels
    • label_names: a tuple of length L that contains the names of the labels
  • n_steps – (optional) The number of steps to train the network for (default: 100000).
  • n_neurons – (optional) The number of neurons to use in the hidden layer (default: 300).
  • weight_decay – (optional) The weight decay to use during training (default: 0)
  • learning_rate – (optional) The learning rate to use during training (default: 0.001).
  • continuum_regions_path – A path containing a list of (start, end) wavelength values that represent the regions to fit as continuum.
max_batch_size = 10000
task_namespace = ThePayne
n_steps
n_neurons
weight_decay
learning_rate
training_set_path
use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
L
continuum_order
continuum_regions_path
spectrum_kwds
sdss_data_model_name = apStar
release
use_remote
remote_access_method
public
mirror
verbose
tree
local_path

The local path of the file.

remote_path

The remote path of the file. Useful for debugging path problems.

This is relatively expensive to return, so don’t use this to download sources. Instead use one instance of sdss_access.HttpAccess to get the remote paths of many sources.

requires(self)

The requirements of this task.

prepare_observation(self)

Prepare the observations for analysis.

run(self)

Execute this task.

output(self)

The output of this task.

_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

get_or_create_data_model_relationships(self)

Return the keywords that reference the input data model for this task.

classmethod get_data_model_keywords(self, task_state)
writer(self, spectrum, path, **kwargs)
classmethod get_local_path(cls, release, public=True, mirror=False, verbose=True, **kwargs)
get_remote_http(self)

Download the remote file using HTTP.

get_remote_rsync(self)

Download the remote file using rsync.

get_remote(self)

Download the remote file.