astra.contrib.classifier.tasks.test

Module Contents

Classes

ClassifySource Classify a source.
ClassifySourceGivenApVisitFileBase Classify the type of stellar source, given an ApVisitFile.
ClassifySourceGivenApVisitFile Classify the type of stellar source, given an ApVisitFile.
ClassifySourceGivenApStarFile Classify an ApStar source from the joint classifications of individual visits (the ApVisit files that went into that ApStar).
ClassifySourceGivenSDSS4ApVisitFile Classify the type of stellar source, given an ApVisitFile.
class astra.contrib.classifier.tasks.test.ClassifySource(*args, **kwargs)

Classify a source.

This should be sub-classed and mixed with data model class to inherit parameters.

task_namespace = Classifier
training_spectra_path
training_labels_path
validation_spectra_path
validation_labels_path
test_spectra_path
test_labels_path
n_epochs
batch_size
weight_decay
learning_rate
max_batch_size = 10000
class_names

Names of classes used in this classifier.

use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
read_network(self)

Read the network from the input path.

run(self)

The task run method, to be overridden in a subclass.

See Task.run

prepare_result(self, log_prob)
get_tqdm_kwds(self, desc=None)
_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
requires(self)

The requirements of this task.

output(self)

The outputs of this task.

query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

class astra.contrib.classifier.tasks.test.ClassifySourceGivenApVisitFileBase(*args, **kwargs)

Classify the type of stellar source, given an ApVisitFile.

This task requires the same parameters required by astra.contrib.classifer.train.TrainNIRSpectrumClassifier, and those required by astra.tasks.io.ApVisitFile.

network_factory
task_namespace = Classifier
training_spectra_path
training_labels_path
validation_spectra_path
validation_labels_path
test_spectra_path
test_labels_path
n_epochs
batch_size
weight_decay
learning_rate
max_batch_size = 10000
class_names

Names of classes used in this classifier.

use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
read_observation(self)

Read the input observation from disk.

prepare_batch(self)

Prepare the input observation for being batched through the network.

run(self)

Execute the task.

output(self)

The output of the task.

read_network(self)

Read the network from the input path.

prepare_result(self, log_prob)
get_tqdm_kwds(self, desc=None)
_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
requires(self)

The requirements of this task.

query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

class astra.contrib.classifier.tasks.test.ClassifySourceGivenApVisitFile(*args, **kwargs)

Classify the type of stellar source, given an ApVisitFile.

This task requires the same parameters required by astra.contrib.classifer.train.TrainNIRSpectrumClassifier, and those required by astra.tasks.io.ApVisitFile.

network_factory
task_namespace = Classifier
training_spectra_path
training_labels_path
validation_spectra_path
validation_labels_path
test_spectra_path
test_labels_path
n_epochs
batch_size
weight_decay
learning_rate
max_batch_size = 10000
class_names

Names of classes used in this classifier.

use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
read_observation(self)

Read the input observation from disk.

prepare_batch(self)

Prepare the input observation for being batched through the network.

run(self)

Execute the task.

output(self)

The output of the task.

read_network(self)

Read the network from the input path.

prepare_result(self, log_prob)
get_tqdm_kwds(self, desc=None)
_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
requires(self)

The requirements of this task.

query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

class astra.contrib.classifier.tasks.test.ClassifySourceGivenApStarFile(*args, **kwargs)

Classify an ApStar source from the joint classifications of individual visits (the ApVisit files that went into that ApStar).

This task requires the same parameters required by astra.contrib.classifer.train.TrainNIRSpectrumClassifier, and those required by astra.tasks.io.ApStarFile.

task_namespace = Classifier
training_spectra_path
training_labels_path
validation_spectra_path
validation_labels_path
test_spectra_path
test_labels_path
n_epochs
batch_size
weight_decay
learning_rate
max_batch_size = 10000
class_names

Names of classes used in this classifier.

use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
sdss_data_model_name = apStar
obj
healpix
apstar
apred
telescope
release
public
use_remote
remote_access_method
mirror
verbose
tree
local_path

The local path of the file.

remote_path

The remote path of the file. Useful for debugging path problems.

This is relatively expensive to return, so don’t use this to download sources. Instead use one instance of sdss_access.HttpAccess to get the remote paths of many sources.

requires(self)

We require the classifications from individual visits of this source.

run(self)

Execute the task.

output(self)

The output for these results.

read_network(self)

Read the network from the input path.

prepare_result(self, log_prob)
get_tqdm_kwds(self, desc=None)
_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below:

get_or_create_data_model_relationships(self)

Return the keywords that reference the input data model for this task.

writer(self, spectrum, path, **kwargs)
classmethod get_local_path(cls, release, public=True, mirror=False, verbose=True, **kwargs)
get_remote_http(self)

Download the remote file using HTTP.

get_remote_rsync(self)

Download the remote file using rsync.

get_remote(self)

Download the remote file.

class astra.contrib.classifier.tasks.test.ClassifySourceGivenSDSS4ApVisitFile(*args, **kwargs)

Classify the type of stellar source, given an ApVisitFile.

This task requires the same parameters required by astra.contrib.classifer.train.TrainNIRSpectrumClassifier, and those required by astra.tasks.io.sdss4.ApVisitFile.

network_factory
task_namespace = Classifier
training_spectra_path
training_labels_path
validation_spectra_path
validation_labels_path
test_spectra_path
test_labels_path
n_epochs
batch_size
weight_decay
learning_rate
max_batch_size = 10000
class_names

Names of classes used in this classifier.

use_slurm
slurm_nodes
slurm_ppn
slurm_walltime
slurm_alloc
slurm_partition
slurm_mem
slurm_gres
astra_version_major
astra_version_minor
astra_version_micro
astra_version_dev
strict_output_checking
is_batch_mode

A boolean property indicating whether the task is in batch mode or not.

output_base_dir

Base directory for storing task outputs.

_event_callbacks
priority = 0
disabled = False
resources
worker_timeout
batchable

True if this instance can be run as part of a batch. By default, True if it has any batched parameters

retry_count

Override this positive integer to have different retry_count at task level Check scheduler-config

disable_hard_timeout

Override this positive integer to have different disable_hard_timeout at task level. Check scheduler-config

disable_window

Override this positive integer to have different disable_window at task level. Check scheduler-config

disable_window_seconds
owner_email

Override this to send out additional error emails to task owner, in addition to the one defined in the global configuration. This should return a string or a list of strings. e.g. ‘test@exmaple.com’ or [‘test1@example.com’, ‘test2@example.com’]

use_cmdline_section

Property used by core config such as --workers etc. These will be exposed without the class as prefix.

accepts_messages

For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

task_module

Returns what Python module to import to get access to this class.

_visible_in_registry = True
__not_user_specified = __not_user_specified
_namespace_at_class_time
task_family

DEPRECATED since after 2.4.0. See get_task_family() instead. Hopefully there will be less meta magic in Luigi.

Convenience method since a property on the metaclass isn’t directly accessible through the class instances.

param_args
read_observation(self)

Read the input observation from disk.

prepare_batch(self)

Prepare the input observation for being batched through the network.

run(self)

Execute the task.

output(self)

The output of the task.

read_network(self)

Read the network from the input path.

prepare_result(self, log_prob)
get_tqdm_kwds(self, desc=None)
_warn_on_wrong_param_types(self, strict=False)
__repr__(self)

Build a task representation like MyTask(hash: param1=1.5, param2='5')

get_common_param_kwargs(self, klass, include_significant=True)
get_common_param_names(self, klass, include_significant=True)
get_hashed_params(self, only_significant=True, only_public=False)
to_str_params(self, only_significant=True, only_public=False)

Convert all parameters to a str->str hash.

classmethod from_str_params(cls, params_str)

Creates an instance from a str->str hash. :param params_str: dict of param name -> value as string.

get_batch_task_kwds(self, include_non_batch_keywords=True)
get_batch_tasks(self)

A generator that yields task(s) that are to be run. Works in single or batch mode.

get_batch_size(self)

Get the number of batched tasks.

get_input(self, key)

Return a single input from the task, assuming the inputs are a dictionary. This can be performed by using task.input()[key], but when there are many inputs (e.g., in batch mode), this can be unnecessarily slow.

Parameters:key – The key of the requirements dictionary to return.
requires(self)

The requirements of this task.

query_state(self, full_output=False)

Query the database for this task and return the SQLAlchemy ORM Query.

Parameters:full_output – [optional] Optionally return a three-length tuple containing the ORM query, database model, and keywords to filter by.
get_or_create_state(self, defaults=None)

Get (or create) an entry in the database for this task.

Note that this will only create an entry for the task, and not for the parameters of the task. This is useful when creating many task entries, with the intent you will create the parameter entries later, and you want to minimise overhead. If you want to create an entry for this task and the parameters, use create_state().

This function returns a two-length tuple containing the SQLAlchemy instance, and a boolean flag indicating whether the entry was created (True) or just retrieved (False).

Parameters:defaults – [optional] A dictionary of default key, value pairs to provide if the entry needs to be created in the database.
create_state(self)

Create an entry in the database for this task, and its parameters.

delete_state(self, cascade=False)

Delete this task entry in the database.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
update_state(self, state, cascade=False)

Update the task entry in the database with the given state dictionary.

Parameters:cascade – [optional] Cascade this to any tasks in this batch.
trigger_event_start(self)

Trigger an event signalling that the task has started.

trigger_event_succeeded(self)

Trigger an event signalling that the task has succeeded.

trigger_event_failed(self)

Trigger an event signalling that the task has failed.

trigger_event_processing_time(self, duration, cascade=False)

Trigger the event that signals the processing time of the event.

Parameters:
  • duration – The time taken for this event.
  • cascade – [optional] Also trigger the task succeeded event (default: False).
_owner_list(self)

Turns the owner_email property into a list. This should not be overridden.

classmethod event_handler(cls, event)

Decorator for adding event handlers.

trigger_event(self, event, *args, **kwargs)

Trigger that calls all of the specified events associated with this class.

classmethod get_task_namespace(cls)

The task family for the given class.

Note: You normally don’t want to override this.

classmethod get_task_family(cls)

The task family for the given class.

If task_namespace is not set, then it’s simply the name of the class. Otherwise, <task_namespace>. is prefixed to the class name.

Note: You normally don’t want to override this.

classmethod get_params(cls)

Returns all of the Parameters for this Task.

classmethod batch_param_names(cls)
classmethod get_param_names(cls, include_significant=False)
classmethod get_param_values(cls, params, args, kwargs)

Get the values of the parameters from the args and kwargs.

Parameters:
  • params – list of (param_name, Parameter).
  • args – positional arguments
  • kwargs – keyword arguments.
Returns:

list of (name, value) tuples, one for each parameter.

initialized(self)

Returns True if the Task is initialized and False otherwise.

_get_param_visibilities(self)
clone(self, cls=None, **kwargs)

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

__hash__(self)

Return hash(self).

__eq__(self, other)

Return self==value.

complete(self)

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

classmethod bulk_complete(cls, parameter_tuples)

Returns those of parameter_tuples for which this Task is complete.

Override (with an efficient implementation) for efficient scheduling with range tools. Keep the logic consistent with that of complete().

_requires(self)

Override in “template” tasks which themselves are supposed to be subclassed and thus have their requires() overridden (name preserved to provide consistent end-user experience), yet need to introduce (non-input) dependencies.

Must return an iterable which among others contains the _requires() of the superclass.

process_resources(self)

Override in “template” tasks which provide common resource functionality but allow subclasses to specify additional resources while preserving the name for consistent end-user experience.

input(self)

Returns the outputs of the Tasks returned by requires()

See Task.input

Returns:a list of Target objects which are specified as outputs of all required Tasks.
deps(self)

Internal method used by the scheduler.

Returns the flattened list of requires.

on_failure(self, exception)

Override for custom error handling.

This method gets called if an exception is raised in run(). The returned value of this method is json encoded and sent to the scheduler as the expl argument. Its string representation will be used as the body of the error email sent out if any.

Default behavior is to return a string representation of the stack trace.

on_success(self)

Override for doing custom completion handling for a larger class of tasks

This method gets called when run() completes without raising any exceptions.

The returned value is json encoded and sent to the scheduler as the expl argument.

Default behavior is to send an None value

no_unpicklable_properties(self)

Remove unpicklable properties before dump task and resume them after.

This method could be called in subtask’s dump method, to ensure unpicklable properties won’t break dump.

This method is a context-manager which can be called as below: