.. _learnloop:
Active Learning loop
====================
Details on running 1 loop
-------------------------
Once the data has been pre-processed, analysis steps 2-4 can be performed directly using the ``DataBase`` object.
For start, we can load the feature information:
.. code-block:: python
:linenos:
>>> from resspect import DataBase
>>> path_to_features_file = 'results/Bazin.csv'
>>> data = DataBase()
>>> data.load_features(path_to_features_file, feature_extractor='Bazin', screen=True)
Loaded 21284 samples!
Notice that this data has some pre-determine separation between training and test sample:
.. code-block:: python
:linenos:
>>> data.metadata['orig_sample'].unique()
array(['test', 'train'], dtype=object)
You can choose to start your first iteration of the active learning loop from the original training sample
flagged int he file OR from scratch. As this is our first example, let's do the simple thing and start from the original
training sample. The code below build the respective samples and performs the classification:
.. code-block:: python
:linenos:
>>> data.build_samples(initial_training='original', nclass=2, screen=True)
** Inside build_orig_samples: **
Training set size: 1093
Test set size: 20191
Validation set size: 20191
Pool set size: 20191
From which queryable: 20191
>>> data.classify(method='RandomForest')
>>> data.classprob # check classification probabilities
array([[0.461, 0.539],
[0.346, 0.654],
...,
[0.398, 0.602],
[0.396, 0.604]])
.. hint:: If you wish to start from scratch, just set the `initial_training=N` where `N` is the number of objects in you want in the initial training. The code will then randomly select `N` objects from the entire sample as the initial training sample. It will also impose that at least half of them are SNe Ias.
For a binary classification, the output from the classifier for each object (line) is presented as a pair of floats, the first column
corresponding to the probability of the given object being a Ia and the second column its complement.
Given the output from the classifier we can calculate the metric(s) of choice:
.. code-block:: python
:linenos:
>>> data.evaluate_classification(metric_label='snpcc')
>>> print(data.metrics_list_names) # check metric header
['acc', 'eff', 'pur', 'fom']
>>> print(data.metrics_list_values) # check metric values
[0.5975434599574068, 0.9024767801857585,
0.34684684684684686, 0.13572404702012383]
Running a number of iterations in sequence
------------------------------------------
We provide a function where all the above steps can be done in sequence for a number of iterations.
In interactive mode, you must define the required variables and use the :py:mod:`resspect.learn_loop` function:
.. code-block:: python
:linenos:
>>> from resspect.learn_loop import learn_loop
>>> from resspect import LoopConfiguration
>>> nloops = 1000 # number of iterations
>>> method = 'Bazin' # only option in v1.0
>>> ml = 'RandomForest' # classifier
>>> strategy = 'RandomSampling' # learning strategy
>>> input_file = 'results/Bazin.csv' # input features file
>>> metric = 'results/metrics.csv' # output metrics file
>>> queried = 'results/queried.csv' # output query file
>>> train = 'original' # initial training
>>> batch = 1 # size of batch
>>> learn_loop(LoopConfiguration(nloops=nloops, features_method=method, classifier=ml,
>>> strategy=strategy, path_to_features=input_file, output_metrics_file=metric,
>>> output_queried_file=queried, training=train, batch=batch))
Alternatively you can also run everything from the command line:
.. code-block:: bash
>>> run_loop -i -b -n
>>> -m