Saving, loading and checkpointing models¶
This example covers AutoTransformers’ save/load features to make trained models persistent, load already-trained models again, and continue training from a saved model.
[ ]:
from autotransformers import AutoTransformer, DatasetLoader
We set up the dataset and model as in the previous example:
[ ]:
# The text snippets in this dataset are from "googleplay", a public dataset of app reviews on Google's Play Store.
dataset = {
"meta": {
"name": "example_singlelabel",
"version": "1.0.0",
"created_with": "wizard"
},
"config": [
{
"domain": "text",
"type": "IText"
},
{
"task_id": "task1",
"classes": ["positive", "neutral", "negative"],
"type": "TSingleClassification"
}
],
"train": [
[
{"value": "None of the notifications work. other people in the forums teport similar problems bht no fix. the app is nice but it isnt nearly as functional without notifications"},
{"value": "negative"},
],
[
{"value": "It's great"},
{"value": "positive"},
],
[
{"value": "Not allowing me to delete my account"},
{"value": "negative"},
],
[
{"value": "So impressed that I bought premium on very first day"},
{"value": "positive"},
],
],
"test": [
[
{"value": "Can't set more than 7 tasks without paying an absurdly expensive weekly subscription"},
{"value": "negative"},
]
],
}
[ ]:
dl = DatasetLoader(dataset)
config = [
("engine/stop_condition/type", "MaxEpochs"),
("engine/stop_condition/value", 1),
]
at = AutoTransformer(config)
path
- this is the checkpointing location to store intermediate steps during training. Checkpoints enable you to stop training and resume late, and prevent a complete loss of progress if something goes wrong.[ ]:
at.init(dataset_loader=dl, path=".models/example07")
at.train(dl)
In case the training run failed (e.g. system crashed, out of memory, …) it is possible to continue training from the latest checkpoint. Simply load the model from this path and call train
again.
[ ]:
del at # remove the old AutoTransformer (to simulate a restart)
at = AutoTransformer(config)
at.load(path=".models/example07")
at.train(dl) # continue training
When training is finished, the checkpoint is automatically converted into a saved model. A saved model is intended only for inference, compared to a checkpoint, which is intended for further training. Thus, a saved model does not persist internal states that are only needed for training, for example the optimizer or subsampler states, which drastically reduces the size of the saved model.
Additionally, its possible to explicitly save a model using the save()
method:
[ ]:
at.save(".models/example07_new")
# Clean up memory. Usually not required if independent train and predict scripts are executed.
at.finish()
del at
Later, we can load the weights into a new AutoTransformer (e.g. other script, node etc.) and predict new data using the knowledge gained from training:
[ ]:
new_at = AutoTransformer().load(".models/example07")
res = new_at(["This app is amazing!", "While I do like the ease of use, I think it's too expensive."])
[(input.value, output.value) for input, output in res]
Attempting to continue training from a saved model won’t train it again, since the training stage is already marked as “finished”.
[ ]:
new_at.train(dl) # this call will return immediately
Nevertheless, sometimes we want to load the weights of an already trained model and start training again. For example if a generic “Banking” model is trained, it should be possible to fine-tune this on some given task. In this case we specify the path to this model in the init function as shown below:
[ ]:
at_finetune = AutoTransformer(config)
# In this case, we train a new model, but start with weights from the model stored at `.../07_save_and_load`.
at_finetune.init(
dataset_loader=dl,
model_name_or_path=".models/example07",
path=".models/example07_finetuned",
)
at_finetune.train(dl)