Configuration Options¶

Here, we will go over the configuration options you can use to customize your AutoTransformer for your use case, data, and hardware.

[ ]:

import autotransformers
from autotransformers import AutoTransformer, DatasetLoader

We will reuse the example dataset of the first tutorial. See the Getting Started tutorial for an explanation on the dataset’s structure.

[ ]:

# The text snippets in this dataset are from "googleplay", a public dataset of app reviews on Google's Play Store.
dataset = {
    "meta": {
        "name": "example_singlelabel",
        "version": "1.0.0",
        "created_with": "wizard"
    },
    "config": [
        {
            "domain": "text",
            "type": "IText"
        },
        {
            "task_id": "task1",
            "classes": ["positive", "neutral", "negative"],
            "type": "TSingleClassification"
        }
    ],
    "train": [
        [
            {"value": "None of the notifications work. other people in the forums teport similar problems bht no fix. the app is nice but it isnt nearly as functional without notifications"},
            {"value": "negative"},
        ],
        [
            {"value": "It's great"},
            {"value": "positive"},
        ],
        [
            {"value": "Not allowing me to delete my account"},
            {"value": "negative"},
        ],
        [
            {"value": "So impressed that I bought premium on very first day"},
            {"value": "positive"},
        ],
    ],
    "test": [
        [
            {"value": "Can't set more than 7 tasks without paying an absurdly expensive weekly subscription"},
            {"value": "negative"},
        ]
    ],
}

[ ]:

dl = DatasetLoader(dataset)

The help function can be used to find all supported configuration options for the given domain; in our example, the “text” domain. Other domains (e.g. “document”) may have other configuration options as well as other default values for them.

[ ]:

autotransformers.help("text")

The most important options are usually the epochs and the learning rate. A higher number of epochs may improve results, but takes longer to train. Likewise, the learning rate can be slightly adjusted if the model learns slowly or training is unstable.

If you see out-of-memory errors, you can try to adjust the batch size. This reduces memory footprint at the expensive of speed.

A config can be in list-style or nested-dict-style, as shown below. You can also supply a .json file directly.

[ ]:

# list-style config
config = [
    ("engine/stop_condition/type", "MaxEpochs"),
    ("engine/stop_condition/value", 2),
    ("engine/optimizer/lr", 3e-5),
    ("engine/train_batch_size", 32),
]

[ ]:

# equivalent dict-style config
config = {
    "engine": {
        "stop_condition": {
            "type": "MaxEpochs",
            "value": 2,
        },
        "train_batch_size": 32,
        "optimizer": {
            "lr": 3e-5
        }
    }
}

[ ]:

at = AutoTransformer(config)

at.init(dataset_loader=dl, path=".models/example06")
at.train(dl)