# Getting Started - Manual

```{note}
   Please note that we suggest to use the {term}`AutoTransformers wizard` to get started. This wizard will help you to create your dataset, and it will automatically generate the `train.py` as well as `predict.py` script.
```

In the previous tutorial, you have seen how you can use the {term}`AutoTransformers wizard` to generate a complete project, without the need for coding a single line. In this example, it is shown how you can train an {term}`AutoTransformer` on documents without using the {term}`AutoTransformers wizard`. More precisely, we will train a "MovieAT" model that will predict, whether a movie review was positive or negative. In this specific case, the input format is text and the model should solve the {term}`task<Task>` to classify whether the input was positive (pos) or negative (neg): An overview of the system that we develop next is shown below:

<img src="/_static/img/custom_dataset_loader.png" />

In the following subsections, you will learn how each component can be implemented from scratch using the {term}`AutoTransformers library`.


## Step 1: Define and think about typing
Its very important to understand that AutoTransformers is a library to transform your data into ML models.
Therefore, you need to define what input data you have and what output you expect. AutoTransformers
implements different input types (e.g. text, documents, images) and different output types that can be used
simultaneously (classification, information extraction etc.).

So first we need to define the input and output types that we expect. In this example we want to classify
*text input* into a *positive* or *negative* sentiment. In AutoTransformers Types are always named
with leading `I...` and for output types `T...` is used. So we want a model with `(IText, ISingleClassification)`.

There is no need to explicitly define this in the AT model class, instead its enough to train the model
with the correct dataset and the model will automatically learn the correct input and output types. Therefore,
its very important that we define the dataset correctly. This is shown in the next step.

```{note}
To find all types you can use the console and type `at man --domain=text` in order to print all supported types
for the text domain. Clearly, the same can be done for all domains (text, document, computer vision).
```

## Step 2: Implement Dataset and DatasetLoader
The DatasetLoader defines your data and, therefore, which types a trained AT model expects in future and it also
defines which data is returned by an AT model. The `config` property of the DatasetLoader defines the input as
well as outputs. Outputs such as `ISingleClassification` need additional configuration in our example we need
to define which classes (`pos`, `neg`) exist. Additionally the meta section defines more information that is quite
useful later on when we want to check on which data a model was trained. Finally, the DatasetLoader provides
the `train_ds`, `test_ds` and `eval_ds` properties that return the data in the correct format used for training and
evaluation.

```{eval-rst}
.. literalinclude:: imdb.py
    :pyobject: ImdbDatasetLoader
```

Its not necessary to create your own DatasetLoader each time. Instead its also possible to use a custom json format
as shown in [this tutorial](/getting_started/data).


## Step 3: Train
Next, we can use our custom `ImdbDatasetLoader` to train and create a MovieAT model. The model will be stored at `.models/MovieAT` from where we can load it later on.

```{eval-rst}
.. literalinclude:: imdb.py
    :pyobject: train
```
Note that we usually train the model only once, but predict often afterwards.


## Step 4: Predict
Finally, we can implement the `predict.py` script that loads our MovieAT model and predicts new data based on patterns that the model learned during training:
```{eval-rst}
.. literalinclude:: imdb.py
    :pyobject: predict
```

Please note that the input format must match with your dataset definition. The output that is returned then also
depends on the same definition i.e. it depends on how your model was trained. This allows AutoTransformers
to be generic and usable for any data type and output type(s) as multiple outputs can be generated at the same time.

## Full Script
The full script is shown below
```{eval-rst}
    .. collapse:: Source Code

        .. literalinclude:: imdb.py
```


## How to continue?
AutoTransformers provides even more features such as:
- Checkpointing
- Clearml or Wandb Logging
- Early stopping
- Active Learning
- Document or Computer Vision tasks
- ...

We think that you can learn how to use the {term}`AutoTransformers library` best with different real-world examples. We, therefore, provide many different tutorials in the next section and details in the [API documentation](/api/api) section.