========
Glossary
========

.. glossary::

    AutoTransformers library
        The library that DeepOpinion developed in order to automatically transform arbitrary data into ML Models
        that can be used to automate processes or solve different :term:`tasks<Task>` in different :term:`domains<Domain>`.

    AutoTransformer
        A :term:`model` that was automatically created with the :term:`AutoTransformers library` using some
        :term:`dataset<Dataset>`. An AutoTransformer can be used to predict new, unseen data in order to solve
        a :term:`task<Task>` such as :term:`InformationExtraction` in documents or text.

    AutoTransformers wizard
        A command line tool that can be called with :code:`at wizard` that helps users
        to create the dataset as well as train and predict scripts.

    Model
        A model is a file that was trained on a :term:`dataset<Dataset>` to learn patterns from the data
        in order to predict new, unseen data later on. Models can e.g. be used to automate processes such as
        text classification or document information extraction. Note that we call a model that was
        created with the :term:`AutoTransformers library` an :term:`AutoTransformer`.

    Hyperparameter
        Different parameters such as the learning rate are (usually) tuned by humans to reach high performance.
        The :term:`AutoTransformers library` is developed such that its not really required to tune hyperparameters
        manually.

    Domain
        Different data sources such as text, documents, images or speech. The :term:`AutoTransformers library`
        supports several domains such that a large range of problems can be solved with an :term:`AutoTransformer`.

    Task
        An :term:`AutoTransformer` should usually solve specific problem(s). E.g. information should be extracted
        from a document and the document should be classified. Classification, Information extraction etc. is then
        called a task. Note that a single :term:`AutoTransformer` can solve several tasks, of one domain, in parallel.

    Skill
        A model trained in a given domain on some task(s) is called a skill. For example, we call a model for document
        information extraction a skill. Or text classification is another skill.

    Classification
        An umbrella term for tasks that assign classes to each sample. Currently, supported classification tasks are
        single-label, multi-label, and multi-single label classification.

    Single-label classification
        This task enables a model to find exactly 1 class out of C classes for each sample.

    Multi-label classification
        This task enables a model to find multiple (or only one, or even zero) classes out of C classes for each sample.

    Multi-single-label classification
        Like multi-label classification, but additionally finds 1 out of C2 inner classes for each outer class.

    InformationExtraction
        This task enables a model to extract information such as the name, address etc. from unstructured text,
        documents etc.

    Dataset
        All :term:`samples<Sample>` (text, documents, images etc.) that are used to train and evaluate the :term:`AutoTransformer`.

    Sample
        We call a single entry in the dataset a sample. Note that each sample of the :term:`dataset<Dataset>` is
        :term:`labeled<Sample>` by the user.

    Label
        To train a model it is required to label the samples of the dataset in order to train the
        :term:`AutoTransformer` correctly.

    DatasetLoader
        The Python (iterator) implementation that is used to load your dataset from the disk into the memory in order
        to train an :term:`AutoTransformer` on a certain :term:`node<Node>` and :term:`device<Device>`.

    Data pipeline
        The dataset is not only loaded with multiple threads through the :term:`DatasetLoader`, but also preprocessed by
        pipeline components. We call this the data pipeline.

    Device
        A device is something that executes code during runtime e.g. a GPU, CPU or TPU.

    Node
        A server that has multiple devices. Nodes can also be clustered in order to train models faster through data,
        model or pipeline parallelism.

    Active learning
        A feature to select the :term:`samples<Sample>` from unlabeled data that should be labeled next where a model
        would benefit most.

    LLM
        Large language models (LLM) are (usually) generative models with billions of parameters that can solve a
        huge range of tasks through prompt inputs.

    Performance
        How well a :term:`model<Model>` performs on new, unseen data.

    Checkpointing
        Checkpointing stores your current training state onto the disk such that it can be continued again after
        the training is stopped (hardware failures etc.).