======== Glossary ======== .. glossary:: AutoTransformers library The library that DeepOpinion developed in order to automatically transform arbitrary data into ML Models that can be used to automate processes or solve different :term:`tasks` in different :term:`domains`. AutoTransformer A :term:`model` that was automatically created with the :term:`AutoTransformers library` using some :term:`dataset`. An AutoTransformer can be used to predict new, unseen data in order to solve a :term:`task` such as :term:`InformationExtraction` in documents or text. AutoTransformers wizard A command line tool that can be called with :code:`at wizard` that helps users to create the dataset as well as train and predict scripts. Model A model is a file that was trained on a :term:`dataset` to learn patterns from the data in order to predict new, unseen data later on. Models can e.g. be used to automate processes such as text classification or document information extraction. Note that we call a model that was created with the :term:`AutoTransformers library` an :term:`AutoTransformer`. Hyperparameter Different parameters such as the learning rate are (usually) tuned by humans to reach high performance. The :term:`AutoTransformers library` is developed such that its not really required to tune hyperparameters manually. Domain Different data sources such as text, documents, images or speech. The :term:`AutoTransformers library` supports several domains such that a large range of problems can be solved with an :term:`AutoTransformer`. Task An :term:`AutoTransformer` should usually solve specific problem(s). E.g. information should be extracted from a document and the document should be classified. Classification, Information extraction etc. is then called a task. Note that a single :term:`AutoTransformer` can solve several tasks, of one domain, in parallel. Skill A model trained in a given domain on some task(s) is called a skill. For example, we call a model for document information extraction a skill. Or text classification is another skill. Classification An umbrella term for tasks that assign classes to each sample. Currently, supported classification tasks are single-label, multi-label, and multi-single label classification. Single-label classification This task enables a model to find exactly 1 class out of C classes for each sample. Multi-label classification This task enables a model to find multiple (or only one, or even zero) classes out of C classes for each sample. Multi-single-label classification Like multi-label classification, but additionally finds 1 out of C2 inner classes for each outer class. InformationExtraction This task enables a model to extract information such as the name, address etc. from unstructured text, documents etc. Dataset All :term:`samples` (text, documents, images etc.) that are used to train and evaluate the :term:`AutoTransformer`. Sample We call a single entry in the dataset a sample. Note that each sample of the :term:`dataset` is :term:`labeled` by the user. Label To train a model it is required to label the samples of the dataset in order to train the :term:`AutoTransformer` correctly. DatasetLoader The Python (iterator) implementation that is used to load your dataset from the disk into the memory in order to train an :term:`AutoTransformer` on a certain :term:`node` and :term:`device`. Data pipeline The dataset is not only loaded with multiple threads through the :term:`DatasetLoader`, but also preprocessed by pipeline components. We call this the data pipeline. Device A device is something that executes code during runtime e.g. a GPU, CPU or TPU. Node A server that has multiple devices. Nodes can also be clustered in order to train models faster through data, model or pipeline parallelism. Active learning A feature to select the :term:`samples` from unlabeled data that should be labeled next where a model would benefit most. LLM Large language models (LLM) are (usually) generative models with billions of parameters that can solve a huge range of tasks through prompt inputs. Performance How well a :term:`model` performs on new, unseen data. Checkpointing Checkpointing stores your current training state onto the disk such that it can be continued again after the training is stopped (hardware failures etc.).