{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multilabel text classification\n",
    "\n",
    "*Note: This notebook assumes that you're familar with the basic usage of AutoTransformers. Go to the \"Getting started\" notebook to learn the basics.*\n",
    "\n",
    "In this example, we will train an AutoTransformer for multi-label classification. This means that each sample can have multiple labels (or none at all), in contrast to single-label, where the labels are mutually exclusive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autotransformers import AutoTransformer, DatasetLoader"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, we prepare a minimal dataset for training the AutoTransformer. Pay attention to the values of the labels - they are lists now, instead of single items like in the single-label example. However, it is still possible (and common) to have samples with only a single label, represented by a single-item list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The text snippets in this dataset are from \"stackoverflow\", a dataset of StackOverflow posts and their tags.\n",
    "dataset = {\n",
    "    \"meta\": {\n",
    "        \"name\": \"example_multilabel\",\n",
    "        \"version\": \"1.0.0\",\n",
    "        \"created_with\": \"wizard\"\n",
    "    },\n",
    "    \"config\": [\n",
    "        {\n",
    "            \"domain\": \"text\",\n",
    "            \"type\": \"IText\"\n",
    "        },\n",
    "        {\n",
    "            \"task_id\": \"task1\",\n",
    "            \"classes\": [\"c\", \"java\", \".net\", \"php\"],\n",
    "            \"type\": \"TMultiClassification\"\n",
    "        }\n",
    "    ],\n",
    "    \"train\": [\n",
    "        [\n",
    "            {\"value\": \"is it possible to compile net il code to machine code i would like to thistribute my net programs without the net framework is it possible to compile a net program to machine code\"},\n",
    "            {\"value\": [\".net\"]}\n",
    "        ],\n",
    "        [\n",
    "            {\"value\": \"how do i show what fields a struct has in gdb i came upon a struct called ngx http variable value t in my gdb session and i would like to print what fields it has in the console is that possible\"},\n",
    "            {\"value\": [\"c\"]}\n",
    "        ],\n",
    "        [\n",
    "            {\"value\": \"character size in java vs c why does a character in java take twice as much space to store as a character in c\"},\n",
    "            {\"value\": [\"c\", \"java\"]}\n",
    "        ],\n",
    "\n",
    "        [\n",
    "            {\"value\": \"net equivalent of javas listsublist is there a net equivalent of javas listsublist that works on ilistt\"},\n",
    "            {\"value\": [\"java\", \".net\"]}\n",
    "        ],\n",
    "    ],\n",
    "    \"test\": [\n",
    "        [\n",
    "            {\"value\": \"is there any general rule to decide if a type casting is done by javac compiler or by the vm\"},\n",
    "            {\"value\": [\"java\"]}\n",
    "        ],\n",
    "\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even though the type of data is different this time, we can use the same code as in the example before for training - the type of data and task is automatically recognized by the DatasetLoader and a suitable AutoTransformer is trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dl = DatasetLoader(dataset)\n",
    "\n",
    "# Or create a DatasetLoader from a file\n",
    "# dl = DatasetLoader(\"path/to/my-dataset.json\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# In this example, we only train for one epoch to finish fast. \n",
    "# In reality, you want to set this to a higher value for better results.\n",
    "config = [\n",
    "    (\"engine/stop_condition/type\", \"MaxEpochs\"),\n",
    "    (\"engine/stop_condition/value\", 1),\n",
    "]\n",
    "at = AutoTransformer(config)\n",
    "\n",
    "at.init(dataset_loader=dl, path=\".models/example02\")\n",
    "at.train(dl)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we check our resulting model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res = at([\"what size does an integer have in c\", \"aaaaa\"])\n",
    "\n",
    "# Formatting the result for nicer output\n",
    "[{\"input\": input.value, \"prediction\": output.value} for input, output in res]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "83ee97f1e4ad98a710574577955c6720418d3d8f987616cd4f238f891737d017"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}