{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Multilabel text classification\n", "\n", "*Note: This notebook assumes that you're familar with the basic usage of AutoTransformers. Go to the \"Getting started\" notebook to learn the basics.*\n", "\n", "In this example, we will train an AutoTransformer for multi-label classification. This means that each sample can have multiple labels (or none at all), in contrast to single-label, where the labels are mutually exclusive." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from autotransformers import AutoTransformer, DatasetLoader" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "As before, we prepare a minimal dataset for training the AutoTransformer. Pay attention to the values of the labels - they are lists now, instead of single items like in the single-label example. However, it is still possible (and common) to have samples with only a single label, represented by a single-item list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The text snippets in this dataset are from \"stackoverflow\", a dataset of StackOverflow posts and their tags.\n", "dataset = {\n", " \"meta\": {\n", " \"name\": \"example_multilabel\",\n", " \"version\": \"1.0.0\",\n", " \"created_with\": \"wizard\"\n", " },\n", " \"config\": [\n", " {\n", " \"domain\": \"text\",\n", " \"type\": \"IText\"\n", " },\n", " {\n", " \"task_id\": \"task1\",\n", " \"classes\": [\"c\", \"java\", \".net\", \"php\"],\n", " \"type\": \"TMultiClassification\"\n", " }\n", " ],\n", " \"train\": [\n", " [\n", " {\"value\": \"is it possible to compile net il code to machine code i would like to thistribute my net programs without the net framework is it possible to compile a net program to machine code\"},\n", " {\"value\": [\".net\"]}\n", " ],\n", " [\n", " {\"value\": \"how do i show what fields a struct has in gdb i came upon a struct called ngx http variable value t in my gdb session and i would like to print what fields it has in the console is that possible\"},\n", " {\"value\": [\"c\"]}\n", " ],\n", " [\n", " {\"value\": \"character size in java vs c why does a character in java take twice as much space to store as a character in c\"},\n", " {\"value\": [\"c\", \"java\"]}\n", " ],\n", "\n", " [\n", " {\"value\": \"net equivalent of javas listsublist is there a net equivalent of javas listsublist that works on ilistt\"},\n", " {\"value\": [\"java\", \".net\"]}\n", " ],\n", " ],\n", " \"test\": [\n", " [\n", " {\"value\": \"is there any general rule to decide if a type casting is done by javac compiler or by the vm\"},\n", " {\"value\": [\"java\"]}\n", " ],\n", "\n", " ]\n", "}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Even though the type of data is different this time, we can use the same code as in the example before for training - the type of data and task is automatically recognized by the DatasetLoader and a suitable AutoTransformer is trained." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dl = DatasetLoader(dataset)\n", "\n", "# Or create a DatasetLoader from a file\n", "# dl = DatasetLoader(\"path/to/my-dataset.json\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# In this example, we only train for one epoch to finish fast. \n", "# In reality, you want to set this to a higher value for better results.\n", "config = [\n", " (\"engine/stop_condition/type\", \"MaxEpochs\"),\n", " (\"engine/stop_condition/value\", 1),\n", "]\n", "at = AutoTransformer(config)\n", "\n", "at.init(dataset_loader=dl, path=\".models/example02\")\n", "at.train(dl)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Again, we check our resulting model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = at([\"what size does an integer have in c\", \"aaaaa\"])\n", "\n", "# Formatting the result for nicer output\n", "[{\"input\": input.value, \"prediction\": output.value} for input, output in res]" ] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "83ee97f1e4ad98a710574577955c6720418d3d8f987616cd4f238f891737d017" } } }, "nbformat": 4, "nbformat_minor": 2 }