Multi-Single label text classification¶

Note: This example builds upon multi-label classification. If you are unfamiliar with it, refer to the previous examples.

Multi-single label classification is a type of hierarchical classification: First, a sample is classified with multi-label classification, yielding a mumber of classes. Then, a single prediction (in a second dimension) is given for each of those classes.

We showcase this with an ABSA (aspect-based sentiment analysis) example, using reviews of hotels. Just like a multi-label dataset, each sample has multiple aspects, but then each aspect additionally has a sentiment. For example, “I liked the breakfast, althought the staff doesn’t even bother to greet you when you enter” has the aspects “food” with positive sentiment, and the aspect “staff” with a negative sentiment. The label would thus be `[[“Food”, “positive”], [“Staff”, “negative”]].

[ ]:

from autotransformers import AutoTransformer, DatasetLoader

Below is an example dataset for multi-single-label tasks. Additionally to the “classes”, we now have to configure a list of “inner classes” for the inner classification task. Note the None inner class, which is used when a sample is not labeled with the corresponding outer class.

The samples’ values are lists of pairs in multi-single classification: Each pair specified an outer and its corresponding inner class.

[ ]:

dataset = {
    "meta": {
        "name": "example_multisingle_label",
        "version": "1.0.0",
        "created_with": "wizard"
    },
    "config": [
        {
            "domain": "text",
            "type": "IText"
        },
        {
            "task_id": "task1",
            "classes": ["Room", "Staff", "Cleanliness"],
            "inner_classes": ["None", "positive", "neutral", "negative"],
            "none_inner_class": "None",
            "type": "TMultiSingleClassification"
        }
    ],
    "train": [
        [
            {"value": "the room was very spacious."},
            {"value": [["Room", "positive"]]},
        ],
        [
            {"value": "Everything was spotless, bed lines were indeed heavenly and the bed was super comfortable with wonderful pillows."},
            {"value": [["Cleanliness", "positive"]]},
        ],
        [
            {"value": "In the room there is a tiny bathroom with shower and seperate toilet."},
            {"value": [["Room", "neutral"]]},
        ],
        [
            {"value": "Our agent had booked the wrong room type and the hotel room we were given was poor."},
            {"value": [["Room", "negative"]]},
        ],
    ],
    "eval": [
        [
            {"value": "Really clean right by a canal staff really friendly and helpful."},
            {"value": [["Staff", "positive"], ["Cleanliness", "positive"]]},
        ],
    ],
    "test": [
        [
            {"value": "The concierge was also very helpful in organising some train tickets"},
            {"value": [["Staff", "positive"]]},
        ]
    ]
}

As before, we simply create a DatasetLoader from the dataset and start training.

[ ]:

dl = DatasetLoader(dataset)

# Or create a DatasetLoader from a file
# dl = DatasetLoader("path/to/my-dataset.json")

[ ]:

# In this example, we only train for one epoch to finish fast.
# In reality, you want to set this to a higher value for better results.
config = [
    ("engine/stop_condition/type", "MaxEpochs"),
    ("engine/stop_condition/value", 1),
]
at = AutoTransformer(config)

at.init(dataset_loader=dl, path=".models/example03")
at.train(dl)

And check our result:

[ ]:

at("I liked the breakfast, althought the staff doesn't even bother to greet you when you go anywhere.")