On the configuration tab, you have the possibility to define your own threshold values for classification. Thresholds allow you to define a minimum probability for the classification of texts. If it is not reached, no classification is performed.
Here, we explain step by step how the configuration function is structured and how you can adjust the threshold values yourself.
1️⃣ Default settings
If no own thresholds have been defined yet, the overview looks as follows:
2️⃣ Activate threshold
By clicking on the toggle switch, the thresholds can be activated and the information and settings for them can be opened. If thresholds are activated, the numbers in the above panel will change at the same time. The numbers in this panel always refer to the examples that could be classified with the set threshold. This means: If a text can be classified with the minimum probability of the defined threshold, it is included in these statistics.
The statistics are made up of the following components: «Accuracy» indicates the percentage of all uploaded examples that were classified correctly. To the right of this, the number of «correct and incorrect classifications» is displayed. In addition, the number of examples that could not be classified by the defined threshold is shown, as well as the total number of all text examples that are available.
3️⃣ Select threshold
After activating the threshold, a histogram opens below, which represents the existing data. The data basis for the probabilities shown in the visualization are the results from the cross-validation, which are also already used in the report.
The width of the bars corresponds to five percentage points. Using the slider, any threshold can be set. By default, a threshold of 75% is set here for all categories.
The X-axis of the histogram thus corresponds to the percentage points that can be set for the threshold. The number of data examples are displayed on the Y-axis. The examples that can be correctly classified are shown in blue, and the faulty ones in orange. With mouse-over over the bar, the absolute number is displayed.
«Classified» describes the percentage of examples that are assigned a higher probability than the selected threshold, and thus continue to be classified. «Precision» describes the percentage of these classified examples that are also assigned to the correct category.