# Confusion Matrix

The performance metrics used to compare classification performance are typically represented using elements in the confusion matrix, which is generated by the machine learning model on a test sample. Figure 1 denotes the template of a confusion matrix for a two-class classification problem, where the class of an instance is either positive or negative (i.e., binary classification).

In the confusion matrix, columns represent actual classes, while rows represent the predicted classes. The number of instances in the test sample is depicted on the top of the confusion matrix, where P is the total number of positive instances and N is the total number of negative instances. The number of instances predicted by the model in each class is shown in the left of the confusion matrix, where p is the total number of instances predicted to be positive and n is the total number of instances predicted to be negative.

### Elementary performance metrics

True Positives (TP) denotes the number of instances correctly predicted to be positive examples. False Negatives (FN) denotes the number of positive instances predicted to be negative. Similarly, True Negatives (TN) is the number of correctly predicted negative instances, and False Positives (FP) denotes the number of negative instances predicted to be positive. The True Positive rate (TPrate), which is represented as TPrate = {$\frac{TP}{TP+FN}$}, depicts the rate at which the positive class is recognised. This is also known as recall or sensitivity. The corresponding metric of the negative class is the true negative rate (TNrate), which is measured as TNrate = {$\frac{TN}{TN+FP}$}. This is also known as specificity and indicates the number of negative instances that are correctly detected. The purpose of Positive Predictive Value (PPV) and Negative Predictive Value (NPV) is to quantify how many instances which are detected as belonging to a given class actually represent that class. PPV, which is also known as precision, measures the number of actual instances identified as positive (i.e., PPV = {$\frac{TP}{TP+FP}$}). NPV denotes the number of negative instances that are correctly detected out of all instances predicted to be negative (i.e., NPV = {$\frac{TN}{TN+FP}$}).

### Composite measures

From the elementary performance metrics discussed above, several composite measures have been constructed, such as F-measure and ROC curves. F-measure (more specifically, F1) is the harmonic mean of precision and recall, and is denoted as {$2\times\frac{precision \times recall}{precision+recall}$}. The ROC (Receiver Operating Characteristic) curve plots true positive rate (or sensitivity denoted as {$\frac{TP}{TP+FN}$}) against false positive rate (or 1-specificity denoted as {$\frac{FP}{FP+TN}$}), at different classification thresholds. Typically, a good classification model should reside in the upper left region of the plot (Figure 2). Point (0,0) indicates a model that detects all instances as negative. Point(1,1) denotes all instances as positive, while a random classifier signifies y=x curve. The ideal classification model generates the point (0,1) indicating that its false positive rate is zero (i.e., none of the negative instances are predicted to be positive) and the true positive rate is equal to 1 (i.e., every positive instance is identified). The AUC (Area Under the ROC Curve) is the aggregated measure of the ROC curve that indicates the performance across all possible thresholds . More specifically, the AUC denotes the entire two-dimensional area under the ROC curve from point (0,0) to (1,1). Simply put, it indicates the probability with which classifier will rank a random positive instance more highly than a random negative instance.