Title: | Exploratory Data Analysis and Manipulation of Multi-Label Data Sets |
---|---|
Description: | Exploratory data analysis and manipulation functions for multi- label data sets along with an interactive Shiny application to ease their use. |
Authors: | David Charte [cre] |
Maintainer: | David Charte <[email protected]> |
License: | LGPL (>= 3) | file LICENSE |
Version: | 0.4.2 |
Built: | 2025-02-01 03:42:14 UTC |
Source: | https://github.com/fcharte/mldr |
mldr
returning a new mldr
Generates a new mldr
object containing the selected
rows from an existent mldr
## S3 method for class 'mldr' mldrObject[rowFilter = T]
## S3 method for class 'mldr' mldrObject[rowFilter = T]
mldrObject |
Original |
rowFilter |
Expression to filter the rows |
A new mldr
object with the selected rows
mldr_from_dataframe
, ==.mldr
, +.mldr
library(mldr) highlycoupled <- genbase[.SCUMBLE > 0.05] # Select instances with highly imbalanced coupled labels summary(highlycoupled) # Compare the selected instances summary(genbase) # with the traits of the original MLD
library(mldr) highlycoupled <- genbase[.SCUMBLE > 0.05] # Select instances with highly imbalanced coupled labels summary(highlycoupled) # Compare the selected instances summary(genbase) # with the traits of the original MLD
Generates a new mldr object joining the rows in the two mldrs given as input
## S3 method for class 'mldr' mldr1 + mldr2
## S3 method for class 'mldr' mldr1 + mldr2
mldr1 |
First |
mldr2 |
Second |
a new mldr
object with all rows in the two parameters
Checks if two mldr objects have the same structure
## S3 method for class 'mldr' mldr1 == mldr2
## S3 method for class 'mldr' mldr1 == mldr2
mldr1 |
First |
mldr2 |
Second |
TRUE
if the two mldr objects have the same structure, FALSE
otherwise
Evaluation metrics based on simple metrics for the confusion matrix, averaged through several criteria.
accuracy(true_labels, predicted_labels, undefined_value = "diagnose") precision(true_labels, predicted_labels, undefined_value = "diagnose") micro_precision(true_labels, predicted_labels, ...) macro_precision(true_labels, predicted_labels, undefined_value = "diagnose") recall(true_labels, predicted_labels, undefined_value = "diagnose") micro_recall(true_labels, predicted_labels, ...) macro_recall(true_labels, predicted_labels, undefined_value = "diagnose") fmeasure(true_labels, predicted_labels, undefined_value = "diagnose") micro_fmeasure(true_labels, predicted_labels, ...) macro_fmeasure(true_labels, predicted_labels, undefined_value = "diagnose")
accuracy(true_labels, predicted_labels, undefined_value = "diagnose") precision(true_labels, predicted_labels, undefined_value = "diagnose") micro_precision(true_labels, predicted_labels, ...) macro_precision(true_labels, predicted_labels, undefined_value = "diagnose") recall(true_labels, predicted_labels, undefined_value = "diagnose") micro_recall(true_labels, predicted_labels, ...) macro_recall(true_labels, predicted_labels, undefined_value = "diagnose") fmeasure(true_labels, predicted_labels, undefined_value = "diagnose") micro_fmeasure(true_labels, predicted_labels, ...) macro_fmeasure(true_labels, predicted_labels, undefined_value = "diagnose")
true_labels |
Matrix of true labels, columns corresponding to labels and rows to instances. |
predicted_labels |
Matrix of predicted labels, columns corresponding to labels and rows to instances. |
undefined_value |
The value to be returned when a computation results in an undefined value due to a division by zero. See details. |
... |
Additional parameters for precision, recall and Fmeasure. |
Available metrics in this category
accuracy
: Bipartition based accuracy
fmeasure
: Example and binary partition F_1 measure (harmonic mean between precision and recall, averaged by instance)
macro_fmeasure
: Label and bipartition based F_1 measure (harmonic mean between precision and recall, macro-averaged by label)
macro_precision
: Label and bipartition based precision (macro-averaged by label)
macro_recall
: Label and bipartition based recall (macro-averaged by label)
micro_fmeasure
: Label and bipartition based F_1 measure (micro-averaged)
micro_precision
: Label and bipartition based precision (micro-averaged)
micro_recall
: Label and bipartition based recall (micro-averaged)
precision
: Example and bipartition based precision (averaged by instance)
recall
: Example and bipartition based recall (averaged by instance)
Deciding a value when denominators are zero
Parameter undefined_value
: The value to be returned when a computation
results in an undefined value due to a division by zero. Can be a single
value (e.g. NA, 0), a function with the following signature:
function(tp, fp, tn, fn)
or a string corresponding to one of the predefined strategies. These are:
"diagnose"
: This strategy performs the following decision:
Returns 1 if there are no true labels and none were predicted
Returns 0 otherwise
This is the default strategy, and the one followed by MULAN.
"ignore"
: Occurrences of undefined values will be ignored when
averaging (averages will be computed with potentially less values than
instances/labels). Undefined values in micro-averaged metrics cannot be
ignored (will return NA
).
"na"
: Will return NA
(with class numeric
) and it
will be propagated when averaging (averaged metrics will potentially return
NA
).
Atomical numeric vector containing the resulting value in the range [0, 1].
Other evaluation metrics: Basic metrics
,
Ranking-based metrics
true_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,1, 0,0,0, 1,0,0 ), ncol = 3, byrow = TRUE) predicted_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,0, 1,0,0, 0,1,0 ), ncol = 3, byrow = TRUE) precision(true_labels, predicted_labels, undefined_value = "diagnose") macro_recall(true_labels, predicted_labels, undefined_value = 0) macro_fmeasure( true_labels, predicted_labels, undefined_value = function(tp, fp, tn, fn) as.numeric(fp == 0 && fn == 0) )
true_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,1, 0,0,0, 1,0,0 ), ncol = 3, byrow = TRUE) predicted_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,0, 1,0,0, 0,1,0 ), ncol = 3, byrow = TRUE) precision(true_labels, predicted_labels, undefined_value = "diagnose") macro_recall(true_labels, predicted_labels, undefined_value = 0) macro_fmeasure( true_labels, predicted_labels, undefined_value = function(tp, fp, tn, fn) as.numeric(fp == 0 && fn == 0) )
Several evaluation metrics designed for multi-label problems.
hamming_loss(true_labels, predicted_labels) subset_accuracy(true_labels, predicted_labels)
hamming_loss(true_labels, predicted_labels) subset_accuracy(true_labels, predicted_labels)
true_labels |
Matrix of true labels, columns corresponding to labels and rows to instances. |
predicted_labels |
Matrix of predicted labels, columns corresponding to labels and rows to instances. |
Available metrics in this category
hamming_loss
: describes
the average absolute distance between a predicted label and its true value.
subset_accuracy
: the ratio of correctly predicted labelsets.
Resulting value in the range [0, 1]
Other evaluation metrics: Averaged metrics
,
Ranking-based metrics
true_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,1, 0,0,0, 1,0,0 ), ncol = 3, byrow = TRUE) predicted_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,0, 1,0,0, 0,1,0 ), ncol = 3, byrow = TRUE) hamming_loss(true_labels, predicted_labels) subset_accuracy(true_labels, predicted_labels)
true_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,1, 0,0,0, 1,0,0 ), ncol = 3, byrow = TRUE) predicted_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,0, 1,0,0, 0,1,0 ), ncol = 3, byrow = TRUE) hamming_loss(true_labels, predicted_labels) subset_accuracy(true_labels, predicted_labels)
birds dataset.
birds
birds
An mldr object with 645 instances, 279 attributes and 19 labels
F. Briggs, Yonghong Huang, R. Raich, K. Eftaxias, Zhong Lei, W. Cukierski, S. Hadley, A. Hadley, M. Betts, X. Fern, J. Irvine, L. Neal, A. Thomas, G. Fodor, G. Tsoumakas, Hong Wei Ng, Thi Ngoc Tho Nguyen, H. Huttunen, P. Ruusuvuori, T. Manninen, A. Diment, T. Virtanen, J. Marzat, J. Defretin, D. Callender, C. Hurlburt, K. Larrey, M. Milakov. "The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment", in proc. 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
summary(birds) birds$labels
summary(birds) birds$labels
This function produces a label concurrence report, providing the average SCUMBLE, SCUMBLE by label, a list with the minority labels most affected by this problem indicating which majority labels they appear with, and a concurrence plot. The report output is written in the standard output by default, but it could be redirected to a PDF file.
concurrenceReport(mld, pdfOutput = FALSE, file = "Rconcurrence.pdf")
concurrenceReport(mld, pdfOutput = FALSE, file = "Rconcurrence.pdf")
mld |
|
pdfOutput |
Boolean value indicating if the output has to be sent to a PDF file. Defaults to true, so the output is shown in the console. |
file |
If the |
None
library(mldr) ## Not run: concurrenceReport(birds) ## End(Not run)
library(mldr) ## Not run: concurrenceReport(birds) ## End(Not run)
emotions dataset.
emotions
emotions
An mldr object with 593 instances, 78 attributes and 6 labels
K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas. "Multilabel Classification of Music into Emotions". Proc. 2008 International Conference on Music Information Retrieval (ISMIR 2008), pp. 325-330, Philadelphia, PA, USA, 2008
summary(emotions) emotions$labels
summary(emotions) emotions$labels
genbase dataset.
genbase
genbase
An mldr object with 662 instances, 1213 attributes and 27 labels
S. Diplaris, G. Tsoumakas, P. Mitkas and I. Vlahavas. Protein Classification with Multiple Algorithms, Proc. 10th Panhellenic Conference on Informatics (PCI 2005), pp. 448-456, Volos, Greece, November 2005
summary(genbase) genbase$labels
summary(genbase) genbase$labels
This function facilitates a list with the minority labels most affected by the problem of concurrence with majority labels, provinding the indexes of the majority labels interacting with each minority and also the number of instances in which they appear together.
labelInteractions(mld, labelProportion)
labelInteractions(mld, labelProportion)
mld |
|
labelProportion |
A value in the (0,1] range establishing the proportion of minority labels to be included as result. By default at least 3 or 10% of minority labels are included, or all of them if there are fewer than 3. |
A list with two slots, indexes
and interactions
. The former contains the indexes of the minority labels, sorted from
higher to lower SCUMBLE metric. The latter will provide an element for each of the previous labels, communicating the indexes of the majority
labels it interacts with and the number of samples in which they appear together.
library(mldr) labelInteractions(birds)
library(mldr) labelInteractions(birds)
Reads a multilabel dataset from a file and returns an mldr
object
containing the data and additional measures. The file has to be in ARFF format.
The label information could be in a separate XML file (MULAN style) or in the
the arff header (MEKA style)
mldr(filename, use_xml = TRUE, auto_extension = TRUE, xml_file, label_indices, label_names, label_amount, force_read_from_file = !all(c(missing(xml_file), missing(label_indices), missing(label_names), missing(label_amount), use_xml, auto_extension)), ...)
mldr(filename, use_xml = TRUE, auto_extension = TRUE, xml_file, label_indices, label_names, label_amount, force_read_from_file = !all(c(missing(xml_file), missing(label_indices), missing(label_names), missing(label_amount), use_xml, auto_extension)), ...)
filename |
Name of the dataset |
use_xml |
Specifies whether to use an associated XML file to identify the labels. Defaults to TRUE |
auto_extension |
Specifies whether to add the '.arff' and '.xml' extensions to the filename where appropriate. Defaults to TRUE |
xml_file |
Path to the XML file. If not provided, the filename ending in ".xml" will be assumed |
label_indices |
Optional vector containing the indices of the attributes that should be read as labels |
label_names |
Optional vector containing the names of the attributes that should be read as labels |
label_amount |
Optional parameter indicating the number of labels in the dataset, which will be taken from the last attributes of the dataset |
force_read_from_file |
Set this parameter to TRUE to always read from a local file, or set it to FALSE to look for the dataset within the 'mldr.datasets' package |
... |
Extra parameters to be passed to 'read_arff' |
An mldr object containing the multilabel dataset
mldr_from_dataframe
, read.arff
, summary.mldr
library(mldr) ## Not run: # Read "yeast.arff" and labels from "yeast.xml" mymld <- mldr("yeast") # Read "yeast.arff" and labels from "yeast.xml", converting categorical # attributes to factors mymld <- mldr("yeast", stringsAsFactors = TRUE) # Read "yeast-tra.arff" and labels from "yeast.xml" mymld <- mldr("yeast-tra", xml_file = "yeast.xml") # Read "yeast.arff" specifying the amount of attributes to be used as labels mymld <- mldr("yeast", label_amount = 14) # Read MEKA style dataset, without XML file and giving extension mymld <- mldr("IMDB.arff", use_xml = FALSE, auto_extension = FALSE) ## End(Not run)
library(mldr) ## Not run: # Read "yeast.arff" and labels from "yeast.xml" mymld <- mldr("yeast") # Read "yeast.arff" and labels from "yeast.xml", converting categorical # attributes to factors mymld <- mldr("yeast", stringsAsFactors = TRUE) # Read "yeast-tra.arff" and labels from "yeast.xml" mymld <- mldr("yeast-tra", xml_file = "yeast.xml") # Read "yeast.arff" specifying the amount of attributes to be used as labels mymld <- mldr("yeast", label_amount = 14) # Read MEKA style dataset, without XML file and giving extension mymld <- mldr("IMDB.arff", use_xml = FALSE, auto_extension = FALSE) ## End(Not run)
Taking as input an mldr
object and a matrix with the predictions
given by a classifier, this function evaluates the classifier performance through
several multilabel metrics.
mldr_evaluate(mldr, predictions, threshold = 0.5)
mldr_evaluate(mldr, predictions, threshold = 0.5)
mldr |
Object of |
predictions |
Matrix with the labels predicted for each instance in the |
threshold |
Threshold to use to generate bipartition of labels. By default the value 0.5 is used |
A list with multilabel predictive performance measures. The items in the list will be
accuracy
example_auc
average_precision
coverage
fmeasure
hamming_loss
macro_auc
macro_fmeasure
macro_precision
macro_recall
micro_auc
micro_fmeasure
micro_precision
micro_recall
one_error
precision
ranking_loss
recall
subset_accuracy
roc
The roc
element corresponds to a roc
object associated to the MicroAUC
value. This object can be given as input to plot
for plotting the ROC curve
The example_auc
, macro_auc
, micro_auc
and roc
members will be NULL
if the pROC
package is not installed.
mldr
, Basic metrics, Averaged metrics, Ranking-based metrics, roc.mldr
## Not run: library(mldr) # Get the true labels in emotions predictions <- as.matrix(emotions$dataset[, emotions$labels$index]) # and introduce some noise (alternatively get the predictions from some classifier) noised_labels <- cbind(sample(1:593, 200, replace = TRUE), sample(1:6, 200, replace = TRUE)) predictions[noised_labels] <- sample(0:1, 100, replace = TRUE) # then evaluate predictive performance res <- mldr_evaluate(emotions, predictions) str(res) plot(res$roc, main = "ROC curve for emotions") ## End(Not run)
## Not run: library(mldr) # Get the true labels in emotions predictions <- as.matrix(emotions$dataset[, emotions$labels$index]) # and introduce some noise (alternatively get the predictions from some classifier) noised_labels <- cbind(sample(1:593, 200, replace = TRUE), sample(1:6, 200, replace = TRUE)) predictions[noised_labels] <- sample(0:1, 100, replace = TRUE) # then evaluate predictive performance res <- mldr_evaluate(emotions, predictions) str(res) plot(res$roc, main = "ROC curve for emotions") ## End(Not run)
This function creates a new mldr
object from the data
stored in a data.frame
, taking as labels the columns pointed by the
indexes given in a vector.
mldr_from_dataframe(dataframe, labelIndices, attributes, name)
mldr_from_dataframe(dataframe, labelIndices, attributes, name)
dataframe |
The |
labelIndices |
Vector containing the indices of attributes acting as labels. Usually the
labels will be at the end (right-most columns) or the beginning (left-most columns) of the |
attributes |
Vector with the attributes type, as returned by the |
name |
Name of the dataset. The name of the dataset given as first parameter will be used by default |
An mldr object containing the multilabel dataset
library(mldr) df <- data.frame(matrix(rnorm(1000), ncol = 10)) df$Label1 <- c(sample(c(0,1), 100, replace = TRUE)) df$Label2 <- c(sample(c(0,1), 100, replace = TRUE)) mymldr <- mldr_from_dataframe(df, labelIndices = c(11, 12), name = "testMLDR") summary(mymldr)
library(mldr) df <- data.frame(matrix(rnorm(1000), ncol = 10)) df$Label1 <- c(sample(c(0,1), 100, replace = TRUE)) df$Label2 <- c(sample(c(0,1), 100, replace = TRUE)) mymldr <- mldr_from_dataframe(df, labelIndices = c(11, 12), name = "testMLDR") summary(mymldr)
Extracts a matrix with the true 0-1 assignment of labels of an
"mldr"
object.
mldr_to_labels(mldr)
mldr_to_labels(mldr)
mldr |
|
Numeric matrix of labels.
Basic metrics, Averaged metrics, Ranking-based metrics.
Transforms an mldr
object into one or serveral binary or multiclass datasets, returning them as data.frame
objects
mldr_transform(mldr, type = "BR", labels)
mldr_transform(mldr, type = "BR", labels)
mldr |
The mldr object to transform |
type |
Indicates the type of transformation to apply. Possible types are:
|
labels |
Vector with the label indexes to include in the transformation. All labels will be used if not specified |
A list of data.frames containing the resulting datasets (for BR) or a data.frame with the dataset (for LP).
The result is no longer an mldr
object, but a plain data.frame
library(mldr) emotionsbr <- mldr_transform(emotions, type = "BR") emotionslp <- mldr_transform(emotions, type = "LP")
library(mldr) emotionsbr <- mldr_transform(emotions, type = "BR") emotionslp <- mldr_transform(emotions, type = "LP")
Loads an interactive user interface in the web browser, built using R shiny.
mldrGUI()
mldrGUI()
The mldr package provides a basic, Shiny-based GUI to work with multilabel datasets. You have to install the shiny package to be able to use this GUI.
The user interface allows working with any of the previous loaded datasets, as well as loading new ones. The GUI is structured into the following pages:
Main: This page is divided into two sections. The one at the left can be used to choose a previously loaded dataset, as well as to load datasets from files. The right part shows some basic statistics about the selected multilabel dataset.
Labels: This page shows a table containing for each label its name, index, count, relative frequency and imbalance ratio (IRLbl). The page also includes a bar plot of the label frequency. The range of labels in the plot can be customized.
Labelsets: This page shows a table containing for each labelset its representation and a counter.
Attributes: This page shows a table containing for each attribute its name, type and a summary of its values.
Concurrence: This page shows for each label the number of instances in which it appears and its mean SCUMBLE measure, along with a plot that shows the level of concurrence among the selected labels. Clicking the labels in the table makes it possible to add/remove them from the plot.
The tables shown in these pages can be sorted by any of its fields, as well as filtered. The content of the tables can be copied to clipboard, printed and saved in CSV and Microsoft Excel format.
Nothing
## Not run: library(mldr) mldrGUI() ## End(Not run)
## Not run: library(mldr) mldrGUI() ## End(Not run)
Generates graphic representations of an mldr
object
## S3 method for class 'mldr' plot(x, type = "LC", labelCount, labelIndices, title, ask = length(type) > prod(par("mfcol")), ...)
## S3 method for class 'mldr' plot(x, type = "LC", labelCount, labelIndices, title, ask = length(type) > prod(par("mfcol")), ...)
x |
The mldr object whose features are to be drawn |
type |
Indicates the type(s) of plot to be produced. Possible types are:
|
labelCount |
Samples the labels in the dataset to show information of only |
labelIndices |
Establishes the labels to be shown in the plot |
title |
A title to be shown above the plot. Defaults to the name of the dataset passed as first argument |
ask |
Specifies whether to pause the generation of plots after each one |
... |
Additional parameters to be given to barplot, hist, etc. |
library(mldr) ## Not run: # Label concurrence plot plot(genbase, type = "LC") # Plots all labels plot(genbase) # Same as above plot(genbase, title = "genbase dataset", color.function = heat.colors) # Changes the title and color plot(genbase, labelCount = 10) # Randomly selects 10 labels to plot plot(genbase, labelIndices = genbase$labels$index[1:10]) # Plots info of first 10 labels # Label bar plot plot(emotions, type = "LB", col = terrain.colors(emotions$measures$num.labels)) # Label histogram plot plot(emotions, type = "LH") # Cardinality histogram plot plot(emotions, type = "CH") # Attributes by type plot(emotions, type = "AT", cex = 0.85) # Labelset histogram plot(emotions, type = "LSH") ## End(Not run)
library(mldr) ## Not run: # Label concurrence plot plot(genbase, type = "LC") # Plots all labels plot(genbase) # Same as above plot(genbase, title = "genbase dataset", color.function = heat.colors) # Changes the title and color plot(genbase, labelCount = 10) # Randomly selects 10 labels to plot plot(genbase, labelIndices = genbase$labels$index[1:10]) # Plots info of first 10 labels # Label bar plot plot(emotions, type = "LB", col = terrain.colors(emotions$measures$num.labels)) # Label histogram plot plot(emotions, type = "LH") # Cardinality histogram plot plot(emotions, type = "CH") # Attributes by type plot(emotions, type = "AT", cex = 0.85) # Labelset histogram plot(emotions, type = "LSH") ## End(Not run)
Prints the mldr
object data, including input attributs and output labels
## S3 method for class 'mldr' print(x, ...)
## S3 method for class 'mldr' print(x, ...)
x |
Object whose data are to be printed |
... |
Additional parameters to be given to print |
library(mldr) emotions print(emotions) # Same as above
library(mldr) emotions print(emotions) # Same as above
Functions that compute ranking-based metrics, given a matrix of true labels and a matrix of predicted probabilities.
average_precision(true_labels, predictions, ...) one_error(true_labels, predictions) coverage(true_labels, predictions, ...) ranking_loss(true_labels, predictions) macro_auc(true_labels, predictions, undefined_value = 0.5, na.rm = FALSE) micro_auc(true_labels, predictions) example_auc(true_labels, predictions, undefined_value = 0.5, na.rm = FALSE)
average_precision(true_labels, predictions, ...) one_error(true_labels, predictions) coverage(true_labels, predictions, ...) ranking_loss(true_labels, predictions) macro_auc(true_labels, predictions, undefined_value = 0.5, na.rm = FALSE) micro_auc(true_labels, predictions) example_auc(true_labels, predictions, undefined_value = 0.5, na.rm = FALSE)
true_labels |
Matrix of true labels, columns corresponding to labels and rows to instances. |
predictions |
Matrix of probabilities predicted by a classifier. |
... |
Additional parameters to be passed to the ranking function. |
undefined_value |
A default value for the cases when macro-averaged
and example-averaged AUC encounter undefined (not computable) values, e.g.
|
na.rm |
Logical specifying whether to ignore undefined values when
|
Available metrics in this category
average_precision
: Example and ranking based average precision (how many steps have to be made in the ranking to reach a certain relevant label, averaged by instance)
coverage
: Example and ranking based coverage (how many steps have to be made in the ranking to cover all the relevant labels, averaged by instance)
example_auc
: Example based Area Under the Curve ROC (averaged by instance)
macro_auc
: Label and ranking based Area Under the Curve ROC (macro-averaged by label)
micro_auc
: Label and ranking based Area Under the Curve ROC (micro-averaged)
one_error
: Example and ranking based one-error (how many times the top-ranked label is not a relevant label, averaged by instance)
ranking_loss
: Example and ranking based ranking-loss (how many times a non-relevant label is ranked above a relevant one, evaluated for all label pairs and averaged by instance)
Breaking ties in rankings
The additional ties_method
parameter for the ranking
function is passed to R's own rank
. It accepts the following values:
"average"
"first"
"last"
"random"
"max"
"min"
See rank
for information on the effect of each
parameter.
The default behavior in mldr corresponds to value "last"
, since this
is the behavior of the ranking method in MULAN, in order to facilitate fair
comparisons among classifiers over both platforms.
Atomical numeric vector specifying the resulting performance metric value.
Other evaluation metrics: Averaged metrics
,
Basic metrics
true_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,1, 0,0,0, 1,0,0 ), ncol = 3, byrow = TRUE) predicted_probs <- matrix(c( .6,.5,.9, .0,.1,.2, .8,.3,.2, .7,.9,.1, .7,.3,.2, .1,.8,.3 ), ncol = 3, byrow = TRUE) # by default, labels with same ranking are assigned ascending rankings # in the order they are encountered coverage(true_labels, predicted_probs) # in the following, labels with same ranking will receive the same, # averaged ranking average_precision(true_labels, predicted_probs, ties_method = "average") # the following will treat all undefined values as 0 (counting them # for the average) example_auc(true_labels, predicted_probs, undefined_value = 0) # the following will ignore undefined values (not counting them for # the average) example_auc(true_labels, predicted_probs, undefined_value = NA, na.rm = TRUE)
true_labels <- matrix(c( 1,1,1, 0,0,0, 1,0,0, 1,1,1, 0,0,0, 1,0,0 ), ncol = 3, byrow = TRUE) predicted_probs <- matrix(c( .6,.5,.9, .0,.1,.2, .8,.3,.2, .7,.9,.1, .7,.3,.2, .1,.8,.3 ), ncol = 3, byrow = TRUE) # by default, labels with same ranking are assigned ascending rankings # in the order they are encountered coverage(true_labels, predicted_probs) # in the following, labels with same ranking will receive the same, # averaged ranking average_precision(true_labels, predicted_probs, ties_method = "average") # the following will treat all undefined values as 0 (counting them # for the average) example_auc(true_labels, predicted_probs, undefined_value = 0) # the following will ignore undefined values (not counting them for # the average) example_auc(true_labels, predicted_probs, undefined_value = NA, na.rm = TRUE)
Reads a multilabel dataset from an ARFF file in Mulan or MEKA and retrieves instances distinguishing attributes corresponding to labels
read.arff(filename, use_xml = TRUE, auto_extension = TRUE, xml_file, label_indices, label_names, label_amount, ...)
read.arff(filename, use_xml = TRUE, auto_extension = TRUE, xml_file, label_indices, label_names, label_amount, ...)
filename |
Name of the dataset |
use_xml |
Specifies whether to use an associated XML file to identify the labels. Defaults to TRUE |
auto_extension |
Specifies whether to add the '.arff' and '.xml' extensions to the filename where appropriate. Defaults to TRUE |
xml_file |
Path to the XML file. If not provided, the filename ending in ".xml" will be assumed |
label_indices |
Optional vector containing the indices of the attributes that should be read as labels |
label_names |
Optional vector containing the names of the attributes that should be read as labels |
label_amount |
Optional parameter indicating the number of labels in the dataset, which will be taken from the last attributes of the dataset |
... |
Extra parameters that will be passed to the parsers. Currently
only the option |
A list containing four members: dataframe
containing the dataset,
labelIndices
specifying the indices of the attributes that correspond to
labels, attributes
containing name and type of each attribute and name
of
the dataset.
library(mldr) ## Not run: # Read "yeast.arff" and labels from "yeast.xml" mymld <- read.arff("yeast") ## End(Not run)
library(mldr) ## Not run: # Read "yeast.arff" and labels from "yeast.xml" mymld <- read.arff("yeast") ## End(Not run)
This function implements the REMEDIAL algorithm. It is a preprocessing algorithm for imbalanced multilabel datasets, whose aim is to decouple frequent and rare classes appearing in the same instance. For doing so, it aggregates new instances to the dataset and edit the labels present in them.
remedial(mld)
remedial(mld)
mld |
|
An mldr object containing the preprocessed multilabel dataset
F. Charte, A. J. Rivera, M. J. del Jesus, F. Herrera. "Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels". Proc. 2015 International Conference on Hybrid Artificial Intelligent Systems (HAIS 2015), pp. 489-501, Bilbao, Spain, 2015
concurrenceReport
, labelInteractions
library(mldr) ## Not run: summary(birds) summary(remedial(birds)) ## End(Not run)
library(mldr) ## Not run: summary(birds) summary(remedial(birds)) ## End(Not run)
Calculates the ROC (Receiver Operating Characteristic) curve for given true labels and predicted ones. The pROC package is needed for this functionality.
roc(...) ## S3 method for class 'mldr' roc(mldr, predictions, ...)
roc(...) ## S3 method for class 'mldr' roc(mldr, predictions, ...)
... |
Additional parameters to be passed to the |
mldr |
An |
predictions |
Matrix of predicted labels or probabilities, columns corresponding to labels and rows to instances. |
ROC object from pROC package.
Prints a summary of the measures obtained from the mldr
object
## S3 method for class 'mldr' summary(object, ...)
## S3 method for class 'mldr' summary(object, ...)
object |
Object whose measures are to be printed |
... |
Additional parameters to be given to print |
library(mldr) summary(emotions)
library(mldr) summary(emotions)
"mldr"
object to a fileSave the mldr
content to an ARFF file and the label data to an XML file.
If you need faster write, more options and support for other formats, please
refer to the write.mldr
function in package mldr.datasets.
write_arff(obj, filename, write.xml = FALSE)
write_arff(obj, filename, write.xml = FALSE)
obj |
The |
filename |
Base name for the files (without extension) |
write.xml |
|
In mldr.datasets: write.mldr
write_arff(emotions, "myemotions")
write_arff(emotions, "myemotions")