Datasets
In the ch.ethz.dalab.dissolve.examples.binaryclassification
package of the
dissolve-struct
package, you’ll find three Binary SVM examples using 3 datasets:
Each of these intend to display different aspects of dissolvestruct’s awesomeness. COV is a relatively large corpus containing around 581,012 data points, each with 54 features. RCV1 contains 20,242 data points, but with each example involving a sparse vector with 47,236 features.
Running the examples
Training a binary SVM locally from the command-line is done as follows, here for the Forest Cover (COV) dataset. Within dissolve-struct-examples
directory, run
spark-1.X/bin/spark-submit \
--class "ch.ethz.dalab.dissolve.examples.binaryclassification.COVBinary" \
--master local \
--driver-memory 2G \
<examples-jar-path>
Running your own Binary classifier
A Binary classifier is bundled with dissolvestruct.
To use it, you’ll merely need to provide the data and the solver parameters.
Just like any other Spark MLLib classifiers, the data can be provided
using the loadLibSVMFile
format.
val training = MLUtils.loadLibSVMFile(sc, covPath)
val solverOptions: SolverOptions[Vector[Double], Double] = new SolverOptions()
val model = BinarySVMWithDBCFW.train(training, solverOptions)
Label Format: The labels need to be +1.0/-1.0. This can be usually taken care of in the preprocessing stage.