Binary SVM

Datasets

In the ch.ethz.dalab.dissolve.examples.binaryclassification package of the dissolve-struct package, you’ll find three Binary SVM examples using 3 datasets:

  1. Adult (A1A)
  2. Forest Cover (COV)
  3. Reuters Corpus Volume 1 (RCV1)

Each of these intend to display different aspects of dissolvestruct’s awesomeness. COV is a relatively large corpus containing around 581,012 data points, each with 54 features. RCV1 contains 20,242 data points, but with each example involving a sparse vector with 47,236 features.

Running the examples

Training a binary SVM locally from the command-line is done as follows, here for the Forest Cover (COV) dataset. Within dissolve-struct-examples directory, run

spark-1.X/bin/spark-submit \
	--class "ch.ethz.dalab.dissolve.examples.binaryclassification.COVBinary" \
	--master local \
	--driver-memory 2G \
	<examples-jar-path>

Running your own Binary classifier

A Binary classifier is bundled with dissolvestruct. To use it, you’ll merely need to provide the data and the solver parameters. Just like any other Spark MLLib classifiers, the data can be provided using the loadLibSVMFile format.

val training = MLUtils.loadLibSVMFile(sc, covPath)
val solverOptions: SolverOptions[Vector[Double], Double] = new SolverOptions()

val model = BinarySVMWithDBCFW.train(training, solverOptions)

Label Format: The labels need to be +1.0/-1.0. This can be usually taken care of in the preprocessing stage.


Updated