Here, we’ll show you how to write your own Distributed Structured Prediction application with
the help of the packaged starter-kit found in dissolve-struct-application
.
To begin, you’ll need:
- Basic experience with SVMs or Structured SVMs
- Linux or Mac OS X
- Scala Eclipse IDE
- sbt (
sudo apt-get install sbt
on Ubuntu orbrew install sbt
on OS X)
Obtain project repository
First, you’ll need to obtain the project repository.
$ git clone https://github.com/dalab/dissolve-struct.git
Setup development environment
Now, we’ll need to build dissolvestruct and generate some files, so that the project can be imported into Eclipse.
$ cd dissolve-struct-lib
$ sbt publish-local
$ cd ../dissolve-struct-application
$ sbt eclipse
The application can now be easily imported into Eclipse via
File | Import | General | Existing Projects into Workspace
.
Following this, if Eclipse displays build errors such as the library being
cross-compiled with Scala version 2.11, you’ll need to go to
Project Properties | Scala Compiler
by right-clicking on the project and
switch to Fixed Scala Installation: 2.10.x (built-in)
.
Implement Dissolve Functions
Now that the development environment is set, you’ll need to implement a few functions for an interface named DissolveFunctions
.
In order to bootstrap and get you started, you’ll find a skeleton in
src/main/scala/ch/ethz/dalab/dissolve/app/DSApp.scala
.
This file contains all the necessary instructions to get you started with
implementing you application.
The main idea is to implement:
- The Joint feature map \( \phi \)
- A Loss function \( \Delta \)
- The Maximization Oracle \( H(w) \)
and provide the training data in the main()
function.
Building
While you can test your application locally within Eclipse, the purpose of the library is to run a scalable application (whose data may be not fit on a single machine) on a cluster. For this, you’ll need to build your application into a binary jar and hand it to spark-submit.
Packaging into a jar
First, you’ll need to set the metadata and additional libraries used in the
dissolve-struct-application/build.sbt
file.
This file also contains the instructions which will help you get started.
Don’t worry – this is very straight-forward and you’ll merely need to change
a few lines.
After you’ve got the build.sbt
configured, you can obtain the fat jar using
$ cd dissolve-struct-application
$ sbt assembly
Running on EC2
It’s extremely easy to execute your application on a cluster!
You’ll need to first launch an EC2 cluster configured with Spark.
Luckily, Spark contains a script which completely sets it up for you.
In case you’ve downloaded Spark on your machine, you’ll find this script in
$SPARK_ROOT/ec2/spark-ec2
.
The documentation for this can be found
here.
Once the cluster is setup, you’ll need to merely move your jar and data to the master node and start the application via spark-submit. Also, make sure you’re not running in local-mode when submitting the application (i.e, you don’t have a setMaster(“local”) enabled in your driver code.)
Java Implementation
If you’re a hardcore Java programmer, worry not, dissolvestruct applications can be written in Java too. Some samples can be found here. Unfortunately, we are not focusing much on this, but feel free to write to us or raise an issue on Github if you need help.