Writing Your Own Application

Here, we’ll show you how to write your own Distributed Structured Prediction application with the help of the packaged starter-kit found in dissolve-struct-application. To begin, you’ll need:

Obtain project repository

First, you’ll need to obtain the project repository.

$ git clone https://github.com/dalab/dissolve-struct.git

Setup development environment

Now, we’ll need to build dissolvestruct and generate some files, so that the project can be imported into Eclipse.

$ cd dissolve-struct-lib
$ sbt publish-local
$ cd ../dissolve-struct-application
$ sbt eclipse

The application can now be easily imported into Eclipse via File | Import | General | Existing Projects into Workspace. Following this, if Eclipse displays build errors such as the library being cross-compiled with Scala version 2.11, you’ll need to go to Project Properties | Scala Compiler by right-clicking on the project and switch to Fixed Scala Installation: 2.10.x (built-in).

Implement Dissolve Functions

Now that the development environment is set, you’ll need to implement a few functions for an interface named DissolveFunctions. In order to bootstrap and get you started, you’ll find a skeleton in src/main/scala/ch/ethz/dalab/dissolve/app/DSApp.scala. This file contains all the necessary instructions to get you started with implementing you application.

The main idea is to implement:

  1. The Joint feature map \( \phi \)
  2. A Loss function \( \Delta \)
  3. The Maximization Oracle \( H(w) \)

and provide the training data in the main() function.

Building

While you can test your application locally within Eclipse, the purpose of the library is to run a scalable application (whose data may be not fit on a single machine) on a cluster. For this, you’ll need to build your application into a binary jar and hand it to spark-submit.

Packaging into a jar

First, you’ll need to set the metadata and additional libraries used in the dissolve-struct-application/build.sbt file. This file also contains the instructions which will help you get started. Don’t worry – this is very straight-forward and you’ll merely need to change a few lines.

After you’ve got the build.sbt configured, you can obtain the fat jar using

$ cd dissolve-struct-application
$ sbt assembly

Running on EC2

It’s extremely easy to execute your application on a cluster! You’ll need to first launch an EC2 cluster configured with Spark. Luckily, Spark contains a script which completely sets it up for you. In case you’ve downloaded Spark on your machine, you’ll find this script in $SPARK_ROOT/ec2/spark-ec2. The documentation for this can be found here.

Once the cluster is setup, you’ll need to merely move your jar and data to the master node and start the application via spark-submit. Also, make sure you’re not running in local-mode when submitting the application (i.e, you don’t have a setMaster(“local”) enabled in your driver code.)

Java Implementation

If you’re a hardcore Java programmer, worry not, dissolvestruct applications can be written in Java too. Some samples can be found here. Unfortunately, we are not focusing much on this, but feel free to write to us or raise an issue on Github if you need help.


Updated