[hpc-announce] Spark on the compute cluster

Boehme, Christian Christian.Boehme at gwdg.de
Thu Feb 23 14:09:17 CET 2017


Dear all,

we have  made Spark available on the compute cluster:

Spark on the compute cluster
======================

The data analytics framework *Spark*[1] is now available on the compute cluster. Please note that, so far, we have only done very limited testing. Please report issues to *support at gwdg.de*.


Preparation
---------------

Spark requires Java 8, so besides the Spark module itself, you also have to load the Java 8 module:
```
module load JAVA/jdk1.8.0_31 spark
```


Using the Spark shell
----------------------------

In order to use the interactive Spark shell you can submit the `lsf-spark-shell.sh` command to the interactive queue:
```
bsub -I -q int -R "span[ptile=16]" -n 32 lsf-spark-shell.sh
```
This example will create 32 Spark workers on two hosts (16 workers each).


Submitting a Spark application as a batch job
------------------------------------------------------------

For submitting Spark applications as batch job you can use the `lsf-spark-submit.sh` command. The general format is:
```
bsub <bsub_options> lsf-spark-submit.sh <spark-submit_options>
```
For example, to execute the *SimpleApp* application from the Spark Quick Start guide [2], you can use the following submission script:
```
#!/bin/bash
#BSUB -n 32,48 
#BSUB -R span[ptile='!']
#BSUB -R same[model]
#BSUB -q mpi 
#BSUB -o simple.%J
lsf-spark-submit.sh --class "SimpleApp" simple-project_2.11-1.0.jar largetextfile.txt
```
This will start 32 to 48 Spark workers (depending on availability) on two hosts.


[1] http://spark.apache.org/
[2] http://spark.apache.org/docs/latest/quick-start.html


Best regards

Christian Boehme


Dr. Christian Boehme
High Performance Computing
Arbeitsgruppe "eScience"
Tel.: +49 551 201-1839, E-Mail: christian.boehme at gwdg.de
--------------------------------------------------------------------------
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de
Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: gwdg at gwdg.de
Service-Hotline: Tel.: +49 551 201-1523, E-Mail: support at gwdg.de

Geschäftsführer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
Sitz der Gesellschaft: Göttingen
Registergericht: Göttingen, Handelsregister-Nr. B 598
--------------------------------------------------------------------------
Zertifiziert nach ISO 9001
--------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5274 bytes
Desc: not available
URL: <http://listserv.gwdg.de/pipermail/hpc-announce/attachments/20170223/4ceef9a0/attachment.p7s>


More information about the hpc-announce mailing list