Install VirtualBox guest additions

To use “Shared Folders” in VirtualBox, the user should install VirtualBox guest additions:

(1) Select Devices -> Insert Guest Additions CD image. If the VirtualBox prompts:

Unable to insert the virtual optical disk C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso into the machine CentOS.

Would you like to try to force insertion of this disk?

Count not mount the media/drive 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' (VERR_PDM_MEDIA_LOCKED).

It means the Devices -> CD/DVD DEvices already has ISO file. Please inject it, and try Insert Guest Additions CD image again.

(2) Mount the ISO file:

mount /dev/cdrom /mnt

(3) Install the VirtualBox guest additions (Take Linux as an example):

cd /mnt
./VBoxLinuxAdditions.run

You may also need to install bzip2, gcc and kernel files to install guest additions successfully. When meeting errors, please refer /var/log/vboxadd-install.log for detail info.

Build Apache Spark Application in IntelliJ IDEA 14.1

My Operating System is Windows 7, so this tutorial may be little difference for your environment.

Firstly, you should install Scala 2.10.x version on Windows to run Spark, else you would get errors like this:

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
        at akka.actor.ActorCell$.<init>(ActorCell.scala:305)
        at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
        at akka.actor.RootActorPath.$div(ActorPath.scala:152)
        ......

Please refer this post.

Secondly, you should install Scala plugin and create a Scala project, you can refer this document: Getting Started with Scala in IntelliJ IDEA 14.1.  

After all the above steps are done, the project view should like this:

21

Then follow the next steps:

(1) Select “File” -> “Project Structure“:

22

(2) Select “Modules” -> “Dependencies” -> “+” -> “Library” -> “Java“:

23

(3) Select spark-assembly-x.x.x-hadoopx.x.x.jar, press OK:

24

(4) Configure Library, press OK:

25

(5) The final configuration likes this:

26

(6) Write a simple CountWord application:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object CountWord{
  def main(args: Array[String]) {
    System.setProperty("hadoop.home.dir", "c:\\winutil\\")

    val logFile = "C:\\spark-1.3.1-bin-hadoop2.4\\README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Please notice “System.setProperty("hadoop.home.dir", "c:\\winutil\\")” , You should downloadwinutils.exe and put it in the folder: C:\winutil\bin. For detail information, you should refer the following posts:
a) Apache Spark checkpoint issue on windows;
b) Run Spark Unit Test On Windows 7.

(7) The final execution likes this:

27

 

The following part introduces creating SBT project:

(1) Select “New project” -> “Scala” -> “SBT“, then click “Next:

sbt1

(2) Fill the “project name” and “project location“, then click “Finish“:

sbt2

(3) In Windows, modify the scala version to 2.10.4 in build.sbt:

sbt4

(4) Add spark package and create an scala object in “src -> main -> scala-2.10” folder, the final file layout likes this:sbt5(5) Run it!

You can also build a jar file:
File” -> “Project Structure” -> “Artifacts“, then select options like this:

sbt6

Refer this post in stackoverflow.

Then using spark-submit command execute jar package:

C:\spark-1.3.1-bin-hadoop2.4\bin>spark-submit --class "CountWord" --master local
[4] C:\Work\Intellij_scala\CountWord\out\artifacts\CountWord_jar\CountWord.jar
15/06/17 17:05:51 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
[Stage 0:>                                                          (0 + 0) / 2]
[Stage 0:>                                                          (0 + 1) / 2]
[Stage 0:>                                                          (0 + 2) / 2]

Lines with a: 60, Lines with b: 29