My Operating System is Windows 7
, so this tutorial may be little difference for your environment.
Firstly, you should install Scala 2.10.x
version on Windows
to run Spark
, else you would get errors like this:
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:305)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:152)
......
Please refer this post.
Secondly, you should install Scala
plugin and create a Scala
project, you can refer this document: Getting Started with Scala in IntelliJ IDEA 14.1.
After all the above steps are done, the project view should like this:
data:image/s3,"s3://crabby-images/1fa3f/1fa3f1fd69b0e56c94b234cd7dfa1f1a94fd505b" alt="21"
Then follow the next steps:
(1) Select “File
” -> “Project Structure
“:
data:image/s3,"s3://crabby-images/f47d3/f47d369b4fa160d965234730fc2ea95ba26caffb" alt="22"
(2) Select “Modules
” -> “Dependencies
” -> “+
” -> “Library
” -> “Java
“:
data:image/s3,"s3://crabby-images/772e2/772e299e6e2e9be641b7fb4f92c9545bd67bbf58" alt="23"
(3) Select spark-assembly-x.x.x-hadoopx.x.x.jar
, press OK
:
data:image/s3,"s3://crabby-images/e1ba8/e1ba89a18acdbb41d898c61374ea890dd4e4767b" alt="24"
(4) Configure Library
, press OK
:
data:image/s3,"s3://crabby-images/8d649/8d649906e752f3dc1208cfcff34d7a9c53c3960d" alt="25"
(5) The final configuration likes this:
data:image/s3,"s3://crabby-images/33f12/33f1281f59e89f1ad0a750de734f436fdd3b3bf9" alt="26"
(6) Write a simple CountWord
application:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object CountWord{
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "c:\\winutil\\")
val logFile = "C:\\spark-1.3.1-bin-hadoop2.4\\README.md"
val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
Please notice “System.setProperty("hadoop.home.dir", "c:\\winutil\\")
” , You should downloadwinutils.exe
and put it in the folder: C:\winutil\bin
. For detail information, you should refer the following posts:
a) Apache Spark checkpoint issue on windows;
b) Run Spark Unit Test On Windows 7.
(7) The final execution likes this:
data:image/s3,"s3://crabby-images/ef717/ef71737baa60cc1fc776eecac62692fb5a98c452" alt="27"
The following part introduces creating SBT
project:
(1) Select “New project
” -> “Scala
” -> “SBT
“, then click “Next
“:
data:image/s3,"s3://crabby-images/9c0ea/9c0eab53c66a41a388eb31ce4e6cdc6385ab6660" alt="sbt1"
(2) Fill the “project name
” and “project location
“, then click “Finish
“:
data:image/s3,"s3://crabby-images/4c964/4c964e22cad7183661f14afc8e7e8faf2b8326b6" alt="sbt2"
(3) In Windows
, modify the scala
version to 2.10.4
in build.sbt
:
data:image/s3,"s3://crabby-images/51987/51987b37037737965edd1dda4aecd0d684ba2986" alt="sbt4"
(4) Add spark
package and create an scala
object in “src -> main -> scala-2.10
” folder, the final file layout likes this:
(5) Run it!
You can also build a jar
file:
“File
” -> “Project Structure
” -> “Artifacts
“, then select options like this:
data:image/s3,"s3://crabby-images/49879/49879c0a76c493f889df9010a7053aa367397167" alt="sbt6"
Refer this post in stackoverflow.
Then using spark-submit
command execute jar
package:
C:\spark-1.3.1-bin-hadoop2.4\bin>spark-submit --class "CountWord" --master local
[4] C:\Work\Intellij_scala\CountWord\out\artifacts\CountWord_jar\CountWord.jar
15/06/17 17:05:51 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
[Stage 0:> (0 + 0) / 2]
[Stage 0:> (0 + 1) / 2]
[Stage 0:> (0 + 2) / 2]
Lines with a: 60, Lines with b: 29