本文参考Spark
网站的Self-Contained Applications一节,使用Scala
语言开发一个单独的小程序。
(1)首先安装sbt
,参考官方文档。我使用的是RPM
包格式:
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt
(2)接下来在/home
文件夹下建立一个SparkApp
的文件夹,文件夹布局如下:
bash-4.1# find /home/SparkApp/
/home/SparkApp/
/home/SparkApp/simple.sbt
/home/SparkApp/src
/home/SparkApp/src/main
/home/SparkApp/src/main/scala
/home/SparkApp/src/main/scala/SimpleApp.scala
其中simple.sbt
文件内容如下所示:
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
SimpleApp.scala
程序如下:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
(3)执行sbt package
命令打包jar
文件:
bash-4.1# sbt package
......
[success] Total time: 89 s, completed May 25, 2015 10:16:51 PM
(4)调用spark-submit
脚本执行程序:
bash-4.1# /usr/local/spark/bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
......
Lines with a: 60, Lines with b: 29
可以看到,输出正确结果。