在网上找到使用Intellij
搭建Scala
开发环境的官方文档,发现老的已经掉牙。索性自己写了一篇:Getting Started with Scala in IntelliJ IDEA 14.1,以给需要的朋友一个参考。
月份:2015年5月
搭建Scala开发环境
本文以CentOS 7
为例,介绍如何搭建Scala
开发环境:
(1)安装Scala
:
执行“yum install scala
”命令:
[root@localhost ~]# yum install scala
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: centos.mirrors.tds.net
* extras: bay.uchicago.edu
* updates: dallas.tx.mirror.xygenhosting.com
Nothing to do
提示找不到Scala
安装包,所以采用另外方式。使用wget
命令直接下载:
[root@localhost ~]# wget http://downloads.typesafe.com/scala/2.11.6/scala-2.11.6.rpm
--2015-05-27 22:07:32-- http://downloads.typesafe.com/scala/2.11.6/scala-2.11.6.rpm
......
Length: 111919675 (107M) [application/octet-stream]
Saving to: ‘scala-2.11.6.rpm’
100%[=========================================================================>] 111,919,675 298KB/s in 6m 15s
2015-05-27 22:13:48 (291 KB/s) - ‘scala-2.11.6.rpm’ saved [111919675/111919675]
接下来安装Scala
:
[root@localhost ~]# rpm -ivh scala-2.11.6.rpm
Preparing... ################################# [100%]
Updating / installing...
1:scala-2.11.6-0 ################################# [100%]
安装成功,执行Scala
:
[root@localhost ~]# scala
/usr/bin/scala: line 23: java: command not found
运行Scala
需要JRE
支持,所以下一步安装Java
环境。
(2)执行yum install java
:
[root@localhost ~]# yum install java
......
Complete!
(3)运行scala
,打印“Hello world!
”:
[root@localhost ~]# scala
Welcome to Scala version 2.11.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_45).
Type in expressions to have them evaluated.
Type :help for more information.
scala> print("Hello world!")
Hello world!
scala> :quit
安装成功!
利用Spark API写一个单独的程序
本文参考Spark
网站的Self-Contained Applications一节,使用Scala
语言开发一个单独的小程序。
(1)首先安装sbt
,参考官方文档。我使用的是RPM
包格式:
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt
(2)接下来在/home
文件夹下建立一个SparkApp
的文件夹,文件夹布局如下:
bash-4.1# find /home/SparkApp/
/home/SparkApp/
/home/SparkApp/simple.sbt
/home/SparkApp/src
/home/SparkApp/src/main
/home/SparkApp/src/main/scala
/home/SparkApp/src/main/scala/SimpleApp.scala
其中simple.sbt
文件内容如下所示:
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
SimpleApp.scala
程序如下:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
(3)执行sbt package
命令打包jar
文件:
bash-4.1# sbt package
......
[success] Total time: 89 s, completed May 25, 2015 10:16:51 PM
(4)调用spark-submit
脚本执行程序:
bash-4.1# /usr/local/spark/bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
......
Lines with a: 60, Lines with b: 29
可以看到,输出正确结果。
搭建Spark开发环境
本文使用docker
搭建Spark
环境,使用的image
文件是sequenceiq
提供的1.3.0版本。
首先pull Spark image
文件:
docker pull sequenceiq/spark:1.3.0
pull
成功后,运行Spark
:
docker run -i -t -h sandbox sequenceiq/spark:1.3.0 bash
测试Spark
是否工作正常:
bash-4.1# spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
......
scala> sc.parallelize(1 to 1000).count()
......
res0: Long = 1000
输出1000
,OK!
(1)启动spark-shell
,输出log
很多,解决方法如下:
a)把/usr/local/spark/conf
文件夹下的log4j.properties.template
文件复制生成一份log4j.properties
文件:
bash-4.1# cd /usr/local/spark/conf
bash-4.1# cp log4j.properties.template log4j.properties
b)把log4j.properties
文件里的“log4j.rootCategory=INFO, console
”改成“log4j.rootCategory=WARN, console
”即可。
(2)启动spark-shell
会有以下warning
:
15/05/25 04:49:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
提示找不到hadoop
的库文件,解决办法如下:
export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH
请参考stackoverflow
的相关讨论:
a)Hadoop “Unable to load native-hadoop library for your platform” error on CentOS;
b)Hadoop “Unable to load native-hadoop library for your platform” error on docker-spark?。
(3)在Quick Start中提到如下例子:
scala> val textFile = sc.textFile("README.md")
......
scala> textFile.count() // Number of items in this RDD
执行会有错误:
scala> textFile.count()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://sandbox:9000/user/root/README.md
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
可以看到程序尝试从hdfs
中寻找文件,所以报错。
解决方法有两种:
a) 指定本地文件系统:
scala> val textFile = sc.textFile("file:///usr/local/spark/README.md")
textFile: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/README.md MapPartitionsRDD[3] at textFile at <console>:21
scala> textFile.count()
res1: Long = 98
b)上传文件到hdfs
上:
bash-4.1# hadoop fs -put /usr/local/spark/README.md README.md
接着运行spark-shell
:
bash-4.1# spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.3.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
15/05/25 05:22:15 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/05/25 05:22:15 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
Spark context available as sc.
SQL context available as sqlContext.
scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:21
scala> textFile.count()
res0: Long = 98
参考邮件:
Spark Quick Start – call to open README.md needs explicit fs prefix。
P.S.在主机(非docker
环境)下载spark
(https://spark.apache.org/downloads.html)运行时,会有以下warning
:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
解决办法是把/path/to/spark/conf
文件夹下的log4j.properties.template
文件复制生成一份log4j.properties
文件即可。
参考stackoverflow
的讨论:
log4j:WARN No appenders could be found for logger (running jar file, not web app)。
不要招聘“性价比”高的工程师
以前在论坛看到一个帖子,发帖者抱怨自己招不到“性价比”高的工程师。当时没想什么,不过我后来仔细琢磨一下,感觉有点奇怪。为什么一个公司非要招聘“性价比”高的工程师呢?
所谓“性价比”,顾名思义,就是“性能”和“价格”的比值。“性能”越高,“价格”越低,性价比就越高。如果一个人每月工资是5000
元,但他每个月为公司创造的价值相当于工资是8000
或10000
元的人创造的,那么这个人“性价比”就很高了。但是且慢,这样是不是对这个人不公平呢?当然,人就应该拿到和他能力匹配的工资。为什么公司非要招聘“性价比”高的工程师呢?“性价比”是1
,或者性能和价格匹配就好了。谁都不是傻子,都知道自己有几斤几两。当他发现工资与自己的能力和价值相差太远时,或是降低工作效率,或是准备拍屁股走人,这样对公司有什么好处呢?开始以为招聘“性价比”高的工程师占了便宜,其实长远来看,还是吃亏的。据说有些公司还把招聘时压价格作为对招聘者的一个考评指标,想想真是可笑。
不要考虑招聘“性价比”高的工程师的了,招聘一个“性价比”合适的工程师。
docker笔记(4)—— 如何进入一个正在运行的“docker container”?
“docker attach [container-id]
”命令有时会hang
住,执行Ctrl+C
命令也不起作用:
[root@localhost ~]# docker attach a972e69ab444
^C^C^C^C^C^C^C^C^C^C
使用pstack
命令查看函数调用栈:
[root@localhost ~]# pstack 29744
Thread 5 (Thread 0x7f9079bd8700 (LWP 29745)):
#0 runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1 0x0000000000417717 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:49
#2 0x0000000001161c58 in runtime.sched ()
#3 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f90792d7700 (LWP 29746)):
#0 runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1 0x0000000000417782 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:55
#2 0x00007f907b830f60 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f9078ad6700 (LWP 29747)):
#0 runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1 0x0000000000417717 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:49
#2 0x00000000011618a0 in text/template.zero ()
#3 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f9073fff700 (LWP 29748)):
#0 runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1 0x0000000000417717 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:49
#2 0x000000c2080952f0 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f907b9e1800 (LWP 29744)):
#0 runtime.epollwait () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:385
#1 0x00000000004175dd in runtime.netpoll () at /usr/lib/golang/src/pkg/runtime/netpoll_epoll.c:78
#2 0x00007fff00000004 in ?? ()
#3 0x00007fff58720fd0 in ?? ()
#4 0xffffffff00000080 in ?? ()
#5 0x0000000000000000 in ?? ()
可以使用“docker exec -it [container-id] bash
”命令进入正在运行的container
:
[root@localhost ~]# docker exec -it a972e69ab444 bash
bash-4.1# ls
bin dev home lib64 mnt pam-1.1.1-17.el6.src.rpm root sbin srv tmp var
boot etc lib media opt proc rpmbuild selinux sys usr
bash-4.1#
docker attach
相当于复用了container
当前使用的tty
,因此在docker attach
内执行exit
,会导致正在运行的container
退出。而docker exec
会新建立一个tty
,在docker exec
中执行exit
不会导致container
退出。
如何杀死一个已经detached的screen会话?
如果想杀死一个已经detached
的screen
会话,可以使用以下命令:
screen -X -S [session # you want to kill] quit
举例如下:
[root@localhost ~]# screen -ls
There are screens on:
9975.pts-0.localhost (Detached)
4588.pts-3.localhost (Detached)
2 Sockets in /var/run/screen/S-root.
[root@localhost ~]# screen -X -S 4588 quit
[root@localhost ~]# screen -ls
There is a screen on:
9975.pts-0.localhost (Detached)
1 Socket in /var/run/screen/S-root.
可以看到,4588
会话已经没有了。
参考资料:
(1)Kill detached screen session。
《若为自由故》读后感
整整花了一个月的周末时间,阅读了RMS
自传的中文版《若为自由故》。这本书也是在上次RMS
访华时的活动上买的,它这讲述了RMS
个人以及他所领导的GNU
运动的成长经历,时间跨度从上世纪50
年代一直到本世纪初左右。我本人很尊敬RMS
先生,他那种不妥协,近乎偏执的精神在现代人看来似乎有些迂腐,但也正是这一点使他得到了别人的认可和尊敬。此外,我也有些羡慕RMS
的生活,因为他会在全世界各地不停地游走和演讲,而“边旅行边工作”也是我向往的生活状态。
RMS
所提倡的“自由软件”运动远远超出了计算机领域本身,而更是一场关乎人类自由的政治运动。其实支持RMS
也很容易,最简单的办法就是请用GNU/Linux
(发音:GNU plus Linux
)来代替Linux
就好了。如果想了解RMS
,我推荐可以读一读这本书,或者找到它的英文版,它会告诉你一个真实的RMS
。
docker笔记(3)—— selinux导致docker工作不正常
最近几天在研究docker
备份文件(操作系统是RHEL7
,docker
版本是1.5.0
)。仿照docker文档,执行如下命令:
[root@localhost data]#docker create -v /dbdata --name dbdata training/postgres /bin/true
[root@localhost data]#docker run -d --volumes-from dbdata --name db1 training/postgres
[root@localhost data]# docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
tar: /backup/backup.tar: Cannot open: Permission denied
tar: Error is not recoverable: exiting now
看到Permission denied
这个提示,自然首先怀疑用户没有写权限的问题。检查一下当前目录的权限:
[root@localhost data]# ls -alt
total 4
drwxrwxrwx. 2 root root 6 May 7 21:33 .
drwxrwx-w-. 15 root root 4096 May 7 21:33 ..
应该是没问题的。经过在stackoverflow上的一番讨论,得到的建议是有可能是selinux
捣的鬼。查看了一下selinux
状态:
[root@localhost root]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
果断把模式改为permissive
:
[root@localhost data]# setenforce 0
[root@localhost data]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
马上工作正常:
[root@localhost data]# docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
tar: Removing leading `/' from member names
/dbdata/
因为时间原因,没有往下深究。总之,在使用docker
时,要留意一下selinux
,有可能会引起很奇怪的问题。
更新:
最近又碰到这个问题,可以参考这篇总结。
参考资料:
(1)Why does docker prompt “Permission denied” when backing up the data volume?;
(2)How to disable SELinux without restart?;
(3)Quick-Tip: Turning off or disabling SELinux。
“五一”逛市场随笔
“五一”小长假没有离开北京。一是时间短,车票不好买;二是哪里都是人,看着就很烦。
昨天去了赵公口附近的一个服装批发市场,算起来应该是连续三年小长假来这里了。尽管现在电商网购已经占据了大量的消费市场,但我发现这里还是人来人往,车水马龙,不像有些商城已经变得门可罗雀。坦率地讲,这里有些工厂店,还是很实惠的,可以买到性价比不错的衣服。我前年在这里买的两件运动T恤,质量相当好,穿到今年还没破。
我想互联网电商给大家带来便利的同时,可能也剥夺了人们逛商场淘宝的乐趣了。花几个小时,逛逛商场,既能够锻炼身体,又有淘宝的乐趣。岂不很好?