Git clone命令笔记

git clone命令用来拷贝一个已经存在的代码仓库。Clone操作会自动创建一个指向原始代码仓库的远程连接(名字是origin),这会很方便操作中央代码库。git的合作开发模式是基于“代码仓库-代码仓库”(repository-to-repository),不同于SVN的从工作拷贝向中央代码仓库提交代码的方式,gitpushpull操作都是从一个代码仓库到另一个代码仓库。

使用:

a)git clone <repo>
把位于repo的代码仓库复制到本机。

b)git clone <repo> <directory>
把位于repo的代码仓库复制到本机的directory

例子:

[root@CentOS ~]# git clone https://github.com/sharklinux/shark
Cloning into 'shark'...
remote: Counting objects: 1003, done.
remote: Total 1003 (delta 0), reused 0 (delta 0), pack-reused 1003
Receiving objects: 100% (1003/1003), 21.43 MiB | 304.00 KiB/s, done.
Resolving deltas: 100% (245/245), done.
[root@CentOS ~]# ls
anaconda-ks.cfg  shark

执行git clone https://github.com/sharklinux/shark在本机初始化一个shark文件夹(注意没有.git后缀,表明这是一个非bare属性的本地拷贝),文件夹里包含了整个shark代码仓库的所有内容。

参考资料:
git clone

解决“Warning: Cannot modify header information – headers already sent by …”的问题

中午变化了一下博客主题,结果就登不上了。提示错误如下:

Warning: Cannot modify header information - headers already sent by (output started at /home/to/public_html/en/wp-content/themes/nordby/functions.php:78) in /home/to/public_html/en/wp-login.php on line 418

Warning: Cannot modify header information - headers already sent by (output started at /home/to/public_html/en/wp-content/themes/nordby/functions.php:78) in /home/to/public_html/en/wp-login.php on line 431

问题出在/home/to/public_html/en/wp-content/themes/nordby/functions.php这个文件上。这是错误的文件截图,可以看到结尾有空白行:

err

修改成这样就可以了:

right

参考资料:
Cannot modify header information – headers already sent by …

Git init命令笔记

git init命令用来创建一个新的Git仓库。它既可以用来完全初始化一个新的空仓库,也可以把一个已经存在的,没有版本控制的仓库转成Git仓库。执行git init命令会在指定工程的根目录下创建一个.git的子文件夹。除了.git子文件夹,工程的其它文件都不会改变。

使用:

a)git init
把当前目录变成一个Git仓库。

b)git init <directory>
在指定的目录下创建Git仓库。执行这个命令将会创建一个叫directory的新文件夹,在这个文件夹里只有.git子文件夹。

c)git init --bare <directory>
初始化一个没有工作文件夹的空的Git仓库。用来共享的Git仓库应该始终使用--bare选项来创建。通常情况下,用--bare选项初始化的仓库以.git作为后缀。举个例子,使用--bare选项创建的project仓库应该叫project.git

比较一下git init <directory>git init --bare <directory>
首先执行git init linux:

[root@CentOS ~]# git init linux
Initialized empty Git repository in /root/linux/.git/
[root@CentOS ~]# ls -alt linux/
total 8
dr-xr-x---. 5 root root 4096 Jun  2 12:53 ..
drwxr-xr-x. 7 root root 4096 Jun  2 12:42 .git
drwxr-xr-x. 3 root root   17 Jun  2 12:42 .
[root@CentOS ~]# ls -alt linux/.git
total 20
drwxr-xr-x. 7 root root 4096 Jun  2 12:42 .
drwxr-xr-x. 4 root root   28 Jun  2 12:42 objects
-rw-r--r--. 1 root root   92 Jun  2 12:42 config
-rw-r--r--. 1 root root   23 Jun  2 12:42 HEAD
drwxr-xr-x. 2 root root   20 Jun  2 12:42 info
drwxr-xr-x. 2 root root 4096 Jun  2 12:42 hooks
-rw-r--r--. 1 root root   73 Jun  2 12:42 description
drwxr-xr-x. 2 root root    6 Jun  2 12:42 branches
drwxr-xr-x. 3 root root   17 Jun  2 12:42 ..
drwxr-xr-x. 4 root root   29 Jun  2 12:42 refs

接着执行git init --bare bsd:

[root@CentOS ~]# git init --bare bsd
Initialized empty Git repository in /root/bsd/
[root@CentOS ~]# ls -lt bsd
total 16
drwxr-xr-x. 4 root root   28 Jun  2 13:01 objects
-rw-r--r--. 1 root root   66 Jun  2 13:01 config
drwxr-xr-x. 2 root root    6 Jun  2 13:01 branches
-rw-r--r--. 1 root root   73 Jun  2 13:01 description
-rw-r--r--. 1 root root   23 Jun  2 13:01 HEAD
drwxr-xr-x. 2 root root 4096 Jun  2 13:01 hooks
drwxr-xr-x. 2 root root   20 Jun  2 13:01 info
drwxr-xr-x. 4 root root   29 Jun  2 13:01 refs

可以看到所有的文件信息都直接创建在bsd目录下,而没有创建在.git文件夹下。

参考文档:
git init

解决screen会话闪烁的问题

使用GNU screen创建多个工作会话时,使用时可能会遇到会话屏幕闪烁的问题。比如在命令行使用backspace把所有字符都删除完以后,或是man某个命令翻到最后一页还接着往下翻,等等。原因是visual bell在捣鬼。解决方法如下:

(1)编辑“~/.screenrc”文件;
(2)加入以下行:

vbell_msg "bell: window ~%"     # Message for visual bell
vbellwait 2                     # Seconds to pause the screen for visual bell
vbell off                       # Turns visual bell off

参考文档:
http://stackoverflow.com/questions/897358/gnu-screen-refresh-problem

Lttng安装简介

本文参考自Lttng的官方文档

(1)安装Lttng
我使用的是CentOS,所以按照RHEL文档,使用yum方式安装:
a)构建package相关信息:

wget -P /etc/yum.repos.d/ http://packages.efficios.com/repo.files/EfficiOS-RHEL7-x86-64.repo
rpmkeys --import http://packages.efficios.com/rhel/repo.key
yum updateinfo

b)接下来安装lttng软件包:

yum install lttng-ust-devel #安装 lttng-ust会同时安装liburcu0
yum install kmod-lttng-modules
yum install lttng-tools-devel
yum install babeltrace-devel

(2)测试一下:

lttng create my-session
lttng enable-event --kernel --all
lttng start
lttng stop
lttng stop
lttng destroy

接着执行ls命令:

[root@CentOS ~]# ls
anaconda-ks.cfg  lttng-traces

可以看到抓的trace都在lttng-traces这个文件夹里。

搭建Scala开发环境

本文以CentOS 7为例,介绍如何搭建Scala开发环境:

(1)安装Scala :
执行“yum install scala”命令:

[root@localhost ~]# yum install scala
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos.mirrors.tds.net
 * extras: bay.uchicago.edu
 * updates: dallas.tx.mirror.xygenhosting.com
Nothing to do

提示找不到Scala安装包,所以采用另外方式。使用wget命令直接下载:

[root@localhost ~]# wget http://downloads.typesafe.com/scala/2.11.6/scala-2.11.6.rpm
--2015-05-27 22:07:32--  http://downloads.typesafe.com/scala/2.11.6/scala-2.11.6.rpm
......
Length: 111919675 (107M) [application/octet-stream]
Saving to: ‘scala-2.11.6.rpm’

100%[=========================================================================>] 111,919,675  298KB/s   in 6m 15s

2015-05-27 22:13:48 (291 KB/s) - ‘scala-2.11.6.rpm’ saved [111919675/111919675]

接下来安装Scala

[root@localhost ~]# rpm -ivh scala-2.11.6.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:scala-2.11.6-0                   ################################# [100%]

安装成功,执行Scala

[root@localhost ~]# scala
/usr/bin/scala: line 23: java: command not found

运行Scala需要JRE支持,所以下一步安装Java环境。

(2)执行yum install java

[root@localhost ~]# yum install java
......
Complete!

(3)运行scala,打印“Hello world!”:

[root@localhost ~]# scala
Welcome to Scala version 2.11.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> print("Hello world!")
Hello world!
scala> :quit

安装成功!

利用Spark API写一个单独的程序

本文参考Spark网站的Self-Contained Applications一节,使用Scala语言开发一个单独的小程序。

(1)首先安装sbt,参考官方文档。我使用的是RPM包格式:

curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt

(2)接下来在/home文件夹下建立一个SparkApp的文件夹,文件夹布局如下:

bash-4.1# find /home/SparkApp/
/home/SparkApp/
/home/SparkApp/simple.sbt
/home/SparkApp/src
/home/SparkApp/src/main
/home/SparkApp/src/main/scala
/home/SparkApp/src/main/scala/SimpleApp.scala

其中simple.sbt文件内容如下所示:

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"

SimpleApp.scala程序如下:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

(3)执行sbt package命令打包jar文件:

bash-4.1# sbt package
......
[success] Total time: 89 s, completed May 25, 2015 10:16:51 PM

(4)调用spark-submit脚本执行程序:

bash-4.1# /usr/local/spark/bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
......
Lines with a: 60, Lines with b: 29

可以看到,输出正确结果。

搭建Spark开发环境

本文使用docker搭建Spark环境,使用的image文件是sequenceiq提供的1.3.0版本

首先pull Spark image文件:

docker pull sequenceiq/spark:1.3.0

pull成功后,运行Spark

docker run -i -t -h sandbox sequenceiq/spark:1.3.0 bash

测试Spark是否工作正常:

bash-4.1# spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
......
scala> sc.parallelize(1 to 1000).count()
......
res0: Long = 1000

输出1000,OK!

(1)启动spark-shell,输出log很多,解决方法如下:
a)把/usr/local/spark/conf文件夹下的log4j.properties.template文件复制生成一份log4j.properties文件:

bash-4.1# cd /usr/local/spark/conf
bash-4.1# cp log4j.properties.template log4j.properties

b)把log4j.properties文件里的“log4j.rootCategory=INFO, console”改成“log4j.rootCategory=WARN, console”即可。

(2)启动spark-shell会有以下warning

15/05/25 04:49:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

提示找不到hadoop的库文件,解决办法如下:

export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH

请参考stackoverflow的相关讨论:
a)Hadoop “Unable to load native-hadoop library for your platform” error on CentOS
b)Hadoop “Unable to load native-hadoop library for your platform” error on docker-spark?

(3)在Quick Start中提到如下例子:

scala> val textFile = sc.textFile("README.md")
......
scala> textFile.count() // Number of items in this RDD

执行会有错误:

scala> textFile.count()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://sandbox:9000/user/root/README.md
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

可以看到程序尝试从hdfs中寻找文件,所以报错。

解决方法有两种:
a) 指定本地文件系统:

scala> val textFile = sc.textFile("file:///usr/local/spark/README.md")
textFile: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/README.md MapPartitionsRDD[3] at textFile at <console>:21

scala> textFile.count()
res1: Long = 98

b)上传文件到hdfs上:

bash-4.1# hadoop fs -put /usr/local/spark/README.md README.md

接着运行spark-shell:

bash-4.1# spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.3.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
15/05/25 05:22:15 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/05/25 05:22:15 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
Spark context available as sc.
SQL context available as sqlContext.

scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:21

scala> textFile.count()
res0: Long = 98

参考邮件:
Spark Quick Start – call to open README.md needs explicit fs prefix

P.S.在主机(非docker环境)下载sparkhttps://spark.apache.org/downloads.html)运行时,会有以下warning

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

解决办法是把/path/to/spark/conf文件夹下的log4j.properties.template文件复制生成一份log4j.properties文件即可。

参考stackoverflow的讨论:
log4j:WARN No appenders could be found for logger (running jar file, not web app)

不要招聘“性价比”高的工程师

以前在论坛看到一个帖子,发帖者抱怨自己招不到“性价比”高的工程师。当时没想什么,不过我后来仔细琢磨一下,感觉有点奇怪。为什么一个公司非要招聘“性价比”高的工程师呢?

所谓“性价比”,顾名思义,就是“性能”和“价格”的比值。“性能”越高,“价格”越低,性价比就越高。如果一个人每月工资是5000元,但他每个月为公司创造的价值相当于工资是800010000元的人创造的,那么这个人“性价比”就很高了。但是且慢,这样是不是对这个人不公平呢?当然,人就应该拿到和他能力匹配的工资。为什么公司非要招聘“性价比”高的工程师呢?“性价比”是1,或者性能和价格匹配就好了。谁都不是傻子,都知道自己有几斤几两。当他发现工资与自己的能力和价值相差太远时,或是降低工作效率,或是准备拍屁股走人,这样对公司有什么好处呢?开始以为招聘“性价比”高的工程师占了便宜,其实长远来看,还是吃亏的。据说有些公司还把招聘时压价格作为对招聘者的一个考评指标,想想真是可笑。 

不要考虑招聘“性价比”高的工程师的了,招聘一个“性价比”合适的工程师。