Install VirtualBox guest additions

To use “Shared Folders” in VirtualBox, the user should install VirtualBox guest additions:

(1) Select Devices -> Insert Guest Additions CD image. If the VirtualBox prompts:

Unable to insert the virtual optical disk C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso into the machine CentOS.

Would you like to try to force insertion of this disk?

Count not mount the media/drive 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' (VERR_PDM_MEDIA_LOCKED).

It means the Devices -> CD/DVD DEvices already has ISO file. Please inject it, and try Insert Guest Additions CD image again.

(2) Mount the ISO file:

mount /dev/cdrom /mnt

(3) Install the VirtualBox guest additions (Take Linux as an example):

cd /mnt
./VBoxLinuxAdditions.run

You may also need to install bzip2, gcc and kernel files to install guest additions successfully. When meeting errors, please refer /var/log/vboxadd-install.log for detail info.

Build Apache Spark Application in IntelliJ IDEA 14.1

My Operating System is Windows 7, so this tutorial may be little difference for your environment.

Firstly, you should install Scala 2.10.x version on Windows to run Spark, else you would get errors like this:

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
        at akka.actor.ActorCell$.<init>(ActorCell.scala:305)
        at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
        at akka.actor.RootActorPath.$div(ActorPath.scala:152)
        ......

Please refer this post.

Secondly, you should install Scala plugin and create a Scala project, you can refer this document: Getting Started with Scala in IntelliJ IDEA 14.1.  

After all the above steps are done, the project view should like this:

21

Then follow the next steps:

(1) Select “File” -> “Project Structure“:

22

(2) Select “Modules” -> “Dependencies” -> “+” -> “Library” -> “Java“:

23

(3) Select spark-assembly-x.x.x-hadoopx.x.x.jar, press OK:

24

(4) Configure Library, press OK:

25

(5) The final configuration likes this:

26

(6) Write a simple CountWord application:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object CountWord{
  def main(args: Array[String]) {
    System.setProperty("hadoop.home.dir", "c:\\winutil\\")

    val logFile = "C:\\spark-1.3.1-bin-hadoop2.4\\README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Please notice “System.setProperty("hadoop.home.dir", "c:\\winutil\\")” , You should downloadwinutils.exe and put it in the folder: C:\winutil\bin. For detail information, you should refer the following posts:
a) Apache Spark checkpoint issue on windows;
b) Run Spark Unit Test On Windows 7.

(7) The final execution likes this:

27

 

The following part introduces creating SBT project:

(1) Select “New project” -> “Scala” -> “SBT“, then click “Next:

sbt1

(2) Fill the “project name” and “project location“, then click “Finish“:

sbt2

(3) In Windows, modify the scala version to 2.10.4 in build.sbt:

sbt4

(4) Add spark package and create an scala object in “src -> main -> scala-2.10” folder, the final file layout likes this:sbt5(5) Run it!

You can also build a jar file:
File” -> “Project Structure” -> “Artifacts“, then select options like this:

sbt6

Refer this post in stackoverflow.

Then using spark-submit command execute jar package:

C:\spark-1.3.1-bin-hadoop2.4\bin>spark-submit --class "CountWord" --master local
[4] C:\Work\Intellij_scala\CountWord\out\artifacts\CountWord_jar\CountWord.jar
15/06/17 17:05:51 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
[Stage 0:>                                                          (0 + 0) / 2]
[Stage 0:>                                                          (0 + 1) / 2]
[Stage 0:>                                                          (0 + 2) / 2]

Lines with a: 60, Lines with b: 29

Getting Started with Scala in IntelliJ IDEA 14.1

This tutorial uses IntelliJ IDEA 14.1.3 version.

Prerequisites:

You should install Java and Scala first.

(1) Install Scala plugin:

a) After installing IntelliJ IDEA successfully, we need to install Scala plugin first: In the welcome window, select Configure -> Plugins:  

0

b) Select “Install JetBrains Plugin...“:

2c) If your computer needs proxy, please click “HTTP Proxy Settings” to configure proxy, else ignore it:

3

 

d) Select Scala plugin, and click Install plugin to install it:

4

 

The installing progress is like this:

5

e) After installation, restart IntelliJ IDEA:

6

 

 

 

(2) Create Scala project:
a) Select “Create New Project:

11

b) Select “Scala” -> “Scala“, then click Next:

7

c) Select a valid name for project and a folder to store project files:

12

d) Fill Project SDK with JDK directory:

13

After selection, click “OK:

14

e) For Scala SDK, click “Create“. It will display the installed Scala, click “OK“:

15

f) Click “Finish“:

16

(3) Create Scala application:

a) Select src -> New -> Scala Class:

17

b) Select object as Kind value:

18

c) Write a simple “Hello World” program:

19

d) Select Run -> Run:

20

e) Select HelloWorld:

21

f) The application outputs “Hello World!“:

22

All is OK now!

 

 

A trick of building multithreaded application on Solaris

Firstly, Let’s see a simple multithreaded application:

#include <stdio.h>
#include <pthread.h>
#include <errno.h>

void *thread1_func(void *p_arg)
{
           errno = 0;
           sleep(3);
           errno = 1;
           printf("%s exit, errno is %d\n", (char*)p_arg, errno);
}

void *thread2_func(void *p_arg)
{
           errno = 0;
           sleep(5);
           printf("%s exit, errno is %d\n", (char*)p_arg, errno);
}

int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread1_func, "Thread 1");
        pthread_create(&t2, NULL, thread2_func, "Thread 2");

        sleep(10);
        return;
}

What output do you expect from this program? Per my understanding, the errnoshould be a thread-safe variable. Though The thread1_func function changes theerrno, it should not affect errno in thread2_func function.

Let’s check it on Solaris 10:

bash-3.2# gcc -g -o a a.c -lpthread
bash-3.2# ./a
Thread 1 exit, errno is 1
Thread 2 exit, errno is 1

Oh! The errno in thread2_func function is also changed to 1. Why does it happen? Let’s find the root cause from the errno.h file:

/*
 * Error codes
 */

#include <sys/errno.h>

#ifdef  __cplusplus
extern "C" {
#endif

#if defined(_LP64)
/*
 * The symbols _sys_errlist and _sys_nerr are not visible in the
 * LP64 libc.  Use strerror(3C) instead.
 */
#endif /* _LP64 */

#if defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L
extern int *___errno();
#define errno (*(___errno()))
#else
extern int errno;
/* ANSI C++ requires that errno be a macro */
#if __cplusplus >= 199711L
#define errno errno
#endif
#endif  /* defined(_REENTRANT) || defined(_TS_ERRNO) */

#ifdef  __cplusplus
}
#endif

#endif  /* _ERRNO_H */

We can find the errno can be a thread-safe variable(#define errno (*(___errno()))) only when the following macros defined:

defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L

Let’s try it:

bash-3.2# gcc -D_POSIX_C_SOURCE=199506L -g -o a a.c -lpthread
bash-3.2# ./a
Thread 1 exit, errno is 1
Thread 2 exit, errno is 0

Yes, the output is right!

From Compiling a Multithreaded Application, we can see:

For POSIX behavior, compile applications with the -D_POSIX_C_SOURCE flag set >= 199506L. For Solaris behavior, compile multithreaded programs with the -D_REENTRANT flag.

So we should pay more attentions when building multithreaded application on Solaris.

P.S., the full code is here.

Reference:
(1) Compiling a Multithreaded Application;
(2) What is the correct way to build a thread-safe, multiplatform C library?

An Awful Project

A simple server program whose function is just receiving protocol packets from clients, then paring them, and inserting the paring results into database. Why is the implementation so complicated and awful?

(1) For every protocol message, the server need listen to a specified port. Now, there are only 5 protocols, how if we need support 100 or 10000 protocol? Do we need to listen 100 or 10000 ports? Yes, We can’t wait for that day. Because if we need support 10000 protocols, we have become Bill Gates.

(2) Every protocol message has a different header.

(3) A lot of copy-paste code, and this will cause so many duplicated code that if there is an error, you need find everywhere and fix it.

(4) A lot of dead code.

(5) Many functions are so long (more than 1000 lines) that I don’t know whether they can be maintained after 5 years.

(6) Depend on 32-bit/64-bit architecture.

(7) The client and server is coupled so closely that upgrading one program must consider the other.

I think we can use a unified header for every protocol and server only listen to one port. Every message arrives the server with the same header, so this can avoid many duplicated codes. What the server needs to do is use “switch … case” to parse every message according to the header. This can also decouple client-server programs.

A good architecture is the base of the house. If there is a serious defection in base, I am not sure whether the house can be stable.