Two practical software engineering rules

There are so many huge books which introduce software engineering, and in this article, I want to share two practical rules which are based on my own experience.

(1) No fear of refactoring

As time goes on, refactoring code is inevitable: the original design can’t handle current situation seamlessly; we can use the new characteristics of programming language to polish existed code, etc. Since refactoring code is time-consuming, risking, and costly, many companies are reluctant to do it for some reasons. Whereas the refactoring is beneficial to both company and engineers literally.

For companies: After refactorig, the code should become more reasonable and easier-maintainable, and the consequence is that it will save you much time and cost to add new features. For engineers: refactoring code can let you be more familiar with the the code logic, try using new characteristics of programming language and practice module design skills, and it is a precious opportunity to enrich yourself. So in the long run, refactoring code is a win-win situation actually. ( If the software quality becomes worse, oh boy! Don’t refactor it!)

(2) “Real” peer-to-peer code review

I haven’t experienced pair-programming, but took part in many “fake” peer-to-peer code review: before reviewing, the reviewer didn’t read code before. During the reviewing, the code author needed to spell out what was the intention of this code, then the reviewer would analyze the code on the spot. It seemed the reviewer and code author were very busy in the review meeting, but in fact it was a totally time-wasting and inefficient!

From my viewpoint, there should be two maintainers for any software module, and the two maintainers have the same familiarity of code. No matter adding a new big feature or just fixing a small bug, the two maintainers should co-work the whole design flow in advance, then if the task is small, one maintainer can take over the whole work, otherwise they can share it. Since everybody has took part in the discussion before, he/she can review partner’s code alone. This method can avoid misleading by code author, saving time, and finding bug efficiently. The potential benefit for company is if one guy resigns, there is no loss because there is always another engineer who is an genuine backup.

These two rules seem feasible? Why not give them a shot?

Process large data in external memory

This week, I implemented a small program which handles a data set. The volume of data set is so big that it can’t be stored in main memory.

I first tried to use stxxlstxxl is an awesome library which mimics STL and processes data in external memory. But it has many limitations, such as data type should be plain old data type. Since my type doesn’t provide default constructor, stxxl can’t satisfy my need (please refer this discussion). I also make attempts on other workaruonds, but all failed.

Finally, I used a simple method: Open a file, serialize the data set into file, and treat the file like the main memory. Although it is not the most efficient approach, the program is vey clear, and not prone to bugs. So I decide to use it as a demo, and improve it gradually.

Update: Split large file into smaller ones, and use multiple threads to handle them is a good idea.


Some tips of using “pool” in programming

In the past weeks, I am dived into utilizing CUDA APIs to operate on multiple GPUs. Among my work, one is a “memory pool” module which manages allocating/freeing memories from different devices. In this post, I want to share some thoughts about “pool”, an interesting tech in programming.

The “pool” works like a “cache”: it allocates some resource beforehand; if the application wants it, the “pool” picks up a available one for it. One classical example is the “database connection pool”: the “pool” preallocates some TCP connections and keep them alive, and this will save client from handshaking with server every time. The other instance is “memory pool”, which I implemented recently. The “pool” keeps the freed memory, and not really release it to device. Based on my benchmark test, the application’s performance can get a 100% improvement in extreme case. (Caveat: For multithreaded application, if the locking mechanism of “pool” becomes the performance bottleneck, try every thread has its own private “pool”.)

The other function of using “pool” is for debugging purpose. Still use my “memory pool” as an demonstration: for every memory pointer, there are a device ID (which GPU this memory is allocated from) and memory size accompanied with it. So you can know the whole life of this block memory clearly from analyzing trace log. This has saved me from notorious “an illegal memory access was encountered” error many times in multiple GPU programming.

Last but not least, although the number of this “memory pool”‘s code lines is merely ~100, I already use following data structures: queue, vector, pair and map. Not mention the mutex and lock, which are essential to avoid nasty data-race issues. So writing this model is a good practice to hone my programming craft.

Enjoy coding!

Porting software is fun and rewarding

Regarding to port software, I think there are several kinds:

a) For the simplest case, one tool is created for Linux, and you want to use it on FreeBSD. Because there is no out-of-box package for this Operating System, you grab the code and compile it yourself, no complaint from compiler. Run it and it seems work, bingo! This should be a perfect experience!

b) The life will become pleasant if everything is similar to the above case, but in reality it is definitely not. Sometimes, the progress can’t go so smoothly. Take socket programming as an example, the Solaris has some specific requirements if you are only familiar with Linux environment (Please check this post). So you may tweak the compiler options and even customoize your code to fit your requirement in this scenario.

c) The third case is you need to read the whole software source code and do modifications, and this is what I am currently doing. Back to this Monday, I received a task to verify a conception. I remembered there is an Open Source framework which has implemented similar function, so I downloaded and went through the code carefully. Fortunately, this project indeed satisfies our requirement, but since our computation environment is Nvidia GPU, I need to use CUDA APIs to replace the related code besides integrate this framework into our code repository. If no other accidents, I think I can finish the whole work in next week.

From my personal experience, porting software is really rewarding! Take this week’s work as an example, I learnt a new C++ library and refreshed my knowledge of graph data structure. Furthermore, porting software can also give you fun: after several hours even days’ hard work, a bespoken tool can meet your requirement, that will let you feel very filled!

At the end, I must declare I don’t encourage you should be lazy and don’t think problems yourself; instead you should leverage the resource rationally. Moreover, please conform to the software license, and don’t violate it.

Enjoy porting!

How to maintain a software project?

For a software engineer, at least from my own experience, maintaining an existing software project would take up a considerable amount of time: adding new features, fixing tricky bugs, and so on. In this post, I will share some some tips about how to become a veteran from a novice quickly when facing a new project.

(1) Get familiar with the background knowledge of the project.

Every software has its own purposes and users: a device driver serves the specified hardware, whilst a SMS gateway helps routing the messages all over the world. So before delving into the code, you should get an overview of these background information. You need not to be an expert now, but at least have a sketch in your brain. Then when you meet a problem in your later work, you can know which part of knowledge you should enrich.

(2) Study the architecture of the project.

The correct method of studying a software is knowing its architecture first: Is it one-process or multiple-process? How many modules is it divided into? Does it provide some fundamental components? such as creating threads, allocating memory, etc. This step not only keeps you from losing the forest for the trees, but also gives you more confidence since it avoids you trap into the messy code at the beginning.

(3) Master the module which has the highest priority.

Since you have got the enough knowledge of project, it’s time to dig into the details of the required modules now. You should begin with the highest priority, for example, which one is frequently reported bugs, or which one is suggested to refactor. During this stage, you should try to utilize all the resources which can give a hand: previous maintainer, the QA engineers who test this software, the project manager, etc. As you become more versed in the code, you will also get a better understanding of the related business.

Good luck!