How to process large file?

In Process large data in external memory, I mentioned:

Update: Split large file into smaller ones, and use multiple threads to handle them is a good idea.

I want to elaborate how to process large file here:

(1) Split the large file into small ones which are independent from each other. E.g., based on users. Then you can spawn multiple threads to process each small file.

(2) For the output: if all threads output to same file, the write operations must be atomic and it will become bottleneck of the program. So every thread should have its own output file.

(3) After all threads exit, main thread can use cat or other methods to consolidate all output files into one.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.