Process large data in external memory

This week, I implemented a small program which handles a data set. The volume of data set is so big that it can’t be stored in main memory.

I first tried to use stxxlstxxl is an awesome library which mimics STL and processes data in external memory. But it has many limitations, such as data type should be plain old data type. Since my type doesn’t provide default constructor, stxxl can’t satisfy my need (please refer this discussion). I also make attempts on other workaruonds, but all failed.

Finally, I used a simple method: Open a file, serialize the data set into file, and treat the file like the main memory. Although it is not the most efficient approach, the program is vey clear, and not prone to bugs. So I decide to use it as a demo, and improve it gradually.

Update: Split large file into smaller ones, and use multiple threads to handle them is a good idea.

 

Be careful of file sequence in linking process

Check following A.h:

# cat A.h
#pragma once

#include <iostream>
#include <vector>

class A
{
public:
        std::vector<int> v;
        A()
        {
                v.push_back(1);
                std::cout << "Enter A's constructor...\n";
        }
        int getFirstElem()
        {
                v.push_back(2);
                std::cout << "Enter A's getFirstElem...\n";
                return v[0];
        }
        ~A()
        {
                std::cout << "Enter A's destructor...\n";
        }
};

int func();

And A.cpp:

# cat A.cpp
#include "A.h"

static A a;

int func()
{
        return a.getFirstElem();
}

The A.cpp just define a A‘s static instance, and a func() returns first element in a‘s internal vector.

Check another file which utilizes A.h and A.cpp:

# cat hello.cpp
#include <iostream>
#include "A.h"

static int gP = func();

int main()
{
    std::cout << gP << std::endl;
    return 0;
}

Compile them:

# clang++ -c hello.cpp
# clang++ -c A.cpp

Link hello.o first and execute the program:

# clang++ hello.o A.o
# ./a.out
Enter A's getFirstElem...
Enter A's constructor...
2
Enter A's destructor...

Then link A.o first and execute the program:

# clang++ A.o hello.o
# ./a.out
Enter A's constructor...
Enter A's getFirstElem...
1
Enter A's destructor...

The results are different. In first case, when call a‘s getFirstElem() function, its constructor is not even called. Please pay attention to it!

Clang seems to be generating more user-friendly error message than gcc

Check following simple C++ program:

#include <string>
#include <utility>

class A 
{
public:
    A (int a) {};
};

int main()
{

    std::pair<std::string, A> p;
    return 0;
}

Compile it with newest gcc 7.3.0, following errors are generated:

$ g++ test.cpp
test.cpp: In function ‘int main()’:
test.cpp:13:31: error: no matching function for call to ‘std::pair<std::__cxx11::basic_string<char>, A>::pair()’
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:431:9: note: candidate: template<class ... _Args1, long unsigned int ..._Indexes1, class ... _Args2, long unsigned int ..._Indexes2> std::pair<_T1, _T2>::pair(std::tuple<_Args1 ...>&, std::tuple<_Args2 ...>&, std::_Index_tuple<_Indexes1 ...>, std::_Index_tuple<_Indexes2 ...>)
         pair(tuple<_Args1...>&, tuple<_Args2...>&,
         ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:431:9: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 4 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:364:9: note: candidate: template<class ... _Args1, class ... _Args2> std::pair<_T1, _T2>::pair(std::piecewise_construct_t, std::tuple<_Args1 ...>, std::tuple<_Args2 ...>)
         pair(piecewise_construct_t, tuple<_Args1...>, tuple<_Args2...>);
         ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:364:9: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 3 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:359:21: note: candidate: template<class _U1, class _U2, typename std::enable_if<(std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_MoveConstructiblePair<_U1, _U2>() && (! std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_ImplicitlyMoveConvertiblePair<_U1, _U2>())), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(std::pair<_U1, _U2>&&)
  explicit constexpr pair(pair<_U1, _U2>&& __p)
                     ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:359:21: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 1 argument, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:349:12: note: candidate: template<class _U1, class _U2, typename std::enable_if<(std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_MoveConstructiblePair<_U1, _U2>() && std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_ImplicitlyMoveConvertiblePair<_U1, _U2>()), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(std::pair<_U1, _U2>&&)
  constexpr pair(pair<_U1, _U2>&& __p)
            ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:349:12: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 1 argument, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:339:21: note: candidate: template<class _U1, class _U2, typename std::enable_if<(_MoveConstructiblePair<_U1, _U2>() && (! _ImplicitlyMoveConvertiblePair<_U1, _U2>())), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(_U1&&, _U2&&)
  explicit constexpr pair(_U1&& __x, _U2&& __y)
                     ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:339:21: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:330:12: note: candidate: template<class _U1, class _U2, typename std::enable_if<(_MoveConstructiblePair<_U1, _U2>() && _ImplicitlyMoveConvertiblePair<_U1, _U2>()), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(_U1&&, _U2&&)
  constexpr pair(_U1&& __x, _U2&& __y)
            ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:330:12: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:321:17: note: candidate: template<class _U2, typename std::enable_if<_CopyMovePair<false, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, _U2>(), bool>::type <anonymous> > std::pair<_T1, _T2>::pair(const _T1&, _U2&&)
        explicit pair(const _T1& __x, _U2&& __y)
                 ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:321:17: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:314:18: note: candidate: template<class _U2, typename std::enable_if<_CopyMovePair<true, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, _U2>(), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(const _T1&, _U2&&)
        constexpr pair(const _T1& __x, _U2&& __y)
                  ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:314:18: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:307:27: note: candidate: template<class _U1, typename std::enable_if<_MoveCopyPair<false, _U1, A>(), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(_U1&&, const _T2&)
        explicit constexpr pair(_U1&& __x, const _T2& __y)
                           ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:307:27: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:300:18: note: candidate: template<class _U1, typename std::enable_if<_MoveCopyPair<true, _U1, A>(), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(_U1&&, const _T2&)
        constexpr pair(_U1&& __x, const _T2& __y)
                  ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:300:18: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:293:17: note: candidate: std::pair<_T1, _T2>::pair(std::pair<_T1, _T2>&&) [with _T1 = std::__cxx11::basic_string<char>; _T2 = A]
       constexpr pair(pair&&) = default;
                 ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:293:17: note:   candidate expects 1 argument, 0 provided
/usr/include/c++/7.3.0/bits/stl_pair.h:292:17: note: candidate: std::pair<_T1, _T2>::pair(const std::pair<_T1, _T2>&) [with _T1 = std::__cxx11::basic_string<char>; _T2 = A]
       constexpr pair(const pair&) = default;
                 ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:292:17: note:   candidate expects 1 argument, 0 provided
/usr/include/c++/7.3.0/bits/stl_pair.h:289:21: note: candidate: template<class _U1, class _U2, typename std::enable_if<(std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_ConstructiblePair<_U1, _U2>() && (! std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_ImplicitlyConvertiblePair<_U1, _U2>())), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(const std::pair<_U1, _U2>&)
  explicit constexpr pair(const pair<_U1, _U2>& __p)
                     ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:289:21: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 1 argument, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:280:19: note: candidate: template<class _U1, class _U2, typename std::enable_if<(std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_ConstructiblePair<_U1, _U2>() && std::_PCC<((! std::is_same<std::__cxx11::basic_string<char>, _U1>::value) || (! std::is_same<A, _U2>::value)), std::__cxx11::basic_string<char>, A>::_ImplicitlyConvertiblePair<_U1, _U2>()), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(const std::pair<_U1, _U2>&)
         constexpr pair(const pair<_U1, _U2>& __p)
                   ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:280:19: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 1 argument, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:258:26: note: candidate: template<class _U1, class _U2, typename std::enable_if<(_ConstructiblePair<_U1, _U2>() && (! _ImplicitlyConvertiblePair<_U1, _U2>())), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(const _T1&, const _T2&)
       explicit constexpr pair(const _T1& __a, const _T2& __b)
                          ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:258:26: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:249:17: note: candidate: template<class _U1, class _U2, typename std::enable_if<(_ConstructiblePair<_U1, _U2>() && _ImplicitlyConvertiblePair<_U1, _U2>()), bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair(const _T1&, const _T2&)
       constexpr pair(const _T1& __a, const _T2& __b)
                 ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:249:17: note:   template argument deduction/substitution failed:
test.cpp:13:31: note:   candidate expects 2 arguments, 0 provided
     std::pair<std::string, A> p;
                               ^
In file included from /usr/include/c++/7.3.0/bits/stl_algobase.h:64:0,
                 from /usr/include/c++/7.3.0/bits/char_traits.h:39,
                 from /usr/include/c++/7.3.0/string:40,
                 from test.cpp:1:
/usr/include/c++/7.3.0/bits/stl_pair.h:231:26: note: candidate: template<class _U1, class _U2, typename std::enable_if<std::__and_<std::is_default_constructible<_Tp>, std::is_default_constructible<_U2>, std::__not_<std::__and_<std::__is_implicitly_default_constructible<_U1>, std::__is_implicitly_default_constructible<_U2> > > >::value, bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair()
       explicit constexpr pair()
                          ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:231:26: note:   template argument deduction/substitution failed:
/usr/include/c++/7.3.0/bits/stl_pair.h:230:59: error: no type named ‘type’ in ‘struct std::enable_if<false, bool>’
                                    ::value, bool>::type = false>
                                                           ^~~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:230:59: note: invalid template non-type parameter
/usr/include/c++/7.3.0/bits/stl_pair.h:218:26: note: candidate: template<class _U1, class _U2, typename std::enable_if<std::__and_<std::__is_implicitly_default_constructible<_U1>, std::__is_implicitly_default_constructible<_U2> >::value, bool>::type <anonymous> > constexpr std::pair<_T1, _T2>::pair()
       _GLIBCXX_CONSTEXPR pair()
                          ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:218:26: note:   template argument deduction/substitution failed:
/usr/include/c++/7.3.0/bits/stl_pair.h:216:59: error: no type named ‘type’ in ‘struct std::enable_if<false, bool>’
                                    ::value, bool>::type = true>
                                                           ^~~~
/usr/include/c++/7.3.0/bits/stl_pair.h:216:59: note: invalid template non-type parameter

Honestly, I’m totally lost in the error messages and can’t find the root cause easily. Whereas using clang 5.0.1 to build it:

$ clang++ test.cpp
In file included from test.cpp:1:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.3.0/../../../../include/c++/7.3.0/string:40:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.3.0/../../../../include/c++/7.3.0/bits/char_traits.h:39:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.3.0/../../../../include/c++/7.3.0/bits/stl_algobase.h:64:
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.3.0/../../../../include/c++/7.3.0/bits/stl_pair.h:219:18: error: no matching
      constructor for initialization of 'A'
      : first(), second() { }
                 ^
test.cpp:13:31: note: in instantiation of member function 'std::pair<std::__cxx11::basic_string<char>, A>::pair'
      requested here
    std::pair<std::string, A> p;
                              ^
test.cpp:7:5: note: candidate constructor not viable: requires single argument 'a', but no arguments were provided
    A (int a) {};
    ^
test.cpp:4:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were
      provided
class A
      ^
1 error generated.

It is very clear and I can figure out what is the problem soon. Through this simple test, clang seems generating more user-friendly error message than gcc.

Import existing CUDA project into Nsight

The steps to import an existing CUDA project (who uses CMake) into Nsight are as following:

(1) Select File -> New -> CUDA C/C++ Project:

1

Untick “Use default location“, and select the root directory of your project.

(2) Change Build location in Properties to points to the Makefile position.

2

(3) After building successfully, right click project: Run As -> Local C/C++ Application, then select which binary you want to execute.

References:
Setting Nsight to run with existing Makefile project;
How to create Eclipse project from CMake project;
How to change make location in Eclipse.

 

First taste of stxxl

stxxl is an interesting project:

STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i. e. STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the closeness to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

Recently, I have a task to do operation on data which can’t be stored in memory, so I want to give a shot of stxxl.

My OS is OpenBSD and the memory is less than 4G and I try to modify the vector1.cpp:

for (int i = 0; i < 1024 * 1024; i++)

to

for (int i = 0; i < 1024 * 1024 * 1024; i++)

Since every integer occupies 4 bytes, so it will require at least 4G storage to save the vector. Build and run this program:

# ./vector1
[STXXL-MSG] STXXL v1.4.99 (prerelease/Debug) (git 263df0c54dc168212d1c7620e3c10c93791c9c29)
[STXXL-ERRMSG] Warning: no config file found.
[STXXL-ERRMSG] Using default disk configuration.
[STXXL-MSG] Warning: open()ing /var/tmp/stxxl without DIRECT mode, as the system does not support it.
[STXXL-MSG] Disk '/var/tmp/stxxl' is allocated, space: 1000 MiB, I/O implementation: syscall delete_on_exit queue=0 devid=0
[STXXL-ERRMSG] External memory block allocation error: 2097152 bytes requested, 0 bytes free. Trying to extend the external memory space...
[STXXL-ERRMSG] External memory block allocation error: 2097152 bytes requested, 0 bytes free. Trying to extend the external memory space...
[STXXL-ERRMSG] External memory block allocation error: 2097152 bytes requested, 0 bytes free. Trying to extend the external memory space...
[STXXL-ERRMSG] External memory block allocation error: 2097152 bytes requested, 0 bytes free. Trying to extend the external memory space...
......
[STXXL-ERRMSG] External memory block allocation error: 2097152 bytes requested, 0 bytes free. Trying to extend the external memory space...
101
[STXXL-ERRMSG] Removing disk file: /var/tmp/stxxl

The program outputs 101 which is correct result. Check /var/tmp/stxxl before it is deleted:

# ls -alt /var/tmp/stxxl
-rw-r-----  1 root  wheel  4294967296 Mar  9 15:58 /var/tmp/stxxl

It is indeed the data file and size is 4G.

Based on this simple test, stxxl gives me a good impression, and is worthy for further exploring.

P.S., use cmake -DBUILD_TESTS=ON .. to enable building the examples.