Moving iterators in C++

This will be a short post about a feature in STL that seems to be not as well-known as it should be.

Imagine we want to create a small function that collects files in the subdirectories of the current directory. So, a list that would be returned by ls */*.

We’ll start by creating a function that returns filenames of all files in the specified directory.

std::vector<fs::directory_entry>
files_in_dir(const fs::directory_entry& dir)
{
    return std::vector<fs::directory_entry>(
            fs::directory_iterator{dir.path()},
            fs::directory_iterator{});
}

This function creates a vector of all files in a specified directory using the STL’s filesystem library.

What we need to do now is create a function that calls files_in_dir for all directories in the current directory and collect all the results in a single vector.

std::vector<fs::directory_entry>
files_in_subdirs()
{
    std::vector<fs::directory_entry> results;

    auto item = fs::directory_iterator{"."};
    const auto last = fs::directory_iterator{};

    for (; item != last; ++item) {
        if (!item->is_directory()) {
            continue;
        }

        auto dir_items = files_in_dir(*item);

        [ Add everything from dir_items into results ]
    }

    return results;
}

The function should be easy to understand – it is just iterating through subdirectories of the current directory, and calls files_in_dir to get the list of files in them.

The question that remains is how to add those collected items into the resulting vector.

We can use .insert to insert all items from dir_items into the results vector:

results.insert(results.end(),
               dir_items.cbegin(),
               dir_items.cend());

The problem is that this will copy all the directory_entry values from dir_items to result. The copies are unnecessary because dir_items is destroyed immediately afterwards.

We could have moved everything into the results vector.

If we didn’t know any better, we could replace this insert with a for loop that moves the elements from dir_items to results one by one (we would need to call reserve before to eliminate the possibility of vector reallocations).

But we do know better – there is an iterator adaptor that returns rvalue references when it is dereferenced and it is aptly named std::move_iterator.

This means that we can still rely on higher abstraction functions like .insert and still be efficient. We just need to pass a move iterator to .insert instead of the normal one:

results.insert(results.end(),
               std::make_move_iterator(dir_items.begin()),
               std::make_move_iterator(dir_items.end()));

That’s it for now.

Bwmat
Wouldn't using the const versions of begin/end prevent a move in that last example?
Ivan Čukić
You're right of course.

I was thinking "remove the c", "remove the c", "remove the c" all the time while writing, and then I forgot. :)

Thanks!
Milos
Correct me if I'm wrong, but this returns subfolders, not files in subdirectories?
Ivan Čukić
directory_entry returnes both -- in this case directories and files of subdirectories of the current directory.
Alfredo Correa
T& has a corresponding T*
T const& has a corresponding T const*

should the be a “move pointer” in the language that corresponds to T&& ?

So far I had to use std::move_iterator<T*> or implement my own for this.
Ivan Čukić
Interesting idea. Though I'm not sure how much should be invested in extending raw pointers.
Links 9/2/2019: Linux 4.4.174 and GTK+ No More (Now Just GTK) | Techrights
[...] Moving iterators in C++ [...]