I was considering to make the title of the post “STL algorithms considered harmful” to test my click-bait creation skills, but I decided it is better to aim for the audience interested in the topic of the post instead of getting readers who just want to argue about my outrageous claims.

This way, I can assume that you care about algorithms, algorithm complexity and writing the best software that you can.

Algorithms

One of the advices commonly heard in the modern C++ community is to use the algorithms provided by the standard library if you want to make your program safer, faster, more expressive, etc. This is the advice I also try to popularize – in my book, talks, workshops… everywhere I have the audience to do so.

While it is true that whenever we want to write a for loop to solve a problem we have, we should first consider whether there is a way to solve that problem with the standard (or boost) algorithms, we should not do it blindly.

We still need to know how those algorithms are implemented, what are their requirements and guarantees, and what is their space and time complexity.

Usually, if we have a problem that perfectly matches the requirements of an STL algorithm so that we can apply it directly, that algorithm will be the most effective solution.

The problem can arise when we have to prepare the data in some way prior to applying the algorithm.

Set intersection

Imagine we’re writing a tool for C++ developers to give advisories of replacing default-captures ([=] and [&]) in lambdas with an explicit list of captured variables.

std::partition(begin(elements), end(elements),
    [=] (auto element) {
    ^~~ - being implicit is uncool, replace with [threshold]
        return element > threshold;
    });

While parsing the file, we would need to keep a collection containing the variables from current and surrounding scopes, and when we encounter a lambda with a default-capture we need to see which variables it uses.

This gives us two sets – one set containing the variables from the surrounding scopes, and one set containing the variables used in the lambda body.

The capture list we should propose as a replacement should be intersection of those two sets (lambda can use global variables which do not need to be captured, and not all variables from the surrounding scopes are used in the lambda).

And, if we need intersection, we can use the std::set_intersection algorithm.

This algorithm is quite beautiful in its simplicity. It takes two sorted collections, and goes through them in parallel from start to end:

  • If the current item in the first collection is the same as the current item in the second collection, it gets added to the result the algorithm just moves on to the next item in both collections;
  • If the current item in the first collection is less than the current item in the second collection, the algorithm just skips the current item in the first collection;
  • If the current item in the first collection is greater than the current item in the second collection, the algorithm just skips the current item in the second collection.

With each iteration at least one element (from the first or the second input collection) is skipped – therefore, the algorithm complexity will be linear – O(m + n), where m is the number of elements in the first collection, and n the number of elements in the second collection.

Simple and efficient. As long as the input collections are sorted.

Sorting

The problem is what to do when collections are not sorted in advance.

In the previous example, it would be prudent to store the variables from the surrounding scopes in a stack-like structure so that the parser can efficiently add new ones when it enters a new scope, and remove the variables from the current scope when it leaves the current scope.

This means that the variables will not be sorted by name and that we can not use std::set_intersection directly to operate on them. Similarly, tracking the variables in the lambda body is very likely not going to keep them sorted.

Since std::set_intersection works only on sorted collections, it is a common pattern I’ve seen in many projects to first sort the collections, and then call the std::set_intersection algorithm.

Skipping the fact that sorting the variables stack in our example would destroy the purpose of the stack we defined, the intersection algorithm for unordered collections would look like this:

template <typename InputIt1, typename InputIt2, typename OutputIt> auto unordered_intersection_1(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, OutputIt dest) { std::sort(first1, last1); std::sort(first2, last2); return std::set_intersection(first1, last1, first2, last2, dest); }

What is the complexity of the whole algorithm?

Sorting takes quasilinear time, which makes the total complexity of this approach O(n log n + m log m + n + m).

Sort less

Can we skip sorting?

If both collections are unordered, we would need to traverse the second collection for each element in the first one – to check whether it needs to be put into the resulting set. Although this approach is not that uncommon in real-world projects, it is even worse than the previous one – its complexity is O(n * m).

Instead sorting everything, or sorting nothing, we can be Zen and take the Middle Path – sort only one collection.

If only one collection is sorted, we will have to iterate through the unsorted one value by value and check whether that value exists in the sorted collection for which we can use binary search.

The time complexity for this will be O(n log n) for sorting the first collection, and O (m log n) for iteration and checking. In total, we have O((n + m) log n) complexity.

If we decided to sort the second collection instead of the first, the complexity would be O((n + m) log m).

In order to be as efficient as possible, we will always sort the collection that has fewer elements thus making the final complexity for our algorithm ((m + n) log (min(m, n)).

The implementation would look like this:

template <typename InputIt1, typename InputIt2, typename OutputIt> auto unordered_intersection_2(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, OutputIt dest) { const auto size1 = std::distance(first1, last1); const auto size2 = std::distance(first2, last2);

if (size1 > size2) {
        unordered_intersection_2(first2, last2, first1, last1, dest);
        return;
    }

    std::sort(first1, last1);

    return std::copy_if(first2, last2, dest,
        [&] (auto&& value) {
            return std::binary_search(first1, last1, FWD(value));
        });
}

In our example of lambda captures, the collection that we’d want to sort is usually going to be the collection of variables used in the lambda as it is most likely going to be smaller than a list of all variables from all surrounding scopes.

Hashing

The last option is to build a std::unordered_set (unordered, hash-based set implementation) from the smaller collection instead of sorting it. This will make lookups O(1) on average, but the construction of std::unordered_set will take time. Construction can be between O(n) and O(n*n) which can be a problem.

template <typename InputIt1, typename InputIt2, typename OutputIt> void unordered_intersection_3(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, OutputIt dest) { const auto size1 = std::distance(first1, last1); const auto size2 = std::distance(first2, last2);

if (size1 > size2) {
        unordered_intersection_3(first2, last2, first1, last1, dest);
        return;
    }

    std::unordered_set<int> test_set(first1, last1);

    return std::copy_if(first2, last2, dest,
        [&] (auto&& value) {
            return test_set.count(FWD(value));
        });
}

The hashing approach would be the absolute winner for the case where we want to calculate intersections of several sets with a single predefined small set. That is, we have a set A, sets B₁, B₂… and we want to calculate A ∩ B₁, A ∩ B₂

In this case, we could ignore the complexity of the std::unordered_set construction, and the complexity of calculating each intersection would be linear – O(m), where m is the number of elements in the second collection.

Benchmark

While it is always useful to check the algorithm complexity, it is prudent also to benchmark different approaches in cases like these. Especially when choosing between the last two approaches where we pit binary search against hash-based sets.

In my basic benchmarks, the first option where both collections have to be sorted was always the slowest.

Sorting the shorter collection was slightly better than using std::unordered_set, but not much.

Both the second and the third approach were slightly faster than the first in the case when both collections had the same number of elements, and were significantly faster (6x) when one collection had 1000x less elements than the other.