I was considering to make the title of the post “STL algorithms considered harmful” to test my click-bait creation skills, but I decided it is better to aim for the audience interested in the topic of the post instead of getting readers who just want to argue about my outrageous claims.

This way, I can assume that you care about algorithms, algorithm complexity and writing the best software that you can.

Algorithms

One of the advices commonly heard in the modern C++ community is to use the algorithms provided by the standard library if you want to make your program safer, faster, more expressive, etc. This is the advice I also try to popularize – in my book, talks, workshops… everywhere I have the audience to do so.

While it is true that whenever we want to write a for loop to solve a problem we have, we should first consider whether there is a way to solve that problem with the standard (or boost) algorithms, we should not do it blindly.

We still need to know how those algorithms are implemented, what are their requirements and guarantees, and what is their space and time complexity.

Usually, if we have a problem that perfectly matches the requirements of an STL algorithm so that we can apply it directly, that algorithm will be the most effective solution.

The problem can arise when we have to prepare the data in some way prior to applying the algorithm.

Set intersection

Imagine we’re writing a tool for C++ developers to give advisories of replacing default-captures ([=] and [&]) in lambdas with an explicit list of captured variables.

std::partition(begin(elements), end(elements),
    [=] (auto element) {
    ^~~ - being implicit is uncool, replace with [threshold]
        return element > threshold;
    });

While parsing the file, we would need to keep a collection containing the variables from current and surrounding scopes, and when we encounter a lambda with a default-capture we need to see which variables it uses.

This gives us two sets – one set containing the variables from the surrounding scopes, and one set containing the variables used in the lambda body.

The capture list we should propose as a replacement should be intersection of those two sets (lambda can use global variables which do not need to be captured, and not all variables from the surrounding scopes are used in the lambda).

And, if we need intersection, we can use the std::set_intersection algorithm.

This algorithm is quite beautiful in its simplicity. It takes two sorted collections, and goes through them in parallel from start to end:

If the current item in the first collection is the same as the current item in the second collection, it gets added to the result the algorithm just moves on to the next item in both collections;
If the current item in the first collection is less than the current item in the second collection, the algorithm just skips the current item in the first collection;
If the current item in the first collection is greater than the current item in the second collection, the algorithm just skips the current item in the second collection.

With each iteration at least one element (from the first or the second input collection) is skipped – therefore, the algorithm complexity will be linear – O(m + n), where m is the number of elements in the first collection, and n the number of elements in the second collection.

Simple and efficient. As long as the input collections are sorted.

Sorting

The problem is what to do when collections are not sorted in advance.

In the previous example, it would be prudent to store the variables from the surrounding scopes in a stack-like structure so that the parser can efficiently add new ones when it enters a new scope, and remove the variables from the current scope when it leaves the current scope.

This means that the variables will not be sorted by name and that we can not use std::set_intersection directly to operate on them. Similarly, tracking the variables in the lambda body is very likely not going to keep them sorted.

Since std::set_intersection works only on sorted collections, it is a common pattern I’ve seen in many projects to first sort the collections, and then call the std::set_intersection algorithm.

Skipping the fact that sorting the variables stack in our example would destroy the purpose of the stack we defined, the intersection algorithm for unordered collections would look like this:

template <typename InputIt1, typename InputIt2, typename OutputIt>
auto unordered_intersection_1(InputIt1 first1, InputIt1 last1,
                              InputIt2 first2, InputIt2 last2,
                              OutputIt dest)
{
    std::sort(first1, last1);
    std::sort(first2, last2);
    return std::set_intersection(first1, last1, first2, last2, dest);
}

What is the complexity of the whole algorithm?

Sorting takes quasilinear time, which makes the total complexity of this approach O(n log n + m log m + n + m).

Sort less

Can we skip sorting?

If both collections are unordered, we would need to traverse the second collection for each element in the first one – to check whether it needs to be put into the resulting set. Although this approach is not that uncommon in real-world projects, it is even worse than the previous one – its complexity is O(n * m).

Instead sorting everything, or sorting nothing, we can be Zen and take the Middle Path – sort only one collection.

If only one collection is sorted, we will have to iterate through the unsorted one value by value and check whether that value exists in the sorted collection for which we can use binary search.

The time complexity for this will be O(n log n) for sorting the first collection, and O (m log n) for iteration and checking. In total, we have O((n + m) log n) complexity.

If we decided to sort the second collection instead of the first, the complexity would be O((n + m) log m).

In order to be as efficient as possible, we will always sort the collection that has fewer elements thus making the final complexity for our algorithm ((m + n) log (min(m, n)).

The implementation would look like this:

template <typename InputIt1, typename InputIt2, typename OutputIt>
auto unordered_intersection_2(InputIt1 first1, InputIt1 last1,
                              InputIt2 first2, InputIt2 last2,
                              OutputIt dest)
{
    const auto size1 = std::distance(first1, last1);
    const auto size2 = std::distance(first2, last2);

    if (size1 > size2) {
        unordered_intersection_2(first2, last2, first1, last1, dest);
        return;
    }

    std::sort(first1, last1);

    return std::copy_if(first2, last2, dest,
        [&] (auto&& value) {
            return std::binary_search(first1, last1, FWD(value));
        });
}

In our example of lambda captures, the collection that we’d want to sort is usually going to be the collection of variables used in the lambda as it is most likely going to be smaller than a list of all variables from all surrounding scopes.

Hashing

The last option is to build a std::unordered_set (unordered, hash-based set implementation) from the smaller collection instead of sorting it. This will make lookups O(1) on average, but the construction of std::unordered_set will take time. Construction can be between O(n) and O(n*n) which can be a problem.

template <typename InputIt1, typename InputIt2, typename OutputIt>
void unordered_intersection_3(InputIt1 first1, InputIt1 last1,
                              InputIt2 first2, InputIt2 last2,
                              OutputIt dest)
{
    const auto size1 = std::distance(first1, last1);
    const auto size2 = std::distance(first2, last2);

    if (size1 > size2) {
        unordered_intersection_3(first2, last2, first1, last1, dest);
        return;
    }

    std::unordered_set<int> test_set(first1, last1);

    return std::copy_if(first2, last2, dest,
        [&] (auto&& value) {
            return test_set.count(FWD(value));
        });
}

The hashing approach would be the absolute winner for the case where we want to calculate intersections of several sets with a single predefined small set. That is, we have a set A, sets B₁, B₂… and we want to calculate A ∩ B₁, A ∩ B₂…

In this case, we could ignore the complexity of the std::unordered_set construction, and the complexity of calculating each intersection would be linear – O(m), where m is the number of elements in the second collection.

Benchmark

While it is always useful to check the algorithm complexity, it is prudent also to benchmark different approaches in cases like these. Especially when choosing between the last two approaches where we pit binary search against hash-based sets.

In my basic benchmarks, the first option where both collections have to be sorted was always the slowest.

Sorting the shorter collection was slightly better than using std::unordered_set, but not much.

Both the second and the third approach were slightly faster than the first in the case when both collections had the same number of elements, and were significantly faster (6x) when one collection had 1000x less elements than the other.

Kuba Ober
For future compatibility, each iterator should have its own type.
Ivan Čukić
Yes, I usually put different types for begin and end.

The resason why I didn't in this case is that sentinels have no purpose for random-access containers. And random-access is required here for sorting.
Jonathan Boccara
Super interesting post, thanks for presenting this Ivan.

In unordered_intersection_2 and 3, it seems to me that the results will be coming from the smaller set, while set_intersection takes them from the left set. Do you think this should be worth taking into account, so that unordered_intersection also shares this convention?

Also I agree with you that sorting before set_intersection has drawbacks. One of them is that is makes unordered_intersection_1 have a side effects on its input parameters. Do you find that it would be better to make a copy of the and work on the copy?

And very nice animation of the algorithm ;)
Ivan Čukić
@Jonathan

Thanks. I was experimenting with SVG animation while hoping nobody uses IE :)

std::set_intersection doesn't care about in which order the collections are given, so I don't really think it would change anything. It stops as soon as any of the collections is completely processed.

As for copying... not really sure. I agree that changing the input arguments is bad (usually :) ), but copying is inefficient. I'd probably choose based on a particular case. In the lambda capture example, I'd be fine sorting variables used in the lambda, while (as mentioned in the post) sorting variables from surrounding scopes would be problematic.
InvI
In unordered_intersection_3:

How come complexity will be O(m).

std::copy_if worst case complexity will be linear (suppose n) then O(n).

unorder_set count() worst case complexity will be linear (suppose n) then O(n)

Then overall complexity will be O(n2).

Correct me if I am wrong.
Ivan Čukić
The average complexity of `count()` is constant.

While you are correct that the worst case complexity,
it is usually not taken in account because (if all is well)
it does not really happen (having a hash function that puts
everything in the same bucket).

Write a new comment...

Ivan Čukić

dr Ivan Čukić

Projects

Blog categories

Functional Programming in C++

Patronage

Knowing when not to use the STL algorithms - set operations

Algorithms

Set intersection

Sorting

Sort less

Hashing

Benchmark

Comments:

dr Ivan Čukić

Projects

Blog categories

Functional Programming in C++

Patronage

Knowing when not to use the STL algorithms - set operations

Algorithms

Set intersection

Sorting

Sort less

Hashing

Benchmark

Share:

Related:

Comments: