I’ll try to provide a more general look at what projections are – not
only in the context of ranges. I recommend reading Ryou’s post before or
after this one for completeness.
Tables have turned
Imagine the following – you have a collection of records where each
record has several fields. One example of this can be a list of files
where each file has a name, size and maybe something else.
Collections like these are often presented to the user as a table
which can be sorted on an arbitrary column.
Tables are everywhere
By default std::sort uses operator< to
compare items in the collection while sorting which is not ideal for our
current problem – we want to be able to sort on an arbitrary column.
The usual solution is to use a custom comparator function that
compares just the field we want to sort on (I’m going to omit namespaces
for the sake of readability, and I’m not going to write pairs of
iterators – just the collection name as is customary with ranges):
There is a lot of boilerplate in this snippet for a really simple
operation of sorting on a specific field.
Projections allow us to specify this in a much simpler way:
sort(files, less, &file_t::name);
What this does is quite simple. It uses less-than for comparison
(less), just as we have with normal std::sort
but it does not call the operator< of the
file_t objects in order to compare the files while sorting.
Instead, it only compares the strings stored in the name
member variable of those files.
That is the point of a projection – instead of an algorithm testing
the value itself, it tests some representation of that value – a
projection. This representation can be anything – from a member
variable, to some more complex transformation.
One step back
Let’s take one step back and investigate what a projection
is.
Instead of sorting, we’re going to take a look at
transform. The transform algorithm with an added projection
would look like this:
We’re transforming a collection of files with a
&file_t::size projection. This means that the
transform algorithm will invoke the provided lambda with
the file sizes instead of file_t objects themselves.
The transformation on each file_t value in the
files collection is performed in two steps:
First, for a file f, extract f.size
Second, divide the result of the previous step by 1024
This is nothing more than a function composition. One function gets a
file and returns its size, and the second function gets a number and
divides it by 1024.
If C++ had a mechanism for function composition, we
would be able to achieve the same effect like so:
Just like in the previous example, the all_of algorithm
is not going to work directly on files, but only on their sizes. For
each file, the size will be read and the provided predicate called on
it.
And just like in the previous example, this can be achieved by simple
function composition without any projections:
The question that arises is whether all projections can be replaced
by ordinary function composition?
From the previous two examples, it looks like the answer to this
question is “yes” – the function composition allows the algorithm to
“look” at an arbitrary representation of a value just like the
projections do. It doesn’t matter whether this representation function
is passed in as a projection and then the algorithm passes on the
projected value to the lambda, or if we compose that lambda with the
representation function.
Unary function composition
The things stop being this simple when an algorithm requires a
function with more than one argument like sort or
accumulate do.
With normal function composition, the first function we apply can
have as many arguments as we want, but since it returns only a single
value, the second function needs to be unary.
For example, we might want to have a function size_in
which returns a file size in some unit like kB, KiB,
MB, MiB, etc. This function would have two arguments –
the file we want to get the size of and the unit. We could compose this
function with the previously defined lambda which checks whether the
file size is greater than zero.
N-ary function composition
sort needs this to be the other way round. It needs a
function of two arguments where both arguments have to be projected. The
representation (projection) function needs to be unary, and the
resulting function needs to be binary.
Composition needed for sorting
Universal projections
So, we need a to compose the functions a bit differently. The
representation function should be applied to all arguments of an n-ary
function before it is called.
As usual, we’re going to create a function object that stores both
the projection and the main function.
This is quite trivial – it calls m_projection for each
of the arguments (variadic template parameter pack expansion) and then
calls m_function with the results. And it is not only
trivial to implement, but it is also trivial for the compiler to
optimize.
Now, we can use projections even with old-school algorithms from STL
and in all other places where we can pass in arbitrary function
objects.
To continue with the files sorting example, the following code sorts
a vector of files, and then prints the files to the standard output all
uppercase like we’re in 90s DOS:
So, we have created the projected_fn function object
that we can use in the situations where all function arguments need to
be projected.
This works for most STL algorithms, but it fails for the coolest (and
most powerful) algorithm – std::accumulate. The
std::accumulate algorithm expects a function of two
arguments, just like std::sort does, but it only the second
argument to that function comes from the input collection. The first
argument is the previously calculated accumulator.
Composition for accumulation
This means that, for accumulate, we must not project all
arguments – but only the last one. While this seems easier to do than
projecting all arguments, the implementation is a tad more involved.
Let’s first write a helper function that checks whether we are
processing the last argument or not, and if we are, it applies
m_projection to it:
Note two important template parameters Total – the total
number of arguments; and Current – the index of the current
argument. We perform the projection only on the last argument (when
Total == Current + 1).
Now we can abuse std::tuple and
std::index_sequence to provide us with argument indices so
that we can call the project_impl function.
The call_operator_impl function gets all arguments as a
tuple and an index sequence to be used to access the items in that
tuple. It calls the previously defined project_impl and
passes it the total number of arguments (sizeof...(Idx)),
the index of the current argument (Idx) and the value of
the current argument.
The call operator just needs to call this function and nothing
more:
We will have projections in C++20 for ranges, but projections can be
useful outside of the ranges library, and outside of the standard
library. For those situations, we can roll up our own efficient
implementations.
These even have some advantages compared to the built-in projections
of the ranges library. The main benefit is that they are reusable.
Also (at least for me) the increased verbosity they bring actually
serves a purpose – it better communicates what the code does.
A bit more than a year ago, the KDE community decided to focus on a
few goals. One of those goals (the most important one as far as I’m
concerned) is to increase the users’ control over their private
data.
KDE developers and users have always been a privacy-minded bunch. But
due to all the fun things that have happened in the recent
years, we had to switch to the next gear.
We have seen new projects like KDE
Itinerary (by Volker), Plasma Vault (by yours truly), Plasma Mycroft (by
Yuri and Aditya), etc. There has also been a lot of work to improve our
existing projects like KMail.
Now, this post is not about any of these.
It is about a KDE Privacy developer sprint organized by Sando
Knauß.
The sprint will be held in Leipzig (Germany) from 22. 3. to 26. 3.
and all privacy-minded contributors are invited to join.
At the previous aKademy, one of the unformal discussions we had were
about Plasma mods.
One thing I always liked about the mobile platforms like Meego (Nokia
N9) and Sailfish that were/are based on Qt/QML, is that there are many
available mods for them created by the community.
With QML, you basically have a lot of source files for an application
(or shell) UI that get compiled when the application is run. This means
that changing the look and behaviour of an application on your system is
often as easy as editing a file with your favourite text editor like
Kate or Vim.
Sometimes modding gets so popular that some brave community member
decides to create an application that allows automatic application of
these mods. This was one of my favourite things about Sailfish OS.
Mods for KDE Plasma
It was always strange to me that Plasma does not have a modders
community.
Maybe it is because Plasma tends to be quite configurable by default,
and mods are not as needed as on other platforms. Maybe it is that
Plasma API is overwhelming for people to use it to create
quick-and-dirty modifications.
Whatever the reason, I would like to see people start writing Plasma
mods.
For that reason, I decided to create a repository with a few example
mods – meant mostly to tackle my pet peeves in Plasma. Some of these are
authored by me, some I got from the KDE forums and other places.
Folder View with the list
view
I have files with quite long names. The icon view, which is the
default for the Folder View applet, does not suit this setup well.
This mod hacks the applet to always think it is placed in a panel –
to force it to show a vertical list of files.
List mode
Default desktop layout
I’m not a fan of keeping files on the desktop.
The main reason is that I don’t want all the files in my home
directory to get shown to the audience when I connect my computer to a
projector at a conference, which is now the default behaviour of
Plasma.
One of the mods reverts the change introduced recently which makes
the Folder View to be the default desktop layout.
Smaller volume OSD
Another thing that I’ve seen people get annoyed by is the big volume
on-screen display that is the default in the Breeze look and feel.
While it looks really pretty, it hides a large portion of the screen
when it is shown.
Fortunately, this is easy to fix with yet another mod.
Different volume OSD
Repository
The repository where I’ve collected these and a few other mods is
located on
GitHub.
The reason this is on GitHub (instead of putting it on the KDE
infrastructure) is to make it clear this is not an official KDE project,
nor that it is condoned by the Plasma team.
Nothing in this repository comes with warranty. Changing your system
files can make your system unusable. Make sure you backup everything
before trying these out.
I was considering to make the title of the post “STL algorithms
considered harmful” to test my click-bait creation skills, but I decided
it is better to aim for the audience interested in the topic of the post
instead of getting readers who just want to argue about my outrageous
claims.
This way, I can assume that you care about algorithms, algorithm
complexity and writing the best software that you can.
Algorithms
One of the advices commonly heard in the modern C++ community is to
use the algorithms provided by the standard library if you want to make
your program safer, faster, more expressive, etc. This is the advice I
also try to popularize – in my book, talks, workshops… everywhere I have
the audience to do so.
While it is true that whenever we want to write a for
loop to solve a problem we have, we should first consider whether there
is a way to solve that problem with the standard (or boost) algorithms,
we should not do it blindly.
We still need to know how those algorithms are implemented, what are
their requirements and guarantees, and what is their space and time
complexity.
Usually, if we have a problem that perfectly matches the requirements
of an STL algorithm so that we can apply it directly, that algorithm
will be the most effective solution.
The problem can arise when we have to prepare the data in some way
prior to applying the algorithm.
Set intersection
Imagine we’re writing a tool for C++ developers to give advisories of
replacing default-captures ([=] and [&])
in lambdas with an explicit list of captured variables.
std::partition(begin(elements), end(elements),
[=] (auto element) {
^~~ - being implicit is uncool, replace with [threshold]
return element > threshold;
});
While parsing the file, we would need to keep a collection containing
the variables from current and surrounding scopes, and when we encounter
a lambda with a default-capture we need to see which variables it
uses.
This gives us two sets – one set containing the variables from the
surrounding scopes, and one set containing the variables used in the
lambda body.
The capture list we should propose as a replacement should be
intersection of those two sets (lambda can use global variables which do
not need to be captured, and not all variables from the surrounding
scopes are used in the lambda).
And, if we need intersection, we can use the
std::set_intersection algorithm.
This algorithm is quite beautiful in its simplicity. It takes two
sorted collections, and goes through them in parallel from start to
end:
If the current item in the first collection is the same as the
current item in the second collection, it gets added to the result the
algorithm just moves on to the next item in both collections;
If the current item in the first collection is less than the current
item in the second collection, the algorithm just skips the current item
in the first collection;
If the current item in the first collection is greater than the
current item in the second collection, the algorithm just skips the
current item in the second collection.
With each iteration at least one element (from the first or the
second input collection) is skipped – therefore, the algorithm
complexity will be linear – O(m + n), where m
is the number of elements in the first collection, and n
the number of elements in the second collection.
Simple and efficient. As long as the input collections are
sorted.
Sorting
The problem is what to do when collections are not sorted in
advance.
In the previous example, it would be prudent to store the variables
from the surrounding scopes in a stack-like structure so that the parser
can efficiently add new ones when it enters a new scope, and remove the
variables from the current scope when it leaves the current scope.
This means that the variables will not be sorted by name and that we
can not use std::set_intersection directly to operate on
them. Similarly, tracking the variables in the lambda body is very
likely not going to keep them sorted.
Since std::set_intersection works only on sorted
collections, it is a common pattern I’ve seen in many projects to first
sort the collections, and then call the
std::set_intersection algorithm.
Skipping the fact that sorting the variables stack in our example
would destroy the purpose of the stack we defined, the intersection
algorithm for unordered collections would look like this:
Sorting takes quasilinear time, which makes the total complexity of
this approach O(n log n + m log m + n + m).
Sort less
Can we skip sorting?
If both collections are unordered, we would need to traverse the
second collection for each element in the first one – to check whether
it needs to be put into the resulting set. Although this approach is not
that uncommon in real-world projects, it is even worse than the previous
one – its complexity is O(n * m).
Instead sorting everything, or sorting nothing, we can be Zen and
take the Middle Path – sort only one collection.
If only one collection is sorted, we will have to iterate through the
unsorted one value by value and check whether that value exists in the
sorted collection for which we can use binary search.
The time complexity for this will be O(n log n) for
sorting the first collection, and O (m log n) for iteration
and checking. In total, we have O((n + m) log n)
complexity.
If we decided to sort the second collection instead of the first, the
complexity would be O((n + m) log m).
In order to be as efficient as possible, we will always sort the
collection that has fewer elements thus making the final complexity for
our algorithm ((m + n) log (min(m, n)).
In our example of lambda captures, the collection that we’d want to
sort is usually going to be the collection of variables used in the
lambda as it is most likely going to be smaller than a list of all
variables from all surrounding scopes.
Hashing
The last option is to build a std::unordered_set
(unordered, hash-based set implementation) from the smaller collection
instead of sorting it. This will make lookups O(1) on
average, but the construction of std::unordered_set will
take time. Construction can be between O(n) and
O(n*n) which can be a problem.
The hashing approach would be the absolute winner for the case where
we want to calculate intersections of several sets with a single
predefined small set. That is, we have a set A, sets
B₁, B₂… and we want to calculate
A ∩ B₁, A ∩ B₂…
In this case, we could ignore the complexity of the
std::unordered_set construction, and the complexity of
calculating each intersection would be linear – O(m), where
m is the number of elements in the second collection.
Benchmark
While it is always useful to check the algorithm complexity, it is
prudent also to benchmark different approaches in cases like these.
Especially when choosing between the last two approaches where we pit
binary search against hash-based sets.
In my basic benchmarks, the first option where both collections have
to be sorted was always the slowest.
Sorting the shorter collection was slightly better than using
std::unordered_set, but not much.
Both the second and the third approach were slightly faster than the
first in the case when both collections had the same number of elements,
and were significantly faster (6x) when one collection had 1000x less
elements than the other.
I have no idea to which category this post belongs to – to C++ or to
KDE development – because it is about one component in Plasma Blade. But
the post is mostly about the approach to writing a distributed system I
went for while implementing it, and a C++ framework that will come out
of it.
Blade is my long-term R&D project to implement a multi-process
KRunner alternative.
Ranges
One of the most powerful parts of the C++ standard library is the
<algorithm> header. It provides quite a few useful
algorithms for processing collections. The main problem is that
algorithms take iterator pairs to denote the start and end of a
collection which should be processed instead of taking the collection
directly.
While this has some useful applications, most of the time it just
makes algorithms more difficult to use and difficult to compose.
There were multiple attempts at fixing this by introducing an
abstraction over sequence collections called ranges. There are
several 3rd party libraries that implement ranges, and one of them (Eric
Niebler’s range-v3 library) is on its way to become a part of the C++
standard library.
There are numerous articles written about ranges available online
(and I even dedicated a whole chapter to them in my book), so I’m not
going to cover them in more detail here. I’m just going to say that
ranges allow us to easily create sequences of transformations that
should be applied to a collection.
Imagine the following scenario – we have the output of the
ping command, and we want to convert the whole output to
uppercase, then extract the number of miliseconds each response took,
and then filter out all the responses which took longer than some
predefined time limit.
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.015 ms
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from localhost (::1): icmp_seq=3 ttl=64 time=0.041 ms
In order to demonstrate the range transformation chaining, we are
going to make this a bit more complicated than it needs to be:
we will convert each line to uppercase;
find the last = character in each line;
chop off everything before the = sign (including the
sign);
keep only results that are less than 0.045.
Written in the pipe notation supported by the range-v3
library, it would look something like this:
In one of my previous
posts, I wrote that using the for_each algorithm
instead of the range-based for loop provides us with an abstraction high
enough to be used for processing asynchronous data streams instead of
being able to work only with regular data collections.
Some people argued that for_each was not meant to be a
customization point in C++, which might be true, but it works very well
as one. Now, it suffers from the same problems as other STL algorithms,
and if we want to reach new heights of abstraction, it truly is not the
best customization point out there. So, we might find something else to
customize instead.
If you look at the previous code snippet, you’ll see several
transformations performed on ping_output. But it does not
say what ping_output is. It can be a vector of strings, it
can be an input stream tokenized on newlines, it can be a
QFuture, etc. It can be anything that
transform and filter can exist for.
Reactive streams
We can create an abstraction similar to ranges, but instead of trying
to create an abstraction over collections, we will create abstractions
over series of events.
Imagine if ping_output was a range that reads the data
from an input stream like std::cin. Whenever we request a
result from the results range, it would block the execution
of our whole program until a whole line is read from the input
stream.
This is a huge problem. The ping command will send our
program one line each second which means that our program will be
suspended for most of its lifetime waiting for those seconds to
pass,
Instead, if would be better if it could continue working on other
tasks until the ping command sends it the new data to
process.
This is where event processing comes into play. Instead of requesting
a value from the results, and then blocking the execution
until that value appears, we want to react to new values
(events) that are received from the ping command, and
process them when they arrive – without ever blocking our program.
This is what reactive streams are meant to model – an asynchronous
stream of values (events). If we continue with the ping
example, the transformations it defines should also be able to work on
reactive streams. When a new value arrives, it goes through the first
transformation which converts it to uppercase, and sends it to the
second transformation. The second transformation processes the value and
sends it to the third. And so on.
The only thing that changed is the source of the values. Instead of
using a range (a vector, or another sequence collection), this uses a
reactive stream which emits a line every time the ping
command outputs it.
This is the power of abstraction – using the code we have written for
one thing, for something completely different. In this case, using the
code that was written to process a collection synchronously, to process
asynchronous event streams.
Distributed stream
processing
While it is nice to be able to write asynchronous software systems in
the same way we write synchronous systems, that is not enough for this
post.
One of the great things about defining programs as series of pure
transformations to be performed on the data is that those
transformations are independent from one another – they can be isolated
from each other and even moved into different processes (even processes
on separate computers in a network).
For this case, I’ve implemented in the Voy library a special type of
transformation called a bridge which is used to transparently
send the data from one process to the other.
Let’s split the data processing in ping example into
three parts (similar to what the Plasma Blade will need) – to execute
the first and the last part in the main program, and to execute the
middle part in the backend. It will look like this (added the
voy namespace for completeness):
This data pipeline is defined once for both the main program and the
backend. For the main program, the middle part of the pipeline will be
disabled (no code generated for it), while for the backend only the
middle part will be compiled.
A note on the implementation
At the last Akademy, I talked to Tomaz Canabrava about making KDE
software more appealing to students to join our development efforts.
Now, this project is probably going to be overly complex for an
average student to join in, but I’m trying to make it as readable and as
clean as possible for more adventurous students. It uses all the new and
cool features of C++17, along with void_t and the detection
idiom to simulate concepts, etc.
If you desire to have a chance to work on a real-world C++17 project,
just send me an e-mail, and I’ll try to get you up to speed. The only
pre-requirement is for you to do some investigation and find out which
repository the code resides in. ;)