Implementing Qt data models is anything but fun. For that reason, I
don’t blame anyone for writing a beginResetModel / endResetModel combo
any time a more complex change has happened.
That is something that we all do, that is something that even Qt
people do in their classes (see QSqlTableModel::select for example).
Resetting a model is an expensive process that can induce overly many
transition animations in QML views. And it is often not really
necessary.
Updates are rarely that complex to warrant invalidation of all data
in the model. Usually, they only consist of a few additions or
removals.
Levenshtein distance
I was planning to fix this in the KActivities data models for some
time now, but haven’t found the time until now.
Recently I started teaching algorithms to my students, coincidentally
some modifications of the usual Levenshtein
distance calculation algorithm. (if you don’t recall the algorithm
courses you had, L.D. calculates how many insertions, deletions and
replacements you have to make in order to convert one string to
another)
Now, the Levenshtein distance is not really a fitting algorithm for
this task. Here, at least in KActivities use-case, most edits are not
per-item. They are more suitable to be represented as sequences of block
operations. Block operations also fit the Qt data model API
perfectly.
After some research, I found a paper by Walter F. Tichy. A paper that
is as old as I am, talking about the string-to-string correction problem
with block moves. The algorithm described there was almost a perfect
fit.
The algorithm
It is a really simple algorithm, that can be described in only a few
lines:
- Take the first character in the destination string
- Find all occurences of said character in the source string, and take
the one that defines the longest matching block
- Now, repeat the algorithm for the first character in the destination
string that was outside of the previously detected block.
For example:
Source: abcabcdefg
| /----/
|/
Destination: abcdefbcg
In this case, for the character ‘a’, it will find two blocks in the
source string: ‘abc’ and ‘abcdef’. It will choose the second one, and
register a block operation that moves it to the beginning of the string.
The next one will detect ‘bc’, and so on.
In the case of KActivities, the situation is a bit simpler. A model
can not contain repeated items, so there is no need to find the biggest
block - there is only one possible block that starts with a specific
element.
The algorithm has O(n²) complexity in the worst case, but in the
common case it tends to be linear - O(n).
Testing
In order to test the updates, the model testing application now has
both QWidget-based UI and QML-based one. All transitions in QML are set
to last for one second, which makes all the events blatantly
obvious.
Why the post title?
So, why is the title of this post ‘there and back again’?
While working on the above algorithm, I wanted to decouple the model
cache implementation from the model itself.
The idea was for the cache class to have methods that do not modify
anything, but to return a sequence of edit operations that need to be
performed.
Something like this:
QList<EditOperation> setItems(...) const;
QList<EditOperation> trimData(...) const;
and a single non-const method that actually performs a specific
operation:
void applyOperation(EditOperation eo);
The model class that uses this cache would get a list of operations
that should be made, and execute one by one while invoking the
corresponding begin(Insert/Remove/Move)Rows / end____Rows methods.
The idea was clean and beautiful, but overly complex and induced some
unnecessary performance penalties. The main problem was in the above
algorithm, because it needs to actually be able to change the cache data
while generating the list of block edits.
For decoupling to work properly, it would need to create a temporary
copy of the cached data just to be able to modify it while calculating
the edit operations. Then, it would destroy it and return a list of
operations it did on the temporary data to be performed on the actual
cache.
So, in the end, I decided that efficiency is more important, and
allowed the cache to directly call
begin_Rows/end_Rows methods.
Conclusion
Is the moral of the story that algorithms are more important to know
then object-oriented programming and encapsulation?
Yes… No… Maybe… :)