Implementing Qt data models is anything but fun. For that reason, I don’t blame anyone for writing a beginResetModel / endResetModel combo any time a more complex change has happened.

That is something that we all do, that is something that even Qt people do in their classes (see QSqlTableModel::select for example).

Resetting a model is an expensive process that can induce overly many transition animations in QML views. And it is often not really necessary.

Updates are rarely that complex to warrant invalidation of all data in the model. Usually, they only consist of a few additions or removals.

Levenshtein distance

I was planning to fix this in the KActivities data models for some time now, but haven’t found the time until now.

Recently I started teaching algorithms to my students, coincidentally some modifications of the usual Levenshtein distance calculation algorithm. (if you don’t recall the algorithm courses you had, L.D. calculates how many insertions, deletions and replacements you have to make in order to convert one string to another)

Now, the Levenshtein distance is not really a fitting algorithm for this task. Here, at least in KActivities use-case, most edits are not per-item. They are more suitable to be represented as sequences of block operations. Block operations also fit the Qt data model API perfectly.

After some research, I found a paper by Walter F. Tichy. A paper that is as old as I am, talking about the string-to-string correction problem with block moves. The algorithm described there was almost a perfect fit.

The algorithm

It is a really simple algorithm, that can be described in only a few lines:

  • Take the first character in the destination string
  • Find all occurences of said character in the source string, and take the one that defines the longest matching block
  • Now, repeat the algorithm for the first character in the destination string that was outside of the previously detected block.

For example:

Source:      abcabcdefg
             | /----/
             |/
Destination: abcdefbcg

In this case, for the character ‘a’, it will find two blocks in the source string: ‘abc’ and ‘abcdef’. It will choose the second one, and register a block operation that moves it to the beginning of the string. The next one will detect ‘bc’, and so on.

In the case of KActivities, the situation is a bit simpler. A model can not contain repeated items, so there is no need to find the biggest block - there is only one possible block that starts with a specific element.

The algorithm has O(n²) complexity in the worst case, but in the common case it tends to be linear - O(n).

Testing

In order to test the updates, the model testing application now has both QWidget-based UI and QML-based one. All transitions in QML are set to last for one second, which makes all the events blatantly obvious.

Why the post title?

So, why is the title of this post ‘there and back again’?

While working on the above algorithm, I wanted to decouple the model cache implementation from the model itself.

The idea was for the cache class to have methods that do not modify anything, but to return a sequence of edit operations that need to be performed.

Something like this:

QList<EditOperation> setItems(...) const;
QList<EditOperation> trimData(...) const;

and a single non-const method that actually performs a specific operation:

void applyOperation(EditOperation eo);

The model class that uses this cache would get a list of operations that should be made, and execute one by one while invoking the corresponding begin(Insert/Remove/Move)Rows / end____Rows methods.

The idea was clean and beautiful, but overly complex and induced some unnecessary performance penalties. The main problem was in the above algorithm, because it needs to actually be able to change the cache data while generating the list of block edits.

For decoupling to work properly, it would need to create a temporary copy of the cached data just to be able to modify it while calculating the edit operations. Then, it would destroy it and return a list of operations it did on the temporary data to be performed on the actual cache.

So, in the end, I decided that efficiency is more important, and allowed the cache to directly call begin_Rows/end_Rows methods.

Conclusion

Is the moral of the story that algorithms are more important to know then object-oriented programming and encapsulation?

Yes… No… Maybe… :)