Post-GSoC Project Document:
portage backend for PackageKit

by Mounir Lamouri (volkmar)

Goal

This document tries to underline the main aspects of the results of the 'portage backend for PackageKit' Google Summer of Code project.

Introduction

PackageKit [1] tries to abstract package managers to have a single API so cross-distro applications using this API can be written and used easily. How it works ? Every distribution write a backend for his package managers and implements functions/signals/features [2]. API clients (like a GUI) knows only packagekit API and can interact transparently with backends. For example, you can manage your packages with gnome-packagekit or KPackagekit; you can install the required application to run a specific file in Nautilus; you can install the required codecs to read something in totem.

Gentoo and user-friendly things

It goes without saying, PackageKit has not been designed for Gentoo Developpers nor for typical Gentoo users. It's much more for normal people (ie. non geek) [3]. Do you think they can be Gentoo users ? At the moment, probably not because Gentoo is too focused on the command line. If you think Gentoo doesn't have, doesn't need or doesn't want to be user-friendly, this project is surely useless for you. However, even if I admit Gentoo can't be used by normal people at the moment, we can't say it will never happen. In my opinion, if we forget man power / time / money (they are linked) the compilation is probably the real blocker and everything else can be improved.
Even for some Gentoo users having a user-friendly desktop will be appreciated. Indeed some portage frontend have been written already but they were designed for the typical Gentoo user (ie. power users) and they haven't the power of being a cross-distro API.

Summary of the PackageKit work

What have been done ?

You can do a lot of things with packagekit at the moment: searching for files, packages, details, getting depend tree and reverse depend tree, managing repositories and install, update or remove packages.
Actually, for every part of the API, if we were able to do so, there is an implementation.

Why some parts of the API haven't been implemented ?

- DownloadPackages: it's for service packs [4] and we don't use service packs in portage backends.
- GetDistroUpgrades: Gentoo doesn't have distro upgrades even if we can change profiles with this function.
- InstallFiles: install a package from files (we could technically use it but it's not safe to let that).
- InstallSignatures: Gentoo packages aren't signed.
(RepoEnable is a typo in the website, we have it working)
- RepoSetData: to change some things in repositories, we don't need that.
- Rollback: no backends are using that, it's for reverting an entire action like Install(foo, bar).
- WhatProvides: (also a type, it's not working) i will take some time to speak about that later in this doc.

Is that working correctly ?

Speaking of bugs, it's hard to say because it hasn't been tested a lot so I will say as far as I've tested it, the backend is working pretty good. For usefulness, I'm pretty pessimistic. In part because it's slow as hell on my -very old- laptop. I didn't test it on my desktop since I didn't sit in front of it since two months ;)

What needs to be improved / changed in the backend ?

We can always improve things so I can't say there is nothing but there is no big work waiting here for sure. May be fixing things, optimizing code, adding some features. For example, get-updates is not perfect (it is missing new use flags update) and get-update-details which is nearly empty. Some features can be exploretd too like EULA acceptance directly via PackageKit.
However, the main features need a change in PackageKit, portage or ebuilds.

What do we need from PackageKit ?

Some features in portage can't be added in the packagekit backend without at least some discussion with packagekit team. For example, to add the configuration file update, at the moment, I've added a message telling there is an update but it's very likely a configuration file update will be added to packagekit even if exact specifications have to be done.
News and preserved-libs features should be think too. I'm not saying we are going to change packagekit but we need to think how to make these features integrated in the backend as best as possible. For example, news can be implemented via a simple message with a new dedicated type.
It will be cool also to have a better status management in packagekit like distinction between 'configure', 'compile' and 'install' but the work in PackageKit side will be easy, the real work will be to get these informations from portage.

What should be improved in portage ?

What have been done ?

During the GSoC, I've worked with portage. Mostly as a client API but I also did a few commits. Small features, bug fixes or some things more important like a default ACCEPT_LICENSE voted by the council (actually, Zac did the code) and I've added an ACCEPT_PROPERTIES management.
ACCEPT_LICENSE="* -@EULA" isn't needed directly by the backend but it was needed to finish the development of ACCEPT_LICENSE so packages with EULA licenses can forget the interactiveness. It leads us to the second variable, ACCEPT_PROPERTIES. It's needed by the backend to set ACCEPT_PROPERTIES="-interactive" then filtering every interactive packages. It goes without saying interactive packages aren't appreciated in PackageKit.

Less outputs, more return values and exceptions

First of all, as every one knows we are calling the main (at least historic) Gentoo portage manager 'portage' but the tool is 'emerge'. Actually, portage is an API used by emerge and other clients. A lot of things have been written directly in emerge but not in portage like all code related to installation. That makes the backend using emerge code and in my opinion, it shouldn't. PackageKit is probably (AFAIK, surely) the only portage client API trying to install packages. That's maybe not a key issue but it's linked to what's following.
Indeed, this link between portage and emerge explains why a lot of things, nearly everything and clearly too much things are prompted by portage/emerge. For example, when creating the depgraph (when installing a package) if there is an error, the error is showed on stdout. There is a lot of work to know what the error is and it shouldn't be duplicated. So, the code has to be changed to be able to show an error message different than "Something went wrong, sorry bro".
In addition, packagekit daemon (packagekitd) and backends communicate via stdout and stderr so if the backend is printing things, packagekitd will try to interpret them.
Anyway, except for this example, we oftenly can filter output via a hack and be able to live with it.

Improving speed with a cache/db

The real issue is about the speed. The backend is slow (i'm only speaking of search functions here). With no doubt it's partly linked with how I've written some code but it's also linked of what I can use from portage. It's like using 'emerge -s': every one knows it's slow. To fix we will have to use a db/cache at the end of 'emerge --sync' like eix and esearch do. It will be speedier and surely I/O will not be the bottle neck.
Unlike the previous issue, that's a big work.

Improve LICENSE versions handling

That's clearly not linked with the backend but during my work I realized some packages were not considered as free by the backend (there is a 'free' filter) even if they look free. It's actually, LGPL-2 licensed packages. They are probably LGPL-2+ licensed and as LGPL-2.1 is considerd as free by FSF and OS but not LGPL-2, they should be free because of the +. That's also an issue if someone do "ACCEPT_LICENSE=GPL-3", it will refuse a lot of GPL-2+ licensed packages. I don't think it's a _big_ work and it will be surely appreciated.
By the way, i think ACCEPT_LICENSE will not work properly with complex licenses like LICENSE="foo? ( || ( lic1 lic2 ) ) lic3".

What needs to be changed in ebuilds

The big issue: we are source-based

First of all, source-based is cool ;)
But source-based is not predictable. And a lot of cool stuff in PackageKit comes with predictability. For example, there is a 'plugin' that will check for unknown command in the command line and will propose you the packages available providing this command. You can also search for non-installed files or such things. With Gentoo, just forget it, we can't do that.

Using metadata

But we can add some metadata to make everything working.
For example, we can add PROPERTIES="application" and PROPERTIES="gui". The first one will be for packages considered as application, that means used by the end-user. Technically that means having a desktop file. The second one will be for gui related packages. I don't really see the need of 'gui' but I'm not proposing them so... Actually, PROPERTIES is probably not the best solution because it can be filtered and we don't want someone to do ACCEPT_PROPERTIES="-* application" but just filtering a search.
That's why I think we should use metadata for that.
It will be also very usefull to add provided "things" in the metadatas. For example, media-plugins/gst-plugins-speex is probably providing a codec named speex so we could imagine something like speex. It could be used for fonts and other things like drivers, modalias.
This will make us able to use WhatProvides function from PackageKit API.

eerror, ewarn, elog don't tell us anything about the content

This functions are used to show messages to the user. They were designed to be used in a english-only CLI. PackageKit wants to be multi-lingual because you can't be user-friendly if you don't speak the user's language. That said, error messages are rarely translated so an error message or an informative message has a non-translated body and a type. This type has to be really specific like 'ERROR_MISSING_DISK_SPACE' so the user can read the associated string and understand the error. The body message will probably details the exact error like 'can't copy foo file because of missing space'.
With Gentoo ebuilds we will send 'eerror my_message' but nothing will help us to know what the error is. Error throwing by portage are mostly system errors so we can parse them but ebuilds errors are not parsable. We should throw errors warnings and messages by specifying a type. It will probably be good for the end user. We should also minimizing these messages because they are sometimes useless for most users.
For example, we can add 'esecurity', 'econflict' or 'ewarn security message'. In my opinion it's the biggest issue because it's going to be hard to make this happen. First of all, the work is pretty difficult (rationalize error messages) but also because it's going to be hard to convinced Gentoo developers.

Binary packages

I've never used them and they are not officially supported by Gentoo so I'm not sure we should accept them in the backend but if someone has good reasons and want that to be done, I'm not against discuss it.

Conclusion

This google summer of code is finished. We have a -I hope- fine backend ready to be used even if it can be improved in numerous ways. Now, we know the ways we have to follow to have a better product and the very next work will be focusing on that.
I'm probably missing some things in this doc but I will send specifications / GLEP / patches for issues needing to be fixed so I will explain them more precisely in the meantime.

Notes

[1] http://www.packagekit.org
[2] http://www.packagekit.org/pk-matrix.html
[3] http://www.packagekit.org/pk-profiles.html
[4] that's more or less tarballs with a few packages ready to be installed