Post-GSoC Project Document:
portage backend for PackageKit
by Mounir Lamouri (volkmar)
Goal
This document tries to underline the main aspects of the results of the
'portage backend for PackageKit' Google Summer of Code project.
Introduction
PackageKit [1] tries to abstract package managers to have a single API so
cross-distro applications using this API can be written and used easily. How
it works ? Every distribution write a backend for his package managers and
implements functions/signals/features [2]. API clients (like a GUI) knows only
packagekit API and can interact transparently with backends.
For example, you can manage your packages with gnome-packagekit or KPackagekit;
you can install the required application to run a specific file in Nautilus;
you can install the required codecs to read something in totem.
Gentoo and user-friendly things
It goes without saying, PackageKit has not been designed for Gentoo Developpers nor for
typical Gentoo users. It's much more for normal people (ie. non geek) [3].
Do you think they can be Gentoo users ? At the moment, probably not because
Gentoo is too focused on the command line. If you think Gentoo doesn't have,
doesn't need or doesn't want to be user-friendly, this project is surely
useless for you. However, even if I admit Gentoo can't be used by normal
people at the moment, we can't say it will never happen. In my opinion, if we
forget man power / time / money (they are linked) the compilation is probably
the real blocker and everything else can be improved.
Even for some Gentoo users having a user-friendly desktop will be appreciated.
Indeed some portage frontend have been written already but they were designed
for the typical Gentoo user (ie. power users) and they haven't the power of
being a cross-distro API.
Summary of the PackageKit work
What have been done ?
You can do a lot of things with packagekit at the moment: searching for files,
packages, details, getting depend tree and reverse depend tree, managing
repositories and install, update or remove packages.
Actually, for every part of the API, if we were able to do so, there is an
implementation.
Why some parts of the API haven't been implemented ?
- DownloadPackages: it's for service packs [4] and we don't use service packs
in portage backends.
- GetDistroUpgrades: Gentoo doesn't have distro upgrades even if we can change
profiles with this function.
- InstallFiles: install a package from files (we could technically use it but
it's not safe to let that).
- InstallSignatures: Gentoo packages aren't signed.
(RepoEnable is a typo in the website, we have it working)
- RepoSetData: to change some things in repositories, we don't need that.
- Rollback: no backends are using that, it's for reverting an entire action
like Install(foo, bar).
- WhatProvides: (also a type, it's not working) i will take some time to speak
about that later in this doc.
Is that working correctly ?
Speaking of bugs, it's hard to say because it hasn't been tested a lot so I
will say as far as I've tested it, the backend is working pretty good.
For usefulness, I'm pretty pessimistic. In part because it's slow as hell on
my -very old- laptop. I didn't test it on my desktop since I didn't sit in
front of it since two months ;)
What needs to be improved / changed in the backend ?
We can always improve things so I can't say there is nothing but there is no
big work waiting here for sure. May be fixing things, optimizing code, adding
some features. For example, get-updates is not perfect (it is missing new use
flags update) and get-update-details which is nearly empty. Some features can be exploretd too like EULA acceptance directly via PackageKit.
However, the main features need a change in PackageKit, portage or ebuilds.
What do we need from PackageKit ?
Some features in portage can't be added in the packagekit backend without at
least some discussion with packagekit team. For example, to add the
configuration file update, at the moment, I've added a message telling there
is an update but it's very likely a configuration file update will be added to
packagekit even if exact specifications have to be done.
News and preserved-libs features should be think too. I'm not saying we are
going to change packagekit but we need to think how to make these features
integrated in the backend as best as possible. For example, news can be
implemented via a simple message with a new dedicated type.
It will be cool also to have a better status management in packagekit like
distinction between 'configure', 'compile' and 'install' but the work in
PackageKit side will be easy, the real work will be to get these informations
from portage.
What should be improved in portage ?
What have been done ?
During the GSoC, I've worked with portage. Mostly as a client API but I also
did a few commits. Small features, bug fixes or some things more important
like a default ACCEPT_LICENSE voted by the council (actually, Zac did the code)
and I've added an ACCEPT_PROPERTIES management.
ACCEPT_LICENSE="* -@EULA" isn't needed directly by the backend but it was
needed to finish the development of ACCEPT_LICENSE so packages with EULA
licenses can forget the interactiveness. It leads us to the second variable,
ACCEPT_PROPERTIES. It's needed by the backend to set
ACCEPT_PROPERTIES="-interactive" then filtering every interactive packages. It
goes without saying interactive packages aren't appreciated in PackageKit.
Less outputs, more return values and exceptions
First of all, as every one knows we are calling the main (at least historic)
Gentoo portage manager 'portage' but the tool is 'emerge'. Actually, portage
is an API used by emerge and other clients. A lot of things have been
written directly in emerge but not in portage like all code related to
installation. That makes the backend using emerge code and in my opinion, it
shouldn't. PackageKit is probably (AFAIK, surely) the only portage client API trying to
install packages. That's maybe not a key issue but it's linked to what's
following.
Indeed, this link between portage and emerge explains why a lot of things,
nearly everything and clearly too much things are prompted by portage/emerge.
For example, when creating the depgraph (when installing a package) if there
is an error, the error is showed on stdout. There is a lot of work to know
what the error is and it shouldn't be duplicated. So, the code has to be
changed to be able to show an error message different than "Something went
wrong, sorry bro".
In addition, packagekit daemon (packagekitd) and backends communicate via
stdout and stderr so if the backend is printing things, packagekitd will try
to interpret them.
Anyway, except for this example, we oftenly can filter output via a hack and
be able to live with it.
Improving speed with a cache/db
The real issue is about the speed. The backend is slow (i'm only speaking of
search functions here). With no doubt it's partly linked with how I've
written some code but it's also linked of what I can use from portage. It's
like using 'emerge -s': every one knows it's slow. To fix we will have to use a db/cache at
the end of 'emerge --sync' like eix and esearch do. It will be speedier and
surely I/O will not be the bottle neck.
Unlike the previous issue, that's a big work.
Improve LICENSE versions handling
That's clearly not linked with the backend but during my work I
realized some packages were not considered as free by the backend (there is a
'free' filter) even if they look free. It's actually, LGPL-2 licensed
packages. They are probably LGPL-2+ licensed and as LGPL-2.1 is
considerd as free by FSF and OS but not LGPL-2, they should be free because of the +. That's
also an issue if someone do "ACCEPT_LICENSE=GPL-3", it will refuse a lot of
GPL-2+ licensed packages. I don't think it's a _big_ work and it will be
surely appreciated.
By the way, i think ACCEPT_LICENSE will not work properly with complex
licenses like LICENSE="foo? ( || ( lic1 lic2 ) ) lic3".
What needs to be changed in ebuilds
The big issue: we are source-based
First of all, source-based is cool ;)
But source-based is not predictable. And a lot of cool stuff in PackageKit
comes with predictability. For example, there is a 'plugin' that will check
for unknown command in the command line and will propose you the packages
available providing this command. You can also search for non-installed files
or such things. With Gentoo, just forget it, we can't do that.
Using metadata
But we can add some metadata to make everything working.
For example, we can add PROPERTIES="application" and PROPERTIES="gui". The
first one will be for packages considered as application, that means used by
the end-user. Technically that means having a desktop file. The second one
will be for gui related packages. I don't really see the need of 'gui' but
I'm not proposing them so... Actually, PROPERTIES is probably not the best
solution because it can be filtered and we don't want someone to do
ACCEPT_PROPERTIES="-* application" but just filtering a search.
That's why I think we should use metadata for that.
It will be also very usefull to add provided "things" in the metadatas. For
example, media-plugins/gst-plugins-speex is probably providing a codec named
speex so we could imagine something like
speex. It could be used for fonts and other
things like drivers, modalias.
This will make us able to use WhatProvides function from PackageKit API.
eerror, ewarn, elog don't tell us anything about the content
This functions are used to show messages to the user. They were designed to be
used in a english-only CLI. PackageKit wants to be multi-lingual because you
can't be user-friendly if you don't speak the user's language. That said, error
messages are rarely translated so an error message or an informative message has
a non-translated body and a type. This type has to be really specific like
'ERROR_MISSING_DISK_SPACE' so the user can read the associated string and
understand the error. The body message will probably details the exact error
like 'can't copy foo file because of missing space'.
With Gentoo ebuilds we will send 'eerror my_message' but nothing will help us
to know what the error is. Error throwing by portage are mostly system errors
so we can parse them but ebuilds errors are not parsable. We should throw errors
warnings and messages by specifying a type. It will probably be good for the end
user. We should also minimizing these messages because they are sometimes
useless for most users.
For example, we can add 'esecurity', 'econflict' or 'ewarn security message'.
In my opinion it's the biggest issue because it's going to be hard to make this
happen. First of all, the work is pretty difficult (rationalize error messages)
but also because it's going to be hard to convinced Gentoo developers.
Binary packages
I've never used them and they are not officially supported by Gentoo so I'm
not sure we should accept them in the backend but if someone has good reasons
and want that to be done, I'm not against discuss it.
Conclusion
This google summer of code is finished. We have a -I hope- fine backend ready to
be used even if it can be improved in numerous ways. Now, we know the ways we
have to follow to have a better product and the very next work will be
focusing on that.
I'm probably missing some things in this doc but I will send specifications / GLEP / patches for issues needing to be fixed so I will explain them more precisely in the meantime.
Notes
[1] http://www.packagekit.org
[2] http://www.packagekit.org/pk-matrix.html
[3] http://www.packagekit.org/pk-profiles.html
[4] that's more or less tarballs with a few packages ready to be installed