GLEP:99993
Title:Package manager controlled installations in an offset
Version:1.16
Last-Modified:2009/03/23 19:23:54
Author:Fabian Groffen <grobian at gentoo.org>
Status:Draft
Type:Standards Track
Content-Type:text/x-rst
Created:31 May 2007
Post-History:

Contents

Credits

The text of this GLEP is, to a certain extent, based on ideas of many people over a period of roughly 4 years. The functional idea of offset installations is not solely an idea of the author.

Abstract

The intention of this GLEP is twofold. First it aims to inform about the problems (possible challenges) of offset installations, and how these can be dealt with on a conceptual level. Second, it describes implementation specific issues, that concern ebuild and eclass developers in offset installations. These implementation specifics stem from a prototype developed as proof-of-concept.

Motivation

While Gentoo is mainly about it, Linux is not the only potential of Gentoo. A prime example of non-Linux Gentoo is the successful Gentoo/FreeBSD variant. It nicely demonstrates the Gentoo portage tree is flexible enough to cater for some diversity. But the road doesn't end at a FreeBSD kernel and userland where Portage is still the primary package manager in control of maintaining the system. Many other operating systems are based on some kind of UNIX, but better off without Portage as their primary package manager. Yet those operating systems are capable of installing software from source, but often lack the infrastructure to do so in a flexible and maintainable manner. Gentoo, as source-based distribution, offers by definition the means to build and maintain packages from source. This GLEP describes the aim to open up the Gentoo portage tree to aforementioned operating systems. Not only does this allow those operating systems to have an up-to-date toolchain and userland, but also does it increase the value of the ebuilds in the Gentoo portage tree. Tedious efforts made on ebuilds to make them working in various settings are reused, and opened up to a wider public.

Rationale

Having a GNU/Gentoo userland and toolchain makes most UNIX-based operating systems more accessible by replacing the mostly Spartan tools by up-to-date (GNU) variants or versions. For (software) developers this means that software build failures because of a lack of proper tools is reduced. For users, it enables easy access to up-to-date tools that are simply not available or installed.

Most software development efforts are aimed at developers with fairly recent development tools. Having access to the right versions of autoconf, automake and libtool often just allows one to start off with development. Having correct libraries is the next step to keep on building. When multiple platforms are involved for the same piece of software, it may even become desirable to have the exact same tools available on every platform.

While the host operating system may not come with these tools, or simply insufficient tools, the desire to have said tools remains. By keeping software (versions) under explicit control, availability can be maintained as a developer or user wishes, independent of the underlying host operating system. To gain this independence, installation in an (solely for this purpose devoted) offset is required.

Backwards Compatibility

Current ebuilds cannot be used in an offset installation. Ebuilds compatible with an offset installation are easily made backwards compatible by setting two variables to trivial values. A special eclass, prefix.eclass caters for this such that Prefix enabled ebuilds can be used in gentoo-x86.

Offset Installations

The package manager within Gentoo installs packages in general with a prefix / or /usr. Exceptions to this usually cater for availability of the functionality in some way or another. An offset installation uses consistently shifted prefixes by pre-pending a fixed path in front of the prefix that normally would be used. This fixed path that is prefixed to the prefix, is hereafter referred to as the offset. As such, an offset installation uses a package manager that installs its packages into a given offset. Assume an offset defined as /home/user/gentoo, then the offset package manager would typically install with a prefix /home/user/gentoo where the normal package manager would install with a prefix /, and similar the offset package manager would install in /home/user/gentoo/usr where the normal package manager would install in /usr.

Offset Considerations

Packages often have some from of a "configuration" phase in which the location of various components during installation of the package are defined. Typically, so-called autotooled packages have a configure script that has the afore mentioned property. The package manager within Gentoo has a function which specifies all locations to match a desired file system layout for this configure script.

Using an offset, the file system layout chosen by the package manager is shifted with this offset. A small side note applies here, dealing with the usefulness of this approach. In principle, in an offset installation, the full file system layout as found in a normal Gentoo Linux installation is found under the offset. Here it serves no use to have programs installed in bin, usr/bin, sbin, usr/sbin, etc. under the offset, while they could be installed all in one place. However, because doing so enables backwards compatibility for this layout and scripts that assume this layout, it has been chosen to use the same layout inside the offset.

Offset Awareness

After a package has been installed in the right location by the package manager, running it should utilise the offset as well. Rationale behind this is that everything installed in the offset was installed on purpose, so other installed packages should make use of that. An example is a package XX that is written in the Python language. Installing said package succeed by placing the appropriate files in the right location under the offset. However, running the program for XX should use the Python interpreter from the offset installation, not the one typically found in /usr/bin/python. This requires the package XX to be patched to look for python in the offset, if it does not already. Another example is package XY. It has a configure script and installs correctly into the offset. However, during runtime, it looks for its configuration in /etc/XY.conf. Also in this case, the package should look into the offset to retrieve the correct configuration file.

Dynamic Libraries in the Offset

A special aspect of Offset Awareness is in shared libraries. While the intrinsics of shared libraries are beyond the scope of this GLEP, no in-depth explanation is done here. The reader is assumed to have some knowledge on dynamic linking in general.

By default, the GCC compilers and linkers have search paths for includes and libraries for /usr/local, /usr and / with include and lib respectively (not considering platform specifics for 64-bits). This introduces some problems for offset installations:

libraries from the offset will not be found by default
Because the compiler search path for includes does not look into the offset include path, the header files remain invisible to the compiler. This can be solved using the -I flag of the compiler, but this is suboptimal. The same holds for the dynamic linker to find the actual libraries, but here the -L flag can be used. GCC can be configured to have an extended search path for includes, allowing to add the offset paths in there. For the linker no such possibility exists other than hacking the code.
produced objects that need dynamic libraries fail at runtime

Assuming an object was successfully compiled and linked using the flags as described in the previous point, at runtime a kernel trap will occur, since the referenced dynamic library cannot be found. The runtime linker, has the same default search paths as the dynamic linker. Since the -L instructions are only used by the dynamic linker to find the libraries it needs to resolve symbols, the runtime linker still will not find the libraries from the offset. The runtime linker can be given a search path via the LD_LIBRARY_PATH variable, however this is not optimal at all, and considered harmful for many reasons (see the Internet). Note that LD_LIBRARY_PATH is not portable (consider DYLD_LIBRARY_PATH on Darwin for instance).

Finally for some systems, this issue is not at hand. For instance the Darwin dynamic linker uses a scheme that allows the produced objects to find the libraries to which they were linked against at compile time. On Linux, xBSD and Solaris and any other ELF-based system above described problem, however, does exist. Runtime Path instructions can be used on these platforms to set the search path of the runtime linker when searching for a library from the object.

In an offset installation, these problems have to be dealt with properly. Solutions range from setting global LDFLAGS, CPPFLAGS and LD_LIBRARY_PATH variables, to wrapping the compiler and/or linker to add the required flags on invocation, or doing post-linking corrections.

Offset Installation Package Manager

The ideas described in this document have been prototyped in a branch of the Gentoo Portage package manager. This allowed piloting the ideas and adjust where necessary. This prototype is codenamed "Prefix". In the rest of this document "Prefix" is used refer to the offset installation, as well as the offset aware package manager prototype. Where necessary, a clear distinction between the two is made.

In Prefix, config files that store paths, can store it with the offset. The alternative of not storing the offset would make the config files compatible with non-Prefix ones, but require all tools reading them to be patched to internally add the offset. The prototype implements the first option, where the offset is stored in paths. This means e.g. GCC configuration files in etc/conf.d/gcc contain the full paths, that gcc-config reads and uses. This option was chosen because it is more obvious in general. The offset is always in the paths, hence it is easy to check if the path is correct. The only exception in the prototype to this, is the CONFIG_PROTECT variable. The paths stored for this variable do not contain the offset, and are hence relative to the offset. The (only) reason for this is that CONFIG_PROTECT is partly defined in the profiles, which are generic and not adapted for the user (and offset).

Ebuild Changes in Prefix

To deal with the extra offset used in Prefix, ebuilds need to be made aware of that offset. Prefixed Portage prototype differs from the main stream Portage version on the following points:

EAPI contains prefix
Each ebuild has EAPI defined to be a string containing at least prefix. The Prefix functionality is defined to be an extension to any officially existing EAPI. For any EAPI that is not 0 (which is the default when undefined) that EAPI is appended to the Prefix' ebuild EAPI string. An example is EAPI="prefix 1" for an ebuild in the gentoo-x86 tree that has EAPI=1. The prototype will mask any ebuild that does not contain prefix in its EAPI. The string is likely to disappear, but functions as marker in the prototype that an ebuild conforms to Prefix requirements. What the final EAPI version, name or feature will become is left as an exercise for the reader for other discussions, and considered out of the scope of this GLEP. In Prefix the EAPI is inserted as first statement in the ebuild after the (CVS) header. Note that every ebuild in Prefix has EAPI defined.
EPREFIX variable
While most ebuilds do not need to explicitly deal with the offset, in some cases it is necessary to directly handle with this offset. Some configure scripts for example require extra or modified paths to be given. Other packages cannot use econf at all. For those occasions the variable ${EPREFIX} is available in ebuilds and eclasses, pointing to the root of the offset. For an offset /Library/Gentoo for example, the variable ${EPREFIX} would contain /Library/Gentoo. This allows to easily use it for example like econf --with-some-app="${EPREFIX}"/usr/bin/some-app.
convenience variables ED and EROOT

In normal ebuilds, ${D} refers to the destination directory in a temporary location before all files are actually merged into the live filesystem. The Prefix Portage prototype runs configure with --prefix="${EPREFIX}"/usr (via econf), which in principle is sufficient for a correct final merge from ${D}. However, for all modifications to the build image, as is common practise in many ebuilds, ${D} no longer suffices for most operations, such as rm -f "${D}"/usr/bin/nuke-me. While this is an obvious result of using an offset, the solution to use ${D}${EPREFIX} in those places requires some typing. To ease this, the variable ${ED} is available for convenience, and is defined as ED=${D}${EPREFIX}/. The variable ${ROOT} has for the same purpose a corresponding ${EROOT} which is defined as EROOT=${ROOT}${EPREFIX}.

Note that ${ED} and ${EROOT} do not obsolete ${D} and ${ROOT}. Recall the --prefix argument to configure, and hence the destination directory not needing a prefix. This is best described by an obvious case: emake DESTDIR="${D}" install. Using ${ED} here would result in a double offset. Retaining the original meaning of the variables ${D} and ${ROOT} in Prefix is considered to be desirable as it reduces confusion.

calls to do* functions
The various do* functions hide the image directory (${D}) where a package is installed before it is merged to the life file system. Because these functions already transparently handle with ${D}, for ease of use (and ebuild conversion), in the Prefix prototype all of these functions work on ${ED} instead of ${D}. The call dodir /usr/bin hence needs no change for Prefix and results in a directory usr/bin under the offset.
USE-expanded prefix
The Prefix base profile sets prefix via use.force, such that every ebuild can test whether it is in Prefix or not. This includes constructs like use prefix && ... in ebuild functions or prefix? ( ... ) in e.g. DEPEND or SRC_URI.
non-privileged environment

The current prototype is implemented not to require any special privileges. In contrast to main stream Portage which requires root privileges, this allows Prefix Portage to be run by normal users. While this strictly is unrelated to installing in an offset, in practise it often is. Because normal users cannot install into e.g. /usr/local, they need to use another (non-standard) location that is writeable to them. It is unrealistic to assume that said users do have administrative privileges (or are willing to use them) for those non-standard locations. They could have just used the native package manager, or any of the other available alternative package managers that do require administrative privileges.

Because the prototype runs unprivileged, the user that is allowed to run various config utilities is unlikely to have UID=0. Further, chown calls are likely to fail. The prototype currently has these checks either changed (to check for the Prefix owner UID) or disabled. It is an open issue to deal with the administrative privileges detached from the Prefix case/environment.

ebuild inter-revisions
Inter-revisions are a non-offset related feature to help to keep ebuild versions aligned with the main tree. See also Section ref{syncing}. They allow to specify a higher version to an ebuild, while maintaining the connection with the main tree. Inter-revisions are simple sub-revisions of main revisions, meaning that for every revision, a numbered sub-revision (inter-revision) version can exist. To make Portage distinguish normal and inter-revisions, the latter ones start with a 0 followed by the normal revision number. The inter-revision is added as dot and the number. An example inter-revision 2 of revision 3 would be -r03.2. Note that the inter-revision of a revisionless ebuild is the inter-revision of revision 0. For example: -r00.1.
prefix.eclass

The prefix eclass in the Prefix prototype provides the function eprefixify. This function replaces occurrences of @GENTOO_PORTAGE_EPREFIX@ with the currently used prefix offset in the files given as arguments to it. This functionality allows to hardcode the offset prefix in files that need it at runtime. Typical uses are in a shebang, or path to find e.g. etc/init.d/functions.sh.

Next to the eprefixify function, this eclass also sets EPREFIX, ED and EROOT when not set by Portage already. This allows a non-Prefix Portage to use the Prefix variables as if it were a Prefix Portage configured for the empty prefix string, by inheriting the prefix eclass.

In Prefix, references to places in the file system need to be reviewed. Often such absolute paths need to be prefixed with the offset, but not always. An example is /dev/null, which is not available in the offset of the Prefix prototype. Each gentoo-x86 ebuild needs to be converted into a Prefix enabled one, possibly automated, but with a human review.

Dynamic Libraries Approach

Since in a Prefix install a toolchain is built as part of the bootstrapping process, Prefix can compile the compiler and linker with offset geared changes if necessary. The GCC compiler is hence configured to have the offset paths in its include search path. This avoids the need to pass -I arguments for offset locations, as the compiler uses them by default. Because dynamic linking differs from platform to platform, and not all platforms use the same linker, in a Prefix installation, the linker is wrapped by a script that simply adds -L flags for offset paths, and -R (Solaris), --rpath= (GNU) or -rpath (IRIX) flags where appropriate. This approach gives full control over the linker and has proven to be reliable in favour of setting e.g. LDFLAGS. Some applications call the linker manually, and forget, or refuse, to pass LDFLAGS during this invocation. By adding the extra arguments at the end of the argument list, originally given arguments precede those added by the wrapper.

With the GCC compilation and wrappers for the linker, the Prefix prototype is able to find headers and libraries at compile time, and to produce objects that can find the right libraries at runtime.

Prefix Tree

Since the Prefix prototype needs modified ebuilds, it is impossible to use the main gentoo-x86 tree. For this reason, a separate tree is in use, devoted to the Prefix prototype [1]. The tree is self-sufficient and basically aims to be a shadow copy of the gentoo-x86 tree containing the Prefix specific changes. These changes are mostly apparent in the profiles directory, where keywords for arches working in the Prefix prototype and their default profiles are added in arch.list and profiles.desc. A default-prefix directory was added which contains the Prefix specific configuration and arches, that eventually inherit from the base profile.

In general the Prefix tree only contains ebuilds that are also in the main tree. However, for some archs or Prefix features special packages are required which do not exist in the main tree, neither are of any use there. Per ebuild modifications exist, but only those that were strictly necessary to make the ebuild working in the Prefix environment. Under and over quotings, syntax errors, spelling errors, etc. are not fixed, to keep the difference with the main to the absolute minimum.

Scripts

Since the conversion of a main tree ebuild to a Prefix tree ebuild is in many cases a trivial task, a few scripts to help doing this are part of the prototype. The three most important scripts are briefly discussed here:

eapify
The only script that actually does conversion from main tree to prefix tree is eapify. It inserts an EAPI= line right after the ebuild header, with a value "prefix", or a combination if EAPI was already defined in the main tree ebuild. Next to this, it uses a simple heuristic to replace occurrences of ${D} by ${ED}. It replaces all, but those that are on a line with make and DESTDIR. Lastly, the script replaces all occurrences of ${ROOT} by ${EROOT}. It must be noted that eapify does not guarantee anything, and its output should be considered as a rough start only.
ecleankw
Since most (Linux) arches are unused and untested in Prefix, their keywords are not relevant in Prefix. ecleankw has a hardcoded list of keywords that are used in Prefix, and matches all keyword in KEYWORDS= for an ebuild with that list. It removes all that do not match, and makes sure the resulting keywords are all dropped back to ~arch. The latter functionality is because in the Prefix prototype no stable keywords are used for simplicity.
eupdate
Once an ebuild is imported from the main tree, updates to the original are not pushed to the imported copy. The eupdate script takes care of this, including importing new versions.

Tree Syncing

To keep the value of

(re)location of the binaries

using $ORIGIN, @executable_path in rpaths/install_names

References

[1]http://overlays.gentoo.org/proj/alt/browser/trunk/prefix-overlay/