Distfile Patching Support
1.
Project Description
This is a Gentoo GSoC 2011 project.
The intention of this project is to develop the software and implement the
necessary infrastructure to generate binary patches for all the distfiles
available on Gentoo Linux mirrors using a generic tool for create binary
patches: diffball. These patches would be fetched and applied by the package
managers, whenever possible. The package manager will be able to detect if a
patch is ok or not, and discard it if needed.
The patches would be created by a server owned by Gentoo Linux
Infrastructure and mirrored as usual, or as desired by the infrastructure team.
Ideally each mirror would have all the distfiles needed by all the packages
currently on the tree and sequences of binary patches between all these
distfiles. When a distfile is removed from the mirror, all the patches against
this file should be removed as well.
This project also wants to address a big issue found on GLEP 25, that is the
difference between the md5sum's of distfiles reconstructed by different
versions of compressors like bzip2. The distfiles with wrong md5sums will be
saved in a separated directory like ${DISTDIR}/delta-reconstructed/, and the
package manager will verify the checksums of each piece used for reconstruct
that distfile, because if we can trust in each piece used, we can trust in the
distfile itself. With this approach the package managers that doesn't have
patching support will not complain about wrong distfiles, and will just
download the full sources as usual. The package managers with patching
support will handle the files from ${DISTDIR}/delta-reconstructed/ and use
them properly.
The software produced during the development of this project would allow
people to save a lot of bandwidth and disk space when upgrading their Gentoo
Linux installations. As a result of this, the Gentoo Linux mirrors will also
save a lot of bandwidth and will be able to serve a greater number of users with
the same amount of resources. Usually just a few parts of the distfiles will
change on a upgrade, then isn't worth to download the full sources again.
Basically people is fetching repeated files over and over again when upgrading
their systems.
The server-side part will be integrated by Gentoo Linux infra and produce the
binary patches transparently. The client-side part will be totally
integrated with the package managers (Portage for now) and will work
transparently for the users. The distfile patching support will not break the
compatibility of the mirrors or the gentoo-x86 tree with package managers that
don't support it yet. Any package manager that doesn't know about binary patches
for distfiles will work as works now, then this project can be integrated and
tested without break anything that is currently working fine.
There's a lot of binary patching implementations around done by other Linux
distributions or OSS projects, like the Fedora's Presto project, but they are
usually tied to a package manager of choice and designed to work with binary
packages, not source packages like Gentoo does. There's also a Gentoo-related
implementation called Deltup, but it's poorly implemented, lack some features
we need and is hard to integrate nicely with Gentoo Infrastructure. One of the
main issues with Deltup is that the remote server generates the binary patches
on demand, and the user needs to wait on queues while the server download the
sources and create the binary patches, and this can take a lot of time and
makes the package manager time out and download the full sources instead.
2.
Developers
| Developer |
Nickname |
Role |
|
rafaelmartins |
Member |
All developers can be reached by e-mail using nickname@gentoo.org.
3.
Project tasks
The tasks of the
distpatch
project are:
XZ support for diffball -
Implement XZ support for diffball (finished)
Write patches for diffball to add XZ support. Currently just bzip and
gzip are supported.
| Starting date: |
23 May 2011 |
A program to generate binary patches for several versions of one package -
distdiffer script (finished)
Write a program that is able to create sequential patches between several
version of the sources of a package. This program will be called by the
program that will run as batch job in the future to produce sequences of
binary patches for all the available distfiles. This program will create
a file with checksum information for all the binary patches of the package
and for the uncompressed sources, to be used by the reconstructor. This
program will also be responsible to verify if the distfile can be splited
in patches or not.
| Starting date: |
30 May 2011 |
A program to reconstruct a distfile from a sequence of binary patches -
distpatcher script
Write a program that will get a sequence of binary patches, verify the
checksums of each one, using the checksum file created by the program
produced as the previous deliverable and reconstruct the distfile. If the
distfile is ok, will be saved at ${DISTDIR}, if not, will be saved at
${DISTDIR}/delta-reconstructed/. This program will also create a local
database of checksums, for the files inside ${DISTDIR}/delta-reconstructed/.
| Starting date: |
13 Jun 2011 |
A program to generate binary patches for all available distfiles, as needed -
distdiffall
Write a program that will create sequential patches for several distfiles
at once as a batch job. This program will check which packages are needing
to be patched and create the patches on demand. This program will also be
responsible by remove old binary patches that aren't needed anymore and
will use the program from the second deliverable to create binary patches
for each package.
| Starting date: |
27 Jun 2011 |
Patch the Portage to use binary patch sequences -
Portage support to binary deltas
Adapt Portage, to be able to use reconstructed tarballs, incorporating
the program from the third deliverable, and being able to look at the
directory ${DISTDIR}/delta-reconstructed/ and use the reconstructed files
without bother about the wrong distfiles, using the local checksum database
created by the reconstructor script to verify the checksums for the patches
and for the uncompressed source. The time reserved for this deliverable is
a bit bigger, because I don't know how much complex it will be.
| Starting date: |
18 Jul 2011 |
Software integration and tests -
Infrastructure test
Make all this thing work on a simulated environment, to see how it behave
and fix possible bugs.
| Starting date: |
1 Aug 2011 |
Gather and polish documentation written during each of the previous phases -
Release documentation
| Starting date: |
8 aug 2011 |
4.
Resources
Resources offered by the
distpatch
project are:
|