Gentoo unified maintainer tool proposal (Project Grumpy)

(This is a copy of http://piratepad.net/0QHhsjMau6 - previous history is there, but piratepad has too much downtime,
so future editing is happening here on titanpad until we move relevant stuff to future project wiki pages)

Interested folks are hanging out in #gentoo-grumpy on FreeNode.

== Rationale ==

Currently there are many different places for maintainers to look at information, including,
but not limited to:
* Various QA reports
* euscan
* Gentoo bugzilla
* Upstream (release) announcements, upstream issue trackers

This doesn't give a nice overview of what is there to do, besides looking at saved bugzilla
searches.


== Idea ==

The idea of a unified maintainer tool would be to gather some of this information together,
to give a more of a one-stop place for a maintainer to see what is there to be done.
The tool would know which projects the developer is part of, and in turn which packages
he/she maintains directly or via a project, and could give a list of what could be done
by the maintainer today, now that he/she has half an hour to spare for Gentoo work from
busy life.
This is essentially a re-hash or revival of Grumpy
https://gitweb.gentoo.org/proj/gsoc2010-grumpy.git/tree/docs/gsoc/proposal.txt
which didn't get deployed for various reasons, and then rotted away.


== Random implementation thoughts ==

It should probably be a web service with a main web frontend. The backend can gather
all the data and do the processing, then expose it out in the web interface.
This web interface could be built on top of an API ("micro-services" if you want to be
fancy); that web API then could also be used to build a separate CLI tool that exposes
some of the functionality.
While many of the things could be personalized based on what the authenticated user
maintains, it of course should be possible to view things for other projects too, when
desired. Authentication could be in various ways, OpenID, Gentoo LDAP, whatever we can
do. If we get a trusted bind between a gentoo developer and the user in this tool, some
extra things might be possible to do on the developers behalf.


== Functionality Ideas ==

* Show all relevant QA reports against your packages
* Show all available upstream bumps to your packages
* Show all of your package revisions that have been in ~arch for 30+ days and haven't
  been stabled
** Allow automatically filing STABLEREQ bugs for individual packages or a selected
   collection of packages (or eventually move this process completely into this tool)
* Show a priority list of available STABLEREQ for arch team members (security, request age); also KEYWORDREQ
* List of revisions that can be pruned
* Retrieving GitHub PRs and other things via their API
* Whatever we can do to help out maintainers thanks to a centralized data store, git tracking, etc

All of these can see future improvements and features, e.g:
* available bump checker can be improved to find bumps for more packages than euscan today;
  allow various rules like version number remapping, notion of upstream development cycles, etc.
* add "fixed in" field to bugzilla, so we can consider closed bugs when suggesting stabilization candidates
* allow to tweak the stabilization finder with extra rules, e.g this core system package shouldn't
  notify before 60 days, or mark a revision bump to remind fast stable in 10 days because it
  fixes an important bug users face, etc.
* Handling of stabilization on Grumpy site completely. Instead of filing bugs, a revision could be marked for
  stabilization and then grumpy can figure out all the rest (which arch teams to list it to based on earlier keywords,
  etc). It could support marking a set of packages to be stabilized together for more tied together sets of packages.
  Then it could give a sorted priority list of packages to stabilize to arch team members (or other interested parties),
  allowing to choose an amount to take care of at a time, and then outputting a package.accept_keywords file to
  use for this. It could keep the sets marked by maintainers to be together forced together, even if the cut-off number
  asked for is smaller. It could support community arch testers from testing it and marking it up for the tested packages
  that they believe it's fine for the arch, so an official member can consider that and do lighter testing himself.
  It could also show to other same arch team members which packages have already been "checked out" by another
  team member, and then skip those on the automatic list outputted on request for package.accept_keywords (with
  a timeout on that checkout, in case it doesn't get done and not marked as such by the first person having gotten
  to those packages). Once stabilization is pushed, grumpy can notice from git tracking and automatically remove it
  from that architectures list. Independently there's also ideas at https://archives.gentoo.org/gentoo-project/message/ed983fa3e0940f276a692e8dd9161c15


== Notes ==

* Mapping of CPAN versions to Gentoo versions:
https://gist.github.com/kentfredric/cfd5a593d3d90d87ae11 # kent\n's rough implementation of a stand-alone python equivalent of both Perl's version.pm and our normalization logic (misses out on some details as discussed on IRC)
Or just use dev-perl/Gentoo-PerlMod-Version via some little perl helper utility that can convert the versions over (gentoo-perlmod-version.pl from the package itself might be suitable)


== Prior art ==

* AutoTua - "Automate It All" - GSoC 2008  - https://gitweb.gentoo.org/proj/autotua.git
* Grumpy - GSoC 2010 - https://gitweb.gentoo.org/proj/gsgsoc2010-grumpy.git/
* euscan - http://euscan.gentooexperimental.org/ | https://github.com/iksaif/euscan
* gentoo-bumpchecker - https://gitweb.gentoo.org/proj/gentoo-bumpchecker.git
* Launchpad
* Anitya - release-monitoring.org | https://release-monitoring.org/api | fedmsg edmsg notifications
* libraries.io
* https://alternativeto.net/software/libraries-io/ listing VersionEye, Gemnasium, David, Touchpine
* requires.io
* http://beta.repology.org/
* Debian:
** https://udd.debian.org/dmd.cgi?email1=pkg-multimedia-maintainers%40lists.alioth.debian.org
** https://qa.debian.org/developer.php?login=pkg-multimedia-maintainers@lists.alioth.debian.org
** https://packages.qa.debian.org/f/ffmpeg.html


== Prototype plans ==

Grab and sync all existing main tree packages with versions/revisions and keywords into database;
currently work started on just grabbing it from packages.gentoo.org JSON API to avoid dealing with
a full tree for now. We can store a first_seen timestamp for each revision (at least that has an
~arch marking) and later use that for 30 day stabilization suggestions. The data includes maintainers,
this should be synced and saved too, combined with projects.xml syncing. This can then be used for
the main purpose of grumpy - showing personalized data, so automatically showing stuff about
packages the logged in user maintains directly or maintains via a project. But this is a frontend
detail really - for the data sync bits it's not really a concern, it's about the presentation later.

kensington provided a machine parseable imlate report; this is currently located at
https://astralcloak.net/~kensington/qa-reports/imlate-grumpy-daily
We can sync that in as well and then present this data in personalized way in the frontend as
potential stabilization candidates for the maintainer.

anitya (release-monitoring.org) provides upstream version bump notifications via fedmsg.
We should be connecting to that fedmsg bus and catch these notifications live, but to support
downtime, we'll need code to parse the history anyways. As such, for prototype we can only use
that for now, and ask the history between previous sync and now. Fedora provides a datanommer
instance for this here: https://apps.fedoraproject.org/datagrepper/
We can use it's HTTP API to get data from anitya. The API gives almost the same JSON as flies
over the fedmsg bus live as well, so the parsing and handling code can be shared in the future.
Our package naming is stored in the anitya production instance at release-monitoring.org. When
a mapping is added or removed, a fedmsg happens. Because right now anitya doesn't provide an
API to query these mappings (other than requesting each package details and then checking if
it contains a Gentoo mapping), we can instead ask from datagrepper all mapping messages since
3-4th September or so (this is when the first mapping was added, and Gentoo distro as a whole
with that) - of course after first sync to a clean database, it should only update from the
timestamp onwards we already have data for to not spam datagrepper too hard. With this we
should have release-monitoring.org project ID associations to our package database entries
thanks to the mapping, as to de.
The API is now actually there: https://github.com/fedora-infra/anitya/issues/344 (though it doesn't include the mapping itself, but we can get that with an extra query - request to provide that info directly in the big list is now filed as #419 there).
After we have the initial mappings, we can seed the versions via anitya web API documented at
https://release-monitoring.org/api
After that, we can start grabbing update data from datagrepper (fedmsg later), knowing which
to discard (due to no mapping) and which to consider.
From this we can display missing version bump reports to the user in the frontend.

To avoid handling login stuff for the prototype, we can just let the user tell us what projects
he is part of and save that in a cookie. When the cookie is unset, we can redirect to the page
where this choice can be done. Or we could just "login" with e-mail with no authorization and
then just act as that (we can find out what projects that e-mail is part of from projects.xml).
This should be fine for the initial prototype, because everything is read-only for now. We can
think about authorization stuff once there are features that involve writing to the database
(e.g saving in database that a given package should wait 60 days in ~arch before stabilization
suggestion, and other future features like that).

mgorny made his gentoo-ci parseable XML into one file for us, so we can also grab that data
to display to maintainers. This can be found from here:
https://gitweb.gentoo.org/report/gentoo-ci.git/tree/
The output.xml is continuosly pushed to together with the existing html report, so for the
prototype we can just sync it from gitweb plain text file download to avoid handling git for
the time being.