Auto reject if autopkgtest of reverse dependencies fail or cause FTBFS

Hi Pirate,

Post by Pirate Praveen
Similar to piuparts auto rejects, I think we should add auto reject when
autopkgtest of a reverse dependency or build dependency fails (which was
not failing earlier) or cause FTBFS to reverse dependencies. This will
help us prevent library updates without proper transitions breaking
other packages. One recent example is update on python-html5lib which
broke python-bleach even though build was failing [1].
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=844943

I'm working on thatÂ¹ and hope we can enable it soon after Stretch release.

Paul
Â¹ https://lists.debian.org/debian-release/2016/12/msg00310.html

Pirate Praveen

2017-01-13 08:20:04 UTC

Post by Paul Gevers
I'm working on thatÂ¹ and hope we can enable it soon after Stretch release.

Thanks! I think it will help us in a great way in handling library
transitions.

Scott Kitterman

2017-01-13 13:30:51 UTC

Post by Paul Gevers
Hi Pirate,

I'm working on that¹ and hope we can enable it soon after Stretch release.
Paul
¹ https://lists.debian.org/debian-release/2016/12/msg00310.html

For clarity, you're discussing this being a testing migration blocker, not a
package accept auto-reject, right?

Scott K

Paul Gevers

2017-01-13 14:48:47 UTC

Hi Scott,

Post by Scott Kitterman

I'm working on thatÂ¹ and hope we can enable it soon after Stretch release.
Paul
Â¹ https://lists.debian.org/debian-release/2016/12/msg00310.html

For clarity, you're discussing this being a testing migration blocker, not a
package accept auto-reject, right?

I am not sure if you are addressing me or Pirate, but indeed I am
working on an implementation similar to what Ubuntu does (see the link
above about the details) which will be used as unstable to testing
migration blocker. debci is the worker, but all the policy logic will be
with britney where it belongs. And of course I try to have a full
release cycle to tune it.

Paul

Ole Streicher

2017-01-13 14:57:09 UTC

Post by Paul Gevers
I am not sure if you are addressing me or Pirate, but indeed I am
working on an implementation similar to what Ubuntu does (see the link
above about the details) which will be used as unstable to testing
migration blocker. debci is the worker, but all the policy logic will be
with britney where it belongs. And of course I try to have a full
release cycle to tune it.

Will there be a way to override this for the maintainer? Otherwise I
would see the danger that a buggy reverse dependency CI test can prevent
an important update, for example if the reverse dependency uses a long
deprecated function that is now removed.

Best regards

Ole

Ghislain Vaillant

2017-01-13 16:05:02 UTC

I second Ole's concerns here. Strict autorejection would be assuming
that all autopkgtest testsuites are solid, which has not always been
the case in my experience.

Ghis

Antonio Terceiro

2017-01-13 16:27:33 UTC

You can either fix the reverse dependency, or get it removed.

Ole Streicher

2017-01-13 16:46:41 UTC

Post by Antonio Terceiro

You can either fix the reverse dependency, or get it removed.

Sorry, I don't understand this. How can I get a reverse dependency
removed (from unstable)? And why should I get responsible for poorly
maintained reverse dependencies?

Also, at least up to now, CI test failures are not necessarily
critical. It depends on the evaluation of the maintainer which severity
the problem that popped up has: often CI tests are quite picky to serve
as an early indicator for problems.

For example, a new package could write a deprecation warning which
brings the CI test of a reverse dependency to fail. The failure is in no
way critical (since the package works). But I would also not like to
ignore stderr -- I *want* to have these kinds of warnings so that I can
react before the real change happens, but I also see no reason to hurry
up here (usually, I contact upstream and wait until they have a
solution).

If you now make the first package dependent on the reverse dependency,
it will not migrate because of the CI failure, but I would also (as
maintainer of the reverse dependency) not accept to ignore stderr.

Problems like these will create additional work for all parties and are
likely to make people angry. IMO it would be much better if you would
either auto-create bug reports (which may be re-assigned), or to have an
"ignore" button somewhere.

The idea of getting informed that a certain upload causes problems in
other packages is however great.

BTW, there were some discussions at debconf about getting an E-mail on
CI test status changes; this would also be a nice thing.

Best regards

Ole

Ian Jackson

2017-01-13 18:22:53 UTC

Post by Ole Streicher
Sorry, I don't understand this. How can I get a reverse dependency
removed (from unstable)?

You wouldn't. You would need to get it removed from testing.

Post by Ole Streicher
And why should I get responsible for poorly
maintained reverse dependencies?

This is more of a sticking point. I don't know what proportion of CI
failures are going to be due to poorly maintained reverse
dependencies.

But the real answer to this is that "Debian testing should be kept
releaseable" and that means that if your rdepends are busted such that
your changes cause lossage, something has to give.

Post by Ole Streicher
The idea of getting informed that a certain upload causes problems in
other packages is however great.

Maybe an intermediate position would be to respond to a CI failure by:
* Increasing the migration delay for the affecting package
* Notifying the affected package maintainers

Post by Ole Streicher
BTW, there were some discussions at debconf about getting an E-mail on
CI test status changes; this would also be a nice thing.

Yes.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Scott Kitterman

2017-01-13 18:54:26 UTC

Post by Antonio Terceiro

You can either fix the reverse dependency, or get it removed.

Sorry, I don't understand this. How can I get a reverse dependency
removed (from unstable)? And why should I get responsible for poorly
maintained reverse dependencies?
Also, at least up to now, CI test failures are not necessarily
critical. It depends on the evaluation of the maintainer which severity
the problem that popped up has: often CI tests are quite picky to serve
as an early indicator for problems.
For example, a new package could write a deprecation warning which
brings the CI test of a reverse dependency to fail. The failure is in no
way critical (since the package works). But I would also not like to
ignore stderr -- I *want* to have these kinds of warnings so that I can
react before the real change happens, but I also see no reason to hurry
up here (usually, I contact upstream and wait until they have a
solution).
If you now make the first package dependent on the reverse dependency,
it will not migrate because of the CI failure, but I would also (as
maintainer of the reverse dependency) not accept to ignore stderr.
Problems like these will create additional work for all parties and are
likely to make people angry. IMO it would be much better if you would
either auto-create bug reports (which may be re-assigned), or to have an
"ignore" button somewhere.
The idea of getting informed that a certain upload causes problems in
other packages is however great.
BTW, there were some discussions at debconf about getting an E-mail on
CI test status changes; this would also be a nice thing.

Probably the simplest way to avoid problems with systems like this is to
remove any autopkg tests your packages are shipping.

Scott K

P.S. Perverse incentives FTW.

Ondrej Novy

2017-01-13 13:38:28 UTC

Hi,

just be carefull, because there are some packages which FTBFS in debci
(example:
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/swift.html
)
and it's bug in debci. Build works fine in buildd and in my local sbuild.

Maybe we should fix this first?

--
Best regards
OndÅej NovÃœ

Email: ***@ondrej.org
PGP: 3D98 3C52 EB85 980C 46A5 6090 3573 1255 9D1E 064B

Holger Levsen

2017-01-13 14:14:09 UTC

Post by Ondrej Novy
just be carefull, because there are some packages which FTBFS in debci
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/swift.html
)
and it's bug in debci. Build works fine in buildd and in my local sbuild.

while this is not related to debci, it brings up an interesting
question:

how should /dev/shm be mounted? and how /run/shm?

i'm interesting in this question for Debian stable with 3.16 and 4.6-4-9
kernels and ubuntu 16.04 running 4.4 kernelsâŠ

--
cheers,
Holger

Simon McVittie

2017-01-13 15:54:30 UTC

Post by Holger Levsen
how should /dev/shm be mounted? and how /run/shm?

I believe the "API" is that /dev/shm is either a tmpfs with
/tmp-like permissions (01777), or a symlink to such a tmpfs.
My understanding is that /run/shm is considered to be an
implementation detail, rather than something that software should
hard-code anywhere.

Reference: glibc sysdeps/unix/sysv/linux/shm-directory.c (the original
user of /dev/shm).

systemd mounts a tmpfs on /dev/shm (it's hard-coded in as one of
the "API filesystems"), and Debian's systemd packaging puts a symlink
at /run/shm in case anything is relying on it
(/usr/lib/tmpfiles.d/debian.conf).

If I'm reading the initscripts code correctly, sysvinit does the reverse
by default, for some reason (/run/shm is the mount point and /dev/shm the
symlink). I think the motivation might have been to be able to use the
same tmpfs for /run and /run/shm, but that's a bad idea if you want to
prevent unprivileged users from performing a DoS attack on privileged system
components by filling up /run (which is why systemd gives each user their
own tmpfs at /run/user/$uid by default).

The default schroot configuration mounts a tmpfs on /dev/shm and does not
do anything special about /run/shm.

Generalizing from those, I think it's reasonable to say that in a
bare-metal system, init is responsible for arranging for /dev/shm to be
as required, and in a container or chroot, the container manager is
responsible.

S

Steve Langasek

2017-01-14 19:00:51 UTC

Post by Simon McVittie
If I'm reading the initscripts code correctly, sysvinit does the reverse
by default, for some reason (/run/shm is the mount point and /dev/shm the
symlink). I think the motivation might have been to be able to use the
same tmpfs for /run and /run/shm,

I recall this being a misguided attempt to move it out of /dev "because it's
not a device". The migration did not go well, especially in the face of
chroots that need to have it mounted, and since systemd did not handle this
the same way sysvinit had, we effectively now have a mess in the other
direction.

We should fix it so that everything again treats /dev/shm as the mountpoint.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
***@ubuntu.com ***@debian.org

Simon McVittie

2017-01-14 21:15:42 UTC

Package: initscripts
Version: 2.88dsf-59.8
Severity: normal

Post by Steve Langasek

I recall this being a misguided attempt to move it out of /dev "because it's
not a device". The migration did not go well, especially in the face of
chroots that need to have it mounted, and since systemd did not handle this
the same way sysvinit had, we effectively now have a mess in the other
direction.
We should fix it so that everything again treats /dev/shm as the mountpoint.

Let's have a bug number for that, then. Please escalate its severity if you
think that's correct.

Steps to reproduce:

* install Debian (I used vmdebootstrap according to autopkgtest-virt-qemu(1))
* apt install sysvinit-core
* reboot
* mount
* ls -al /dev/shm /root/shm

Expected result:

* /dev/shm is a tmpfs
* /run/shm is a symlink with target /dev/shm

Actual result:

* /dev/shm is a symlink with target /run/shm
* /run/shm is a tmpfs

----

This might also be related to #697003, #818442.

Michael Biebl

2017-01-15 00:18:00 UTC

Post by Steve Langasek

The /run/shm symlink in systemd was added to minimize breakage when
doing the switch from sysvinit to systemd

Post by Steve Langasek
We should fix it so that everything again treats /dev/shm as the mountpoint.

Nod, I'd be more then happy to drop the /run/shm symlink again from systemd.

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Simon McVittie

2017-01-15 00:51:51 UTC

Post by Michael Biebl

Post by Steve Langasek
I recall this being a misguided attempt to move it out of /dev "because it's
not a device". The migration did not go well, especially in the face of
chroots that need to have it mounted, and since systemd did not handle this
the same way sysvinit had, we effectively now have a mess in the other
direction.

The /run/shm symlink in systemd was added to minimize breakage when
doing the switch from sysvinit to systemd

If I understand correctly, the objection was to how sysvinit behaves
(for which I have now opened #851427) - it puts the symlink at /dev/shm and
the real mount at /run/shm.

I don't think systemd is doing anything wrong here. Upstream systemd is
correct to mount the actual filesystem on /dev/shm, and IMO it's also
valid for Debian systemd to make the symlink.

Post by Michael Biebl

Post by Steve Langasek
We should fix it so that everything again treats /dev/shm as the mountpoint.

Nod, I'd be more then happy to drop the /run/shm symlink again from systemd.

This sounds like a job for post-stretch. Let's not remove low-cost
compatibility symlinks right now :-)

S

Simon Richter

2017-01-18 15:54:58 UTC

Hi,

Post by Simon McVittie
If I understand correctly, the objection was to how sysvinit behaves
(for which I have now opened #851427) - it puts the symlink at /dev/shm and
the real mount at /run/shm.

That is the correct approach, and IIRC this is how it was implemented in
sysvinit before jessie (/dev/shm is way older than /run), so I'm wondering
what triggered the change.

Simon

Antonio Terceiro

2017-01-13 16:47:41 UTC

Post by Ondrej Novy
Hi,

just be carefull, because there are some packages which FTBFS in debci
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/swift.html
)
and it's bug in debci. Build works fine in buildd and in my local sbuild.

I think you are a little confused. That links to reproducible builds,
which has nothing to do with debci.

Ondrej Novy

2017-01-13 17:49:03 UTC

Hi,

Post by Antonio Terceiro
I think you are a little confused. That links to reproducible builds,
which has nothing to do with debci.

yep, sorry for confusion. I assumed that FTBS migration check will use data
from reproducible builds OR will use same system for running builds
(Jenkins?).

--
Best regards
OndÅej NovÃœ

Email: ***@ondrej.org
PGP: 3D98 3C52 EB85 980C 46A5 6090 3573 1255 9D1E 064B

Simon McVittie

2017-01-13 19:35:10 UTC

Post by Ian Jackson
* Increasing the migration delay for the affecting package
* Notifying the affected package maintainers

I think this makes sense: it gives the maintainer and other interested
developers some time to assess whether the failure is a showstopper
(=> upload a fix/revert/workaround, or at worst file a RC bug) or not.

Or, conversely, blocking migrations but letting the relevant maintainers
remove that block might work.

Post by Ian Jackson
Probably the simplest way to avoid problems with systems like this is to
remove any autopkg tests your packages are shipping.

[...]

Post by Ian Jackson
P.S. Perverse incentives FTW.

This is my concern too. If you maintain a useful package with tests that
are unstable but do not imply a release-critical issue, running those
tests and recording-but-ignoring the failures seems considerably better
for Debian than either disabling the tests or removing the software
(more information is better than less information, and if the package
is useful despite the test failure then it's better to have it than not).

Possible autopkgtest extension: "Restrictions: unreliable"?

S

Ole Streicher

2017-01-13 20:05:30 UTC

Post by Ian Jackson
* Increasing the migration delay for the affecting package
* Notifying the affected package maintainers

I think this makes sense: it gives the maintainer and other interested
developers some time to assess whether the failure is a showstopper
(=> upload a fix/revert/workaround, or at worst file a RC bug) or not.
Or, conversely, blocking migrations but letting the relevant maintainers
remove that block might work.

What is the reason not to use automated bug reports here? This would
allow to use all the tools the bug system has: severities, reassigning
closing etc.

Post by Simon McVittie
Possible autopkgtest extension: "Restrictions: unreliable"?

This is not specific enough. Often you have some tests that are
unreliable, and others that are important. Since one usually takes the
upstream test suite (which may be huge), one has to look manually first
to decide about the further processing.

Best regards

Ole

Paul Gevers

2017-01-13 22:08:11 UTC

Hi Ole,

Post by Ian Jackson
* Increasing the migration delay for the affecting package

I like this and will suggest it to the release team. Especially for the
start up time.

Post by Ian Jackson
* Notifying the affected package maintainers

I think this makes sense: it gives the maintainer and other interested
developers some time to assess whether the failure is a showstopper
(=> upload a fix/revert/workaround, or at worst file a RC bug) or not.
Or, conversely, blocking migrations but letting the relevant maintainers
remove that block might work.

One can always file bug reports against the release.debian.org pseudo
package to ask for britney to ignore the autopkgtest result. One other
thing that I can envision (but maybe to early to agree on or set in
stone) is that we lower the NMU criteria for fixing (or temporarily
disabling) autopkgtest in ones reverse dependencies. In the end,
personally I don't think this is up to the "relevant maintainers" but up
to the release team. And I assume that badly maintained autopkgtest will
just be a good reason to kick a package out of testing.

On the item of notifications, I would expect that as soon as we have
this mechanism in place we would implement the notification on all the
actively maintained notification places, such as tracker.debian.org and
udd. This isn't much different than other migration criteria or
auto-removals in that respect.

Post by Ole Streicher
What is the reason not to use automated bug reports here? This would
allow to use all the tools the bug system has: severities, reassigning
closing etc.

The largest reason is that it didn't cross my mind yet and nobody else
except you has raised the idea so far. One cravat that I see though is
which system should hold the logic. The current idea is that it is
britney that determines which combinations need to be tested and thus
can use the result straight away for the migration decision. As Martin
Pitt described in the thread I referenced in my first reply, Ubuntu
already experimented with this and they came to the conclusion that it
didn't really work if two entities have to try and keep the logic in sync.

Post by Simon McVittie
Possible autopkgtest extension: "Restrictions: unreliable"?

Than maybe change the wording: not-blocking, for-info, too-sensitive,
ignore or .... If you know your test suite needs investigation, you can
have it not automatically block. But depending on the outcome of the
investigation, you can still file (RC) bugs.

Why I am so motivated on doing this is because I really believe this is
going to improve the quality of the release and the release process. I
really hope that more packages will be creating autopkgtests, as one now
has an incentive: it helps guaranteeing that you package will not
suddenly be kicked out of the release due to a change in your
dependencies. Personally, one of my autopkgtest has triggered a change
in MySQL (upstream even) to not suddenly drop some command line option,
but implement a deprecation period. Software was relying on the behavior
and in this case there wasn't a grace period. I am not sure that without
this "stick" it would have happened (and definitely not in time for the
specific Ubuntu release).

Paul

Ole Streicher

2017-01-14 10:05:48 UTC

Post by Paul Gevers
One can always file bug reports against the release.debian.org pseudo
package to ask for britney to ignore the autopkgtest result.

This would again concentrate work on a relatively small team.

Post by Paul Gevers
One other thing that I can envision (but maybe to early to agree on or
set in stone) is that we lower the NMU criteria for fixing (or
temporarily disabling) autopkgtest in ones reverse dependencies. In
the end, personally I don't think this is up to the "relevant
maintainers" but up to the release team. And I assume that badly
maintained autopkgtest will just be a good reason to kick a package
out of testing.

I already brought an example where autopkgtest ist well maintained but
keeps failing.

And I think that it is the package maintainers who have the experience
of whether a CI test failure is critical or not.

BTW, in the moment the CI tests are done in unstable -- if you want to
kick out a package from *testing*, you need to test the new unstable
package against this, which would be some change in the logic of
autopkgtest.

Post by Ole Streicher
What is the reason not to use automated bug reports here? This would
allow to use all the tools the bug system has: severities, reassigning
closing etc.

The largest reason is that it didn't cross my mind yet and nobody else
except you has raised the idea so far.

I already don't understand this with the piuparts blocker: we have an
established workflow for problems with packages that need some
intervention, and this is bugs.d.o. This has a lot of very nice
features, like:

* discussion of the problem attached to the problem itself and stored
for reference
* formal documentation of problem solving in the changelog (Closes: #)
* severities, tags, re-assignments, affects etc.
* maintainer notifications, migration blocks, autoremovals etc.
* documented manual intervention possible

I don't see a feature that one would need for piuparts complaints or for
CI test failures that is not in our bug system. And (I am not sure)
aren't already package conflict bugs autogenerated?

I would really prefer to use the bug system instead of something else.

Post by Paul Gevers
One cravat that I see though is which system should hold the
logic. The current idea is that it is britney that determines which
combinations need to be tested and thus can use the result straight
away for the migration decision.
As Martin Pitt described in the thread I referenced in my first reply,
Ubuntu already experimented with this and they came to the conclusion
that it didn't really work if two entities have to try and keep the
logic in sync.

I don't see the need to keep things in sync: If a new failure is
detected, it creates an RC bug against the migration candidate, with an
"affects" to the package that failed the test. The maintainer then has
the possibilities:

* solve the problem in his own package, upload a new revision, and close
the bug there

* re-assign the problem to the package that failed the test is the
problem lies there. In this case, that maintainer can decide if the
problem is RC, and if not, then lower the severity.

In any case, the maintainers can follow the established workflow, and if
one needs to look up the problems a year later, one can just search for
the bug.

What else would you need to keep in sync?

Post by Simon McVittie
Possible autopkgtest extension: "Restrictions: unreliable"?

Than maybe change the wording: not-blocking, for-info, too-sensitive,
ignore or ....

The problem is that a test suite is not that homogenious, and often one
doesn't knows that ahead. For example, the summary of one of my packages
(python-astropy) has almost 9000 individual tests. Some of them are
critical and influence the behaviour of the whole package, but others
are for a small subsystem an/or a very special case. I have no
documentation of the importance of each individual test; this I decide
on when I see a failure (in cooperation with upstream). But more: these
9000 tests are combined into *one* autopkgtest result. What should I put
there?

Post by Paul Gevers
If you know your test suite needs investigation, you can have it not
automatically block. But depending on the outcome of the
investigation, you can still file (RC) bugs.

But then we are where we are already today: Almost all tests of my
packages are "a bit complex", so I would just all mark them as
non-blocking. But then I would need to file the bugs myself, and
especially then there is no formal sync between the test failure and the
bug.

Post by Paul Gevers
Why I am so motivated on doing this is because I really believe this is
going to improve the quality of the release and the release process.

As I already wrote: I really appreciate autopkgtest, and I would like to
have a way to automatically keep and track CI test failures. I just think
that it should allow the maintainer to finally overwrite, and that
bugs.d.o is the superiour system for the workflow because of its
flexibility.

Best regards

Ole

Adam D. Barratt

2017-01-14 12:01:00 UTC

Post by Ole Streicher
I don't see the need to keep things in sync: If a new failure is
detected, it creates an RC bug against the migration candidate, with an
"affects" to the package that failed the test. The maintainer then has
* solve the problem in his own package, upload a new revision, and close
the bug there
* re-assign the problem to the package that failed the test is the
problem lies there. In this case, that maintainer can decide if the
problem is RC, and if not, then lower the severity.
In any case, the maintainers can follow the established workflow, and if
one needs to look up the problems a year later, one can just search for
the bug.

You missed the (not at all hypothetical) case:

* downgrades the bug, regardless of the practical impact of the failure,
just so her package can migrate.

Regards,

Adam

Ian Jackson

2017-01-14 13:11:19 UTC

I prefer my other suggestion, that humans should write bugs if
necessary to unblock migration. Because:

* It eliminates a timing problem, where the testing migration
infrastructure[1] needs to somehow decide whether the test have
been run. (This is needed because in the future we may want to
accelerate migration, perhaps dramatically when there are lots of
tests; and then, the testing queue may be longer than the minimum
migration delay.)

* See my other mail about the problems I anticipate with
automatically opened bug reports.

Post by Paul Gevers
Than maybe change the wording: not-blocking, for-info, too-sensitive,
ignore or ....

You should help enhance autopkgtest so that a single test script can
report results of multiple test. This will involve some new protocol
for those test scripts.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Ole Streicher

2017-01-14 13:45:38 UTC

Ole Streicher writes ("Re: Auto reject if autopkgtest of reverse

I prefer my other suggestion, that humans should write bugs if
* It eliminates a timing problem, where the testing migration
infrastructure[1] needs to somehow decide whether the test have

^^^ reference/footnote not found

been run. (This is needed because in the future we may want to
accelerate migration, perhaps dramatically when there are lots of
tests; and then, the testing queue may be longer than the minimum
migration delay.)

I would not see this a big problem: the bug can also be filed against a
migrated package. As with any other bug. Also humans sometimes fail to
write bug reports during the sid quarantine, resulting in
autoremovals. I see no difference here.

* See my other mail about the problems I anticipate with
automatically opened bug reports.
The difficulty with automated bug reports is this: how do you tell
whether something is the same bug or not ?
If you're not careful, a test which fails 50% of the time will result
in an endless stream of new bugs from the CI system which then get
auto-closed...

Just allow only one bug report per version pair, report only changes,
and don't report is another bug for the package pair is still
open. Either have a local database with the required information, of
store this as metadata in the bug reports. and query the BTS before
sending.

Basically the same procedure as one would do manually.

(If there are bugs, we want them to auto-close because no matter how
hard we try, test failures due to "weather" will always occur some of
the time. Closing such bugs by hand would be annoying.)

I just had a lengthy (and unresolved) discussion with Santiago Vila
about weather dependent built time failures https://bugs.debian.org/848859
While I disagree that those are RC, IMO occasional failures are useful
to report to the maintainer, without autoclosing.

Post by Paul Gevers
Than maybe change the wording: not-blocking, for-info, too-sensitive,
ignore or ....

You should help enhance autopkgtest so that a single test script can
report results of multiple test. This will involve some new protocol
for those test scripts.

Sorry, but I can't evaluate all 9000 tests and categorize them which are
RC and which are not -- this will not work. It is also not realistic to
force upstream to do so. The only thing I can do is reactively tag a
certain failure being RC or not.

Often the test infrastructure even doesn't have a way to mark a test as
xfail (like cmake), and upstream even can't definitely say which tests
are xfailing, so that I already have enough to do to keep the tests in a
good shape.

Best regards

Ole

Ian Jackson

2017-01-16 14:29:25 UTC

Post by Ian Jackson
* It eliminates a timing problem, where the testing migration
infrastructure[1] needs to somehow decide whether the test have

^^^ reference/footnote not found

Post by Ian Jackson
been run. (This is needed because in the future we may want to
accelerate migration, perhaps dramatically when there are lots of
tests; and then, the testing queue may be longer than the minimum
migration delay.)

I would not see this a big problem: the bug can also be filed against a
migrated package.

That should not be the default. I think you have missed my point. If
the migration delay for a particular upload is less than the waiting
time to get the tests run, then somehow we will need to delay the
migration on the grounds that the tests have not been run.

Obviously this could be done but it involves a new data exchange (and
new protocol) between the testing migration decision tools[1] and the
CI system, which are supposed to be arms-length.

There are other intertwinings: typically batches of packages need to
be tested together, and it's the testing migration system that knows
which packages to test.

So it would be simpler to do the CI as part of the testing migration.

Ultimately this is a decision for Paul I think.

[1] is the same missing footnote as before, which was:
[1] The name "britney" is IMO not cool. I wish it would be renamed.

Post by Ian Jackson
The difficulty with automated bug reports is this: how do you tell
whether something is the same bug or not ?
If you're not careful, a test which fails 50% of the time will result
in an endless stream of new bugs from the CI system which then get
auto-closed...

Just allow only one bug report per version pair, report only changes,
and don't report is another bug for the package pair is still
open. Either have a local database with the required information, of
store this as metadata in the bug reports. and query the BTS before
sending.
Basically the same procedure as one would do manually.

No, it isn't. What you propose produces one bug report per uploaded
version of each dependency. What one would do manually is have one
report that describes the scope of the problem.

Also a manual bug report can have a better introduction.

Post by Ian Jackson
You should help enhance autopkgtest so that a single test script can
report results of multiple test. This will involve some new protocol
for those test scripts.

You have misunderstood my proposal, I think.

I am suggesting that you should arrange that your 9000 tests each
show up as one test case as far as autopkgtest is concerned. That can
probably be done wholesale: these kind of systems already produce
systematic (or nearly-systematic) output. So you don't need to
categorise them up-front.

Then when you get a test failure you would look at (only) the failing
tests, and perhaps file a bug

To: ***@bugs.debian.org
Subject: gnomovision monochrome gnomes are pink and blue

Package: gnomovision
Version: 1.2-3
Severity: minor
Control: user ci.debian.net
Control: usertags -1 + nonblock-G32-PINK nonblock-G33-BLUE

The gnomes in these tests should be black and white. This is caught
by the autopkgtests which check the colour configuration.

The bug is cosmetic - even on a monochrome display, you can tell the
gnomes apart.

Doing things this way also means that you fix the bug in the changelog
in the usual way. If you're wrong, the CI will nag you until you
reopen the bug.

FWIW: in my day job I maintain osstest, the Xen Project's CI system,
so I have a lot of experience of how CI blocking workflows should be
managed.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Steve Langasek

2017-01-14 18:15:15 UTC

Hi Ole,

If the failure of the test is not critical, then it should not be used as a
gate for CI. Which means you, as the package maintainer who knows that this
test failure is not critical, should fix your autopkgtest to not fail when
the non-critical test case fails.

Quite to the contrary of the claims in this thread that gating on
autopkgtests will create a bottleneck in the release team for overriding
test failures, this will have the effect of holding maintainers accountable
for the state of their autopkgtest results. CI tests are only useful if you
have a known good baseline. If your tests are flaky, or otherwise produce
failures that you think don't matter, then those test results are not useful
than anyone but yourself. Please help us make the autopkgtests useful for
the whole project.

And the incentive for maintainers to keep their autopkgtests in place
instead of removing them altogether is that packages with succeeding
autopkgtests can have their testing transition time decreased from the
default. (The release team agreed to this policy once upon a time, but I'm
not sure if this is wired up or if that will happen as part of Paul's work?)

Post by Ole Streicher
BTW, in the moment the CI tests are done in unstable -- if you want to
kick out a package from *testing*, you need to test the new unstable
package against this, which would be some change in the logic of
autopkgtest.

The autopkgtest policy in Ubuntu's britney deployment includes all the logic
to do this. Hopefully Paul can make good use of this when integrating into
Debian.

Post by Simon McVittie
Possible autopkgtest extension: "Restrictions: unreliable"?

Than maybe change the wording: not-blocking, for-info, too-sensitive,
ignore or ....

The result of the autopkgtest should be whatever you as the maintainer think
is the appropriate level for gating. Frankly, I think it's sophistry to
argue both that you care about seeing the results of the tests, and that you
don't want a failure of those tests to gate because they only apply to
"special cases". We should all strive to continually raise the quality of
Debian releases, and using automated CI tests is an effective tool for this.
Bear in mind that this is as much about preventing someone else's package
from silently breaking yours in the release, as it is about your package
being blocked in unstable. This is a bidirectional contract, which works
precisely if your autopkgtest is constructed to be a meaningful gate.
Having a clear gate is the only way to meaningfully scale out CI for the
number of components in Debian and have it actually drive quality of the
distribution.

I will say that looking at Ubuntu autopkgtest results for packages you're
maintainer of, I see quite a few recent autopkgtest failures of packages
that are reverse-dependencies of python-astropy. From my POV, that's a good
thing, and I'm happy that there were autopkgtests there that gated
python-astropy rather than letting it into the Ubuntu release in a state
that broke many of its reverse-dependencies (or at least, broke the tests).

Post by Paul Gevers
If you know your test suite needs investigation, you can have it not
automatically block. But depending on the outcome of the
investigation, you can still file (RC) bugs.

Why would you mark them non-blocking /before/ you know that the tests are
flaky enough for this to matter? Upstream put them in the test suite for a
reason. I'd suggest that it's much better to block by default, and if you
find that a particular test is becoming a problem (for you or for another
maintainer), you can upload to make that test non-blocking. But in the
meantime, deeper testing provides a lot of goodness - *if* we gate on the
results of that testing.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
***@ubuntu.com ***@debian.org

Ole Streicher

2017-01-15 14:24:12 UTC

Post by Steve Langasek

Again: astropy (as an example) has 9000 tests, designed by a quite large
upstream team. There is no realistic way to proactively decide which of
these is critical and which not. At the end I would have to decide
whether to make any of those tests fail, or none.

Post by Steve Langasek
Quite to the contrary of the claims in this thread that gating on
autopkgtests will create a bottleneck in the release team for overriding
test failures, this will have the effect of holding maintainers accountable
for the state of their autopkgtest results. CI tests are only useful if you
have a known good baseline. If your tests are flaky, or otherwise produce
failures that you think don't matter, then those test results are not useful
than anyone but yourself. Please help us make the autopkgtests useful for
the whole project.

I use autopkgtests for the majority of my packages, and they are very
useful.

Just to bring my experience from the last days: I uploaded a new
upstream version of astropy, also removed an internal version of pytest
(2.X) that was accidentally overseen before. This caused a number of
rdependent failures:

1. the new astropy version ignores the "remote_data" option in its test
infrastructure, which causes rdeps to fail if they have (optional)
remote tests which previously were disabled by this option.

This bug is clearly a regression of astropy. However since it does not
affect the normal work of the package but only the test infrastructure,
and there is a workaround for the affected packages, it is not really RC.

2. Since now the Debian version of pytest is to be used (which is
3.0.5), deprecation warnings appear for the packages that use the
astropy test framework, causing the tests to fail. This is not a
regression of astropy, but also not RC for the affected rdeps: I am even
not sure whether one should allow-stderr here, since this would just
hide the problem. Instead, I would prefer creating bugs upstream. Since
in the specific case they don't expect the switch to 3.0.5, this may
take some time. In the meantime, the packages are perfectly usable,
until pytest upstream decided to remove the deprecated stuff.

Post by Steve Langasek
From my experience, a large amount of CI test failures are not due to

bugs in the package itself, but in the test code. Then, especially if
one runs a large test suite, many of the tests check some corner cases
which rarely appear in the ordinary work of the package. Sure, both
needs to be fixed, but it is not RC: the package will work fine for most
people -- at least, in Debian we don't have a policy that *any*
identified bug is RC.

I prefer to have CI tests as an indicator of *possible* problems with
the package, so they should start to ring a bell *before* something
serious happens.

Post by Steve Langasek
And the incentive for maintainers to keep their autopkgtests in place
instead of removing them altogether is that packages with succeeding
autopkgtests can have their testing transition time decreased from the
default. (The release team agreed to this policy once upon a time, but I'm
not sure if this is wired up or if that will happen as part of Paul's work?)

Hmm. Often the CI tests are more or less identical with the build time
tests, so that passing CI tests is not really a surprise. And we are
discussion here a different case: that rdependencies need to have
passing CIs.

Post by Steve Langasek
The result of the autopkgtest should be whatever you as the maintainer think
is the appropriate level for gating. Frankly, I think it's sophistry to
argue both that you care about seeing the results of the tests, and that you
don't want a failure of those tests to gate because they only apply to
"special cases".

I just want to keep the possibilities as they are for other reported
problems: reassing to the correct package, change severity, forward to
upstream, keep the discussion etc. I just want them as ordinary bugs.

Again: we have a powerful system that shows what is needed to do with a
package, and this is bugs.d.o. Why re-invent the wheel and use something
that is less powerful here? Why should we handle a CI test failure
differently from any bug report?

Post by Steve Langasek
Bear in mind that this is as much about preventing someone else's
package from silently breaking yours in the release, as it is about
your package being blocked in unstable.

I would propose (again) that a failing CI test would just create an RC
bug for the updated package, affecting the other. Its maintainer then
can decide whether he needs to solve this himself, reassigns it to the
rdep or lowers the severity. Since the other package maintainer is
involved via "affects", this will not be done in isolation. If the
decision is questionable, the problem may be escalated as any other
bug.

Best regards

Ole

Sean Whitton

2017-01-15 22:00:29 UTC

Hello Steve,

Post by Steve Langasek
If the failure of the test is not critical, then it should not be used
as a gate for CI. Which means you, as the package maintainer who
knows that this test failure is not critical, should fix your
autopkgtest to not fail when the non-critical test case fails.

If we make it so that the only way to mark a test failure as
non-critical is to hack the test suite to exit zero anyway, we would
make it much less convenient to run non-critical tests on
ci.debian.net. Maintainers could no longer look for 'fail' to see
whether their non-critical tests have failed: they would have to open up
the test output.

I agree with the principle that test failures should be RC by default.
I think we need an additional field in d/tests/control to mark
individual tests as non-critical (this wouldn't really help Ole's 9000
tests package though).

--
Sean Whitton

Ole Streicher

2017-01-16 07:50:57 UTC

Post by Sean Whitton
I agree with the principle that test failures should be RC by default.

This is something which seems to have no disagreement here. My concern
is just that I want to have a simple way to override this, to assign
this to a different package etc. I want to have the same flexibility
here as for bugs.

Post by Sean Whitton
I think we need an additional field in d/tests/control to mark
individual tests as non-critical (this wouldn't really help Ole's 9000
tests package though).

While this is a really large test suite, it also wouldn't help for
others, since there are many packages with > 100 tests: aplpy has ~250
tests, astroml has ~210 tests, gnudatalanguage has ~170 tests etc. Even
when the package itself is rather "small", the test count is quite
nice -- the github ecosystem with test coverage tools (and badges) helps
here, as well as powerful Python test packages.

Especially modern packages with a highly motivated upstream come with a
large number of tests (and they care about failures). This makes
autokgtest to a very valuable tool, but it shouldn't become a
maintainer's nightmare. And if I would have to maintain test-specific
severities, that would be a nightmare, not only for me but also for
upstream, who would need to be involved as the knowledgable instance
here.

Best regards

Ole

Lars Wirzenius

2017-01-16 08:38:42 UTC

Post by Sean Whitton
I agree with the principle that test failures should be RC by default.

A failing test means there's a bug. It might be in the test itself, or
in the code being tested. It might be a bug in the test environment.

Personally, I'd really rather have unreliable tests fixed. Unreliable
tests are like playing Russian roulette: mostly OK but sometimes you
get a really loud noise that makes your parents and loved ones be
ashamed of you.

Picture this: a cocktail party. Many people mingling around, dressed
up and engaging in smalltalk, sipping colourful drinks. A new couple
arrives and is immediately surrounded by old fiends. "Hi, Jack and
Joan, how are you? How is that lovely offspring of yours?" The couple
look down, and their faces get a careful, blank expression. "It's not
good. We don't know what we did wrong. We're so ashamed. We don't know
how such a thing could happen. We thought we were such good parents."
A shocked silence fall on the group, in the middle of the hubbub of
the greater party. "You see, our child, our child..." Jack sobs and
can't get the words out, so Joan takes a deep breath and speaks. "Our
child wrote a test that fails randomly, and released it." One by one
their friends leave the group, quietly, and without speaking a single
harsh syllable. But for months, they had to wait for an invitation to
a new party.

Apart from social exclusion, unreliable tests waste a lot of time,
effort, and mental energy. When a test fails, you have to find out
why. What caused the fail? Is it because the test it bad, or because
the code it tests is broken? If you let a test fail randomly, you have
to debug that test many times. It also kills confidence in the test
suite: if all tests pass, is that too also just a random fluke? Can
you actually make a release, or should you do some tedious manual
testing just to make sure flaky test success didn't cover up a bug
somewhere?

Until an unreliable test is fixed, in my opinion it'd be better if the
test suite didn't fail because of it. Run the test by all means, to
gather more information for debugging, but don't fail the whole test
suite.

--
I want to build worthwhile things that might last. --joeyh

Simon McVittie

2017-01-16 09:35:25 UTC

Post by Lars Wirzenius
A failing test means there's a bug. It might be in the test itself, or
in the code being tested. It might be a bug in the test environment.

Nobody is disputing this, but we have bug severities for a reason:
not every bug is release-critical. If we gated packages on "has no
known bugs" we'd never release anything.

Post by Lars Wirzenius
Personally, I'd really rather have unreliable tests fixed.

Of course, but it isn't always feasible to drop everything and fix an
unreliable test, or the bug that the test illustrates - the cause of an
intermittent bug is often hard to determine. Until that can happen, I'd
rather have the test sometimes or always fail, ideally reported as
XFAIL or TODO or something (distinguishing it from "significant"
failures), so I can use the information that it produces.

For example, several of the ostree tests intermittently failed for a
long time, which turned out to be (we think) a libsoup thread-safety
bug. If I had disabled those tests on ci.debian.net altogether, then
I wouldn't have been able to tell upstream "those tests have stopped
failing since fixing libsoup, so that fix is probably all we need".

Post by Lars Wirzenius
Apart from social exclusion, unreliable tests waste a lot of time,
effort, and mental energy.

Yes, and in an ideal world they wouldn't exist. This world is demonstrably
not ideal, and the code we release is not perfect (if it was, we wouldn't
need tests). Would you prefer it if packages whose tests are not fully
reliable just stopped running them altogether, or even deleted them?

I would very much prefer that we run tests, even the imperfect ones,
because CPU time is cheap and more information is better than less
information.

I've opened:

autopkgtest: define Restrictions for tests that aren't suitable for gating CI
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=851556

and sent a patch to:

autopkgtest: should be possible to ignore test restrictions by request
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850494

in the hope that we can use those as a way to mark certain tests as
"failure is non-critical".

S

Ole Streicher

2017-01-16 09:24:59 UTC

Hi Lars,

Post by Lars Wirzenius

Post by Sean Whitton
I agree with the principle that test failures should be RC by default.

I fully agree with you. I just think that it is not necessarily RC that
a CI test is unfixed.

The point here is: the proposed plan is to make CI test failures in
reverse dependencies a direct migration excuse, which can only be
overwritten by the release team.

I find this too unflexible, and propose that instead the failing CI test
should create an RC bug assigned to the updated package, affecting the
failing package. This would

* document the bug on the place where all bugs are documented, and keep
it in our eternal bug database, allowing to search for it etc.

* enable all the possibilities an open bug has, like discussion,
re-assignment, severity change etc.

* better link the d/changelog entry to the problem ("Increasing
tolerance in picky test. Closes: #123456" instead of "... to fix a CI
test failure with updated grampf package")

Post by Lars Wirzenius
Picture this: a cocktail party. Many people mingling around, dressed
up and engaging in smalltalk, sipping colourful drinks.

Nice picture :-)

Post by Lars Wirzenius
But for months, they had to wait for an invitation to a new party.

At least, I would not like to go to a coctail party where the host
announces that he kicks out the people for that reason. This should be
on the decision of the parents äääh maintainers.

IMO, we should trust the maintainer and their decisions until there is
no experience that it doesn't work. Which means: keep the maintainer
fully responsible on the package, including the ability to lower
severity of a CI test or any other bug. Only if we experience that this
doesn't work, we need other measures.

Best regards

Ole

Santiago Vila

2017-01-16 10:43:05 UTC

Post by Ole Streicher
IMO, we should trust the maintainer and their decisions until there is
no experience that it doesn't work. Which means: keep the maintainer
fully responsible on the package, including the ability to lower
severity of a CI test or any other bug. Only if we experience that this
doesn't work, we need other measures.

Well, it does not work:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=843038#10
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841098#78

Thanks.

Ole Streicher

2017-01-16 11:17:48 UTC

Post by Santiago Vila

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=843038#10
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841098#78

This comes out of different interpretations of whether builds that
sometimes fail (but often not) are RC buggy. You know that also I have a
different opinion here.

So, if you really want to have your interpretation to be the commonly
accepted one, you should discuss it here and see whether you reach a
common acceptance with your interpretation.

Otherwise it is IMO OK if a package maintainer has her own idea of
whether an occaisonal build failure is RC.

Best regards

Ole

Santiago Vila

2017-01-16 12:23:43 UTC

Post by Santiago Vila

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=843038#10
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841098#78

This comes out of different interpretations of whether builds that
sometimes fail (but often not) are RC buggy. You know that also I have a
different opinion here.
So, if you really want to have your interpretation to be the commonly
accepted one, you should discuss it here and see whether you reach a
common acceptance with your interpretation.

I think it should be the other way around, because Release Policy
already says "Packages must autobuild from source" and it does
not say anything about failure thresholds.

In fact, I've seen maintainers downgrading FTBFS bugs that happen more
than 50% of the time.

With the lax interpretation, we could have policy reversed, as in
"packages must not build from source", and the package
would still be policy compliant!

Schrödinger paradox! Packages are simultaneosly compliant with Release
Policy and with the reverse of Relese Policy!

This is why I can't trust (all) maintainers to do the right thing
regarding random FTBFS failures.

So, if you people are considering to put a piuparts-like blocking to
testing migration, please consider what will happen when the failure
happens randomly.

BTW: Your idea of an automatic RC bug would be a good start indeed,
and it's probably the least we should do.

Thanks.

Ian Jackson

2017-01-16 13:40:59 UTC

Post by Santiago Vila
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=843038#10
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841098#78

I agree that there are sometimes problems with people wanting to
"defend" their packages by unjustifiably downgrading bugs.

Better testing amounts to better discovery of bugs. Better automatic
handling of test results does indeed depend on getting the human
judgements right.

I don't think Paul's proposal going to help solve this, but that just
means that Paul's proposal does not solve all problems.

The alternatives: eg, preventing maintainers from overriding test
failures, is worse: the tests will simply be removed.

Thanks,
Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Santiago Vila

2017-01-16 10:07:11 UTC

Post by Lars Wirzenius
Picture this: a cocktail party. Many people mingling around, dressed
up and engaging in smalltalk, sipping colourful drinks. A new couple
arrives and is immediately surrounded by old fiends. "Hi, Jack and
Joan, how are you? How is that lovely offspring of yours?" The couple
look down, and their faces get a careful, blank expression. "It's not
good. We don't know what we did wrong. We're so ashamed. We don't know
how such a thing could happen. We thought we were such good parents."
A shocked silence fall on the group, in the middle of the hubbub of
the greater party. "You see, our child, our child..." Jack sobs and
can't get the words out, so Joan takes a deep breath and speaks. "Our
child wrote a test that fails randomly, and released it." One by one
their friends leave the group, quietly, and without speaking a single
harsh syllable. But for months, they had to wait for an invitation to
a new party.

LOL, but I don't see a lot of social exclusion here:

https://bugs.debian.org/cgi-bin/pkgreport.cgi?users=***@debian.org;tag=ftbfs-randomly

Sometimes I've seen maintainers downgrade FTBFS bugs to "wishlist"!

Surely I will not invite those maintainers to a party, but they are
still maintaining Debian packages.

Should I ask the Technical Committee to rule out that FTBFS bugs are RC,
even if they did not happen in buildd.debian.org yet?

Thanks.

Jonas Smedegaard

2017-01-16 11:30:12 UTC

Quoting Santiago Vila (2017-01-16 11:07:11)

Post by Santiago Vila
Sometimes I've seen maintainers downgrade FTBFS bugs to "wishlist"!
Surely I will not invite those maintainers to a party, but they are
still maintaining Debian packages.
Should I ask the Technical Committee to rule out that FTBFS bugs are RC,
even if they did not happen in buildd.debian.org yet?

Yes, please do!

I believe, however, that Ole was talking about non-critical bugs being
part of a testsuite - not criticial bugs wrongly downgraded.

- Jonas

--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/

[x] quote me freely [ ] ask before reusing [ ] keep private

Adam Borowski

2017-01-16 12:22:12 UTC

I'd say that all FTBFS bugs should be RC -- the maintainer requested (even
if unintentionally) test failures to be fatal. If that's not your intent,
write ||: after "make check".

--
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
./configure --host=zx-spectrum --build=pdp11

Russ Allbery

2017-01-16 20:02:32 UTC

Post by Santiago Vila
Should I ask the Technical Committee to rule out that FTBFS bugs are RC,
even if they did not happen in buildd.debian.org yet?

This seems excessively aggressive. I've had FTBFS bugs in my packages
that were due to specific configurations for archive mass rebuilds that
were not reproducible on buildds, and while those are certainly bugs that
I wanted to fix, I think making them RC is questionable.

See, for instance:

https://bugs.debian.org/830452 (which I shouldn't have closed)
https://bugs.debian.org/835677

I understand the frustration -- for instance, I closed that first bug when
I absolutely should have left it open, since it represented a fragile
test. (It's now fixed properly.) But I think making them RC instead is
an overreaction.

Remember, making a bug RC says that we're going to remove the package from
the archive if the bug isn't fixed. Suppose either of those had been
reported near the release freeze and I was, say, in the hospital or
something and simply couldn't look at them. Would the appropriate
reaction to either of the above bugs be to remove the software from the
release?

Note that I'm not arguing that these aren't bugs, or that they shouldn't
be a priority, just that FTBFS bugs that aren't reproducible on buildds
don't interfere with the release or with security support and therefore
I'm not sure the RC severity is justified. (Now, that said, flaky
failures that sometimes do fail on buildds *may* interfere with security
support, and therefore are, to my mind, much more serious.)

--
Russ Allbery (***@debian.org) <http://www.eyrie.org/~eagle/>

Santiago Vila

2017-01-16 21:00:42 UTC

Post by Santiago Vila
Should I ask the Technical Committee to rule out that FTBFS bugs are RC,
even if they did not happen in buildd.debian.org yet?

This seems excessively aggressive.

No, really it's not. It's already current practice:

https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=lamby%40debian.org
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=lucas%40debian.org
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=sanvila%40debian.org

Are you suggesting that we should refrain from reporting FTBFS bugs as
serious unless we have a build log from buildd.debian.org in our hands?

I'm sure you are not, but I've seen people downgrade bugs "because
they do not happen in buildd.debian.org" and at the same time nobody
of them realize what would happen if we followed such silly
(and wrong) rule in a consistent way.

Post by Russ Allbery
I've had FTBFS bugs in my packages
that were due to specific configurations for archive mass rebuilds that
were not reproducible on buildds, and while those are certainly bugs that
I wanted to fix, I think making them RC is questionable.

Well, maybe what it's excessively aggressive or questionable is to run
the tests at build time and making the package build as a whole
to fail when any test fails.

I have the feeling that this autopkgtest things should be used (among
other things) to de-couple package builds from package testing.

Then people who test that packages build ok would have one thing less
to worry about.

Post by Russ Allbery
[...]
Remember, making a bug RC says that we're going to remove the package from
the archive if the bug isn't fixed. Suppose either of those had been
reported near the release freeze and I was, say, in the hospital or
something and simply couldn't look at them. Would the appropriate
reaction to either of the above bugs be to remove the software from the
release?

No, the appropriate reaction would be to disable the failing tests via
NMU until the maintainer exits the hospital and can investigate.

Thanks.

Markus Koschany

2017-01-16 22:45:42 UTC

Post by Santiago Vila

Post by Santiago Vila
Should I ask the Technical Committee to rule out that FTBFS bugs are RC,
even if they did not happen in buildd.debian.org yet?

This seems excessively aggressive.

https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=lamby%40debian.org
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=lucas%40debian.org
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=sanvila%40debian.org
Are you suggesting that we should refrain from reporting FTBFS bugs as
serious unless we have a build log from buildd.debian.org in our hands?
I'm sure you are not, but I've seen people downgrade bugs "because
they do not happen in buildd.debian.org" and at the same time nobody
of them realize what would happen if we followed such silly
(and wrong) rule in a consistent way.

[...]

No, this is not current practice. But you are obviously trying to force
it this way by all means necessary. Nobody asks you from refraining to
report those kind of bugs but what I and other people are seriously
questioning is your handling of severity levels. You always assume RC
severity even when it is proven that the package works and builds fine
for the majority of people. You don't care what maintainers think about
the issue. Many people, me included, get annoyed and then resolve this
"issue" by disabling the responsible test and focus on more pressing
matters. There is nothing wrong with tests per se which try to catch
_real life_ issues though.

How can this be in the best interest of users and developers? First of
all I think your test environment is fundamentally flawed. It is
possible to make every package in the archive fail to build from source
by choosing extremely unusual parameters. Tests and packages require a
certain amount of memory and a certain amount of disk space. Tests make
assumptions about what is to be expected in a real life environment.
Nobody in his right mind would agree with me that a build failure due to
low memory on a user's machine is RC when the buildds and 99,9 % of all
standard computers are able to compile the package.

Should this become the standard in Debian, then I would at least expect
that we define some sort of reference system (in terms of hardware
specs) against which these rebuilds are run. In my opinion the buildd
network is a reasonable candidate. A randomly emulated environment is not.

Michael Biebl

2017-01-16 23:31:45 UTC

Post by Markus Koschany
No, this is not current practice. But you are obviously trying to force
it this way by all means necessary. Nobody asks you from refraining to
report those kind of bugs but what I and other people are seriously
questioning is your handling of severity levels.

Yup, I'm definitely annoyed by how aggressively Santiago is handling
this. He's not helping anyone with this even if he has well-meaning
intentions.

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Santiago Vila

2017-01-16 23:29:29 UTC

Sorry, no. You downgraded "missing build-depends"-type bugs several
times, and somebody else had to tell you that they were RC.

Example: gnupg. You did not believe that gnupg was not essential and
argued and argued and argued until a Release Manager told you clearly
that missing build-depends are RC.

There was also a missing build-conflicts bug that you downgraded and
somebody else had to tell you that it was wrong as well.

So it's not me who is handling severities wrong.

Post by Markus Koschany
You always assume RC
severity even when it is proven that the package works and builds fine
for the majority of people.

No. I assume RC when it is a FTBFS bug and I can reproduce it in
several different computers.

There is no such thing as a "majority of people" when your single and
only source for "buildability" is buildd.debian.org.

A successful build in buildd.debian.org means *nothing*.

Buildds may have packages installed which are not build-essential.

Buildds may be running jessie while I am already running stretch.

Etc.

Post by Markus Koschany
You don't care what maintainers think about
the issue. Many people, me included, get annoyed and then resolve this
"issue" by disabling the responsible test and focus on more pressing
matters. There is nothing wrong with tests per se which try to catch
_real life_ issues though.

Sorry, it is not responsible at all to have a flaky test make the
whole build to fail.

If you get annoyed by flaky tests making the build to fail, do not let
the test to make the build to fail, but don't blame me for the annoyance
that the test fails.

Thanks.

Russ Allbery

2017-01-17 01:45:19 UTC

Post by Santiago Vila
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=lamby%40debian.org
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=lucas%40debian.org
https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3AFTBFS;submitter=sanvila%40debian.org
Are you suggesting that we should refrain from reporting FTBFS bugs as
serious unless we have a build log from buildd.debian.org in our hands?

No, I'm suggesting that you should continue to report FTBFS as serious,
but if the maintainer downgrades the bug to important because it's not
reproducible on the buildds and seems to be an artifact of the test
rebuild environment and they don't have time to fix it immediately, you
should at least consider whether that's possibly a legitimate response
depending on the specific situation. And that one should at least take a
look at such bugs, ideally, before letting them auto-remove packages from
testing (although I understand that no one really has much time to do
that).

I cannot over-stress how demoralizing it is to have your packages removed
from the archive right before a release because you didn't have time to
fix a bug like this due to life reasons. I am all in favor of continually
ratcheting up the quality expectations that we have for Debian packages,
but please be sensitive to whether the specific bug you've discovered is
*really* release-critical, in the sense that the package is going to cause
some problem or not be maintainable in a stable release.

For many, many FTBFS bugs, the answer is yes, it's release-critical. But
I don't think that's true for every instance of someone attempting to
build a Debian source package and having it fail.

Post by Santiago Vila
I'm sure you are not, but I've seen people downgrade bugs "because they
do not happen in buildd.debian.org" and at the same time nobody of them
realize what would happen if we followed such silly (and wrong) rule in
a consistent way.

I have sometimes downgraded such bugs because, as it turns out, the person
who reported the FTBFS bug was building in an unclean environment (stray
bad configuration files, stray partly-removed conflicting packages, etc.).
I want my packages to build everywhere, and I don't think there's been a
case of this where I've not managed to fix it, but I don't consider
ensuring that the package builds in absolutely any environment to be
release-critical.

Post by Santiago Vila
Well, maybe what it's excessively aggressive or questionable is to run
the tests at build time and making the package build as a whole to fail
when any test fails.

*blink*.

I'm quite surprised that you would advocate not failing a build if tests
fail during the package build? I think that would be an awful way to
proceed. My packages have test suites for a reason. I do not want
packages to appear to successfully build if their tests are failing. That
may mean that the resulting binaries are nonfunctional or even dangerous.

Post by Santiago Vila
I have the feeling that this autopkgtest things should be used (among
other things) to de-couple package builds from package testing.

autopkgtest is useful for adding additional tests of the built binaries,
but I don't believe it's intended as a replacement for build-time testing.
Maybe I've missed something?

Post by Santiago Vila
No, the appropriate reaction would be to disable the failing tests via
NMU until the maintainer exits the hospital and can investigate.

That would certainly be fine, and I'm signed up for every "please NMU my
packages" list I can find, but we both know that time to do this for all
packages is pretty short in the run-up to the release.

--
Russ Allbery (***@debian.org) <http://www.eyrie.org/~eagle/>

Santiago Vila

2017-01-17 08:02:28 UTC

Post by Santiago Vila
Well, maybe what it's excessively aggressive or questionable is to run
the tests at build time and making the package build as a whole to fail
when any test fails.

*blink*.
I'm quite surprised that you would advocate not failing a build if tests
fail during the package build? I think that would be an awful way to
proceed. My packages have test suites for a reason. I do not want
packages to appear to successfully build if their tests are failing. That
may mean that the resulting binaries are nonfunctional or even dangerous.

Not exactly. I'm not advocating not failing a build if tests fail
as a general rule.

In this context, I refer specifically to flaky tests. What I call
questionable is keeping a flaky test making the build to fail when the
test fails so much that it's clearly a wrongly designed test.

Or, alternatively, if the test fail a lot and it's correctly designed,
it is questionable not to consider the bug as RC.

Thanks.

Russ Allbery

2017-01-17 17:08:40 UTC

Post by Santiago Vila
Not exactly. I'm not advocating not failing a build if tests fail
as a general rule.
In this context, I refer specifically to flaky tests. What I call
questionable is keeping a flaky test making the build to fail when the
test fails so much that it's clearly a wrongly designed test.

Oh, sure, I'm in favor of disabling flaky tests if we can't fix them. My
experience is usually more that I'm leaving them on *because* I'm trying
to fix them and can't reproduce locally, or I think I've fixed it (but
actually haven't).

Some upstream test suites also make it a little difficult to disable a
single test without carrying a patch. (Hm, including mine....)

--
Russ Allbery (***@debian.org) <http://www.eyrie.org/~eagle/>

Paul Wise

2017-01-18 01:42:36 UTC

Post by Santiago Vila
In this context, I refer specifically to flaky tests. What I call
questionable is keeping a flaky test making the build to fail when the
test fails so much that it's clearly a wrongly designed test.

Oh, sure, I'm in favor of disabling flaky tests if we can't fix them. My
experience is usually more that I'm leaving them on *because* I'm trying
to fix them and can't reproduce locally, or I think I've fixed it (but
actually haven't).
Some upstream test suites also make it a little difficult to disable a
single test without carrying a patch. (Hm, including mine....)

I would expect most upstream test frameworks support marking tests as
flaky, which usually means they always get run and results printed but
their outcome never causes a build failure.

--
bye,
pabs

https://wiki.debian.org/PaulWise

Ian Jackson

2017-01-18 11:35:32 UTC

Post by Paul Wise

Post by Russ Allbery
Oh, sure, I'm in favor of disabling flaky tests if we can't fix them. My
experience is usually more that I'm leaving them on *because* I'm trying
to fix them and can't reproduce locally, or I think I've fixed it (but
actually haven't).

...

Post by Paul Wise
I would expect most upstream test frameworks support marking tests as
flaky, which usually means they always get run and results printed but
their outcome never causes a build failure.

It might also be that the flakiness affects all tests.

For example, there is a race in gnupg2's gpg-agent which makes the
dgit test suite fail sometimes. (#841143, reported in October; now at
last being worked on.)

As it happens, I have chosen (for other reasons[1]) not to run the
test suite during package build, so this is never a FTBFS. But it
could easily have been a flaky FTBFS.

Ian.

[1] Mainly, that the test suite is computationally intensive and has
an inconveniently large dependency set; this makes it
disproportionate, particularly given that the actual package build is
almost trivial.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Barry Warsaw

2017-01-18 20:45:52 UTC

Post by Russ Allbery
autopkgtest is useful for adding additional tests of the built binaries,
but I don't believe it's intended as a replacement for build-time testing.
Maybe I've missed something?

No, I think you're exactly right. If an upstream provides unit tests, those
are totally appropriate to run at build time -and to fail the build if they
fail- but may not be appropriate to run in autopkgtest. autopkgtests should
be reserved for larger suitability tests on the built and installed package.

An example might be a Python library's test suite. It makes sense to run
these at build time because that's usually when upstream will run them
(i.e. during development of the package). But since the test suite usually
isn't run on a built package, it shouldn't be autopkgtested. The environment
for build tests and autopkgtests are importantly different, e.g. the former
does not/should not allow access to the internet while the latter can and
sometimes must. A good example of an autopkgtest would be an import test for
a Python module, i.e. once the package is built and installed, does it import?
In fact, autodep8 will automatically add import tests for Python modules in a
safe way (by cd'ing to a temporary directory first).

There are occasionally good reasons why an upstream's test suite can't be run
at build-time, and in those few cases I will run them in an autopkgtest. But
generally, I think the two are there to test different aspects or lifecycles
of the package.

Cheers,
-Barry

Ian Jackson

2017-01-18 22:49:15 UTC

Post by Barry Warsaw
There are occasionally good reasons why an upstream's test suite can't be run
at build-time, and in those few cases I will run them in an autopkgtest. But
generally, I think the two are there to test different aspects or lifecycles
of the package.

This depends very much on the nature of the program.

If it's a self-contained library or tool, full of algorithms and with
few entanglements with dependencies, and the upstream test is mostly
algorithmic "right answer" tests, then what you say is true.

But much software that we have is not like that at all. dgit is a
very good example here.

dgit has many dependencies and is easily broken by "unexpected
behaviours" in those dependencies. Most tests uin the test suite
exercise a mixture of code in dgit, and the dependencies, and the
interaction between those.

There isn't really a distinction between things that are useful to
test during development, and things that are useful as autopkgtests.

Even in software that looks very algorithmic it can be useful to use
the upstream test suite as the autopkgtest. I recently packaged Simon
Tatham's exact real calculator, `spigot'.

spigot's build system as supplied by Simon runs a moderately extensive
test suite.

spigot depends on gmp. If there were some bug in certain gmp
functions, that gmp bug might cause a regression in spigot. How would
I detect this ? Well, I can publish the spigot test suite as an
autopkgtest test. I haven't checked, but I imagine that Simon's test
suite tests most of the interesting codepaths in spigot, so it
probably also tests most of the intersting gmp calls that spigot
makes.

My view is that for most packages, if the upstream test suite _can_ be
made to run against the installed version, and isn't annoying in some
way, it should be advertised as an autopkgtest.

One may want to add other autopkgtests. For example, I added another
test for spigot, to check that it is actually using the gmp
integration, rather than its fallback internal bignum library. AFAICT
this is actually a check that could be done at build time, with
spigot's current fallback behaviour, but autopkgtest is a convenient
place to add Debian-specific tests, and it will spot if spigot gets
dynamic fallback.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Ian Jackson

2017-01-16 13:42:27 UTC

Post by Lars Wirzenius
Until an unreliable test is fixed, in my opinion it'd be better if the
test suite didn't fail because of it. Run the test by all means, to
gather more information for debugging, but don't fail the whole test
suite.

autopkgtest can report individual test failures without "failing the
whole test suite".

There is new functionality needed to be able to do this in cases where
there are many test results run by one upstream script.

An override mechanism ought to operate on individual autopkgtest
tests, at least by default. (Maybe it ought to operate on glob
patterns.)

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Ian Jackson

2017-01-16 13:38:05 UTC

Post by Steve Langasek
If the failure of the test is not critical, then it should not be used as a
gate for CI. Which means you, as the package maintainer who knows that this
test failure is not critical, should fix your autopkgtest to not fail when
the non-critical test case fails.

You seem to be suggesting that in the case of
* tests which expose non-RC bugs in the main code or its dependencies
* broken tests, but not the most important test cases
the test should be suppressed: ie, that it should not be run (or, it
should be nobbled to exit "successfully" despite actually failing).

I disagree. Information from such tests is useful and should be
properly recorded and handled.

I also disagree with the proposition that the information, that test X
in package Y is caused by a non-RC bug and should not impede
migration, should be recorded inside the source package Y.

We need to be able to adjust the blockness of tests without uploading
new versions of packages. It should be in the bug system.

CI tests are useful for purposes other than controlling testing
migration.

This is IMO a silly argument. We always release Debian with bugs.

CI failures that represent non-RC bugs are useful information. Such
failures should be brought to the attention of a human so that the
human can decide whether te failure is RC (or take other appropriate
action).

You are getting dangerously close to the notion that in a
well-functioning organisation the test suite will nearly always
completely pass.

Post by Steve Langasek
Why would you mark them non-blocking /before/ you know that the tests are
flaky enough for this to matter? Upstream put them in the test suite for a
reason. I'd suggest that it's much better to block by default, and if you
find that a particular test is becoming a problem (for you or for another
maintainer), you can upload to make that test non-blocking.

I don't think anyone is arguing the reverse.

The question is whether marking a test non-blocking should involve the
release team. I think it should not. It should involve the package
maintainer (unless there is disagreement).

We want to incentivise people to provide tests. If they cannot
control what action is taken (by automation) in response to the tests,
they will remove or disable (or not provide) tests.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Niels Thykier

2017-01-16 17:30:00 UTC

Post by Ian Jackson

[...]

The question is whether marking a test non-blocking should involve the
release team. I think it should not. It should involve the package
maintainer (unless there is disagreement).
We want to incentivise people to provide tests. If they cannot
control what action is taken (by automation) in response to the tests,
they will remove or disable (or not provide) tests.
Ian.

The autopkgtests gating testing migration comes with a promise from the
release teams side of reducing the age delay for migrations[1]. I.e.
packages with passing tests that do not cause regressions[2] in reverse
dependencies would be entitled to a shorter migration delay.

Personally, I think autopkgtests gating should eventually replace the
age delay in general. Notably, I remember Bdale saying at DC13 that the
age delay is basically only really useful for finding brown-paper-bag
bugs[3] and I am inclined to agree with that.
Mind you, it will probably be several releases before we are at a
stage where we are ready for completely eliminating age delays for
autopktests-enabled packages.

I would prefer setting it up as we decided 3-4 years ago in a
non-enforcing mode to see how it all works out. Once we have ironed out
the early implementation bugs and have seen how it works in practise, we
can look at enabling the "blocking" feature of this proposal.

In summary:

* We will introduce it in a non-enforcing mode to see how it works
(and weed out any "early-implementation bugs")
* Passing tests will be grounds for reduced age requirements (once it
has been tested)
* Only regressions will be blockers; if the tests also fail in testing
the migration will not be stalled (but it will be subject to full
age delay)

Thanks,
~Niels

[1] https://lists.debian.org/debian-devel-announce/2013/08/msg00006.html

[2] The original mail says "failures" would be blockers but in practise,
Britney has always blocked on "regressions" rather than "failured like
it does in testing".

[3] It is in one of the video talks from DC13 - I /think/ it was the
release team talk/bits, where we were debating reducing the default age
requirement from 10 to 5 days.

Ian Jackson

2017-01-16 17:58:12 UTC

Post by Niels Thykier
* We will introduce it in a non-enforcing mode to see how it works
(and weed out any "early-implementation bugs")
* Passing tests will be grounds for reduced age requirements (once it
has been tested)
* Only regressions will be blockers; if the tests also fail in testing
the migration will not be stalled (but it will be subject to full
age delay)

This would be good.

But I do think maintainer control, by way of filing bugs (RC or
otherwise) which they explicitly declare to be the cause of individual
test failures, would be a good addition.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Niels Thykier

2017-01-16 17:52:00 UTC

Post by Ole Streicher
What is the reason not to use automated bug reports here? This would
allow to use all the tools the bug system has: severities, reassigning
closing etc.

[...]

I already don't understand this with the piuparts blocker: we have an
established workflow for problems with packages that need some
intervention, and this is bugs.d.o. This has a lot of very nice
* discussion of the problem attached to the problem itself and stored
for reference
* formal documentation of problem solving in the changelog (Closes: #)
* severities, tags, re-assignments, affects etc.
* maintainer notifications, migration blocks, autoremovals etc.
* documented manual intervention possible
I don't see a feature that one would need for piuparts complaints or for
CI test failures that is not in our bug system. And (I am not sure)
aren't already package conflict bugs autogenerated?
I would really prefer to use the bug system instead of something else.

There exists no "auto-bug-filing" tool that people approve of for this
kind of purpose. You are very welcome to introduce such a tool - I
would be happy to see it for FTBFS regressions on buildds.
In the absence of such a tool, existing and future QA checks for
gating will be implemented directly in Britney. Mind you, even if such
a auto-bug-filing tool were created, there will always be control checks
in Britney that will only be overridable by the release team.

Personally, I do not have the capacity to create such a tool. Instead,
I have been working on making Britney's migration policy decisions
available in a machine-parsable format (available from [1]).
It is still WIP[2], but it does include piuparts, aging and RC bugs
blockers. Plus it would also include autopkgtests information once we
add that.
Ideally, I would finish that up and have it integrated into
tracker.d.o or/and UDD (related dep: DEP-2?).

Thanks,
~Niels

[1] https://release.debian.org/britney/excuses.yaml

[2] If you are considering to import this data set, please let me know.
We still tweak the format/values now and then, so it is better if I can
notify you in advance, rather than break your importer without knowing it.

Related note: Feedback from prospective importers welcome (please follow
up in a separate thread).

Ian Jackson

2017-01-14 12:58:14 UTC

Post by Ian Jackson
* Increasing the migration delay for the affecting package

I like this and will suggest it to the release team. Especially for the
start up time.

I definitely think we should start with this. It provides a good
incentive to add tests to one's package: namely, advance notice of
problems which occur with newer dependencies.

But there are a lot of things that I think we are going to have to
work out. Some of them have been mentioned in this thread.

At the moment we (Debian) 1. have very little experience of how
autopkgtests will work in practice 2. haven't really tackled any of
the social questions. The Ubuntu experience is valuable for (1) but
Ubuntu has a very different social structure, so doesn't tell us much
about (2).

Questions which will come to the fore include: if a new version of a
core package A breaks an "unimportant" leaf package B, such that B
becomes RC-buggy, is that an RC bug in A ? The only coherent answer
is "yes" but if B is just "too wrong" or unfixable, at some point
something will have to give. I think our social structures will come
under some additional strain.

Post by Paul Gevers
One can always file bug reports against the release.debian.org pseudo
package to ask for britney to ignore the autopkgtest result.

I think that if autopkgtests are a success, there will be far too much
of this for the release team to be involved in first-line response.

Since the autopkgtests are controlled by the depending package, I
suggest that there should be a way for the depending package
maintainer to provide this information and control the way the tests
affect migrations.

The information would want to be kept outside the depending package's
source tree, but rather in some kind of management system, because
uploads are disruptive in this context. We could use the BTS: one way
would be for the autopkgtest analyser to look for a bug with a new
kind of tag "this bug causes broken tests". Ideally there would be a
way to specify the specific failing tests.

If the bug is actually in the dep package, but the maintainer of the
rdep with the failing tests wants it not to block migration of the
dep, they would still file a bug against the rdep and mark it blocked
in the bts by the bug in the dep.

This way our existing rule that the maintainer of a packgae is (at
least in the first instance) in charge of the bugs against their
package extends naturally to giving the rdep first instance control
over migration of deps which cause test failures.

That is consistent with the principle of providing an incentive for
adding tests. It also provides a way to work around broken tests that
is not throwing the package out of the release. That is very
important because otherwise adding tests is a risky move: your package
might be removed from testing as a result of your excess of zeal.

The release team would become involved if the dep maintainer and the
the rdep maintainer disagree. Ie, if the dep maintainer wants such a
"broken test" bug to exist, and the rdep maintainer wants not, then
the rdep maintainer would ask ***@. The existing principle that
the release team are the first escalation point for disagreements
about testing migration (currently, RC bug severity) extends naturally
to this case.

Post by Ole Streicher
What is the reason not to use automated bug reports here? This would
allow to use all the tools the bug system has: severities, reassigning
closing etc.

The difficulty with automated bug reports is this: how do you tell
whether something is the same bug or not ?

If you're not careful, a test which fails 50% of the time will result
in an endless stream of new bugs from the CI system which then get
auto-closed...

(If there are bugs, we want them to auto-close because no matter how
hard we try, test failures due to "weather" will always occur some of
the time. Closing such bugs by hand would be annoying.)

Thanks,
Ian.

Colin Watson

2017-01-14 00:50:40 UTC

Post by Ian Jackson
* Increasing the migration delay for the affecting package
* Notifying the affected package maintainers

I think this makes sense: it gives the maintainer and other interested
developers some time to assess whether the failure is a showstopper
(=> upload a fix/revert/workaround, or at worst file a RC bug) or not.
Or, conversely, blocking migrations but letting the relevant maintainers
remove that block might work.

Agreed (if any of this is practical in britney).

Post by Ian Jackson
Probably the simplest way to avoid problems with systems like this is to
remove any autopkg tests your packages are shipping.

[...]

Post by Ian Jackson
P.S. Perverse incentives FTW.

This is my concern too. If you maintain a useful package with tests that
are unstable but do not imply a release-critical issue, running those
tests and recording-but-ignoring the failures seems considerably better
for Debian than either disabling the tests or removing the software
(more information is better than less information, and if the package
is useful despite the test failure then it's better to have it than not).
Possible autopkgtest extension: "Restrictions: unreliable"?

May as well just use "Restrictions: allow-stderr" and "... || true".
That's easier to do on a more fine-grained level, too.

--
Colin Watson [***@debian.org]

Ole Streicher

2017-01-14 10:13:15 UTC

Post by Colin Watson

Post by Simon McVittie
Possible autopkgtest extension: "Restrictions: unreliable"?

May as well just use "Restrictions: allow-stderr" and "... || true".
That's easier to do on a more fine-grained level, too.

As on my deprecation example: I *want* to have autopkgtest failing when
a deprecation warning appears. I also want to keep the failure until it
is solved, so I would not like to just override it.

It is just a non-critical CI test failure.

BTW, this was just the simplest example. Others (in python-astropy f.e.)
are internal tests that no warnings were written during a certain
test. This will fail if a deprecation warning pops up (even if not
written to stderr), but is still non-critical.

If I would need to limit the CI tests to critical ones, I would probably
switch them off completely: most of the failures I experienced so far
are not critical at all. But this would be counterproductive.

Best regards

Ole

Martin Pitt

2017-01-16 12:01:44 UTC

Hello all,

(I'm not subscribed, thus hand-crafting In-Reply-To:; please keep CC'ing me on
replies).

Ole Streicher [Fri, 13 Jan 2017 15:57:09 +0100]

Post by Ole Streicher
Will there be a way to override this for the maintainer? Otherwise I
would see the danger that a buggy reverse dependency CI test can prevent
an important update, for example if the reverse dependency uses a long
deprecated function that is now removed.

If you upload a new version of a library that removes a symbol, then all
reverse dependences must get fixed in or removed from testing anyway. In this
scenario the new lib would already not propagate as the rdepends would FTBFS in
the binNMU against the new library SONAME (assuming that you did bump it).
OTOH, if you did not bump the SONAME, then this is an RC bug anyway which then
gets caught by the test.

For other scenarios which aren't already caught by britney's installability
checks (change in behaviour which doesn't reflect in changed ABI) we actually
do want the same: If we can catch a regression through a test, then it makes
zero sense to automatically land that regression in testing anyway -- the whole
point of this exercise is to allow us to land transitions with confidence and
sort out transitions in unstable *before* landing regressions in testing.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Martin Pitt

2017-01-16 12:06:19 UTC

Hello all,

Scott Kitterman [Fri, 13 Jan 2017 13:54:26 -0500]

Post by Scott Kitterman
Probably the simplest way to avoid problems with systems like this is to
remove any autopkg tests your packages are shipping.
P.S. Perverse incentives FTW.

No, that won't work at all. If you upload libfoo which regresses a reverse
dependency bar and bar's tests now fail, then removing libfoo's autopkgtests
won't help you *at all* in landing the new libfoo in testing. You'd need to
convince bar's maintainer to change/drop the test.

The carrot for adding tests is that the better they are, the harder you make it
for *other people* (i. e. your dependencies) to break your software. The stick
is that you then of course need to make/keep your own tests running so that you
can upload new versions of libfoo yourself.

So IMHO the incentives are quite right here.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Scott Kitterman

2017-01-16 15:52:03 UTC

Post by Martin Pitt
Hello all,
Scott Kitterman [Fri, 13 Jan 2017 13:54:26 -0500]

Post by Scott Kitterman
Probably the simplest way to avoid problems with systems like this is to
remove any autopkg tests your packages are shipping.
P.S. Perverse incentives FTW.

No, that won't work at all. If you upload libfoo which regresses a reverse
dependency bar and bar's tests now fail, then removing libfoo's autopkgtests
won't help you *at all* in landing the new libfoo in testing. You'd need to
convince bar's maintainer to change/drop the test.
The carrot for adding tests is that the better they are, the harder you make
it for *other people* (i. e. your dependencies) to break your software. The
stick is that you then of course need to make/keep your own tests running
so that you can upload new versions of libfoo yourself.
So IMHO the incentives are quite right here.

Of course if we just never allow anything into Testing, there's no risk of bad
software migrating.

This is going to take a lot of work. I see random failures routinely block
migrations in Ubuntu (postfix is a current example - there's two alleged
regressions that to the extent they are valid are completely unrelated to
anything that changed in the package).

The question is who's going to do the work? I don't see the release team
having tons of spare cycles to dive into the details of individual test
results and package failures and decide what's RC and what's not. The only
thing that scales to something the size of the Debian archive is to let the
maintainers decide.

Yes, some of them will abuse that authority, but that's true of anything.

Scott K

Barry Warsaw

2017-01-16 17:09:02 UTC

Post by Scott Kitterman
This is going to take a lot of work. I see random failures routinely block
migrations in Ubuntu (postfix is a current example - there's two alleged
regressions that to the extent they are valid are completely unrelated to
anything that changed in the package).
The question is who's going to do the work? I don't see the release team
having tons of spare cycles to dive into the details of individual test
results and package failures and decide what's RC and what's not. The only
thing that scales to something the size of the Debian archive is to let the
maintainers decide.

Speaking only about our experiences in Ubuntu, I can anecdotally but
emphatically claim that gating promotions on passing autopkgtests has
dramatically improved the quality of the running end user systems.

It used to be that you had to be very careful about running and especially
dist-upgrading the current devel series. You never knew if something major
like the kernel or X would break, or even when minor breakages would be highly
inconvenient. It just wasn't safe without a lot of precaution.

But now I don't hesitate to run devel almost as soon as the new series opens.
That's not to say that serious breakage never happens; not everything is
tested of course, and stuff happens. But it's rare, maybe once or twice a
cycle for boot-to-desktop to break, or a package regression sneaks through. I
just have way way more confidence in the distro now that these tests block
promotion.

Yes, it can be more work at times, and it's not always easy to diagnose or
reproduce promotion problems. (I'm currently flummoxed by a systemd
regression triggered by a network-manager fix.) But I'd much rather have the
luxury of debugging these problems on a still-functioning system and without
also-frustrated users hammering me on IRC, with deadlines looming.

I think it does mean that maintainers will have to step up and take more
responsibility for nursing their packages through to promotion, but I also
think they are in a much better position to do so than J Random User who runs
an upgrade only to be left with a broken system or application.

One other point. I don't know how many folks run unstable (or in Ubuntu's
case, devel), but for most software I work on, few users really test
pre-release versions. As much as you plead with them, "hey, beta 3 is out,
please test!" they just won't for totally understandable reasons. So problems
arise *after* the final release because that's when people start to really
hammer on it, and integrate it with their own software, environments, and
workflows. That means day-to-day user testing just can't be all that reliable
because there are so few data points, and it's another reason why I think
automated testing/CI is so important. (It's also an investment over time; you
don't have to have 100% coverage from day one, but every new test can improve
overall quality just a little bit.) It's also why I feel it's important for
*me* to run unstable/devel. True, it's my day job, but I also feel a
responsibility to help ensure the little corner of stuff I use, care about,
and know about is in as good a shape as possible before it gets into the hands
of our users. I *want* to feel the pain before they do.

Cheers,
-Barry

Scott Kitterman

2017-01-16 17:24:08 UTC

Post by Barry Warsaw

Speaking only about our experiences in Ubuntu, I can anecdotally but
emphatically claim that gating promotions on passing autopkgtests has
dramatically improved the quality of the running end user systems.
It used to be that you had to be very careful about running and especially
dist-upgrading the current devel series. You never knew if something major
like the kernel or X would break, or even when minor breakages would be
highly inconvenient. It just wasn't safe without a lot of precaution.
But now I don't hesitate to run devel almost as soon as the new series
opens. That's not to say that serious breakage never happens; not
everything is tested of course, and stuff happens. But it's rare, maybe
once or twice a cycle for boot-to-desktop to break, or a package regression
sneaks through. I just have way way more confidence in the distro now that
these tests block promotion.
Yes, it can be more work at times, and it's not always easy to diagnose or
reproduce promotion problems. (I'm currently flummoxed by a systemd
regression triggered by a network-manager fix.) But I'd much rather have
the luxury of debugging these problems on a still-functioning system and
without also-frustrated users hammering me on IRC, with deadlines looming.
I think it does mean that maintainers will have to step up and take more
responsibility for nursing their packages through to promotion, but I also
think they are in a much better position to do so than J Random User who
runs an upgrade only to be left with a broken system or application.
One other point. I don't know how many folks run unstable (or in Ubuntu's
case, devel), but for most software I work on, few users really test
pre-release versions. As much as you plead with them, "hey, beta 3 is out,
please test!" they just won't for totally understandable reasons. So
problems arise *after* the final release because that's when people start
to really hammer on it, and integrate it with their own software,
environments, and workflows. That means day-to-day user testing just can't
be all that reliable because there are so few data points, and it's another
reason why I think automated testing/CI is so important. (It's also an
investment over time; you don't have to have 100% coverage from day one,
but every new test can improve overall quality just a little bit.) It's
also why I feel it's important for *me* to run unstable/devel. True, it's
my day job, but I also feel a responsibility to help ensure the little
corner of stuff I use, care about, and know about is in as good a shape as
possible before it gets into the hands of our users. I *want* to feel the
pain before they do.

The before/after comparison for Debian and Ubuntu is apples and oranges.
Before Ubuntu had the auto package test migration there we nothing other than
installability blocking migration, it had (and still doesn't AFAIK) any notion
of blocking due to RC bugs.

Back to my experience with postfix: I don't recall the auto package test
catching anything. When I upload it broken to unstable (and via autosync to
the Ubuntu devel release) people notice pretty much right away.

I'm sure it's generally helped, but so far, I've found it mostly a nuisance.
If Debian started enforcing auto package test pass for Testing migration, the
first thing I'd do is remove the postfix tests (it's never worked on Debian as
far as I've noticed, despite intermittently working on Ubuntu, and I've no
idea why). Postfix doesn't have rdepends, so at least for that package, I can
side step the problem.

Scott K

Ian Jackson

2017-01-16 18:01:13 UTC

Post by Scott Kitterman
I'm sure it's generally helped, but so far, I've found it mostly a
nuisance. If Debian started enforcing auto package test pass for
Testing migration,

Right now the plan is to have _passing tests_ (well, regressionless
ones) _reduce_ the migration delay. Failing tests would be the same
as no tests.

I do agree that there would be a temptation to remove troublesome
tests, rather than fix or debug them. That's why I'm suggesting that
fairly soon the maintainer should get to override the tests, so that
the test is not considered a blocker.

Ian.

--
Ian Jackson <***@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Barry Warsaw

2017-01-16 19:04:56 UTC

Post by Ian Jackson
Right now the plan is to have _passing tests_ (well, regressionless
ones) _reduce_ the migration delay. Failing tests would be the same
as no tests.

One other important point for the Ubuntu infrastructure is that the
autopkgtests are a ratchet. IOW, if a test has *never* passed, its continued
failure won't block promotion. It's only once a test starts passing and then
regresses will it block.

We have an "excuses" page that shows you what things look like. It could be
prettied-up, but it provides lots of useful information. It also includes a
retry button (the little three-arrow triangle) for people with the proper
permissions.

http://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses.html

Cheers,
-Barry

Colin Watson

2017-01-16 17:51:22 UTC