Post by Steve LangasekPost by Ole StreicherPost by Paul GeversOne other thing that I can envision (but maybe to early to agree on or
set in stone) is that we lower the NMU criteria for fixing (or
temporarily disabling) autopkgtest in ones reverse dependencies. In
the end, personally I don't think this is up to the "relevant
maintainers" but up to the release team. And I assume that badly
maintained autopkgtest will just be a good reason to kick a package
out of testing.
I already brought an example where autopkgtest ist well maintained but
keeps failing.
And I think that it is the package maintainers who have the experience
of whether a CI test failure is critical or not.
If the failure of the test is not critical, then it should not be used as a
gate for CI. Which means you, as the package maintainer who knows that this
test failure is not critical, should fix your autopkgtest to not fail when
the non-critical test case fails.
Again: astropy (as an example) has 9000 tests, designed by a quite large
upstream team. There is no realistic way to proactively decide which of
these is critical and which not. At the end I would have to decide
whether to make any of those tests fail, or none.
Post by Steve LangasekQuite to the contrary of the claims in this thread that gating on
autopkgtests will create a bottleneck in the release team for overriding
test failures, this will have the effect of holding maintainers accountable
for the state of their autopkgtest results. CI tests are only useful if you
have a known good baseline. If your tests are flaky, or otherwise produce
failures that you think don't matter, then those test results are not useful
than anyone but yourself. Please help us make the autopkgtests useful for
the whole project.
I use autopkgtests for the majority of my packages, and they are very
useful.
Just to bring my experience from the last days: I uploaded a new
upstream version of astropy, also removed an internal version of pytest
(2.X) that was accidentally overseen before. This caused a number of
rdependent failures:
1. the new astropy version ignores the "remote_data" option in its test
infrastructure, which causes rdeps to fail if they have (optional)
remote tests which previously were disabled by this option.
This bug is clearly a regression of astropy. However since it does not
affect the normal work of the package but only the test infrastructure,
and there is a workaround for the affected packages, it is not really RC.
2. Since now the Debian version of pytest is to be used (which is
3.0.5), deprecation warnings appear for the packages that use the
astropy test framework, causing the tests to fail. This is not a
regression of astropy, but also not RC for the affected rdeps: I am even
not sure whether one should allow-stderr here, since this would just
hide the problem. Instead, I would prefer creating bugs upstream. Since
in the specific case they don't expect the switch to 3.0.5, this may
take some time. In the meantime, the packages are perfectly usable,
until pytest upstream decided to remove the deprecated stuff.
Post by Steve LangasekFrom my experience, a large amount of CI test failures are not due to
bugs in the package itself, but in the test code. Then, especially if
one runs a large test suite, many of the tests check some corner cases
which rarely appear in the ordinary work of the package. Sure, both
needs to be fixed, but it is not RC: the package will work fine for most
people -- at least, in Debian we don't have a policy that *any*
identified bug is RC.
I prefer to have CI tests as an indicator of *possible* problems with
the package, so they should start to ring a bell *before* something
serious happens.
Post by Steve LangasekAnd the incentive for maintainers to keep their autopkgtests in place
instead of removing them altogether is that packages with succeeding
autopkgtests can have their testing transition time decreased from the
default. (The release team agreed to this policy once upon a time, but I'm
not sure if this is wired up or if that will happen as part of Paul's work?)
Hmm. Often the CI tests are more or less identical with the build time
tests, so that passing CI tests is not really a surprise. And we are
discussion here a different case: that rdependencies need to have
passing CIs.
Post by Steve LangasekThe result of the autopkgtest should be whatever you as the maintainer think
is the appropriate level for gating. Frankly, I think it's sophistry to
argue both that you care about seeing the results of the tests, and that you
don't want a failure of those tests to gate because they only apply to
"special cases".
I just want to keep the possibilities as they are for other reported
problems: reassing to the correct package, change severity, forward to
upstream, keep the discussion etc. I just want them as ordinary bugs.
Again: we have a powerful system that shows what is needed to do with a
package, and this is bugs.d.o. Why re-invent the wheel and use something
that is less powerful here? Why should we handle a CI test failure
differently from any bug report?
Post by Steve LangasekBear in mind that this is as much about preventing someone else's
package from silently breaking yours in the release, as it is about
your package being blocked in unstable.
I would propose (again) that a failing CI test would just create an RC
bug for the updated package, affecting the other. Its maintainer then
can decide whether he needs to solve this himself, reassigns it to the
rdep or lowers the severity. Since the other package maintainer is
involved via "affects", this will not be done in isolation. If the
decision is questionable, the problem may be escalated as any other
bug.
Best regards
Ole