Dave Mitchell
2024-02-25 14:30:47 UTC
For years we've had intermittent timeout smoke failures on two of Tie::File's
test files, 29a_upcopy.t, 29_downcopy.t.
Currently it would appear that they only fail (via test timeouts) on Arm
platform smokers. The failing test(s) vary.
The alarm timeout is per test, initially at 5 seconds; then increased by
me to 10s in Nov 2022, then to 20s for parallel builds by Yves in Feb 2023.
For comparison, on my fast Intel laptop on a debugging build,
29_downcopy.t takes 0.08s to run - that represents about 300 microseconds
per test - a test that is taking more than 20s on a smoker and timing out.
In terms of memory usage, (again on my debugging build), just loading the
test script but running no tests has a memory footprint of 12.5Mb;
running all the tests too increases that to 13.1Mb.
So it's really not using much memory or CPU. So why does the smoke fail?
I'm guessing at least one of the smokers is a Raspberry Pi based on the OS
("Raspbian GNU/Linux 10.13"). Now I'd expect an rPi to be slower, but that
much slower?
So the question really is:
is the fact that the tests running that slow reasonable, and so we should
just disable the test files on slow platforms, or is something else afoot
here?
Does anyone have an rPi they could run those test scripts on and see how
much memory and CPU those two files use?
Each failing test does something like: create a test file consisting of a
few 8k blocks, then do some line inserts, which behind the scenes triggers
a few OS calls along the lines of lseek(i); read(8k block); lseek(j);
write(8k block). But overall, not really a lot of IO activity, nor on
large files. Even with tests running in parallel I wouldn't expect the CPU
to over-burdened to that extent.
test files, 29a_upcopy.t, 29_downcopy.t.
Currently it would appear that they only fail (via test timeouts) on Arm
platform smokers. The failing test(s) vary.
The alarm timeout is per test, initially at 5 seconds; then increased by
me to 10s in Nov 2022, then to 20s for parallel builds by Yves in Feb 2023.
For comparison, on my fast Intel laptop on a debugging build,
29_downcopy.t takes 0.08s to run - that represents about 300 microseconds
per test - a test that is taking more than 20s on a smoker and timing out.
In terms of memory usage, (again on my debugging build), just loading the
test script but running no tests has a memory footprint of 12.5Mb;
running all the tests too increases that to 13.1Mb.
So it's really not using much memory or CPU. So why does the smoke fail?
I'm guessing at least one of the smokers is a Raspberry Pi based on the OS
("Raspbian GNU/Linux 10.13"). Now I'd expect an rPi to be slower, but that
much slower?
So the question really is:
is the fact that the tests running that slow reasonable, and so we should
just disable the test files on slow platforms, or is something else afoot
here?
Does anyone have an rPi they could run those test scripts on and see how
much memory and CPU those two files use?
Each failing test does something like: create a test file consisting of a
few 8k blocks, then do some line inserts, which behind the scenes triggers
a few OS calls along the lines of lseek(i); read(8k block); lseek(j);
write(8k block). But overall, not really a lot of IO activity, nor on
large files. Even with tests running in parallel I wouldn't expect the CPU
to over-burdened to that extent.
--
"Strange women lying in ponds distributing swords is no basis for a system
of government. Supreme executive power derives from a mandate from the
masses, not from some farcical aquatic ceremony."
-- Dennis, "Monty Python and the Holy Grail"
"Strange women lying in ponds distributing swords is no basis for a system
of government. Supreme executive power derives from a mandate from the
masses, not from some farcical aquatic ceremony."
-- Dennis, "Monty Python and the Holy Grail"