Discussion:
[Bug 195458] New: Hang on shutdown/root unmount after FreeBSD 10.1R
(too old to reply)
b***@freebsd.org
2014-11-27 22:44:46 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Bug ID: 195458
Summary: Hang on shutdown/root unmount after FreeBSD 10.1R
upgrade
Product: Base System
Version: 10.1-RELEASE
Hardware: Any
OS: Any
Status: New
Severity: Affects Many People
Priority: ---
Component: kern
Assignee: freebsd-***@FreeBSD.org
Reporter: ***@lifeforms.nl

Created attachment 149945
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=149945&action=edit
Screenshot of the hang

On three out of six FreeBSD installs upgraded from 10.0R to 10.1R, the system
freezes as the 10.1 system shutdowns for the first time. After "All buffers
synced." the system remains at 100% CPU and makes no progress for a long time.
After a forced reset, the file system is dirty.

On another upgrade, the first reboot in 10.1 seemed to go fine. However, a
subsequent shutdown flashed the following error:

All buffers synced.
softdep_waitidle: Failed to flush worklist for 0xfffff800027b4330
unmount of / failed (BUSY)

leaving the file system also dirty. This seems related to the first problem
(see below).

--

Throughout the 10.1-RC cycle, various others have described the hang after
upgrading to 10.1 on the freebsd-stable mailinglist:
https://lists.freebsd.org/pipermail/freebsd-stable/2014-October/080595.html

I spoke to another user on IRC who confirmed the hang with 10.1-RELEASE on two
physical servers.

In #195183, another hang at reboot after updating to 10.1 is reported (although
I am not sure about the effect of ipfw; in any case enabling/disabling ipfw has
no effect for me.)

--

I reproduced the problem on a clean 10.0-RELEASE install in VMware after a
freebsd-update to 10.1-RELEASE. I snapshotted this VM after freebsd-update but
before rebooting it, so I can do experiments on it if needed.

It appears to me that the problem happens during unmounting of the UFS root
filesystem. If after upgrading I drop to single user mode ("shutdown now"), and
attempt the command "/sbin/mount -o ro /", this should normally succeed.
However, on a failed 10.1 machine, the CPU goes to 100% and the command never
finishes. Just like during a shutdown, the kernel is alive (e.g. the host
pings) but it's not possible to recover from the situation.

I have not seen this problem on subsequent reboots of 10.1 systems, nor on
clean 10.1 installs (non-upgraded), but very consistently after a 10.0 to 10.1
upgrade.

It's a pretty big showstopper for me at this point, so please let me know if I
can provide more info from the test box or help in other ways.

--

Reproduce:
- download 10.0-RELEASE amd64 ISO
- create a VM in VMware
- install 10.0-RELEASE, UFS2, all defaults # problem happens on SU and SU+J
- freebsd-update upgrade -r 10.1-RELEASE
- freebsd-update install
- shutdown -r now
- freebsd-update install
- freebsd-update install # make a VM snapshot before continuing
- shutdown -r now # CPU goes to 100% after "All buffers synced."
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2014-11-29 22:38:48 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #1 from Walter Hop <***@lifeforms.nl> ---
I noticed today that the same problem happens when upgrading from 10.1-RC3 to
10.1-RELEASE, and also on a clean install of 10.1-RELEASE.

I did some more digging. As a refresher, the hang didn't occur just when
booting the new kernel. It happened only after "freebsd-update install" was
executed to replace userland. What is so special about "freebsd-update install"
that would trigger the problem?

I think the interesting bit might be that it replaced /sbin/init.

I can completely reliably trigger a hang on a default 10.1-RELEASE install on
UFS2 in VMware Fusion with the following procedure:

# chflags noschg /sbin/init
# cp -Rp /sbin/init /sbin/init2
# rm -f /sbin/init
# mv /sbin/init2 /sbin/init
# chflags schg /sbin/init
# reboot
=> Hang after "All buffers synced."

I created two clean 10.1 UFS2 installs which both exhibit the problem 100% of
trying.

I tried doing the same on two clean 10.1 ZFS (auto) installs. ZFS does NOT
exhibit the problem so far.

I tried disabling softupdates (tunefs -n disable /dev/da0p2) before doing the
procedure. In this case, on multiple machines, it also does NOT hang.

On 10.0 there's also no problem.

So: on FreeBSD 10.1, when using UFS2 with journaled softupdates, replacing init
leads to a hang when rebooting/unmounting root afterwards.

And a workaround might be to disable softupdates before upgrading to 10.1.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2014-12-13 10:34:29 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #5 from ***@abv.bg ---
I also experienced it. I freebsd-update upgraded two physical 10.0-RELEASE-p13
installs to 10.1-RELEASE-p1.

For the first, I didn't disable soft updates and it froze at "All buffers
synced." during the last reboot (as per [1]).

For the second, I disabled soft updates and everything went fine.

[1] https://www.freebsd.org/releases/10.1R/installation.html
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2014-12-14 19:40:10 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #6 from Walter Hop <***@lifeforms.nl> ---
According to Jilles Tjoelker the LOR are false positives so please disregard
them.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2014-12-19 12:47:27 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #7 from ***@gmx.us ---
I have the same problem, however, I noticed the issue after installing patch-1
to a 10.1-RELEASE box that had been previously (weeks before) upgraded from
10.0-RELEASE. I'm using UFS, not ZFS. More confounding, it doesn't always
happen, but I haven't been able to pin down the trigger. As with others above,
this issue only presents on one of two very similar installs (while the
installs are very similar, the hardware is different, but both are amd64.)
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2014-12-29 15:16:22 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Davide Davini <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #8 from Davide Davini <***@gmail.com> ---
Same issue here on three VM's 10.1R-p1 during today upgrade to p3. They were
recently upgraded to 10.1R from 10.0R. All three VM's run on ESXi 5.5. All
three VM's are clones of the first one I installed. We are talking about amd64
machines using UFS with soft-updates on.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-05 09:55:50 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

***@blodan.se changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@blodan.se

--- Comment #9 from ***@blodan.se ---
I just went from 10.1-p0 to 10.1-p3 on a completly new server and got stuck
after "All buffers synced" so started looking around and found this pr :)

Last week i upgraded 16 supermicro boxes from 10.0 to 10.1 and every single one
of them got stuck att "All buffers synced" after freebsd-update
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-14 09:44:07 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Andrew Smith <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #10 from Andrew Smith <***@gmail.com> ---
It seems that this at last effects all Kernels since 10.1-RELEASE through p3.

If you have one of these Kernels with Softupdates active on the root filesystem
and you replace /sbin/init then you get this behaviour.

If you either disable file system softupdates on the filesystem or you disable
the softupdates option in a new Kernel build then the issue does not exist.

I suspect people that haven't had this issue have some other environmental
difference that nobody has highlighted yet.

The issue of course hits 10.0-RELEASE to 10.1-RELEASE upgrades since when
freebsd-update install is run, the first thing done is to replace the Kernel
and ask for a reboot then run freebsd-update again. The Kernel exhibiting the
problem is then in place.

Could this be related to the changes made to ufs for the per FFS-Filesystem
threading around August? (r269457, r269533, r269583).
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-14 11:11:50 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #12 from Walter Hop <***@lifeforms.nl> ---
I have spent some hours bisecting -STABLE between 10.0 and 10.1, but I couldn't
pinpoint the revision, as the bug seems to depend on the clang build! So, the
bisecting procedure should probably involve building a clean toolchain at every
revision.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-14 14:45:04 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #13 from Andrew Smith <***@gmail.com> ---
Have rebuilt the 10.1 Kernel using CLANG 3.3 on a FreeBSD 10.0 system and on a
10.1-RELEASE-p3 system from the current 10.1-RELEASE-p3 source tree delivered
via freebsd-update.

The problem is apparent with both builds so it would suggest this isn't
directly tied to the updated toolchain.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-14 23:56:44 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #14 from ***@gmx.us ---
I was (concurrently) having many problems with the transition to
xorg-server-1.14. Long story short, I disabled hald (8) and have not had any
hangs since. I will note I have no real evidence of any connection here, only
the end result on this particular install.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-15 10:41:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #15 from Andrew Smith <***@gmail.com> ---
To confirm, p4 seemed to appear as available today and the problem persists.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-01-16 15:06:55 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Matt Kollross <***@illinois.edu> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@illinois.edu

--- Comment #16 from Matt Kollross <***@illinois.edu> ---
I've also noticed this issue. I upgraded 4 machines last night, and all had the
same problem.

Specs: 10.0-RELEASE p4 -> 10.1-RELEASE

Two where VMs on VMWARE EXI, and the other two were bare metal Dell R310. I
believe all are running UFS.

During the freebsd-update process, it installs the kernel with no issues,
prompts for a reboot. This appeared to work fine on all 4 machines. However
after running freebsd-update install an additional two times (one to install
userland and again to remove older libraries) I reboot again and this is where
it hangs.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-04 09:07:34 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Charley Sheets <***@nvidia.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@nvidia.com

--- Comment #17 from Charley Sheets <***@nvidia.com> ---
I've also just seen this issue. I installed 10.1-RELEASE on an HP ProLiant
DL360 G5. After initial configuration, I did a freebsd-update fetch,
freebsd-update install, and reboot.

Assuming I'm experiencing the same issue as others, it seems to be 10.1 itself
and not anything particular to the 10.0 -> 10.1 upgrade, as I did not perform
that upgrade.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-04 18:11:30 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

***@tnpi.net changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@tnpi.net

--- Comment #18 from ***@tnpi.net ---
I just experienced this as well. It was on a system I just upgraded from 9.3 ->
10.1-RELEASE via the "make world" dance (because of bug #195484 in
freebsd-update). Installing the 10.1 kernel and reboot worked fine. Then I
installed world, mergemaster, deleted old libs, and finally ran freebsd-update
to install the latest security patches. Then it hung on reboot. Ouch.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-04 18:21:32 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #19 from ***@tnpi.net ---
This seems related. On a server I upgraded to 10.1-RELEASE (via buildworld),
and then updated to the latest patch level, I get this in dmesg after boot:

Trying to mount root from ufs:/dev/mirror/root [rw]...
WARNING: / was not properly dismounted
WARNING: / was not properly dismounted
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-05 22:51:56 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

***@gmail.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #20 from ***@gmail.com ---
I installed Release 10.1 on an ancient desktop computer from a DVD image:

10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 22:51:51 UTC 2014

I can't remember what happened with the first two or three boots, but the
shutdown problem became apparent rather quickly. I have many packages
installed, including xdm and the xfce desktop. I've set up the necessary .pkla
file for polkit and put all users in the operator group, who conseqently have a
non-greyed Shutdown option in the GUI menu. Clicking on this and answering
"yes" to "do you want to shut down?" results in the computer becoming
unresponsive, in the sense that it can no longer be reached by ssh from another
machine, but it does not power down.

An alternative scenario is that the command "shutdown -p now" is issued by root
in a ssh terminal window. In this case the "System going down immediately"
message is received and the connection is lost as expected, but again the
computer does not power down.

On reboot into single-user mode, one always sees the message "/ was not
properly dismounted". Typing "shutdown -p now" at the console prompt does
effect a true shutdown with power off.

This is a troublesome fault. If one takes no remedial action, the filesytem
corruption eventally reaches a stage where the computer reboots spontaneously
and unpredictably while in normal use. It's tiresome to have to run fsck at
every boot. A workaround would be useful if a cure is not immediately
available.

--Mike
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-10 21:39:10 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #22 from ***@gmail.com ---
Disabling soft-updates journaling on just the root partition before the upgrade
alleviates the hanging issue for me as well. However, the filesystem still ends
up being dirty with an incorrect block count on the subsequent reboot during
fsck.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-10 22:05:59 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #23 from Andrew Smith <***@gmail.com> ---
It's a bit of an issue for me since I need to plan a series of unattended
upgrades. Thankfully those systems are on an unaffected 10.0 Kernel level but I
don;t have the option of disabling softupdates by booting from alternative
media.

I've try to prioritise some time to look at the problem but I'm mid way through
another large piece of work and can't stop :(
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-10 22:14:18 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #24 from ***@gmail.com ---
I am in the same situation of having to do remote unattended upgrades of a few
hundred boxes... For a while now I've been using a custom rc.d script to run
tunefs before the filesystems are mounted. I am trying to use this to disable
soft-updates journaling on the root partition before the upgrade with something
like this...

cat /etc/rc.d/tunefs
#!/bin/sh

# PROVIDE: tunefs
# REQUIRE: root
# BEFORE: fsck FILESYSTEMS
# KEYWORD: nojail

. /etc/rc.subr

name="tunefs"
start_cmd="tunefs_start"
stop_cmd=":"

tunefs_start()
{
echo -n "Tuning devices..."
tunefs -j disable /
}

load_rc_config $name
run_rc_command "$1"


Perhaps this will help someone. The problem is that at least for me the root
filesystem comes back dirty even though it does not hang. Maybe I am not
disabling journaling correctly.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-02-27 19:16:57 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

***@gmail.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #25 from ***@gmail.com ---
This just happened to a brand new install of FreeBSD 10.1 from disk1 AMD64 ISO
inside ESXi 5.5.

1. Installed under ESXi 5.5, ran through installer - chose default UFS
filesystem layout install with defaults options of SSH,etc.
2. Rebooted into new system, performed freebsd-update fetch, freebsd-update
install.
3. ran "reboot"
4. console was stuck after "All buffers synced."

Had to reset the VM, and then upon bootup:

Feb 27 10:52:30 proxy-01 kernel: Trying to mount root from ufs:/dev/da0p2
[rw]...
Feb 27 10:52:30 proxy-01 kernel: WARNING: / was not properly dismounted

So it affects fresh installs of 10.1R when updating to 10.1-RELEASE-p6.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-02 14:48:21 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

***@hotmail.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@hotmail.com

--- Comment #26 from ***@hotmail.com ---
I have no idea if this is related or not, but...


When I install FreeBSD 10.0 via an ILO/IPMI/KVM-mounted .iso image, the machine
usually hang in very much the same manner as described in this thread, just
when it is supposed to reboot.

The problem seem to be related with the USB system, which simply won't die, and
the machine will therefore never reboot itself.
(I guess the USB system is busy with my virtual CD/keyboard/mouse)

Workarounds:
press the reset-button, and the machine will reboot just fine...
or first set the sysctl hw.usb.no_shutdown_wait=1 before the installer reboot
the machine.



So if someone has a machine where the problem can *always* be reproduced, test
the sysctl command above just to see if the USB system is the real reason for
the hang and the subsequent dirty filesystem.

/Elof
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-02 21:46:58 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #27 from ***@tnpi.net ---
Re: ***@hotmail.com

I needed a KVM to be put on two servers exactly *because* of this issue. So
maybe that's part of it, but it's certainly not all of it.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-03 13:44:36 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #28 from ***@hotmail.com ---
That's what I was thinking...

Perhaps the more connected USB-stuff (such as KVM-keyboard, etc), the more
prone to freezing the machine gets?




Anyone have an old snapshot of a non-upgraded vm that *always* freeze when
upgraded?
Please run 'sysctl hw.usb.no_shutdown_wait=1' and 'echo
"hw.usb.no_shutdown_wait=1" >> /etc/sysctl.conf' and then upgrade the vm as
usual.
Could you reproduce the problem or did it pass?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-03 17:02:37 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #29 from ***@gmail.com ---
I just tried the following suggestion in a new stock 10.1R VM with UFS fs -
same hang on "All buffers synced." when trying to reboot after update:

1. 'sysctl hw.usb.no_shutdown_wait=1'
and 'echo "hw.usb.no_shutdown_wait=1" >> /etc/sysctl.conf'
2. freebsd-update fetch
3. freebsd-update install
4. reboot -> hang after "All buffers synced."

The CPU usage jumps up from being fairly low to a steady 65 - 75% on a dual CPU
VM after it hits the All buffers synced.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-03 22:40:04 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #30 from ***@hotmail.com ---
Thanks Jamie.
It was a long shot, but worth testing.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 16:28:35 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #31 from ***@gmail.com ---
+1 for this breaking upgrading to 10.1-RELEASE-p6 from p5. Not being able to
update to the latest patch level without having to physically reboot a
production server when it hangs during reboot really sucks...

In my opinion based on testing, the problem seems more to do with having UFS
soft updates journaling enabled, possibly coupled with replacing /sbin/init.
Temporarily disabling soft updates journaling somewhat alleviates the problem
for me when going from 9.x to 10.1 (the hang stops, but the file system comes
back dirty). However it is not a tractable solution when going from one 10.1
patch level to the next (e.g., 10.1-RELEASE-p5 to p6). It would be really nice
to get some sort of resolution to this bug.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 16:58:53 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #32 from ***@gmx.us ---
I can refine my comments above. Apparently, while I can shutdown cleanly at
all times, I cannot reboot at anytime without hanging. I am presently on
10.1-RELEASE-p6 on both a "generally Intel" desktop and a "generally
AMD/NVIDIA" desktop; both act identically, i.e., can shutdown cleanly but not
reboot. (I know those general descriptions are mostly useless, so, if anyone
needs dmesgs, etc, I'll gladly provide.)

I response to some of the above comments, I do not have soft updates enabled
and none of the kernel tunables had any positive effect.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 17:02:39 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #33 from Glen Barber <***@FreeBSD.org> ---
For clarification, when you say "reboot", do you mean reboot(8), or shutdown(8)
with the '-r' flag?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 17:07:23 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #34 from ***@gmail.com ---
In my cases shutdown(8) ("shutdown -r now") and reboot(8) exhibit the same
behavior.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 17:09:11 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #35 from Glen Barber <***@FreeBSD.org> ---
(In reply to ncrogers from comment #34)
Post by b***@freebsd.org
In my cases shutdown(8) ("shutdown -r now") and reboot(8) exhibit the same
behavior.
Ok, thank you.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 17:18:31 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #36 from ***@gmx.us ---
I'm using "shutdown -r now" to reboot and "shutdown -p now" to shutdown.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 17:49:47 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #37 from Glen Barber <***@FreeBSD.org> ---
(In reply to stoa from comment #36)
Post by b***@freebsd.org
I'm using "shutdown -r now" to reboot and "shutdown -p now" to shutdown.
Thank you.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 19:07:06 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #38 from Walter Hop <***@lifeforms.nl> ---
Please note that in my tests it's not the actual reboot that's the problem,
rather unmounting the root FS. Replacing /sbin/init binary on UFS+S and doing a
"mount -o ro /" afterwards also hangs the system at 100%CPU.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 19:55:05 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #39 from ***@gmail.com ---
(In reply to Walter Hop from comment #38)
I concur. The problem is with unmounting the root filesystem, and not so much
the reboot itself. In my tests simply executing a "mount -r /" (which should
fail) after the last freebsd-update stage results in a hang. This happens
before mount(8) is able to return a "Device busy" error.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 20:35:17 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #40 from Glen Barber <***@FreeBSD.org> ---
Can a few people please paste output from 'service -e | grep ^/etc' and 'sysctl
kern.shutdown' ?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 20:38:24 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #41 from ***@gmail.com ---
(In reply to Glen Barber from comment #40)
# uname -v
FreeBSD 10.1-RELEASE-p6 #8 r279296M: Wed Feb 25 16:15:37 EST 2015
***@fbsd_101_amd64_builder.rgnets.com:/usr/obj/usr/src/sys/RGNETS
# service -e | grep ^/etc
/etc/rc.d/hostid
/etc/rc.d/hostid_save
/etc/rc.d/cleanvar
/etc/rc.d/ip6addrctl
/etc/rc.d/devd
/etc/rc.d/newsyslog
/etc/rc.d/syslogd
/etc/rc.d/dmesg
/etc/rc.d/virecover
/etc/rc.d/motd
/etc/rc.d/sshd
/etc/rc.d/sendmail
/etc/rc.d/cron
/etc/rc.d/mixer
/etc/rc.d/gptboot
# sysctl kern.shutdown
kern.shutdown.show_busybufs: 0
kern.shutdown.poweroff_delay: 5000
kern.shutdown.kproc_shutdown_wait: 60
kern.shutdown.dumpdevname:
rxg#
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 20:42:51 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #42 from ***@gmx.us ---
/usr/home/dutch $ service -e | grep ^/etc
/etc/rc.d/hostid
/etc/rc.d/hostid_save
/etc/rc.d/cleanvar
/etc/rc.d/ip6addrctl
/etc/rc.d/devd
/etc/rc.d/pflog
/etc/rc.d/pf
/etc/rc.d/newsyslog
/etc/rc.d/syslogd
/etc/rc.d/dmesg
/etc/rc.d/virecover
/etc/rc.d/lpd
/etc/rc.d/motd
/etc/rc.d/ntpd
/etc/rc.d/moused
/etc/rc.d/sendmail
/etc/rc.d/cron
/etc/rc.d/mixer
/etc/rc.d/gptboot
/etc/rc.d/bgfsck

/usr/home/dutch $ sysctl kern.shutdown
kern.shutdown.show_busybufs: 0
kern.shutdown.poweroff_delay: 5000
kern.shutdown.kproc_shutdown_wait: 60
kern.shutdown.dumpdevname: ada0s2b

/usr/home/dutch $
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 20:45:53 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #43 from ***@blodan.se ---
[***@www-04-portlane ~]# uname -a
FreeBSD www-04-portlane.p203.se 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401:
Tue Nov 11 21:02:49 UTC 2014
***@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
[***@www-04-portlane ~]# service -e | grep ^/etc
/etc/rc.d/hostid
/etc/rc.d/hostid_save
/etc/rc.d/cleanvar
/etc/rc.d/ip6addrctl
/etc/rc.d/devd
/etc/rc.d/newsyslog
/etc/rc.d/syslogd
/etc/rc.d/dmesg
/etc/rc.d/virecover
/etc/rc.d/motd
/etc/rc.d/ntpd
/etc/rc.d/sshd
/etc/rc.d/cron
/etc/rc.d/mixer
/etc/rc.d/gptboot
/etc/rc.d/bgfsck
[***@www-04-portlane ~]# sysctl kern.shutdown
kern.shutdown.show_busybufs: 0
kern.shutdown.poweroff_delay: 5000
kern.shutdown.kproc_shutdown_wait: 60
kern.shutdown.dumpdevname: da0p3
[***@www-04-portlane ~]#
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 20:51:44 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #44 from Glen Barber <***@FreeBSD.org> ---
Thanks.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 23:05:09 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

***@gmail.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #45 from ***@gmail.com ---
I have succeeded in working around the issue by:
# shutdown now
# sync
wait a minute
# reboot

Since I started doing this I have not had my system hang. Note that "shutdown
now" followed by a reboot is functionally identical to "shutdown -r now" except
for the very long delay between termination of multi-user services the actual
reboot. I am guessing that something is flushing. I really doubt the "sync" is
required, but I do it just to be sure.

I have no explanation as to why this seems to work, but it has for the three
updates I have used it on. I can only hope that this gives someone a clue as to
what the actual issue is.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-09 23:15:56 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #46 from Glen Barber <***@FreeBSD.org> ---
(In reply to rkoberman from comment #45)
Post by b***@freebsd.org
# shutdown now
# sync
wait a minute
# reboot
Since I started doing this I have not had my system hang. Note that
"shutdown now" followed by a reboot is functionally identical to "shutdown
-r now" except for the very long delay between termination of multi-user
services the actual reboot. I am guessing that something is flushing. I
really doubt the "sync" is required, but I do it just to be sure.
I have no explanation as to why this seems to work, but it has for the three
updates I have used it on. I can only hope that this gives someone a clue as
to what the actual issue is.
This is somewhat the direction I was going when asking for the service(8) and
sysctl(8) output. I've finally reproduced the issue in VirtualBox, so now that
I can reproduce it reliably, and with your findings, hope to be able to
identify the underlying cause soon.

Thank you for providing this information.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-10 17:15:30 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Guy Helmer <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #47 from Guy Helmer <***@FreeBSD.org> ---
(In reply to rkoberman from comment #45)

It seems the complaints here all involve FreeBSD 10, but I am seeing similar
issues on FreeBSD 9.3 on VMware ESXi servers.
The systems in question have / filesystem without soft-updates, and /usr
filesystem with soft-updates enabled. File contents have been disappearing from
the /usr partition after reboot.
A "sync" before "shutdown -r now" seems to have significantly reduced loss of
file contents.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-10 17:41:33 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #48 from Glen Barber <***@FreeBSD.org> ---
While continuing to look into this, I think I may have found a workaround.

Can someone test running 'freebsd-update install' twice *without* the
intermediate reboot between the kernel and userland updates?

The specific command sequence I'm interested in is:

# freebsd-update -r 10.1-RELEASE upgrade
# freebsd-update install
# freebsd-update install
# shutdown -r now
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-10 19:34:32 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #49 from ***@gmail.com ---
(In reply to Guy Helmer from comment #47)
I suspect that this is a different, though possibly related issue. In all cases
reported, while all filesystems were unclean on reboot, fsck never found any
errors on my system and ended up simply marking he volume "clean" (after a long
time had passed).

OTOH, the problem started on a system that had installed 10.0-R right after I
received it and figured out how to turn off boot signature checking so I could
boot the memstick install media. (That was hidden several menus deep in a menu
that only could be brought up when another, seemingly unrelated BIOS option was
modified.)

I can say that before I retired and still had many FreeBSD systems to maintain
that I never saw this with freebsd-update. Those were all version 9 systems,
and all were physical system, no virtualization involved.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-10 21:12:01 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #51 from Glen Barber <***@FreeBSD.org> ---
After editing sys/kern/kern_shutdown.c to be a bit more verbose, it appears
kern_reboot() is getting stuck on line 429:

421 if (nbusy) {
422 /*
423 * Failed to sync all blocks. Indicate this and
don't
424 * unmount filesystems (thus forcing an fsck on
reboot).
425 */
426 printf("Giving up on %d buffers\n", nbusy);
427 DELAY(5000000); /* 5 seconds */
428 } else {
429 if (!first_buf_printf)
430 printf("Final sync complete\n");
431 /*
432 * Unmount filesystems
433 */
434 if (panicstr == 0)
435 vfs_unmountall();
436 }
437 swapoff_all();
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 03:15:01 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #52 from Glen Barber <***@FreeBSD.org> ---
(In reply to Glen Barber from comment #51)
Post by b***@freebsd.org
After editing sys/kern/kern_shutdown.c to be a bit more verbose, it appears
421 if (nbusy) {
422 /*
423 * Failed to sync all blocks. Indicate this and don't
424 * unmount filesystems (thus forcing an fsck on reboot).
425 */
426 printf("Giving up on %d buffers\n", nbusy);
427 DELAY(5000000); /* 5 seconds */
428 } else {
429 if (!first_buf_printf)
430 printf("Final sync complete\n");
431 /*
432 * Unmount filesystems
433 */
434 if (panicstr == 0)
435 vfs_unmountall();
436 }
437 swapoff_all();
After looking further, it appears to make it through the if/else to at least
line 436, and swapoff_all() is triggered. So, still looking...
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 09:28:03 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Konstantin Belousov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #53 from Konstantin Belousov <***@FreeBSD.org> ---
(In reply to Glen Barber from comment #52)
It is physically impossible to hang on a line which is not loop.

I suspect that it is either the buffer flush code, or softdep worker thread
which loop and cause shutdown thread to wait. It is good that you have
reproducable case and willing to move it further, previous reporters only
bother to whine.

See
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
for the instructions on how to configure your kernel and what information to
get.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 16:12:23 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #54 from ***@hotmail.com ---
Install a VirtualBox VM via 10.1-RELEASE CD.
Reboot after installer: OK

Boot the installed system.
Reboot after doing nothing at all but logging in: OK

Boot again.
Run 'freebsd-update fetch install' (to 10.1-RELEASE-p6, 705 patches, 462 files)
Now I take a snapshot in VirtualBox - "Updates installed, not rebooted yet".
I now run 'reboot': Fail
Syncing disks, vnodes remaining...3 1 0 done
All buffers synced.

Poweroff in VirtualBox.
Boot the machine again.
Reboot: OK.



I now Restore Snapshot "Updates installed, not rebooted yet" and start the VM
again.
'reboot' again fails:
Syncing disks, vnodes remaining...3 1 0 0 done
All buffers synced.

Restore again and run 'sync && reboot': Fail
Syncing disks, vnodes remaining...3 1 0 0 done
All buffers synced.

Restore again.
sync
sync
sleep 30
reboot Fail
Syncing disks, vnodes remaining...3 1 0 0 done
All buffers synced.

Restore again.
sync ; sleep 5 ; sync ; sleep 5 ; shutdown -r now
Syncing disks, vnodes remaining...2 0 0 done
All buffers synced.


So... It fails with both 'reboot' and 'shutdown -r now'.


Restore again.
shutdown now
stopping cron
stopping sshd
stopping devd
Writing entropy file:.
.
syslogd exiting on signal 15
Enter full pathname of shell or RETURN for /bin/sh:
<return>
sync; sleep 5; reboot
Syncing disks, vnodes remaining...0 done
All buffers synced.

Now it hangs for 20 seconds, so it looks like it once again failed, BUT...
Suddenly the machine reboots!!!

(Normally the machine waits less than 1 second after the "All buffers synced"
message when I've run a 'reboot' command, so this must be a 20 second timeout
somewhere)

Also, I see no root (/) fs warnings upon booting. Yay!




I went back and re-ran the 'sync ; sleep 5 ; sync ; sleep 5 ; shutdown -r now'
command and waited several minutes. No reboot. Fail.


Restore again, ran 'shutdown now', enter single-user-shell and 'reboot'
Syncing disks, vnodes remaining...1 0 done
All buffers synced.
After 20 seconds, the machine reboots.



Restore again, stopped devd and killed cron, syslogd, adjkerntz, dhclient,
sendmail and ran 'reboot'
Syncing disks, vnodes remaining...1 0 done
All buffers synced.
Nope it fails. Waited several minutes.


Restore again, ran 'shutdown -ro now' (execute 'reboot' instead of signalling
init().
Syncing disks, vnodes remaining...2 2 0 0 done
All buffers synced.
Fail.


Restore again, ran 'shutdown -ron now' (prevent filesystem cache from being
flushed)
Syncing disks, vnodes remaining...2 2 0 0 done
All buffers synced.
Now the machine instantly reboots! Yay!
/ was not properly dismounted
/: mount pending error: blocks 8512 files 5
...Rebuilding fs from journal...





Findings:
'reboot' or 'shutdown -r' get the same results.
Manual pre 'sync' does nothing.
Running 'shutdown now' and hence entering single-user mode apparently does
something good.
Some buffers seem to be connected to a 20 second timeout.
Not flushing the buffers at all on shutdown removes the 20 second timeout (but
generates a corrupt fs).

You can easily reproduce this yourselves in VirtualBox to debug further. See
above.

The main problem is still a CRITICAL one, since even if you use the 'shutdown
now+single-user+20sec timeout'-approach to get the machine to finally _reboot_
OK, you still need KVM-access for the single-user-mode.
And if you use the 'shutdown -ron now'-approach, you do get the much needed
reboot, but you also get a corrupt fs... :-(
So remote FreeBSD machines without any iLO/IPMI still suffer badly from this. I
hope someone will find a fix soon.

/Elof
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 16:21:10 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #55 from Glen Barber <***@FreeBSD.org> ---
(In reply to Konstantin Belousov from comment #53)
Post by b***@freebsd.org
(In reply to Glen Barber from comment #52)
It is physically impossible to hang on a line which is not loop.
Yes, understood.
Post by b***@freebsd.org
I suspect that it is either the buffer flush code, or softdep worker thread
which loop and cause shutdown thread to wait. It is good that you have
reproducable case and willing to move it further, previous reporters only
bother to whine.
See
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/
kerneldebug-deadlocks.html for the instructions on how to configure your
kernel and what information to get.
The test machine now has INVARIANTS, INVARIANT_SUPPORT, WITNESS, DEBUG_LOCKS,
DEBUG_VFS_LOCKS, DIAGNOSTIC, and ALT_BREAK_TO_DEBUGGER. Unfortunately, it
panics on boot now, so I cannot proceed to the 'freebsd-update install; reboot'
phase.

Just prior to this, I left out DIAGNOSTIC and saw a lock order reversal after
the "All buffers synced." message. (I will provide screenshots in a separate
update.)

It looks like I will need to remove DIAGNOSTIC to get the system to boot.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 16:22:26 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #56 from Glen Barber <***@FreeBSD.org> ---
Created attachment 154206
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=154206&action=edit
DIAGNOSTIC panic (1/2)
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 16:22:53 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #57 from Glen Barber <***@FreeBSD.org> ---
Created attachment 154207
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=154207&action=edit
DIAGNOSTIC panic (2/2)
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 16:24:23 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #58 from Glen Barber <***@FreeBSD.org> ---
Created attachment 154208
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=154208&action=edit
lock order reversal after kern_reboot()
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 17:53:35 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #59 from Glen Barber <***@FreeBSD.org> ---
(In reply to Glen Barber from comment #55)
Post by b***@freebsd.org
The test machine now has INVARIANTS, INVARIANT_SUPPORT, WITNESS,
DEBUG_LOCKS, DEBUG_VFS_LOCKS, DIAGNOSTIC, and ALT_BREAK_TO_DEBUGGER.
Unfortunately, it panics on boot now, so I cannot proceed to the
'freebsd-update install; reboot' phase.
Just prior to this, I left out DIAGNOSTIC and saw a lock order reversal
after the "All buffers synced." message. (I will provide screenshots in a
separate update.)
It looks like I will need to remove DIAGNOSTIC to get the system to boot.
It seems removing DIAGNOSTIC alone was not enough, since now the test machine
panics on boot. Although unrelated to the original problem in this PR, the ddb
session will be included in a followup.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 17:54:53 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #60 from Glen Barber <***@FreeBSD.org> ---
Created attachment 154211
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=154211&action=edit
ddb transcript of panic-on-boot with debugging options enabled
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 20:26:28 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #61 from Glen Barber <***@FreeBSD.org> ---
I've finally gotten the machine into a state where I can access the debugger
after it hangs. script(1) output of the debugging session will be attached.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 20:27:30 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #62 from Glen Barber <***@FreeBSD.org> ---
Created attachment 154224
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=154224&action=edit
debugging session after freebsd-update and reboot
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 21:35:34 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #63 from Konstantin Belousov <***@FreeBSD.org> ---
(In reply to Glen Barber from comment #62)
You should add WITNESS_SKIPSPIN kernel option, it is known that console
spinlocks are not in order.

So for the attachment id=154224, is it possible to do show mount and show mount
<addr> for the root mp ?

You can 'set $lines 0' to disable pager in ddb.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-11 21:45:28 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #64 from Glen Barber <***@FreeBSD.org> ---
(In reply to Konstantin Belousov from comment #63)
Post by b***@freebsd.org
(In reply to Glen Barber from comment #62)
You should add WITNESS_SKIPSPIN kernel option, it is known that console
spinlocks are not in order.
Okay, I wasn't sure if we wanted to see spinlocks.
Post by b***@freebsd.org
So for the attachment id=154224, is it possible to do show mount and show
mount <addr> for the root mp ?
Sure. One thing to note (though it shouldn't matter) is that each iteration
requires a rollback of the VM. I only mention this in case there is
inconsitencies between ddb sessions.
Post by b***@freebsd.org
You can 'set $lines 0' to disable pager in ddb.
Thank you, I wasn't aware of this.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-12 01:52:38 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #65 from Glen Barber <***@FreeBSD.org> ---
(In reply to Glen Barber from comment #59)
Post by b***@freebsd.org
(In reply to Glen Barber from comment #55)
Post by b***@freebsd.org
The test machine now has INVARIANTS, INVARIANT_SUPPORT, WITNESS,
DEBUG_LOCKS, DEBUG_VFS_LOCKS, DIAGNOSTIC, and ALT_BREAK_TO_DEBUGGER.
Unfortunately, it panics on boot now, so I cannot proceed to the
'freebsd-update install; reboot' phase.
[...]
It looks like I will need to remove DIAGNOSTIC to get the system to boot.
It seems removing DIAGNOSTIC alone was not enough, since now the test
machine panics on boot.
Just a note:

This particular issue (panic-on-boot with DIAGNOSTIC) will need to be
reinvestigated after the original issue discussed in this PR is identified and
resolved, as right now it is difficult to tell if this is an effect of a larger
issue.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-13 10:20:26 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Julien Cigar <***@ulb.ac.be> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@ulb.ac.be

--- Comment #66 from Julien Cigar <***@ulb.ac.be> ---
FYI I had an issue on a HP Proliant 10.0-RELEASE-p7 box which may be related:
the machine has been installed and worked perfectly for ~30 days, and one day
it suddenly "froze".. I had to physically power off/power on the box and the FS
was corrupted afterwards. SU+J was unable to recover it (it segfaulted
everywhere) but hopefully a manual fsck was able to repair it. As I had a *lot*
of issues with SU+J in the past I turned it off (the +J part) on all my FS and
since then it has been rock solid (the machine has ~200 days of uptime). I was
told that SU+J had been fixed on 10, but apparently there are still problems..
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-20 15:09:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #67 from ***@hotmail.com ---
Has anyone had any luck debugging this critical issue?

1)
A normal upgrade+reboot always freeze the machine permanently. :-(

2)
A normal upgrade+reboot, followed by 'shutdown now', entering single user mode
and finally rebooting from there always reboot my machine after a 20 second
timeout.
Better, but still not a solution since I need ssh-access to do this remotely
(no iLO/IPMI/KVM exists).

3)
A normal upgrade+reboot, followed by 'shutdown -ron now' (to prevent filesystem
cache from being flushed) always make my machine reboot immediately as it
should.
However, this too is not a perfect workaround since the filesystem gets
corrupted.



Given these three scenarios, and because they are reproduceable every time, I
hope a soluction will be found soon.
FreeBSD 10.0 is now unsupported so us FreeBSD users need to upgrade all our
machines. In my case this is >100 machines located all over the world.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-21 03:03:25 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #69 from Glen Barber <***@FreeBSD.org> ---
Just an update to note that this issue is not forgotten, and is being actively
(and heavily) investigated.

The underlying causes are not yet fully understood, and are quite complex by
nature.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-21 11:18:36 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #70 from ***@hotmail.com ---
Glen, that sounds good.

Just an update from me too:
I counted the seconds once more today when I installed and upgraded a
10.1-machine, and I think that all my "20 seconds" above should actually read
"30 seconds".
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-03-27 13:56:45 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #71 from commit-***@freebsd.org ---
A commit references this bug:

Author: kib
Date: Fri Mar 27 13:55:57 UTC 2015
New revision: 280760
URL: https://svnweb.freebsd.org/changeset/base/280760

Log:
Fix the hand after the immediate reboot when the following command
sequence is performed on UFS SU+J rootfs:
cp -Rp /sbin/init /sbin/init.old
mv -f /sbin/init.old /sbin/init

Hang occurs on the rootfs unmount. There are two issues:

1. Removed init binary, which is still mapped, creates a reference to
the removed vnode. The inodeblock for such vnode must have active
inodedep, which is (eventually) linked through the unlinked list. This
means that ffs_sync(MNT_SUSPEND) cannot succeed, because number of
softdep workitems for the mp is always > 0. FFS is suspended during
unmount, so unmount just hangs.

2. As noted above, the inodedep is linked eventually. It is not
linked until the superblock is written. But at the vfs_unmountall()
time, when the rootfs is unmounted, the call is made to
ffs_unmount()->ffs_sync() before vflush(), and ffs_sync() only calls
ffs_sbupdate() after all workitems are flushed. It is masked for
normal system operations, because syncer works in parallel and
eventually flushes superblock. Syncer is stopped when rootfs
unmounted, so ffs_sync() must do sb update on its own.

Correct the issues listed above. For MNT_SUSPEND, count the number of
linked unlinked inodedeps (this is not a typo) and substract the count
of such workitems from the total. For the second issue, the
ffs_sbupdate() is called right after device sync in ffs_sync() loop.

There is third problem, occuring with both SU and SU+J. The
softdep_waitidle() loop, which waits for softdep_flush() thread to
clear the worklist, only waits 20ms max. It seems that the 1 tick,
specified for msleep(9), was a typo.

Add fsync(devvp, MNT_WAIT) call to softdep_waitidle(), which seems to
significantly help the softdep thread, and change the MNT_LAZY update
at the reboot time to MNT_WAIT for similar reasons. Note that
userspace cannot create more work while devvp is flushed, since the
mount point is always suspended before the call to softdep_waitidle()
in unmount or remount path.

PR: 195458
In collaboration with: gjb, pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks

Changes:
head/sys/ufs/ffs/ffs_softdep.c
head/sys/ufs/ffs/ffs_vfsops.c
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-04-09 21:53:27 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

--- Comment #72 from ***@gmail.com ---
(In reply to commit-hook from comment #71)

Great work tracking this one down. I am guessing this is a no, but are there
any plans for this to make it into the 10.1-RELEASE branch? Another one of my
systems was hit by this today going from 10.1-p8 to 10.1-p9. Or is patching a
custom kernel the recommended solution?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-04-14 23:32:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

Xin LI <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|New |In Progress
Assignee|freebsd-***@FreeBSD.org |***@FreeBSD.org

--- Comment #73 from Xin LI <***@FreeBSD.org> ---
Take.
--
You are receiving this mail because:
You are the assignee for the bug.
Loading...