Discussion:
localed stuck in recent 3.18 git in copy_net_ns?
(too old to reply)
Kevin Fenzi
2014-10-20 20:40:01 UTC
Permalink
Greetings.

I'm seeing suspend/resume failures with recent 3.18 git kernels.

Full dmesg at: http://paste.fedoraproject.org/143615/83287914/

The possibly interesting parts:

[ 78.373144] PM: Syncing filesystems ... done.
[ 78.411180] PM: Preparing system for mem sleep
[ 78.411995] Freezing user space processes ...
[ 98.429955] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 98.429971] (-localed) D ffff88025f214c80 0 1866 1 0x00000084
[ 98.429975] ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[ 98.429978] ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[ 98.429981] 0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[ 98.429983] Call Trace:
[ 98.429991] [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[ 98.429994] [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[ 98.429997] [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[ 98.430001] [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[ 98.430005] [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[ 98.430008] [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[ 98.430012] [<ffffffff81098813>] SyS_unshare+0x193/0x340
[ 98.430015] [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

[ 98.430032] Restarting tasks ... done.
[ 98.480361] PM: Syncing filesystems ... done.
[ 98.571645] PM: Preparing system for freeze sleep
[ 98.571779] Freezing user space processes ...
[ 118.592086] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 118.592102] (-localed) D ffff88025f214c80 0 1866 1 0x00000084
[ 118.592106] ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[ 118.592109] ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[ 118.592111] 0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[ 118.592114] Call Trace:
[ 118.592121] [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[ 118.592125] [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[ 118.592127] [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[ 118.592132] [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[ 118.592136] [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[ 118.592139] [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[ 118.592143] [<ffffffff81098813>] SyS_unshare+0x193/0x340
[ 118.592146] [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

[ 118.592163] Restarting tasks ... done.

root 6 0.0 0.0 0 0 ? D 13:49 0:00 [kworker/u16:0]
root 1876 0.0 0.0 41460 5784 ? Ds 13:49 0:00 (-localed)

I'll try and bisect this, but perhaps it rings bells already for folks.

kevin
Dave Jones
2014-10-20 20:50:03 UTC
Permalink
Post by Kevin Fenzi
I'm seeing suspend/resume failures with recent 3.18 git kernels.
Full dmesg at: http://paste.fedoraproject.org/143615/83287914/
[ 78.373144] PM: Syncing filesystems ... done.
[ 78.411180] PM: Preparing system for mem sleep
[ 78.411995] Freezing user space processes ...
[ 98.429971] (-localed) D ffff88025f214c80 0 1866 1 0x00000084
[ 98.429975] ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[ 98.429978] ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[ 98.429981] 0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[ 98.429991] [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[ 98.429994] [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[ 98.429997] [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[ 98.430001] [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[ 98.430005] [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[ 98.430008] [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[ 98.430012] [<ffffffff81098813>] SyS_unshare+0x193/0x340
[ 98.430015] [<ffffffff817617a9>] system_call_fastpath+0x12/0x17
I've seen similar soft lockup traces from the sys_unshare path when running my
fuzz tester. It seems that if you create enough network namespaces,
it can take a huge amount of time for them to be iterated.
(Running trinity with '-c unshare' you can see the slow down happen. In
some cases, it takes so long that the watchdog process kills it --
though the SIGKILL won't get delivered until the unshare() completes)

Any idea what this machine had been doing prior to this that may have
involved creating lots of namespaces ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Kevin Fenzi
2014-10-20 21:00:03 UTC
Permalink
On Mon, 20 Oct 2014 16:43:26 -0400
Post by Dave Jones
I've seen similar soft lockup traces from the sys_unshare path when
running my fuzz tester. It seems that if you create enough network
namespaces, it can take a huge amount of time for them to be iterated.
(Running trinity with '-c unshare' you can see the slow down happen.
In some cases, it takes so long that the watchdog process kills it --
though the SIGKILL won't get delivered until the unshare() completes)
Any idea what this machine had been doing prior to this that may have
involved creating lots of namespaces ?
That was right after boot. ;)

This is my main rawhide running laptop.

A 'ip netns list' shows nothing.

kevin
Kevin Fenzi
2014-10-21 21:20:03 UTC
Permalink
On Mon, 20 Oct 2014 14:53:59 -0600
Post by Kevin Fenzi
On Mon, 20 Oct 2014 16:43:26 -0400
Post by Dave Jones
I've seen similar soft lockup traces from the sys_unshare path when
running my fuzz tester. It seems that if you create enough network
namespaces, it can take a huge amount of time for them to be
iterated. (Running trinity with '-c unshare' you can see the slow
down happen. In some cases, it takes so long that the watchdog
process kills it -- though the SIGKILL won't get delivered until
the unshare() completes)
Any idea what this machine had been doing prior to this that may
have involved creating lots of namespaces ?
That was right after boot. ;)
This is my main rawhide running laptop.
A 'ip netns list' shows nothing.
Some more information:

The problem started between:

v3.17-7872-g5ff0b9e1a1da and v3.17-8307-gf1d0d14120a8

(I can try and do a bisect, but have to head out on a trip tomorrow)

In all the kernels with the problem, there is a kworker process in D.

sysrq-t says:
Showing all locks held in the system:
Oct 21 15:06:31 voldemort.scrye.com kernel: 4 locks held by kworker/u16:0/6:
Oct 21 15:06:31 voldemort.scrye.com kernel: #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel: #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel: #2: (net_mutex){+.+.+.}, at: [<ffffffff817069fc>] cleanup_net+0x8c/0x1f0
Oct 21 15:06:31 voldemort.scrye.com kernel: #3:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a395>]
_rcu_barrier+0x35/0x200

On first running any of the systemd units that use PrivateNetwork, then
run ok, but they are also set to timeout after a minute. On sucessive
runs they hang in D also.

kevin
Josh Boyer
2014-10-22 17:20:03 UTC
Permalink
Post by Kevin Fenzi
On Mon, 20 Oct 2014 14:53:59 -0600
Post by Kevin Fenzi
On Mon, 20 Oct 2014 16:43:26 -0400
Post by Dave Jones
I've seen similar soft lockup traces from the sys_unshare path when
running my fuzz tester. It seems that if you create enough network
namespaces, it can take a huge amount of time for them to be
iterated. (Running trinity with '-c unshare' you can see the slow
down happen. In some cases, it takes so long that the watchdog
process kills it -- though the SIGKILL won't get delivered until
the unshare() completes)
Any idea what this machine had been doing prior to this that may
have involved creating lots of namespaces ?
That was right after boot. ;)
This is my main rawhide running laptop.
A 'ip netns list' shows nothing.
v3.17-7872-g5ff0b9e1a1da and v3.17-8307-gf1d0d14120a8
(I can try and do a bisect, but have to head out on a trip tomorrow)
In all the kernels with the problem, there is a kworker process in D.
Oct 21 15:06:31 voldemort.scrye.com kernel: #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel: #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel: #2: (net_mutex){+.+.+.}, at: [<ffffffff817069fc>] cleanup_net+0x8c/0x1f0
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a395>]
_rcu_barrier+0x35/0x200
On first running any of the systemd units that use PrivateNetwork, then
run ok, but they are also set to timeout after a minute. On sucessive
runs they hang in D also.
Someone else is seeing this when they try and modprobe ppp_generic:

[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600386] Call Trace:
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603224] 4 locks held by kworker/u16:5/100:
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605228] Call Trace:
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606773] 1 lock held by modprobe/1387:
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608138] Call Trace:
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609677] 1 lock held by modprobe/1466:
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70

Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.

Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Cong Wang
2014-10-22 17:40:02 UTC
Permalink
(Adding Paul and Eric in Cc)

I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().

Thanks.
Post by Josh Boyer
[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70
Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.
Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.
josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Josh Boyer
2014-10-22 17:50:02 UTC
Permalink
Post by Cong Wang
(Adding Paul and Eric in Cc)
I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().
Possibly. The person that reported the issue below said it showed up
between Linux v3.17-7872-g5ff0b9e1a1da and Linux
v3.17-8307-gf1d0d14120a8 for them. Which is a slightly older window
than the on that Kevin reported. I haven't had a chance to dig
through the commits yet.

josh
Post by Cong Wang
Post by Josh Boyer
[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70
Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.
Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.
josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-22 18:00:02 UTC
Permalink
Post by Cong Wang
(Adding Paul and Eric in Cc)
Post by Josh Boyer
[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70
Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.
Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.
I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().
From the limited trace data I see in this email I have to agree.

It looks like for some reason rcu_barrier is taking forever
while the rtnl_lock is held in cleanup_net. Because the
rtnl_lock is held modprobe of the ppp driver is getting stuck.

Is it possible we have an AB BA deadlock between the rtnl_lock
and rcu. With something the module loading code assumes?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-22 19:00:02 UTC
Permalink
Post by Eric W. Biederman
Post by Cong Wang
(Adding Paul and Eric in Cc)
Post by Josh Boyer
[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70
Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.
Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.
I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().
From the limited trace data I see in this email I have to agree.
It looks like for some reason rcu_barrier is taking forever
while the rtnl_lock is held in cleanup_net. Because the
rtnl_lock is held modprobe of the ppp driver is getting stuck.
Is it possible we have an AB BA deadlock between the rtnl_lock
and rcu. With something the module loading code assumes?
I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.
Does the module loading code do something strange with rcu? Perhaps
blocking an rcu grace period until the module loading completes?
If the module loading somehow blocks an rcu grace period that would
create an AB deadlock because loading the ppp module grabs the
rtnl_lock. And elsewhere we have the rtnl_lock waiting for an rcu grace
period.
I would think trying and failing to get the rtnl_lock would sleep and
thus let any rcu grace period happen but shrug.
It looks like something is holding up the rcu grace period, and causing
this. Although it is possible that something is causing cleanup_net
to run slowly and we are just seeing that slowness show up in
rcu_barrier as that is one of the slower bits. With a single trace I
can't definitely same that the rcu barrier is getting stuck but it
certainly looks that way.
Don't get me wrong -- the fact that this kthread appears to have
blocked within rcu_barrier() for 120 seconds means that something is
most definitely wrong here. I am surprised that there are no RCU CPU
stall warnings, but perhaps the blockage is in the callback execution
rather than grace-period completion. Or something is preventing this
kthread from starting up after the wake-up callback executes. Or...

Is this thing reproducible?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Josh Boyer
2014-10-22 19:40:02 UTC
Permalink
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
Post by Paul E. McKenney
Post by Eric W. Biederman
Post by Cong Wang
(Adding Paul and Eric in Cc)
Post by Josh Boyer
[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70
Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.
Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.
I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().
From the limited trace data I see in this email I have to agree.
It looks like for some reason rcu_barrier is taking forever
while the rtnl_lock is held in cleanup_net. Because the
rtnl_lock is held modprobe of the ppp driver is getting stuck.
Is it possible we have an AB BA deadlock between the rtnl_lock
and rcu. With something the module loading code assumes?
I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.
Does the module loading code do something strange with rcu? Perhaps
blocking an rcu grace period until the module loading completes?
If the module loading somehow blocks an rcu grace period that would
create an AB deadlock because loading the ppp module grabs the
rtnl_lock. And elsewhere we have the rtnl_lock waiting for an rcu grace
period.
I would think trying and failing to get the rtnl_lock would sleep and
thus let any rcu grace period happen but shrug.
It looks like something is holding up the rcu grace period, and causing
this. Although it is possible that something is causing cleanup_net
to run slowly and we are just seeing that slowness show up in
rcu_barrier as that is one of the slower bits. With a single trace I
can't definitely same that the rcu barrier is getting stuck but it
certainly looks that way.
Don't get me wrong -- the fact that this kthread appears to have
blocked within rcu_barrier() for 120 seconds means that something is
most definitely wrong here. I am surprised that there are no RCU CPU
stall warnings, but perhaps the blockage is in the callback execution
rather than grace-period completion. Or something is preventing this
kthread from starting up after the wake-up callback executes. Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you have.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-22 23:30:02 UTC
Permalink
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread appears to have
blocked within rcu_barrier() for 120 seconds means that something is
most definitely wrong here. I am surprised that there are no RCU CPU
stall warnings, but perhaps the blockage is in the callback execution
rather than grace-period completion. Or something is preventing this
kthread from starting up after the wake-up callback executes. Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you have.
- It is reproducible
- I've done another build here to double check and its definitely the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do testing if needed.
Please! Does the following patch help?

Thanx, Paul

------------------------------------------------------------------------

rcu: More on deadlock between CPU hotplug and expedited grace periods

Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
expedited grace periods) was incomplete. Although it did eliminate
deadlocks involving synchronize_sched_expedited()'s acquisition of
cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
deadlock involving acquisition of this same lock via put_online_cpus().
This deadlock became apparent with testing involving hibernation.

This commit therefore changes put_online_cpus() acquisition of this lock
to be conditional, and increments a new cpu_hotplug.puts_pending field
in case of acquisition failure. Then cpu_hotplug_begin() checks for this
new field being non-zero, and applies any changes to cpu_hotplug.refcount.

Reported-by: Jiri Kosina <***@suse.cz>
Signed-off-by: Paul E. McKenney <***@linux.vnet.ibm.com>
Tested-by: Jiri Kosina <***@suse.cz>

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 356450f09c1f..90a3d017b90c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -64,6 +64,8 @@ static struct {
* an ongoing cpu hotplug operation.
*/
int refcount;
+ /* And allows lockless put_online_cpus(). */
+ atomic_t puts_pending;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
@@ -113,7 +115,11 @@ void put_online_cpus(void)
{
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
+ if (!mutex_trylock(&cpu_hotplug.lock)) {
+ atomic_inc(&cpu_hotplug.puts_pending);
+ cpuhp_lock_release();
+ return;
+ }

if (WARN_ON(!cpu_hotplug.refcount))
cpu_hotplug.refcount++; /* try to fix things up */
@@ -155,6 +161,12 @@ void cpu_hotplug_begin(void)
cpuhp_lock_acquire();
for (;;) {
mutex_lock(&cpu_hotplug.lock);
+ if (atomic_read(&cpu_hotplug.puts_pending)) {
+ int delta;
+
+ delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
+ cpu_hotplug.refcount -= delta;
+ }
if (likely(!cpu_hotplug.refcount))
break;
__set_current_state(TASK_UNINTERRUPTIBLE);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-23 06:20:02 UTC
Permalink
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread appears to have
blocked within rcu_barrier() for 120 seconds means that
something is
most definitely wrong here. I am surprised that there are no RCU CPU
stall warnings, but perhaps the blockage is in the callback execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you have.
- It is reproducible
- I've done another build here to double check and its definitely the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do testing if needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test


INFO: task kworker/u16:6:101 blocked for more than 120 seconds.
Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kworker/u16:6 D ffff88022067cec0 11680 101 2 0x00000000
Workqueue: netns cleanup_net
ffff8802206939e8 0000000000000096 ffff88022067cec0 00000000001d5f00
ffff880220693fd8 00000000001d5f00 ffff880223263480 ffff88022067cec0
ffffffff82c51d60 7fffffffffffffff ffffffff81ee2698 ffffffff81ee2690
Call Trace:
[<ffffffff8185e289>] schedule+0x29/0x70
[<ffffffff818634ac>] schedule_timeout+0x26c/0x410
[<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
[<ffffffff81107afc>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81864530>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff81107c8d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185fcbc>] wait_for_completion+0x10c/0x150
[<ffffffff810e5430>] ? wake_up_state+0x20/0x20
[<ffffffff8112a799>] _rcu_barrier+0x159/0x200
[<ffffffff8112a895>] rcu_barrier+0x15/0x20
[<ffffffff81718f0f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170dad5>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff81725f7e>] rtnl_unlock+0xe/0x10
[<ffffffff8170f936>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd8f0>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff817079e3>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81708590>] cleanup_net+0x100/0x1f0
[<ffffffff810ccff8>] process_one_work+0x218/0x850
[<ffffffff810ccf5f>] ? process_one_work+0x17f/0x850
[<ffffffff810cd717>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd69b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd630>] ? process_one_work+0x850/0x850
[<ffffffff810d39eb>] kthread+0x10b/0x130
[<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[<ffffffff810d38e0>] ? kthread_create_on_node+0x250/0x250
[<ffffffff8186527c>] ret_from_fork+0x7c/0xb0
[<ffffffff810d38e0>] ? kthread_create_on_node+0x250/0x250
4 locks held by kworker/u16:6/101:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf5f>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccf5f>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff8170851c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a675>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1139 blocked for more than 120 seconds.
Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
modprobe D ffff880213ac1a40 13112 1139 1138 0x00000080
ffff880036ab3be8 0000000000000096 ffff880213ac1a40 00000000001d5f00
ffff880036ab3fd8 00000000001d5f00 ffff880223264ec0 ffff880213ac1a40
ffff880213ac1a40 ffffffff81f8fb48 0000000000000246 ffff880213ac1a40
Call Trace:
[<ffffffff8185e831>] schedule_preempt_disabled+0x31/0x80
[<ffffffff81860083>] mutex_lock_nested+0x183/0x440
[<ffffffff817083af>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff817083af>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa06f3000>] ? 0xffffffffa06f3000
[<ffffffff817083af>] register_pernet_subsys+0x1f/0x50
[<ffffffffa06f3048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153c52>] load_module+0x20c2/0x2870
[<ffffffff8114ec30>] ? store_uevent+0x70/0x70
[<ffffffff8110ac76>] ? lock_release_non_nested+0x3c6/0x3d0
[<ffffffff811544e7>] SyS_init_module+0xe7/0x140
[<ffffffff81865329>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/1139:
#0: (net_mutex){+.+.+.}, at: [<ffffffff817083af>]
register_pernet_subsys+0x1f/0x50
INFO: task modprobe:1209 blocked for more than 120 seconds.
Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
modprobe D ffff8800c5324ec0 13368 1209 1151 0x00000080
ffff88020d14bbe8 0000000000000096 ffff8800c5324ec0 00000000001d5f00
ffff88020d14bfd8 00000000001d5f00 ffff880223280000 ffff8800c5324ec0
ffff8800c5324ec0 ffffffff81f8fb48 0000000000000246 ffff8800c5324ec0
Call Trace:
[<ffffffff8185e831>] schedule_preempt_disabled+0x31/0x80
[<ffffffff81860083>] mutex_lock_nested+0x183/0x440
[<ffffffff817083fd>] ? register_pernet_device+0x1d/0x70
[<ffffffff817083fd>] ? register_pernet_device+0x1d/0x70
[<ffffffffa070f000>] ? 0xffffffffa070f000
[<ffffffff817083fd>] register_pernet_device+0x1d/0x70
[<ffffffffa070f020>] ppp_init+0x20/0x1000 [ppp_generic]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153c52>] load_module+0x20c2/0x2870
[<ffffffff8114ec30>] ? store_uevent+0x70/0x70
[<ffffffff8110ac76>] ? lock_release_non_nested+0x3c6/0x3d0
[<ffffffff811544e7>] SyS_init_module+0xe7/0x140
[<ffffffff81865329>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/1209:
#0: (net_mutex){+.+.+.}, at: [<ffffffff817083fd>] register_pernet_device+0x1d/0x70
Post by Paul E. McKenney
Thanx, Paul
---------------------------------------------------------------------
---
rcu: More on deadlock between CPU hotplug and expedited grace periods
Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
expedited grace periods) was incomplete. Although it did eliminate
deadlocks involving synchronize_sched_expedited()'s acquisition of
cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
deadlock involving acquisition of this same lock via
put_online_cpus().
This deadlock became apparent with testing involving hibernation.
This commit therefore changes put_online_cpus() acquisition of this lock
to be conditional, and increments a new cpu_hotplug.puts_pending field
in case of acquisition failure. Then cpu_hotplug_begin() checks for this
new field being non-zero, and applies any changes to
cpu_hotplug.refcount.
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 356450f09c1f..90a3d017b90c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -64,6 +64,8 @@ static struct {
* an ongoing cpu hotplug operation.
*/
int refcount;
+ /* And allows lockless put_online_cpus(). */
+ atomic_t puts_pending;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
@@ -113,7 +115,11 @@ void put_online_cpus(void)
{
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
+ if (!mutex_trylock(&cpu_hotplug.lock)) {
+ atomic_inc(&cpu_hotplug.puts_pending);
+ cpuhp_lock_release();
+ return;
+ }
if (WARN_ON(!cpu_hotplug.refcount))
cpu_hotplug.refcount++; /* try to fix things up */
@@ -155,6 +161,12 @@ void cpu_hotplug_begin(void)
cpuhp_lock_acquire();
for (;;) {
mutex_lock(&cpu_hotplug.lock);
+ if (atomic_read(&cpu_hotplug.puts_pending)) {
+ int delta;
+
+ delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
+ cpu_hotplug.refcount -= delta;
+ }
if (likely(!cpu_hotplug.refcount))
break;
__set_current_state(TASK_UNINTERRUPTIBLE);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-23 16:40:02 UTC
Permalink
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread appears to
have
blocked within rcu_barrier() for 120 seconds means that
something is
most definitely wrong here. I am surprised that there are no
RCU CPU
stall warnings, but perhaps the blockage is in the callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you
have.
- It is reproducible
- I've done another build here to double check and its definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do testing if needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks
after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
a53dd6a65668 (rcutorture: Add RCU-tasks tests to default rcutorture list)
If any of the above fail, this one should also fail.
Also, could you please send along your .config?
Which tree are those in?
They are all in Linus's tree. They are topic branches of the RCU merge
commit (d6dd50e), and the test results will hopefully give me more of a
clue where to look. As would the .config file. ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-23 20:00:01 UTC
Permalink
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread appears to have
blocked within rcu_barrier() for 120 seconds means that
something is
most definitely wrong here. I am surprised that there are no
RCU CPU
stall warnings, but perhaps the blockage is in the callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you have.
- It is reproducible
- I've done another build here to double check and its definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do testing if needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.

Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.

Hope that helps.

I am attaching the config.
Paul E. McKenney
2014-10-23 20:10:01 UTC
Permalink
Post by Yanko Kaneti
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread appears to
have
blocked within rcu_barrier() for 120 seconds means that
something is
most definitely wrong here. I am surprised that there are no
RCU CPU
stall warnings, but perhaps the blockage is in the callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you
have.
- It is reproducible
- I've done another build here to double check and its definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do testing if needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!

The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?

Thanx, Paul

------------------------------------------------------------------------

rcu: Kick rcuo kthreads after their CPU goes offline

If a no-CBs CPU were to post an RCU callback with interrupts disabled
after it entered the idle loop for the last time, there might be no
deferred wakeup for the corresponding rcuo kthreads. This commit
therefore adds a set of calls to do_nocb_deferred_wakeup() after the
CPU has gone completely offline.

Signed-off-by: Paul E. McKenney <***@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 84b41b3c6ebd..4f3d25a58786 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct notifier_block *self,
case CPU_DEAD_FROZEN:
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
- for_each_rcu_flavor(rsp)
+ for_each_rcu_flavor(rsp) {
rcu_cleanup_dead_cpu(cpu, rsp);
+ do_nocb_deferred_wakeup(this_cpu_ptr(rsp->rda));
+ }
break;
default:
break;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-23 21:50:04 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread appears to
have
blocked within rcu_barrier() for 120 seconds means that
something is
most definitely wrong here. I am surprised that
there are no
RCU CPU
stall warnings, but perhaps the blockage is in the callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback
executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace
above and can
recreate it reliably. Apparently reverting the RCU
merge commit
(d6dd50e) and rebuilding the latest after that does
not show the
issue. I'll let Yanko explain more and answer any
questions you
have.
- It is reproducible
- I've done another build here to double check and its definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do
testing if
needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to
revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try
testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with
WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora
rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!
The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?
Unfortunately not, This is linus-tip + patch


INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
Call Trace:
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
4 locks held by kworker/u16:6/96:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
Call Trace:
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/1045:
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
Post by Paul E. McKenney
Thanx, Paul
---------------------------------------------------------------------
---
rcu: Kick rcuo kthreads after their CPU goes offline
If a no-CBs CPU were to post an RCU callback with interrupts disabled
after it entered the idle loop for the last time, there might be no
deferred wakeup for the corresponding rcuo kthreads. This commit
therefore adds a set of calls to do_nocb_deferred_wakeup() after the
CPU has gone completely offline.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 84b41b3c6ebd..4f3d25a58786 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct
notifier_block *self,
- for_each_rcu_flavor(rsp)
+ for_each_rcu_flavor(rsp) {
rcu_cleanup_dead_cpu(cpu, rsp);
+ do_nocb_deferred_wakeup(this_cpu_ptr(rsp->rda));
+ }
break;
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-23 22:10:02 UTC
Permalink
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread
appears to
have
blocked within rcu_barrier() for 120 seconds means that
something is
most definitely wrong here. I am surprised that
there are no
RCU CPU
stall warnings, but perhaps the blockage is in the
callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback
executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace
above and can
recreate it reliably. Apparently reverting the RCU
merge commit
(d6dd50e) and rebuilding the latest after that does
not show the
issue. I'll let Yanko explain more and answer any
questions you
have.
- It is reproducible
- I've done another build here to double check and its
definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do
testing if
needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with
WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!
The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?
Unfortunately not, This is linus-tip + patch
OK. Can't have everything, I guess.
Post by Yanko Kaneti
INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
Presumably the kworker/u16:6 completed, then modprobe hung?

If not, I have some very hard questions about why net_mutex can be
held by two tasks concurrently, given that it does not appear to be a
reader-writer lock...

Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-24 04:50:03 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread
appears to
have
blocked within rcu_barrier() for 120 seconds means
that
something is
most definitely wrong here. I am surprised that
there are no
RCU CPU
stall warnings, but perhaps the blockage is in the
callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback
executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace
above and can
recreate it reliably. Apparently reverting the RCU
merge commit
(d6dd50e) and rebuilding the latest after that does
not show the
issue. I'll let Yanko explain more and answer any
questions you
have.
- It is reproducible
- I've done another build here to double check and its
definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do
testing if
needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with
WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!
The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?
Unfortunately not, This is linus-tip + patch
OK. Can't have everything, I guess.
Post by Yanko Kaneti
INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
Presumably the kworker/u16:6 completed, then modprobe hung?
If not, I have some very hard questions about why net_mutex can be
held by two tasks concurrently, given that it does not appear to be a
reader-writer lock...
Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
Just a note to add that I am also reliably inducing what appears
to be this issue on a current -net tree, when configuring openvswitch
via script. I am available to test patches or bisect tomorrow (Friday)
US time if needed.

The stack is as follows:

[ 1320.492020] INFO: task ovs-vswitchd:1303 blocked for more than 120 seconds.
[ 1320.498965] Not tainted 3.17.0-testola+ #1
[ 1320.503570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.511374] ovs-vswitchd D ffff88013fc14600 0 1303 1302 0x00000004
[ 1320.511378] ffff8801388d77d8 0000000000000002 ffff880031144b00 ffff8801388d7fd8
[ 1320.511382] 0000000000014600 0000000000014600 ffff8800b092e400 ffff880031144b00
[ 1320.511385] ffff8800b1126000 ffffffff81c58ad0 ffffffff81c58ad8 7fffffffffffffff
[ 1320.511389] Call Trace:
[ 1320.511396] [<ffffffff81739db9>] schedule+0x29/0x70
[ 1320.511399] [<ffffffff8173cd8c>] schedule_timeout+0x1dc/0x260
[ 1320.511404] [<ffffffff8109698d>] ? check_preempt_curr+0x8d/0xa0
[ 1320.511407] [<ffffffff810969bd>] ? ttwu_do_wakeup+0x1d/0xd0
[ 1320.511410] [<ffffffff8173aab6>] wait_for_completion+0xa6/0x160
[ 1320.511413] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 1320.511417] [<ffffffff810cdb57>] _rcu_barrier+0x157/0x200
[ 1320.511419] [<ffffffff810cdc55>] rcu_barrier+0x15/0x20
[ 1320.511423] [<ffffffff8163a780>] netdev_run_todo+0x60/0x300
[ 1320.511427] [<ffffffff8164515e>] rtnl_unlock+0xe/0x10
[ 1320.511435] [<ffffffffa01aecc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 1320.511440] [<ffffffffa01ae622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 1320.511444] [<ffffffffa01a7dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 1320.511448] [<ffffffffa01a7ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 1320.511452] [<ffffffff816675b5>] genl_family_rcv_msg+0x1a5/0x3c0
[ 1320.511455] [<ffffffff816677d0>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 1320.511458] [<ffffffff81667861>] genl_rcv_msg+0x91/0xd0
[ 1320.511461] [<ffffffff816658d1>] netlink_rcv_skb+0xc1/0xe0
[ 1320.511463] [<ffffffff81665dfc>] genl_rcv+0x2c/0x40
[ 1320.511466] [<ffffffff81664e66>] netlink_unicast+0xf6/0x200
[ 1320.511468] [<ffffffff8166528d>] netlink_sendmsg+0x31d/0x780
[ 1320.511472] [<ffffffff81662274>] ? netlink_rcv_wake+0x44/0x60
[ 1320.511475] [<ffffffff816632e3>] ? netlink_recvmsg+0x1d3/0x3e0
[ 1320.511479] [<ffffffff8161c463>] sock_sendmsg+0x93/0xd0
[ 1320.511484] [<ffffffff81332d00>] ? apparmor_file_alloc_security+0x20/0x40
[ 1320.511487] [<ffffffff8162a697>] ? verify_iovec+0x47/0xd0
[ 1320.511491] [<ffffffff8161cc79>] ___sys_sendmsg+0x399/0x3b0
[ 1320.511495] [<ffffffff81254e02>] ? kernfs_seq_stop_active+0x32/0x40
[ 1320.511499] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511502] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511505] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 1320.511509] [<ffffffff81122d5c>] ? acct_account_cputime+0x1c/0x20
[ 1320.511512] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 1320.511516] [<ffffffff811fc135>] ? __fget_light+0x25/0x70
[ 1320.511519] [<ffffffff8161d372>] __sys_sendmsg+0x42/0x80
[ 1320.511521] [<ffffffff8161d3c2>] SyS_sendmsg+0x12/0x20
[ 1320.511525] [<ffffffff8173e6a4>] tracesys_phase2+0xd8/0xdd

-J

---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 15:00:01 UTC
Permalink
Post by Jay Vosburgh
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread
appears to
have
blocked within rcu_barrier() for 120 seconds means
that
something is
most definitely wrong here. I am surprised that
there are no
RCU CPU
stall warnings, but perhaps the blockage is in the
callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback
executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace
above and can
recreate it reliably. Apparently reverting the RCU
merge commit
(d6dd50e) and rebuilding the latest after that does
not show the
issue. I'll let Yanko explain more and answer any
questions you
have.
- It is reproducible
- I've done another build here to double check and its
definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do
testing if
needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe
ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU
merge commit
and see what suggests itself. I am likely to ask you to
revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with
WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!
The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?
Unfortunately not, This is linus-tip + patch
OK. Can't have everything, I guess.
Post by Yanko Kaneti
INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
Presumably the kworker/u16:6 completed, then modprobe hung?
If not, I have some very hard questions about why net_mutex can be
held by two tasks concurrently, given that it does not appear to be a
reader-writer lock...
Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
Just a note to add that I am also reliably inducing what appears
to be this issue on a current -net tree, when configuring openvswitch
via script. I am available to test patches or bisect tomorrow (Friday)
US time if needed.
Thank you, Jay! Could you please check to see if reverting this commit
fixes things for you?

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Reverting is not a long-term fix, as this commit is itself a bug fix,
but would be good to check to see if you are seeing the same thing that
Yanko is. ;-)

Thanx, Paul
Post by Jay Vosburgh
[ 1320.492020] INFO: task ovs-vswitchd:1303 blocked for more than 120 seconds.
[ 1320.498965] Not tainted 3.17.0-testola+ #1
[ 1320.503570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.511374] ovs-vswitchd D ffff88013fc14600 0 1303 1302 0x00000004
[ 1320.511378] ffff8801388d77d8 0000000000000002 ffff880031144b00 ffff8801388d7fd8
[ 1320.511382] 0000000000014600 0000000000014600 ffff8800b092e400 ffff880031144b00
[ 1320.511385] ffff8800b1126000 ffffffff81c58ad0 ffffffff81c58ad8 7fffffffffffffff
[ 1320.511396] [<ffffffff81739db9>] schedule+0x29/0x70
[ 1320.511399] [<ffffffff8173cd8c>] schedule_timeout+0x1dc/0x260
[ 1320.511404] [<ffffffff8109698d>] ? check_preempt_curr+0x8d/0xa0
[ 1320.511407] [<ffffffff810969bd>] ? ttwu_do_wakeup+0x1d/0xd0
[ 1320.511410] [<ffffffff8173aab6>] wait_for_completion+0xa6/0x160
[ 1320.511413] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 1320.511417] [<ffffffff810cdb57>] _rcu_barrier+0x157/0x200
[ 1320.511419] [<ffffffff810cdc55>] rcu_barrier+0x15/0x20
[ 1320.511423] [<ffffffff8163a780>] netdev_run_todo+0x60/0x300
[ 1320.511427] [<ffffffff8164515e>] rtnl_unlock+0xe/0x10
[ 1320.511435] [<ffffffffa01aecc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 1320.511440] [<ffffffffa01ae622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 1320.511444] [<ffffffffa01a7dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 1320.511448] [<ffffffffa01a7ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 1320.511452] [<ffffffff816675b5>] genl_family_rcv_msg+0x1a5/0x3c0
[ 1320.511455] [<ffffffff816677d0>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 1320.511458] [<ffffffff81667861>] genl_rcv_msg+0x91/0xd0
[ 1320.511461] [<ffffffff816658d1>] netlink_rcv_skb+0xc1/0xe0
[ 1320.511463] [<ffffffff81665dfc>] genl_rcv+0x2c/0x40
[ 1320.511466] [<ffffffff81664e66>] netlink_unicast+0xf6/0x200
[ 1320.511468] [<ffffffff8166528d>] netlink_sendmsg+0x31d/0x780
[ 1320.511472] [<ffffffff81662274>] ? netlink_rcv_wake+0x44/0x60
[ 1320.511475] [<ffffffff816632e3>] ? netlink_recvmsg+0x1d3/0x3e0
[ 1320.511479] [<ffffffff8161c463>] sock_sendmsg+0x93/0xd0
[ 1320.511484] [<ffffffff81332d00>] ? apparmor_file_alloc_security+0x20/0x40
[ 1320.511487] [<ffffffff8162a697>] ? verify_iovec+0x47/0xd0
[ 1320.511491] [<ffffffff8161cc79>] ___sys_sendmsg+0x399/0x3b0
[ 1320.511495] [<ffffffff81254e02>] ? kernfs_seq_stop_active+0x32/0x40
[ 1320.511499] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511502] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511505] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 1320.511509] [<ffffffff81122d5c>] ? acct_account_cputime+0x1c/0x20
[ 1320.511512] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 1320.511516] [<ffffffff811fc135>] ? __fget_light+0x25/0x70
[ 1320.511519] [<ffffffff8161d372>] __sys_sendmsg+0x42/0x80
[ 1320.511521] [<ffffffff8161d3c2>] SyS_sendmsg+0x12/0x20
[ 1320.511525] [<ffffffff8173e6a4>] tracesys_phase2+0xd8/0xdd
-J
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-24 18:30:02 UTC
Permalink
[...]
Post by Paul E. McKenney
Post by Jay Vosburgh
Post by Paul E. McKenney
Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
Just a note to add that I am also reliably inducing what appears
to be this issue on a current -net tree, when configuring openvswitch
via script. I am available to test patches or bisect tomorrow (Friday)
US time if needed.
Thank you, Jay! Could you please check to see if reverting this commit
fixes things for you?
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Reverting is not a long-term fix, as this commit is itself a bug fix,
but would be good to check to see if you are seeing the same thing that
Yanko is. ;-)
Just to confirm what Yanko found, reverting this commit makes
the problem go away for me.

-J

---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 18:40:01 UTC
Permalink
Post by Jay Vosburgh
[...]
Post by Paul E. McKenney
Post by Jay Vosburgh
Post by Paul E. McKenney
Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
Just a note to add that I am also reliably inducing what appears
to be this issue on a current -net tree, when configuring openvswitch
via script. I am available to test patches or bisect tomorrow (Friday)
US time if needed.
Thank you, Jay! Could you please check to see if reverting this commit
fixes things for you?
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Reverting is not a long-term fix, as this commit is itself a bug fix,
but would be good to check to see if you are seeing the same thing that
Yanko is. ;-)
Just to confirm what Yanko found, reverting this commit makes
the problem go away for me.
Thank you!

I take it that the patches that don't help Yanko also don't help you?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-24 09:10:02 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Josh Boyer
On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
[ . . . ]
Post by Josh Boyer
Post by Paul E. McKenney
Don't get me wrong -- the fact that this kthread
appears to
have
blocked within rcu_barrier() for 120 seconds means
that
something is
most definitely wrong here. I am surprised that
there are no
RCU CPU
stall warnings, but perhaps the blockage is in the
callback
execution
rather than grace-period completion. Or something is
preventing this
kthread from starting up after the wake-up callback
executes.
Or...
Is this thing reproducible?
I've added Yanko on CC, who reported the backtrace
above and can
recreate it reliably. Apparently reverting the RCU
merge commit
(d6dd50e) and rebuilding the latest after that does
not show the
issue. I'll let Yanko explain more and answer any
questions you
have.
- It is reproducible
- I've done another build here to double check and its
definitely
the rcu merge
that's causing it.
Don't think I'll be able to dig deeper, but I can do
testing if
needed.
Please! Does the following patch help?
Nope, doesn't seem to make a difference to the modprobe ppp_generic
test
Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to
revert specific
commits, if that works for you.
Well, rather than reverting commits, could you please try testing the
following commits?
11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
callbacks after spawning)
73a860cd58a1 (rcu: Replace flush_signals() with
WARN_ON(signal_pending()))
c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
For whatever it is worth, I am guessing this one.
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!
The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?
Unfortunately not, This is linus-tip + patch
OK. Can't have everything, I guess.
Post by Yanko Kaneti
INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
Presumably the kworker/u16:6 completed, then modprobe hung?
If not, I have some very hard questions about why net_mutex can be
held by two tasks concurrently, given that it does not appear to be a
reader-writer lock...
Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
Ok, unless I've messsed up something major, bisecting points to:

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Makes any sense ?


Another thing I noticed is that in failure mode the libvirtd bridge actually
doesn't show up. So maybe ppp is just the first thing to try that bumps up
into whatever libvirtd is failing to do to setup those.

Truly hope this is not something with random timing dependency....

--Yanko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 15:50:01 UTC
Permalink
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Indeed, c847f14217d5 it is.
Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which
in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.
Hope that helps.
I am attaching the config.
It does help a lot, thank you!!!
The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?
Unfortunately not, This is linus-tip + patch
OK. Can't have everything, I guess.
Post by Yanko Kaneti
INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
Presumably the kworker/u16:6 completed, then modprobe hung?
If not, I have some very hard questions about why net_mutex can be
held by two tasks concurrently, given that it does not appear to be a
reader-writer lock...
Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)

Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Post by Yanko Kaneti
Another thing I noticed is that in failure mode the libvirtd bridge actually
doesn't show up. So maybe ppp is just the first thing to try that bumps up
into whatever libvirtd is failing to do to setup those.
Truly hope this is not something with random timing dependency....
Me too. ;-)

Thanx, Paul
Post by Yanko Kaneti
--Yanko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 17:00:02 UTC
Permalink
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)

I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-24 17:10:02 UTC
Permalink
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test

--Yanko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 17:30:02 UTC
Permalink
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-24 17:40:02 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 18:40:01 UTC
Permalink
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.

Thanx, Paul
Post by Yanko Kaneti
Post by Paul E. McKenney
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-24 19:00:02 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
My system is up and responsive when the problem occurs, so this
shouldn't be a problem.

Do you want the ftrace with your patch below, or unmodified tip
of tree?

-J
Post by Paul E. McKenney
Thanx, Paul
Post by Yanko Kaneti
Post by Paul E. McKenney
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}
---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 19:10:02 UTC
Permalink
Post by Jay Vosburgh
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
My system is up and responsive when the problem occurs, so this
shouldn't be a problem.
Nice! ;-)
Post by Jay Vosburgh
Do you want the ftrace with your patch below, or unmodified tip
of tree?
Let's please start with the patch.

Thanx, Paul
Post by Jay Vosburgh
-J
Post by Paul E. McKenney
Thanx, Paul
Post by Yanko Kaneti
Post by Paul E. McKenney
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 20:20:02 UTC
Permalink
Post by Paul E. McKenney
Post by Jay Vosburgh
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
My system is up and responsive when the problem occurs, so this
shouldn't be a problem.
Nice! ;-)
Post by Jay Vosburgh
Do you want the ftrace with your patch below, or unmodified tip
of tree?
Let's please start with the patch.
And I should hasten to add that you need to set CONFIG_RCU_TRACE=y
for these tracepoints to be enabled.

Thanx, Paul
Post by Paul E. McKenney
Post by Jay Vosburgh
-J
Post by Paul E. McKenney
Thanx, Paul
Post by Yanko Kaneti
Post by Paul E. McKenney
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-24 21:30:01 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
[ . . . ]
Post by Paul E. McKenney
Post by Yanko Kaneti
35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
Makes any sense ?
Good question. ;-)
Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.
Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.
Thank you, very interesting. Which 6 of the rcuos are present?
Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.
Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.
Post by Yanko Kaneti
Post by Paul E. McKenney
Awating instructions: :)
Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)
I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)
Ready to test
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.

In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh

6 vs 8 as in 6 rcuos where before they were always 8

Just observations from someone who still doesn't know what the u16
kworkers are..

-- Yanko
Post by Paul E. McKenney
Thanx, Paul
Post by Yanko Kaneti
Post by Paul E. McKenney
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 22:00:03 UTC
Permalink
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.

Thanx, Paul

------------------------------------------------------------------------

rcu: Dump no-CBs CPU state at task-hung time

Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.

Signed-off-by: Paul E. McKenney <***@linux.vnet.ibm.com>

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0e5366200154..34048140577b 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)

#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */

+static inline void rcu_show_nocb_setup(void)
+{
+}
+
#endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 52953790dcca..0b813bdb971b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;

bool rcu_is_watching(void);

+void rcu_show_nocb_setup(void);
+
#endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06db12434d72..e6e4d0f6b063 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
" disables this message.\n");
sched_show_task(t);
debug_show_held_locks(t);
+ rcu_show_nocb_setup();

touch_nmi_watchdog();

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 240fa9094f83..6b373e79ce0e 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
{
int i;

+ rcu_show_nocb_setup();
rcutorture_record_test_transition();
if (torture_cleanup_begin()) {
if (cur_ops->cb_barrier != NULL)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..285b3f6fb229 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)

#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */

+void rcu_show_nocb_setup(void)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+ int cpu;
+ struct rcu_data *rdp;
+ struct rcu_state *rsp;
+
+ for_each_rcu_flavor(rsp) {
+ pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
+ for_each_possible_cpu(cpu) {
+ if (!rcu_is_nocb_cpu(cpu))
+ continue;
+ rdp = per_cpu_ptr(rsp->rda, cpu);
+ pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
+ cpu,
+ rdp, rdp->nocb_leader, rdp->nocb_next_follower,
+ ".N"[!!rdp->nocb_head],
+ ".G"[!!rdp->nocb_gp_head],
+ ".F"[!!rdp->nocb_follower_head]);
+ }
+ }
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
+}
+EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
+
/*
* An adaptive-ticks CPU can potentially execute in kernel mode for an
* arbitrarily long period of time with the scheduling-clock tick turned

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-24 22:10:02 UTC
Permalink
Post by Paul E. McKenney
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.
I can give this a spin after the ftrace (now that I've got
CONFIG_RCU_TRACE turned on).

I've got an ftrace capture from unmodified -net, it looks like
this:

ovs-vswitchd-902 [000] .... 471.778441: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
rcuos/0-9 [000] ..s. 471.793150: rcu_barrier: rcu_sched CB cpu -1 remaining 3 # 2
rcuos/1-18 [001] ..s. 471.793308: rcu_barrier: rcu_sched CB cpu -1 remaining 2 # 2

I let it sit through several "hung task" cycles but that was all
there was for rcu:rcu_barrier.

I should have ftrace with the patch as soon as the kernel is
done building, then I can try the below patch (I'll start it building
now).

-J
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Dump no-CBs CPU state at task-hung time
Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0e5366200154..34048140577b 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)
#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
+static inline void rcu_show_nocb_setup(void)
+{
+}
+
#endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 52953790dcca..0b813bdb971b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;
bool rcu_is_watching(void);
+void rcu_show_nocb_setup(void);
+
#endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06db12434d72..e6e4d0f6b063 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
" disables this message.\n");
sched_show_task(t);
debug_show_held_locks(t);
+ rcu_show_nocb_setup();
touch_nmi_watchdog();
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 240fa9094f83..6b373e79ce0e 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
{
int i;
+ rcu_show_nocb_setup();
rcutorture_record_test_transition();
if (torture_cleanup_begin()) {
if (cur_ops->cb_barrier != NULL)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..285b3f6fb229 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
+void rcu_show_nocb_setup(void)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+ int cpu;
+ struct rcu_data *rdp;
+ struct rcu_state *rsp;
+
+ for_each_rcu_flavor(rsp) {
+ pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
+ for_each_possible_cpu(cpu) {
+ if (!rcu_is_nocb_cpu(cpu))
+ continue;
+ rdp = per_cpu_ptr(rsp->rda, cpu);
+ pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
+ cpu,
+ rdp, rdp->nocb_leader, rdp->nocb_next_follower,
+ ".N"[!!rdp->nocb_head],
+ ".G"[!!rdp->nocb_gp_head],
+ ".F"[!!rdp->nocb_follower_head]);
+ }
+ }
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
+}
+EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
+
/*
* An adaptive-ticks CPU can potentially execute in kernel mode for an
* arbitrarily long period of time with the scheduling-clock tick turned
---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 22:30:01 UTC
Permalink
Post by Jay Vosburgh
Post by Paul E. McKenney
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.
I can give this a spin after the ftrace (now that I've got
CONFIG_RCU_TRACE turned on).
I've got an ftrace capture from unmodified -net, it looks like
ovs-vswitchd-902 [000] .... 471.778441: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
OK, so it looks like your system has four CPUs, and rcu_barrier() placed
callbacks on them all.
Post by Jay Vosburgh
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
The above removes the extra count used to avoid races between posting new
callbacks and completion of previously posted callbacks.
Post by Jay Vosburgh
rcuos/0-9 [000] ..s. 471.793150: rcu_barrier: rcu_sched CB cpu -1 remaining 3 # 2
rcuos/1-18 [001] ..s. 471.793308: rcu_barrier: rcu_sched CB cpu -1 remaining 2 # 2
Two of the four callbacks fired, but the other two appear to be AWOL.
And rcu_barrier() won't return until they all fire.
Post by Jay Vosburgh
I let it sit through several "hung task" cycles but that was all
there was for rcu:rcu_barrier.
I should have ftrace with the patch as soon as the kernel is
done building, then I can try the below patch (I'll start it building
now).
Sounds very good, looking forward to hearing of the results.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-24 22:50:02 UTC
Permalink
[...]
Post by Paul E. McKenney
Post by Jay Vosburgh
I've got an ftrace capture from unmodified -net, it looks like
ovs-vswitchd-902 [000] .... 471.778441: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
OK, so it looks like your system has four CPUs, and rcu_barrier() placed
callbacks on them all.
No, the system has only two CPUs. It's an Intel Core 2 Duo
E8400, and /proc/cpuinfo agrees that there are only 2. There is a
potentially relevant-sounding message early in dmesg that says:

[ 0.000000] smpboot: Allowing 4 CPUs, 2 hotplug CPUs
Post by Paul E. McKenney
Post by Jay Vosburgh
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
The above removes the extra count used to avoid races between posting new
callbacks and completion of previously posted callbacks.
Post by Jay Vosburgh
rcuos/0-9 [000] ..s. 471.793150: rcu_barrier: rcu_sched CB cpu -1 remaining 3 # 2
rcuos/1-18 [001] ..s. 471.793308: rcu_barrier: rcu_sched CB cpu -1 remaining 2 # 2
Two of the four callbacks fired, but the other two appear to be AWOL.
And rcu_barrier() won't return until they all fire.
Post by Jay Vosburgh
I let it sit through several "hung task" cycles but that was all
there was for rcu:rcu_barrier.
I should have ftrace with the patch as soon as the kernel is
done building, then I can try the below patch (I'll start it building
now).
Sounds very good, looking forward to hearing of the results.
Going to bounce it for ftrace now, but the cpu count mismatch
seemed important enough to mention separately.

-J

---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-24 22:40:02 UTC
Permalink
Post by Paul E. McKenney
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.
Here's the output of the patch; I let it sit through two hang
cycles.

-J


[ 240.348020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 240.354878] Not tainted 3.17.0-testola+ #4
[ 240.359481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.367285] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 240.367290] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 240.367293] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 240.367296] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 240.367300] Call Trace:
[ 240.367307] [<ffffffff81722b99>] schedule+0x29/0x70
[ 240.367310] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 240.367313] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 240.367316] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 240.367321] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 240.367324] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 240.367328] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 240.367331] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 240.367334] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 240.367338] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 240.367341] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 240.367349] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 240.367354] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 240.367358] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 240.367363] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 240.367367] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 240.367370] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 240.367372] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 240.367376] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 240.367378] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 240.367381] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 240.367383] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 240.367387] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 240.367391] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 240.367395] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 240.367399] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 240.367402] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 240.367406] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 240.367410] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367413] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367416] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 240.367420] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 240.367424] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 240.367428] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 240.367431] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 240.367433] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 240.367436] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 240.367439] rcu_show_nocb_setup(): rcu_sched nocb state:
[ 240.372734] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 .G.
[ 240.379673] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) .G.
[ 240.386611] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 240.393550] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
[ 240.400489] rcu_show_nocb_setup(): rcu_bh nocb state:
[ 240.405525] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 240.412463] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) ...
[ 240.419401] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 240.426339] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
[ 360.432020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 360.438881] Not tainted 3.17.0-testola+ #4
[ 360.443484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.451289] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 360.451293] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 360.451297] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 360.451300] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 360.451303] Call Trace:
[ 360.451311] [<ffffffff81722b99>] schedule+0x29/0x70
[ 360.451314] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 360.451317] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 360.451320] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 360.451325] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 360.451327] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 360.451331] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 360.451335] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 360.451338] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 360.451342] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 360.451345] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 360.451353] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 360.451358] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 360.451362] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 360.451366] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 360.451370] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 360.451373] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 360.451376] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 360.451379] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 360.451381] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 360.451384] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 360.451387] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 360.451390] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 360.451394] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 360.451399] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 360.451402] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 360.451406] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 360.451410] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 360.451414] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451417] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451419] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 360.451424] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 360.451427] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 360.451431] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 360.451434] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 360.451437] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 360.451440] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 360.451442] rcu_show_nocb_setup(): rcu_sched nocb state:
[ 360.456737] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 ...
[ 360.463676] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) ...
[ 360.470614] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 360.477554] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
[ 360.484494] rcu_show_nocb_setup(): rcu_bh nocb state:
[ 360.489529] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 360.496469] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) .G.
[ 360.503407] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 360.510346] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...

---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 23:10:02 UTC
Permalink
Post by Jay Vosburgh
Post by Paul E. McKenney
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.
Here's the output of the patch; I let it sit through two hang
cycles.
-J
[ 240.348020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 240.354878] Not tainted 3.17.0-testola+ #4
[ 240.359481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.367285] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 240.367290] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 240.367293] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 240.367296] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 240.367307] [<ffffffff81722b99>] schedule+0x29/0x70
[ 240.367310] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 240.367313] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 240.367316] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 240.367321] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 240.367324] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 240.367328] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 240.367331] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 240.367334] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 240.367338] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 240.367341] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 240.367349] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 240.367354] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 240.367358] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 240.367363] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 240.367367] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 240.367370] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 240.367372] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 240.367376] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 240.367378] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 240.367381] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 240.367383] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 240.367387] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 240.367391] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 240.367395] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 240.367399] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 240.367402] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 240.367406] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 240.367410] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367413] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367416] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 240.367420] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 240.367424] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 240.367428] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 240.367431] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 240.367433] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 240.367436] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 240.372734] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 .G.
[ 240.379673] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) .G.
[ 240.386611] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 240.393550] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
[ 240.405525] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 240.412463] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) ...
[ 240.419401] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 240.426339] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
[ 360.432020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 360.438881] Not tainted 3.17.0-testola+ #4
[ 360.443484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.451289] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 360.451293] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 360.451297] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 360.451300] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 360.451311] [<ffffffff81722b99>] schedule+0x29/0x70
[ 360.451314] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 360.451317] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 360.451320] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 360.451325] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 360.451327] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 360.451331] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 360.451335] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 360.451338] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 360.451342] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 360.451345] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 360.451353] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 360.451358] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 360.451362] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 360.451366] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 360.451370] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 360.451373] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 360.451376] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 360.451379] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 360.451381] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 360.451384] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 360.451387] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 360.451390] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 360.451394] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 360.451399] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 360.451402] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 360.451406] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 360.451410] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 360.451414] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451417] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451419] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 360.451424] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 360.451427] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 360.451431] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 360.451434] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 360.451437] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 360.451440] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 360.456737] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 ...
[ 360.463676] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) ...
[ 360.470614] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 360.477554] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
Hmmm... It sure looks like we have some callbacks stuck here. I clearly
need to take a hard look at the sleep/wakeup code.
Thank you for running this!!!
Could you please try the following patch? If no joy, could you please
add rcu:rcu_nocb_wake to the list of ftrace events?

Thanx, Paul

------------------------------------------------------------------------

rcu: Kick rcuo kthreads after their CPU goes offline

If a no-CBs CPU were to post an RCU callback with interrupts disabled
after it entered the idle loop for the last time, there might be no
deferred wakeup for the corresponding rcuo kthreads. This commit
therefore adds a set of calls to do_nocb_deferred_wakeup() after the
CPU has gone completely offline.

Signed-off-by: Paul E. McKenney <***@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 84b41b3c6ebd..f6880052b917 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct notifier_block *self,
case CPU_DEAD_FROZEN:
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
- for_each_rcu_flavor(rsp)
+ for_each_rcu_flavor(rsp) {
rcu_cleanup_dead_cpu(cpu, rsp);
+ do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
+ }
break;
default:
break;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-25 00:30:02 UTC
Permalink
[...]
Post by Paul E. McKenney
Hmmm... It sure looks like we have some callbacks stuck here. I clearly
need to take a hard look at the sleep/wakeup code.
Thank you for running this!!!
Could you please try the following patch? If no joy, could you please
add rcu:rcu_nocb_wake to the list of ftrace events?
I tried the patch, it did not change the behavior.

I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
and ran it again (with this patch and the first patch from earlier
today); the trace output is a bit on the large side so I put it and the
dmesg log at:

http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt

http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt

-J
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Kick rcuo kthreads after their CPU goes offline
If a no-CBs CPU were to post an RCU callback with interrupts disabled
after it entered the idle loop for the last time, there might be no
deferred wakeup for the corresponding rcuo kthreads. This commit
therefore adds a set of calls to do_nocb_deferred_wakeup() after the
CPU has gone completely offline.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 84b41b3c6ebd..f6880052b917 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct notifier_block *self,
- for_each_rcu_flavor(rsp)
+ for_each_rcu_flavor(rsp) {
rcu_cleanup_dead_cpu(cpu, rsp);
+ do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
+ }
break;
break;
---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-25 18:30:03 UTC
Permalink
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-27 17:50:02 UTC
Permalink
Post by Paul E. McKenney
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.
And here is an alleged fix, lightly tested at this end. Does this patch
help?

Thanx, Paul

------------------------------------------------------------------------

rcu: Make rcu_barrier() understand about missing rcuo kthreads

Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.

Reported-by: Yanko Kaneti <***@declera.com>
Reported-by: Jay Vosburgh <***@canonical.com>
Reported-by: Meelis Roos <***@linux.ee>
Reported-by: Eric B Munson <***@akamai.com>
Signed-off-by: Paul E. McKenney <***@linux.vnet.ibm.com>

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
* the _rcu_barrier phase:
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}

/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)

#else /* #ifdef CONFIG_RCU_NOCB_CPU */

+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-27 20:50:02 UTC
Permalink
Post by Paul E. McKenney
Post by Paul E. McKenney
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.
And here is an alleged fix, lightly tested at this end. Does this patch
help?
This patch appears to make the problem go away; I've run about
10 iterations. I applied this patch to the same -net tree I was using
previously (-net as of Oct 22), with all other test patches removed.

FWIW, dmesg is unchanged, and still shows messages like:

[ 0.000000] Offload RCU callbacks from CPUs: 0-3.

Tested-by: Jay Vosburgh <***@canonical.com>

-J
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}
/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}
---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-27 21:20:02 UTC
Permalink
Post by Jay Vosburgh
Post by Paul E. McKenney
Post by Paul E. McKenney
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.
And here is an alleged fix, lightly tested at this end. Does this patch
help?
This patch appears to make the problem go away; I've run about
10 iterations. I applied this patch to the same -net tree I was using
previously (-net as of Oct 22), with all other test patches removed.
So I finally produced a patch that helps! It was bound to happen sooner
or later, I guess. ;-)
Post by Jay Vosburgh
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
Yep, at that point in boot, RCU has no way of knowing that the firmware
is lying to it about the number of CPUs. ;-)
Thank you for your testing efforts!!!

Thanx, Paul
Post by Jay Vosburgh
-J
Post by Paul E. McKenney
------------------------------------------------------------------------
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}
/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-28 08:20:02 UTC
Permalink
Post by Paul E. McKenney
Post by Paul E. McKenney
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.
And here is an alleged fix, lightly tested at this end. Does this patch
help?
Tested this on top of rc2 (as found in Fedora, and failing without the patch)
with all my modprobe scenarios and it seems to have fixed it.

Thanks
-Yanko
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}
/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-28 13:00:02 UTC
Permalink
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Paul E. McKenney
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.
And here is an alleged fix, lightly tested at this end. Does this patch
help?
Tested this on top of rc2 (as found in Fedora, and failing without the patch)
with all my modprobe scenarios and it seems to have fixed it.
Very good! May I apply your Tested-by?

Thanx, Paul
Post by Yanko Kaneti
Thanks
-Yanko
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}
/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-28 13:10:01 UTC
Permalink
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Paul E. McKenney
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).
Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.
Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.
And here is an alleged fix, lightly tested at this end. Does this patch
help?
Tested this on top of rc2 (as found in Fedora, and failing without the patch)
with all my modprobe scenarios and it seems to have fixed it.
Very good! May I apply your Tested-by?
Sure. Sorry didn't include this earlier
Post by Paul E. McKenney
Thanx, Paul
Post by Yanko Kaneti
Thanks
-Yanko
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}
/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Kevin Fenzi
2014-10-28 16:00:02 UTC
Permalink
Just FYI, this solves the orig issue for me as well. ;)

Thanks for all the work in tracking it down...

Tested-by: Kevin Fenzi <***@scrye.com>

kevin
Paul E. McKenney
2014-10-28 16:20:02 UTC
Permalink
Post by Kevin Fenzi
Just FYI, this solves the orig issue for me as well. ;)
Thanks for all the work in tracking it down...
And thank you for testing as well!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Jay Vosburgh
2014-10-25 18:50:01 UTC
Permalink
Post by Jay Vosburgh
[...]
Post by Paul E. McKenney
Hmmm... It sure looks like we have some callbacks stuck here. I clearly
need to take a hard look at the sleep/wakeup code.
Thank you for running this!!!
Could you please try the following patch? If no joy, could you please
add rcu:rcu_nocb_wake to the list of ftrace events?
I tried the patch, it did not change the behavior.
I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
and ran it again (with this patch and the first patch from earlier
today); the trace output is a bit on the large side so I put it and the
http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
Thank you again!
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
The pair of WakeNotPoll trace entries says that at that point, RCU believed
that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/
On the test system I'm using, CPUs 2 and 3 really do not exist;
it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
earlier message, but perhaps you missed it in the flurry.

Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,

[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.

but later shows 2:

[ 0.233703] x86: Booting SMP configuration:
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs

In any event, the E8400 is a 2 core CPU with no hyperthreading.

-J

---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jay Vosburgh
2014-10-25 18:50:02 UTC
Permalink
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Booting with maxcpus=2 makes no difference (the dmesg output is
the same).

Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
dmesg has different CPU information at boot:

[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.

-J

---
-Jay Vosburgh, ***@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-25 21:20:03 UTC
Permalink
Post by Jay Vosburgh
Post by Jay Vosburgh
[...]
Post by Paul E. McKenney
Hmmm... It sure looks like we have some callbacks stuck here. I clearly
need to take a hard look at the sleep/wakeup code.
Thank you for running this!!!
Could you please try the following patch? If no joy, could you please
add rcu:rcu_nocb_wake to the list of ftrace events?
I tried the patch, it did not change the behavior.
I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
and ran it again (with this patch and the first patch from earlier
today); the trace output is a bit on the large side so I put it and the
http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
Thank you again!
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
The pair of WakeNotPoll trace entries says that at that point, RCU believed
that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/
On the test system I'm using, CPUs 2 and 3 really do not exist;
it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
earlier message, but perhaps you missed it in the flurry.
Or forgot it. Either way, thank you for reminding me.
Post by Jay Vosburgh
Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs
In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.

So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul E. McKenney
2014-10-24 23:10:02 UTC
Permalink
Post by Jay Vosburgh
Post by Paul E. McKenney
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.
Here's the output of the patch; I let it sit through two hang
cycles.
-J
[ 240.348020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 240.354878] Not tainted 3.17.0-testola+ #4
[ 240.359481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.367285] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 240.367290] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 240.367293] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 240.367296] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 240.367307] [<ffffffff81722b99>] schedule+0x29/0x70
[ 240.367310] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 240.367313] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 240.367316] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 240.367321] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 240.367324] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 240.367328] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 240.367331] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 240.367334] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 240.367338] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 240.367341] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 240.367349] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 240.367354] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 240.367358] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 240.367363] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 240.367367] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 240.367370] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 240.367372] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 240.367376] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 240.367378] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 240.367381] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 240.367383] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 240.367387] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 240.367391] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 240.367395] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 240.367399] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 240.367402] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 240.367406] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 240.367410] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367413] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367416] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 240.367420] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 240.367424] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 240.367428] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 240.367431] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 240.367433] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 240.367436] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 240.372734] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 .G.
[ 240.379673] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) .G.
[ 240.386611] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 240.393550] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
[ 240.405525] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 240.412463] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) ...
[ 240.419401] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 240.426339] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
[ 360.432020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 360.438881] Not tainted 3.17.0-testola+ #4
[ 360.443484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.451289] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 360.451293] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 360.451297] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 360.451300] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 360.451311] [<ffffffff81722b99>] schedule+0x29/0x70
[ 360.451314] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 360.451317] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 360.451320] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 360.451325] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 360.451327] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 360.451331] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 360.451335] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 360.451338] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 360.451342] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 360.451345] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 360.451353] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 360.451358] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 360.451362] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 360.451366] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 360.451370] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 360.451373] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 360.451376] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 360.451379] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 360.451381] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 360.451384] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 360.451387] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 360.451390] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 360.451394] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 360.451399] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 360.451402] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 360.451406] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 360.451410] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 360.451414] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451417] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451419] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 360.451424] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 360.451427] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 360.451431] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 360.451434] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 360.451437] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 360.451440] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 360.456737] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 ...
[ 360.463676] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) ...
[ 360.470614] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 360.477554] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
Hmmm... It sure looks like we have some callbacks stuck here. I clearly
need to take a hard look at the sleep/wakeup code.

Thank you for running this!!!

Thanx, Paul
Post by Jay Vosburgh
[ 360.489529] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 360.496469] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) .G.
[ 360.503407] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 360.510346] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Yanko Kaneti
2014-10-25 19:20:03 UTC
Permalink
Post by Paul E. McKenney
[ . . . ]
Post by Yanko Kaneti
Post by Paul E. McKenney
Post by Yanko Kaneti
Post by Paul E. McKenney
Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.
Doesn't seem to make a difference here
OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.
Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.
In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh
Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
Post by Yanko Kaneti
6 vs 8 as in 6 rcuos where before they were always 8
Just observations from someone who still doesn't know what the u16
kworkers are..
Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.
So here the output with todays linux tip and the diagnostic patch.
This is the case with just starting libvird in runlevel 1.
Also a snapshot of the kworker/u16 s

6 ? S 0:00 \_ [kworker/u16:0]
553 ? S 0:00 | \_ [kworker/u16:0]
554 ? D 0:00 | \_ /sbin/modprobe -q -- bridge
78 ? S 0:00 \_ [kworker/u16:1]
92 ? S 0:00 \_ [kworker/u16:2]
93 ? S 0:00 \_ [kworker/u16:3]
94 ? S 0:00 \_ [kworker/u16:4]
95 ? S 0:00 \_ [kworker/u16:5]
96 ? D 0:00 \_ [kworker/u16:6]
105 ? S 0:00 \_ [kworker/u16:7]
108 ? S 0:00 \_ [kworker/u16:8]


INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #16
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca9ecec0 11552 96 2 0x00000000
Workqueue: netns cleanup_net
ffff880221fff9c8 0000000000000096 ffff8800ca9ecec0 00000000001d5f00
ffff880221ffffd8 00000000001d5f00 ffff880223260000 ffff8800ca9ecec0
ffffffff82c44010 7fffffffffffffff ffffffff81ee3798 ffffffff81ee3790
Call Trace:
[<ffffffff81866219>] schedule+0x29/0x70
[<ffffffff8186b43c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110748c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff8186c4c0>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110761d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff81867c4c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4dc0>] ? wake_up_state+0x20/0x20
[<ffffffff81133627>] _rcu_barrier+0x677/0xcd0
[<ffffffff81133cd5>] rcu_barrier+0x15/0x20
[<ffffffff81720edf>] netdev_run_todo+0x6f/0x310
[<ffffffff81715aa5>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff8172df4e>] rtnl_unlock+0xe/0x10
[<ffffffff81717906>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd280>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff8170f9b3>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81710560>] cleanup_net+0x100/0x1f0
[<ffffffff810cc988>] process_one_work+0x218/0x850
[<ffffffff810cc8ef>] ? process_one_work+0x17f/0x850
[<ffffffff810cd0a7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd02b>] worker_thread+0x6b/0x4a0
[<ffffffff810ccfc0>] ? process_one_work+0x850/0x850
[<ffffffff810d337b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3270>] ? kthread_create_on_node+0x250/0x250
[<ffffffff8186d1fc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3270>] ? kthread_create_on_node+0x250/0x250
4 locks held by kworker/u16:6/96:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc8ef>]
#process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc8ef>]
#process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff817104ec>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff81133025>]
#_rcu_barrier+0x75/0xcd0
rcu_show_nocb_setup(): rcu_sched nocb state:
0: ffff8802267ced40 l:ffff8802267ced40 n:ffff8802269ced40 .G.
1: ffff8802269ced40 l:ffff8802267ced40 n: (null) ...
2: ffff880226bced40 l:ffff880226bced40 n:ffff880226dced40 .G.
3: ffff880226dced40 l:ffff880226bced40 n: (null) N..
4: ffff880226fced40 l:ffff880226fced40 n:ffff8802271ced40 .G.
5: ffff8802271ced40 l:ffff880226fced40 n: (null) ...
6: ffff8802273ced40 l:ffff8802273ced40 n:ffff8802275ced40 N..
7: ffff8802275ced40 l:ffff8802273ced40 n: (null) N..
rcu_show_nocb_setup(): rcu_bh nocb state:
0: ffff8802267ceac0 l:ffff8802267ceac0 n:ffff8802269ceac0 ...
1: ffff8802269ceac0 l:ffff8802267ceac0 n: (null) ...
2: ffff880226bceac0 l:ffff880226bceac0 n:ffff880226dceac0 ...
3: ffff880226dceac0 l:ffff880226bceac0 n: (null) ...
4: ffff880226fceac0 l:ffff880226fceac0 n:ffff8802271ceac0 ...
5: ffff8802271ceac0 l:ffff880226fceac0 n: (null) ...
6: ffff8802273ceac0 l:ffff8802273ceac0 n:ffff8802275ceac0 ...
7: ffff8802275ceac0 l:ffff8802273ceac0 n: (null) ...
INFO: task modprobe:554 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #16
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff8800c85dcec0 12456 554 553 0x00000000
ffff8802178afbf8 0000000000000096 ffff8800c85dcec0 00000000001d5f00
ffff8802178affd8 00000000001d5f00 ffffffff81e1b580 ffff8800c85dcec0
ffff8800c85dcec0 ffffffff81f90c08 0000000000000246 ffff8800c85dcec0
Call Trace:
[<ffffffff818667c1>] schedule_preempt_disabled+0x31/0x80
[<ffffffff81868013>] mutex_lock_nested+0x183/0x440
[<ffffffff8171037f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff8171037f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0619000>] ? 0xffffffffa0619000
[<ffffffff8171037f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0619048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff8115bc22>] load_module+0x20c2/0x2870
[<ffffffff81156c00>] ? store_uevent+0x70/0x70
[<ffffffff81281327>] ? kernel_read+0x57/0x90
[<ffffffff8115c5b6>] SyS_finit_module+0xa6/0xe0
[<ffffffff8186d2d5>] ? sysret_check+0x22/0x5d
[<ffffffff8186d2a9>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/554:
#0: (net_mutex){+.+.+.}, at: [<ffffffff8171037f>]
#register_pernet_subsys+0x1f/0x50
rcu_show_nocb_setup(): rcu_sched nocb state:
0: ffff8802267ced40 l:ffff8802267ced40 n:ffff8802269ced40 .G.
1: ffff8802269ced40 l:ffff8802267ced40 n: (null) ...
2: ffff880226bced40 l:ffff880226bced40 n:ffff880226dced40 .G.
3: ffff880226dced40 l:ffff880226bced40 n: (null) N..
4: ffff880226fced40 l:ffff880226fced40 n:ffff8802271ced40 .G.
5: ffff8802271ced40 l:ffff880226fced40 n: (null) ...
6: ffff8802273ced40 l:ffff8802273ced40 n:ffff8802275ced40 N..
7: ffff8802275ced40 l:ffff8802273ced40 n: (null) N..
rcu_show_nocb_setup(): rcu_bh nocb state:
0: ffff8802267ceac0 l:ffff8802267ceac0 n:ffff8802269ceac0 ...
1: ffff8802269ceac0 l:ffff8802267ceac0 n: (null) ...
2: ffff880226bceac0 l:ffff880226bceac0 n:ffff880226dceac0 ...
3: ffff880226dceac0 l:ffff880226bceac0 n: (null) ...
4: ffff880226fceac0 l:ffff880226fceac0 n:ffff8802271ceac0 ...
5: ffff8802271ceac0 l:ffff880226fceac0 n: (null) ...
6: ffff8802273ceac0 l:ffff8802273ceac0 n:ffff8802275ceac0 ...
7: ffff8802275ceac0 l:ffff8802273ceac0 n: (null) ...
Post by Paul E. McKenney
Thanx, Paul
------------------------------------------------------------------------
rcu: Dump no-CBs CPU state at task-hung time
Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0e5366200154..34048140577b 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)
#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
+static inline void rcu_show_nocb_setup(void)
+{
+}
+
#endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 52953790dcca..0b813bdb971b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;
bool rcu_is_watching(void);
+void rcu_show_nocb_setup(void);
+
#endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06db12434d72..e6e4d0f6b063 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
" disables this message.\n");
sched_show_task(t);
debug_show_held_locks(t);
+ rcu_show_nocb_setup();
touch_nmi_watchdog();
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 240fa9094f83..6b373e79ce0e 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
{
int i;
+ rcu_show_nocb_setup();
rcutorture_record_test_transition();
if (torture_cleanup_begin()) {
if (cur_ops->cb_barrier != NULL)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..285b3f6fb229 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
+void rcu_show_nocb_setup(void)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+ int cpu;
+ struct rcu_data *rdp;
+ struct rcu_state *rsp;
+
+ for_each_rcu_flavor(rsp) {
+ pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
+ for_each_possible_cpu(cpu) {
+ if (!rcu_is_nocb_cpu(cpu))
+ continue;
+ rdp = per_cpu_ptr(rsp->rda, cpu);
+ pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
+ cpu,
+ rdp, rdp->nocb_leader, rdp->nocb_next_follower,
+ ".N"[!!rdp->nocb_head],
+ ".G"[!!rdp->nocb_gp_head],
+ ".F"[!!rdp->nocb_follower_head]);
+ }
+ }
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
+}
+EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
+
/*
* An adaptive-ticks CPU can potentially execute in kernel mode for an
* arbitrarily long period of time with the scheduling-clock tick turned
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Josh Boyer
2014-10-22 18:10:02 UTC
Permalink
On Wed, Oct 22, 2014 at 1:59 PM, Paul E. McKenney
Post by Cong Wang
(Adding Paul and Eric in Cc)
I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().
Thanks.
Does commit 789cbbeca4e (workqueue: Add quiescent state between work items)
and 3e28e3772 (workqueue: Use cond_resched_rcu_qs macro) help this?
I don't believe so. The output below is from a post 3.18-rc1 kernel
(Linux v3.18-rc1-221-gc3351dfabf5c to be exact), and both of those
commits are included in that if I'm reading the git output correctly.

josh
Post by Cong Wang
Post by Josh Boyer
[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70
Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.
Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.
josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Loading...