[Bug 230498] Fatal trap 12: page fault while in kernel mode in sysctl_dumpentry from sysctl NET_RT

Discussion:

[Bug 230498] Fatal trap 12: page fault while in kernel mode in sysctl_dumpentry from sysctl NET_RT_DUMP

b***@freebsd.org

2018-11-07 20:25:25 UTC

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-07 21:01:07 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

Eugene Grosbein <***@freebsd.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org,
| |***@FreeBSD.org,
| |***@FreeBSD.org

--- Comment #2 from Eugene Grosbein <***@freebsd.org> ---
Add some people working with locking for rtsock recently to the CC list.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-07 22:01:32 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #3 from Eugene Grosbein <***@freebsd.org> ---
Created attachment 199064
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=199064&action=edit
add some checks

Please try this patch and see if it eliminates panics. Apply it:

cd /usr/src
patch < /path/to/patch

Then rebuild and reinstall the kernel.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-10 10:58:04 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #4 from Andrey V. Elsukov <***@FreeBSD.org> ---
(In reply to Eugene Grosbein from comment #3)

Created attachment 199064 [details]
add some checks
cd /usr/src
patch < /path/to/patch
Then rebuild and reinstall the kernel.

This patch is not correct way to fix the problem, I think you have not any
guarantee that you acquire the lock at the time when all data is correct.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-19 15:18:25 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

Franck Rousseau <***@imag.fr> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@imag.fr

--- Comment #5 from Franck Rousseau <***@imag.fr> ---
Indeed, this patch does not work. I have given more information at bug #227720
which is linked to this one.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-19 15:25:47 UTC

Permalink

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-19 16:03:56 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #6 from Andrey V. Elsukov <***@FreeBSD.org> ---
Created attachment 199344
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=199344&action=edit
Proposed patch (for stable/12+)

I think this problem can be fixed by this patch, but it is only applicable to
FreeBSD 12.0 and later. If you are able to test stable/12 with and without
patch, the feedback would be appreciated.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-19 16:10:36 UTC

Permalink

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-20 09:46:35 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

Andrey V. Elsukov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #199345|0 |1
is obsolete| |

--- Comment #8 from Andrey V. Elsukov <***@FreeBSD.org> ---
Created attachment 199372
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=199372&action=edit
Proposed patch (for stable/12+)

Sorry, but I think the panic is still possible. The kernel sets IFF_DYING flag
too late, instead we can check for presence of IFF_UP. Also, do not reset to
NULL ifp->if_addr pointer in the if_detach_internal(), this doesn't look like
very useful and also will protect us from NULL pointer dereference, when
another thread will detach interface after we check IFF_UP flag. The accessing
to if_addr is safe in this case due to using epoch_call() in ifa_free().

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-20 16:36:41 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #9 from Franck Rousseau <***@imag.fr> ---
Thanks for the tentative fix, I have just tested on 11.2 and 12-RC1 kernels. I
have adapted to 11.2 by removing the NET_EPOCH_* macros. The behavior changes,
there is no more crash, but it looks like something is not cleared as it
should.

Setting up ppp + proxy arp, everything works. Then, interrupting and restarting
ppp used to cause the crash consistently, but with this patch, ppp fails with
the following error :

PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 ->
192.168.0.1): File exists
Error: ipcp_InterfaceUp: unable to set ip address

Sorry, I don't have much time to dig into the route and interface handling code
right now.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-20 16:40:56 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #10 from Andrey V. Elsukov <***@FreeBSD.org> ---
(In reply to Franck Rousseau from comment #9)

Post by b***@freebsd.org
Thanks for the tentative fix, I have just tested on 11.2 and 12-RC1 kernels.
I have adapted to 11.2 by removing the NET_EPOCH_* macros. The behavior
changes, there is no more crash, but it looks like something is not cleared
as it should.
Setting up ppp + proxy arp, everything works. Then, interrupting and
restarting ppp used to cause the crash consistently, but with this patch,
PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 ->
192.168.0.1): File exists
Error: ipcp_InterfaceUp: unable to set ip address
Sorry, I don't have much time to dig into the route and interface handling
code right now.

No, without NET_EPOCH the patch won't work. It is the main feature that allows
to fix the problem and 11.x has not this feature.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-20 16:53:40 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #11 from Franck Rousseau <***@imag.fr> ---
Sure, I did the test in 12 as I just wrote, it was just to compare, since it
did not work.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 00:15:22 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #12 from Franck Rousseau <***@imag.fr> ---
(In reply to Andrey V. Elsukov from comment #10)

Just to clear things up:
- the crash happens both in 11.2 and 12
- the proposed fix breaks ppp

I did more tests with ppp as explained in bug #227720 this morning and noticed
the following:
- if the ppp server has two different addresses on the ethernet and ppp tun
interfaces, everything works fine, I can stop and start ppp without a problem
- if I configure the same address on the ethernet interface as the one set up
on the tun interface, then the next ppp connection works fine, but if I stop
the server, restart and re-open from the client I consistently get a crash

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 00:22:50 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #13 from ***@niw.com.au ---
I did make a pretty naive fix for this shortly after reporting it as the system
in question was crashing several times a day. Since applying this I have has no
further issues with it. It does mean the application querying gets back some
null pointers, but its likely better the application exits (if it does not
check for NULL pointers) than the entire system crashing ?

Index: rtsock.c
===================================================================
--- rtsock.c (revision 339318)
+++ rtsock.c (working copy)
@@ -1556,8 +1556,10 @@
rt_mask(rt), &ss);
info.rti_info[RTAX_GENMASK] = 0;
if (rt->rt_ifp) {
- info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
- info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;
+ if (rt->rt_ifp->if_addr)
+ info.rti_info[RTAX_IFP] =
rt->rt_ifp->if_addr->ifa_addr;
+ if (rt->rt_ifa)
+ info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;
if (rt->rt_ifp->if_flags & IFF_POINTOPOINT)
info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr;
}

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 06:36:06 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

Andrey V. Elsukov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #199372|0 |1
is obsolete| |

--- Comment #15 from Andrey V. Elsukov <***@FreeBSD.org> ---
Created attachment 199444
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=199444&action=edit
Proposed patch (for stable/12+)

Simplify the patch, remove the check for IFF_UP.
ifnet pointer should be safe to dereference while we in NET_EPOCH section.
Also, since if_addr now kept unchanged, it is safe to dereference it too.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 06:50:11 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #16 from Andrey V. Elsukov <***@FreeBSD.org> ---
(In reply to ian from comment #13)

Post by b***@freebsd.org
I did make a pretty naive fix for this shortly after reporting it as the
system in question was crashing several times a day. Since applying this I
have has no further issues with it. It does mean the application querying
gets back some null pointers, but its likely better the application exits
(if it does not check for NULL pointers) than the entire system crashing ?
Index: rtsock.c
===================================================================
--- rtsock.c (revision 339318)
+++ rtsock.c (working copy)
@@ -1556,8 +1556,10 @@
rt_mask(rt), &ss);
info.rti_info[RTAX_GENMASK] = 0;
if (rt->rt_ifp) {
- info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
- info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;
+ if (rt->rt_ifp->if_addr)
+ info.rti_info[RTAX_IFP] =
rt->rt_ifp->if_addr->ifa_addr;
+ if (rt->rt_ifa)
+ info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;
if (rt->rt_ifp->if_flags & IFF_POINTOPOINT)
info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr;
}

rt->rt_ifa should be safe to dereference, since rtentry holds reference to ifa
and it won't be freed. But access to rt_ifp->if_addr is not easy to protect in
stable/11. The problem happens due to interface is destroying in the time, when
we are doing iteration through routes. And even if you add NULL check here,
there is not any guarantee that you won't make access to already freed memory
in the rtsock_msg_buffer() a bit later, when you will make access to
info.rti_info[]. Also I think an application may expect presence of both
RTAX_IFP and RTAX_IFA pointers.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 06:32:50 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #14 from Andrey V. Elsukov <***@FreeBSD.org> ---
(In reply to Franck Rousseau from comment #12)

Post by b***@freebsd.org
- the crash happens both in 11.2 and 12
- the proposed fix breaks ppp
I did more tests with ppp as explained in bug #227720 this morning and
- if the ppp server has two different addresses on the ethernet and ppp tun
interfaces, everything works fine, I can stop and start ppp without a problem
- if I configure the same address on the ethernet interface as the one set
up on the tun interface, then the next ppp connection works fine, but if I
stop the server, restart and re-open from the client I consistently get a
crash

Ok, I think the problem with ppp is due to we don't return needed info when
interface isn't UP.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 14:20:32 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

Andrey V. Elsukov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #199444|0 |1
is obsolete| |

--- Comment #17 from Andrey V. Elsukov <***@FreeBSD.org> ---
Created attachment 199449
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=199449&action=edit
Proposed patch

I think this patch can be used for both FreeBSD 12 and 11. Use
IFNET_RLOCK_NOSLEEP() to protect from interface destroying during routes
iteration. In if_detach_internal() mark interface as dying just after we remove
it from the ifnets list. In sysctl_dumpentry() add the check, that interface
was not destroyed before doing the access.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-22 14:22:20 UTC

Permalink

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-27 09:04:48 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #19 from commit-***@freebsd.org ---
A commit references this bug:

Author: ae
Date: Tue Nov 27 09:04:07 UTC 2018
New revision: 341008
URL: https://svnweb.freebsd.org/changeset/base/341008

Log:
Fix possible panic during ifnet detach in rtsock.

The panic can happen, when some application does dump of routing table
using sysctl interface. To prevent this, set IFF_DYING flag in
if_detach_internal() function, when ifnet under lock is removed from
the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent
ifnet detach during routes enumeration. In case, if some interface was
detached in the time before we take the lock, add the check, that ifnet
is not DYING. This prevents access to memory that could be freed after
ifnet is unlinked.

PR: 227720, 230498, 233306
Reviewed by: bz, eugen
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18338

Changes:
head/sys/net/if.c
head/sys/net/rtsock.c

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-28 08:53:58 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #20 from Franck Rousseau <***@imag.fr> ---
(In reply to commit-hook from comment #19)

As mentioned in comment #9 above, this patch breaks ppp, I get this when trying
to re-open a second connection, this is the stage at which the crash occured
before:
PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 ->
192.168.0.1): File exists
Error: ipcp_InterfaceUp: unable to set ip address

Also, the patch in attachment #199450 does not fix this specific problem, we
still crash the kernel with the procedure described earlier in comment #12. As
I said, I could narrow down the cause and find a fix for our use case, by using
two different IPv4 addresses for Ethernet and PPP tun interfaces the kernel
does not crash anymore.

About the fix, I suspect that internal structures are corrupted, so any kind of
fix at this point will fail, for example with this patch on 11.2-p4 it looks
like I keep getting these values after the crash:

(kgdb) print rt->rt_ifp->if_flags
$3 = 3
(kgdb) print rt->rt_ifp->if_index
$4 = 63488

I will try to setup on-line debugging to watch internal structures and see if I
can get an idea of what is breaking things up.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-28 09:10:41 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #21 from Andrey V. Elsukov <***@FreeBSD.org> ---
(In reply to Franck Rousseau from comment #20)

Post by b***@freebsd.org
(In reply to commit-hook from comment #19)
As mentioned in comment #9 above, this patch breaks ppp, I get this when
trying to re-open a second connection, this is the stage at which the crash
PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 ->
192.168.0.1): File exists
Error: ipcp_InterfaceUp: unable to set ip address
Also, the patch in attachment #199450 [details] does not fix this specific
problem, we still crash the kernel with the procedure described earlier in
comment #12. As I said, I could narrow down the cause and find a fix for our
use case, by using two different IPv4 addresses for Ethernet and PPP tun
interfaces the kernel does not crash anymore.
About the fix, I suspect that internal structures are corrupted, so any kind
of fix at this point will fail, for example with this patch on 11.2-p4 it
(kgdb) print rt->rt_ifp->if_flags
$3 = 3
(kgdb) print rt->rt_ifp->if_index
$4 = 63488
I will try to setup on-line debugging to watch internal structures and see
if I can get an idea of what is breaking things up.

According to if_flags this patch doesn't affect your case, since if_flags =
(IFF_UP | IFF_BROADCAST). There is no IFF_DYING flag. Also, rtsock has several
places where it can panic due to the similar issue, but with different stack
trace (for example https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205678).
Are you sure that your panic is the same? Also if_index has unusual very large
value. Pleas, show your backtrace and show in context of noted frame the output
of "p *rt->rt_ifp" command.

--
You are receiving this mail because:
You are the assignee for the bug.

b***@freebsd.org

2018-11-28 09:23:31 UTC

Permalink

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498

--- Comment #22 from Franck Rousseau <***@imag.fr> ---
(In reply to Andrey V. Elsukov from comment #21)

Panic is at sys/net/rtsock.c:1559
1559 info.rti_info[RTAX_IFP] =
rt->rt_ifp->if_addr->ifa_addr;

The stack trace is always pretty much the same, as in bug 227720 comments 35
and 37, at this last comment you will also find the output of p *rt->rt_ifp

--
You are receiving this mail because:
You are the assignee for the bug.