Discussion:
[PATCH 5.5 010/151] ipvlan: add cond_resched_rcu() while processing muticast backlog
(too old to reply)
Greg Kroah-Hartman
2020-03-17 10:53:40 UTC
Permalink
From: Mahesh Bandewar <***@google.com>

[ Upstream commit e18b353f102e371580f3f01dd47567a25acc3c1d ]

If there are substantial number of slaves created as simulated by
Syzbot, the backlog processing could take much longer and result
into the issue found in the Syzbot report.

INFO: rcu_sched detected stalls on CPUs/tasks:
(detected by 1, t=10502 jiffies, g=5049, c=5048, q=752)
All QSes seen, last rcu_sched kthread activity 10502 (4294965563-4294955061), jiffies_till_next_fqs=1, root ->qsmask 0x0
syz-executor.1 R running task on cpu 1 10984 11210 3866 0x30020008 179034491270
Call Trace:
<IRQ>
[<ffffffff81497163>] _sched_show_task kernel/sched/core.c:8063 [inline]
[<ffffffff81497163>] _sched_show_task.cold+0x2fd/0x392 kernel/sched/core.c:8030
[<ffffffff8146a91b>] sched_show_task+0xb/0x10 kernel/sched/core.c:8073
[<ffffffff815c931b>] print_other_cpu_stall kernel/rcu/tree.c:1577 [inline]
[<ffffffff815c931b>] check_cpu_stall kernel/rcu/tree.c:1695 [inline]
[<ffffffff815c931b>] __rcu_pending kernel/rcu/tree.c:3478 [inline]
[<ffffffff815c931b>] rcu_pending kernel/rcu/tree.c:3540 [inline]
[<ffffffff815c931b>] rcu_check_callbacks.cold+0xbb4/0xc29 kernel/rcu/tree.c:2876
[<ffffffff815e3962>] update_process_times+0x32/0x80 kernel/time/timer.c:1635
[<ffffffff816164f0>] tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
[<ffffffff81616ae4>] tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
[<ffffffff815e75f7>] __run_hrtimer kernel/time/hrtimer.c:1393 [inline]
[<ffffffff815e75f7>] __hrtimer_run_queues+0x307/0xd90 kernel/time/hrtimer.c:1455
[<ffffffff815e90ea>] hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1513
[<ffffffff844050f4>] local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
[<ffffffff844050f4>] smp_apic_timer_interrupt+0x144/0x5e0 arch/x86/kernel/apic/apic.c:1056
[<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778
RIP: 0010:do_raw_read_lock+0x22/0x80 kernel/locking/spinlock_debug.c:153
RSP: 0018:ffff8801dad07ab8 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff12
RAX: 0000000000000000 RBX: ffff8801c4135680 RCX: 0000000000000000
RDX: 1ffff10038826afe RSI: ffff88019d816bb8 RDI: ffff8801c41357f0
RBP: ffff8801dad07ac0 R08: 0000000000004b15 R09: 0000000000310273
R10: ffff88019d816bb8 R11: 0000000000000001 R12: ffff8801c41357e8
R13: 0000000000000000 R14: ffff8801cfb19850 R15: ffff8801cfb198b0
[<ffffffff8101460e>] __raw_read_lock_bh include/linux/rwlock_api_smp.h:177 [inline]
[<ffffffff8101460e>] _raw_read_lock_bh+0x3e/0x50 kernel/locking/spinlock.c:240
[<ffffffff840d78ca>] ipv6_chk_mcast_addr+0x11a/0x6f0 net/ipv6/mcast.c:1006
[<ffffffff84023439>] ip6_mc_input+0x319/0x8e0 net/ipv6/ip6_input.c:482
[<ffffffff840211c8>] dst_input include/net/dst.h:449 [inline]
[<ffffffff840211c8>] ip6_rcv_finish+0x408/0x610 net/ipv6/ip6_input.c:78
[<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:292 [inline]
[<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:286 [inline]
[<ffffffff840214de>] ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:278
[<ffffffff83a29efa>] __netif_receive_skb_one_core+0x12a/0x1f0 net/core/dev.c:5303
[<ffffffff83a2a15c>] __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:5417
[<ffffffff83a2f536>] process_backlog+0x216/0x6c0 net/core/dev.c:6243
[<ffffffff83a30d1b>] napi_poll net/core/dev.c:6680 [inline]
[<ffffffff83a30d1b>] net_rx_action+0x47b/0xfb0 net/core/dev.c:6748
[<ffffffff846002c8>] __do_softirq+0x2c8/0x99a kernel/softirq.c:317
[<ffffffff813e656a>] invoke_softirq kernel/softirq.c:399 [inline]
[<ffffffff813e656a>] irq_exit+0x16a/0x1a0 kernel/softirq.c:439
[<ffffffff84405115>] exiting_irq arch/x86/include/asm/apic.h:561 [inline]
[<ffffffff84405115>] smp_apic_timer_interrupt+0x165/0x5e0 arch/x86/kernel/apic/apic.c:1058
[<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778
</IRQ>
RIP: 0010:__sanitizer_cov_trace_pc+0x26/0x50 kernel/kcov.c:102
RSP: 0018:ffff880196033bd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12
RAX: ffff88019d8161c0 RBX: 00000000ffffffff RCX: ffffc90003501000
RDX: 0000000000000002 RSI: ffffffff816236d1 RDI: 0000000000000005
RBP: ffff880196033bd8 R08: ffff88019d8161c0 R09: 0000000000000000
R10: 1ffff10032c067f0 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000
[<ffffffff816236d1>] do_futex+0x151/0x1d50 kernel/futex.c:3548
[<ffffffff816260f0>] C_SYSC_futex kernel/futex_compat.c:201 [inline]
[<ffffffff816260f0>] compat_SyS_futex+0x270/0x3b0 kernel/futex_compat.c:175
[<ffffffff8101da17>] do_syscall_32_irqs_on arch/x86/entry/common.c:353 [inline]
[<ffffffff8101da17>] do_fast_syscall_32+0x357/0xe1c arch/x86/entry/common.c:415
[<ffffffff84401a9b>] entry_SYSENTER_compat+0x8b/0x9d arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f23c69
RSP: 002b:00000000f5d1f12c EFLAGS: 00000282 ORIG_RAX: 00000000000000f0
RAX: ffffffffffffffda RBX: 000000000816af88 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000816af8c
RBP: 00000000f5d1f228 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
rcu_sched kthread starved for 10502 jiffies! g5049 c5048 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
rcu_sched R running task on cpu 1 13048 8 2 0x90000000 179099587640
Call Trace:
[<ffffffff8147321f>] context_switch+0x60f/0xa60 kernel/sched/core.c:3209
[<ffffffff8100095a>] __schedule+0x5aa/0x1da0 kernel/sched/core.c:3934
[<ffffffff810021df>] schedule+0x8f/0x1b0 kernel/sched/core.c:4011
[<ffffffff8101116d>] schedule_timeout+0x50d/0xee0 kernel/time/timer.c:1803
[<ffffffff815c13f1>] rcu_gp_kthread+0xda1/0x3b50 kernel/rcu/tree.c:2327
[<ffffffff8144b318>] kthread+0x348/0x420 kernel/kthread.c:246
[<ffffffff84400266>] ret_from_fork+0x56/0x70 arch/x86/entry/entry_64.S:393

Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”)
Signed-off-by: Mahesh Bandewar <***@google.com>
Reported-by: syzbot <***@googlegroups.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ipvlan/ipvlan_core.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -277,6 +277,7 @@ void ipvlan_process_multicast(struct wor
}
ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true);
local_bh_enable();
+ cond_resched_rcu();
}
rcu_read_unlock();
Greg Kroah-Hartman
2020-03-17 10:53:41 UTC
Permalink
From: Jiri Wiesner <***@suse.com>

[ Upstream commit 63aae7b17344d4b08a7d05cb07044de4c0f9dcc6 ]

There is a problem when ipvlan slaves are created on a master device that
is a vmxnet3 device (ipvlan in VMware guests). The vmxnet3 driver does not
support unicast address filtering. When an ipvlan device is brought up in
ipvlan_open(), the ipvlan driver calls dev_uc_add() to add the hardware
address of the vmxnet3 master device to the unicast address list of the
master device, phy_dev->uc. This inevitably leads to the vmxnet3 master
device being forced into promiscuous mode by __dev_set_rx_mode().

Promiscuous mode is switched on the master despite the fact that there is
still only one hardware address that the master device should use for
filtering in order for the ipvlan device to be able to receive packets.
The comment above struct net_device describes the uc_promisc member as a
"counter, that indicates, that promiscuous mode has been enabled due to
the need to listen to additional unicast addresses in a device that does
not implement ndo_set_rx_mode()". Moreover, the design of ipvlan
guarantees that only the hardware address of a master device,
phy_dev->dev_addr, will be used to transmit and receive all packets from
its ipvlan slaves. Thus, the unicast address list of the master device
should not be modified by ipvlan_open() and ipvlan_stop() in order to make
ipvlan a workable option on masters that do not support unicast address
filtering.

Fixes: 2ad7bf3638411 ("ipvlan: Initial check-in of the IPVLAN driver")
Reported-by: Per Sundstrom <***@redqube.se>
Signed-off-by: Jiri Wiesner <***@suse.com>
Reviewed-by: Eric Dumazet <***@google.com>
Acked-by: Mahesh Bandewar <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ipvlan/ipvlan_main.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -164,7 +164,6 @@ static void ipvlan_uninit(struct net_dev
static int ipvlan_open(struct net_device *dev)
{
struct ipvl_dev *ipvlan = netdev_priv(dev);
- struct net_device *phy_dev = ipvlan->phy_dev;
struct ipvl_addr *addr;

if (ipvlan->port->mode == IPVLAN_MODE_L3 ||
@@ -178,7 +177,7 @@ static int ipvlan_open(struct net_device
ipvlan_ht_addr_add(ipvlan, addr);
rcu_read_unlock();

- return dev_uc_add(phy_dev, phy_dev->dev_addr);
+ return 0;
}

static int ipvlan_stop(struct net_device *dev)
@@ -190,8 +189,6 @@ static int ipvlan_stop(struct net_device
dev_uc_unsync(phy_dev, dev);
dev_mc_unsync(phy_dev, dev);

- dev_uc_del(phy_dev, phy_dev->dev_addr);
-
rcu_read_lock();
list_for_each_entry_rcu(addr, &ipvlan->addrs, anode)
ipvlan_ht_addr_del(addr);
Greg Kroah-Hartman
2020-03-17 10:53:35 UTC
Permalink
From: Dmitry Yakunin <***@yandex-team.ru>

[ Upstream commit 018d26fcd12a75fb9b5fe233762aa3f2f0854b88 ]

In our production environment we have faced with problem that updating
classid in cgroup with heavy tasks cause long freeze of the file tables
in this tasks. By heavy tasks we understand tasks with many threads and
opened sockets (e.g. balancers). This freeze leads to an increase number
of client timeouts.

This patch implements following logic to fix this issue:
аfter iterating 1000 file descriptors file table lock will be released
thus providing a time gap for socket creation/deletion.

Now update is non atomic and socket may be skipped using calls:

dup2(oldfd, newfd);
close(oldfd);

But this case is not typical. Moreover before this patch skip is possible
too by hiding socket fd in unix socket buffer.

New sockets will be allocated with updated classid because cgroup state
is updated before start of the file descriptors iteration.

So in common cases this patch has no side effects.

Signed-off-by: Dmitry Yakunin <***@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <***@yandex-team.ru>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/core/netclassid_cgroup.c | 47 +++++++++++++++++++++++++++++++++----------
1 file changed, 37 insertions(+), 10 deletions(-)

--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -53,30 +53,60 @@ static void cgrp_css_free(struct cgroup_
kfree(css_cls_state(css));
}

+/*
+ * To avoid freezing of sockets creation for tasks with big number of threads
+ * and opened sockets lets release file_lock every 1000 iterated descriptors.
+ * New sockets will already have been created with new classid.
+ */
+
+struct update_classid_context {
+ u32 classid;
+ unsigned int batch;
+};
+
+#define UPDATE_CLASSID_BATCH 1000
+
static int update_classid_sock(const void *v, struct file *file, unsigned n)
{
int err;
+ struct update_classid_context *ctx = (void *)v;
struct socket *sock = sock_from_file(file, &err);

if (sock) {
spin_lock(&cgroup_sk_update_lock);
- sock_cgroup_set_classid(&sock->sk->sk_cgrp_data,
- (unsigned long)v);
+ sock_cgroup_set_classid(&sock->sk->sk_cgrp_data, ctx->classid);
spin_unlock(&cgroup_sk_update_lock);
}
+ if (--ctx->batch == 0) {
+ ctx->batch = UPDATE_CLASSID_BATCH;
+ return n + 1;
+ }
return 0;
}

+static void update_classid_task(struct task_struct *p, u32 classid)
+{
+ struct update_classid_context ctx = {
+ .classid = classid,
+ .batch = UPDATE_CLASSID_BATCH
+ };
+ unsigned int fd = 0;
+
+ do {
+ task_lock(p);
+ fd = iterate_fd(p->files, fd, update_classid_sock, &ctx);
+ task_unlock(p);
+ cond_resched();
+ } while (fd);
+}
+
static void cgrp_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
struct task_struct *p;

cgroup_taskset_for_each(p, css, tset) {
- task_lock(p);
- iterate_fd(p->files, 0, update_classid_sock,
- (void *)(unsigned long)css_cls_state(css)->classid);
- task_unlock(p);
+ update_classid_task(p, css_cls_state(css)->classid);
}
}

@@ -98,10 +128,7 @@ static int write_classid(struct cgroup_s

css_task_iter_start(css, 0, &it);
while ((p = css_task_iter_next(&it))) {
- task_lock(p);
- iterate_fd(p->files, 0, update_classid_sock,
- (void *)(unsigned long)cs->classid);
- task_unlock(p);
+ update_classid_task(p, cs->classid);
cond_resched();
}
css_task_iter_end(&it);
Greg Kroah-Hartman
2020-03-17 10:53:36 UTC
Permalink
From: Vishal Kulkarni <***@chelsio.com>

[ Upstream commit 116ca924aea664141afa86a1425edc3fcda0d06f ]

Hardware can support more than 8 queues currently limited by
netif_get_num_default_rss_queues(). So, rework and fix checks for max
number of queues to allocate. The checks should be based on how many are
actually supported by hardware, OR the number of online cpus; whichever
is lower.

Fixes: 5952dde72307 ("cxgb4: set maximal number of default RSS queues")
Signed-off-by: Vishal Kulkarni <***@chelsio.com>"
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 49 +++++++++++++-----------
1 file changed, 27 insertions(+), 22 deletions(-)

--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5351,12 +5351,11 @@ static inline bool is_x_10g_port(const s
static int cfg_queues(struct adapter *adap)
{
u32 avail_qsets, avail_eth_qsets, avail_uld_qsets;
+ u32 i, n10g = 0, qidx = 0, n1g = 0;
+ u32 ncpus = num_online_cpus();
u32 niqflint, neq, num_ulds;
struct sge *s = &adap->sge;
- u32 i, n10g = 0, qidx = 0;
-#ifndef CONFIG_CHELSIO_T4_DCB
- int q10g = 0;
-#endif
+ u32 q10g = 0, q1g;

/* Reduce memory usage in kdump environment, disable all offload. */
if (is_kdump_kernel() || (is_uld(adap) && t4_uld_mem_alloc(adap))) {
@@ -5394,44 +5393,50 @@ static int cfg_queues(struct adapter *ad
n10g += is_x_10g_port(&adap2pinfo(adap, i)->link_cfg);

avail_eth_qsets = min_t(u32, avail_qsets, MAX_ETH_QSETS);
+
+ /* We default to 1 queue per non-10G port and up to # of cores queues
+ * per 10G port.
+ */
+ if (n10g)
+ q10g = (avail_eth_qsets - (adap->params.nports - n10g)) / n10g;
+
+ n1g = adap->params.nports - n10g;
#ifdef CONFIG_CHELSIO_T4_DCB
/* For Data Center Bridging support we need to be able to support up
* to 8 Traffic Priorities; each of which will be assigned to its
* own TX Queue in order to prevent Head-Of-Line Blocking.
*/
+ q1g = 8;
if (adap->params.nports * 8 > avail_eth_qsets) {
dev_err(adap->pdev_dev, "DCB avail_eth_qsets=%d < %d!\n",
avail_eth_qsets, adap->params.nports * 8);
return -ENOMEM;
}

- for_each_port(adap, i) {
- struct port_info *pi = adap2pinfo(adap, i);
+ if (adap->params.nports * ncpus < avail_eth_qsets)
+ q10g = max(8U, ncpus);
+ else
+ q10g = max(8U, q10g);

- pi->first_qset = qidx;
- pi->nqsets = is_kdump_kernel() ? 1 : 8;
- qidx += pi->nqsets;
- }
-#else /* !CONFIG_CHELSIO_T4_DCB */
- /* We default to 1 queue per non-10G port and up to # of cores queues
- * per 10G port.
- */
- if (n10g)
- q10g = (avail_eth_qsets - (adap->params.nports - n10g)) / n10g;
- if (q10g > netif_get_num_default_rss_queues())
- q10g = netif_get_num_default_rss_queues();
+ while ((q10g * n10g) > (avail_eth_qsets - n1g * q1g))
+ q10g--;

- if (is_kdump_kernel())
+#else /* !CONFIG_CHELSIO_T4_DCB */
+ q1g = 1;
+ q10g = min(q10g, ncpus);
+#endif /* !CONFIG_CHELSIO_T4_DCB */
+ if (is_kdump_kernel()) {
q10g = 1;
+ q1g = 1;
+ }

for_each_port(adap, i) {
struct port_info *pi = adap2pinfo(adap, i);

pi->first_qset = qidx;
- pi->nqsets = is_x_10g_port(&pi->link_cfg) ? q10g : 1;
+ pi->nqsets = is_x_10g_port(&pi->link_cfg) ? q10g : q1g;
qidx += pi->nqsets;
}
-#endif /* !CONFIG_CHELSIO_T4_DCB */

s->ethqsets = qidx;
s->max_ethqsets = qidx; /* MSI-X may lower it later */
@@ -5443,7 +5448,7 @@ static int cfg_queues(struct adapter *ad
* capped by the number of available cores.
*/
num_ulds = adap->num_uld + adap->num_ofld_uld;
- i = min_t(u32, MAX_OFLD_QSETS, num_online_cpus());
+ i = min_t(u32, MAX_OFLD_QSETS, ncpus);
avail_uld_qsets = roundup(i, adap->params.nports);
if (avail_qsets < num_ulds * adap->params.nports) {
adap->params.offload = 0;
Greg Kroah-Hartman
2020-03-17 10:53:37 UTC
Permalink
From: Eric Dumazet <***@google.com>

[ Upstream commit 17c25cafd4d3e74c83dce56b158843b19c40b414 ]

syzbot found an interesting case of the kernel reading
an uninit-value [1]

Problem is in the handling of ETH_P_WCCP in gre_parse_header()

We look at the byte following GRE options to eventually decide
if the options are four bytes longer.

Use skb_header_pointer() to not pull bytes if we found
that no more bytes were needed.

All callers of gre_parse_header() are properly using pskb_may_pull()
anyway before proceeding to next header.

[1]
BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2303 [inline]
BUG: KMSAN: uninit-value in __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
CPU: 1 PID: 11784 Comm: syz-executor940 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
pskb_may_pull include/linux/skbuff.h:2303 [inline]
__iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
iptunnel_pull_header include/net/ip_tunnels.h:411 [inline]
gre_rcv+0x15e/0x19c0 net/ipv6/ip6_gre.c:606
ip6_protocol_deliver_rcu+0x181b/0x22c0 net/ipv6/ip6_input.c:432
ip6_input_finish net/ipv6/ip6_input.c:473 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ip6_input net/ipv6/ip6_input.c:482 [inline]
ip6_mc_input+0xdf2/0x1460 net/ipv6/ip6_input.c:576
dst_input include/net/dst.h:442 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:306
__netif_receive_skb_one_core net/core/dev.c:5198 [inline]
__netif_receive_skb net/core/dev.c:5312 [inline]
netif_receive_skb_internal net/core/dev.c:5402 [inline]
netif_receive_skb+0x66b/0xf20 net/core/dev.c:5461
tun_rx_batched include/linux/skbuff.h:4321 [inline]
tun_get_user+0x6aef/0x6f60 drivers/net/tun.c:1997
tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
call_write_iter include/linux/fs.h:1901 [inline]
new_sync_write fs/read_write.c:483 [inline]
__vfs_write+0xa5a/0xca0 fs/read_write.c:496
vfs_write+0x44a/0x8f0 fs/read_write.c:558
ksys_write+0x267/0x450 fs/read_write.c:611
__do_sys_write fs/read_write.c:623 [inline]
__se_sys_write fs/read_write.c:620 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:620
do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f62d99
Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000fffedb2c EFLAGS: 00000217 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020002580
RDX: 0000000000000fca RSI: 0000000000000036 RDI: 0000000000000004
RBP: 0000000000008914 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
slab_alloc_node mm/slub.c:2793 [inline]
__kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
__kmalloc_reserve net/core/skbuff.c:142 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
alloc_skb include/linux/skbuff.h:1051 [inline]
alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766
sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242
tun_alloc_skb drivers/net/tun.c:1529 [inline]
tun_get_user+0x10ae/0x6f60 drivers/net/tun.c:1843
tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
call_write_iter include/linux/fs.h:1901 [inline]
new_sync_write fs/read_write.c:483 [inline]
__vfs_write+0xa5a/0xca0 fs/read_write.c:496
vfs_write+0x44a/0x8f0 fs/read_write.c:558
ksys_write+0x267/0x450 fs/read_write.c:611
__do_sys_write fs/read_write.c:623 [inline]
__se_sys_write fs/read_write.c:620 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:620
do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139

Fixes: 95f5c64c3c13 ("gre: Move utility functions to common headers")
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Eric Dumazet <***@google.com>
Reported-by: syzbot <***@googlegroups.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ipv4/gre_demux.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -56,7 +56,9 @@ int gre_del_protocol(const struct gre_pr
}
EXPORT_SYMBOL_GPL(gre_del_protocol);

-/* Fills in tpi and returns header length to be pulled. */
+/* Fills in tpi and returns header length to be pulled.
+ * Note that caller must use pskb_may_pull() before pulling GRE header.
+ */
int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
bool *csum_err, __be16 proto, int nhs)
{
@@ -110,8 +112,14 @@ int gre_parse_header(struct sk_buff *skb
* - When dealing with WCCPv2, Skip extra 4 bytes in GRE header
*/
if (greh->flags == 0 && tpi->proto == htons(ETH_P_WCCP)) {
+ u8 _val, *val;
+
+ val = skb_header_pointer(skb, nhs + hdr_len,
+ sizeof(_val), &_val);
+ if (!val)
+ return -EINVAL;
tpi->proto = proto;
- if ((*(u8 *)options & 0xF0) != 0x40)
+ if ((*val & 0xF0) != 0x40)
hdr_len += 4;
}
tpi->hdr_len = hdr_len;
Greg Kroah-Hartman
2020-03-17 10:53:38 UTC
Permalink
From: Dmitry Yakunin <***@yandex-team.ru>

[ Upstream commit 83f73c5bb7b9a9135173f0ba2b1aa00c06664ff9 ]

In commit 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and
fallback to priority") croup classid reporting was fixed. But this works
only for TCP sockets because for other socket types icsk parameter can
be NULL and classid code path is skipped. This change moves classid
handling to inet_diag_msg_attrs_fill() function.

Also inet_diag_msg_attrs_size() helper was added and addends in
nlmsg_new() were reordered to save order from inet_sk_diag_fill().

Fixes: 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and fallback to priority")
Signed-off-by: Dmitry Yakunin <***@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <***@yandex-team.ru>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
include/linux/inet_diag.h | 18 ++++++++++++------
net/ipv4/inet_diag.c | 44 ++++++++++++++++++++------------------------
net/ipv4/raw_diag.c | 5 +++--
net/ipv4/udp_diag.c | 5 +++--
net/sctp/diag.c | 8 ++------
5 files changed, 40 insertions(+), 40 deletions(-)

--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -2,15 +2,10 @@
#ifndef _INET_DIAG_H_
#define _INET_DIAG_H_ 1

+#include <net/netlink.h>
#include <uapi/linux/inet_diag.h>

-struct net;
-struct sock;
struct inet_hashinfo;
-struct nlattr;
-struct nlmsghdr;
-struct sk_buff;
-struct netlink_callback;

struct inet_diag_handler {
void (*dump)(struct sk_buff *skb,
@@ -62,6 +57,17 @@ int inet_diag_bc_sk(const struct nlattr

void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk);

+static inline size_t inet_diag_msg_attrs_size(void)
+{
+ return nla_total_size(1) /* INET_DIAG_SHUTDOWN */
+ + nla_total_size(1) /* INET_DIAG_TOS */
+#if IS_ENABLED(CONFIG_IPV6)
+ + nla_total_size(1) /* INET_DIAG_TCLASS */
+ + nla_total_size(1) /* INET_DIAG_SKV6ONLY */
+#endif
+ + nla_total_size(4) /* INET_DIAG_MARK */
+ + nla_total_size(4); /* INET_DIAG_CLASS_ID */
+}
int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
struct inet_diag_msg *r, int ext,
struct user_namespace *user_ns, bool net_admin);
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -100,13 +100,9 @@ static size_t inet_sk_attr_size(struct s
aux = handler->idiag_get_aux_size(sk, net_admin);

return nla_total_size(sizeof(struct tcp_info))
- + nla_total_size(1) /* INET_DIAG_SHUTDOWN */
- + nla_total_size(1) /* INET_DIAG_TOS */
- + nla_total_size(1) /* INET_DIAG_TCLASS */
- + nla_total_size(4) /* INET_DIAG_MARK */
- + nla_total_size(4) /* INET_DIAG_CLASS_ID */
- + nla_total_size(sizeof(struct inet_diag_meminfo))
+ nla_total_size(sizeof(struct inet_diag_msg))
+ + inet_diag_msg_attrs_size()
+ + nla_total_size(sizeof(struct inet_diag_meminfo))
+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
+ nla_total_size(TCP_CA_NAME_MAX)
+ nla_total_size(sizeof(struct tcpvegas_info))
@@ -147,6 +143,24 @@ int inet_diag_msg_attrs_fill(struct sock
if (net_admin && nla_put_u32(skb, INET_DIAG_MARK, sk->sk_mark))
goto errout;

+ if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) ||
+ ext & (1 << (INET_DIAG_TCLASS - 1))) {
+ u32 classid = 0;
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+ classid = sock_cgroup_classid(&sk->sk_cgrp_data);
+#endif
+ /* Fallback to socket priority if class id isn't set.
+ * Classful qdiscs use it as direct reference to class.
+ * For cgroup2 classid is always zero.
+ */
+ if (!classid)
+ classid = sk->sk_priority;
+
+ if (nla_put_u32(skb, INET_DIAG_CLASS_ID, classid))
+ goto errout;
+ }
+
r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
r->idiag_inode = sock_i_ino(sk);

@@ -284,24 +298,6 @@ int inet_sk_diag_fill(struct sock *sk, s
goto errout;
}

- if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) ||
- ext & (1 << (INET_DIAG_TCLASS - 1))) {
- u32 classid = 0;
-
-#ifdef CONFIG_SOCK_CGROUP_DATA
- classid = sock_cgroup_classid(&sk->sk_cgrp_data);
-#endif
- /* Fallback to socket priority if class id isn't set.
- * Classful qdiscs use it as direct reference to class.
- * For cgroup2 classid is always zero.
- */
- if (!classid)
- classid = sk->sk_priority;
-
- if (nla_put_u32(skb, INET_DIAG_CLASS_ID, classid))
- goto errout;
- }
-
out:
nlmsg_end(skb, nlh);
return 0;
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -100,8 +100,9 @@ static int raw_diag_dump_one(struct sk_b
if (IS_ERR(sk))
return PTR_ERR(sk);

- rep = nlmsg_new(sizeof(struct inet_diag_msg) +
- sizeof(struct inet_diag_meminfo) + 64,
+ rep = nlmsg_new(nla_total_size(sizeof(struct inet_diag_msg)) +
+ inet_diag_msg_attrs_size() +
+ nla_total_size(sizeof(struct inet_diag_meminfo)) + 64,
GFP_KERNEL);
if (!rep) {
sock_put(sk);
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -64,8 +64,9 @@ static int udp_dump_one(struct udp_table
goto out;

err = -ENOMEM;
- rep = nlmsg_new(sizeof(struct inet_diag_msg) +
- sizeof(struct inet_diag_meminfo) + 64,
+ rep = nlmsg_new(nla_total_size(sizeof(struct inet_diag_msg)) +
+ inet_diag_msg_attrs_size() +
+ nla_total_size(sizeof(struct inet_diag_meminfo)) + 64,
GFP_KERNEL);
if (!rep)
goto out;
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -237,15 +237,11 @@ static size_t inet_assoc_attr_size(struc
addrcnt++;

return nla_total_size(sizeof(struct sctp_info))
- + nla_total_size(1) /* INET_DIAG_SHUTDOWN */
- + nla_total_size(1) /* INET_DIAG_TOS */
- + nla_total_size(1) /* INET_DIAG_TCLASS */
- + nla_total_size(4) /* INET_DIAG_MARK */
- + nla_total_size(4) /* INET_DIAG_CLASS_ID */
+ nla_total_size(addrlen * asoc->peer.transport_count)
+ nla_total_size(addrlen * addrcnt)
- + nla_total_size(sizeof(struct inet_diag_meminfo))
+ nla_total_size(sizeof(struct inet_diag_msg))
+ + inet_diag_msg_attrs_size()
+ + nla_total_size(sizeof(struct inet_diag_meminfo))
+ 64;
}
Greg Kroah-Hartman
2020-03-17 10:53:39 UTC
Permalink
From: Hangbin Liu <***@gmail.com>

[ Upstream commit 60380488e4e0b95e9e82aa68aa9705baa86de84c ]

Rafał found an issue that for non-Ethernet interface, if we down and up
frequently, the memory will be consumed slowly.

The reason is we add allnodes/allrouters addressed in multicast list in
ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
getting bigger and bigger. The call stack looks like:

addrconf_notify(NETDEV_REGISTER)
ipv6_add_dev
ipv6_dev_mc_inc(ff01::1)
ipv6_dev_mc_inc(ff02::1)
ipv6_dev_mc_inc(ff02::2)

addrconf_notify(NETDEV_UP)
addrconf_dev_config
/* Alas, we support only Ethernet autoconfiguration. */
return;

addrconf_notify(NETDEV_DOWN)
addrconf_ifdown
ipv6_mc_down
igmp6_group_dropped(ff02::2)
mld_add_delrec(ff02::2)
igmp6_group_dropped(ff02::1)
igmp6_group_dropped(ff01::1)

After investigating, I can't found a rule to disable multicast on
non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
in inetdev_event(). Even for IPv6, we don't check the dev type and call
ipv6_add_dev(), ipv6_dev_mc_inc() after register device.

So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
non-Ethernet interface.

v2: Also check IFF_MULTICAST flag to make sure the interface supports
multicast

Reported-by: Rafał Miłecki <***@gmail.com>
Tested-by: Rafał Miłecki <***@gmail.com>
Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down")
Signed-off-by: Hangbin Liu <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ipv6/addrconf.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3345,6 +3345,10 @@ static void addrconf_dev_config(struct n
(dev->type != ARPHRD_NONE) &&
(dev->type != ARPHRD_RAWIP)) {
/* Alas, we support only Ethernet autoconfiguration. */
+ idev = __in6_dev_get(dev);
+ if (!IS_ERR_OR_NULL(idev) && dev->flags & IFF_UP &&
+ dev->flags & IFF_MULTICAST)
+ ipv6_mc_up(idev);
return;
}
Greg Kroah-Hartman
2020-03-17 10:53:44 UTC
Permalink
From: Mahesh Bandewar <***@google.com>

[ Upstream commit ce9a4186f9ac475c415ffd20348176a4ea366670 ]

The Rx bound multicast packets are deferred to a workqueue and
macvlan can also suffer from the same attack that was discovered
by Syzbot for IPvlan. This solution is not as effective as in
IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing
to a workqueue while macvlan does this way only for the Rx. This
fix should address the Rx codition to certain extent.

Tx is still suseptible. Tx multicast processing happens when
.ndo_start_xmit is called, hence we cannot add cond_resched().
However, it's not that severe since the user which is generating
/ flooding will be affected the most.

Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Mahesh Bandewar <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/macvlan.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -334,6 +334,8 @@ static void macvlan_process_broadcast(st
if (src)
dev_put(src->dev);
consume_skb(skb);
+
+ cond_resched();
}
}
Greg Kroah-Hartman
2020-03-17 10:53:53 UTC
Permalink
From: Dan Carpenter <***@oracle.com>

[ Upstream commit a3aefbfe45751bf7b338c181b97608e276b5bb73 ]

This is similar to commit 674d9de02aa7 ("NFC: Fix possible memory
corruption when handling SHDLC I-Frame commands") and commit d7ee81ad09f0
("NFC: nci: Add some bounds checking in nci_hci_cmd_received()") which
added range checks on "pipe".

The "pipe" variable comes skb->data[0] in nfc_hci_msg_rx_work().
It's in the 0-255 range. We're using it as the array index into the
hdev->pipes[] array which has NFC_HCI_MAX_PIPES (128) members.

Fixes: 118278f20aa8 ("NFC: hci: Add pipes table to reference them with a tuple {gate, host}")
Signed-off-by: Dan Carpenter <***@oracle.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/nfc/hci/core.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

--- a/net/nfc/hci/core.c
+++ b/net/nfc/hci/core.c
@@ -181,13 +181,20 @@ exit:
void nfc_hci_cmd_received(struct nfc_hci_dev *hdev, u8 pipe, u8 cmd,
struct sk_buff *skb)
{
- u8 gate = hdev->pipes[pipe].gate;
u8 status = NFC_HCI_ANY_OK;
struct hci_create_pipe_resp *create_info;
struct hci_delete_pipe_noti *delete_info;
struct hci_all_pipe_cleared_noti *cleared_info;
+ u8 gate;

- pr_debug("from gate %x pipe %x cmd %x\n", gate, pipe, cmd);
+ pr_debug("from pipe %x cmd %x\n", pipe, cmd);
+
+ if (pipe >= NFC_HCI_MAX_PIPES) {
+ status = NFC_HCI_ANY_E_NOK;
+ goto exit;
+ }
+
+ gate = hdev->pipes[pipe].gate;

switch (cmd) {
case NFC_HCI_ADM_NOTIFY_PIPE_CREATED:
@@ -375,8 +382,14 @@ void nfc_hci_event_received(struct nfc_h
struct sk_buff *skb)
{
int r = 0;
- u8 gate = hdev->pipes[pipe].gate;
+ u8 gate;
+
+ if (pipe >= NFC_HCI_MAX_PIPES) {
+ pr_err("Discarded event %x to invalid pipe %x\n", event, pipe);
+ goto exit;
+ }

+ gate = hdev->pipes[pipe].gate;
if (gate == NFC_HCI_INVALID_GATE) {
pr_err("Discarded event %x to unopened pipe %x\n", event, pipe);
goto exit;
Greg Kroah-Hartman
2020-03-17 10:53:54 UTC
Permalink
From: Willem de Bruijn <***@google.com>

[ Upstream commit 46e4c421a053c36bf7a33dda2272481bcaf3eed3 ]

In one error case, tpacket_rcv drops packets after incrementing the
ring producer index.

If this happens, it does not update tp_status to TP_STATUS_USER and
thus the reader is stalled for an iteration of the ring, causing out
of order arrival.

The only such error path is when virtio_net_hdr_from_skb fails due
to encountering an unknown GSO type.

Signed-off-by: Willem de Bruijn <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/packet/af_packet.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2273,6 +2273,13 @@ static int tpacket_rcv(struct sk_buff *s
TP_STATUS_KERNEL, (macoff+snaplen));
if (!h.raw)
goto drop_n_account;
+
+ if (do_vnet &&
+ virtio_net_hdr_from_skb(skb, h.raw + macoff -
+ sizeof(struct virtio_net_hdr),
+ vio_le(), true, 0))
+ goto drop_n_account;
+
if (po->tp_version <= TPACKET_V2) {
packet_increment_rx_head(po, &po->rx_ring);
/*
@@ -2285,12 +2292,6 @@ static int tpacket_rcv(struct sk_buff *s
status |= TP_STATUS_LOSING;
}

- if (do_vnet &&
- virtio_net_hdr_from_skb(skb, h.raw + macoff -
- sizeof(struct virtio_net_hdr),
- vio_le(), true, 0))
- goto drop_n_account;
-
po->stats.stats1.tp_packets++;
if (copy_skb) {
status |= TP_STATUS_COPY;
Greg Kroah-Hartman
2020-03-17 10:53:55 UTC
Permalink
From: Jonas Gorski <***@gmail.com>

[ Upstream commit 43de81b0601df7d7988d3f5617ee0987df65c883 ]

719655a14971 ("net: phy: Replace phy driver features u32 with link_mode
bitmap") was a bit over-eager and also removed the second phy driver's
name, resulting in a nasty OOPS on registration:

[ 1.319854] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 804dd50c, ra == 804dd4f0
[ 1.330859] Oops[#1]:
[ 1.333138] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.22 #0
[ 1.339217] $ 0 : 00000000 00000001 87ca7f00 805c1874
[ 1.344590] $ 4 : 00000000 00000047 00585000 8701f800
[ 1.349965] $ 8 : 8701f800 804f4a5c 00000003 64726976
[ 1.355341] $12 : 00000001 00000000 00000000 00000114
[ 1.360718] $16 : 87ca7f80 00000000 00000000 80639fe4
[ 1.366093] $20 : 00000002 00000000 806441d0 80b90000
[ 1.371470] $24 : 00000000 00000000
[ 1.376847] $28 : 87c1e000 87c1fda0 80b90000 804dd4f0
[ 1.382224] Hi : d1c8f8da
[ 1.385180] Lo : 5518a480
[ 1.388182] epc : 804dd50c kset_find_obj+0x3c/0x114
[ 1.393345] ra : 804dd4f0 kset_find_obj+0x20/0x114
[ 1.398530] Status: 10008703 KERNEL EXL IE
[ 1.402833] Cause : 00800008 (ExcCode 02)
[ 1.406952] BadVA : 00000000
[ 1.409913] PrId : 0002a075 (Broadcom BMIPS4350)
[ 1.414745] Modules linked in:
[ 1.417895] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[ 1.426214] Stack : 87cec000 80630000 80639370 80640658 80640000 80049af4 80639fe4 8063a0d8
[ 1.434816] 8063a0d8 802ef078 00000002 00000000 806441d0 80b90000 8063a0d8 802ef114
[ 1.443417] 87cea0de 87c1fde0 00000000 804de488 87cea000 8063a0d8 8063a0d8 80334e48
[ 1.452018] 80640000 8063984c 80639bf4 00000000 8065de48 00000001 8063a0d8 80334ed0
[ 1.460620] 806441d0 80b90000 80b90000 802ef164 8065dd70 80620000 80b90000 8065de58
[ 1.469222] ...
[ 1.471734] Call Trace:
[ 1.474255] [<804dd50c>] kset_find_obj+0x3c/0x114
[ 1.479141] [<802ef078>] driver_find+0x1c/0x44
[ 1.483665] [<802ef114>] driver_register+0x74/0x148
[ 1.488719] [<80334e48>] phy_driver_register+0x9c/0xd0
[ 1.493968] [<80334ed0>] phy_drivers_register+0x54/0xe8
[ 1.499345] [<8001061c>] do_one_initcall+0x7c/0x1f4
[ 1.504374] [<80644ed8>] kernel_init_freeable+0x1d4/0x2b4
[ 1.509940] [<804f4e24>] kernel_init+0x10/0xf8
[ 1.514502] [<80018e68>] ret_from_kernel_thread+0x14/0x1c
[ 1.520040] Code: 1060000c 02202025 90650000 <90810000> 24630001 14250004 24840001 14a0fffb 90650000
[ 1.530061]
[ 1.531698] ---[ end trace d52f1717cd29bdc8 ]---

Fix it by readding the name.

Fixes: 719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Signed-off-by: Jonas Gorski <***@gmail.com>
Acked-by: Florian Fainelli <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/phy/bcm63xx.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/phy/bcm63xx.c
+++ b/drivers/net/phy/bcm63xx.c
@@ -73,6 +73,7 @@ static struct phy_driver bcm63xx_driver[
/* same phy as above, with just a different OUI */
.phy_id = 0x002bdc00,
.phy_id_mask = 0xfffffc00,
+ .name = "Broadcom BCM63XX (2)",
/* PHY_BASIC_FEATURES */
.flags = PHY_IS_INTERNAL,
.config_init = bcm63xx_config_init,
Greg Kroah-Hartman
2020-03-17 10:53:56 UTC
Permalink
From: Karsten Graul <***@linux.ibm.com>

[ Upstream commit ece0d7bd74615773268475b6b64d6f1ebbd4b4c6 ]

During IB device removal, cancel the event worker before the device
structure is freed.

Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client")
Reported-by: syzbot+***@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul <***@linux.ibm.com>
Reviewed-by: Ursula Braun <***@linux.ibm.com>
Reviewed-by: Leon Romanovsky <***@mellanox.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/smc/smc_ib.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -580,6 +580,7 @@ static void smc_ib_remove_dev(struct ib_
smc_smcr_terminate_all(smcibdev);
smc_ib_cleanup_per_ibdev(smcibdev);
ib_unregister_event_handler(&smcibdev->event_handler);
+ cancel_work_sync(&smcibdev->port_event_work);
kfree(smcibdev);
}
Greg Kroah-Hartman
2020-03-17 10:53:57 UTC
Permalink
From: Remi Pommarel <***@triplefau.lt>

[ Upstream commit b723bd933980f4956dabc8a8d84b3e83be8d094c ]

ACS (auto PAD/FCS stripping) removes FCS off 802.3 packets (LLC) so that
there is no need to manually strip it for such packets. The enhanced DMA
descriptors allow to flag LLC packets so that the receiving callback can
use that to strip FCS manually or not. On the other hand, normal
descriptors do not support that.

Thus in order to not truncate LLC packet ACS should be disabled when
using normal DMA descriptors.

Fixes: 47dd7a540b8a0 ("net: add support for STMicroelectronics Ethernet controllers.")
Signed-off-by: Remi Pommarel <***@triplefau.lt>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -24,6 +24,7 @@
static void dwmac1000_core_init(struct mac_device_info *hw,
struct net_device *dev)
{
+ struct stmmac_priv *priv = netdev_priv(dev);
void __iomem *ioaddr = hw->pcsr;
u32 value = readl(ioaddr + GMAC_CONTROL);
int mtu = dev->mtu;
@@ -35,7 +36,7 @@ static void dwmac1000_core_init(struct m
* Broadcom tags can look like invalid LLC/SNAP packets and cause the
* hardware to truncate packets on reception.
*/
- if (netdev_uses_dsa(dev))
+ if (netdev_uses_dsa(dev) || !priv->plat->enh_desc)
value &= ~GMAC_CONTROL_ACS;

if (mtu > 1500)
Greg Kroah-Hartman
2020-03-17 10:53:58 UTC
Permalink
From: Colin Ian King <***@canonical.com>

[ Upstream commit c0368595c1639947839c0db8294ee96aca0b3b86 ]

Currently the bounds check on index is off by one and can lead to
an out of bounds access on array priv->filters_loc when index is
RXCHK_BRCM_TAG_MAX.

Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER")
Signed-off-by: Colin Ian King <***@canonical.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/broadcom/bcmsysport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2135,7 +2135,7 @@ static int bcm_sysport_rule_set(struct b
return -ENOSPC;

index = find_first_zero_bit(priv->filters, RXCHK_BRCM_TAG_MAX);
- if (index > RXCHK_BRCM_TAG_MAX)
+ if (index >= RXCHK_BRCM_TAG_MAX)
return -ENOSPC;

/* Location is the classification ID, and index is the position
Greg Kroah-Hartman
2020-03-17 10:53:59 UTC
Permalink
From: You-Sheng Yang <***@canonical.com>

[ Upstream commit d64c7a08034b32c285e576208ae44fc3ba3fa7df ]

Dell USB Type C docking WD19/WD19DC attaches additional peripherals as:

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
|__ Port 1: Dev 11, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 3: Dev 12, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 4: Dev 13, If 0, Class=Vendor Specific Class,
Driver=r8152, 5000M

where usb 2-1-3 is a hub connecting all USB Type-A/C ports on the dock.

When hotplugging such dock with additional usb devices already attached on
it, the probing process may reset usb 2.1 port, therefore r8152 ethernet
device is also reset. However, during r8152 device init there are several
for-loops that, when it's unable to retrieve hardware registers due to
being disconnected from USB, may take up to 14 seconds each in practice,
and that has to be completed before USB may re-enumerate devices on the
bus. As a result, devices attached to the dock will only be available
after nearly 1 minute after the dock was plugged in:

[ 216.388290] [250] r8152 2-1.4:1.0: usb_probe_interface
[ 216.388292] [250] r8152 2-1.4:1.0: usb_probe_interface - got id
[ 258.830410] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): PHY not ready
[ 258.830460] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): Invalid header when reading pass-thru MAC addr
[ 258.830464] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): Get ether addr fail

This happens in, for example, r8153_init:

static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
void *data, u16 type)
{
if (test_bit(RTL8152_UNPLUG, &tp->flags))
return -ENODEV;
...
}

static u16 ocp_read_word(struct r8152 *tp, u16 type, u16 index)
{
u32 data;
...
generic_ocp_read(tp, index, sizeof(tmp), &tmp, type | byen);

data = __le32_to_cpu(tmp);
...
return (u16)data;
}

static void r8153_init(struct r8152 *tp)
{
...
if (test_bit(RTL8152_UNPLUG, &tp->flags))
return;

for (i = 0; i < 500; i++) {
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
break;
msleep(20);
}
...
}

Since ocp_read_word() doesn't check the return status of
generic_ocp_read(), and the only exit condition for the loop is to have
a match in the returned value, such loops will only ends after exceeding
its maximum runs when the device has been marked as disconnected, which
takes 500 * 20ms = 10 seconds in theory, 14 in practice.

To solve this long latency another test to RTL8152_UNPLUG flag should be
added after those 20ms sleep to skip unnecessary loops, so that the device
probe can complete early and proceed to parent port reset/reprobe process.

This can be reproduced on all kernel versions up to latest v5.6-rc2, but
after v5.5-rc7 the reproduce rate is dramatically lowered to 1/30 or less
while it was around 1/2.

Signed-off-by: You-Sheng Yang <***@canonical.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/usb/r8152.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3220,6 +3220,8 @@ static u16 r8153_phy_status(struct r8152
}

msleep(20);
+ if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ break;
}

return data;
@@ -5401,7 +5403,10 @@ static void r8153_init(struct r8152 *tp)
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
break;
+
msleep(20);
+ if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ break;
}

data = r8153_phy_status(tp, 0);
@@ -5538,7 +5543,10 @@ static void r8153b_init(struct r8152 *tp
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
break;
+
msleep(20);
+ if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ break;
}

data = r8153_phy_status(tp, 0);
Greg Kroah-Hartman
2020-03-17 10:54:00 UTC
Permalink
From: Edward Cree <***@solarflare.com>

[ Upstream commit 4b1bd9db078f7d5332c8601a2f5bd43cf0458fd4 ]

It's a resource, not a parameter, so we can't copy it into the new
channel's TX queues, otherwise aliasing will lead to resource-
management bugs if the channel is subsequently torn down without
being initialised.

Before the Fixes:-tagged commit there was a similar bug with
tsoh_page, but I'm not sure it's worth doing another fix for such
old kernels.

Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
Suggested-by: Derek Shute <***@stratus.com>
Signed-off-by: Edward Cree <***@solarflare.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/sfc/efx.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -525,6 +525,7 @@ efx_copy_channel(const struct efx_channe
if (tx_queue->channel)
tx_queue->channel = channel;
tx_queue->buffer = NULL;
+ tx_queue->cb_page = NULL;
memset(&tx_queue->txd, 0, sizeof(tx_queue->txd));
}
Greg Kroah-Hartman
2020-03-17 10:54:01 UTC
Permalink
From: Eric Dumazet <***@google.com>

[ Upstream commit 110a40dfb708fe940a3f3704d470e431c368d256 ]

Before accessing various fields in IPV4 network header
and TCP header, make sure the packet :

- Has IP version 4 (ip->version == 4)
- Has not a silly network length (ip->ihl >= 5)
- Is big enough to hold network and transport headers
- Has not a silly TCP header size (th->doff >= sizeof(struct tcphdr) / 4)

syzbot reported :

BUG: KMSAN: uninit-value in slhc_compress+0x5b9/0x2e60 drivers/net/slip/slhc.c:270
CPU: 0 PID: 11728 Comm: syz-executor231 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
slhc_compress+0x5b9/0x2e60 drivers/net/slip/slhc.c:270
ppp_send_frame drivers/net/ppp/ppp_generic.c:1637 [inline]
__ppp_xmit_process+0x1902/0x2970 drivers/net/ppp/ppp_generic.c:1495
ppp_xmit_process+0x147/0x2f0 drivers/net/ppp/ppp_generic.c:1516
ppp_write+0x6bb/0x790 drivers/net/ppp/ppp_generic.c:512
do_loop_readv_writev fs/read_write.c:717 [inline]
do_iter_write+0x812/0xdc0 fs/read_write.c:1000
compat_writev+0x2df/0x5a0 fs/read_write.c:1351
do_compat_pwritev64 fs/read_write.c:1400 [inline]
__do_compat_sys_pwritev fs/read_write.c:1420 [inline]
__se_compat_sys_pwritev fs/read_write.c:1414 [inline]
__ia32_compat_sys_pwritev+0x349/0x3f0 fs/read_write.c:1414
do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f7cd99
Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffdb84ac EFLAGS: 00000217 ORIG_RAX: 000000000000014e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000200001c0
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000040047459 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
slab_alloc_node mm/slub.c:2793 [inline]
__kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
__kmalloc_reserve net/core/skbuff.c:142 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
alloc_skb include/linux/skbuff.h:1051 [inline]
ppp_write+0x115/0x790 drivers/net/ppp/ppp_generic.c:500
do_loop_readv_writev fs/read_write.c:717 [inline]
do_iter_write+0x812/0xdc0 fs/read_write.c:1000
compat_writev+0x2df/0x5a0 fs/read_write.c:1351
do_compat_pwritev64 fs/read_write.c:1400 [inline]
__do_compat_sys_pwritev fs/read_write.c:1420 [inline]
__se_compat_sys_pwritev fs/read_write.c:1414 [inline]
__ia32_compat_sys_pwritev+0x349/0x3f0 fs/read_write.c:1414
do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Signed-off-by: Eric Dumazet <***@google.com>
Reported-by: syzbot <***@googlegroups.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/slip/slhc.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -232,7 +232,7 @@ slhc_compress(struct slcompress *comp, u
struct cstate *cs = lcs->next;
unsigned long deltaS, deltaA;
short changes = 0;
- int hlen;
+ int nlen, hlen;
unsigned char new_seq[16];
unsigned char *cp = new_seq;
struct iphdr *ip;
@@ -248,6 +248,8 @@ slhc_compress(struct slcompress *comp, u
return isize;

ip = (struct iphdr *) icp;
+ if (ip->version != 4 || ip->ihl < 5)
+ return isize;

/* Bail if this packet isn't TCP, or is an IP fragment */
if (ip->protocol != IPPROTO_TCP || (ntohs(ip->frag_off) & 0x3fff)) {
@@ -258,10 +260,14 @@ slhc_compress(struct slcompress *comp, u
comp->sls_o_tcp++;
return isize;
}
- /* Extract TCP header */
+ nlen = ip->ihl * 4;
+ if (isize < nlen + sizeof(*th))
+ return isize;

- th = (struct tcphdr *)(((unsigned char *)ip) + ip->ihl*4);
- hlen = ip->ihl*4 + th->doff*4;
+ th = (struct tcphdr *)(icp + nlen);
+ if (th->doff < sizeof(struct tcphdr) / 4)
+ return isize;
+ hlen = nlen + th->doff * 4;

/* Bail if the TCP packet isn't `compressible' (i.e., ACK isn't set or
* some other control bit is set). Also uncompressible if
Greg Kroah-Hartman
2020-03-17 10:54:02 UTC
Permalink
From: Vinicius Costa Gomes <***@intel.com>

[ Upstream commit b09fe70ef520e011ba4a64f4b93f948a8f14717b ]

There was a bug that was causing packets to be sent to the driver
without first calling dequeue() on the "child" qdisc. And the KASAN
report below shows that sending a packet without calling dequeue()
leads to bad results.

The problem is that when checking the last qdisc "child" we do not set
the returned skb to NULL, which can cause it to be sent to the driver,
and so after the skb is sent, it may be freed, and in some situations a
reference to it may still be in the child qdisc, because it was never
dequeued.

The crash log looks like this:

[ 19.937538] ==================================================================
[ 19.938300] BUG: KASAN: use-after-free in taprio_dequeue_soft+0x620/0x780
[ 19.938968] Read of size 4 at addr ffff8881128628cc by task swapper/1/0
[ 19.939612]
[ 19.939772] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc3+ #97
[ 19.940397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qe4
[ 19.941523] Call Trace:
[ 19.941774] <IRQ>
[ 19.941985] dump_stack+0x97/0xe0
[ 19.942323] print_address_description.constprop.0+0x3b/0x60
[ 19.942884] ? taprio_dequeue_soft+0x620/0x780
[ 19.943325] ? taprio_dequeue_soft+0x620/0x780
[ 19.943767] __kasan_report.cold+0x1a/0x32
[ 19.944173] ? taprio_dequeue_soft+0x620/0x780
[ 19.944612] kasan_report+0xe/0x20
[ 19.944954] taprio_dequeue_soft+0x620/0x780
[ 19.945380] __qdisc_run+0x164/0x18d0
[ 19.945749] net_tx_action+0x2c4/0x730
[ 19.946124] __do_softirq+0x268/0x7bc
[ 19.946491] irq_exit+0x17d/0x1b0
[ 19.946824] smp_apic_timer_interrupt+0xeb/0x380
[ 19.947280] apic_timer_interrupt+0xf/0x20
[ 19.947687] </IRQ>
[ 19.947912] RIP: 0010:default_idle+0x2d/0x2d0
[ 19.948345] Code: 00 00 41 56 41 55 65 44 8b 2d 3f 8d 7c 7c 41 54 55 53 0f 1f 44 00 00 e8 b1 b2 c5 fd e9 07 00 3
[ 19.950166] RSP: 0018:ffff88811a3efda0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[ 19.950909] RAX: 0000000080000000 RBX: ffff88811a3a9600 RCX: ffffffff8385327e
[ 19.951608] RDX: 1ffff110234752c0 RSI: 0000000000000000 RDI: ffffffff8385262f
[ 19.952309] RBP: ffffed10234752c0 R08: 0000000000000001 R09: ffffed10234752c1
[ 19.953009] R10: ffffed10234752c0 R11: ffff88811a3a9607 R12: 0000000000000001
[ 19.953709] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 19.954408] ? default_idle_call+0x2e/0x70
[ 19.954816] ? default_idle+0x1f/0x2d0
[ 19.955192] default_idle_call+0x5e/0x70
[ 19.955584] do_idle+0x3d4/0x500
[ 19.955909] ? arch_cpu_idle_exit+0x40/0x40
[ 19.956325] ? _raw_spin_unlock_irqrestore+0x23/0x30
[ 19.956829] ? trace_hardirqs_on+0x30/0x160
[ 19.957242] cpu_startup_entry+0x19/0x20
[ 19.957633] start_secondary+0x2a6/0x380
[ 19.958026] ? set_cpu_sibling_map+0x18b0/0x18b0
[ 19.958486] secondary_startup_64+0xa4/0xb0
[ 19.958921]
[ 19.959078] Allocated by task 33:
[ 19.959412] save_stack+0x1b/0x80
[ 19.959747] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 19.960222] kmem_cache_alloc+0xe4/0x230
[ 19.960617] __alloc_skb+0x91/0x510
[ 19.960967] ndisc_alloc_skb+0x133/0x330
[ 19.961358] ndisc_send_ns+0x134/0x810
[ 19.961735] addrconf_dad_work+0xad5/0xf80
[ 19.962144] process_one_work+0x78e/0x13a0
[ 19.962551] worker_thread+0x8f/0xfa0
[ 19.962919] kthread+0x2ba/0x3b0
[ 19.963242] ret_from_fork+0x3a/0x50
[ 19.963596]
[ 19.963753] Freed by task 33:
[ 19.964055] save_stack+0x1b/0x80
[ 19.964386] __kasan_slab_free+0x12f/0x180
[ 19.964830] kmem_cache_free+0x80/0x290
[ 19.965231] ip6_mc_input+0x38a/0x4d0
[ 19.965617] ipv6_rcv+0x1a4/0x1d0
[ 19.965948] __netif_receive_skb_one_core+0xf2/0x180
[ 19.966437] netif_receive_skb+0x8c/0x3c0
[ 19.966846] br_handle_frame_finish+0x779/0x1310
[ 19.967302] br_handle_frame+0x42a/0x830
[ 19.967694] __netif_receive_skb_core+0xf0e/0x2a90
[ 19.968167] __netif_receive_skb_one_core+0x96/0x180
[ 19.968658] process_backlog+0x198/0x650
[ 19.969047] net_rx_action+0x2fa/0xaa0
[ 19.969420] __do_softirq+0x268/0x7bc
[ 19.969785]
[ 19.969940] The buggy address belongs to the object at ffff888112862840
[ 19.969940] which belongs to the cache skbuff_head_cache of size 224
[ 19.971202] The buggy address is located 140 bytes inside of
[ 19.971202] 224-byte region [ffff888112862840, ffff888112862920)
[ 19.972344] The buggy address belongs to the page:
[ 19.972820] page:ffffea00044a1800 refcount:1 mapcount:0 mapping:ffff88811a2bd1c0 index:0xffff8881128625c0 compo0
[ 19.973930] flags: 0x8000000000010200(slab|head)
[ 19.974388] raw: 8000000000010200 ffff88811a2ed650 ffff88811a2ed650 ffff88811a2bd1c0
[ 19.975151] raw: ffff8881128625c0 0000000000190013 00000001ffffffff 0000000000000000
[ 19.975915] page dumped because: kasan: bad access detected
[ 19.976461] page_owner tracks the page as allocated
[ 19.976946] page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NO)
[ 19.978332] prep_new_page+0x24b/0x330
[ 19.978707] get_page_from_freelist+0x2057/0x2c90
[ 19.979170] __alloc_pages_nodemask+0x218/0x590
[ 19.979619] new_slab+0x9d/0x300
[ 19.979948] ___slab_alloc.constprop.0+0x2f9/0x6f0
[ 19.980421] __slab_alloc.constprop.0+0x30/0x60
[ 19.980870] kmem_cache_alloc+0x201/0x230
[ 19.981269] __alloc_skb+0x91/0x510
[ 19.981620] alloc_skb_with_frags+0x78/0x4a0
[ 19.982043] sock_alloc_send_pskb+0x5eb/0x750
[ 19.982476] unix_stream_sendmsg+0x399/0x7f0
[ 19.982904] sock_sendmsg+0xe2/0x110
[ 19.983262] ____sys_sendmsg+0x4de/0x6d0
[ 19.983660] ___sys_sendmsg+0xe4/0x160
[ 19.984032] __sys_sendmsg+0xab/0x130
[ 19.984396] do_syscall_64+0xe7/0xae0
[ 19.984761] page last free stack trace:
[ 19.985142] __free_pages_ok+0x432/0xbc0
[ 19.985533] qlist_free_all+0x56/0xc0
[ 19.985907] quarantine_reduce+0x149/0x170
[ 19.986315] __kasan_kmalloc.constprop.0+0x9e/0xd0
[ 19.986791] kmem_cache_alloc+0xe4/0x230
[ 19.987182] prepare_creds+0x24/0x440
[ 19.987548] do_faccessat+0x80/0x590
[ 19.987906] do_syscall_64+0xe7/0xae0
[ 19.988276] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 19.988775]
[ 19.988930] Memory state around the buggy address:
[ 19.989402] ffff888112862780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.990111] ffff888112862800: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 19.990822] >ffff888112862880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 19.991529] ^
[ 19.992081] ffff888112862900: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.992796] ffff888112862980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Reported-by: Michael Schmidt <***@eti.uni-siegen.de>
Signed-off-by: Vinicius Costa Gomes <***@intel.com>
Acked-by: Andre Guedes <***@intel.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/sched/sch_taprio.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -564,8 +564,10 @@ static struct sk_buff *taprio_dequeue_so
prio = skb->priority;
tc = netdev_get_prio_tc_map(dev, prio);

- if (!(gate_mask & BIT(tc)))
+ if (!(gate_mask & BIT(tc))) {
+ skb = NULL;
continue;
+ }

len = qdisc_pkt_len(skb);
guard = ktime_add_ns(taprio_get_time(q),
@@ -575,13 +577,17 @@ static struct sk_buff *taprio_dequeue_so
* guard band ...
*/
if (gate_mask != TAPRIO_ALL_GATES_OPEN &&
- ktime_after(guard, entry->close_time))
+ ktime_after(guard, entry->close_time)) {
+ skb = NULL;
continue;
+ }

/* ... and no budget. */
if (gate_mask != TAPRIO_ALL_GATES_OPEN &&
- atomic_sub_return(len, &entry->budget) < 0)
+ atomic_sub_return(len, &entry->budget) < 0) {
+ skb = NULL;
continue;
+ }

skb = child->ops->dequeue(child);
if (unlikely(!skb))
Greg Kroah-Hartman
2020-03-17 10:54:03 UTC
Permalink
From: Eric Dumazet <***@google.com>

commit b7469e83d2add567e4e0b063963db185f3167cea upstream.

Similar to commit 38f88c454042 ("bonding/alb: properly access headers
in bond_alb_xmit()"), we need to make sure arp header was pulled
in skb->head before blindly accessing it in rlb_arp_xmit().

Remove arp_pkt() private helper, since it is more readable/obvious
to have the following construct back to back :

if (!pskb_network_may_pull(skb, sizeof(*arp)))
return NULL;
arp = (struct arp_pkt *)skb_network_header(skb);

syzbot reported :

BUG: KMSAN: uninit-value in bond_slave_has_mac_rx include/net/bonding.h:704 [inline]
BUG: KMSAN: uninit-value in rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline]
BUG: KMSAN: uninit-value in bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477
CPU: 0 PID: 12743 Comm: syz-executor.4 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
bond_slave_has_mac_rx include/net/bonding.h:704 [inline]
rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline]
bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477
__bond_start_xmit drivers/net/bonding/bond_main.c:4257 [inline]
bond_start_xmit+0x85d/0x2f70 drivers/net/bonding/bond_main.c:4282
__netdev_start_xmit include/linux/netdevice.h:4524 [inline]
netdev_start_xmit include/linux/netdevice.h:4538 [inline]
xmit_one net/core/dev.c:3470 [inline]
dev_hard_start_xmit+0x531/0xab0 net/core/dev.c:3486
__dev_queue_xmit+0x37de/0x4220 net/core/dev.c:4063
dev_queue_xmit+0x4b/0x60 net/core/dev.c:4096
packet_snd net/packet/af_packet.c:2967 [inline]
packet_sendmsg+0x8347/0x93b0 net/packet/af_packet.c:2992
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
__sys_sendto+0xc1b/0xc50 net/socket.c:1998
__do_sys_sendto net/socket.c:2010 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:2006
__x64_sys_sendto+0x6e/0x90 net/socket.c:2006
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45c479
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc77ffbbc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fc77ffbc6d4 RCX: 000000000045c479
RDX: 000000000000000e RSI: 00000000200004c0 RDI: 0000000000000003
RBP: 000000000076bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000a04 R14: 00000000004cc7b0 R15: 000000000076bf2c

Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
slab_alloc_node mm/slub.c:2793 [inline]
__kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
__kmalloc_reserve net/core/skbuff.c:142 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
alloc_skb include/linux/skbuff.h:1051 [inline]
alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766
sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242
packet_alloc_skb net/packet/af_packet.c:2815 [inline]
packet_snd net/packet/af_packet.c:2910 [inline]
packet_sendmsg+0x66a0/0x93b0 net/packet/af_packet.c:2992
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
__sys_sendto+0xc1b/0xc50 net/socket.c:1998
__do_sys_sendto net/socket.c:2010 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:2006
__x64_sys_sendto+0x6e/0x90 net/socket.c:2006
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <***@google.com>
Reported-by: syzbot <***@googlegroups.com>
Cc: Jay Vosburgh <***@gmail.com>
Cc: Veaceslav Falico <***@gmail.com>
Cc: Andy Gospodarek <***@greyhouse.net>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/bonding/bond_alb.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -50,11 +50,6 @@ struct arp_pkt {
};
#pragma pack()

-static inline struct arp_pkt *arp_pkt(const struct sk_buff *skb)
-{
- return (struct arp_pkt *)skb_network_header(skb);
-}
-
/* Forward declaration */
static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[],
bool strict_match);
@@ -553,10 +548,11 @@ static void rlb_req_update_subnet_client
spin_unlock(&bond->mode_lock);
}

-static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bond)
+static struct slave *rlb_choose_channel(struct sk_buff *skb,
+ struct bonding *bond,
+ const struct arp_pkt *arp)
{
struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
- struct arp_pkt *arp = arp_pkt(skb);
struct slave *assigned_slave, *curr_active_slave;
struct rlb_client_info *client_info;
u32 hash_index = 0;
@@ -653,8 +649,12 @@ static struct slave *rlb_choose_channel(
*/
static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
{
- struct arp_pkt *arp = arp_pkt(skb);
struct slave *tx_slave = NULL;
+ struct arp_pkt *arp;
+
+ if (!pskb_network_may_pull(skb, sizeof(*arp)))
+ return NULL;
+ arp = (struct arp_pkt *)skb_network_header(skb);

/* Don't modify or load balance ARPs that do not originate locally
* (e.g.,arrive via a bridge).
@@ -664,7 +664,7 @@ static struct slave *rlb_arp_xmit(struct

if (arp->op_code == htons(ARPOP_REPLY)) {
/* the arp must be sent on the selected rx channel */
- tx_slave = rlb_choose_channel(skb, bond);
+ tx_slave = rlb_choose_channel(skb, bond, arp);
if (tx_slave)
bond_hw_addr_copy(arp->mac_src, tx_slave->dev->dev_addr,
tx_slave->dev->addr_len);
@@ -676,7 +676,7 @@ static struct slave *rlb_arp_xmit(struct
* When the arp reply is received the entry will be updated
* with the correct unicast address of the client.
*/
- tx_slave = rlb_choose_channel(skb, bond);
+ tx_slave = rlb_choose_channel(skb, bond, arp);

/* The ARP reply packets must be delayed so that
* they can cancel out the influence of the ARP request.
Greg Kroah-Hartman
2020-03-17 10:54:04 UTC
Permalink
From: Vasundhara Volam <vasundhara-***@broadcom.com>

[ Upstream commit a9b952d267e59a3b405e644930f46d252cea7122 ]

MTU changes may affect the number of IRQs so we must call
bnxt_close_nic()/bnxt_open_nic() with the irq_re_init parameter
set to true. The reason is that a larger MTU may require
aggregation rings not needed with smaller MTU. We may not be
able to allocate the required number of aggregation rings and
so we reduce the number of channels which will change the number
of IRQs. Without this patch, it may crash eventually in
pci_disable_msix() when the IRQs are not properly unwound.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Vasundhara Volam <vasundhara-***@broadcom.com>
Signed-off-by: Michael Chan <***@broadcom.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10973,13 +10973,13 @@ static int bnxt_change_mtu(struct net_de
struct bnxt *bp = netdev_priv(dev);

if (netif_running(dev))
- bnxt_close_nic(bp, false, false);
+ bnxt_close_nic(bp, true, false);

dev->mtu = new_mtu;
bnxt_set_ring_params(bp);

if (netif_running(dev))
- return bnxt_open_nic(bp, false, false);
+ return bnxt_open_nic(bp, true, false);

return 0;
}
Greg Kroah-Hartman
2020-03-17 10:54:05 UTC
Permalink
From: Edwin Peer <***@broadcom.com>

[ Upstream commit 22630e28f9c2b55abd217869cc0696def89f2284 ]

After bnxt_hwrm_do_send_message() was updated to return standard error
codes in a recent commit, a regression in bnxt_flash_package_from_file()
was introduced. The return value does not properly reflect all
possible firmware errors when calling firmware to flash the package.

Fix it by consolidating all errors in one local variable rc instead
of having 2 variables for different errors.

Fixes: d4f1420d3656 ("bnxt_en: Convert error code in firmware message response to standard code.")
Signed-off-by: Edwin Peer <***@broadcom.com>
Signed-off-by: Michael Chan <***@broadcom.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 24 ++++++++++------------
1 file changed, 11 insertions(+), 13 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2007,8 +2007,8 @@ int bnxt_flash_package_from_file(struct
struct hwrm_nvm_install_update_output *resp = bp->hwrm_cmd_resp_addr;
struct hwrm_nvm_install_update_input install = {0};
const struct firmware *fw;
- int rc, hwrm_err = 0;
u32 item_len;
+ int rc = 0;
u16 index;

bnxt_hwrm_fw_set_time(bp);
@@ -2052,15 +2052,14 @@ int bnxt_flash_package_from_file(struct
memcpy(kmem, fw->data, fw->size);
modify.host_src_addr = cpu_to_le64(dma_handle);

- hwrm_err = hwrm_send_message(bp, &modify,
- sizeof(modify),
- FLASH_PACKAGE_TIMEOUT);
+ rc = hwrm_send_message(bp, &modify, sizeof(modify),
+ FLASH_PACKAGE_TIMEOUT);
dma_free_coherent(&bp->pdev->dev, fw->size, kmem,
dma_handle);
}
}
release_firmware(fw);
- if (rc || hwrm_err)
+ if (rc)
goto err_exit;

if ((install_type & 0xffff) == 0)
@@ -2069,20 +2068,19 @@ int bnxt_flash_package_from_file(struct
install.install_type = cpu_to_le32(install_type);

mutex_lock(&bp->hwrm_cmd_lock);
- hwrm_err = _hwrm_send_message(bp, &install, sizeof(install),
- INSTALL_PACKAGE_TIMEOUT);
- if (hwrm_err) {
+ rc = _hwrm_send_message(bp, &install, sizeof(install),
+ INSTALL_PACKAGE_TIMEOUT);
+ if (rc) {
u8 error_code = ((struct hwrm_err_output *)resp)->cmd_err;

if (resp->error_code && error_code ==
NVM_INSTALL_UPDATE_CMD_ERR_CODE_FRAG_ERR) {
install.flags |= cpu_to_le16(
NVM_INSTALL_UPDATE_REQ_FLAGS_ALLOWED_TO_DEFRAG);
- hwrm_err = _hwrm_send_message(bp, &install,
- sizeof(install),
- INSTALL_PACKAGE_TIMEOUT);
+ rc = _hwrm_send_message(bp, &install, sizeof(install),
+ INSTALL_PACKAGE_TIMEOUT);
}
- if (hwrm_err)
+ if (rc)
goto flash_pkg_exit;
}

@@ -2094,7 +2092,7 @@ int bnxt_flash_package_from_file(struct
flash_pkg_exit:
mutex_unlock(&bp->hwrm_cmd_lock);
err_exit:
- if (hwrm_err == -EACCES)
+ if (rc == -EACCES)
bnxt_print_admin_err(bp);
return rc;
}
Greg Kroah-Hartman
2020-03-17 10:54:06 UTC
Permalink
From: Shakeel Butt <***@google.com>

[ Upstream commit e876ecc67db80dfdb8e237f71e5b43bb88ae549c ]

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
<IRQ>
dump_stack+0x57/0x75
mem_cgroup_sk_alloc+0xe9/0xf0
sk_clone_lock+0x2a7/0x420
inet_csk_clone_lock+0x1b/0x110
tcp_create_openreq_child+0x23/0x3b0
tcp_v6_syn_recv_sock+0x88/0x730
tcp_check_req+0x429/0x560
tcp_v6_rcv+0x72d/0xa40
ip6_protocol_deliver_rcu+0xc9/0x400
ip6_input+0x44/0xd0
? ip6_protocol_deliver_rcu+0x400/0x400
ip6_rcv_finish+0x71/0x80
ipv6_rcv+0x5b/0xe0
? ip6_sublist_rcv+0x2e0/0x2e0
process_backlog+0x108/0x1e0
net_rx_action+0x26b/0x460
__do_softirq+0x104/0x2a6
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.19+0x40/0x50
__local_bh_enable_ip+0x51/0x60
ip6_finish_output2+0x23d/0x520
? ip6table_mangle_hook+0x55/0x160
__ip6_finish_output+0xa1/0x100
ip6_finish_output+0x30/0xd0
ip6_output+0x73/0x120
? __ip6_finish_output+0x100/0x100
ip6_xmit+0x2e3/0x600
? ipv6_anycast_cleanup+0x50/0x50
? inet6_csk_route_socket+0x136/0x1e0
? skb_free_head+0x1e/0x30
inet6_csk_xmit+0x95/0xf0
__tcp_transmit_skb+0x5b4/0xb20
__tcp_send_ack.part.60+0xa3/0x110
tcp_send_ack+0x1d/0x20
tcp_rcv_state_process+0xe64/0xe80
? tcp_v6_connect+0x5d1/0x5f0
tcp_v6_do_rcv+0x1b1/0x3f0
? tcp_v6_do_rcv+0x1b1/0x3f0
__release_sock+0x7f/0xd0
release_sock+0x30/0xa0
__inet_stream_connect+0x1c3/0x3b0
? prepare_to_wait+0xb0/0xb0
inet_stream_connect+0x3b/0x60
__sys_connect+0x101/0x120
? __sys_getsockopt+0x11b/0x140
__x64_sys_connect+0x1a/0x20
do_syscall_64+0x51/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <***@google.com>
Reviewed-by: Roman Gushchin <***@fb.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
kernel/cgroup/cgroup.c | 4 ++++
mm/memcontrol.c | 4 ++++
2 files changed, 8 insertions(+)

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6263,6 +6263,10 @@ void cgroup_sk_alloc(struct sock_cgroup_
return;
}

+ /* Don't associate the sock with unrelated interrupted task's cgroup. */
+ if (in_interrupt())
+ return;
+
rcu_read_lock();

while (true) {
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6697,6 +6697,10 @@ void mem_cgroup_sk_alloc(struct sock *sk
return;
}

+ /* Do not associate the sock with unrelated interrupted task's memcg. */
+ if (in_interrupt())
+ return;
+
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
if (memcg == root_mem_cgroup)
Greg Kroah-Hartman
2020-03-17 10:54:08 UTC
Permalink
From: Eric Dumazet <***@google.com>

commit 06669ea346e476a5339033d77ef175566a40efbb upstream.

Locking newsk while still holding the listener lock triggered
a lockdep splat [1]

We can simply move the memcg code after we release the listener lock,
as this can also help if multiple threads are sharing a common listener.

Also fix a typo while reading socket sk_rmem_alloc.

[1]
WARNING: possible recursive locking detected
5.6.0-rc3-syzkaller #0 Not tainted
--------------------------------------------
syz-executor598/9524 is trying to acquire lock:
ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492

but task is already holding lock:
ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(sk_lock-AF_INET6);
lock(sk_lock-AF_INET6);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz-executor598/9524:
#0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
#0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445

stack backtrace:
CPU: 0 PID: 9524 Comm: syz-executor598 Not tainted 5.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
print_deadlock_bug kernel/locking/lockdep.c:2370 [inline]
check_deadlock kernel/locking/lockdep.c:2411 [inline]
validate_chain kernel/locking/lockdep.c:2954 [inline]
__lock_acquire.cold+0x114/0x288 kernel/locking/lockdep.c:3954
lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4484
lock_sock_nested+0xc5/0x110 net/core/sock.c:2947
lock_sock include/net/sock.h:1541 [inline]
inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492
inet_accept+0xe9/0x7c0 net/ipv4/af_inet.c:734
__sys_accept4_file+0x3ac/0x5b0 net/socket.c:1758
__sys_accept4+0x53/0x90 net/socket.c:1809
__do_sys_accept4 net/socket.c:1821 [inline]
__se_sys_accept4 net/socket.c:1818 [inline]
__x64_sys_accept4+0x93/0xf0 net/socket.c:1818
do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4445c9
Code: e8 0c 0d 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffc35b37608 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004445c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000306777 R09: 0000000000306777
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004053d0 R14: 0000000000000000 R15: 0000000000000000

Fixes: d752a4986532 ("net: memcg: late association of sock to memcg")
Signed-off-by: Eric Dumazet <***@google.com>
Cc: Shakeel Butt <***@google.com>
Reported-by: syzbot <***@googlegroups.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ipv4/inet_connection_sock.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -483,27 +483,27 @@ struct sock *inet_csk_accept(struct sock
spin_unlock_bh(&queue->fastopenq.lock);
}

- if (mem_cgroup_sockets_enabled) {
+out:
+ release_sock(sk);
+ if (newsk && mem_cgroup_sockets_enabled) {
int amt;

/* atomically get the memory usage, set and charge the
- * sk->sk_memcg.
+ * newsk->sk_memcg.
*/
lock_sock(newsk);

- /* The sk has not been accepted yet, no need to look at
- * sk->sk_wmem_queued.
+ /* The socket has not been accepted yet, no need to look at
+ * newsk->sk_wmem_queued.
*/
amt = sk_mem_pages(newsk->sk_forward_alloc +
- atomic_read(&sk->sk_rmem_alloc));
+ atomic_read(&newsk->sk_rmem_alloc));
mem_cgroup_sk_alloc(newsk);
if (newsk->sk_memcg && amt)
mem_cgroup_charge_skmem(newsk->sk_memcg, amt);

release_sock(newsk);
}
-out:
- release_sock(sk);
if (req)
reqsk_put(req);
return newsk;
Greg Kroah-Hartman
2020-03-17 10:54:09 UTC
Permalink
From: Madalin Bucur <***@nxp.com>

commit 26d5bb9e4c4b541c475751e015072eb2cbf70d15 upstream.

FMAN DMA read or writes under heavy traffic load may cause FMAN
internal resource leak; thus stopping further packet processing.

The FMAN internal queue can overflow when FMAN splits single
read or write transactions into multiple smaller transactions
such that more than 17 AXI transactions are in flight from FMAN
to interconnect. When the FMAN internal queue overflows, it can
stall further packet processing. The issue can occur with any one
of the following three conditions:

1. FMAN AXI transaction crosses 4K address boundary (Errata
A010022)
2. FMAN DMA address for an AXI transaction is not 16 byte
aligned, i.e. the last 4 bits of an address are non-zero
3. Scatter Gather (SG) frames have more than one SG buffer in
the SG list and any one of the buffers, except the last
buffer in the SG list has data size that is not a multiple
of 16 bytes, i.e., other than 16, 32, 48, 64, etc.

With any one of the above three conditions present, there is
likelihood of stalled FMAN packet processing, especially under
stress with multiple ports injecting line-rate traffic.

To avoid situations that stall FMAN packet processing, all of the
above three conditions must be avoided; therefore, configure the
system with the following rules:

1. Frame buffers must not span a 4KB address boundary, unless
the frame start address is 256 byte aligned
2. All FMAN DMA start addresses (for example, BMAN buffer
address, FD[address] + FD[offset]) are 16B aligned
3. SG table and buffer addresses are 16B aligned and the size
of SG buffers are multiple of 16 bytes, except for the last
SG buffer that can be of any size.

Additional workaround notes:
- Address alignment of 64 bytes is recommended for maximally
efficient system bus transactions (although 16 byte alignment is
sufficient to avoid the stall condition)
- To support frame sizes that are larger than 4K bytes, there are
two options:
1. Large single buffer frames that span a 4KB page boundary can
be converted into SG frames to avoid transaction splits at
the 4KB boundary,
2. Align the large single buffer to 256B address boundaries,
ensure that the frame address plus offset is 256B aligned.
- If software generated SG frames have buffers that are unaligned
and with random non-multiple of 16 byte lengths, before
transmitting such frames via FMAN, frames will need to be copied
into a new single buffer or multiple buffer SG frame that is
compliant with the three rules listed above.

Signed-off-by: Madalin Bucur <***@nxp.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
Documentation/devicetree/bindings/net/fsl-fman.txt | 7 +++++++
1 file changed, 7 insertions(+)

--- a/Documentation/devicetree/bindings/net/fsl-fman.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fman.txt
@@ -110,6 +110,13 @@ PROPERTIES
Usage: required
Definition: See soc/fsl/qman.txt and soc/fsl/bman.txt

+- fsl,erratum-a050385
+ Usage: optional
+ Value type: boolean
+ Definition: A boolean property. Indicates the presence of the
+ erratum A050385 which indicates that DMA transactions that are
+ split can result in a FMan lock.
+
=============================================================================
FMan MURAM Node
Greg Kroah-Hartman
2020-03-17 10:54:11 UTC
Permalink
From: Madalin Bucur <***@nxp.com>

commit b281f7b93b258ce1419043bbd898a29254d5c9c7 upstream.

Detect the presence of the A050385 erratum.

Signed-off-by: Madalin Bucur <***@nxp.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/freescale/fman/Kconfig | 28 ++++++++++++++++++++++++++++
drivers/net/ethernet/freescale/fman/fman.c | 18 ++++++++++++++++++
drivers/net/ethernet/freescale/fman/fman.h | 5 +++++
3 files changed, 51 insertions(+)

--- a/drivers/net/ethernet/freescale/fman/Kconfig
+++ b/drivers/net/ethernet/freescale/fman/Kconfig
@@ -8,3 +8,31 @@ config FSL_FMAN
help
Freescale Data-Path Acceleration Architecture Frame Manager
(FMan) support
+
+config DPAA_ERRATUM_A050385
+ bool
+ depends on ARM64 && FSL_DPAA
+ default y
+ help
+ DPAA FMan erratum A050385 software workaround implementation:
+ align buffers, data start, SG fragment length to avoid FMan DMA
+ splits.
+ FMAN DMA read or writes under heavy traffic load may cause FMAN
+ internal resource leak thus stopping further packet processing.
+ The FMAN internal queue can overflow when FMAN splits single
+ read or write transactions into multiple smaller transactions
+ such that more than 17 AXI transactions are in flight from FMAN
+ to interconnect. When the FMAN internal queue overflows, it can
+ stall further packet processing. The issue can occur with any
+ one of the following three conditions:
+ 1. FMAN AXI transaction crosses 4K address boundary (Errata
+ A010022)
+ 2. FMAN DMA address for an AXI transaction is not 16 byte
+ aligned, i.e. the last 4 bits of an address are non-zero
+ 3. Scatter Gather (SG) frames have more than one SG buffer in
+ the SG list and any one of the buffers, except the last
+ buffer in the SG list has data size that is not a multiple
+ of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
+ With any one of the above three conditions present, there is
+ likelihood of stalled FMAN packet processing, especially under
+ stress with multiple ports injecting line-rate traffic.
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -1,5 +1,6 @@
/*
* Copyright 2008-2015 Freescale Semiconductor Inc.
+ * Copyright 2020 NXP
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@@ -566,6 +567,10 @@ struct fman_cfg {
u32 qmi_def_tnums_thresh;
};

+#ifdef CONFIG_DPAA_ERRATUM_A050385
+static bool fman_has_err_a050385;
+#endif
+
static irqreturn_t fman_exceptions(struct fman *fman,
enum fman_exceptions exception)
{
@@ -2518,6 +2523,14 @@ struct fman *fman_bind(struct device *fm
}
EXPORT_SYMBOL(fman_bind);

+#ifdef CONFIG_DPAA_ERRATUM_A050385
+bool fman_has_errata_a050385(void)
+{
+ return fman_has_err_a050385;
+}
+EXPORT_SYMBOL(fman_has_errata_a050385);
+#endif
+
static irqreturn_t fman_err_irq(int irq, void *handle)
{
struct fman *fman = (struct fman *)handle;
@@ -2845,6 +2858,11 @@ static struct fman *read_dts_node(struct
goto fman_free;
}

+#ifdef CONFIG_DPAA_ERRATUM_A050385
+ fman_has_err_a050385 =
+ of_property_read_bool(fm_node, "fsl,erratum-a050385");
+#endif
+
return fman;

fman_node_put:
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -1,5 +1,6 @@
/*
* Copyright 2008-2015 Freescale Semiconductor Inc.
+ * Copyright 2020 NXP
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@@ -398,6 +399,10 @@ u16 fman_get_max_frm(void);

int fman_get_rx_extra_headroom(void);

+#ifdef CONFIG_DPAA_ERRATUM_A050385
+bool fman_has_errata_a050385(void);
+#endif
+
struct fman *fman_bind(struct device *dev);

#endif /* __FM_H */
Greg Kroah-Hartman
2020-03-17 10:53:46 UTC
Permalink
From: Russell King <rmk+***@armlinux.org.uk>

[ Upstream commit 0395823b8d9a4d87bd1bf74359123461c2ae801b ]

If the switch is not hardware reset on a warm boot, interrupts can be
left enabled, and possibly pending. This will cause us to enter an
infinite loop trying to service an interrupt we are unable to handle,
thereby preventing the kernel from booting.

Ensure that the global 2 interrupt sources are disabled before we claim
the parent interrupt.

Observed on the ZII development revision B and C platforms with
reworked serdes support, and using reboot -f to reboot the platform.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Russell King <rmk+***@armlinux.org.uk>
Reviewed-by: Andrew Lunn <***@lunn.ch>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/dsa/mv88e6xxx/global2.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/net/dsa/mv88e6xxx/global2.c
+++ b/drivers/net/dsa/mv88e6xxx/global2.c
@@ -1096,6 +1096,13 @@ int mv88e6xxx_g2_irq_setup(struct mv88e6
{
int err, irq, virq;

+ chip->g2_irq.masked = ~0;
+ mv88e6xxx_reg_lock(chip);
+ err = mv88e6xxx_g2_int_mask(chip, ~chip->g2_irq.masked);
+ mv88e6xxx_reg_unlock(chip);
+ if (err)
+ return err;
+
chip->g2_irq.domain = irq_domain_add_simple(
chip->dev->of_node, 16, 0, &mv88e6xxx_g2_irq_domain_ops, chip);
if (!chip->g2_irq.domain)
@@ -1105,7 +1112,6 @@ int mv88e6xxx_g2_irq_setup(struct mv88e6
irq_create_mapping(chip->g2_irq.domain, irq);

chip->g2_irq.chip = mv88e6xxx_g2_irq_chip;
- chip->g2_irq.masked = ~0;

chip->device_irq = irq_find_mapping(chip->g1_irq.domain,
MV88E6XXX_G1_STS_IRQ_DEVICE);
Greg Kroah-Hartman
2020-03-17 10:54:13 UTC
Permalink
From: Yonglong Liu <***@huawei.com>

[ Upstream commit 5eb01ddfcfb25e6ebc404a41deae946bde776731 ]

The HNS3 driver supports to configure TC numbers and TC to priority
map via "tc" tool. But when delete the rule, will fail, because
the HNS3 driver needs at least one TC, but the "tc" tool sets TC
number to zero when delete.

This patch makes sure that the TC number is at least one.

Fixes: 30d240dfa2e8 ("net: hns3: Add mqprio hardware offload support in hns3 driver")
Signed-off-by: Yonglong Liu <***@huawei.com>
Signed-off-by: Huazhong Tan <***@huawei.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1675,7 +1675,7 @@ static int hns3_setup_tc(struct net_devi
netif_dbg(h, drv, netdev, "setup tc: num_tc=%u\n", tc);

return (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ?
- kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP;
+ kinfo->dcb_ops->setup_tc(h, tc ? tc : 1, prio_tc) : -EOPNOTSUPP;
}

static int hns3_nic_setup_tc(struct net_device *dev, enum tc_setup_type type,
Greg Kroah-Hartman
2020-03-17 10:54:14 UTC
Permalink
From: Jian Shen <***@huawei.com>

[ Upstream commit 903b85d3adce99a5301d5959c4d3c9d14a7974d4 ]

According to the user manual, the ingress and egress VLAN filter
are configured at the same time. Currently, hclge_init_vlan_config()
and hclge_set_vlan_spoofchk() will both change the VLAN filter
switch. So it's necessary to read the old configuration before
modifying it.

Fixes: 22044f95faa0 ("net: hns3: add support for spoof check setting")
Signed-off-by: Jian Shen <***@huawei.com>
Signed-off-by: Huazhong Tan <***@huawei.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 19 ++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -7727,16 +7727,27 @@ static int hclge_set_vlan_filter_ctrl(st
struct hclge_desc desc;
int ret;

- hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_VLAN_FILTER_CTRL, false);
-
+ /* read current vlan filter parameter */
+ hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_VLAN_FILTER_CTRL, true);
req = (struct hclge_vlan_filter_ctrl_cmd *)desc.data;
req->vlan_type = vlan_type;
- req->vlan_fe = filter_en ? fe_type : 0;
req->vf_id = vf_id;

ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+ if (ret) {
+ dev_err(&hdev->pdev->dev,
+ "failed to get vlan filter config, ret = %d.\n", ret);
+ return ret;
+ }
+
+ /* modify and write new config parameter */
+ hclge_cmd_reuse_desc(&desc, false);
+ req->vlan_fe = filter_en ?
+ (req->vlan_fe | fe_type) : (req->vlan_fe & ~fe_type);
+
+ ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret)
- dev_err(&hdev->pdev->dev, "set vlan filter fail, ret =%d.\n",
+ dev_err(&hdev->pdev->dev, "failed to set vlan filter, ret = %d.\n",
ret);

return ret;
Greg Kroah-Hartman
2020-03-17 10:54:16 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 8750939b6ad86abc3f53ec8a9683a1cded4a5654 ]

DEVLINK_ATTR_PARAM_VALUE_DATA may have different types
so it's not checked by the normal netlink policy. Make
sure the attribute length is what we expect.

Fixes: e3b7ca18ad7b ("devlink: Add param set command")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: Jiri Pirko <***@mellanox.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/core/devlink.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)

--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3352,34 +3352,41 @@ devlink_param_value_get_from_info(const
struct genl_info *info,
union devlink_param_value *value)
{
+ struct nlattr *param_data;
int len;

- if (param->type != DEVLINK_PARAM_TYPE_BOOL &&
- !info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])
+ param_data = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA];
+
+ if (param->type != DEVLINK_PARAM_TYPE_BOOL && !param_data)
return -EINVAL;

switch (param->type) {
case DEVLINK_PARAM_TYPE_U8:
- value->vu8 = nla_get_u8(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
+ if (nla_len(param_data) != sizeof(u8))
+ return -EINVAL;
+ value->vu8 = nla_get_u8(param_data);
break;
case DEVLINK_PARAM_TYPE_U16:
- value->vu16 = nla_get_u16(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
+ if (nla_len(param_data) != sizeof(u16))
+ return -EINVAL;
+ value->vu16 = nla_get_u16(param_data);
break;
case DEVLINK_PARAM_TYPE_U32:
- value->vu32 = nla_get_u32(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
+ if (nla_len(param_data) != sizeof(u32))
+ return -EINVAL;
+ value->vu32 = nla_get_u32(param_data);
break;
case DEVLINK_PARAM_TYPE_STRING:
- len = strnlen(nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]),
- nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
- if (len == nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) ||
+ len = strnlen(nla_data(param_data), nla_len(param_data));
+ if (len == nla_len(param_data) ||
len >= __DEVLINK_PARAM_MAX_STRING_VALUE)
return -EINVAL;
- strcpy(value->vstr,
- nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
+ strcpy(value->vstr, nla_data(param_data));
break;
case DEVLINK_PARAM_TYPE_BOOL:
- value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?
- true : false;
+ if (param_data && nla_len(param_data))
+ return -EINVAL;
+ value->vbool = nla_get_flag(param_data);
break;
}
return 0;
Greg Kroah-Hartman
2020-03-17 10:54:17 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit ff3b63b8c299b73ac599b120653b47e275407656 ]

DEVLINK_ATTR_REGION_CHUNK_ADDR and DEVLINK_ATTR_REGION_CHUNK_LEN
lack entries in the netlink policy. Corresponding nla_get_u64()s
may read beyond the end of the message.

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: Jiri Pirko <***@mellanox.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/core/devlink.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -5924,6 +5924,8 @@ static const struct nla_policy devlink_n
[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
+ [DEVLINK_ATTR_REGION_CHUNK_ADDR] = { .type = NLA_U64 },
+ [DEVLINK_ATTR_REGION_CHUNK_LEN] = { .type = NLA_U64 },
[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
Greg Kroah-Hartman
2020-03-17 10:54:18 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 4c16d64ea04056f1b1b324ab6916019f6a064114 ]

Add missing netlink policy entry for FRA_TUN_ID.

Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: David Ahern <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
include/net/fib_rules.h | 1 +
1 file changed, 1 insertion(+)

--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -108,6 +108,7 @@ struct fib_rule_notifier_info {
[FRA_OIFNAME] = { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, \
[FRA_PRIORITY] = { .type = NLA_U32 }, \
[FRA_FWMARK] = { .type = NLA_U32 }, \
+ [FRA_TUN_ID] = { .type = NLA_U64 }, \
[FRA_FWMASK] = { .type = NLA_U32 }, \
[FRA_TABLE] = { .type = NLA_U32 }, \
[FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \
Greg Kroah-Hartman
2020-03-17 10:54:15 UTC
Permalink
From: Jian Shen <***@huawei.com>

[ Upstream commit 59359fc8a2f7af062777692e6a7aae73483729ec ]

Currently, PF missed to clear the port base VLAN for VF when
unload. In this case, the VLAN id will remain in the VLAN
table. This patch fixes it.

Fixes: 92f11ea177cd ("net: hns3: fix set port based VLAN issue for VF")
Signed-off-by: Jian Shen <***@huawei.com>
Signed-off-by: Huazhong Tan <***@huawei.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 23 ++++++++++++++++
1 file changed, 23 insertions(+)

--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -8486,6 +8486,28 @@ static int hclge_set_vf_vlan_filter(stru
}
}

+static void hclge_clear_vf_vlan(struct hclge_dev *hdev)
+{
+ struct hclge_vlan_info *vlan_info;
+ struct hclge_vport *vport;
+ int ret;
+ int vf;
+
+ /* clear port base vlan for all vf */
+ for (vf = HCLGE_VF_VPORT_START_NUM; vf < hdev->num_alloc_vport; vf++) {
+ vport = &hdev->vport[vf];
+ vlan_info = &vport->port_base_vlan_cfg.vlan_info;
+
+ ret = hclge_set_vlan_filter_hw(hdev, htons(ETH_P_8021Q),
+ vport->vport_id,
+ vlan_info->vlan_tag, true);
+ if (ret)
+ dev_err(&hdev->pdev->dev,
+ "failed to clear vf vlan for vf%d, ret = %d\n",
+ vf - HCLGE_VF_VPORT_START_NUM, ret);
+ }
+}
+
int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
u16 vlan_id, bool is_kill)
{
@@ -9895,6 +9917,7 @@ static void hclge_uninit_ae_dev(struct h
struct hclge_mac *mac = &hdev->hw.mac;

hclge_reset_vf_rate(hdev);
+ hclge_clear_vf_vlan(hdev);
hclge_misc_affinity_teardown(hdev);
hclge_state_uninit(hdev);
Greg Kroah-Hartman
2020-03-17 10:54:19 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 9322cd7c4af2ccc7fe7c5f01adb53f4f77949e92 ]

Add missing attribute validation for several u8 types.

Fixes: 2c21d11518b6 ("net: add NL802154 interface for configuration of 802.15.4 devices")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Acked-by: Stefan Schmidt <***@datenfreihafen.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ieee802154/nl_policy.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/net/ieee802154/nl_policy.c
+++ b/net/ieee802154/nl_policy.c
@@ -21,6 +21,11 @@ const struct nla_policy ieee802154_polic
[IEEE802154_ATTR_HW_ADDR] = { .type = NLA_HW_ADDR, },
[IEEE802154_ATTR_PAN_ID] = { .type = NLA_U16, },
[IEEE802154_ATTR_CHANNEL] = { .type = NLA_U8, },
+ [IEEE802154_ATTR_BCN_ORD] = { .type = NLA_U8, },
+ [IEEE802154_ATTR_SF_ORD] = { .type = NLA_U8, },
+ [IEEE802154_ATTR_PAN_COORD] = { .type = NLA_U8, },
+ [IEEE802154_ATTR_BAT_EXT] = { .type = NLA_U8, },
+ [IEEE802154_ATTR_COORD_REALIGN] = { .type = NLA_U8, },
[IEEE802154_ATTR_PAGE] = { .type = NLA_U8, },
[IEEE802154_ATTR_COORD_SHORT_ADDR] = { .type = NLA_U16, },
[IEEE802154_ATTR_COORD_HW_ADDR] = { .type = NLA_HW_ADDR, },
Greg Kroah-Hartman
2020-03-17 10:54:20 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit b60673c4c418bef7550d02faf53c34fbfeb366bf ]

Add missing attribute type validation for IEEE802154_ATTR_DEV_TYPE
to the netlink policy.

Fixes: 90c049b2c6ae ("ieee802154: interface type to be added")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Acked-by: Stefan Schmidt <***@datenfreihafen.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ieee802154/nl_policy.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/ieee802154/nl_policy.c
+++ b/net/ieee802154/nl_policy.c
@@ -27,6 +27,7 @@ const struct nla_policy ieee802154_polic
[IEEE802154_ATTR_BAT_EXT] = { .type = NLA_U8, },
[IEEE802154_ATTR_COORD_REALIGN] = { .type = NLA_U8, },
[IEEE802154_ATTR_PAGE] = { .type = NLA_U8, },
+ [IEEE802154_ATTR_DEV_TYPE] = { .type = NLA_U8, },
[IEEE802154_ATTR_COORD_SHORT_ADDR] = { .type = NLA_U16, },
[IEEE802154_ATTR_COORD_HW_ADDR] = { .type = NLA_HW_ADDR, },
[IEEE802154_ATTR_COORD_PAN_ID] = { .type = NLA_U16, },
Greg Kroah-Hartman
2020-03-17 10:54:21 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit ab02ad660586b94f5d08912a3952b939cf4c4430 ]

Add missing attribute validation for IFLA_CAN_TERMINATION
to the netlink policy.

Fixes: 12a6075cabc0 ("can: dev: add CAN interface termination API")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Acked-by: Oliver Hartkopp <***@hartkopp.net>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/can/dev.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -883,6 +883,7 @@ static const struct nla_policy can_polic
= { .len = sizeof(struct can_bittiming) },
[IFLA_CAN_DATA_BITTIMING_CONST]
= { .len = sizeof(struct can_bittiming_const) },
+ [IFLA_CAN_TERMINATION] = { .type = NLA_U16 },
};

static int can_validate(struct nlattr *tb[], struct nlattr *data[],
Greg Kroah-Hartman
2020-03-17 10:54:23 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit b5ab1f1be6180a2e975eede18731804b5164a05d ]

Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.

Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: Greg Rose <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/openvswitch/datapath.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -647,6 +647,7 @@ static const struct nla_policy packet_po
[OVS_PACKET_ATTR_ACTIONS] = { .type = NLA_NESTED },
[OVS_PACKET_ATTR_PROBE] = { .type = NLA_FLAG },
[OVS_PACKET_ATTR_MRU] = { .type = NLA_U16 },
+ [OVS_PACKET_ATTR_HASH] = { .type = NLA_U64 },
};

static const struct genl_ops dp_packet_genl_ops[] = {
Greg Kroah-Hartman
2020-03-17 10:53:48 UTC
Permalink
From: Jian Shen <***@huawei.com>

[ Upstream commit 68e1006f618e509fc7869259fe83ceec4a95dac3 ]

When fibre port supports auto-negotiation, the IMP(Intelligent
Management Process) processes the speed of auto-negotiation
and the user's speed separately.
For below case, the port will get a not link up problem.
step 1: disables auto-negotiation and sets speed to A, then
the driver's MAC speed will be updated to A.
step 2: enables auto-negotiation and MAC gets negotiated
speed B, then the driver's MAC speed will be updated to B
through querying in periodical task.
step 3: MAC gets new negotiated speed A.
step 4: disables auto-negotiation and sets speed to B before
periodical task query new MAC speed A, the driver will ignore
the speed configuration.

This patch fixes it by skipping speed and duplex checking when
fibre port supports auto-negotiation.

Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Jian Shen <***@huawei.com>
Signed-off-by: Huazhong Tan <***@huawei.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2450,10 +2450,12 @@ static int hclge_cfg_mac_speed_dup_hw(st

int hclge_cfg_mac_speed_dup(struct hclge_dev *hdev, int speed, u8 duplex)
{
+ struct hclge_mac *mac = &hdev->hw.mac;
int ret;

duplex = hclge_check_speed_dup(duplex, speed);
- if (hdev->hw.mac.speed == speed && hdev->hw.mac.duplex == duplex)
+ if (!mac->support_autoneg && mac->speed == speed &&
+ mac->duplex == duplex)
return 0;

ret = hclge_cfg_mac_speed_dup_hw(hdev, speed, duplex);
Greg Kroah-Hartman
2020-03-17 10:53:49 UTC
Permalink
From: Hangbin Liu <***@gmail.com>

[ Upstream commit 07758eb9ff52794fba15d03aa88d92dbd1b7d125 ]

When we add peer address with metric configured, IPv4 could set the dest
metric correctly, but IPv6 do not. e.g.

]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20
]# ip route show dev eth1
192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20
]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20
]# ip -6 route show dev eth1
2001:db8::1 proto kernel metric 20 pref medium
2001:db8::2 proto kernel metric 256 pref medium

Fix this by using configured metric instead of default one.

Reported-by: Jianlin Shi <***@redhat.com>
Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes")
Reviewed-by: David Ahern <***@gmail.com>
Signed-off-by: Hangbin Liu <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ipv6/addrconf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5987,9 +5987,9 @@ static void __ipv6_ifa_notify(int event,
if (ifp->idev->cnf.forwarding)
addrconf_join_anycast(ifp);
if (!ipv6_addr_any(&ifp->peer_addr))
- addrconf_prefix_route(&ifp->peer_addr, 128, 0,
- ifp->idev->dev, 0, 0,
- GFP_ATOMIC);
+ addrconf_prefix_route(&ifp->peer_addr, 128,
+ ifp->rt_priority, ifp->idev->dev,
+ 0, 0, GFP_ATOMIC);
break;
case RTM_DELADDR:
if (ifp->idev->cnf.forwarding)
Greg Kroah-Hartman
2020-03-17 10:53:50 UTC
Permalink
From: Pablo Neira Ayuso <***@netfilter.org>

[ Upstream commit 84b3268027641401bb8ad4427a90a3cce2eb86f5 ]

Userspace might send a batch that is composed of several netlink
messages. The netlink_ack() function must use the pointer to the netlink
header as base to calculate the bad attribute offset.

Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting")
Signed-off-by: Pablo Neira Ayuso <***@netfilter.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2434,7 +2434,7 @@ void netlink_ack(struct sk_buff *in_skb,
in_skb->len))
WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS,
(u8 *)extack->bad_attr -
- in_skb->data));
+ (u8 *)nlh));
} else {
if (extack->cookie_len)
WARN_ON(nla_put(skb, NLMSGERR_ATTR_COOKIE,
Greg Kroah-Hartman
2020-03-17 10:53:51 UTC
Permalink
From: Dmitry Bogdanov <***@marvell.com>

[ Upstream commit 6fc498bc82929ee23aa2f35a828c6178dfd3f823 ]

SCI should be updated, because it contains MAC in its first 6 octets.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Dmitry Bogdanov <***@marvell.com>
Signed-off-by: Mark Starovoytov <***@marvell.com>
Signed-off-by: Igor Russkikh <***@marvell.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/macsec.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -532,6 +532,11 @@ static struct macsec_eth_header *macsec_
return (struct macsec_eth_header *)skb_mac_header(skb);
}

+static sci_t dev_to_sci(struct net_device *dev, __be16 port)
+{
+ return make_sci(dev->dev_addr, port);
+}
+
static u32 tx_sa_update_pn(struct macsec_tx_sa *tx_sa, struct macsec_secy *secy)
{
u32 pn;
@@ -2903,6 +2908,7 @@ static int macsec_set_mac_address(struct

out:
ether_addr_copy(dev->dev_addr, addr->sa_data);
+ macsec->secy.sci = dev_to_sci(dev, MACSEC_PORT_ES);
return 0;
}

@@ -3176,11 +3182,6 @@ static bool sci_exists(struct net_device
return false;
}

-static sci_t dev_to_sci(struct net_device *dev, __be16 port)
-{
- return make_sci(dev->dev_addr, port);
-}
-
static int macsec_add_dev(struct net_device *dev, sci_t sci, u8 icv_len)
{
struct macsec_dev *macsec = macsec_priv(dev);
Greg Kroah-Hartman
2020-03-17 10:53:52 UTC
Permalink
From: Vladimir Oltean <***@nxp.com>

[ Upstream commit a8015ded89ad740d21355470d41879c5bd82aab7 ]

What the driver writes into MAC_MAXLEN_CFG does not actually represent
VLAN_ETH_FRAME_LEN but instead ETH_FRAME_LEN + ETH_FCS_LEN. Yes they are
numerically equal, but the difference is important, as the switch treats
VLAN-tagged traffic specially and knows to increase the maximum accepted
frame size automatically. So it is always wrong to account for VLAN in
the MAC_MAXLEN_CFG register.

Unconditionally increase the maximum allowed frame size for
double-tagged traffic. Accounting for the additional length does not
mean that the other VLAN membership checks aren't performed, so there's
no harm done.

Also, stop abusing the MTU name for configuring the MRU. There is no
support for configuring the MRU on an interface at the moment.

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Fixes: fa914e9c4d94 ("net: mscc: ocelot: create a helper for changing the port MTU")
Signed-off-by: Vladimir Oltean <***@nxp.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ethernet/mscc/ocelot.c | 28 +++++++++++++++++-----------
drivers/net/ethernet/mscc/ocelot_dev.h | 2 +-
2 files changed, 18 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -2175,24 +2175,29 @@ static int ocelot_init_timestamp(struct
return 0;
}

-static void ocelot_port_set_mtu(struct ocelot *ocelot, int port, size_t mtu)
+/* Configure the maximum SDU (L2 payload) on RX to the value specified in @sdu.
+ * The length of VLAN tags is accounted for automatically via DEV_MAC_TAGS_CFG.
+ */
+static void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
+ int maxlen = sdu + ETH_HLEN + ETH_FCS_LEN;
int atop_wm;

- ocelot_port_writel(ocelot_port, mtu, DEV_MAC_MAXLEN_CFG);
+ ocelot_port_writel(ocelot_port, maxlen, DEV_MAC_MAXLEN_CFG);

/* Set Pause WM hysteresis
- * 152 = 6 * mtu / OCELOT_BUFFER_CELL_SZ
- * 101 = 4 * mtu / OCELOT_BUFFER_CELL_SZ
+ * 152 = 6 * maxlen / OCELOT_BUFFER_CELL_SZ
+ * 101 = 4 * maxlen / OCELOT_BUFFER_CELL_SZ
*/
ocelot_write_rix(ocelot, SYS_PAUSE_CFG_PAUSE_ENA |
SYS_PAUSE_CFG_PAUSE_STOP(101) |
SYS_PAUSE_CFG_PAUSE_START(152), SYS_PAUSE_CFG, port);

/* Tail dropping watermark */
- atop_wm = (ocelot->shared_queue_sz - 9 * mtu) / OCELOT_BUFFER_CELL_SZ;
- ocelot_write_rix(ocelot, ocelot_wm_enc(9 * mtu),
+ atop_wm = (ocelot->shared_queue_sz - 9 * maxlen) /
+ OCELOT_BUFFER_CELL_SZ;
+ ocelot_write_rix(ocelot, ocelot_wm_enc(9 * maxlen),
SYS_ATOP, port);
ocelot_write(ocelot, ocelot_wm_enc(atop_wm), SYS_ATOP_TOT_CFG);
}
@@ -2221,9 +2226,10 @@ void ocelot_init_port(struct ocelot *oce
DEV_MAC_HDX_CFG);

/* Set Max Length and maximum tags allowed */
- ocelot_port_set_mtu(ocelot, port, VLAN_ETH_FRAME_LEN);
+ ocelot_port_set_maxlen(ocelot, port, ETH_DATA_LEN);
ocelot_port_writel(ocelot_port, DEV_MAC_TAGS_CFG_TAG_ID(ETH_P_8021AD) |
DEV_MAC_TAGS_CFG_VLAN_AWR_ENA |
+ DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA |
DEV_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA,
DEV_MAC_TAGS_CFG);

@@ -2309,18 +2315,18 @@ void ocelot_set_cpu_port(struct ocelot *
* Only one port can be an NPI at the same time.
*/
if (cpu < ocelot->num_phys_ports) {
- int mtu = VLAN_ETH_FRAME_LEN + OCELOT_TAG_LEN;
+ int sdu = ETH_DATA_LEN + OCELOT_TAG_LEN;

ocelot_write(ocelot, QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK_M |
QSYS_EXT_CPU_CFG_EXT_CPU_PORT(cpu),
QSYS_EXT_CPU_CFG);

if (injection == OCELOT_TAG_PREFIX_SHORT)
- mtu += OCELOT_SHORT_PREFIX_LEN;
+ sdu += OCELOT_SHORT_PREFIX_LEN;
else if (injection == OCELOT_TAG_PREFIX_LONG)
- mtu += OCELOT_LONG_PREFIX_LEN;
+ sdu += OCELOT_LONG_PREFIX_LEN;

- ocelot_port_set_mtu(ocelot, cpu, mtu);
+ ocelot_port_set_maxlen(ocelot, cpu, sdu);
}

/* CPU port Injection/Extraction configuration */
--- a/drivers/net/ethernet/mscc/ocelot_dev.h
+++ b/drivers/net/ethernet/mscc/ocelot_dev.h
@@ -74,7 +74,7 @@
#define DEV_MAC_TAGS_CFG_TAG_ID_M GENMASK(31, 16)
#define DEV_MAC_TAGS_CFG_TAG_ID_X(x) (((x) & GENMASK(31, 16)) >> 16)
#define DEV_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA BIT(2)
-#define DEV_MAC_TAGS_CFG_PB_ENA BIT(1)
+#define DEV_MAC_TAGS_CFG_VLAN_DBL_AWR_ENA BIT(1)
#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA BIT(0)

#define DEV_MAC_ADV_CHK_CFG 0x2c
Greg Kroah-Hartman
2020-03-17 10:53:43 UTC
Permalink
From: Mahesh Bandewar <***@google.com>

[ Upstream commit ad8192767c9f9cf97da57b9ffcea70fb100febef ]

IPvlan in L3 mode discards outbound multicast packets but performs
the check before ensuring the ether-header is set or not. This is
an error that Eric found through code browsing.

Fixes: 2ad7bf363841 (“ipvlan: Initial check-in of the IPVLAN driver.”)
Signed-off-by: Mahesh Bandewar <***@google.com>
Reported-by: Eric Dumazet <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/ipvlan/ipvlan_core.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -499,19 +499,21 @@ static int ipvlan_process_outbound(struc
struct ethhdr *ethh = eth_hdr(skb);
int ret = NET_XMIT_DROP;

- /* In this mode we dont care about multicast and broadcast traffic */
- if (is_multicast_ether_addr(ethh->h_dest)) {
- pr_debug_ratelimited("Dropped {multi|broad}cast of type=[%x]\n",
- ntohs(skb->protocol));
- kfree_skb(skb);
- goto out;
- }
-
/* The ipvlan is a pseudo-L2 device, so the packets that we receive
* will have L2; which need to discarded and processed further
* in the net-ns of the main-device.
*/
if (skb_mac_header_was_set(skb)) {
+ /* In this mode we dont care about
+ * multicast and broadcast traffic */
+ if (is_multicast_ether_addr(ethh->h_dest)) {
+ pr_debug_ratelimited(
+ "Dropped {multi|broad}cast of type=[%x]\n",
+ ntohs(skb->protocol));
+ kfree_skb(skb);
+ goto out;
+ }
+
skb_pull(skb, sizeof(*ethh));
skb->mac_header = (typeof(skb->mac_header))~0U;
skb_reset_network_header(skb);
Greg Kroah-Hartman
2020-03-17 10:54:25 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit e13aaa0643da10006ec35715954e7f92a62899a5 ]

Add missing attribute validation for TCA_TAPRIO_ATTR_TXTIME_DELAY
to the netlink policy.

Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: Vinicius Costa Gomes <***@intel.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/sched/sch_taprio.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -774,6 +774,7 @@ static const struct nla_policy taprio_po
[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = { .type = NLA_S64 },
[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
[TCA_TAPRIO_ATTR_FLAGS] = { .type = NLA_U32 },
+ [TCA_TAPRIO_ATTR_TXTIME_DELAY] = { .type = NLA_U32 },
};

static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry,
Greg Kroah-Hartman
2020-03-17 10:54:35 UTC
Permalink
From: Hangbin Liu <***@gmail.com>

[ Upstream commit d0098e4c6b83e502cc1cd96d67ca86bc79a6c559 ]

When we modify the peer route and changed it to a new one, we should
remove the old route first. Before the fix:

+ ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 256 pref medium
2001:db8::2 proto kernel metric 256 pref medium
+ ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 256 pref medium
2001:db8::2 proto kernel metric 256 pref medium

After the fix:
+ ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
+ ip -6 route show dev dummy1
2001:db8::1 proto kernel metric 256 pref medium
2001:db8::3 proto kernel metric 256 pref medium

This patch depend on the previous patch "net/ipv6: need update peer route
when modify metric" to update new peer route after delete old one.

Signed-off-by: Hangbin Liu <***@gmail.com>
Reviewed-by: David Ahern <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/ipv6/addrconf.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1226,11 +1226,13 @@ check_cleanup_prefix_route(struct inet6_
}

static void
-cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires, bool del_rt)
+cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires,
+ bool del_rt, bool del_peer)
{
struct fib6_info *f6i;

- f6i = addrconf_get_prefix_route(&ifp->addr, ifp->prefix_len,
+ f6i = addrconf_get_prefix_route(del_peer ? &ifp->peer_addr : &ifp->addr,
+ ifp->prefix_len,
ifp->idev->dev, 0, RTF_DEFAULT, true);
if (f6i) {
if (del_rt)
@@ -1293,7 +1295,7 @@ static void ipv6_del_addr(struct inet6_i

if (action != CLEANUP_PREFIX_RT_NOP) {
cleanup_prefix_route(ifp, expires,
- action == CLEANUP_PREFIX_RT_DEL);
+ action == CLEANUP_PREFIX_RT_DEL, false);
}

/* clean up prefsrc entries */
@@ -4631,6 +4633,7 @@ static int inet6_addr_modify(struct inet
unsigned long timeout;
bool was_managetempaddr;
bool had_prefixroute;
+ bool new_peer = false;

ASSERT_RTNL();

@@ -4662,6 +4665,13 @@ static int inet6_addr_modify(struct inet
cfg->preferred_lft = timeout;
}

+ if (cfg->peer_pfx &&
+ memcmp(&ifp->peer_addr, cfg->peer_pfx, sizeof(struct in6_addr))) {
+ if (!ipv6_addr_any(&ifp->peer_addr))
+ cleanup_prefix_route(ifp, expires, true, true);
+ new_peer = true;
+ }
+
spin_lock_bh(&ifp->lock);
was_managetempaddr = ifp->flags & IFA_F_MANAGETEMPADDR;
had_prefixroute = ifp->flags & IFA_F_PERMANENT &&
@@ -4677,6 +4687,9 @@ static int inet6_addr_modify(struct inet
if (cfg->rt_priority && cfg->rt_priority != ifp->rt_priority)
ifp->rt_priority = cfg->rt_priority;

+ if (new_peer)
+ ifp->peer_addr = *cfg->peer_pfx;
+
spin_unlock_bh(&ifp->lock);
if (!(ifp->flags&IFA_F_TENTATIVE))
ipv6_ifa_notify(0, ifp);
@@ -4712,7 +4725,7 @@ static int inet6_addr_modify(struct inet

if (action != CLEANUP_PREFIX_RT_NOP) {
cleanup_prefix_route(ifp, rt_expires,
- action == CLEANUP_PREFIX_RT_DEL);
+ action == CLEANUP_PREFIX_RT_DEL, false);
}
}
Greg Kroah-Hartman
2020-03-17 10:54:37 UTC
Permalink
From: Julian Wiedmann <***@linux.ibm.com>

[ Upstream commit 240c1948491b81cfe40f84ea040a8f2a4966f101 ]

When an OSA device in prio-queue setup is reduced to 1 TX queue due to
HW restrictions, we reset its the default_out_queue to 0.

In the old code this was needed so that qeth_get_priority_queue() gets
the queue selection right. But with proper multiqueue support we already
reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic
on txq 0 without even calling .ndo_select_queue.

Thus we can preserve the user's configuration, and apply it if the OSA
device later re-gains support for multiple TX queues.

Fixes: 73dc2daf110f ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <***@linux.ibm.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/s390/net/qeth_core_main.c | 1 -
1 file changed, 1 deletion(-)

--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1244,7 +1244,6 @@ static int qeth_osa_set_output_queues(st
if (count == 1)
dev_info(&card->gdev->dev, "Priority Queueing not supported\n");

- card->qdio.default_out_queue = single ? 0 : QETH_DEFAULT_QUEUE;
card->qdio.no_out_queues = count;
return 0;
}
Greg Kroah-Hartman
2020-03-17 10:54:36 UTC
Permalink
From: Hangbin Liu <***@gmail.com>

[ Upstream commit 0d29169a708bf730ede287248e429d579f432d1d ]

This patch update {ipv4, ipv6}_addr_metric_test with
1. Set metric of address with peer route and see if the route added
correctly.
2. Modify metric and peer address for peer route and see if the route
changed correctly.

Signed-off-by: Hangbin Liu <***@gmail.com>
Reviewed-by: David Ahern <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
tools/testing/selftests/net/fib_tests.sh | 34 ++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)

--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -1041,6 +1041,27 @@ ipv6_addr_metric_test()
fi
log_test $rc 0 "Prefix route with metric on link up"

+ # verify peer metric added correctly
+ set -e
+ run_cmd "$IP -6 addr flush dev dummy2"
+ run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::1 peer 2001:db8:104::2 metric 260"
+ set +e
+
+ check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 260"
+ log_test $? 0 "Set metric with peer route on local side"
+ log_test $? 0 "User specified metric on local address"
+ check_route6 "2001:db8:104::2 dev dummy2 proto kernel metric 260"
+ log_test $? 0 "Set metric with peer route on peer side"
+
+ set -e
+ run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::1 peer 2001:db8:104::3 metric 261"
+ set +e
+
+ check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 261"
+ log_test $? 0 "Modify metric and peer address on local side"
+ check_route6 "2001:db8:104::3 dev dummy2 proto kernel metric 261"
+ log_test $? 0 "Modify metric and peer address on peer side"
+
$IP li del dummy1
$IP li del dummy2
cleanup
@@ -1457,13 +1478,20 @@ ipv4_addr_metric_test()

run_cmd "$IP addr flush dev dummy2"
run_cmd "$IP addr add dev dummy2 172.16.104.1/32 peer 172.16.104.2 metric 260"
- run_cmd "$IP addr change dev dummy2 172.16.104.1/32 peer 172.16.104.2 metric 261"
rc=$?
if [ $rc -eq 0 ]; then
- check_route "172.16.104.2 dev dummy2 proto kernel scope link src 172.16.104.1 metric 261"
+ check_route "172.16.104.2 dev dummy2 proto kernel scope link src 172.16.104.1 metric 260"
+ rc=$?
+ fi
+ log_test $rc 0 "Set metric of address with peer route"
+
+ run_cmd "$IP addr change dev dummy2 172.16.104.1/32 peer 172.16.104.3 metric 261"
+ rc=$?
+ if [ $rc -eq 0 ]; then
+ check_route "172.16.104.3 dev dummy2 proto kernel scope link src 172.16.104.1 metric 261"
rc=$?
fi
- log_test $rc 0 "Modify metric of address with peer route"
+ log_test $rc 0 "Modify metric and peer address for peer route"

$IP li del dummy1
$IP li del dummy2
Greg Kroah-Hartman
2020-03-17 10:54:39 UTC
Permalink
From: Andrew Lunn <***@lunn.ch>

[ Upstream commit a20f997010c4ec76eaa55b8cc047d76dcac69f70 ]

By default, DSA drivers should configure CPU and DSA ports to their
maximum speed. In many configurations this is sufficient to make the
link work.

In some cases it is necessary to configure the link to run slower,
e.g. because of limitations of the SoC it is connected to. Or back to
back PHYs are used and the PHY needs to be driven in order to
establish link. In this case, phylink is used.

Only instantiate phylink if it is required. If there is no PHY, or no
fixed link properties, phylink can upset a link which works in the
default configuration.

Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Andrew Lunn <***@lunn.ch>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/dsa/port.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -653,9 +653,14 @@ err_phy_connect:
int dsa_port_link_register_of(struct dsa_port *dp)
{
struct dsa_switch *ds = dp->ds;
+ struct device_node *phy_np;

- if (!ds->ops->adjust_link)
- return dsa_port_phylink_register(dp);
+ if (!ds->ops->adjust_link) {
+ phy_np = of_parse_phandle(dp->dn, "phy-handle", 0);
+ if (of_phy_is_fixed_link(dp->dn) || phy_np)
+ return dsa_port_phylink_register(dp);
+ return 0;
+ }

dev_warn(ds->dev,
"Using legacy PHYLIB callbacks. Please migrate to PHYLINK!\n");
@@ -670,11 +675,12 @@ void dsa_port_link_unregister_of(struct
{
struct dsa_switch *ds = dp->ds;

- if (!ds->ops->adjust_link) {
+ if (!ds->ops->adjust_link && dp->pl) {
rtnl_lock();
phylink_disconnect_phy(dp->pl);
rtnl_unlock();
phylink_destroy(dp->pl);
+ dp->pl = NULL;
return;
}
Greg Kroah-Hartman
2020-03-17 10:54:40 UTC
Permalink
From: Andrew Lunn <***@lunn.ch>

[ Upstream commit 012fc74517b25177dfede2ed45cd108258564e4a ]

Only the bottom 12 bits contain the ATU bin occupancy statistics. The
upper bits need masking off.

Fixes: e0c69ca7dfbb ("net: dsa: mv88e6xxx: Add ATU occupancy via devlink resources")
Signed-off-by: Andrew Lunn <***@lunn.ch>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2762,6 +2762,8 @@ static u64 mv88e6xxx_devlink_atu_bin_get
goto unlock;
}

+ occupancy &= MV88E6XXX_G2_ATU_STATS_MASK;
+
unlock:
mv88e6xxx_reg_unlock(chip);
Greg Kroah-Hartman
2020-03-17 10:54:41 UTC
Permalink
From: Florian Fainelli <***@gmail.com>

commit 503ba7c6961034ff0047707685644cad9287c226 upstream.

It is currently possible for a PHY device to be suspended as part of a
network device driver's suspend call while it is still being attached to
that net_device, either via phy_suspend() or implicitly via phy_stop().

Later on, when the MDIO bus controller get suspended, we would attempt
to suspend again the PHY because it is still attached to a network
device.

This is both a waste of time and creates an opportunity for improper
clock/power management bugs to creep in.

Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY")
Signed-off-by: Florian Fainelli <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -247,7 +247,7 @@ static bool mdio_bus_phy_may_suspend(str
* MDIO bus driver and clock gated at this point.
*/
if (!netdev)
- return !phydev->suspended;
+ goto out;

if (netdev->wol_enabled)
return false;
@@ -267,7 +267,8 @@ static bool mdio_bus_phy_may_suspend(str
if (device_may_wakeup(&netdev->dev))
return false;

- return true;
+out:
+ return !phydev->suspended;
}

static int mdio_bus_phy_suspend(struct device *dev)
Greg Kroah-Hartman
2020-03-17 10:54:42 UTC
Permalink
From: Qian Cai <***@lca.pw>

commit 190ecb190a9cd8c0599d8499b901e3c32e87966a upstream.

Similar to the commit d7495343228f ("cgroup: fix incorrect
WARN_ON_ONCE() in cgroup_setup_root()"), cgroup_id(root_cgrp) does not
equal to 1 on 32bit ino archs which triggers all sorts of issues with
psi_show() on s390x. For example,

BUG: KASAN: slab-out-of-bounds in collect_percpu_times+0x2d0/
Read of size 4 at addr 000000001e0ce000 by task read_all/3667
collect_percpu_times+0x2d0/0x798
psi_show+0x7c/0x2a8
seq_read+0x2ac/0x830
vfs_read+0x92/0x150
ksys_read+0xe2/0x188
system_call+0xd8/0x2b4

Fix it by using cgroup_ino().

Fixes: 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID")
Signed-off-by: Qian Cai <***@lca.pw>
Acked-by: Johannes Weiner <***@cmpxchg.org>
Signed-off-by: Tejun Heo <***@kernel.org>
Cc: ***@vger.kernel.org # v5.5
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
kernel/cgroup/cgroup.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3547,21 +3547,21 @@ static int cpu_stat_show(struct seq_file
static int cgroup_io_pressure_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
- struct psi_group *psi = cgroup_id(cgrp) == 1 ? &psi_system : &cgrp->psi;
+ struct psi_group *psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi;

return psi_show(seq, psi, PSI_IO);
}
static int cgroup_memory_pressure_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
- struct psi_group *psi = cgroup_id(cgrp) == 1 ? &psi_system : &cgrp->psi;
+ struct psi_group *psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi;

return psi_show(seq, psi, PSI_MEM);
}
static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
- struct psi_group *psi = cgroup_id(cgrp) == 1 ? &psi_system : &cgrp->psi;
+ struct psi_group *psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi;

return psi_show(seq, psi, PSI_CPU);
}
Greg Kroah-Hartman
2020-03-17 10:54:26 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit dd25cb272ccce4db67dc8509278229099e4f5e99 ]

Add missing attribute validation for TEAM_ATTR_OPTION_PORT_IFINDEX
to the netlink policy.

Fixes: 80f7c6683fe0 ("team: add support for per-port options")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: Jiri Pirko <***@mellanox.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/team/team.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2240,6 +2240,7 @@ team_nl_option_policy[TEAM_ATTR_OPTION_M
[TEAM_ATTR_OPTION_CHANGED] = { .type = NLA_FLAG },
[TEAM_ATTR_OPTION_TYPE] = { .type = NLA_U8 },
[TEAM_ATTR_OPTION_DATA] = { .type = NLA_BINARY },
+ [TEAM_ATTR_OPTION_PORT_IFINDEX] = { .type = NLA_U32 },
};

static int team_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
Greg Kroah-Hartman
2020-03-17 10:54:43 UTC
Permalink
From: Vasily Averin <***@virtuozzo.com>

commit 2d4ecb030dcc90fb725ecbfc82ce5d6c37906e0e upstream.

If seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output:

1) dd bs=1 skip output of each 2nd elements
$ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
2
3
4
5
1+0 records in
1+0 records out
8 bytes copied, 0,000267297 s, 29,9 kB/s
[***@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
2
4 <<< NB! 3 was skipped
6 <<< ... and 5 too
8 <<< ... and 7
8+0 records in
8+0 records out
8 bytes copied, 5,2123e-05 s, 153 kB/s

This happen because __cgroup_procs_start() makes an extra
extra cgroup_procs_next() call

2) read after lseek beyond end of file generates whole last line.
3) read after lseek into middle of last line generates
expected rest of last line and unexpected whole line once again.

Additionally patch removes an extra position index changes in
__cgroup_procs_start()

Cc: ***@vger.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <***@virtuozzo.com>
Signed-off-by: Tejun Heo <***@kernel.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
kernel/cgroup/cgroup.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4600,6 +4600,9 @@ static void *cgroup_procs_next(struct se
struct kernfs_open_file *of = s->private;
struct css_task_iter *it = of->priv;

+ if (pos)
+ (*pos)++;
+
return css_task_iter_next(it);
}

@@ -4615,7 +4618,7 @@ static void *__cgroup_procs_start(struct
* from position 0, so we can simply keep iterating on !0 *pos.
*/
if (!it) {
- if (WARN_ON_ONCE((*pos)++))
+ if (WARN_ON_ONCE((*pos)))
return ERR_PTR(-EINVAL);

it = kzalloc(sizeof(*it), GFP_KERNEL);
@@ -4623,10 +4626,11 @@ static void *__cgroup_procs_start(struct
return ERR_PTR(-ENOMEM);
of->priv = it;
css_task_iter_start(&cgrp->self, iter_flags, it);
- } else if (!(*pos)++) {
+ } else if (!(*pos)) {
css_task_iter_end(it);
css_task_iter_start(&cgrp->self, iter_flags, it);
- }
+ } else
+ return it->cur_task;

return cgroup_procs_next(s, NULL, NULL);
}
Greg Kroah-Hartman
2020-03-17 10:54:44 UTC
Permalink
From: Michal Koutný <***@suse.com>

commit 9c974c77246460fa6a92c18554c3311c8c83c160 upstream.

PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.

Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a83 ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.

Reported-by: Suren Baghdasaryan <***@google.com>
Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný <***@suse.com>
Signed-off-by: Tejun Heo <***@kernel.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
include/linux/cgroup.h | 1 +
kernel/cgroup/cgroup.c | 23 ++++++++++++++++-------
2 files changed, 17 insertions(+), 7 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -62,6 +62,7 @@ struct css_task_iter {
struct list_head *mg_tasks_head;
struct list_head *dying_tasks_head;

+ struct list_head *cur_tasks_head;
struct css_set *cur_cset;
struct css_set *cur_dcset;
struct task_struct *cur_task;
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4405,12 +4405,16 @@ static void css_task_iter_advance_css_se
}
} while (!css_set_populated(cset) && list_empty(&cset->dying_tasks));

- if (!list_empty(&cset->tasks))
+ if (!list_empty(&cset->tasks)) {
it->task_pos = cset->tasks.next;
- else if (!list_empty(&cset->mg_tasks))
+ it->cur_tasks_head = &cset->tasks;
+ } else if (!list_empty(&cset->mg_tasks)) {
it->task_pos = cset->mg_tasks.next;
- else
+ it->cur_tasks_head = &cset->mg_tasks;
+ } else {
it->task_pos = cset->dying_tasks.next;
+ it->cur_tasks_head = &cset->dying_tasks;
+ }

it->tasks_head = &cset->tasks;
it->mg_tasks_head = &cset->mg_tasks;
@@ -4468,10 +4472,14 @@ repeat:
else
it->task_pos = it->task_pos->next;

- if (it->task_pos == it->tasks_head)
+ if (it->task_pos == it->tasks_head) {
it->task_pos = it->mg_tasks_head->next;
- if (it->task_pos == it->mg_tasks_head)
+ it->cur_tasks_head = it->mg_tasks_head;
+ }
+ if (it->task_pos == it->mg_tasks_head) {
it->task_pos = it->dying_tasks_head->next;
+ it->cur_tasks_head = it->dying_tasks_head;
+ }
if (it->task_pos == it->dying_tasks_head)
css_task_iter_advance_css_set(it);
} else {
@@ -4490,11 +4498,12 @@ repeat:
goto repeat;

/* and dying leaders w/o live member threads */
- if (!atomic_read(&task->signal->live))
+ if (it->cur_tasks_head == it->dying_tasks_head &&
+ !atomic_read(&task->signal->live))
goto repeat;
} else {
/* skip all dying ones */
- if (task->flags & PF_EXITING)
+ if (it->cur_tasks_head == it->dying_tasks_head)
goto repeat;
}
}
Greg Kroah-Hartman
2020-03-17 10:54:45 UTC
Permalink
From: Florian Westphal <***@strlen.de>

commit 1d305ba40eb8081ff21eeb8ca6ba5c70fd920934 upstream.

nft will loop forever if the kernel doesn't support an expression:

1. nft_expr_type_get() appends the family specific name to the module list.
2. -EAGAIN is returned to nfnetlink, nfnetlink calls abort path.
3. abort path sets ->done to true and calls request_module for the
expression.
4. nfnetlink replays the batch, we end up in nft_expr_type_get() again.
5. nft_expr_type_get attempts to append family-specific name. This
one already exists on the list, so we continue
6. nft_expr_type_get adds the generic expression name to the module
list. -EAGAIN is returned, nfnetlink calls abort path.
7. abort path encounters the family-specific expression which
has 'done' set, so it gets removed.
8. abort path requests the generic expression name, sets done to true.
9. batch is replayed.

If the expression could not be loaded, then we will end up back at 1),
because the family-specific name got removed and the cycle starts again.

Note that userspace can SIGKILL the nft process to stop the cycle, but
the desired behaviour is to return an error after the generic expr name
fails to load the expression.

Fixes: eb014de4fd418 ("netfilter: nf_tables: autoload modules from the abort path")
Signed-off-by: Florian Westphal <***@strlen.de>
Signed-off-by: Pablo Neira Ayuso <***@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
net/netfilter/nf_tables_api.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7250,13 +7250,8 @@ static void nf_tables_module_autoload(st
list_splice_init(&net->nft.module_list, &module_list);
mutex_unlock(&net->nft.commit_mutex);
list_for_each_entry_safe(req, next, &module_list, list) {
- if (req->done) {
- list_del(&req->list);
- kfree(req);
- } else {
- request_module("%s", req->module);
- req->done = true;
- }
+ request_module("%s", req->module);
+ req->done = true;
}
mutex_lock(&net->nft.commit_mutex);
list_splice(&module_list, &net->nft.module_list);
@@ -8039,6 +8034,7 @@ static void __net_exit nf_tables_exit_ne
__nft_release_tables(net);
mutex_unlock(&net->nft.commit_mutex);
WARN_ON_ONCE(!list_empty(&net->nft.tables));
+ WARN_ON_ONCE(!list_empty(&net->nft.module_list));
}

static struct pernet_operations nf_tables_net_ops = {
Greg Kroah-Hartman
2020-03-17 10:54:46 UTC
Permalink
From: Dan Moulding <***@me.com>

commit a9149d243f259ad8f02b1e23dfe8ba06128f15e1 upstream.

The logic for checking required NVM sections was recently fixed in
commit b3f20e098293 ("iwlwifi: mvm: fix NVM check for 3168
devices"). However, with that fixed the else is now taken for 3168
devices and within the else clause there is a mandatory check for the
PHY_SKU section. This causes the parsing to fail for 3168 devices.

The PHY_SKU section is really only mandatory for the IWL_NVM_EXT
layout (the phy_sku parameter of iwl_parse_nvm_data is only used when
the NVM type is IWL_NVM_EXT). So this changes the PHY_SKU section
check so that it's only mandatory for IWL_NVM_EXT.

Fixes: b3f20e098293 ("iwlwifi: mvm: fix NVM check for 3168 devices")
Signed-off-by: Dan Moulding <***@me.com>
Signed-off-by: Kalle Valo <***@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/net/wireless/intel/iwlwifi/mvm/nvm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
@@ -308,7 +308,8 @@ iwl_parse_nvm_sections(struct iwl_mvm *m
}

/* PHY_SKU section is mandatory in B0 */
- if (!mvm->nvm_sections[NVM_SECTION_TYPE_PHY_SKU].data) {
+ if (mvm->trans->cfg->nvm_type == IWL_NVM_EXT &&
+ !mvm->nvm_sections[NVM_SECTION_TYPE_PHY_SKU].data) {
IWL_ERR(mvm,
"Can't parse phy_sku in B0, empty sections\n");
return NULL;
Greg Kroah-Hartman
2020-03-17 10:54:47 UTC
Permalink
From: Halil Pasic <***@linux.ibm.com>

commit f5f6b95c72f7f8bb46eace8c5306c752d0133daa upstream.

Since nobody else is going to restart our hw_queue for us, the
blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
necessarily sufficient to ensure that the queue will get started again.
In case of global resource outage (-ENOMEM because mapping failure,
because of swiotlb full) our virtqueue may be empty and we can get
stuck with a stopped hw_queue.

Let us not stop the queue on arbitrary errors, but only on -EONSPC which
indicates a full virtqueue, where the hw_queue is guaranteed to get
started by virtblk_done() before when it makes sense to carry on
submitting requests. Let us also remove a stale comment.

Signed-off-by: Halil Pasic <***@linux.ibm.com>
Cc: Jens Axboe <***@kernel.dk>
Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
Link: https://lore.kernel.org/r/20200213123728.61216-2-***@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <***@redhat.com>
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/block/virtio_blk.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -339,10 +339,12 @@ static blk_status_t virtio_queue_rq(stru
err = virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
if (err) {
virtqueue_kick(vblk->vqs[qid].vq);
- blk_mq_stop_hw_queue(hctx);
+ /* Don't stop the queue if -ENOMEM: we may have failed to
+ * bounce the buffer due to global resource outage.
+ */
+ if (err == -ENOSPC)
+ blk_mq_stop_hw_queue(hctx);
spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
- /* Out of mem doesn't actually happen, since we fall back
- * to direct descriptors */
if (err == -ENOMEM || err == -ENOSPC)
return BLK_STS_DEV_RESOURCE;
return BLK_STS_IOERR;
Greg Kroah-Hartman
2020-03-17 10:54:48 UTC
Permalink
From: Hans de Goede <***@redhat.com>

commit 81ee85d0462410de8eeeec1b9761941fd6ed8c7b upstream.

Quoting from the comment describing the WARN functions in
include/asm-generic/bug.h:

* WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
* significant kernel issues that need prompt attention if they should ever
* appear at runtime.
*
* Do not use these macros when checking for invalid external inputs

The (buggy) firmware tables which the dmar code was calling WARN_TAINT
for really are invalid external inputs. They are not under the kernel's
control and the issues in them cannot be fixed by a kernel update.
So logging a backtrace, which invites bug reports to be filed about this,
is not helpful.

Fixes: 556ab45f9a77 ("ioat2: catch and recover from broken vtd configurations v6")
Signed-off-by: Hans de Goede <***@redhat.com>
Acked-by: Lu Baolu <***@linux.intel.com>
Link: https://lore.kernel.org/r/20200309182510.373875-1-***@redhat.com
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=701847
Signed-off-by: Joerg Roedel <***@suse.de>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/iommu/intel-iommu.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4141,10 +4141,11 @@ static void quirk_ioat_snb_local_iommu(s

/* we know that the this iommu should be at offset 0xa000 from vtbar */
drhd = dmar_find_matched_drhd_unit(pdev);
- if (WARN_TAINT_ONCE(!drhd || drhd->reg_base_addr - vtbar != 0xa000,
- TAINT_FIRMWARE_WORKAROUND,
- "BIOS assigned incorrect VT-d unit for Intel(R) QuickData Technology device\n"))
+ if (!drhd || drhd->reg_base_addr - vtbar != 0xa000) {
+ pr_warn_once(FW_BUG "BIOS assigned incorrect VT-d unit for Intel(R) QuickData Technology device\n");
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+ }
}
DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IOAT_SNB, quirk_ioat_snb_local_iommu);
Greg Kroah-Hartman
2020-03-17 10:54:49 UTC
Permalink
From: Vasily Averin <***@virtuozzo.com>

commit dc15af8e9dbd039ebb06336597d2c491ef46ab74 upstream.

If .next function does not change position index,
following .show function will repeat output related
to current position index.

Cc: ***@vger.kernel.org
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <***@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <***@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
net/netfilter/nf_conntrack_standalone.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -411,7 +411,7 @@ static void *ct_cpu_seq_next(struct seq_
*pos = cpu + 1;
return per_cpu_ptr(net->ct.stat, cpu);
}
-
+ (*pos)++;
return NULL;
}
Greg Kroah-Hartman
2020-03-17 10:54:51 UTC
Permalink
From: Vasily Averin <***@virtuozzo.com>

commit db25517a550926f609c63054b12ea9ad515e1a10 upstream.

If .next function does not change position index,
following .show function will repeat output related
to current position index.

Without the patch:
# dd if=/proc/net/xt_recent/SSH # original file outpt
src=127.0.0.4 ttl: 0 last_seen: 6275444819 oldest_pkt: 1 6275444819
src=127.0.0.2 ttl: 0 last_seen: 6275438906 oldest_pkt: 1 6275438906
src=127.0.0.3 ttl: 0 last_seen: 6275441953 oldest_pkt: 1 6275441953
0+1 records in
0+1 records out
204 bytes copied, 6.1332e-05 s, 3.3 MB/s

Read after lseek into middle of last line (offset 140 in example below)
generates expected end of last line and then unexpected whole last line
once again

# dd if=/proc/net/xt_recent/SSH bs=140 skip=1
dd: /proc/net/xt_recent/SSH: cannot skip to specified offset
127.0.0.3 ttl: 0 last_seen: 6275441953 oldest_pkt: 1 6275441953
src=127.0.0.3 ttl: 0 last_seen: 6275441953 oldest_pkt: 1 6275441953
0+1 records in
0+1 records out
132 bytes copied, 6.2487e-05 s, 2.1 MB/s

Cc: ***@vger.kernel.org
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <***@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <***@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
net/netfilter/xt_recent.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -492,12 +492,12 @@ static void *recent_seq_next(struct seq_
const struct recent_entry *e = v;
const struct list_head *head = e->list.next;

+ (*pos)++;
while (head == &t->iphash[st->bucket]) {
if (++st->bucket >= ip_list_hash_size)
return NULL;
head = t->iphash[st->bucket].next;
}
- (*pos)++;
return list_entry(head, struct recent_entry, list);
}
Greg Kroah-Hartman
2020-03-17 10:54:50 UTC
Permalink
From: Vasily Averin <***@virtuozzo.com>

commit bb71f846a0002239f7058c84f1496648ff4a5c20 upstream.

If .next function does not change position index,
following .show function will repeat output related
to current position index.

Cc: ***@vger.kernel.org
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <***@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <***@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
net/netfilter/nf_synproxy_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -267,7 +267,7 @@ static void *synproxy_cpu_seq_next(struc
*pos = cpu + 1;
return per_cpu_ptr(snet->stats, cpu);
}
-
+ (*pos)++;
return NULL;
}
Greg Kroah-Hartman
2020-03-17 10:54:52 UTC
Permalink
From: Vasily Averin <***@virtuozzo.com>

commit ee84f19cbbe9cf7cba2958acb03163fed3ecbb0f upstream.

If .next function does not change position index,
following .show function will repeat output related
to current position index.

Without patch:
# dd if=/proc/net/ip_tables_matches # original file output
conntrack
conntrack
conntrack
recent
recent
icmp
udplite
udp
tcp
0+1 records in
0+1 records out
65 bytes copied, 5.4074e-05 s, 1.2 MB/s

# dd if=/proc/net/ip_tables_matches bs=62 skip=1
dd: /proc/net/ip_tables_matches: cannot skip to specified offset
cp <<< end of last line
tcp <<< and then unexpected whole last line once again
0+1 records in
0+1 records out
7 bytes copied, 0.000102447 s, 68.3 kB/s

Cc: ***@vger.kernel.org
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <***@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <***@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
net/netfilter/x_tables.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1551,6 +1551,9 @@ static void *xt_mttg_seq_next(struct seq
uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file));
struct nf_mttg_trav *trav = seq->private;

+ if (ppos != NULL)
+ ++(*ppos);
+
switch (trav->class) {
case MTTG_TRAV_INIT:
trav->class = MTTG_TRAV_NFP_UNSPEC;
@@ -1576,9 +1579,6 @@ static void *xt_mttg_seq_next(struct seq
default:
return NULL;
}
-
- if (ppos != NULL)
- ++*ppos;
return trav;
}
Greg Kroah-Hartman
2020-03-17 10:54:27 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 669fcd7795900cd1880237cbbb57a7db66cb9ac8 ]

Add missing attribute validation for TEAM_ATTR_OPTION_ARRAY_INDEX
to the netlink policy.

Fixes: b13033262d24 ("team: introduce array options")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Reviewed-by: Jiri Pirko <***@mellanox.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/net/team/team.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2241,6 +2241,7 @@ team_nl_option_policy[TEAM_ATTR_OPTION_M
[TEAM_ATTR_OPTION_TYPE] = { .type = NLA_U8 },
[TEAM_ATTR_OPTION_DATA] = { .type = NLA_BINARY },
[TEAM_ATTR_OPTION_PORT_IFINDEX] = { .type = NLA_U32 },
+ [TEAM_ATTR_OPTION_ARRAY_INDEX] = { .type = NLA_U32 },
};

static int team_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
Greg Kroah-Hartman
2020-03-17 10:54:53 UTC
Permalink
From: Hillf Danton <***@sina.com>

commit aa202f1f56960c60e7befaa0f49c72b8fa11b0a8 upstream.

wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.

Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.

Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement. So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.

Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: ***@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton <***@sina.com>
[dj: massage changelog]
Signed-off-by: Daniel Jordan <***@oracle.com>
Cc: Tejun Heo <***@kernel.org>
Cc: Lai Jiangshan <***@gmail.com>
Cc: linux-***@vger.kernel.org
Signed-off-by: Tejun Heo <***@kernel.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
kernel/workqueue.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1411,14 +1411,16 @@ static void __queue_work(int cpu, struct
return;
rcu_read_lock();
retry:
- if (req_cpu == WORK_CPU_UNBOUND)
- cpu = wq_select_unbound_cpu(raw_smp_processor_id());
-
/* pwq which will be used unless @work is executing elsewhere */
- if (!(wq->flags & WQ_UNBOUND))
- pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
- else
+ if (wq->flags & WQ_UNBOUND) {
+ if (req_cpu == WORK_CPU_UNBOUND)
+ cpu = wq_select_unbound_cpu(raw_smp_processor_id());
pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
+ } else {
+ if (req_cpu == WORK_CPU_UNBOUND)
+ cpu = raw_smp_processor_id();
+ pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
+ }

/*
* If @work was previously on a different pool, it might still be
Greg Kroah-Hartman
2020-03-17 10:54:54 UTC
Permalink
From: Colin Ian King <***@canonical.com>

commit d785476c608c621b345dd9396e8b21e90375cb0e upstream.

Variable grph_obj_type is being assigned twice, one of these is
redundant so remove it.

Addresses-Coverity: ("Evaluation order violation")
Signed-off-by: Colin Ian King <***@canonical.com>
Signed-off-by: Alex Deucher <***@amd.com>
Cc: <***@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -365,8 +365,7 @@ bool amdgpu_atombios_get_connector_info_
router.ddc_valid = false;
router.cd_valid = false;
for (j = 0; j < ((le16_to_cpu(path->usSize) - 8) / 2); j++) {
- uint8_t grph_obj_type=
- grph_obj_type =
+ uint8_t grph_obj_type =
(le16_to_cpu(path->usGraphicObjIds[j]) &
OBJECT_TYPE_MASK) >> OBJECT_TYPE_SHIFT;
Greg Kroah-Hartman
2020-03-17 10:54:55 UTC
Permalink
From: Chris Wilson <***@chris-wilson.co.uk>

commit c67b35d970ed3391069c21f3071a26f687399ab2 upstream.

Fix the inverted test to emit the wait on the end of the previous
request if we /haven't/ already.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <***@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <***@linux.intel.com>
Cc: <***@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <***@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305104210.2619967-1-***@chris-wilson.co.uk
(cherry picked from commit 07e9c59d63df6a1c44c1975c01827ba18b69270a)
Signed-off-by: Jani Nikula <***@intel.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/i915_request.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -785,7 +785,7 @@ i915_request_await_start(struct i915_req
return PTR_ERR_OR_ZERO(fence);

err = 0;
- if (intel_timeline_sync_is_later(i915_request_timeline(rq), fence))
+ if (!intel_timeline_sync_is_later(i915_request_timeline(rq), fence))
err = i915_sw_fence_await_dma_fence(&rq->submit,
fence, 0,
I915_FENCE_GFP);
Greg Kroah-Hartman
2020-03-17 10:54:57 UTC
Permalink
From: Matthew Auld <***@intel.com>

commit 1d61c5d711a2dc0b978ae905535edee9601f9449 upstream.

The alignment is u64, and yet is_power_of_2() assumes unsigned long,
which might give different results between 32b and 64b kernel.

Signed-off-by: Matthew Auld <***@intel.com>
Cc: Chris Wilson <***@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <***@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <***@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305203534.210466-1-***@intel.com
Cc: ***@vger.kernel.org
(cherry picked from commit 2920516b2f719546f55079bc39a7fe409d9e80ab)
Signed-off-by: Jani Nikula <***@intel.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 3 ++-
drivers/gpu/drm/i915/i915_utils.h | 5 +++++
2 files changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -441,7 +441,8 @@ eb_validate_vma(struct i915_execbuffer *
if (unlikely(entry->flags & eb->invalid_flags))
return -EINVAL;

- if (unlikely(entry->alignment && !is_power_of_2(entry->alignment)))
+ if (unlikely(entry->alignment &&
+ !is_power_of_2_u64(entry->alignment)))
return -EINVAL;

/*
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -234,6 +234,11 @@ static inline u64 ptr_to_u64(const void
__idx; \
})

+static inline bool is_power_of_2_u64(u64 n)
+{
+ return (n != 0 && ((n & (n - 1)) == 0));
+}
+
static inline void __list_del_many(struct list_head *head,
struct list_head *first)
{
Greg Kroah-Hartman
2020-03-17 10:54:58 UTC
Permalink
From: Chris Wilson <***@chris-wilson.co.uk>

commit 14a0d527a479eb2cb6067f9e5e163e1bf35db2a9 upstream.

Since the semaphore fence may be signaled from inside an interrupt
handler from inside a request holding its request->lock, we cannot then
enter into the engine->active.lock for processing the semaphore priority
bump as we may traverse our call tree and end up on another held
request.

CPU 0:
[ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0
[ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915]
[ 2243.218869] semaphore_notify+0x6d/0x98 [i915]
[ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2243.218874] ? kmem_cache_free+0x211/0x290
[ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915]
[ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915]
[ 2243.218881] signal_irq_work+0x571/0x690 [i915]
[ 2243.218883] irq_work_run_list+0xd7/0x120
[ 2243.218885] irq_work_run+0x1d/0x50
[ 2243.218887] smp_irq_work_interrupt+0x21/0x30
[ 2243.218889] irq_work_interrupt+0xf/0x20

CPU 1:
[ 2242.173107] _raw_spin_lock+0x8f/0xa0
[ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915]
[ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915]
[ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915]
[ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915]
[ 2242.173119] submit_notify+0x8f/0xc0 [i915]
[ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915]
[ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318
Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <***@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <***@intel.com>
Cc: <***@vger.kernel.org> # v5.2+
Reviewed-by: Tvrtko Ursulin <***@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-***@chris-wilson.co.uk
(cherry picked from commit 209df10bb4536c81c2540df96c02cd079435357f)
Signed-off-by: Jani Nikula <***@intel.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/i915_request.c | 22 +++++++++++++++++-----
drivers/gpu/drm/i915/i915_request.h | 2 ++
2 files changed, 19 insertions(+), 5 deletions(-)

--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -529,19 +529,31 @@ submit_notify(struct i915_sw_fence *fenc
return NOTIFY_DONE;
}

+static void irq_semaphore_cb(struct irq_work *wrk)
+{
+ struct i915_request *rq =
+ container_of(wrk, typeof(*rq), semaphore_work);
+
+ i915_schedule_bump_priority(rq, I915_PRIORITY_NOSEMAPHORE);
+ i915_request_put(rq);
+}
+
static int __i915_sw_fence_call
semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
{
- struct i915_request *request =
- container_of(fence, typeof(*request), semaphore);
+ struct i915_request *rq = container_of(fence, typeof(*rq), semaphore);

switch (state) {
case FENCE_COMPLETE:
- i915_schedule_bump_priority(request, I915_PRIORITY_NOSEMAPHORE);
+ if (!(READ_ONCE(rq->sched.attr.priority) & I915_PRIORITY_NOSEMAPHORE)) {
+ i915_request_get(rq);
+ init_irq_work(&rq->semaphore_work, irq_semaphore_cb);
+ irq_work_queue(&rq->semaphore_work);
+ }
break;

case FENCE_FREE:
- i915_request_put(request);
+ i915_request_put(rq);
break;
}

@@ -1283,9 +1295,9 @@ void __i915_request_queue(struct i915_re
* decide whether to preempt the entire chain so that it is ready to
* run at the earliest possible convenience.
*/
- i915_sw_fence_commit(&rq->semaphore);
if (attr && rq->engine->schedule)
rq->engine->schedule(rq, attr);
+ i915_sw_fence_commit(&rq->semaphore);
i915_sw_fence_commit(&rq->submit);
}

--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -26,6 +26,7 @@
#define I915_REQUEST_H

#include <linux/dma-fence.h>
+#include <linux/irq_work.h>
#include <linux/lockdep.h>

#include "gt/intel_context_types.h"
@@ -147,6 +148,7 @@ struct i915_request {
};
struct list_head execute_cb;
struct i915_sw_fence semaphore;
+ struct irq_work semaphore_work;

/*
* A list of everyone we wait upon, and everyone who waits upon us.
Greg Kroah-Hartman
2020-03-17 10:54:59 UTC
Permalink
From: Chris Wilson <***@chris-wilson.co.uk>

commit 8ea6bb8e4d47e07518e5dba4f5cb77e210f0df82 upstream.

If the cacheline may still be busy, atomically mark it for future
release, and only if we can determine that it will never be used again,
immediately free it.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1392
Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <***@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <***@intel.com>
Cc: Mika Kuoppala <***@linux.intel.com>
Cc: Matthew Auld <***@intel.com>
Reviewed-by: Mika Kuoppala <***@linux.intel.com>
Cc: <***@vger.kernel.org> # v5.2+
Link: https://patchwork.freedesktop.org/patch/msgid/20200306154647.3528345-1-***@chris-wilson.co.uk
(cherry picked from commit 2d4bd971f5baa51418625f379a69f5d58b5a0450)
Signed-off-by: Jani Nikula <***@intel.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/gt/intel_timeline.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -197,11 +197,15 @@ static void cacheline_release(struct int

static void cacheline_free(struct intel_timeline_cacheline *cl)
{
+ if (!i915_active_acquire_if_busy(&cl->active)) {
+ __idle_cacheline_free(cl);
+ return;
+ }
+
GEM_BUG_ON(ptr_test_bit(cl->vaddr, CACHELINE_FREE));
cl->vaddr = ptr_set_bit(cl->vaddr, CACHELINE_FREE);

- if (i915_active_is_idle(&cl->active))
- __idle_cacheline_free(cl);
+ i915_active_release(&cl->active);
}

int intel_timeline_init(struct intel_timeline *timeline,
Greg Kroah-Hartman
2020-03-17 10:55:00 UTC
Permalink
From: Chris Wilson <***@chris-wilson.co.uk>

commit eafc2aa20fba319b6e791a1b0c45a91511eccb6b upstream.

If we stop filling the ELSP due to an incompatible virtual engine
request, check if we should enable the timeslice on behalf of the queue.

This fixes the case where we are inspecting the last->next element when
we know that the last element is the last request in the execution queue,
and so decided we did not need to enable timeslicing despite the intent
to do so!

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <***@chris-wilson.co.uk>
Cc: Mika Kuoppala <***@linux.intel.com>
Cc: Tvrtko Ursulin <***@intel.com>
Cc: <***@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <***@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306113012.3184606-1-***@chris-wilson.co.uk
(cherry picked from commit 3df2deed411e0f1b7312baf0139aab8bba4c0410)
Signed-off-by: Jani Nikula <***@intel.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/gt/intel_lrc.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)

--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1501,11 +1501,9 @@ need_timeslice(struct intel_engine_cs *e
if (!intel_engine_has_timeslices(engine))
return false;

- if (list_is_last(&rq->sched.link, &engine->active.requests))
- return false;
-
- hint = max(rq_prio(list_next_entry(rq, sched.link)),
- engine->execlists.queue_priority_hint);
+ hint = engine->execlists.queue_priority_hint;
+ if (!list_is_last(&rq->sched.link, &engine->active.requests))
+ hint = max(hint, rq_prio(list_next_entry(rq, sched.link)));

return hint >= effective_prio(rq);
}
@@ -1547,6 +1545,18 @@ static void set_timeslice(struct intel_e
set_timer_ms(&engine->execlists.timer, active_timeslice(engine));
}

+static void start_timeslice(struct intel_engine_cs *engine)
+{
+ struct intel_engine_execlists *execlists = &engine->execlists;
+
+ execlists->switch_priority_hint = execlists->queue_priority_hint;
+
+ if (timer_pending(&execlists->timer))
+ return;
+
+ set_timer_ms(&execlists->timer, timeslice(engine));
+}
+
static void record_preemption(struct intel_engine_execlists *execlists)
{
(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
@@ -1705,11 +1715,7 @@ static void execlists_dequeue(struct int
* Even if ELSP[1] is occupied and not worthy
* of timeslices, our queue might be.
*/
- if (!execlists->timer.expires &&
- need_timeslice(engine, last))
- set_timer_ms(&execlists->timer,
- timeslice(engine));
-
+ start_timeslice(engine);
return;
}
}
@@ -1744,7 +1750,8 @@ static void execlists_dequeue(struct int

if (last && !can_merge_rq(last, rq)) {
spin_unlock(&ve->base.active.lock);
- return; /* leave this for another */
+ start_timeslice(engine);
+ return; /* leave this for another sibling */
}

GEM_TRACE("%s: virtual rq=%llx:%lld%s, new engine? %s\n",
Greg Kroah-Hartman
2020-03-17 10:55:01 UTC
Permalink
From: Ben Chuang <***@genesyslogic.com.tw>

commit 31e43f31890ca6e909b27dcb539252b46aa465da upstream.

Enable MSI interrupt for GL9750/GL9755. Some platforms
do not support PCI INTx and devices can not work without
interrupt. Like messages below:

[ 4.487132] sdhci-pci 0000:01:00.0: SDHCI controller found [17a0:9755] (rev 0)
[ 4.487198] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.PBR2._PRT.APS2], AE_NOT_FOUND (20190816/psargs-330)
[ 4.487397] ACPI Error: Aborting method \_SB.PCI0.PBR2._PRT due to previous error (AE_NOT_FOUND) (20190816/psparse-529)
[ 4.487707] pcieport 0000:00:01.3: can't derive routing for PCI INT A
[ 4.487709] sdhci-pci 0000:01:00.0: PCI INT A: no GSI

Signed-off-by: Ben Chuang <***@genesyslogic.com.tw>
Tested-by: Raul E Rangel <***@chromium.org>
Fixes: e51df6ce668a ("mmc: host: sdhci-pci: Add Genesys Logic GL975x support")
Cc: ***@vger.kernel.org
Link: https://lore.kernel.org/r/20200219092900.9151-1-***@gmail.com
Signed-off-by: Ulf Hansson <***@linaro.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/mmc/host/sdhci-pci-gli.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -262,10 +262,26 @@ static int gl9750_execute_tuning(struct
return 0;
}

+static void gli_pcie_enable_msi(struct sdhci_pci_slot *slot)
+{
+ int ret;
+
+ ret = pci_alloc_irq_vectors(slot->chip->pdev, 1, 1,
+ PCI_IRQ_MSI | PCI_IRQ_MSIX);
+ if (ret < 0) {
+ pr_warn("%s: enable PCI MSI failed, error=%d\n",
+ mmc_hostname(slot->host->mmc), ret);
+ return;
+ }
+
+ slot->host->irq = pci_irq_vector(slot->chip->pdev, 0);
+}
+
static int gli_probe_slot_gl9750(struct sdhci_pci_slot *slot)
{
struct sdhci_host *host = slot->host;

+ gli_pcie_enable_msi(slot);
slot->host->mmc->caps2 |= MMC_CAP2_NO_SDIO;
sdhci_enable_v4_mode(host);

@@ -276,6 +292,7 @@ static int gli_probe_slot_gl9755(struct
{
struct sdhci_host *host = slot->host;

+ gli_pcie_enable_msi(slot);
slot->host->mmc->caps2 |= MMC_CAP2_NO_SDIO;
sdhci_enable_v4_mode(host);
Greg Kroah-Hartman
2020-03-17 10:54:28 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 213320a67962ff6e7b83b704d55cbebc341426db ]

Add missing attribute validation for TIPC_NLA_PROP_MTU
to the netlink policy.

Fixes: 901271e0403a ("tipc: implement configuration of UDP media MTU")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Signed-off-by: Jakub Kicinski <***@kernel.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/tipc/netlink.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/tipc/netlink.c
+++ b/net/tipc/netlink.c
@@ -115,6 +115,7 @@ const struct nla_policy tipc_nl_prop_pol
[TIPC_NLA_PROP_PRIO] = { .type = NLA_U32 },
[TIPC_NLA_PROP_TOL] = { .type = NLA_U32 },
[TIPC_NLA_PROP_WIN] = { .type = NLA_U32 },
+ [TIPC_NLA_PROP_MTU] = { .type = NLA_U32 },
[TIPC_NLA_PROP_BROADCAST] = { .type = NLA_U32 },
[TIPC_NLA_PROP_BROADCAST_RATIO] = { .type = NLA_U32 }
};
Greg Kroah-Hartman
2020-03-17 10:55:02 UTC
Permalink
From: Mathias Kresin <***@kresin.me>

commit d62e7fbea4951c124a24176da0c7bf3003ec53d4 upstream.

Add the missing semicolon after of_node_put to get the file compiled.

Fixes: f17d2f54d36d ("pinctrl: falcon: Add of_node_put() before return")
Cc: ***@vger.kernel.org # v5.4+
Signed-off-by: Mathias Kresin <***@kresin.me>
Link: https://lore.kernel.org/r/20200305182245.9636-1-***@kresin.me
Acked-by: Thomas Langer <***@intel.com>
Signed-off-by: Linus Walleij <***@linaro.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/pinctrl/pinctrl-falcon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/pinctrl/pinctrl-falcon.c
+++ b/drivers/pinctrl/pinctrl-falcon.c
@@ -451,7 +451,7 @@ static int pinctrl_falcon_probe(struct p
falcon_info.clk[*bank] = clk_get(&ppdev->dev, NULL);
if (IS_ERR(falcon_info.clk[*bank])) {
dev_err(&ppdev->dev, "failed to get clock\n");
- of_node_put(np)
+ of_node_put(np);
return PTR_ERR(falcon_info.clk[*bank]);
}
falcon_info.membase[*bank] = devm_ioremap_resource(&pdev->dev,
Greg Kroah-Hartman
2020-03-17 10:54:29 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 361d23e41ca6e504033f7e66a03b95788377caae ]

Add missing attribute validation for NFC_ATTR_SE_INDEX
to the netlink policy.

Fixes: 5ce3f32b5264 ("NFC: netlink: SE API implementation")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/nfc/netlink.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -43,6 +43,7 @@ static const struct nla_policy nfc_genl_
[NFC_ATTR_LLC_SDP] = { .type = NLA_NESTED },
[NFC_ATTR_FIRMWARE_NAME] = { .type = NLA_STRING,
.len = NFC_FIRMWARE_NAME_MAXSIZE },
+ [NFC_ATTR_SE_INDEX] = { .type = NLA_U32 },
[NFC_ATTR_SE_APDU] = { .type = NLA_BINARY },
[NFC_ATTR_VENDOR_DATA] = { .type = NLA_BINARY },
Greg Kroah-Hartman
2020-03-17 10:54:31 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 6ba3da446551f2150fadbf8c7788edcb977683d3 ]

Add missing attribute validation for vendor subcommand attributes
to the netlink policy.

Fixes: 9e58095f9660 ("NFC: netlink: Implement vendor command support")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/nfc/netlink.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -46,6 +46,8 @@ static const struct nla_policy nfc_genl_
.len = NFC_FIRMWARE_NAME_MAXSIZE },
[NFC_ATTR_SE_INDEX] = { .type = NLA_U32 },
[NFC_ATTR_SE_APDU] = { .type = NLA_BINARY },
+ [NFC_ATTR_VENDOR_ID] = { .type = NLA_U32 },
+ [NFC_ATTR_VENDOR_SUBCMD] = { .type = NLA_U32 },
[NFC_ATTR_VENDOR_DATA] = { .type = NLA_BINARY },

};
Greg Kroah-Hartman
2020-03-17 10:54:30 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 88e706d5168b07df4792dbc3d1bc37b83e4bd74d ]

Add missing attribute validation for NFC_ATTR_TARGET_INDEX
to the netlink policy.

Fixes: 4d63adfe12dd ("NFC: Add NFC_CMD_DEACTIVATE_TARGET support")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/nfc/netlink.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -32,6 +32,7 @@ static const struct nla_policy nfc_genl_
[NFC_ATTR_DEVICE_NAME] = { .type = NLA_STRING,
.len = NFC_DEVICE_NAME_MAXSIZE },
[NFC_ATTR_PROTOCOLS] = { .type = NLA_U32 },
+ [NFC_ATTR_TARGET_INDEX] = { .type = NLA_U32 },
[NFC_ATTR_COMM_MODE] = { .type = NLA_U8 },
[NFC_ATTR_RF_MODE] = { .type = NLA_U8 },
[NFC_ATTR_DEVICE_POWERED] = { .type = NLA_U8 },
Greg Kroah-Hartman
2020-03-17 10:54:24 UTC
Permalink
From: Jakub Kicinski <***@kernel.org>

[ Upstream commit 7e6dc03eeb023e18427a373522f1d247b916a641 ]

Add missing attribute validation for TCA_FQ_ORPHAN_MASK
to the netlink policy.

Fixes: 06eb395fa985 ("pkt_sched: fq: better control of DDOS traffic")
Signed-off-by: Jakub Kicinski <***@kernel.org>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
net/sched/sch_fq.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -744,6 +744,7 @@ static const struct nla_policy fq_policy
[TCA_FQ_FLOW_MAX_RATE] = { .type = NLA_U32 },
[TCA_FQ_BUCKETS_LOG] = { .type = NLA_U32 },
[TCA_FQ_FLOW_REFILL_DELAY] = { .type = NLA_U32 },
+ [TCA_FQ_ORPHAN_MASK] = { .type = NLA_U32 },
[TCA_FQ_LOW_RATE_THRESHOLD] = { .type = NLA_U32 },
[TCA_FQ_CE_THRESHOLD] = { .type = NLA_U32 },
};
Greg Kroah-Hartman
2020-03-17 10:55:04 UTC
Permalink
From: Steven Rostedt (VMware) <***@goodmis.org>

commit 4d00fc477a2ce8b6d2b09fb34ef9fe9918e7d434 upstream.

Before rebooting the box, a "ssh sync" is called to the test machine to see
if it is alive or not. But if the test machine is in a partial state, that
ssh may never actually finish, and the ktest test hangs.

Add a 10 second timeout to the sync test, which will fail after 10 seconds
and then cause the test to reboot the test machine.

Cc: ***@vger.kernel.org
Fixes: 6474ace999edd ("ktest.pl: Powercycle the box on reboot if no connection can be made")
Signed-off-by: Steven Rostedt (VMware) <***@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
tools/testing/ktest/ktest.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/tools/testing/ktest/ktest.pl
+++ b/tools/testing/ktest/ktest.pl
@@ -1383,7 +1383,7 @@ sub reboot {

} else {
# Make sure everything has been written to disk
- run_ssh("sync");
+ run_ssh("sync", 10);

if (defined($time)) {
start_monitor;
Greg Kroah-Hartman
2020-03-17 10:55:13 UTC
Permalink
From: H. Nikolaus Schaller <***@goldelico.com>

commit 130ab8819d81bd96f1a71e8461a8f73edf1fbe82 upstream.

Interrupts should not be specified by interrupt line but by
gpio parent and reference.

Fixes: 73f2b940474d ("MIPS: CI20: DTS: Add I2C nodes")
Cc: ***@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <***@goldelico.com>
Reviewed-by: Paul Cercueil <***@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <***@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
arch/mips/boot/dts/ingenic/ci20.dts | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/mips/boot/dts/ingenic/ci20.dts
+++ b/arch/mips/boot/dts/ingenic/ci20.dts
@@ -4,6 +4,7 @@
#include "jz4780.dtsi"
#include <dt-bindings/clock/ingenic,tcu.h>
#include <dt-bindings/gpio/gpio.h>
+#include <dt-bindings/interrupt-controller/irq.h>
#include <dt-bindings/regulator/active-semi,8865-regulator.h>

/ {
@@ -270,7 +271,9 @@
***@51 {
compatible = "nxp,pcf8563";
reg = <0x51>;
- interrupts = <110>;
+
+ interrupt-parent = <&gpf>;
+ interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
};
};
Greg Kroah-Hartman
2020-03-17 10:55:14 UTC
Permalink
From: Paul Cercueil <***@crapouillou.net>

commit 8e029eb0bcd6a7fab6dc9191152c085784c31ee6 upstream.

The CONFIG_MIPS_CMDLINE_DTB_EXTEND option is used so that the kernel
arguments provided in the 'bootargs' property in devicetree are extended
with the kernel arguments provided by the bootloader.

The code was broken, as it didn't actually take any of the kernel
arguments provided in devicetree when that option was set.

Fixes: 7784cac69735 ("MIPS: cmdline: Clean up boot_command_line initialization")
Cc: ***@vger.kernel.org
Signed-off-by: Paul Cercueil <***@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <***@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
arch/mips/kernel/setup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -606,7 +606,8 @@ static void __init bootcmdline_init(char
* If we're configured to take boot arguments from DT, look for those
* now.
*/
- if (IS_ENABLED(CONFIG_MIPS_CMDLINE_FROM_DTB))
+ if (IS_ENABLED(CONFIG_MIPS_CMDLINE_FROM_DTB) ||
+ IS_ENABLED(CONFIG_MIPS_CMDLINE_DTB_EXTEND))
of_scan_flat_dt(bootcmdline_scan_chosen, &dt_bootargs);
#endif
Greg Kroah-Hartman
2020-03-17 10:55:15 UTC
Permalink
From: Stefan Haberland <***@linux.ibm.com>

commit 5e6bdd37c5526ef01326df5dabb93011ee89237e upstream.

Devices are formatted in multiple of tracks.
For an Extent Space Efficient (ESE) volume we get errors when accessing
unformatted tracks. In this case the driver either formats the track on
the flight for write requests or returns zero data for read requests.

In case a request spans multiple tracks, the indication of an unformatted
track presented for the first track is incorrectly applied to all tracks
covered by the request. As a result, tracks containing data will be handled
as empty, resulting in zero data being returned on read, or overwriting
existing data with zero on write.

Fix by determining the track that gets the NRF error.
For write requests only format the track that is surely not formatted.
For Read requests all tracks before have returned valid data and should not
be touched.
All tracks after the unformatted track might be formatted or not. Those are
returned to the blocklayer to build a new request.

When using alias devices there is a chance that multiple write requests
trigger a format of the same track which might lead to data loss. Ensure
that a track is formatted only once by maintaining a list of currently
processed tracks.

Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
Cc: ***@vger.kernel.org # 5.3+
Signed-off-by: Stefan Haberland <***@linux.ibm.com>
Reviewed-by: Jan Hoeppner <***@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <***@linux.ibm.com>
Signed-off-by: Jens Axboe <***@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/s390/block/dasd.c | 27 ++++++
drivers/s390/block/dasd_eckd.c | 163 +++++++++++++++++++++++++++++++++++++++--
drivers/s390/block/dasd_int.h | 15 +++
3 files changed, 193 insertions(+), 12 deletions(-)

--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -178,6 +178,8 @@ struct dasd_block *dasd_alloc_block(void
(unsigned long) block);
INIT_LIST_HEAD(&block->ccw_queue);
spin_lock_init(&block->queue_lock);
+ INIT_LIST_HEAD(&block->format_list);
+ spin_lock_init(&block->format_lock);
timer_setup(&block->timer, dasd_block_timeout, 0);
spin_lock_init(&block->profile.lock);

@@ -1779,20 +1781,26 @@ void dasd_int_handler(struct ccw_device

if (dasd_ese_needs_format(cqr->block, irb)) {
if (rq_data_dir((struct request *)cqr->callback_data) == READ) {
- device->discipline->ese_read(cqr);
+ device->discipline->ese_read(cqr, irb);
cqr->status = DASD_CQR_SUCCESS;
cqr->stopclk = now;
dasd_device_clear_timer(device);
dasd_schedule_device_bh(device);
return;
}
- fcqr = device->discipline->ese_format(device, cqr);
+ fcqr = device->discipline->ese_format(device, cqr, irb);
if (IS_ERR(fcqr)) {
+ if (PTR_ERR(fcqr) == -EINVAL) {
+ cqr->status = DASD_CQR_ERROR;
+ return;
+ }
/*
* If we can't format now, let the request go
* one extra round. Maybe we can format later.
*/
cqr->status = DASD_CQR_QUEUED;
+ dasd_schedule_device_bh(device);
+ return;
} else {
fcqr->status = DASD_CQR_QUEUED;
cqr->status = DASD_CQR_QUEUED;
@@ -2748,11 +2756,13 @@ static void __dasd_cleanup_cqr(struct da
{
struct request *req;
blk_status_t error = BLK_STS_OK;
+ unsigned int proc_bytes;
int status;

req = (struct request *) cqr->callback_data;
dasd_profile_end(cqr->block, cqr, req);

+ proc_bytes = cqr->proc_bytes;
status = cqr->block->base->discipline->free_cp(cqr, req);
if (status < 0)
error = errno_to_blk_status(status);
@@ -2783,7 +2793,18 @@ static void __dasd_cleanup_cqr(struct da
blk_mq_end_request(req, error);
blk_mq_run_hw_queues(req->q, true);
} else {
- blk_mq_complete_request(req);
+ /*
+ * Partial completed requests can happen with ESE devices.
+ * During read we might have gotten a NRF error and have to
+ * complete a request partially.
+ */
+ if (proc_bytes) {
+ blk_update_request(req, BLK_STS_OK,
+ blk_rq_bytes(req) - proc_bytes);
+ blk_mq_requeue_request(req, true);
+ } else {
+ blk_mq_complete_request(req);
+ }
}
}

--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -207,6 +207,45 @@ static void set_ch_t(struct ch_t *geo, _
geo->head |= head;
}

+/*
+ * calculate failing track from sense data depending if
+ * it is an EAV device or not
+ */
+static int dasd_eckd_track_from_irb(struct irb *irb, struct dasd_device *device,
+ sector_t *track)
+{
+ struct dasd_eckd_private *private = device->private;
+ u8 *sense = NULL;
+ u32 cyl;
+ u8 head;
+
+ sense = dasd_get_sense(irb);
+ if (!sense) {
+ DBF_DEV_EVENT(DBF_WARNING, device, "%s",
+ "ESE error no sense data\n");
+ return -EINVAL;
+ }
+ if (!(sense[27] & DASD_SENSE_BIT_2)) {
+ DBF_DEV_EVENT(DBF_WARNING, device, "%s",
+ "ESE error no valid track data\n");
+ return -EINVAL;
+ }
+
+ if (sense[27] & DASD_SENSE_BIT_3) {
+ /* enhanced addressing */
+ cyl = sense[30] << 20;
+ cyl |= (sense[31] & 0xF0) << 12;
+ cyl |= sense[28] << 8;
+ cyl |= sense[29];
+ } else {
+ cyl = sense[29] << 8;
+ cyl |= sense[30];
+ }
+ head = sense[31] & 0x0F;
+ *track = cyl * private->rdc_data.trk_per_cyl + head;
+ return 0;
+}
+
static int set_timestamp(struct ccw1 *ccw, struct DE_eckd_data *data,
struct dasd_device *device)
{
@@ -2986,6 +3025,37 @@ static int dasd_eckd_format_device(struc
0, NULL);
}

+static bool test_and_set_format_track(struct dasd_format_entry *to_format,
+ struct dasd_block *block)
+{
+ struct dasd_format_entry *format;
+ unsigned long flags;
+ bool rc = false;
+
+ spin_lock_irqsave(&block->format_lock, flags);
+ list_for_each_entry(format, &block->format_list, list) {
+ if (format->track == to_format->track) {
+ rc = true;
+ goto out;
+ }
+ }
+ list_add_tail(&to_format->list, &block->format_list);
+
+out:
+ spin_unlock_irqrestore(&block->format_lock, flags);
+ return rc;
+}
+
+static void clear_format_track(struct dasd_format_entry *format,
+ struct dasd_block *block)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&block->format_lock, flags);
+ list_del_init(&format->list);
+ spin_unlock_irqrestore(&block->format_lock, flags);
+}
+
/*
* Callback function to free ESE format requests.
*/
@@ -2993,15 +3063,19 @@ static void dasd_eckd_ese_format_cb(stru
{
struct dasd_device *device = cqr->startdev;
struct dasd_eckd_private *private = device->private;
+ struct dasd_format_entry *format = data;

+ clear_format_track(format, cqr->basedev->block);
private->count--;
dasd_ffree_request(cqr, device);
}

static struct dasd_ccw_req *
-dasd_eckd_ese_format(struct dasd_device *startdev, struct dasd_ccw_req *cqr)
+dasd_eckd_ese_format(struct dasd_device *startdev, struct dasd_ccw_req *cqr,
+ struct irb *irb)
{
struct dasd_eckd_private *private;
+ struct dasd_format_entry *format;
struct format_data_t fdata;
unsigned int recs_per_trk;
struct dasd_ccw_req *fcqr;
@@ -3011,23 +3085,39 @@ dasd_eckd_ese_format(struct dasd_device
struct request *req;
sector_t first_trk;
sector_t last_trk;
+ sector_t curr_trk;
int rc;

req = cqr->callback_data;
- base = cqr->block->base;
+ block = cqr->block;
+ base = block->base;
private = base->private;
- block = base->block;
blksize = block->bp_block;
recs_per_trk = recs_per_track(&private->rdc_data, 0, blksize);
+ format = &startdev->format_entry;

first_trk = blk_rq_pos(req) >> block->s2b_shift;
sector_div(first_trk, recs_per_trk);
last_trk =
(blk_rq_pos(req) + blk_rq_sectors(req) - 1) >> block->s2b_shift;
sector_div(last_trk, recs_per_trk);
+ rc = dasd_eckd_track_from_irb(irb, base, &curr_trk);
+ if (rc)
+ return ERR_PTR(rc);
+
+ if (curr_trk < first_trk || curr_trk > last_trk) {
+ DBF_DEV_EVENT(DBF_WARNING, startdev,
+ "ESE error track %llu not within range %llu - %llu\n",
+ curr_trk, first_trk, last_trk);
+ return ERR_PTR(-EINVAL);
+ }
+ format->track = curr_trk;
+ /* test if track is already in formatting by another thread */
+ if (test_and_set_format_track(format, block))
+ return ERR_PTR(-EEXIST);

- fdata.start_unit = first_trk;
- fdata.stop_unit = last_trk;
+ fdata.start_unit = curr_trk;
+ fdata.stop_unit = curr_trk;
fdata.blksize = blksize;
fdata.intensity = private->uses_cdl ? DASD_FMT_INT_COMPAT : 0;

@@ -3044,6 +3134,7 @@ dasd_eckd_ese_format(struct dasd_device
return fcqr;

fcqr->callback = dasd_eckd_ese_format_cb;
+ fcqr->callback_data = (void *) format;

return fcqr;
}
@@ -3051,29 +3142,87 @@ dasd_eckd_ese_format(struct dasd_device
/*
* When data is read from an unformatted area of an ESE volume, this function
* returns zeroed data and thereby mimics a read of zero data.
+ *
+ * The first unformatted track is the one that got the NRF error, the address is
+ * encoded in the sense data.
+ *
+ * All tracks before have returned valid data and should not be touched.
+ * All tracks after the unformatted track might be formatted or not. This is
+ * currently not known, remember the processed data and return the remainder of
+ * the request to the blocklayer in __dasd_cleanup_cqr().
*/
-static void dasd_eckd_ese_read(struct dasd_ccw_req *cqr)
+static int dasd_eckd_ese_read(struct dasd_ccw_req *cqr, struct irb *irb)
{
+ struct dasd_eckd_private *private;
+ sector_t first_trk, last_trk;
+ sector_t first_blk, last_blk;
unsigned int blksize, off;
+ unsigned int recs_per_trk;
struct dasd_device *base;
struct req_iterator iter;
+ struct dasd_block *block;
+ unsigned int skip_block;
+ unsigned int blk_count;
struct request *req;
struct bio_vec bv;
+ sector_t curr_trk;
+ sector_t end_blk;
char *dst;
+ int rc;

req = (struct request *) cqr->callback_data;
base = cqr->block->base;
blksize = base->block->bp_block;
+ block = cqr->block;
+ private = base->private;
+ skip_block = 0;
+ blk_count = 0;
+
+ recs_per_trk = recs_per_track(&private->rdc_data, 0, blksize);
+ first_trk = first_blk = blk_rq_pos(req) >> block->s2b_shift;
+ sector_div(first_trk, recs_per_trk);
+ last_trk = last_blk =
+ (blk_rq_pos(req) + blk_rq_sectors(req) - 1) >> block->s2b_shift;
+ sector_div(last_trk, recs_per_trk);
+ rc = dasd_eckd_track_from_irb(irb, base, &curr_trk);
+ if (rc)
+ return rc;
+
+ /* sanity check if the current track from sense data is valid */
+ if (curr_trk < first_trk || curr_trk > last_trk) {
+ DBF_DEV_EVENT(DBF_WARNING, base,
+ "ESE error track %llu not within range %llu - %llu\n",
+ curr_trk, first_trk, last_trk);
+ return -EINVAL;
+ }
+
+ /*
+ * if not the first track got the NRF error we have to skip over valid
+ * blocks
+ */
+ if (curr_trk != first_trk)
+ skip_block = curr_trk * recs_per_trk - first_blk;
+
+ /* we have no information beyond the current track */
+ end_blk = (curr_trk + 1) * recs_per_trk;

rq_for_each_segment(bv, req, iter) {
dst = page_address(bv.bv_page) + bv.bv_offset;
for (off = 0; off < bv.bv_len; off += blksize) {
- if (dst && rq_data_dir(req) == READ) {
+ if (first_blk + blk_count >= end_blk) {
+ cqr->proc_bytes = blk_count * blksize;
+ return 0;
+ }
+ if (dst && !skip_block) {
dst += off;
memset(dst, 0, blksize);
+ } else {
+ skip_block--;
}
+ blk_count++;
}
}
+ return 0;
}

/*
--- a/drivers/s390/block/dasd_int.h
+++ b/drivers/s390/block/dasd_int.h
@@ -187,6 +187,7 @@ struct dasd_ccw_req {

void (*callback)(struct dasd_ccw_req *, void *data);
void *callback_data;
+ unsigned int proc_bytes; /* bytes for partial completion */
};

/*
@@ -387,8 +388,9 @@ struct dasd_discipline {
int (*ext_pool_warn_thrshld)(struct dasd_device *);
int (*ext_pool_oos)(struct dasd_device *);
int (*ext_pool_exhaust)(struct dasd_device *, struct dasd_ccw_req *);
- struct dasd_ccw_req *(*ese_format)(struct dasd_device *, struct dasd_ccw_req *);
- void (*ese_read)(struct dasd_ccw_req *);
+ struct dasd_ccw_req *(*ese_format)(struct dasd_device *,
+ struct dasd_ccw_req *, struct irb *);
+ int (*ese_read)(struct dasd_ccw_req *, struct irb *);
};

extern struct dasd_discipline *dasd_diag_discipline_pointer;
@@ -474,6 +476,11 @@ struct dasd_profile {
spinlock_t lock;
};

+struct dasd_format_entry {
+ struct list_head list;
+ sector_t track;
+};
+
struct dasd_device {
/* Block device stuff. */
struct dasd_block *block;
@@ -539,6 +546,7 @@ struct dasd_device {
struct dentry *debugfs_dentry;
struct dentry *hosts_dentry;
struct dasd_profile profile;
+ struct dasd_format_entry format_entry;
};

struct dasd_block {
@@ -564,6 +572,9 @@ struct dasd_block {

struct dentry *debugfs_dentry;
struct dasd_profile profile;
+
+ struct list_head format_list;
+ spinlock_t format_lock;
};

struct dasd_attention_data {
Greg Kroah-Hartman
2020-03-17 10:55:17 UTC
Permalink
From: Artem Savkov <***@redhat.com>

commit d9815bff6b379ff46981bea9dfeb146081eab314 upstream.

It appears that ip ranges can overlap so. In that case lookup_rec()
returns whatever results it got last even if it found nothing in last
searched page.

This breaks an obscure livepatch late module patching usecase:
- load livepatch
- load the patched module
- unload livepatch
- try to load livepatch again

To fix this return from lookup_rec() as soon as it found the record
containing searched-for ip. This used to be this way prior lookup_rec()
introduction.

Link: http://lkml.kernel.org/r/20200306174317.21699-1-***@redhat.com

Cc: ***@vger.kernel.org
Fixes: 7e16f581a817 ("ftrace: Separate out functionality from ftrace_location_range()")
Signed-off-by: Artem Savkov <***@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <***@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
kernel/trace/ftrace.c | 2 ++
1 file changed, 2 insertions(+)

--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1552,6 +1552,8 @@ static struct dyn_ftrace *lookup_rec(uns
rec = bsearch(&key, pg->records, pg->index,
sizeof(struct dyn_ftrace),
ftrace_cmp_recs);
+ if (rec)
+ break;
}
return rec;
}
Greg Kroah-Hartman
2020-03-17 10:55:16 UTC
Permalink
From: Takashi Iwai <***@suse.de>

commit 443d372d6a96cd94ad119e5c14bb4d63a536a7f6 upstream.

Although the IRQ assignment in ipmi_si driver is optional,
platform_get_irq() spews error messages unnecessarily:
ipmi_si dmi-ipmi-si.0: IRQ index 0 not found

Fix this by switching to platform_get_irq_optional().

Cc: ***@vger.kernel.org # 5.4.x
Cc: John Donnelly <***@oracle.com>
Fixes: 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()")
Reported-and-tested-by: Patrick Vo <***@hpe.com>
Signed-off-by: Takashi Iwai <***@suse.de>
Message-Id: <20200205093146.1352-1-***@suse.de>
Signed-off-by: Corey Minyard <***@mvista.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/char/ipmi/ipmi_si_platform.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/char/ipmi/ipmi_si_platform.c
+++ b/drivers/char/ipmi/ipmi_si_platform.c
@@ -194,7 +194,7 @@ static int platform_ipmi_probe(struct pl
else
io.slave_addr = slave_addr;

- io.irq = platform_get_irq(pdev, 0);
+ io.irq = platform_get_irq_optional(pdev, 0);
if (io.irq > 0)
io.irq_setup = ipmi_std_irq_setup;
else
@@ -378,7 +378,7 @@ static int acpi_ipmi_probe(struct platfo
io.irq = tmp;
io.irq_setup = acpi_gpe_irq_setup;
} else {
- int irq = platform_get_irq(pdev, 0);
+ int irq = platform_get_irq_optional(pdev, 0);

if (irq > 0) {
io.irq = irq;
Greg Kroah-Hartman
2020-03-17 10:55:19 UTC
Permalink
From: Eric Biggers <***@google.com>

commit 2b4eae95c7361e0a147b838715c8baa1380a428f upstream.

After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the
filesystem and tries to get and put all inodes that were unlocked by the
key so that unused inodes get evicted via fscrypt_drop_inode().
Normally, the inodes are all clean due to the sync.

However, after the filesystem is sync'ed, userspace can modify and close
one of the files. (Userspace is *supposed* to close the files before
removing the key. But it doesn't always happen, and the kernel can't
assume it.) This causes the inode to be dirtied and have i_count == 0.
Then, fscrypt_drop_inode() failed to consider this case and indicated
that the inode can be dropped, causing the write to be lost.

On f2fs, other problems such as a filesystem freeze could occur due to
the inode being freed while still on f2fs's dirty inode list.

Fix this bug by making fscrypt_drop_inode() only drop clean inodes.

I've written an xfstest which detects this bug on ext4, f2fs, and ubifs.

Fixes: b1c0ec3599f4 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
Cc: <***@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/20200305084138.653498-1-***@kernel.org
Signed-off-by: Eric Biggers <***@google.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
fs/crypto/keysetup.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/fs/crypto/keysetup.c
+++ b/fs/crypto/keysetup.c
@@ -515,6 +515,15 @@ int fscrypt_drop_inode(struct inode *ino
mk = ci->ci_master_key->payload.data[0];

/*
+ * With proper, non-racy use of FS_IOC_REMOVE_ENCRYPTION_KEY, all inodes
+ * protected by the key were cleaned by sync_filesystem(). But if
+ * userspace is still using the files, inodes can be dirtied between
+ * then and now. We mustn't lose any writes, so skip dirty inodes here.
+ */
+ if (inode->i_state & I_DIRTY_ALL)
+ return 0;
+
+ /*
* Note: since we aren't holding ->mk_secret_sem, the result here can
* immediately become outdated. But there's no correctness problem with
* unnecessarily evicting. Nor is there a correctness problem with not
Greg Kroah-Hartman
2020-03-17 10:55:20 UTC
Permalink
From: Corey Minyard <***@mvista.com>

commit b26ebfe12f34f372cf041c6f801fa49c3fb382c5 upstream.

Recent changes to alloc_pid() allow the pid number to be specified on
the command line. If set_tid_size is set, then the code scanning the
levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
value.

After the code scanning the levels, there are error returns that do not
set retval, assuming it is still set to -ENOMEM.

So set retval back to -ENOMEM after scanning the levels.

Fixes: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID")
Signed-off-by: Corey Minyard <***@mvista.com>
Acked-by: Christian Brauner <***@ubuntu.com>
Cc: Andrei Vagin <***@gmail.com>
Cc: Dmitry Safonov <***@gmail.com>
Cc: Oleg Nesterov <***@redhat.com>
Cc: Adrian Reber <***@redhat.com>
Cc: <***@vger.kernel.org> # 5.5
Link: https://lore.kernel.org/r/20200306172314.12232-1-***@acm.org
[***@ubuntu.com: fixup commit message]
Signed-off-by: Christian Brauner <***@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
kernel/pid.c | 2 ++
1 file changed, 2 insertions(+)

--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -247,6 +247,8 @@ struct pid *alloc_pid(struct pid_namespa
tmp = tmp->parent;
}

+ retval = -ENOMEM;
+
if (unlikely(is_child_reaper(pid))) {
if (pid_ns_prepare_proc(ns))
goto out_free;
Greg Kroah-Hartman
2020-03-17 10:55:05 UTC
Permalink
From: Shin'ichiro Kawasaki <***@wdc.com>

commit b53df2e7442c73a932fb74228147fb946e531585 upstream.

Commit b72053072c0b ("block: allow partitions on host aware zone
devices") introduced the helper function disk_has_partitions() to check
if a given disk has valid partitions. However, since this function result
directly depends on the disk partition table length rather than the
actual existence of valid partitions in the table, it returns true even
after all partitions are removed from the disk. For host aware zoned
block devices, this results in zone management support to be kept
disabled even after removing all partitions.

Fix this by changing disk_has_partitions() to walk through the partition
table entries and return true if and only if a valid non-zero size
partition is found.

Fixes: b72053072c0b ("block: allow partitions on host aware zone devices")
Cc: ***@vger.kernel.org # 5.5
Reviewed-by: Damien Le Moal <***@wdc.com>
Reviewed-by: Johannes Thumshirn <***@wdc.com>
Reviewed-by: Christoph Hellwig <***@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <***@wdc.com>
Signed-off-by: Jens Axboe <***@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

diff --git a/block/genhd.c b/block/genhd.c
index ff6268970ddc..9c2e13ce0d19 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -301,6 +301,42 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
}
EXPORT_SYMBOL_GPL(disk_map_sector_rcu);

+/**
+ * disk_has_partitions
+ * @disk: gendisk of interest
+ *
+ * Walk through the partition table and check if valid partition exists.
+ *
+ * CONTEXT:
+ * Don't care.
+ *
+ * RETURNS:
+ * True if the gendisk has at least one valid non-zero size partition.
+ * Otherwise false.
+ */
+bool disk_has_partitions(struct gendisk *disk)
+{
+ struct disk_part_tbl *ptbl;
+ int i;
+ bool ret = false;
+
+ rcu_read_lock();
+ ptbl = rcu_dereference(disk->part_tbl);
+
+ /* Iterate partitions skipping the whole device at index 0 */
+ for (i = 1; i < ptbl->len; i++) {
+ if (rcu_dereference(ptbl->part[i])) {
+ ret = true;
+ break;
+ }
+ }
+
+ rcu_read_unlock();
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(disk_has_partitions);
+
/*
* Can be deleted altogether. Later.
*
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 6fbe58538ad6..07dc91835b98 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -245,18 +245,6 @@ static inline bool disk_part_scan_enabled(struct gendisk *disk)
!(disk->flags & GENHD_FL_NO_PART_SCAN);
}

-static inline bool disk_has_partitions(struct gendisk *disk)
-{
- bool ret = false;
-
- rcu_read_lock();
- if (rcu_dereference(disk->part_tbl)->len > 1)
- ret = true;
- rcu_read_unlock();
-
- return ret;
-}
-
static inline dev_t disk_devt(struct gendisk *disk)
{
return MKDEV(disk->major, disk->first_minor);
@@ -298,6 +286,7 @@ extern void disk_part_iter_exit(struct disk_part_iter *piter);

extern struct hd_struct *disk_map_sector_rcu(struct gendisk *disk,
sector_t sector);
+bool disk_has_partitions(struct gendisk *disk);

/*
* Macros to operate on percpu disk statistics:
Greg Kroah-Hartman
2020-03-17 10:55:23 UTC
Permalink
From: Vladis Dronov <***@redhat.com>

commit 286d3250c9d6437340203fb64938bea344729a0e upstream.

There is a race and a buffer overflow corrupting a kernel memory while
reading an EFI variable with a size more than 1024 bytes via the older
sysfs method. This happens because accessing struct efi_variable in
efivar_{attr,size,data}_read() and friends is not protected from
a concurrent access leading to a kernel memory corruption and, at best,
to a crash. The race scenario is the following:

CPU0: CPU1:
efivar_attr_read()
var->DataSize = 1024;
efivar_entry_get(... &var->DataSize)
down_interruptible(&efivars_lock)
efivar_attr_read() // same EFI var
var->DataSize = 1024;
efivar_entry_get(... &var->DataSize)
down_interruptible(&efivars_lock)
virt_efi_get_variable()
// returns EFI_BUFFER_TOO_SMALL but
// var->DataSize is set to a real
// var size more than 1024 bytes
up(&efivars_lock)
virt_efi_get_variable()
// called with var->DataSize set
// to a real var size, returns
// successfully and overwrites
// a 1024-bytes kernel buffer
up(&efivars_lock)

This can be reproduced by concurrent reading of an EFI variable which size
is more than 1024 bytes:

ts# for cpu in $(seq 0 $(nproc --ignore=1)); do ( taskset -c $cpu \
cat /sys/firmware/efi/vars/KEKDefault*/size & ) ; done

Fix this by using a local variable for a var's data buffer size so it
does not get overwritten.

Fixes: e14ab23dde12b80d ("efivars: efivar_entry API")
Reported-by: Bob Sanders <***@hpe.com> and the LTP testsuite
Signed-off-by: Vladis Dronov <***@redhat.com>
Signed-off-by: Ard Biesheuvel <***@kernel.org>
Signed-off-by: Ingo Molnar <***@kernel.org>
Cc: <***@vger.kernel.org>
Link: https://lore.kernel.org/r/20200305084041.24053-2-***@redhat.com
Link: https://lore.kernel.org/r/20200308080859.21568-24-***@kernel.org
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/firmware/efi/efivars.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)

--- a/drivers/firmware/efi/efivars.c
+++ b/drivers/firmware/efi/efivars.c
@@ -83,13 +83,16 @@ static ssize_t
efivar_attr_read(struct efivar_entry *entry, char *buf)
{
struct efi_variable *var = &entry->var;
+ unsigned long size = sizeof(var->Data);
char *str = buf;
+ int ret;

if (!entry || !buf)
return -EINVAL;

- var->DataSize = 1024;
- if (efivar_entry_get(entry, &var->Attributes, &var->DataSize, var->Data))
+ ret = efivar_entry_get(entry, &var->Attributes, &size, var->Data);
+ var->DataSize = size;
+ if (ret)
return -EIO;

if (var->Attributes & EFI_VARIABLE_NON_VOLATILE)
@@ -116,13 +119,16 @@ static ssize_t
efivar_size_read(struct efivar_entry *entry, char *buf)
{
struct efi_variable *var = &entry->var;
+ unsigned long size = sizeof(var->Data);
char *str = buf;
+ int ret;

if (!entry || !buf)
return -EINVAL;

- var->DataSize = 1024;
- if (efivar_entry_get(entry, &var->Attributes, &var->DataSize, var->Data))
+ ret = efivar_entry_get(entry, &var->Attributes, &size, var->Data);
+ var->DataSize = size;
+ if (ret)
return -EIO;

str += sprintf(str, "0x%lx\n", var->DataSize);
@@ -133,12 +139,15 @@ static ssize_t
efivar_data_read(struct efivar_entry *entry, char *buf)
{
struct efi_variable *var = &entry->var;
+ unsigned long size = sizeof(var->Data);
+ int ret;

if (!entry || !buf)
return -EINVAL;

- var->DataSize = 1024;
- if (efivar_entry_get(entry, &var->Attributes, &var->DataSize, var->Data))
+ ret = efivar_entry_get(entry, &var->Attributes, &size, var->Data);
+ var->DataSize = size;
+ if (ret)
return -EIO;

memcpy(buf, var->Data, var->DataSize);
@@ -250,14 +259,16 @@ efivar_show_raw(struct efivar_entry *ent
{
struct efi_variable *var = &entry->var;
struct compat_efi_variable *compat;
+ unsigned long datasize = sizeof(var->Data);
size_t size;
+ int ret;

if (!entry || !buf)
return 0;

- var->DataSize = 1024;
- if (efivar_entry_get(entry, &entry->var.Attributes,
- &entry->var.DataSize, entry->var.Data))
+ ret = efivar_entry_get(entry, &var->Attributes, &datasize, var->Data);
+ var->DataSize = datasize;
+ if (ret)
return -EIO;

if (in_compat_syscall()) {
Greg Kroah-Hartman
2020-03-17 10:55:24 UTC
Permalink
From: Vladis Dronov <***@redhat.com>

commit d6c066fda90d578aacdf19771a027ed484a79825 upstream.

Add a sanity check to efivar_store_raw() the same way
efivar_{attr,size,data}_read() and efivar_show_raw() have it.

Signed-off-by: Vladis Dronov <***@redhat.com>
Signed-off-by: Ard Biesheuvel <***@kernel.org>
Signed-off-by: Ingo Molnar <***@kernel.org>
Cc: <***@vger.kernel.org>
Link: https://lore.kernel.org/r/20200305084041.24053-3-***@redhat.com
Link: https://lore.kernel.org/r/20200308080859.21568-25-***@kernel.org
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/firmware/efi/efivars.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/firmware/efi/efivars.c
+++ b/drivers/firmware/efi/efivars.c
@@ -208,6 +208,9 @@ efivar_store_raw(struct efivar_entry *en
u8 *data;
int err;

+ if (!entry || !buf)
+ return -EINVAL;
+
if (in_compat_syscall()) {
struct compat_efi_variable *compat;
Greg Kroah-Hartman
2020-03-17 10:55:26 UTC
Permalink
From: Felix Fietkau <***@nbd.name>

commit b102f0c522cf668c8382c56a4f771b37d011cda2 upstream.

If the hardware receives an oversized packet with too many rx fragments,
skb_shinfo(skb)->frags can overflow and corrupt memory of adjacent pages.
This becomes especially visible if it corrupts the freelist pointer of
a slab page.

Cc: ***@vger.kernel.org
Signed-off-by: Felix Fietkau <***@nbd.name>
Signed-off-by: Kalle Valo <***@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/net/wireless/mediatek/mt76/dma.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/drivers/net/wireless/mediatek/mt76/dma.c
+++ b/drivers/net/wireless/mediatek/mt76/dma.c
@@ -447,10 +447,13 @@ mt76_add_fragment(struct mt76_dev *dev,
struct page *page = virt_to_head_page(data);
int offset = data - page_address(page);
struct sk_buff *skb = q->rx_head;
+ struct skb_shared_info *shinfo = skb_shinfo(skb);

- offset += q->buf_offset;
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset, len,
- q->buf_size);
+ if (shinfo->nr_frags < ARRAY_SIZE(shinfo->frags)) {
+ offset += q->buf_offset;
+ skb_add_rx_frag(skb, shinfo->nr_frags, page, offset, len,
+ q->buf_size);
+ }

if (more)
return;
Greg Kroah-Hartman
2020-03-17 10:55:28 UTC
Permalink
From: Tony Luck <***@intel.com>

commit 59b5809655bdafb0767d3fd00a3e41711aab07e6 upstream.

There are two implemented bits in the PPIN_CTL MSR:

Bit 0: LockOut (R/WO)
Set 1 to prevent further writes to MSR_PPIN_CTL.

Bit 1: Enable_PPIN (R/W)
If 1, enables MSR_PPIN to be accessible using RDMSR.
If 0, an attempt to read MSR_PPIN will cause #GP.

So there are four defined values:
0: PPIN is disabled, PPIN_CTL may be updated
1: PPIN is disabled. PPIN_CTL is locked against updates
2: PPIN is enabled. PPIN_CTL may be updated
3: PPIN is enabled. PPIN_CTL is locked against updates

Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2".
When it should have done so for both case "2" and case "3".

Fix the final test to just check for the enable bit. Also fix some of
the other comments in this function.

Fixes: 3f5a7896a509 ("x86/mce: Include the PPIN in MCE records when available")
Signed-off-by: Tony Luck <***@intel.com>
Signed-off-by: Borislav Petkov <***@suse.de>
Cc: <***@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200226011737.9958-1-***@intel.com
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
arch/x86/kernel/cpu/mce/intel.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -492,17 +492,18 @@ static void intel_ppin_init(struct cpuin
return;

if ((val & 3UL) == 1UL) {
- /* PPIN available but disabled: */
+ /* PPIN locked in disabled mode */
return;
}

- /* If PPIN is disabled, but not locked, try to enable: */
- if (!(val & 3UL)) {
+ /* If PPIN is disabled, try to enable */
+ if (!(val & 2UL)) {
wrmsrl_safe(MSR_PPIN_CTL, val | 2UL);
rdmsrl_safe(MSR_PPIN_CTL, &val);
}

- if ((val & 3UL) == 2UL)
+ /* Is the enable bit set? */
+ if (val & 2UL)
set_cpu_cap(c, X86_FEATURE_INTEL_PPIN);
}
}
Greg Kroah-Hartman
2020-03-17 10:55:27 UTC
Permalink
From: Kim Phillips <***@amd.com>

commit f967140dfb7442e2db0868b03b961f9c59418a1b upstream.

Enable the sampling check in kernel/events/core.c::perf_event_open(),
which returns the more appropriate -EOPNOTSUPP.

BEFORE:

$ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses).
/bin/dmesg | grep -i perf may provide additional information.

With nothing relevant in dmesg.

AFTER:

$ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
Error:
l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters")
Signed-off-by: Kim Phillips <***@amd.com>
Signed-off-by: Borislav Petkov <***@suse.de>
Acked-by: Peter Zijlstra <***@infradead.org>
Cc: ***@vger.kernel.org
Link: https://lkml.kernel.org/r/20200311191323.13124-1-***@amd.com
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
arch/x86/events/amd/uncore.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -190,15 +190,12 @@ static int amd_uncore_event_init(struct

/*
* NB and Last level cache counters (MSRs) are shared across all cores
- * that share the same NB / Last level cache. Interrupts can be directed
- * to a single target core, however, event counts generated by processes
- * running on other cores cannot be masked out. So we do not support
- * sampling and per-thread events.
+ * that share the same NB / Last level cache. On family 16h and below,
+ * Interrupts can be directed to a single target core, however, event
+ * counts generated by processes running on other cores cannot be masked
+ * out. So we do not support sampling and per-thread events via
+ * CAP_NO_INTERRUPT, and we do not enable counter overflow interrupts:
*/
- if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
- return -EINVAL;
-
- /* and we do not enable counter overflow interrupts */
hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB;
hwc->idx = -1;

@@ -306,7 +303,7 @@ static struct pmu amd_nb_pmu = {
.start = amd_uncore_start,
.stop = amd_uncore_stop,
.read = amd_uncore_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
};

static struct pmu amd_llc_pmu = {
@@ -317,7 +314,7 @@ static struct pmu amd_llc_pmu = {
.start = amd_uncore_start,
.stop = amd_uncore_stop,
.read = amd_uncore_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
};

static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
Greg Kroah-Hartman
2020-03-17 10:55:29 UTC
Permalink
From: Marc Zyngier <***@kernel.org>

commit 65ac74f1de3334852fb7d9b1b430fa5a06524276 upstream.

The way cookie_init_hw_msi_region() allocates the iommu_dma_msi_page
structures doesn't match the way iommu_put_dma_cookie() frees them.

The former performs a single allocation of all the required structures,
while the latter tries to free them one at a time. It doesn't quite
work for the main use case (the GICv3 ITS where the range is 64kB)
when the base granule size is 4kB.

This leads to a nice slab corruption on teardown, which is easily
observable by simply creating a VF on a SRIOV-capable device, and
tearing it down immediately (no need to even make use of it).
Fortunately, this only affects systems where the ITS isn't translated
by the SMMU, which are both rare and non-standard.

Fix it by allocating iommu_dma_msi_page structures one at a time.

Fixes: 7c1b058c8b5a3 ("iommu/dma: Handle IOMMU API reserved regions")
Signed-off-by: Marc Zyngier <***@kernel.org>
Reviewed-by: Eric Auger <***@redhat.com>
Cc: Robin Murphy <***@arm.com>
Cc: Joerg Roedel <***@suse.de>
Cc: Will Deacon <***@kernel.org>
Cc: ***@vger.kernel.org
Reviewed-by: Robin Murphy <***@arm.com>
Signed-off-by: Joerg Roedel <***@suse.de>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/iommu/dma-iommu.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -177,15 +177,15 @@ static int cookie_init_hw_msi_region(str
start -= iova_offset(iovad, start);
num_pages = iova_align(iovad, end - start) >> iova_shift(iovad);

- msi_page = kcalloc(num_pages, sizeof(*msi_page), GFP_KERNEL);
- if (!msi_page)
- return -ENOMEM;
-
for (i = 0; i < num_pages; i++) {
- msi_page[i].phys = start;
- msi_page[i].iova = start;
- INIT_LIST_HEAD(&msi_page[i].list);
- list_add(&msi_page[i].list, &cookie->msi_page_list);
+ msi_page = kmalloc(sizeof(*msi_page), GFP_KERNEL);
+ if (!msi_page)
+ return -ENOMEM;
+
+ msi_page->phys = start;
+ msi_page->iova = start;
+ INIT_LIST_HEAD(&msi_page->list);
+ list_add(&msi_page->list, &cookie->msi_page_list);
start += iovad->granule;
}
Greg Kroah-Hartman
2020-03-17 10:55:30 UTC
Permalink
From: Hans de Goede <***@redhat.com>

commit 59833696442c674acbbd297772ba89e7ad8c753d upstream.

Quoting from the comment describing the WARN functions in
include/asm-generic/bug.h:

* WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
* significant kernel issues that need prompt attention if they should ever
* appear at runtime.
*
* Do not use these macros when checking for invalid external inputs

The (buggy) firmware tables which the dmar code was calling WARN_TAINT
for really are invalid external inputs. They are not under the kernel's
control and the issues in them cannot be fixed by a kernel update.
So logging a backtrace, which invites bug reports to be filed about this,
is not helpful.

Some distros, e.g. Fedora, have tools watching for the kernel backtraces
logged by the WARN macros and offer the user an option to file a bug for
this when these are encountered. The WARN_TAINT in warn_invalid_dmar()
+ another iommu WARN_TAINT, addressed in another patch, have lead to over
a 100 bugs being filed this way.

This commit replaces the WARN_TAINT("...") calls, with
pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) calls
avoiding the backtrace and thus also avoiding bug-reports being filed
about this against the kernel.

Fixes: fd0c8894893c ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables")
Fixes: e625b4a95d50 ("iommu/vt-d: Parse ANDD records")
Signed-off-by: Hans de Goede <***@redhat.com>
Signed-off-by: Joerg Roedel <***@suse.de>
Acked-by: Lu Baolu <***@linux.intel.com>
Cc: ***@vger.kernel.org
Link: https://lore.kernel.org/r/20200309140138.3753-2-***@redhat.com
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1564895
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/iommu/dmar.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -440,12 +440,13 @@ static int __init dmar_parse_one_andd(st

/* Check for NUL termination within the designated length */
if (strnlen(andd->device_name, header->length - 8) == header->length - 8) {
- WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND,
+ pr_warn(FW_BUG
"Your BIOS is broken; ANDD object name is not NUL-terminated\n"
"BIOS vendor: %s; Ver: %s; Product Version: %s\n",
dmi_get_system_info(DMI_BIOS_VENDOR),
dmi_get_system_info(DMI_BIOS_VERSION),
dmi_get_system_info(DMI_PRODUCT_VERSION));
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
return -EINVAL;
}
pr_info("ANDD device: %x name: %s\n", andd->device_number,
@@ -471,14 +472,14 @@ static int dmar_parse_one_rhsa(struct ac
return 0;
}
}
- WARN_TAINT(
- 1, TAINT_FIRMWARE_WORKAROUND,
+ pr_warn(FW_BUG
"Your BIOS is broken; RHSA refers to non-existent DMAR unit at %llx\n"
"BIOS vendor: %s; Ver: %s; Product Version: %s\n",
drhd->reg_base_addr,
dmi_get_system_info(DMI_BIOS_VENDOR),
dmi_get_system_info(DMI_BIOS_VERSION),
dmi_get_system_info(DMI_PRODUCT_VERSION));
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);

return 0;
}
@@ -827,14 +828,14 @@ int __init dmar_table_init(void)

static void warn_invalid_dmar(u64 addr, const char *message)
{
- WARN_TAINT_ONCE(
- 1, TAINT_FIRMWARE_WORKAROUND,
+ pr_warn_once(FW_BUG
"Your BIOS is broken; DMAR reported at address %llx%s!\n"
"BIOS vendor: %s; Ver: %s; Product Version: %s\n",
addr, message,
dmi_get_system_info(DMI_BIOS_VENDOR),
dmi_get_system_info(DMI_BIOS_VERSION),
dmi_get_system_info(DMI_PRODUCT_VERSION));
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
}

static int __ref
Greg Kroah-Hartman
2020-03-17 10:55:06 UTC
Permalink
From: Al Viro <***@zeniv.linux.org.uk>

commit d9a9f4849fe0c9d560851ab22a85a666cddfdd24 upstream.

several iterations of ->atomic_open() calling conventions ago, we
used to need fput() if ->atomic_open() failed at some point after
successful finish_open(). Now (since 2016) it's not needed -
struct file carries enough state to make fput() work regardless
of the point in struct file lifecycle and discarding it on
failure exits in open() got unified. Unfortunately, I'd missed
the fact that we had an instance of ->atomic_open() (cifs one)
that used to need that fput(), as well as the stale comment in
finish_open() demanding such late failure handling. Trivially
fixed...

Fixes: fe9ec8291fca "do_last(): take fput() on error after opening to out:"
Cc: ***@kernel.org # v4.7+
Signed-off-by: Al Viro <***@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
Documentation/filesystems/porting.rst | 8 ++++++++
fs/cifs/dir.c | 1 -
fs/open.c | 3 ---
3 files changed, 8 insertions(+), 4 deletions(-)

--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -850,3 +850,11 @@ business doing so.
d_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
very suspect (and won't work in modules). Such uses are very likely to
be misspelled d_alloc_anon().
+
+---
+
+**mandatory**
+
+[should've been added in 2016] stale comment in finish_open() nonwithstanding,
+failure exits in ->atomic_open() instances should *NOT* fput() the file,
+no matter what. Everything is handled by the caller.
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -558,7 +558,6 @@ cifs_atomic_open(struct inode *inode, st
if (server->ops->close)
server->ops->close(xid, tcon, &fid);
cifs_del_pending_open(&open);
- fput(file);
rc = -ENOMEM;
}

--- a/fs/open.c
+++ b/fs/open.c
@@ -860,9 +860,6 @@ cleanup_file:
* the return value of d_splice_alias(), then the caller needs to perform dput()
* on it after finish_open().
*
- * On successful return @file is a fully instantiated open file. After this, if
- * an error occurs in ->atomic_open(), it needs to clean up with fput().
- *
* Returns zero on success or -errno if the open failed.
*/
int finish_open(struct file *file, struct dentry *dentry,
Greg Kroah-Hartman
2020-03-17 10:55:33 UTC
Permalink
From: Yonghyun Hwang <***@google.com>

commit 77a1bce84bba01f3f143d77127b72e872b573795 upstream.

intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge
page onto its corresponding physical address. This commit fixes the bug by
accomodating the level of page entry for the IOVA and adds IOVA's lower
address to the physical address.

Cc: <***@vger.kernel.org>
Acked-by: Lu Baolu <***@linux.intel.com>
Reviewed-by: Moritz Fischer <***@kernel.org>
Signed-off-by: Yonghyun Hwang <***@google.com>
Fixes: 3871794642579 ("VT-d: Changes to support KVM")
Signed-off-by: Joerg Roedel <***@suse.de>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/iommu/intel-iommu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5568,8 +5568,10 @@ static phys_addr_t intel_iommu_iova_to_p
u64 phys = 0;

pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &level);
- if (pte)
- phys = dma_pte_addr(pte);
+ if (pte && dma_pte_present(pte))
+ phys = dma_pte_addr(pte) +
+ (iova & (BIT_MASK(level_to_offset_bits(level) +
+ VTD_PAGE_SHIFT) - 1));

return phys;
}
Greg Kroah-Hartman
2020-03-17 10:55:35 UTC
Permalink
From: Anson Huang <***@nxp.com>

commit 5eb40257047fb11085d582b7b9ccd0bffe900726 upstream.

IMX8MN_CLK_I2C4 and IMX8MN_CLK_UART1's index definitions are incorrect,
fix them.

Fixes: 1e80936a42e1 ("dt-bindings: imx: Add clock binding doc for i.MX8MN")
Signed-off-by: Anson Huang <***@nxp.com>
Signed-off-by: Shawn Guo <***@kernel.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
include/dt-bindings/clock/imx8mn-clock.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/dt-bindings/clock/imx8mn-clock.h
+++ b/include/dt-bindings/clock/imx8mn-clock.h
@@ -122,8 +122,8 @@
#define IMX8MN_CLK_I2C1 105
#define IMX8MN_CLK_I2C2 106
#define IMX8MN_CLK_I2C3 107
-#define IMX8MN_CLK_I2C4 118
-#define IMX8MN_CLK_UART1 119
+#define IMX8MN_CLK_I2C4 108
+#define IMX8MN_CLK_UART1 109
#define IMX8MN_CLK_UART2 110
#define IMX8MN_CLK_UART3 111
#define IMX8MN_CLK_UART4 112
Greg Kroah-Hartman
2020-03-17 10:55:37 UTC
Permalink
From: Leonard Crestez <***@nxp.com>

commit 4c48e549f39f8ed10cf8a0b6cb96f5eddf0391ce upstream.

The imx SC api strongly assumes that messages are composed out of
4-bytes words but some of our message structs have odd sizeofs.

This produces many oopses with CONFIG_KASAN=y.

Fix by marking with __aligned(4).

Fixes: b96eea718bf6 ("pinctrl: fsl: add scu based pinctrl support")
Signed-off-by: Leonard Crestez <***@nxp.com>
Link: https://lore.kernel.org/r/***@nxp.com
Signed-off-by: Linus Walleij <***@linaro.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/pinctrl/freescale/pinctrl-scu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/pinctrl/freescale/pinctrl-scu.c
+++ b/drivers/pinctrl/freescale/pinctrl-scu.c
@@ -23,12 +23,12 @@ struct imx_sc_msg_req_pad_set {
struct imx_sc_rpc_msg hdr;
u32 val;
u16 pad;
-} __packed;
+} __packed __aligned(4);

struct imx_sc_msg_req_pad_get {
struct imx_sc_rpc_msg hdr;
u16 pad;
-} __packed;
+} __packed __aligned(4);

struct imx_sc_msg_resp_pad_get {
struct imx_sc_rpc_msg hdr;
Greg Kroah-Hartman
2020-03-17 10:55:40 UTC
Permalink
From: Tina Zhang <***@intel.com>

commit 259170cb4c84f4165a36c0b05811eb74c495412c upstream.

Commit c3b5a8430daad ("drm/i915/gvt: Enable gfx virtualiztion for CFL")
added the support on CFL. The vgpu emulation hotplug support on CFL was
supposed to be included in that patch. Without the vgpu emulation
hotplug support, the dma-buf based display gives us a blur face.

So fix this issue by adding the vgpu emulation hotplug support on CFL.

Fixes: c3b5a8430daad ("drm/i915/gvt: Enable gfx virtualiztion for CFL")
Signed-off-by: Tina Zhang <***@intel.com>
Acked-by: Zhenyu Wang <***@linux.intel.com>
Signed-off-by: Zhenyu Wang <***@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200227010041.32248-1-***@intel.com
(cherry picked from commit 135dde8853c7e00f6002e710f7e4787ed8585c0e)
Signed-off-by: Jani Nikula <***@intel.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/gvt/display.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/gvt/display.c
+++ b/drivers/gpu/drm/i915/gvt/display.c
@@ -457,7 +457,8 @@ void intel_vgpu_emulate_hotplug(struct i
struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;

/* TODO: add more platforms support */
- if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv)) {
+ if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv) ||
+ IS_COFFEELAKE(dev_priv)) {
if (connected) {
vgpu_vreg_t(vgpu, SFUSE_STRAP) |=
SFUSE_STRAP_DDID_DETECTED;
Greg Kroah-Hartman
2020-03-17 10:55:41 UTC
Permalink
From: Charles Keepax <***@opensource.cirrus.com>

commit aafd56fc79041bf36f97712d4b35208cbe07db90 upstream.

kref_init starts with the reference count at 1, which will be balanced
by the pinctrl_put in pinctrl_unregister. The additional kref_get in
pinctrl_claim_hogs will increase this count to 2 and cause the hogs to
not get freed when pinctrl_unregister is called.

Fixes: 6118714275f0 ("pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()")
Signed-off-by: Charles Keepax <***@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20200228154142.13860-1-***@opensource.cirrus.com
Signed-off-by: Linus Walleij <***@linaro.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/pinctrl/core.c | 1 -
1 file changed, 1 deletion(-)

--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -2025,7 +2025,6 @@ static int pinctrl_claim_hogs(struct pin
return PTR_ERR(pctldev->p);
}

- kref_get(&pctldev->p->users);
pctldev->hog_default =
pinctrl_lookup_state(pctldev->p, PINCTRL_STATE_DEFAULT);
if (IS_ERR(pctldev->hog_default)) {
Greg Kroah-Hartman
2020-03-17 10:55:42 UTC
Permalink
From: Zhenyu Wang <***@linux.intel.com>

commit 04d6067f1f19e70a418f92fa3170cf7fe53b7fdf upstream.
From commit f25a49ab8ab9 ("drm/i915/gvt: Use vgpu_lock to protect per
vgpu access") the vgpu idr destroy is moved later than vgpu resource
destroy, then it would fail to stop timer for schedule policy clean
which to check vgpu idr for any left vGPU. So this trys to destroy
vgpu idr earlier.

Cc: Colin Xu <***@intel.com>
Fixes: f25a49ab8ab9 ("drm/i915/gvt: Use vgpu_lock to protect per vgpu access")
Acked-by: Colin Xu <***@intel.com>
Signed-off-by: Zhenyu Wang <***@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200229055445.31481-1-***@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
drivers/gpu/drm/i915/gvt/vgpu.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -272,10 +272,17 @@ void intel_gvt_destroy_vgpu(struct intel
{
struct intel_gvt *gvt = vgpu->gvt;

- mutex_lock(&vgpu->vgpu_lock);
-
WARN(vgpu->active, "vGPU is still active!\n");

+ /*
+ * remove idr first so later clean can judge if need to stop
+ * service if no active vgpu.
+ */
+ mutex_lock(&gvt->lock);
+ idr_remove(&gvt->vgpu_idr, vgpu->id);
+ mutex_unlock(&gvt->lock);
+
+ mutex_lock(&vgpu->vgpu_lock);
intel_gvt_debugfs_remove_vgpu(vgpu);
intel_vgpu_clean_sched_policy(vgpu);
intel_vgpu_clean_submission(vgpu);
@@ -290,7 +297,6 @@ void intel_gvt_destroy_vgpu(struct intel
mutex_unlock(&vgpu->vgpu_lock);

mutex_lock(&gvt->lock);
- idr_remove(&gvt->vgpu_idr, vgpu->id);
if (idr_is_empty(&gvt->vgpu_idr))
intel_gvt_clean_irq(gvt);
intel_gvt_update_vgpu_types(gvt);
Greg Kroah-Hartman
2020-03-17 10:55:07 UTC
Permalink
From: Al Viro <***@zeniv.linux.org.uk>

commit 21039132650281de06a169cbe8a0f7e5c578fd8b upstream.

with the way fs/namei.c:do_last() had been done, ->atomic_open()
instances needed to recognize the case when existing file got
found with O_EXCL|O_CREAT, either by falling back to finish_no_open()
or failing themselves. gfs2 one didn't.

Fixes: 6d4ade986f9c (GFS2: Add atomic_open support)
Cc: ***@kernel.org # v3.11
Signed-off-by: Al Viro <***@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>

---
fs/gfs2/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1248,7 +1248,7 @@ static int gfs2_atomic_open(struct inode
if (!(file->f_mode & FMODE_OPENED))
return finish_no_open(file, d);
dput(d);
- return 0;
+ return excl && (flags & O_CREAT) ? -EEXIST : 0;
}

BUG_ON(d != NULL);

Loading...