Discussion:
[PATCH 3.12 02/94] can: c_can: Remove EOB exit
Jiri Slaby
2014-07-30 12:13:51 UTC
Permalink
From: Thomas Gleixner <***@linutronix.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 710c56105dfd10e32a89086cf78cc1c8433f6a7a upstream.

The rx_poll code has the following gem:

if (msg_ctrl_save & IF_MCONT_EOB)
return num_rx_pkts;

The EOB bit is the indicator for the hardware that this is the last
configured FIFO object. But this object can contain valid data, if we
manage to free up objects before the overrun case hits.

Now if the code exits due to the EOB bit set, then this buffer is
stale and the interrupt bit and NewDat bit of the buffer are still
set. Results in a nice interrupt storm unless we come into an overrun
situation where the MSGLST bit gets set.

ksoftirqd/0-3 [000] ..s. 79.124101: c_can_poll: rx_poll: val: 00008001 pend 00008001
ksoftirqd/0-3 [000] ..s. 79.124176: c_can_poll: rx_poll: val: 00008000 pend 00008000
ksoftirqd/0-3 [000] ..s. 79.124187: c_can_poll: rx_poll: val: 00008002 pend 00008002
ksoftirqd/0-3 [000] ..s. 79.124256: c_can_poll: rx_poll: val: 00008000 pend 00008000
ksoftirqd/0-3 [000] ..s. 79.124267: c_can_poll: rx_poll: val: 00008000 pend 00008000

The amazing thing is that the check of the MSGLST (aka overrun bit)
used to be after the check of the EOB bit. That was "fixed" in commit
5d0f801a2c(can: c_can: Fix RX message handling, handle lost message
before EOB). But the author of this "fix" did not even understand that
the EOB check is broken as well.

Again a simple solution: Remove

Signed-off-by: Thomas Gleixner <***@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <***@pengutronix.de>

Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/can/c_can/c_can.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index e59c42b446a9..ae148055baa2 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -830,9 +830,6 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
continue;
}

- if (msg_ctrl_save & IF_MCONT_EOB)
- return num_rx_pkts;
-
if (!(msg_ctrl_save & IF_MCONT_NEWDAT))
continue;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:07 UTC
Permalink
From: Antti Palosaari <***@iki.fi>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit db4175ae2095634dbecd4c847da439f9c83e1b3b upstream.

Only supported modulation for DVB-S is QPSK. Modulation parameter
contains invalid value for DVB-S on some cases, which leads driver
refusing tuning attempt. Due to that, hard code modulation to QPSK
in case of DVB-S.

Signed-off-by: Antti Palosaari <***@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <***@samsung.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/media/dvb-frontends/tda10071.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/media/dvb-frontends/tda10071.c b/drivers/media/dvb-frontends/tda10071.c
index 8ad3a57cf640..287b977862e2 100644
--- a/drivers/media/dvb-frontends/tda10071.c
+++ b/drivers/media/dvb-frontends/tda10071.c
@@ -667,6 +667,7 @@ static int tda10071_set_frontend(struct dvb_frontend *fe)
struct dtv_frontend_properties *c = &fe->dtv_property_cache;
int ret, i;
u8 mode, rolloff, pilot, inversion, div;
+ fe_modulation_t modulation;

dev_dbg(&priv->i2c->dev, "%s: delivery_system=%d modulation=%d " \
"frequency=%d symbol_rate=%d inversion=%d pilot=%d " \
@@ -701,10 +702,13 @@ static int tda10071_set_frontend(struct dvb_frontend *fe)

switch (c->delivery_system) {
case SYS_DVBS:
+ modulation = QPSK;
rolloff = 0;
pilot = 2;
break;
case SYS_DVBS2:
+ modulation = c->modulation;
+
switch (c->rolloff) {
case ROLLOFF_20:
rolloff = 2;
@@ -749,7 +753,7 @@ static int tda10071_set_frontend(struct dvb_frontend *fe)

for (i = 0, mode = 0xff; i < ARRAY_SIZE(TDA10071_MODCOD); i++) {
if (c->delivery_system == TDA10071_MODCOD[i].delivery_system &&
- c->modulation == TDA10071_MODCOD[i].modulation &&
+ modulation == TDA10071_MODCOD[i].modulation &&
c->fec_inner == TDA10071_MODCOD[i].fec) {
mode = TDA10071_MODCOD[i].val;
dev_dbg(&priv->i2c->dev, "%s: mode found=%02x\n",
--
2.0.1
Jiri Slaby
2014-07-30 12:15:16 UTC
Permalink
From: Silesh C V <***@mvista.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit aed8adb7688d5744cb484226820163af31d2499a upstream.

Commit 079148b919d0 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags. This causes issues during
core generation when tsk->flags is checked again (eg. for PF_USED_MATH
to dump floating point registers). Fix this.

Signed-off-by: Silesh C V <***@mvista.com>
Acked-by: Oleg Nesterov <***@redhat.com>
Cc: Mandeep Singh Baines <***@chromium.org>
Signed-off-by: Andrew Morton <***@linux-foundation.org>
Signed-off-by: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/coredump.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 02db009d1531..88adbdd15193 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -307,7 +307,7 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
if (unlikely(nr < 0))
return nr;

- tsk->flags = PF_DUMPCORE;
+ tsk->flags |= PF_DUMPCORE;
if (atomic_read(&mm->mm_users) == nr + 1)
goto done;
/*
--
2.0.1
Jiri Slaby
2014-07-30 12:15:21 UTC
Permalink
=46rom: Christian K=C3=B6nig <***@amd.com>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

commit e8c214d22e76dd0ead38f97f8d2dc09aac70d651 upstream.

We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.

Signed-off-by: Christian K=C3=B6nig <***@amd.com>
Signed-off-by: Alex Deucher <***@amd.com>
Reviewed-by: Michel D=C3=A4nzer <***@amd.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/gpu/drm/radeon/cik.c | 1 +
drivers/gpu/drm/radeon/evergreen.c | 1 +
drivers/gpu/drm/radeon/r600.c | 1 +
drivers/gpu/drm/radeon/si.c | 1 +
4 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.=
c
index bb7f2ae7683d..14836dfd04e7 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -6554,6 +6554,7 @@ static inline u32 cik_get_ih_wptr(struct radeon_d=
evice *rdev)
tmp =3D RREG32(IH_RB_CNTL);
tmp |=3D IH_WPTR_OVERFLOW_CLEAR;
WREG32(IH_RB_CNTL, tmp);
+ wptr &=3D ~RB_OVERFLOW;
}
return (wptr & rdev->ih.ptr_mask);
}
diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeo=
n/evergreen.c
index 4564bb1ab837..7ca58fc7a1c6 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -4664,6 +4664,7 @@ static u32 evergreen_get_ih_wptr(struct radeon_de=
vice *rdev)
tmp =3D RREG32(IH_RB_CNTL);
tmp |=3D IH_WPTR_OVERFLOW_CLEAR;
WREG32(IH_RB_CNTL, tmp);
+ wptr &=3D ~RB_OVERFLOW;
}
return (wptr & rdev->ih.ptr_mask);
}
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r60=
0.c
index 2c2b91f16ecf..88eb936fbc2f 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -3657,6 +3657,7 @@ static u32 r600_get_ih_wptr(struct radeon_device =
*rdev)
tmp =3D RREG32(IH_RB_CNTL);
tmp |=3D IH_WPTR_OVERFLOW_CLEAR;
WREG32(IH_RB_CNTL, tmp);
+ wptr &=3D ~RB_OVERFLOW;
}
return (wptr & rdev->ih.ptr_mask);
}
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index c9f9c07f888d..4d41a0dc1796 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -6041,6 +6041,7 @@ static inline u32 si_get_ih_wptr(struct radeon_de=
vice *rdev)
tmp =3D RREG32(IH_RB_CNTL);
tmp |=3D IH_WPTR_OVERFLOW_CLEAR;
WREG32(IH_RB_CNTL, tmp);
+ wptr &=3D ~RB_OVERFLOW;
}
return (wptr & rdev->ih.ptr_mask);
}
--=20
2.0.1
Jiri Slaby
2014-07-30 12:15:00 UTC
Permalink
From: Peter Zijlstra <***@infradead.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream.

The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.

There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.

Opt in for known good archs.

Signed-off-by: Peter Zijlstra <***@infradead.org>
Reported-by: Mikulas Patocka <***@redhat.com>
Cc: David Miller <***@davemloft.net>
Cc: Chris Metcalf <***@tilera.com>
Cc: James Bottomley <***@hansenpartnership.com>
Cc: Vineet Gupta <***@synopsys.com>
Cc: Jason Low <***@hp.com>
Cc: Waiman Long <***@hp.com>
Cc: "James E.J. Bottomley" <***@parisc-linux.org>
Cc: Paul McKenney <***@linux.vnet.ibm.com>
Cc: John David Anglin <***@bell.net>
Cc: James Hogan <***@imgtec.com>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Davidlohr Bueso <***@hp.com>
Cc: Benjamin Herrenschmidt <***@kernel.crashing.org>
Cc: Catalin Marinas <***@arm.com>
Cc: Russell King <***@arm.linux.org.uk>
Cc: Will Deacon <***@arm.com>
Cc: linux-arm-***@lists.infradead.org
Cc: linux-***@vger.kernel.org
Cc: linuxppc-***@lists.ozlabs.org
Cc: ***@vger.kernel.org
Link: http://lkml.kernel.org/r/***@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
kernel/Kconfig.locks | 5 ++++-
6 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e47fcd1e9645..99e1ce978cf9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -5,6 +5,7 @@ config ARM
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
+ select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT if MMU
select CLONE_BACKWARDS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c04454876bcb..fe70eaea0e28 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,7 @@
config ARM64
def_bool y
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+ select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_WANT_OPTIONAL_GPIOLIB
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d5d026b6d237..2e0ddfadc0b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -138,6 +138,7 @@ config PPC
select OLD_SIGSUSPEND
select OLD_SIGACTION if PPC32
select HAVE_DEBUG_STACKOVERFLOW
+ select ARCH_SUPPORTS_ATOMIC_RMW

config EARLY_PRINTK
bool
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 4e5683877b93..d60f34dbae89 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -75,6 +75,7 @@ config SPARC64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select HAVE_C_RECORDMCOUNT
select NO_BOOTMEM
+ select ARCH_SUPPORTS_ATOMIC_RMW

config ARCH_DEFCONFIG
string
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb2dfa61eabe..9dc1a24d41b8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -123,6 +123,7 @@ config X86
select COMPAT_OLD_SIGACTION if IA32_EMULATION
select RTC_LIB
select HAVE_DEBUG_STACKOVERFLOW
+ select ARCH_SUPPORTS_ATOMIC_RMW

config INSTRUCTION_DECODER
def_bool y
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index d2b32ac27a39..ecee67a00f5f 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -220,6 +220,9 @@ config INLINE_WRITE_UNLOCK_IRQRESTORE

endif

+config ARCH_SUPPORTS_ATOMIC_RMW
+ bool
+
config MUTEX_SPIN_ON_OWNER
def_bool y
- depends on SMP && !DEBUG_MUTEXES
+ depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
--
2.0.1
Jiri Slaby
2014-07-30 12:15:17 UTC
Permalink
From: John David Anglin <***@bell.net>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 20dbea494543aefaace874cc3ec93a39b94b1ec4 upstream.

The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.

Signed-off-by: John David Anglin <***@bell.net>
Signed-off-by: Helge Deller <***@gmx.de>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/parisc/include/uapi/asm/signal.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/parisc/include/uapi/asm/signal.h b/arch/parisc/include/uapi/asm/signal.h
index a2fa297196bc..f5645d6a89f2 100644
--- a/arch/parisc/include/uapi/asm/signal.h
+++ b/arch/parisc/include/uapi/asm/signal.h
@@ -69,8 +69,6 @@
#define SA_NOMASK SA_NODEFER
#define SA_ONESHOT SA_RESETHAND

-#define SA_RESTORER 0x04000000 /* obsolete -- ignored */
-
#define MINSIGSTKSZ 2048
#define SIGSTKSZ 8192
--
2.0.1
Jiri Slaby
2014-07-30 12:15:23 UTC
Permalink
=46rom: Michael Brown <***@fensystems.co.uk>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream.

The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions. Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.

=46ix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).

Signed-off-by: Michael Brown <***@fensystems.co.uk>
Cc: Thomas B=C3=A4chler <***@archlinux.org>
Cc: Josh Boyer <***@fedoraproject.org>
Signed-off-by: Matt Fleming <***@intel.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/x86/boot/header.S | 26 ++++++++++++++++++++++----
arch/x86/boot/tools/build.c | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9ec06a1f6d61..425712462178 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -91,10 +91,9 @@ bs_die:
=20
.section ".bsdata", "a"
bugger_off_msg:
- .ascii "Direct floppy boot is not supported. "
- .ascii "Use a boot loader program instead.\r\n"
+ .ascii "Use a boot loader.\r\n"
.ascii "\n"
- .ascii "Remove disk and press any key to reboot ...\r\n"
+ .ascii "Remove disk and press any key to reboot...\r\n"
.byte 0
=20
#ifdef CONFIG_EFI_STUB
@@ -108,7 +107,7 @@ coff_header:
#else
.word 0x8664 # x86-64
#endif
- .word 3 # nr_sections
+ .word 4 # nr_sections
.long 0 # TimeDateStamp
.long 0 # PointerToSymbolTable
.long 1 # NumberOfSymbols
@@ -250,6 +249,25 @@ section_table:
.word 0 # NumberOfLineNumbers
.long 0x60500020 # Characteristics (section flags)
=20
+ #
+ # The offset & size fields are filled in by build.c.
+ #
+ .ascii ".bss"
+ .byte 0
+ .byte 0
+ .byte 0
+ .byte 0
+ .long 0
+ .long 0x0
+ .long 0 # Size of initialized data
+ # on disk
+ .long 0x0
+ .long 0 # PointerToRelocations
+ .long 0 # PointerToLineNumbers
+ .word 0 # NumberOfRelocations
+ .word 0 # NumberOfLineNumbers
+ .long 0xc8000080 # Characteristics (section flags)
+
#endif /* CONFIG_EFI_STUB */
=20
# Kernel attributes; used by setup. This is part 1 of the
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index c941d6a8887f..687dd281c23e 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -141,7 +141,7 @@ static void usage(void)
=20
#ifdef CONFIG_EFI_STUB
=20
-static void update_pecoff_section_header(char *section_name, u32 offse=
t, u32 size)
+static void update_pecoff_section_header_fields(char *section_name, u3=
2 vma, u32 size, u32 datasz, u32 offset)
{
unsigned int pe_header;
unsigned short num_sections;
@@ -162,10 +162,10 @@ static void update_pecoff_section_header(char *se=
ction_name, u32 offset, u32 siz
put_unaligned_le32(size, section + 0x8);
=20
/* section header vma field */
- put_unaligned_le32(offset, section + 0xc);
+ put_unaligned_le32(vma, section + 0xc);
=20
/* section header 'size of initialised data' field */
- put_unaligned_le32(size, section + 0x10);
+ put_unaligned_le32(datasz, section + 0x10);
=20
/* section header 'file offset' field */
put_unaligned_le32(offset, section + 0x14);
@@ -177,6 +177,11 @@ static void update_pecoff_section_header(char *sec=
tion_name, u32 offset, u32 siz
}
}
=20
+static void update_pecoff_section_header(char *section_name, u32 offse=
t, u32 size)
+{
+ update_pecoff_section_header_fields(section_name, offset, size, size,=
offset);
+}
+
static void update_pecoff_setup_and_reloc(unsigned int size)
{
u32 setup_offset =3D 0x200;
@@ -201,9 +206,6 @@ static void update_pecoff_text(unsigned int text_st=
art, unsigned int file_sz)
=20
pe_header =3D get_unaligned_le32(&buf[0x3c]);
=20
- /* Size of image */
- put_unaligned_le32(file_sz, &buf[pe_header + 0x50]);
-
/*
* Size of code: Subtract the size of the first sector (512 bytes)
* which includes the header.
@@ -218,6 +220,22 @@ static void update_pecoff_text(unsigned int text_s=
tart, unsigned int file_sz)
update_pecoff_section_header(".text", text_start, text_sz);
}
=20
+static void update_pecoff_bss(unsigned int file_sz, unsigned int init_=
sz)
+{
+ unsigned int pe_header;
+ unsigned int bss_sz =3D init_sz - file_sz;
+
+ pe_header =3D get_unaligned_le32(&buf[0x3c]);
+
+ /* Size of uninitialized data */
+ put_unaligned_le32(bss_sz, &buf[pe_header + 0x24]);
+
+ /* Size of image */
+ put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
+
+ update_pecoff_section_header_fields(".bss", file_sz, bss_sz, 0, 0);
+}
+
#endif /* CONFIG_EFI_STUB */
=20
=20
@@ -269,6 +287,9 @@ int main(int argc, char ** argv)
int fd;
void *kernel;
u32 crc =3D 0xffffffffUL;
+#ifdef CONFIG_EFI_STUB
+ unsigned int init_sz;
+#endif
=20
/* Defaults for old kernel */
#ifdef CONFIG_X86_32
@@ -339,7 +360,9 @@ int main(int argc, char ** argv)
put_unaligned_le32(sys_size, &buf[0x1f4]);
=20
#ifdef CONFIG_EFI_STUB
- update_pecoff_text(setup_sectors * 512, sz + i + ((sys_size * 16) - s=
z));
+ update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));
+ init_sz =3D get_unaligned_le32(&buf[0x260]);
+ update_pecoff_bss(i + (sys_size * 16), init_sz);
=20
#ifdef CONFIG_X86_64 /* Yes, this is really how we defined it :( */
efi_stub_entry -=3D 0x200;
--=20
2.0.1
Jiri Slaby
2014-07-30 12:14:30 UTC
Permalink
From: Christoph Paasch <***@uclouvain.be>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 5924f17a8a30c2ae18d034a86ee7581b34accef6 ]

When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
tcp_push with mss_now being 0. If data is in the send-queue and
tcp_set_skb_tso_segs gets called, we crash because it will divide by
mss_now:

[ 347.151939] divide error: 0000 [#1] SMP
[ 347.152907] Modules linked in:
[ 347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4
[ 347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
[ 347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1
[ 347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
[ 347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
[ 347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
[ 347.152907] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
[ 347.152907] Stack:
[ 347.152907] c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
[ 347.152907] f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
[ 347.152907] c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
[ 347.152907] Call Trace:
[ 347.152907] [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34
[ 347.152907] [<c16013e6>] tcp_init_tso_segs+0x36/0x50
[ 347.152907] [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0
[ 347.152907] [<c10a0188>] ? up+0x28/0x40
[ 347.152907] [<c10acc85>] ? console_unlock+0x295/0x480
[ 347.152907] [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0
[ 347.152907] [<c1605716>] __tcp_push_pending_frames+0x36/0xd0
[ 347.152907] [<c15f4860>] tcp_push+0xf0/0x120
[ 347.152907] [<c15f7641>] tcp_sendmsg+0xf1/0xbf0
[ 347.152907] [<c116d920>] ? kmem_cache_free+0xf0/0x120
[ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40
[ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40
[ 347.152907] [<c114f0f0>] ? do_wp_page+0x3e0/0x850
[ 347.152907] [<c161c36a>] inet_sendmsg+0x4a/0xb0
[ 347.152907] [<c1150269>] ? handle_mm_fault+0x709/0xfb0
[ 347.152907] [<c15a006b>] sock_aio_write+0xbb/0xd0
[ 347.152907] [<c1180b79>] do_sync_write+0x69/0xa0
[ 347.152907] [<c1181023>] vfs_write+0x123/0x160
[ 347.152907] [<c1181d55>] SyS_write+0x55/0xb0
[ 347.152907] [<c167f0d8>] sysenter_do_call+0x12/0x28

This can easily be reproduced with the following packetdrill-script (the
"magic" with netem, sk_pacing and limit_output_bytes is done to prevent
the kernel from pushing all segments, because hitting the limit without
doing this is not so easy with packetdrill):

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

+0 < S 0:0(0) win 32792 <mss 1460>
+0 > S. 0:0(0) ack 1 <mss 1460>
+0.1 < . 1:1(0) ack 1 win 65000

+0 accept(3, ..., ...) = 4

// This forces that not all segments of the snd-queue will be pushed
+0 `tc qdisc add dev tun0 root netem delay 10ms`
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
+0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0

+0 write(4,...,10000) = 10000
+0 write(4,...,10000) = 10000

// Set tcp-repair stuff, particularly TCP_RECV_QUEUE
+0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
+0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0

// This now will make the write push the remaining segments
+0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`

// Now we will crash
+0 write(4,...,1000) = 1000

This happens since ec3423257508 (tcp: fix retransmission in repair
mode). Prior to that, the call to tcp_push was prevented by a check for
tp->repair.

The patch fixes it, by adding the new goto-label out_nopush. When exiting
tcp_sendmsg and a push is not required, which is the case for tp->repair,
we go to this label.

When repairing and calling send() with TCP_RECV_QUEUE, the data is
actually put in the receive-queue. So, no push is required because no
data has been added to the send-queue.

Cc: Andrew Vagin <***@openvz.org>
Cc: Pavel Emelyanov <***@parallels.com>
Fixes: ec3423257508 (tcp: fix retransmission in repair mode)
Signed-off-by: Christoph Paasch <***@uclouvain.be>
Acked-by: Andrew Vagin <***@openvz.org>
Acked-by: Pavel Emelyanov <***@parallels.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/tcp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 531ab5721d79..cbe5adaad338 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1064,7 +1064,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
if (unlikely(tp->repair)) {
if (tp->repair_queue == TCP_RECV_QUEUE) {
copied = tcp_send_rcvq(sk, msg, size);
- goto out;
+ goto out_nopush;
}

err = -EINVAL;
@@ -1237,6 +1237,7 @@ wait_for_memory:
out:
if (copied)
tcp_push(sk, flags, mss_now, tp->nonagle);
+out_nopush:
release_sock(sk);
return copied + copied_syn;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:01 UTC
Permalink
From: Mateusz Guzik <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit b0ab99e7736af88b8ac1b7ae50ea287fffa2badc upstream.

proc_sched_show_task() does:

if (nr_switches)
do_div(avg_atom, nr_switches);

nr_switches is unsigned long and do_div truncates it to 32 bits, which
means it can test non-zero on e.g. x86-64 and be truncated to zero for
division.

Fix the problem by using div64_ul() instead.

As a side effect calculations of avg_atom for big nr_switches are now correct.

Signed-off-by: Mateusz Guzik <***@redhat.com>
Signed-off-by: Peter Zijlstra <***@infradead.org>
Cc: Linus Torvalds <***@linux-foundation.org>
Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-***@redhat.com
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/sched/debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index fd9ca1de7559..0efe4a27540b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -554,7 +554,7 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)

avg_atom = p->se.sum_exec_runtime;
if (nr_switches)
- do_div(avg_atom, nr_switches);
+ avg_atom = div64_ul(avg_atom, nr_switches);
else
avg_atom = -1LL;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:47 UTC
Permalink
=46rom: HATAYAMA Daisuke <***@jp.fujitsu.com>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream.

Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.

=46or example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.

This commit deals with the issue simply by ignoring CondChgd bit.

Here is explanation in detail.

On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.

intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.

The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.

As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.

I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.

On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.

I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:

In 18.9.1:

"The MSR_PERF_GLOBAL_STATUS MSR also provides a =C2=A1sticky bit=C2=A2=
to
indicate changes to the state of performancmonitoring hardware"

In Table 35-2 IA-32 Architectural MSRs

63 CondChg: status bits of this register has changed.

These are different from the bahviour I see on the actual system as I
explained above.

At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.

Signed-off-by: HATAYAMA Daisuke <***@jp.fujitsu.com>
Acked-by: Don Zickus <***@redhat.com>
Signed-off-by: Peter Zijlstra <***@infradead.org>
Cc: Arnaldo Carvalho de Melo <***@kernel.org>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: linux-***@vger.kernel.org
Link: http://lkml.kernel.org/r/***@jp.=
fujitsu.com
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/x86/kernel/cpu/perf_event_intel.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/c=
pu/perf_event_intel.c
index f31a1655d1ff..aa4b5c132c66 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1365,6 +1365,15 @@ again:
intel_pmu_lbr_read();
=20
/*
+ * CondChgd bit 63 doesn't mean any overflow status. Ignore
+ * and clear the bit.
+ */
+ if (__test_and_clear_bit(63, (unsigned long *)&status)) {
+ if (!status)
+ goto done;
+ }
+
+ /*
* PEBS overflow sets bit 62 in the global status register
*/
if (__test_and_clear_bit(62, (unsigned long *)&status)) {
--=20
2.0.1
Jiri Slaby
2014-07-30 12:15:03 UTC
Permalink
From: Marek Vasut <***@denx.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 22970070e027cbbb9b2878f8f7c31d0d7f29e94d upstream.

Add alias for FEC ethernet on i.MX to allow bootloaders (like U-Boot)
patch-in the MAC address for FEC using this alias.

Signed-off-by: Marek Vasut <***@denx.de>
Signed-off-by: Shawn Guo <***@linaro.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/arm/boot/dts/imx25.dtsi | 1 +
arch/arm/boot/dts/imx27.dtsi | 1 +
arch/arm/boot/dts/imx51.dtsi | 1 +
arch/arm/boot/dts/imx53.dtsi | 1 +
4 files changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/imx25.dtsi b/arch/arm/boot/dts/imx25.dtsi
index 737ed5da8f71..de1611966d8b 100644
--- a/arch/arm/boot/dts/imx25.dtsi
+++ b/arch/arm/boot/dts/imx25.dtsi
@@ -30,6 +30,7 @@
spi2 = &spi3;
usb0 = &usbotg;
usb1 = &usbhost1;
+ ethernet0 = &fec;
};

cpus {
diff --git a/arch/arm/boot/dts/imx27.dtsi b/arch/arm/boot/dts/imx27.dtsi
index b7a1c6d950b9..c07aea4f66cb 100644
--- a/arch/arm/boot/dts/imx27.dtsi
+++ b/arch/arm/boot/dts/imx27.dtsi
@@ -30,6 +30,7 @@
spi0 = &cspi1;
spi1 = &cspi2;
spi2 = &cspi3;
+ ethernet0 = &fec;
};

aitc: aitc-interrupt-***@e0000000 {
diff --git a/arch/arm/boot/dts/imx51.dtsi b/arch/arm/boot/dts/imx51.dtsi
index 54cee6517902..6d2a5343691f 100644
--- a/arch/arm/boot/dts/imx51.dtsi
+++ b/arch/arm/boot/dts/imx51.dtsi
@@ -27,6 +27,7 @@
spi0 = &ecspi1;
spi1 = &ecspi2;
spi2 = &cspi;
+ ethernet0 = &fec;
};

tzic: tz-interrupt-***@e0000000 {
diff --git a/arch/arm/boot/dts/imx53.dtsi b/arch/arm/boot/dts/imx53.dtsi
index dc72353de0b3..50eda500f39a 100644
--- a/arch/arm/boot/dts/imx53.dtsi
+++ b/arch/arm/boot/dts/imx53.dtsi
@@ -33,6 +33,7 @@
spi0 = &ecspi1;
spi1 = &ecspi2;
spi2 = &cspi;
+ ethernet0 = &fec;
};

cpus {
--
2.0.1
Jiri Slaby
2014-07-30 12:14:48 UTC
Permalink
From: Amitkumar Karwar <***@marvell.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit d76744a93246eccdca1106037e8ee29debf48277 upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=70191
https://bugzilla.kernel.org/show_bug.cgi?id=77581

It is observed that sometimes Tx packet is downloaded without
adding driver's txpd header. This results in firmware parsing
garbage data as packet length. Sometimes firmware is unable
to read the packet if length comes out as invalid. This stops
further traffic and timeout occurs.

The root cause is uninitialized fields in tx_info(skb->cb) of
packet used to get garbage values. In this case if
MWIFIEX_BUF_FLAG_REQUEUED_PKT flag is mistakenly set, txpd
header was skipped. This patch makes sure that tx_info is
correctly initialized to fix the problem.

Reported-by: Andrew Wiley <***@gmail.com>
Reported-by: Linus Gasser <***@markas-al-nour.org>
Reported-by: Michael Hirsch <***@teufel.de>
Tested-by: Xinming Hu <***@marvell.com>
Signed-off-by: Amitkumar Karwar <***@marvell.com>
Signed-off-by: Maithili Hinge <***@marvell.com>
Signed-off-by: Avinash Patil <***@marvell.com>
Signed-off-by: Bing Zhao <***@marvell.com>
Signed-off-by: John W. Linville <***@tuxdriver.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/wireless/mwifiex/main.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
index c2b91f566e05..edf5239d93df 100644
--- a/drivers/net/wireless/mwifiex/main.c
+++ b/drivers/net/wireless/mwifiex/main.c
@@ -654,6 +654,7 @@ mwifiex_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
}

tx_info = MWIFIEX_SKB_TXCB(skb);
+ memset(tx_info, 0, sizeof(*tx_info));
tx_info->bss_num = priv->bss_num;
tx_info->bss_type = priv->bss_type;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:19 UTC
Permalink
From: Vasily Averin <***@parallels.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 295dc39d941dc2ae53d5c170365af4c9d5c16212 upstream.

Currently umount on symlink blocks following umount:

/vz is separate mount

# ls /vz/ -al | grep test
drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir
lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir
# umount -l /vz/testlink
umount: /vz/testlink: not mounted (expected)

# lsof /vz
# umount /vz
umount: /vz: device is busy. (unexpected)

In this case mountpoint_last() gets an extra refcount on path->mnt

Signed-off-by: Vasily Averin <***@openvz.org>
Acked-by: Ian Kent <***@themaw.net>
Acked-by: Jeff Layton <***@primarydata.com>
Signed-off-by: Christoph Hellwig <***@lst.de>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 338d08b7eae2..e3249d565c95 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2281,9 +2281,10 @@ done:
goto out;
}
path->dentry = dentry;
- path->mnt = mntget(nd->path.mnt);
+ path->mnt = nd->path.mnt;
if (should_follow_link(dentry->d_inode, nd->flags & LOOKUP_FOLLOW))
return 1;
+ mntget(path->mnt);
follow_mount(path);
error = 0;
out:
--
2.0.1
Jiri Slaby
2014-07-30 12:15:20 UTC
Permalink
From: Sven Wegener <***@stealer.net>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 8142b215501f8b291a108a202b3a053a265b03dd upstream.

Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.
int result = syscall(666);
printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
result=666 errno=0 error=Success
Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
result=-1 errno=38 error=Function not implemented
The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.

Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.

Signed-off-by: Sven Wegener <***@stealer.net>
Link: http://lkml.kernel.org/r/***@titan.int.lan.stealer.net
Reviewed-and-tested-by: Andy Lutomirski <***@amacapital.net>
Signed-off-by: H. Peter Anvin <***@zytor.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/x86/kernel/entry_32.S | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 3308125c90aa..1fc2a347c47c 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -436,8 +436,8 @@ sysenter_do_call:
cmpl $(NR_syscalls), %eax
jae sysenter_badsys
call *sys_call_table(,%eax,4)
- movl %eax,PT_EAX(%esp)
sysenter_after_call:
+ movl %eax,PT_EAX(%esp)
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
@@ -517,6 +517,7 @@ ENTRY(system_call)
jae syscall_badsys
syscall_call:
call *sys_call_table(,%eax,4)
+syscall_after_call:
movl %eax,PT_EAX(%esp) # store the return value
syscall_exit:
LOCKDEP_SYS_EXIT
@@ -686,12 +687,12 @@ syscall_fault:
END(syscall_fault)

syscall_badsys:
- movl $-ENOSYS,PT_EAX(%esp)
- jmp syscall_exit
+ movl $-ENOSYS,%eax
+ jmp syscall_after_call
END(syscall_badsys)

sysenter_badsys:
- movl $-ENOSYS,PT_EAX(%esp)
+ movl $-ENOSYS,%eax
jmp sysenter_after_call
END(syscall_badsys)
CFI_ENDPROC
--
2.0.1
Jiri Slaby
2014-07-30 12:15:14 UTC
Permalink
From: Mikulas Patocka <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 694617474e33b8603fc76e090ed7d09376514b1a upstream.

The patch 3e374919b314f20e2a04f641ebc1093d758f66a4 is supposed to fix the
problem where kmem_cache_create incorrectly reports duplicate cache name
and fails. The problem is described in the header of that patch.

However, the patch doesn't really fix the problem because of these
reasons:

* the logic to test for debugging is reversed. It was intended to perform
the check only if slub debugging is enabled (which implies that caches
with the same parameters are not merged). Therefore, there should be
#if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_DEBUG_ON)
The current code has the condition reversed and performs the test if
debugging is disabled.

* slub debugging may be enabled or disabled based on kernel command line,
CONFIG_SLUB_DEBUG_ON is just the default settings. Therefore the test
based on definition of CONFIG_SLUB_DEBUG_ON is unreliable.

This patch fixes the problem by removing the test
"!defined(CONFIG_SLUB_DEBUG_ON)". Therefore, duplicate names are never
checked if the SLUB allocator is used.

Note to stable kernel maintainers: when backporint this patch, please
backport also the patch 3e374919b314f20e2a04f641ebc1093d758f66a4.

Acked-by: David Rientjes <***@google.com>
Acked-by: Christoph Lameter <***@linux.com>
Signed-off-by: Mikulas Patocka <***@redhat.com>
Signed-off-by: Pekka Enberg <***@kernel.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
mm/slab_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index e2e98af703ea..97e5f5eeca12 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -56,7 +56,7 @@ static int kmem_cache_sanity_check(struct mem_cgroup *memcg, const char *name,
continue;
}

-#if !defined(CONFIG_SLUB) || !defined(CONFIG_SLUB_DEBUG_ON)
+#if !defined(CONFIG_SLUB)
/*
* For simplicity, we won't check this in the list of memcg
* caches. We have control over memcg naming, and if there
--
2.0.1
Jiri Slaby
2014-07-30 12:15:09 UTC
Permalink
From: Christoph Hellwig <***@lst.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit d45b3279a5a2252cafcd665bbf2db8c9b31ef783 upstream.

There is no inherent reason why the last put of a tag structure must be
the one for the Scsi_Host, as device model objects can be held for
arbitrary periods. Merge blk_free_tags and __blk_free_tags into a single
funtion that just release a references and get rid of the BUG() when the
host reference wasn't the last.

Signed-off-by: Christoph Hellwig <***@lst.de>
Signed-off-by: Jens Axboe <***@fb.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
block/blk-tag.c | 33 +++++++--------------------------
1 file changed, 7 insertions(+), 26 deletions(-)

diff --git a/block/blk-tag.c b/block/blk-tag.c
index 3f33d8672268..a185b86741e5 100644
--- a/block/blk-tag.c
+++ b/block/blk-tag.c
@@ -27,18 +27,15 @@ struct request *blk_queue_find_tag(struct request_queue *q, int tag)
EXPORT_SYMBOL(blk_queue_find_tag);

/**
- * __blk_free_tags - release a given set of tag maintenance info
+ * blk_free_tags - release a given set of tag maintenance info
* @bqt: the tag map to free
*
- * Tries to free the specified @bqt. Returns true if it was
- * actually freed and false if there are still references using it
+ * Drop the reference count on @bqt and frees it when the last reference
+ * is dropped.
*/
-static int __blk_free_tags(struct blk_queue_tag *bqt)
+void blk_free_tags(struct blk_queue_tag *bqt)
{
- int retval;
-
- retval = atomic_dec_and_test(&bqt->refcnt);
- if (retval) {
+ if (atomic_dec_and_test(&bqt->refcnt)) {
BUG_ON(find_first_bit(bqt->tag_map, bqt->max_depth) <
bqt->max_depth);

@@ -50,9 +47,8 @@ static int __blk_free_tags(struct blk_queue_tag *bqt)

kfree(bqt);
}
-
- return retval;
}
+EXPORT_SYMBOL(blk_free_tags);

/**
* __blk_queue_free_tags - release tag maintenance info
@@ -69,28 +65,13 @@ void __blk_queue_free_tags(struct request_queue *q)
if (!bqt)
return;

- __blk_free_tags(bqt);
+ blk_free_tags(bqt);

q->queue_tags = NULL;
queue_flag_clear_unlocked(QUEUE_FLAG_QUEUED, q);
}

/**
- * blk_free_tags - release a given set of tag maintenance info
- * @bqt: the tag map to free
- *
- * For externally managed @bqt frees the map. Callers of this
- * function must guarantee to have released all the queues that
- * might have been using this tag map.
- */
-void blk_free_tags(struct blk_queue_tag *bqt)
-{
- if (unlikely(!__blk_free_tags(bqt)))
- BUG();
-}
-EXPORT_SYMBOL(blk_free_tags);
-
-/**
* blk_queue_free_tags - release tag maintenance info
* @q: the request queue for the device
*
--
2.0.1
Jiri Slaby
2014-07-30 12:15:02 UTC
Permalink
From: Benjamin LaHaise <***@kvack.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 263782c1c95bbddbb022dc092fd89a36bb8d5577 upstream.

As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to
have put_reqs_available() called from irq context. While put_reqs_available()
is per cpu, it did not protect itself from interrupts on the same CPU. This
lead to aio_complete() corrupting the available io requests count when run
under a heavy O_DIRECT workloads as reported by Robert Elliott. Fix this by
disabling irq updates around the per cpu batch updates of reqs_available.

Many thanks to Robert and folks for testing and tracking this down.

Reported-by: Robert Elliot <***@hp.com>
Tested-by: Robert Elliot <***@hp.com>
Signed-off-by: Benjamin LaHaise <***@kvack.org>
Cc: Jens Axboe <***@kernel.dk>, Christoph Hellwig <***@infradead.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/aio.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index e609e15f36b9..6d68e01dc7ca 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -830,16 +830,20 @@ void exit_aio(struct mm_struct *mm)
static void put_reqs_available(struct kioctx *ctx, unsigned nr)
{
struct kioctx_cpu *kcpu;
+ unsigned long flags;

preempt_disable();
kcpu = this_cpu_ptr(ctx->cpu);

+ local_irq_save(flags);
kcpu->reqs_available += nr;
+
while (kcpu->reqs_available >= ctx->req_batch * 2) {
kcpu->reqs_available -= ctx->req_batch;
atomic_add(ctx->req_batch, &ctx->reqs_available);
}

+ local_irq_restore(flags);
preempt_enable();
}

@@ -847,10 +851,12 @@ static bool get_reqs_available(struct kioctx *ctx)
{
struct kioctx_cpu *kcpu;
bool ret = false;
+ unsigned long flags;

preempt_disable();
kcpu = this_cpu_ptr(ctx->cpu);

+ local_irq_save(flags);
if (!kcpu->reqs_available) {
int old, avail = atomic_read(&ctx->reqs_available);

@@ -869,6 +875,7 @@ static bool get_reqs_available(struct kioctx *ctx)
ret = true;
kcpu->reqs_available--;
out:
+ local_irq_restore(flags);
preempt_enable();
return ret;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:15:15 UTC
Permalink
From: Dmitry Torokhov <***@chromium.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 50c5d36dab930b1f1b1e3348b8608aa8b9ee7610 upstream.

We attempt to remove noise from coordinates reported by devices in
input_handle_abs_event(), unfortunately, unless we were dropping the
event altogether, we were ignoring the adjusted value and were passing
on the original value instead.

Reviewed-by: Andrew de los Reyes <***@chromium.org>
Reviewed-by: Benson Leung <***@chromium.org>
Reviewed-by: David Herrmann <***@gmail.com>
Reviewed-by: Henrik Rydberg <***@euromail.se>
Signed-off-by: Dmitry Torokhov <***@gmail.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/input/input.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/input/input.c b/drivers/input/input.c
index 74f47980117b..fcf77af28866 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -257,9 +257,10 @@ static int input_handle_abs_event(struct input_dev *dev,
}

static int input_get_disposition(struct input_dev *dev,
- unsigned int type, unsigned int code, int value)
+ unsigned int type, unsigned int code, int *pval)
{
int disposition = INPUT_IGNORE_EVENT;
+ int value = *pval;

switch (type) {

@@ -357,6 +358,7 @@ static int input_get_disposition(struct input_dev *dev,
break;
}

+ *pval = value;
return disposition;
}

@@ -365,7 +367,7 @@ static void input_handle_event(struct input_dev *dev,
{
int disposition;

- disposition = input_get_disposition(dev, type, code, value);
+ disposition = input_get_disposition(dev, type, code, &value);

if ((disposition & INPUT_PASS_TO_DEVICE) && dev->event)
dev->event(dev, type, code, value);
--
2.0.1
Jiri Slaby
2014-07-30 12:15:04 UTC
Permalink
From: Anton Kolesov <***@synopsys.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit a4b6cb735b25aa84a462a1985e3e43bebaf5beb4 upstream.

This patch adds implementation of GET_THREAD_AREA ptrace request type. This
is required by GDB to debug NPTL applications.

Signed-off-by: Anton Kolesov <***@synopsys.com>
Signed-off-by: Vineet Gupta <***@synopsys.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
arch/arc/include/uapi/asm/ptrace.h | 1 +
arch/arc/kernel/ptrace.c | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/arch/arc/include/uapi/asm/ptrace.h b/arch/arc/include/uapi/asm/ptrace.h
index 2618cc13ba75..76a7739aab1c 100644
--- a/arch/arc/include/uapi/asm/ptrace.h
+++ b/arch/arc/include/uapi/asm/ptrace.h
@@ -11,6 +11,7 @@
#ifndef _UAPI__ASM_ARC_PTRACE_H
#define _UAPI__ASM_ARC_PTRACE_H

+#define PTRACE_GET_THREAD_AREA 25

#ifndef __ASSEMBLY__
/*
diff --git a/arch/arc/kernel/ptrace.c b/arch/arc/kernel/ptrace.c
index 5d76706139dd..13b3ffb27a38 100644
--- a/arch/arc/kernel/ptrace.c
+++ b/arch/arc/kernel/ptrace.c
@@ -146,6 +146,10 @@ long arch_ptrace(struct task_struct *child, long request,
pr_debug("REQ=%ld: ADDR =0x%lx, DATA=0x%lx)\n", request, addr, data);

switch (request) {
+ case PTRACE_GET_THREAD_AREA:
+ ret = put_user(task_thread_info(child)->thr_ptr,
+ (unsigned long __user *)data);
+ break;
default:
ret = ptrace_request(child, request, addr, data);
break;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:13 UTC
Permalink
From: Tony Luck <***@intel.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 58d4e21e50ff3cc57910a8abc20d7e14375d2f61 upstream.

The "uptime" trace clock added in:

commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Link: http://lkml.kernel.org/p/***@intel.com

Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies"
Signed-off-by: Tony Luck <***@intel.com>
Signed-off-by: Steven Rostedt <***@goodmis.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/trace/trace.c | 2 +-
kernel/trace/trace_clock.c | 9 +++++----
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b7566fe4d607..dcdf4e682dd4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -764,7 +764,7 @@ static struct {
{ trace_clock_local, "local", 1 },
{ trace_clock_global, "global", 1 },
{ trace_clock_counter, "counter", 0 },
- { trace_clock_jiffies, "uptime", 1 },
+ { trace_clock_jiffies, "uptime", 0 },
{ trace_clock, "perf", 1 },
ARCH_TRACE_CLOCKS
};
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..57b67b1f24d1 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,14 @@ u64 notrace trace_clock(void)

/*
* trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * Note that this use of jiffies_64 is not completely safe on
+ * 32-bit systems. But the window is tiny, and the effect if
+ * we are affected is that we will have an obviously bogus
+ * timestamp on a trace event - i.e. not life threatening.
*/
u64 notrace trace_clock_jiffies(void)
{
- u64 jiffy = jiffies - INITIAL_JIFFIES;
-
- /* Return nsecs */
- return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+ return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
}

/*
--
2.0.1
Jiri Slaby
2014-07-30 12:15:18 UTC
Permalink
From: Guenter Roeck <***@roeck-us.net>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 043572d5444116b9d9ad8ae763cf069e7accbc30 upstream.

Temperature limit clamps are applied after converting the temperature
from milli-degrees C to degrees C, so either the clamp limit needs
to be specified in degrees C, not milli-degrees C, or clamping must
happen before converting to degrees C. Use the latter method to avoid
overflows.

vrm is an u8, so the written value needs to be limited to [0, 255].

Cc: Axel Lin <***@ingics.com>
Signed-off-by: Guenter Roeck <***@roeck-us.net>
Reviewed-by: Jean Delvare <***@suse.de>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/hwmon/smsc47m192.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/smsc47m192.c b/drivers/hwmon/smsc47m192.c
index efee4c59239f..34b9a601ad07 100644
--- a/drivers/hwmon/smsc47m192.c
+++ b/drivers/hwmon/smsc47m192.c
@@ -86,7 +86,7 @@ static inline u8 IN_TO_REG(unsigned long val, int n)
*/
static inline s8 TEMP_TO_REG(int val)
{
- return clamp_val(SCALE(val, 1, 1000), -128000, 127000);
+ return SCALE(clamp_val(val, -128000, 127000), 1, 1000);
}

static inline int TEMP_FROM_REG(s8 val)
@@ -384,6 +384,8 @@ static ssize_t set_vrm(struct device *dev, struct device_attribute *attr,
err = kstrtoul(buf, 10, &val);
if (err)
return err;
+ if (val > 255)
+ return -EINVAL;

data->vrm = val;
return count;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:22 UTC
Permalink
=46rom: Linus Torvalds <***@linux-foundation.org>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

commit 2062afb4f804afef61cbe62a30cac9a46e58e067 upstream.

Michel D=C3=A4nzer and a couple of other people reported inexplicable r=
andom
oopses in the scheduler, and the cause turns out to be gcc mis-compilin=
g
the load_balance() function when debugging is enabled. The gcc bug
apparently goes back to gcc-4.5, but slight optimization changes means
that it now showed up as a problem in 4.9.0 and 4.9.1.

The instruction scheduling problem causes gcc to schedule a spill
operation to before the stack frame has been created, which in turn can
corrupt the spilled value if an interrupt comes in. There may be other
effects of this bug too, but that's the code generation problem seen in
Michel's case.

This is fixed in current gcc HEAD, but the workaround as suggested by
Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
when compiling the kernel, which disables the gcc code that causes the
problem. This can result in slightly worse debug information for
variable accesses, but that is infinitely preferable to actual code
generation problems.

Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
non-debug builds to verify that the debug build would be identical: we
can do

export GCC_COMPARE_DEBUG=3D1

to make gcc internally verify that the result of the build is
independent of the "-g" flag (it will make the compiler build everythin=
g
twice, toggling the debug flag, and compare the results).

Without the "-fno-var-tracking-assignments" option, the build would fai=
l
(even with 4.8.3 that didn't show the actual stack frame bug) with a gc=
c
compare failure.

See also gcc bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D61801

Reported-by: Michel D=C3=A4nzer <***@daenzer.net>
Suggested-by: Markus Trippelsdorf <***@trippelsdorf.de>
Cc: Jakub Jelinek <***@redhat.com>
Signed-off-by: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index 4d25b56bf81c..c1a2ffa1e010 100644
--- a/Makefile
+++ b/Makefile
@@ -614,6 +614,8 @@ KBUILD_CFLAGS +=3D -fomit-frame-pointer
endif
endif
=20
+KBUILD_CFLAGS +=3D $(call cc-option, -fno-var-tracking-assignments)
+
ifdef CONFIG_DEBUG_INFO
KBUILD_CFLAGS +=3D -g
KBUILD_AFLAGS +=3D -gdwarf-2
--=20
2.0.1
Jiri Slaby
2014-07-30 12:14:41 UTC
Permalink
From: Jon Paul Maloy <***@ericsson.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 999417549c16dd0e3a382aa9f6ae61688db03181 ]

If the 'next' pointer of the last fragment buffer in a message is not
zeroed before reassembly, we risk ending up with a corrupt message,
since the reassembly function itself isn't doing this.

Currently, when a buffer is retrieved from the deferred queue of the
broadcast link, the next pointer is not cleared, with the result as
described above.

This commit corrects this, and thereby fixes a bug that may occur when
long broadcast messages are transmitted across dual interfaces. The bug
has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc:
message reassembly using fragment chain")

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <***@ericsson.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/tipc/bcast.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 716de1ac6cb5..6ef89256b2fb 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -531,6 +531,7 @@ receive:

buf = node->bclink.deferred_head;
node->bclink.deferred_head = buf->next;
+ buf->next = NULL;
node->bclink.deferred_size--;
goto receive;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:15:12 UTC
Permalink
From: Tejun Heo <***@kernel.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 0b462c89e31f7eb6789713437eb551833ee16ff3 upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL. If someone else starts to drain
while the queue is in this state, the following oops happens.

NULL pointer dereference at 0000000000000028
IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
PGD e4a1067 PUD b773067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
RIP: 0010:[<ffffffff8144e944>] [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
RSP: 0018:ffff88000efd7bf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
FS: 00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
Stack:
ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
Call Trace:
[<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
[<ffffffff81427641>] __blk_drain_queue+0x71/0x180
[<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
[<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
[<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
[<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
[<ffffffff8142d476>] blk_release_queue+0x26/0xd0
[<ffffffff81454968>] kobject_cleanup+0x38/0x70
[<ffffffff81454848>] kobject_put+0x28/0x60
[<ffffffff81427505>] blk_put_queue+0x15/0x20
[<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
[<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
[<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
[<ffffffff817930e2>] device_release+0x32/0xa0
[<ffffffff81454968>] kobject_cleanup+0x38/0x70
[<ffffffff81454848>] kobject_put+0x28/0x60
[<ffffffff817934d7>] put_device+0x17/0x20
[<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
[<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
[<ffffffff817d1257>] sdev_store_delete+0x27/0x30
[<ffffffff81792ca8>] dev_attr_store+0x18/0x30
[<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
[<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
[<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
[<ffffffff811f69bd>] SyS_write+0x4d/0xc0
[<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo <***@kernel.org>
Reported-by: Shirish Pargaonkar <***@suse.com>
Reported-by: Sasha Levin <***@oracle.com>
Reported-by: Jet Chen <***@intel.com>
Tested-by: Shirish Pargaonkar <***@suse.com>
Signed-off-by: Jens Axboe <***@fb.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
block/blk-cgroup.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index dd0dd2d4ceca..d8f80e733cf8 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -859,6 +859,13 @@ void blkcg_drain_queue(struct request_queue *q)
{
lockdep_assert_held(q->queue_lock);

+ /*
+ * @q could be exiting and already have destroyed all blkgs as
+ * indicated by NULL root_blkg. If so, don't confuse policies.
+ */
+ if (!q->root_blkg)
+ return;
+
blk_throtl_drain(q);
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:43 UTC
Permalink
From: Christoph Schulz <***@kristov.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711 ]

The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

/*
* hdrlen includes the 2-byte PPP protocol field, but the
* MTU counts only the payload excluding the protocol field.
* (RFC1661 Section 2)
*/
mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

2948 (echo payload)
+ 8 (ICMPv4 header)
+ 20 (IPv4 header)
---------------------
2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

1489 (PPP payload)
+ 4 (MP header)
+ 2 (PPP protocol field for the MP payload (0x3d))
+ 6 (PPPoE header)
--------------------------
1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
length 2976)
192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
length 2956

The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <***@kristov.de>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ppp/pppoe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 82ee6ed954cb..addd23246eb6 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -675,7 +675,7 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
po->chan.hdrlen = (sizeof(struct pppoe_hdr) +
dev->hard_header_len);

- po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr);
+ po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr) - 2;
po->chan.private = sk;
po->chan.ops = &pppoe_chan_ops;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:06 UTC
Permalink
From: Hans Verkuil <***@xs4all.nl>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3445857b22eafb70a6ac258979e955b116bfd2c6 upstream.

When the audio encoding is changed the driver calls hdpvr_set_audio
with the current opt->audio_input value. However, that should have
been opt->audio_input + 1. So changing the audio encoding inadvertently
changes the input as well. This bug has always been there.

The second bug was introduced in kernel 3.10 and that broke the
default_audio_input module option handling: the audio encoding was
never switched to AC3 if default_audio_input was set to 2 (SPDIF input).

In addition, since starting with 3.10 the audio encoding is always set
at the start the first bug now always happens when the driver is loaded.
In the past this bug would only surface if the user would change the
audio encoding after the driver was loaded.

Also fixes a small trivial typo (bufffer -> buffer).

Signed-off-by: Hans Verkuil <***@cisco.com>
Reported-by: Scott Doty <***@corp.sonic.net>
Signed-off-by: Mauro Carvalho Chehab <***@samsung.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/media/usb/hdpvr/hdpvr-video.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c b/drivers/media/usb/hdpvr/hdpvr-video.c
index 0500c4175d5f..6bce01a674f9 100644
--- a/drivers/media/usb/hdpvr/hdpvr-video.c
+++ b/drivers/media/usb/hdpvr/hdpvr-video.c
@@ -82,7 +82,7 @@ static void hdpvr_read_bulk_callback(struct urb *urb)
}

/*=========================================================================*/
-/* bufffer bits */
+/* buffer bits */

/* function expects dev->io_mutex to be hold by caller */
int hdpvr_cancel_queue(struct hdpvr_device *dev)
@@ -926,7 +926,7 @@ static int hdpvr_s_ctrl(struct v4l2_ctrl *ctrl)
case V4L2_CID_MPEG_AUDIO_ENCODING:
if (dev->flags & HDPVR_FLAG_AC3_CAP) {
opt->audio_codec = ctrl->val;
- return hdpvr_set_audio(dev, opt->audio_input,
+ return hdpvr_set_audio(dev, opt->audio_input + 1,
opt->audio_codec);
}
return 0;
@@ -1198,7 +1198,7 @@ int hdpvr_register_videodev(struct hdpvr_device *dev, struct device *parent,
v4l2_ctrl_new_std_menu(hdl, &hdpvr_ctrl_ops,
V4L2_CID_MPEG_AUDIO_ENCODING,
ac3 ? V4L2_MPEG_AUDIO_ENCODING_AC3 : V4L2_MPEG_AUDIO_ENCODING_AAC,
- 0x7, V4L2_MPEG_AUDIO_ENCODING_AAC);
+ 0x7, ac3 ? dev->options.audio_codec : V4L2_MPEG_AUDIO_ENCODING_AAC);
v4l2_ctrl_new_std_menu(hdl, &hdpvr_ctrl_ops,
V4L2_CID_MPEG_VIDEO_ENCODING,
V4L2_MPEG_VIDEO_ENCODING_MPEG_4_AVC, 0x3,
--
2.0.1
Jiri Slaby
2014-07-30 12:15:05 UTC
Permalink
From: Rickard Strandqvist <***@spectrumdigital.se>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f71920efb1066d71d74811e1dbed658173adf9bf upstream.

Wrong value used in same cases for the aspect ratio.

Signed-off-by: Rickard Strandqvist <***@spectrumdigital.se>
Acked-by: Lad, Prabhakar <***@gmail.com>
Signed-off-by: Hans Verkuil <***@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <***@samsung.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/media/v4l2-core/v4l2-dv-timings.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-dv-timings.c b/drivers/media/v4l2-core/v4l2-dv-timings.c
index c0895f88ce9c..9f2ac588661b 100644
--- a/drivers/media/v4l2-core/v4l2-dv-timings.c
+++ b/drivers/media/v4l2-core/v4l2-dv-timings.c
@@ -594,10 +594,10 @@ struct v4l2_fract v4l2_calc_aspect_ratio(u8 hor_landscape, u8 vert_portrait)
aspect.denominator = 9;
} else if (ratio == 34) {
aspect.numerator = 4;
- aspect.numerator = 3;
+ aspect.denominator = 3;
} else if (ratio == 68) {
aspect.numerator = 15;
- aspect.numerator = 9;
+ aspect.denominator = 9;
} else {
aspect.numerator = hor_landscape + 99;
aspect.denominator = 100;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:11 UTC
Permalink
From: Romain Degez <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit b32bfc06aefab61acc872dec3222624e6cd867ed upstream.

Add support of the Promise FastTrak TX8660 SATA HBA in ahci mode by
registering the board in the ahci_pci_tbl[].

Note: this HBA also provide a hardware RAID mode when activated in
BIOS but specific drivers from the manufacturer are required in this
case.

Signed-off-by: Romain Degez <***@gmail.com>
Tested-by: Romain Degez <***@gmail.com>
Signed-off-by: Tejun Heo <***@kernel.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/ata/ahci.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 5421a820ec7d..efa328bf6724 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -455,6 +455,7 @@ static const struct pci_device_id ahci_pci_tbl[] = {

/* Promise */
{ PCI_VDEVICE(PROMISE, 0x3f20), board_ahci }, /* PDC42819 */
+ { PCI_VDEVICE(PROMISE, 0x3781), board_ahci }, /* FastTrak TX8660 ahci-mode */

/* Asmedia */
{ PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci }, /* ASM1060 */
--
2.0.1
Jiri Slaby
2014-07-30 12:14:40 UTC
Permalink
From: Suresh Reddy <***@emulex.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 4cad9f3b61c7268fa89ab8096e23202300399b5d ]

On BE3, if the clear-interrupt bit of the EQ doorbell is not set the first
time it is armed, ocassionally we have observed that the EQ doesn't raise
anymore interrupts even if it is in armed state.
This patch fixes this by setting the clear-interrupt bit when EQs are
armed for the first time in be_open().

Signed-off-by: Suresh Reddy <***@emulex.com>
Signed-off-by: Sathya Perla <***@emulex.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/emulex/benet/be_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 2c38cc402119..5226c99813c7 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2632,7 +2632,7 @@ static int be_open(struct net_device *netdev)

for_all_evt_queues(adapter, eqo, i) {
napi_enable(&eqo->napi);
- be_eq_notify(adapter, eqo->q.id, true, false, 0);
+ be_eq_notify(adapter, eqo->q.id, true, true, 0);
}
adapter->flags |= BE_FLAGS_NAPI_ENABLED;
--
2.0.1
Jiri Slaby
2014-07-30 12:15:10 UTC
Permalink
From: Kevin Hao <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 1871ee134b73fb4cadab75752a7152ed2813c751 upstream.

The sata on fsl mpc8315e is broken after the commit 8a4aeec8d2d6
("libata/ahci: accommodate tag ordered controllers"). The reason is
that the ata controller on this SoC only implement a queue depth of
16. When issuing the commands in tag order, all the commands in tag
16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata
malfunction. It makes no senses to use a 32 queue in software while
the hardware has less queue depth. So consider the queue depth
implemented by the hardware when requesting a command tag.

Fixes: 8a4aeec8d2d6 ("libata/ahci: accommodate tag ordered controllers")
Signed-off-by: Kevin Hao <***@gmail.com>
Acked-by: Dan Williams <***@intel.com>
Signed-off-by: Tejun Heo <***@kernel.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/ata/libata-core.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d2eb9df3da3d..97b5e01a6814 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4787,6 +4787,10 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
* ata_qc_new - Request an available ATA command, for queueing
* @ap: target port
*
+ * Some ATA host controllers may implement a queue depth which is less
+ * than ATA_MAX_QUEUE. So we shouldn't allocate a tag which is beyond
+ * the hardware limitation.
+ *
* LOCKING:
* None.
*/
@@ -4794,14 +4798,16 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
{
struct ata_queued_cmd *qc = NULL;
- unsigned int i, tag;
+ unsigned int i, tag, max_queue;
+
+ max_queue = ap->scsi_host->can_queue;

/* no command while frozen */
if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
return NULL;

- for (i = 0; i < ATA_MAX_QUEUE; i++) {
- tag = (i + ap->last_tag + 1) % ATA_MAX_QUEUE;
+ for (i = 0, tag = ap->last_tag + 1; i < max_queue; i++, tag++) {
+ tag = tag < max_queue ? tag : 0;

/* the last tag is reserved for internal command. */
if (tag == ATA_TAG_INTERNAL)
@@ -6184,6 +6190,16 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
{
int i, rc;

+ /*
+ * The max queue supported by hardware must not be greater than
+ * ATA_MAX_QUEUE.
+ */
+ if (sht->can_queue > ATA_MAX_QUEUE) {
+ dev_err(host->dev, "BUG: the hardware max queue is too large\n");
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
/* host must have been started */
if (!(host->flags & ATA_HOST_STARTED)) {
dev_err(host->dev, "BUG: trying to register unstarted host\n");
--
2.0.1
Thomas Backlund
2014-07-30 14:37:06 UTC
Permalink
Post by Jiri Slaby
3.12-stable review patch. If anyone has any objections, please let me know.
===============
commit 1871ee134b73fb4cadab75752a7152ed2813c751 upstream.
The sata on fsl mpc8315e is broken after the commit 8a4aeec8d2d6
("libata/ahci: accommodate tag ordered controllers"). The reason is
that the ata controller on this SoC only implement a queue depth of
16. When issuing the commands in tag order, all the commands in tag
16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata
malfunction. It makes no senses to use a 32 queue in software while
the hardware has less queue depth. So consider the queue depth
implemented by the hardware when requesting a command tag.
Fixes: 8a4aeec8d2d6 ("libata/ahci: accommodate tag ordered controllers")
As you have added this to 3.12 branch, you also need to add this to
avoid SAS breakage:

commit 1a112d10f03e83fb3a2fdc4c9165865dec8a3ca6
Author: Tejun Heo <***@kernel.org>
Date: Wed Jul 23 09:05:27 2014 -0400

libata: introduce ata_host->n_tags to avoid oops on SAS controllers

1871ee134b73 ("libata: support the ata host which implements a queue
depth less than 32") directly used ata_port->scsi_host->can_queue from
ata_qc_new() to determine the number of tags supported by the host;
unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
leading to the following oops.

--

Thomas

Jiri Slaby
2014-07-30 12:14:59 UTC
Permalink
From: Takashi Iwai <***@suse.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 upstream.

The commit [247bc037: PM / Sleep: Mitigate race between the freezer
and request_firmware()] introduced the finer state control, but it
also leads to a new bug; for example, a bug report regarding the
firmware loading of intel BT device at suspend/resume:
https://bugzilla.novell.com/show_bug.cgi?id=873790

The root cause seems to be a small window between the process resume
and the clear of usermodehelper lock. The request_firmware() function
checks the UMH lock and gives up when it's in UMH_DISABLE state. This
is for avoiding the invalid f/w loading during suspend/resume phase.
The problem is, however, that usermodehelper_enable() is called at the
end of thaw_processes(). Thus, a thawed process in between can kick
off the f/w loader code path (in this case, via btusb_setup_intel())
even before the call of usermodehelper_enable(). Then
usermodehelper_read_trylock() returns an error and request_firmware()
spews WARN_ON() in the end.

This oneliner patch fixes the issue just by setting to UMH_FREEZING
state again before restarting tasks, so that the call of
request_firmware() will be blocked until the end of this function
instead of returning an error.

Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware())
Link: https://bugzilla.novell.com/show_bug.cgi?id=873790
Signed-off-by: Takashi Iwai <***@suse.de>
Signed-off-by: Rafael J. Wysocki <***@intel.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/power/process.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index 06ec8869dbf1..14f9a8d4725d 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -184,6 +184,7 @@ void thaw_processes(void)

printk("Restarting tasks ... ");

+ __usermodehelper_set_disable_depth(UMH_FREEZING);
thaw_workqueues();

read_lock(&tasklist_lock);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:38 UTC
Permalink
From: Thomas Petazzoni <***@free-electrons.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 4d12bc63ab5e48c1d78fa13883cf6fefcea3afb1 ]

As reported by Maggie Mae Roxas, the mvneta driver doesn't behave
properly in 10 Mbit/s mode. This is due to a misconfiguration of the
MVNETA_GMAC_AUTONEG_CONFIG register: bit MVNETA_GMAC_CONFIG_MII_SPEED
must be set for a 100 Mbit/s speed, but cleared for a 10 Mbit/s speed,
which the driver was not properly doing. This commit adjusts that by
setting the MVNETA_GMAC_CONFIG_MII_SPEED bit only in 100 Mbit/s mode,
and relying on the fact that all the speed related bits of this
register are cleared at the beginning of the mvneta_adjust_link()
function.

This problem exists since c5aff18204da0 ("net: mvneta: driver for
Marvell Armada 370/XP network unit") which is the commit that
introduced the mvneta driver in the kernel.

Cc: <***@vger.kernel.org> # v3.8+
Fixes: c5aff18204da0 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Reported-by: Maggie Mae Roxas <***@gmail.com>
Cc: Maggie Mae Roxas <***@gmail.com>
Signed-off-by: Thomas Petazzoni <***@free-electrons.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/marvell/mvneta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 5cdd2b2f18c5..fabdda91fd0e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2358,7 +2358,7 @@ static void mvneta_adjust_link(struct net_device *ndev)

if (phydev->speed == SPEED_1000)
val |= MVNETA_GMAC_CONFIG_GMII_SPEED;
- else
+ else if (phydev->speed == SPEED_100)
val |= MVNETA_GMAC_CONFIG_MII_SPEED;

mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:57 UTC
Permalink
From: Mike Snitzer <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 9aec8629ec829fc9403788cd959e05dd87988bd1 upstream.

The block size for the thin-pool's data device must remained fixed for
the life of the thin-pool. Disallow any attempt to change the
thin-pool's data block size.

It should be noted that attempting to change the data block size via
thin-pool table reload will be ignored as a side-effect of the thin-pool
handover that the thin-pool target does during thin-pool table reload.

Here is an example outcome of attempting to load a thin-pool table that
reduced the thin-pool's data block size from 1024K to 512K.

Before:
kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks

After:
kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported
kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object
kernel: device-mapper: ioctl: error adding target to table

Signed-off-by: Mike Snitzer <***@redhat.com>
Acked-by: Joe Thornber <***@redhat.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/md/dm-thin-metadata.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 07a6ea3a9820..b63095c73b5f 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -613,6 +613,15 @@ static int __open_metadata(struct dm_pool_metadata *pmd)

disk_super = dm_block_data(sblock);

+ /* Verify the data block size hasn't changed */
+ if (le32_to_cpu(disk_super->data_block_size) != pmd->data_block_size) {
+ DMERR("changing the data block size (from %u to %llu) is not supported",
+ le32_to_cpu(disk_super->data_block_size),
+ (unsigned long long)pmd->data_block_size);
+ r = -EINVAL;
+ goto bad_unlock_sblock;
+ }
+
r = __check_incompat_features(disk_super, pmd);
if (r < 0)
goto bad_unlock_sblock;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:44 UTC
Permalink
From: Sowmini Varadhan <***@oracle.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ]

Nothing cleans up the objects created by
vnet_new(), they are completely leaked.

vnet_exit(), after doing the vio_unregister_driver() to clean
up ports, should call a helper function that iterates over vnet_list
and cleans up those objects. This includes unregister_netdevice()
as well as free_netdev().

Signed-off-by: Sowmini Varadhan <***@oracle.com>
Acked-by: Dave Kleikamp <***@oracle.com>
Reviewed-by: Karl Volz <***@oracle.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/sun/sunvnet.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 3df56840a3b9..398faff8be7a 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -1083,6 +1083,24 @@ static struct vnet *vnet_find_or_create(const u64 *local_mac)
return vp;
}

+static void vnet_cleanup(void)
+{
+ struct vnet *vp;
+ struct net_device *dev;
+
+ mutex_lock(&vnet_list_mutex);
+ while (!list_empty(&vnet_list)) {
+ vp = list_first_entry(&vnet_list, struct vnet, list);
+ list_del(&vp->list);
+ dev = vp->dev;
+ /* vio_unregister_driver() should have cleaned up port_list */
+ BUG_ON(!list_empty(&vp->port_list));
+ unregister_netdev(dev);
+ free_netdev(dev);
+ }
+ mutex_unlock(&vnet_list_mutex);
+}
+
static const char *local_mac_prop = "local-mac-address";

static struct vnet *vnet_find_parent(struct mdesc_handle *hp,
@@ -1240,7 +1258,6 @@ static int vnet_port_remove(struct vio_dev *vdev)

kfree(port);

- unregister_netdev(vp->dev);
}
return 0;
}
@@ -1268,6 +1285,7 @@ static int __init vnet_init(void)
static void __exit vnet_exit(void)
{
vio_unregister_driver(&vnet_port_driver);
+ vnet_cleanup();
}

module_init(vnet_init);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:33 UTC
Permalink
=46rom: Bj=C3=B8rn Mork <***@mork.no>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

[ Upstream commit 5343330010a892b76a97fd93ad3c455a4a32a7fb ]

Add two device IDs found in an out-of-tree driver downloadable
from Netgear.

Signed-off-by: Bj=C3=B8rn Mork <***@mork.no>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/usb/qmi_wwan.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 7b0bbc395e83..2d8bf4232502 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -647,6 +647,7 @@ static const struct usb_device_id products[] =3D {
{QMI_FIXED_INTF(0x05c6, 0x9084, 4)},
{QMI_FIXED_INTF(0x05c6, 0x920d, 0)},
{QMI_FIXED_INTF(0x05c6, 0x920d, 5)},
+ {QMI_FIXED_INTF(0x0846, 0x68a2, 8)},
{QMI_FIXED_INTF(0x12d1, 0x140c, 1)}, /* Huawei E173 */
{QMI_FIXED_INTF(0x12d1, 0x14ac, 1)}, /* Huawei E1820 */
{QMI_FIXED_INTF(0x16d8, 0x6003, 0)}, /* CMOTech 6003 */
@@ -734,6 +735,7 @@ static const struct usb_device_id products[] =3D {
{QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */
{QMI_FIXED_INTF(0x1199, 0x9041, 8)}, /* Sierra Wireless MC7305/MC7355=
*/
{QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */
+ {QMI_FIXED_INTF(0x1199, 0x9057, 8)},
{QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Al=
catel One Touch L100V LTE) */
{QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */
{QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */
--=20
2.0.1
Jiri Slaby
2014-07-30 12:14:56 UTC
Permalink
From: Ted Juan <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 6938ad40cb97a52d88a763008935340729a4acc7 upstream.

These two function's switch case lack the 'break' that make them always
return error.

Signed-off-by: Ted Juan <***@gmail.com>
Acked-by: Pekon Gupta <***@ti.com>
Signed-off-by: Brian Norris <***@gmail.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/mtd/devices/elm.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/mtd/devices/elm.c b/drivers/mtd/devices/elm.c
index d1dd6a33a050..3059a7a53bff 100644
--- a/drivers/mtd/devices/elm.c
+++ b/drivers/mtd/devices/elm.c
@@ -428,6 +428,7 @@ static int elm_context_save(struct elm_info *info)
ELM_SYNDROME_FRAGMENT_1 + offset);
regs->elm_syndrome_fragment_0[i] = elm_read_reg(info,
ELM_SYNDROME_FRAGMENT_0 + offset);
+ break;
default:
return -EINVAL;
}
@@ -466,6 +467,7 @@ static int elm_context_restore(struct elm_info *info)
regs->elm_syndrome_fragment_1[i]);
elm_write_reg(info, ELM_SYNDROME_FRAGMENT_0 + offset,
regs->elm_syndrome_fragment_0[i]);
+ break;
default:
return -EINVAL;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:15:08 UTC
Permalink
From: Mikulas Patocka <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3b3a1814d1703027f9867d0f5cbbfaf6c7482474 upstream.

This patch provides the compat BLKZEROOUT ioctl. The argument is a pointer
to two uint64_t values, so there is no need to translate it.

Signed-off-by: Mikulas Patocka <***@redhat.com>
Acked-by: Martin K. Petersen <***@oracle.com>
Signed-off-by: Jens Axboe <***@fb.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
block/compat_ioctl.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index fbd5a67cb773..a0926a6094b2 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -690,6 +690,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
case BLKROSET:
case BLKDISCARD:
case BLKSECDISCARD:
+ case BLKZEROOUT:
/*
* the ones below are implemented in blkdev_locked_ioctl,
* but we call blkdev_ioctl, which gets the lock for us
--
2.0.1
Jiri Slaby
2014-07-30 12:14:29 UTC
Permalink
From: Eric Dumazet <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 07b0f00964def8af9321cfd6c4a7e84f6362f728 ]

While it is legal to kfree(NULL), it is not wise to use :
put_page(virt_to_head_page(NULL))

BUG: unable to handle kernel paging request at ffffeba400000000
IP: [<ffffffffc01f5928>] virt_to_head_page+0x36/0x44 [bnx2x]

Reported-by: Michel Lespinasse <***@google.com>
Signed-off-by: Eric Dumazet <***@google.com>
Cc: Ariel Elior <***@qlogic.com>
Fixes: d46d132cc021 ("bnx2x: use netdev_alloc_frag()")
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 0399458e6d44..9846d3e712a1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -755,7 +755,8 @@ static void bnx2x_tpa_stop(struct bnx2x *bp, struct bnx2x_fastpath *fp,

return;
}
- bnx2x_frag_free(fp, new_data);
+ if (new_data)
+ bnx2x_frag_free(fp, new_data);
drop:
/* drop the packet and keep the buffer in the bin */
DP(NETIF_MSG_RX_STATUS,
--
2.0.1
Jiri Slaby
2014-07-30 12:14:50 UTC
Permalink
From: Matthias Brugger <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit a97e8027b1d28eafe6bafe062556c1ec926a49c6 upstream.

Patch 0a68214b "ARM: DT: Add binding for GIC virtualization extentions (VGIC)" added
the "arm,cortex-a7-gic" compatible string, but the corresponding IRQCHIP_DECLARE
was never added to the gic driver.

To let real Cortex-A7 SoCs use it, add the necessary declaration to the device driver.

Signed-off-by: Matthias Brugger <***@gmail.com>
Link: https://lkml.kernel.org/r/1404388732-28890-1-git-send-email-***@gmail.com
Fixes: 0a68214b76ca ("ARM: DT: Add binding for GIC virtualization extentions (VGIC)")
Signed-off-by: Jason Cooper <***@lakedaemon.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/irqchip/irq-gic.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 86df97f6fd27..36fcc1b6264e 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -864,6 +864,7 @@ int __init gic_of_init(struct device_node *node, struct device_node *parent)
}
IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init);
+IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init);
IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init);
IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:54 UTC
Permalink
From: Alex Deucher <***@amd.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 0ac66effe7fcdee55bda6d5d10d3372c95a41920 upstream.

In some cases we fetch the edid in the detect() callback
in order to determine what sort of monitor is connected.
If that happens, don't fetch the edid again in the get_modes()
callback or we will leak the edid.

Signed-off-by: Alex Deucher <***@amd.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/gpu/drm/radeon/radeon_display.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 0254a7596a55..9a19a0432f0f 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -708,6 +708,10 @@ int radeon_ddc_get_modes(struct radeon_connector *radeon_connector)
struct radeon_device *rdev = dev->dev_private;
int ret = 0;

+ /* don't leak the edid if we already fetched it in detect() */
+ if (radeon_connector->edid)
+ goto got_edid;
+
/* on hw with routers, select right port */
if (radeon_connector->router.ddc_valid)
radeon_router_select_ddc_port(radeon_connector);
@@ -747,6 +751,7 @@ int radeon_ddc_get_modes(struct radeon_connector *radeon_connector)
radeon_connector->edid = radeon_bios_get_hardcoded_edid(rdev);
}
if (radeon_connector->edid) {
+got_edid:
drm_mode_connector_update_edid_property(&radeon_connector->base, radeon_connector->edid);
ret = drm_add_edid_modes(&radeon_connector->base, radeon_connector->edid);
drm_edid_to_eld(&radeon_connector->base, radeon_connector->edid);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:58 UTC
Permalink
From: Mike Snitzer <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 048e5a07f282c57815b3901d4a68a77fa131ce0a upstream.

The block size for the dm-cache's data device must remained fixed for
the life of the cache. Disallow any attempt to change the cache's data
block size.

Signed-off-by: Mike Snitzer <***@redhat.com>
Acked-by: Joe Thornber <***@redhat.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/md/dm-cache-metadata.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 1d38019bb022..b564c0610259 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -407,6 +407,15 @@ static int __open_metadata(struct dm_cache_metadata *cmd)

disk_super = dm_block_data(sblock);

+ /* Verify the data block size hasn't changed */
+ if (le32_to_cpu(disk_super->data_block_size) != cmd->data_block_size) {
+ DMERR("changing the data block size (from %u to %llu) is not supported",
+ le32_to_cpu(disk_super->data_block_size),
+ (unsigned long long)cmd->data_block_size);
+ r = -EINVAL;
+ goto bad;
+ }
+
r = __check_incompat_features(disk_super, cmd);
if (r < 0)
goto bad;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:39 UTC
Permalink
From: Ben Pfaff <***@nicira.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit ac30ef832e6af0505b6f0251a6659adcfa74975e ]

netlink_dump() returns a negative errno value on error. Until now,
netlink_recvmsg() directly recorded that negative value in sk->sk_err, but
that's wrong since sk_err takes positive errno values. (This manifests as
userspace receiving a positive return value from the recv() system call,
falsely indicating success.) This bug was introduced in the commit that
started checking the netlink_dump() return value, commit b44d211 (netlink:
handle errors from netlink_dump()).

Multithreaded Netlink dumps are one way to trigger this behavior in
practice, as described in the commit message for the userspace workaround
posted here:
http://openvswitch.org/pipermail/dev/2014-June/042339.html

This commit also fixes the same bug in netlink_poll(), introduced in commit
cd1df525d (netlink: add flow control for memory mapped I/O).

Signed-off-by: Ben Pfaff <***@nicira.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/netlink/af_netlink.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index e6d457c4a4e4..d9a2598a5190 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -628,7 +628,7 @@ static unsigned int netlink_poll(struct file *file, struct socket *sock,
while (nlk->cb_running && netlink_dump_space(nlk)) {
err = netlink_dump(sk);
if (err < 0) {
- sk->sk_err = err;
+ sk->sk_err = -err;
sk->sk_error_report(sk);
break;
}
@@ -2440,7 +2440,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
- sk->sk_err = ret;
+ sk->sk_err = -ret;
sk->sk_error_report(sk);
}
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:55 UTC
Permalink
From: John Stultz <***@linaro.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 16927776ae757d0d132bdbfabbfe2c498342bd59 upstream.

Sharvil noticed with the posix timer_settime interface, using the
CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users
tried to specify a relative time timer, it would incorrectly be
treated as absolute regardless of the state of the flags argument.

This patch corrects this, properly checking the absolute/relative flag,
as well as adds further error checking that no invalid flag bits are set.

Reported-by: Sharvil Nanavati <***@google.com>
Signed-off-by: John Stultz <***@linaro.org>
Cc: Thomas Gleixner <***@linutronix.de>
Cc: Ingo Molnar <***@kernel.org>
Cc: Prarit Bhargava <***@redhat.com>
Cc: Sharvil Nanavati <***@google.com>
Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-***@linaro.org
Signed-off-by: Thomas Gleixner <***@linutronix.de>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/time/alarmtimer.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 88c9c65a430d..fe75444ae7ec 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -585,9 +585,14 @@ static int alarm_timer_set(struct k_itimer *timr, int flags,
struct itimerspec *new_setting,
struct itimerspec *old_setting)
{
+ ktime_t exp;
+
if (!rtcdev)
return -ENOTSUPP;

+ if (flags & ~TIMER_ABSTIME)
+ return -EINVAL;
+
if (old_setting)
alarm_timer_get(timr, old_setting);

@@ -597,8 +602,16 @@ static int alarm_timer_set(struct k_itimer *timr, int flags,

/* start the timer */
timr->it.alarm.interval = timespec_to_ktime(new_setting->it_interval);
- alarm_start(&timr->it.alarm.alarmtimer,
- timespec_to_ktime(new_setting->it_value));
+ exp = timespec_to_ktime(new_setting->it_value);
+ /* Convert (if necessary) to absolute time */
+ if (flags != TIMER_ABSTIME) {
+ ktime_t now;
+
+ now = alarm_bases[timr->it.alarm.alarmtimer.type].gettime();
+ exp = ktime_add(now, exp);
+ }
+
+ alarm_start(&timr->it.alarm.alarmtimer, exp);
return 0;
}

@@ -730,6 +743,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
if (!alarmtimer_get_rtcdev())
return -ENOTSUPP;

+ if (flags & ~TIMER_ABSTIME)
+ return -EINVAL;
+
if (!capable(CAP_WAKE_ALARM))
return -EPERM;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:31 UTC
Permalink
From: Edward Allcutt <***@openmarket.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 68b7107b62983f2cff0948292429d5f5999df096 ]

Some older router implementations still send Fragmentation Needed
errors with the Next-Hop MTU field set to zero. This is explicitly
described as an eventuality that hosts must deal with by the
standard (RFC 1191) since older standards specified that those
bits must be zero.

Linux had a generic (for all of IPv4) implementation of the algorithm
described in the RFC for searching a list of MTU plateaus for a good
value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
removed this as part of the changes to remove the routing cache.
Subsequently any Fragmentation Needed packet with a zero Next-Hop
MTU has been discarded without being passed to the per-protocol
handlers or notifying userspace for raw sockets.

When there is a router which does not implement RFC 1191 on an
MTU limited path then this results in stalled connections since
large packets are discarded and the local protocols are not
notified so they never attempt to lower the pMTU.

One example I have seen is an OpenBSD router terminating IPSec
tunnels. It's worth pointing out that this case is distinct from
the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
since the commit in question dismissed that as a valid concern.

All of the per-protocols handlers implement the simple approach from
RFC 1191 of immediately falling back to the minimum value. Although
this is sub-optimal it is vastly preferable to connections hanging
indefinitely.

Remove the Next-Hop MTU != 0 check and allow such packets
to follow the normal path.

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Edward Allcutt <***@openmarket.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/icmp.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5f7d11a45871..ff670cab5af5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -705,8 +705,6 @@ static void icmp_unreach(struct sk_buff *skb)
&iph->daddr);
} else {
info = ntohs(icmph->un.frag.mtu);
- if (!info)
- goto out;
}
break;
case ICMP_SR_FAILED:
--
2.0.1
Jiri Slaby
2014-07-30 12:14:42 UTC
Permalink
From: Daniel Borkmann <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ]

While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

The sctp_sndrcvinfo structure is defined below:

struct sctp_sndrcvinfo {
uint16_t sinfo_stream;
uint16_t sinfo_ssn;
uint16_t sinfo_flags;
<-- 2 bytes hole -->
uint32_t sinfo_ppid;
uint32_t sinfo_context;
uint32_t sinfo_timetolive;
uint32_t sinfo_tsn;
uint32_t sinfo_cumtsn;
sctp_assoc_t sinfo_assoc_id;
};

* 6.1.3. SCTP_REMOTE_ERROR:

A remote peer may send an Operation Error message to its peer.
This message indicates a variety of error conditions on an
association. The entire ERROR chunk as it appears on the wire
is included in an SCTP_REMOTE_ERROR event. Please refer to the
SCTP specification [RFC4960] and any extensions for a list of
possible error formats. An SCTP error notification has the
following format:

struct sctp_remote_error {
uint16_t sre_type;
uint16_t sre_flags;
uint32_t sre_length;
uint16_t sre_error;
<-- 2 bytes hole -->
sctp_assoc_t sre_assoc_id;
uint8_t sre_data[];
};

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <***@redhat.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/sctp/ulpevent.c | 122 +++++++---------------------------------------------
1 file changed, 15 insertions(+), 107 deletions(-)

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 81089ed65456..12c37cee80e5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -367,9 +367,10 @@ fail:
* specification [SCTP] and any extensions for a list of possible
* error formats.
*/
-struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
- const struct sctp_association *asoc, struct sctp_chunk *chunk,
- __u16 flags, gfp_t gfp)
+struct sctp_ulpevent *
+sctp_ulpevent_make_remote_error(const struct sctp_association *asoc,
+ struct sctp_chunk *chunk, __u16 flags,
+ gfp_t gfp)
{
struct sctp_ulpevent *event;
struct sctp_remote_error *sre;
@@ -388,8 +389,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
/* Copy the skb to a new skb with room for us to prepend
* notification with.
*/
- skb = skb_copy_expand(chunk->skb, sizeof(struct sctp_remote_error),
- 0, gfp);
+ skb = skb_copy_expand(chunk->skb, sizeof(*sre), 0, gfp);

/* Pull off the rest of the cause TLV from the chunk. */
skb_pull(chunk->skb, elen);
@@ -400,62 +400,21 @@ struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
event = sctp_skb2event(skb);
sctp_ulpevent_init(event, MSG_NOTIFICATION, skb->truesize);

- sre = (struct sctp_remote_error *)
- skb_push(skb, sizeof(struct sctp_remote_error));
+ sre = (struct sctp_remote_error *) skb_push(skb, sizeof(*sre));

/* Trim the buffer to the right length. */
- skb_trim(skb, sizeof(struct sctp_remote_error) + elen);
+ skb_trim(skb, sizeof(*sre) + elen);

- /* Socket Extensions for SCTP
- * 5.3.1.3 SCTP_REMOTE_ERROR
- *
- * sre_type:
- * It should be SCTP_REMOTE_ERROR.
- */
+ /* RFC6458, Section 6.1.3. SCTP_REMOTE_ERROR */
+ memset(sre, 0, sizeof(*sre));
sre->sre_type = SCTP_REMOTE_ERROR;
-
- /*
- * Socket Extensions for SCTP
- * 5.3.1.3 SCTP_REMOTE_ERROR
- *
- * sre_flags: 16 bits (unsigned integer)
- * Currently unused.
- */
sre->sre_flags = 0;
-
- /* Socket Extensions for SCTP
- * 5.3.1.3 SCTP_REMOTE_ERROR
- *
- * sre_length: sizeof (__u32)
- *
- * This field is the total length of the notification data,
- * including the notification header.
- */
sre->sre_length = skb->len;
-
- /* Socket Extensions for SCTP
- * 5.3.1.3 SCTP_REMOTE_ERROR
- *
- * sre_error: 16 bits (unsigned integer)
- * This value represents one of the Operational Error causes defined in
- * the SCTP specification, in network byte order.
- */
sre->sre_error = cause;
-
- /* Socket Extensions for SCTP
- * 5.3.1.3 SCTP_REMOTE_ERROR
- *
- * sre_assoc_id: sizeof (sctp_assoc_t)
- *
- * The association id field, holds the identifier for the association.
- * All notifications for a given association have the same association
- * identifier. For TCP style socket, this field is ignored.
- */
sctp_ulpevent_set_owner(event, asoc);
sre->sre_assoc_id = sctp_assoc2id(asoc);

return event;
-
fail:
return NULL;
}
@@ -900,7 +859,9 @@ __u16 sctp_ulpevent_get_notification_type(const struct sctp_ulpevent *event)
return notification->sn_header.sn_type;
}

-/* Copy out the sndrcvinfo into a msghdr. */
+/* RFC6458, Section 5.3.2. SCTP Header Information Structure
+ * (SCTP_SNDRCV, DEPRECATED)
+ */
void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event,
struct msghdr *msghdr)
{
@@ -909,74 +870,21 @@ void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event,
if (sctp_ulpevent_is_notification(event))
return;

- /* Sockets API Extensions for SCTP
- * Section 5.2.2 SCTP Header Information Structure (SCTP_SNDRCV)
- *
- * sinfo_stream: 16 bits (unsigned integer)
- *
- * For recvmsg() the SCTP stack places the message's stream number in
- * this value.
- */
+ memset(&sinfo, 0, sizeof(sinfo));
sinfo.sinfo_stream = event->stream;
- /* sinfo_ssn: 16 bits (unsigned integer)
- *
- * For recvmsg() this value contains the stream sequence number that
- * the remote endpoint placed in the DATA chunk. For fragmented
- * messages this is the same number for all deliveries of the message
- * (if more than one recvmsg() is needed to read the message).
- */
sinfo.sinfo_ssn = event->ssn;
- /* sinfo_ppid: 32 bits (unsigned integer)
- *
- * In recvmsg() this value is
- * the same information that was passed by the upper layer in the peer
- * application. Please note that byte order issues are NOT accounted
- * for and this information is passed opaquely by the SCTP stack from
- * one end to the other.
- */
sinfo.sinfo_ppid = event->ppid;
- /* sinfo_flags: 16 bits (unsigned integer)
- *
- * This field may contain any of the following flags and is composed of
- * a bitwise OR of these values.
- *
- * recvmsg() flags:
- *
- * SCTP_UNORDERED - This flag is present when the message was sent
- * non-ordered.
- */
sinfo.sinfo_flags = event->flags;
- /* sinfo_tsn: 32 bit (unsigned integer)
- *
- * For the receiving side, this field holds a TSN that was
- * assigned to one of the SCTP Data Chunks.
- */
sinfo.sinfo_tsn = event->tsn;
- /* sinfo_cumtsn: 32 bit (unsigned integer)
- *
- * This field will hold the current cumulative TSN as
- * known by the underlying SCTP layer. Note this field is
- * ignored when sending and only valid for a receive
- * operation when sinfo_flags are set to SCTP_UNORDERED.
- */
sinfo.sinfo_cumtsn = event->cumtsn;
- /* sinfo_assoc_id: sizeof (sctp_assoc_t)
- *
- * The association handle field, sinfo_assoc_id, holds the identifier
- * for the association announced in the COMMUNICATION_UP notification.
- * All notifications for a given association have the same identifier.
- * Ignored for one-to-one style sockets.
- */
sinfo.sinfo_assoc_id = sctp_assoc2id(event->asoc);
-
- /* context value that is set via SCTP_CONTEXT socket option. */
+ /* Context value that is set via SCTP_CONTEXT socket option. */
sinfo.sinfo_context = event->asoc->default_rcv_context;
-
/* These fields are not used while receiving. */
sinfo.sinfo_timetolive = 0;

put_cmsg(msghdr, IPPROTO_SCTP, SCTP_SNDRCV,
- sizeof(struct sctp_sndrcvinfo), (void *)&sinfo);
+ sizeof(sinfo), &sinfo);
}

/* Do accounting for bytes received and hold a reference to the association
--
2.0.1
Jiri Slaby
2014-07-30 12:14:49 UTC
Permalink
From: Martin Lau <***@fb.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 97b8ee845393701edc06e27ccec2876ff9596019 upstream.

ring_buffer_poll_wait() should always put the poll_table to its wait_queue
even there is immediate data available. Otherwise, the following epoll and
read sequence will eventually hang forever:

1. Put some data to make the trace_pipe ring_buffer read ready first
2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
3. epoll_wait()
4. read(trace_pipe_fd) till EAGAIN
5. Add some more data to the trace_pipe ring_buffer
6. epoll_wait() -> this epoll_wait() will block forever

~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
ring_buffer_poll_wait() returns immediately without adding poll_table,
which has poll_table->_qproc pointing to ep_poll_callback(), to its
wait_queue.
~ During the epoll_wait() call in step 3 and step 6,
ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
because the poll_table->_qproc is NULL and it is how epoll works.
~ When there is new data available in step 6, ring_buffer does not know
it has to call ep_poll_callback() because it is not in its wait queue.
Hence, block forever.

Other poll implementation seems to call poll_wait() unconditionally as the very
first thing to do. For example, tcp_poll() in tcp.c.

Link: http://lkml.kernel.org/p/***@devbig242.prn2.facebook.com

Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled"
Reviewed-by: Chris Mason <***@fb.com>
Signed-off-by: Martin Lau <***@fb.com>
Signed-off-by: Steven Rostedt <***@goodmis.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/trace/ring_buffer.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 15c4ae203885..a758ec217bc0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -616,10 +616,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;

- if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
- (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
- return POLLIN | POLLRDNORM;
-
if (cpu == RING_BUFFER_ALL_CPUS)
work = &buffer->irq_work;
else {
--
2.0.1
Jiri Slaby
2014-07-30 12:14:45 UTC
Permalink
=46rom: Manuel Sch=C3=B6lling <***@gmx.de>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

[ Upstream commit 84a7c0b1db1c17d5ded8d3800228a608e1070b40 ]

dns_query() credulously assumes that keys are null-terminated and
returns a copy of a memory block that is off by one.

Signed-off-by: Manuel Sch=C3=B6lling <***@gmx.de>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/dns_resolver/dns_query.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/dns_resolver/dns_query.c b/net/dns_resolver/dns_query.=
c
index c32be292c7e3..ede0e2d7412e 100644
--- a/net/dns_resolver/dns_query.c
+++ b/net/dns_resolver/dns_query.c
@@ -150,7 +150,9 @@ int dns_query(const char *type, const char *name, s=
ize_t namelen,
if (!*_result)
goto put;
=20
- memcpy(*_result, upayload->data, len + 1);
+ memcpy(*_result, upayload->data, len);
+ *_result[len] =3D '\0';
+
if (_expiry)
*_expiry =3D rkey->expiry;
=20
--=20
2.0.1
Jiri Slaby
2014-07-30 12:14:52 UTC
Permalink
From: Alex Deucher <***@amd.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 201bb62402e0227375c655446ea04fcd0acf7287 upstream.

If the value in the scratch register is 0, set it to the
max level. This fixes an issue where the console fb blanking
code calls back into the backlight driver on unblank and then
sets the backlight level to 0 after the driver has already
set the mode and enabled the backlight.

bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=81382
https://bugs.freedesktop.org/show_bug.cgi?id=70207

Signed-off-by: Alex Deucher <***@amd.com>
Tested-by: David Heidelberger <***@ixit.cz>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/gpu/drm/radeon/atombios_encoders.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c
index 583345636d4b..6a965172d8dd 100644
--- a/drivers/gpu/drm/radeon/atombios_encoders.c
+++ b/drivers/gpu/drm/radeon/atombios_encoders.c
@@ -183,7 +183,6 @@ void radeon_atom_backlight_init(struct radeon_encoder *radeon_encoder,
struct backlight_properties props;
struct radeon_backlight_privdata *pdata;
struct radeon_encoder_atom_dig *dig;
- u8 backlight_level;
char bl_name[16];

/* Mac laptops with multiple GPUs use the gmux driver for backlight
@@ -222,12 +221,17 @@ void radeon_atom_backlight_init(struct radeon_encoder *radeon_encoder,

pdata->encoder = radeon_encoder;

- backlight_level = radeon_atom_get_backlight_level_from_reg(rdev);
-
dig = radeon_encoder->enc_priv;
dig->bl_dev = bd;

bd->props.brightness = radeon_atom_backlight_get_brightness(bd);
+ /* Set a reasonable default here if the level is 0 otherwise
+ * fbdev will attempt to turn the backlight on after console
+ * unblanking and it will try and restore 0 which turns the backlight
+ * off again.
+ */
+ if (bd->props.brightness == 0)
+ bd->props.brightness = RADEON_MAX_BL_LEVEL;
bd->props.power = FB_BLANK_UNBLANK;
backlight_update_status(bd);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:37 UTC
Permalink
From: Andrey Utkin <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ]

Setting just skb->sk without taking its reference and setting a
destructor is invalid. However, in the places where this was done, skb
is used in a way not requiring skb->sk setting. So dropping the setting
of skb->sk.
Thanks to Eric Dumazet <***@gmail.com> for correct solution.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441
Reported-by: Ed Martin <***@edman007.com>
Signed-off-by: Andrey Utkin <***@gmail.com>
Signed-off-by: Eric Dumazet <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/appletalk/ddp.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 7d424ac6e760..43e875c84429 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1489,8 +1489,6 @@ static int atalk_rcv(struct sk_buff *skb, struct net_device *dev,
goto drop;

/* Queue packet (standard) */
- skb->sk = sock;
-
if (sock_queue_rcv_skb(sock, skb) < 0)
goto drop;

@@ -1644,7 +1642,6 @@ static int atalk_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
if (!skb)
goto out;

- skb->sk = sk;
skb_reserve(skb, ddp_dl->header_length);
skb_reserve(skb, dev->hard_header_len);
skb->dev = dev;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:36 UTC
Permalink
From: Yuchung Cheng <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 6e08d5e3c8236e7484229e46fdf92006e1dd4c49 ]

The undo code assumes that, upon entering loss recovery, TCP
1) always retransmit something
2) the retransmission never fails locally (e.g., qdisc drop)

so undo_marker is set in tcp_enter_recovery() and undo_retrans is
incremented only when tcp_retransmit_skb() is successful.

When the assumption is broken because TCP's cwnd is too small to
retransmit or the retransmit fails locally. The next (DUP)ACK
would incorrectly revert the cwnd and the congestion state in
tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
may enter the recovery state. The sender repeatedly enter and
(incorrectly) exit recovery states if the retransmits continue to
fail locally while receiving (DUP)ACKs.

The fix is to initialize undo_retrans to -1 and start counting on
the first retransmission. Always increment undo_retrans even if the
retransmissions fail locally because they couldn't cause DSACKs to
undo the cwnd reduction.

Signed-off-by: Yuchung Cheng <***@google.com>
Signed-off-by: Neal Cardwell <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/tcp_input.c | 8 ++++----
net/ipv4/tcp_output.c | 6 ++++--
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3c6ee41ba419..95f67671f56e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1064,7 +1064,7 @@ static bool tcp_check_dsack(struct sock *sk, const struct sk_buff *ack_skb,
}

/* D-SACK for already forgotten data... Do dumb counting. */
- if (dup_sack && tp->undo_marker && tp->undo_retrans &&
+ if (dup_sack && tp->undo_marker && tp->undo_retrans > 0 &&
!after(end_seq_0, prior_snd_una) &&
after(end_seq_0, tp->undo_marker))
tp->undo_retrans--;
@@ -1144,7 +1144,7 @@ static u8 tcp_sacktag_one(struct sock *sk,

/* Account D-SACK for retransmitted packet. */
if (dup_sack && (sacked & TCPCB_RETRANS)) {
- if (tp->undo_marker && tp->undo_retrans &&
+ if (tp->undo_marker && tp->undo_retrans > 0 &&
after(end_seq, tp->undo_marker))
tp->undo_retrans--;
if (sacked & TCPCB_SACKED_ACKED)
@@ -1845,7 +1845,7 @@ static void tcp_clear_retrans_partial(struct tcp_sock *tp)
tp->lost_out = 0;

tp->undo_marker = 0;
- tp->undo_retrans = 0;
+ tp->undo_retrans = -1;
}

void tcp_clear_retrans(struct tcp_sock *tp)
@@ -2613,7 +2613,7 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack)

tp->prior_ssthresh = 0;
tp->undo_marker = tp->snd_una;
- tp->undo_retrans = tp->retrans_out;
+ tp->undo_retrans = tp->retrans_out ? : -1;

if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {
if (!ece_ack)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 826fc6fab576..0cce660cf7dd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2437,8 +2437,6 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
if (!tp->retrans_stamp)
tp->retrans_stamp = TCP_SKB_CB(skb)->when;

- tp->undo_retrans += tcp_skb_pcount(skb);
-
/* snd_nxt is stored to detect loss of retransmitted segment,
* see tcp_input.c tcp_sacktag_write_queue().
*/
@@ -2446,6 +2444,10 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
} else {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRETRANSFAIL);
}
+
+ if (tp->undo_retrans < 0)
+ tp->undo_retrans = 0;
+ tp->undo_retrans += tcp_skb_pcount(skb);
return err;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:34 UTC
Permalink
From: Loic Prylli <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 54951194656e4853e441266fd095f880bc0398f3 ]

A bug was introduced in NETDEV_CHANGE notifier sequence causing the
arp table to be sometimes spuriously cleared (including manual arp
entries marked permanent), upon network link carrier changes.

The changed argument for the notifier was applied only to a single
caller of NETDEV_CHANGE, missing among others netdev_state_change().
So upon net_carrier events induced by the network, which are
triggering a call to netdev_state_change(), arp_netdev_event() would
decide whether to clear or not arp cache based on random/junk stack
values (a kind of read buffer overflow).

Fixes: be9efd365328 ("net: pass changed flags along with NETDEV_CHANGE event")
Fixes: 6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change")
Signed-off-by: Loic Prylli <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/core/dev.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 704c0c5bed1f..ef2f239cc322 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1202,7 +1202,11 @@ EXPORT_SYMBOL(netdev_features_change);
void netdev_state_change(struct net_device *dev)
{
if (dev->flags & IFF_UP) {
- call_netdevice_notifiers(NETDEV_CHANGE, dev);
+ struct netdev_notifier_change_info change_info;
+
+ change_info.flags_changed = 0;
+ call_netdevice_notifiers_info(NETDEV_CHANGE, dev,
+ &change_info.info);
rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
}
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:35 UTC
Permalink
From: dingtianhong <***@huawei.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ]

The problem was triggered by these steps:

1) create socket, bind and then setsockopt for add mc group.
mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));

2) drop the mc group for this socket.
mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));

3) and then drop the socket, I found the mc group was still used by the dev:

netstat -g

Interface RefCnt Group
--------------- ------ ---------------------
eth2 1 255.0.0.37

Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:

The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.

v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
so I add the checking for the in_dev before polling the mc_list, make sure when
we remove the mc group, dec the refcnt to the real dev which was using the mc address.
The problem would never happened again.

Signed-off-by: Ding Tianhong <***@huawei.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/igmp.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 7defdc9ba167..9fa5c0908ce3 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1952,6 +1952,10 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)

rtnl_lock();
in_dev = ip_mc_find_dev(net, imr);
+ if (!in_dev) {
+ ret = -ENODEV;
+ goto out;
+ }
ifindex = imr->imr_ifindex;
for (imlp = &inet->mc_list;
(iml = rtnl_dereference(*imlp)) != NULL;
@@ -1969,16 +1973,14 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)

*imlp = iml->next_rcu;

- if (in_dev)
- ip_mc_dec_group(in_dev, group);
+ ip_mc_dec_group(in_dev, group);
rtnl_unlock();
/* decrease mem now to avoid the memleak warning */
atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
kfree_rcu(iml, rcu);
return 0;
}
- if (!in_dev)
- ret = -ENODEV;
+out:
rtnl_unlock();
return ret;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:51 UTC
Permalink
From: Tomasz Figa <***@samsung.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 29e697b11853d3f83b1864ae385abdad4aa2c361 upstream.

Certain GIC implementation, namely those found on earlier, single
cluster, Exynos SoCs, have registers mapped without per-CPU banking,
which means that the driver needs to use different offset for each CPU.

Currently the driver calculates the offset by multiplying value returned
by cpu_logical_map() by CPU offset parsed from DT. This is correct when
CPU topology is not specified in DT and aforementioned function returns
core ID alone. However when DT contains CPU topology, the function
changes to return cluster ID as well, which is non-zero on mentioned
SoCs and so breaks the calculation in GIC driver.

This patch fixes this by masking out cluster ID in CPU offset
calculation so that only core ID is considered. Multi-cluster Exynos
SoCs already have banked GIC implementations, so this simple fix should
be enough.

Reported-by: Lorenzo Pieralisi <***@arm.com>
Reported-by: Bartlomiej Zolnierkiewicz <***@samsung.com>
Signed-off-by: Tomasz Figa <***@samsung.com>
Fixes: db0d4db22a78d ("ARM: gic: allow GIC to support non-banked setups")
Link: https://lkml.kernel.org/r/1405610624-18722-1-git-send-email-***@samsung.com
Signed-off-by: Jason Cooper <***@lakedaemon.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/irqchip/irq-gic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 36fcc1b6264e..0fcbf921fff3 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -42,6 +42,7 @@
#include <linux/irqchip/chained_irq.h>
#include <linux/irqchip/arm-gic.h>

+#include <asm/cputype.h>
#include <asm/irq.h>
#include <asm/exception.h>
#include <asm/smp_plat.h>
@@ -760,7 +761,9 @@ void __init gic_init_bases(unsigned int gic_nr, int irq_start,
}

for_each_possible_cpu(cpu) {
- unsigned long offset = percpu_offset * cpu_logical_map(cpu);
+ u32 mpidr = cpu_logical_map(cpu);
+ u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+ unsigned long offset = percpu_offset * core_id;
*per_cpu_ptr(gic->dist_base.percpu_base, cpu) = dist_base + offset;
*per_cpu_ptr(gic->cpu_base.percpu_base, cpu) = cpu_base + offset;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:28 UTC
Permalink
From: Eric Dumazet <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f ]

We have two different ways to handle changes to sk->sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() & __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.

Signed-off-by: Eric Dumazet <***@google.com>
Cc: Steffen Klassert <***@secunet.com>
Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
include/net/sock.h | 12 ++++++------
net/ipv4/route.c | 15 ++++++++-------
2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index dc84ec518ff5..def541a583de 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1789,9 +1789,11 @@ __sk_dst_set(struct sock *sk, struct dst_entry *dst)
static inline void
sk_dst_set(struct sock *sk, struct dst_entry *dst)
{
- spin_lock(&sk->sk_dst_lock);
- __sk_dst_set(sk, dst);
- spin_unlock(&sk->sk_dst_lock);
+ struct dst_entry *old_dst;
+
+ sk_tx_queue_clear(sk);
+ old_dst = xchg(&sk->sk_dst_cache, dst);
+ dst_release(old_dst);
}

static inline void
@@ -1803,9 +1805,7 @@ __sk_dst_reset(struct sock *sk)
static inline void
sk_dst_reset(struct sock *sk)
{
- spin_lock(&sk->sk_dst_lock);
- __sk_dst_reset(sk);
- spin_unlock(&sk->sk_dst_lock);
+ sk_dst_set(sk, NULL);
}

extern struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2b681867164d..310963d7c028 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1032,20 +1032,21 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
const struct iphdr *iph = (const struct iphdr *) skb->data;
struct flowi4 fl4;
struct rtable *rt;
- struct dst_entry *dst;
+ struct dst_entry *odst = NULL;
bool new = false;

bh_lock_sock(sk);
- rt = (struct rtable *) __sk_dst_get(sk);
+ odst = sk_dst_get(sk);

- if (sock_owned_by_user(sk) || !rt) {
+ if (sock_owned_by_user(sk) || !odst) {
__ipv4_sk_update_pmtu(skb, sk, mtu);
goto out;
}

__build_flow_key(&fl4, sk, iph, 0, 0, 0, 0, 0);

- if (!__sk_dst_check(sk, 0)) {
+ rt = (struct rtable *)odst;
+ if (odst->obsolete && odst->ops->check(odst, 0) == NULL) {
rt = ip_route_output_flow(sock_net(sk), &fl4, sk);
if (IS_ERR(rt))
goto out;
@@ -1055,8 +1056,7 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)

__ip_rt_update_pmtu((struct rtable *) rt->dst.path, &fl4, mtu);

- dst = dst_check(&rt->dst, 0);
- if (!dst) {
+ if (!dst_check(&rt->dst, 0)) {
if (new)
dst_release(&rt->dst);

@@ -1068,10 +1068,11 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
}

if (new)
- __sk_dst_set(sk, &rt->dst);
+ sk_dst_set(sk, &rt->dst);

out:
bh_unlock_sock(sk);
+ dst_release(odst);
}
EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:46 UTC
Permalink
From: Eric Dumazet <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ]

There is a benign buffer overflow in ip_options_compile spotted by
AddressSanitizer[1] :

Its benign because we always can access one extra byte in skb->head
(because header is followed by struct skb_shared_info), and in this case
this byte is not even used.

[28504.910798] ==================================================================
[28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
[28504.913170] Read of size 1 by thread T15843:
[28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
[28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
[28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.924722]
[28504.925106] Allocated by thread T15843:
[28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
[28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.934018]
[28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
[28504.934377] of 40-byte region [ffff880026382800, ffff880026382828)
[28504.937144]
[28504.937474] Memory state around the buggy address:
[28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
[28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.945573] ^
[28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.099804] Legend:
[28505.100269] f - 8 freed bytes
[28505.100884] r - 8 redzone bytes
[28505.101649] . - 8 allocated bytes
[28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes
[28505.103637] ==================================================================

[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Signed-off-by: Eric Dumazet <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/ip_options.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index ec7264514a82..089ed81d1878 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -288,6 +288,10 @@ int ip_options_compile(struct net *net,
optptr++;
continue;
}
+ if (unlikely(l < 2)) {
+ pp_ptr = optptr;
+ goto error;
+ }
optlen = optptr[1];
if (optlen<2 || optlen>l) {
pp_ptr = optptr;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:53 UTC
Permalink
From: Jason Wang <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit fbb60fe35ad579b511de8604b06a30b43846473b upstream.

Return IRQ_NONE if it was not our irq. This is necessary for the case
when qxl is sharing irq line with a device A in a crash kernel. If qxl
is initialized before A and A's irq was raised during this gap,
returning IRQ_HANDLED in this case will cause this irq to be raised
again after EOI since kernel think it was handled but in fact it was
not.

Cc: Gerd Hoffmann <***@redhat.com>
Signed-off-by: Jason Wang <***@redhat.com>
Signed-off-by: Dave Airlie <***@redhat.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/gpu/drm/qxl/qxl_irq.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_irq.c b/drivers/gpu/drm/qxl/qxl_irq.c
index 21393dc4700a..f4b6b89b98f3 100644
--- a/drivers/gpu/drm/qxl/qxl_irq.c
+++ b/drivers/gpu/drm/qxl/qxl_irq.c
@@ -33,6 +33,9 @@ irqreturn_t qxl_irq_handler(DRM_IRQ_ARGS)

pending = xchg(&qdev->ram_header->int_pending, 0);

+ if (!pending)
+ return IRQ_NONE;
+
atomic_inc(&qdev->irq_received);

if (pending & QXL_INTERRUPT_DISPLAY) {
--
2.0.1
Jiri Slaby
2014-07-30 12:14:32 UTC
Permalink
=46rom: Bernd Wachter <***@jolla.com>

3.12-stable review patch. If anyone has any objections, please let me =
know.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

[ Upstream commit 8dcb4b1526747d8431f9895e153dd478c9d16186 ]

There's a new version of the Telewell 4G modem working with, but not
recognized by this driver.

Signed-off-by: Bernd Wachter <***@jolla.com>
Acked-by: Bj=C3=B8rn Mork <***@mork.no>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/usb/qmi_wwan.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 135fb3ac330f..7b0bbc395e83 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -721,6 +721,7 @@ static const struct usb_device_id products[] =3D {
{QMI_FIXED_INTF(0x19d2, 0x1424, 2)},
{QMI_FIXED_INTF(0x19d2, 0x1425, 2)},
{QMI_FIXED_INTF(0x19d2, 0x1426, 2)}, /* ZTE MF91 */
+ {QMI_FIXED_INTF(0x19d2, 0x1428, 2)}, /* Telewell TW-LTE 4G v2 */
{QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */
{QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */
{QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */
--=20
2.0.1
Jiri Slaby
2014-07-30 12:14:25 UTC
Permalink
From: Li RongQing <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 916c1689a09bc1ca81f2d7a34876f8d35aadd11b ]

skb_cow called in vlan_reorder_header does not free the skb when it failed,
and vlan_reorder_header returns NULL to reset original skb when it is called
in vlan_untag, lead to a memory leak.

Signed-off-by: Li RongQing <***@gmail.com>
Acked-by: Eric Dumazet <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/8021q/vlan_core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 6ee48aac776f..7e57135c7cc4 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -108,8 +108,11 @@ EXPORT_SYMBOL(vlan_dev_vlan_id);

static struct sk_buff *vlan_reorder_header(struct sk_buff *skb)
{
- if (skb_cow(skb, skb_headroom(skb)) < 0)
+ if (skb_cow(skb, skb_headroom(skb)) < 0) {
+ kfree_skb(skb);
return NULL;
+ }
+
memmove(skb->data - ETH_HLEN, skb->data - VLAN_ETH_HLEN, 2 * ETH_ALEN);
skb->mac_header += VLAN_HLEN;
return skb;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:27 UTC
Permalink
From: Eric Dumazet <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit f88649721268999bdff09777847080a52004f691 ]

When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet <***@google.com>
Reported-by: Dormando <***@rydia.net>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
include/net/sock.h | 4 ++--
net/core/dst.c | 16 +++++++++++-----
2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 4aa873a6267f..dc84ec518ff5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1749,8 +1749,8 @@ sk_dst_get(struct sock *sk)

rcu_read_lock();
dst = rcu_dereference(sk->sk_dst_cache);
- if (dst)
- dst_hold(dst);
+ if (dst && !atomic_inc_not_zero(&dst->__refcnt))
+ dst = NULL;
rcu_read_unlock();
return dst;
}
diff --git a/net/core/dst.c b/net/core/dst.c
index ca4231ec7347..15b6792e6ebb 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -267,6 +267,15 @@ again:
}
EXPORT_SYMBOL(dst_destroy);

+static void dst_destroy_rcu(struct rcu_head *head)
+{
+ struct dst_entry *dst = container_of(head, struct dst_entry, rcu_head);
+
+ dst = dst_destroy(dst);
+ if (dst)
+ __dst_free(dst);
+}
+
void dst_release(struct dst_entry *dst)
{
if (dst) {
@@ -274,11 +283,8 @@ void dst_release(struct dst_entry *dst)

newrefcnt = atomic_dec_return(&dst->__refcnt);
WARN_ON(newrefcnt < 0);
- if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt) {
- dst = dst_destroy(dst);
- if (dst)
- __dst_free(dst);
- }
+ if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
+ call_rcu(&dst->rcu_head, dst_destroy_rcu);
}
}
EXPORT_SYMBOL(dst_release);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:26 UTC
Permalink
From: Wei-Chun Chao <***@plumgrid.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 5882a07c72093dc3a18e2d2b129fb200686bb6ee ]

This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.

During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.

Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.

kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<ffffffff81688a58>] ip_output+0x58/0x90
[<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]

Signed-off-by: Wei-Chun Chao <***@plumgrid.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/core/skbuff.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5a60953e6f39..aeb870c5c134 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2744,12 +2744,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
int i = 0;
int pos;

+ __skb_push(head_skb, doffset);
proto = skb_network_protocol(head_skb);
if (unlikely(!proto))
return ERR_PTR(-EINVAL);

csum = !!can_checksum_protocol(features, proto);
- __skb_push(head_skb, doffset);
+
headroom = skb_headroom(head_skb);
pos = skb_headlen(head_skb);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:23 UTC
Permalink
From: Neal Cardwell <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ]

If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

mss: 1000
skb: seq: 0 end_seq: 4000 len: 4000
SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

in_sack = false
pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
new_len += mss = 4000

Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <***@google.com>
Signed-off-by: Neal Cardwell <***@google.com>
Cc: Eric Dumazet <***@google.com>
Cc: Yuchung Cheng <***@google.com>
Cc: Ilpo Jarvinen <***@helsinki.fi>
Acked-by: Eric Dumazet <***@google.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0e8af08a98fc..3c6ee41ba419 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1120,7 +1120,7 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb,
unsigned int new_len = (pkt_len / mss) * mss;
if (!in_sack && new_len < pkt_len) {
new_len += mss;
- if (new_len > skb->len)
+ if (new_len >= skb->len)
return 0;
}
pkt_len = new_len;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:00 UTC
Permalink
From: Gavin Guo <***@canonical.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit bb86cf569bbd7ad4dce581a37c7fbd748057e9dc upstream.

When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller
[1022:7814], the second hotplugging will experience the USB 3.0 pen
drive is recognized as high-speed device. After bisecting the kernel,
I found the commit number 41e7e056cdc662f704fa9262e5c6e213b4ab45dd
(USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing
some experiments, the bug can be fixed by avoiding executing the function
hub_usb3_port_disable(). Because the port status with [AMD] FCH USB
XHCI Controlleris [1022:7814] is already in RxDetect
(I tried printing out the port status before setting to Disabled state),
it's reasonable to check the port status before really executing
hub_usb3_port_disable().

Fixes: 41e7e056cdc6 (USB: Allow USB 3.0 ports to be disabled.)
Signed-off-by: Gavin Guo <***@canonical.com>
Acked-by: Alan Stern <***@rowland.harvard.edu>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/usb/core/hub.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 60a1f13db296..9c63a76cfedd 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -891,6 +891,25 @@ static int hub_usb3_port_disable(struct usb_hub *hub, int port1)
if (!hub_is_superspeed(hub->hdev))
return -EINVAL;

+ ret = hub_port_status(hub, port1, &portstatus, &portchange);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * USB controller Advanced Micro Devices, Inc. [AMD] FCH USB XHCI
+ * Controller [1022:7814] will have spurious result making the following
+ * usb 3.0 device hotplugging route to the 2.0 root hub and recognized
+ * as high-speed device if we set the usb 3.0 port link state to
+ * Disabled. Since it's already in USB_SS_PORT_LS_RX_DETECT state, we
+ * check the state here to avoid the bug.
+ */
+ if ((portstatus & USB_PORT_STAT_LINK_STATE) ==
+ USB_SS_PORT_LS_RX_DETECT) {
+ dev_dbg(&hub->ports[port1 - 1]->dev,
+ "Not disabling port; link state is RxDetect\n");
+ return ret;
+ }
+
ret = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_SS_DISABLED);
if (ret)
return ret;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:24 UTC
Permalink
From: Daniel Borkmann <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 24599e61b7552673dd85971cf5a35369cd8c119e ]

When writing to the sysctl field net.sctp.auth_enable, it can well
be that the user buffer we handed over to proc_dointvec() via
proc_sctp_do_auth() handler contains something other than integers.

In that case, we would set an uninitialized 4-byte value from the
stack to net->sctp.auth_enable that can be leaked back when reading
the sysctl variable, and it can unintentionally turn auth_enable
on/off based on the stack content since auth_enable is interpreted
as a boolean.

Fix it up by making sure proc_dointvec() returned sucessfully.

Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint")
Reported-by: Florian Westphal <***@redhat.com>
Signed-off-by: Daniel Borkmann <***@redhat.com>
Acked-by: Neil Horman <***@tuxdriver.com>
Acked-by: Vlad Yasevich <***@gmail.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/sctp/sysctl.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 976c89d5295b..968355f0de60 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -361,8 +361,7 @@ static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
tbl.data = &net->sctp.auth_enable;

ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
-
- if (write) {
+ if (write && ret == 0) {
struct sock *sk = net->sctp.ctl_sock;

net->sctp.auth_enable = new_value;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:02 UTC
Permalink
From: Hans de Goede <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 242841d3d71191348f98310e2d2001e1001d8630 upstream.

Tested-and-reported-by: yullaw <***@mageia.cz>

Signed-off-by: Hans de Goede <***@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <***@samsung.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/media/usb/gspca/pac7302.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/media/usb/gspca/pac7302.c b/drivers/media/usb/gspca/pac7302.c
index a91509643563..0d4be1d840ab 100644
--- a/drivers/media/usb/gspca/pac7302.c
+++ b/drivers/media/usb/gspca/pac7302.c
@@ -928,6 +928,7 @@ static const struct usb_device_id device_table[] = {
{USB_DEVICE(0x093a, 0x2620)},
{USB_DEVICE(0x093a, 0x2621)},
{USB_DEVICE(0x093a, 0x2622), .driver_info = FL_VFLIP},
+ {USB_DEVICE(0x093a, 0x2623), .driver_info = FL_VFLIP},
{USB_DEVICE(0x093a, 0x2624), .driver_info = FL_VFLIP},
{USB_DEVICE(0x093a, 0x2625)},
{USB_DEVICE(0x093a, 0x2626)},
--
2.0.1
Jiri Slaby
2014-07-30 12:14:11 UTC
Permalink
From: "zhangwei(Jovi)" <***@huawei.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f0160a5a2912267c02cfe692eac955c360de5fdf upstream.

The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
so add it, to be consistent with __trace_printk/__trace_bprintk.
Those functions are all called by the same function: trace_printk().

Link: http://lkml.kernel.org/p/***@huawei.com

Signed-off-by: zhangwei(Jovi) <***@huawei.com>
Signed-off-by: Steven Rostedt <***@goodmis.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/trace/trace.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 952cde4d4e3c..b7566fe4d607 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -436,6 +436,9 @@ int __trace_puts(unsigned long ip, const char *str, int size)
int alloc;
int pc;

+ if (!(trace_flags & TRACE_ITER_PRINTK))
+ return 0;
+
pc = preempt_count();

if (unlikely(tracing_selftest_running || tracing_disabled))
@@ -483,6 +486,9 @@ int __trace_bputs(unsigned long ip, const char *str)
int size = sizeof(struct bputs_entry);
int pc;

+ if (!(trace_flags & TRACE_ITER_PRINTK))
+ return 0;
+
pc = preempt_count();

if (unlikely(tracing_selftest_running || tracing_disabled))
--
2.0.1
Jiri Slaby
2014-07-30 12:14:16 UTC
Permalink
From: Stefan Assmann <***@kpanic.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 76252723e88681628a3dbb9c09c963e095476f73 upstream.

To properly re-initialize SR-IOV it is necessary to reset the device
even if it is already down. Not doing this may result in Tx unit hangs.

Signed-off-by: Stefan Assmann <***@kpanic.de>
Tested-by: Aaron Brown <***@intel.com>
Signed-off-by: Jeff Kirsher <***@intel.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index d366df533726..76e43c417a31 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7295,6 +7295,8 @@ static int igb_sriov_reinit(struct pci_dev *dev)

if (netif_running(netdev))
igb_close(netdev);
+ else
+ igb_reset(adapter);

igb_clear_interrupt_scheme(adapter);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:10 UTC
Permalink
From: "zhangwei(Jovi)" <***@huawei.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 8abfb8727f4a724d31f9ccfd8013fbd16d539445 upstream.

Currently trace option stacktrace is not applicable for
trace_printk with constant string argument, the reason is
in __trace_puts/__trace_bputs ftrace_trace_stack is missing.

In contrast, when using trace_printk with non constant string
argument(will call into __trace_printk/__trace_bprintk), then
trace option stacktrace is workable, this inconstant result
will confuses users a lot.

Link: http://lkml.kernel.org/p/***@huawei.com

Signed-off-by: zhangwei(Jovi) <***@huawei.com>
Signed-off-by: Steven Rostedt <***@goodmis.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/trace/trace.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5e9cb157d31e..952cde4d4e3c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -434,6 +434,9 @@ int __trace_puts(unsigned long ip, const char *str, int size)
struct print_entry *entry;
unsigned long irq_flags;
int alloc;
+ int pc;
+
+ pc = preempt_count();

if (unlikely(tracing_selftest_running || tracing_disabled))
return 0;
@@ -443,7 +446,7 @@ int __trace_puts(unsigned long ip, const char *str, int size)
local_save_flags(irq_flags);
buffer = global_trace.trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, alloc,
- irq_flags, preempt_count());
+ irq_flags, pc);
if (!event)
return 0;

@@ -460,6 +463,7 @@ int __trace_puts(unsigned long ip, const char *str, int size)
entry->buf[size] = '\0';

__buffer_unlock_commit(buffer, event);
+ ftrace_trace_stack(buffer, irq_flags, 4, pc);

return size;
}
@@ -477,6 +481,9 @@ int __trace_bputs(unsigned long ip, const char *str)
struct bputs_entry *entry;
unsigned long irq_flags;
int size = sizeof(struct bputs_entry);
+ int pc;
+
+ pc = preempt_count();

if (unlikely(tracing_selftest_running || tracing_disabled))
return 0;
@@ -484,7 +491,7 @@ int __trace_bputs(unsigned long ip, const char *str)
local_save_flags(irq_flags);
buffer = global_trace.trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_BPUTS, size,
- irq_flags, preempt_count());
+ irq_flags, pc);
if (!event)
return 0;

@@ -493,6 +500,7 @@ int __trace_bputs(unsigned long ip, const char *str)
entry->str = str;

__buffer_unlock_commit(buffer, event);
+ ftrace_trace_stack(buffer, irq_flags, 4, pc);

return 1;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:17 UTC
Permalink
From: Niu Yawei <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit d68aab6b8f572406aa93b45ef6483934dd3b54a6 upstream.

Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API)
accidentally removed locking from quota shrinker. Fix it -
dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de
Signed-off-by: Niu Yawei <***@intel.com>
Signed-off-by: Jan Kara <***@suse.cz>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/quota/dquot.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9cd5f63715c0..7f30bdc57d13 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -702,6 +702,7 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
struct dquot *dquot;
unsigned long freed = 0;

+ spin_lock(&dq_list_lock);
head = free_dquots.prev;
while (head != &free_dquots && sc->nr_to_scan) {
dquot = list_entry(head, struct dquot, dq_free);
@@ -713,6 +714,7 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
freed++;
head = free_dquots.prev;
}
+ spin_unlock(&dq_list_lock);
return freed;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:21 UTC
Permalink
From: Tyler Hall <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit a8e83b17536aad603fbeae4c460f2da0ee9fe6ed ]

The commit "slip: Fix deadlock in write_wakeup" fixes a deadlock caused
by a change made in both slcan and slip. This is a direct port of that
fix.

Signed-off-by: Tyler Hall <***@gmail.com>
Cc: Oliver Hartkopp <***@hartkopp.net>
Cc: Andre Naujoks <***@gmail.com>
Cc: David S. Miller <***@davemloft.net>
Cc: linux-***@vger.kernel.org
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/can/slcan.c | 41 +++++++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c
index 25377e547f9b..3c28d1f187c0 100644
--- a/drivers/net/can/slcan.c
+++ b/drivers/net/can/slcan.c
@@ -54,6 +54,7 @@
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/kernel.h>
+#include <linux/workqueue.h>
#include <linux/can.h>
#include <linux/can/skb.h>

@@ -87,6 +88,7 @@ struct slcan {
struct tty_struct *tty; /* ptr to TTY structure */
struct net_device *dev; /* easy for intr handling */
spinlock_t lock;
+ struct work_struct tx_work; /* Flushes transmit buffer */

/* These are pointers to the malloc()ed frame buffers. */
unsigned char rbuff[SLC_MTU]; /* receiver buffer */
@@ -311,34 +313,44 @@ static void slc_encaps(struct slcan *sl, struct can_frame *cf)
sl->dev->stats.tx_bytes += cf->can_dlc;
}

-/*
- * Called by the driver when there's room for more data. If we have
- * more packets to send, we send them here.
- */
-static void slcan_write_wakeup(struct tty_struct *tty)
+/* Write out any remaining transmit buffer. Scheduled when tty is writable */
+static void slcan_transmit(struct work_struct *work)
{
+ struct slcan *sl = container_of(work, struct slcan, tx_work);
int actual;
- struct slcan *sl = (struct slcan *) tty->disc_data;

+ spin_lock_bh(&sl->lock);
/* First make sure we're connected. */
- if (!sl || sl->magic != SLCAN_MAGIC || !netif_running(sl->dev))
+ if (!sl->tty || sl->magic != SLCAN_MAGIC || !netif_running(sl->dev)) {
+ spin_unlock_bh(&sl->lock);
return;
+ }

- spin_lock(&sl->lock);
if (sl->xleft <= 0) {
/* Now serial buffer is almost free & we can start
* transmission of another packet */
sl->dev->stats.tx_packets++;
- clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
- spin_unlock(&sl->lock);
+ clear_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
+ spin_unlock_bh(&sl->lock);
netif_wake_queue(sl->dev);
return;
}

- actual = tty->ops->write(tty, sl->xhead, sl->xleft);
+ actual = sl->tty->ops->write(sl->tty, sl->xhead, sl->xleft);
sl->xleft -= actual;
sl->xhead += actual;
- spin_unlock(&sl->lock);
+ spin_unlock_bh(&sl->lock);
+}
+
+/*
+ * Called by the driver when there's room for more data.
+ * Schedule the transmit.
+ */
+static void slcan_write_wakeup(struct tty_struct *tty)
+{
+ struct slcan *sl = tty->disc_data;
+
+ schedule_work(&sl->tx_work);
}

/* Send a can_frame to a TTY queue. */
@@ -524,6 +536,7 @@ static struct slcan *slc_alloc(dev_t line)
sl->magic = SLCAN_MAGIC;
sl->dev = dev;
spin_lock_init(&sl->lock);
+ INIT_WORK(&sl->tx_work, slcan_transmit);
slcan_devs[i] = dev;

return sl;
@@ -622,8 +635,12 @@ static void slcan_close(struct tty_struct *tty)
if (!sl || sl->magic != SLCAN_MAGIC || sl->tty != tty)
return;

+ spin_lock_bh(&sl->lock);
tty->disc_data = NULL;
sl->tty = NULL;
+ spin_unlock_bh(&sl->lock);
+
+ flush_work(&sl->tx_work);

/* Flush network side */
unregister_netdev(sl->dev);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:15 UTC
Permalink
From: Todd Fujinaka <***@intel.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 948264879b6894dc389a44b99fae4f0b72932619 upstream.

On some devices, the internal PLL circuit occasionally provides the
wrong clock frequency after power up. The probability of failure is less
than one failure per 1000 power cycles. When the failure occurs, the
internal clock frequency is around 1/20 of the correct frequency.

Signed-off-by: Todd Fujinaka <***@intel.com>
Tested-by: Aaron Brown <***@intel.com>
Signed-off-by: Jeff Kirsher <***@intel.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/intel/igb/e1000_82575.c | 7 +++
drivers/net/ethernet/intel/igb/e1000_defines.h | 18 +++----
drivers/net/ethernet/intel/igb/e1000_hw.h | 3 ++
drivers/net/ethernet/intel/igb/e1000_i210.c | 66 ++++++++++++++++++++++++++
drivers/net/ethernet/intel/igb/e1000_i210.h | 12 +++++
drivers/net/ethernet/intel/igb/e1000_regs.h | 1 +
drivers/net/ethernet/intel/igb/igb_main.c | 14 ++++++
7 files changed, 113 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index 47c2d10df826..974558e36588 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -1403,6 +1403,13 @@ static s32 igb_init_hw_82575(struct e1000_hw *hw)
s32 ret_val;
u16 i, rar_count = mac->rar_entry_count;

+ if ((hw->mac.type >= e1000_i210) &&
+ !(igb_get_flash_presence_i210(hw))) {
+ ret_val = igb_pll_workaround_i210(hw);
+ if (ret_val)
+ return ret_val;
+ }
+
/* Initialize identification LED */
ret_val = igb_id_led_init(hw);
if (ret_val) {
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 978eca31ceda..956c4c3ae70b 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -46,14 +46,15 @@
/* Extended Device Control */
#define E1000_CTRL_EXT_SDP3_DATA 0x00000080 /* Value of SW Defineable Pin 3 */
/* Physical Func Reset Done Indication */
-#define E1000_CTRL_EXT_PFRSTD 0x00004000
-#define E1000_CTRL_EXT_LINK_MODE_MASK 0x00C00000
-#define E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES 0x00C00000
-#define E1000_CTRL_EXT_LINK_MODE_1000BASE_KX 0x00400000
-#define E1000_CTRL_EXT_LINK_MODE_SGMII 0x00800000
-#define E1000_CTRL_EXT_LINK_MODE_GMII 0x00000000
-#define E1000_CTRL_EXT_EIAME 0x01000000
-#define E1000_CTRL_EXT_IRCA 0x00000001
+#define E1000_CTRL_EXT_PFRSTD 0x00004000
+#define E1000_CTRL_EXT_SDLPE 0X00040000 /* SerDes Low Power Enable */
+#define E1000_CTRL_EXT_LINK_MODE_MASK 0x00C00000
+#define E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES 0x00C00000
+#define E1000_CTRL_EXT_LINK_MODE_1000BASE_KX 0x00400000
+#define E1000_CTRL_EXT_LINK_MODE_SGMII 0x00800000
+#define E1000_CTRL_EXT_LINK_MODE_GMII 0x00000000
+#define E1000_CTRL_EXT_EIAME 0x01000000
+#define E1000_CTRL_EXT_IRCA 0x00000001
/* Interrupt delay cancellation */
/* Driver loaded bit for FW */
#define E1000_CTRL_EXT_DRV_LOAD 0x10000000
@@ -62,6 +63,7 @@
/* packet buffer parity error detection enabled */
/* descriptor FIFO parity error detection enable */
#define E1000_CTRL_EXT_PBA_CLR 0x80000000 /* PBA Clear */
+#define E1000_CTRL_EXT_PHYPDEN 0x00100000
#define E1000_I2CCMD_REG_ADDR_SHIFT 16
#define E1000_I2CCMD_PHY_ADDR_SHIFT 24
#define E1000_I2CCMD_OPCODE_READ 0x08000000
diff --git a/drivers/net/ethernet/intel/igb/e1000_hw.h b/drivers/net/ethernet/intel/igb/e1000_hw.h
index 37a9c06a6c68..80f20d1f1cfe 100644
--- a/drivers/net/ethernet/intel/igb/e1000_hw.h
+++ b/drivers/net/ethernet/intel/igb/e1000_hw.h
@@ -569,4 +569,7 @@ extern struct net_device *igb_get_hw_dev(struct e1000_hw *hw);
/* These functions must be implemented by drivers */
s32 igb_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value);
s32 igb_write_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value);
+
+void igb_read_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value);
+void igb_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value);
#endif /* _E1000_HW_H_ */
diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.c b/drivers/net/ethernet/intel/igb/e1000_i210.c
index 0c0393316a3a..0217d4e229a0 100644
--- a/drivers/net/ethernet/intel/igb/e1000_i210.c
+++ b/drivers/net/ethernet/intel/igb/e1000_i210.c
@@ -835,3 +835,69 @@ s32 igb_init_nvm_params_i210(struct e1000_hw *hw)
}
return ret_val;
}
+
+/**
+ * igb_pll_workaround_i210
+ * @hw: pointer to the HW structure
+ *
+ * Works around an errata in the PLL circuit where it occasionally
+ * provides the wrong clock frequency after power up.
+ **/
+s32 igb_pll_workaround_i210(struct e1000_hw *hw)
+{
+ s32 ret_val;
+ u32 wuc, mdicnfg, ctrl, ctrl_ext, reg_val;
+ u16 nvm_word, phy_word, pci_word, tmp_nvm;
+ int i;
+
+ /* Get and set needed register values */
+ wuc = rd32(E1000_WUC);
+ mdicnfg = rd32(E1000_MDICNFG);
+ reg_val = mdicnfg & ~E1000_MDICNFG_EXT_MDIO;
+ wr32(E1000_MDICNFG, reg_val);
+
+ /* Get data from NVM, or set default */
+ ret_val = igb_read_invm_word_i210(hw, E1000_INVM_AUTOLOAD,
+ &nvm_word);
+ if (ret_val)
+ nvm_word = E1000_INVM_DEFAULT_AL;
+ tmp_nvm = nvm_word | E1000_INVM_PLL_WO_VAL;
+ for (i = 0; i < E1000_MAX_PLL_TRIES; i++) {
+ /* check current state directly from internal PHY */
+ igb_read_phy_reg_gs40g(hw, (E1000_PHY_PLL_FREQ_PAGE |
+ E1000_PHY_PLL_FREQ_REG), &phy_word);
+ if ((phy_word & E1000_PHY_PLL_UNCONF)
+ != E1000_PHY_PLL_UNCONF) {
+ ret_val = 0;
+ break;
+ } else {
+ ret_val = -E1000_ERR_PHY;
+ }
+ /* directly reset the internal PHY */
+ ctrl = rd32(E1000_CTRL);
+ wr32(E1000_CTRL, ctrl|E1000_CTRL_PHY_RST);
+
+ ctrl_ext = rd32(E1000_CTRL_EXT);
+ ctrl_ext |= (E1000_CTRL_EXT_PHYPDEN | E1000_CTRL_EXT_SDLPE);
+ wr32(E1000_CTRL_EXT, ctrl_ext);
+
+ wr32(E1000_WUC, 0);
+ reg_val = (E1000_INVM_AUTOLOAD << 4) | (tmp_nvm << 16);
+ wr32(E1000_EEARBC_I210, reg_val);
+
+ igb_read_pci_cfg(hw, E1000_PCI_PMCSR, &pci_word);
+ pci_word |= E1000_PCI_PMCSR_D3;
+ igb_write_pci_cfg(hw, E1000_PCI_PMCSR, &pci_word);
+ usleep_range(1000, 2000);
+ pci_word &= ~E1000_PCI_PMCSR_D3;
+ igb_write_pci_cfg(hw, E1000_PCI_PMCSR, &pci_word);
+ reg_val = (E1000_INVM_AUTOLOAD << 4) | (nvm_word << 16);
+ wr32(E1000_EEARBC_I210, reg_val);
+
+ /* restore WUC register */
+ wr32(E1000_WUC, wuc);
+ }
+ /* restore MDICNFG setting */
+ wr32(E1000_MDICNFG, mdicnfg);
+ return ret_val;
+}
diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.h b/drivers/net/ethernet/intel/igb/e1000_i210.h
index dde3c4b7ea99..99f4611d6f48 100644
--- a/drivers/net/ethernet/intel/igb/e1000_i210.h
+++ b/drivers/net/ethernet/intel/igb/e1000_i210.h
@@ -48,6 +48,7 @@ extern s32 igb_write_xmdio_reg(struct e1000_hw *hw, u16 addr, u8 dev_addr,
u16 data);
extern s32 igb_init_nvm_params_i210(struct e1000_hw *hw);
extern bool igb_get_flash_presence_i210(struct e1000_hw *hw);
+s32 igb_pll_workaround_i210(struct e1000_hw *hw);

#define E1000_STM_OPCODE 0xDB00
#define E1000_EEPROM_FLASH_SIZE_WORD 0x11
@@ -93,4 +94,15 @@ enum E1000_INVM_STRUCTURE_TYPE {
#define NVM_LED_1_CFG_DEFAULT_I211 0x0184
#define NVM_LED_0_2_CFG_DEFAULT_I211 0x200C

+/* PLL Defines */
+#define E1000_PCI_PMCSR 0x44
+#define E1000_PCI_PMCSR_D3 0x03
+#define E1000_MAX_PLL_TRIES 5
+#define E1000_PHY_PLL_UNCONF 0xFF
+#define E1000_PHY_PLL_FREQ_PAGE 0xFC0000
+#define E1000_PHY_PLL_FREQ_REG 0x000E
+#define E1000_INVM_DEFAULT_AL 0x202F
+#define E1000_INVM_AUTOLOAD 0x0A
+#define E1000_INVM_PLL_WO_VAL 0x0010
+
#endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 82632c6c53af..7156981ec813 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -69,6 +69,7 @@
#define E1000_PBA 0x01000 /* Packet Buffer Allocation - RW */
#define E1000_PBS 0x01008 /* Packet Buffer Size */
#define E1000_EEMNGCTL 0x01010 /* MNG EEprom Control */
+#define E1000_EEARBC_I210 0x12024 /* EEPROM Auto Read Bus Control */
#define E1000_EEWR 0x0102C /* EEPROM Write Register - RW */
#define E1000_I2CCMD 0x01028 /* SFPI2C Command Register - RW */
#define E1000_FRTIMER 0x01048 /* Free Running Timer - RW */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8cf44f2a8ccd..d366df533726 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6918,6 +6918,20 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
}
}

+void igb_read_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
+{
+ struct igb_adapter *adapter = hw->back;
+
+ pci_read_config_word(adapter->pdev, reg, value);
+}
+
+void igb_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value)
+{
+ struct igb_adapter *adapter = hw->back;
+
+ pci_write_config_word(adapter->pdev, reg, *value);
+}
+
s32 igb_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
{
struct igb_adapter *adapter = hw->back;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:09 UTC
Permalink
From: "Steven Rostedt (Red Hat)" <***@goodmis.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 5f8bf2d263a20b986225ae1ed7d6759dc4b93af9 upstream.

Running my ftrace tests on PowerPC, it failed the test that checks
if function_graph tracer is affected by the stack tracer. It was.
Looking into this, I found that the update_function_graph_func()
must be called even if the trampoline function is not changed.
This is because archs like PowerPC do not support ftrace_ops being
passed by assembly and instead uses a helper function (what the
trampoline function points to). Since this function is not changed
even when multiple ftrace_ops are added to the code, the test that
falls out before calling update_function_graph_func() will miss that
the update must still be done.

Call update_function_graph_function() for all calls to
update_ftrace_function()

Signed-off-by: Steven Rostedt <***@goodmis.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
kernel/trace/ftrace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a8642bac843e..d2ab10b3a30e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -331,12 +331,12 @@ static void update_ftrace_function(void)
func = ftrace_ops_list_func;
}

+ update_function_graph_func();
+
/* If there's no change, then do nothing more here */
if (ftrace_trace_function == func)
return;

- update_function_graph_func();
-
/*
* If we are using the list function, it doesn't care
* about the function_trace_ops.
--
2.0.1
Jiri Slaby
2014-07-30 12:14:20 UTC
Permalink
From: Dmitry Popov <***@qrator.net>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit e0056593b61253f1a8a9941dacda22e73b963cdc ]

This patch fixes 3 similar bugs where incoming packets might be routed into
wrong non-wildcard tunnels:

1) Consider the following setup:
ip address add 1.1.1.1/24 dev eth0
ip address add 1.1.1.2/24 dev eth0
ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0
ip link set ipip1 up

Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst =
1.1.1.2. Moreover even if there was wildcard tunnel like
ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0
but it was created before explicit one (with local 1.1.1.1), incoming ipip
packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1.

Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti)

2) ip address add 1.1.1.1/24 dev eth0
ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0
ip link set ipip1 up

Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what
src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this
issue, 2.2.146.85 is just an example, there are more than 4 million of them.
And again, wildcard tunnel like
ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0
wouldn't be ever matched if it was created before explicit tunnel like above.

Gre & vti tunnels had the same issue.

3) ip address add 1.1.1.1/24 dev eth0
ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0
ip link set gre1 up

Any incoming gre packet with key = 1 were routed into gre1, no matter what
src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised
the issue, 2.2.146.84 is just an example, there are more than 4 million of them.
Wildcard tunnel like
ip tunnel add gre2 remote any local any key 1 mode gre dev eth0
wouldn't be ever matched if it was created before explicit tunnel like above.

All this stuff happened because while looking for a wildcard tunnel we didn't
check that matched tunnel is a wildcard one. Fixed.

Signed-off-by: Dmitry Popov <***@qrator.net>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/ipv4/ip_tunnel.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d9dbe0f78612..edd5a8171357 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -166,6 +166,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,

hlist_for_each_entry_rcu(t, head, hash_node) {
if (remote != t->parms.iph.daddr ||
+ t->parms.iph.saddr != 0 ||
!(t->dev->flags & IFF_UP))
continue;

@@ -182,10 +183,11 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
head = &itn->tunnels[hash];

hlist_for_each_entry_rcu(t, head, hash_node) {
- if ((local != t->parms.iph.saddr &&
- (local != t->parms.iph.daddr ||
- !ipv4_is_multicast(local))) ||
- !(t->dev->flags & IFF_UP))
+ if ((local != t->parms.iph.saddr || t->parms.iph.daddr != 0) &&
+ (local != t->parms.iph.daddr || !ipv4_is_multicast(local)))
+ continue;
+
+ if (!(t->dev->flags & IFF_UP))
continue;

if (!ip_tunnel_key_match(&t->parms, flags, key))
@@ -202,6 +204,8 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,

hlist_for_each_entry_rcu(t, head, hash_node) {
if (t->parms.i_key != key ||
+ t->parms.iph.saddr != 0 ||
+ t->parms.iph.daddr != 0 ||
!(t->dev->flags & IFF_UP))
continue;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:22 UTC
Permalink
From: Daniel Borkmann <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit ff5e92c1affe7166b3f6e7073e648ed65a6e2e59 ]

sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and
proc_sctp_do_rto_max() do not properly reflect some error cases
when writing values via sysctl from internal proc functions such
as proc_dointvec() and proc_dostring().

In all these cases we pass the test for write != 0 and partially
do additional work just to notice that additional sanity checks
fail and we return with hard-coded -EINVAL while proc_do*
functions might also return different errors. So fix this up by
simply testing a successful return of proc_do* right after
calling it.

This also allows to propagate its return value onwards to the user.
While touching this, also fix up some minor style issues.

Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Signed-off-by: Daniel Borkmann <***@redhat.com>
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
net/sctp/sysctl.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 3e5ac1948607..976c89d5295b 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -303,41 +303,40 @@ static int proc_sctp_do_hmac_alg(struct ctl_table *ctl,
loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
- char tmp[8];
struct ctl_table tbl;
- int ret;
- int changed = 0;
+ bool changed = false;
char *none = "none";
+ char tmp[8];
+ int ret;

memset(&tbl, 0, sizeof(struct ctl_table));

if (write) {
tbl.data = tmp;
- tbl.maxlen = 8;
+ tbl.maxlen = sizeof(tmp);
} else {
tbl.data = net->sctp.sctp_hmac_alg ? : none;
tbl.maxlen = strlen(tbl.data);
}
- ret = proc_dostring(&tbl, write, buffer, lenp, ppos);

- if (write) {
+ ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
+ if (write && ret == 0) {
#ifdef CONFIG_CRYPTO_MD5
if (!strncmp(tmp, "md5", 3)) {
net->sctp.sctp_hmac_alg = "md5";
- changed = 1;
+ changed = true;
}
#endif
#ifdef CONFIG_CRYPTO_SHA1
if (!strncmp(tmp, "sha1", 4)) {
net->sctp.sctp_hmac_alg = "sha1";
- changed = 1;
+ changed = true;
}
#endif
if (!strncmp(tmp, "none", 4)) {
net->sctp.sctp_hmac_alg = NULL;
- changed = 1;
+ changed = true;
}
-
if (!changed)
ret = -EINVAL;
}
--
2.0.1
Jiri Slaby
2014-07-30 12:14:05 UTC
Permalink
From: Loic Poulain <***@intel.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 48439d501e3d9e8634bdc0c418e066870039599d upstream.

When detecting a non-link packet, h5_reset_rx() frees the Rx skb.
Not returning after that will cause the upcoming h5_rx_payload()
call to dereference a now NULL Rx skb and trigger a kernel oops.

Signed-off-by: Loic Poulain <***@intel.com>
Signed-off-by: Marcel Holtmann <***@holtmann.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/bluetooth/hci_h5.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_h5.c b/drivers/bluetooth/hci_h5.c
index b6154d5a07a5..db0be2fb05fe 100644
--- a/drivers/bluetooth/hci_h5.c
+++ b/drivers/bluetooth/hci_h5.c
@@ -406,6 +406,7 @@ static int h5_rx_3wire_hdr(struct hci_uart *hu, unsigned char c)
H5_HDR_PKT_TYPE(hdr) != HCI_3WIRE_LINK_PKT) {
BT_ERR("Non-link packet received in non-active state");
h5_reset_rx(h5);
+ return 0;
}

h5->rx_func = h5_rx_payload;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:19 UTC
Permalink
From: David Ertman <***@intel.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 96dee024ca4799d6d21588951240035c21ba1c67 upstream.

Previous commit c3a0dce35af0 fixed an overrun for the RAR on i218 devices.
This commit also attempted to homogenize the RAR/SHRA access for all parts
accessed by the e1000e driver. This change introduced an error for
assigning MAC addresses to guest OS's for 82579 devices.

Only RAR[0] is accessible to the driver for 82579 parts, and additional
addresses must be placed into the SHRA[L|H] registers. The rar_entry_count
was changed in the previous commit to an inaccurate value that accounted
for all RAR and SHRA registers, not just the ones usable by the driver.

This patch fixes the count to the correct value and adjusts the
e1000_rar_set_pch2lan() function to user the correct index.

Cc: John Greene <***@redhat.com>
Signed-off-by: Dave Ertman <***@intel.com>
Tested-by: Aaron Brown <***@intel.com>
Signed-off-by: Jeff Kirsher <***@intel.com>
Cc: "Alexander Y. Fomichev" <***@x5.ru>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 +-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 42f0f6717511..70e16f71f574 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1374,7 +1374,7 @@ static void e1000_rar_set_pch2lan(struct e1000_hw *hw, u8 *addr, u32 index)
/* RAR[1-6] are owned by manageability. Skip those and program the
* next address into the SHRA register array.
*/
- if (index < (u32)(hw->mac.rar_entry_count - 6)) {
+ if (index < (u32)(hw->mac.rar_entry_count)) {
s32 ret_val;

ret_val = e1000_acquire_swflag_ich8lan(hw);
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.h b/drivers/net/ethernet/intel/e1000e/ich8lan.h
index 217090df33e7..59865695b282 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.h
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.h
@@ -98,7 +98,7 @@
#define PCIE_ICH8_SNOOP_ALL PCIE_NO_SNOOP_ALL

#define E1000_ICH_RAR_ENTRIES 7
-#define E1000_PCH2_RAR_ENTRIES 11 /* RAR[0-6], SHRA[0-3] */
+#define E1000_PCH2_RAR_ENTRIES 5 /* RAR[0], SHRA[0-3] */
#define E1000_PCH_LPT_RAR_ENTRIES 12 /* RAR[0], SHRA[0-10] */

#define PHY_PAGE_SHIFT 5
--
2.0.1
Jiri Slaby
2014-07-30 12:14:18 UTC
Permalink
From: Emmanuel Grumbach <***@intel.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 43d826ca5979927131685cc2092c7ce862cb91cd upstream.

We should always prefer to use full RTS protection. Using
CTS to self gives a meaningless improvement, but this flow
is much harder for the firmware which is likely to have
issues with it.

Signed-off-by: Emmanuel Grumbach <***@intel.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/wireless/iwlwifi/dvm/rxon.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/dvm/rxon.c b/drivers/net/wireless/iwlwifi/dvm/rxon.c
index d7ce2f12a907..6a5b7593ea42 100644
--- a/drivers/net/wireless/iwlwifi/dvm/rxon.c
+++ b/drivers/net/wireless/iwlwifi/dvm/rxon.c
@@ -1068,13 +1068,6 @@ int iwlagn_commit_rxon(struct iwl_priv *priv, struct iwl_rxon_context *ctx)
/* recalculate basic rates */
iwl_calc_basic_rates(priv, ctx);

- /*
- * force CTS-to-self frames protection if RTS-CTS is not preferred
- * one aggregation protection method
- */
- if (!priv->hw_params.use_rts_for_aggregation)
- ctx->staging.flags |= RXON_FLG_SELF_CTS_EN;
-
if ((ctx->vif && ctx->vif->bss_conf.use_short_slot) ||
!(ctx->staging.flags & RXON_FLG_BAND_24G_MSK))
ctx->staging.flags |= RXON_FLG_SHORT_SLOT_MSK;
@@ -1480,11 +1473,6 @@ void iwlagn_bss_info_changed(struct ieee80211_hw *hw,
else
ctx->staging.flags &= ~RXON_FLG_TGG_PROTECT_MSK;

- if (bss_conf->use_cts_prot)
- ctx->staging.flags |= RXON_FLG_SELF_CTS_EN;
- else
- ctx->staging.flags &= ~RXON_FLG_SELF_CTS_EN;
-
memcpy(ctx->staging.bssid_addr, bss_conf->bssid, ETH_ALEN);

if (vif->type == NL80211_IFTYPE_AP ||
--
2.0.1
Jiri Slaby
2014-07-30 12:14:07 UTC
Permalink
From: Miklos Szeredi <***@suse.cz>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 233a01fa9c4c7c41238537e8db8434667ff28a2f upstream.

If the number in "user_id=N" or "group_id=N" mount options was larger than
INT_MAX then fuse returned EINVAL.

Fix this to handle all valid uid/gid values.

Signed-off-by: Miklos Szeredi <***@suse.cz>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/fuse/inode.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index a8ce6dab60a0..4937d4b51253 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -461,6 +461,17 @@ static const match_table_t tokens = {
{OPT_ERR, NULL}
};

+static int fuse_match_uint(substring_t *s, unsigned int *res)
+{
+ int err = -ENOMEM;
+ char *buf = match_strdup(s);
+ if (buf) {
+ err = kstrtouint(buf, 10, res);
+ kfree(buf);
+ }
+ return err;
+}
+
static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
{
char *p;
@@ -471,6 +482,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
while ((p = strsep(&opt, ",")) != NULL) {
int token;
int value;
+ unsigned uv;
substring_t args[MAX_OPT_ARGS];
if (!*p)
continue;
@@ -494,18 +506,18 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
break;

case OPT_USER_ID:
- if (match_int(&args[0], &value))
+ if (fuse_match_uint(&args[0], &uv))
return 0;
- d->user_id = make_kuid(current_user_ns(), value);
+ d->user_id = make_kuid(current_user_ns(), uv);
if (!uid_valid(d->user_id))
return 0;
d->user_id_present = 1;
break;

case OPT_GROUP_ID:
- if (match_int(&args[0], &value))
+ if (fuse_match_uint(&args[0], &uv))
return 0;
- d->group_id = make_kgid(current_user_ns(), value);
+ d->group_id = make_kgid(current_user_ns(), uv);
if (!gid_valid(d->group_id))
return 0;
d->group_id_present = 1;
--
2.0.1
Jiri Slaby
2014-07-30 12:13:54 UTC
Permalink
From: Andy Whitcroft <***@canonical.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 867f9d463b82462793ea4610e748be0b04b37fc7 upstream.

The recently merged change (in v3.14-rc6) to ACPI resource detection
(below) causes all zero length ACPI resources to be elided from the
table:

commit b355cee88e3b1a193f0e9a81db810f6f83ad728b
Author: Zhang Rui <***@intel.com>
Date: Thu Feb 27 11:37:15 2014 +0800

ACPI / resources: ignore invalid ACPI device resources

This change has caused a regression in (at least) serial port detection
for a number of machines (see LP#1313981 [1]). These seem to represent
their IO regions (presumably incorrectly) as a zero length region.
Reverting the above commit restores these serial devices.

Only elide zero length resources which lie at address 0.

Fixes: b355cee88e3b (ACPI / resources: ignore invalid ACPI device resources)
Signed-off-by: Andy Whitcroft <***@canonical.com>
Acked-by: Zhang Rui <***@intel.com>
Cc: 3.14+ <***@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <***@intel.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/acpi/resource.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 0bdacc5e26a3..2ba8f02ced36 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -77,7 +77,7 @@ bool acpi_dev_resource_memory(struct acpi_resource *ares, struct resource *res)
switch (ares->type) {
case ACPI_RESOURCE_TYPE_MEMORY24:
memory24 = &ares->data.memory24;
- if (!memory24->address_length)
+ if (!memory24->minimum && !memory24->address_length)
return false;
acpi_dev_get_memresource(res, memory24->minimum,
memory24->address_length,
@@ -85,7 +85,7 @@ bool acpi_dev_resource_memory(struct acpi_resource *ares, struct resource *res)
break;
case ACPI_RESOURCE_TYPE_MEMORY32:
memory32 = &ares->data.memory32;
- if (!memory32->address_length)
+ if (!memory32->minimum && !memory32->address_length)
return false;
acpi_dev_get_memresource(res, memory32->minimum,
memory32->address_length,
@@ -93,7 +93,7 @@ bool acpi_dev_resource_memory(struct acpi_resource *ares, struct resource *res)
break;
case ACPI_RESOURCE_TYPE_FIXED_MEMORY32:
fixed_memory32 = &ares->data.fixed_memory32;
- if (!fixed_memory32->address_length)
+ if (!fixed_memory32->address && !fixed_memory32->address_length)
return false;
acpi_dev_get_memresource(res, fixed_memory32->address,
fixed_memory32->address_length,
@@ -150,7 +150,7 @@ bool acpi_dev_resource_io(struct acpi_resource *ares, struct resource *res)
switch (ares->type) {
case ACPI_RESOURCE_TYPE_IO:
io = &ares->data.io;
- if (!io->address_length)
+ if (!io->minimum && !io->address_length)
return false;
acpi_dev_get_ioresource(res, io->minimum,
io->address_length,
@@ -158,7 +158,7 @@ bool acpi_dev_resource_io(struct acpi_resource *ares, struct resource *res)
break;
case ACPI_RESOURCE_TYPE_FIXED_IO:
fixed_io = &ares->data.fixed_io;
- if (!fixed_io->address_length)
+ if (!fixed_io->address && !fixed_io->address_length)
return false;
acpi_dev_get_ioresource(res, fixed_io->address,
fixed_io->address_length,
--
2.0.1
Jiri Slaby
2014-07-30 12:14:08 UTC
Permalink
From: Anand Avati <***@redhat.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 154210ccb3a871e631bf39fdeb7a8731d98af87b upstream.

The following test case demonstrates the bug:

sh# mount -t glusterfs localhost:meta-test /mnt/one

sh# mount -t glusterfs localhost:meta-test /mnt/two

sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file
bash: /mnt/one/file: Stale file handle

sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file

On the second open() on /mnt/one, FUSE would have used the old
nodeid (file handle) trying to re-open it. Gluster is returning
-ESTALE. The ESTALE propagates back to namei.c:filename_lookup()
where lookup is re-attempted with LOOKUP_REVAL. The right
behavior now, would be for FUSE to ignore the entry-timeout and
and do the up-call revalidation. Instead FUSE is ignoring
LOOKUP_REVAL, succeeding the revalidation (because entry-timeout
has not passed), and open() is again retried on the old file
handle and finally the ESTALE is going back to the application.

Fix: if revalidation is happening with LOOKUP_REVAL, then ignore
entry-timeout and always do the up-call.

Signed-off-by: Anand Avati <***@redhat.com>
Reviewed-by: Niels de Vos <***@redhat.com>
Signed-off-by: Miklos Szeredi <***@suse.cz>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/fuse/dir.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0afbf93f5935..936d40400c56 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -188,7 +188,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
inode = ACCESS_ONCE(entry->d_inode);
if (inode && is_bad_inode(inode))
goto invalid;
- else if (time_before64(fuse_dentry_time(entry), get_jiffies_64())) {
+ else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
+ (flags & LOOKUP_REVAL)) {
int err;
struct fuse_entry_out outarg;
struct fuse_req *req;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:12 UTC
Permalink
From: Axel Lin <***@ingics.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 6b00f440dd678d786389a7100a2e03fe44478431 upstream.

Dashes are not allowed in hwmon name attributes.
Use "da9055" instead of "da9055-hwmon".

Signed-off-by: Axel Lin <***@ingics.com>
Signed-off-by: Guenter Roeck <***@roeck-us.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/hwmon/da9055-hwmon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/da9055-hwmon.c b/drivers/hwmon/da9055-hwmon.c
index 029ecabc4380..1b275a2881d6 100644
--- a/drivers/hwmon/da9055-hwmon.c
+++ b/drivers/hwmon/da9055-hwmon.c
@@ -204,7 +204,7 @@ static ssize_t da9055_hwmon_show_name(struct device *dev,
struct device_attribute *devattr,
char *buf)
{
- return sprintf(buf, "da9055-hwmon\n");
+ return sprintf(buf, "da9055\n");
}

static ssize_t show_label(struct device *dev,
--
2.0.1
Jiri Slaby
2014-07-30 12:14:13 UTC
Permalink
From: Axel Lin <***@ingics.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit ee14b644daaa58afe1e91bb9ebd9cf1b18d1f5fa upstream.

Dashes are not allowed in hwmon name attributes.
Use "da9052" instead of "da9052-hwmon".

Signed-off-by: Axel Lin <***@ingics.com>
Signed-off-by: Guenter Roeck <***@roeck-us.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/hwmon/da9052-hwmon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/da9052-hwmon.c b/drivers/hwmon/da9052-hwmon.c
index 960fac3fb166..48044b044b7a 100644
--- a/drivers/hwmon/da9052-hwmon.c
+++ b/drivers/hwmon/da9052-hwmon.c
@@ -194,7 +194,7 @@ static ssize_t da9052_hwmon_show_name(struct device *dev,
struct device_attribute *devattr,
char *buf)
{
- return sprintf(buf, "da9052-hwmon\n");
+ return sprintf(buf, "da9052\n");
}

static ssize_t show_label(struct device *dev,
--
2.0.1
Jiri Slaby
2014-07-30 12:14:14 UTC
Permalink
From: Guenter Roeck <***@roeck-us.net>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit de12d6f4b10b21854441f5242dcb29ea96181e58 upstream.

Temperature limit registers are signed. Limits therefore need
to be clamped to (-128, 127) degrees C and not to (0, 255)
degrees C.

Without this fix, writing a limit of 128 degrees C sets the
actual limit to -128 degrees C.

Signed-off-by: Guenter Roeck <***@roeck-us.net>
Reviewed-by: Axel Lin <***@ingics.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/hwmon/adt7470.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/hwmon/adt7470.c b/drivers/hwmon/adt7470.c
index 0f4dea5ccf17..9ee3913850d6 100644
--- a/drivers/hwmon/adt7470.c
+++ b/drivers/hwmon/adt7470.c
@@ -515,7 +515,7 @@ static ssize_t set_temp_min(struct device *dev,
return -EINVAL;

temp = DIV_ROUND_CLOSEST(temp, 1000);
- temp = clamp_val(temp, 0, 255);
+ temp = clamp_val(temp, -128, 127);

mutex_lock(&data->lock);
data->temp_min[attr->index] = temp;
@@ -549,7 +549,7 @@ static ssize_t set_temp_max(struct device *dev,
return -EINVAL;

temp = DIV_ROUND_CLOSEST(temp, 1000);
- temp = clamp_val(temp, 0, 255);
+ temp = clamp_val(temp, -128, 127);

mutex_lock(&data->lock);
data->temp_max[attr->index] = temp;
@@ -826,7 +826,7 @@ static ssize_t set_pwm_tmin(struct device *dev,
return -EINVAL;

temp = DIV_ROUND_CLOSEST(temp, 1000);
- temp = clamp_val(temp, 0, 255);
+ temp = clamp_val(temp, -128, 127);

mutex_lock(&data->lock);
data->pwm_tmin[attr->index] = temp;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:04 UTC
Permalink
From: "K. Y. Srinivasan" <***@microsoft.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc upstream.

Add code to poll the channel since we process only one message
at a time and the host may not interrupt us. Also increase the
receive buffer size since some KVP messages are close to 8K bytes in size.

Signed-off-by: K. Y. Srinivasan <***@microsoft.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/hv/hv_kvp.c | 14 ++++++++++++--
drivers/hv/hv_util.c | 2 +-
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 09988b289622..816782a65488 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -127,6 +127,15 @@ kvp_work_func(struct work_struct *dummy)
kvp_respond_to_host(NULL, HV_E_FAIL);
}

+static void poll_channel(struct vmbus_channel *channel)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&channel->inbound_lock, flags);
+ hv_kvp_onchannelcallback(channel);
+ spin_unlock_irqrestore(&channel->inbound_lock, flags);
+}
+
static int kvp_handle_handshake(struct hv_kvp_msg *msg)
{
int ret = 1;
@@ -155,7 +164,7 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
kvp_register(dm_reg_value);
kvp_transaction.active = false;
if (kvp_transaction.kvp_context)
- hv_kvp_onchannelcallback(kvp_transaction.kvp_context);
+ poll_channel(kvp_transaction.kvp_context);
}
return ret;
}
@@ -568,6 +577,7 @@ response_done:

vmbus_sendpacket(channel, recv_buffer, buf_len, req_id,
VM_PKT_DATA_INBAND, 0);
+ poll_channel(channel);

}

@@ -603,7 +613,7 @@ void hv_kvp_onchannelcallback(void *context)
return;
}

- vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, &recvlen,
+ vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 4, &recvlen,
&requestid);

if (recvlen > 0) {
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 273e3ddb3a20..665b7dac6b7d 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -312,7 +312,7 @@ static int util_probe(struct hv_device *dev,
(struct hv_util_service *)dev_id->driver_data;
int ret;

- srv->recv_buffer = kmalloc(PAGE_SIZE * 2, GFP_KERNEL);
+ srv->recv_buffer = kmalloc(PAGE_SIZE * 4, GFP_KERNEL);
if (!srv->recv_buffer)
return -ENOMEM;
if (srv->util_init) {
--
2.0.1
Jiri Slaby
2014-07-30 12:13:52 UTC
Permalink
From: Qipan Li <***@csr.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 07d410e06463f3c1c106e2bb2a7ff23eff1e71c9 upstream.

commit fb78b811422cd2d8c8605949cc4cc13618347ad5 provide a workaround for
kernel panic, but bring potential deadlock risk. that is in
sirfsoc_rx_tmo_process_tl while enter into sirfsoc_uart_pio_rx_chars
cpu hold uart_port->lock, if uart interrupt comes cpu enter into
sirfsoc_uart_isr and deadlock occurs in getting uart_port->lock.

the patch replace spin_lock version to spin_lock_irq* version to avoid
spinlock dead lock issue. let function tty_flip_buffer_push in tasklet
outof spin_lock_irq* protect area to avoid add the pair of spin_lock and
spin_unlock for tty_flip_buffer_push.
BTW drop self defined unused spinlock protect of tx_lock/rx_lock.

56274.220464] BUG: spinlock lockup suspected on CPU#0, swapper/0/0
[56274.223648] lock: 0xc05d9db0, .magic: dead4ead, .owner: swapper/0/0,
.owner_cpu: 0
[56274.231278] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G
O 3.10.35 #1
[56274.238241] [<c0015530>] (unwind_backtrace+0x0/0xf4) from
[<c00120d8>] (show_stack+0x10/0x14)
[56274.246742] [<c00120d8>] (show_stack+0x10/0x14) from
[<c01b11b0>] (do_raw_spin_lock+0x110/0x184)
[56274.255501] [<c01b11b0>] (do_raw_spin_lock+0x110/0x184) from
[<c02124c8>] (sirfsoc_uart_isr+0x20/0x42c)
[56274.264874] [<c02124c8>] (sirfsoc_uart_isr+0x20/0x42c) from
[<c0075790>] (handle_irq_event_percpu+0x54/0x17c)
[56274.274758] [<c0075790>] (handle_irq_event_percpu+0x54/0x17c)
from [<c00758f4>] (handle_irq_event+0x3c/0x5c)
[56274.284561] [<c00758f4>] (handle_irq_event+0x3c/0x5c) from
[<c0077fa0>] (handle_level_irq+0x98/0xfc)
[56274.293670] [<c0077fa0>] (handle_level_irq+0x98/0xfc) from
[<c0074f44>] (generic_handle_irq+0x2c/0x3c)
[56274.302952] [<c0074f44>] (generic_handle_irq+0x2c/0x3c) from
[<c000ef80>] (handle_IRQ+0x40/0x90)
[56274.311706] [<c000ef80>] (handle_IRQ+0x40/0x90) from
[<c000dc80>] (__irq_svc+0x40/0x70)
[56274.319697] [<c000dc80>] (__irq_svc+0x40/0x70) from
[<c038113c>] (_raw_spin_unlock_irqrestore+0x10/0x48)
[56274.329158] [<c038113c>]
(_raw_spin_unlock_irqrestore+0x10/0x48) from [<c0200034>]
(tty_port_tty_get+0x58/0x90)
[56274.339213] [<c0200034>] (tty_port_tty_get+0x58/0x90) from
[<c0212008>] (sirfsoc_uart_pio_rx_chars+0x1c/0xc8)
[56274.349097] [<c0212008>]
(sirfsoc_uart_pio_rx_chars+0x1c/0xc8) from [<c0212ef8>]
(sirfsoc_rx_tmo_process_tl+0xe4/0x1fc)
[56274.359853] [<c0212ef8>]
(sirfsoc_rx_tmo_process_tl+0xe4/0x1fc) from [<c0027c04>]
(tasklet_action+0x84/0x114)
[56274.369739] [<c0027c04>] (tasklet_action+0x84/0x114) from
[<c0027db4>] (__do_softirq+0x120/0x200)
[56274.378585] [<c0027db4>] (__do_softirq+0x120/0x200) from
[<c0027f44>] (do_softirq+0x54/0x5c)
[56274.386998] [<c0027f44>] (do_softirq+0x54/0x5c) from
[<c00281ec>] (irq_exit+0x9c/0xd0)
[56274.394899] [<c00281ec>] (irq_exit+0x9c/0xd0) from
[<c000ef84>] (handle_IRQ+0x44/0x90)
[56274.402790] [<c000ef84>] (handle_IRQ+0x44/0x90) from
[<c000dc80>] (__irq_svc+0x40/0x70)
[56274.410774] [<c000dc80>] (__irq_svc+0x40/0x70) from
[<c0288af4>] (cpuidle_enter_state+0x50/0xe0)
[56274.419532] [<c0288af4>] (cpuidle_enter_state+0x50/0xe0) from
[<c0288c34>] (cpuidle_idle_call+0xb0/0x148)
[56274.429080] [<c0288c34>] (cpuidle_idle_call+0xb0/0x148) from
[<c000f3ac>] (arch_cpu_idle+0x8/0x38)
[56274.438016] [<c000f3ac>] (arch_cpu_idle+0x8/0x38) from
[<c0059344>] (cpu_startup_entry+0xfc/0x140)
[56274.446956] [<c0059344>] (cpu_startup_entry+0xfc/0x140) from
[<c04a3a54>] (start_kernel+0x2d8/0x2e4)

Signed-off-by: Qipan Li <***@csr.com>
Signed-off-by: Barry Song <***@csr.com>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/tty/serial/sirfsoc_uart.c | 49 ++++++++++++++-------------------------
drivers/tty/serial/sirfsoc_uart.h | 2 --
2 files changed, 18 insertions(+), 33 deletions(-)

diff --git a/drivers/tty/serial/sirfsoc_uart.c b/drivers/tty/serial/sirfsoc_uart.c
index 6904818d3424..ef61908cf9c3 100644
--- a/drivers/tty/serial/sirfsoc_uart.c
+++ b/drivers/tty/serial/sirfsoc_uart.c
@@ -359,9 +359,11 @@ static irqreturn_t sirfsoc_uart_usp_cts_handler(int irq, void *dev_id)
{
struct sirfsoc_uart_port *sirfport = (struct sirfsoc_uart_port *)dev_id;
struct uart_port *port = &sirfport->port;
+ spin_lock(&port->lock);
if (gpio_is_valid(sirfport->cts_gpio) && sirfport->ms_enabled)
uart_handle_cts_change(port,
!gpio_get_value(sirfport->cts_gpio));
+ spin_unlock(&port->lock);
return IRQ_HANDLED;
}

@@ -429,10 +431,6 @@ sirfsoc_uart_pio_rx_chars(struct uart_port *port, unsigned int max_rx_count)
sirfport->rx_io_count += rx_count;
port->icount.rx += rx_count;

- spin_unlock(&port->lock);
- tty_flip_buffer_push(&port->state->port);
- spin_lock(&port->lock);
-
return rx_count;
}

@@ -466,6 +464,7 @@ static void sirfsoc_uart_tx_dma_complete_callback(void *param)
struct circ_buf *xmit = &port->state->xmit;
unsigned long flags;

+ spin_lock_irqsave(&port->lock, flags);
xmit->tail = (xmit->tail + sirfport->transfer_size) &
(UART_XMIT_SIZE - 1);
port->icount.tx += sirfport->transfer_size;
@@ -474,10 +473,9 @@ static void sirfsoc_uart_tx_dma_complete_callback(void *param)
if (sirfport->tx_dma_addr)
dma_unmap_single(port->dev, sirfport->tx_dma_addr,
sirfport->transfer_size, DMA_TO_DEVICE);
- spin_lock_irqsave(&sirfport->tx_lock, flags);
sirfport->tx_dma_state = TX_DMA_IDLE;
sirfsoc_uart_tx_with_dma(sirfport);
- spin_unlock_irqrestore(&sirfport->tx_lock, flags);
+ spin_unlock_irqrestore(&port->lock, flags);
}

static void sirfsoc_uart_insert_rx_buf_to_tty(
@@ -490,7 +488,6 @@ static void sirfsoc_uart_insert_rx_buf_to_tty(
inserted = tty_insert_flip_string(tport,
sirfport->rx_dma_items[sirfport->rx_completed].xmit.buf, count);
port->icount.rx += inserted;
- tty_flip_buffer_push(tport);
}

static void sirfsoc_rx_submit_one_dma_desc(struct uart_port *port, int index)
@@ -525,7 +522,7 @@ static void sirfsoc_rx_tmo_process_tl(unsigned long param)
unsigned int count;
unsigned long flags;

- spin_lock_irqsave(&sirfport->rx_lock, flags);
+ spin_lock_irqsave(&port->lock, flags);
while (sirfport->rx_completed != sirfport->rx_issued) {
sirfsoc_uart_insert_rx_buf_to_tty(sirfport,
SIRFSOC_RX_DMA_BUF_SIZE);
@@ -540,12 +537,8 @@ static void sirfsoc_rx_tmo_process_tl(unsigned long param)
wr_regl(port, ureg->sirfsoc_rx_dma_io_ctrl,
rd_regl(port, ureg->sirfsoc_rx_dma_io_ctrl) |
SIRFUART_IO_MODE);
- spin_unlock_irqrestore(&sirfport->rx_lock, flags);
- spin_lock(&port->lock);
sirfsoc_uart_pio_rx_chars(port, 4 - sirfport->rx_io_count);
- spin_unlock(&port->lock);
if (sirfport->rx_io_count == 4) {
- spin_lock_irqsave(&sirfport->rx_lock, flags);
sirfport->rx_io_count = 0;
wr_regl(port, ureg->sirfsoc_int_st_reg,
uint_st->sirfsoc_rx_done);
@@ -556,11 +549,8 @@ static void sirfsoc_rx_tmo_process_tl(unsigned long param)
else
wr_regl(port, SIRFUART_INT_EN_CLR,
uint_en->sirfsoc_rx_done_en);
- spin_unlock_irqrestore(&sirfport->rx_lock, flags);
-
sirfsoc_uart_start_next_rx_dma(port);
} else {
- spin_lock_irqsave(&sirfport->rx_lock, flags);
wr_regl(port, ureg->sirfsoc_int_st_reg,
uint_st->sirfsoc_rx_done);
if (!sirfport->is_marco)
@@ -570,8 +560,9 @@ static void sirfsoc_rx_tmo_process_tl(unsigned long param)
else
wr_regl(port, ureg->sirfsoc_int_en_reg,
uint_en->sirfsoc_rx_done_en);
- spin_unlock_irqrestore(&sirfport->rx_lock, flags);
}
+ spin_unlock_irqrestore(&port->lock, flags);
+ tty_flip_buffer_push(&port->state->port);
}

static void sirfsoc_uart_handle_rx_tmo(struct sirfsoc_uart_port *sirfport)
@@ -580,8 +571,6 @@ static void sirfsoc_uart_handle_rx_tmo(struct sirfsoc_uart_port *sirfport)
struct sirfsoc_register *ureg = &sirfport->uart_reg->uart_reg;
struct sirfsoc_int_en *uint_en = &sirfport->uart_reg->uart_int_en;
struct dma_tx_state tx_state;
- spin_lock(&sirfport->rx_lock);
-
dmaengine_tx_status(sirfport->rx_dma_chan,
sirfport->rx_dma_items[sirfport->rx_issued].cookie, &tx_state);
dmaengine_terminate_all(sirfport->rx_dma_chan);
@@ -594,7 +583,6 @@ static void sirfsoc_uart_handle_rx_tmo(struct sirfsoc_uart_port *sirfport)
else
wr_regl(port, SIRFUART_INT_EN_CLR,
uint_en->sirfsoc_rx_timeout_en);
- spin_unlock(&sirfport->rx_lock);
tasklet_schedule(&sirfport->rx_tmo_process_tasklet);
}

@@ -658,7 +646,6 @@ static irqreturn_t sirfsoc_uart_isr(int irq, void *dev_id)
intr_status &= port->read_status_mask;
uart_insert_char(port, intr_status,
uint_en->sirfsoc_rx_oflow_en, 0, flag);
- tty_flip_buffer_push(&state->port);
}
recv_char:
if ((sirfport->uart_reg->uart_type == SIRF_REAL_UART) &&
@@ -683,6 +670,9 @@ recv_char:
sirfsoc_uart_pio_rx_chars(port,
SIRFSOC_UART_IO_RX_MAX_CNT);
}
+ spin_unlock(&port->lock);
+ tty_flip_buffer_push(&state->port);
+ spin_lock(&port->lock);
if (intr_status & uint_st->sirfsoc_txfifo_empty) {
if (IS_DMA_CHAN_VALID(sirfport->tx_dma_no))
sirfsoc_uart_tx_with_dma(sirfport);
@@ -701,6 +691,7 @@ recv_char:
}
}
spin_unlock(&port->lock);
+
return IRQ_HANDLED;
}

@@ -709,24 +700,27 @@ static void sirfsoc_uart_rx_dma_complete_tl(unsigned long param)
struct sirfsoc_uart_port *sirfport = (struct sirfsoc_uart_port *)param;
struct uart_port *port = &sirfport->port;
unsigned long flags;
- spin_lock_irqsave(&sirfport->rx_lock, flags);
+ spin_lock_irqsave(&port->rx_lock, flags);
while (sirfport->rx_completed != sirfport->rx_issued) {
sirfsoc_uart_insert_rx_buf_to_tty(sirfport,
SIRFSOC_RX_DMA_BUF_SIZE);
sirfsoc_rx_submit_one_dma_desc(port, sirfport->rx_completed++);
sirfport->rx_completed %= SIRFSOC_RX_LOOP_BUF_CNT;
}
- spin_unlock_irqrestore(&sirfport->rx_lock, flags);
+ spin_unlock_irqrestore(&port->lock, flags);
+ tty_flip_buffer_push(&port->state->port);
}

static void sirfsoc_uart_rx_dma_complete_callback(void *param)
{
struct sirfsoc_uart_port *sirfport = (struct sirfsoc_uart_port *)param;
- spin_lock(&sirfport->rx_lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&sirfport->port.lock, flags);
sirfport->rx_issued++;
sirfport->rx_issued %= SIRFSOC_RX_LOOP_BUF_CNT;
- spin_unlock(&sirfport->rx_lock);
tasklet_schedule(&sirfport->rx_dma_complete_tasklet);
+ spin_unlock_irqrestore(&sirfport->port.lock, flags);
}

/* submit rx dma task into dmaengine */
@@ -735,18 +729,14 @@ static void sirfsoc_uart_start_next_rx_dma(struct uart_port *port)
struct sirfsoc_uart_port *sirfport = to_sirfport(port);
struct sirfsoc_register *ureg = &sirfport->uart_reg->uart_reg;
struct sirfsoc_int_en *uint_en = &sirfport->uart_reg->uart_int_en;
- unsigned long flags;
int i;
- spin_lock_irqsave(&sirfport->rx_lock, flags);
sirfport->rx_io_count = 0;
wr_regl(port, ureg->sirfsoc_rx_dma_io_ctrl,
rd_regl(port, ureg->sirfsoc_rx_dma_io_ctrl) &
~SIRFUART_IO_MODE);
- spin_unlock_irqrestore(&sirfport->rx_lock, flags);
for (i = 0; i < SIRFSOC_RX_LOOP_BUF_CNT; i++)
sirfsoc_rx_submit_one_dma_desc(port, i);
sirfport->rx_completed = sirfport->rx_issued = 0;
- spin_lock_irqsave(&sirfport->rx_lock, flags);
if (!sirfport->is_marco)
wr_regl(port, ureg->sirfsoc_int_en_reg,
rd_regl(port, ureg->sirfsoc_int_en_reg) |
@@ -754,7 +744,6 @@ static void sirfsoc_uart_start_next_rx_dma(struct uart_port *port)
else
wr_regl(port, ureg->sirfsoc_int_en_reg,
SIRFUART_RX_DMA_INT_EN(port, uint_en));
- spin_unlock_irqrestore(&sirfport->rx_lock, flags);
}

static void sirfsoc_uart_start_rx(struct uart_port *port)
@@ -1455,8 +1444,6 @@ usp_no_flow_control:
ret = -EFAULT;
goto err;
}
- spin_lock_init(&sirfport->rx_lock);
- spin_lock_init(&sirfport->tx_lock);
tasklet_init(&sirfport->rx_dma_complete_tasklet,
sirfsoc_uart_rx_dma_complete_tl, (unsigned long)sirfport);
tasklet_init(&sirfport->rx_tmo_process_tasklet,
diff --git a/drivers/tty/serial/sirfsoc_uart.h b/drivers/tty/serial/sirfsoc_uart.h
index fb8d0a002607..38cb159138f1 100644
--- a/drivers/tty/serial/sirfsoc_uart.h
+++ b/drivers/tty/serial/sirfsoc_uart.h
@@ -438,8 +438,6 @@ struct sirfsoc_uart_port {
struct dma_chan *tx_dma_chan;
dma_addr_t tx_dma_addr;
struct dma_async_tx_descriptor *tx_dma_desc;
- spinlock_t rx_lock;
- spinlock_t tx_lock;
struct tasklet_struct rx_dma_complete_tasklet;
struct tasklet_struct rx_tmo_process_tasklet;
unsigned int rx_io_count;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:06 UTC
Permalink
From: Miklos Szeredi <***@suse.cz>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 126b9d4365b110c157bc4cbc32540dfa66c9c85a upstream.

As suggested by checkpatch.pl, use time_before64() instead of direct
comparison of jiffies64 values.

Signed-off-by: Miklos Szeredi <***@suse.cz>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
fs/fuse/dir.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index b7989f2ab4c4..0afbf93f5935 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -188,7 +188,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
inode = ACCESS_ONCE(entry->d_inode);
if (inode && is_bad_inode(inode))
goto invalid;
- else if (fuse_dentry_time(entry) < get_jiffies_64()) {
+ else if (time_before64(fuse_dentry_time(entry), get_jiffies_64())) {
int err;
struct fuse_entry_out outarg;
struct fuse_req *req;
@@ -945,7 +945,7 @@ int fuse_update_attributes(struct inode *inode, struct kstat *stat,
int err;
bool r;

- if (fi->i_time < get_jiffies_64()) {
+ if (time_before64(fi->i_time, get_jiffies_64())) {
r = true;
err = fuse_do_getattr(inode, stat, file);
} else {
@@ -1131,7 +1131,7 @@ static int fuse_permission(struct inode *inode, int mask)
((mask & MAY_EXEC) && S_ISREG(inode->i_mode))) {
struct fuse_inode *fi = get_fuse_inode(inode);

- if (fi->i_time < get_jiffies_64()) {
+ if (time_before64(fi->i_time, get_jiffies_64())) {
refreshed = true;

err = fuse_perm_getattr(inode, mask);
--
2.0.1
Jiri Slaby
2014-07-30 12:14:01 UTC
Permalink
From: Abbas Raza <***@mentor.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 953c66469735aed8d2ada639a72b150f01dae605 upstream.

There are 2 methods for ZLP (zero-length packet) generation:
1) In software
2) Automatic generation by device controller

1) is implemented in UDC driver and it attaches ZLP to IN packet if
descriptor->size < wLength
2) can be enabled/disabled by setting ZLT bit in the QH

When gadget ffs is connected to ubuntu host, the host sends
get descriptor request and wLength in setup packet is 255 while the
size of descriptor which will be sent by gadget in IN packet is
64 byte. So the composite driver sets req->zero = 1.
In UDC driver following code will be executed then

if (hwreq->req.zero && hwreq->req.length
&& (hwreq->req.length % hwep->ep.maxpacket == 0))
add_td_to_list(hwep, hwreq, 0);

Case-A:
So in case of ubuntu host, UDC driver will attach a ZLP to the IN packet.
ubuntu host will request 255 byte in IN request, gadget will send 64 byte
with ZLP and host will come to know that there is no more data.
But hold on, by default ZLT=0 for endpoint 0 so hardware also tries to
automatically generate the ZLP which blocks enumeration for ~6 seconds due
to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING)

Case-B:
In case when gadget ffs is connected to Apple device, Apple device sends
setup packet with wLength=64. So descriptor->size = 64 and wLength=64
therefore req->zero = 0 and UDC driver will not attach any ZLP to the
IN packet. Apple device requests 64 bytes, gets 64 bytes and doesn't
further request for IN data. But ZLT=0 by default for endpoint 0 so
hardware tries to automatically generate the ZLP which blocks enumeration
for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any
requests (OUT/PING)

According to USB2.0 specs:

8.5.3.2 Variable-length Data Stage
A control pipe may have a variable-length data phase in which the
host requests more data than is contained in the specified data
structure. When all of the data structure is returned to the host,
the function should indicate that the Data stage is ended by
returning a packet that is shorter than the MaxPacketSize for the
pipe. If the data structure is an exact multiple of wMaxPacketSize
for the pipe, the function will return a zero-length packet to indicate
the end of the Data stage.

In Case-A mentioned above:
If we disable software ZLP generation & ZLT=0 for endpoint 0 OR if software
ZLP generation is not disabled but we set ZLT=1 for endpoint 0 then
enumeration doesn't block for 6 seconds.

In Case-B mentioned above:
If we disable software ZLP generation & ZLT=0 for endpoint then enumeration
still blocks due to ZLP automatically generated by hardware and host not needing
it. But if we keep software ZLP generation enabled but we set ZLT=1 for
endpoint 0 then enumeration doesn't block for 6 seconds.

So the proper solution for this issue seems to disable automatic ZLP generation
by hardware (i.e by setting ZLT=1 for endpoint 0) and let software (UDC driver)
handle the ZLP generation based on req->zero field.

Signed-off-by: Abbas Raza <***@mentor.com>
Signed-off-by: Peter Chen <***@freescale.com>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/usb/chipidea/udc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
index a18c2cfafe6d..455e4e6b9926 100644
--- a/drivers/usb/chipidea/udc.c
+++ b/drivers/usb/chipidea/udc.c
@@ -1178,8 +1178,8 @@ static int ep_enable(struct usb_ep *ep,

if (hwep->type == USB_ENDPOINT_XFER_CONTROL)
cap |= QH_IOS;
- if (hwep->num)
- cap |= QH_ZLT;
+
+ cap |= QH_ZLT;
cap |= (hwep->ep.maxpacket << __ffs(QH_MAX_PKT)) & QH_MAX_PKT;
/*
* For ISO-TX, we set mult at QH as the largest value, and use
--
2.0.1
Jiri Slaby
2014-07-30 12:13:59 UTC
Permalink
From: Catalin Marinas <***@arm.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 7f88f88f83ed609650a01b18572e605ea50cd163 upstream.

Commit 248ac0e1943a ("mm/vmalloc: remove guard page from between vmap
blocks") had the side effect of making vmap_area.va_end member point to
the next vmap_area.va_start. This was creating an artificial reference
to vmalloc'ed objects and kmemleak was rarely reporting vmalloc() leaks.

This patch marks the vmap_area containing pointers explicitly and
reduces the min ref_count to 2 as vm_struct still contains a reference
to the vmalloc'ed object. The kmemleak add_scan_area() function has
been improved to allow a SIZE_MAX argument covering the rest of the
object (for simpler calling sites).

Signed-off-by: Catalin Marinas <***@arm.com>
Signed-off-by: Andrew Morton <***@linux-foundation.org>
Signed-off-by: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
mm/kmemleak.c | 4 +++-
mm/vmalloc.c | 14 ++++++++++----
2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index e126b0ef9ad2..31f01c5011e5 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -753,7 +753,9 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
}

spin_lock_irqsave(&object->lock, flags);
- if (ptr + size > object->pointer + object->size) {
+ if (size == SIZE_MAX) {
+ size = object->pointer + object->size - ptr;
+ } else if (ptr + size > object->pointer + object->size) {
kmemleak_warn("Scan area larger than object 0x%08lx\n", ptr);
dump_object_info(object);
kmem_cache_free(scan_area_cache, area);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 107454312d5e..e2be0f802ccf 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -359,6 +359,12 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
if (unlikely(!va))
return ERR_PTR(-ENOMEM);

+ /*
+ * Only scan the relevant parts containing pointers to other objects
+ * to avoid false negatives.
+ */
+ kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
+
retry:
spin_lock(&vmap_area_lock);
/*
@@ -1646,11 +1652,11 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
clear_vm_uninitialized_flag(area);

/*
- * A ref_count = 3 is needed because the vm_struct and vmap_area
- * structures allocated in the __get_vm_area_node() function contain
- * references to the virtual address of the vmalloc'ed block.
+ * A ref_count = 2 is needed because vm_struct allocated in
+ * __get_vm_area_node() contains a reference to the virtual address of
+ * the vmalloc'ed block.
*/
- kmemleak_alloc(addr, real_size, 3, gfp_mask);
+ kmemleak_alloc(addr, real_size, 2, gfp_mask);

return addr;
--
2.0.1
Jiri Slaby
2014-07-30 12:14:03 UTC
Permalink
From: Takashi Iwai <***@suse.de>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4da63c6fc426023d1a20e45508c47d7d68c6a53d upstream.

When the initialization of Intel HDMI controller fails due to missing
i915 kernel symbols (e.g. HD-audio is built in while i915 is module),
the driver discontinues the probe. However, since the probe was done
asynchronously, the driver object still remains, thus the relevant PM
ops are still called at suspend/resume. This results in the bad access
to the incomplete audio card object, eventually leads to Oops or stall
at PM.

This patch adds the missing checks of chip->init_failed flag at each
PM callback in order to fix the problem above.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561
Signed-off-by: Takashi Iwai <***@suse.de>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
sound/pci/hda/hda_intel.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index ee1a6ff120a2..37806a97c878 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2917,7 +2917,7 @@ static int azx_suspend(struct device *dev)
struct azx *chip = card->private_data;
struct azx_pcm *p;

- if (chip->disabled)
+ if (chip->disabled || chip->init_failed)
return 0;

snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);
@@ -2948,7 +2948,7 @@ static int azx_resume(struct device *dev)
struct snd_card *card = dev_get_drvdata(dev);
struct azx *chip = card->private_data;

- if (chip->disabled)
+ if (chip->disabled || chip->init_failed)
return 0;

if (chip->driver_caps & AZX_DCAPS_I915_POWERWELL)
@@ -2983,7 +2983,7 @@ static int azx_runtime_suspend(struct device *dev)
struct snd_card *card = dev_get_drvdata(dev);
struct azx *chip = card->private_data;

- if (chip->disabled)
+ if (chip->disabled || chip->init_failed)
return 0;

if (!(chip->driver_caps & AZX_DCAPS_PM_RUNTIME))
@@ -3009,7 +3009,7 @@ static int azx_runtime_resume(struct device *dev)
struct hda_codec *codec;
int status;

- if (chip->disabled)
+ if (chip->disabled || chip->init_failed)
return 0;

if (!(chip->driver_caps & AZX_DCAPS_PM_RUNTIME))
@@ -3044,7 +3044,7 @@ static int azx_runtime_idle(struct device *dev)
struct snd_card *card = dev_get_drvdata(dev);
struct azx *chip = card->private_data;

- if (chip->disabled)
+ if (chip->disabled || chip->init_failed)
return 0;

if (!power_save_controller ||
--
2.0.1
Jiri Slaby
2014-07-30 12:13:53 UTC
Permalink
From: Daniel Thompson <***@linaro.org>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 58eb97c99da6a82c556ddec70683eb3863d4f617 upstream.

After 07d410e0) serial: sirf: fix spinlock deadlock issue it is no longer
possiblet to compile this driver. The rename of one of the spinlocks is
faulty. After looking at the original patch I believe this is the correct
fix.

Compile tested using ARM's multi_v7_defconfig

Reported-by: Stephen Rothwell <***@canb.auug.org.au>
Cc: Jiri Slaby <***@suse.cz>
Cc: Qipan Li <***@csr.com>
Signed-off-by: Daniel Thompson <***@linaro.org>
Acked-by: Barry Song <***@kernel.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/tty/serial/sirfsoc_uart.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/sirfsoc_uart.c b/drivers/tty/serial/sirfsoc_uart.c
index ef61908cf9c3..25aecf0fa339 100644
--- a/drivers/tty/serial/sirfsoc_uart.c
+++ b/drivers/tty/serial/sirfsoc_uart.c
@@ -700,7 +700,7 @@ static void sirfsoc_uart_rx_dma_complete_tl(unsigned long param)
struct sirfsoc_uart_port *sirfport = (struct sirfsoc_uart_port *)param;
struct uart_port *port = &sirfport->port;
unsigned long flags;
- spin_lock_irqsave(&port->rx_lock, flags);
+ spin_lock_irqsave(&port->lock, flags);
while (sirfport->rx_completed != sirfport->rx_issued) {
sirfsoc_uart_insert_rx_buf_to_tty(sirfport,
SIRFSOC_RX_DMA_BUF_SIZE);
--
2.0.1
Jiri Slaby
2014-07-30 12:13:55 UTC
Permalink
From: Tyler Hall <***@gmail.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 661f7fda21b15ec52f57fcd397c03370acc28688 upstream.

Use schedule_work() to avoid potentially taking the spinlock in
interrupt context.

Commit cc9fa74e2a ("slip/slcan: added locking in wakeup function") added
necessary locking to the wakeup function and 367525c8c2/ddcde142be ("can:
slcan: Fix spinlock variant") converted it to spin_lock_bh() because the lock
is also taken in timers.

Disabling softirqs is not sufficient, however, as tty drivers may call
write_wakeup from interrupt context. This driver calls tty->ops->write() with
its spinlock held, which may immediately cause an interrupt on the same CPU and
subsequent spin_bug().

Simply converting to spin_lock_irq/irqsave() prevents this deadlock, but
causes lockdep to point out a possible circular locking dependency
between these locks:

(&(&sl->lock)->rlock){-.....}, at: slip_write_wakeup
(&port_lock_key){-.....}, at: serial8250_handle_irq.part.13

The slip transmit is holding the slip spinlock when calling the tty write.
This grabs the port lock. On an interrupt, the handler grabs the port
lock and calls write_wakeup which grabs the slip lock. This could be a
problem if a serial interrupt occurs on another CPU during the slip
transmit.

To deal with these issues, don't grab the lock in the wakeup function by
deferring the writeout to a workqueue. Also hold the lock during close
when de-assigning the tty pointer to safely disarm the worker and
timers.

This bug is easily reproducible on the first transmit when slip is
used with the standard 8250 serial driver.

[<c0410b7c>] (spin_bug+0x0/0x38) from [<c006109c>] (do_raw_spin_lock+0x60/0x1d0)
r5:eab27000 r4:ec02754c
[<c006103c>] (do_raw_spin_lock+0x0/0x1d0) from [<c04185c0>] (_raw_spin_lock+0x28/0x2c)
r10:0000001f r9:eabb814c r8:eabb8140 r7:40070193 r6:ec02754c r5:eab27000
r4:ec02754c r3:00000000
[<c0418598>] (_raw_spin_lock+0x0/0x2c) from [<bf3a0220>] (slip_write_wakeup+0x50/0xe0 [slip])
r4:ec027540 r3:00000003
[<bf3a01d0>] (slip_write_wakeup+0x0/0xe0 [slip]) from [<c026e420>] (tty_wakeup+0x48/0x68)
r6:00000000 r5:ea80c480 r4:eab27000 r3:bf3a01d0
[<c026e3d8>] (tty_wakeup+0x0/0x68) from [<c028a8ec>] (uart_write_wakeup+0x2c/0x30)
r5:ed68ea90 r4:c06790d8
[<c028a8c0>] (uart_write_wakeup+0x0/0x30) from [<c028dc44>] (serial8250_tx_chars+0x114/0x170)
[<c028db30>] (serial8250_tx_chars+0x0/0x170) from [<c028dffc>] (serial8250_handle_irq+0xa0/0xbc)
r6:000000c2 r5:00000060 r4:c06790d8 r3:00000000
[<c028df5c>] (serial8250_handle_irq+0x0/0xbc) from [<c02933a4>] (dw8250_handle_irq+0x38/0x64)
r7:00000000 r6:edd2f390 r5:000000c2 r4:c06790d8
[<c029336c>] (dw8250_handle_irq+0x0/0x64) from [<c028d2f4>] (serial8250_interrupt+0x44/0xc4)
r6:00000000 r5:00000000 r4:c06791c4 r3:c029336c
[<c028d2b0>] (serial8250_interrupt+0x0/0xc4) from [<c0067fe4>] (handle_irq_event_percpu+0xb4/0x2b0)
r10:c06790d8 r9:eab27000 r8:00000000 r7:00000000 r6:0000001f r5:edd52980
r4:ec53b6c0 r3:c028d2b0
[<c0067f30>] (handle_irq_event_percpu+0x0/0x2b0) from [<c006822c>] (handle_irq_event+0x4c/0x6c)
r10:c06790d8 r9:eab27000 r8:c0673ae0 r7:c05c2020 r6:ec53b6c0 r5:edd529d4
r4:edd52980
[<c00681e0>] (handle_irq_event+0x0/0x6c) from [<c006b140>] (handle_level_irq+0xe8/0x100)
r6:00000000 r5:edd529d4 r4:edd52980 r3:00022000
[<c006b058>] (handle_level_irq+0x0/0x100) from [<c00676f8>] (generic_handle_irq+0x30/0x40)
r5:0000001f r4:0000001f
[<c00676c8>] (generic_handle_irq+0x0/0x40) from [<c000f57c>] (handle_IRQ+0xd0/0x13c)
r4:ea997b18 r3:000000e0
[<c000f4ac>] (handle_IRQ+0x0/0x13c) from [<c00086c4>] (armada_370_xp_handle_irq+0x4c/0x118)
r8:000003ff r7:ea997b18 r6:ffffffff r5:60070013 r4:c0674dc0
[<c0008678>] (armada_370_xp_handle_irq+0x0/0x118) from [<c0013840>] (__irq_svc+0x40/0x70)
Exception stack(0xea997b18 to 0xea997b60)
7b00: 00000001 20070013
7b20: 00000000 0000000b 20070013 eab27000 20070013 00000000 ed10103e eab27000
7b40: c06790d8 ea997b74 ea997b60 ea997b60 c04186c0 c04186c8 60070013 ffffffff
r9:eab27000 r8:ed10103e r7:ea997b4c r6:ffffffff r5:60070013 r4:c04186c8
[<c04186a4>] (_raw_spin_unlock_irqrestore+0x0/0x54) from [<c0288fc0>] (uart_start+0x40/0x44)
r4:c06790d8 r3:c028ddd8
[<c0288f80>] (uart_start+0x0/0x44) from [<c028982c>] (uart_write+0xe4/0xf4)
r6:0000003e r5:00000000 r4:ed68ea90 r3:0000003e
[<c0289748>] (uart_write+0x0/0xf4) from [<bf3a0d20>] (sl_xmit+0x1c4/0x228 [slip])
r10:ed388e60 r9:0000003c r8:ffffffdd r7:0000003e r6:ec02754c r5:ea717eb8
r4:ec027000
[<bf3a0b5c>] (sl_xmit+0x0/0x228 [slip]) from [<c0368d74>] (dev_hard_start_xmit+0x39c/0x6d0)
r8:eaf163c0 r7:ec027000 r6:ea717eb8 r5:00000000 r4:00000000

Signed-off-by: Tyler Hall <***@gmail.com>
Cc: Oliver Hartkopp <***@hartkopp.net>
Cc: Andre Naujoks <***@gmail.com>
Cc: David S. Miller <***@davemloft.net>
Cc: linux-***@vger.kernel.org
Signed-off-by: David S. Miller <***@davemloft.net>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
drivers/net/slip/slip.c | 36 ++++++++++++++++++++++++++----------
drivers/net/slip/slip.h | 1 +
2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index ad4a94e9ff57..87526443841f 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -83,6 +83,7 @@
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/slab.h>
+#include <linux/workqueue.h>
#include "slip.h"
#ifdef CONFIG_INET
#include <linux/ip.h>
@@ -416,36 +417,46 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
#endif
}

-/*
- * Called by the driver when there's room for more data. If we have
- * more packets to send, we send them here.
- */
-static void slip_write_wakeup(struct tty_struct *tty)
+/* Write out any remaining transmit buffer. Scheduled when tty is writable */
+static void slip_transmit(struct work_struct *work)
{
+ struct slip *sl = container_of(work, struct slip, tx_work);
int actual;
- struct slip *sl = tty->disc_data;

+ spin_lock_bh(&sl->lock);
/* First make sure we're connected. */
- if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev))
+ if (!sl->tty || sl->magic != SLIP_MAGIC || !netif_running(sl->dev)) {
+ spin_unlock_bh(&sl->lock);
return;
+ }

- spin_lock_bh(&sl->lock);
if (sl->xleft <= 0) {
/* Now serial buffer is almost free & we can start
* transmission of another packet */
sl->dev->stats.tx_packets++;
- clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+ clear_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
spin_unlock_bh(&sl->lock);
sl_unlock(sl);
return;
}

- actual = tty->ops->write(tty, sl->xhead, sl->xleft);
+ actual = sl->tty->ops->write(sl->tty, sl->xhead, sl->xleft);
sl->xleft -= actual;
sl->xhead += actual;
spin_unlock_bh(&sl->lock);
}

+/*
+ * Called by the driver when there's room for more data.
+ * Schedule the transmit.
+ */
+static void slip_write_wakeup(struct tty_struct *tty)
+{
+ struct slip *sl = tty->disc_data;
+
+ schedule_work(&sl->tx_work);
+}
+
static void sl_tx_timeout(struct net_device *dev)
{
struct slip *sl = netdev_priv(dev);
@@ -749,6 +760,7 @@ static struct slip *sl_alloc(dev_t line)
sl->magic = SLIP_MAGIC;
sl->dev = dev;
spin_lock_init(&sl->lock);
+ INIT_WORK(&sl->tx_work, slip_transmit);
sl->mode = SL_MODE_DEFAULT;
#ifdef CONFIG_SLIP_SMART
/* initialize timer_list struct */
@@ -872,8 +884,12 @@ static void slip_close(struct tty_struct *tty)
if (!sl || sl->magic != SLIP_MAGIC || sl->tty != tty)
return;

+ spin_lock_bh(&sl->lock);
tty->disc_data = NULL;
sl->tty = NULL;
+ spin_unlock_bh(&sl->lock);
+
+ flush_work(&sl->tx_work);

/* VSV = very important to remove timers */
#ifdef CONFIG_SLIP_SMART
diff --git a/drivers/net/slip/slip.h b/drivers/net/slip/slip.h
index 67673cf1266b..cf32aadf508f 100644
--- a/drivers/net/slip/slip.h
+++ b/drivers/net/slip/slip.h
@@ -53,6 +53,7 @@ struct slip {
struct tty_struct *tty; /* ptr to TTY structure */
struct net_device *dev; /* easy for intr handling */
spinlock_t lock;
+ struct work_struct tx_work; /* Flushes transmit buffer */

#ifdef SL_INCLUDE_CSLIP
struct slcompress *slcomp; /* for header compression */
--
2.0.1
Jiri Slaby
2014-07-30 12:13:58 UTC
Permalink
From: Hugh Dickins <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream.

shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.

But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.

shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.

shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.

We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?

The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.

Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.

Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.

But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.

Signed-off-by: Hugh Dickins <***@google.com>
Reported-by: Sasha Levin <***@oracle.com>
Suggested-by: Vlastimil Babka <***@suse.cz>
Cc: Konstantin Khlebnikov <***@gmail.com>
Cc: Johannes Weiner <***@cmpxchg.org>
Cc: Lukas Czerner <***@redhat.com>
Cc: Dave Jones <***@redhat.com>
Cc: <***@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <***@linux-foundation.org>
Signed-off-by: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
mm/shmem.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 6f5626fca3cc..0da81aaeb4cc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -534,22 +534,19 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
return;

index = start;
- for ( ; ; ) {
+ while (index < end) {
cond_resched();
pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE),
pvec.pages, indices);
if (!pvec.nr) {
- if (index == start || unfalloc)
+ /* If all gone or hole-punch or unfalloc, we're done */
+ if (index == start || end != -1)
break;
+ /* But if truncating, restart to make sure all gone */
index = start;
continue;
}
- if ((index == start || unfalloc) && indices[0] >= end) {
- shmem_deswap_pagevec(&pvec);
- pagevec_release(&pvec);
- break;
- }
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -561,8 +558,12 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (radix_tree_exceptional_entry(page)) {
if (unfalloc)
continue;
- nr_swaps_freed += !shmem_free_swap(mapping,
- index, page);
+ if (shmem_free_swap(mapping, index, page)) {
+ /* Swap was replaced by page: retry */
+ index--;
+ break;
+ }
+ nr_swaps_freed++;
continue;
}

@@ -571,6 +572,11 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (page->mapping == mapping) {
VM_BUG_ON(PageWriteback(page));
truncate_inode_page(mapping, page);
+ } else {
+ /* Page was replaced by swap: retry */
+ unlock_page(page);
+ index--;
+ break;
}
}
unlock_page(page);
--
2.0.1
Jiri Slaby
2014-07-30 12:13:56 UTC
Permalink
From: Hugh Dickins <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f00cdc6df7d7cfcabb5b740911e6788cb0802bdb upstream.

Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.

It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.

Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).

[***@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <***@google.com>
Reported-by: Sasha Levin <***@oracle.com>
Tested-by: Sasha Levin <***@oracle.com>
Cc: Dave Jones <***@redhat.com>
Signed-off-by: Andrew Morton <***@linux-foundation.org>
Signed-off-by: Linus Torvalds <***@linux-foundation.org>

Signed-off-by: Jiri Slaby <***@suse.cz>
---
mm/shmem.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 8297623fcaed..00d412fd2254 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -80,11 +80,12 @@ static struct vfsmount *shm_mnt;
#define SHORT_SYMLINK_LEN 128

/*
- * shmem_fallocate and shmem_writepage communicate via inode->i_private
- * (with i_mutex making sure that it has only one user at a time):
- * we would prefer not to enlarge the shmem inode just for that.
+ * shmem_fallocate communicates with shmem_fault or shmem_writepage via
+ * inode->i_private (with i_mutex making sure that it has only one user at
+ * a time): we would prefer not to enlarge the shmem inode just for that.
*/
struct shmem_falloc {
+ int mode; /* FALLOC_FL mode currently operating */
pgoff_t start; /* start of range currently being fallocated */
pgoff_t next; /* the next page offset to be fallocated */
pgoff_t nr_falloced; /* how many new pages have been fallocated */
@@ -826,6 +827,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
spin_lock(&inode->i_lock);
shmem_falloc = inode->i_private;
if (shmem_falloc &&
+ !shmem_falloc->mode &&
index >= shmem_falloc->start &&
index < shmem_falloc->next)
shmem_falloc->nr_unswapped++;
@@ -1300,6 +1302,44 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
int error;
int ret = VM_FAULT_LOCKED;

+ /*
+ * Trinity finds that probing a hole which tmpfs is punching can
+ * prevent the hole-punch from ever completing: which in turn
+ * locks writers out with its hold on i_mutex. So refrain from
+ * faulting pages into the hole while it's being punched, and
+ * wait on i_mutex to be released if vmf->flags permits.
+ */
+ if (unlikely(inode->i_private)) {
+ struct shmem_falloc *shmem_falloc;
+
+ spin_lock(&inode->i_lock);
+ shmem_falloc = inode->i_private;
+ if (!shmem_falloc ||
+ shmem_falloc->mode != FALLOC_FL_PUNCH_HOLE ||
+ vmf->pgoff < shmem_falloc->start ||
+ vmf->pgoff >= shmem_falloc->next)
+ shmem_falloc = NULL;
+ spin_unlock(&inode->i_lock);
+ /*
+ * i_lock has protected us from taking shmem_falloc seriously
+ * once return from shmem_fallocate() went back up that stack.
+ * i_lock does not serialize with i_mutex at all, but it does
+ * not matter if sometimes we wait unnecessarily, or sometimes
+ * miss out on waiting: we just need to make those cases rare.
+ */
+ if (shmem_falloc) {
+ if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
+ !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
+ up_read(&vma->vm_mm->mmap_sem);
+ mutex_lock(&inode->i_mutex);
+ mutex_unlock(&inode->i_mutex);
+ return VM_FAULT_RETRY;
+ }
+ /* cond_resched? Leave that to GUP or return to user */
+ return VM_FAULT_NOPAGE;
+ }
+ }
+
error = shmem_getpage(inode, vmf->pgoff, &vmf->page, SGP_CACHE, &ret);
if (error)
return ((error == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS);
@@ -1815,18 +1855,26 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,

mutex_lock(&inode->i_mutex);

+ shmem_falloc.mode = mode & ~FALLOC_FL_KEEP_SIZE;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
struct address_space *mapping = file->f_mapping;
loff_t unmap_start = round_up(offset, PAGE_SIZE);
loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;

+ shmem_falloc.start = unmap_start >> PAGE_SHIFT;
+ shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT;
+ spin_lock(&inode->i_lock);
+ inode->i_private = &shmem_falloc;
+ spin_unlock(&inode->i_lock);
+
if ((u64)unmap_end > (u64)unmap_start)
unmap_mapping_range(mapping, unmap_start,
1 + unmap_end - unmap_start, 0);
shmem_truncate_range(inode, offset, offset + len - 1);
/* No need to unmap again: hole-punching leaves COWed pages */
error = 0;
- goto out;
+ goto undone;
}

/* We need to check rlimit even when FALLOC_FL_KEEP_SIZE */
--
2.0.1
Jiri Slaby
2014-07-30 12:13:57 UTC
Permalink
From: Hugh Dickins <***@google.com>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 8e205f779d1443a94b5ae81aa359cb535dd3021e upstream.

Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).

We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.

So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.

This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.

This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.

Signed-off-by: Hugh Dickins <***@google.com>
Reported-by: Sasha Levin <***@oracle.com>
Tested-by: Sasha Levin <***@oracle.com>
Cc: Vlastimil Babka <***@suse.cz>
Cc: Konstantin Khlebnikov <***@gmail.com>
Cc: Johannes Weiner <***@cmpxchg.org>
Cc: Lukas Czerner <***@redhat.com>
Cc: Dave Jones <***@redhat.com>
Cc: <***@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <***@linux-foundation.org>
Signed-off-by: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Jiri Slaby <***@suse.cz>
---
mm/shmem.c | 78 +++++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 52 insertions(+), 26 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 00d412fd2254..6f5626fca3cc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -85,7 +85,7 @@ static struct vfsmount *shm_mnt;
* a time): we would prefer not to enlarge the shmem inode just for that.
*/
struct shmem_falloc {
- int mode; /* FALLOC_FL mode currently operating */
+ wait_queue_head_t *waitq; /* faults into hole wait for punch to end */
pgoff_t start; /* start of range currently being fallocated */
pgoff_t next; /* the next page offset to be fallocated */
pgoff_t nr_falloced; /* how many new pages have been fallocated */
@@ -827,7 +827,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
spin_lock(&inode->i_lock);
shmem_falloc = inode->i_private;
if (shmem_falloc &&
- !shmem_falloc->mode &&
+ !shmem_falloc->waitq &&
index >= shmem_falloc->start &&
index < shmem_falloc->next)
shmem_falloc->nr_unswapped++;
@@ -1306,38 +1306,58 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
* Trinity finds that probing a hole which tmpfs is punching can
* prevent the hole-punch from ever completing: which in turn
* locks writers out with its hold on i_mutex. So refrain from
- * faulting pages into the hole while it's being punched, and
- * wait on i_mutex to be released if vmf->flags permits.
+ * faulting pages into the hole while it's being punched. Although
+ * shmem_undo_range() does remove the additions, it may be unable to
+ * keep up, as each new page needs its own unmap_mapping_range() call,
+ * and the i_mmap tree grows ever slower to scan if new vmas are added.
+ *
+ * It does not matter if we sometimes reach this check just before the
+ * hole-punch begins, so that one fault then races with the punch:
+ * we just need to make racing faults a rare case.
+ *
+ * The implementation below would be much simpler if we just used a
+ * standard mutex or completion: but we cannot take i_mutex in fault,
+ * and bloating every shmem inode for this unlikely case would be sad.
*/
if (unlikely(inode->i_private)) {
struct shmem_falloc *shmem_falloc;

spin_lock(&inode->i_lock);
shmem_falloc = inode->i_private;
- if (!shmem_falloc ||
- shmem_falloc->mode != FALLOC_FL_PUNCH_HOLE ||
- vmf->pgoff < shmem_falloc->start ||
- vmf->pgoff >= shmem_falloc->next)
- shmem_falloc = NULL;
- spin_unlock(&inode->i_lock);
- /*
- * i_lock has protected us from taking shmem_falloc seriously
- * once return from shmem_fallocate() went back up that stack.
- * i_lock does not serialize with i_mutex at all, but it does
- * not matter if sometimes we wait unnecessarily, or sometimes
- * miss out on waiting: we just need to make those cases rare.
- */
- if (shmem_falloc) {
+ if (shmem_falloc &&
+ shmem_falloc->waitq &&
+ vmf->pgoff >= shmem_falloc->start &&
+ vmf->pgoff < shmem_falloc->next) {
+ wait_queue_head_t *shmem_falloc_waitq;
+ DEFINE_WAIT(shmem_fault_wait);
+
+ ret = VM_FAULT_NOPAGE;
if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
+ /* It's polite to up mmap_sem if we can */
up_read(&vma->vm_mm->mmap_sem);
- mutex_lock(&inode->i_mutex);
- mutex_unlock(&inode->i_mutex);
- return VM_FAULT_RETRY;
+ ret = VM_FAULT_RETRY;
}
- /* cond_resched? Leave that to GUP or return to user */
- return VM_FAULT_NOPAGE;
+
+ shmem_falloc_waitq = shmem_falloc->waitq;
+ prepare_to_wait(shmem_falloc_waitq, &shmem_fault_wait,
+ TASK_UNINTERRUPTIBLE);
+ spin_unlock(&inode->i_lock);
+ schedule();
+
+ /*
+ * shmem_falloc_waitq points into the shmem_fallocate()
+ * stack of the hole-punching task: shmem_falloc_waitq
+ * is usually invalid by the time we reach here, but
+ * finish_wait() does not dereference it in that case;
+ * though i_lock needed lest racing with wake_up_all().
+ */
+ spin_lock(&inode->i_lock);
+ finish_wait(shmem_falloc_waitq, &shmem_fault_wait);
+ spin_unlock(&inode->i_lock);
+ return ret;
}
+ spin_unlock(&inode->i_lock);
}

error = shmem_getpage(inode, vmf->pgoff, &vmf->page, SGP_CACHE, &ret);
@@ -1855,13 +1875,13 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,

mutex_lock(&inode->i_mutex);

- shmem_falloc.mode = mode & ~FALLOC_FL_KEEP_SIZE;
-
if (mode & FALLOC_FL_PUNCH_HOLE) {
struct address_space *mapping = file->f_mapping;
loff_t unmap_start = round_up(offset, PAGE_SIZE);
loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
+ DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq);

+ shmem_falloc.waitq = &shmem_falloc_waitq;
shmem_falloc.start = unmap_start >> PAGE_SHIFT;
shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT;
spin_lock(&inode->i_lock);
@@ -1873,8 +1893,13 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
1 + unmap_end - unmap_start, 0);
shmem_truncate_range(inode, offset, offset + len - 1);
/* No need to unmap again: hole-punching leaves COWed pages */
+
+ spin_lock(&inode->i_lock);
+ inode->i_private = NULL;
+ wake_up_all(&shmem_falloc_waitq);
+ spin_unlock(&inode->i_lock);
error = 0;
- goto undone;
+ goto out;
}

/* We need to check rlimit even when FALLOC_FL_KEEP_SIZE */
@@ -1890,6 +1915,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
goto out;
}

+ shmem_falloc.waitq = NULL;
shmem_falloc.start = start;
shmem_falloc.next = start;
shmem_falloc.nr_falloced = 0;
--
2.0.1
Loading...