linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 4.9 00/71] 4.9.134-stable review
@ 2018-10-16 17:08 Greg Kroah-Hartman
  2018-10-16 17:08 ` [PATCH 4.9 01/71] ASoC: wm8804: Add ACPI support Greg Kroah-Hartman
                   ` (75 more replies)
  0 siblings, 76 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.9.134 release.
There are 71 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.134-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.9.134-rc1

Dan Carpenter <dan.carpenter@oracle.com>
    ipv4: frags: precedence bug in ip_expire()

Taehee Yoo <ap420073@gmail.com>
    ip: frags: fix crash in ip_do_fragment()

Peter Oskolkov <posk@google.com>
    ip: process in-order fragments efficiently

Peter Oskolkov <posk@google.com>
    ip: add helpers to process in-order fragments faster.

Peter Oskolkov <posk@google.com>
    ip: use rb trees for IP frag queue.

Eric Dumazet <edumazet@google.com>
    net: add rb_to_skb() and other rb tree helpers

Eric Dumazet <edumazet@google.com>
    net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

Florian Westphal <fw@strlen.de>
    ipv6: defrag: drop non-last frags smaller than min mtu

Peter Oskolkov <posk@google.com>
    net: modify skb_rbtree_purge to return the truesize of all purged skbs.

Eric Dumazet <edumazet@google.com>
    net: speed up skb_rbtree_purge()

Peter Oskolkov <posk@google.com>
    ip: discard IPv4 datagrams with overlapping segments.

Eric Dumazet <edumazet@google.com>
    inet: frags: fix ip6frag_low_thresh boundary

Eric Dumazet <edumazet@google.com>
    inet: frags: get rid of ipfrag_skb_cb/FRAG_CB

Eric Dumazet <edumazet@google.com>
    inet: frags: reorganize struct netns_frags

Eric Dumazet <edumazet@google.com>
    rhashtable: reorganize struct rhashtable layout

Eric Dumazet <edumazet@google.com>
    ipv6: frags: rewrite ip6_expire_frag_queue()

Eric Dumazet <edumazet@google.com>
    inet: frags: do not clone skb in ip_expire()

Eric Dumazet <edumazet@google.com>
    inet: frags: break the 2GB limit for frags storage

Eric Dumazet <edumazet@google.com>
    inet: frags: remove inet_frag_maybe_warn_overflow()

Eric Dumazet <edumazet@google.com>
    inet: frags: get rif of inet_frag_evicting()

Eric Dumazet <edumazet@google.com>
    inet: frags: remove some helpers

Eric Dumazet <edumazet@google.com>
    inet: frags: use rhashtables for reassembly units

Eric Dumazet <edumazet@google.com>
    rhashtable: add schedule points

Eric Dumazet <edumazet@google.com>
    ipv6: export ip6 fragments sysctl to unprivileged users

Eric Dumazet <edumazet@google.com>
    inet: frags: refactor lowpan_net_frag_init()

Eric Dumazet <edumazet@google.com>
    inet: frags: refactor ipv6_frag_init()

Eric Dumazet <edumazet@google.com>
    inet: frags: refactor ipfrag_init()

Eric Dumazet <edumazet@google.com>
    inet: frags: add a pointer to struct netns_frags

Eric Dumazet <edumazet@google.com>
    inet: frags: change inet_frags_init_net() return value

Eric Dumazet <edumazet@google.com>
    inet: make sure to grab rcu_read_lock before using ireq->ireq_opt

Eric Dumazet <edumazet@google.com>
    tcp/dccp: fix lockdep issue when SYN is backlogged

Eric Dumazet <edumazet@google.com>
    rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096

Florian Fainelli <f.fainelli@gmail.com>
    net: systemport: Fix wake-up interrupt race during resume

Maxime Chevallier <maxime.chevallier@bootlin.com>
    net: mvpp2: Extract the correct ethtype from the skb for tx csum offload

Florian Fainelli <f.fainelli@gmail.com>
    net: dsa: bcm_sf2: Fix unbind ordering

Ido Schimmel <idosch@mellanox.com>
    team: Forbid enslaving team device to itself

Giacinto Cifelli <gciofono@gmail.com>
    qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface

Shahed Shaikh <shahed.shaikh@cavium.com>
    qlcnic: fix Tx descriptor corruption on 82xx devices

Yu Zhao <yuzhao@google.com>
    net/usb: cancel pending work when unbinding smsc75xx

Sean Tranchetti <stranche@codeaurora.org>
    netlabel: check for IPV4MASK in addrinfo_get

Jeff Barnhill <0xeffeff@gmail.com>
    net/ipv6: Display all addresses in output of /proc/net/if_inet6

Sabrina Dubroca <sd@queasysnail.net>
    net: ipv4: update fnhe_pmtu when first hop's MTU changes

Yunsheng Lin <linyunsheng@huawei.com>
    net: hns: fix for unmapping problem when SMMU is on

Florian Fainelli <f.fainelli@gmail.com>
    net: dsa: bcm_sf2: Call setup during switch resume

Wei Wang <weiwan@google.com>
    ipv6: take rcu lock in rawv6_send_hdrinc()

Eric Dumazet <edumazet@google.com>
    ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()

Paolo Abeni <pabeni@redhat.com>
    ip_tunnel: be careful when accessing the inner header

Paolo Abeni <pabeni@redhat.com>
    ip6_tunnel: be careful when accessing the inner header

Mahesh Bandewar <maheshb@google.com>
    bonding: avoid possible dead-lock

Michael Chan <michael.chan@broadcom.com>
    bnxt_en: Fix TX timeout during netpoll.

Mathias Nyman <mathias.nyman@linux.intel.com>
    xhci: Don't print a warning when setting link state for disabled ports

Edgar Cherkasov <echerkasov@dev.rtsoft.ru>
    i2c: i2c-scmi: fix for i2c_smbus_write_block_data

Jan Kara <jack@suse.cz>
    mm: Preserve _PAGE_DEVMAP across mprotect() calls

Adrian Hunter <adrian.hunter@intel.com>
    perf script python: Fix export-to-postgresql.py occasional failure

Mikulas Patocka <mpatocka@redhat.com>
    mach64: detect the dot clock divider correctly on sparc

Paul Burton <paul.burton@mips.com>
    MIPS: VDSO: Always map near top of user memory

Jann Horn <jannh@google.com>
    mm/vmstat.c: fix outdated vmstat_text

Daniel Rosenberg <drosen@google.com>
    ext4: Fix error code in ext4_xattr_set_entry()

Amber Lin <Amber.Lin@amd.com>
    drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7

Vitaly Kuznetsov <vkuznets@redhat.com>
    x86/kvm/lapic: always disable MMIO interface in x2APIC mode

Nicolas Ferre <nicolas.ferre@microchip.com>
    ARM: dts: at91: add new compatibility string for macb on sama5d3

Nicolas Ferre <nicolas.ferre@microchip.com>
    net: macb: disable scatter-gather for macb on sama5d3

Jongsung Kim <neidhard.kim@lge.com>
    stmmac: fix valid numbers of unicast filter entries

Yu Zhao <yuzhao@google.com>
    sound: enable interrupt after dma buffer initialization

Dan Carpenter <dan.carpenter@oracle.com>
    scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted()

Laura Abbott <labbott@redhat.com>
    scsi: iscsi: target: Don't use stack buffer for scatterlist

Tony Lindgren <tony@atomide.com>
    mfd: omap-usb-host: Fix dts probe of children

Lei Yang <Lei.Yang@windriver.com>
    selftests: memory-hotplug: add required configs

Lei Yang <Lei.Yang@windriver.com>
    selftests/efivarfs: add required kernel configs

Danny Smith <danny.smith@axis.com>
    ASoC: sigmadsp: safeload should not have lower byte limit

Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    ASoC: wm8804: Add ACPI support


-------------

Diffstat:

 Documentation/devicetree/bindings/net/macb.txt     |   1 +
 Documentation/networking/ip-sysctl.txt             |  13 +-
 Makefile                                           |   4 +-
 arch/arm/boot/dts/sama5d3_emac.dtsi                |   2 +-
 arch/mips/include/asm/processor.h                  |  10 +-
 arch/mips/kernel/process.c                         |  25 +
 arch/mips/kernel/vdso.c                            |  18 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h       |   4 +-
 arch/x86/include/asm/pgtable_types.h               |   2 +-
 arch/x86/include/uapi/asm/kvm.h                    |   1 +
 arch/x86/kvm/lapic.c                               |  22 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   2 +-
 drivers/i2c/busses/i2c-scmi.c                      |   1 +
 drivers/mfd/omap-usb-host.c                        |  11 +-
 drivers/net/bonding/bond_main.c                    |  43 +-
 drivers/net/dsa/bcm_sf2.c                          |  12 +-
 drivers/net/ethernet/broadcom/bcmsysport.c         |  22 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  13 +-
 drivers/net/ethernet/cadence/macb.c                |   8 +
 drivers/net/ethernet/hisilicon/hns/hnae.c          |   2 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c      |  30 +-
 drivers/net/ethernet/marvell/mvpp2.c               |  10 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h        |   8 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c    |   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h    |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h     |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c     |  12 +-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   5 +-
 drivers/net/team/team.c                            |   5 +
 drivers/net/usb/qmi_wwan.c                         |   1 +
 drivers/net/usb/smsc75xx.c                         |   1 +
 drivers/scsi/qla2xxx/qla_target.h                  |   4 +-
 drivers/target/iscsi/iscsi_target.c                |  22 +-
 drivers/usb/host/xhci-hub.c                        |  18 +-
 drivers/video/fbdev/aty/atyfb.h                    |   3 +-
 drivers/video/fbdev/aty/atyfb_base.c               |   7 +-
 drivers/video/fbdev/aty/mach64_ct.c                |  10 +-
 fs/ext4/xattr.c                                    |   2 +-
 include/linux/netdevice.h                          |   7 +
 include/linux/rhashtable.h                         |   4 +-
 include/linux/skbuff.h                             |  34 +-
 include/net/bonding.h                              |   7 +-
 include/net/inet_frag.h                            | 133 +++--
 include/net/inet_sock.h                            |   6 -
 include/net/ip.h                                   |   1 -
 include/net/ip_fib.h                               |   1 +
 include/net/ipv6.h                                 |  26 +-
 include/uapi/linux/snmp.h                          |   1 +
 lib/rhashtable.c                                   |   5 +-
 mm/vmstat.c                                        |   1 -
 net/core/dev.c                                     |  28 +-
 net/core/rtnetlink.c                               |   6 +
 net/core/skbuff.c                                  |  31 +-
 net/dccp/input.c                                   |   4 +-
 net/dccp/ipv4.c                                    |   4 +-
 net/ieee802154/6lowpan/6lowpan_i.h                 |  26 +-
 net/ieee802154/6lowpan/reassembly.c                | 148 +++---
 net/ipv4/fib_frontend.c                            |  12 +-
 net/ipv4/fib_semantics.c                           |  50 ++
 net/ipv4/inet_connection_sock.c                    |   5 +-
 net/ipv4/inet_fragment.c                           | 379 +++-----------
 net/ipv4/ip_fragment.c                             | 573 ++++++++++++---------
 net/ipv4/ip_sockglue.c                             |   3 +-
 net/ipv4/ip_tunnel.c                               |   9 +
 net/ipv4/proc.c                                    |   7 +-
 net/ipv4/tcp_input.c                               |  37 +-
 net/ipv4/tcp_ipv4.c                                |   4 +-
 net/ipv6/addrconf.c                                |   4 +-
 net/ipv6/ip6_tunnel.c                              |  13 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c            | 100 ++--
 net/ipv6/proc.c                                    |   5 +-
 net/ipv6/raw.c                                     |  29 +-
 net/ipv6/reassembly.c                              | 212 ++++----
 net/netlabel/netlabel_unlabeled.c                  |   3 +-
 sound/hda/hdac_controller.c                        |   8 +-
 sound/soc/codecs/sigmadsp.c                        |   3 +-
 sound/soc/codecs/wm8804-i2c.c                      |  15 +-
 tools/perf/scripts/python/export-to-postgresql.py  |   9 +
 tools/testing/selftests/efivarfs/config            |   1 +
 tools/testing/selftests/memory-hotplug/config      |   1 +
 80 files changed, 1185 insertions(+), 1133 deletions(-)



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 01/71] ASoC: wm8804: Add ACPI support
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
@ 2018-10-16 17:08 ` Greg Kroah-Hartman
  2018-10-16 17:08 ` [PATCH 4.9 02/71] ASoC: sigmadsp: safeload should not have lower byte limit Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pierre-Louis Bossart, Charles Keepax,
	Mark Brown, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

[ Upstream commit 960cdd50ca9fdfeb82c2757107bcb7f93c8d7d41 ]

HID made of either Wolfson/CirrusLogic PCI ID + 8804 identifier.

This helps enumerate the HifiBerry Digi+ HAT boards on the Up2 platform.

The scripts at https://github.com/thesofproject/acpi-scripts can be
used to add the ACPI initrd overlays.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 sound/soc/codecs/wm8804-i2c.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

--- a/sound/soc/codecs/wm8804-i2c.c
+++ b/sound/soc/codecs/wm8804-i2c.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/i2c.h>
+#include <linux/acpi.h>
 
 #include "wm8804.h"
 
@@ -40,17 +41,29 @@ static const struct i2c_device_id wm8804
 };
 MODULE_DEVICE_TABLE(i2c, wm8804_i2c_id);
 
+#if defined(CONFIG_OF)
 static const struct of_device_id wm8804_of_match[] = {
 	{ .compatible = "wlf,wm8804", },
 	{ }
 };
 MODULE_DEVICE_TABLE(of, wm8804_of_match);
+#endif
+
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id wm8804_acpi_match[] = {
+	{ "1AEC8804", 0 }, /* Wolfson PCI ID + part ID */
+	{ "10138804", 0 }, /* Cirrus Logic PCI ID + part ID */
+	{ },
+};
+MODULE_DEVICE_TABLE(acpi, wm8804_acpi_match);
+#endif
 
 static struct i2c_driver wm8804_i2c_driver = {
 	.driver = {
 		.name = "wm8804",
 		.pm = &wm8804_pm,
-		.of_match_table = wm8804_of_match,
+		.of_match_table = of_match_ptr(wm8804_of_match),
+		.acpi_match_table = ACPI_PTR(wm8804_acpi_match),
 	},
 	.probe = wm8804_i2c_probe,
 	.remove = wm8804_i2c_remove,



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 02/71] ASoC: sigmadsp: safeload should not have lower byte limit
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
  2018-10-16 17:08 ` [PATCH 4.9 01/71] ASoC: wm8804: Add ACPI support Greg Kroah-Hartman
@ 2018-10-16 17:08 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 03/71] selftests/efivarfs: add required kernel configs Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Danny Smith, Lars-Peter Clausen,
	Mark Brown, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Danny Smith <danny.smith@axis.com>

[ Upstream commit 5ea752c6efdf5aa8a57aed816d453a8f479f1b0a ]

Fixed range in safeload conditional to allow safeload to up to 20 bytes,
without a lower limit.

Signed-off-by: Danny Smith <dannys@axis.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 sound/soc/codecs/sigmadsp.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/sound/soc/codecs/sigmadsp.c
+++ b/sound/soc/codecs/sigmadsp.c
@@ -117,8 +117,7 @@ static int sigmadsp_ctrl_write(struct si
 	struct sigmadsp_control *ctrl, void *data)
 {
 	/* safeload loads up to 20 bytes in a atomic operation */
-	if (ctrl->num_bytes > 4 && ctrl->num_bytes <= 20 && sigmadsp->ops &&
-	    sigmadsp->ops->safeload)
+	if (ctrl->num_bytes <= 20 && sigmadsp->ops && sigmadsp->ops->safeload)
 		return sigmadsp->ops->safeload(sigmadsp, ctrl->addr, data,
 			ctrl->num_bytes);
 	else



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 03/71] selftests/efivarfs: add required kernel configs
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
  2018-10-16 17:08 ` [PATCH 4.9 01/71] ASoC: wm8804: Add ACPI support Greg Kroah-Hartman
  2018-10-16 17:08 ` [PATCH 4.9 02/71] ASoC: sigmadsp: safeload should not have lower byte limit Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 04/71] selftests: memory-hotplug: add required configs Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lei Yang, Shuah Khan (Samsung OSG),
	Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lei Yang <Lei.Yang@windriver.com>

[ Upstream commit 53cf59d6c0ad3edc4f4449098706a8f8986258b6 ]

add config file

Signed-off-by: Lei Yang <Lei.Yang@windriver.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 tools/testing/selftests/efivarfs/config |    1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tools/testing/selftests/efivarfs/config

--- /dev/null
+++ b/tools/testing/selftests/efivarfs/config
@@ -0,0 +1 @@
+CONFIG_EFIVAR_FS=y



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 04/71] selftests: memory-hotplug: add required configs
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 03/71] selftests/efivarfs: add required kernel configs Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 05/71] mfd: omap-usb-host: Fix dts probe of children Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lei Yang, Shuah Khan (Samsung OSG),
	Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lei Yang <Lei.Yang@windriver.com>

[ Upstream commit 4d85af102a66ee6aeefa596f273169e77fb2b48e ]

add CONFIG_MEMORY_HOTREMOVE=y in config
without this config, /sys/devices/system/memory/memory*/removable
always return 0, I endup getting an early skip during test

Signed-off-by: Lei Yang <Lei.Yang@windriver.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 tools/testing/selftests/memory-hotplug/config |    1 +
 1 file changed, 1 insertion(+)

--- a/tools/testing/selftests/memory-hotplug/config
+++ b/tools/testing/selftests/memory-hotplug/config
@@ -2,3 +2,4 @@ CONFIG_MEMORY_HOTPLUG=y
 CONFIG_MEMORY_HOTPLUG_SPARSE=y
 CONFIG_NOTIFIER_ERROR_INJECTION=y
 CONFIG_MEMORY_NOTIFIER_ERROR_INJECT=m
+CONFIG_MEMORY_HOTREMOVE=y



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 05/71] mfd: omap-usb-host: Fix dts probe of children
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 04/71] selftests: memory-hotplug: add required configs Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 06/71] scsi: iscsi: target: Dont use stack buffer for scatterlist Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tony Lindgren, Roger Quadros,
	Lee Jones, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tony Lindgren <tony@atomide.com>

[ Upstream commit 10492ee8ed9188d6d420e1f79b2b9bdbc0624e65 ]

It currently only works if the parent bus uses "simple-bus". We
currently try to probe children with non-existing compatible values.
And we're missing .probe.

I noticed this while testing devices configured to probe using ti-sysc
interconnect target module driver. For that we also may want to rebind
the driver, so let's remove __init and __exit.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/mfd/omap-usb-host.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/drivers/mfd/omap-usb-host.c
+++ b/drivers/mfd/omap-usb-host.c
@@ -548,8 +548,8 @@ static int usbhs_omap_get_dt_pdata(struc
 }
 
 static const struct of_device_id usbhs_child_match_table[] = {
-	{ .compatible = "ti,omap-ehci", },
-	{ .compatible = "ti,omap-ohci", },
+	{ .compatible = "ti,ehci-omap", },
+	{ .compatible = "ti,ohci-omap3", },
 	{ }
 };
 
@@ -875,6 +875,7 @@ static struct platform_driver usbhs_omap
 		.pm		= &usbhsomap_dev_pm_ops,
 		.of_match_table = usbhs_omap_dt_ids,
 	},
+	.probe		= usbhs_omap_probe,
 	.remove		= usbhs_omap_remove,
 };
 
@@ -884,9 +885,9 @@ MODULE_ALIAS("platform:" USBHS_DRIVER_NA
 MODULE_LICENSE("GPL v2");
 MODULE_DESCRIPTION("usb host common core driver for omap EHCI and OHCI");
 
-static int __init omap_usbhs_drvinit(void)
+static int omap_usbhs_drvinit(void)
 {
-	return platform_driver_probe(&usbhs_omap_driver, usbhs_omap_probe);
+	return platform_driver_register(&usbhs_omap_driver);
 }
 
 /*
@@ -898,7 +899,7 @@ static int __init omap_usbhs_drvinit(voi
  */
 fs_initcall_sync(omap_usbhs_drvinit);
 
-static void __exit omap_usbhs_drvexit(void)
+static void omap_usbhs_drvexit(void)
 {
 	platform_driver_unregister(&usbhs_omap_driver);
 }



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 06/71] scsi: iscsi: target: Dont use stack buffer for scatterlist
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 05/71] mfd: omap-usb-host: Fix dts probe of children Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 07/71] scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted() Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Laura Abbott, Mike Christie,
	Martin K. Petersen, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Laura Abbott <labbott@redhat.com>

[ Upstream commit 679fcae46c8b2352bba3485d521da070cfbe68e6 ]

Fedora got a bug report of a crash with iSCSI:

kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:iscsit_do_crypto_hash_buf+0x154/0x180 [iscsi_target_mod]
...
 Call Trace:
  ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod]
  iscsit_get_rx_pdu+0x4cd/0xa90 [iscsi_target_mod]
  ? native_sched_clock+0x3e/0xa0
  ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod]
  iscsi_target_rx_thread+0x81/0xf0 [iscsi_target_mod]
  kthread+0x120/0x140
  ? kthread_create_worker_on_cpu+0x70/0x70
  ret_from_fork+0x3a/0x50

This is a BUG_ON for using a stack buffer with a scatterlist.  There
are two cases that trigger this bug. Switch to using a dynamically
allocated buffer for one case and do not assign a NULL buffer in
another case.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/target/iscsi/iscsi_target.c |   22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1435,7 +1435,8 @@ static void iscsit_do_crypto_hash_buf(
 
 	sg_init_table(sg, ARRAY_SIZE(sg));
 	sg_set_buf(sg, buf, payload_length);
-	sg_set_buf(sg + 1, pad_bytes, padding);
+	if (padding)
+		sg_set_buf(sg + 1, pad_bytes, padding);
 
 	ahash_request_set_crypt(hash, sg, data_crc, payload_length + padding);
 
@@ -3949,10 +3950,14 @@ static bool iscsi_target_check_conn_stat
 static void iscsit_get_rx_pdu(struct iscsi_conn *conn)
 {
 	int ret;
-	u8 buffer[ISCSI_HDR_LEN], opcode;
+	u8 *buffer, opcode;
 	u32 checksum = 0, digest = 0;
 	struct kvec iov;
 
+	buffer = kcalloc(ISCSI_HDR_LEN, sizeof(*buffer), GFP_KERNEL);
+	if (!buffer)
+		return;
+
 	while (!kthread_should_stop()) {
 		/*
 		 * Ensure that both TX and RX per connection kthreads
@@ -3960,7 +3965,6 @@ static void iscsit_get_rx_pdu(struct isc
 		 */
 		iscsit_thread_check_cpumask(conn, current, 0);
 
-		memset(buffer, 0, ISCSI_HDR_LEN);
 		memset(&iov, 0, sizeof(struct kvec));
 
 		iov.iov_base	= buffer;
@@ -3969,7 +3973,7 @@ static void iscsit_get_rx_pdu(struct isc
 		ret = rx_data(conn, &iov, 1, ISCSI_HDR_LEN);
 		if (ret != ISCSI_HDR_LEN) {
 			iscsit_rx_thread_wait_for_tcp(conn);
-			return;
+			break;
 		}
 
 		if (conn->conn_ops->HeaderDigest) {
@@ -3979,7 +3983,7 @@ static void iscsit_get_rx_pdu(struct isc
 			ret = rx_data(conn, &iov, 1, ISCSI_CRC_LEN);
 			if (ret != ISCSI_CRC_LEN) {
 				iscsit_rx_thread_wait_for_tcp(conn);
-				return;
+				break;
 			}
 
 			iscsit_do_crypto_hash_buf(conn->conn_rx_hash,
@@ -4003,7 +4007,7 @@ static void iscsit_get_rx_pdu(struct isc
 		}
 
 		if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT)
-			return;
+			break;
 
 		opcode = buffer[0] & ISCSI_OPCODE_MASK;
 
@@ -4014,13 +4018,15 @@ static void iscsit_get_rx_pdu(struct isc
 			" while in Discovery Session, rejecting.\n", opcode);
 			iscsit_add_reject(conn, ISCSI_REASON_PROTOCOL_ERROR,
 					  buffer);
-			return;
+			break;
 		}
 
 		ret = iscsi_target_rx_opcode(conn, buffer);
 		if (ret < 0)
-			return;
+			break;
 	}
+
+	kfree(buffer);
 }
 
 int iscsi_target_rx_thread(void *arg)



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 07/71] scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 06/71] scsi: iscsi: target: Dont use stack buffer for scatterlist Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 08/71] sound: enable interrupt after dma buffer initialization Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Quinn Tran,
	Himanshu Madhani, Martin K. Petersen, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

[ Upstream commit cbe3fd39d223f14b1c60c80fe9347a3dd08c2edb ]

We should first do the le16_to_cpu endian conversion and then apply the
FCP_CMD_LENGTH_MASK mask.

Fixes: 5f35509db179 ("qla2xxx: Terminate exchange if corrupted")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Quinn Tran <Quinn.Tran@cavium.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/scsi/qla2xxx/qla_target.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/scsi/qla2xxx/qla_target.h
+++ b/drivers/scsi/qla2xxx/qla_target.h
@@ -440,8 +440,8 @@ struct atio_from_isp {
 static inline int fcpcmd_is_corrupted(struct atio *atio)
 {
 	if (atio->entry_type == ATIO_TYPE7 &&
-	    (le16_to_cpu(atio->attr_n_length & FCP_CMD_LENGTH_MASK) <
-	    FCP_CMD_LENGTH_MIN))
+	    ((le16_to_cpu(atio->attr_n_length) & FCP_CMD_LENGTH_MASK) <
+	     FCP_CMD_LENGTH_MIN))
 		return 1;
 	else
 		return 0;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 08/71] sound: enable interrupt after dma buffer initialization
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 07/71] scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 09/71] stmmac: fix valid numbers of unicast filter entries Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Takashi Iwai, Yu Zhao, Mark Brown,
	Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yu Zhao <yuzhao@google.com>

[ Upstream commit b61749a89f826eb61fc59794d9e4697bd246eb61 ]

In snd_hdac_bus_init_chip(), we enable interrupt before
snd_hdac_bus_init_cmd_io() initializing dma buffers. If irq has
been acquired and irq handler uses the dma buffer, kernel may crash
when interrupt comes in.

Fix the problem by postponing enabling irq after dma buffer
initialization. And warn once on null dma buffer pointer during the
initialization.

Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 sound/hda/hdac_controller.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -40,6 +40,8 @@ static void azx_clear_corbrp(struct hdac
  */
 void snd_hdac_bus_init_cmd_io(struct hdac_bus *bus)
 {
+	WARN_ON_ONCE(!bus->rb.area);
+
 	spin_lock_irq(&bus->reg_lock);
 	/* CORB set up */
 	bus->corb.addr = bus->rb.addr;
@@ -478,13 +480,15 @@ bool snd_hdac_bus_init_chip(struct hdac_
 	/* reset controller */
 	azx_reset(bus, full_reset);
 
-	/* initialize interrupts */
+	/* clear interrupts */
 	azx_int_clear(bus);
-	azx_int_enable(bus);
 
 	/* initialize the codec command I/O */
 	snd_hdac_bus_init_cmd_io(bus);
 
+	/* enable interrupts after CORB/RIRB buffers are initialized above */
+	azx_int_enable(bus);
+
 	/* program the position buffer */
 	if (bus->use_posbuf && bus->posbuf.addr) {
 		snd_hdac_chip_writel(bus, DPLBASE, (u32)bus->posbuf.addr);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 09/71] stmmac: fix valid numbers of unicast filter entries
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 08/71] sound: enable interrupt after dma buffer initialization Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 10/71] net: macb: disable scatter-gather for macb on sama5d3 Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jongsung Kim, David S. Miller, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jongsung Kim <neidhard.kim@lge.com>

[ Upstream commit edf2ef7242805e53ec2e0841db26e06d8bc7da70 ]

Synopsys DWC Ethernet MAC can be configured to have 1..32, 64, or
128 unicast filter entries. (Table 7-8 MAC Address Registers from
databook) Fix dwmac1000_validate_ucast_entries() to accept values
between 1 and 32 in addition.

Signed-off-by: Jongsung Kim <neidhard.kim@lge.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -71,7 +71,7 @@ static int dwmac1000_validate_mcast_bins
  * Description:
  * This function validates the number of Unicast address entries supported
  * by a particular Synopsys 10/100/1000 controller. The Synopsys controller
- * supports 1, 32, 64, or 128 Unicast filter entries for it's Unicast filter
+ * supports 1..32, 64, or 128 Unicast filter entries for it's Unicast filter
  * logic. This function validates a valid, supported configuration is
  * selected, and defaults to 1 Unicast address if an unsupported
  * configuration is selected.
@@ -81,8 +81,7 @@ static int dwmac1000_validate_ucast_entr
 	int x = ucast_entries;
 
 	switch (x) {
-	case 1:
-	case 32:
+	case 1 ... 32:
 	case 64:
 	case 128:
 		break;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 10/71] net: macb: disable scatter-gather for macb on sama5d3
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 09/71] stmmac: fix valid numbers of unicast filter entries Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 11/71] ARM: dts: at91: add new compatibility string " Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nicolas Ferre, David S. Miller, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Ferre <nicolas.ferre@microchip.com>

[ Upstream commit eb4ed8e2d7fecb5f40db38e4498b9ee23cddf196 ]

Create a new configuration for the sama5d3-macb new compatibility string.
This configuration disables scatter-gather because we experienced lock down
of the macb interface of this particular SoC under very high load.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/cadence/macb.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2861,6 +2861,13 @@ static const struct macb_config at91sam9
 	.init = macb_init,
 };
 
+static const struct macb_config sama5d3macb_config = {
+	.caps = MACB_CAPS_SG_DISABLED
+	      | MACB_CAPS_USRIO_HAS_CLKEN | MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
+	.clk_init = macb_clk_init,
+	.init = macb_init,
+};
+
 static const struct macb_config pc302gem_config = {
 	.caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
 	.dma_burst_length = 16,
@@ -2925,6 +2932,7 @@ static const struct of_device_id macb_dt
 	{ .compatible = "cdns,gem", .data = &pc302gem_config },
 	{ .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config },
 	{ .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config },
+	{ .compatible = "atmel,sama5d3-macb", .data = &sama5d3macb_config },
 	{ .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config },
 	{ .compatible = "cdns,at91rm9200-emac", .data = &emac_config },
 	{ .compatible = "cdns,emac", .data = &emac_config },



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 11/71] ARM: dts: at91: add new compatibility string for macb on sama5d3
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 10/71] net: macb: disable scatter-gather for macb on sama5d3 Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 12/71] x86/kvm/lapic: always disable MMIO interface in x2APIC mode Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nicolas Ferre, David S. Miller, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Ferre <nicolas.ferre@microchip.com>

[ Upstream commit 321cc359d899a8e988f3725d87c18a628e1cc624 ]

We need this new compatibility string as we experienced different behavior
for this 10/100Mbits/s macb interface on this particular SoC.
Backward compatibility is preserved as we keep the alternative strings.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/net/macb.txt |    1 +
 arch/arm/boot/dts/sama5d3_emac.dtsi            |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -10,6 +10,7 @@ Required properties:
   Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: "cdns,gem".
   Use "atmel,sama5d2-gem" for the GEM IP (10/100) available on Atmel sama5d2 SoCs.
+  Use "atmel,sama5d3-macb" for the 10/100Mbit IP available on Atmel sama5d3 SoCs.
   Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs.
   Use "atmel,sama5d4-gem" for the GEM IP (10/100) available on Atmel sama5d4 SoCs.
   Use "cdns,zynq-gem" Xilinx Zynq-7xxx SoC.
--- a/arch/arm/boot/dts/sama5d3_emac.dtsi
+++ b/arch/arm/boot/dts/sama5d3_emac.dtsi
@@ -41,7 +41,7 @@
 			};
 
 			macb1: ethernet@f802c000 {
-				compatible = "cdns,at91sam9260-macb", "cdns,macb";
+				compatible = "atmel,sama5d3-macb", "cdns,at91sam9260-macb", "cdns,macb";
 				reg = <0xf802c000 0x100>;
 				interrupts = <35 IRQ_TYPE_LEVEL_HIGH 3>;
 				pinctrl-names = "default";



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 12/71] x86/kvm/lapic: always disable MMIO interface in x2APIC mode
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 11/71] ARM: dts: at91: add new compatibility string " Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 13/71] drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vitaly Kuznetsov, Paolo Bonzini, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vitaly Kuznetsov <vkuznets@redhat.com>

[ Upstream commit d1766202779e81d0f2a94c4650a6ba31497d369d ]

When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):

PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014

The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.

When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.

Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.

The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/uapi/asm/kvm.h |    1 +
 arch/x86/kvm/lapic.c            |   22 +++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -356,5 +356,6 @@ struct kvm_sync_regs {
 
 #define KVM_X86_QUIRK_LINT0_REENABLED	(1 << 0)
 #define KVM_X86_QUIRK_CD_NW_CLEARED	(1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	(1 << 2)
 
 #endif /* _ASM_X86_KVM_H */
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1220,9 +1220,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_reg_read);
 
 static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
 {
-	return kvm_apic_hw_enabled(apic) &&
-	    addr >= apic->base_address &&
-	    addr < apic->base_address + LAPIC_MMIO_LENGTH;
+	return addr >= apic->base_address &&
+		addr < apic->base_address + LAPIC_MMIO_LENGTH;
 }
 
 static int apic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
@@ -1234,6 +1233,15 @@ static int apic_mmio_read(struct kvm_vcp
 	if (!apic_mmio_in_range(apic, address))
 		return -EOPNOTSUPP;
 
+	if (!kvm_apic_hw_enabled(apic) || apic_x2apic_mode(apic)) {
+		if (!kvm_check_has_quirk(vcpu->kvm,
+					 KVM_X86_QUIRK_LAPIC_MMIO_HOLE))
+			return -EOPNOTSUPP;
+
+		memset(data, 0xff, len);
+		return 0;
+	}
+
 	kvm_lapic_reg_read(apic, offset, len, data);
 
 	return 0;
@@ -1646,6 +1654,14 @@ static int apic_mmio_write(struct kvm_vc
 	if (!apic_mmio_in_range(apic, address))
 		return -EOPNOTSUPP;
 
+	if (!kvm_apic_hw_enabled(apic) || apic_x2apic_mode(apic)) {
+		if (!kvm_check_has_quirk(vcpu->kvm,
+					 KVM_X86_QUIRK_LAPIC_MMIO_HOLE))
+			return -EOPNOTSUPP;
+
+		return 0;
+	}
+
 	/*
 	 * APIC register must be aligned on 128-bits boundary.
 	 * 32/64/128 bits registers must be accessed thru 32 bits.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 13/71] drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 12/71] x86/kvm/lapic: always disable MMIO interface in x2APIC mode Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 14/71] ext4: Fix error code in ext4_xattr_set_entry() Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alex Deucher, Amber Lin,
	Felix Kuehling, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Amber Lin <Amber.Lin@amd.com>

[ Upstream commit caaa4c8a6be2a275bd14f2369ee364978ff74704 ]

A wrong register bit was examinated for checking SDMA status so it reports
false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct
bit.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -505,7 +505,7 @@ static int kgd_hqd_sdma_destroy(struct k
 
 	while (true) {
 		temp = RREG32(sdma_base_addr + mmSDMA0_RLC0_CONTEXT_STATUS);
-		if (temp & SDMA0_STATUS_REG__RB_CMD_IDLE__SHIFT)
+		if (temp & SDMA0_RLC0_CONTEXT_STATUS__IDLE_MASK)
 			break;
 		if (timeout <= 0)
 			return -ETIME;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 14/71] ext4: Fix error code in ext4_xattr_set_entry()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 13/71] drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 15/71] mm/vmstat.c: fix outdated vmstat_text Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Greg Kroah-Hartman, Daniel Rosenberg

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Rosenberg <drosen@google.com>

ext4_xattr_set_entry should return EFSCORRUPTED instead of EIO
for corrupted xattr entries.

Fixes b469713e0c0c ("ext4: add corruption check in ext4_xattr_set_entry()")

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Apply to 4.9

 fs/ext4/xattr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -657,7 +657,7 @@ ext4_xattr_set_entry(struct ext4_xattr_i
 		next = EXT4_XATTR_NEXT(last);
 		if ((void *)next >= s->end) {
 			EXT4_ERROR_INODE(inode, "corrupted xattr entries");
-			return -EIO;
+			return -EFSCORRUPTED;
 		}
 		if (last->e_value_size) {
 			size_t offs = le16_to_cpu(last->e_value_offs);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 15/71] mm/vmstat.c: fix outdated vmstat_text
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 14/71] ext4: Fix error code in ext4_xattr_set_entry() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 16/71] MIPS: VDSO: Always map near top of user memory Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jann Horn, Kees Cook, Andrew Morton,
	Michal Hocko, Roman Gushchin, Davidlohr Bueso, Oleg Nesterov,
	Christoph Lameter, Kemi Wang, Andy Lutomirski, Ingo Molnar

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jann Horn <jannh@google.com>

commit 28e2c4bb99aa40f9d5f07ac130cbc4da0ea93079 upstream.

7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") removed the
VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding
entry in vmstat_text.  This causes an out-of-bounds access in
vmstat_show().

Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which
is probably very rare.

Link: http://lkml.kernel.org/r/20181001143138.95119-1-jannh@google.com
Fixes: 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/vmstat.c |    1 -
 1 file changed, 1 deletion(-)

--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1089,7 +1089,6 @@ const char * const vmstat_text[] = {
 #ifdef CONFIG_DEBUG_VM_VMACACHE
 	"vmacache_find_calls",
 	"vmacache_find_hits",
-	"vmacache_full_flushes",
 #endif
 #endif /* CONFIG_VM_EVENTS_COUNTERS */
 };



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 16/71] MIPS: VDSO: Always map near top of user memory
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 15/71] mm/vmstat.c: fix outdated vmstat_text Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 17/71] mach64: detect the dot clock divider correctly on sparc Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paul Burton, Huacai Chen, linux-mips

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Burton <paul.burton@mips.com>

commit ea7e0480a4b695d0aa6b3fa99bd658a003122113 upstream.

When using the legacy mmap layout, for example triggered using ulimit -s
unlimited, get_unmapped_area() fills memory from bottom to top starting
from a fairly low address near TASK_UNMAPPED_BASE.

This placement is suboptimal if the user application wishes to allocate
large amounts of heap memory using the brk syscall. With the VDSO being
located low in the user's virtual address space, the amount of space
available for access using brk is limited much more than it was prior to
the introduction of the VDSO.

For example:

  # ulimit -s unlimited; cat /proc/self/maps
  00400000-004ec000 r-xp 00000000 08:00 71436      /usr/bin/coreutils
  004fc000-004fd000 rwxp 000ec000 08:00 71436      /usr/bin/coreutils
  004fd000-0050f000 rwxp 00000000 00:00 0
  00cc3000-00ce4000 rwxp 00000000 00:00 0          [heap]
  2ab96000-2ab98000 r--p 00000000 00:00 0          [vvar]
  2ab98000-2ab99000 r-xp 00000000 00:00 0          [vdso]
  2ab99000-2ab9d000 rwxp 00000000 00:00 0
  ...

Resolve this by adjusting STACK_TOP to reserve space for the VDSO &
providing an address hint to get_unmapped_area() causing it to use this
space even when using the legacy mmap layout.

We reserve enough space for the VDSO, plus 1MB or 256MB for 32 bit & 64
bit systems respectively within which we randomize the VDSO base
address. Previously this randomization was taken care of by the mmap
base address randomization performed by arch_mmap_rnd(). The 1MB & 256MB
sizes are somewhat arbitrary but chosen such that we have some
randomization without taking up too much of the user's virtual address
space, which is often in short supply for 32 bit systems.

With this the VDSO is always mapped at a high address, leaving lots of
space for statically linked programs to make use of brk:

  # ulimit -s unlimited; cat /proc/self/maps
  00400000-004ec000 r-xp 00000000 08:00 71436      /usr/bin/coreutils
  004fc000-004fd000 rwxp 000ec000 08:00 71436      /usr/bin/coreutils
  004fd000-0050f000 rwxp 00000000 00:00 0
  00c28000-00c49000 rwxp 00000000 00:00 0          [heap]
  ...
  7f67c000-7f69d000 rwxp 00000000 00:00 0          [stack]
  7f7fc000-7f7fd000 rwxp 00000000 00:00 0
  7fcf1000-7fcf3000 r--p 00000000 00:00 0          [vvar]
  7fcf3000-7fcf4000 r-xp 00000000 00:00 0          [vdso]

Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Huacai Chen <chenhc@lemote.com>
Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/mips/include/asm/processor.h |   10 +++++-----
 arch/mips/kernel/process.c        |   25 +++++++++++++++++++++++++
 arch/mips/kernel/vdso.c           |   18 +++++++++++++++++-
 3 files changed, 47 insertions(+), 6 deletions(-)

--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -13,6 +13,7 @@
 
 #include <linux/atomic.h>
 #include <linux/cpumask.h>
+#include <linux/sizes.h>
 #include <linux/threads.h>
 
 #include <asm/cachectl.h>
@@ -80,11 +81,10 @@ extern unsigned int vced_count, vcei_cou
 
 #endif
 
-/*
- * One page above the stack is used for branch delay slot "emulation".
- * See dsemul.c for details.
- */
-#define STACK_TOP	((TASK_SIZE & PAGE_MASK) - PAGE_SIZE)
+#define VDSO_RANDOMIZE_SIZE	(TASK_IS_32BIT_ADDR ? SZ_1M : SZ_256M)
+
+extern unsigned long mips_stack_top(void);
+#define STACK_TOP		mips_stack_top()
 
 /*
  * This decides where the kernel will search for a free chunk of vm
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -28,6 +28,7 @@
 #include <linux/prctl.h>
 #include <linux/nmi.h>
 
+#include <asm/abi.h>
 #include <asm/asm.h>
 #include <asm/bootinfo.h>
 #include <asm/cpu.h>
@@ -35,6 +36,7 @@
 #include <asm/dsp.h>
 #include <asm/fpu.h>
 #include <asm/irq.h>
+#include <asm/mips-cps.h>
 #include <asm/msa.h>
 #include <asm/pgtable.h>
 #include <asm/mipsregs.h>
@@ -621,6 +623,29 @@ out:
 	return pc;
 }
 
+unsigned long mips_stack_top(void)
+{
+	unsigned long top = TASK_SIZE & PAGE_MASK;
+
+	/* One page for branch delay slot "emulation" */
+	top -= PAGE_SIZE;
+
+	/* Space for the VDSO, data page & GIC user page */
+	top -= PAGE_ALIGN(current->thread.abi->vdso->size);
+	top -= PAGE_SIZE;
+	top -= mips_gic_present() ? PAGE_SIZE : 0;
+
+	/* Space for cache colour alignment */
+	if (cpu_has_dc_aliases)
+		top -= shm_align_mask + 1;
+
+	/* Space to randomize the VDSO base */
+	if (current->flags & PF_RANDOMIZE)
+		top -= VDSO_RANDOMIZE_SIZE;
+
+	return top;
+}
+
 /*
  * Don't forget that the stack pointer must be aligned on a 8 bytes
  * boundary for 32-bits ABI and 16 bytes for 64-bits ABI.
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -16,6 +16,7 @@
 #include <linux/irqchip/mips-gic.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/random.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/timekeeper_internal.h>
@@ -97,6 +98,21 @@ void update_vsyscall_tz(void)
 	}
 }
 
+static unsigned long vdso_base(void)
+{
+	unsigned long base;
+
+	/* Skip the delay slot emulation page */
+	base = STACK_TOP + PAGE_SIZE;
+
+	if (current->flags & PF_RANDOMIZE) {
+		base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1);
+		base = PAGE_ALIGN(base);
+	}
+
+	return base;
+}
+
 int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 {
 	struct mips_vdso_image *image = current->thread.abi->vdso;
@@ -138,7 +154,7 @@ int arch_setup_additional_pages(struct l
 	if (cpu_has_dc_aliases)
 		size += shm_align_mask + 1;
 
-	base = get_unmapped_area(NULL, 0, size, 0, 0);
+	base = get_unmapped_area(NULL, vdso_base(), size, 0, 0);
 	if (IS_ERR_VALUE(base)) {
 		ret = base;
 		goto out;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 17/71] mach64: detect the dot clock divider correctly on sparc
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 16/71] MIPS: VDSO: Always map near top of user memory Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 18/71] perf script python: Fix export-to-postgresql.py occasional failure Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mikulas Patocka, David S. Miller,
	Ville Syrjälä

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <mpatocka@redhat.com>

commit 76ebebd2464c5c8a4453c98b6dbf9c95a599e810 upstream.

On Sun Ultra 5, it happens that the dot clock is not set up properly for
some videomodes. For example, if we set the videomode "r1024x768x60" in
the firmware, Linux would incorrectly set a videomode with refresh rate
180Hz when booting (suprisingly, my LCD monitor can display it, although
display quality is very low).

The reason is this: Older mach64 cards set the divider in the register
VCLK_POST_DIV. The register has four 2-bit fields (the field that is
actually used is specified in the lowest two bits of the register
CLOCK_CNTL). The 2 bits select divider "1, 2, 4, 8". On newer mach64 cards,
there's another bit added - the top four bits of PLL_EXT_CNTL extend the
divider selection, so we have possible dividers "1, 2, 4, 8, 3, 5, 6, 12".
The Linux driver clears the top four bits of PLL_EXT_CNTL and never sets
them, so it can work regardless if the card supports them. However, the
sparc64 firmware may set these extended dividers during boot - and the
mach64 driver detects incorrect dot clock in this case.

This patch makes the driver read the additional divider bit from
PLL_EXT_CNTL and calculate the initial refresh rate properly.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/video/fbdev/aty/atyfb.h      |    3 ++-
 drivers/video/fbdev/aty/atyfb_base.c |    7 ++++---
 drivers/video/fbdev/aty/mach64_ct.c  |   10 +++++-----
 3 files changed, 11 insertions(+), 9 deletions(-)

--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -332,6 +332,8 @@ extern const struct aty_pll_ops aty_pll_
 extern void aty_set_pll_ct(const struct fb_info *info, const union aty_pll *pll);
 extern u8 aty_ld_pll_ct(int offset, const struct atyfb_par *par);
 
+extern const u8 aty_postdividers[8];
+
 
     /*
      *  Hardware cursor support
@@ -358,7 +360,6 @@ static inline void wait_for_idle(struct
 
 extern void aty_reset_engine(const struct atyfb_par *par);
 extern void aty_init_engine(struct atyfb_par *par, struct fb_info *info);
-extern u8   aty_ld_pll_ct(int offset, const struct atyfb_par *par);
 
 void atyfb_copyarea(struct fb_info *info, const struct fb_copyarea *area);
 void atyfb_fillrect(struct fb_info *info, const struct fb_fillrect *rect);
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3093,17 +3093,18 @@ static int atyfb_setup_sparc(struct pci_
 		/*
 		 * PLL Reference Divider M:
 		 */
-		M = pll_regs[2];
+		M = pll_regs[PLL_REF_DIV];
 
 		/*
 		 * PLL Feedback Divider N (Dependent on CLOCK_CNTL):
 		 */
-		N = pll_regs[7 + (clock_cntl & 3)];
+		N = pll_regs[VCLK0_FB_DIV + (clock_cntl & 3)];
 
 		/*
 		 * PLL Post Divider P (Dependent on CLOCK_CNTL):
 		 */
-		P = 1 << (pll_regs[6] >> ((clock_cntl & 3) << 1));
+		P = aty_postdividers[((pll_regs[VCLK_POST_DIV] >> ((clock_cntl & 3) << 1)) & 3) |
+		                     ((pll_regs[PLL_EXT_CNTL] >> (2 + (clock_cntl & 3))) & 4)];
 
 		/*
 		 * PLL Divider Q:
--- a/drivers/video/fbdev/aty/mach64_ct.c
+++ b/drivers/video/fbdev/aty/mach64_ct.c
@@ -114,7 +114,7 @@ static void aty_st_pll_ct(int offset, u8
  */
 
 #define Maximum_DSP_PRECISION 7
-static u8 postdividers[] = {1,2,4,8,3};
+const u8 aty_postdividers[8] = {1,2,4,8,3,5,6,12};
 
 static int aty_dsp_gt(const struct fb_info *info, u32 bpp, struct pll_ct *pll)
 {
@@ -221,7 +221,7 @@ static int aty_valid_pll_ct(const struct
 		pll->vclk_post_div += (q <  64*8);
 		pll->vclk_post_div += (q <  32*8);
 	}
-	pll->vclk_post_div_real = postdividers[pll->vclk_post_div];
+	pll->vclk_post_div_real = aty_postdividers[pll->vclk_post_div];
 	//    pll->vclk_post_div <<= 6;
 	pll->vclk_fb_div = q * pll->vclk_post_div_real / 8;
 	pllvclk = (1000000 * 2 * pll->vclk_fb_div) /
@@ -512,7 +512,7 @@ static int aty_init_pll_ct(const struct
 		u8 mclk_fb_div, pll_ext_cntl;
 		pll->ct.pll_ref_div = aty_ld_pll_ct(PLL_REF_DIV, par);
 		pll_ext_cntl = aty_ld_pll_ct(PLL_EXT_CNTL, par);
-		pll->ct.xclk_post_div_real = postdividers[pll_ext_cntl & 0x07];
+		pll->ct.xclk_post_div_real = aty_postdividers[pll_ext_cntl & 0x07];
 		mclk_fb_div = aty_ld_pll_ct(MCLK_FB_DIV, par);
 		if (pll_ext_cntl & PLL_MFB_TIMES_4_2B)
 			mclk_fb_div <<= 1;
@@ -534,7 +534,7 @@ static int aty_init_pll_ct(const struct
 		xpost_div += (q <  64*8);
 		xpost_div += (q <  32*8);
 	}
-	pll->ct.xclk_post_div_real = postdividers[xpost_div];
+	pll->ct.xclk_post_div_real = aty_postdividers[xpost_div];
 	pll->ct.mclk_fb_div = q * pll->ct.xclk_post_div_real / 8;
 
 #ifdef CONFIG_PPC
@@ -583,7 +583,7 @@ static int aty_init_pll_ct(const struct
 			mpost_div += (q <  64*8);
 			mpost_div += (q <  32*8);
 		}
-		sclk_post_div_real = postdividers[mpost_div];
+		sclk_post_div_real = aty_postdividers[mpost_div];
 		pll->ct.sclk_fb_div = q * sclk_post_div_real / 8;
 		pll->ct.spll_cntl2 = mpost_div << 4;
 #ifdef DEBUG



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 18/71] perf script python: Fix export-to-postgresql.py occasional failure
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 17/71] mach64: detect the dot clock divider correctly on sparc Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 19/71] mm: Preserve _PAGE_DEVMAP across mprotect() calls Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Adrian Hunter, Jiri Olsa,
	Arnaldo Carvalho de Melo

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Adrian Hunter <adrian.hunter@intel.com>

commit 25e11700b54c7b6b5ebfc4361981dae12299557b upstream.

Occasional export failures were found to be caused by truncating 64-bit
pointers to 32-bits. Fix by explicitly setting types for all ctype
arguments and results.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180911114504.28516-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 tools/perf/scripts/python/export-to-postgresql.py |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -204,14 +204,23 @@ from ctypes import *
 libpq = CDLL("libpq.so.5")
 PQconnectdb = libpq.PQconnectdb
 PQconnectdb.restype = c_void_p
+PQconnectdb.argtypes = [ c_char_p ]
 PQfinish = libpq.PQfinish
+PQfinish.argtypes = [ c_void_p ]
 PQstatus = libpq.PQstatus
+PQstatus.restype = c_int
+PQstatus.argtypes = [ c_void_p ]
 PQexec = libpq.PQexec
 PQexec.restype = c_void_p
+PQexec.argtypes = [ c_void_p, c_char_p ]
 PQresultStatus = libpq.PQresultStatus
+PQresultStatus.restype = c_int
+PQresultStatus.argtypes = [ c_void_p ]
 PQputCopyData = libpq.PQputCopyData
+PQputCopyData.restype = c_int
 PQputCopyData.argtypes = [ c_void_p, c_void_p, c_int ]
 PQputCopyEnd = libpq.PQputCopyEnd
+PQputCopyEnd.restype = c_int
 PQputCopyEnd.argtypes = [ c_void_p, c_void_p ]
 
 sys.path.append(os.environ['PERF_EXEC_PATH'] + \



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 19/71] mm: Preserve _PAGE_DEVMAP across mprotect() calls
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 18/71] perf script python: Fix export-to-postgresql.py occasional failure Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 20/71] i2c: i2c-scmi: fix for i2c_smbus_write_block_data Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Kara, Michal Hocko,
	Johannes Thumshirn, Dan Williams

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 4628a64591e6cee181237060961e98c615c33966 upstream.

Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a
result we will see warnings such as:

BUG: Bad page map in process JobWrk0013  pte:800001803875ea25 pmd:7624381067
addr:00007f0930720000 vm_flags:280000f9 anon_vma:          (null) mapping:ffff97f2384056f0 index:0
file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage:          (null)
CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G        W          4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased)
Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
Call Trace:
 dump_stack+0x5a/0x75
 print_bad_pte+0x217/0x2c0
 ? enqueue_task_fair+0x76/0x9f0
 _vm_normal_page+0xe5/0x100
 zap_pte_range+0x148/0x740
 unmap_page_range+0x39a/0x4b0
 unmap_vmas+0x42/0x90
 unmap_region+0x99/0xf0
 ? vma_gap_callbacks_rotate+0x1a/0x20
 do_munmap+0x255/0x3a0
 vm_munmap+0x54/0x80
 SyS_munmap+0x1d/0x30
 do_syscall_64+0x74/0x150
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
...

when mprotect(2) gets used on DAX mappings. Also there is a wide variety
of other failures that can result from the missing _PAGE_DEVMAP flag
when the area gets used by get_user_pages() later.

Fix the problem by including _PAGE_DEVMAP in a set of flags that get
preserved by mprotect(2).

Fixes: 69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP")
Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/include/asm/book3s/64/pgtable.h |    4 ++--
 arch/x86/include/asm/pgtable_types.h         |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -53,7 +53,7 @@
  */
 #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
 			 _PAGE_ACCESSED | H_PAGE_THP_HUGE | _PAGE_PTE | \
-			 _PAGE_SOFT_DIRTY)
+			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 /*
  * user access blocked by key
  */
@@ -71,7 +71,7 @@
  */
 #define _PAGE_CHG_MASK	(PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
 			 _PAGE_ACCESSED | _PAGE_SPECIAL | _PAGE_PTE |	\
-			 _PAGE_SOFT_DIRTY)
+			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 /*
  * Mask of bits returned by pte_pgprot()
  */
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -134,7 +134,7 @@
  */
 #define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
 			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
-			 _PAGE_SOFT_DIRTY)
+			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
 
 /* The ASID is the lower 12 bits of CR3 */



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 20/71] i2c: i2c-scmi: fix for i2c_smbus_write_block_data
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 19/71] mm: Preserve _PAGE_DEVMAP across mprotect() calls Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 21/71] xhci: Dont print a warning when setting link state for disabled ports Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Edgar Cherkasov, Viktor Krasnov,
	Michael Brunner, Wolfram Sang

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Edgar Cherkasov <echerkasov@dev.rtsoft.ru>

commit 08d9db00fe0e300d6df976e6c294f974988226dd upstream.

The i2c-scmi driver crashes when the SMBus Write Block transaction is
executed:

WARNING: CPU: 9 PID: 2194 at mm/page_alloc.c:3931 __alloc_pages_slowpath+0x9db/0xec0
 Call Trace:
  ? get_page_from_freelist+0x49d/0x11f0
  ? alloc_pages_current+0x6a/0xe0
  ? new_slab+0x499/0x690
  __alloc_pages_nodemask+0x265/0x280
  alloc_pages_current+0x6a/0xe0
  kmalloc_order+0x18/0x40
  kmalloc_order_trace+0x24/0xb0
  ? acpi_ut_allocate_object_desc_dbg+0x62/0x10c
  __kmalloc+0x203/0x220
  acpi_os_allocate_zeroed+0x34/0x36
  acpi_ut_copy_eobject_to_iobject+0x266/0x31e
  acpi_evaluate_object+0x166/0x3b2
  acpi_smbus_cmi_access+0x144/0x530 [i2c_scmi]
  i2c_smbus_xfer+0xda/0x370
  i2cdev_ioctl_smbus+0x1bd/0x270
  i2cdev_ioctl+0xaa/0x250
  do_vfs_ioctl+0xa4/0x600
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x73/0x130
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
ACPI Error: Evaluating _SBW: 4 (20170831/smbus_cmi-185)

This problem occurs because the length of ACPI Buffer object is not
defined/initialized in the code before a corresponding ACPI method is
called. The obvious patch below fixes this issue.

Signed-off-by: Edgar Cherkasov <echerkasov@dev.rtsoft.ru>
Acked-by: Viktor Krasnov <vkrasnov@dev.rtsoft.ru>
Acked-by: Michael Brunner <Michael.Brunner@kontron.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/i2c/busses/i2c-scmi.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/i2c/busses/i2c-scmi.c
+++ b/drivers/i2c/busses/i2c-scmi.c
@@ -152,6 +152,7 @@ acpi_smbus_cmi_access(struct i2c_adapter
 			mt_params[3].type = ACPI_TYPE_INTEGER;
 			mt_params[3].integer.value = len;
 			mt_params[4].type = ACPI_TYPE_BUFFER;
+			mt_params[4].buffer.length = len;
 			mt_params[4].buffer.pointer = data->block + 1;
 		}
 		break;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 21/71] xhci: Dont print a warning when setting link state for disabled ports
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 20/71] i2c: i2c-scmi: fix for i2c_smbus_write_block_data Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 22/71] bnxt_en: Fix TX timeout during netpoll Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mathias Nyman, Yoshihiro Shimoda,
	Ross Zwisler

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Nyman <mathias.nyman@linux.intel.com>

commit 1208d8a84fdcae6b395c57911cdf907450d30e70 upstream.

When disabling a USB3 port the hub driver will set the port link state to
U3 to prevent "ejected" or "safely removed" devices that are still
physically connected from immediately re-enumerating.

If the device was really unplugged, then error messages were printed
as the hub tries to set the U3 link state for a port that is no longer
enabled.

xhci-hcd ee000000.usb: Cannot set link state.
usb usb8-port1: cannot disable (err = -32)

Don't print error message in xhci-hub if hub tries to set port link state
for a disabled port. Return -ENODEV instead which also silences hub driver.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/usb/host/xhci-hub.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1072,17 +1072,17 @@ int xhci_hub_control(struct usb_hcd *hcd
 				temp = readl(port_array[wIndex]);
 				break;
 			}
-
-			/* Software should not attempt to set
-			 * port link state above '3' (U3) and the port
-			 * must be enabled.
-			 */
-			if ((temp & PORT_PE) == 0 ||
-				(link_state > USB_SS_PORT_LS_U3)) {
-				xhci_warn(xhci, "Cannot set link state.\n");
+			/* Port must be enabled */
+			if (!(temp & PORT_PE)) {
+				retval = -ENODEV;
+				break;
+			}
+			/* Can't set port link state above '3' (U3) */
+			if (link_state > USB_SS_PORT_LS_U3) {
+				xhci_warn(xhci, "Cannot set port %d link state %d\n",
+					 wIndex, link_state);
 				goto error;
 			}
-
 			if (link_state == USB_SS_PORT_LS_U3) {
 				slot_id = xhci_find_slot_id_by_port(hcd, xhci,
 						wIndex + 1);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 22/71] bnxt_en: Fix TX timeout during netpoll.
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 21/71] xhci: Dont print a warning when setting link state for disabled ports Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 23/71] bonding: avoid possible dead-lock Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Song Liu, Michael Chan, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael Chan <michael.chan@broadcom.com>

[ Upstream commit 73f21c653f930f438d53eed29b5e4c65c8a0f906 ]

The current netpoll implementation in the bnxt_en driver has problems
that may miss TX completion events.  bnxt_poll_work() in effect is
only handling at most 1 TX packet before exiting.  In addition,
there may be in flight TX completions that ->poll() may miss even
after we fix bnxt_poll_work() to handle all visible TX completions.
netpoll may not call ->poll() again and HW may not generate IRQ
because the driver does not ARM the IRQ when the budget (0 for netpoll)
is reached.

We fix it by handling all TX completions and to always ARM the IRQ
when we exit ->poll() with 0 budget.

Also, the logic to ACK the completion ring in case it is almost filled
with TX completions need to be adjusted to take care of the 0 budget
case, as discussed with Eric Dumazet <edumazet@google.com>

Reported-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1666,8 +1666,11 @@ static int bnxt_poll_work(struct bnxt *b
 		if (TX_CMP_TYPE(txcmp) == CMP_TYPE_TX_L2_CMP) {
 			tx_pkts++;
 			/* return full budget so NAPI will complete. */
-			if (unlikely(tx_pkts > bp->tx_wake_thresh))
+			if (unlikely(tx_pkts > bp->tx_wake_thresh)) {
 				rx_pkts = budget;
+				raw_cons = NEXT_RAW_CMP(raw_cons);
+				break;
+			}
 		} else if ((TX_CMP_TYPE(txcmp) & 0x30) == 0x10) {
 			rc = bnxt_rx_pkt(bp, bnapi, &raw_cons, &agg_event);
 			if (likely(rc >= 0))
@@ -1685,7 +1688,7 @@ static int bnxt_poll_work(struct bnxt *b
 		}
 		raw_cons = NEXT_RAW_CMP(raw_cons);
 
-		if (rx_pkts == budget)
+		if (rx_pkts && rx_pkts == budget)
 			break;
 	}
 
@@ -1797,8 +1800,12 @@ static int bnxt_poll(struct napi_struct
 	while (1) {
 		work_done += bnxt_poll_work(bp, bnapi, budget - work_done);
 
-		if (work_done >= budget)
+		if (work_done >= budget) {
+			if (!budget)
+				BNXT_CP_DB_REARM(cpr->cp_doorbell,
+						 cpr->cp_raw_cons);
 			break;
+		}
 
 		if (!bnxt_has_work(bp, cpr)) {
 			napi_complete(napi);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 23/71] bonding: avoid possible dead-lock
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 22/71] bnxt_en: Fix TX timeout during netpoll Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 24/71] ip6_tunnel: be careful when accessing the inner header Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Mahesh Bandewar, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mahesh Bandewar <maheshb@google.com>

[ Upstream commit d4859d749aa7090ffb743d15648adb962a1baeae ]

Syzkaller reported this on a slightly older kernel but it's still
applicable to the current kernel -

======================================================
WARNING: possible circular locking dependency detected
4.18.0-next-20180823+ #46 Not tainted
------------------------------------------------------
syz-executor4/26841 is trying to acquire lock:
00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652

but task is already holding lock:
00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (rtnl_mutex){+.+.}:
       __mutex_lock_common kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
       rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
       bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
       bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
       process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
       worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

-> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
       process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
       worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

-> #0 ((wq_completion)bond_dev->name){+.+.}:
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(rtnl_mutex);
                               lock((work_completion)(&(&nnw->work)->work));
                               lock(rtnl_mutex);
  lock((wq_completion)bond_dev->name);

 *** DEADLOCK ***

1 lock held by syz-executor4/26841:

stack backtrace:
CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
 check_prev_add kernel/locking/lockdep.c:1862 [inline]
 check_prevs_add kernel/locking/lockdep.c:1975 [inline]
 validate_chain kernel/locking/lockdep.c:2416 [inline]
 __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
 register_netdevice+0x337/0x1100 net/core/dev.c:8410
 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:632
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
 __sys_sendmsg+0x11d/0x290 net/socket.c:2153
 __do_sys_sendmsg net/socket.c:2162 [inline]
 __se_sys_sendmsg net/socket.c:2160 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457089
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/bonding/bond_main.c |   43 +++++++++++++++-------------------------
 include/net/bonding.h           |    7 ------
 2 files changed, 18 insertions(+), 32 deletions(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -216,6 +216,7 @@ static struct rtnl_link_stats64 *bond_ge
 static void bond_slave_arr_handler(struct work_struct *work);
 static bool bond_time_in_interval(struct bonding *bond, unsigned long last_act,
 				  int mod);
+static void bond_netdev_notify_work(struct work_struct *work);
 
 /*---------------------------- General routines -----------------------------*/
 
@@ -1250,6 +1251,8 @@ static struct slave *bond_alloc_slave(st
 			return NULL;
 		}
 	}
+	INIT_DELAYED_WORK(&slave->notify_work, bond_netdev_notify_work);
+
 	return slave;
 }
 
@@ -1257,6 +1260,7 @@ static void bond_free_slave(struct slave
 {
 	struct bonding *bond = bond_get_bond_by_slave(slave);
 
+	cancel_delayed_work_sync(&slave->notify_work);
 	if (BOND_MODE(bond) == BOND_MODE_8023AD)
 		kfree(SLAVE_AD_INFO(slave));
 
@@ -1278,39 +1282,26 @@ static void bond_fill_ifslave(struct sla
 	info->link_failure_count = slave->link_failure_count;
 }
 
-static void bond_netdev_notify(struct net_device *dev,
-			       struct netdev_bonding_info *info)
-{
-	rtnl_lock();
-	netdev_bonding_info_change(dev, info);
-	rtnl_unlock();
-}
-
 static void bond_netdev_notify_work(struct work_struct *_work)
 {
-	struct netdev_notify_work *w =
-		container_of(_work, struct netdev_notify_work, work.work);
+	struct slave *slave = container_of(_work, struct slave,
+					   notify_work.work);
+
+	if (rtnl_trylock()) {
+		struct netdev_bonding_info binfo;
 
-	bond_netdev_notify(w->dev, &w->bonding_info);
-	dev_put(w->dev);
-	kfree(w);
+		bond_fill_ifslave(slave, &binfo.slave);
+		bond_fill_ifbond(slave->bond, &binfo.master);
+		netdev_bonding_info_change(slave->dev, &binfo);
+		rtnl_unlock();
+	} else {
+		queue_delayed_work(slave->bond->wq, &slave->notify_work, 1);
+	}
 }
 
 void bond_queue_slave_event(struct slave *slave)
 {
-	struct bonding *bond = slave->bond;
-	struct netdev_notify_work *nnw = kzalloc(sizeof(*nnw), GFP_ATOMIC);
-
-	if (!nnw)
-		return;
-
-	dev_hold(slave->dev);
-	nnw->dev = slave->dev;
-	bond_fill_ifslave(slave, &nnw->bonding_info.slave);
-	bond_fill_ifbond(bond, &nnw->bonding_info.master);
-	INIT_DELAYED_WORK(&nnw->work, bond_netdev_notify_work);
-
-	queue_delayed_work(slave->bond->wq, &nnw->work, 0);
+	queue_delayed_work(slave->bond->wq, &slave->notify_work, 0);
 }
 
 void bond_lower_state_changed(struct slave *slave)
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -139,12 +139,6 @@ struct bond_parm_tbl {
 	int mode;
 };
 
-struct netdev_notify_work {
-	struct delayed_work	work;
-	struct net_device	*dev;
-	struct netdev_bonding_info bonding_info;
-};
-
 struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
 	struct bonding *bond; /* our master */
@@ -171,6 +165,7 @@ struct slave {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	struct netpoll *np;
 #endif
+	struct delayed_work notify_work;
 	struct kobject kobj;
 	struct rtnl_link_stats64 slave_stats;
 };



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 24/71] ip6_tunnel: be careful when accessing the inner header
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 23/71] bonding: avoid possible dead-lock Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 25/71] ip_tunnel: " Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+3fde91d4d394747d6db4,
	Alexander Potapenko, Paolo Abeni, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <pabeni@redhat.com>

[ Upstream commit 76c0ddd8c3a683f6e2c6e60e11dc1a1558caf4bc ]

the ip6 tunnel xmit ndo assumes that the processed skb always
contains an ip[v6] header, but syzbot has found a way to send
frames that fall short of this assumption, leading to the following splat:

BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
[inline]
BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
net/ipv6/ip6_tunnel.c:1390
CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
  ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
  __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
  netdev_start_xmit include/linux/netdevice.h:4075 [inline]
  xmit_one net/core/dev.c:3026 [inline]
  dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
  __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  packet_snd net/packet/af_packet.c:2944 [inline]
  packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
  __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
  SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
  SyS_sendmmsg+0x63/0x90 net/socket.c:2162
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x441819
RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
  slab_post_alloc_hook mm/slab.h:445 [inline]
  slab_alloc_node mm/slub.c:2737 [inline]
  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:984 [inline]
  alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
  sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
  packet_alloc_skb net/packet/af_packet.c:2803 [inline]
  packet_snd net/packet/af_packet.c:2894 [inline]
  packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
  __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
  SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
  SyS_sendmmsg+0x63/0x90 net/socket.c:2162
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

This change addresses the issue adding the needed check before
accessing the inner header.

The ipv4 side of the issue is apparently there since the ipv4 over ipv6
initial support, and the ipv6 side predates git history.

Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_tunnel.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1226,7 +1226,7 @@ static inline int
 ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip6_tnl *t = netdev_priv(dev);
-	const struct iphdr  *iph = ip_hdr(skb);
+	const struct iphdr  *iph;
 	int encap_limit = -1;
 	struct flowi6 fl6;
 	__u8 dsfield;
@@ -1234,6 +1234,11 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, str
 	u8 tproto;
 	int err;
 
+	/* ensure we can access the full inner ip header */
+	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+		return -1;
+
+	iph = ip_hdr(skb);
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 
 	tproto = ACCESS_ONCE(t->parms.proto);
@@ -1293,7 +1298,7 @@ static inline int
 ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip6_tnl *t = netdev_priv(dev);
-	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct ipv6hdr *ipv6h;
 	int encap_limit = -1;
 	__u16 offset;
 	struct flowi6 fl6;
@@ -1302,6 +1307,10 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, str
 	u8 tproto;
 	int err;
 
+	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
+		return -1;
+
+	ipv6h = ipv6_hdr(skb);
 	tproto = ACCESS_ONCE(t->parms.proto);
 	if ((tproto != IPPROTO_IPV6 && tproto != 0) ||
 	    ip6_tnl_addr_conflict(t, ipv6h))



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 25/71] ip_tunnel: be careful when accessing the inner header
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 24/71] ip6_tunnel: be careful when accessing the inner header Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 26/71] ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Cong Wang, Paolo Abeni, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <pabeni@redhat.com>

[ Upstream commit ccfec9e5cb2d48df5a955b7bf47f7782157d3bc2]

Cong noted that we need the same checks introduced by commit 76c0ddd8c3a6
("ip6_tunnel: be careful when accessing the inner header")
even for ipv4 tunnels.

Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_tunnel.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -627,6 +627,7 @@ void ip_tunnel_xmit(struct sk_buff *skb,
 		    const struct iphdr *tnl_params, u8 protocol)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
+	unsigned int inner_nhdr_len = 0;
 	const struct iphdr *inner_iph;
 	struct flowi4 fl4;
 	u8     tos, ttl;
@@ -636,6 +637,14 @@ void ip_tunnel_xmit(struct sk_buff *skb,
 	__be32 dst;
 	bool connected;
 
+	/* ensure we can access the inner net header, for several users below */
+	if (skb->protocol == htons(ETH_P_IP))
+		inner_nhdr_len = sizeof(struct iphdr);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		inner_nhdr_len = sizeof(struct ipv6hdr);
+	if (unlikely(!pskb_may_pull(skb, inner_nhdr_len)))
+		goto tx_error;
+
 	inner_iph = (const struct iphdr *)skb_inner_network_header(skb);
 	connected = (tunnel->parms.iph.daddr != 0);
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 26/71] ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 25/71] ip_tunnel: " Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 27/71] ipv6: take rcu lock in rawv6_send_hdrinc() Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Willem de Bruijn,
	syzbot, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 64199fc0a46ba211362472f7f942f900af9492fd ]

Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy,
do not do it.

Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_sockglue.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -134,7 +134,6 @@ static void ip_cmsg_recv_security(struct
 static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb)
 {
 	struct sockaddr_in sin;
-	const struct iphdr *iph = ip_hdr(skb);
 	__be16 *ports;
 	int end;
 
@@ -149,7 +148,7 @@ static void ip_cmsg_recv_dstaddr(struct
 	ports = (__be16 *)skb_transport_header(skb);
 
 	sin.sin_family = AF_INET;
-	sin.sin_addr.s_addr = iph->daddr;
+	sin.sin_addr.s_addr = ip_hdr(skb)->daddr;
 	sin.sin_port = ports[1];
 	memset(sin.sin_zero, 0, sizeof(sin.sin_zero));
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 27/71] ipv6: take rcu lock in rawv6_send_hdrinc()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 26/71] ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 28/71] net: dsa: bcm_sf2: Call setup during switch resume Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Wei Wang, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Wei Wang <weiwan@google.com>

[ Upstream commit a688caa34beb2fd2a92f1b6d33e40cde433ba160 ]

In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
directly assign the dst to skb and set passed in dst to NULL to avoid
double free.
However, in error case, we free skb and then do stats update with the
dst pointer passed in. This causes use-after-free on the dst.
Fix it by taking rcu read lock right before dst could get released to
make sure dst does not get freed until the stats update is done.
Note: we don't have this issue in ipv4 cause dst is not used for stats
update in v4.

Syzkaller reported following crash:
BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088

CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
 rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
 __sys_sendmsg+0x11d/0x280 net/socket.c:2152
 __do_sys_sendmsg net/socket.c:2161 [inline]
 __se_sys_sendmsg net/socket.c:2159 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457099
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000

Allocated by task 32088:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:105
 ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
 ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
 ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
 fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
 ip6_route_output include/net/ip6_route.h:88 [inline]
 ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
 rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
 __sys_sendmsg+0x11d/0x280 net/socket.c:2152
 __do_sys_sendmsg net/socket.c:2161 [inline]
 __se_sys_sendmsg net/socket.c:2159 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 5356:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x83/0x290 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:141
 dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
 __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
 rcu_do_batch kernel/rcu/tree.c:2576 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
 rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
 __do_softirq+0x30b/0xad8 kernel/softirq.c:292

Fixes: 1789a640f556 ("raw: avoid two atomics in xmit")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/raw.c |   29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -645,8 +645,6 @@ static int rawv6_send_hdrinc(struct sock
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
-	skb_dst_set(skb, &rt->dst);
-	*dstp = NULL;
 
 	skb_put(skb, length);
 	skb_reset_network_header(skb);
@@ -656,8 +654,14 @@ static int rawv6_send_hdrinc(struct sock
 
 	skb->transport_header = skb->network_header;
 	err = memcpy_from_msg(iph, msg, length);
-	if (err)
-		goto error_fault;
+	if (err) {
+		err = -EFAULT;
+		kfree_skb(skb);
+		goto error;
+	}
+
+	skb_dst_set(skb, &rt->dst);
+	*dstp = NULL;
 
 	/* if egress device is enslaved to an L3 master device pass the
 	 * skb to its handler for processing
@@ -666,21 +670,28 @@ static int rawv6_send_hdrinc(struct sock
 	if (unlikely(!skb))
 		return 0;
 
+	/* Acquire rcu_read_lock() in case we need to use rt->rt6i_idev
+	 * in the error path. Since skb has been freed, the dst could
+	 * have been queued for deletion.
+	 */
+	rcu_read_lock();
 	IP6_UPD_PO_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUT, skb->len);
 	err = NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb,
 		      NULL, rt->dst.dev, dst_output);
 	if (err > 0)
 		err = net_xmit_errno(err);
-	if (err)
-		goto error;
+	if (err) {
+		IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
+		rcu_read_unlock();
+		goto error_check;
+	}
+	rcu_read_unlock();
 out:
 	return 0;
 
-error_fault:
-	err = -EFAULT;
-	kfree_skb(skb);
 error:
 	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
+error_check:
 	if (err == -ENOBUFS && !np->recverr)
 		err = 0;
 	return err;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 28/71] net: dsa: bcm_sf2: Call setup during switch resume
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 27/71] ipv6: take rcu lock in rawv6_send_hdrinc() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 29/71] net: hns: fix for unmapping problem when SMMU is on Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Fainelli, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <f.fainelli@gmail.com>

[ Upstream commit 54baca096386d862d19c10f58f34bf787c6b3cbe ]

There is no reason to open code what the switch setup function does, in
fact, because we just issued a switch reset, we would make all the
register get their default values, including for instance, having unused
port be enabled again and wasting power and leading to an inappropriate
switch core clock being selected.

Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/dsa/bcm_sf2.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -744,7 +744,6 @@ static int bcm_sf2_sw_suspend(struct dsa
 static int bcm_sf2_sw_resume(struct dsa_switch *ds)
 {
 	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-	unsigned int port;
 	int ret;
 
 	ret = bcm_sf2_sw_rst(priv);
@@ -756,12 +755,7 @@ static int bcm_sf2_sw_resume(struct dsa_
 	if (priv->hw_params.num_gphy == 1)
 		bcm_sf2_gphy_enable_set(ds, true);
 
-	for (port = 0; port < DSA_MAX_PORTS; port++) {
-		if ((1 << port) & ds->enabled_port_mask)
-			bcm_sf2_port_setup(ds, port, NULL);
-		else if (dsa_is_cpu_port(ds, port))
-			bcm_sf2_imp_setup(ds, port);
-	}
+	ds->ops->setup(ds);
 
 	return 0;
 }



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 29/71] net: hns: fix for unmapping problem when SMMU is on
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 28/71] net: dsa: bcm_sf2: Call setup during switch resume Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 30/71] net: ipv4: update fnhe_pmtu when first hops MTU changes Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yunsheng Lin, Peng Li, Yisen Zhuang,
	Salil Mehta, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yunsheng Lin <linyunsheng@huawei.com>

[ Upstream commit 2e9361efa707e186d91b938e44f9e326725259f7 ]

If SMMU is on, there is more likely that skb_shinfo(skb)->frags[i]
can not send by a single BD. when this happen, the
hns_nic_net_xmit_hw function map the whole data in a frags using
skb_frag_dma_map, but unmap each BD' data individually when tx is
done, which causes problem when SMMU is on.

This patch fixes this problem by ummapping the whole data in a
frags when tx is done.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/hisilicon/hns/hnae.c     |    2 -
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   30 ++++++++++++++++----------
 2 files changed, 20 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -80,7 +80,7 @@ static void hnae_unmap_buffer(struct hna
 	if (cb->type == DESC_TYPE_SKB)
 		dma_unmap_single(ring_to_dev(ring), cb->dma, cb->length,
 				 ring_to_dma_dir(ring));
-	else
+	else if (cb->length)
 		dma_unmap_page(ring_to_dev(ring), cb->dma, cb->length,
 			       ring_to_dma_dir(ring));
 }
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -39,9 +39,9 @@
 #define SKB_TMP_LEN(SKB) \
 	(((SKB)->transport_header - (SKB)->mac_header) + tcp_hdrlen(SKB))
 
-static void fill_v2_desc(struct hnae_ring *ring, void *priv,
-			 int size, dma_addr_t dma, int frag_end,
-			 int buf_num, enum hns_desc_type type, int mtu)
+static void fill_v2_desc_hw(struct hnae_ring *ring, void *priv, int size,
+			    int send_sz, dma_addr_t dma, int frag_end,
+			    int buf_num, enum hns_desc_type type, int mtu)
 {
 	struct hnae_desc *desc = &ring->desc[ring->next_to_use];
 	struct hnae_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use];
@@ -63,7 +63,7 @@ static void fill_v2_desc(struct hnae_rin
 	desc_cb->type = type;
 
 	desc->addr = cpu_to_le64(dma);
-	desc->tx.send_size = cpu_to_le16((u16)size);
+	desc->tx.send_size = cpu_to_le16((u16)send_sz);
 
 	/* config bd buffer end */
 	hnae_set_bit(rrcfv, HNSV2_TXD_VLD_B, 1);
@@ -132,6 +132,14 @@ static void fill_v2_desc(struct hnae_rin
 	ring_ptr_move_fw(ring, next_to_use);
 }
 
+static void fill_v2_desc(struct hnae_ring *ring, void *priv,
+			 int size, dma_addr_t dma, int frag_end,
+			 int buf_num, enum hns_desc_type type, int mtu)
+{
+	fill_v2_desc_hw(ring, priv, size, size, dma, frag_end,
+			buf_num, type, mtu);
+}
+
 static const struct acpi_device_id hns_enet_acpi_match[] = {
 	{ "HISI00C1", 0 },
 	{ "HISI00C2", 0 },
@@ -288,15 +296,15 @@ static void fill_tso_desc(struct hnae_ri
 
 	/* when the frag size is bigger than hardware, split this frag */
 	for (k = 0; k < frag_buf_num; k++)
-		fill_v2_desc(ring, priv,
-			     (k == frag_buf_num - 1) ?
+		fill_v2_desc_hw(ring, priv, k == 0 ? size : 0,
+				(k == frag_buf_num - 1) ?
 					sizeoflast : BD_MAX_SEND_SIZE,
-			     dma + BD_MAX_SEND_SIZE * k,
-			     frag_end && (k == frag_buf_num - 1) ? 1 : 0,
-			     buf_num,
-			     (type == DESC_TYPE_SKB && !k) ?
+				dma + BD_MAX_SEND_SIZE * k,
+				frag_end && (k == frag_buf_num - 1) ? 1 : 0,
+				buf_num,
+				(type == DESC_TYPE_SKB && !k) ?
 					DESC_TYPE_SKB : DESC_TYPE_PAGE,
-			     mtu);
+				mtu);
 }
 
 netdev_tx_t hns_nic_net_xmit_hw(struct net_device *ndev,



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 30/71] net: ipv4: update fnhe_pmtu when first hops MTU changes
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 29/71] net: hns: fix for unmapping problem when SMMU is on Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 31/71] net/ipv6: Display all addresses in output of /proc/net/if_inet6 Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sabrina Dubroca, Stefano Brivio,
	David Ahern, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sabrina Dubroca <sd@queasysnail.net>

[ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ]

Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop
exceptions"), exceptions get deprecated separately from cached
routes. In particular, administrative changes don't clear PMTU anymore.

As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes
on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
the local MTU change can become stale:
 - if the local MTU is now lower than the PMTU, that PMTU is now
   incorrect
 - if the local MTU was the lowest value in the path, and is increased,
   we might discover a higher PMTU

Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those
cases.

If the exception was locked, the discovered PMTU was smaller than the
minimal accepted PMTU. In that case, if the new local MTU is smaller
than the current PMTU, let PMTU discovery figure out if locking of the
exception is still needed.

To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
notifier. By the time the notifier is called, dev->mtu has been
changed. This patch adds the old MTU as additional information in the
notifier structure, and a new call_netdevice_notifiers_u32() function.

Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/netdevice.h |    7 ++++++
 include/net/ip_fib.h      |    1 
 net/core/dev.c            |   28 +++++++++++++++++++++++--
 net/ipv4/fib_frontend.c   |   12 +++++++----
 net/ipv4/fib_semantics.c  |   50 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 92 insertions(+), 6 deletions(-)

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2338,6 +2338,13 @@ struct netdev_notifier_info {
 	struct net_device *dev;
 };
 
+struct netdev_notifier_info_ext {
+	struct netdev_notifier_info info; /* must be first */
+	union {
+		u32 mtu;
+	} ext;
+};
+
 struct netdev_notifier_change_info {
 	struct netdev_notifier_info info; /* must be first */
 	unsigned int flags_changed;
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -363,6 +363,7 @@ int ip_fib_check_default(__be32 gw, stru
 int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
 int fib_sync_down_addr(struct net_device *dev, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
+void fib_sync_mtu(struct net_device *dev, u32 orig_mtu);
 
 extern u32 fib_multipath_secret __read_mostly;
 
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1664,6 +1664,28 @@ int call_netdevice_notifiers(unsigned lo
 }
 EXPORT_SYMBOL(call_netdevice_notifiers);
 
+/**
+ *	call_netdevice_notifiers_mtu - call all network notifier blocks
+ *	@val: value passed unmodified to notifier function
+ *	@dev: net_device pointer passed unmodified to notifier function
+ *	@arg: additional u32 argument passed to the notifier function
+ *
+ *	Call all network notifier blocks.  Parameters and return value
+ *	are as for raw_notifier_call_chain().
+ */
+static int call_netdevice_notifiers_mtu(unsigned long val,
+					struct net_device *dev, u32 arg)
+{
+	struct netdev_notifier_info_ext info = {
+		.info.dev = dev,
+		.ext.mtu = arg,
+	};
+
+	BUILD_BUG_ON(offsetof(struct netdev_notifier_info_ext, info) != 0);
+
+	return call_netdevice_notifiers_info(val, dev, &info.info);
+}
+
 #ifdef CONFIG_NET_INGRESS
 static struct static_key ingress_needed __read_mostly;
 
@@ -6589,14 +6611,16 @@ int dev_set_mtu(struct net_device *dev,
 	err = __dev_set_mtu(dev, new_mtu);
 
 	if (!err) {
-		err = call_netdevice_notifiers(NETDEV_CHANGEMTU, dev);
+		err = call_netdevice_notifiers_mtu(NETDEV_CHANGEMTU, dev,
+						   orig_mtu);
 		err = notifier_to_errno(err);
 		if (err) {
 			/* setting mtu back and notifying everyone again,
 			 * so that they have a chance to revert changes.
 			 */
 			__dev_set_mtu(dev, orig_mtu);
-			call_netdevice_notifiers(NETDEV_CHANGEMTU, dev);
+			call_netdevice_notifiers_mtu(NETDEV_CHANGEMTU, dev,
+						     new_mtu);
 		}
 	}
 	return err;
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1171,7 +1171,8 @@ static int fib_inetaddr_event(struct not
 static int fib_netdev_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-	struct netdev_notifier_changeupper_info *info;
+	struct netdev_notifier_changeupper_info *upper_info = ptr;
+	struct netdev_notifier_info_ext *info_ext = ptr;
 	struct in_device *in_dev;
 	struct net *net = dev_net(dev);
 	unsigned int flags;
@@ -1206,16 +1207,19 @@ static int fib_netdev_event(struct notif
 			fib_sync_up(dev, RTNH_F_LINKDOWN);
 		else
 			fib_sync_down_dev(dev, event, false);
-		/* fall through */
+		rt_cache_flush(net);
+		break;
 	case NETDEV_CHANGEMTU:
+		fib_sync_mtu(dev, info_ext->ext.mtu);
 		rt_cache_flush(net);
 		break;
 	case NETDEV_CHANGEUPPER:
-		info = ptr;
+		upper_info = ptr;
 		/* flush all routes if dev is linked to or unlinked from
 		 * an L3 master device (e.g., VRF)
 		 */
-		if (info->upper_dev && netif_is_l3_master(info->upper_dev))
+		if (upper_info->upper_dev &&
+		    netif_is_l3_master(upper_info->upper_dev))
 			fib_disable_ip(dev, NETDEV_DOWN, true);
 		break;
 	}
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1377,6 +1377,56 @@ int fib_sync_down_addr(struct net_device
 	return ret;
 }
 
+/* Update the PMTU of exceptions when:
+ * - the new MTU of the first hop becomes smaller than the PMTU
+ * - the old MTU was the same as the PMTU, and it limited discovery of
+ *   larger MTUs on the path. With that limit raised, we can now
+ *   discover larger MTUs
+ * A special case is locked exceptions, for which the PMTU is smaller
+ * than the minimal accepted PMTU:
+ * - if the new MTU is greater than the PMTU, don't make any change
+ * - otherwise, unlock and set PMTU
+ */
+static void nh_update_mtu(struct fib_nh *nh, u32 new, u32 orig)
+{
+	struct fnhe_hash_bucket *bucket;
+	int i;
+
+	bucket = rcu_dereference_protected(nh->nh_exceptions, 1);
+	if (!bucket)
+		return;
+
+	for (i = 0; i < FNHE_HASH_SIZE; i++) {
+		struct fib_nh_exception *fnhe;
+
+		for (fnhe = rcu_dereference_protected(bucket[i].chain, 1);
+		     fnhe;
+		     fnhe = rcu_dereference_protected(fnhe->fnhe_next, 1)) {
+			if (fnhe->fnhe_mtu_locked) {
+				if (new <= fnhe->fnhe_pmtu) {
+					fnhe->fnhe_pmtu = new;
+					fnhe->fnhe_mtu_locked = false;
+				}
+			} else if (new < fnhe->fnhe_pmtu ||
+				   orig == fnhe->fnhe_pmtu) {
+				fnhe->fnhe_pmtu = new;
+			}
+		}
+	}
+}
+
+void fib_sync_mtu(struct net_device *dev, u32 orig_mtu)
+{
+	unsigned int hash = fib_devindex_hashfn(dev->ifindex);
+	struct hlist_head *head = &fib_info_devhash[hash];
+	struct fib_nh *nh;
+
+	hlist_for_each_entry(nh, head, nh_hash) {
+		if (nh->nh_dev == dev)
+			nh_update_mtu(nh, dev->mtu, orig_mtu);
+	}
+}
+
 /* Event              force Flags           Description
  * NETDEV_CHANGE      0     LINKDOWN        Carrier OFF, not for scope host
  * NETDEV_DOWN        0     LINKDOWN|DEAD   Link down, not for scope host



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 31/71] net/ipv6: Display all addresses in output of /proc/net/if_inet6
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 30/71] net: ipv4: update fnhe_pmtu when first hops MTU changes Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 32/71] netlabel: check for IPV4MASK in addrinfo_get Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jeff Barnhill, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Barnhill <0xeffeff@gmail.com>

[ Upstream commit 86f9bd1ff61c413a2a251fa736463295e4e24733 ]

The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
handle starting/stopping the iteration.  The problem is that at some point
during the iteration, an overflow is detected and the process is
subsequently stopped.  The item being shown via seq_printf() when the
overflow occurs is not actually shown, though.  When start() is
subsequently called to resume iterating, it returns the next item, and
thus the item that was being processed when the overflow occurred never
gets printed.

Alter the meaning of the private data member "offset".  Currently, when it
is not 0 (which only happens at the very beginning), "offset" represents
the next hlist item to be printed.  After this change, "offset" always
represents the current item.

This is also consistent with the private data member "bucket", which
represents the current bucket, and also the use of "pos" as defined in
seq_file.txt:
    The pos passed to start() will always be either zero, or the most
    recent pos used in the previous session.

Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/addrconf.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4068,7 +4068,6 @@ static struct inet6_ifaddr *if6_get_firs
 				p++;
 				continue;
 			}
-			state->offset++;
 			return ifa;
 		}
 
@@ -4092,13 +4091,12 @@ static struct inet6_ifaddr *if6_get_next
 		return ifa;
 	}
 
+	state->offset = 0;
 	while (++state->bucket < IN6_ADDR_HSIZE) {
-		state->offset = 0;
 		hlist_for_each_entry_rcu_bh(ifa,
 				     &inet6_addr_lst[state->bucket], addr_lst) {
 			if (!net_eq(dev_net(ifa->idev->dev), net))
 				continue;
-			state->offset++;
 			return ifa;
 		}
 	}



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 32/71] netlabel: check for IPV4MASK in addrinfo_get
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 31/71] net/ipv6: Display all addresses in output of /proc/net/if_inet6 Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 33/71] net/usb: cancel pending work when unbinding smsc75xx Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Sean Tranchetti, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sean Tranchetti <stranche@codeaurora.org>

[ Upstream commit f88b4c01b97e09535505cf3c327fdbce55c27f00 ]

netlbl_unlabel_addrinfo_get() assumes that if it finds the
NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the
NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is
not necessarily the case as the current checks in
netlbl_unlabel_staticadd() and friends are not sufficent to
enforce this.

If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR,
NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes,
these functions will all call netlbl_unlabel_addrinfo_get() which
will then attempt dereference NULL when fetching the non-existent
NLBL_UNLABEL_A_IPV4MASK attribute:

Unable to handle kernel NULL pointer dereference at virtual address 0
Process unlab (pid: 31762, stack limit = 0xffffff80502d8000)
Call trace:
	netlbl_unlabel_addrinfo_get+0x44/0xd8
	netlbl_unlabel_staticremovedef+0x98/0xe0
	genl_rcv_msg+0x354/0x388
	netlink_rcv_skb+0xac/0x118
	genl_rcv+0x34/0x48
	netlink_unicast+0x158/0x1f0
	netlink_sendmsg+0x32c/0x338
	sock_sendmsg+0x44/0x60
	___sys_sendmsg+0x1d0/0x2a8
	__sys_sendmsg+0x64/0xb4
	SyS_sendmsg+0x34/0x4c
	el0_svc_naked+0x34/0x38
Code: 51001149 7100113f 540000a0 f9401508 (79400108)
---[ end trace f6438a488e737143 ]---
Kernel panic - not syncing: Fatal exception

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/netlabel/netlabel_unlabeled.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/netlabel/netlabel_unlabeled.c
+++ b/net/netlabel/netlabel_unlabeled.c
@@ -787,7 +787,8 @@ static int netlbl_unlabel_addrinfo_get(s
 {
 	u32 addr_len;
 
-	if (info->attrs[NLBL_UNLABEL_A_IPV4ADDR]) {
+	if (info->attrs[NLBL_UNLABEL_A_IPV4ADDR] &&
+	    info->attrs[NLBL_UNLABEL_A_IPV4MASK]) {
 		addr_len = nla_len(info->attrs[NLBL_UNLABEL_A_IPV4ADDR]);
 		if (addr_len != sizeof(struct in_addr) &&
 		    addr_len != nla_len(info->attrs[NLBL_UNLABEL_A_IPV4MASK]))



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 33/71] net/usb: cancel pending work when unbinding smsc75xx
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 32/71] netlabel: check for IPV4MASK in addrinfo_get Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 34/71] qlcnic: fix Tx descriptor corruption on 82xx devices Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Yu Zhao, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yu Zhao <yuzhao@google.com>

[ Upstream commit f7b2a56e1f3dcbdb4cf09b2b63e859ffe0e09df8 ]

Cancel pending work before freeing smsc75xx private data structure
during binding. This fixes the following crash in the driver:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
IP: mutex_lock+0x2b/0x3f
<snipped>
Workqueue: events smsc75xx_deferred_multicast_write [smsc75xx]
task: ffff8caa83e85700 task.stack: ffff948b80518000
RIP: 0010:mutex_lock+0x2b/0x3f
<snipped>
Call Trace:
 smsc75xx_deferred_multicast_write+0x40/0x1af [smsc75xx]
 process_one_work+0x18d/0x2fc
 worker_thread+0x1a2/0x269
 ? pr_cont_work+0x58/0x58
 kthread+0xfa/0x10a
 ? pr_cont_work+0x58/0x58
 ? rcu_read_unlock_sched_notrace+0x48/0x48
 ret_from_fork+0x22/0x40

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/usb/smsc75xx.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -1518,6 +1518,7 @@ static void smsc75xx_unbind(struct usbne
 {
 	struct smsc75xx_priv *pdata = (struct smsc75xx_priv *)(dev->data[0]);
 	if (pdata) {
+		cancel_work_sync(&pdata->set_multicast);
 		netif_dbg(dev, ifdown, dev->net, "free pdata\n");
 		kfree(pdata);
 		pdata = NULL;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 34/71] qlcnic: fix Tx descriptor corruption on 82xx devices
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 33/71] net/usb: cancel pending work when unbinding smsc75xx Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 35/71] qmi_wwan: Added support for Gemaltos Cinterion ALASxx WWAN interface Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Shahed Shaikh, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shahed Shaikh <shahed.shaikh@cavium.com>

[ Upstream commit c333fa0c4f220f8f7ea5acd6b0ebf3bf13fd684d ]

In regular NIC transmission flow, driver always configures MAC using
Tx queue zero descriptor as a part of MAC learning flow.
But with multi Tx queue supported NIC, regular transmission can occur on
any non-zero Tx queue and from that context it uses
Tx queue zero descriptor to configure MAC, at the same time TX queue
zero could be used by another CPU for regular transmission
which could lead to Tx queue zero descriptor corruption and cause FW
abort.

This patch fixes this in such a way that driver always configures
learned MAC address from the same Tx queue which is used for
regular transmission.

Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h         |    8 +++++---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |    3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h |    3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h      |    3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c      |   12 ++++++------
 5 files changed, 17 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -1800,7 +1800,8 @@ struct qlcnic_hardware_ops {
 	int (*config_loopback) (struct qlcnic_adapter *, u8);
 	int (*clear_loopback) (struct qlcnic_adapter *, u8);
 	int (*config_promisc_mode) (struct qlcnic_adapter *, u32);
-	void (*change_l2_filter) (struct qlcnic_adapter *, u64 *, u16);
+	void (*change_l2_filter)(struct qlcnic_adapter *adapter, u64 *addr,
+				 u16 vlan, struct qlcnic_host_tx_ring *tx_ring);
 	int (*get_board_info) (struct qlcnic_adapter *);
 	void (*set_mac_filter_count) (struct qlcnic_adapter *);
 	void (*free_mac_list) (struct qlcnic_adapter *);
@@ -2042,9 +2043,10 @@ static inline int qlcnic_nic_set_promisc
 }
 
 static inline void qlcnic_change_filter(struct qlcnic_adapter *adapter,
-					u64 *addr, u16 id)
+					u64 *addr, u16 vlan,
+					struct qlcnic_host_tx_ring *tx_ring)
 {
-	adapter->ahw->hw_ops->change_l2_filter(adapter, addr, id);
+	adapter->ahw->hw_ops->change_l2_filter(adapter, addr, vlan, tx_ring);
 }
 
 static inline int qlcnic_get_board_info(struct qlcnic_adapter *adapter)
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -2132,7 +2132,8 @@ out:
 }
 
 void qlcnic_83xx_change_l2_filter(struct qlcnic_adapter *adapter, u64 *addr,
-				  u16 vlan_id)
+				  u16 vlan_id,
+				  struct qlcnic_host_tx_ring *tx_ring)
 {
 	u8 mac[ETH_ALEN];
 	memcpy(&mac, addr, ETH_ALEN);
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h
@@ -550,7 +550,8 @@ int qlcnic_83xx_wrt_reg_indirect(struct
 int qlcnic_83xx_nic_set_promisc(struct qlcnic_adapter *, u32);
 int qlcnic_83xx_config_hw_lro(struct qlcnic_adapter *, int);
 int qlcnic_83xx_config_rss(struct qlcnic_adapter *, int);
-void qlcnic_83xx_change_l2_filter(struct qlcnic_adapter *, u64 *, u16);
+void qlcnic_83xx_change_l2_filter(struct qlcnic_adapter *adapter, u64 *addr,
+				  u16 vlan, struct qlcnic_host_tx_ring *ring);
 int qlcnic_83xx_get_pci_info(struct qlcnic_adapter *, struct qlcnic_pci_info *);
 int qlcnic_83xx_set_nic_info(struct qlcnic_adapter *, struct qlcnic_info *);
 void qlcnic_83xx_initialize_nic(struct qlcnic_adapter *, int);
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h
@@ -173,7 +173,8 @@ int qlcnic_82xx_napi_add(struct qlcnic_a
 			 struct net_device *netdev);
 void qlcnic_82xx_get_beacon_state(struct qlcnic_adapter *);
 void qlcnic_82xx_change_filter(struct qlcnic_adapter *adapter,
-			       u64 *uaddr, u16 vlan_id);
+			       u64 *uaddr, u16 vlan_id,
+			       struct qlcnic_host_tx_ring *tx_ring);
 int qlcnic_82xx_config_intr_coalesce(struct qlcnic_adapter *,
 				     struct ethtool_coalesce *);
 int qlcnic_82xx_set_rx_coalesce(struct qlcnic_adapter *);
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
@@ -268,13 +268,12 @@ static void qlcnic_add_lb_filter(struct
 }
 
 void qlcnic_82xx_change_filter(struct qlcnic_adapter *adapter, u64 *uaddr,
-			       u16 vlan_id)
+			       u16 vlan_id, struct qlcnic_host_tx_ring *tx_ring)
 {
 	struct cmd_desc_type0 *hwdesc;
 	struct qlcnic_nic_req *req;
 	struct qlcnic_mac_req *mac_req;
 	struct qlcnic_vlan_req *vlan_req;
-	struct qlcnic_host_tx_ring *tx_ring = adapter->tx_ring;
 	u32 producer;
 	u64 word;
 
@@ -301,7 +300,8 @@ void qlcnic_82xx_change_filter(struct ql
 
 static void qlcnic_send_filter(struct qlcnic_adapter *adapter,
 			       struct cmd_desc_type0 *first_desc,
-			       struct sk_buff *skb)
+			       struct sk_buff *skb,
+			       struct qlcnic_host_tx_ring *tx_ring)
 {
 	struct vlan_ethhdr *vh = (struct vlan_ethhdr *)(skb->data);
 	struct ethhdr *phdr = (struct ethhdr *)(skb->data);
@@ -335,7 +335,7 @@ static void qlcnic_send_filter(struct ql
 		    tmp_fil->vlan_id == vlan_id) {
 			if (jiffies > (QLCNIC_READD_AGE * HZ + tmp_fil->ftime))
 				qlcnic_change_filter(adapter, &src_addr,
-						     vlan_id);
+						     vlan_id, tx_ring);
 			tmp_fil->ftime = jiffies;
 			return;
 		}
@@ -350,7 +350,7 @@ static void qlcnic_send_filter(struct ql
 	if (!fil)
 		return;
 
-	qlcnic_change_filter(adapter, &src_addr, vlan_id);
+	qlcnic_change_filter(adapter, &src_addr, vlan_id, tx_ring);
 	fil->ftime = jiffies;
 	fil->vlan_id = vlan_id;
 	memcpy(fil->faddr, &src_addr, ETH_ALEN);
@@ -766,7 +766,7 @@ netdev_tx_t qlcnic_xmit_frame(struct sk_
 	}
 
 	if (adapter->drv_mac_learn)
-		qlcnic_send_filter(adapter, first_desc, skb);
+		qlcnic_send_filter(adapter, first_desc, skb, tx_ring);
 
 	tx_ring->tx_stats.tx_bytes += skb->len;
 	tx_ring->tx_stats.xmit_called++;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 35/71] qmi_wwan: Added support for Gemaltos Cinterion ALASxx WWAN interface
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 34/71] qlcnic: fix Tx descriptor corruption on 82xx devices Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 36/71] team: Forbid enslaving team device to itself Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Giacinto Cifelli, Bjørn Mork,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Giacinto Cifelli <gciofono@gmail.com>

[ Upstream commit 4f7617705bfff84d756fe4401a1f4f032f374984 ]

Added support for Gemalto's Cinterion ALASxx WWAN interfaces
by adding QMI_FIXED_INTF with Cinterion's VID and PID.

Signed-off-by: Giacinto Cifelli <gciofono@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/usb/qmi_wwan.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -934,6 +934,7 @@ static const struct usb_device_id produc
 	{QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)},	/* Olivetti Olicard 500 */
 	{QMI_FIXED_INTF(0x1e2d, 0x0060, 4)},	/* Cinterion PLxx */
 	{QMI_FIXED_INTF(0x1e2d, 0x0053, 4)},	/* Cinterion PHxx,PXxx */
+	{QMI_FIXED_INTF(0x1e2d, 0x0063, 10)},	/* Cinterion ALASxx (1 RmNet) */
 	{QMI_FIXED_INTF(0x1e2d, 0x0082, 4)},	/* Cinterion PHxx,PXxx (2 RmNet) */
 	{QMI_FIXED_INTF(0x1e2d, 0x0082, 5)},	/* Cinterion PHxx,PXxx (2 RmNet) */
 	{QMI_FIXED_INTF(0x1e2d, 0x0083, 4)},	/* Cinterion PHxx,PXxx (1 RmNet + USB Audio)*/



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 36/71] team: Forbid enslaving team device to itself
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 35/71] qmi_wwan: Added support for Gemaltos Cinterion ALASxx WWAN interface Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 37/71] net: dsa: bcm_sf2: Fix unbind ordering Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ido Schimmel, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ido Schimmel <idosch@mellanox.com>

[ Upstream commit 471b83bd8bbe4e89743683ef8ecb78f7029d8288 ]

team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c219baf
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ #176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ #176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b03baab ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-and-tested-by: syzbot+bd051aba086537515cdb@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/team/team.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1171,6 +1171,11 @@ static int team_port_add(struct team *te
 		return -EBUSY;
 	}
 
+	if (dev == port_dev) {
+		netdev_err(dev, "Cannot enslave team device to itself\n");
+		return -EINVAL;
+	}
+
 	if (port_dev->features & NETIF_F_VLAN_CHALLENGED &&
 	    vlan_uses_dev(dev)) {
 		netdev_err(dev, "Device %s is VLAN challenged and team device has VLAN set up\n",



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 37/71] net: dsa: bcm_sf2: Fix unbind ordering
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 36/71] team: Forbid enslaving team device to itself Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 38/71] net: mvpp2: Extract the correct ethtype from the skb for tx csum offload Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Fainelli, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <f.fainelli@gmail.com>

[ Upstream commit bf3b452b7af787b8bf27de6490dc4eedf6f97599 ]

The order in which we release resources is unfortunately leading to bus
errors while dismantling the port. This is because we set
priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now
permissible to clock gate the switch. Later on, when dsa_slave_destroy()
comes in from dsa_unregister_switch() and calls
dsa_switch_ops::port_disable, we perform the same dismantling again, and
this time we hit registers that are clock gated.

Make sure that dsa_unregister_switch() is the first thing that happens,
which takes care of releasing all user visible resources, then proceed
with clock gating hardware. We still need to set priv->wol_ports_mask to
0 to make sure that an enabled port properly gets disabled in case it
was previously used as part of Wake-on-LAN.

Fixes: d9338023fb8e ("net: dsa: bcm_sf2: Make it a real platform device driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/dsa/bcm_sf2.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1129,10 +1129,10 @@ static int bcm_sf2_sw_remove(struct plat
 {
 	struct bcm_sf2_priv *priv = platform_get_drvdata(pdev);
 
-	/* Disable all ports and interrupts */
 	priv->wol_ports_mask = 0;
-	bcm_sf2_sw_suspend(priv->dev->ds);
 	dsa_unregister_switch(priv->dev->ds);
+	/* Disable all ports and interrupts */
+	bcm_sf2_sw_suspend(priv->dev->ds);
 	bcm_sf2_mdio_unregister(priv);
 
 	return 0;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 38/71] net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 37/71] net: dsa: bcm_sf2: Fix unbind ordering Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 39/71] net: systemport: Fix wake-up interrupt race during resume Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Maxime Chevallier, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Maxime Chevallier <maxime.chevallier@bootlin.com>

[ Upstream commit 35f3625c21852ad839f20c91c7d81c4c1101e207 ]

When offloading the L3 and L4 csum computation on TX, we need to extract
the l3_proto from the ethtype, independently of the presence of a vlan
tag.

The actual driver uses skb->protocol as-is, resulting in packets with
the wrong L4 checksum being sent when there's a vlan tag in the packet
header and checksum offloading is enabled.

This commit makes use of vlan_protocol_get() to get the correct ethtype
regardless the presence of a vlan tag.

Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/marvell/mvpp2.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -29,6 +29,7 @@
 #include <linux/clk.h>
 #include <linux/hrtimer.h>
 #include <linux/ktime.h>
+#include <linux/if_vlan.h>
 #include <uapi/linux/ppp_defs.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
@@ -4266,7 +4267,7 @@ static void mvpp2_txq_desc_put(struct mv
 }
 
 /* Set Tx descriptors fields relevant for CSUM calculation */
-static u32 mvpp2_txq_desc_csum(int l3_offs, int l3_proto,
+static u32 mvpp2_txq_desc_csum(int l3_offs, __be16 l3_proto,
 			       int ip_hdr_len, int l4_proto)
 {
 	u32 command;
@@ -5019,14 +5020,15 @@ static u32 mvpp2_skb_tx_csum(struct mvpp
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		int ip_hdr_len = 0;
 		u8 l4_proto;
+		__be16 l3_proto = vlan_get_protocol(skb);
 
-		if (skb->protocol == htons(ETH_P_IP)) {
+		if (l3_proto == htons(ETH_P_IP)) {
 			struct iphdr *ip4h = ip_hdr(skb);
 
 			/* Calculate IPv4 checksum and L4 checksum */
 			ip_hdr_len = ip4h->ihl;
 			l4_proto = ip4h->protocol;
-		} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		} else if (l3_proto == htons(ETH_P_IPV6)) {
 			struct ipv6hdr *ip6h = ipv6_hdr(skb);
 
 			/* Read l4_protocol from one of IPv6 extra headers */
@@ -5038,7 +5040,7 @@ static u32 mvpp2_skb_tx_csum(struct mvpp
 		}
 
 		return mvpp2_txq_desc_csum(skb_network_offset(skb),
-				skb->protocol, ip_hdr_len, l4_proto);
+					   l3_proto, ip_hdr_len, l4_proto);
 	}
 
 	return MVPP2_TXD_L4_CSUM_NOT | MVPP2_TXD_IP_CSUM_DISABLE;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 39/71] net: systemport: Fix wake-up interrupt race during resume
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 38/71] net: mvpp2: Extract the correct ethtype from the skb for tx csum offload Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 40/71] rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Fainelli, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Fainelli <f.fainelli@gmail.com>

[ Upstream commit 45ec318578c0c22a11f5b9927d064418e1ab1905 ]

The AON_PM_L2 is normally used to trigger and identify the source of a
wake-up event. Since the RX_SYS clock is no longer turned off, we also
have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and
that interrupt remains active up until the magic packet detector is
disabled which happens much later during the driver resumption.

The race happens if we have a CPU that is entering the SYSTEMPORT
INTRL2_0 handler during resume, and another CPU has managed to clear the
wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we
have the first CPU stuck in the interrupt handler with an interrupt
cause that has been cleared under its feet, and so we keep returning
IRQ_NONE and we never make any progress.

This was not a problem before because we would always turn off the
RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned
off as well, thus not latching the interrupt.

The fix is to make sure we do not enable either the MPD or
BRCM_TAG_MATCH interrupts since those are redundant with what the
AON_PM_L2 interrupt controller already processes and they would cause
such a race to occur.

Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER")
Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bcmsysport.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -828,14 +828,22 @@ static void bcm_sysport_resume_from_wol(
 {
 	u32 reg;
 
-	/* Stop monitoring MPD interrupt */
-	intrl2_0_mask_set(priv, INTRL2_0_MPD);
-
 	/* Clear the MagicPacket detection logic */
 	reg = umac_readl(priv, UMAC_MPD_CTRL);
 	reg &= ~MPD_EN;
 	umac_writel(priv, reg, UMAC_MPD_CTRL);
 
+	reg = intrl2_0_readl(priv, INTRL2_CPU_STATUS);
+	if (reg & INTRL2_0_MPD)
+		netdev_info(priv->netdev, "Wake-on-LAN (MPD) interrupt!\n");
+
+	if (reg & INTRL2_0_BRCM_MATCH_TAG) {
+		reg = rxchk_readl(priv, RXCHK_BRCM_TAG_MATCH_STATUS) &
+				  RXCHK_BRCM_TAG_MATCH_MASK;
+		netdev_info(priv->netdev,
+			    "Wake-on-LAN (filters 0x%02x) interrupt!\n", reg);
+	}
+
 	netif_dbg(priv, wol, priv->netdev, "resumed from WOL\n");
 }
 
@@ -868,11 +876,6 @@ static irqreturn_t bcm_sysport_rx_isr(in
 	if (priv->irq0_stat & INTRL2_0_TX_RING_FULL)
 		bcm_sysport_tx_reclaim_all(priv);
 
-	if (priv->irq0_stat & INTRL2_0_MPD) {
-		netdev_info(priv->netdev, "Wake-on-LAN interrupt!\n");
-		bcm_sysport_resume_from_wol(priv);
-	}
-
 	return IRQ_HANDLED;
 }
 
@@ -1901,9 +1904,6 @@ static int bcm_sysport_suspend_to_wol(st
 	/* UniMAC receive needs to be turned on */
 	umac_enable_set(priv, CMD_RX_EN, 1);
 
-	/* Enable the interrupt wake-up source */
-	intrl2_0_mask_clear(priv, INTRL2_0_MPD);
-
 	netif_dbg(priv, wol, ndev, "entered WOL mode\n");
 
 	return 0;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 40/71] rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 39/71] net: systemport: Fix wake-up interrupt race during resume Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 41/71] tcp/dccp: fix lockdep issue when SYN is backlogged Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, syzbot, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 0e1d6eca5113858ed2caea61a5adc03c595f6096 ]

We have an impressive number of syzkaller bugs that are linked
to the fact that syzbot was able to create a networking device
with millions of TX (or RX) queues.

Let's limit the number of RX/TX queues to 4096, this really should
cover all known cases.

A separate patch will add various cond_resched() in the loops
handling sysfs entries at device creation and dismantle.

Tested:

lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
RTNETLINK answers: Invalid argument

lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap

real	0m0.180s
user	0m0.000s
sys	0m0.107s

Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/rtnetlink.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2368,6 +2368,12 @@ struct net_device *rtnl_create_link(stru
 	else if (ops->get_num_rx_queues)
 		num_rx_queues = ops->get_num_rx_queues();
 
+	if (num_tx_queues < 1 || num_tx_queues > 4096)
+		return ERR_PTR(-EINVAL);
+
+	if (num_rx_queues < 1 || num_rx_queues > 4096)
+		return ERR_PTR(-EINVAL);
+
 	err = -ENOMEM;
 	dev = alloc_netdev_mqs(ops->priv_size, ifname, name_assign_type,
 			       ops->setup, num_tx_queues, num_rx_queues);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 41/71] tcp/dccp: fix lockdep issue when SYN is backlogged
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 40/71] rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 42/71] inet: make sure to grab rcu_read_lock before using ireq->ireq_opt Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, syzbot, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 1ad98e9d1bdf4724c0a8532fabd84bf3c457c2bc ]

In normal SYN processing, packets are handled without listener
lock and in RCU protected ingress path.

But syzkaller is known to be able to trick us and SYN
packets might be processed in process context, after being
queued into socket backlog.

In commit 06f877d613be ("tcp/dccp: fix other lockdep splats
accessing ireq_opt") I made a very stupid fix, that happened
to work mostly because of the regular path being RCU protected.

Really the thing protecting ireq->ireq_opt is RCU read lock,
and the pseudo request refcnt is not relevant.

This patch extends what I did in commit 449809a66c1d ("tcp/dccp:
block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
pair in the paths that might be taken when processing SYN from
socket backlog (thus possibly in process context)

Fixes: 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_sock.h |    3 +--
 net/dccp/input.c        |    4 +++-
 net/ipv4/tcp_input.c    |    4 +++-
 3 files changed, 7 insertions(+), 4 deletions(-)

--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -134,8 +134,7 @@ static inline int inet_request_bound_dev
 
 static inline struct ip_options_rcu *ireq_opt_deref(const struct inet_request_sock *ireq)
 {
-	return rcu_dereference_check(ireq->ireq_opt,
-				     atomic_read(&ireq->req.rsk_refcnt) > 0);
+	return rcu_dereference(ireq->ireq_opt);
 }
 
 struct inet_cork {
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -605,11 +605,13 @@ int dccp_rcv_state_process(struct sock *
 	if (sk->sk_state == DCCP_LISTEN) {
 		if (dh->dccph_type == DCCP_PKT_REQUEST) {
 			/* It is possible that we process SYN packets from backlog,
-			 * so we need to make sure to disable BH right there.
+			 * so we need to make sure to disable BH and RCU right there.
 			 */
+			rcu_read_lock();
 			local_bh_disable();
 			acceptable = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb) >= 0;
 			local_bh_enable();
+			rcu_read_unlock();
 			if (!acceptable)
 				return 1;
 			consume_skb(skb);
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5978,11 +5978,13 @@ int tcp_rcv_state_process(struct sock *s
 			if (th->fin)
 				goto discard;
 			/* It is possible that we process SYN packets from backlog,
-			 * so we need to make sure to disable BH right there.
+			 * so we need to make sure to disable BH and RCU right there.
 			 */
+			rcu_read_lock();
 			local_bh_disable();
 			acceptable = icsk->icsk_af_ops->conn_request(sk, skb) >= 0;
 			local_bh_enable();
+			rcu_read_unlock();
 
 			if (!acceptable)
 				return 1;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 42/71] inet: make sure to grab rcu_read_lock before using ireq->ireq_opt
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 41/71] tcp/dccp: fix lockdep issue when SYN is backlogged Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 43/71] inet: frags: change inet_frags_init_net() return value Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Willem de Bruijn,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 2ab2ddd301a22ca3c5f0b743593e4ad2953dfa53 ]

Timer handlers do not imply rcu_read_lock(), so my recent fix
triggered a LOCKDEP warning when SYNACK is retransmit.

Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt
usages instead of guessing what is done by callers, since it is
not worth the pain.

Get rid of ireq_opt_deref() helper since it hides the logic
without real benefit, since it is now a standard rcu_dereference().

Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_sock.h         |    5 -----
 net/dccp/ipv4.c                 |    4 +++-
 net/ipv4/inet_connection_sock.c |    5 ++++-
 net/ipv4/tcp_ipv4.c             |    4 +++-
 4 files changed, 10 insertions(+), 8 deletions(-)

--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -132,11 +132,6 @@ static inline int inet_request_bound_dev
 	return sk->sk_bound_dev_if;
 }
 
-static inline struct ip_options_rcu *ireq_opt_deref(const struct inet_request_sock *ireq)
-{
-	return rcu_dereference(ireq->ireq_opt);
-}
-
 struct inet_cork {
 	unsigned int		flags;
 	__be32			addr;
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -493,9 +493,11 @@ static int dccp_v4_send_response(const s
 
 		dh->dccph_checksum = dccp_v4_csum_finish(skb, ireq->ir_loc_addr,
 							      ireq->ir_rmt_addr);
+		rcu_read_lock();
 		err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr,
 					    ireq->ir_rmt_addr,
-					    ireq_opt_deref(ireq));
+					    rcu_dereference(ireq->ireq_opt));
+		rcu_read_unlock();
 		err = net_xmit_eval(err);
 	}
 
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -410,7 +410,8 @@ struct dst_entry *inet_csk_route_req(con
 	struct ip_options_rcu *opt;
 	struct rtable *rt;
 
-	opt = ireq_opt_deref(ireq);
+	rcu_read_lock();
+	opt = rcu_dereference(ireq->ireq_opt);
 
 	flowi4_init_output(fl4, ireq->ir_iif, ireq->ir_mark,
 			   RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE,
@@ -424,11 +425,13 @@ struct dst_entry *inet_csk_route_req(con
 		goto no_route;
 	if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
 		goto route_err;
+	rcu_read_unlock();
 	return &rt->dst;
 
 route_err:
 	ip_rt_put(rt);
 no_route:
+	rcu_read_unlock();
 	__IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES);
 	return NULL;
 }
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -859,9 +859,11 @@ static int tcp_v4_send_synack(const stru
 	if (skb) {
 		__tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);
 
+		rcu_read_lock();
 		err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr,
 					    ireq->ir_rmt_addr,
-					    ireq_opt_deref(ireq));
+					    rcu_dereference(ireq->ireq_opt));
+		rcu_read_unlock();
 		err = net_xmit_eval(err);
 	}
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 43/71] inet: frags: change inet_frags_init_net() return value
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 42/71] inet: make sure to grab rcu_read_lock before using ireq->ireq_opt Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 44/71] inet: frags: add a pointer to struct netns_frags Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

We will soon initialize one rhashtable per struct netns_frags
in inet_frags_init_net().

This patch changes the return value to eventually propagate an
error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 787bea7748a76130566f881c2342a0be4127d182)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h                 |    3 ++-
 net/ieee802154/6lowpan/reassembly.c     |   11 ++++++++---
 net/ipv4/ip_fragment.c                  |   12 +++++++++---
 net/ipv6/netfilter/nf_conntrack_reasm.c |   12 +++++++++---
 net/ipv6/reassembly.c                   |   11 +++++++++--
 5 files changed, 37 insertions(+), 12 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -103,9 +103,10 @@ struct inet_frags {
 int inet_frags_init(struct inet_frags *);
 void inet_frags_fini(struct inet_frags *);
 
-static inline void inet_frags_init_net(struct netns_frags *nf)
+static inline int inet_frags_init_net(struct netns_frags *nf)
 {
 	atomic_set(&nf->mem, 0);
+	return 0;
 }
 void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
 
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -580,14 +580,19 @@ static int __net_init lowpan_frags_init_
 {
 	struct netns_ieee802154_lowpan *ieee802154_lowpan =
 		net_ieee802154_lowpan(net);
+	int res;
 
 	ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;
 
-	inet_frags_init_net(&ieee802154_lowpan->frags);
-
-	return lowpan_frags_ns_sysctl_register(net);
+	res = inet_frags_init_net(&ieee802154_lowpan->frags);
+	if (res < 0)
+		return res;
+	res = lowpan_frags_ns_sysctl_register(net);
+	if (res < 0)
+		inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+	return res;
 }
 
 static void __net_exit lowpan_frags_exit_net(struct net *net)
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -849,6 +849,8 @@ static void __init ip4_frags_ctl_registe
 
 static int __net_init ipv4_frags_init_net(struct net *net)
 {
+	int res;
+
 	/* Fragment cache limits.
 	 *
 	 * The fragment memory accounting code, (tries to) account for
@@ -874,9 +876,13 @@ static int __net_init ipv4_frags_init_ne
 
 	net->ipv4.frags.max_dist = 64;
 
-	inet_frags_init_net(&net->ipv4.frags);
-
-	return ip4_frags_ns_ctl_register(net);
+	res = inet_frags_init_net(&net->ipv4.frags);
+	if (res < 0)
+		return res;
+	res = ip4_frags_ns_ctl_register(net);
+	if (res < 0)
+		inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+	return res;
 }
 
 static void __net_exit ipv4_frags_exit_net(struct net *net)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -630,12 +630,18 @@ EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
 
 static int nf_ct_net_init(struct net *net)
 {
+	int res;
+
 	net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
-	inet_frags_init_net(&net->nf_frag.frags);
-
-	return nf_ct_frag6_sysctl_register(net);
+	res = inet_frags_init_net(&net->nf_frag.frags);
+	if (res < 0)
+		return res;
+	res = nf_ct_frag6_sysctl_register(net);
+	if (res < 0)
+		inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+	return res;
 }
 
 static void nf_ct_net_exit(struct net *net)
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -709,13 +709,20 @@ static void ip6_frags_sysctl_unregister(
 
 static int __net_init ipv6_frags_init_net(struct net *net)
 {
+	int res;
+
 	net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;
 
-	inet_frags_init_net(&net->ipv6.frags);
+	res = inet_frags_init_net(&net->ipv6.frags);
+	if (res < 0)
+		return res;
 
-	return ip6_frags_ns_sysctl_register(net);
+	res = ip6_frags_ns_sysctl_register(net);
+	if (res < 0)
+		inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+	return res;
 }
 
 static void __net_exit ipv6_frags_exit_net(struct net *net)



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 44/71] inet: frags: add a pointer to struct netns_frags
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 43/71] inet: frags: change inet_frags_init_net() return value Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 45/71] inet: frags: refactor ipfrag_init() Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

In order to simplify the API, add a pointer to struct inet_frags.
This will allow us to make things less complex.

These functions no longer have a struct inet_frags parameter :

inet_frag_destroy(struct inet_frag_queue *q  /*, struct inet_frags *f */)
inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 093ba72914b696521e4885756a68a3332782c8de)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h                 |   11 ++++++-----
 include/net/ipv6.h                      |    3 +--
 net/ieee802154/6lowpan/reassembly.c     |   13 +++++++------
 net/ipv4/inet_fragment.c                |   17 ++++++++++-------
 net/ipv4/ip_fragment.c                  |    9 +++++----
 net/ipv6/netfilter/nf_conntrack_reasm.c |   16 +++++++++-------
 net/ipv6/reassembly.c                   |   20 ++++++++++----------
 7 files changed, 48 insertions(+), 41 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -9,6 +9,7 @@ struct netns_frags {
 	int			high_thresh;
 	int			low_thresh;
 	int			max_dist;
+	struct inet_frags	*f;
 };
 
 /**
@@ -108,20 +109,20 @@ static inline int inet_frags_init_net(st
 	atomic_set(&nf->mem, 0);
 	return 0;
 }
-void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
+void inet_frags_exit_net(struct netns_frags *nf);
 
-void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
-void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f);
+void inet_frag_kill(struct inet_frag_queue *q);
+void inet_frag_destroy(struct inet_frag_queue *q);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash);
 
 void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
 				   const char *prefix);
 
-static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f)
+static inline void inet_frag_put(struct inet_frag_queue *q)
 {
 	if (atomic_dec_and_test(&q->refcnt))
-		inet_frag_destroy(q, f);
+		inet_frag_destroy(q);
 }
 
 static inline bool inet_frag_evicting(struct inet_frag_queue *q)
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -559,8 +559,7 @@ struct frag_queue {
 	u8			ecn;
 };
 
-void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
-			   struct inet_frags *frags);
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq);
 
 static inline bool ipv6_addr_any(const struct in6_addr *a)
 {
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -93,10 +93,10 @@ static void lowpan_frag_expire(unsigned
 	if (fq->q.flags & INET_FRAG_COMPLETE)
 		goto out;
 
-	inet_frag_kill(&fq->q, &lowpan_frags);
+	inet_frag_kill(&fq->q);
 out:
 	spin_unlock(&fq->q.lock);
-	inet_frag_put(&fq->q, &lowpan_frags);
+	inet_frag_put(&fq->q);
 }
 
 static inline struct lowpan_frag_queue *
@@ -229,7 +229,7 @@ static int lowpan_frag_reasm(struct lowp
 	struct sk_buff *fp, *head = fq->q.fragments;
 	int sum_truesize;
 
-	inet_frag_kill(&fq->q, &lowpan_frags);
+	inet_frag_kill(&fq->q);
 
 	/* Make the one we just received the head. */
 	if (prev) {
@@ -437,7 +437,7 @@ int lowpan_frag_rcv(struct sk_buff *skb,
 		ret = lowpan_frag_queue(fq, skb, frag_type);
 		spin_unlock(&fq->q.lock);
 
-		inet_frag_put(&fq->q, &lowpan_frags);
+		inet_frag_put(&fq->q);
 		return ret;
 	}
 
@@ -585,13 +585,14 @@ static int __net_init lowpan_frags_init_
 	ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;
+	ieee802154_lowpan->frags.f = &lowpan_frags;
 
 	res = inet_frags_init_net(&ieee802154_lowpan->frags);
 	if (res < 0)
 		return res;
 	res = lowpan_frags_ns_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+		inet_frags_exit_net(&ieee802154_lowpan->frags);
 	return res;
 }
 
@@ -601,7 +602,7 @@ static void __net_exit lowpan_frags_exit
 		net_ieee802154_lowpan(net);
 
 	lowpan_frags_ns_sysctl_unregister(net);
-	inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+	inet_frags_exit_net(&ieee802154_lowpan->frags);
 }
 
 static struct pernet_operations lowpan_frags_ops = {
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -219,8 +219,9 @@ void inet_frags_fini(struct inet_frags *
 }
 EXPORT_SYMBOL(inet_frags_fini);
 
-void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
+void inet_frags_exit_net(struct netns_frags *nf)
 {
+	struct inet_frags *f =nf->f;
 	unsigned int seq;
 	int i;
 
@@ -264,33 +265,34 @@ __acquires(hb->chain_lock)
 	return hb;
 }
 
-static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
+static inline void fq_unlink(struct inet_frag_queue *fq)
 {
 	struct inet_frag_bucket *hb;
 
-	hb = get_frag_bucket_locked(fq, f);
+	hb = get_frag_bucket_locked(fq, fq->net->f);
 	hlist_del(&fq->list);
 	fq->flags |= INET_FRAG_COMPLETE;
 	spin_unlock(&hb->chain_lock);
 }
 
-void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
+void inet_frag_kill(struct inet_frag_queue *fq)
 {
 	if (del_timer(&fq->timer))
 		atomic_dec(&fq->refcnt);
 
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
-		fq_unlink(fq, f);
+		fq_unlink(fq);
 		atomic_dec(&fq->refcnt);
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
 
-void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f)
+void inet_frag_destroy(struct inet_frag_queue *q)
 {
 	struct sk_buff *fp;
 	struct netns_frags *nf;
 	unsigned int sum, sum_truesize = 0;
+	struct inet_frags *f;
 
 	WARN_ON(!(q->flags & INET_FRAG_COMPLETE));
 	WARN_ON(del_timer(&q->timer) != 0);
@@ -298,6 +300,7 @@ void inet_frag_destroy(struct inet_frag_
 	/* Release all fragment data. */
 	fp = q->fragments;
 	nf = q->net;
+	f = nf->f;
 	while (fp) {
 		struct sk_buff *xp = fp->next;
 
@@ -333,7 +336,7 @@ static struct inet_frag_queue *inet_frag
 			atomic_inc(&qp->refcnt);
 			spin_unlock(&hb->chain_lock);
 			qp_in->flags |= INET_FRAG_COMPLETE;
-			inet_frag_put(qp_in, f);
+			inet_frag_put(qp_in);
 			return qp;
 		}
 	}
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -167,7 +167,7 @@ static void ip4_frag_free(struct inet_fr
 
 static void ipq_put(struct ipq *ipq)
 {
-	inet_frag_put(&ipq->q, &ip4_frags);
+	inet_frag_put(&ipq->q);
 }
 
 /* Kill ipq entry. It is not destroyed immediately,
@@ -175,7 +175,7 @@ static void ipq_put(struct ipq *ipq)
  */
 static void ipq_kill(struct ipq *ipq)
 {
-	inet_frag_kill(&ipq->q, &ip4_frags);
+	inet_frag_kill(&ipq->q);
 }
 
 static bool frag_expire_skip_icmp(u32 user)
@@ -875,20 +875,21 @@ static int __net_init ipv4_frags_init_ne
 	net->ipv4.frags.timeout = IP_FRAG_TIME;
 
 	net->ipv4.frags.max_dist = 64;
+	net->ipv4.frags.f = &ip4_frags;
 
 	res = inet_frags_init_net(&net->ipv4.frags);
 	if (res < 0)
 		return res;
 	res = ip4_frags_ns_ctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+		inet_frags_exit_net(&net->ipv4.frags);
 	return res;
 }
 
 static void __net_exit ipv4_frags_exit_net(struct net *net)
 {
 	ip4_frags_ns_ctl_unregister(net);
-	inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+	inet_frags_exit_net(&net->ipv4.frags);
 }
 
 static struct pernet_operations ip4_frags_ops = {
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -177,7 +177,7 @@ static void nf_ct_frag6_expire(unsigned
 	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
 	net = container_of(fq->q.net, struct net, nf_frag.frags);
 
-	ip6_expire_frag_queue(net, fq, &nf_frags);
+	ip6_expire_frag_queue(net, fq);
 }
 
 /* Creation primitives. */
@@ -263,7 +263,7 @@ static int nf_ct_frag6_queue(struct frag
 			 * this case. -DaveM
 			 */
 			pr_debug("end of fragment not rounded to 8 bytes.\n");
-			inet_frag_kill(&fq->q, &nf_frags);
+			inet_frag_kill(&fq->q);
 			return -EPROTO;
 		}
 		if (end > fq->q.len) {
@@ -356,7 +356,7 @@ found:
 	return 0;
 
 discard_fq:
-	inet_frag_kill(&fq->q, &nf_frags);
+	inet_frag_kill(&fq->q);
 err:
 	return -EINVAL;
 }
@@ -378,7 +378,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
 	int    payload_len;
 	u8 ecn;
 
-	inet_frag_kill(&fq->q, &nf_frags);
+	inet_frag_kill(&fq->q);
 
 	WARN_ON(head == NULL);
 	WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -623,7 +623,7 @@ int nf_ct_frag6_gather(struct net *net,
 
 out_unlock:
 	spin_unlock_bh(&fq->q.lock);
-	inet_frag_put(&fq->q, &nf_frags);
+	inet_frag_put(&fq->q);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
@@ -635,19 +635,21 @@ static int nf_ct_net_init(struct net *ne
 	net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
+	net->nf_frag.frags.f = &nf_frags;
+
 	res = inet_frags_init_net(&net->nf_frag.frags);
 	if (res < 0)
 		return res;
 	res = nf_ct_frag6_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+		inet_frags_exit_net(&net->nf_frag.frags);
 	return res;
 }
 
 static void nf_ct_net_exit(struct net *net)
 {
 	nf_ct_frags6_sysctl_unregister(net);
-	inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+	inet_frags_exit_net(&net->nf_frag.frags);
 }
 
 static struct pernet_operations nf_ct_net_ops = {
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -128,8 +128,7 @@ void ip6_frag_init(struct inet_frag_queu
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
-void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
-			   struct inet_frags *frags)
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 {
 	struct net_device *dev = NULL;
 
@@ -138,7 +137,7 @@ void ip6_expire_frag_queue(struct net *n
 	if (fq->q.flags & INET_FRAG_COMPLETE)
 		goto out;
 
-	inet_frag_kill(&fq->q, frags);
+	inet_frag_kill(&fq->q);
 
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, fq->iif);
@@ -166,7 +165,7 @@ out_rcu_unlock:
 	rcu_read_unlock();
 out:
 	spin_unlock(&fq->q.lock);
-	inet_frag_put(&fq->q, frags);
+	inet_frag_put(&fq->q);
 }
 EXPORT_SYMBOL(ip6_expire_frag_queue);
 
@@ -178,7 +177,7 @@ static void ip6_frag_expire(unsigned lon
 	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
 	net = container_of(fq->q.net, struct net, ipv6.frags);
 
-	ip6_expire_frag_queue(net, fq, &ip6_frags);
+	ip6_expire_frag_queue(net, fq);
 }
 
 static struct frag_queue *
@@ -359,7 +358,7 @@ found:
 	return -1;
 
 discard_fq:
-	inet_frag_kill(&fq->q, &ip6_frags);
+	inet_frag_kill(&fq->q);
 err:
 	__IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
 			IPSTATS_MIB_REASMFAILS);
@@ -386,7 +385,7 @@ static int ip6_frag_reasm(struct frag_qu
 	int sum_truesize;
 	u8 ecn;
 
-	inet_frag_kill(&fq->q, &ip6_frags);
+	inet_frag_kill(&fq->q);
 
 	ecn = ip_frag_ecn_table[fq->ecn];
 	if (unlikely(ecn == 0xff))
@@ -563,7 +562,7 @@ static int ipv6_frag_rcv(struct sk_buff
 		ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
 
 		spin_unlock(&fq->q.lock);
-		inet_frag_put(&fq->q, &ip6_frags);
+		inet_frag_put(&fq->q);
 		return ret;
 	}
 
@@ -714,6 +713,7 @@ static int __net_init ipv6_frags_init_ne
 	net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;
+	net->ipv6.frags.f = &ip6_frags;
 
 	res = inet_frags_init_net(&net->ipv6.frags);
 	if (res < 0)
@@ -721,14 +721,14 @@ static int __net_init ipv6_frags_init_ne
 
 	res = ip6_frags_ns_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+		inet_frags_exit_net(&net->ipv6.frags);
 	return res;
 }
 
 static void __net_exit ipv6_frags_exit_net(struct net *net)
 {
 	ip6_frags_ns_sysctl_unregister(net);
-	inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+	inet_frags_exit_net(&net->ipv6.frags);
 }
 
 static struct pernet_operations ip6_frags_ops = {



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 45/71] inet: frags: refactor ipfrag_init()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 44/71] inet: frags: add a pointer to struct netns_frags Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 46/71] inet: frags: refactor ipv6_frag_init() Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

We need to call inet_frags_init() before register_pernet_subsys(),
as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 483a6e4fa055123142d8956866fe2aa9c98d546d)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_fragment.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -899,8 +899,6 @@ static struct pernet_operations ip4_frag
 
 void __init ipfrag_init(void)
 {
-	ip4_frags_ctl_register();
-	register_pernet_subsys(&ip4_frags_ops);
 	ip4_frags.hashfn = ip4_hashfn;
 	ip4_frags.constructor = ip4_frag_init;
 	ip4_frags.destructor = ip4_frag_free;
@@ -910,4 +908,6 @@ void __init ipfrag_init(void)
 	ip4_frags.frags_cache_name = ip_frag_cache_name;
 	if (inet_frags_init(&ip4_frags))
 		panic("IP: failed to allocate ip4_frags cache\n");
+	ip4_frags_ctl_register();
+	register_pernet_subsys(&ip4_frags_ops);
 }



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 46/71] inet: frags: refactor ipv6_frag_init()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 45/71] inet: frags: refactor ipfrag_init() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 47/71] inet: frags: refactor lowpan_net_frag_init() Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

We want to call inet_frags_init() earlier.

This is a prereq to "inet: frags: use rhashtables for reassembly units"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b975bab23615cd0fdf67af6c9298eb01c4b9f61)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/reassembly.c |   25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -740,10 +740,21 @@ int __init ipv6_frag_init(void)
 {
 	int ret;
 
-	ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+	ip6_frags.hashfn = ip6_hashfn;
+	ip6_frags.constructor = ip6_frag_init;
+	ip6_frags.destructor = NULL;
+	ip6_frags.qsize = sizeof(struct frag_queue);
+	ip6_frags.match = ip6_frag_match;
+	ip6_frags.frag_expire = ip6_frag_expire;
+	ip6_frags.frags_cache_name = ip6_frag_cache_name;
+	ret = inet_frags_init(&ip6_frags);
 	if (ret)
 		goto out;
 
+	ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+	if (ret)
+		goto err_protocol;
+
 	ret = ip6_frags_sysctl_register();
 	if (ret)
 		goto err_sysctl;
@@ -752,16 +763,6 @@ int __init ipv6_frag_init(void)
 	if (ret)
 		goto err_pernet;
 
-	ip6_frags.hashfn = ip6_hashfn;
-	ip6_frags.constructor = ip6_frag_init;
-	ip6_frags.destructor = NULL;
-	ip6_frags.qsize = sizeof(struct frag_queue);
-	ip6_frags.match = ip6_frag_match;
-	ip6_frags.frag_expire = ip6_frag_expire;
-	ip6_frags.frags_cache_name = ip6_frag_cache_name;
-	ret = inet_frags_init(&ip6_frags);
-	if (ret)
-		goto err_pernet;
 out:
 	return ret;
 
@@ -769,6 +770,8 @@ err_pernet:
 	ip6_frags_sysctl_unregister();
 err_sysctl:
 	inet6_del_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+err_protocol:
+	inet_frags_fini(&ip6_frags);
 	goto out;
 }
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 47/71] inet: frags: refactor lowpan_net_frag_init()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 46/71] inet: frags: refactor ipv6_frag_init() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 48/71] ipv6: export ip6 fragments sysctl to unprivileged users Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

We want to call lowpan_net_frag_init() earlier.
Similar to commit "inet: frags: refactor ipv6_frag_init()"

This is a prereq to "inet: frags: use rhashtables for reassembly units"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 807f1844df4ac23594268fa9f41902d0549e92aa)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ieee802154/6lowpan/reassembly.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -614,14 +614,6 @@ int __init lowpan_net_frag_init(void)
 {
 	int ret;
 
-	ret = lowpan_frags_sysctl_register();
-	if (ret)
-		return ret;
-
-	ret = register_pernet_subsys(&lowpan_frags_ops);
-	if (ret)
-		goto err_pernet;
-
 	lowpan_frags.hashfn = lowpan_hashfn;
 	lowpan_frags.constructor = lowpan_frag_init;
 	lowpan_frags.destructor = NULL;
@@ -631,11 +623,21 @@ int __init lowpan_net_frag_init(void)
 	lowpan_frags.frags_cache_name = lowpan_frags_cache_name;
 	ret = inet_frags_init(&lowpan_frags);
 	if (ret)
-		goto err_pernet;
+		goto out;
+
+	ret = lowpan_frags_sysctl_register();
+	if (ret)
+		goto err_sysctl;
 
+	ret = register_pernet_subsys(&lowpan_frags_ops);
+	if (ret)
+		goto err_pernet;
+out:
 	return ret;
 err_pernet:
 	lowpan_frags_sysctl_unregister();
+err_sysctl:
+	inet_frags_fini(&lowpan_frags);
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 48/71] ipv6: export ip6 fragments sysctl to unprivileged users
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 47/71] inet: frags: refactor lowpan_net_frag_init() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 49/71] rhashtable: add schedule points Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Nikolay Borisov,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment
sysctl to unprivileged users")

The only sysctl that is not per-netns is not used :
ip6frag_secret_interval

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18dcbe12fe9fca0ab825f7eff993060525ac2503)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/reassembly.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -643,10 +643,6 @@ static int __net_init ip6_frags_ns_sysct
 		table[1].data = &net->ipv6.frags.low_thresh;
 		table[1].extra2 = &net->ipv6.frags.high_thresh;
 		table[2].data = &net->ipv6.frags.timeout;
-
-		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
-			table[0].procname = NULL;
 	}
 
 	hdr = register_net_sysctl(net, "net/ipv6", table);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 49/71] rhashtable: add schedule points
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 48/71] ipv6: export ip6 fragments sysctl to unprivileged users Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Herbert Xu, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Rehashing and destroying large hash table takes a lot of time,
and happens in process context. It is safe to add cond_resched()
in rhashtable_rehash_table() and rhashtable_free_and_destroy()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ae6da1f503abb5a5081f9f6c4a6881de97830f3e)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 lib/rhashtable.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -251,8 +251,10 @@ static int rhashtable_rehash_table(struc
 	if (!new_tbl)
 		return 0;
 
-	for (old_hash = 0; old_hash < old_tbl->size; old_hash++)
+	for (old_hash = 0; old_hash < old_tbl->size; old_hash++) {
 		rhashtable_rehash_chain(ht, old_hash);
+		cond_resched();
+	}
 
 	/* Publish the new table pointer. */
 	rcu_assign_pointer(ht->tbl, new_tbl);
@@ -993,6 +995,7 @@ void rhashtable_free_and_destroy(struct
 		for (i = 0; i < tbl->size; i++) {
 			struct rhash_head *pos, *next;
 
+			cond_resched();
 			for (pos = rht_dereference(tbl->buckets[i], ht),
 			     next = !rht_is_a_nulls(pos) ?
 					rht_dereference(pos->next, ht) : NULL;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 49/71] rhashtable: add schedule points Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-26 13:39   ` Stefan Schmidt
  2018-10-16 17:09 ` [PATCH 4.9 51/71] inet: frags: remove some helpers Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  75 siblings, 1 reply; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Kirill Tkhai,
	Herbert Xu, Florian Westphal, Jesper Dangaard Brouer,
	Alexander Aring, Stefan Schmidt, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.

It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.

This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.

Then there is the problem of sharing this hash table for all netns.

It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.

Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.

Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.

After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)

$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608

A followup patch will change the limits for 64bit arches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Florian Westphal <fw@strlen.de>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/networking/ip-sysctl.txt  |    7 
 include/net/inet_frag.h                 |   81 +++----
 include/net/ipv6.h                      |   16 -
 net/ieee802154/6lowpan/6lowpan_i.h      |   26 --
 net/ieee802154/6lowpan/reassembly.c     |   91 +++-----
 net/ipv4/inet_fragment.c                |  349 ++++++--------------------------
 net/ipv4/ip_fragment.c                  |  112 ++++------
 net/ipv6/netfilter/nf_conntrack_reasm.c |   51 +---
 net/ipv6/reassembly.c                   |  110 ++++------
 9 files changed, 267 insertions(+), 576 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -123,13 +123,10 @@ min_adv_mss - INTEGER
 IP Fragmentation:
 
 ipfrag_high_thresh - INTEGER
-	Maximum memory used to reassemble IP fragments. When
-	ipfrag_high_thresh bytes of memory is allocated for this purpose,
-	the fragment handler will toss packets until ipfrag_low_thresh
-	is reached. This also serves as a maximum limit to namespaces
-	different from the initial one.
+	Maximum memory used to reassemble IP fragments.
 
 ipfrag_low_thresh - INTEGER
+	(Obsolete since linux-4.17)
 	Maximum memory used to reassemble IP fragments before the kernel
 	begins to remove incomplete fragment queues to free up resources.
 	The kernel still accepts new fragments for defragmentation.
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -1,7 +1,11 @@
 #ifndef __NET_FRAG_H__
 #define __NET_FRAG_H__
 
+#include <linux/rhashtable.h>
+
 struct netns_frags {
+	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
+
 	/* Keep atomic mem on separate cachelines in structs that include it */
 	atomic_t		mem ____cacheline_aligned_in_smp;
 	/* sysctls */
@@ -25,12 +29,30 @@ enum {
 	INET_FRAG_COMPLETE	= BIT(2),
 };
 
+struct frag_v4_compare_key {
+	__be32		saddr;
+	__be32		daddr;
+	u32		user;
+	u32		vif;
+	__be16		id;
+	u16		protocol;
+};
+
+struct frag_v6_compare_key {
+	struct in6_addr	saddr;
+	struct in6_addr	daddr;
+	u32		user;
+	__be32		id;
+	u32		iif;
+};
+
 /**
  * struct inet_frag_queue - fragment queue
  *
- * @lock: spinlock protecting the queue
+ * @node: rhash node
+ * @key: keys identifying this frag.
  * @timer: queue expiration timer
- * @list: hash bucket list
+ * @lock: spinlock protecting this frag
  * @refcnt: reference count of the queue
  * @fragments: received fragments head
  * @fragments_tail: received fragments tail
@@ -40,12 +62,16 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
- * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
+ * @rcu: rcu head for freeing deferall
  */
 struct inet_frag_queue {
-	spinlock_t		lock;
+	struct rhash_head	node;
+	union {
+		struct frag_v4_compare_key v4;
+		struct frag_v6_compare_key v6;
+	} key;
 	struct timer_list	timer;
-	struct hlist_node	list;
+	spinlock_t		lock;
 	atomic_t		refcnt;
 	struct sk_buff		*fragments;
 	struct sk_buff		*fragments_tail;
@@ -54,51 +80,20 @@ struct inet_frag_queue {
 	int			meat;
 	__u8			flags;
 	u16			max_size;
-	struct netns_frags	*net;
-	struct hlist_node	list_evictor;
-};
-
-#define INETFRAGS_HASHSZ	1024
-
-/* averaged:
- * max_depth = default ipfrag_high_thresh / INETFRAGS_HASHSZ /
- *	       rounded up (SKB_TRUELEN(0) + sizeof(struct ipq or
- *	       struct frag_queue))
- */
-#define INETFRAGS_MAXDEPTH	128
-
-struct inet_frag_bucket {
-	struct hlist_head	chain;
-	spinlock_t		chain_lock;
+	struct netns_frags      *net;
+	struct rcu_head		rcu;
 };
 
 struct inet_frags {
-	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];
-
-	struct work_struct	frags_work;
-	unsigned int next_bucket;
-	unsigned long last_rebuild_jiffies;
-	bool rebuild;
-
-	/* The first call to hashfn is responsible to initialize
-	 * rnd. This is best done with net_get_random_once.
-	 *
-	 * rnd_seqlock is used to let hash insertion detect
-	 * when it needs to re-lookup the hash chain to use.
-	 */
-	u32			rnd;
-	seqlock_t		rnd_seqlock;
 	int			qsize;
 
-	unsigned int		(*hashfn)(const struct inet_frag_queue *);
-	bool			(*match)(const struct inet_frag_queue *q,
-					 const void *arg);
 	void			(*constructor)(struct inet_frag_queue *q,
 					       const void *arg);
 	void			(*destructor)(struct inet_frag_queue *);
 	void			(*frag_expire)(unsigned long data);
 	struct kmem_cache	*frags_cachep;
 	const char		*frags_cache_name;
+	struct rhashtable_params rhash_params;
 };
 
 int inet_frags_init(struct inet_frags *);
@@ -107,15 +102,13 @@ void inet_frags_fini(struct inet_frags *
 static inline int inet_frags_init_net(struct netns_frags *nf)
 {
 	atomic_set(&nf->mem, 0);
-	return 0;
+	return rhashtable_init(&nf->rhashtable, &nf->f->rhash_params);
 }
 void inet_frags_exit_net(struct netns_frags *nf);
 
 void inet_frag_kill(struct inet_frag_queue *q);
 void inet_frag_destroy(struct inet_frag_queue *q);
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
-		struct inet_frags *f, void *key, unsigned int hash);
-
+struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
 void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
 				   const char *prefix);
 
@@ -127,7 +120,7 @@ static inline void inet_frag_put(struct
 
 static inline bool inet_frag_evicting(struct inet_frag_queue *q)
 {
-	return !hlist_unhashed(&q->list_evictor);
+	return false;
 }
 
 /* Memory Tracking Functions. */
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -530,17 +530,8 @@ enum ip6_defrag_users {
 	__IP6_DEFRAG_CONNTRACK_BRIDGE_IN = IP6_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX,
 };
 
-struct ip6_create_arg {
-	__be32 id;
-	u32 user;
-	const struct in6_addr *src;
-	const struct in6_addr *dst;
-	int iif;
-	u8 ecn;
-};
-
 void ip6_frag_init(struct inet_frag_queue *q, const void *a);
-bool ip6_frag_match(const struct inet_frag_queue *q, const void *a);
+extern const struct rhashtable_params ip6_rhash_params;
 
 /*
  *	Equivalent of ipv4 struct ip
@@ -548,11 +539,6 @@ bool ip6_frag_match(const struct inet_fr
 struct frag_queue {
 	struct inet_frag_queue	q;
 
-	__be32			id;		/* fragment id		*/
-	u32			user;
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-
 	int			iif;
 	unsigned int		csum;
 	__u16			nhoffset;
--- a/net/ieee802154/6lowpan/6lowpan_i.h
+++ b/net/ieee802154/6lowpan/6lowpan_i.h
@@ -16,37 +16,19 @@ typedef unsigned __bitwise__ lowpan_rx_r
 #define LOWPAN_DISPATCH_FRAG1           0xc0
 #define LOWPAN_DISPATCH_FRAGN           0xe0
 
-struct lowpan_create_arg {
+struct frag_lowpan_compare_key {
 	u16 tag;
 	u16 d_size;
-	const struct ieee802154_addr *src;
-	const struct ieee802154_addr *dst;
+	const struct ieee802154_addr src;
+	const struct ieee802154_addr dst;
 };
 
-/* Equivalent of ipv4 struct ip
+/* Equivalent of ipv4 struct ipq
  */
 struct lowpan_frag_queue {
 	struct inet_frag_queue	q;
-
-	u16			tag;
-	u16			d_size;
-	struct ieee802154_addr	saddr;
-	struct ieee802154_addr	daddr;
 };
 
-static inline u32 ieee802154_addr_hash(const struct ieee802154_addr *a)
-{
-	switch (a->mode) {
-	case IEEE802154_ADDR_LONG:
-		return (((__force u64)a->extended_addr) >> 32) ^
-			(((__force u64)a->extended_addr) & 0xffffffff);
-	case IEEE802154_ADDR_SHORT:
-		return (__force u32)(a->short_addr + (a->pan_id << 16));
-	default:
-		return 0;
-	}
-}
-
 int lowpan_frag_rcv(struct sk_buff *skb, const u8 frag_type);
 void lowpan_net_frag_exit(void);
 int lowpan_net_frag_init(void);
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -37,47 +37,15 @@ static struct inet_frags lowpan_frags;
 static int lowpan_frag_reasm(struct lowpan_frag_queue *fq,
 			     struct sk_buff *prev, struct net_device *ldev);
 
-static unsigned int lowpan_hash_frag(u16 tag, u16 d_size,
-				     const struct ieee802154_addr *saddr,
-				     const struct ieee802154_addr *daddr)
-{
-	net_get_random_once(&lowpan_frags.rnd, sizeof(lowpan_frags.rnd));
-	return jhash_3words(ieee802154_addr_hash(saddr),
-			    ieee802154_addr_hash(daddr),
-			    (__force u32)(tag + (d_size << 16)),
-			    lowpan_frags.rnd);
-}
-
-static unsigned int lowpan_hashfn(const struct inet_frag_queue *q)
-{
-	const struct lowpan_frag_queue *fq;
-
-	fq = container_of(q, struct lowpan_frag_queue, q);
-	return lowpan_hash_frag(fq->tag, fq->d_size, &fq->saddr, &fq->daddr);
-}
-
-static bool lowpan_frag_match(const struct inet_frag_queue *q, const void *a)
-{
-	const struct lowpan_frag_queue *fq;
-	const struct lowpan_create_arg *arg = a;
-
-	fq = container_of(q, struct lowpan_frag_queue, q);
-	return	fq->tag == arg->tag && fq->d_size == arg->d_size &&
-		ieee802154_addr_equal(&fq->saddr, arg->src) &&
-		ieee802154_addr_equal(&fq->daddr, arg->dst);
-}
-
 static void lowpan_frag_init(struct inet_frag_queue *q, const void *a)
 {
-	const struct lowpan_create_arg *arg = a;
+	const struct frag_lowpan_compare_key *key = a;
 	struct lowpan_frag_queue *fq;
 
 	fq = container_of(q, struct lowpan_frag_queue, q);
 
-	fq->tag = arg->tag;
-	fq->d_size = arg->d_size;
-	fq->saddr = *arg->src;
-	fq->daddr = *arg->dst;
+	BUILD_BUG_ON(sizeof(*key) > sizeof(q->key));
+	memcpy(&q->key, key, sizeof(*key));
 }
 
 static void lowpan_frag_expire(unsigned long data)
@@ -104,21 +72,17 @@ fq_find(struct net *net, const struct lo
 	const struct ieee802154_addr *src,
 	const struct ieee802154_addr *dst)
 {
-	struct inet_frag_queue *q;
-	struct lowpan_create_arg arg;
-	unsigned int hash;
 	struct netns_ieee802154_lowpan *ieee802154_lowpan =
 		net_ieee802154_lowpan(net);
+	struct frag_lowpan_compare_key key = {
+		.tag = cb->d_tag,
+		.d_size = cb->d_size,
+		.src = *src,
+		.dst = *dst,
+	};
+	struct inet_frag_queue *q;
 
-	arg.tag = cb->d_tag;
-	arg.d_size = cb->d_size;
-	arg.src = src;
-	arg.dst = dst;
-
-	hash = lowpan_hash_frag(cb->d_tag, cb->d_size, src, dst);
-
-	q = inet_frag_find(&ieee802154_lowpan->frags,
-			   &lowpan_frags, &arg, hash);
+	q = inet_frag_find(&ieee802154_lowpan->frags, &key);
 	if (IS_ERR_OR_NULL(q)) {
 		inet_frag_maybe_warn_overflow(q, pr_fmt());
 		return NULL;
@@ -610,17 +574,46 @@ static struct pernet_operations lowpan_f
 	.exit = lowpan_frags_exit_net,
 };
 
+static u32 lowpan_key_hashfn(const void *data, u32 len, u32 seed)
+{
+	return jhash2(data,
+		      sizeof(struct frag_lowpan_compare_key) / sizeof(u32), seed);
+}
+
+static u32 lowpan_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct inet_frag_queue *fq = data;
+
+	return jhash2((const u32 *)&fq->key,
+		      sizeof(struct frag_lowpan_compare_key) / sizeof(u32), seed);
+}
+
+static int lowpan_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr)
+{
+	const struct frag_lowpan_compare_key *key = arg->key;
+	const struct inet_frag_queue *fq = ptr;
+
+	return !!memcmp(&fq->key, key, sizeof(*key));
+}
+
+static const struct rhashtable_params lowpan_rhash_params = {
+	.head_offset		= offsetof(struct inet_frag_queue, node),
+	.hashfn			= lowpan_key_hashfn,
+	.obj_hashfn		= lowpan_obj_hashfn,
+	.obj_cmpfn		= lowpan_obj_cmpfn,
+	.automatic_shrinking	= true,
+};
+
 int __init lowpan_net_frag_init(void)
 {
 	int ret;
 
-	lowpan_frags.hashfn = lowpan_hashfn;
 	lowpan_frags.constructor = lowpan_frag_init;
 	lowpan_frags.destructor = NULL;
 	lowpan_frags.qsize = sizeof(struct frag_queue);
-	lowpan_frags.match = lowpan_frag_match;
 	lowpan_frags.frag_expire = lowpan_frag_expire;
 	lowpan_frags.frags_cache_name = lowpan_frags_cache_name;
+	lowpan_frags.rhash_params = lowpan_rhash_params;
 	ret = inet_frags_init(&lowpan_frags);
 	if (ret)
 		goto out;
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -25,12 +25,6 @@
 #include <net/inet_frag.h>
 #include <net/inet_ecn.h>
 
-#define INETFRAGS_EVICT_BUCKETS   128
-#define INETFRAGS_EVICT_MAX	  512
-
-/* don't rebuild inetfrag table with new secret more often than this */
-#define INETFRAGS_MIN_REBUILD_INTERVAL (5 * HZ)
-
 /* Given the OR values of all fragments, apply RFC 3168 5.3 requirements
  * Value : 0xff if frame should be dropped.
  *         0 or INET_ECN_CE value, to be ORed in to final iph->tos field
@@ -52,157 +46,8 @@ const u8 ip_frag_ecn_table[16] = {
 };
 EXPORT_SYMBOL(ip_frag_ecn_table);
 
-static unsigned int
-inet_frag_hashfn(const struct inet_frags *f, const struct inet_frag_queue *q)
-{
-	return f->hashfn(q) & (INETFRAGS_HASHSZ - 1);
-}
-
-static bool inet_frag_may_rebuild(struct inet_frags *f)
-{
-	return time_after(jiffies,
-	       f->last_rebuild_jiffies + INETFRAGS_MIN_REBUILD_INTERVAL);
-}
-
-static void inet_frag_secret_rebuild(struct inet_frags *f)
-{
-	int i;
-
-	write_seqlock_bh(&f->rnd_seqlock);
-
-	if (!inet_frag_may_rebuild(f))
-		goto out;
-
-	get_random_bytes(&f->rnd, sizeof(u32));
-
-	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
-		struct inet_frag_bucket *hb;
-		struct inet_frag_queue *q;
-		struct hlist_node *n;
-
-		hb = &f->hash[i];
-		spin_lock(&hb->chain_lock);
-
-		hlist_for_each_entry_safe(q, n, &hb->chain, list) {
-			unsigned int hval = inet_frag_hashfn(f, q);
-
-			if (hval != i) {
-				struct inet_frag_bucket *hb_dest;
-
-				hlist_del(&q->list);
-
-				/* Relink to new hash chain. */
-				hb_dest = &f->hash[hval];
-
-				/* This is the only place where we take
-				 * another chain_lock while already holding
-				 * one.  As this will not run concurrently,
-				 * we cannot deadlock on hb_dest lock below, if its
-				 * already locked it will be released soon since
-				 * other caller cannot be waiting for hb lock
-				 * that we've taken above.
-				 */
-				spin_lock_nested(&hb_dest->chain_lock,
-						 SINGLE_DEPTH_NESTING);
-				hlist_add_head(&q->list, &hb_dest->chain);
-				spin_unlock(&hb_dest->chain_lock);
-			}
-		}
-		spin_unlock(&hb->chain_lock);
-	}
-
-	f->rebuild = false;
-	f->last_rebuild_jiffies = jiffies;
-out:
-	write_sequnlock_bh(&f->rnd_seqlock);
-}
-
-static bool inet_fragq_should_evict(const struct inet_frag_queue *q)
-{
-	if (!hlist_unhashed(&q->list_evictor))
-		return false;
-
-	return q->net->low_thresh == 0 ||
-	       frag_mem_limit(q->net) >= q->net->low_thresh;
-}
-
-static unsigned int
-inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)
-{
-	struct inet_frag_queue *fq;
-	struct hlist_node *n;
-	unsigned int evicted = 0;
-	HLIST_HEAD(expired);
-
-	spin_lock(&hb->chain_lock);
-
-	hlist_for_each_entry_safe(fq, n, &hb->chain, list) {
-		if (!inet_fragq_should_evict(fq))
-			continue;
-
-		if (!del_timer(&fq->timer))
-			continue;
-
-		hlist_add_head(&fq->list_evictor, &expired);
-		++evicted;
-	}
-
-	spin_unlock(&hb->chain_lock);
-
-	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
-		f->frag_expire((unsigned long) fq);
-
-	return evicted;
-}
-
-static void inet_frag_worker(struct work_struct *work)
-{
-	unsigned int budget = INETFRAGS_EVICT_BUCKETS;
-	unsigned int i, evicted = 0;
-	struct inet_frags *f;
-
-	f = container_of(work, struct inet_frags, frags_work);
-
-	BUILD_BUG_ON(INETFRAGS_EVICT_BUCKETS >= INETFRAGS_HASHSZ);
-
-	local_bh_disable();
-
-	for (i = ACCESS_ONCE(f->next_bucket); budget; --budget) {
-		evicted += inet_evict_bucket(f, &f->hash[i]);
-		i = (i + 1) & (INETFRAGS_HASHSZ - 1);
-		if (evicted > INETFRAGS_EVICT_MAX)
-			break;
-	}
-
-	f->next_bucket = i;
-
-	local_bh_enable();
-
-	if (f->rebuild && inet_frag_may_rebuild(f))
-		inet_frag_secret_rebuild(f);
-}
-
-static void inet_frag_schedule_worker(struct inet_frags *f)
-{
-	if (unlikely(!work_pending(&f->frags_work)))
-		schedule_work(&f->frags_work);
-}
-
 int inet_frags_init(struct inet_frags *f)
 {
-	int i;
-
-	INIT_WORK(&f->frags_work, inet_frag_worker);
-
-	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
-		struct inet_frag_bucket *hb = &f->hash[i];
-
-		spin_lock_init(&hb->chain_lock);
-		INIT_HLIST_HEAD(&hb->chain);
-	}
-
-	seqlock_init(&f->rnd_seqlock);
-	f->last_rebuild_jiffies = 0;
 	f->frags_cachep = kmem_cache_create(f->frags_cache_name, f->qsize, 0, 0,
 					    NULL);
 	if (!f->frags_cachep)
@@ -214,66 +59,42 @@ EXPORT_SYMBOL(inet_frags_init);
 
 void inet_frags_fini(struct inet_frags *f)
 {
-	cancel_work_sync(&f->frags_work);
+	/* We must wait that all inet_frag_destroy_rcu() have completed. */
+	rcu_barrier();
+
 	kmem_cache_destroy(f->frags_cachep);
+	f->frags_cachep = NULL;
 }
 EXPORT_SYMBOL(inet_frags_fini);
 
-void inet_frags_exit_net(struct netns_frags *nf)
+static void inet_frags_free_cb(void *ptr, void *arg)
 {
-	struct inet_frags *f =nf->f;
-	unsigned int seq;
-	int i;
-
-	nf->low_thresh = 0;
-
-evict_again:
-	local_bh_disable();
-	seq = read_seqbegin(&f->rnd_seqlock);
-
-	for (i = 0; i < INETFRAGS_HASHSZ ; i++)
-		inet_evict_bucket(f, &f->hash[i]);
-
-	local_bh_enable();
-	cond_resched();
+	struct inet_frag_queue *fq = ptr;
 
-	if (read_seqretry(&f->rnd_seqlock, seq) ||
-	    sum_frag_mem_limit(nf))
-		goto evict_again;
-}
-EXPORT_SYMBOL(inet_frags_exit_net);
+	/* If we can not cancel the timer, it means this frag_queue
+	 * is already disappearing, we have nothing to do.
+	 * Otherwise, we own a refcount until the end of this function.
+	 */
+	if (!del_timer(&fq->timer))
+		return;
 
-static struct inet_frag_bucket *
-get_frag_bucket_locked(struct inet_frag_queue *fq, struct inet_frags *f)
-__acquires(hb->chain_lock)
-{
-	struct inet_frag_bucket *hb;
-	unsigned int seq, hash;
-
- restart:
-	seq = read_seqbegin(&f->rnd_seqlock);
-
-	hash = inet_frag_hashfn(f, fq);
-	hb = &f->hash[hash];
-
-	spin_lock(&hb->chain_lock);
-	if (read_seqretry(&f->rnd_seqlock, seq)) {
-		spin_unlock(&hb->chain_lock);
-		goto restart;
+	spin_lock_bh(&fq->lock);
+	if (!(fq->flags & INET_FRAG_COMPLETE)) {
+		fq->flags |= INET_FRAG_COMPLETE;
+		atomic_dec(&fq->refcnt);
 	}
+	spin_unlock_bh(&fq->lock);
 
-	return hb;
+	inet_frag_put(fq);
 }
 
-static inline void fq_unlink(struct inet_frag_queue *fq)
+void inet_frags_exit_net(struct netns_frags *nf)
 {
-	struct inet_frag_bucket *hb;
+	nf->low_thresh = 0; /* prevent creation of new frags */
 
-	hb = get_frag_bucket_locked(fq, fq->net->f);
-	hlist_del(&fq->list);
-	fq->flags |= INET_FRAG_COMPLETE;
-	spin_unlock(&hb->chain_lock);
+	rhashtable_free_and_destroy(&nf->rhashtable, inet_frags_free_cb, NULL);
 }
+EXPORT_SYMBOL(inet_frags_exit_net);
 
 void inet_frag_kill(struct inet_frag_queue *fq)
 {
@@ -281,12 +102,26 @@ void inet_frag_kill(struct inet_frag_que
 		atomic_dec(&fq->refcnt);
 
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
-		fq_unlink(fq);
+		struct netns_frags *nf = fq->net;
+
+		fq->flags |= INET_FRAG_COMPLETE;
+		rhashtable_remove_fast(&nf->rhashtable, &fq->node, nf->f->rhash_params);
 		atomic_dec(&fq->refcnt);
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
 
+static void inet_frag_destroy_rcu(struct rcu_head *head)
+{
+	struct inet_frag_queue *q = container_of(head, struct inet_frag_queue,
+						 rcu);
+	struct inet_frags *f = q->net->f;
+
+	if (f->destructor)
+		f->destructor(q);
+	kmem_cache_free(f->frags_cachep, q);
+}
+
 void inet_frag_destroy(struct inet_frag_queue *q)
 {
 	struct sk_buff *fp;
@@ -310,55 +145,21 @@ void inet_frag_destroy(struct inet_frag_
 	}
 	sum = sum_truesize + f->qsize;
 
-	if (f->destructor)
-		f->destructor(q);
-	kmem_cache_free(f->frags_cachep, q);
+	call_rcu(&q->rcu, inet_frag_destroy_rcu);
 
 	sub_frag_mem_limit(nf, sum);
 }
 EXPORT_SYMBOL(inet_frag_destroy);
 
-static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
-						struct inet_frag_queue *qp_in,
-						struct inet_frags *f,
-						void *arg)
-{
-	struct inet_frag_bucket *hb = get_frag_bucket_locked(qp_in, f);
-	struct inet_frag_queue *qp;
-
-#ifdef CONFIG_SMP
-	/* With SMP race we have to recheck hash table, because
-	 * such entry could have been created on other cpu before
-	 * we acquired hash bucket lock.
-	 */
-	hlist_for_each_entry(qp, &hb->chain, list) {
-		if (qp->net == nf && f->match(qp, arg)) {
-			atomic_inc(&qp->refcnt);
-			spin_unlock(&hb->chain_lock);
-			qp_in->flags |= INET_FRAG_COMPLETE;
-			inet_frag_put(qp_in);
-			return qp;
-		}
-	}
-#endif
-	qp = qp_in;
-	if (!mod_timer(&qp->timer, jiffies + nf->timeout))
-		atomic_inc(&qp->refcnt);
-
-	atomic_inc(&qp->refcnt);
-	hlist_add_head(&qp->list, &hb->chain);
-
-	spin_unlock(&hb->chain_lock);
-
-	return qp;
-}
-
 static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 					       struct inet_frags *f,
 					       void *arg)
 {
 	struct inet_frag_queue *q;
 
+	if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh)
+		return NULL;
+
 	q = kmem_cache_zalloc(f->frags_cachep, GFP_ATOMIC);
 	if (!q)
 		return NULL;
@@ -369,64 +170,51 @@ static struct inet_frag_queue *inet_frag
 
 	setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
 	spin_lock_init(&q->lock);
-	atomic_set(&q->refcnt, 1);
+	atomic_set(&q->refcnt, 3);
 
 	return q;
 }
 
 static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
-						struct inet_frags *f,
 						void *arg)
 {
+	struct inet_frags *f = nf->f;
 	struct inet_frag_queue *q;
+	int err;
 
 	q = inet_frag_alloc(nf, f, arg);
 	if (!q)
 		return NULL;
 
-	return inet_frag_intern(nf, q, f, arg);
-}
+	mod_timer(&q->timer, jiffies + nf->timeout);
 
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
-				       struct inet_frags *f, void *key,
-				       unsigned int hash)
-{
-	struct inet_frag_bucket *hb;
-	struct inet_frag_queue *q;
-	int depth = 0;
-
-	if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
-		inet_frag_schedule_worker(f);
+	err = rhashtable_insert_fast(&nf->rhashtable, &q->node,
+				     f->rhash_params);
+	if (err < 0) {
+		q->flags |= INET_FRAG_COMPLETE;
+		inet_frag_kill(q);
+		inet_frag_destroy(q);
 		return NULL;
 	}
+	return q;
+}
+EXPORT_SYMBOL(inet_frag_create);
 
-	if (frag_mem_limit(nf) > nf->low_thresh)
-		inet_frag_schedule_worker(f);
-
-	hash &= (INETFRAGS_HASHSZ - 1);
-	hb = &f->hash[hash];
-
-	spin_lock(&hb->chain_lock);
-	hlist_for_each_entry(q, &hb->chain, list) {
-		if (q->net == nf && f->match(q, key)) {
-			atomic_inc(&q->refcnt);
-			spin_unlock(&hb->chain_lock);
-			return q;
-		}
-		depth++;
-	}
-	spin_unlock(&hb->chain_lock);
-
-	if (depth <= INETFRAGS_MAXDEPTH)
-		return inet_frag_create(nf, f, key);
+/* TODO : call from rcu_read_lock() and no longer use refcount_inc_not_zero() */
+struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key)
+{
+	struct inet_frag_queue *fq;
 
-	if (inet_frag_may_rebuild(f)) {
-		if (!f->rebuild)
-			f->rebuild = true;
-		inet_frag_schedule_worker(f);
+	rcu_read_lock();
+	fq = rhashtable_lookup(&nf->rhashtable, key, nf->f->rhash_params);
+	if (fq) {
+		if (!atomic_inc_not_zero(&fq->refcnt))
+			fq = NULL;
+		rcu_read_unlock();
+		return fq;
 	}
-
-	return ERR_PTR(-ENOBUFS);
+	rcu_read_unlock();
+	return inet_frag_create(nf, key);
 }
 EXPORT_SYMBOL(inet_frag_find);
 
@@ -434,8 +222,7 @@ void inet_frag_maybe_warn_overflow(struc
 				   const char *prefix)
 {
 	static const char msg[] = "inet_frag_find: Fragment hash bucket"
-		" list length grew over limit " __stringify(INETFRAGS_MAXDEPTH)
-		". Dropping fragment.\n";
+		" list length grew over limit. Dropping fragment.\n";
 
 	if (PTR_ERR(q) == -ENOBUFS)
 		net_dbg_ratelimited("%s%s", prefix, msg);
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -68,15 +68,9 @@ struct ipfrag_skb_cb
 struct ipq {
 	struct inet_frag_queue q;
 
-	u32		user;
-	__be32		saddr;
-	__be32		daddr;
-	__be16		id;
-	u8		protocol;
 	u8		ecn; /* RFC3168 support */
 	u16		max_df_size; /* largest frag with DF set seen */
 	int             iif;
-	int             vif;   /* L3 master device index */
 	unsigned int    rid;
 	struct inet_peer *peer;
 };
@@ -96,41 +90,6 @@ int ip_frag_mem(struct net *net)
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 			 struct net_device *dev);
 
-struct ip4_create_arg {
-	struct iphdr *iph;
-	u32 user;
-	int vif;
-};
-
-static unsigned int ipqhashfn(__be16 id, __be32 saddr, __be32 daddr, u8 prot)
-{
-	net_get_random_once(&ip4_frags.rnd, sizeof(ip4_frags.rnd));
-	return jhash_3words((__force u32)id << 16 | prot,
-			    (__force u32)saddr, (__force u32)daddr,
-			    ip4_frags.rnd);
-}
-
-static unsigned int ip4_hashfn(const struct inet_frag_queue *q)
-{
-	const struct ipq *ipq;
-
-	ipq = container_of(q, struct ipq, q);
-	return ipqhashfn(ipq->id, ipq->saddr, ipq->daddr, ipq->protocol);
-}
-
-static bool ip4_frag_match(const struct inet_frag_queue *q, const void *a)
-{
-	const struct ipq *qp;
-	const struct ip4_create_arg *arg = a;
-
-	qp = container_of(q, struct ipq, q);
-	return	qp->id == arg->iph->id &&
-		qp->saddr == arg->iph->saddr &&
-		qp->daddr == arg->iph->daddr &&
-		qp->protocol == arg->iph->protocol &&
-		qp->user == arg->user &&
-		qp->vif == arg->vif;
-}
 
 static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
 {
@@ -139,17 +98,12 @@ static void ip4_frag_init(struct inet_fr
 					       frags);
 	struct net *net = container_of(ipv4, struct net, ipv4);
 
-	const struct ip4_create_arg *arg = a;
+	const struct frag_v4_compare_key *key = a;
 
-	qp->protocol = arg->iph->protocol;
-	qp->id = arg->iph->id;
-	qp->ecn = ip4_frag_ecn(arg->iph->tos);
-	qp->saddr = arg->iph->saddr;
-	qp->daddr = arg->iph->daddr;
-	qp->vif = arg->vif;
-	qp->user = arg->user;
+	q->key.v4 = *key;
+	qp->ecn = 0;
 	qp->peer = q->net->max_dist ?
-		inet_getpeer_v4(net->ipv4.peers, arg->iph->saddr, arg->vif, 1) :
+		inet_getpeer_v4(net->ipv4.peers, key->saddr, key->vif, 1) :
 		NULL;
 }
 
@@ -232,7 +186,7 @@ static void ip_expire(unsigned long arg)
 		/* Only an end host needs to send an ICMP
 		 * "Fragment Reassembly Timeout" message, per RFC792.
 		 */
-		if (frag_expire_skip_icmp(qp->user) &&
+		if (frag_expire_skip_icmp(qp->q.key.v4.user) &&
 		    (skb_rtable(head)->rt_type != RTN_LOCAL))
 			goto out;
 
@@ -260,17 +214,17 @@ out_rcu_unlock:
 static struct ipq *ip_find(struct net *net, struct iphdr *iph,
 			   u32 user, int vif)
 {
+	struct frag_v4_compare_key key = {
+		.saddr = iph->saddr,
+		.daddr = iph->daddr,
+		.user = user,
+		.vif = vif,
+		.id = iph->id,
+		.protocol = iph->protocol,
+	};
 	struct inet_frag_queue *q;
-	struct ip4_create_arg arg;
-	unsigned int hash;
-
-	arg.iph = iph;
-	arg.user = user;
-	arg.vif = vif;
 
-	hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);
-
-	q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash);
+	q = inet_frag_find(&net->ipv4.frags, &key);
 	if (IS_ERR_OR_NULL(q)) {
 		inet_frag_maybe_warn_overflow(q, pr_fmt());
 		return NULL;
@@ -659,7 +613,7 @@ out_nomem:
 	err = -ENOMEM;
 	goto out_fail;
 out_oversize:
-	net_info_ratelimited("Oversized IP packet from %pI4\n", &qp->saddr);
+	net_info_ratelimited("Oversized IP packet from %pI4\n", &qp->q.key.v4.saddr);
 out_fail:
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
 	return err;
@@ -897,15 +851,47 @@ static struct pernet_operations ip4_frag
 	.exit = ipv4_frags_exit_net,
 };
 
+
+static u32 ip4_key_hashfn(const void *data, u32 len, u32 seed)
+{
+	return jhash2(data,
+		      sizeof(struct frag_v4_compare_key) / sizeof(u32), seed);
+}
+
+static u32 ip4_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct inet_frag_queue *fq = data;
+
+	return jhash2((const u32 *)&fq->key.v4,
+		      sizeof(struct frag_v4_compare_key) / sizeof(u32), seed);
+}
+
+static int ip4_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr)
+{
+	const struct frag_v4_compare_key *key = arg->key;
+	const struct inet_frag_queue *fq = ptr;
+
+	return !!memcmp(&fq->key, key, sizeof(*key));
+}
+
+static const struct rhashtable_params ip4_rhash_params = {
+	.head_offset		= offsetof(struct inet_frag_queue, node),
+	.key_offset		= offsetof(struct inet_frag_queue, key),
+	.key_len		= sizeof(struct frag_v4_compare_key),
+	.hashfn			= ip4_key_hashfn,
+	.obj_hashfn		= ip4_obj_hashfn,
+	.obj_cmpfn		= ip4_obj_cmpfn,
+	.automatic_shrinking	= true,
+};
+
 void __init ipfrag_init(void)
 {
-	ip4_frags.hashfn = ip4_hashfn;
 	ip4_frags.constructor = ip4_frag_init;
 	ip4_frags.destructor = ip4_frag_free;
 	ip4_frags.qsize = sizeof(struct ipq);
-	ip4_frags.match = ip4_frag_match;
 	ip4_frags.frag_expire = ip_expire;
 	ip4_frags.frags_cache_name = ip_frag_cache_name;
+	ip4_frags.rhash_params = ip4_rhash_params;
 	if (inet_frags_init(&ip4_frags))
 		panic("IP: failed to allocate ip4_frags cache\n");
 	ip4_frags_ctl_register();
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -152,23 +152,6 @@ static inline u8 ip6_frag_ecn(const stru
 	return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK);
 }
 
-static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr,
-				 const struct in6_addr *daddr)
-{
-	net_get_random_once(&nf_frags.rnd, sizeof(nf_frags.rnd));
-	return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
-			    (__force u32)id, nf_frags.rnd);
-}
-
-
-static unsigned int nf_hashfn(const struct inet_frag_queue *q)
-{
-	const struct frag_queue *nq;
-
-	nq = container_of(q, struct frag_queue, q);
-	return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
-}
-
 static void nf_ct_frag6_expire(unsigned long data)
 {
 	struct frag_queue *fq;
@@ -181,26 +164,19 @@ static void nf_ct_frag6_expire(unsigned
 }
 
 /* Creation primitives. */
-static inline struct frag_queue *fq_find(struct net *net, __be32 id,
-					 u32 user, struct in6_addr *src,
-					 struct in6_addr *dst, int iif, u8 ecn)
+static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user,
+				  const struct ipv6hdr *hdr, int iif)
 {
+	struct frag_v6_compare_key key = {
+		.id = id,
+		.saddr = hdr->saddr,
+		.daddr = hdr->daddr,
+		.user = user,
+		.iif = iif,
+	};
 	struct inet_frag_queue *q;
-	struct ip6_create_arg arg;
-	unsigned int hash;
-
-	arg.id = id;
-	arg.user = user;
-	arg.src = src;
-	arg.dst = dst;
-	arg.iif = iif;
-	arg.ecn = ecn;
-
-	local_bh_disable();
-	hash = nf_hash_frag(id, src, dst);
 
-	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
-	local_bh_enable();
+	q = inet_frag_find(&net->nf_frag.frags, &key);
 	if (IS_ERR_OR_NULL(q)) {
 		inet_frag_maybe_warn_overflow(q, pr_fmt());
 		return NULL;
@@ -592,8 +568,8 @@ int nf_ct_frag6_gather(struct net *net,
 	fhdr = (struct frag_hdr *)skb_transport_header(skb);
 
 	skb_orphan(skb);
-	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
-		     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+	fq = fq_find(net, fhdr->identification, user, hdr,
+		     skb->dev ? skb->dev->ifindex : 0);
 	if (fq == NULL) {
 		pr_debug("Can't find and can't create new queue\n");
 		return -ENOMEM;
@@ -661,13 +637,12 @@ int nf_ct_frag6_init(void)
 {
 	int ret = 0;
 
-	nf_frags.hashfn = nf_hashfn;
 	nf_frags.constructor = ip6_frag_init;
 	nf_frags.destructor = NULL;
 	nf_frags.qsize = sizeof(struct frag_queue);
-	nf_frags.match = ip6_frag_match;
 	nf_frags.frag_expire = nf_ct_frag6_expire;
 	nf_frags.frags_cache_name = nf_frags_cache_name;
+	nf_frags.rhash_params = ip6_rhash_params;
 	ret = inet_frags_init(&nf_frags);
 	if (ret)
 		goto out;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -79,52 +79,13 @@ static struct inet_frags ip6_frags;
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 			  struct net_device *dev);
 
-/*
- * callers should be careful not to use the hash value outside the ipfrag_lock
- * as doing so could race with ipfrag_hash_rnd being recalculated.
- */
-static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-				    const struct in6_addr *daddr)
-{
-	net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
-	return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
-			    (__force u32)id, ip6_frags.rnd);
-}
-
-static unsigned int ip6_hashfn(const struct inet_frag_queue *q)
-{
-	const struct frag_queue *fq;
-
-	fq = container_of(q, struct frag_queue, q);
-	return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr);
-}
-
-bool ip6_frag_match(const struct inet_frag_queue *q, const void *a)
-{
-	const struct frag_queue *fq;
-	const struct ip6_create_arg *arg = a;
-
-	fq = container_of(q, struct frag_queue, q);
-	return	fq->id == arg->id &&
-		fq->user == arg->user &&
-		ipv6_addr_equal(&fq->saddr, arg->src) &&
-		ipv6_addr_equal(&fq->daddr, arg->dst) &&
-		(arg->iif == fq->iif ||
-		 !(ipv6_addr_type(arg->dst) & (IPV6_ADDR_MULTICAST |
-					       IPV6_ADDR_LINKLOCAL)));
-}
-EXPORT_SYMBOL(ip6_frag_match);
-
 void ip6_frag_init(struct inet_frag_queue *q, const void *a)
 {
 	struct frag_queue *fq = container_of(q, struct frag_queue, q);
-	const struct ip6_create_arg *arg = a;
+	const struct frag_v6_compare_key *key = a;
 
-	fq->id = arg->id;
-	fq->user = arg->user;
-	fq->saddr = *arg->src;
-	fq->daddr = *arg->dst;
-	fq->ecn = arg->ecn;
+	q->key.v6 = *key;
+	fq->ecn = 0;
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
@@ -181,23 +142,22 @@ static void ip6_frag_expire(unsigned lon
 }
 
 static struct frag_queue *
-fq_find(struct net *net, __be32 id, const struct in6_addr *src,
-	const struct in6_addr *dst, int iif, u8 ecn)
+fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
 {
+	struct frag_v6_compare_key key = {
+		.id = id,
+		.saddr = hdr->saddr,
+		.daddr = hdr->daddr,
+		.user = IP6_DEFRAG_LOCAL_DELIVER,
+		.iif = iif,
+	};
 	struct inet_frag_queue *q;
-	struct ip6_create_arg arg;
-	unsigned int hash;
-
-	arg.id = id;
-	arg.user = IP6_DEFRAG_LOCAL_DELIVER;
-	arg.src = src;
-	arg.dst = dst;
-	arg.iif = iif;
-	arg.ecn = ecn;
 
-	hash = inet6_hash_frag(id, src, dst);
+	if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
+					    IPV6_ADDR_LINKLOCAL)))
+		key.iif = 0;
 
-	q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);
+	q = inet_frag_find(&net->ipv6.frags, &key);
 	if (IS_ERR_OR_NULL(q)) {
 		inet_frag_maybe_warn_overflow(q, pr_fmt());
 		return NULL;
@@ -524,6 +484,7 @@ static int ipv6_frag_rcv(struct sk_buff
 	struct frag_queue *fq;
 	const struct ipv6hdr *hdr = ipv6_hdr(skb);
 	struct net *net = dev_net(skb_dst(skb)->dev);
+	int iif;
 
 	if (IP6CB(skb)->flags & IP6SKB_FRAGMENTED)
 		goto fail_hdr;
@@ -552,13 +513,14 @@ static int ipv6_frag_rcv(struct sk_buff
 		return 1;
 	}
 
-	fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr,
-		     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+	iif = skb->dev ? skb->dev->ifindex : 0;
+	fq = fq_find(net, fhdr->identification, hdr, iif);
 	if (fq) {
 		int ret;
 
 		spin_lock(&fq->q.lock);
 
+		fq->iif = iif;
 		ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
 
 		spin_unlock(&fq->q.lock);
@@ -732,17 +694,47 @@ static struct pernet_operations ip6_frag
 	.exit = ipv6_frags_exit_net,
 };
 
+static u32 ip6_key_hashfn(const void *data, u32 len, u32 seed)
+{
+	return jhash2(data,
+		      sizeof(struct frag_v6_compare_key) / sizeof(u32), seed);
+}
+
+static u32 ip6_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct inet_frag_queue *fq = data;
+
+	return jhash2((const u32 *)&fq->key.v6,
+		      sizeof(struct frag_v6_compare_key) / sizeof(u32), seed);
+}
+
+static int ip6_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr)
+{
+	const struct frag_v6_compare_key *key = arg->key;
+	const struct inet_frag_queue *fq = ptr;
+
+	return !!memcmp(&fq->key, key, sizeof(*key));
+}
+
+const struct rhashtable_params ip6_rhash_params = {
+	.head_offset		= offsetof(struct inet_frag_queue, node),
+	.hashfn			= ip6_key_hashfn,
+	.obj_hashfn		= ip6_obj_hashfn,
+	.obj_cmpfn		= ip6_obj_cmpfn,
+	.automatic_shrinking	= true,
+};
+EXPORT_SYMBOL(ip6_rhash_params);
+
 int __init ipv6_frag_init(void)
 {
 	int ret;
 
-	ip6_frags.hashfn = ip6_hashfn;
 	ip6_frags.constructor = ip6_frag_init;
 	ip6_frags.destructor = NULL;
 	ip6_frags.qsize = sizeof(struct frag_queue);
-	ip6_frags.match = ip6_frag_match;
 	ip6_frags.frag_expire = ip6_frag_expire;
 	ip6_frags.frags_cache_name = ip6_frag_cache_name;
+	ip6_frags.rhash_params = ip6_rhash_params;
 	ret = inet_frags_init(&ip6_frags);
 	if (ret)
 		goto out;



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 51/71] inet: frags: remove some helpers
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 52/71] inet: frags: get rif of inet_frag_evicting() Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()

Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405ab ("inet: frag: don't account number
of fragment queues")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h |    5 -----
 include/net/ip.h        |    1 -
 include/net/ipv6.h      |    7 -------
 net/ipv4/ip_fragment.c  |    5 -----
 net/ipv4/proc.c         |    6 +++---
 net/ipv6/proc.c         |    5 +++--
 6 files changed, 6 insertions(+), 23 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -140,11 +140,6 @@ static inline void add_frag_mem_limit(st
 	atomic_add(i, &nf->mem);
 }
 
-static inline int sum_frag_mem_limit(struct netns_frags *nf)
-{
-	return atomic_read(&nf->mem);
-}
-
 /* RFC 3168 support :
  * We want to check ECN values of all fragments, do detect invalid combinations.
  * In ipq->ecn, we store the OR value of each ip4_frag_ecn() fragment value.
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -548,7 +548,6 @@ static inline struct sk_buff *ip_check_d
 	return skb;
 }
 #endif
-int ip_frag_mem(struct net *net);
 
 /*
  *	Functions provided by ip_forward.c
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -330,13 +330,6 @@ static inline bool ipv6_accept_ra(struct
 	    idev->cnf.accept_ra;
 }
 
-#if IS_ENABLED(CONFIG_IPV6)
-static inline int ip6_frag_mem(struct net *net)
-{
-	return sum_frag_mem_limit(&net->ipv6.frags);
-}
-#endif
-
 #define IPV6_FRAG_HIGH_THRESH	(4 * 1024*1024)	/* 4194304 */
 #define IPV6_FRAG_LOW_THRESH	(3 * 1024*1024)	/* 3145728 */
 #define IPV6_FRAG_TIMEOUT	(60 * HZ)	/* 60 seconds */
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -82,11 +82,6 @@ static u8 ip4_frag_ecn(u8 tos)
 
 static struct inet_frags ip4_frags;
 
-int ip_frag_mem(struct net *net)
-{
-	return sum_frag_mem_limit(&net->ipv4.frags);
-}
-
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 			 struct net_device *dev);
 
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -54,7 +54,6 @@
 static int sockstat_seq_show(struct seq_file *seq, void *v)
 {
 	struct net *net = seq->private;
-	unsigned int frag_mem;
 	int orphans, sockets;
 
 	local_bh_disable();
@@ -74,8 +73,9 @@ static int sockstat_seq_show(struct seq_
 		   sock_prot_inuse_get(net, &udplite_prot));
 	seq_printf(seq, "RAW: inuse %d\n",
 		   sock_prot_inuse_get(net, &raw_prot));
-	frag_mem = ip_frag_mem(net);
-	seq_printf(seq,  "FRAG: inuse %u memory %u\n", !!frag_mem, frag_mem);
+	seq_printf(seq,  "FRAG: inuse %u memory %u\n",
+		   atomic_read(&net->ipv4.frags.rhashtable.nelems),
+		   frag_mem_limit(&net->ipv4.frags));
 	return 0;
 }
 
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -38,7 +38,6 @@
 static int sockstat6_seq_show(struct seq_file *seq, void *v)
 {
 	struct net *net = seq->private;
-	unsigned int frag_mem = ip6_frag_mem(net);
 
 	seq_printf(seq, "TCP6: inuse %d\n",
 		       sock_prot_inuse_get(net, &tcpv6_prot));
@@ -48,7 +47,9 @@ static int sockstat6_seq_show(struct seq
 			sock_prot_inuse_get(net, &udplitev6_prot));
 	seq_printf(seq, "RAW6: inuse %d\n",
 		       sock_prot_inuse_get(net, &rawv6_prot));
-	seq_printf(seq, "FRAG6: inuse %u memory %u\n", !!frag_mem, frag_mem);
+	seq_printf(seq, "FRAG6: inuse %u memory %u\n",
+		   atomic_read(&net->ipv6.frags.rhashtable.nelems),
+		   frag_mem_limit(&net->ipv6.frags));
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 52/71] inet: frags: get rif of inet_frag_evicting()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 51/71] inet: frags: remove some helpers Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 53/71] inet: frags: remove inet_frag_maybe_warn_overflow() Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

This refactors ip_expire() since one indentation level is removed.

Note: in the future, we should try hard to avoid the skb_clone()
since this is a serious performance cost.
Under DDOS, the ICMP message wont be sent because of rate limits.

Fact that ip6_expire_frag_queue() does not use skb_clone() is
disturbing too. Presumably IPv6 should have the same
issue than the one we fixed in commit ec4fbd64751d
("inet: frag: release spinlock before calling icmp_send()")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 399d1404be660d355192ff4df5ccc3f4159ec1e4)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h |    5 ---
 net/ipv4/ip_fragment.c  |   65 +++++++++++++++++++++++-------------------------
 net/ipv6/reassembly.c   |    4 --
 3 files changed, 32 insertions(+), 42 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -118,11 +118,6 @@ static inline void inet_frag_put(struct
 		inet_frag_destroy(q);
 }
 
-static inline bool inet_frag_evicting(struct inet_frag_queue *q)
-{
-	return false;
-}
-
 /* Memory Tracking Functions. */
 
 static inline int frag_mem_limit(struct netns_frags *nf)
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -141,8 +141,11 @@ static bool frag_expire_skip_icmp(u32 us
  */
 static void ip_expire(unsigned long arg)
 {
-	struct ipq *qp;
+	struct sk_buff *clone, *head;
+	const struct iphdr *iph;
 	struct net *net;
+	struct ipq *qp;
+	int err;
 
 	qp = container_of((struct inet_frag_queue *) arg, struct ipq, q);
 	net = container_of(qp->q.net, struct net, ipv4.frags);
@@ -156,45 +159,41 @@ static void ip_expire(unsigned long arg)
 	ipq_kill(qp);
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
 
-	if (!inet_frag_evicting(&qp->q)) {
-		struct sk_buff *clone, *head = qp->q.fragments;
-		const struct iphdr *iph;
-		int err;
+	head = qp->q.fragments;
 
-		__IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
+	__IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
 
-		if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !qp->q.fragments)
-			goto out;
+	if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
+		goto out;
 
-		head->dev = dev_get_by_index_rcu(net, qp->iif);
-		if (!head->dev)
-			goto out;
+	head->dev = dev_get_by_index_rcu(net, qp->iif);
+	if (!head->dev)
+		goto out;
 
 
-		/* skb has no dst, perform route lookup again */
-		iph = ip_hdr(head);
-		err = ip_route_input_noref(head, iph->daddr, iph->saddr,
+	/* skb has no dst, perform route lookup again */
+	iph = ip_hdr(head);
+	err = ip_route_input_noref(head, iph->daddr, iph->saddr,
 					   iph->tos, head->dev);
-		if (err)
-			goto out;
+	if (err)
+		goto out;
+
+	/* Only an end host needs to send an ICMP
+	 * "Fragment Reassembly Timeout" message, per RFC792.
+	 */
+	if (frag_expire_skip_icmp(qp->q.key.v4.user) &&
+	    (skb_rtable(head)->rt_type != RTN_LOCAL))
+		goto out;
+
+	clone = skb_clone(head, GFP_ATOMIC);
 
-		/* Only an end host needs to send an ICMP
-		 * "Fragment Reassembly Timeout" message, per RFC792.
-		 */
-		if (frag_expire_skip_icmp(qp->q.key.v4.user) &&
-		    (skb_rtable(head)->rt_type != RTN_LOCAL))
-			goto out;
-
-		clone = skb_clone(head, GFP_ATOMIC);
-
-		/* Send an ICMP "Fragment Reassembly Timeout" message. */
-		if (clone) {
-			spin_unlock(&qp->q.lock);
-			icmp_send(clone, ICMP_TIME_EXCEEDED,
-				  ICMP_EXC_FRAGTIME, 0);
-			consume_skb(clone);
-			goto out_rcu_unlock;
-		}
+	/* Send an ICMP "Fragment Reassembly Timeout" message. */
+	if (clone) {
+		spin_unlock(&qp->q.lock);
+		icmp_send(clone, ICMP_TIME_EXCEEDED,
+			  ICMP_EXC_FRAGTIME, 0);
+		consume_skb(clone);
+		goto out_rcu_unlock;
 	}
 out:
 	spin_unlock(&qp->q.lock);
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -106,10 +106,6 @@ void ip6_expire_frag_queue(struct net *n
 		goto out_rcu_unlock;
 
 	__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS);
-
-	if (inet_frag_evicting(&fq->q))
-		goto out_rcu_unlock;
-
 	__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
 
 	/* Don't send error if the first segment did not arrive. */



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 53/71] inet: frags: remove inet_frag_maybe_warn_overflow()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 52/71] inet: frags: get rif of inet_frag_evicting() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 54/71] inet: frags: break the 2GB limit for frags storage Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

This function is obsolete, after rhashtable addition to inet defrag.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h                 |    2 --
 net/ieee802154/6lowpan/reassembly.c     |    5 ++---
 net/ipv4/inet_fragment.c                |   11 -----------
 net/ipv4/ip_fragment.c                  |    5 ++---
 net/ipv6/netfilter/nf_conntrack_reasm.c |    5 ++---
 net/ipv6/reassembly.c                   |    5 ++---
 6 files changed, 8 insertions(+), 25 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -109,8 +109,6 @@ void inet_frags_exit_net(struct netns_fr
 void inet_frag_kill(struct inet_frag_queue *q);
 void inet_frag_destroy(struct inet_frag_queue *q);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
-void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
-				   const char *prefix);
 
 static inline void inet_frag_put(struct inet_frag_queue *q)
 {
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -83,10 +83,9 @@ fq_find(struct net *net, const struct lo
 	struct inet_frag_queue *q;
 
 	q = inet_frag_find(&ieee802154_lowpan->frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct lowpan_frag_queue, q);
 }
 
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -217,14 +217,3 @@ struct inet_frag_queue *inet_frag_find(s
 	return inet_frag_create(nf, key);
 }
 EXPORT_SYMBOL(inet_frag_find);
-
-void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
-				   const char *prefix)
-{
-	static const char msg[] = "inet_frag_find: Fragment hash bucket"
-		" list length grew over limit. Dropping fragment.\n";
-
-	if (PTR_ERR(q) == -ENOBUFS)
-		net_dbg_ratelimited("%s%s", prefix, msg);
-}
-EXPORT_SYMBOL(inet_frag_maybe_warn_overflow);
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -219,10 +219,9 @@ static struct ipq *ip_find(struct net *n
 	struct inet_frag_queue *q;
 
 	q = inet_frag_find(&net->ipv4.frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct ipq, q);
 }
 
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -177,10 +177,9 @@ static struct frag_queue *fq_find(struct
 	struct inet_frag_queue *q;
 
 	q = inet_frag_find(&net->nf_frag.frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct frag_queue, q);
 }
 
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -154,10 +154,9 @@ fq_find(struct net *net, __be32 id, cons
 		key.iif = 0;
 
 	q = inet_frag_find(&net->ipv6.frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct frag_queue, q);
 }
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 54/71] inet: frags: break the 2GB limit for frags storage
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 53/71] inet: frags: remove inet_frag_maybe_warn_overflow() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 55/71] inet: frags: do not clone skb in ip_expire() Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) > nf->high_thresh)

Tested:

$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh

<frag DDOS>

$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds                    3317150            0.0
IpReasmFails                    3317112            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/networking/ip-sysctl.txt  |    4 ++--
 include/net/inet_frag.h                 |   20 ++++++++++----------
 net/ieee802154/6lowpan/reassembly.c     |   10 +++++-----
 net/ipv4/ip_fragment.c                  |   10 +++++-----
 net/ipv4/proc.c                         |    2 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c |   10 +++++-----
 net/ipv6/proc.c                         |    2 +-
 net/ipv6/reassembly.c                   |    6 +++---
 8 files changed, 32 insertions(+), 32 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -122,10 +122,10 @@ min_adv_mss - INTEGER
 
 IP Fragmentation:
 
-ipfrag_high_thresh - INTEGER
+ipfrag_high_thresh - LONG INTEGER
 	Maximum memory used to reassemble IP fragments.
 
-ipfrag_low_thresh - INTEGER
+ipfrag_low_thresh - LONG INTEGER
 	(Obsolete since linux-4.17)
 	Maximum memory used to reassemble IP fragments before the kernel
 	begins to remove incomplete fragment queues to free up resources.
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -7,11 +7,11 @@ struct netns_frags {
 	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
 
 	/* Keep atomic mem on separate cachelines in structs that include it */
-	atomic_t		mem ____cacheline_aligned_in_smp;
+	atomic_long_t		mem ____cacheline_aligned_in_smp;
 	/* sysctls */
+	long			high_thresh;
+	long			low_thresh;
 	int			timeout;
-	int			high_thresh;
-	int			low_thresh;
 	int			max_dist;
 	struct inet_frags	*f;
 };
@@ -101,7 +101,7 @@ void inet_frags_fini(struct inet_frags *
 
 static inline int inet_frags_init_net(struct netns_frags *nf)
 {
-	atomic_set(&nf->mem, 0);
+	atomic_long_set(&nf->mem, 0);
 	return rhashtable_init(&nf->rhashtable, &nf->f->rhash_params);
 }
 void inet_frags_exit_net(struct netns_frags *nf);
@@ -118,19 +118,19 @@ static inline void inet_frag_put(struct
 
 /* Memory Tracking Functions. */
 
-static inline int frag_mem_limit(struct netns_frags *nf)
+static inline long frag_mem_limit(const struct netns_frags *nf)
 {
-	return atomic_read(&nf->mem);
+	return atomic_long_read(&nf->mem);
 }
 
-static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)
+static inline void sub_frag_mem_limit(struct netns_frags *nf, long val)
 {
-	atomic_sub(i, &nf->mem);
+	atomic_long_sub(val, &nf->mem);
 }
 
-static inline void add_frag_mem_limit(struct netns_frags *nf, int i)
+static inline void add_frag_mem_limit(struct netns_frags *nf, long val)
 {
-	atomic_add(i, &nf->mem);
+	atomic_long_add(val, &nf->mem);
 }
 
 /* RFC 3168 support :
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -410,23 +410,23 @@ err:
 }
 
 #ifdef CONFIG_SYSCTL
-static int zero;
+static long zero;
 
 static struct ctl_table lowpan_frags_ns_ctl_table[] = {
 	{
 		.procname	= "6lowpanfrag_high_thresh",
 		.data		= &init_net.ieee802154_lowpan.frags.high_thresh,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &init_net.ieee802154_lowpan.frags.low_thresh
 	},
 	{
 		.procname	= "6lowpanfrag_low_thresh",
 		.data		= &init_net.ieee802154_lowpan.frags.low_thresh,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &zero,
 		.extra2		= &init_net.ieee802154_lowpan.frags.high_thresh
 	},
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -681,23 +681,23 @@ struct sk_buff *ip_check_defrag(struct n
 EXPORT_SYMBOL(ip_check_defrag);
 
 #ifdef CONFIG_SYSCTL
-static int zero;
+static long zero;
 
 static struct ctl_table ip4_frags_ns_ctl_table[] = {
 	{
 		.procname	= "ipfrag_high_thresh",
 		.data		= &init_net.ipv4.frags.high_thresh,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &init_net.ipv4.frags.low_thresh
 	},
 	{
 		.procname	= "ipfrag_low_thresh",
 		.data		= &init_net.ipv4.frags.low_thresh,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &zero,
 		.extra2		= &init_net.ipv4.frags.high_thresh
 	},
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -73,7 +73,7 @@ static int sockstat_seq_show(struct seq_
 		   sock_prot_inuse_get(net, &udplite_prot));
 	seq_printf(seq, "RAW: inuse %d\n",
 		   sock_prot_inuse_get(net, &raw_prot));
-	seq_printf(seq,  "FRAG: inuse %u memory %u\n",
+	seq_printf(seq,  "FRAG: inuse %u memory %lu\n",
 		   atomic_read(&net->ipv4.frags.rhashtable.nelems),
 		   frag_mem_limit(&net->ipv4.frags));
 	return 0;
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -63,7 +63,7 @@ struct nf_ct_frag6_skb_cb
 static struct inet_frags nf_frags;
 
 #ifdef CONFIG_SYSCTL
-static int zero;
+static long zero;
 
 static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{
@@ -76,18 +76,18 @@ static struct ctl_table nf_ct_frag6_sysc
 	{
 		.procname	= "nf_conntrack_frag6_low_thresh",
 		.data		= &init_net.nf_frag.frags.low_thresh,
-		.maxlen		= sizeof(unsigned int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &zero,
 		.extra2		= &init_net.nf_frag.frags.high_thresh
 	},
 	{
 		.procname	= "nf_conntrack_frag6_high_thresh",
 		.data		= &init_net.nf_frag.frags.high_thresh,
-		.maxlen		= sizeof(unsigned int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &init_net.nf_frag.frags.low_thresh
 	},
 	{ }
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -47,7 +47,7 @@ static int sockstat6_seq_show(struct seq
 			sock_prot_inuse_get(net, &udplitev6_prot));
 	seq_printf(seq, "RAW6: inuse %d\n",
 		       sock_prot_inuse_get(net, &rawv6_prot));
-	seq_printf(seq, "FRAG6: inuse %u memory %u\n",
+	seq_printf(seq, "FRAG6: inuse %u memory %lu\n",
 		   atomic_read(&net->ipv6.frags.rhashtable.nelems),
 		   frag_mem_limit(&net->ipv6.frags));
 	return 0;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -546,15 +546,15 @@ static struct ctl_table ip6_frags_ns_ctl
 	{
 		.procname	= "ip6frag_high_thresh",
 		.data		= &init_net.ipv6.frags.high_thresh,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra1		= &init_net.ipv6.frags.low_thresh
 	},
 	{
 		.procname	= "ip6frag_low_thresh",
 		.data		= &init_net.ipv6.frags.low_thresh,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= &zero,



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 55/71] inet: frags: do not clone skb in ip_expire()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 54/71] inet: frags: break the 2GB limit for frags storage Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 56/71] ipv6: frags: rewrite ip6_expire_frag_queue() Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

An skb_clone() was added in commit ec4fbd64751d ("inet: frag: release
spinlock before calling icmp_send()")

While fixing the bug at that time, it also added a very high cost
for DDOS frags, as the ICMP rate limit is applied after this
expensive operation (skb_clone() + consume_skb(), implying memory
allocations, copy, and freeing)

We can use skb_get(head) here, all we want is to make sure skb wont
be freed by another cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1eec5d5670084ee644597bd26c25e22c69b9f748)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_fragment.c |   16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -141,8 +141,8 @@ static bool frag_expire_skip_icmp(u32 us
  */
 static void ip_expire(unsigned long arg)
 {
-	struct sk_buff *clone, *head;
 	const struct iphdr *iph;
+	struct sk_buff *head;
 	struct net *net;
 	struct ipq *qp;
 	int err;
@@ -185,16 +185,12 @@ static void ip_expire(unsigned long arg)
 	    (skb_rtable(head)->rt_type != RTN_LOCAL))
 		goto out;
 
-	clone = skb_clone(head, GFP_ATOMIC);
+	skb_get(head);
+	spin_unlock(&qp->q.lock);
+	icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
+	kfree_skb(head);
+	goto out_rcu_unlock;
 
-	/* Send an ICMP "Fragment Reassembly Timeout" message. */
-	if (clone) {
-		spin_unlock(&qp->q.lock);
-		icmp_send(clone, ICMP_TIME_EXCEEDED,
-			  ICMP_EXC_FRAGTIME, 0);
-		consume_skb(clone);
-		goto out_rcu_unlock;
-	}
 out:
 	spin_unlock(&qp->q.lock);
 out_rcu_unlock:



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 56/71] ipv6: frags: rewrite ip6_expire_frag_queue()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 55/71] inet: frags: do not clone skb in ip_expire() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 57/71] rhashtable: reorganize struct rhashtable layout Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Make it similar to IPv4 ip_expire(), and release the lock
before calling icmp functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 05c0b86b9696802fd0ce5676a92a63f1b455bdf3)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/reassembly.c |   24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -92,7 +92,9 @@ EXPORT_SYMBOL(ip6_frag_init);
 void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 {
 	struct net_device *dev = NULL;
+	struct sk_buff *head;
 
+	rcu_read_lock();
 	spin_lock(&fq->q.lock);
 
 	if (fq->q.flags & INET_FRAG_COMPLETE)
@@ -100,28 +102,34 @@ void ip6_expire_frag_queue(struct net *n
 
 	inet_frag_kill(&fq->q);
 
-	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, fq->iif);
 	if (!dev)
-		goto out_rcu_unlock;
+		goto out;
 
 	__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS);
 	__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
 
 	/* Don't send error if the first segment did not arrive. */
-	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !fq->q.fragments)
-		goto out_rcu_unlock;
+	head = fq->q.fragments;
+	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
+		goto out;
 
 	/* But use as source device on which LAST ARRIVED
 	 * segment was received. And do not use fq->dev
 	 * pointer directly, device might already disappeared.
 	 */
-	fq->q.fragments->dev = dev;
-	icmpv6_send(fq->q.fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
-out_rcu_unlock:
-	rcu_read_unlock();
+	head->dev = dev;
+	skb_get(head);
+	spin_unlock(&fq->q.lock);
+
+	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
+	kfree_skb(head);
+	goto out_rcu_unlock;
+
 out:
 	spin_unlock(&fq->q.lock);
+out_rcu_unlock:
+	rcu_read_unlock();
 	inet_frag_put(&fq->q);
 }
 EXPORT_SYMBOL(ip6_expire_frag_queue);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 57/71] rhashtable: reorganize struct rhashtable layout
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 56/71] ipv6: frags: rewrite ip6_expire_frag_queue() Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 58/71] inet: frags: reorganize struct netns_frags Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

While under frags DDOS I noticed unfortunate false sharing between
@nelems and @params.automatic_shrinking

Move @nelems at the end of struct rhashtable so that first cache line
is shared between all cpus, because almost never dirtied.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e5d672a0780d9e7118caad4c171ec88b8299398d)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/rhashtable.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -138,7 +138,6 @@ struct rhashtable_params {
 /**
  * struct rhashtable - Hash table handle
  * @tbl: Bucket table
- * @nelems: Number of elements in table
  * @key_len: Key length for hashfn
  * @elasticity: Maximum chain length before rehash
  * @p: Configuration parameters
@@ -146,10 +145,10 @@ struct rhashtable_params {
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
  * @lock: Spin lock to protect walker list
+ * @nelems: Number of elements in table
  */
 struct rhashtable {
 	struct bucket_table __rcu	*tbl;
-	atomic_t			nelems;
 	unsigned int			key_len;
 	unsigned int			elasticity;
 	struct rhashtable_params	p;
@@ -157,6 +156,7 @@ struct rhashtable {
 	struct work_struct		run_work;
 	struct mutex                    mutex;
 	spinlock_t			lock;
+	atomic_t			nelems;
 };
 
 /**



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 58/71] inet: frags: reorganize struct netns_frags
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 57/71] rhashtable: reorganize struct rhashtable layout Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 59/71] inet: frags: get rid of ipfrag_skb_cb/FRAG_CB Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Put the read-mostly fields in a separate cache line
at the beginning of struct netns_frags, to reduce
false sharing noticed in inet_frag_kill()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c2615cf5a761b32bf74e85bddc223dfff3d9b9f0)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -4,16 +4,17 @@
 #include <linux/rhashtable.h>
 
 struct netns_frags {
-	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
-
-	/* Keep atomic mem on separate cachelines in structs that include it */
-	atomic_long_t		mem ____cacheline_aligned_in_smp;
 	/* sysctls */
 	long			high_thresh;
 	long			low_thresh;
 	int			timeout;
 	int			max_dist;
 	struct inet_frags	*f;
+
+	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
+
+	/* Keep atomic mem on separate cachelines in structs that include it */
+	atomic_long_t		mem ____cacheline_aligned_in_smp;
 };
 
 /**



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 59/71] inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 58/71] inet: frags: reorganize struct netns_frags Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 60/71] inet: frags: fix ip6frag_low_thresh boundary Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb->next,
meaning that we use two cache lines per skb when finding the insertion point.

By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
in a single cache line and save precious memory bandwidth.

Note that after the fast path added by Changli Gao in commit
d6bebca92c66 ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev->len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.

Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |    5 +++++
 1 file changed, 5 insertions(+)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -645,6 +645,11 @@ struct sk_buff {
 		};
 		struct rb_node	rbnode; /* used in netem & tcp stack */
 	};
+
+	union {
+		int			ip_defrag_offset;
+	};
+
 	struct sock		*sk;
 	struct net_device	*dev;
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 60/71] inet: frags: fix ip6frag_low_thresh boundary
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 59/71] inet: frags: get rid of ipfrag_skb_cb/FRAG_CB Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 61/71] ip: discard IPv4 datagrams with overlapping segments Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet,
	Maciej Żenczykowski, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.

ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.

Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.

Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3d23401283e80ceb03f765842787e0e79ff598b7)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ieee802154/6lowpan/reassembly.c     |    2 -
 net/ipv4/ip_fragment.c                  |   40 ++++++++++++--------------------
 net/ipv6/netfilter/nf_conntrack_reasm.c |    2 -
 net/ipv6/reassembly.c                   |    4 ---
 4 files changed, 17 insertions(+), 31 deletions(-)

--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -410,7 +410,6 @@ err:
 }
 
 #ifdef CONFIG_SYSCTL
-static long zero;
 
 static struct ctl_table lowpan_frags_ns_ctl_table[] = {
 	{
@@ -427,7 +426,6 @@ static struct ctl_table lowpan_frags_ns_
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &zero,
 		.extra2		= &init_net.ieee802154_lowpan.frags.high_thresh
 	},
 	{
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -56,14 +56,6 @@
  */
 static const char ip_frag_cache_name[] = "ip4-frags";
 
-struct ipfrag_skb_cb
-{
-	struct inet_skb_parm	h;
-	int			offset;
-};
-
-#define FRAG_CB(skb)	((struct ipfrag_skb_cb *)((skb)->cb))
-
 /* Describe an entry in the "incomplete datagrams" queue. */
 struct ipq {
 	struct inet_frag_queue q;
@@ -351,13 +343,13 @@ static int ip_frag_queue(struct ipq *qp,
 	 * this fragment, right?
 	 */
 	prev = qp->q.fragments_tail;
-	if (!prev || FRAG_CB(prev)->offset < offset) {
+	if (!prev || prev->ip_defrag_offset < offset) {
 		next = NULL;
 		goto found;
 	}
 	prev = NULL;
 	for (next = qp->q.fragments; next != NULL; next = next->next) {
-		if (FRAG_CB(next)->offset >= offset)
+		if (next->ip_defrag_offset >= offset)
 			break;	/* bingo! */
 		prev = next;
 	}
@@ -368,7 +360,7 @@ found:
 	 * any overlaps are eliminated.
 	 */
 	if (prev) {
-		int i = (FRAG_CB(prev)->offset + prev->len) - offset;
+		int i = (prev->ip_defrag_offset + prev->len) - offset;
 
 		if (i > 0) {
 			offset += i;
@@ -385,8 +377,8 @@ found:
 
 	err = -ENOMEM;
 
-	while (next && FRAG_CB(next)->offset < end) {
-		int i = end - FRAG_CB(next)->offset; /* overlap is 'i' bytes */
+	while (next && next->ip_defrag_offset < end) {
+		int i = end - next->ip_defrag_offset; /* overlap is 'i' bytes */
 
 		if (i < next->len) {
 			int delta = -next->truesize;
@@ -399,7 +391,7 @@ found:
 			delta += next->truesize;
 			if (delta)
 				add_frag_mem_limit(qp->q.net, delta);
-			FRAG_CB(next)->offset += i;
+			next->ip_defrag_offset += i;
 			qp->q.meat -= i;
 			if (next->ip_summed != CHECKSUM_UNNECESSARY)
 				next->ip_summed = CHECKSUM_NONE;
@@ -423,7 +415,13 @@ found:
 		}
 	}
 
-	FRAG_CB(skb)->offset = offset;
+	/* Note : skb->ip_defrag_offset and skb->dev share the same location */
+	dev = skb->dev;
+	if (dev)
+		qp->iif = dev->ifindex;
+	/* Makes sure compiler wont do silly aliasing games */
+	barrier();
+	skb->ip_defrag_offset = offset;
 
 	/* Insert this fragment in the chain of fragments. */
 	skb->next = next;
@@ -434,11 +432,6 @@ found:
 	else
 		qp->q.fragments = skb;
 
-	dev = skb->dev;
-	if (dev) {
-		qp->iif = dev->ifindex;
-		skb->dev = NULL;
-	}
 	qp->q.stamp = skb->tstamp;
 	qp->q.meat += skb->len;
 	qp->ecn |= ecn;
@@ -514,7 +507,7 @@ static int ip_frag_reasm(struct ipq *qp,
 	}
 
 	WARN_ON(!head);
-	WARN_ON(FRAG_CB(head)->offset != 0);
+	WARN_ON(head->ip_defrag_offset != 0);
 
 	/* Allocate a new buffer for the datagram. */
 	ihlen = ip_hdrlen(head);
@@ -677,7 +670,7 @@ struct sk_buff *ip_check_defrag(struct n
 EXPORT_SYMBOL(ip_check_defrag);
 
 #ifdef CONFIG_SYSCTL
-static long zero;
+static int dist_min;
 
 static struct ctl_table ip4_frags_ns_ctl_table[] = {
 	{
@@ -694,7 +687,6 @@ static struct ctl_table ip4_frags_ns_ctl
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &zero,
 		.extra2		= &init_net.ipv4.frags.high_thresh
 	},
 	{
@@ -710,7 +702,7 @@ static struct ctl_table ip4_frags_ns_ctl
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &zero
+		.extra1		= &dist_min,
 	},
 	{ }
 };
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -63,7 +63,6 @@ struct nf_ct_frag6_skb_cb
 static struct inet_frags nf_frags;
 
 #ifdef CONFIG_SYSCTL
-static long zero;
 
 static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{
@@ -79,7 +78,6 @@ static struct ctl_table nf_ct_frag6_sysc
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &zero,
 		.extra2		= &init_net.nf_frag.frags.high_thresh
 	},
 	{
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -548,7 +548,6 @@ static const struct inet6_protocol frag_
 };
 
 #ifdef CONFIG_SYSCTL
-static int zero;
 
 static struct ctl_table ip6_frags_ns_ctl_table[] = {
 	{
@@ -564,8 +563,7 @@ static struct ctl_table ip6_frags_ns_ctl
 		.data		= &init_net.ipv6.frags.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &zero,
+		.proc_handler	= proc_doulongvec_minmax,
 		.extra2		= &init_net.ipv6.frags.high_thresh
 	},
 	{



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 61/71] ip: discard IPv4 datagrams with overlapping segments.
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 60/71] inet: frags: fix ip6frag_low_thresh boundary Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:09 ` [PATCH 4.9 62/71] net: speed up skb_rbtree_purge() Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, David S. Miller, Peter Oskolkov,
	Eric Dumazet, Florian Westphal, Stephen Hemminger

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <posk@google.com>

This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.

Tested: ran ip_defrag selftest (not yet available uptream).

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/uapi/linux/snmp.h |    1 
 net/ipv4/ip_fragment.c    |   77 +++++++++++-----------------------------------
 net/ipv4/proc.c           |    1 
 3 files changed, 22 insertions(+), 57 deletions(-)

--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -55,6 +55,7 @@ enum
 	IPSTATS_MIB_ECT1PKTS,			/* InECT1Pkts */
 	IPSTATS_MIB_ECT0PKTS,			/* InECT0Pkts */
 	IPSTATS_MIB_CEPKTS,			/* InCEPkts */
+	IPSTATS_MIB_REASM_OVERLAPS,		/* ReasmOverlaps */
 	__IPSTATS_MIB_MAX
 };
 
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -275,6 +275,7 @@ static int ip_frag_reinit(struct ipq *qp
 /* Add new segment to existing queue. */
 static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 {
+	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
 	struct sk_buff *prev, *next;
 	struct net_device *dev;
 	unsigned int fragsize;
@@ -355,65 +356,23 @@ static int ip_frag_queue(struct ipq *qp,
 	}
 
 found:
-	/* We found where to put this one.  Check for overlap with
-	 * preceding fragment, and, if needed, align things so that
-	 * any overlaps are eliminated.
+	/* RFC5722, Section 4, amended by Errata ID : 3089
+	 *                          When reassembling an IPv6 datagram, if
+	 *   one or more its constituent fragments is determined to be an
+	 *   overlapping fragment, the entire datagram (and any constituent
+	 *   fragments) MUST be silently discarded.
+	 *
+	 * We do the same here for IPv4.
 	 */
-	if (prev) {
-		int i = (prev->ip_defrag_offset + prev->len) - offset;
 
-		if (i > 0) {
-			offset += i;
-			err = -EINVAL;
-			if (end <= offset)
-				goto err;
-			err = -ENOMEM;
-			if (!pskb_pull(skb, i))
-				goto err;
-			if (skb->ip_summed != CHECKSUM_UNNECESSARY)
-				skb->ip_summed = CHECKSUM_NONE;
-		}
-	}
-
-	err = -ENOMEM;
-
-	while (next && next->ip_defrag_offset < end) {
-		int i = end - next->ip_defrag_offset; /* overlap is 'i' bytes */
-
-		if (i < next->len) {
-			int delta = -next->truesize;
-
-			/* Eat head of the next overlapped fragment
-			 * and leave the loop. The next ones cannot overlap.
-			 */
-			if (!pskb_pull(next, i))
-				goto err;
-			delta += next->truesize;
-			if (delta)
-				add_frag_mem_limit(qp->q.net, delta);
-			next->ip_defrag_offset += i;
-			qp->q.meat -= i;
-			if (next->ip_summed != CHECKSUM_UNNECESSARY)
-				next->ip_summed = CHECKSUM_NONE;
-			break;
-		} else {
-			struct sk_buff *free_it = next;
-
-			/* Old fragment is completely overridden with
-			 * new one drop it.
-			 */
-			next = next->next;
-
-			if (prev)
-				prev->next = next;
-			else
-				qp->q.fragments = next;
-
-			qp->q.meat -= free_it->len;
-			sub_frag_mem_limit(qp->q.net, free_it->truesize);
-			kfree_skb(free_it);
-		}
-	}
+	/* Is there an overlap with the previous fragment? */
+	if (prev &&
+	    (prev->ip_defrag_offset + prev->len) > offset)
+		goto discard_qp;
+
+	/* Is there an overlap with the next fragment? */
+	if (next && next->ip_defrag_offset < end)
+		goto discard_qp;
 
 	/* Note : skb->ip_defrag_offset and skb->dev share the same location */
 	dev = skb->dev;
@@ -461,6 +420,10 @@ found:
 	skb_dst_drop(skb);
 	return -EINPROGRESS;
 
+discard_qp:
+	inet_frag_kill(&qp->q);
+	err = -EINVAL;
+	__IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
 err:
 	kfree_skb(skb);
 	return err;
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -134,6 +134,7 @@ static const struct snmp_mib snmp4_ipext
 	SNMP_MIB_ITEM("InECT1Pkts", IPSTATS_MIB_ECT1PKTS),
 	SNMP_MIB_ITEM("InECT0Pkts", IPSTATS_MIB_ECT0PKTS),
 	SNMP_MIB_ITEM("InCEPkts", IPSTATS_MIB_CEPKTS),
+	SNMP_MIB_ITEM("ReasmOverlaps", IPSTATS_MIB_REASM_OVERLAPS),
 	SNMP_MIB_SENTINEL
 };
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 62/71] net: speed up skb_rbtree_purge()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 61/71] ip: discard IPv4 datagrams with overlapping segments Greg Kroah-Hartman
@ 2018-10-16 17:09 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 63/71] net: modify skb_rbtree_purge to return the truesize of all purged skbs Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Greg Kroah-Hartman, stable, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

As measured in my prior patch ("sch_netem: faster rb tree removal"),
rbtree_postorder_for_each_entry_safe() is nice looking but much slower
than using rb_next() directly, except when tree is small enough
to fit in CPU caches (then the cost is the same)

Also note that there is not even an increase of text size :
$ size net/core/skbuff.o.before net/core/skbuff.o
   text	   data	    bss	    dec	    hex	filename
  40711	   1298	      0	  42009	   a419	net/core/skbuff.o.before
  40711	   1298	      0	  42009	   a419	net/core/skbuff.o

From: Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7c90584c66cc4b033a3b684b0e0950f79e7b7166)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/skbuff.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2433,12 +2433,15 @@ EXPORT_SYMBOL(skb_queue_purge);
  */
 void skb_rbtree_purge(struct rb_root *root)
 {
-	struct sk_buff *skb, *next;
+	struct rb_node *p = rb_first(root);
 
-	rbtree_postorder_for_each_entry_safe(skb, next, root, rbnode)
-		kfree_skb(skb);
+	while (p) {
+		struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);
 
-	*root = RB_ROOT;
+		p = rb_next(p);
+		rb_erase(&skb->rbnode, root);
+		kfree_skb(skb);
+	}
 }
 
 /**



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 63/71] net: modify skb_rbtree_purge to return the truesize of all purged skbs.
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2018-10-16 17:09 ` [PATCH 4.9 62/71] net: speed up skb_rbtree_purge() Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 64/71] ipv6: defrag: drop non-last frags smaller than min mtu Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Peter Oskolkov,
	Florian Westphal, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <posk@google.com>

Tested: see the next patch is the series.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |    2 +-
 net/core/skbuff.c      |    6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2418,7 +2418,7 @@ static inline void __skb_queue_purge(str
 		kfree_skb(skb);
 }
 
-void skb_rbtree_purge(struct rb_root *root);
+unsigned int skb_rbtree_purge(struct rb_root *root);
 
 void *netdev_alloc_frag(unsigned int fragsz);
 
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2425,23 +2425,27 @@ EXPORT_SYMBOL(skb_queue_purge);
 /**
  *	skb_rbtree_purge - empty a skb rbtree
  *	@root: root of the rbtree to empty
+ *	Return value: the sum of truesizes of all purged skbs.
  *
  *	Delete all buffers on an &sk_buff rbtree. Each buffer is removed from
  *	the list and one reference dropped. This function does not take
  *	any lock. Synchronization should be handled by the caller (e.g., TCP
  *	out-of-order queue is protected by the socket lock).
  */
-void skb_rbtree_purge(struct rb_root *root)
+unsigned int skb_rbtree_purge(struct rb_root *root)
 {
 	struct rb_node *p = rb_first(root);
+	unsigned int sum = 0;
 
 	while (p) {
 		struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);
 
 		p = rb_next(p);
 		rb_erase(&skb->rbnode, root);
+		sum += skb->truesize;
 		kfree_skb(skb);
 	}
+	return sum;
 }
 
 /**



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 64/71] ipv6: defrag: drop non-last frags smaller than min mtu
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 63/71] net: modify skb_rbtree_purge to return the truesize of all purged skbs Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 65/71] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Peter Oskolkov, Eric Dumazet,
	Florian Westphal, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

don't bother with pathological cases, they only waste cycles.
IPv6 requires a minimum MTU of 1280 so we should never see fragments
smaller than this (except last frag).

v3: don't use awkward "-offset + len"
v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
    There were concerns that there could be even smaller frags
    generated by intermediate nodes, e.g. on radio networks.

Cc: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/netfilter/nf_conntrack_reasm.c |    4 ++++
 net/ipv6/reassembly.c                   |    4 ++++
 2 files changed, 8 insertions(+)

--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -564,6 +564,10 @@ int nf_ct_frag6_gather(struct net *net,
 	hdr = ipv6_hdr(skb);
 	fhdr = (struct frag_hdr *)skb_transport_header(skb);
 
+	if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU &&
+	    fhdr->frag_off & htons(IP6_MF))
+		return -EINVAL;
+
 	skb_orphan(skb);
 	fq = fq_find(net, fhdr->identification, user, hdr,
 		     skb->dev ? skb->dev->ifindex : 0);
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -516,6 +516,10 @@ static int ipv6_frag_rcv(struct sk_buff
 		return 1;
 	}
 
+	if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU &&
+	    fhdr->frag_off & htons(IP6_MF))
+		goto fail_hdr;
+
 	iif = skb->dev ? skb->dev->ifindex : 0;
 	fq = fq_find(net, fhdr->identification, hdr, iif);
 	if (fq) {



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 65/71] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 64/71] ipv6: defrag: drop non-last frags smaller than min mtu Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 66/71] net: add rb_to_skb() and other rb tree helpers Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |    5 ++---
 net/core/skbuff.c      |   14 ++++++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2954,6 +2954,7 @@ static inline unsigned char *skb_push_rc
 	return skb->data;
 }
 
+int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
 /**
  *	pskb_trim_rcsum - trim received skb and update checksum
  *	@skb: buffer to trim
@@ -2967,9 +2968,7 @@ static inline int pskb_trim_rcsum(struct
 {
 	if (likely(len >= skb->len))
 		return 0;
-	if (skb->ip_summed == CHECKSUM_COMPLETE)
-		skb->ip_summed = CHECKSUM_NONE;
-	return __pskb_trim(skb, len);
+	return pskb_trim_rcsum_slow(skb, len);
 }
 
 static inline int __skb_trim_rcsum(struct sk_buff *skb, unsigned int len)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1578,6 +1578,20 @@ done:
 }
 EXPORT_SYMBOL(___pskb_trim);
 
+/* Note : use pskb_trim_rcsum() instead of calling this directly
+ */
+int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
+{
+	if (skb->ip_summed == CHECKSUM_COMPLETE) {
+		int delta = skb->len - len;
+
+		skb->csum = csum_sub(skb->csum,
+				     skb_checksum(skb, len, delta, 0));
+	}
+	return __pskb_trim(skb, len);
+}
+EXPORT_SYMBOL(pskb_trim_rcsum_slow);
+
 /**
  *	__pskb_pull_tail - advance tail of skb header
  *	@skb: buffer to reallocate



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 66/71] net: add rb_to_skb() and other rb tree helpers
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 65/71] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 67/71] ip: use rb trees for IP frag queue Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Geeralize private netem_rb_to_skb()

TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |   18 ++++++++++++++++++
 net/ipv4/tcp_input.c   |   33 ++++++++++++---------------------
 2 files changed, 30 insertions(+), 21 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2988,6 +2988,12 @@ static inline int __skb_grow_rcsum(struc
 
 #define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
 
+#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
+#define skb_rb_first(root) rb_to_skb(rb_first(root))
+#define skb_rb_last(root)  rb_to_skb(rb_last(root))
+#define skb_rb_next(skb)   rb_to_skb(rb_next(&(skb)->rbnode))
+#define skb_rb_prev(skb)   rb_to_skb(rb_prev(&(skb)->rbnode))
+
 #define skb_queue_walk(queue, skb) \
 		for (skb = (queue)->next;					\
 		     skb != (struct sk_buff *)(queue);				\
@@ -3002,6 +3008,18 @@ static inline int __skb_grow_rcsum(struc
 		for (; skb != (struct sk_buff *)(queue);			\
 		     skb = skb->next)
 
+#define skb_rbtree_walk(skb, root)						\
+		for (skb = skb_rb_first(root); skb != NULL;			\
+		     skb = skb_rb_next(skb))
+
+#define skb_rbtree_walk_from(skb)						\
+		for (; skb != NULL;						\
+		     skb = skb_rb_next(skb))
+
+#define skb_rbtree_walk_from_safe(skb, tmp)					\
+		for (; tmp = skb ? skb_rb_next(skb) : NULL, (skb != NULL);	\
+		     skb = tmp)
+
 #define skb_queue_walk_from_safe(queue, skb, tmp)				\
 		for (tmp = skb->next;						\
 		     skb != (struct sk_buff *)(queue);				\
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4406,7 +4406,7 @@ static void tcp_ofo_queue(struct sock *s
 
 	p = rb_first(&tp->out_of_order_queue);
 	while (p) {
-		skb = rb_entry(p, struct sk_buff, rbnode);
+		skb = rb_to_skb(p);
 		if (after(TCP_SKB_CB(skb)->seq, tp->rcv_nxt))
 			break;
 
@@ -4470,7 +4470,7 @@ static int tcp_try_rmem_schedule(struct
 static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct rb_node **p, *q, *parent;
+	struct rb_node **p, *parent;
 	struct sk_buff *skb1;
 	u32 seq, end_seq;
 	bool fragstolen;
@@ -4529,7 +4529,7 @@ coalesce_done:
 	parent = NULL;
 	while (*p) {
 		parent = *p;
-		skb1 = rb_entry(parent, struct sk_buff, rbnode);
+		skb1 = rb_to_skb(parent);
 		if (before(seq, TCP_SKB_CB(skb1)->seq)) {
 			p = &parent->rb_left;
 			continue;
@@ -4574,9 +4574,7 @@ insert:
 
 merge_right:
 	/* Remove other segments covered by skb. */
-	while ((q = rb_next(&skb->rbnode)) != NULL) {
-		skb1 = rb_entry(q, struct sk_buff, rbnode);
-
+	while ((skb1 = skb_rb_next(skb)) != NULL) {
 		if (!after(end_seq, TCP_SKB_CB(skb1)->seq))
 			break;
 		if (before(end_seq, TCP_SKB_CB(skb1)->end_seq)) {
@@ -4591,7 +4589,7 @@ merge_right:
 		tcp_drop(sk, skb1);
 	}
 	/* If there is no skb after us, we are the last_skb ! */
-	if (!q)
+	if (!skb1)
 		tp->ooo_last_skb = skb;
 
 add_sack:
@@ -4792,7 +4790,7 @@ static struct sk_buff *tcp_skb_next(stru
 	if (list)
 		return !skb_queue_is_last(list, skb) ? skb->next : NULL;
 
-	return rb_entry_safe(rb_next(&skb->rbnode), struct sk_buff, rbnode);
+	return skb_rb_next(skb);
 }
 
 static struct sk_buff *tcp_collapse_one(struct sock *sk, struct sk_buff *skb,
@@ -4821,7 +4819,7 @@ static void tcp_rbtree_insert(struct rb_
 
 	while (*p) {
 		parent = *p;
-		skb1 = rb_entry(parent, struct sk_buff, rbnode);
+		skb1 = rb_to_skb(parent);
 		if (before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb1)->seq))
 			p = &parent->rb_left;
 		else
@@ -4941,19 +4939,12 @@ static void tcp_collapse_ofo_queue(struc
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 range_truesize, sum_tiny = 0;
 	struct sk_buff *skb, *head;
-	struct rb_node *p;
 	u32 start, end;
 
-	p = rb_first(&tp->out_of_order_queue);
-	skb = rb_entry_safe(p, struct sk_buff, rbnode);
+	skb = skb_rb_first(&tp->out_of_order_queue);
 new_range:
 	if (!skb) {
-		p = rb_last(&tp->out_of_order_queue);
-		/* Note: This is possible p is NULL here. We do not
-		 * use rb_entry_safe(), as ooo_last_skb is valid only
-		 * if rbtree is not empty.
-		 */
-		tp->ooo_last_skb = rb_entry(p, struct sk_buff, rbnode);
+		tp->ooo_last_skb = skb_rb_last(&tp->out_of_order_queue);
 		return;
 	}
 	start = TCP_SKB_CB(skb)->seq;
@@ -4961,7 +4952,7 @@ new_range:
 	range_truesize = skb->truesize;
 
 	for (head = skb;;) {
-		skb = tcp_skb_next(skb, NULL);
+		skb = skb_rb_next(skb);
 
 		/* Range is terminated when we see a gap or when
 		 * we are at the queue end.
@@ -5017,7 +5008,7 @@ static bool tcp_prune_ofo_queue(struct s
 		prev = rb_prev(node);
 		rb_erase(node, &tp->out_of_order_queue);
 		goal -= rb_to_skb(node)->truesize;
-		tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode));
+		tcp_drop(sk, rb_to_skb(node));
 		if (!prev || goal <= 0) {
 			sk_mem_reclaim(sk);
 			if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
@@ -5027,7 +5018,7 @@ static bool tcp_prune_ofo_queue(struct s
 		}
 		node = prev;
 	} while (node);
-	tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);
+	tp->ooo_last_skb = rb_to_skb(prev);
 
 	/* Reset SACK state.  A conforming SACK implementation will
 	 * do the same at a timeout based retransmit.  When a connection



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 67/71] ip: use rb trees for IP frag queue.
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 66/71] net: add rb_to_skb() and other rb tree helpers Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 68/71] ip: add helpers to process in-order fragments faster Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Jann Horn, Juha-Matti Tilli,
	Eric Dumazet, Peter Oskolkov, Florian Westphal, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <posk@google.com>

(commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream)

Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn <jannh@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h                  |    4 
 include/net/inet_frag.h                 |    3 
 net/ipv4/inet_fragment.c                |   14 +-
 net/ipv4/ip_fragment.c                  |  182 +++++++++++++++++---------------
 net/ipv6/netfilter/nf_conntrack_reasm.c |    1 
 net/ipv6/reassembly.c                   |    1 
 6 files changed, 116 insertions(+), 89 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -643,14 +643,14 @@ struct sk_buff {
 				struct skb_mstamp skb_mstamp;
 			};
 		};
-		struct rb_node	rbnode; /* used in netem & tcp stack */
+		struct rb_node		rbnode; /* used in netem, ip4 defrag, and tcp stack */
 	};
 
 	union {
+		struct sock		*sk;
 		int			ip_defrag_offset;
 	};
 
-	struct sock		*sk;
 	struct net_device	*dev;
 
 	/*
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -74,7 +74,8 @@ struct inet_frag_queue {
 	struct timer_list	timer;
 	spinlock_t		lock;
 	atomic_t		refcnt;
-	struct sk_buff		*fragments;
+	struct sk_buff		*fragments;  /* Used in IPv6. */
+	struct rb_root		rb_fragments; /* Used in IPv4. */
 	struct sk_buff		*fragments_tail;
 	ktime_t			stamp;
 	int			len;
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -136,12 +136,16 @@ void inet_frag_destroy(struct inet_frag_
 	fp = q->fragments;
 	nf = q->net;
 	f = nf->f;
-	while (fp) {
-		struct sk_buff *xp = fp->next;
+	if (fp) {
+		do {
+			struct sk_buff *xp = fp->next;
 
-		sum_truesize += fp->truesize;
-		kfree_skb(fp);
-		fp = xp;
+			sum_truesize += fp->truesize;
+			kfree_skb(fp);
+			fp = xp;
+		} while (fp);
+	} else {
+		sum_truesize = skb_rbtree_purge(&q->rb_fragments);
 	}
 	sum = sum_truesize + f->qsize;
 
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -134,7 +134,7 @@ static bool frag_expire_skip_icmp(u32 us
 static void ip_expire(unsigned long arg)
 {
 	const struct iphdr *iph;
-	struct sk_buff *head;
+	struct sk_buff *head = NULL;
 	struct net *net;
 	struct ipq *qp;
 	int err;
@@ -150,14 +150,31 @@ static void ip_expire(unsigned long arg)
 
 	ipq_kill(qp);
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
-
-	head = qp->q.fragments;
-
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
 
-	if (!(qp->q.flags & INET_FRAG_FIRST_IN) || !head)
+	if (!qp->q.flags & INET_FRAG_FIRST_IN)
 		goto out;
 
+	/* sk_buff::dev and sk_buff::rbnode are unionized. So we
+	 * pull the head out of the tree in order to be able to
+	 * deal with head->dev.
+	 */
+	if (qp->q.fragments) {
+		head = qp->q.fragments;
+		qp->q.fragments = head->next;
+	} else {
+		head = skb_rb_first(&qp->q.rb_fragments);
+		if (!head)
+			goto out;
+		rb_erase(&head->rbnode, &qp->q.rb_fragments);
+		memset(&head->rbnode, 0, sizeof(head->rbnode));
+		barrier();
+	}
+	if (head == qp->q.fragments_tail)
+		qp->q.fragments_tail = NULL;
+
+	sub_frag_mem_limit(qp->q.net, head->truesize);
+
 	head->dev = dev_get_by_index_rcu(net, qp->iif);
 	if (!head->dev)
 		goto out;
@@ -177,16 +194,16 @@ static void ip_expire(unsigned long arg)
 	    (skb_rtable(head)->rt_type != RTN_LOCAL))
 		goto out;
 
-	skb_get(head);
 	spin_unlock(&qp->q.lock);
 	icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
-	kfree_skb(head);
 	goto out_rcu_unlock;
 
 out:
 	spin_unlock(&qp->q.lock);
 out_rcu_unlock:
 	rcu_read_unlock();
+	if (head)
+		kfree_skb(head);
 	ipq_put(qp);
 }
 
@@ -229,7 +246,7 @@ static int ip_frag_too_far(struct ipq *q
 	end = atomic_inc_return(&peer->rid);
 	qp->rid = end;
 
-	rc = qp->q.fragments && (end - start) > max;
+	rc = qp->q.fragments_tail && (end - start) > max;
 
 	if (rc) {
 		struct net *net;
@@ -243,7 +260,6 @@ static int ip_frag_too_far(struct ipq *q
 
 static int ip_frag_reinit(struct ipq *qp)
 {
-	struct sk_buff *fp;
 	unsigned int sum_truesize = 0;
 
 	if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
@@ -251,20 +267,14 @@ static int ip_frag_reinit(struct ipq *qp
 		return -ETIMEDOUT;
 	}
 
-	fp = qp->q.fragments;
-	do {
-		struct sk_buff *xp = fp->next;
-
-		sum_truesize += fp->truesize;
-		kfree_skb(fp);
-		fp = xp;
-	} while (fp);
+	sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
 	sub_frag_mem_limit(qp->q.net, sum_truesize);
 
 	qp->q.flags = 0;
 	qp->q.len = 0;
 	qp->q.meat = 0;
 	qp->q.fragments = NULL;
+	qp->q.rb_fragments = RB_ROOT;
 	qp->q.fragments_tail = NULL;
 	qp->iif = 0;
 	qp->ecn = 0;
@@ -276,7 +286,8 @@ static int ip_frag_reinit(struct ipq *qp
 static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 {
 	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
-	struct sk_buff *prev, *next;
+	struct rb_node **rbn, *parent;
+	struct sk_buff *skb1;
 	struct net_device *dev;
 	unsigned int fragsize;
 	int flags, offset;
@@ -339,58 +350,58 @@ static int ip_frag_queue(struct ipq *qp,
 	if (err)
 		goto err;
 
-	/* Find out which fragments are in front and at the back of us
-	 * in the chain of fragments so far.  We must know where to put
-	 * this fragment, right?
-	 */
-	prev = qp->q.fragments_tail;
-	if (!prev || prev->ip_defrag_offset < offset) {
-		next = NULL;
-		goto found;
-	}
-	prev = NULL;
-	for (next = qp->q.fragments; next != NULL; next = next->next) {
-		if (next->ip_defrag_offset >= offset)
-			break;	/* bingo! */
-		prev = next;
-	}
+	/* Note : skb->rbnode and skb->dev share the same location. */
+	dev = skb->dev;
+	/* Makes sure compiler wont do silly aliasing games */
+	barrier();
 
-found:
 	/* RFC5722, Section 4, amended by Errata ID : 3089
 	 *                          When reassembling an IPv6 datagram, if
 	 *   one or more its constituent fragments is determined to be an
 	 *   overlapping fragment, the entire datagram (and any constituent
 	 *   fragments) MUST be silently discarded.
 	 *
-	 * We do the same here for IPv4.
+	 * We do the same here for IPv4 (and increment an snmp counter).
 	 */
 
-	/* Is there an overlap with the previous fragment? */
-	if (prev &&
-	    (prev->ip_defrag_offset + prev->len) > offset)
-		goto discard_qp;
-
-	/* Is there an overlap with the next fragment? */
-	if (next && next->ip_defrag_offset < end)
-		goto discard_qp;
+	/* Find out where to put this fragment.  */
+	skb1 = qp->q.fragments_tail;
+	if (!skb1) {
+		/* This is the first fragment we've received. */
+		rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
+		qp->q.fragments_tail = skb;
+	} else if ((skb1->ip_defrag_offset + skb1->len) < end) {
+		/* This is the common/special case: skb goes to the end. */
+		/* Detect and discard overlaps. */
+		if (offset < (skb1->ip_defrag_offset + skb1->len))
+			goto discard_qp;
+		/* Insert after skb1. */
+		rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
+		qp->q.fragments_tail = skb;
+	} else {
+		/* Binary search. Note that skb can become the first fragment, but
+		 * not the last (covered above). */
+		rbn = &qp->q.rb_fragments.rb_node;
+		do {
+			parent = *rbn;
+			skb1 = rb_to_skb(parent);
+			if (end <= skb1->ip_defrag_offset)
+				rbn = &parent->rb_left;
+			else if (offset >= skb1->ip_defrag_offset + skb1->len)
+				rbn = &parent->rb_right;
+			else /* Found an overlap with skb1. */
+				goto discard_qp;
+		} while (*rbn);
+		/* Here we have parent properly set, and rbn pointing to
+		 * one of its NULL left/right children. Insert skb. */
+		rb_link_node(&skb->rbnode, parent, rbn);
+	}
+	rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
 
-	/* Note : skb->ip_defrag_offset and skb->dev share the same location */
-	dev = skb->dev;
 	if (dev)
 		qp->iif = dev->ifindex;
-	/* Makes sure compiler wont do silly aliasing games */
-	barrier();
 	skb->ip_defrag_offset = offset;
 
-	/* Insert this fragment in the chain of fragments. */
-	skb->next = next;
-	if (!next)
-		qp->q.fragments_tail = skb;
-	if (prev)
-		prev->next = skb;
-	else
-		qp->q.fragments = skb;
-
 	qp->q.stamp = skb->tstamp;
 	qp->q.meat += skb->len;
 	qp->ecn |= ecn;
@@ -412,7 +423,7 @@ found:
 		unsigned long orefdst = skb->_skb_refdst;
 
 		skb->_skb_refdst = 0UL;
-		err = ip_frag_reasm(qp, prev, dev);
+		err = ip_frag_reasm(qp, skb, dev);
 		skb->_skb_refdst = orefdst;
 		return err;
 	}
@@ -429,15 +440,15 @@ err:
 	return err;
 }
 
-
 /* Build a new IP datagram from all its fragments. */
-
-static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
+static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 			 struct net_device *dev)
 {
 	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
 	struct iphdr *iph;
-	struct sk_buff *fp, *head = qp->q.fragments;
+	struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments);
+	struct sk_buff **nextp; /* To build frag_list. */
+	struct rb_node *rbn;
 	int len;
 	int ihlen;
 	int err;
@@ -451,25 +462,20 @@ static int ip_frag_reasm(struct ipq *qp,
 		goto out_fail;
 	}
 	/* Make the one we just received the head. */
-	if (prev) {
-		head = prev->next;
-		fp = skb_clone(head, GFP_ATOMIC);
+	if (head != skb) {
+		fp = skb_clone(skb, GFP_ATOMIC);
 		if (!fp)
 			goto out_nomem;
-
-		fp->next = head->next;
-		if (!fp->next)
+		rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
+		if (qp->q.fragments_tail == skb)
 			qp->q.fragments_tail = fp;
-		prev->next = fp;
-
-		skb_morph(head, qp->q.fragments);
-		head->next = qp->q.fragments->next;
-
-		consume_skb(qp->q.fragments);
-		qp->q.fragments = head;
+		skb_morph(skb, head);
+		rb_replace_node(&head->rbnode, &skb->rbnode,
+				&qp->q.rb_fragments);
+		consume_skb(head);
+		head = skb;
 	}
 
-	WARN_ON(!head);
 	WARN_ON(head->ip_defrag_offset != 0);
 
 	/* Allocate a new buffer for the datagram. */
@@ -494,24 +500,35 @@ static int ip_frag_reasm(struct ipq *qp,
 		clone = alloc_skb(0, GFP_ATOMIC);
 		if (!clone)
 			goto out_nomem;
-		clone->next = head->next;
-		head->next = clone;
 		skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
 		skb_frag_list_init(head);
 		for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
 			plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
 		clone->len = clone->data_len = head->data_len - plen;
-		head->data_len -= clone->len;
-		head->len -= clone->len;
+		skb->truesize += clone->truesize;
 		clone->csum = 0;
 		clone->ip_summed = head->ip_summed;
 		add_frag_mem_limit(qp->q.net, clone->truesize);
+		skb_shinfo(head)->frag_list = clone;
+		nextp = &clone->next;
+	} else {
+		nextp = &skb_shinfo(head)->frag_list;
 	}
 
-	skb_shinfo(head)->frag_list = head->next;
 	skb_push(head, head->data - skb_network_header(head));
 
-	for (fp=head->next; fp; fp = fp->next) {
+	/* Traverse the tree in order, to build frag_list. */
+	rbn = rb_next(&head->rbnode);
+	rb_erase(&head->rbnode, &qp->q.rb_fragments);
+	while (rbn) {
+		struct rb_node *rbnext = rb_next(rbn);
+		fp = rb_to_skb(rbn);
+		rb_erase(rbn, &qp->q.rb_fragments);
+		rbn = rbnext;
+		*nextp = fp;
+		nextp = &fp->next;
+		fp->prev = NULL;
+		memset(&fp->rbnode, 0, sizeof(fp->rbnode));
 		head->data_len += fp->len;
 		head->len += fp->len;
 		if (head->ip_summed != fp->ip_summed)
@@ -522,7 +539,9 @@ static int ip_frag_reasm(struct ipq *qp,
 	}
 	sub_frag_mem_limit(qp->q.net, head->truesize);
 
+	*nextp = NULL;
 	head->next = NULL;
+	head->prev = NULL;
 	head->dev = dev;
 	head->tstamp = qp->q.stamp;
 	IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
@@ -550,6 +569,7 @@ static int ip_frag_reasm(struct ipq *qp,
 
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMOKS);
 	qp->q.fragments = NULL;
+	qp->q.rb_fragments = RB_ROOT;
 	qp->q.fragments_tail = NULL;
 	return 0;
 
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -470,6 +470,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
 					  head->csum);
 
 	fq->q.fragments = NULL;
+	fq->q.rb_fragments = RB_ROOT;
 	fq->q.fragments_tail = NULL;
 
 	return true;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -466,6 +466,7 @@ static int ip6_frag_reasm(struct frag_qu
 	__IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
 	rcu_read_unlock();
 	fq->q.fragments = NULL;
+	fq->q.rb_fragments = RB_ROOT;
 	fq->q.fragments_tail = NULL;
 	return 1;
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 68/71] ip: add helpers to process in-order fragments faster.
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 67/71] ip: use rb trees for IP frag queue Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 69/71] ip: process in-order fragments efficiently Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Willem de Bruijn, Peter Oskolkov,
	Eric Dumazet, Florian Westphal, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <posk@google.com>

This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.

The new logic (fully implemented in the second patch) is as follows:

* Nodes in the rb-tree will now contain not single fragments, but lists
  of consecutive fragments ("runs").

* At each point in time, the current "active" run at the tail is
  maintained/tracked. Fragments that arrive in-order, adjacent
  to the previous tail fragment, are added to this tail run without
  triggering the re-balancing of the rb-tree.

* If a fragment arrives out of order with the offset _before_ the tail run,
  it is inserted into the rb-tree as a single fragment.

* If a fragment arrives after the current tail fragment (with a gap),
  it starts a new "tail" run, as is inserted into the rb-tree
  at the end as the head of the new run.

skb->cb is used to store additional information
needed here (suggested by Eric Dumazet).

Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 353c9cb360874e737fb000545f783df756c06f9a)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h |    6 +++
 net/ipv4/ip_fragment.c  |   73 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -56,7 +56,9 @@ struct frag_v6_compare_key {
  * @lock: spinlock protecting this frag
  * @refcnt: reference count of the queue
  * @fragments: received fragments head
+ * @rb_fragments: received fragments rb-tree root
  * @fragments_tail: received fragments tail
+ * @last_run_head: the head of the last "run". see ip_fragment.c
  * @stamp: timestamp of the last received fragment
  * @len: total length of the original datagram
  * @meat: length of received fragments so far
@@ -77,6 +79,7 @@ struct inet_frag_queue {
 	struct sk_buff		*fragments;  /* Used in IPv6. */
 	struct rb_root		rb_fragments; /* Used in IPv4. */
 	struct sk_buff		*fragments_tail;
+	struct sk_buff		*last_run_head;
 	ktime_t			stamp;
 	int			len;
 	int			meat;
@@ -112,6 +115,9 @@ void inet_frag_kill(struct inet_frag_que
 void inet_frag_destroy(struct inet_frag_queue *q);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
 
+/* Free all skbs in the queue; return the sum of their truesizes. */
+unsigned int inet_frag_rbtree_purge(struct rb_root *root);
+
 static inline void inet_frag_put(struct inet_frag_queue *q)
 {
 	if (atomic_dec_and_test(&q->refcnt))
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -56,6 +56,57 @@
  */
 static const char ip_frag_cache_name[] = "ip4-frags";
 
+/* Use skb->cb to track consecutive/adjacent fragments coming at
+ * the end of the queue. Nodes in the rb-tree queue will
+ * contain "runs" of one or more adjacent fragments.
+ *
+ * Invariants:
+ * - next_frag is NULL at the tail of a "run";
+ * - the head of a "run" has the sum of all fragment lengths in frag_run_len.
+ */
+struct ipfrag_skb_cb {
+	struct inet_skb_parm	h;
+	struct sk_buff		*next_frag;
+	int			frag_run_len;
+};
+
+#define FRAG_CB(skb)		((struct ipfrag_skb_cb *)((skb)->cb))
+
+static void ip4_frag_init_run(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb));
+
+	FRAG_CB(skb)->next_frag = NULL;
+	FRAG_CB(skb)->frag_run_len = skb->len;
+}
+
+/* Append skb to the last "run". */
+static void ip4_frag_append_to_last_run(struct inet_frag_queue *q,
+					struct sk_buff *skb)
+{
+	RB_CLEAR_NODE(&skb->rbnode);
+	FRAG_CB(skb)->next_frag = NULL;
+
+	FRAG_CB(q->last_run_head)->frag_run_len += skb->len;
+	FRAG_CB(q->fragments_tail)->next_frag = skb;
+	q->fragments_tail = skb;
+}
+
+/* Create a new "run" with the skb. */
+static void ip4_frag_create_run(struct inet_frag_queue *q, struct sk_buff *skb)
+{
+	if (q->last_run_head)
+		rb_link_node(&skb->rbnode, &q->last_run_head->rbnode,
+			     &q->last_run_head->rbnode.rb_right);
+	else
+		rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node);
+	rb_insert_color(&skb->rbnode, &q->rb_fragments);
+
+	ip4_frag_init_run(skb);
+	q->fragments_tail = skb;
+	q->last_run_head = skb;
+}
+
 /* Describe an entry in the "incomplete datagrams" queue. */
 struct ipq {
 	struct inet_frag_queue q;
@@ -652,6 +703,28 @@ struct sk_buff *ip_check_defrag(struct n
 }
 EXPORT_SYMBOL(ip_check_defrag);
 
+unsigned int inet_frag_rbtree_purge(struct rb_root *root)
+{
+	struct rb_node *p = rb_first(root);
+	unsigned int sum = 0;
+
+	while (p) {
+		struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);
+
+		p = rb_next(p);
+		rb_erase(&skb->rbnode, root);
+		while (skb) {
+			struct sk_buff *next = FRAG_CB(skb)->next_frag;
+
+			sum += skb->truesize;
+			kfree_skb(skb);
+			skb = next;
+		}
+	}
+	return sum;
+}
+EXPORT_SYMBOL(inet_frag_rbtree_purge);
+
 #ifdef CONFIG_SYSCTL
 static int dist_min;
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 69/71] ip: process in-order fragments efficiently
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 68/71] ip: add helpers to process in-order fragments faster Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 70/71] ip: frags: fix crash in ip_do_fragment() Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Willem de Bruijn, Peter Oskolkov,
	Eric Dumazet, Florian Westphal, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <posk@google.com>

This patch changes the runtime behavior of IP defrag queue:
incoming in-order fragments are added to the end of the current
list/"run" of in-order fragments at the tail.

On some workloads, UDP stream performance is substantially improved:

RX: ./udp_stream -F 10 -T 2 -l 60
TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60

with this patchset applied on a 10Gbps receiver:

  throughput=9524.18
  throughput_units=Mbit/s

upstream (net-next):

  throughput=4608.93
  throughput_units=Mbit/s

Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a4fd284a1f8fd4b6c59aa59db2185b1e17c5c11c)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/inet_fragment.c |    2 
 net/ipv4/ip_fragment.c   |  110 +++++++++++++++++++++++++++++------------------
 2 files changed, 70 insertions(+), 42 deletions(-)

--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -145,7 +145,7 @@ void inet_frag_destroy(struct inet_frag_
 			fp = xp;
 		} while (fp);
 	} else {
-		sum_truesize = skb_rbtree_purge(&q->rb_fragments);
+		sum_truesize = inet_frag_rbtree_purge(&q->rb_fragments);
 	}
 	sum = sum_truesize + f->qsize;
 
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -125,8 +125,8 @@ static u8 ip4_frag_ecn(u8 tos)
 
 static struct inet_frags ip4_frags;
 
-static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
-			 struct net_device *dev);
+static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
+			 struct sk_buff *prev_tail, struct net_device *dev);
 
 
 static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
@@ -217,7 +217,12 @@ static void ip_expire(unsigned long arg)
 		head = skb_rb_first(&qp->q.rb_fragments);
 		if (!head)
 			goto out;
-		rb_erase(&head->rbnode, &qp->q.rb_fragments);
+		if (FRAG_CB(head)->next_frag)
+			rb_replace_node(&head->rbnode,
+					&FRAG_CB(head)->next_frag->rbnode,
+					&qp->q.rb_fragments);
+		else
+			rb_erase(&head->rbnode, &qp->q.rb_fragments);
 		memset(&head->rbnode, 0, sizeof(head->rbnode));
 		barrier();
 	}
@@ -318,7 +323,7 @@ static int ip_frag_reinit(struct ipq *qp
 		return -ETIMEDOUT;
 	}
 
-	sum_truesize = skb_rbtree_purge(&qp->q.rb_fragments);
+	sum_truesize = inet_frag_rbtree_purge(&qp->q.rb_fragments);
 	sub_frag_mem_limit(qp->q.net, sum_truesize);
 
 	qp->q.flags = 0;
@@ -327,6 +332,7 @@ static int ip_frag_reinit(struct ipq *qp
 	qp->q.fragments = NULL;
 	qp->q.rb_fragments = RB_ROOT;
 	qp->q.fragments_tail = NULL;
+	qp->q.last_run_head = NULL;
 	qp->iif = 0;
 	qp->ecn = 0;
 
@@ -338,7 +344,7 @@ static int ip_frag_queue(struct ipq *qp,
 {
 	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
 	struct rb_node **rbn, *parent;
-	struct sk_buff *skb1;
+	struct sk_buff *skb1, *prev_tail;
 	struct net_device *dev;
 	unsigned int fragsize;
 	int flags, offset;
@@ -416,38 +422,41 @@ static int ip_frag_queue(struct ipq *qp,
 	 */
 
 	/* Find out where to put this fragment.  */
-	skb1 = qp->q.fragments_tail;
-	if (!skb1) {
-		/* This is the first fragment we've received. */
-		rb_link_node(&skb->rbnode, NULL, &qp->q.rb_fragments.rb_node);
-		qp->q.fragments_tail = skb;
-	} else if ((skb1->ip_defrag_offset + skb1->len) < end) {
-		/* This is the common/special case: skb goes to the end. */
+	prev_tail = qp->q.fragments_tail;
+	if (!prev_tail)
+		ip4_frag_create_run(&qp->q, skb);  /* First fragment. */
+	else if (prev_tail->ip_defrag_offset + prev_tail->len < end) {
+		/* This is the common case: skb goes to the end. */
 		/* Detect and discard overlaps. */
-		if (offset < (skb1->ip_defrag_offset + skb1->len))
+		if (offset < prev_tail->ip_defrag_offset + prev_tail->len)
 			goto discard_qp;
-		/* Insert after skb1. */
-		rb_link_node(&skb->rbnode, &skb1->rbnode, &skb1->rbnode.rb_right);
-		qp->q.fragments_tail = skb;
+		if (offset == prev_tail->ip_defrag_offset + prev_tail->len)
+			ip4_frag_append_to_last_run(&qp->q, skb);
+		else
+			ip4_frag_create_run(&qp->q, skb);
 	} else {
-		/* Binary search. Note that skb can become the first fragment, but
-		 * not the last (covered above). */
+		/* Binary search. Note that skb can become the first fragment,
+		 * but not the last (covered above).
+		 */
 		rbn = &qp->q.rb_fragments.rb_node;
 		do {
 			parent = *rbn;
 			skb1 = rb_to_skb(parent);
 			if (end <= skb1->ip_defrag_offset)
 				rbn = &parent->rb_left;
-			else if (offset >= skb1->ip_defrag_offset + skb1->len)
+			else if (offset >= skb1->ip_defrag_offset +
+						FRAG_CB(skb1)->frag_run_len)
 				rbn = &parent->rb_right;
 			else /* Found an overlap with skb1. */
 				goto discard_qp;
 		} while (*rbn);
 		/* Here we have parent properly set, and rbn pointing to
-		 * one of its NULL left/right children. Insert skb. */
+		 * one of its NULL left/right children. Insert skb.
+		 */
+		ip4_frag_init_run(skb);
 		rb_link_node(&skb->rbnode, parent, rbn);
+		rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
 	}
-	rb_insert_color(&skb->rbnode, &qp->q.rb_fragments);
 
 	if (dev)
 		qp->iif = dev->ifindex;
@@ -474,7 +483,7 @@ static int ip_frag_queue(struct ipq *qp,
 		unsigned long orefdst = skb->_skb_refdst;
 
 		skb->_skb_refdst = 0UL;
-		err = ip_frag_reasm(qp, skb, dev);
+		err = ip_frag_reasm(qp, skb, prev_tail, dev);
 		skb->_skb_refdst = orefdst;
 		return err;
 	}
@@ -493,7 +502,7 @@ err:
 
 /* Build a new IP datagram from all its fragments. */
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
-			 struct net_device *dev)
+			 struct sk_buff *prev_tail, struct net_device *dev)
 {
 	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
 	struct iphdr *iph;
@@ -517,10 +526,16 @@ static int ip_frag_reasm(struct ipq *qp,
 		fp = skb_clone(skb, GFP_ATOMIC);
 		if (!fp)
 			goto out_nomem;
-		rb_replace_node(&skb->rbnode, &fp->rbnode, &qp->q.rb_fragments);
+		FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag;
+		if (RB_EMPTY_NODE(&skb->rbnode))
+			FRAG_CB(prev_tail)->next_frag = fp;
+		else
+			rb_replace_node(&skb->rbnode, &fp->rbnode,
+					&qp->q.rb_fragments);
 		if (qp->q.fragments_tail == skb)
 			qp->q.fragments_tail = fp;
 		skb_morph(skb, head);
+		FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag;
 		rb_replace_node(&head->rbnode, &skb->rbnode,
 				&qp->q.rb_fragments);
 		consume_skb(head);
@@ -556,7 +571,7 @@ static int ip_frag_reasm(struct ipq *qp,
 		for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
 			plen += skb_frag_size(&skb_shinfo(head)->frags[i]);
 		clone->len = clone->data_len = head->data_len - plen;
-		skb->truesize += clone->truesize;
+		head->truesize += clone->truesize;
 		clone->csum = 0;
 		clone->ip_summed = head->ip_summed;
 		add_frag_mem_limit(qp->q.net, clone->truesize);
@@ -569,24 +584,36 @@ static int ip_frag_reasm(struct ipq *qp,
 	skb_push(head, head->data - skb_network_header(head));
 
 	/* Traverse the tree in order, to build frag_list. */
+	fp = FRAG_CB(head)->next_frag;
 	rbn = rb_next(&head->rbnode);
 	rb_erase(&head->rbnode, &qp->q.rb_fragments);
-	while (rbn) {
-		struct rb_node *rbnext = rb_next(rbn);
-		fp = rb_to_skb(rbn);
-		rb_erase(rbn, &qp->q.rb_fragments);
-		rbn = rbnext;
-		*nextp = fp;
-		nextp = &fp->next;
-		fp->prev = NULL;
-		memset(&fp->rbnode, 0, sizeof(fp->rbnode));
-		head->data_len += fp->len;
-		head->len += fp->len;
-		if (head->ip_summed != fp->ip_summed)
-			head->ip_summed = CHECKSUM_NONE;
-		else if (head->ip_summed == CHECKSUM_COMPLETE)
-			head->csum = csum_add(head->csum, fp->csum);
-		head->truesize += fp->truesize;
+	while (rbn || fp) {
+		/* fp points to the next sk_buff in the current run;
+		 * rbn points to the next run.
+		 */
+		/* Go through the current run. */
+		while (fp) {
+			*nextp = fp;
+			nextp = &fp->next;
+			fp->prev = NULL;
+			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+			head->data_len += fp->len;
+			head->len += fp->len;
+			if (head->ip_summed != fp->ip_summed)
+				head->ip_summed = CHECKSUM_NONE;
+			else if (head->ip_summed == CHECKSUM_COMPLETE)
+				head->csum = csum_add(head->csum, fp->csum);
+			head->truesize += fp->truesize;
+			fp = FRAG_CB(fp)->next_frag;
+		}
+		/* Move to the next run. */
+		if (rbn) {
+			struct rb_node *rbnext = rb_next(rbn);
+
+			fp = rb_to_skb(rbn);
+			rb_erase(rbn, &qp->q.rb_fragments);
+			rbn = rbnext;
+		}
 	}
 	sub_frag_mem_limit(qp->q.net, head->truesize);
 
@@ -622,6 +649,7 @@ static int ip_frag_reasm(struct ipq *qp,
 	qp->q.fragments = NULL;
 	qp->q.rb_fragments = RB_ROOT;
 	qp->q.fragments_tail = NULL;
+	qp->q.last_run_head = NULL;
 	return 0;
 
 out_nomem:



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 70/71] ip: frags: fix crash in ip_do_fragment()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 69/71] ip: process in-order fragments efficiently Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-16 17:10 ` [PATCH 4.9 71/71] ipv4: frags: precedence bug in ip_expire() Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Taehee Yoo, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Taehee Yoo <ap420073@gmail.com>

commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream

A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
   %iptables -t nat -I POSTROUTING -j MASQUERADE
   %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[  261.198158] Call Trace:
[  261.199018]  ? dst_output+0x180/0x180
[  261.205011]  ? save_trace+0x300/0x300
[  261.209018]  ? ip_copy_metadata+0xb00/0xb00
[  261.213034]  ? sched_clock_local+0xd4/0x140
[  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
[  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
[  261.227014]  ? find_held_lock+0x39/0x1c0
[  261.233008]  ip_finish_output+0x51d/0xb50
[  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
[  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[  261.250152]  ? rcu_is_watching+0x77/0x120
[  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[  261.261033]  ? nf_hook_slow+0xb1/0x160
[  261.265007]  ip_output+0x1c7/0x710
[  261.269005]  ? ip_mc_output+0x13f0/0x13f0
[  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
[  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
[  261.282996]  ? nf_hook_slow+0xb1/0x160
[  261.287007]  raw_sendmsg+0x21f9/0x4420
[  261.291008]  ? dst_output+0x180/0x180
[  261.297003]  ? sched_clock_cpu+0x126/0x170
[  261.301003]  ? find_held_lock+0x39/0x1c0
[  261.306155]  ? stop_critical_timings+0x420/0x420
[  261.311004]  ? check_flags.part.36+0x450/0x450
[  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.326142]  ? cyc2ns_read_end+0x10/0x10
[  261.330139]  ? raw_bind+0x280/0x280
[  261.334138]  ? sched_clock_cpu+0x126/0x170
[  261.338995]  ? check_flags.part.36+0x450/0x450
[  261.342991]  ? __lock_acquire+0x4500/0x4500
[  261.348994]  ? inet_sendmsg+0x11c/0x500
[  261.352989]  ? dst_output+0x180/0x180
[  261.357012]  inet_sendmsg+0x11c/0x500
[ ... ]

v2:
 - clear skb->sk at reassembly routine.(Eric Dumarzet)

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_fragment.c                  |    1 +
 net/ipv6/netfilter/nf_conntrack_reasm.c |    1 +
 2 files changed, 2 insertions(+)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -597,6 +597,7 @@ static int ip_frag_reasm(struct ipq *qp,
 			nextp = &fp->next;
 			fp->prev = NULL;
 			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+			fp->sk = NULL;
 			head->data_len += fp->len;
 			head->len += fp->len;
 			if (head->ip_summed != fp->ip_summed)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -452,6 +452,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
 		else if (head->ip_summed == CHECKSUM_COMPLETE)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
+		fp->sk = NULL;
 	}
 	sub_frag_mem_limit(fq->q.net, head->truesize);
 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 4.9 71/71] ipv4: frags: precedence bug in ip_expire()
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 70/71] ip: frags: fix crash in ip_do_fragment() Greg Kroah-Hartman
@ 2018-10-16 17:10 ` Greg Kroah-Hartman
  2018-10-17  7:20 ` [PATCH 4.9 00/71] 4.9.134-stable review Amit Pundir
                   ` (4 subsequent siblings)
  75 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

(commit 70837ffe3085c9a91488b52ca13ac84424da1042 upstream)

We accidentally removed the parentheses here, but they are required
because '!' has higher precedence than '&'.

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_fragment.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -203,7 +203,7 @@ static void ip_expire(unsigned long arg)
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
 	__IP_INC_STATS(net, IPSTATS_MIB_REASMTIMEOUT);
 
-	if (!qp->q.flags & INET_FRAG_FIRST_IN)
+	if (!(qp->q.flags & INET_FRAG_FIRST_IN))
 		goto out;
 
 	/* sk_buff::dev and sk_buff::rbnode are unionized. So we



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2018-10-16 17:10 ` [PATCH 4.9 71/71] ipv4: frags: precedence bug in ip_expire() Greg Kroah-Hartman
@ 2018-10-17  7:20 ` Amit Pundir
  2018-10-17  7:51   ` Greg Kroah-Hartman
  2018-10-17 13:19 ` Guenter Roeck
                   ` (3 subsequent siblings)
  75 siblings, 1 reply; 82+ messages in thread
From: Amit Pundir @ 2018-10-17  7:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: lkml, Linus Torvalds, Andrew Morton, Guenter Roeck, Shuah Khan,
	patches, Ben Hutchings, lkft-triage, stable

Hi,

On Tue, 16 Oct 2018 at 22:52, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
<snip> ..

> Paul Burton <paul.burton@mips.com>
>     MIPS: VDSO: Always map near top of user memory

This patch broke stable-rc/linux-4.9.y build on kernelci
https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.133-72-gda849e5647be/,
mostly because of missing mips_gic_present() definition. It is added
in 582e2b4aecda ("MIPS: GIC: Introduce asm/mips-gic.h with accessor
functions"), which need to be backported too because it doesn't apply
cleanly on linux-4.9.y.

Regards,
Amit Pundir

>
> Jann Horn <jannh@google.com>
>     mm/vmstat.c: fix outdated vmstat_text
>
> Daniel Rosenberg <drosen@google.com>
>     ext4: Fix error code in ext4_xattr_set_entry()
>
> Amber Lin <Amber.Lin@amd.com>
>     drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
>
> Vitaly Kuznetsov <vkuznets@redhat.com>
>     x86/kvm/lapic: always disable MMIO interface in x2APIC mode
>
> Nicolas Ferre <nicolas.ferre@microchip.com>
>     ARM: dts: at91: add new compatibility string for macb on sama5d3
>
> Nicolas Ferre <nicolas.ferre@microchip.com>
>     net: macb: disable scatter-gather for macb on sama5d3
>
> Jongsung Kim <neidhard.kim@lge.com>
>     stmmac: fix valid numbers of unicast filter entries
>
> Yu Zhao <yuzhao@google.com>
>     sound: enable interrupt after dma buffer initialization
>
> Dan Carpenter <dan.carpenter@oracle.com>
>     scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted()
>
> Laura Abbott <labbott@redhat.com>
>     scsi: iscsi: target: Don't use stack buffer for scatterlist
>
> Tony Lindgren <tony@atomide.com>
>     mfd: omap-usb-host: Fix dts probe of children
>
> Lei Yang <Lei.Yang@windriver.com>
>     selftests: memory-hotplug: add required configs
>
> Lei Yang <Lei.Yang@windriver.com>
>     selftests/efivarfs: add required kernel configs
>
> Danny Smith <danny.smith@axis.com>
>     ASoC: sigmadsp: safeload should not have lower byte limit
>
> Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>     ASoC: wm8804: Add ACPI support
>
>
> -------------
>
> Diffstat:
>
>  Documentation/devicetree/bindings/net/macb.txt     |   1 +
>  Documentation/networking/ip-sysctl.txt             |  13 +-
>  Makefile                                           |   4 +-
>  arch/arm/boot/dts/sama5d3_emac.dtsi                |   2 +-
>  arch/mips/include/asm/processor.h                  |  10 +-
>  arch/mips/kernel/process.c                         |  25 +
>  arch/mips/kernel/vdso.c                            |  18 +-
>  arch/powerpc/include/asm/book3s/64/pgtable.h       |   4 +-
>  arch/x86/include/asm/pgtable_types.h               |   2 +-
>  arch/x86/include/uapi/asm/kvm.h                    |   1 +
>  arch/x86/kvm/lapic.c                               |  22 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   2 +-
>  drivers/i2c/busses/i2c-scmi.c                      |   1 +
>  drivers/mfd/omap-usb-host.c                        |  11 +-
>  drivers/net/bonding/bond_main.c                    |  43 +-
>  drivers/net/dsa/bcm_sf2.c                          |  12 +-
>  drivers/net/ethernet/broadcom/bcmsysport.c         |  22 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  13 +-
>  drivers/net/ethernet/cadence/macb.c                |   8 +
>  drivers/net/ethernet/hisilicon/hns/hnae.c          |   2 +-
>  drivers/net/ethernet/hisilicon/hns/hns_enet.c      |  30 +-
>  drivers/net/ethernet/marvell/mvpp2.c               |  10 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic.h        |   8 +-
>  .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c    |   3 +-
>  .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h    |   3 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h     |   3 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c     |  12 +-
>  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   5 +-
>  drivers/net/team/team.c                            |   5 +
>  drivers/net/usb/qmi_wwan.c                         |   1 +
>  drivers/net/usb/smsc75xx.c                         |   1 +
>  drivers/scsi/qla2xxx/qla_target.h                  |   4 +-
>  drivers/target/iscsi/iscsi_target.c                |  22 +-
>  drivers/usb/host/xhci-hub.c                        |  18 +-
>  drivers/video/fbdev/aty/atyfb.h                    |   3 +-
>  drivers/video/fbdev/aty/atyfb_base.c               |   7 +-
>  drivers/video/fbdev/aty/mach64_ct.c                |  10 +-
>  fs/ext4/xattr.c                                    |   2 +-
>  include/linux/netdevice.h                          |   7 +
>  include/linux/rhashtable.h                         |   4 +-
>  include/linux/skbuff.h                             |  34 +-
>  include/net/bonding.h                              |   7 +-
>  include/net/inet_frag.h                            | 133 +++--
>  include/net/inet_sock.h                            |   6 -
>  include/net/ip.h                                   |   1 -
>  include/net/ip_fib.h                               |   1 +
>  include/net/ipv6.h                                 |  26 +-
>  include/uapi/linux/snmp.h                          |   1 +
>  lib/rhashtable.c                                   |   5 +-
>  mm/vmstat.c                                        |   1 -
>  net/core/dev.c                                     |  28 +-
>  net/core/rtnetlink.c                               |   6 +
>  net/core/skbuff.c                                  |  31 +-
>  net/dccp/input.c                                   |   4 +-
>  net/dccp/ipv4.c                                    |   4 +-
>  net/ieee802154/6lowpan/6lowpan_i.h                 |  26 +-
>  net/ieee802154/6lowpan/reassembly.c                | 148 +++---
>  net/ipv4/fib_frontend.c                            |  12 +-
>  net/ipv4/fib_semantics.c                           |  50 ++
>  net/ipv4/inet_connection_sock.c                    |   5 +-
>  net/ipv4/inet_fragment.c                           | 379 +++-----------
>  net/ipv4/ip_fragment.c                             | 573 ++++++++++++---------
>  net/ipv4/ip_sockglue.c                             |   3 +-
>  net/ipv4/ip_tunnel.c                               |   9 +
>  net/ipv4/proc.c                                    |   7 +-
>  net/ipv4/tcp_input.c                               |  37 +-
>  net/ipv4/tcp_ipv4.c                                |   4 +-
>  net/ipv6/addrconf.c                                |   4 +-
>  net/ipv6/ip6_tunnel.c                              |  13 +-
>  net/ipv6/netfilter/nf_conntrack_reasm.c            | 100 ++--
>  net/ipv6/proc.c                                    |   5 +-
>  net/ipv6/raw.c                                     |  29 +-
>  net/ipv6/reassembly.c                              | 212 ++++----
>  net/netlabel/netlabel_unlabeled.c                  |   3 +-
>  sound/hda/hdac_controller.c                        |   8 +-
>  sound/soc/codecs/sigmadsp.c                        |   3 +-
>  sound/soc/codecs/wm8804-i2c.c                      |  15 +-
>  tools/perf/scripts/python/export-to-postgresql.py  |   9 +
>  tools/testing/selftests/efivarfs/config            |   1 +
>  tools/testing/selftests/memory-hotplug/config      |   1 +
>  80 files changed, 1185 insertions(+), 1133 deletions(-)
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-17  7:20 ` [PATCH 4.9 00/71] 4.9.134-stable review Amit Pundir
@ 2018-10-17  7:51   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-17  7:51 UTC (permalink / raw)
  To: Amit Pundir, Paul Burton, Huacai Chen
  Cc: lkml, Linus Torvalds, Andrew Morton, Guenter Roeck, Shuah Khan,
	patches, Ben Hutchings, lkft-triage, stable

On Wed, Oct 17, 2018 at 12:50:42PM +0530, Amit Pundir wrote:
> Hi,
> 
> On Tue, 16 Oct 2018 at 22:52, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > This is the start of the stable review cycle for the 4.9.134 release.
> > There are 71 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> <snip> ..
> 
> > Paul Burton <paul.burton@mips.com>
> >     MIPS: VDSO: Always map near top of user memory
> 
> This patch broke stable-rc/linux-4.9.y build on kernelci
> https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.133-72-gda849e5647be/,
> mostly because of missing mips_gic_present() definition. It is added
> in 582e2b4aecda ("MIPS: GIC: Introduce asm/mips-gic.h with accessor
> functions"), which need to be backported too because it doesn't apply
> cleanly on linux-4.9.y.

Thanks for letting me know, I've now just dropped this patch from the
tree.

greg k-h

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2018-10-17  7:20 ` [PATCH 4.9 00/71] 4.9.134-stable review Amit Pundir
@ 2018-10-17 13:19 ` Guenter Roeck
  2018-10-17 13:32   ` Greg Kroah-Hartman
  2018-10-17 15:11 ` Rafael Tinoco
                   ` (2 subsequent siblings)
  75 siblings, 1 reply; 82+ messages in thread
From: Guenter Roeck @ 2018-10-17 13:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, shuah, patches, ben.hutchings, lkft-triage, stable

On 10/16/2018 10:08 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> Anything received after that time might be too late.
> 

Boot tests are still going on, but:

powerpc:defconfig (and others):

arch/powerpc/include/asm/book3s/64/pgtable.h: In function 'pte_modify':
arch/powerpc/include/asm/book3s/64/pgtable.h:74:24: error: '_PAGE_DEVMAP' undeclared

Introduced by commit 4091b46b1dfe ("mm: Preserve _PAGE_DEVMAP across mprotect() calls")
which fixes ebd31197931d for ppc64 which is not in v4.9.y. Unfortunately it also fixes
69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP") which _is_ in the branch. ebd31197931d
doesn't apply cleanly, so either someone needs to backport it, or the ppc changes need
to be dropped from 4091b46b1dfe.


Excellent example why it isn't a good idea to fix two commits with one.

Guenter

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-17 13:19 ` Guenter Roeck
@ 2018-10-17 13:32   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-17 13:32 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Wed, Oct 17, 2018 at 06:19:06AM -0700, Guenter Roeck wrote:
> On 10/16/2018 10:08 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.9.134 release.
> > There are 71 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> Boot tests are still going on, but:
> 
> powerpc:defconfig (and others):
> 
> arch/powerpc/include/asm/book3s/64/pgtable.h: In function 'pte_modify':
> arch/powerpc/include/asm/book3s/64/pgtable.h:74:24: error: '_PAGE_DEVMAP' undeclared
> 
> Introduced by commit 4091b46b1dfe ("mm: Preserve _PAGE_DEVMAP across mprotect() calls")
> which fixes ebd31197931d for ppc64 which is not in v4.9.y. Unfortunately it also fixes
> 69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP") which _is_ in the branch. ebd31197931d
> doesn't apply cleanly, so either someone needs to backport it, or the ppc changes need
> to be dropped from 4091b46b1dfe.

Now dropped from the tree, Jan Kara just reported the same thing.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2018-10-17 13:19 ` Guenter Roeck
@ 2018-10-17 15:11 ` Rafael Tinoco
  2018-10-17 18:43 ` Shuah Khan
  2018-10-17 19:19 ` Guenter Roeck
  75 siblings, 0 replies; 82+ messages in thread
From: Rafael Tinoco @ 2018-10-17 15:11 UTC (permalink / raw)
  To: gregkh
  Cc: Linux Kernel, shuah, patches, lkft-triage, ben.hutchings, stable,
	akpm, torvalds, linux

On Tue, Oct 16, 2018 at 2:23 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.134-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary
------------------------------------------------------------------------

kernel: 4.9.134-rc2
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 9e48abe2679cbb419f7472c31d11c06711b5ebc7
git describe: v4.9.133-71-g9e48abe2679c
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.133-71-g9e48abe2679c


No regressions (compared to build v4.9.133-72-gda849e5647be)


No fixes (compared to build v4.9.133-72-gda849e5647be)


Ran 20932 total tests in the following environments and test suites.

Environments
--------------
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
-----------
* boot
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest
* libhugetlbfs
* ltp-fs-tests
* ltp-fsx-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2018-10-17 15:11 ` Rafael Tinoco
@ 2018-10-17 18:43 ` Shuah Khan
  2018-10-17 19:19 ` Guenter Roeck
  75 siblings, 0 replies; 82+ messages in thread
From: Shuah Khan @ 2018-10-17 18:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, lkft-triage,
	stable, Shuah Khan

On 10/16/2018 11:08 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.134-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2018-10-17 18:43 ` Shuah Khan
@ 2018-10-17 19:19 ` Guenter Roeck
  2018-10-18  7:12   ` Greg Kroah-Hartman
  75 siblings, 1 reply; 82+ messages in thread
From: Guenter Roeck @ 2018-10-17 19:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Tue, Oct 16, 2018 at 07:08:57PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> Anything received after that time might be too late.
> 

For v4.9.133-70-gbc300460e99d:

Build results:
	total: 150 pass: 150 fail: 0
Qemu test results:
	total: 308 pass: 308 fail: 0

Details are available at https://kerneltests.org/builders/.

Guenter

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 00/71] 4.9.134-stable review
  2018-10-17 19:19 ` Guenter Roeck
@ 2018-10-18  7:12   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-18  7:12 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Wed, Oct 17, 2018 at 12:19:39PM -0700, Guenter Roeck wrote:
> On Tue, Oct 16, 2018 at 07:08:57PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.9.134 release.
> > There are 71 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> For v4.9.133-70-gbc300460e99d:
> 
> Build results:
> 	total: 150 pass: 150 fail: 0
> Qemu test results:
> 	total: 308 pass: 308 fail: 0
> 
> Details are available at https://kerneltests.org/builders/.

Great, thanks for testing all three of these and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units
  2018-10-16 17:09 ` [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units Greg Kroah-Hartman
@ 2018-10-26 13:39   ` Stefan Schmidt
  2018-11-29 12:54     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 82+ messages in thread
From: Stefan Schmidt @ 2018-10-26 13:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, netdev
  Cc: stable, Eric Dumazet, Kirill Tkhai, Herbert Xu, Florian Westphal,
	Jesper Dangaard Brouer, Alexander Aring, Stefan Schmidt,
	David S. Miller

Hello Greg.

[Hope I am not to late for this]

On 16/10/2018 19:09, Greg Kroah-Hartman wrote:
> 4.9-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> Some applications still rely on IP fragmentation, and to be fair linux
> reassembly unit is not working under any serious load.
> 
> It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
> 
> A work queue is supposed to garbage collect items when host is under memory
> pressure, and doing a hash rebuild, changing seed used in hash computations.
> 
> This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
> occurring every 5 seconds if host is under fire.
> 
> Then there is the problem of sharing this hash table for all netns.
> 
> It is time to switch to rhashtables, and allocate one of them per netns
> to speedup netns dismantle, since this is a critical metric these days.
> 
> Lookup is now using RCU. A followup patch will even remove
> the refcount hold/release left from prior implementation and save
> a couple of atomic operations.
> 
> Before this patch, 16 cpus (16 RX queue NIC) could not handle more
> than 1 Mpps frags DDOS.
> 
> After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
> of storage for the fragments (exact number depends on frags being evicted
> after timeout)
> 
> $ grep FRAG /proc/net/sockstat
> FRAG: inuse 1966916 memory 2140004608
> 
> A followup patch will change the limits for 64bit arches.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Alexander Aring <alex.aring@gmail.com>
> Cc: Stefan Schmidt <stefan@osg.samsung.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  Documentation/networking/ip-sysctl.txt  |    7 
>  include/net/inet_frag.h                 |   81 +++----
>  include/net/ipv6.h                      |   16 -
>  net/ieee802154/6lowpan/6lowpan_i.h      |   26 --
>  net/ieee802154/6lowpan/reassembly.c     |   91 +++-----
>  net/ipv4/inet_fragment.c                |  349 ++++++--------------------------
>  net/ipv4/ip_fragment.c                  |  112 ++++------
>  net/ipv6/netfilter/nf_conntrack_reasm.c |   51 +---
>  net/ipv6/reassembly.c                   |  110 ++++------
>  9 files changed, 267 insertions(+), 576 deletions(-)
> 

When this patch hit master a while back we had to address a regression
in the ieee802514 6lowpan layer. It seems this fix is missing in the
backport series (only looking at your patchset here, no the full tree).

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=f18fa5de5ba7f1d6650951502bb96a6e4715a948

I would appreciate if you could pull this into this series as well.

regards
Stefan Schmidt

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units
  2018-10-26 13:39   ` Stefan Schmidt
@ 2018-11-29 12:54     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 82+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-29 12:54 UTC (permalink / raw)
  To: Stefan Schmidt
  Cc: linux-kernel, netdev, stable, Eric Dumazet, Kirill Tkhai,
	Herbert Xu, Florian Westphal, Jesper Dangaard Brouer,
	Alexander Aring, Stefan Schmidt, David S. Miller

On Fri, Oct 26, 2018 at 03:39:47PM +0200, Stefan Schmidt wrote:
> Hello Greg.
> 
> [Hope I am not to late for this]
> 
> On 16/10/2018 19:09, Greg Kroah-Hartman wrote:
> > 4.9-stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Some applications still rely on IP fragmentation, and to be fair linux
> > reassembly unit is not working under any serious load.
> > 
> > It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
> > 
> > A work queue is supposed to garbage collect items when host is under memory
> > pressure, and doing a hash rebuild, changing seed used in hash computations.
> > 
> > This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
> > occurring every 5 seconds if host is under fire.
> > 
> > Then there is the problem of sharing this hash table for all netns.
> > 
> > It is time to switch to rhashtables, and allocate one of them per netns
> > to speedup netns dismantle, since this is a critical metric these days.
> > 
> > Lookup is now using RCU. A followup patch will even remove
> > the refcount hold/release left from prior implementation and save
> > a couple of atomic operations.
> > 
> > Before this patch, 16 cpus (16 RX queue NIC) could not handle more
> > than 1 Mpps frags DDOS.
> > 
> > After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
> > of storage for the fragments (exact number depends on frags being evicted
> > after timeout)
> > 
> > $ grep FRAG /proc/net/sockstat
> > FRAG: inuse 1966916 memory 2140004608
> > 
> > A followup patch will change the limits for 64bit arches.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Alexander Aring <alex.aring@gmail.com>
> > Cc: Stefan Schmidt <stefan@osg.samsung.com>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> >  Documentation/networking/ip-sysctl.txt  |    7 
> >  include/net/inet_frag.h                 |   81 +++----
> >  include/net/ipv6.h                      |   16 -
> >  net/ieee802154/6lowpan/6lowpan_i.h      |   26 --
> >  net/ieee802154/6lowpan/reassembly.c     |   91 +++-----
> >  net/ipv4/inet_fragment.c                |  349 ++++++--------------------------
> >  net/ipv4/ip_fragment.c                  |  112 ++++------
> >  net/ipv6/netfilter/nf_conntrack_reasm.c |   51 +---
> >  net/ipv6/reassembly.c                   |  110 ++++------
> >  9 files changed, 267 insertions(+), 576 deletions(-)
> > 
> 
> When this patch hit master a while back we had to address a regression
> in the ieee802514 6lowpan layer. It seems this fix is missing in the
> backport series (only looking at your patchset here, no the full tree).
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=f18fa5de5ba7f1d6650951502bb96a6e4715a948
> 
> I would appreciate if you could pull this into this series as well.

Now queued up for 4.14 and 4.9 as well, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2018-11-29 12:54 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-16 17:08 [PATCH 4.9 00/71] 4.9.134-stable review Greg Kroah-Hartman
2018-10-16 17:08 ` [PATCH 4.9 01/71] ASoC: wm8804: Add ACPI support Greg Kroah-Hartman
2018-10-16 17:08 ` [PATCH 4.9 02/71] ASoC: sigmadsp: safeload should not have lower byte limit Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 03/71] selftests/efivarfs: add required kernel configs Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 04/71] selftests: memory-hotplug: add required configs Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 05/71] mfd: omap-usb-host: Fix dts probe of children Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 06/71] scsi: iscsi: target: Dont use stack buffer for scatterlist Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 07/71] scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 08/71] sound: enable interrupt after dma buffer initialization Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 09/71] stmmac: fix valid numbers of unicast filter entries Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 10/71] net: macb: disable scatter-gather for macb on sama5d3 Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 11/71] ARM: dts: at91: add new compatibility string " Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 12/71] x86/kvm/lapic: always disable MMIO interface in x2APIC mode Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 13/71] drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 14/71] ext4: Fix error code in ext4_xattr_set_entry() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 15/71] mm/vmstat.c: fix outdated vmstat_text Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 16/71] MIPS: VDSO: Always map near top of user memory Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 17/71] mach64: detect the dot clock divider correctly on sparc Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 18/71] perf script python: Fix export-to-postgresql.py occasional failure Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 19/71] mm: Preserve _PAGE_DEVMAP across mprotect() calls Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 20/71] i2c: i2c-scmi: fix for i2c_smbus_write_block_data Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 21/71] xhci: Dont print a warning when setting link state for disabled ports Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 22/71] bnxt_en: Fix TX timeout during netpoll Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 23/71] bonding: avoid possible dead-lock Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 24/71] ip6_tunnel: be careful when accessing the inner header Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 25/71] ip_tunnel: " Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 26/71] ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 27/71] ipv6: take rcu lock in rawv6_send_hdrinc() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 28/71] net: dsa: bcm_sf2: Call setup during switch resume Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 29/71] net: hns: fix for unmapping problem when SMMU is on Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 30/71] net: ipv4: update fnhe_pmtu when first hops MTU changes Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 31/71] net/ipv6: Display all addresses in output of /proc/net/if_inet6 Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 32/71] netlabel: check for IPV4MASK in addrinfo_get Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 33/71] net/usb: cancel pending work when unbinding smsc75xx Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 34/71] qlcnic: fix Tx descriptor corruption on 82xx devices Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 35/71] qmi_wwan: Added support for Gemaltos Cinterion ALASxx WWAN interface Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 36/71] team: Forbid enslaving team device to itself Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 37/71] net: dsa: bcm_sf2: Fix unbind ordering Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 38/71] net: mvpp2: Extract the correct ethtype from the skb for tx csum offload Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 39/71] net: systemport: Fix wake-up interrupt race during resume Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 40/71] rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 41/71] tcp/dccp: fix lockdep issue when SYN is backlogged Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 42/71] inet: make sure to grab rcu_read_lock before using ireq->ireq_opt Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 43/71] inet: frags: change inet_frags_init_net() return value Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 44/71] inet: frags: add a pointer to struct netns_frags Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 45/71] inet: frags: refactor ipfrag_init() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 46/71] inet: frags: refactor ipv6_frag_init() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 47/71] inet: frags: refactor lowpan_net_frag_init() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 48/71] ipv6: export ip6 fragments sysctl to unprivileged users Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 49/71] rhashtable: add schedule points Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units Greg Kroah-Hartman
2018-10-26 13:39   ` Stefan Schmidt
2018-11-29 12:54     ` Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 51/71] inet: frags: remove some helpers Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 52/71] inet: frags: get rif of inet_frag_evicting() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 53/71] inet: frags: remove inet_frag_maybe_warn_overflow() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 54/71] inet: frags: break the 2GB limit for frags storage Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 55/71] inet: frags: do not clone skb in ip_expire() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 56/71] ipv6: frags: rewrite ip6_expire_frag_queue() Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 57/71] rhashtable: reorganize struct rhashtable layout Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 58/71] inet: frags: reorganize struct netns_frags Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 59/71] inet: frags: get rid of ipfrag_skb_cb/FRAG_CB Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 60/71] inet: frags: fix ip6frag_low_thresh boundary Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 61/71] ip: discard IPv4 datagrams with overlapping segments Greg Kroah-Hartman
2018-10-16 17:09 ` [PATCH 4.9 62/71] net: speed up skb_rbtree_purge() Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 63/71] net: modify skb_rbtree_purge to return the truesize of all purged skbs Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 64/71] ipv6: defrag: drop non-last frags smaller than min mtu Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 65/71] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 66/71] net: add rb_to_skb() and other rb tree helpers Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 67/71] ip: use rb trees for IP frag queue Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 68/71] ip: add helpers to process in-order fragments faster Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 69/71] ip: process in-order fragments efficiently Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 70/71] ip: frags: fix crash in ip_do_fragment() Greg Kroah-Hartman
2018-10-16 17:10 ` [PATCH 4.9 71/71] ipv4: frags: precedence bug in ip_expire() Greg Kroah-Hartman
2018-10-17  7:20 ` [PATCH 4.9 00/71] 4.9.134-stable review Amit Pundir
2018-10-17  7:51   ` Greg Kroah-Hartman
2018-10-17 13:19 ` Guenter Roeck
2018-10-17 13:32   ` Greg Kroah-Hartman
2018-10-17 15:11 ` Rafael Tinoco
2018-10-17 18:43 ` Shuah Khan
2018-10-17 19:19 ` Guenter Roeck
2018-10-18  7:12   ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).