linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jiri Slaby <jslaby@suse.cz>,
	Michal Hocko <mhocko@suse.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Shakeel Butt <shakeelb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.14 50/69] memcg: make it work on sparse non-0-node systems
Date: Fri,  7 Jun 2019 17:39:31 +0200	[thread overview]
Message-ID: <20190607153854.483053122@linuxfoundation.org> (raw)
In-Reply-To: <20190607153848.271562617@linuxfoundation.org>

From: Jiri Slaby <jslaby@suse.cz>

commit 3e8589963773a5c23e2f1fe4bcad0e9a90b7f471 upstream.

We have a single node system with node 0 disabled:
  Scanning NUMA topology in Northbridge 24
  Number of physical nodes 2
  Skipping disabled node 0
  Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
  NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]

This causes crashes in memcg when system boots:
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
...
  RIP: 0010:list_lru_add+0x94/0x170
...
  Call Trace:
   d_lru_add+0x44/0x50
   dput.part.34+0xfc/0x110
   __fput+0x108/0x230
   task_work_run+0x9f/0xc0
   exit_to_usermode_loop+0xf5/0x100

It is reproducible as far as 4.12.  I did not try older kernels.  You have
to have a new enough systemd, e.g.  241 (the reason is unknown -- was not
investigated).  Cannot be reproduced with systemd 234.

The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.

The root cause are list_lru_memcg_aware checks in the list_lru code.  The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.

So fix this by avoiding checks on node 0.  Remember the memcg-awareness by
a bool flag in struct list_lru.

Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/list_lru.h |    1 +
 mm/list_lru.c            |    8 +++-----
 2 files changed, 4 insertions(+), 5 deletions(-)

--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -52,6 +52,7 @@ struct list_lru {
 	struct list_lru_node	*node;
 #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
 	struct list_head	list;
+	bool			memcg_aware;
 #endif
 };
 
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -42,11 +42,7 @@ static void list_lru_unregister(struct l
 #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
 static inline bool list_lru_memcg_aware(struct list_lru *lru)
 {
-	/*
-	 * This needs node 0 to be always present, even
-	 * in the systems supporting sparse numa ids.
-	 */
-	return !!lru->node[0].memcg_lrus;
+	return lru->memcg_aware;
 }
 
 static inline struct list_lru_one *
@@ -389,6 +385,8 @@ static int memcg_init_list_lru(struct li
 {
 	int i;
 
+	lru->memcg_aware = memcg_aware;
+
 	if (!memcg_aware)
 		return 0;
 



  parent reply	other threads:[~2019-06-07 15:43 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 15:38 [PATCH 4.14 00/69] 4.14.124-stable review Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 01/69] inet: switch IP ID generator to siphash Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 02/69] ipv6: Consider sk_bound_dev_if when binding a raw socket to an address Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 03/69] llc: fix skb leak in llc_build_and_send_ui_pkt() Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 04/69] net: fec: fix the clk mismatch in failed_reset path Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 05/69] net-gro: fix use-after-free read in napi_gro_frags() Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 06/69] net: stmmac: fix reset gpio free missing Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 07/69] usbnet: fix kernel crash after disconnect Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 08/69] tipc: Avoid copying bytes beyond the supplied data Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 09/69] net/mlx5: Allocate root ns memory using kzalloc to match kfree Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 10/69] bnxt_en: Fix aggregation buffer leak under OOM condition Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 11/69] ipv4/igmp: fix another memory leak in igmpv3_del_delrec() Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 12/69] ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 13/69] net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 14/69] net: mvneta: Fix err code path of probe Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 15/69] net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 16/69] net: phy: marvell10g: report if the PHY fails to boot firmware Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 17/69] crypto: vmx - ghash: do nosimd fallback manually Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 4.14 18/69] xen/pciback: Dont disable PCI_COMMAND on PCI device reset Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 19/69] Revert "tipc: fix modprobe tipc failed after switch order of device registration" Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 20/69] tipc: fix modprobe tipc failed after switch order of device registration Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 21/69] sparc64: Fix regression in non-hypervisor TLB flush xcall Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 22/69] include/linux/bitops.h: sanitize rotate primitives Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 23/69] xhci: update bounce buffer with correct sg num Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 24/69] xhci: Use %zu for printing size_t type Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 25/69] xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 26/69] usb: xhci: avoid null pointer deref when bos field is NULL Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 27/69] usbip: usbip_host: fix BUG: sleeping function called from invalid context Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 28/69] usbip: usbip_host: fix stub_dev lock context imbalance regression Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 29/69] USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 30/69] USB: sisusbvga: fix oops in error path of sisusb_probe Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 31/69] USB: Add LPM quirk for Surface Dock GigE adapter Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 32/69] USB: rio500: refuse more than one device at a time Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 33/69] USB: rio500: fix memory leak in close after disconnect Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 34/69] media: usb: siano: Fix general protection fault in smsusb Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 35/69] media: usb: siano: Fix false-positive "uninitialized variable" warning Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 36/69] media: smsusb: better handle optional alignment Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 37/69] scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 38/69] scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs) Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 39/69] Btrfs: fix wrong ctime and mtime of a directory after log replay Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 40/69] Btrfs: fix race updating log root item during fsync Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 41/69] Btrfs: fix fsync not persisting changed attributes of a directory Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 42/69] Btrfs: incremental send, fix file corruption when no-holes feature is enabled Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 43/69] KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 44/69] powerpc/perf: Fix MMCRA corruption by bhrb_filter Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 45/69] ALSA: hda/realtek - Set default power save node to 0 Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 46/69] KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 47/69] drm/nouveau/i2c: Disable i2c bus access after ->fini() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 48/69] tty: serial: msm_serial: Fix XON/XOFF Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 49/69] tty: max310x: Fix external crystal register setup Greg Kroah-Hartman
2019-06-07 15:39 ` Greg Kroah-Hartman [this message]
2019-06-07 15:39 ` [PATCH 4.14 51/69] kernel/signal.c: trace_signal_deliver when signal_group_exit Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 52/69] docs: Fix conf.py for Sphinx 2.0 Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 53/69] doc: Cope with the deprecation of AutoReporter Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 54/69] doc: Cope with Sphinx logging deprecations Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 55/69] ima: show rules with IMA_INMASK correctly Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 56/69] serial: sh-sci: disable DMA for uart_console Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 57/69] staging: vc04_services: prevent integer overflow in create_pagelist() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 58/69] staging: wlan-ng: fix adapter initialization failure Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 59/69] CIFS: cifs_read_allocate_pages: dont iterate through whole page array on ENOMEM Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 60/69] Revert "lockd: Show pid of lockd for remote locks" Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 61/69] gcc-plugins: Fix build failures under Darwin host Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 62/69] drm/vmwgfx: Dont send drm sysfs hotplug events on initial master set Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 63/69] drm/rockchip: shutdown drm subsystem on shutdown Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 64/69] Compiler Attributes: add support for __copy (gcc >= 9) Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 65/69] include/linux/module.h: copy __init/__exit attrs to init/cleanup_module Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 66/69] Revert "x86/build: Move _etext to actual end of .text" Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 67/69] Revert "binder: fix handling of misaligned binder object" Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 68/69] binder: fix race between munmap() and direct reclaim Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 4.14 69/69] media: uvcvideo: Fix uvc_alloc_entity() allocation alignment Greg Kroah-Hartman
2019-06-07 16:11 ` [PATCH 4.14 00/69] 4.14.124-stable review Guenter Roeck
2019-06-07 16:16   ` Greg Kroah-Hartman
2019-06-07 16:27     ` Guenter Roeck
2019-06-07 16:32       ` Greg Kroah-Hartman
2019-06-07 16:38         ` Guenter Roeck
2019-06-07 16:35       ` Ben Hutchings
2019-06-08  9:28         ` Greg Kroah-Hartman
2019-06-07 19:29 ` kernelci.org bot
2019-06-08  7:13 ` Naresh Kamboju
2019-06-08  9:32 ` Greg Kroah-Hartman
2019-06-08 19:06   ` Naresh Kamboju
2019-06-09  7:14     ` Greg Kroah-Hartman
2019-06-08 18:45 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190607153854.483053122@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).