linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>,
	Christian Heusel <christian@heusel.eu>,
	Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Runge <dave@sleepmap.de>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	Mark W <instruform@gmail.com>,
	regressions@lists.linux.dev, Yosry Ahmed <yosryahmed@google.com>
Subject: Re: [REGRESSION] Null pointer dereference while shrinking zswap
Date: Wed, 17 Apr 2024 10:33:24 -0400	[thread overview]
Message-ID: <20240417143324.GA1055428@cmpxchg.org> (raw)
In-Reply-To: <246c1f4d-af13-40fa-b968-fbaf36b8f91f@linux.dev>

On Wed, Apr 17, 2024 at 11:44:45AM +0800, Chengming Zhou wrote:
> On 2024/4/17 08:22, Nhat Pham wrote:
> > On Tue, Apr 16, 2024 at 4:29 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >>
> >> On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >>>
> >>> On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote:
> >>>>
> >>>> Hello everyone,
> >>>
> >>> Thanks for the report, Christian! Looking at it now.
> >>>
> >>>>
> >>>> while rebuilding a few packages in Arch Linux we have recently come
> >>>> across a regression in the linux kernel which was made visible by a test
> >>>> failure in libguestfs[0], where the booted kernel showed a Call Trace
> >>>> like the following one:
> >>>>
> >>>> [  218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M
> >>>
> >>> Is this one of the kernel versions that was broken? That looks a bit
> >>> odd, as zswap shrinker landed on 6.8...
> >>
> >> Ah ignore this - I understand the versioning now...
> >>
> >>>
> >>>> [  218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M
> >>>> [  218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M
> >>>> [  218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M
> >>>> [  218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M
> >>>> [  218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M
> >>>> [  218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M
> >>>> [  218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M
> >>>> [  218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M
> >>>> [  218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M
> >>>> [  218.742376] FS:  00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M
> >>>> [  218.742569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
> >>>> [  218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M
> >>>> [  218.743494] PKRU: 55555554^M
> >>>> [  218.743593] Call Trace:^M
> >>>> [  218.743733]  <TASK>^M
> >>>> [  218.743847]  ? __die+0x23/0x70^M
> >>>> [  218.743957]  ? page_fault_oops+0x171/0x4e0^M
> >>>> [  218.744056]  ? free_unref_page+0xf6/0x180^M
> >>>> [  218.744458]  ? exc_page_fault+0x7f/0x180^M
> >>>> [  218.744551]  ? asm_exc_page_fault+0x26/0x30^M
> >>>> [  218.744684]  ? memcg_page_state+0x9/0x30^M
> >>>> [  218.744779]  zswap_shrinker_count+0x9d/0x110^M
> >>>> [  218.744896]  do_shrink_slab+0x3a/0x360^M
> >>>> [  218.744990]  shrink_slab+0xc7/0x3c0^M
> >>>> [  218.745609]  drop_slab+0x85/0x140^M
> >>>> [  218.745691]  drop_caches_sysctl_handler+0x7e/0xd0^M
> >>>> [  218.745799]  proc_sys_call_handler+0x1c0/0x2e0^M
> >>>> [  218.745912]  vfs_write+0x23d/0x400^M
> >>>> [  218.745998]  ksys_write+0x6f/0xf0^M
> >>>> [  218.746080]  do_syscall_64+0x64/0xe0^M
> >>>> [  218.746169]  ? exit_to_user_mode_prepare+0x132/0x1f0^M
> >>>> [  218.746873]  entry_SYSCALL_64_after_hwframe+0x6e/0x76^M
> >>>>
> > 
> > Actually, inspecting the code a bit more - can memcg be null here?
> > 
> > Specifically, if mem_cgroup_disabled() is true, can we see null memcg
> > here? Looks like in this case, mem_cgroup_iter() can return null, and
> > the first iteration of drop_slab_node() can have null memcg if it's
> > returned by mem_cgroup_iter().
> > 
> > shrink_slab() will still proceed and call do_shrink_slab() if the
> > memcg is null - provided that mem_cgroup_disabled() holds:
> > 
> > if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> >       return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> > 
> 
> Ah, I think your analysis is very right, here memcg is NULL but memcg_page_state
> won't check.

+1.

I could reproduce the NULL crash locally with cgroup_disable=memory,
the shrinker enabled, and echo 3 >/proc/sys/vm/drop_caches.

> > Inside zswap_shrink_count(), all the memcg accessors in this area seem
> > to always check for null memcg (mem_cgroup_lruvec,
> > mem_cgroup_zswap_writeback_enabled), *except* memcg_page_state, which
> > is the one line that fail.
> > 
> > If this is all to it, we can simply add a null check or
> > mem_cgroup_disabled() check here, and use pool stats instead?
> 
> Both look ok to me. The VM could only set sc->memcg to NULL when memcg
> disabled, right?

Christian, can you please test the below patch on top of current
upstream?

diff --git a/mm/zswap.c b/mm/zswap.c
index caed028945b0..6f8850c44b61 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
 	if (!gfp_has_io_fs(sc->gfp_mask))
 		return 0;
 
-#ifdef CONFIG_MEMCG_KMEM
-	mem_cgroup_flush_stats(memcg);
-	nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;
-	nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
-#else
-	/* use pool stats instead of memcg stats */
-	nr_backing = zswap_pool_total_size >> PAGE_SHIFT;
-	nr_stored = atomic_read(&zswap_nr_stored);
-#endif
+	/*
+	 * For memcg, use the cgroup-wide ZSWAP stats since we don't
+	 * have them per-node and thus per-lruvec. Careful if memcg is
+	 * runtime-disabled: we can get sc->memcg == NULL, which is ok
+	 * for the lruvec, but not for memcg_page_state().
+	 *
+	 * Without memcg, use the zswap pool-wide metrics.
+	 */
+	if (!mem_cgroup_disabled()) {
+		mem_cgroup_flush_stats(memcg);
+		nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;
+		nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
+	} else {
+		nr_backing = zswap_pool_total_size >> PAGE_SHIFT;
+		nr_stored = atomic_read(&zswap_nr_stored);
+	}
 
 	if (!nr_stored)
 		return 0;

  reply	other threads:[~2024-04-17 14:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel
2024-04-16 19:18 ` Andrew Morton
2024-04-16 19:57   ` Christian Heusel
2024-04-16 22:14 ` Nhat Pham
2024-04-16 22:29   ` Christian Heusel
2024-04-16 23:29   ` Nhat Pham
2024-04-17  0:22     ` Nhat Pham
2024-04-17  3:44       ` Chengming Zhou
2024-04-17 14:33         ` Johannes Weiner [this message]
2024-04-17 15:08           ` Richard W.M. Jones
2024-04-17 17:18           ` Christian Heusel
2024-04-18 12:40             ` Johannes Weiner
2024-04-18 14:25               ` Linux regression tracking (Thorsten Leemhuis)
2024-04-18 20:09               ` Yosry Ahmed
2024-04-19 14:22                 ` Johannes Weiner
2024-04-19 19:10                   ` Yosry Ahmed
2024-04-17  0:33 ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240417143324.GA1055428@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=christian@heusel.eu \
    --cc=dave@sleepmap.de \
    --cc=ddstreet@ieee.org \
    --cc=instruform@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=regressions@lists.linux.dev \
    --cc=rjones@redhat.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).