All of lore.kernel.org
 help / color / mirror / Atom feed
* Stuck looping on list_empty(list) in free_pcppages_bulk()
@ 2021-08-30 23:12 Sultan Alsawaf
  2021-08-31 12:44 ` Mel Gorman
  2021-08-31 16:51 ` Vlastimil Babka
  0 siblings, 2 replies; 5+ messages in thread
From: Sultan Alsawaf @ 2021-08-30 23:12 UTC (permalink / raw)
  To: linux-mm; +Cc: mhocko, mgorman, akpm, linux-kernel

Hi,

I was recently given the following CPU stall splat and asked to look into it:
----------------8<----------------
 rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
 rcu:         44-...0: (1 GPs behind) idle=77e/1/0x4000000000000000 softirq=28480656/28480657 fqs=279939 
         (detected by 62, t=1140032 jiffies, g=224165577, q=8883218)
 Sending NMI from CPU 62 to CPUs 44:
 NMI backtrace for cpu 44
 CPU: 44 PID: 83957 Comm: perl Tainted: G             L    5.8.18-100.fc31.x86_64 #1
 RIP: 0010:free_pcppages_bulk+0x63/0x2a0
 RSP: 0018:ffffb3078698fb60 EFLAGS: 00000086
 RAX: ffff8b647db30390 RBX: ffffee5fcab67f48 RCX: ffffee5f30c79980
 RDX: 0000000000c31e66 RSI: 0000000000000007 RDI: 0000000000000007
 RBP: 0000000000000000 R08: ffffb3078698fb80 R09: ffffb3078698fb80
 R10: 00000000002ffa93 R11: 0000000000000000 R12: ffff8b647db30390
 R13: 0000000000000000 R14: ffff8b647db30380 R15: 000000006a340084
 FS:  0000000000000000(0000) GS:ffff8b647db00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ff2b5200fa8 CR3: 00000034ada0a000 CR4: 0000000000340ee0
 Call Trace:
  free_unref_page_list+0x113/0x1a0
  release_pages+0x3ad/0x450
  tlb_flush_mmu+0x36/0x160
  unmap_page_range+0xab6/0xee0
  unmap_vmas+0x6a/0xd0
  exit_mmap+0x97/0x170
  mmput+0x61/0x140
  do_exit+0x306/0xb80
  ? syscall_trace_enter+0x160/0x290
  do_group_exit+0x3a/0xa0
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x4d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
---------------->8----------------

I apologize in advance for reporting a bug on an EOL kernel. I don't see any
changes as of 5.14 that could address something like this, so I'm emailing in
case whatever happened here may be a bug affecting newer kernels.

With gdb, it appears that the CPU got stuck in the list_empty(list) loop inside
free_pcppages_bulk():
----------------8<----------------
	do {
		batch_free++;
		if (++migratetype == MIGRATE_PCPTYPES)
			migratetype = 0;
		list = &pcp->lists[migratetype];
	} while (list_empty(list));
---------------->8----------------

Although this code snippet is slightly different in 5.14, it's still ultimately
the same. Side note: I noticed that the way `migratetype` is incremented causes
`&pcp->lists[1]` to get looked at first rather than `&pcp->lists[0]`, since
`migratetype` will start out at 1. This quirk is still present in 5.14, though
the variable in question is now called `pindex`.

With some more gdb digging, I found that the `count` variable was stored in %ESI
at the time of the stall. According to register dump in the splat, %ESI was 7.

It looks like, for some reason, the pcp count was 7 higher than the number of
pages actually present in the pcp lists.

I tried to find some way that this could happen, but the only thing I could
think of was that maybe an allocation had both __GFP_RECLAIMABLE and
__GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
and then pages could be added to an out-of-bounds pcp list while still
incrementing the overall pcp count. This seems pretty unlikely though. As
another side note, it looks like there's nothing stopping this from occurring;
there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
set.

Any ideas on what may have happened here, or a link to a commit that may have
fixed this issue in newer kernels, would be much appreciated.

Thanks,
Sultan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Stuck looping on list_empty(list) in free_pcppages_bulk()
  2021-08-30 23:12 Stuck looping on list_empty(list) in free_pcppages_bulk() Sultan Alsawaf
@ 2021-08-31 12:44 ` Mel Gorman
  2021-08-31 16:37   ` Sultan Alsawaf
  2021-08-31 16:51 ` Vlastimil Babka
  1 sibling, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2021-08-31 12:44 UTC (permalink / raw)
  To: Sultan Alsawaf; +Cc: linux-mm, mhocko, akpm, linux-kernel

On Mon, Aug 30, 2021 at 04:12:51PM -0700, Sultan Alsawaf wrote:
> I apologize in advance for reporting a bug on an EOL kernel. I don't see any
> changes as of 5.14 that could address something like this, so I'm emailing in
> case whatever happened here may be a bug affecting newer kernels.
> 
> With gdb, it appears that the CPU got stuck in the list_empty(list) loop inside
> free_pcppages_bulk():
> ----------------8<----------------
> 	do {
> 		batch_free++;
> 		if (++migratetype == MIGRATE_PCPTYPES)
> 			migratetype = 0;
> 		list = &pcp->lists[migratetype];
> 	} while (list_empty(list));
> ---------------->8----------------
> 
> Although this code snippet is slightly different in 5.14, it's still ultimately
> the same. Side note: I noticed that the way `migratetype` is incremented causes
> `&pcp->lists[1]` to get looked at first rather than `&pcp->lists[0]`, since
> `migratetype` will start out at 1. This quirk is still present in 5.14, though
> the variable in question is now called `pindex`.
> 
> With some more gdb digging, I found that the `count` variable was stored in %ESI
> at the time of the stall. According to register dump in the splat, %ESI was 7.
> 
> It looks like, for some reason, the pcp count was 7 higher than the number of
> pages actually present in the pcp lists.
> 

That's your answer -- the PCP count has been corrupted or misaccounted.
Given this is a Fedora kernel, check for any patches affecting
mm/page_alloc.c that could be accounting related or that would affect
the IRQ disabling or zone lock acquisition for problems. Another
possibility is memory corruption -- either kernel or the hardware
itself.

> I tried to find some way that this could happen, but the only thing I could
> think of was that maybe an allocation had both __GFP_RECLAIMABLE and
> __GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
> get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
> and then pages could be added to an out-of-bounds pcp list while still
> incrementing the overall pcp count. This seems pretty unlikely though.

It's unlikely because it would be an outright bug to specify both flags.

> As
> another side note, it looks like there's nothing stopping this from occurring;
> there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
> set.
> 

There is no explicit check for it because they should not be both set.
I don't think this happens in kernel but if an out-of-tree module did
it, it might corrupt adjacent PCPs.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Stuck looping on list_empty(list) in free_pcppages_bulk()
  2021-08-31 12:44 ` Mel Gorman
@ 2021-08-31 16:37   ` Sultan Alsawaf
  0 siblings, 0 replies; 5+ messages in thread
From: Sultan Alsawaf @ 2021-08-31 16:37 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, mhocko, akpm, linux-kernel

On Tue, Aug 31, 2021 at 01:44:49PM +0100, Mel Gorman wrote:
> That's your answer -- the PCP count has been corrupted or misaccounted.
> Given this is a Fedora kernel, check for any patches affecting
> mm/page_alloc.c that could be accounting related or that would affect
> the IRQ disabling or zone lock acquisition for problems. Another
> possibility is memory corruption -- either kernel or the hardware
> itself.

Hmm, I don't see any changes to mm/page_alloc.c from Fedora for this kernel.

What about a memory allocation originating from inside an NMI? I think this
could occur quite easily with an eBPF program registered to a tracepoint, as
some of the eBPF helpers do allocate memory on the fly for certain map types.

> > I tried to find some way that this could happen, but the only thing I could
> > think of was that maybe an allocation had both __GFP_RECLAIMABLE and
> > __GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
> > get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
> > and then pages could be added to an out-of-bounds pcp list while still
> > incrementing the overall pcp count. This seems pretty unlikely though.
> 
> It's unlikely because it would be an outright bug to specify both flags.

Perhaps that VM_WARN_ON should be changed to a VM_BUG_ON?

Thanks,
Sultan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Stuck looping on list_empty(list) in free_pcppages_bulk()
  2021-08-30 23:12 Stuck looping on list_empty(list) in free_pcppages_bulk() Sultan Alsawaf
  2021-08-31 12:44 ` Mel Gorman
@ 2021-08-31 16:51 ` Vlastimil Babka
  2021-08-31 16:58   ` Sultan Alsawaf
  1 sibling, 1 reply; 5+ messages in thread
From: Vlastimil Babka @ 2021-08-31 16:51 UTC (permalink / raw)
  To: Sultan Alsawaf, linux-mm; +Cc: mhocko, mgorman, akpm, linux-kernel

On 8/31/21 01:12, Sultan Alsawaf wrote:
> With some more gdb digging, I found that the `count` variable was stored in %ESI
> at the time of the stall. According to register dump in the splat, %ESI was 7.
> 
> It looks like, for some reason, the pcp count was 7 higher than the number of
> pages actually present in the pcp lists.
> 
> I tried to find some way that this could happen, but the only thing I could
> think of was that maybe an allocation had both __GFP_RECLAIMABLE and
> __GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
> get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
> and then pages could be added to an out-of-bounds pcp list while still
> incrementing the overall pcp count. This seems pretty unlikely though. As
> another side note, it looks like there's nothing stopping this from occurring;
> there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
> set.
> 
> Any ideas on what may have happened here, or a link to a commit that may have
> fixed this issue in newer kernels, would be much appreciated.

Does the kernel have commit 88e8ac11d2ea3 backported? If not, and there were
memory hotplug operations happening, the infinite loop could happen. If that
commit was not backported, and instead 5c3ad2eb7104 was backported, possibly
there are more scenarios outside hotplug that can cause trouble.

> Thanks,
> Sultan
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Stuck looping on list_empty(list) in free_pcppages_bulk()
  2021-08-31 16:51 ` Vlastimil Babka
@ 2021-08-31 16:58   ` Sultan Alsawaf
  0 siblings, 0 replies; 5+ messages in thread
From: Sultan Alsawaf @ 2021-08-31 16:58 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, mhocko, mgorman, akpm, linux-kernel

On Tue, Aug 31, 2021 at 06:51:23PM +0200, Vlastimil Babka wrote:
> Does the kernel have commit 88e8ac11d2ea3 backported? If not, and there were
> memory hotplug operations happening, the infinite loop could happen. If that
> commit was not backported, and instead 5c3ad2eb7104 was backported, possibly
> there are more scenarios outside hotplug that can cause trouble.

Yes, it appears that 5.8.18 contains the 88e8ac11d2ea3 backport (commit ID
5b21a116b1365 in linux-5.8.y).

Thanks,
Sultan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-31 16:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 23:12 Stuck looping on list_empty(list) in free_pcppages_bulk() Sultan Alsawaf
2021-08-31 12:44 ` Mel Gorman
2021-08-31 16:37   ` Sultan Alsawaf
2021-08-31 16:51 ` Vlastimil Babka
2021-08-31 16:58   ` Sultan Alsawaf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.