linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	kafai@fb.com, kpsingh@kernel.org
Subject: Re: [PATCH 0/6] Drain remote per-cpu directly v3
Date: Thu, 19 May 2022 12:15:24 -0700	[thread overview]
Message-ID: <20220519191524.GC1790663@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <YoZGSd6yQL3EP8tk@qian>

On Thu, May 19, 2022 at 09:29:45AM -0400, Qian Cai wrote:
> On Wed, May 18, 2022 at 10:15:03AM -0700, Paul E. McKenney wrote:
> > So does this python script somehow change the tracing state?  (It does
> > not look to me like it does, but I could easily be missing something.)
> 
> No, I don't think so either. It pretty much just offline memory sections
> one at a time.

No idea.

> > Either way, is there something else waiting for these RCU flavors?
> > (There should not be.)  Nevertheless, if so, there should be
> > a synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> > synchronize_rcu_tasks_trace() on some other blocked task's stack
> > somewhere.
> 
> There are only three blocked tasks when this happens. The kmemleak_scan()
> is just the victim waiting for the locks taken by the stucking
> offline_pages()->synchronize_rcu() task.

OK, then I believe that the RCU Tasks flavors were innocent bystanders.

Is the task doing offline_pages()->synchronize_rcu() doing this
repeatedly?  Or is there a stalled RCU grace period?  (From what
I can see, offline_pages() is not doing huge numbers of calls to
synchronize_rcu() in any of its loops, but I freely admit that I do not
know this code.)

If repeatedly, one workaround is to use synchronize_rcu_expedited()
instead of synchronize_rcu().  A better fix might be to batch the
grace periods, so that one RCU grace period serves several page
offline operations.  An alternative better fix might be to use
call_rcu() instead of synchronize_rcu().

>  task:kmemleak        state:D stack:25824 pid: 1033 ppid:     2 flags:0x00000008
>  Call trace:
>   __switch_to
>   __schedule
>   schedule
>   percpu_rwsem_wait
>   __percpu_down_read
>   percpu_down_read.constprop.0
>   get_online_mems

This is read-acquiring the mem_hotplug_lock.  It looks like offline_pages()
write-acquires this same lock.

>   kmemleak_scan
>   kmemleak_scan_thread
>   kthread
>   ret_from_fork
> 
>  task:cppc_fie        state:D stack:23472 pid: 1848 ppid:     2 flags:0x00000008
>  Call trace:
>   __switch_to
>   __schedule
>   lockdep_recursion
> 
>  task:tee             state:D stack:24816 pid:16733 ppid: 16732 flags:0x0000020c
>  Call trace:
>   __switch_to
>   __schedule
>   schedule
>   schedule_timeout
>   __wait_for_common
>   wait_for_completion
>   __wait_rcu_gp
>   synchronize_rcu

So, yes, this is sleeping holding the lock that kmemleak_scan wants to
acquire.

>   lru_cache_disable
>   __alloc_contig_migrate_range
>   isolate_single_pageblock
>   start_isolate_page_range
>   offline_pages
>   memory_subsys_offline
>   device_offline
>   online_store
>   dev_attr_store
>   sysfs_kf_write
>   kernfs_fop_write_iter
>   new_sync_write
>   vfs_write
>   ksys_write
>   __arm64_sys_write
>   invoke_syscall
>   el0_svc_common.constprop.0
>   do_el0_svc
>   el0_svc
>   el0t_64_sync_handler
>   el0t_64_sync
>  
> > Or maybe something sleeps waiting for an RCU Tasks * callback to
> > be invoked.  In that case (and in the above case, for that matter),
> > at least one of these pointers would be non-NULL on some CPU:
> > 
> > 1.	rcu_tasks__percpu.cblist.head
> > 2.	rcu_tasks_rude__percpu.cblist.head
> > 3.	rcu_tasks_trace__percpu.cblist.head
> > 
> > The ->func field of the pointed-to structure contains a pointer to
> > the callback function, which will help work out what is going on.
> > (Most likely a wakeup being lost or not provided.)
> 
> What would be some of the easy ways to find out those? I can't see anything
> interesting from the output of sysrq-t.

Again, I believe that these are victims of circumstance.  Though that does
not explain why revertin those three patches makes things work better.

Or is it possible that reverting those three patches simply decreases
the probability of failure, rather than eliminating the failure?
Such a decrease could be due to many things, for example, changes to
offsets and sizes of data structures.

> > Alternatively, if your system has hundreds of thousands of tasks and
> > you have attached BPF programs to short-lived socket structures and you
> > don't yet have the workaround, then you can see hangs.  (I am working on a
> > longer-term fix.)  In the short term, applying the workaround is the right
> > thing to do.  (Adding a couple of the BPF guys on CC for their thoughts.)
> 
> The system is pretty much idle after a fresh reboot. The only workload is
> to run the script.

Do you ever see RCU CPU stall warnings?

Could you please trace the offline_pages() function?  Is it really stuck,
or is it being invoked periodically during the hang?

							Thanx, Paul

  reply	other threads:[~2022-05-19 19:16 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12  8:50 [PATCH 0/6] Drain remote per-cpu directly v3 Mel Gorman
2022-05-12  8:50 ` [PATCH 1/6] mm/page_alloc: Add page->buddy_list and page->pcp_list Mel Gorman
2022-05-13 11:59   ` Nicolas Saenz Julienne
2022-05-19  9:36   ` Vlastimil Babka
2022-05-12  8:50 ` [PATCH 2/6] mm/page_alloc: Use only one PCP list for THP-sized allocations Mel Gorman
2022-05-19  9:45   ` Vlastimil Babka
2022-05-12  8:50 ` [PATCH 3/6] mm/page_alloc: Split out buddy removal code from rmqueue into separate helper Mel Gorman
2022-05-13 12:01   ` Nicolas Saenz Julienne
2022-05-19  9:52   ` Vlastimil Babka
2022-05-23 16:09   ` Qais Yousef
2022-05-24 11:55     ` Mel Gorman
2022-05-25 11:23       ` Qais Yousef
2022-05-12  8:50 ` [PATCH 4/6] mm/page_alloc: Remove unnecessary page == NULL check in rmqueue Mel Gorman
2022-05-13 12:03   ` Nicolas Saenz Julienne
2022-05-19 10:57   ` Vlastimil Babka
2022-05-19 12:13     ` Mel Gorman
2022-05-19 12:26       ` Vlastimil Babka
2022-05-12  8:50 ` [PATCH 5/6] mm/page_alloc: Protect PCP lists with a spinlock Mel Gorman
2022-05-13 12:22   ` Nicolas Saenz Julienne
2022-05-12  8:50 ` [PATCH 6/6] mm/page_alloc: Remotely drain per-cpu lists Mel Gorman
2022-05-12 19:37   ` Andrew Morton
2022-05-13 15:04     ` Mel Gorman
2022-05-13 15:19       ` Nicolas Saenz Julienne
2022-05-13 18:23         ` Mel Gorman
2022-05-17 12:57           ` Mel Gorman
2022-05-12 19:43 ` [PATCH 0/6] Drain remote per-cpu directly v3 Andrew Morton
2022-05-13 14:23   ` Mel Gorman
2022-05-13 19:38     ` Andrew Morton
2022-05-16 10:53       ` Mel Gorman
2022-05-13 12:24 ` Nicolas Saenz Julienne
2022-05-17 23:35 ` Qian Cai
2022-05-18 12:51   ` Mel Gorman
2022-05-18 16:27     ` Qian Cai
2022-05-18 17:15       ` Paul E. McKenney
2022-05-19 13:29         ` Qian Cai
2022-05-19 19:15           ` Paul E. McKenney [this message]
2022-05-19 21:05             ` Qian Cai
2022-05-19 21:29               ` Paul E. McKenney
2022-05-18 17:26   ` Marcelo Tosatti
2022-05-18 17:44     ` Marcelo Tosatti
2022-05-18 18:01 ` Nicolas Saenz Julienne
2022-05-26 17:19 ` Qian Cai
2022-05-27  8:39   ` Mel Gorman
2022-05-27 12:58     ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220519191524.GC1790663@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=quic_qiancai@quicinc.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).