All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Pavan Kondeti <quic_pkondeti@quicinc.com>
Cc: Michel Lespinasse <michel@lespinasse.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	kernel-team@fb.com, Laurent Dufour <ldufour@linux.ibm.com>,
	Jerome Glisse <jglisse@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Matthew Wilcox <willy@infradead.org>,
	Liam Howlett <liam.howlett@oracle.com>,
	Rik van Riel <riel@surriel.com>,
	Paul McKenney <paulmck@kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Minchan Kim <minchan@google.com>,
	Joel Fernandes <joelaf@google.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH v2 23/35] mm: add mmu_notifier_lock
Date: Wed, 27 Jul 2022 13:30:29 -0700	[thread overview]
Message-ID: <CAJuCfpG0_xwGhTbzWRRwcBKO263TgrVm0T1gJ+PdzcL-EzcHpA@mail.gmail.com> (raw)
In-Reply-To: <20220727073420.GA8985@hu-pkondeti-hyd.qualcomm.com>

On Wed, Jul 27, 2022 at 12:34 AM Pavan Kondeti
<quic_pkondeti@quicinc.com> wrote:
>
> On Fri, Jan 28, 2022 at 05:09:54AM -0800, Michel Lespinasse wrote:
> > Introduce mmu_notifier_lock as a per-mm percpu_rw_semaphore,
> > as well as the code to initialize and destroy it together with the mm.
> >
> > This lock will be used to prevent races between mmu_notifier_register()
> > and speculative fault handlers that need to fire MMU notifications
> > without holding any of the mmap or rmap locks.
> >
> > Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
> > ---
> >  include/linux/mm_types.h     |  6 +++++-
> >  include/linux/mmu_notifier.h | 27 +++++++++++++++++++++++++--
> >  kernel/fork.c                |  3 ++-
> >  3 files changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 305f05d2a4bc..f77e2dec038d 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -462,6 +462,7 @@ struct vm_area_struct {
> >  } __randomize_layout;
> >
> >  struct kioctx_table;
> > +struct percpu_rw_semaphore;
> >  struct mm_struct {
> >       struct {
> >               struct vm_area_struct *mmap;            /* list of VMAs */
> > @@ -608,7 +609,10 @@ struct mm_struct {
> >               struct file __rcu *exe_file;
> >  #ifdef CONFIG_MMU_NOTIFIER
> >               struct mmu_notifier_subscriptions *notifier_subscriptions;
> > -#endif
> > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> > +             struct percpu_rw_semaphore *mmu_notifier_lock;
> > +#endif       /* CONFIG_SPECULATIVE_PAGE_FAULT */
> > +#endif       /* CONFIG_MMU_NOTIFIER */
> >  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
> >               pgtable_t pmd_huge_pte; /* protected by page_table_lock */
> >  #endif
> > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > index 45fc2c81e370..ace76fe91c0c 100644
> > --- a/include/linux/mmu_notifier.h
> > +++ b/include/linux/mmu_notifier.h
> > @@ -6,6 +6,8 @@
> >  #include <linux/spinlock.h>
> >  #include <linux/mm_types.h>
> >  #include <linux/mmap_lock.h>
> > +#include <linux/percpu-rwsem.h>
> > +#include <linux/slab.h>
> >  #include <linux/srcu.h>
> >  #include <linux/interval_tree.h>
> >
> > @@ -499,15 +501,35 @@ static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
> >               __mmu_notifier_invalidate_range(mm, start, end);
> >  }
> >
> > -static inline void mmu_notifier_subscriptions_init(struct mm_struct *mm)
> > +static inline bool mmu_notifier_subscriptions_init(struct mm_struct *mm)
> >  {
> > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> > +     mm->mmu_notifier_lock = kzalloc(sizeof(struct percpu_rw_semaphore), GFP_KERNEL);
> > +     if (!mm->mmu_notifier_lock)
> > +             return false;
> > +     if (percpu_init_rwsem(mm->mmu_notifier_lock)) {
> > +             kfree(mm->mmu_notifier_lock);
> > +             return false;
> > +     }
> > +#endif
> > +
> >       mm->notifier_subscriptions = NULL;
> > +     return true;
> >  }
> >
> >  static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm)
> >  {
> >       if (mm_has_notifiers(mm))
> >               __mmu_notifier_subscriptions_destroy(mm);
> > +
> > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> > +     if (!in_atomic()) {
> > +             percpu_free_rwsem(mm->mmu_notifier_lock);
> > +             kfree(mm->mmu_notifier_lock);
> > +     } else {
> > +             percpu_rwsem_async_destroy(mm->mmu_notifier_lock);
> > +     }
> > +#endif
> >  }
> >
>
> We have received a bug report from our customer running Android GKI kernel
> android-13-5.15 branch where this series is included. As the callstack [1]
> indicates, the non-atomic test it self is not sufficient to free the percpu
> rwsem.
>
> The scenario deduced from the callstack:
>
> - context switch on CPU#0 from 'A' to idle. idle thread took A's mm
>
> - 'A' later ran on another CPU and exited. A's mm has still reference.
>
> - Now CPU#0 is being hotplugged out. As part of this, idle thread's
> mm is switched (in idle_task_exit()) but its active_mm freeing is
> deferred to finish_cpu() which gets called later from the control processor
> (the thread which initiated the CPU hotplug). Please see the reasoning
> on why mmdrop() is not called in idle_task_exit() at
> commit bf2c59fce4074('sched/core: Fix illegal RCU from offline CPUs')
>
> - Now when finish_cpu() tries call percpu_free_rwsem() directly since we are
> not in atomic path but hotplug path where cpus_write_lock() called is causing
> the deadlock.
>
> I am not sure if there is a clean way other than freeing the per-cpu
> rwsemaphore asynchronously all the time.

Thanks for reporting this issue, Pavan. I think your suggestion of
doing unconditional async destruction of mmu_notifier_lock would be
fine here. percpu_rwsem_async_destroy has a bit of an overhead to
schedule that work but I don't think the exit path is too performance
critical to suffer from that. Michel, WDYT?

>
> [1]
>
> -001|context_switch(inline)
> -001|__schedule()
> -002|__preempt_count_sub(inline)
> -002|schedule()
> -003|_raw_spin_unlock_irq(inline)
> -003|spin_unlock_irq(inline)
> -003|percpu_rwsem_wait()
> -004|__preempt_count_add(inline)
> -004|__percpu_down_read()
> -005|percpu_down_read(inline)
> -005|cpus_read_lock() // trying to get cpu_hotplug_lock again
> -006|rcu_barrier()
> -007|rcu_sync_dtor()
> -008|mmu_notifier_subscriptions_destroy(inline)
> -008|__mmdrop()
> -009|mmdrop(inline)
> -009|finish_cpu()
> -010|cpuhp_invoke_callback()
> -011|cpuhp_invoke_callback_range(inline)
> -011|cpuhp_down_callbacks()
> -012|_cpu_down() // acquired cpu_hotplug_lock (write lock)
>
> Thanks,
> Pavan
>

  reply	other threads:[~2022-07-27 20:30 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-28 13:09 [PATCH v2 00/35] Speculative page faults Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 01/35] mm: export dump_mm Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 02/35] mmap locking API: mmap_lock_is_contended returns a bool Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 03/35] mmap locking API: name the return values Michel Lespinasse
2022-01-31 16:17   ` Liam Howlett
2022-02-07 17:39     ` Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 04/35] do_anonymous_page: use update_mmu_tlb() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 05/35] do_anonymous_page: reduce code duplication Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 06/35] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 07/35] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 08/35] mm: add FAULT_FLAG_SPECULATIVE flag Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 09/35] mm: add do_handle_mm_fault() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 10/35] mm: add per-mm mmap sequence counter for speculative page fault handling Michel Lespinasse
2022-08-25 11:23   ` Pavan Kondeti
2022-01-28 13:09 ` [PATCH v2 11/35] mm: rcu safe vma freeing Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 12/35] mm: separate mmap locked assertion from find_vma Michel Lespinasse
2022-01-29  0:08   ` kernel test robot
2022-01-29  0:08     ` kernel test robot
2022-01-29  0:33     ` Michel Lespinasse
2022-01-29  0:33       ` Michel Lespinasse
2022-01-31 14:44   ` Matthew Wilcox
2022-02-04 22:41     ` Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 13/35] x86/mm: attempt speculative mm faults first Michel Lespinasse
2022-02-01 17:16   ` Liam Howlett
2022-02-07 17:39     ` Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 14/35] mm: add speculative_page_walk_begin() and speculative_page_walk_end() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 15/35] mm: refactor __handle_mm_fault() / handle_pte_fault() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 16/35] mm: implement speculative handling in __handle_mm_fault() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 17/35] mm: add pte_map_lock() and pte_spinlock() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 18/35] mm: implement speculative handling in do_anonymous_page() Michel Lespinasse
2022-01-28 21:03   ` kernel test robot
2022-01-28 21:03     ` kernel test robot
2022-01-28 22:08     ` Michel Lespinasse
2022-01-28 22:08       ` Michel Lespinasse
2022-01-30  2:54   ` [mm] fa5331bae2: canonical_address#:#[##] kernel test robot
2022-01-30  2:54     ` kernel test robot
2022-01-30  5:08     ` Michel Lespinasse
2022-01-30  5:08       ` Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 19/35] mm: enable speculative fault handling through do_anonymous_page() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 20/35] mm: implement speculative handling in do_numa_page() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 21/35] mm: enable speculative fault " Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 22/35] percpu-rwsem: enable percpu_sem destruction in atomic context Michel Lespinasse
2022-01-29 12:13   ` Hillf Danton
2022-01-31 18:04     ` Suren Baghdasaryan
2022-02-01  2:09       ` Hillf Danton
2022-02-07 19:31         ` Suren Baghdasaryan
2022-02-08  0:20           ` Hillf Danton
2022-02-08  1:31             ` Suren Baghdasaryan
2022-01-28 13:09 ` [PATCH v2 23/35] mm: add mmu_notifier_lock Michel Lespinasse
2022-07-27  7:34   ` Pavan Kondeti
2022-07-27 20:30     ` Suren Baghdasaryan [this message]
2022-01-28 13:09 ` [PATCH v2 24/35] mm: write lock mmu_notifier_lock when registering mmu notifiers Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 25/35] mm: add mmu_notifier_trylock() and mmu_notifier_unlock() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 26/35] mm: implement speculative handling in wp_page_copy() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 27/35] mm: implement and enable speculative fault handling in handle_pte_fault() Michel Lespinasse
2022-01-28 13:09 ` [PATCH v2 28/35] mm: disable speculative faults for single threaded user space Michel Lespinasse
2022-01-28 13:10 ` [PATCH v2 29/35] mm: disable rcu safe vma freeing " Michel Lespinasse
2022-01-28 13:10 ` [PATCH v2 30/35] mm: create new include/linux/vm_event.h header file Michel Lespinasse
2022-01-28 13:10 ` [PATCH v2 31/35] mm: anon spf statistics Michel Lespinasse
2022-01-28 13:10 ` [PATCH v2 32/35] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2022-01-28 13:10 ` [PATCH v2 33/35] arm64/mm: attempt speculative mm faults first Michel Lespinasse
2022-01-30  9:13   ` Mike Rapoport
2022-01-31  8:07     ` Michel Lespinasse
2022-02-01  8:58       ` Mike Rapoport
2022-02-07 17:39         ` Michel Lespinasse
2022-02-08  9:07           ` Mike Rapoport
2022-01-28 13:10 ` [PATCH v2 34/35] powerpc/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2022-01-28 13:10 ` [PATCH v2 35/35] powerpc/mm: attempt speculative mm faults first Michel Lespinasse
2022-01-31  9:56 ` [PATCH v2 00/35] Speculative page faults David Hildenbrand
2022-01-31 17:00   ` Suren Baghdasaryan
2022-02-01  1:14 ` Andrew Morton
2022-02-01  2:20   ` Matthew Wilcox
2022-02-07 17:39     ` Michel Lespinasse
2022-02-01 17:17   ` Sebastian Andrzej Siewior
2022-02-23 16:11 ` Mel Gorman
2022-03-08  5:37   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpG0_xwGhTbzWRRwcBKO263TgrVm0T1gJ+PdzcL-EzcHpA@mail.gmail.com \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=dave@stgolabs.net \
    --cc=jglisse@google.com \
    --cc=joelaf@google.com \
    --cc=kernel-team@fb.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=minchan@google.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=quic_pkondeti@quicinc.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=songliubraving@fb.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.