All of lore.kernel.org
 help / color / mirror / Atom feed
* blocking vs. non-blocking mmu notifiers
@ 2022-03-23  8:43 Juergen Gross
  2022-03-23  9:45 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Juergen Gross @ 2022-03-23  8:43 UTC (permalink / raw)
  To: linux-mm, lkml, Andrew Morton; +Cc: Michal Hocko, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2405 bytes --]

Hi,

during analysis of a customer's problem on a 4.12 based kernel
(deadlock due to a blocking mmu notifier in a Xen driver) I came
across upstream patches 93065ac753e4 ("mm, oom: distinguish
blockable mode for mmu notifiers") et al.

The backtrace of the blocked tasks was typically something like:

  #0 [ffffc9004222f228] __schedule at ffffffff817223e2
  #1 [ffffc9004222f2b8] schedule at ffffffff81722a02
  #2 [ffffc9004222f2c8] schedule_preempt_disabled at ffffffff81722d0a
  #3 [ffffc9004222f2d0] __mutex_lock at ffffffff81724104
  #4 [ffffc9004222f360] mn_invl_range_start at ffffffffc01fd398 [xen_gntdev]
  #5 [ffffc9004222f398] __mmu_notifier_invalidate_page at ffffffff8123375a
  #6 [ffffc9004222f3c0] try_to_unmap_one at ffffffff812112cb
  #7 [ffffc9004222f478] rmap_walk_file at ffffffff812105cd
  #8 [ffffc9004222f4d0] try_to_unmap at ffffffff81212450
  #9 [ffffc9004222f508] shrink_page_list at ffffffff811e0755
#10 [ffffc9004222f5c8] shrink_inactive_list at ffffffff811e13cf
#11 [ffffc9004222f6a8] shrink_node_memcg at ffffffff811e241f
#12 [ffffc9004222f790] shrink_node at ffffffff811e29c5
#13 [ffffc9004222f808] do_try_to_free_pages at ffffffff811e2ee1
#14 [ffffc9004222f868] try_to_free_pages at ffffffff811e3248
#15 [ffffc9004222f8e8] __alloc_pages_slowpath at ffffffff81262c37
#16 [ffffc9004222f9f0] __alloc_pages_nodemask at ffffffff8121afc1
#17 [ffffc9004222fa48] alloc_pages_current at ffffffff8122f350
#18 [ffffc9004222fa78] __get_free_pages at ffffffff8121685a
#19 [ffffc9004222fa80] __pollwait at ffffffff8127e795
#20 [ffffc9004222faa8] evtchn_poll at ffffffffc00e802b [xen_evtchn]
#21 [ffffc9004222fab8] do_sys_poll at ffffffff8127f953
#22 [ffffc9004222fec8] sys_ppoll at ffffffff81280478
#23 [ffffc9004222ff30] do_syscall_64 at ffffffff81004954
#24 [ffffc9004222ff50] entry_SYSCALL_64_after_hwframe at ffffffff818000b6

It was found that the notifier of the Xen gntdev driver was using a
mutex resulting in the deadlock.

Michal Hocko suggested that backporting above mentioned patch might
help, as the mmu notifier call is happening in atomic context.

Looking into that I was wondering whether try_to_unmap_one() shouldn't
call mmu_notifier_invalidate_range_start_nonblock() instead of
mmu_notifier_invalidate_range_start() if this is true. Otherwise I
can't see how this deadlock could be avoided.

Any thoughts?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blocking vs. non-blocking mmu notifiers
  2022-03-23  8:43 blocking vs. non-blocking mmu notifiers Juergen Gross
@ 2022-03-23  9:45 ` Michal Hocko
  2022-03-23 16:31   ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2022-03-23  9:45 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-mm, lkml, Andrew Morton, xen-devel, Jerome Glisse, Jason Gunthorpe

[Let me add more people to the CC list - I am not really sure who is the
 most familiar with all the tricks that mmu notifiers might do]

On Wed 23-03-22 09:43:59, Juergen Gross wrote:
> Hi,
> 
> during analysis of a customer's problem on a 4.12 based kernel
> (deadlock due to a blocking mmu notifier in a Xen driver) I came
> across upstream patches 93065ac753e4 ("mm, oom: distinguish
> blockable mode for mmu notifiers") et al.
> 
> The backtrace of the blocked tasks was typically something like:
> 
>  #0 [ffffc9004222f228] __schedule at ffffffff817223e2
>  #1 [ffffc9004222f2b8] schedule at ffffffff81722a02
>  #2 [ffffc9004222f2c8] schedule_preempt_disabled at ffffffff81722d0a
>  #3 [ffffc9004222f2d0] __mutex_lock at ffffffff81724104
>  #4 [ffffc9004222f360] mn_invl_range_start at ffffffffc01fd398 [xen_gntdev]
>  #5 [ffffc9004222f398] __mmu_notifier_invalidate_page at ffffffff8123375a
>  #6 [ffffc9004222f3c0] try_to_unmap_one at ffffffff812112cb
>  #7 [ffffc9004222f478] rmap_walk_file at ffffffff812105cd
>  #8 [ffffc9004222f4d0] try_to_unmap at ffffffff81212450
>  #9 [ffffc9004222f508] shrink_page_list at ffffffff811e0755
> #10 [ffffc9004222f5c8] shrink_inactive_list at ffffffff811e13cf
> #11 [ffffc9004222f6a8] shrink_node_memcg at ffffffff811e241f
> #12 [ffffc9004222f790] shrink_node at ffffffff811e29c5
> #13 [ffffc9004222f808] do_try_to_free_pages at ffffffff811e2ee1
> #14 [ffffc9004222f868] try_to_free_pages at ffffffff811e3248
> #15 [ffffc9004222f8e8] __alloc_pages_slowpath at ffffffff81262c37
> #16 [ffffc9004222f9f0] __alloc_pages_nodemask at ffffffff8121afc1
> #17 [ffffc9004222fa48] alloc_pages_current at ffffffff8122f350
> #18 [ffffc9004222fa78] __get_free_pages at ffffffff8121685a
> #19 [ffffc9004222fa80] __pollwait at ffffffff8127e795
> #20 [ffffc9004222faa8] evtchn_poll at ffffffffc00e802b [xen_evtchn]
> #21 [ffffc9004222fab8] do_sys_poll at ffffffff8127f953
> #22 [ffffc9004222fec8] sys_ppoll at ffffffff81280478
> #23 [ffffc9004222ff30] do_syscall_64 at ffffffff81004954
> #24 [ffffc9004222ff50] entry_SYSCALL_64_after_hwframe at ffffffff818000b6
> 
> It was found that the notifier of the Xen gntdev driver was using a
> mutex resulting in the deadlock.
> 
> Michal Hocko suggested that backporting above mentioned patch might
> help, as the mmu notifier call is happening in atomic context.
> 
> Looking into that I was wondering whether try_to_unmap_one() shouldn't
> call mmu_notifier_invalidate_range_start_nonblock() instead of
> mmu_notifier_invalidate_range_start() if this is true. Otherwise I
> can't see how this deadlock could be avoided.

Just to be more explicit. The current upstream code calls
mmu_notifier_invalidate_range while the page table locks are held.
Are there any notifiers which could sleep in those? In other words
should we use the nonblock start/stop in try_to_unmap?

> Any thoughts?
> 
> 
> Juergen






-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blocking vs. non-blocking mmu notifiers
  2022-03-23  9:45 ` Michal Hocko
@ 2022-03-23 16:31   ` Jason Gunthorpe
  2022-03-23 16:49     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2022-03-23 16:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Juergen Gross, linux-mm, lkml, Andrew Morton, xen-devel, Jerome Glisse

On Wed, Mar 23, 2022 at 10:45:30AM +0100, Michal Hocko wrote:
> [Let me add more people to the CC list - I am not really sure who is the
>  most familiar with all the tricks that mmu notifiers might do]
> 
> On Wed 23-03-22 09:43:59, Juergen Gross wrote:
> > Hi,
> > 
> > during analysis of a customer's problem on a 4.12 based kernel
> > (deadlock due to a blocking mmu notifier in a Xen driver) I came
> > across upstream patches 93065ac753e4 ("mm, oom: distinguish
> > blockable mode for mmu notifiers") et al.
> > 
> > The backtrace of the blocked tasks was typically something like:
> > 
> >  #0 [ffffc9004222f228] __schedule at ffffffff817223e2
> >  #1 [ffffc9004222f2b8] schedule at ffffffff81722a02
> >  #2 [ffffc9004222f2c8] schedule_preempt_disabled at ffffffff81722d0a
> >  #3 [ffffc9004222f2d0] __mutex_lock at ffffffff81724104
> >  #4 [ffffc9004222f360] mn_invl_range_start at ffffffffc01fd398 [xen_gntdev]
> >  #5 [ffffc9004222f398] __mmu_notifier_invalidate_page at ffffffff8123375a
> >  #6 [ffffc9004222f3c0] try_to_unmap_one at ffffffff812112cb
> >  #7 [ffffc9004222f478] rmap_walk_file at ffffffff812105cd
> >  #8 [ffffc9004222f4d0] try_to_unmap at ffffffff81212450
> >  #9 [ffffc9004222f508] shrink_page_list at ffffffff811e0755
> > #10 [ffffc9004222f5c8] shrink_inactive_list at ffffffff811e13cf
> > #11 [ffffc9004222f6a8] shrink_node_memcg at ffffffff811e241f
> > #12 [ffffc9004222f790] shrink_node at ffffffff811e29c5
> > #13 [ffffc9004222f808] do_try_to_free_pages at ffffffff811e2ee1
> > #14 [ffffc9004222f868] try_to_free_pages at ffffffff811e3248
> > #15 [ffffc9004222f8e8] __alloc_pages_slowpath at ffffffff81262c37
> > #16 [ffffc9004222f9f0] __alloc_pages_nodemask at ffffffff8121afc1
> > #17 [ffffc9004222fa48] alloc_pages_current at ffffffff8122f350
> > #18 [ffffc9004222fa78] __get_free_pages at ffffffff8121685a
> > #19 [ffffc9004222fa80] __pollwait at ffffffff8127e795
> > #20 [ffffc9004222faa8] evtchn_poll at ffffffffc00e802b [xen_evtchn]
> > #21 [ffffc9004222fab8] do_sys_poll at ffffffff8127f953
> > #22 [ffffc9004222fec8] sys_ppoll at ffffffff81280478
> > #23 [ffffc9004222ff30] do_syscall_64 at ffffffff81004954
> > #24 [ffffc9004222ff50] entry_SYSCALL_64_after_hwframe at ffffffff818000b6
> > 
> > It was found that the notifier of the Xen gntdev driver was using a
> > mutex resulting in the deadlock.

The bug here is that prior to commit a81461b0546c ("xen/gntdev: update
to new mmu_notifier semantic") wired the mn_invl_range_start() which
takes a mutex to invalidate_page, which is defined to run in an atomic
context.

> > Michal Hocko suggested that backporting above mentioned patch might
> > help, as the mmu notifier call is happening in atomic context.

IIRC "blocking" was not supposed to be about atomic context or not, but
more about time to completion/guarenteed completion as it is used from
a memory reclaim path.

> Just to be more explicit. The current upstream code calls
> mmu_notifier_invalidate_range while the page table locks are held.
> Are there any notifiers which could sleep in those? 

There should not be, that would be a bug that lockdep would find.

> In other words should we use the nonblock start/stop in
> try_to_unmap?

AFAICT, no.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blocking vs. non-blocking mmu notifiers
  2022-03-23 16:31   ` Jason Gunthorpe
@ 2022-03-23 16:49     ` Michal Hocko
  2022-03-23 17:04       ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2022-03-23 16:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Juergen Gross, linux-mm, lkml, Andrew Morton, xen-devel, Jerome Glisse

On Wed 23-03-22 13:31:46, Jason Gunthorpe wrote:
> On Wed, Mar 23, 2022 at 10:45:30AM +0100, Michal Hocko wrote:
> > [Let me add more people to the CC list - I am not really sure who is the
> >  most familiar with all the tricks that mmu notifiers might do]
> > 
> > On Wed 23-03-22 09:43:59, Juergen Gross wrote:
> > > Hi,
> > > 
> > > during analysis of a customer's problem on a 4.12 based kernel
> > > (deadlock due to a blocking mmu notifier in a Xen driver) I came
> > > across upstream patches 93065ac753e4 ("mm, oom: distinguish
> > > blockable mode for mmu notifiers") et al.
> > > 
> > > The backtrace of the blocked tasks was typically something like:
> > > 
> > >  #0 [ffffc9004222f228] __schedule at ffffffff817223e2
> > >  #1 [ffffc9004222f2b8] schedule at ffffffff81722a02
> > >  #2 [ffffc9004222f2c8] schedule_preempt_disabled at ffffffff81722d0a
> > >  #3 [ffffc9004222f2d0] __mutex_lock at ffffffff81724104
> > >  #4 [ffffc9004222f360] mn_invl_range_start at ffffffffc01fd398 [xen_gntdev]
> > >  #5 [ffffc9004222f398] __mmu_notifier_invalidate_page at ffffffff8123375a
> > >  #6 [ffffc9004222f3c0] try_to_unmap_one at ffffffff812112cb
> > >  #7 [ffffc9004222f478] rmap_walk_file at ffffffff812105cd
> > >  #8 [ffffc9004222f4d0] try_to_unmap at ffffffff81212450
> > >  #9 [ffffc9004222f508] shrink_page_list at ffffffff811e0755
> > > #10 [ffffc9004222f5c8] shrink_inactive_list at ffffffff811e13cf
> > > #11 [ffffc9004222f6a8] shrink_node_memcg at ffffffff811e241f
> > > #12 [ffffc9004222f790] shrink_node at ffffffff811e29c5
> > > #13 [ffffc9004222f808] do_try_to_free_pages at ffffffff811e2ee1
> > > #14 [ffffc9004222f868] try_to_free_pages at ffffffff811e3248
> > > #15 [ffffc9004222f8e8] __alloc_pages_slowpath at ffffffff81262c37
> > > #16 [ffffc9004222f9f0] __alloc_pages_nodemask at ffffffff8121afc1
> > > #17 [ffffc9004222fa48] alloc_pages_current at ffffffff8122f350
> > > #18 [ffffc9004222fa78] __get_free_pages at ffffffff8121685a
> > > #19 [ffffc9004222fa80] __pollwait at ffffffff8127e795
> > > #20 [ffffc9004222faa8] evtchn_poll at ffffffffc00e802b [xen_evtchn]
> > > #21 [ffffc9004222fab8] do_sys_poll at ffffffff8127f953
> > > #22 [ffffc9004222fec8] sys_ppoll at ffffffff81280478
> > > #23 [ffffc9004222ff30] do_syscall_64 at ffffffff81004954
> > > #24 [ffffc9004222ff50] entry_SYSCALL_64_after_hwframe at ffffffff818000b6
> > > 
> > > It was found that the notifier of the Xen gntdev driver was using a
> > > mutex resulting in the deadlock.
> 
> The bug here is that prior to commit a81461b0546c ("xen/gntdev: update
> to new mmu_notifier semantic") wired the mn_invl_range_start() which
> takes a mutex to invalidate_page, which is defined to run in an atomic
> context.

Yeah, we have already identified that but quickly realized that the
whole mmu notifier overhaul which this fix depends on would be no no for
backporting to our older code base. So we are trying to find our way
around that.

> > > Michal Hocko suggested that backporting above mentioned patch might
> > > help, as the mmu notifier call is happening in atomic context.
> 
> IIRC "blocking" was not supposed to be about atomic context or not, but
> more about time to completion/guarenteed completion as it is used from
> a memory reclaim path.

Yeah, when I was introducing that I was mostly concerned about oom path.
But it is essentially impossible to tell what are the existing
assumptions different mmu notifiers operate on.

> 
> > Just to be more explicit. The current upstream code calls
> > mmu_notifier_invalidate_range while the page table locks are held.
> > Are there any notifiers which could sleep in those? 
> 
> There should not be, that would be a bug that lockdep would find.

OK, this is reassuring. Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blocking vs. non-blocking mmu notifiers
  2022-03-23 16:49     ` Michal Hocko
@ 2022-03-23 17:04       ` Jason Gunthorpe
  2022-03-24 12:42         ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2022-03-23 17:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Juergen Gross, linux-mm, lkml, Andrew Morton, xen-devel, Jerome Glisse

On Wed, Mar 23, 2022 at 05:49:43PM +0100, Michal Hocko wrote:
> > The bug here is that prior to commit a81461b0546c ("xen/gntdev: update
> > to new mmu_notifier semantic") wired the mn_invl_range_start() which
> > takes a mutex to invalidate_page, which is defined to run in an atomic
> > context.
> 
> Yeah, we have already identified that but quickly realized that the
> whole mmu notifier overhaul which this fix depends on would be no no for
> backporting to our older code base. So we are trying to find our way
> around that.

IMHO you don't need everything, just commit 369ea8242c0f ("mm/rmap:
update to new mmu_notifier semantic v2") which adds the missing
start/end outside the lock for the page callbacks.

Then you can take safely a8146 into gntdev.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blocking vs. non-blocking mmu notifiers
  2022-03-23 17:04       ` Jason Gunthorpe
@ 2022-03-24 12:42         ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2022-03-24 12:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Juergen Gross, linux-mm, lkml, Andrew Morton, xen-devel, Jerome Glisse

On Wed 23-03-22 14:04:04, Jason Gunthorpe wrote:
> On Wed, Mar 23, 2022 at 05:49:43PM +0100, Michal Hocko wrote:
> > > The bug here is that prior to commit a81461b0546c ("xen/gntdev: update
> > > to new mmu_notifier semantic") wired the mn_invl_range_start() which
> > > takes a mutex to invalidate_page, which is defined to run in an atomic
> > > context.
> > 
> > Yeah, we have already identified that but quickly realized that the
> > whole mmu notifier overhaul which this fix depends on would be no no for
> > backporting to our older code base. So we are trying to find our way
> > around that.
> 
> IMHO you don't need everything, just commit 369ea8242c0f ("mm/rmap:
> update to new mmu_notifier semantic v2") which adds the missing
> start/end outside the lock for the page callbacks.
> 
> Then you can take safely a8146 into gntdev.

Thanks Jason!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-24 12:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23  8:43 blocking vs. non-blocking mmu notifiers Juergen Gross
2022-03-23  9:45 ` Michal Hocko
2022-03-23 16:31   ` Jason Gunthorpe
2022-03-23 16:49     ` Michal Hocko
2022-03-23 17:04       ` Jason Gunthorpe
2022-03-24 12:42         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.