All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Ralph Campbell <rcampbell@nvidia.com>,
	stable@vger.kernel.org, Evgeny Baskakov <ebaskakov@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>
Subject: Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
Date: Wed, 21 Mar 2018 18:46:21 -0400	[thread overview]
Message-ID: <20180321224620.GH3214@redhat.com> (raw)
In-Reply-To: <788cf786-edbf-ab43-af0d-abbe9d538757@nvidia.com>

On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> > On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> >> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> >>> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> <snip>
> 
> >> Hi Jerome,
> >>
> >> This presents a deadlock problem (details below). As for solution ideas, 
> >> Mark Hairgrove points out that the MMU notifiers had to solve the
> >> same sort of problem, and part of the solution involves "avoid
> >> holding locks when issuing these callbacks". That's not an entire 
> >> solution description, of course, but it seems like a good start.
> >>
> >> Anyway, for the deadlock problem:
> >>
> >> Each of these ->release callbacks potentially has to wait for the 
> >> hmm_invalidate_range() callbacks to finish. That is not shown in any
> >> code directly, but it's because: when a device driver is processing 
> >> the above ->release callback, it has to allow any in-progress operations 
> >> to finish up (as specified clearly in your comment documentation above). 
> >>
> >> Some of those operations will invariably need to do things that result 
> >> in page invalidations, thus triggering the hmm_invalidate_range() callback.
> >> Then, the hmm_invalidate_range() callback tries to acquire the same 
> >> hmm->mirrors_sem lock, thus leading to deadlock:
> >>
> >> hmm_invalidate_range():
> >> // ...
> >> 	down_read(&hmm->mirrors_sem);
> >> 	list_for_each_entry(mirror, &hmm->mirrors, list)
> >> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
> >> 							start, end);
> >> 	up_read(&hmm->mirrors_sem);
> > 
> > That is just illegal, the release callback is not allowed to trigger
> > invalidation all it does is kill all device's threads and stop device
> > page fault from happening. So there is no deadlock issues. I can re-
> > inforce the comment some more (see [1] for example on what it should
> > be).
> 
> That rule is fine, and it is true that the .release callback will not 
> directly trigger any invalidations. However, the problem is in letting 
> any *existing* outstanding operations finish up. We have to let 
> existing operations "drain", in order to meet the requirement that 
> everything is done when .release returns.
> 
> For example, if a device driver thread is in the middle of working through
> its fault buffer, it will call migrate_vma(), which will in turn unmap
> pages. That will cause an hmm_invalidate_range() callback, which tries
> to take hmm->mirrors_sems, and we deadlock.
> 
> There's no way to "kill" such a thread while it's in the middle of
> migrate_vma(), you have to let it finish up.
>
> > Also it is illegal for the sync callback to trigger any mmu_notifier
> > callback. I thought this was obvious. The sync callback should only
> > update device page table and do _nothing else_. No way to make this
> > re-entrant.
> 
> That is obvious, yes. I am not trying to say there is any problem with
> that rule. It's the "drain outstanding operations during .release", 
> above, that is the real problem.

Maybe just relax the release callback wording, it should stop any
more processing of fault buffer but not wait for it to finish. In
nouveau code i kill thing but i do not wait hence i don't deadlock.

What matter is to stop any further processing. Yes some fault might
be in flight but they will serialize on various lock. So just do not
wait in the release callback, kill thing. I might have a bug where i
still fill in GPU page table in nouveau, i will check nouveau code
for that.

Kill thing should also kill the channel (i don't do that in nouveau
because i am waiting on some channel patchset) but i am not sure if
hardware like it if we kill channel before stoping fault notification.

Cheers,
Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Ralph Campbell <rcampbell@nvidia.com>,
	stable@vger.kernel.org, Evgeny Baskakov <ebaskakov@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>
Subject: Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
Date: Wed, 21 Mar 2018 18:46:21 -0400	[thread overview]
Message-ID: <20180321224620.GH3214@redhat.com> (raw)
In-Reply-To: <788cf786-edbf-ab43-af0d-abbe9d538757@nvidia.com>

On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> > On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> >> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> >>> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> <snip>
> 
> >> Hi Jerome,
> >>
> >> This presents a deadlock problem (details below). As for solution ideas, 
> >> Mark Hairgrove points out that the MMU notifiers had to solve the
> >> same sort of problem, and part of the solution involves "avoid
> >> holding locks when issuing these callbacks". That's not an entire 
> >> solution description, of course, but it seems like a good start.
> >>
> >> Anyway, for the deadlock problem:
> >>
> >> Each of these ->release callbacks potentially has to wait for the 
> >> hmm_invalidate_range() callbacks to finish. That is not shown in any
> >> code directly, but it's because: when a device driver is processing 
> >> the above ->release callback, it has to allow any in-progress operations 
> >> to finish up (as specified clearly in your comment documentation above). 
> >>
> >> Some of those operations will invariably need to do things that result 
> >> in page invalidations, thus triggering the hmm_invalidate_range() callback.
> >> Then, the hmm_invalidate_range() callback tries to acquire the same 
> >> hmm->mirrors_sem lock, thus leading to deadlock:
> >>
> >> hmm_invalidate_range():
> >> // ...
> >> 	down_read(&hmm->mirrors_sem);
> >> 	list_for_each_entry(mirror, &hmm->mirrors, list)
> >> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
> >> 							start, end);
> >> 	up_read(&hmm->mirrors_sem);
> > 
> > That is just illegal, the release callback is not allowed to trigger
> > invalidation all it does is kill all device's threads and stop device
> > page fault from happening. So there is no deadlock issues. I can re-
> > inforce the comment some more (see [1] for example on what it should
> > be).
> 
> That rule is fine, and it is true that the .release callback will not 
> directly trigger any invalidations. However, the problem is in letting 
> any *existing* outstanding operations finish up. We have to let 
> existing operations "drain", in order to meet the requirement that 
> everything is done when .release returns.
> 
> For example, if a device driver thread is in the middle of working through
> its fault buffer, it will call migrate_vma(), which will in turn unmap
> pages. That will cause an hmm_invalidate_range() callback, which tries
> to take hmm->mirrors_sems, and we deadlock.
> 
> There's no way to "kill" such a thread while it's in the middle of
> migrate_vma(), you have to let it finish up.
>
> > Also it is illegal for the sync callback to trigger any mmu_notifier
> > callback. I thought this was obvious. The sync callback should only
> > update device page table and do _nothing else_. No way to make this
> > re-entrant.
> 
> That is obvious, yes. I am not trying to say there is any problem with
> that rule. It's the "drain outstanding operations during .release", 
> above, that is the real problem.

Maybe just relax the release callback wording, it should stop any
more processing of fault buffer but not wait for it to finish. In
nouveau code i kill thing but i do not wait hence i don't deadlock.

What matter is to stop any further processing. Yes some fault might
be in flight but they will serialize on various lock. So just do not
wait in the release callback, kill thing. I might have a bug where i
still fill in GPU page table in nouveau, i will check nouveau code
for that.

Kill thing should also kill the channel (i don't do that in nouveau
because i am waiting on some channel patchset) but i am not sure if
hardware like it if we kill channel before stoping fault notification.

Cheers,
J�r�me

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Ralph Campbell <rcampbell@nvidia.com>,
	stable@vger.kernel.org, Evgeny Baskakov <ebaskakov@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>
Subject: Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
Date: Wed, 21 Mar 2018 18:46:21 -0400	[thread overview]
Message-ID: <20180321224620.GH3214@redhat.com> (raw)
In-Reply-To: <788cf786-edbf-ab43-af0d-abbe9d538757@nvidia.com>

On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote:
> On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> > On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
> >> On 03/19/2018 07:00 PM, jglisse@redhat.com wrote:
> >>> From: Ralph Campbell <rcampbell@nvidia.com>
> 
> <snip>
> 
> >> Hi Jerome,
> >>
> >> This presents a deadlock problem (details below). As for solution ideas, 
> >> Mark Hairgrove points out that the MMU notifiers had to solve the
> >> same sort of problem, and part of the solution involves "avoid
> >> holding locks when issuing these callbacks". That's not an entire 
> >> solution description, of course, but it seems like a good start.
> >>
> >> Anyway, for the deadlock problem:
> >>
> >> Each of these ->release callbacks potentially has to wait for the 
> >> hmm_invalidate_range() callbacks to finish. That is not shown in any
> >> code directly, but it's because: when a device driver is processing 
> >> the above ->release callback, it has to allow any in-progress operations 
> >> to finish up (as specified clearly in your comment documentation above). 
> >>
> >> Some of those operations will invariably need to do things that result 
> >> in page invalidations, thus triggering the hmm_invalidate_range() callback.
> >> Then, the hmm_invalidate_range() callback tries to acquire the same 
> >> hmm->mirrors_sem lock, thus leading to deadlock:
> >>
> >> hmm_invalidate_range():
> >> // ...
> >> 	down_read(&hmm->mirrors_sem);
> >> 	list_for_each_entry(mirror, &hmm->mirrors, list)
> >> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
> >> 							start, end);
> >> 	up_read(&hmm->mirrors_sem);
> > 
> > That is just illegal, the release callback is not allowed to trigger
> > invalidation all it does is kill all device's threads and stop device
> > page fault from happening. So there is no deadlock issues. I can re-
> > inforce the comment some more (see [1] for example on what it should
> > be).
> 
> That rule is fine, and it is true that the .release callback will not 
> directly trigger any invalidations. However, the problem is in letting 
> any *existing* outstanding operations finish up. We have to let 
> existing operations "drain", in order to meet the requirement that 
> everything is done when .release returns.
> 
> For example, if a device driver thread is in the middle of working through
> its fault buffer, it will call migrate_vma(), which will in turn unmap
> pages. That will cause an hmm_invalidate_range() callback, which tries
> to take hmm->mirrors_sems, and we deadlock.
> 
> There's no way to "kill" such a thread while it's in the middle of
> migrate_vma(), you have to let it finish up.
>
> > Also it is illegal for the sync callback to trigger any mmu_notifier
> > callback. I thought this was obvious. The sync callback should only
> > update device page table and do _nothing else_. No way to make this
> > re-entrant.
> 
> That is obvious, yes. I am not trying to say there is any problem with
> that rule. It's the "drain outstanding operations during .release", 
> above, that is the real problem.

Maybe just relax the release callback wording, it should stop any
more processing of fault buffer but not wait for it to finish. In
nouveau code i kill thing but i do not wait hence i don't deadlock.

What matter is to stop any further processing. Yes some fault might
be in flight but they will serialize on various lock. So just do not
wait in the release callback, kill thing. I might have a bug where i
still fill in GPU page table in nouveau, i will check nouveau code
for that.

Kill thing should also kill the channel (i don't do that in nouveau
because i am waiting on some channel patchset) but i am not sure if
hardware like it if we kill channel before stoping fault notification.

Cheers,
Jerome

  reply	other threads:[~2018-03-21 22:46 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-20  2:00 [PATCH 00/15] hmm: fixes and documentations v3 jglisse
2018-03-20  2:00 ` jglisse
2018-03-20  2:00 ` [PATCH 01/15] mm/hmm: documentation editorial update to HMM documentation jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-21  4:14   ` John Hubbard
2018-03-21  4:14     ` John Hubbard
2018-03-21 18:03     ` Jerome Glisse
2018-03-21 18:03       ` Jerome Glisse
2018-03-21 18:03       ` Jerome Glisse
2018-03-21 22:16       ` John Hubbard
2018-03-21 22:16         ` John Hubbard
2018-03-21 22:46         ` Jerome Glisse [this message]
2018-03-21 22:46           ` Jerome Glisse
2018-03-21 22:46           ` Jerome Glisse
2018-03-21 23:10           ` John Hubbard
2018-03-21 23:10             ` John Hubbard
2018-03-21 23:37             ` Jerome Glisse
2018-03-21 23:37               ` Jerome Glisse
2018-03-21 23:37               ` Jerome Glisse
2018-03-22  0:11               ` John Hubbard
2018-03-22  0:11                 ` John Hubbard
2018-03-22  1:32                 ` Jerome Glisse
2018-03-22  1:32                   ` Jerome Glisse
2018-03-22  1:32                   ` Jerome Glisse
2018-03-22  1:28   ` [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v3 jglisse
2018-03-22  1:28     ` jglisse
2018-03-22  6:58     ` John Hubbard
2018-03-22  6:58       ` John Hubbard
2018-03-20  2:00 ` [PATCH 04/15] mm/hmm: unregister mmu_notifier when last HMM client quit jglisse
2018-03-20  2:00   ` jglisse
2018-03-21  4:24   ` John Hubbard
2018-03-21  4:24     ` John Hubbard
2018-03-21 18:12     ` Jerome Glisse
2018-03-21 18:12       ` Jerome Glisse
2018-03-21 18:16   ` [PATCH 04/15] mm/hmm: unregister mmu_notifier when last HMM client quit v2 jglisse
2018-03-21 18:16     ` jglisse
2018-03-21 23:22     ` John Hubbard
2018-03-21 23:22       ` John Hubbard
2018-03-21 23:41       ` Jerome Glisse
2018-03-21 23:41         ` Jerome Glisse
2018-03-22 22:47         ` John Hubbard
2018-03-22 22:47           ` John Hubbard
2018-03-22 23:37           ` Jerome Glisse
2018-03-22 23:37             ` Jerome Glisse
2018-03-23  0:13             ` John Hubbard
2018-03-23  0:13               ` John Hubbard
2018-03-23  0:50               ` Jerome Glisse
2018-03-23  0:50                 ` Jerome Glisse
2018-03-23  0:56                 ` John Hubbard
2018-03-23  0:56                   ` John Hubbard
2018-03-22  1:30     ` [PATCH 04/15] mm/hmm: unregister mmu_notifier when last HMM client quit v3 jglisse
2018-03-22  1:30       ` jglisse
2018-03-22 22:36       ` Andrew Morton
2018-03-20  2:00 ` [PATCH 05/15] mm/hmm: hmm_pfns_bad() was accessing wrong struct jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 06/15] mm/hmm: use struct for hmm_vma_fault(), hmm_vma_get_pfns() parameters v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 07/15] mm/hmm: remove HMM_PFN_READ flag and ignore peculiar architecture v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 08/15] mm/hmm: use uint64_t for HMM pfn instead of defining hmm_pfn_t to ulong v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 09/15] mm/hmm: cleanup special vma handling (VM_SPECIAL) jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 10/15] mm/hmm: do not differentiate between empty entry or missing directory v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-21  5:24   ` John Hubbard
2018-03-21  5:24     ` John Hubbard
2018-03-21 14:48     ` Jerome Glisse
2018-03-21 14:48       ` Jerome Glisse
2018-03-21 23:16       ` John Hubbard
2018-03-21 23:16         ` John Hubbard
2018-03-20  2:00 ` [PATCH 11/15] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 12/15] mm/hmm: move hmm_pfns_clear() closer to where it is use jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 13/15] mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd() jglisse
2018-03-20  2:00   ` jglisse
2018-03-21  5:07   ` John Hubbard
2018-03-21  5:07     ` John Hubbard
2018-03-21 15:08     ` Jerome Glisse
2018-03-21 15:08       ` Jerome Glisse
2018-03-21 22:36       ` John Hubbard
2018-03-21 22:36         ` John Hubbard
2018-03-20  2:00 ` [PATCH 14/15] mm/hmm: change hmm_vma_fault() to allow write fault on page basis jglisse
2018-03-20  2:00   ` jglisse
2018-03-20  2:00 ` [PATCH 15/15] mm/hmm: use device driver encoding for HMM pfn v2 jglisse
2018-03-20  2:00   ` jglisse
2018-03-21  4:39   ` John Hubbard
2018-03-21  4:39     ` John Hubbard
2018-03-21 15:52     ` Jerome Glisse
2018-03-21 15:52       ` Jerome Glisse
2018-03-21 23:19       ` John Hubbard
2018-03-21 23:19         ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180321224620.GH3214@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebaskakov@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhairgrove@nvidia.com \
    --cc=rcampbell@nvidia.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.