All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"David (ChunMing) Zhou" <David1.Zhou@amd.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	"Jani Nikula" <jani.nikula@linux.intel.com>,
	"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Doug Ledford" <dledford@redhat.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Mike Marciniszyn" <mike.marciniszyn@intel.com>,
	"Dennis Dalessandro" <dennis.dalessandro@intel.com>,
	"Sudeep Dutt" <sudeep.dutt@intel.com>,
	"Ashutosh Dixit" <ashutosh.dixit@intel.com>,
	"Dimitri Sivanich" <sivanich@sgi.com>
Subject: Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers
Date: Mon, 25 Jun 2018 10:01:03 +0200	[thread overview]
Message-ID: <20180625080103.GB28965@dhcp22.suse.cz> (raw)
In-Reply-To: <dd260800-6457-f3ff-47df-b65ef258f4b7@amd.com>

On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > On Fri 22-06-18 17:13:02, Christian König wrote:
> >> Hi Michal,
> >>
> >> [Adding Felix as well]
> >>
> >> Well first of all you have a misconception why at least the AMD graphics
> >> driver need to be able to sleep in an MMU notifier: We need to sleep because
> >> we need to wait for hardware operations to finish and *NOT* because we need
> >> to wait for locks.
> >>
> >> I'm not sure if your flag now means that you generally can't sleep in MMU
> >> notifiers any more, but if that's the case at least AMD hardware will break
> >> badly. In our case the approach of waiting for a short time for the process
> >> to be reaped and then select another victim actually sounds like the right
> >> thing to do.
> > Well, I do not need to make the notifier code non blocking all the time.
> > All I need is to ensure that it won't sleep if the flag says so and
> > return -EAGAIN instead.
> >
> > So here is what I do for amdgpu:
> 
> In the case of KFD we also need to take the DQM lock:
> 
> amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> 
> So we'd need to pass the blockable parameter all the way through that
> call chain.

Thanks, I have missed that part. So I guess I will start with something
similar to intel-gfx and back off when the current range needs some
treatment. So this on top. Does it look correct?

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..e2d422b3eb0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
 		struct amdgpu_mn_node *node;
 		struct amdgpu_bo *bo;
 
+		if (!blockable) {
+			amdgpu_mn_read_unlock();
+			return -EAGAIN;
+		}
+
 		node = container_of(it, struct amdgpu_mn_node, it);
 		it = interval_tree_iter_next(it, start, end);
 
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"David (ChunMing) Zhou" <David1.Zhou@amd.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	"Jani Nikula" <jani.nikula@linux.intel.com>,
	"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Doug Ledford" <dledford@redhat.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Mike Marciniszyn" <mike.marciniszyn@intel.com>,
	"Dennis Dalessandro" <dennis.dalessandro@intel.com>,
	"Sudeep Dutt" <sudeep.dutt@intel.com>,
	"Ashutosh Dixit" <ashutosh.dixit@intel.com>,
	"Dimitri Sivanich" <sivanich@sgi.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	kvm@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	linux-rdma@vger.kernel.org, xen-devel@lists.xenproject.org,
	linux-mm@kvack.org, "David Rientjes" <rientjes@google.com>
Subject: Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers
Date: Mon, 25 Jun 2018 10:01:03 +0200	[thread overview]
Message-ID: <20180625080103.GB28965@dhcp22.suse.cz> (raw)
In-Reply-To: <dd260800-6457-f3ff-47df-b65ef258f4b7@amd.com>

On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > On Fri 22-06-18 17:13:02, Christian König wrote:
> >> Hi Michal,
> >>
> >> [Adding Felix as well]
> >>
> >> Well first of all you have a misconception why at least the AMD graphics
> >> driver need to be able to sleep in an MMU notifier: We need to sleep because
> >> we need to wait for hardware operations to finish and *NOT* because we need
> >> to wait for locks.
> >>
> >> I'm not sure if your flag now means that you generally can't sleep in MMU
> >> notifiers any more, but if that's the case at least AMD hardware will break
> >> badly. In our case the approach of waiting for a short time for the process
> >> to be reaped and then select another victim actually sounds like the right
> >> thing to do.
> > Well, I do not need to make the notifier code non blocking all the time.
> > All I need is to ensure that it won't sleep if the flag says so and
> > return -EAGAIN instead.
> >
> > So here is what I do for amdgpu:
> 
> In the case of KFD we also need to take the DQM lock:
> 
> amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> 
> So we'd need to pass the blockable parameter all the way through that
> call chain.

Thanks, I have missed that part. So I guess I will start with something
similar to intel-gfx and back off when the current range needs some
treatment. So this on top. Does it look correct?

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..e2d422b3eb0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
 		struct amdgpu_mn_node *node;
 		struct amdgpu_bo *bo;
 
+		if (!blockable) {
+			amdgpu_mn_read_unlock();
+			return -EAGAIN;
+		}
+
 		node = container_of(it, struct amdgpu_mn_node, it);
 		it = interval_tree_iter_next(it, start, end);
 
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"David (ChunMing) Zhou" <David1.Zhou@amd.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	"Jani Nikula" <jani.nikula@linux.intel.com>,
	"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Doug Ledford" <dledford@redhat.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Mike Marciniszyn" <mike.marciniszyn@intel.com>,
	"Dennis Dalessandro" <dennis.dalessandro@intel.com>,
	"Sudeep Dutt" <sudeep.dutt@intel.com>,
	"Ashutosh Dixit" <ashutosh.dixit@intel.com>,
	"Dimitri Sivanich" <sivanich@sgi.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	kvm@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	linux-rdma@vger.kernel.org, xen-devel@lists.xenproject.org,
	linux-mm@kvack.org, "David Rientjes" <rientjes@google.com>
Subject: Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers
Date: Mon, 25 Jun 2018 10:01:03 +0200	[thread overview]
Message-ID: <20180625080103.GB28965@dhcp22.suse.cz> (raw)
In-Reply-To: <dd260800-6457-f3ff-47df-b65ef258f4b7@amd.com>

On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > On Fri 22-06-18 17:13:02, Christian Konig wrote:
> >> Hi Michal,
> >>
> >> [Adding Felix as well]
> >>
> >> Well first of all you have a misconception why at least the AMD graphics
> >> driver need to be able to sleep in an MMU notifier: We need to sleep because
> >> we need to wait for hardware operations to finish and *NOT* because we need
> >> to wait for locks.
> >>
> >> I'm not sure if your flag now means that you generally can't sleep in MMU
> >> notifiers any more, but if that's the case at least AMD hardware will break
> >> badly. In our case the approach of waiting for a short time for the process
> >> to be reaped and then select another victim actually sounds like the right
> >> thing to do.
> > Well, I do not need to make the notifier code non blocking all the time.
> > All I need is to ensure that it won't sleep if the flag says so and
> > return -EAGAIN instead.
> >
> > So here is what I do for amdgpu:
> 
> In the case of KFD we also need to take the DQM lock:
> 
> amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> 
> So we'd need to pass the blockable parameter all the way through that
> call chain.

Thanks, I have missed that part. So I guess I will start with something
similar to intel-gfx and back off when the current range needs some
treatment. So this on top. Does it look correct?

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..e2d422b3eb0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
 		struct amdgpu_mn_node *node;
 		struct amdgpu_bo *bo;
 
+		if (!blockable) {
+			amdgpu_mn_read_unlock();
+			return -EAGAIN;
+		}
+
 		node = container_of(it, struct amdgpu_mn_node, it);
 		it = interval_tree_iter_next(it, start, end);
 
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2018-06-25  8:01 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-22 15:02 [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers Michal Hocko
2018-06-22 15:02 ` Michal Hocko
2018-06-22 15:02 ` Michal Hocko
2018-06-22 15:06 ` ✗ Fi.CI.BAT: failure for " Patchwork
2018-06-22 15:13 ` [RFC PATCH] " Christian König
2018-06-22 15:13   ` Christian König
2018-06-22 15:13   ` Christian König
2018-06-22 15:24   ` Michal Hocko
2018-06-22 15:24   ` Michal Hocko
2018-06-22 15:24     ` Michal Hocko
2018-06-22 15:24     ` Michal Hocko
2018-06-22 20:09     ` Felix Kuehling
     [not found]     ` <20180622152444.GC10465-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-06-22 20:09       ` Felix Kuehling
2018-06-22 20:09         ` Felix Kuehling
2018-06-22 20:09         ` Felix Kuehling
2018-06-25  8:01         ` Michal Hocko
2018-06-25  8:01         ` Michal Hocko [this message]
2018-06-25  8:01           ` Michal Hocko
2018-06-25  8:01           ` Michal Hocko
2018-06-25 13:31           ` Michal Hocko
     [not found]           ` <20180625080103.GB28965-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-06-25 13:31             ` Michal Hocko
2018-06-25 13:31               ` Michal Hocko
2018-06-25 13:31               ` Michal Hocko
2018-06-22 15:13 ` Christian König
2018-06-22 15:36 ` [Intel-gfx] " Chris Wilson
2018-06-22 16:01 ` ✗ Fi.CI.BAT: failure for mm, oom: distinguish blockable mode for mmu notifiers (rev2) Patchwork
2018-06-24  8:11 ` [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers Paolo Bonzini
2018-06-25  7:57   ` Michal Hocko
2018-06-25  8:10     ` Paolo Bonzini
2018-06-25  8:45       ` Michal Hocko
2018-06-25 10:34         ` Paolo Bonzini
2018-06-25 11:08           ` Michal Hocko
2018-06-25 10:23 ` ✗ Fi.CI.CHECKPATCH: warning for mm, oom: distinguish blockable mode for mmu notifiers (rev3) Patchwork
2018-06-25 10:56 ` ✓ Fi.CI.BAT: success " Patchwork
2018-06-25 13:50 ` ✗ Fi.CI.CHECKPATCH: warning for mm, oom: distinguish blockable mode for mmu notifiers (rev4) Patchwork
2018-06-25 14:00 ` ✓ Fi.CI.IGT: success for mm, oom: distinguish blockable mode for mmu notifiers (rev3) Patchwork
2018-06-25 14:10 ` ✓ Fi.CI.BAT: success for mm, oom: distinguish blockable mode for mmu notifiers (rev4) Patchwork
2018-06-25 19:22 ` ✓ Fi.CI.IGT: " Patchwork
     [not found] ` <20180622150242.16558-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-06-22 15:36   ` [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers Chris Wilson
2018-06-22 15:36     ` Chris Wilson
2018-06-22 15:57     ` Michal Hocko
2018-06-22 15:57     ` Michal Hocko
2018-06-22 15:57       ` Michal Hocko
2018-06-22 16:18       ` Jerome Glisse
2018-06-22 16:18       ` Jerome Glisse
2018-06-22 16:18         ` Jerome Glisse
2018-06-22 16:18         ` Jerome Glisse
     [not found]         ` <20180622164026.GA23674@dhcp22.suse.cz>
2018-06-22 16:42           ` Michal Hocko
2018-06-22 16:42             ` [Intel-gfx] " Michal Hocko
2018-06-22 16:42             ` Michal Hocko
2018-06-22 17:26             ` [Intel-gfx] " Jerome Glisse
2018-06-22 17:26             ` Jerome Glisse
2018-06-22 17:26               ` [Intel-gfx] " Jerome Glisse
2018-06-22 16:42           ` Michal Hocko
     [not found]       ` <152968364170.11773.4392861266443293819@mail.alporthouse.com>
2018-06-22 16:19         ` Michal Hocko
2018-06-22 16:19           ` Michal Hocko
2018-06-22 16:19         ` Michal Hocko
2018-06-22 16:25     ` Jerome Glisse
2018-06-27  7:44   ` Michal Hocko
2018-06-27  7:44     ` Michal Hocko
2018-06-27  7:44     ` Michal Hocko
     [not found]     ` <20180627074421.GF32348-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-07-02  9:14       ` Christian König
2018-07-02  9:14         ` Christian König
2018-07-02  9:14         ` Christian König
2018-07-02 11:54         ` Michal Hocko
2018-07-02 11:54         ` Michal Hocko
2018-07-02 11:54           ` Michal Hocko
2018-07-02 11:54           ` Michal Hocko
2018-07-02 12:13           ` Christian König
2018-07-02 12:13           ` Christian König
2018-07-02 12:13             ` Christian König
2018-07-02 12:13             ` Christian König
2018-07-02 12:20             ` Michal Hocko
2018-07-02 12:20             ` Michal Hocko
2018-07-02 12:20               ` Michal Hocko
2018-07-02 12:20               ` Michal Hocko
2018-07-02 12:24               ` Christian König
2018-07-02 12:24                 ` Christian König
2018-07-02 12:24                 ` Christian König
2018-07-02 12:35                 ` Michal Hocko
     [not found]                 ` <02d1d52c-f534-f899-a18c-a3169123ac7c-5C7GfCeVMHo@public.gmane.org>
2018-07-02 12:35                   ` Michal Hocko
2018-07-02 12:35                     ` Michal Hocko
2018-07-02 12:35                     ` Michal Hocko
2018-07-02 12:39                     ` Christian König
     [not found]                     ` <20180702123521.GO19043-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-07-02 12:39                       ` Christian König
2018-07-02 12:39                         ` Christian König
2018-07-02 12:39                         ` Christian König
     [not found]                         ` <91ad1106-6bd4-7d2c-4d40-7c5be945ba36-5C7GfCeVMHo@public.gmane.org>
2018-07-02 12:56                           ` Michal Hocko
2018-07-02 12:56                             ` Michal Hocko
2018-07-02 12:56                             ` Michal Hocko
2018-07-02 12:56                         ` Michal Hocko
2018-07-02 12:24               ` Christian König
2018-07-09 12:29       ` Michal Hocko
2018-07-09 12:29         ` Michal Hocko
2018-07-09 12:29         ` Michal Hocko
     [not found]         ` <20180709122908.GJ22049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-07-10 13:40           ` Leon Romanovsky
2018-07-10 13:40             ` Leon Romanovsky
2018-07-10 14:14             ` Michal Hocko
2018-07-10 14:14               ` Michal Hocko
2018-07-10 16:20               ` Leon Romanovsky
2018-07-10 16:20                 ` Leon Romanovsky
     [not found]                 ` <20180710162020.GJ3014-U/DQcQFIOTAkZy+6dPanYNBPR1lH4CV8@public.gmane.org>
2018-07-11  9:03                   ` Michal Hocko
2018-07-11  9:03                     ` Michal Hocko
2018-07-11 10:14                     ` Leon Romanovsky
2018-07-11 10:14                       ` Leon Romanovsky
2018-07-11 11:13                       ` Michal Hocko
2018-07-11 11:13                       ` Michal Hocko
2018-07-11 11:13                         ` Michal Hocko
2018-07-11 12:08                         ` Leon Romanovsky
     [not found]                         ` <20180711111318.GL20050-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-07-11 12:08                           ` Leon Romanovsky
2018-07-11 12:08                             ` Leon Romanovsky
2018-07-11 10:14                     ` Leon Romanovsky
2018-07-11  9:03                 ` Michal Hocko
2018-07-10 16:20               ` Leon Romanovsky
     [not found]               ` <20180710141410.GP14284-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2018-07-16  7:59                 ` Leon Romanovsky
2018-07-16  7:59                   ` Leon Romanovsky
2018-07-16  7:59               ` Leon Romanovsky
2018-07-10 14:14             ` Michal Hocko
2018-07-10 13:40         ` Leon Romanovsky
2018-07-02  9:14     ` Christian König
2018-07-09 12:29     ` Michal Hocko
2018-06-27  7:44 ` Michal Hocko
2018-06-27  9:05 ` ✗ Fi.CI.BAT: failure for mm, oom: distinguish blockable mode for mmu notifiers (rev5) Patchwork
2018-07-11 10:57 ` ✗ Fi.CI.BAT: failure for mm, oom: distinguish blockable mode for mmu notifiers (rev6) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2018-06-22 15:02 [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180625080103.GB28965@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=David1.Zhou@amd.com \
    --cc=airlied@linux.ie \
    --cc=alexander.deucher@amd.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=felix.kuehling@amd.com \
    --cc=jani.nikula@linux.intel.com \
    --cc=jgg@ziepe.ca \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=sivanich@sgi.com \
    --cc=sudeep.dutt@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.