nouveau.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Lyude Paul <lyude@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>
Cc: "Alex Sierra" <alex.sierra@amd.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	dri-devel@lists.freedesktop.org,
	"David Airlie" <airlied@linux.ie>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	linux-kernel@vger.kernel.org,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	nouveau@lists.freedesktop.org,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	amd-gfx@lists.freedesktop.org, linuxppc-dev@lists.ozlabs.org,
	"Christian König" <christian.koenig@amd.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>
Subject: Re: [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
Date: Mon, 26 Sep 2022 15:14:09 -0700	[thread overview]
Message-ID: <6ff9dcc5-c34b-963f-f5e7-7038eecae98b@nvidia.com> (raw)
In-Reply-To: <881735bda9b1ba0ecf3648af201840233508f206.camel@redhat.com>

On 9/26/22 14:35, Lyude Paul wrote:
>> +	for (i = 0; i < npages; i++) {
>> +		if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
>> +			struct page *dpage;
>> +
>> +			/*
>> +			 * _GFP_NOFAIL because the GPU is going away and there
>> +			 * is nothing sensible we can do if we can't copy the
>> +			 * data back.
>> +			 */
> 
> You'll have to excuse me for a moment since this area of nouveau isn't one of
> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
> retry, in the case of a GPU hotplug event I would assume we would rather just
> stop trying to migrate things to the GPU and just drop the data instead of
> hanging on infinite retries.
> 
Hi Lyude!

Actually, I really think it's better in this case to keep trying
(presumably not necessarily infinitely, but only until memory becomes
available), rather than failing out and corrupting data.

That's because I'm not sure it's completely clear that this memory is
discardable. And at some point, we're going to make this all work with
file-backed memory, which will *definitely* not be discardable--I
realize that we're not there yet, of course.

But here, it's reasonable to commit to just retrying indefinitely,
really. Memory should eventually show up. And if it doesn't, then
restarting the machine is better than corrupting data, generally.


thanks,

-- 
John Hubbard
NVIDIA


  reply	other threads:[~2022-09-26 22:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-26  6:03 [Nouveau] [PATCH 0/7] Fix several device private page reference counting issues Alistair Popple
2022-09-26  6:03 ` [Nouveau] [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page Alistair Popple
2022-09-29  0:07   ` Michael Ellerman
2022-09-29  1:40     ` Alistair Popple
2022-09-29  5:07       ` Michael Ellerman
2022-09-26  6:03 ` [Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount Alistair Popple
2022-09-26 14:36   ` Jason Gunthorpe
2022-09-27  2:06     ` Alistair Popple
2022-09-29 20:18       ` Dan Williams
2022-09-30  0:45         ` Alistair Popple
2022-09-30  1:49           ` Dan Williams
2022-09-26  6:03 ` [Nouveau] [PATCH 3/7] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page() Alistair Popple
2022-09-26  6:03 ` [Nouveau] [PATCH 4/7] mm/migrate_device.c: Add migrate_device_range() Alistair Popple
2022-09-26  6:03 ` [Nouveau] [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one() Alistair Popple
2022-09-26 21:29   ` Lyude Paul
2022-09-28 11:30     ` Alistair Popple
2022-09-26  6:03 ` [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release Alistair Popple
2022-09-26 13:28   ` kernel test robot
2022-09-26 21:35   ` Lyude Paul
2022-09-26 22:14     ` John Hubbard [this message]
2022-09-26 23:45       ` Alistair Popple
2022-09-28 21:39         ` Lyude Paul
2022-09-26 23:07     ` Felix Kuehling
2022-09-27  1:39       ` Alistair Popple
2022-09-28 21:23         ` Lyude Paul
2022-09-26  6:03 ` [Nouveau] [PATCH 7/7] hmm-tests: Add test for migrate_device_range() Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ff9dcc5-c34b-963f-f5e7-7038eecae98b@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=bskeggs@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lyude@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=nouveau@lists.freedesktop.org \
    --cc=npiggin@gmail.com \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).