All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, nouveau@lists.freedesktop.org,
	bskeggs@redhat.com, akpm@linux-foundation.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm-ppc@vger.kernel.org, dri-devel@lists.freedesktop.org,
	jhubbard@nvidia.com, rcampbell@nvidia.com, jglisse@redhat.com
Subject: Re: [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate
Date: Wed, 10 Feb 2021 08:56:34 -0400	[thread overview]
Message-ID: <20210210125634.GL4718@ziepe.ca> (raw)
In-Reply-To: <1780857.6Ip0F2Sa4d@nvdebian>

On Wed, Feb 10, 2021 at 02:40:10PM +1100, Alistair Popple wrote:
> On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote:
> > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> > > Device private pages are used to represent device memory that is not
> > > directly accessible from the CPU. Extra references to a device private
> > > page are only used to ensure the struct page itself remains valid whilst
> > > waiting for migration entries. Therefore extra references should not
> > > prevent device private page migration as this can lead to failures to
> > > migrate pages back to the CPU which are fatal to the user process.
> > 
> > This should identify the extra references in expected_count, just
> > disabling this protection seems unsafe, ZONE_DEVICE is not so special
> > that the refcount means nothing
> 
> This is similar to what migarte_vma_check_page() does now. The issue is that a 
> migration wait takes a reference on the device private page so you can end up 
> with one thread stuck waiting for migration whilst the other can't migrate due 
> to the extra refcount.
> 
> Given device private pages can't undergo GUP and that it's not possible to 
> differentiate the migration wait refcount from any other refcount we assume 
> any possible extra reference must be from migration wait.

GUP is not the only thing that elevates the refcount, I think this is
an unsafe assumption

Why is migration holding an extra refcount anyhow?

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alistair Popple <apopple@nvidia.com>
Cc: rcampbell@nvidia.com, linux-doc@vger.kernel.org,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org,
	linux-mm@kvack.org, bskeggs@redhat.com,
	akpm@linux-foundation.org
Subject: Re: [Nouveau] [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate
Date: Wed, 10 Feb 2021 08:56:34 -0400	[thread overview]
Message-ID: <20210210125634.GL4718@ziepe.ca> (raw)
In-Reply-To: <1780857.6Ip0F2Sa4d@nvdebian>

On Wed, Feb 10, 2021 at 02:40:10PM +1100, Alistair Popple wrote:
> On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote:
> > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> > > Device private pages are used to represent device memory that is not
> > > directly accessible from the CPU. Extra references to a device private
> > > page are only used to ensure the struct page itself remains valid whilst
> > > waiting for migration entries. Therefore extra references should not
> > > prevent device private page migration as this can lead to failures to
> > > migrate pages back to the CPU which are fatal to the user process.
> > 
> > This should identify the extra references in expected_count, just
> > disabling this protection seems unsafe, ZONE_DEVICE is not so special
> > that the refcount means nothing
> 
> This is similar to what migarte_vma_check_page() does now. The issue is that a 
> migration wait takes a reference on the device private page so you can end up 
> with one thread stuck waiting for migration whilst the other can't migrate due 
> to the extra refcount.
> 
> Given device private pages can't undergo GUP and that it's not possible to 
> differentiate the migration wait refcount from any other refcount we assume 
> any possible extra reference must be from migration wait.

GUP is not the only thing that elevates the refcount, I think this is
an unsafe assumption

Why is migration holding an extra refcount anyhow?

Jason
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alistair Popple <apopple@nvidia.com>
Cc: rcampbell@nvidia.com, linux-doc@vger.kernel.org,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org,
	linux-mm@kvack.org, jglisse@redhat.com, bskeggs@redhat.com,
	jhubbard@nvidia.com, akpm@linux-foundation.org
Subject: Re: [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate
Date: Wed, 10 Feb 2021 08:56:34 -0400	[thread overview]
Message-ID: <20210210125634.GL4718@ziepe.ca> (raw)
In-Reply-To: <1780857.6Ip0F2Sa4d@nvdebian>

On Wed, Feb 10, 2021 at 02:40:10PM +1100, Alistair Popple wrote:
> On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote:
> > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> > > Device private pages are used to represent device memory that is not
> > > directly accessible from the CPU. Extra references to a device private
> > > page are only used to ensure the struct page itself remains valid whilst
> > > waiting for migration entries. Therefore extra references should not
> > > prevent device private page migration as this can lead to failures to
> > > migrate pages back to the CPU which are fatal to the user process.
> > 
> > This should identify the extra references in expected_count, just
> > disabling this protection seems unsafe, ZONE_DEVICE is not so special
> > that the refcount means nothing
> 
> This is similar to what migarte_vma_check_page() does now. The issue is that a 
> migration wait takes a reference on the device private page so you can end up 
> with one thread stuck waiting for migration whilst the other can't migrate due 
> to the extra refcount.
> 
> Given device private pages can't undergo GUP and that it's not possible to 
> differentiate the migration wait refcount from any other refcount we assume 
> any possible extra reference must be from migration wait.

GUP is not the only thing that elevates the refcount, I think this is
an unsafe assumption

Why is migration holding an extra refcount anyhow?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, nouveau@lists.freedesktop.org,
	bskeggs@redhat.com, akpm@linux-foundation.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm-ppc@vger.kernel.org, dri-devel@lists.freedesktop.org,
	jhubbard@nvidia.com, rcampbell@nvidia.com, jglisse@redhat.com
Subject: Re: [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate
Date: Wed, 10 Feb 2021 12:56:34 +0000	[thread overview]
Message-ID: <20210210125634.GL4718@ziepe.ca> (raw)
In-Reply-To: <1780857.6Ip0F2Sa4d@nvdebian>

On Wed, Feb 10, 2021 at 02:40:10PM +1100, Alistair Popple wrote:
> On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote:
> > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> > > Device private pages are used to represent device memory that is not
> > > directly accessible from the CPU. Extra references to a device private
> > > page are only used to ensure the struct page itself remains valid whilst
> > > waiting for migration entries. Therefore extra references should not
> > > prevent device private page migration as this can lead to failures to
> > > migrate pages back to the CPU which are fatal to the user process.
> > 
> > This should identify the extra references in expected_count, just
> > disabling this protection seems unsafe, ZONE_DEVICE is not so special
> > that the refcount means nothing
> 
> This is similar to what migarte_vma_check_page() does now. The issue is that a 
> migration wait takes a reference on the device private page so you can end up 
> with one thread stuck waiting for migration whilst the other can't migrate due 
> to the extra refcount.
> 
> Given device private pages can't undergo GUP and that it's not possible to 
> differentiate the migration wait refcount from any other refcount we assume 
> any possible extra reference must be from migration wait.

GUP is not the only thing that elevates the refcount, I think this is
an unsafe assumption

Why is migration holding an extra refcount anyhow?

Jason

  reply	other threads:[~2021-02-10 13:03 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09  1:07 [PATCH 0/9] Add support for SVM atomics in Nouveau Alistair Popple
2021-02-09  1:07 ` Alistair Popple
2021-02-09  1:07 ` Alistair Popple
2021-02-09  1:07 ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 1/9] mm/migrate.c: Always allow device private pages to migrate Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09 13:39   ` Jason Gunthorpe
2021-02-09 13:39     ` Jason Gunthorpe
2021-02-09 13:39     ` Jason Gunthorpe
2021-02-09 13:39     ` [Nouveau] " Jason Gunthorpe
2021-02-10  3:40     ` Alistair Popple
2021-02-10  3:40       ` Alistair Popple
2021-02-10  3:40       ` Alistair Popple
2021-02-10  3:40       ` [Nouveau] " Alistair Popple
2021-02-10 12:56       ` Jason Gunthorpe [this message]
2021-02-10 12:56         ` Jason Gunthorpe
2021-02-10 12:56         ` Jason Gunthorpe
2021-02-10 12:56         ` [Nouveau] " Jason Gunthorpe
2021-02-09  1:07 ` [PATCH 2/9] mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup() Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 3/9] mm/migrate: Add a unmap and pin migration mode Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 4/9] Documentation: Add unmap and pin to HMM Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 5/9] hmm-tests: Add test for unmap and pin Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 6/9] nouveau/dmem: Only map migrating pages Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 7/9] nouveau/svm: Refactor nouveau_range_fault Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 8/9] nouveau/dmem: Add support for multiple page types Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09  1:07 ` [PATCH 9/9] nouveau/svm: Implement atomic SVM access Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` Alistair Popple
2021-02-09  1:07   ` [Nouveau] " Alistair Popple
2021-02-09 10:27 ` [PATCH 0/9] Add support for SVM atomics in Nouveau Daniel Vetter
2021-02-09 10:27   ` Daniel Vetter
2021-02-09 10:27   ` Daniel Vetter
2021-02-09 10:27   ` [Nouveau] " Daniel Vetter
2021-02-09 10:27   ` Daniel Vetter
2021-02-09 12:57   ` Alistair Popple
2021-02-09 12:57     ` Alistair Popple
2021-02-09 12:57     ` Alistair Popple
2021-02-09 12:57     ` [Nouveau] " Alistair Popple
2021-02-09 13:35     ` Jason Gunthorpe
2021-02-09 13:35       ` Jason Gunthorpe
2021-02-09 13:35       ` Jason Gunthorpe
2021-02-09 13:35       ` [Nouveau] " Jason Gunthorpe
2021-02-09 13:39       ` Daniel Vetter
2021-02-09 13:39         ` Daniel Vetter
2021-02-09 13:39         ` Daniel Vetter
2021-02-09 13:39         ` [Nouveau] " Daniel Vetter
2021-02-09 13:39         ` Daniel Vetter
2021-02-09 13:44         ` Jason Gunthorpe
2021-02-09 13:44           ` Jason Gunthorpe
2021-02-09 13:44           ` Jason Gunthorpe
2021-02-09 13:44           ` [Nouveau] " Jason Gunthorpe
2021-02-09 21:17       ` Jerome Glisse
2021-02-09 21:17         ` Jerome Glisse
2021-02-09 21:17         ` Jerome Glisse
2021-02-09 21:17         ` [Nouveau] " Jerome Glisse
2021-02-10 17:56         ` Jason Gunthorpe
2021-02-10 17:56           ` Jason Gunthorpe
2021-02-10 17:56           ` Jason Gunthorpe
2021-02-10 17:56           ` [Nouveau] " Jason Gunthorpe
2021-02-09 13:37     ` Daniel Vetter
2021-02-09 13:37       ` Daniel Vetter
2021-02-09 13:37       ` Daniel Vetter
2021-02-09 13:37       ` [Nouveau] " Daniel Vetter
2021-02-09 13:37       ` Daniel Vetter
2021-02-09 20:53       ` John Hubbard
2021-02-09 20:53         ` John Hubbard
2021-02-09 20:53         ` John Hubbard
2021-02-09 20:53         ` [Nouveau] " John Hubbard
2021-02-10 12:59         ` Daniel Vetter
2021-02-10 12:59           ` Daniel Vetter
2021-02-10 12:59           ` Daniel Vetter
2021-02-10 12:59           ` [Nouveau] " Daniel Vetter
2021-02-11  2:26           ` John Hubbard
2021-02-11  2:26             ` John Hubbard
2021-02-11  2:26             ` John Hubbard
2021-02-11  2:26             ` [Nouveau] " John Hubbard
2021-02-10 17:59         ` Jason Gunthorpe
2021-02-10 17:59           ` Jason Gunthorpe
2021-02-10 17:59           ` Jason Gunthorpe
2021-02-10 17:59           ` [Nouveau] " Jason Gunthorpe
2021-02-11  7:55           ` Christoph Hellwig
2021-02-11  7:55             ` [Nouveau] " Christoph Hellwig
2021-02-17 23:00             ` Alistair Popple
2021-02-17 23:00               ` Alistair Popple
2021-02-17 23:00               ` Alistair Popple
2021-02-17 23:00               ` [Nouveau] " Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210210125634.GL4718@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.