All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	Lyude Paul <lyude@redhat.com>
Subject: Re: [PATCH v2 0/8] Fix several device private page reference counting issues
Date: Wed, 26 Oct 2022 12:47:15 +1100	[thread overview]
Message-ID: <8735bbuyvs.fsf@nvidia.com> (raw)
In-Reply-To: <f36153fe-214c-2904-e155-ab9cee8a2a2c@kernel.org>


"Vlastimil Babka (SUSE)" <vbabka@kernel.org> writes:

> On 9/28/22 14:01, Alistair Popple wrote:
>> This series aims to fix a number of page reference counting issues in
>> drivers dealing with device private ZONE_DEVICE pages. These result in
>> use-after-free type bugs, either from accessing a struct page which no
>> longer exists because it has been removed or accessing fields within the
>> struct page which are no longer valid because the page has been freed.
>>
>> During normal usage it is unlikely these will cause any problems. However
>> without these fixes it is possible to crash the kernel from userspace.
>> These crashes can be triggered either by unloading the kernel module or
>> unbinding the device from the driver prior to a userspace task exiting. In
>> modules such as Nouveau it is also possible to trigger some of these issues
>> by explicitly closing the device file-descriptor prior to the task exiting
>> and then accessing device private memory.
>
> Hi, as this series was noticed to create a CVE [1], do you think a stable
> backport is warranted? I think the "It is possible to launch the attack
> remotely." in [1] is incorrect though, right?

Right, I don't see how this could be exploited remotely. And I'm pretty
sure you need root as well because in practice the pgmap needs to be
freed, and for Nouveau at least that only happens on device removal.

> It looks to me that patch 1 would be needed since the CONFIG_DEVICE_PRIVATE
> introduction, while the following few only to kernels with 27674ef6c73f
> (probably not so critical as that includes no LTS)?

Patch 3 already has a fixes tag for 27674ef6c73f. Patch 1 would need to
go back to CONFIG_DEVICE_PRIVATE introduction. I think patches 4-8 would
also need to go back to introduction of CONFIG_DEVICE_PRIVATE, but there
isn't as much impact there and they would be harder to backport I think.
Without them device removal can loop indefinitely in kernel mode (if
patch 3 is present or the kernel is older than 27674ef6c73f).

 - Alistair

> Thanks,
> Vlastimil
>
> [1] https://nvd.nist.gov/vuln/detail/CVE-2022-3523
>
>> This involves some minor changes to both PowerPC and AMD GPU code.
>> Unfortunately I lack hardware to test either of those so any help there
>> would be appreciated. The changes mimic what is done in for both Nouveau
>> and hmm-tests though so I doubt they will cause problems.
>>
>> To: Andrew Morton <akpm@linux-foundation.org>
>> To: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: amd-gfx@lists.freedesktop.org
>> Cc: nouveau@lists.freedesktop.org
>> Cc: dri-devel@lists.freedesktop.org
>>
>> Alistair Popple (8):
>>   mm/memory.c: Fix race when faulting a device private page
>>   mm: Free device private pages have zero refcount
>>   mm/memremap.c: Take a pgmap reference on page allocation
>>   mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
>>   mm/migrate_device.c: Add migrate_device_range()
>>   nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
>>   nouveau/dmem: Evict device private memory during release
>>   hmm-tests: Add test for migrate_device_range()
>>
>>  arch/powerpc/kvm/book3s_hv_uvmem.c       |  17 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  19 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c     |  11 +-
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 108 +++++++----
>>  include/linux/memremap.h                 |   1 +-
>>  include/linux/migrate.h                  |  15 ++-
>>  lib/test_hmm.c                           | 129 ++++++++++---
>>  lib/test_hmm_uapi.h                      |   1 +-
>>  mm/memory.c                              |  16 +-
>>  mm/memremap.c                            |  30 ++-
>>  mm/migrate.c                             |  34 +--
>>  mm/migrate_device.c                      | 239 +++++++++++++++++-------
>>  mm/page_alloc.c                          |   8 +-
>>  tools/testing/selftests/vm/hmm-tests.c   |  49 +++++-
>>  15 files changed, 516 insertions(+), 163 deletions(-)
>>
>> base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: nouveau@lists.freedesktop.org,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, amd-gfx@lists.freedesktop.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Nouveau] [PATCH v2 0/8] Fix several device private page reference counting issues
Date: Wed, 26 Oct 2022 12:47:15 +1100	[thread overview]
Message-ID: <8735bbuyvs.fsf@nvidia.com> (raw)
In-Reply-To: <f36153fe-214c-2904-e155-ab9cee8a2a2c@kernel.org>


"Vlastimil Babka (SUSE)" <vbabka@kernel.org> writes:

> On 9/28/22 14:01, Alistair Popple wrote:
>> This series aims to fix a number of page reference counting issues in
>> drivers dealing with device private ZONE_DEVICE pages. These result in
>> use-after-free type bugs, either from accessing a struct page which no
>> longer exists because it has been removed or accessing fields within the
>> struct page which are no longer valid because the page has been freed.
>>
>> During normal usage it is unlikely these will cause any problems. However
>> without these fixes it is possible to crash the kernel from userspace.
>> These crashes can be triggered either by unloading the kernel module or
>> unbinding the device from the driver prior to a userspace task exiting. In
>> modules such as Nouveau it is also possible to trigger some of these issues
>> by explicitly closing the device file-descriptor prior to the task exiting
>> and then accessing device private memory.
>
> Hi, as this series was noticed to create a CVE [1], do you think a stable
> backport is warranted? I think the "It is possible to launch the attack
> remotely." in [1] is incorrect though, right?

Right, I don't see how this could be exploited remotely. And I'm pretty
sure you need root as well because in practice the pgmap needs to be
freed, and for Nouveau at least that only happens on device removal.

> It looks to me that patch 1 would be needed since the CONFIG_DEVICE_PRIVATE
> introduction, while the following few only to kernels with 27674ef6c73f
> (probably not so critical as that includes no LTS)?

Patch 3 already has a fixes tag for 27674ef6c73f. Patch 1 would need to
go back to CONFIG_DEVICE_PRIVATE introduction. I think patches 4-8 would
also need to go back to introduction of CONFIG_DEVICE_PRIVATE, but there
isn't as much impact there and they would be harder to backport I think.
Without them device removal can loop indefinitely in kernel mode (if
patch 3 is present or the kernel is older than 27674ef6c73f).

 - Alistair

> Thanks,
> Vlastimil
>
> [1] https://nvd.nist.gov/vuln/detail/CVE-2022-3523
>
>> This involves some minor changes to both PowerPC and AMD GPU code.
>> Unfortunately I lack hardware to test either of those so any help there
>> would be appreciated. The changes mimic what is done in for both Nouveau
>> and hmm-tests though so I doubt they will cause problems.
>>
>> To: Andrew Morton <akpm@linux-foundation.org>
>> To: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: amd-gfx@lists.freedesktop.org
>> Cc: nouveau@lists.freedesktop.org
>> Cc: dri-devel@lists.freedesktop.org
>>
>> Alistair Popple (8):
>>   mm/memory.c: Fix race when faulting a device private page
>>   mm: Free device private pages have zero refcount
>>   mm/memremap.c: Take a pgmap reference on page allocation
>>   mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
>>   mm/migrate_device.c: Add migrate_device_range()
>>   nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
>>   nouveau/dmem: Evict device private memory during release
>>   hmm-tests: Add test for migrate_device_range()
>>
>>  arch/powerpc/kvm/book3s_hv_uvmem.c       |  17 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  19 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c     |  11 +-
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 108 +++++++----
>>  include/linux/memremap.h                 |   1 +-
>>  include/linux/migrate.h                  |  15 ++-
>>  lib/test_hmm.c                           | 129 ++++++++++---
>>  lib/test_hmm_uapi.h                      |   1 +-
>>  mm/memory.c                              |  16 +-
>>  mm/memremap.c                            |  30 ++-
>>  mm/migrate.c                             |  34 +--
>>  mm/migrate_device.c                      | 239 +++++++++++++++++-------
>>  mm/page_alloc.c                          |   8 +-
>>  tools/testing/selftests/vm/hmm-tests.c   |  49 +++++-
>>  15 files changed, 516 insertions(+), 163 deletions(-)
>>
>> base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: nouveau@lists.freedesktop.org,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, amd-gfx@lists.freedesktop.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 0/8] Fix several device private page reference counting issues
Date: Wed, 26 Oct 2022 12:47:15 +1100	[thread overview]
Message-ID: <8735bbuyvs.fsf@nvidia.com> (raw)
In-Reply-To: <f36153fe-214c-2904-e155-ab9cee8a2a2c@kernel.org>


"Vlastimil Babka (SUSE)" <vbabka@kernel.org> writes:

> On 9/28/22 14:01, Alistair Popple wrote:
>> This series aims to fix a number of page reference counting issues in
>> drivers dealing with device private ZONE_DEVICE pages. These result in
>> use-after-free type bugs, either from accessing a struct page which no
>> longer exists because it has been removed or accessing fields within the
>> struct page which are no longer valid because the page has been freed.
>>
>> During normal usage it is unlikely these will cause any problems. However
>> without these fixes it is possible to crash the kernel from userspace.
>> These crashes can be triggered either by unloading the kernel module or
>> unbinding the device from the driver prior to a userspace task exiting. In
>> modules such as Nouveau it is also possible to trigger some of these issues
>> by explicitly closing the device file-descriptor prior to the task exiting
>> and then accessing device private memory.
>
> Hi, as this series was noticed to create a CVE [1], do you think a stable
> backport is warranted? I think the "It is possible to launch the attack
> remotely." in [1] is incorrect though, right?

Right, I don't see how this could be exploited remotely. And I'm pretty
sure you need root as well because in practice the pgmap needs to be
freed, and for Nouveau at least that only happens on device removal.

> It looks to me that patch 1 would be needed since the CONFIG_DEVICE_PRIVATE
> introduction, while the following few only to kernels with 27674ef6c73f
> (probably not so critical as that includes no LTS)?

Patch 3 already has a fixes tag for 27674ef6c73f. Patch 1 would need to
go back to CONFIG_DEVICE_PRIVATE introduction. I think patches 4-8 would
also need to go back to introduction of CONFIG_DEVICE_PRIVATE, but there
isn't as much impact there and they would be harder to backport I think.
Without them device removal can loop indefinitely in kernel mode (if
patch 3 is present or the kernel is older than 27674ef6c73f).

 - Alistair

> Thanks,
> Vlastimil
>
> [1] https://nvd.nist.gov/vuln/detail/CVE-2022-3523
>
>> This involves some minor changes to both PowerPC and AMD GPU code.
>> Unfortunately I lack hardware to test either of those so any help there
>> would be appreciated. The changes mimic what is done in for both Nouveau
>> and hmm-tests though so I doubt they will cause problems.
>>
>> To: Andrew Morton <akpm@linux-foundation.org>
>> To: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: amd-gfx@lists.freedesktop.org
>> Cc: nouveau@lists.freedesktop.org
>> Cc: dri-devel@lists.freedesktop.org
>>
>> Alistair Popple (8):
>>   mm/memory.c: Fix race when faulting a device private page
>>   mm: Free device private pages have zero refcount
>>   mm/memremap.c: Take a pgmap reference on page allocation
>>   mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
>>   mm/migrate_device.c: Add migrate_device_range()
>>   nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
>>   nouveau/dmem: Evict device private memory during release
>>   hmm-tests: Add test for migrate_device_range()
>>
>>  arch/powerpc/kvm/book3s_hv_uvmem.c       |  17 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  19 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c     |  11 +-
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 108 +++++++----
>>  include/linux/memremap.h                 |   1 +-
>>  include/linux/migrate.h                  |  15 ++-
>>  lib/test_hmm.c                           | 129 ++++++++++---
>>  lib/test_hmm_uapi.h                      |   1 +-
>>  mm/memory.c                              |  16 +-
>>  mm/memremap.c                            |  30 ++-
>>  mm/migrate.c                             |  34 +--
>>  mm/migrate_device.c                      | 239 +++++++++++++++++-------
>>  mm/page_alloc.c                          |   8 +-
>>  tools/testing/selftests/vm/hmm-tests.c   |  49 +++++-
>>  15 files changed, 516 insertions(+), 163 deletions(-)
>>
>> base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786

  reply	other threads:[~2022-10-26  2:01 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28 12:01 [PATCH v2 0/8] Fix several device private page reference counting issues Alistair Popple
2022-09-28 12:01 ` [Nouveau] " Alistair Popple
2022-09-28 12:01 ` Alistair Popple
2022-09-28 12:01 ` [PATCH v2 1/8] mm/memory.c: Fix race when faulting a device private page Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-29 18:30   ` Felix Kuehling
2022-09-29 18:30     ` [Nouveau] " Felix Kuehling
2022-09-29 18:30     ` Felix Kuehling
2022-10-03  0:53     ` Alistair Popple
2022-10-03  0:53       ` Alistair Popple
2022-10-03  0:53       ` [Nouveau] " Alistair Popple
2022-10-03 17:34       ` Felix Kuehling
2022-10-03 17:34         ` Felix Kuehling
2022-10-03 17:34         ` [Nouveau] " Felix Kuehling
2022-09-28 12:01 ` [PATCH v2 2/8] mm: Free device private pages have zero refcount Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-29 19:21   ` Felix Kuehling
2022-09-29 19:21     ` [Nouveau] " Felix Kuehling
2022-09-29 19:21     ` Felix Kuehling
2022-09-28 12:01 ` [PATCH v2 3/8] mm/memremap.c: Take a pgmap reference on page allocation Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01 ` [PATCH v2 4/8] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page() Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-28 12:01 ` [PATCH v2 5/8] mm/migrate_device.c: Add migrate_device_range() Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-28 12:01 ` [PATCH v2 6/8] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one() Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-28 12:01 ` [PATCH v2 7/8] nouveau/dmem: Evict device private memory during release Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` [Nouveau] " Alistair Popple
2022-09-28 21:37   ` Lyude Paul
2022-09-28 21:37     ` Lyude Paul
2022-09-28 21:37     ` [Nouveau] " Lyude Paul
2022-09-28 12:01 ` [Nouveau] [PATCH v2 8/8] hmm-tests: Add test for migrate_device_range() Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 12:01   ` Alistair Popple
2022-09-28 15:10   ` Andrew Morton
2022-09-28 15:10     ` Andrew Morton
2022-09-28 15:10     ` [Nouveau] " Andrew Morton
2022-09-29 11:00     ` Alistair Popple
2022-09-29 11:00       ` Alistair Popple
2022-09-29 11:00       ` [Nouveau] " Alistair Popple
2022-10-25 10:17 ` [PATCH v2 0/8] Fix several device private page reference counting issues Vlastimil Babka (SUSE)
2022-10-25 10:17   ` [Nouveau] " Vlastimil Babka (SUSE)
2022-10-25 10:17   ` Vlastimil Babka (SUSE)
2022-10-26  1:47   ` Alistair Popple [this message]
2022-10-26  1:47     ` Alistair Popple
2022-10-26  1:47     ` [Nouveau] " Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735bbuyvs.fsf@nvidia.com \
    --to=apopple@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.