From: Daniel Vetter <daniel@ffwll.ch> To: christian.koenig@amd.com Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, dri-devel <dri-devel@lists.freedesktop.org>, amd-gfx list <amd-gfx@lists.freedesktop.org>, Greg KH <gregkh@linuxfoundation.org>, Alex Deucher <Alexander.Deucher@amd.com>, Qiang Yu <yuq825@gmail.com> Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Date: Mon, 8 Feb 2021 10:48:53 +0100 [thread overview] Message-ID: <YCEJBfA6ce4dD3JT@phenom.ffwll.local> (raw) In-Reply-To: <fcb2cf17-d011-55c6-1545-9fa190e358c3@gmail.com> On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote: > Am 07.02.21 um 22:50 schrieb Daniel Vetter: > > [SNIP] > > > Clarification - as far as I know there are no page fault handlers for kernel > > > mappings. And we are talking about kernel mappings here, right ? If there were > > > I could solve all those issues the same as I do for user mappings, by > > > invalidating all existing mappings in the kernel (both kmaps and ioreamps)and > > > insert dummy zero or ~0 filled page instead. > > > Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve > > > ioremap API and it's not something that I think can be easily done according to > > > am answer i got to a related topic a few weeks ago > > > https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply > > > i got) > > mmiotrace can, but only for debug, and only on x86 platforms: > > > > https://www.kernel.org/doc/html/latest/trace/mmiotrace.html > > > > Should be feasible (but maybe not worth the effort) to extend this to > > support fake unplug. > > Mhm, interesting idea you guys brought up here. > > We don't need a page fault for this to work, all we need to do is to insert > dummy PTEs into the kernels page table at the place where previously the > MMIO mapping has been. Simply pte trick isn't enough, because we need: - drop all writes silently - all reads return 0xff ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik. > > > > But ugh ... > > > > > > > > Otoh validating an entire driver like amdgpu without such a trick > > > > against 0xff reads is practically impossible. So maybe you need to add > > > > this as one of the tasks here? > > > Or I could just for validation purposes return ~0 from all reg reads in the code > > > and ignore writes if drm_dev_unplugged, this could already easily validate a big > > > portion of the code flow under such scenario. > > Hm yeah if your really wrap them all, that should work too. Since > > iommappings have __iomem pointer type, as long as amdgpu is sparse > > warning free, should be doable to guarantee this. > > Problem is that ~0 is not always a valid register value. > > You would need to audit every register read that it doesn't use the returned > value blindly as index or similar. That is quite a bit of work. Yeah that's the entire crux here :-/ -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch> To: christian.koenig@amd.com Cc: Rob Herring <robh@kernel.org>, Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>, Daniel Vetter <daniel.vetter@ffwll.ch>, dri-devel <dri-devel@lists.freedesktop.org>, "Anholt, Eric" <eric@anholt.net>, Pekka Paalanen <ppaalanen@gmail.com>, amd-gfx list <amd-gfx@lists.freedesktop.org>, Greg KH <gregkh@linuxfoundation.org>, Alex Deucher <Alexander.Deucher@amd.com>, Lucas Stach <l.stach@pengutronix.de>, "Wentland, Harry" <Harry.Wentland@amd.com>, Qiang Yu <yuq825@gmail.com> Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Date: Mon, 8 Feb 2021 10:48:53 +0100 [thread overview] Message-ID: <YCEJBfA6ce4dD3JT@phenom.ffwll.local> (raw) In-Reply-To: <fcb2cf17-d011-55c6-1545-9fa190e358c3@gmail.com> On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote: > Am 07.02.21 um 22:50 schrieb Daniel Vetter: > > [SNIP] > > > Clarification - as far as I know there are no page fault handlers for kernel > > > mappings. And we are talking about kernel mappings here, right ? If there were > > > I could solve all those issues the same as I do for user mappings, by > > > invalidating all existing mappings in the kernel (both kmaps and ioreamps)and > > > insert dummy zero or ~0 filled page instead. > > > Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve > > > ioremap API and it's not something that I think can be easily done according to > > > am answer i got to a related topic a few weeks ago > > > https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply > > > i got) > > mmiotrace can, but only for debug, and only on x86 platforms: > > > > https://www.kernel.org/doc/html/latest/trace/mmiotrace.html > > > > Should be feasible (but maybe not worth the effort) to extend this to > > support fake unplug. > > Mhm, interesting idea you guys brought up here. > > We don't need a page fault for this to work, all we need to do is to insert > dummy PTEs into the kernels page table at the place where previously the > MMIO mapping has been. Simply pte trick isn't enough, because we need: - drop all writes silently - all reads return 0xff ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik. > > > > But ugh ... > > > > > > > > Otoh validating an entire driver like amdgpu without such a trick > > > > against 0xff reads is practically impossible. So maybe you need to add > > > > this as one of the tasks here? > > > Or I could just for validation purposes return ~0 from all reg reads in the code > > > and ignore writes if drm_dev_unplugged, this could already easily validate a big > > > portion of the code flow under such scenario. > > Hm yeah if your really wrap them all, that should work too. Since > > iommappings have __iomem pointer type, as long as amdgpu is sparse > > warning free, should be doable to guarantee this. > > Problem is that ~0 is not always a valid register value. > > You would need to audit every register read that it doesn't use the returned > value blindly as index or similar. That is quite a bit of work. Yeah that's the entire crux here :-/ -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2021-02-08 9:49 UTC|newest] Thread overview: 196+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-18 21:01 [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 01/14] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:48 ` Alex Deucher 2021-01-18 21:48 ` Alex Deucher 2021-01-19 8:41 ` Christian König 2021-01-19 8:41 ` Christian König 2021-01-19 13:56 ` Daniel Vetter 2021-01-19 13:56 ` Daniel Vetter 2021-01-25 15:28 ` Andrey Grodzovsky 2021-01-25 15:28 ` Andrey Grodzovsky 2021-01-27 14:29 ` Andrey Grodzovsky 2021-01-27 14:29 ` Andrey Grodzovsky 2021-02-02 14:21 ` Daniel Vetter 2021-02-02 14:21 ` Daniel Vetter 2021-01-18 21:01 ` [PATCH v4 02/14] drm: Unamp the entire device address space on device unplug Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 03/14] drm/ttm: Expose ttm_tt_unpopulate for driver use Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 04/14] drm/sched: Cancel and flush all oustatdning jobs before finish Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:49 ` Alex Deucher 2021-01-18 21:49 ` Alex Deucher 2021-01-19 8:42 ` Christian König 2021-01-19 8:42 ` Christian König 2021-01-19 9:50 ` Christian König 2021-01-19 9:50 ` Christian König 2021-01-18 21:01 ` [PATCH v4 05/14] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:45 ` Christian König 2021-01-19 8:45 ` Christian König 2021-01-18 21:01 ` [PATCH v4 06/14] drm/amdgpu: Add early fini callback Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:52 ` Alex Deucher 2021-01-18 21:52 ` Alex Deucher 2021-01-19 8:48 ` Christian König 2021-01-19 8:48 ` Christian König 2021-01-19 13:45 ` Daniel Vetter 2021-01-19 13:45 ` Daniel Vetter 2021-01-19 21:21 ` Andrey Grodzovsky 2021-01-19 21:21 ` Andrey Grodzovsky 2021-01-19 22:01 ` Daniel Vetter 2021-01-19 22:01 ` Daniel Vetter 2021-01-20 4:21 ` Andrey Grodzovsky 2021-01-20 4:21 ` Andrey Grodzovsky 2021-01-20 8:38 ` Daniel Vetter 2021-01-20 8:38 ` Daniel Vetter [not found] ` <1a5f7ccb-1f91-91be-1cb1-e7cb43ac2c13@amd.com> 2021-01-21 10:48 ` Daniel Vetter 2021-01-21 10:48 ` Daniel Vetter 2021-01-20 5:01 ` Andrey Grodzovsky 2021-01-20 5:01 ` Andrey Grodzovsky 2021-01-20 19:38 ` Andrey Grodzovsky 2021-01-20 19:38 ` Andrey Grodzovsky 2021-01-21 10:42 ` Christian König 2021-01-21 10:42 ` Christian König 2021-01-18 21:01 ` [PATCH v4 08/14] drm/amdgpu: Fix a bunch of sdma code crash post device unplug Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:51 ` Christian König 2021-01-19 8:51 ` Christian König 2021-01-18 21:01 ` [PATCH v4 09/14] drm/amdgpu: Remap all page faults to per process dummy page Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:52 ` Christian König 2021-01-19 8:52 ` Christian König 2021-01-18 21:01 ` [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 7:34 ` Greg KH 2021-01-19 7:34 ` Greg KH 2021-01-19 16:36 ` Andrey Grodzovsky 2021-01-19 16:36 ` Andrey Grodzovsky 2021-01-19 17:47 ` Greg KH 2021-01-19 17:47 ` Greg KH 2021-01-19 19:04 ` Alex Deucher 2021-01-19 19:04 ` Alex Deucher 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-19 19:41 ` Greg KH 2021-01-19 19:41 ` Greg KH 2021-01-19 8:53 ` Christian König 2021-01-19 8:53 ` Christian König 2021-01-18 21:01 ` [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:55 ` Christian König 2021-01-19 8:55 ` Christian König 2021-01-19 15:35 ` Andrey Grodzovsky 2021-01-19 15:35 ` Andrey Grodzovsky 2021-01-19 15:39 ` Christian König 2021-01-19 15:39 ` Christian König 2021-01-19 18:05 ` Daniel Vetter 2021-01-19 18:05 ` Daniel Vetter 2021-01-19 18:22 ` Andrey Grodzovsky 2021-01-19 18:22 ` Andrey Grodzovsky 2021-01-19 18:59 ` Christian König 2021-01-19 18:59 ` Christian König 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-20 19:34 ` Andrey Grodzovsky 2021-01-20 19:34 ` Andrey Grodzovsky 2021-01-28 17:23 ` Andrey Grodzovsky 2021-01-28 17:23 ` Andrey Grodzovsky 2021-01-29 15:16 ` Christian König 2021-01-29 15:16 ` Christian König 2021-01-29 17:35 ` Andrey Grodzovsky 2021-01-29 17:35 ` Andrey Grodzovsky 2021-01-29 19:25 ` Christian König 2021-01-29 19:25 ` Christian König 2021-02-05 16:22 ` Andrey Grodzovsky 2021-02-05 16:22 ` Andrey Grodzovsky 2021-02-05 22:10 ` Daniel Vetter 2021-02-05 22:10 ` Daniel Vetter 2021-02-05 23:09 ` Andrey Grodzovsky 2021-02-05 23:09 ` Andrey Grodzovsky 2021-02-06 14:18 ` Daniel Vetter 2021-02-06 14:18 ` Daniel Vetter 2021-02-07 21:28 ` Andrey Grodzovsky 2021-02-07 21:28 ` Andrey Grodzovsky 2021-02-07 21:50 ` Daniel Vetter 2021-02-07 21:50 ` Daniel Vetter 2021-02-08 9:37 ` Christian König 2021-02-08 9:37 ` Christian König 2021-02-08 9:48 ` Daniel Vetter [this message] 2021-02-08 9:48 ` Daniel Vetter 2021-02-08 10:03 ` Christian König 2021-02-08 10:03 ` Christian König 2021-02-08 10:11 ` Daniel Vetter 2021-02-08 10:11 ` Daniel Vetter 2021-02-08 13:59 ` Christian König 2021-02-08 13:59 ` Christian König 2021-02-08 16:23 ` Daniel Vetter 2021-02-08 16:23 ` Daniel Vetter 2021-02-08 22:15 ` Andrey Grodzovsky 2021-02-08 22:15 ` Andrey Grodzovsky 2021-02-09 7:58 ` Christian König 2021-02-09 7:58 ` Christian König 2021-02-09 14:30 ` Andrey Grodzovsky 2021-02-09 14:30 ` Andrey Grodzovsky 2021-02-09 15:40 ` Christian König 2021-02-09 15:40 ` Christian König 2021-02-10 22:01 ` Andrey Grodzovsky 2021-02-10 22:01 ` Andrey Grodzovsky 2021-02-12 15:00 ` Andrey Grodzovsky 2021-02-12 15:00 ` Andrey Grodzovsky 2021-02-08 22:09 ` Andrey Grodzovsky 2021-02-08 22:09 ` Andrey Grodzovsky 2021-02-09 8:27 ` Christian König 2021-02-09 8:27 ` Christian König 2021-02-09 9:46 ` Daniel Vetter 2021-02-09 9:46 ` Daniel Vetter 2021-01-18 21:01 ` [PATCH v4 12/14] drm/scheduler: Job timeout handler returns status Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 7:53 ` Christian König 2021-01-19 7:53 ` Christian König 2021-01-19 17:47 ` Luben Tuikov 2021-01-19 17:47 ` Luben Tuikov 2021-01-19 18:53 ` Christian König 2021-01-19 18:53 ` Christian König 2021-01-18 21:01 ` [PATCH v4 13/14] drm/sched: Make timeout timer rearm conditional Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 14/14] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 14:16 ` [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Daniel Vetter 2021-01-19 14:16 ` Daniel Vetter 2021-01-19 17:31 ` Andrey Grodzovsky 2021-01-19 17:31 ` Andrey Grodzovsky 2021-01-19 18:08 ` Daniel Vetter 2021-01-19 18:08 ` Daniel Vetter 2021-01-19 18:18 ` Andrey Grodzovsky 2021-01-19 18:18 ` Andrey Grodzovsky 2021-01-20 9:05 ` Daniel Vetter 2021-01-20 9:05 ` Daniel Vetter 2021-01-20 14:19 ` Andrey Grodzovsky 2021-01-20 14:19 ` Andrey Grodzovsky 2021-01-20 15:59 ` Daniel Vetter 2021-01-20 15:59 ` Daniel Vetter 2021-02-08 5:59 ` Andrey Grodzovsky 2021-02-08 5:59 ` Andrey Grodzovsky 2021-02-08 7:27 ` Daniel Vetter 2021-02-08 7:27 ` Daniel Vetter 2021-02-09 4:01 ` Andrey Grodzovsky 2021-02-09 4:01 ` Andrey Grodzovsky 2021-02-09 9:50 ` Daniel Vetter 2021-02-09 9:50 ` Daniel Vetter 2021-02-09 15:34 ` Andrey Grodzovsky 2021-02-09 15:34 ` Andrey Grodzovsky 2021-02-18 20:03 ` Andrey Grodzovsky 2021-02-18 20:03 ` Andrey Grodzovsky 2021-02-19 10:24 ` Daniel Vetter 2021-02-19 10:24 ` Daniel Vetter 2021-02-24 16:30 ` Andrey Grodzovsky 2021-02-24 16:30 ` Andrey Grodzovsky 2021-02-25 10:25 ` Daniel Vetter 2021-02-25 10:25 ` Daniel Vetter 2021-02-25 16:12 ` Andrey Grodzovsky 2021-02-25 16:12 ` Andrey Grodzovsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YCEJBfA6ce4dD3JT@phenom.ffwll.local \ --to=daniel@ffwll.ch \ --cc=Alexander.Deucher@amd.com \ --cc=amd-gfx@lists.freedesktop.org \ --cc=christian.koenig@amd.com \ --cc=daniel.vetter@ffwll.ch \ --cc=dri-devel@lists.freedesktop.org \ --cc=gregkh@linuxfoundation.org \ --cc=yuq825@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.