All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Daniel Vetter" <daniel@ffwll.ch>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Alex Deucher <Alexander.Deucher@amd.com>,
	Qiang Yu <yuq825@gmail.com>
Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal
Date: Fri, 12 Feb 2021 10:00:05 -0500	[thread overview]
Message-ID: <d8f6d2fe-09a0-1eab-b20d-06ecfdd167fa@amd.com> (raw)
In-Reply-To: <7cbc510e-275c-5747-32d0-ce6a5f531cfe@amd.com>

Ping

Andrey

On 2/10/21 5:01 PM, Andrey Grodzovsky wrote:
>
> On 2/9/21 10:40 AM, Christian König wrote:
>> Am 09.02.21 um 15:30 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>> Question - Why can't we just set those PTEs to point to system memory 
>>>>> (another RO dummy page)
>>>>> filled with 1s ?
>>>>
>>>>
>>>> Then writes are not discarded. E.g. the 1s would change to something else.
>>>>
>>>> Christian.
>>>
>>>
>>> I see but, what about marking the mappings as RO and discarding the write 
>>> access page faults continuously until the device is finalized ?
>>> Regarding using an unused range behind the upper bridge as Daniel suggested, 
>>> I wonder will this interfere with
>>> the upcoming feature to support BARs movement  during hot plug - 
>>> https://www.spinics.net/lists/linux-pci/msg103195.html ?
>>
>> In the picture in my head the address will never be used.
>>
>> But it doesn't even needs to be an unused range of the root bridge. That one 
>> is usually stuffed full by the BIOS.
>>
>> For my BAR resize work I looked at quite a bunch of NB documentation. At 
>> least for AMD CPUs we should always have an MMIO address which is never ever 
>> used. That makes this platform/CPU dependent, but the code is just minimal.
>>
>> The really really nice thing about this approach is that you could unit test 
>> and audit the code for problems even without *real* hotplug hardware. E.g. we 
>> can swap the kernel PTEs and see how the whole power and display code reacts 
>> to that etc....
>>
>> Christian.
>
>
> Tried to play with hacking mmio tracer as Daniel suggested but just hanged the 
> machine so...
>
> Can you tell me how to dynamically obtain this kind of unused MMIO address ? I 
> think given such address, writing
> to which is dropped and reading from return all 1s, I can then do something 
> like bellow, if that what u meant -
>
> for (address = adev->rmmio; address < adev->rmmio_size; adress += PAGE_SIZE) {
>
>     old_pte = get_locked_pte(init_mm, address)
>     dummy_pte = pfn_pte(fake_mmio_address, 0);
>     set_pte(&old_pte, dummy_pte)
>
> }
>
> flush_tlb ???
>
> P.S I hope to obtain thunderbolt eGPU adapter soon so even if this won't work 
> I still will be able to test how the driver handles all 1s.
>
> Andrey
>
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> It's a nifty idea indeed otherwise ...
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> But ugh ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otoh validating an entire driver like amdgpu without such a trick
>>>>>>>>>>>>>> against 0xff reads is practically impossible. So maybe you need 
>>>>>>>>>>>>>> to add
>>>>>>>>>>>>>> this as one of the tasks here?
>>>>>>>>>>>>> Or I could just for validation purposes return ~0 from all reg 
>>>>>>>>>>>>> reads in the code
>>>>>>>>>>>>> and ignore writes if drm_dev_unplugged, this could already easily 
>>>>>>>>>>>>> validate a big
>>>>>>>>>>>>> portion of the code flow under such scenario.
>>>>>>>>>>>> Hm yeah if your really wrap them all, that should work too. Since
>>>>>>>>>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse
>>>>>>>>>>>> warning free, should be doable to guarantee this.
>>>>>>>>>>> Problem is that ~0 is not always a valid register value.
>>>>>>>>>>>
>>>>>>>>>>> You would need to audit every register read that it doesn't use the 
>>>>>>>>>>> returned
>>>>>>>>>>> value blindly as index or similar. That is quite a bit of work.
>>>>>>>>>> Yeah that's the entire crux here :-/
>>>>>>>>>> -Daniel
>>>>>>
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Daniel Vetter" <daniel@ffwll.ch>
Cc: Rob Herring <robh@kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	"Anholt, Eric" <eric@anholt.net>,
	Pekka Paalanen <ppaalanen@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Alex Deucher <Alexander.Deucher@amd.com>,
	Qiang Yu <yuq825@gmail.com>,
	"Wentland, Harry" <Harry.Wentland@amd.com>,
	Lucas Stach <l.stach@pengutronix.de>
Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal
Date: Fri, 12 Feb 2021 10:00:05 -0500	[thread overview]
Message-ID: <d8f6d2fe-09a0-1eab-b20d-06ecfdd167fa@amd.com> (raw)
In-Reply-To: <7cbc510e-275c-5747-32d0-ce6a5f531cfe@amd.com>

Ping

Andrey

On 2/10/21 5:01 PM, Andrey Grodzovsky wrote:
>
> On 2/9/21 10:40 AM, Christian König wrote:
>> Am 09.02.21 um 15:30 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>> Question - Why can't we just set those PTEs to point to system memory 
>>>>> (another RO dummy page)
>>>>> filled with 1s ?
>>>>
>>>>
>>>> Then writes are not discarded. E.g. the 1s would change to something else.
>>>>
>>>> Christian.
>>>
>>>
>>> I see but, what about marking the mappings as RO and discarding the write 
>>> access page faults continuously until the device is finalized ?
>>> Regarding using an unused range behind the upper bridge as Daniel suggested, 
>>> I wonder will this interfere with
>>> the upcoming feature to support BARs movement  during hot plug - 
>>> https://www.spinics.net/lists/linux-pci/msg103195.html ?
>>
>> In the picture in my head the address will never be used.
>>
>> But it doesn't even needs to be an unused range of the root bridge. That one 
>> is usually stuffed full by the BIOS.
>>
>> For my BAR resize work I looked at quite a bunch of NB documentation. At 
>> least for AMD CPUs we should always have an MMIO address which is never ever 
>> used. That makes this platform/CPU dependent, but the code is just minimal.
>>
>> The really really nice thing about this approach is that you could unit test 
>> and audit the code for problems even without *real* hotplug hardware. E.g. we 
>> can swap the kernel PTEs and see how the whole power and display code reacts 
>> to that etc....
>>
>> Christian.
>
>
> Tried to play with hacking mmio tracer as Daniel suggested but just hanged the 
> machine so...
>
> Can you tell me how to dynamically obtain this kind of unused MMIO address ? I 
> think given such address, writing
> to which is dropped and reading from return all 1s, I can then do something 
> like bellow, if that what u meant -
>
> for (address = adev->rmmio; address < adev->rmmio_size; adress += PAGE_SIZE) {
>
>     old_pte = get_locked_pte(init_mm, address)
>     dummy_pte = pfn_pte(fake_mmio_address, 0);
>     set_pte(&old_pte, dummy_pte)
>
> }
>
> flush_tlb ???
>
> P.S I hope to obtain thunderbolt eGPU adapter soon so even if this won't work 
> I still will be able to test how the driver handles all 1s.
>
> Andrey
>
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> It's a nifty idea indeed otherwise ...
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> But ugh ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otoh validating an entire driver like amdgpu without such a trick
>>>>>>>>>>>>>> against 0xff reads is practically impossible. So maybe you need 
>>>>>>>>>>>>>> to add
>>>>>>>>>>>>>> this as one of the tasks here?
>>>>>>>>>>>>> Or I could just for validation purposes return ~0 from all reg 
>>>>>>>>>>>>> reads in the code
>>>>>>>>>>>>> and ignore writes if drm_dev_unplugged, this could already easily 
>>>>>>>>>>>>> validate a big
>>>>>>>>>>>>> portion of the code flow under such scenario.
>>>>>>>>>>>> Hm yeah if your really wrap them all, that should work too. Since
>>>>>>>>>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse
>>>>>>>>>>>> warning free, should be doable to guarantee this.
>>>>>>>>>>> Problem is that ~0 is not always a valid register value.
>>>>>>>>>>>
>>>>>>>>>>> You would need to audit every register read that it doesn't use the 
>>>>>>>>>>> returned
>>>>>>>>>>> value blindly as index or similar. That is quite a bit of work.
>>>>>>>>>> Yeah that's the entire crux here :-/
>>>>>>>>>> -Daniel
>>>>>>
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2021-02-12 15:00 UTC|newest]

Thread overview: 196+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18 21:01 [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
2021-01-18 21:01 ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 01/14] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:48   ` Alex Deucher
2021-01-18 21:48     ` Alex Deucher
2021-01-19  8:41   ` Christian König
2021-01-19  8:41     ` Christian König
2021-01-19 13:56   ` Daniel Vetter
2021-01-19 13:56     ` Daniel Vetter
2021-01-25 15:28     ` Andrey Grodzovsky
2021-01-25 15:28       ` Andrey Grodzovsky
2021-01-27 14:29       ` Andrey Grodzovsky
2021-01-27 14:29         ` Andrey Grodzovsky
2021-02-02 14:21         ` Daniel Vetter
2021-02-02 14:21           ` Daniel Vetter
2021-01-18 21:01 ` [PATCH v4 02/14] drm: Unamp the entire device address space on device unplug Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 03/14] drm/ttm: Expose ttm_tt_unpopulate for driver use Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 04/14] drm/sched: Cancel and flush all oustatdning jobs before finish Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:49   ` Alex Deucher
2021-01-18 21:49     ` Alex Deucher
2021-01-19  8:42   ` Christian König
2021-01-19  8:42     ` Christian König
2021-01-19  9:50     ` Christian König
2021-01-19  9:50       ` Christian König
2021-01-18 21:01 ` [PATCH v4 05/14] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:45   ` Christian König
2021-01-19  8:45     ` Christian König
2021-01-18 21:01 ` [PATCH v4 06/14] drm/amdgpu: Add early fini callback Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:52   ` Alex Deucher
2021-01-18 21:52     ` Alex Deucher
2021-01-19  8:48   ` Christian König
2021-01-19  8:48     ` Christian König
2021-01-19 13:45     ` Daniel Vetter
2021-01-19 13:45       ` Daniel Vetter
2021-01-19 21:21       ` Andrey Grodzovsky
2021-01-19 21:21         ` Andrey Grodzovsky
2021-01-19 22:01         ` Daniel Vetter
2021-01-19 22:01           ` Daniel Vetter
2021-01-20  4:21           ` Andrey Grodzovsky
2021-01-20  4:21             ` Andrey Grodzovsky
2021-01-20  8:38             ` Daniel Vetter
2021-01-20  8:38               ` Daniel Vetter
     [not found]               ` <1a5f7ccb-1f91-91be-1cb1-e7cb43ac2c13@amd.com>
2021-01-21 10:48                 ` Daniel Vetter
2021-01-21 10:48                   ` Daniel Vetter
2021-01-20  5:01     ` Andrey Grodzovsky
2021-01-20  5:01       ` Andrey Grodzovsky
2021-01-20 19:38       ` Andrey Grodzovsky
2021-01-20 19:38         ` Andrey Grodzovsky
2021-01-21 10:42         ` Christian König
2021-01-21 10:42           ` Christian König
2021-01-18 21:01 ` [PATCH v4 08/14] drm/amdgpu: Fix a bunch of sdma code crash post device unplug Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:51   ` Christian König
2021-01-19  8:51     ` Christian König
2021-01-18 21:01 ` [PATCH v4 09/14] drm/amdgpu: Remap all page faults to per process dummy page Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:52   ` Christian König
2021-01-19  8:52     ` Christian König
2021-01-18 21:01 ` [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  7:34   ` Greg KH
2021-01-19  7:34     ` Greg KH
2021-01-19 16:36     ` Andrey Grodzovsky
2021-01-19 16:36       ` Andrey Grodzovsky
2021-01-19 17:47       ` Greg KH
2021-01-19 17:47         ` Greg KH
2021-01-19 19:04         ` Alex Deucher
2021-01-19 19:04           ` Alex Deucher
2021-01-19 19:16           ` Andrey Grodzovsky
2021-01-19 19:16             ` Andrey Grodzovsky
2021-01-19 19:41           ` Greg KH
2021-01-19 19:41             ` Greg KH
2021-01-19  8:53   ` Christian König
2021-01-19  8:53     ` Christian König
2021-01-18 21:01 ` [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:55   ` Christian König
2021-01-19  8:55     ` Christian König
2021-01-19 15:35     ` Andrey Grodzovsky
2021-01-19 15:35       ` Andrey Grodzovsky
2021-01-19 15:39       ` Christian König
2021-01-19 15:39         ` Christian König
2021-01-19 18:05       ` Daniel Vetter
2021-01-19 18:05         ` Daniel Vetter
2021-01-19 18:22         ` Andrey Grodzovsky
2021-01-19 18:22           ` Andrey Grodzovsky
2021-01-19 18:59           ` Christian König
2021-01-19 18:59             ` Christian König
2021-01-19 19:16             ` Andrey Grodzovsky
2021-01-19 19:16               ` Andrey Grodzovsky
2021-01-20 19:34               ` Andrey Grodzovsky
2021-01-20 19:34                 ` Andrey Grodzovsky
2021-01-28 17:23             ` Andrey Grodzovsky
2021-01-28 17:23               ` Andrey Grodzovsky
2021-01-29 15:16               ` Christian König
2021-01-29 15:16                 ` Christian König
2021-01-29 17:35                 ` Andrey Grodzovsky
2021-01-29 17:35                   ` Andrey Grodzovsky
2021-01-29 19:25                   ` Christian König
2021-01-29 19:25                     ` Christian König
2021-02-05 16:22                     ` Andrey Grodzovsky
2021-02-05 16:22                       ` Andrey Grodzovsky
2021-02-05 22:10                       ` Daniel Vetter
2021-02-05 22:10                         ` Daniel Vetter
2021-02-05 23:09                         ` Andrey Grodzovsky
2021-02-05 23:09                           ` Andrey Grodzovsky
2021-02-06 14:18                           ` Daniel Vetter
2021-02-06 14:18                             ` Daniel Vetter
2021-02-07 21:28                         ` Andrey Grodzovsky
2021-02-07 21:28                           ` Andrey Grodzovsky
2021-02-07 21:50                           ` Daniel Vetter
2021-02-07 21:50                             ` Daniel Vetter
2021-02-08  9:37                             ` Christian König
2021-02-08  9:37                               ` Christian König
2021-02-08  9:48                               ` Daniel Vetter
2021-02-08  9:48                                 ` Daniel Vetter
2021-02-08 10:03                                 ` Christian König
2021-02-08 10:03                                   ` Christian König
2021-02-08 10:11                                   ` Daniel Vetter
2021-02-08 10:11                                     ` Daniel Vetter
2021-02-08 13:59                                     ` Christian König
2021-02-08 13:59                                       ` Christian König
2021-02-08 16:23                                       ` Daniel Vetter
2021-02-08 16:23                                         ` Daniel Vetter
2021-02-08 22:15                                         ` Andrey Grodzovsky
2021-02-08 22:15                                           ` Andrey Grodzovsky
2021-02-09  7:58                                           ` Christian König
2021-02-09  7:58                                             ` Christian König
2021-02-09 14:30                                             ` Andrey Grodzovsky
2021-02-09 14:30                                               ` Andrey Grodzovsky
2021-02-09 15:40                                               ` Christian König
2021-02-09 15:40                                                 ` Christian König
2021-02-10 22:01                                                 ` Andrey Grodzovsky
2021-02-10 22:01                                                   ` Andrey Grodzovsky
2021-02-12 15:00                                                   ` Andrey Grodzovsky [this message]
2021-02-12 15:00                                                     ` Andrey Grodzovsky
2021-02-08 22:09                               ` Andrey Grodzovsky
2021-02-08 22:09                                 ` Andrey Grodzovsky
2021-02-09  8:27                                 ` Christian König
2021-02-09  8:27                                   ` Christian König
2021-02-09  9:46                                   ` Daniel Vetter
2021-02-09  9:46                                     ` Daniel Vetter
2021-01-18 21:01 ` [PATCH v4 12/14] drm/scheduler: Job timeout handler returns status Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  7:53   ` Christian König
2021-01-19  7:53     ` Christian König
2021-01-19 17:47     ` Luben Tuikov
2021-01-19 17:47       ` Luben Tuikov
2021-01-19 18:53       ` Christian König
2021-01-19 18:53         ` Christian König
2021-01-18 21:01 ` [PATCH v4 13/14] drm/sched: Make timeout timer rearm conditional Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 14/14] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19 14:16 ` [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Daniel Vetter
2021-01-19 14:16   ` Daniel Vetter
2021-01-19 17:31   ` Andrey Grodzovsky
2021-01-19 17:31     ` Andrey Grodzovsky
2021-01-19 18:08     ` Daniel Vetter
2021-01-19 18:08       ` Daniel Vetter
2021-01-19 18:18       ` Andrey Grodzovsky
2021-01-19 18:18         ` Andrey Grodzovsky
2021-01-20  9:05         ` Daniel Vetter
2021-01-20  9:05           ` Daniel Vetter
2021-01-20 14:19           ` Andrey Grodzovsky
2021-01-20 14:19             ` Andrey Grodzovsky
2021-01-20 15:59             ` Daniel Vetter
2021-01-20 15:59               ` Daniel Vetter
2021-02-08  5:59               ` Andrey Grodzovsky
2021-02-08  5:59                 ` Andrey Grodzovsky
2021-02-08  7:27                 ` Daniel Vetter
2021-02-08  7:27                   ` Daniel Vetter
2021-02-09  4:01                   ` Andrey Grodzovsky
2021-02-09  4:01                     ` Andrey Grodzovsky
2021-02-09  9:50                     ` Daniel Vetter
2021-02-09  9:50                       ` Daniel Vetter
2021-02-09 15:34                       ` Andrey Grodzovsky
2021-02-09 15:34                         ` Andrey Grodzovsky
2021-02-18 20:03                       ` Andrey Grodzovsky
2021-02-18 20:03                         ` Andrey Grodzovsky
2021-02-19 10:24                         ` Daniel Vetter
2021-02-19 10:24                           ` Daniel Vetter
2021-02-24 16:30                           ` Andrey Grodzovsky
2021-02-24 16:30                             ` Andrey Grodzovsky
2021-02-25 10:25                             ` Daniel Vetter
2021-02-25 10:25                               ` Daniel Vetter
2021-02-25 16:12                               ` Andrey Grodzovsky
2021-02-25 16:12                                 ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8f6d2fe-09a0-1eab-b20d-06ecfdd167fa@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=yuq825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.