All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Andrey Grodzovsky" <Andrey.Grodzovsky@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Alex Deucher <Alexander.Deucher@amd.com>,
	Qiang Yu <yuq825@gmail.com>
Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal
Date: Fri, 29 Jan 2021 20:25:02 +0100	[thread overview]
Message-ID: <0b502043-5a66-dcd5-53f9-5c190f22dc46@gmail.com> (raw)
In-Reply-To: <69f036e2-f102-8233-37f6-5254a484bf97@amd.com>

Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
>
> On 1/29/21 10:16 AM, Christian König wrote:
>> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
>>>
>>> On 1/19/21 1:59 PM, Christian König wrote:
>>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote:
>>>>>> [SNIP]
>>>>> So say writing in a loop to some harmless scratch register for 
>>>>> many times both for plugged
>>>>> and unplugged case and measure total time delta ?
>>>>
>>>> I think we should at least measure the following:
>>>>
>>>> 1. Writing X times to a scratch reg without your patch.
>>>> 2. Writing X times to a scratch reg with your patch.
>>>> 3. Writing X times to a scratch reg with the hardware physically 
>>>> disconnected.
>>>>
>>>> I suggest to repeat that once for Polaris (or older) and once for 
>>>> Vega or Navi.
>>>>
>>>> The SRBM on Polaris is meant to introduce some delay in each 
>>>> access, so it might react differently then the newer hardware.
>>>>
>>>> Christian.
>>>
>>>
>>> See attached results and the testing code. Ran on Polaris (gfx8) and 
>>> Vega10(gfx9)
>>>
>>> In summary, over 1 million WWREG32 in loop with and without this 
>>> patch you get around 10ms of accumulated overhead ( so 0.00001 
>>> millisecond penalty for each WWREG32) for using drm_dev_enter check 
>>> when writing registers.
>>>
>>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have 
>>> one.
>>
>> Well if I'm not completely mistaken that are 100ms of accumulated 
>> overhead. So around 100ns per write. And even bigger problem is that 
>> this is a ~67% increase.
>
>
> My bad, and 67% from what ? How u calculate ?

My bad, (308501-209689)/209689=47% increase.

>>
>> I'm not sure how many write we do during normal operation, but that 
>> sounds like a bit much. Ideas?
>
> Well, u suggested to move the drm_dev_enter way up but as i see it the 
> problem with this is that it increase the chance of race where the
> device is extracted after we check for drm_dev_enter (there is also 
> such chance even when it's placed inside WWREG but it's lower).
> Earlier I propsed that instead of doing all those guards scattered all 
> over the code simply delay release of system memory pages and 
> unreserve of
> MMIO ranges to until after the device itself is gone after last drm 
> device reference is dropped. But Daniel opposes delaying MMIO ranges 
> unreserve to after
> PCI remove code because according to him it will upset the PCI subsytem.

Yeah, that's most likely true as well.

Maybe Daniel has another idea when he's back from vacation.

Christian.

>
> Andrey
>
>>
>> Christian.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Andrey Grodzovsky" <Andrey.Grodzovsky@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>
Cc: Rob Herring <robh@kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	"Anholt, Eric" <eric@anholt.net>,
	Pekka Paalanen <ppaalanen@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Alex Deucher <Alexander.Deucher@amd.com>,
	Qiang Yu <yuq825@gmail.com>,
	"Wentland, Harry" <Harry.Wentland@amd.com>,
	Lucas Stach <l.stach@pengutronix.de>
Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal
Date: Fri, 29 Jan 2021 20:25:02 +0100	[thread overview]
Message-ID: <0b502043-5a66-dcd5-53f9-5c190f22dc46@gmail.com> (raw)
In-Reply-To: <69f036e2-f102-8233-37f6-5254a484bf97@amd.com>

Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
>
> On 1/29/21 10:16 AM, Christian König wrote:
>> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
>>>
>>> On 1/19/21 1:59 PM, Christian König wrote:
>>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote:
>>>>>> [SNIP]
>>>>> So say writing in a loop to some harmless scratch register for 
>>>>> many times both for plugged
>>>>> and unplugged case and measure total time delta ?
>>>>
>>>> I think we should at least measure the following:
>>>>
>>>> 1. Writing X times to a scratch reg without your patch.
>>>> 2. Writing X times to a scratch reg with your patch.
>>>> 3. Writing X times to a scratch reg with the hardware physically 
>>>> disconnected.
>>>>
>>>> I suggest to repeat that once for Polaris (or older) and once for 
>>>> Vega or Navi.
>>>>
>>>> The SRBM on Polaris is meant to introduce some delay in each 
>>>> access, so it might react differently then the newer hardware.
>>>>
>>>> Christian.
>>>
>>>
>>> See attached results and the testing code. Ran on Polaris (gfx8) and 
>>> Vega10(gfx9)
>>>
>>> In summary, over 1 million WWREG32 in loop with and without this 
>>> patch you get around 10ms of accumulated overhead ( so 0.00001 
>>> millisecond penalty for each WWREG32) for using drm_dev_enter check 
>>> when writing registers.
>>>
>>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have 
>>> one.
>>
>> Well if I'm not completely mistaken that are 100ms of accumulated 
>> overhead. So around 100ns per write. And even bigger problem is that 
>> this is a ~67% increase.
>
>
> My bad, and 67% from what ? How u calculate ?

My bad, (308501-209689)/209689=47% increase.

>>
>> I'm not sure how many write we do during normal operation, but that 
>> sounds like a bit much. Ideas?
>
> Well, u suggested to move the drm_dev_enter way up but as i see it the 
> problem with this is that it increase the chance of race where the
> device is extracted after we check for drm_dev_enter (there is also 
> such chance even when it's placed inside WWREG but it's lower).
> Earlier I propsed that instead of doing all those guards scattered all 
> over the code simply delay release of system memory pages and 
> unreserve of
> MMIO ranges to until after the device itself is gone after last drm 
> device reference is dropped. But Daniel opposes delaying MMIO ranges 
> unreserve to after
> PCI remove code because according to him it will upset the PCI subsytem.

Yeah, that's most likely true as well.

Maybe Daniel has another idea when he's back from vacation.

Christian.

>
> Andrey
>
>>
>> Christian.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2021-01-29 19:25 UTC|newest]

Thread overview: 196+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18 21:01 [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
2021-01-18 21:01 ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 01/14] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:48   ` Alex Deucher
2021-01-18 21:48     ` Alex Deucher
2021-01-19  8:41   ` Christian König
2021-01-19  8:41     ` Christian König
2021-01-19 13:56   ` Daniel Vetter
2021-01-19 13:56     ` Daniel Vetter
2021-01-25 15:28     ` Andrey Grodzovsky
2021-01-25 15:28       ` Andrey Grodzovsky
2021-01-27 14:29       ` Andrey Grodzovsky
2021-01-27 14:29         ` Andrey Grodzovsky
2021-02-02 14:21         ` Daniel Vetter
2021-02-02 14:21           ` Daniel Vetter
2021-01-18 21:01 ` [PATCH v4 02/14] drm: Unamp the entire device address space on device unplug Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 03/14] drm/ttm: Expose ttm_tt_unpopulate for driver use Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 04/14] drm/sched: Cancel and flush all oustatdning jobs before finish Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:49   ` Alex Deucher
2021-01-18 21:49     ` Alex Deucher
2021-01-19  8:42   ` Christian König
2021-01-19  8:42     ` Christian König
2021-01-19  9:50     ` Christian König
2021-01-19  9:50       ` Christian König
2021-01-18 21:01 ` [PATCH v4 05/14] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:45   ` Christian König
2021-01-19  8:45     ` Christian König
2021-01-18 21:01 ` [PATCH v4 06/14] drm/amdgpu: Add early fini callback Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:52   ` Alex Deucher
2021-01-18 21:52     ` Alex Deucher
2021-01-19  8:48   ` Christian König
2021-01-19  8:48     ` Christian König
2021-01-19 13:45     ` Daniel Vetter
2021-01-19 13:45       ` Daniel Vetter
2021-01-19 21:21       ` Andrey Grodzovsky
2021-01-19 21:21         ` Andrey Grodzovsky
2021-01-19 22:01         ` Daniel Vetter
2021-01-19 22:01           ` Daniel Vetter
2021-01-20  4:21           ` Andrey Grodzovsky
2021-01-20  4:21             ` Andrey Grodzovsky
2021-01-20  8:38             ` Daniel Vetter
2021-01-20  8:38               ` Daniel Vetter
     [not found]               ` <1a5f7ccb-1f91-91be-1cb1-e7cb43ac2c13@amd.com>
2021-01-21 10:48                 ` Daniel Vetter
2021-01-21 10:48                   ` Daniel Vetter
2021-01-20  5:01     ` Andrey Grodzovsky
2021-01-20  5:01       ` Andrey Grodzovsky
2021-01-20 19:38       ` Andrey Grodzovsky
2021-01-20 19:38         ` Andrey Grodzovsky
2021-01-21 10:42         ` Christian König
2021-01-21 10:42           ` Christian König
2021-01-18 21:01 ` [PATCH v4 08/14] drm/amdgpu: Fix a bunch of sdma code crash post device unplug Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:51   ` Christian König
2021-01-19  8:51     ` Christian König
2021-01-18 21:01 ` [PATCH v4 09/14] drm/amdgpu: Remap all page faults to per process dummy page Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:52   ` Christian König
2021-01-19  8:52     ` Christian König
2021-01-18 21:01 ` [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  7:34   ` Greg KH
2021-01-19  7:34     ` Greg KH
2021-01-19 16:36     ` Andrey Grodzovsky
2021-01-19 16:36       ` Andrey Grodzovsky
2021-01-19 17:47       ` Greg KH
2021-01-19 17:47         ` Greg KH
2021-01-19 19:04         ` Alex Deucher
2021-01-19 19:04           ` Alex Deucher
2021-01-19 19:16           ` Andrey Grodzovsky
2021-01-19 19:16             ` Andrey Grodzovsky
2021-01-19 19:41           ` Greg KH
2021-01-19 19:41             ` Greg KH
2021-01-19  8:53   ` Christian König
2021-01-19  8:53     ` Christian König
2021-01-18 21:01 ` [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  8:55   ` Christian König
2021-01-19  8:55     ` Christian König
2021-01-19 15:35     ` Andrey Grodzovsky
2021-01-19 15:35       ` Andrey Grodzovsky
2021-01-19 15:39       ` Christian König
2021-01-19 15:39         ` Christian König
2021-01-19 18:05       ` Daniel Vetter
2021-01-19 18:05         ` Daniel Vetter
2021-01-19 18:22         ` Andrey Grodzovsky
2021-01-19 18:22           ` Andrey Grodzovsky
2021-01-19 18:59           ` Christian König
2021-01-19 18:59             ` Christian König
2021-01-19 19:16             ` Andrey Grodzovsky
2021-01-19 19:16               ` Andrey Grodzovsky
2021-01-20 19:34               ` Andrey Grodzovsky
2021-01-20 19:34                 ` Andrey Grodzovsky
2021-01-28 17:23             ` Andrey Grodzovsky
2021-01-28 17:23               ` Andrey Grodzovsky
2021-01-29 15:16               ` Christian König
2021-01-29 15:16                 ` Christian König
2021-01-29 17:35                 ` Andrey Grodzovsky
2021-01-29 17:35                   ` Andrey Grodzovsky
2021-01-29 19:25                   ` Christian König [this message]
2021-01-29 19:25                     ` Christian König
2021-02-05 16:22                     ` Andrey Grodzovsky
2021-02-05 16:22                       ` Andrey Grodzovsky
2021-02-05 22:10                       ` Daniel Vetter
2021-02-05 22:10                         ` Daniel Vetter
2021-02-05 23:09                         ` Andrey Grodzovsky
2021-02-05 23:09                           ` Andrey Grodzovsky
2021-02-06 14:18                           ` Daniel Vetter
2021-02-06 14:18                             ` Daniel Vetter
2021-02-07 21:28                         ` Andrey Grodzovsky
2021-02-07 21:28                           ` Andrey Grodzovsky
2021-02-07 21:50                           ` Daniel Vetter
2021-02-07 21:50                             ` Daniel Vetter
2021-02-08  9:37                             ` Christian König
2021-02-08  9:37                               ` Christian König
2021-02-08  9:48                               ` Daniel Vetter
2021-02-08  9:48                                 ` Daniel Vetter
2021-02-08 10:03                                 ` Christian König
2021-02-08 10:03                                   ` Christian König
2021-02-08 10:11                                   ` Daniel Vetter
2021-02-08 10:11                                     ` Daniel Vetter
2021-02-08 13:59                                     ` Christian König
2021-02-08 13:59                                       ` Christian König
2021-02-08 16:23                                       ` Daniel Vetter
2021-02-08 16:23                                         ` Daniel Vetter
2021-02-08 22:15                                         ` Andrey Grodzovsky
2021-02-08 22:15                                           ` Andrey Grodzovsky
2021-02-09  7:58                                           ` Christian König
2021-02-09  7:58                                             ` Christian König
2021-02-09 14:30                                             ` Andrey Grodzovsky
2021-02-09 14:30                                               ` Andrey Grodzovsky
2021-02-09 15:40                                               ` Christian König
2021-02-09 15:40                                                 ` Christian König
2021-02-10 22:01                                                 ` Andrey Grodzovsky
2021-02-10 22:01                                                   ` Andrey Grodzovsky
2021-02-12 15:00                                                   ` Andrey Grodzovsky
2021-02-12 15:00                                                     ` Andrey Grodzovsky
2021-02-08 22:09                               ` Andrey Grodzovsky
2021-02-08 22:09                                 ` Andrey Grodzovsky
2021-02-09  8:27                                 ` Christian König
2021-02-09  8:27                                   ` Christian König
2021-02-09  9:46                                   ` Daniel Vetter
2021-02-09  9:46                                     ` Daniel Vetter
2021-01-18 21:01 ` [PATCH v4 12/14] drm/scheduler: Job timeout handler returns status Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19  7:53   ` Christian König
2021-01-19  7:53     ` Christian König
2021-01-19 17:47     ` Luben Tuikov
2021-01-19 17:47       ` Luben Tuikov
2021-01-19 18:53       ` Christian König
2021-01-19 18:53         ` Christian König
2021-01-18 21:01 ` [PATCH v4 13/14] drm/sched: Make timeout timer rearm conditional Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-18 21:01 ` [PATCH v4 14/14] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
2021-01-18 21:01   ` Andrey Grodzovsky
2021-01-19 14:16 ` [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Daniel Vetter
2021-01-19 14:16   ` Daniel Vetter
2021-01-19 17:31   ` Andrey Grodzovsky
2021-01-19 17:31     ` Andrey Grodzovsky
2021-01-19 18:08     ` Daniel Vetter
2021-01-19 18:08       ` Daniel Vetter
2021-01-19 18:18       ` Andrey Grodzovsky
2021-01-19 18:18         ` Andrey Grodzovsky
2021-01-20  9:05         ` Daniel Vetter
2021-01-20  9:05           ` Daniel Vetter
2021-01-20 14:19           ` Andrey Grodzovsky
2021-01-20 14:19             ` Andrey Grodzovsky
2021-01-20 15:59             ` Daniel Vetter
2021-01-20 15:59               ` Daniel Vetter
2021-02-08  5:59               ` Andrey Grodzovsky
2021-02-08  5:59                 ` Andrey Grodzovsky
2021-02-08  7:27                 ` Daniel Vetter
2021-02-08  7:27                   ` Daniel Vetter
2021-02-09  4:01                   ` Andrey Grodzovsky
2021-02-09  4:01                     ` Andrey Grodzovsky
2021-02-09  9:50                     ` Daniel Vetter
2021-02-09  9:50                       ` Daniel Vetter
2021-02-09 15:34                       ` Andrey Grodzovsky
2021-02-09 15:34                         ` Andrey Grodzovsky
2021-02-18 20:03                       ` Andrey Grodzovsky
2021-02-18 20:03                         ` Andrey Grodzovsky
2021-02-19 10:24                         ` Daniel Vetter
2021-02-19 10:24                           ` Daniel Vetter
2021-02-24 16:30                           ` Andrey Grodzovsky
2021-02-24 16:30                             ` Andrey Grodzovsky
2021-02-25 10:25                             ` Daniel Vetter
2021-02-25 10:25                               ` Daniel Vetter
2021-02-25 16:12                               ` Andrey Grodzovsky
2021-02-25 16:12                                 ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b502043-5a66-dcd5-53f9-5c190f22dc46@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=yuq825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.