From: "Christian König" <ckoenig.leichtzumerken@gmail.com> To: "Andrey Grodzovsky" <Andrey.Grodzovsky@amd.com>, "Christian König" <christian.koenig@amd.com>, "Daniel Vetter" <daniel.vetter@ffwll.ch> Cc: Greg KH <gregkh@linuxfoundation.org>, amd-gfx list <amd-gfx@lists.freedesktop.org>, dri-devel <dri-devel@lists.freedesktop.org>, Alex Deucher <Alexander.Deucher@amd.com>, Qiang Yu <yuq825@gmail.com> Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Date: Fri, 29 Jan 2021 20:25:02 +0100 [thread overview] Message-ID: <0b502043-5a66-dcd5-53f9-5c190f22dc46@gmail.com> (raw) In-Reply-To: <69f036e2-f102-8233-37f6-5254a484bf97@amd.com> Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky: > > On 1/29/21 10:16 AM, Christian König wrote: >> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky: >>> >>> On 1/19/21 1:59 PM, Christian König wrote: >>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: >>>>> >>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote: >>>>>> [SNIP] >>>>> So say writing in a loop to some harmless scratch register for >>>>> many times both for plugged >>>>> and unplugged case and measure total time delta ? >>>> >>>> I think we should at least measure the following: >>>> >>>> 1. Writing X times to a scratch reg without your patch. >>>> 2. Writing X times to a scratch reg with your patch. >>>> 3. Writing X times to a scratch reg with the hardware physically >>>> disconnected. >>>> >>>> I suggest to repeat that once for Polaris (or older) and once for >>>> Vega or Navi. >>>> >>>> The SRBM on Polaris is meant to introduce some delay in each >>>> access, so it might react differently then the newer hardware. >>>> >>>> Christian. >>> >>> >>> See attached results and the testing code. Ran on Polaris (gfx8) and >>> Vega10(gfx9) >>> >>> In summary, over 1 million WWREG32 in loop with and without this >>> patch you get around 10ms of accumulated overhead ( so 0.00001 >>> millisecond penalty for each WWREG32) for using drm_dev_enter check >>> when writing registers. >>> >>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have >>> one. >> >> Well if I'm not completely mistaken that are 100ms of accumulated >> overhead. So around 100ns per write. And even bigger problem is that >> this is a ~67% increase. > > > My bad, and 67% from what ? How u calculate ? My bad, (308501-209689)/209689=47% increase. >> >> I'm not sure how many write we do during normal operation, but that >> sounds like a bit much. Ideas? > > Well, u suggested to move the drm_dev_enter way up but as i see it the > problem with this is that it increase the chance of race where the > device is extracted after we check for drm_dev_enter (there is also > such chance even when it's placed inside WWREG but it's lower). > Earlier I propsed that instead of doing all those guards scattered all > over the code simply delay release of system memory pages and > unreserve of > MMIO ranges to until after the device itself is gone after last drm > device reference is dropped. But Daniel opposes delaying MMIO ranges > unreserve to after > PCI remove code because according to him it will upset the PCI subsytem. Yeah, that's most likely true as well. Maybe Daniel has another idea when he's back from vacation. Christian. > > Andrey > >> >> Christian. > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <ckoenig.leichtzumerken@gmail.com> To: "Andrey Grodzovsky" <Andrey.Grodzovsky@amd.com>, "Christian König" <christian.koenig@amd.com>, "Daniel Vetter" <daniel.vetter@ffwll.ch> Cc: Rob Herring <robh@kernel.org>, Greg KH <gregkh@linuxfoundation.org>, amd-gfx list <amd-gfx@lists.freedesktop.org>, "Anholt, Eric" <eric@anholt.net>, Pekka Paalanen <ppaalanen@gmail.com>, dri-devel <dri-devel@lists.freedesktop.org>, Alex Deucher <Alexander.Deucher@amd.com>, Qiang Yu <yuq825@gmail.com>, "Wentland, Harry" <Harry.Wentland@amd.com>, Lucas Stach <l.stach@pengutronix.de> Subject: Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Date: Fri, 29 Jan 2021 20:25:02 +0100 [thread overview] Message-ID: <0b502043-5a66-dcd5-53f9-5c190f22dc46@gmail.com> (raw) In-Reply-To: <69f036e2-f102-8233-37f6-5254a484bf97@amd.com> Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky: > > On 1/29/21 10:16 AM, Christian König wrote: >> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky: >>> >>> On 1/19/21 1:59 PM, Christian König wrote: >>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: >>>>> >>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote: >>>>>> [SNIP] >>>>> So say writing in a loop to some harmless scratch register for >>>>> many times both for plugged >>>>> and unplugged case and measure total time delta ? >>>> >>>> I think we should at least measure the following: >>>> >>>> 1. Writing X times to a scratch reg without your patch. >>>> 2. Writing X times to a scratch reg with your patch. >>>> 3. Writing X times to a scratch reg with the hardware physically >>>> disconnected. >>>> >>>> I suggest to repeat that once for Polaris (or older) and once for >>>> Vega or Navi. >>>> >>>> The SRBM on Polaris is meant to introduce some delay in each >>>> access, so it might react differently then the newer hardware. >>>> >>>> Christian. >>> >>> >>> See attached results and the testing code. Ran on Polaris (gfx8) and >>> Vega10(gfx9) >>> >>> In summary, over 1 million WWREG32 in loop with and without this >>> patch you get around 10ms of accumulated overhead ( so 0.00001 >>> millisecond penalty for each WWREG32) for using drm_dev_enter check >>> when writing registers. >>> >>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have >>> one. >> >> Well if I'm not completely mistaken that are 100ms of accumulated >> overhead. So around 100ns per write. And even bigger problem is that >> this is a ~67% increase. > > > My bad, and 67% from what ? How u calculate ? My bad, (308501-209689)/209689=47% increase. >> >> I'm not sure how many write we do during normal operation, but that >> sounds like a bit much. Ideas? > > Well, u suggested to move the drm_dev_enter way up but as i see it the > problem with this is that it increase the chance of race where the > device is extracted after we check for drm_dev_enter (there is also > such chance even when it's placed inside WWREG but it's lower). > Earlier I propsed that instead of doing all those guards scattered all > over the code simply delay release of system memory pages and > unreserve of > MMIO ranges to until after the device itself is gone after last drm > device reference is dropped. But Daniel opposes delaying MMIO ranges > unreserve to after > PCI remove code because according to him it will upset the PCI subsytem. Yeah, that's most likely true as well. Maybe Daniel has another idea when he's back from vacation. Christian. > > Andrey > >> >> Christian. > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2021-01-29 19:25 UTC|newest] Thread overview: 196+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-18 21:01 [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 01/14] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:48 ` Alex Deucher 2021-01-18 21:48 ` Alex Deucher 2021-01-19 8:41 ` Christian König 2021-01-19 8:41 ` Christian König 2021-01-19 13:56 ` Daniel Vetter 2021-01-19 13:56 ` Daniel Vetter 2021-01-25 15:28 ` Andrey Grodzovsky 2021-01-25 15:28 ` Andrey Grodzovsky 2021-01-27 14:29 ` Andrey Grodzovsky 2021-01-27 14:29 ` Andrey Grodzovsky 2021-02-02 14:21 ` Daniel Vetter 2021-02-02 14:21 ` Daniel Vetter 2021-01-18 21:01 ` [PATCH v4 02/14] drm: Unamp the entire device address space on device unplug Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 03/14] drm/ttm: Expose ttm_tt_unpopulate for driver use Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 04/14] drm/sched: Cancel and flush all oustatdning jobs before finish Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:49 ` Alex Deucher 2021-01-18 21:49 ` Alex Deucher 2021-01-19 8:42 ` Christian König 2021-01-19 8:42 ` Christian König 2021-01-19 9:50 ` Christian König 2021-01-19 9:50 ` Christian König 2021-01-18 21:01 ` [PATCH v4 05/14] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:45 ` Christian König 2021-01-19 8:45 ` Christian König 2021-01-18 21:01 ` [PATCH v4 06/14] drm/amdgpu: Add early fini callback Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 07/14] drm/amdgpu: Register IOMMU topology notifier per device Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:52 ` Alex Deucher 2021-01-18 21:52 ` Alex Deucher 2021-01-19 8:48 ` Christian König 2021-01-19 8:48 ` Christian König 2021-01-19 13:45 ` Daniel Vetter 2021-01-19 13:45 ` Daniel Vetter 2021-01-19 21:21 ` Andrey Grodzovsky 2021-01-19 21:21 ` Andrey Grodzovsky 2021-01-19 22:01 ` Daniel Vetter 2021-01-19 22:01 ` Daniel Vetter 2021-01-20 4:21 ` Andrey Grodzovsky 2021-01-20 4:21 ` Andrey Grodzovsky 2021-01-20 8:38 ` Daniel Vetter 2021-01-20 8:38 ` Daniel Vetter [not found] ` <1a5f7ccb-1f91-91be-1cb1-e7cb43ac2c13@amd.com> 2021-01-21 10:48 ` Daniel Vetter 2021-01-21 10:48 ` Daniel Vetter 2021-01-20 5:01 ` Andrey Grodzovsky 2021-01-20 5:01 ` Andrey Grodzovsky 2021-01-20 19:38 ` Andrey Grodzovsky 2021-01-20 19:38 ` Andrey Grodzovsky 2021-01-21 10:42 ` Christian König 2021-01-21 10:42 ` Christian König 2021-01-18 21:01 ` [PATCH v4 08/14] drm/amdgpu: Fix a bunch of sdma code crash post device unplug Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:51 ` Christian König 2021-01-19 8:51 ` Christian König 2021-01-18 21:01 ` [PATCH v4 09/14] drm/amdgpu: Remap all page faults to per process dummy page Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:52 ` Christian König 2021-01-19 8:52 ` Christian König 2021-01-18 21:01 ` [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 7:34 ` Greg KH 2021-01-19 7:34 ` Greg KH 2021-01-19 16:36 ` Andrey Grodzovsky 2021-01-19 16:36 ` Andrey Grodzovsky 2021-01-19 17:47 ` Greg KH 2021-01-19 17:47 ` Greg KH 2021-01-19 19:04 ` Alex Deucher 2021-01-19 19:04 ` Alex Deucher 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-19 19:41 ` Greg KH 2021-01-19 19:41 ` Greg KH 2021-01-19 8:53 ` Christian König 2021-01-19 8:53 ` Christian König 2021-01-18 21:01 ` [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 8:55 ` Christian König 2021-01-19 8:55 ` Christian König 2021-01-19 15:35 ` Andrey Grodzovsky 2021-01-19 15:35 ` Andrey Grodzovsky 2021-01-19 15:39 ` Christian König 2021-01-19 15:39 ` Christian König 2021-01-19 18:05 ` Daniel Vetter 2021-01-19 18:05 ` Daniel Vetter 2021-01-19 18:22 ` Andrey Grodzovsky 2021-01-19 18:22 ` Andrey Grodzovsky 2021-01-19 18:59 ` Christian König 2021-01-19 18:59 ` Christian König 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-19 19:16 ` Andrey Grodzovsky 2021-01-20 19:34 ` Andrey Grodzovsky 2021-01-20 19:34 ` Andrey Grodzovsky 2021-01-28 17:23 ` Andrey Grodzovsky 2021-01-28 17:23 ` Andrey Grodzovsky 2021-01-29 15:16 ` Christian König 2021-01-29 15:16 ` Christian König 2021-01-29 17:35 ` Andrey Grodzovsky 2021-01-29 17:35 ` Andrey Grodzovsky 2021-01-29 19:25 ` Christian König [this message] 2021-01-29 19:25 ` Christian König 2021-02-05 16:22 ` Andrey Grodzovsky 2021-02-05 16:22 ` Andrey Grodzovsky 2021-02-05 22:10 ` Daniel Vetter 2021-02-05 22:10 ` Daniel Vetter 2021-02-05 23:09 ` Andrey Grodzovsky 2021-02-05 23:09 ` Andrey Grodzovsky 2021-02-06 14:18 ` Daniel Vetter 2021-02-06 14:18 ` Daniel Vetter 2021-02-07 21:28 ` Andrey Grodzovsky 2021-02-07 21:28 ` Andrey Grodzovsky 2021-02-07 21:50 ` Daniel Vetter 2021-02-07 21:50 ` Daniel Vetter 2021-02-08 9:37 ` Christian König 2021-02-08 9:37 ` Christian König 2021-02-08 9:48 ` Daniel Vetter 2021-02-08 9:48 ` Daniel Vetter 2021-02-08 10:03 ` Christian König 2021-02-08 10:03 ` Christian König 2021-02-08 10:11 ` Daniel Vetter 2021-02-08 10:11 ` Daniel Vetter 2021-02-08 13:59 ` Christian König 2021-02-08 13:59 ` Christian König 2021-02-08 16:23 ` Daniel Vetter 2021-02-08 16:23 ` Daniel Vetter 2021-02-08 22:15 ` Andrey Grodzovsky 2021-02-08 22:15 ` Andrey Grodzovsky 2021-02-09 7:58 ` Christian König 2021-02-09 7:58 ` Christian König 2021-02-09 14:30 ` Andrey Grodzovsky 2021-02-09 14:30 ` Andrey Grodzovsky 2021-02-09 15:40 ` Christian König 2021-02-09 15:40 ` Christian König 2021-02-10 22:01 ` Andrey Grodzovsky 2021-02-10 22:01 ` Andrey Grodzovsky 2021-02-12 15:00 ` Andrey Grodzovsky 2021-02-12 15:00 ` Andrey Grodzovsky 2021-02-08 22:09 ` Andrey Grodzovsky 2021-02-08 22:09 ` Andrey Grodzovsky 2021-02-09 8:27 ` Christian König 2021-02-09 8:27 ` Christian König 2021-02-09 9:46 ` Daniel Vetter 2021-02-09 9:46 ` Daniel Vetter 2021-01-18 21:01 ` [PATCH v4 12/14] drm/scheduler: Job timeout handler returns status Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 7:53 ` Christian König 2021-01-19 7:53 ` Christian König 2021-01-19 17:47 ` Luben Tuikov 2021-01-19 17:47 ` Luben Tuikov 2021-01-19 18:53 ` Christian König 2021-01-19 18:53 ` Christian König 2021-01-18 21:01 ` [PATCH v4 13/14] drm/sched: Make timeout timer rearm conditional Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-18 21:01 ` [PATCH v4 14/14] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky 2021-01-18 21:01 ` Andrey Grodzovsky 2021-01-19 14:16 ` [PATCH v4 00/14] RFC Support hot device unplug in amdgpu Daniel Vetter 2021-01-19 14:16 ` Daniel Vetter 2021-01-19 17:31 ` Andrey Grodzovsky 2021-01-19 17:31 ` Andrey Grodzovsky 2021-01-19 18:08 ` Daniel Vetter 2021-01-19 18:08 ` Daniel Vetter 2021-01-19 18:18 ` Andrey Grodzovsky 2021-01-19 18:18 ` Andrey Grodzovsky 2021-01-20 9:05 ` Daniel Vetter 2021-01-20 9:05 ` Daniel Vetter 2021-01-20 14:19 ` Andrey Grodzovsky 2021-01-20 14:19 ` Andrey Grodzovsky 2021-01-20 15:59 ` Daniel Vetter 2021-01-20 15:59 ` Daniel Vetter 2021-02-08 5:59 ` Andrey Grodzovsky 2021-02-08 5:59 ` Andrey Grodzovsky 2021-02-08 7:27 ` Daniel Vetter 2021-02-08 7:27 ` Daniel Vetter 2021-02-09 4:01 ` Andrey Grodzovsky 2021-02-09 4:01 ` Andrey Grodzovsky 2021-02-09 9:50 ` Daniel Vetter 2021-02-09 9:50 ` Daniel Vetter 2021-02-09 15:34 ` Andrey Grodzovsky 2021-02-09 15:34 ` Andrey Grodzovsky 2021-02-18 20:03 ` Andrey Grodzovsky 2021-02-18 20:03 ` Andrey Grodzovsky 2021-02-19 10:24 ` Daniel Vetter 2021-02-19 10:24 ` Daniel Vetter 2021-02-24 16:30 ` Andrey Grodzovsky 2021-02-24 16:30 ` Andrey Grodzovsky 2021-02-25 10:25 ` Daniel Vetter 2021-02-25 10:25 ` Daniel Vetter 2021-02-25 16:12 ` Andrey Grodzovsky 2021-02-25 16:12 ` Andrey Grodzovsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=0b502043-5a66-dcd5-53f9-5c190f22dc46@gmail.com \ --to=ckoenig.leichtzumerken@gmail.com \ --cc=Alexander.Deucher@amd.com \ --cc=Andrey.Grodzovsky@amd.com \ --cc=amd-gfx@lists.freedesktop.org \ --cc=christian.koenig@amd.com \ --cc=daniel.vetter@ffwll.ch \ --cc=dri-devel@lists.freedesktop.org \ --cc=gregkh@linuxfoundation.org \ --cc=yuq825@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.