From: zhoucm1 <zhoucm1@amd.com> To: "Chris Wilson" <chris@chris-wilson.co.uk>, "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>, "Christian König" <ckoenig.leichtzumerken@gmail.com>, "Eric Anholt" <eric@anholt.net>, christian.koenig@amd.com Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 2/2] drm: Revert syncobj timeline changes. Date: Tue, 13 Nov 2018 13:57:27 +0800 [thread overview] Message-ID: <cd0f89e7-1018-e4c3-3a81-4e796c384c1d@amd.com> (raw) In-Reply-To: <154201973877.16646.5745251436337959698@skylake-alporthouse-com> On 2018年11月12日 18:48, Chris Wilson wrote: > Quoting Christian König (2018-11-12 10:16:01) >> Am 09.11.18 um 23:26 schrieb Eric Anholt: >> >> Eric Anholt <eric@anholt.net> writes: >> >> >> [ Unknown signature status ] >> zhoucm1 <zhoucm1@amd.com> writes: >> >> >> On 2018年11月09日 00:52, Christian König wrote: >> >> Am 08.11.18 um 17:07 schrieb Koenig, Christian: >> >> Am 08.11.18 um 17:04 schrieb Eric Anholt: >> >> Daniel suggested I submit this, since we're still seeing regressions >> from it. This is a revert to before 48197bc564c7 ("drm: add syncobj >> timeline support v9") and its followon fixes. >> >> This is a harmless false positive from lockdep, Chouming and I are >> already working on a fix. >> >> On the other hand we had enough trouble with that patch, so if it >> really bothers you feel free to add my Acked-by: Christian König >> <christian.koenig@amd.com> and push it. >> >> NAK, please no, I don't think this needed, the Warning totally isn't >> related to syncobj timeline, but fence-array implementation flaw, just >> exposed by syncobj. >> In addition, Christian already has a fix for this Warning, I've tested. >> Please Christian send to public review. >> >> I backed out my revert of #2 (#1 still necessary) after adding the >> lockdep regression fix, and now my CTS run got oomkilled after just a >> few hours, with these notable lines in the unreclaimable slab info list: >> >> [ 6314.373099] drm_sched_fence 69095KB 69095KB >> [ 6314.373653] kmemleak_object 428249KB 428384KB >> [ 6314.373736] kmalloc-262144 256KB 256KB >> [ 6314.373743] kmalloc-131072 128KB 128KB >> [ 6314.373750] kmalloc-65536 64KB 64KB >> [ 6314.373756] kmalloc-32768 1472KB 1728KB >> [ 6314.373763] kmalloc-16384 64KB 64KB >> [ 6314.373770] kmalloc-8192 208KB 208KB >> [ 6314.373778] kmalloc-4096 2408KB 2408KB >> [ 6314.373784] kmalloc-2048 288KB 336KB >> [ 6314.373792] kmalloc-1024 1457KB 1512KB >> [ 6314.373800] kmalloc-512 854KB 1048KB >> [ 6314.373808] kmalloc-256 188KB 268KB >> [ 6314.373817] kmalloc-192 69141KB 69142KB >> [ 6314.373824] kmalloc-64 47703KB 47704KB >> [ 6314.373886] kmalloc-128 46396KB 46396KB >> [ 6314.373894] kmem_cache 31KB 35KB >> >> No results from kmemleak, though. >> >> OK, it looks like the #2 revert probably isn't related to the OOM issue. >> Running a single job on otherwise unused DRM, watching /proc/slabinfo >> every second for drm_sched_fence, I get: >> >> drm_sched_fence 0 0 192 21 1 : tunables 32 16 8 : slabdata 0 0 0 : globalstat 0 0 0 0 0 0 0 0 0 : cpustat 0 0 0 0 >> drm_sched_fence 16 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 13 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 6 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 4 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 2 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 0 21 192 21 1 : tunables 32 16 8 : slabdata 0 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> >> So we generate a ton of fences, and I guess free them slowly because of >> RCU? And presumably kmemleak was sucking up lots of memory because of >> how many of these objects were laying around. >> >> >> That is certainly possible. Another possibility is that we don't drop the >> reference in dma-fence-array early enough. >> >> E.g. the dma-fence-array will keep the reference to its fences until it is >> destroyed, which is a bit late when you chain multiple dma-fence-array objects >> together. >> >> David can you take a look at this and propose a fix? That would probably be >> good to have fixed in dma-fence-array separately to the timeline work. > Note that drm_syncobj_replace_fence() leaks any existing fence for > !timeline syncobjs. Hi Chris, Hui! Isn't existing fence collected as garbage? Could you point where/how leaks existing fence? Thanks, David > Which coupled with the linear search ends up with > a severe regression in both time and memory. > -Chris
WARNING: multiple messages have this Message-ID (diff)
From: zhoucm1 <zhoucm1@amd.com> To: "Chris Wilson" <chris@chris-wilson.co.uk>, "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>, "Christian König" <ckoenig.leichtzumerken@gmail.com>, "Eric Anholt" <eric@anholt.net>, christian.koenig@amd.com Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 2/2] drm: Revert syncobj timeline changes. Date: Tue, 13 Nov 2018 13:57:27 +0800 [thread overview] Message-ID: <cd0f89e7-1018-e4c3-3a81-4e796c384c1d@amd.com> (raw) In-Reply-To: <154201973877.16646.5745251436337959698@skylake-alporthouse-com> On 2018年11月12日 18:48, Chris Wilson wrote: > Quoting Christian König (2018-11-12 10:16:01) >> Am 09.11.18 um 23:26 schrieb Eric Anholt: >> >> Eric Anholt <eric@anholt.net> writes: >> >> >> [ Unknown signature status ] >> zhoucm1 <zhoucm1@amd.com> writes: >> >> >> On 2018年11月09日 00:52, Christian König wrote: >> >> Am 08.11.18 um 17:07 schrieb Koenig, Christian: >> >> Am 08.11.18 um 17:04 schrieb Eric Anholt: >> >> Daniel suggested I submit this, since we're still seeing regressions >> from it. This is a revert to before 48197bc564c7 ("drm: add syncobj >> timeline support v9") and its followon fixes. >> >> This is a harmless false positive from lockdep, Chouming and I are >> already working on a fix. >> >> On the other hand we had enough trouble with that patch, so if it >> really bothers you feel free to add my Acked-by: Christian König >> <christian.koenig@amd.com> and push it. >> >> NAK, please no, I don't think this needed, the Warning totally isn't >> related to syncobj timeline, but fence-array implementation flaw, just >> exposed by syncobj. >> In addition, Christian already has a fix for this Warning, I've tested. >> Please Christian send to public review. >> >> I backed out my revert of #2 (#1 still necessary) after adding the >> lockdep regression fix, and now my CTS run got oomkilled after just a >> few hours, with these notable lines in the unreclaimable slab info list: >> >> [ 6314.373099] drm_sched_fence 69095KB 69095KB >> [ 6314.373653] kmemleak_object 428249KB 428384KB >> [ 6314.373736] kmalloc-262144 256KB 256KB >> [ 6314.373743] kmalloc-131072 128KB 128KB >> [ 6314.373750] kmalloc-65536 64KB 64KB >> [ 6314.373756] kmalloc-32768 1472KB 1728KB >> [ 6314.373763] kmalloc-16384 64KB 64KB >> [ 6314.373770] kmalloc-8192 208KB 208KB >> [ 6314.373778] kmalloc-4096 2408KB 2408KB >> [ 6314.373784] kmalloc-2048 288KB 336KB >> [ 6314.373792] kmalloc-1024 1457KB 1512KB >> [ 6314.373800] kmalloc-512 854KB 1048KB >> [ 6314.373808] kmalloc-256 188KB 268KB >> [ 6314.373817] kmalloc-192 69141KB 69142KB >> [ 6314.373824] kmalloc-64 47703KB 47704KB >> [ 6314.373886] kmalloc-128 46396KB 46396KB >> [ 6314.373894] kmem_cache 31KB 35KB >> >> No results from kmemleak, though. >> >> OK, it looks like the #2 revert probably isn't related to the OOM issue. >> Running a single job on otherwise unused DRM, watching /proc/slabinfo >> every second for drm_sched_fence, I get: >> >> drm_sched_fence 0 0 192 21 1 : tunables 32 16 8 : slabdata 0 0 0 : globalstat 0 0 0 0 0 0 0 0 0 : cpustat 0 0 0 0 >> drm_sched_fence 16 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 13 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 6 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 4 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 2 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> drm_sched_fence 0 21 192 21 1 : tunables 32 16 8 : slabdata 0 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0 >> >> So we generate a ton of fences, and I guess free them slowly because of >> RCU? And presumably kmemleak was sucking up lots of memory because of >> how many of these objects were laying around. >> >> >> That is certainly possible. Another possibility is that we don't drop the >> reference in dma-fence-array early enough. >> >> E.g. the dma-fence-array will keep the reference to its fences until it is >> destroyed, which is a bit late when you chain multiple dma-fence-array objects >> together. >> >> David can you take a look at this and propose a fix? That would probably be >> good to have fixed in dma-fence-array separately to the timeline work. > Note that drm_syncobj_replace_fence() leaks any existing fence for > !timeline syncobjs. Hi Chris, Hui! Isn't existing fence collected as garbage? Could you point where/how leaks existing fence? Thanks, David > Which coupled with the linear search ends up with > a severe regression in both time and memory. > -Chris _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2018-11-13 5:57 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-11-08 16:04 [PATCH 0/2] reverts to un-regress v3d Eric Anholt 2018-11-08 16:04 ` [PATCH 1/2] Revert "drm/sched: fix timeout handling v2" Eric Anholt 2018-11-08 16:04 ` Eric Anholt 2018-11-08 16:10 ` Koenig, Christian 2018-11-08 16:10 ` Koenig, Christian 2018-11-08 16:19 ` Eric Anholt 2018-11-08 16:19 ` Eric Anholt 2018-11-08 16:48 ` Koenig, Christian 2018-11-08 16:04 ` [PATCH 2/2] drm: Revert syncobj timeline changes Eric Anholt 2018-11-08 16:07 ` Koenig, Christian 2018-11-08 16:07 ` Koenig, Christian 2018-11-08 16:52 ` Christian König 2018-11-09 2:35 ` zhoucm1 2018-11-09 2:35 ` zhoucm1 2018-11-09 21:10 ` Eric Anholt 2018-11-09 21:10 ` Eric Anholt 2018-11-09 22:26 ` Eric Anholt 2018-11-09 22:26 ` Eric Anholt 2018-11-12 10:16 ` Christian König 2018-11-12 10:28 ` zhoucm1 2018-11-12 10:48 ` Chris Wilson 2018-11-12 11:47 ` Koenig, Christian 2018-11-12 11:47 ` Koenig, Christian 2018-11-13 5:57 ` zhoucm1 [this message] 2018-11-13 5:57 ` zhoucm1 2018-11-13 6:18 ` zhoucm1 [not found] ` <84df39ac-e8a2-f0f5-6562-f2df25c110e8@amd.com> [not found] ` <8736s4w31m.fsf@anholt.net> [not found] ` <22b7eef1-cc13-5662-5656-d39aeb0c78e0@amd.com> [not found] ` <87bm6rb0cy.fsf@anholt.net> 2018-11-15 18:47 ` Eric Anholt 2018-12-19 17:53 ` Dmitry Osipenko 2018-12-21 18:27 ` Christian König 2018-12-21 18:35 ` Dmitry Osipenko 2018-12-21 18:45 ` Koenig, Christian 2018-12-21 18:59 ` Dmitry Osipenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cd0f89e7-1018-e4c3-3a81-4e796c384c1d@amd.com \ --to=zhoucm1@amd.com \ --cc=chris@chris-wilson.co.uk \ --cc=christian.koenig@amd.com \ --cc=ckoenig.leichtzumerken@gmail.com \ --cc=daniel.vetter@ffwll.ch \ --cc=dri-devel@lists.freedesktop.org \ --cc=eric@anholt.net \ --cc=linux-kernel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.