From: "Koenig, Christian" <Christian.Koenig@amd.com>
To: "Chris Wilson" <chris@chris-wilson.co.uk>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Eric Anholt" <eric@anholt.net>,
"Zhou, David(ChunMing)" <David1.Zhou@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] drm: Revert syncobj timeline changes.
Date: Mon, 12 Nov 2018 11:47:56 +0000 [thread overview]
Message-ID: <2c580da6-da64-a930-fad2-80705ac3bf5f@amd.com> (raw)
In-Reply-To: <154201973877.16646.5745251436337959698@skylake-alporthouse-com>
Am 12.11.18 um 11:48 schrieb Chris Wilson:
> Quoting Christian König (2018-11-12 10:16:01)
>> Am 09.11.18 um 23:26 schrieb Eric Anholt:
>>
>> Eric Anholt <eric@anholt.net> writes:
>>
>>
>> [ Unknown signature status ]
>> zhoucm1 <zhoucm1@amd.com> writes:
>>
>>
>> On 2018年11月09日 00:52, Christian König wrote:
>>
>> Am 08.11.18 um 17:07 schrieb Koenig, Christian:
>>
>> Am 08.11.18 um 17:04 schrieb Eric Anholt:
>>
>> Daniel suggested I submit this, since we're still seeing regressions
>> from it. This is a revert to before 48197bc564c7 ("drm: add syncobj
>> timeline support v9") and its followon fixes.
>>
>> This is a harmless false positive from lockdep, Chouming and I are
>> already working on a fix.
>>
>> On the other hand we had enough trouble with that patch, so if it
>> really bothers you feel free to add my Acked-by: Christian König
>> <christian.koenig@amd.com> and push it.
>>
>> NAK, please no, I don't think this needed, the Warning totally isn't
>> related to syncobj timeline, but fence-array implementation flaw, just
>> exposed by syncobj.
>> In addition, Christian already has a fix for this Warning, I've tested.
>> Please Christian send to public review.
>>
>> I backed out my revert of #2 (#1 still necessary) after adding the
>> lockdep regression fix, and now my CTS run got oomkilled after just a
>> few hours, with these notable lines in the unreclaimable slab info list:
>>
>> [ 6314.373099] drm_sched_fence 69095KB 69095KB
>> [ 6314.373653] kmemleak_object 428249KB 428384KB
>> [ 6314.373736] kmalloc-262144 256KB 256KB
>> [ 6314.373743] kmalloc-131072 128KB 128KB
>> [ 6314.373750] kmalloc-65536 64KB 64KB
>> [ 6314.373756] kmalloc-32768 1472KB 1728KB
>> [ 6314.373763] kmalloc-16384 64KB 64KB
>> [ 6314.373770] kmalloc-8192 208KB 208KB
>> [ 6314.373778] kmalloc-4096 2408KB 2408KB
>> [ 6314.373784] kmalloc-2048 288KB 336KB
>> [ 6314.373792] kmalloc-1024 1457KB 1512KB
>> [ 6314.373800] kmalloc-512 854KB 1048KB
>> [ 6314.373808] kmalloc-256 188KB 268KB
>> [ 6314.373817] kmalloc-192 69141KB 69142KB
>> [ 6314.373824] kmalloc-64 47703KB 47704KB
>> [ 6314.373886] kmalloc-128 46396KB 46396KB
>> [ 6314.373894] kmem_cache 31KB 35KB
>>
>> No results from kmemleak, though.
>>
>> OK, it looks like the #2 revert probably isn't related to the OOM issue.
>> Running a single job on otherwise unused DRM, watching /proc/slabinfo
>> every second for drm_sched_fence, I get:
>>
>> drm_sched_fence 0 0 192 21 1 : tunables 32 16 8 : slabdata 0 0 0 : globalstat 0 0 0 0 0 0 0 0 0 : cpustat 0 0 0 0
>> drm_sched_fence 16 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 13 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 6 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 4 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 2 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>> drm_sched_fence 0 21 192 21 1 : tunables 32 16 8 : slabdata 0 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
>>
>> So we generate a ton of fences, and I guess free them slowly because of
>> RCU? And presumably kmemleak was sucking up lots of memory because of
>> how many of these objects were laying around.
>>
>>
>> That is certainly possible. Another possibility is that we don't drop the
>> reference in dma-fence-array early enough.
>>
>> E.g. the dma-fence-array will keep the reference to its fences until it is
>> destroyed, which is a bit late when you chain multiple dma-fence-array objects
>> together.
>>
>> David can you take a look at this and propose a fix? That would probably be
>> good to have fixed in dma-fence-array separately to the timeline work.
> Note that drm_syncobj_replace_fence() leaks any existing fence for
> !timeline syncobjs. Which coupled with the linear search ends up with
> a severe regression in both time and memory.
Ok, enough is enough. I'm going to revert this.
Thanks for the info,
Christian.
> -Chris
next prev parent reply other threads:[~2018-11-12 11:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-08 16:04 [PATCH 0/2] reverts to un-regress v3d Eric Anholt
2018-11-08 16:04 ` [PATCH 1/2] Revert "drm/sched: fix timeout handling v2" Eric Anholt
2018-11-08 16:10 ` Koenig, Christian
2018-11-08 16:19 ` Eric Anholt
2018-11-08 16:48 ` Koenig, Christian
2018-11-08 16:04 ` [PATCH 2/2] drm: Revert syncobj timeline changes Eric Anholt
2018-11-08 16:07 ` Koenig, Christian
2018-11-08 16:52 ` Christian König
2018-11-09 2:35 ` zhoucm1
2018-11-09 21:10 ` Eric Anholt
2018-11-09 22:26 ` Eric Anholt
[not found] ` <199c35bc-e684-fbc4-dcef-d7105d82f0ff@gmail.com>
2018-11-12 10:48 ` Chris Wilson
2018-11-12 11:47 ` Koenig, Christian [this message]
2018-11-13 5:57 ` zhoucm1
2018-12-19 17:53 ` Dmitry Osipenko
2018-12-21 18:27 ` Christian König
2018-12-21 18:35 ` Dmitry Osipenko
2018-12-21 18:45 ` Koenig, Christian
2018-12-21 18:59 ` Dmitry Osipenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c580da6-da64-a930-fad2-80705ac3bf5f@amd.com \
--to=christian.koenig@amd.com \
--cc=David1.Zhou@amd.com \
--cc=chris@chris-wilson.co.uk \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=eric@anholt.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).