linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Eric Anholt" <eric@anholt.net>,
	christian.koenig@amd.com, zhoucm1 <zhoucm1@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] drm: Revert syncobj timeline changes.
Date: Mon, 12 Nov 2018 10:48:58 +0000	[thread overview]
Message-ID: <154201973877.16646.5745251436337959698@skylake-alporthouse-com> (raw)
In-Reply-To: <199c35bc-e684-fbc4-dcef-d7105d82f0ff@gmail.com>

Quoting Christian König (2018-11-12 10:16:01)
> Am 09.11.18 um 23:26 schrieb Eric Anholt:
> 
>     Eric Anholt <eric@anholt.net> writes:
> 
> 
>         [ Unknown signature status ]
>         zhoucm1 <zhoucm1@amd.com> writes:
> 
> 
>             On 2018年11月09日 00:52, Christian König wrote:
> 
>                 Am 08.11.18 um 17:07 schrieb Koenig, Christian:
> 
>                     Am 08.11.18 um 17:04 schrieb Eric Anholt:
> 
>                         Daniel suggested I submit this, since we're still seeing regressions
>                         from it.  This is a revert to before 48197bc564c7 ("drm: add syncobj
>                         timeline support v9") and its followon fixes.
> 
>                     This is a harmless false positive from lockdep, Chouming and I are
>                     already working on a fix.
> 
>                 On the other hand we had enough trouble with that patch, so if it
>                 really bothers you feel free to add my Acked-by: Christian König
>                 <christian.koenig@amd.com> and push it.
> 
>             NAK, please no, I don't think this needed, the Warning totally isn't
>             related to syncobj timeline, but fence-array implementation flaw, just
>             exposed by syncobj.
>             In addition, Christian already has a fix for this Warning, I've tested.
>             Please Christian send to public review.
> 
>         I backed out my revert of #2 (#1 still necessary) after adding the
>         lockdep regression fix, and now my CTS run got oomkilled after just a
>         few hours, with these notable lines in the unreclaimable slab info list:
> 
>         [ 6314.373099] drm_sched_fence        69095KB      69095KB
>         [ 6314.373653] kmemleak_object       428249KB     428384KB
>         [ 6314.373736] kmalloc-262144           256KB        256KB
>         [ 6314.373743] kmalloc-131072           128KB        128KB
>         [ 6314.373750] kmalloc-65536             64KB         64KB
>         [ 6314.373756] kmalloc-32768           1472KB       1728KB
>         [ 6314.373763] kmalloc-16384             64KB         64KB
>         [ 6314.373770] kmalloc-8192             208KB        208KB
>         [ 6314.373778] kmalloc-4096            2408KB       2408KB
>         [ 6314.373784] kmalloc-2048             288KB        336KB
>         [ 6314.373792] kmalloc-1024            1457KB       1512KB
>         [ 6314.373800] kmalloc-512              854KB       1048KB
>         [ 6314.373808] kmalloc-256              188KB        268KB
>         [ 6314.373817] kmalloc-192            69141KB      69142KB
>         [ 6314.373824] kmalloc-64             47703KB      47704KB
>         [ 6314.373886] kmalloc-128            46396KB      46396KB
>         [ 6314.373894] kmem_cache                31KB         35KB
> 
>         No results from kmemleak, though.
> 
>     OK, it looks like the #2 revert probably isn't related to the OOM issue.
>     Running a single job on otherwise unused DRM, watching /proc/slabinfo
>     every second for drm_sched_fence, I get:
> 
>     drm_sched_fence        0      0    192   21    1 : tunables   32   16    8 : slabdata      0      0      0 : globalstat       0      0     0    0    0    0    0    0    0 : cpustat      0      0      0      0
>     drm_sched_fence       16     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
>     drm_sched_fence       13     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
>     drm_sched_fence        6     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
>     drm_sched_fence        4     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
>     drm_sched_fence        2     21    192   21    1 : tunables   32   16    8 : slabdata      1      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
>     drm_sched_fence        0     21    192   21    1 : tunables   32   16    8 : slabdata      0      1      0 : globalstat      16     16     1    0    0    0    0    0    0 : cpustat      5      1      6      0
> 
>     So we generate a ton of fences, and I guess free them slowly because of
>     RCU?  And presumably kmemleak was sucking up lots of memory because of
>     how many of these objects were laying around.
> 
> 
> That is certainly possible. Another possibility is that we don't drop the
> reference in dma-fence-array early enough.
> 
> E.g. the dma-fence-array will keep the reference to its fences until it is
> destroyed, which is a bit late when you chain multiple dma-fence-array objects
> together.
> 
> David can you take a look at this and propose a fix? That would probably be
> good to have fixed in dma-fence-array separately to the timeline work.

Note that drm_syncobj_replace_fence() leaks any existing fence for
!timeline syncobjs. Which coupled with the linear search ends up with
a severe regression in both time and memory.
-Chris

  parent reply	other threads:[~2018-11-12 10:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-08 16:04 [PATCH 0/2] reverts to un-regress v3d Eric Anholt
2018-11-08 16:04 ` [PATCH 1/2] Revert "drm/sched: fix timeout handling v2" Eric Anholt
2018-11-08 16:10   ` Koenig, Christian
2018-11-08 16:19     ` Eric Anholt
2018-11-08 16:48       ` Koenig, Christian
2018-11-08 16:04 ` [PATCH 2/2] drm: Revert syncobj timeline changes Eric Anholt
2018-11-08 16:07   ` Koenig, Christian
2018-11-08 16:52     ` Christian König
2018-11-09  2:35       ` zhoucm1
2018-11-09 21:10         ` Eric Anholt
2018-11-09 22:26           ` Eric Anholt
     [not found]             ` <199c35bc-e684-fbc4-dcef-d7105d82f0ff@gmail.com>
2018-11-12 10:48               ` Chris Wilson [this message]
2018-11-12 11:47                 ` Koenig, Christian
2018-11-13  5:57                 ` zhoucm1
2018-12-19 17:53   ` Dmitry Osipenko
2018-12-21 18:27     ` Christian König
2018-12-21 18:35       ` Dmitry Osipenko
2018-12-21 18:45         ` Koenig, Christian
2018-12-21 18:59           ` Dmitry Osipenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=154201973877.16646.5745251436337959698@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=eric@anholt.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zhoucm1@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).