All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Thomas Hellstrom <thomas@shipmail.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 0/9] make struct drm_mm_node embeddable
Date: Mon, 15 Nov 2010 20:45:12 +0100	[thread overview]
Message-ID: <20101115194512.GB3484@viiv.ffwll.ch> (raw)
In-Reply-To: <4CE0E815.3000908@shipmail.org>

Hi Thomas,

On Mon, Nov 15, 2010 at 08:58:13AM +0100, Thomas Hellstrom wrote:
> Nice work, although I have some comments about general applicability
> that we perhaps need to think about.
> 
> 1) The space representation and space allocation algorithm is
> something that is private to the aperture management system. For a
> specialized implementation like i915 that is all fine, but Ben has
> recently abstracted that part out of the core TTM bo implementation.
> As an example, vmwgfx is now using kernel idas to manage aperture
> space, and drm_mm objects for traditional VRAM space. Hence,
> embedding drm_mm objects into ttm bos will not really be worthwile.
> At least not for aperture space management, and TTM will need to
> continue to "dance", both in the ida case and in the drm_mm case.

Yep, I've looked into this and noticed the recent addition of the ida
support. This is why I've added the "decent surgery" comment. Embedding
the drm_mm_node still looks possible, albeit perhaps not feasible (at
least I won't tackle this in the immediate future).

> For device address space, the situation is different, though, and it
> should be possible to embed the drm_mm objects, but that brings up
> the next thing:
> 
> 2) The algorithm used by drm_mm has been around for a while and has
> seen a fair amount of changes, but nobody has yet attacked the
> algorithm used to search for free space, which was just quickly put
> together as an improvement on what was the old mesa range manager.
> In moderate fragmentation situations, the performance will degrade,
> particularly with "best match" searches. In the near future we'd
> probably want to add something like a "hole rb tree" rather than a
> "hole list", and a choice of algorithm for the user. With embeddable
> objects, unless you want to waste space for unused members, you'd
> need a separate drm_mm node subclass for each algorithm, whereas if
> you don't embed, you only need to allocate what you need.

First a small rant about "best match" (to get it out of the way;-)
- "best match" is simply a misleading name: with alignment > size
  (at least on older hw) and mixes of unrestricted and range restricted
  allocations (ironlake has 2G of gtt, just 256 of it mappable), which is
  all possible with the latest experimental i915 patches, "best match" can
  do worse than the simpler approach.
- doing a full linear scan for every tiny state buffer/pixmap cache is
  slow.
At this, it serves as an excuse to not implement proper eviction support.
</rant>
[If you agree, I'll happily write the patch to rip it out. It just doesn't
bother me 'cause it's only a few lines in drm_mm.c and I can ignore the
actual users.]

Now to the more useful discussion: IMHO drm_mm.c should be an allocator
for vram/(g)tt, i.e. it needs to support:
- a mix of large/small sizes.
- fancy alignment constrains (new patches for drm/i915 are pushing things
  there).
- range-restricted allocations. I think current users only ever have one
  (start, end) set for restricted allocations, so this might actually be
  simplified.
If other users don't fit into this anymore, mea culpa, they need they're
own allocator. You've already taken this path for vmwgfx by using the ida
allocator. And if the linear scan for the gem mmap offset allocator ever
shows up in profiles, I think it's better served with a buddy-style
pot-sized, pot-aligned allocator. After all, fragmentation of virtual
address space isn't a that severe problem.

Hence I think that drivers with extremely specific needs should roll their
own allocator. So I don't think we should anticipate different allocator
algorithms. I see driver-specific stuff more in the area of clever
eviction algorithms - i915 is currently at 5 lru's for gtt mapped bos, and
we're still adding.

Of course I've spent a bunch of brain-cycles on creating a more efficient
allocator - O(n) just doesn't look that good. Now
- it should be fast in the common case
- and not degerate into O(n) for ugly corner cases.
Which leaves us for the above allocation requirements of (u64 size, u32
alignment, bool range_restricted) with two 2d-range-trees. Now factoring
in that lru-scanning is also O(n_{gtt_mapped}) gives us a data-structure
I'm not really eager to create.

Current code seems fares rather well because the hole_stack fifo is good
at avoiding the linear scan worst-case. And as soon as we start to strash
the gtt, everything is totally snowed under by clflush overhead on i915
anyway.

To make a long story short, I've opted to make the current code faster by
avoiding kmalloc and spoiling fewer cache-lines with useless data. And if
the linear scan ever shows up in profiles, we could always add some stats
to bail out early for large allocations. Or add a tree to heuristically
find a suitable hole (assuming worst-case waste due to alignment).

Thanks a lot for your input on this.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

  reply	other threads:[~2010-11-15 19:42 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 17:36 [PATCH 0/9] make struct drm_mm_node embeddable Daniel Vetter
2010-11-12 17:36 ` [PATCH 1/9] drm/nouveau: don't munge in drm_mm internals Daniel Vetter
2010-11-12 17:36 ` [PATCH 2/9] drm: mm: track free areas implicitly Daniel Vetter
2010-11-12 17:36 ` [PATCH 3/9] drm: mm: extract node insert helper functions Daniel Vetter
2010-11-12 17:36 ` [PATCH 4/9] drm: mm: add api for embedding struct drm_mm_node Daniel Vetter
2010-11-12 17:36 ` [PATCH 5/9] drm/i915: embed struct drm_mm_node into struct drm_i915_gem_object Daniel Vetter
2010-11-12 17:36 ` [PATCH 6/9] drm/i915: kill obj->gtt_offset Daniel Vetter
2010-11-12 17:36 ` [PATCH 7/9] drm/i915: kill gtt_list Daniel Vetter
2010-11-12 17:36 ` [PATCH 8/9] drm: mm: add helper to unwind scan state Daniel Vetter
2010-11-12 17:36 ` [PATCH 9/9] drm/i915: use drm_mm_for_each_scanned_node_reverse helper Daniel Vetter
2010-11-12 17:56 ` [PATCH 0/9] make struct drm_mm_node embeddable Chris Wilson
2010-11-15  7:58 ` Thomas Hellstrom
2010-11-15 19:45   ` Daniel Vetter [this message]
2010-11-15 20:40     ` Thomas Hellstrom
2010-11-15 20:54       ` Daniel Vetter
2010-11-14 14:03 Sedat Dilek
2010-11-14 14:38 ` Chris Wilson
2010-11-14 15:52   ` Sedat Dilek
2010-11-14 16:13     ` Daniel Vetter
2010-11-14 16:56       ` Sedat Dilek
2010-11-14 17:14         ` Daniel Vetter
2010-11-14 17:27           ` Sedat Dilek
2010-11-14 17:52             ` Sedat Dilek
2010-11-14 18:14               ` Daniel Vetter
2010-11-14 18:19                 ` Sedat Dilek
2010-11-14 18:31                 ` Sedat Dilek
2010-11-14 18:54                   ` Daniel Vetter
2010-11-14 19:55                     ` Sedat Dilek
2010-11-15 10:31 Sedat Dilek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101115194512.GB3484@viiv.ffwll.ch \
    --to=daniel@ffwll.ch \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=thomas@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.