Re: TTM placement & caching issue/questions

From: Daniel Vetter <daniel@ffwll.ch>
To: Thomas Hellstrom <thellstrom@vmware.com>
Cc: dri-devel@lists.freedesktop.org,
	Michel Danzer <daenzer@vmware.com>,
	linuxppc-dev@ozlabs.org
Subject: Re: TTM placement & caching issue/questions
Date: Thu, 4 Sep 2014 11:34:54 +0200	[thread overview]
Message-ID: <20140904093454.GG15520@phenom.ffwll.local> (raw)
In-Reply-To: <54081844.7000604@vmware.com>

On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.
> 
> Moreover (although only tested on Intel's embedded chipsets), texturing
> from cpu-cache-coherent PCI memory was a real GPU performance hog
> compared to texturing from non-snooped memory. Hence, whenever a buffer
> could be classified as GPU-read-only (or almost at least), it should be
> placed in write-combined memory.

Just a quick comment since this explicitly referes to intel chips: On
desktop/laptop chips with the big shared l3/l4 caches it's the other way
round. Cached uploads are substantially faster than wc and not using
coherent access is a severe perf hit for texturing. I guess the hw guys
worked really hard to hide the snooping costs so that the gpu can benefit
from the massive bandwidth these caches can provide.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch