From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97C0BC11F66 for ; Tue, 13 Jul 2021 11:42:34 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6444B60240 for ; Tue, 13 Jul 2021 11:42:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6444B60240 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 43EDE6E08A; Tue, 13 Jul 2021 11:42:32 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2B1696E086; Tue, 13 Jul 2021 11:42:30 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10043"; a="197418678" X-IronPort-AV: E=Sophos;i="5.84,236,1620716400"; d="scan'208";a="197418678" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2021 04:42:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,236,1620716400"; d="scan'208";a="492829748" Received: from gaia.fi.intel.com ([10.237.72.192]) by FMSMGA003.fm.intel.com with ESMTP; 13 Jul 2021 04:42:20 -0700 Received: by gaia.fi.intel.com (Postfix, from userid 1000) id DF3AA5C1F0F; Tue, 13 Jul 2021 14:41:30 +0300 (EEST) From: Mika Kuoppala To: Matthew Auld , intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits In-Reply-To: <20210713104554.2381406-1-matthew.auld@intel.com> References: <20210713104554.2381406-1-matthew.auld@intel.com> Date: Tue, 13 Jul 2021 14:41:30 +0300 Message-ID: <87wnputp91.fsf@gaia.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter , dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Matthew Auld writes: > Try to document the object caching related bits, like cache_coherent and > cache_dirty. > > Suggested-by: Daniel Vetter > Signed-off-by: Matthew Auld > --- > .../gpu/drm/i915/gem/i915_gem_object_types.h | 135 +++++++++++++++++- > drivers/gpu/drm/i915/i915_drv.h | 9 -- > 2 files changed, 131 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index ef3de2ae9723..02c3529b774c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -92,6 +92,57 @@ struct drm_i915_gem_object_ops { > const char *name; /* friendly name for debug, e.g. lockdep classes */ > }; > > +/** > + * enum i915_cache_level - The supported GTT caching values for system memory > + * pages. > + * > + * These translate to some special GTT PTE bits when binding pages into some > + * address space. It also determines whether an object, or rather its pages are > + * coherent with the GPU, when also reading or writing through the CPU cache > + * with those pages. > + * > + * Userspace can also control this through struct drm_i915_gem_caching. > + */ > +enum i915_cache_level { > + /** > + * @I915_CACHE_NONE: > + * > + * Not coherent with the CPU cache. If the cache is dirty and we need > + * the underlying pages to be coherent with some later GPU access then > + * we need to manually flush the pages. > + * > + * Note that on shared-LLC platforms reads through the CPU cache are > + * still coherent even with this setting. See also > + * I915_BO_CACHE_COHERENT_FOR_READ for more details. > + */ > + I915_CACHE_NONE = 0, > + /** > + * @I915_CACHE_LLC: > + * > + * Coherent with the CPU cache. If the cache is dirty, then the GPU will > + * ensure that access remains coherent, when both reading and writing > + * through the CPU cache. > + * > + * Applies to both platforms with shared-LLC(HAS_LLC), and snooping > + * based platforms(HAS_SNOOP). > + */ > + I915_CACHE_LLC, > + /** > + * @I915_CACHE_L3_LLC: > + * > + * gen7+, L3 sits between the domain specifc caches, eg sampler/render typo: specifc > + * caches, and the large Last-Level-Cache. LLC is coherent with the CPU, > + * but L3 is only visible to the GPU. > + */ I dont get the difference between this and I915_CACHE_LLC. Could the diff between LLC and L3_LLC be described here with example? Thanks, -Mika > + I915_CACHE_L3_LLC, > + /** > + * @I915_CACHE_WT: > + * > + * hsw:gt3e Write-through for scanout buffers. > + */ > + I915_CACHE_WT, > +}; > + > enum i915_map_type { > I915_MAP_WB = 0, > I915_MAP_WC, > @@ -228,14 +279,90 @@ struct drm_i915_gem_object { > unsigned int mem_flags; > #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */ > #define I915_BO_FLAG_IOMEM BIT(1) /* Object backed by IO memory */ > - /* > - * Is the object to be mapped as read-only to the GPU > - * Only honoured if hardware has relevant pte bit > + /** > + * @cache_level: The desired GTT caching level. > + * > + * See enum i915_cache_level for possible values, along with what > + * each does. > */ > unsigned int cache_level:3; > - unsigned int cache_coherent:2; > + /** > + * @cache_coherent: > + * > + * Track whether the pages are coherent with the GPU if reading or > + * writing through the CPU cache. > + * > + * This largely depends on the @cache_level, for example if the object > + * is marked as I915_CACHE_LLC, then GPU access is coherent for both > + * reads and writes through the CPU cache. > + * > + * Note that on platforms with shared-LLC support(HAS_LLC) reads through > + * the CPU cache are always coherent, regardless of the @cache_level. On > + * snooping based platforms this is not the case, unless the full > + * I915_CACHE_LLC or similar setting is used. > + * > + * As a result of this we need to track coherency separately for reads > + * and writes, in order to avoid superfluous flushing on shared-LLC > + * platforms, for reads. > + * > + * I915_BO_CACHE_COHERENT_FOR_READ: > + * > + * When reading through the CPU cache, the GPU is still coherent. Note > + * that no data has actually been modified here, so it might seem > + * strange that we care about this. > + * > + * As an example, if some object is mapped on the CPU with write-back > + * caching, and we read some page, then the cache likely now contains > + * the data from that read. At this point the cache and main memory > + * match up, so all good. But next the GPU needs to write some data to > + * that same page. Now if the @cache_level is I915_CACHE_NONE and the > + * the platform doesn't have the shared-LLC, then the GPU will > + * effectively skip invalidating the cache(or however that works > + * internally) when writing the new value. This is really bad since the > + * GPU has just written some new data to main memory, but the CPU cache > + * is still valid and now contains stale data. As a result the next time > + * we do a cached read with the CPU, we are rewarded with stale data. > + * Likewise if the cache is later flushed, we might be rewarded with > + * overwriting main memory with stale data. > + * > + * I915_BO_CACHE_COHERENT_FOR_WRITE: > + * > + * When writing through the CPU cache, the GPU is still coherent. Note > + * that this also implies I915_BO_CACHE_COHERENT_FOR_READ. > + * > + * This is never set when I915_CACHE_NONE is used for @cache_level, > + * where instead we have to manually flush the caches after writing > + * through the CPU cache. For other cache levels this should be set and > + * the object is therefore considered coherent for both reads and writes > + * through the CPU cache. > + */ > #define I915_BO_CACHE_COHERENT_FOR_READ BIT(0) > #define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1) > + unsigned int cache_coherent:2; > + /** > + * @cache_dirty: > + * > + * Track if the cache might be dirty for the @pages i.e it has yet to be > + * written back to main memory. As a result reading directly from main > + * memory might yield stale data. > + * > + * This also ties into whether the kernel is tracking the object as > + * coherent with the GPU, as per @cache_coherent, as it determines if > + * flushing might be needed at various points. > + * > + * Another part of @cache_dirty is managing flushing when first > + * acquiring the pages for system memory, at this point the pages are > + * considered foreign, so the default assumption is that the cache is > + * dirty, for example the page zeroing done my the kernel might leave > + * writes though the CPU cache, or swapping-in, while the actual data in > + * main memory is potentially stale. Note that this is a potential > + * security issue when dealing with userspace objects and zeroing. Now, > + * whether we actually need apply the big sledgehammer of flushing all > + * the pages on acquire depends on if @cache_coherent is marked as > + * I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent > + * for both reads and writes though the CPU cache. So pretty much this > + * should only be needed for I915_CACHE_NONE objects. > + */ > unsigned int cache_dirty:1; > > /** > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index c4747f4407ef..37bb1a3cadd4 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -394,15 +394,6 @@ struct drm_i915_display_funcs { > void (*read_luts)(struct intel_crtc_state *crtc_state); > }; > > -enum i915_cache_level { > - I915_CACHE_NONE = 0, > - I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */ > - I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc > - caches, eg sampler/render caches, and the > - large Last-Level-Cache. LLC is coherent with > - the CPU, but L3 is only visible to the GPU. */ > - I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */ > -}; > > #define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */ > > -- > 2.26.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30FB2C07E95 for ; Tue, 13 Jul 2021 11:42:34 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E41BA611C0 for ; Tue, 13 Jul 2021 11:42:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E41BA611C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8D0276E086; Tue, 13 Jul 2021 11:42:31 +0000 (UTC) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2B1696E086; Tue, 13 Jul 2021 11:42:30 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10043"; a="197418678" X-IronPort-AV: E=Sophos;i="5.84,236,1620716400"; d="scan'208";a="197418678" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2021 04:42:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,236,1620716400"; d="scan'208";a="492829748" Received: from gaia.fi.intel.com ([10.237.72.192]) by FMSMGA003.fm.intel.com with ESMTP; 13 Jul 2021 04:42:20 -0700 Received: by gaia.fi.intel.com (Postfix, from userid 1000) id DF3AA5C1F0F; Tue, 13 Jul 2021 14:41:30 +0300 (EEST) From: Mika Kuoppala To: Matthew Auld , intel-gfx@lists.freedesktop.org In-Reply-To: <20210713104554.2381406-1-matthew.auld@intel.com> References: <20210713104554.2381406-1-matthew.auld@intel.com> Date: Tue, 13 Jul 2021 14:41:30 +0300 Message-ID: <87wnputp91.fsf@gaia.fi.intel.com> MIME-Version: 1.0 Subject: Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter , dri-devel@lists.freedesktop.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Matthew Auld writes: > Try to document the object caching related bits, like cache_coherent and > cache_dirty. > > Suggested-by: Daniel Vetter > Signed-off-by: Matthew Auld > --- > .../gpu/drm/i915/gem/i915_gem_object_types.h | 135 +++++++++++++++++- > drivers/gpu/drm/i915/i915_drv.h | 9 -- > 2 files changed, 131 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index ef3de2ae9723..02c3529b774c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -92,6 +92,57 @@ struct drm_i915_gem_object_ops { > const char *name; /* friendly name for debug, e.g. lockdep classes */ > }; > > +/** > + * enum i915_cache_level - The supported GTT caching values for system memory > + * pages. > + * > + * These translate to some special GTT PTE bits when binding pages into some > + * address space. It also determines whether an object, or rather its pages are > + * coherent with the GPU, when also reading or writing through the CPU cache > + * with those pages. > + * > + * Userspace can also control this through struct drm_i915_gem_caching. > + */ > +enum i915_cache_level { > + /** > + * @I915_CACHE_NONE: > + * > + * Not coherent with the CPU cache. If the cache is dirty and we need > + * the underlying pages to be coherent with some later GPU access then > + * we need to manually flush the pages. > + * > + * Note that on shared-LLC platforms reads through the CPU cache are > + * still coherent even with this setting. See also > + * I915_BO_CACHE_COHERENT_FOR_READ for more details. > + */ > + I915_CACHE_NONE = 0, > + /** > + * @I915_CACHE_LLC: > + * > + * Coherent with the CPU cache. If the cache is dirty, then the GPU will > + * ensure that access remains coherent, when both reading and writing > + * through the CPU cache. > + * > + * Applies to both platforms with shared-LLC(HAS_LLC), and snooping > + * based platforms(HAS_SNOOP). > + */ > + I915_CACHE_LLC, > + /** > + * @I915_CACHE_L3_LLC: > + * > + * gen7+, L3 sits between the domain specifc caches, eg sampler/render typo: specifc > + * caches, and the large Last-Level-Cache. LLC is coherent with the CPU, > + * but L3 is only visible to the GPU. > + */ I dont get the difference between this and I915_CACHE_LLC. Could the diff between LLC and L3_LLC be described here with example? Thanks, -Mika > + I915_CACHE_L3_LLC, > + /** > + * @I915_CACHE_WT: > + * > + * hsw:gt3e Write-through for scanout buffers. > + */ > + I915_CACHE_WT, > +}; > + > enum i915_map_type { > I915_MAP_WB = 0, > I915_MAP_WC, > @@ -228,14 +279,90 @@ struct drm_i915_gem_object { > unsigned int mem_flags; > #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */ > #define I915_BO_FLAG_IOMEM BIT(1) /* Object backed by IO memory */ > - /* > - * Is the object to be mapped as read-only to the GPU > - * Only honoured if hardware has relevant pte bit > + /** > + * @cache_level: The desired GTT caching level. > + * > + * See enum i915_cache_level for possible values, along with what > + * each does. > */ > unsigned int cache_level:3; > - unsigned int cache_coherent:2; > + /** > + * @cache_coherent: > + * > + * Track whether the pages are coherent with the GPU if reading or > + * writing through the CPU cache. > + * > + * This largely depends on the @cache_level, for example if the object > + * is marked as I915_CACHE_LLC, then GPU access is coherent for both > + * reads and writes through the CPU cache. > + * > + * Note that on platforms with shared-LLC support(HAS_LLC) reads through > + * the CPU cache are always coherent, regardless of the @cache_level. On > + * snooping based platforms this is not the case, unless the full > + * I915_CACHE_LLC or similar setting is used. > + * > + * As a result of this we need to track coherency separately for reads > + * and writes, in order to avoid superfluous flushing on shared-LLC > + * platforms, for reads. > + * > + * I915_BO_CACHE_COHERENT_FOR_READ: > + * > + * When reading through the CPU cache, the GPU is still coherent. Note > + * that no data has actually been modified here, so it might seem > + * strange that we care about this. > + * > + * As an example, if some object is mapped on the CPU with write-back > + * caching, and we read some page, then the cache likely now contains > + * the data from that read. At this point the cache and main memory > + * match up, so all good. But next the GPU needs to write some data to > + * that same page. Now if the @cache_level is I915_CACHE_NONE and the > + * the platform doesn't have the shared-LLC, then the GPU will > + * effectively skip invalidating the cache(or however that works > + * internally) when writing the new value. This is really bad since the > + * GPU has just written some new data to main memory, but the CPU cache > + * is still valid and now contains stale data. As a result the next time > + * we do a cached read with the CPU, we are rewarded with stale data. > + * Likewise if the cache is later flushed, we might be rewarded with > + * overwriting main memory with stale data. > + * > + * I915_BO_CACHE_COHERENT_FOR_WRITE: > + * > + * When writing through the CPU cache, the GPU is still coherent. Note > + * that this also implies I915_BO_CACHE_COHERENT_FOR_READ. > + * > + * This is never set when I915_CACHE_NONE is used for @cache_level, > + * where instead we have to manually flush the caches after writing > + * through the CPU cache. For other cache levels this should be set and > + * the object is therefore considered coherent for both reads and writes > + * through the CPU cache. > + */ > #define I915_BO_CACHE_COHERENT_FOR_READ BIT(0) > #define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1) > + unsigned int cache_coherent:2; > + /** > + * @cache_dirty: > + * > + * Track if the cache might be dirty for the @pages i.e it has yet to be > + * written back to main memory. As a result reading directly from main > + * memory might yield stale data. > + * > + * This also ties into whether the kernel is tracking the object as > + * coherent with the GPU, as per @cache_coherent, as it determines if > + * flushing might be needed at various points. > + * > + * Another part of @cache_dirty is managing flushing when first > + * acquiring the pages for system memory, at this point the pages are > + * considered foreign, so the default assumption is that the cache is > + * dirty, for example the page zeroing done my the kernel might leave > + * writes though the CPU cache, or swapping-in, while the actual data in > + * main memory is potentially stale. Note that this is a potential > + * security issue when dealing with userspace objects and zeroing. Now, > + * whether we actually need apply the big sledgehammer of flushing all > + * the pages on acquire depends on if @cache_coherent is marked as > + * I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent > + * for both reads and writes though the CPU cache. So pretty much this > + * should only be needed for I915_CACHE_NONE objects. > + */ > unsigned int cache_dirty:1; > > /** > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index c4747f4407ef..37bb1a3cadd4 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -394,15 +394,6 @@ struct drm_i915_display_funcs { > void (*read_luts)(struct intel_crtc_state *crtc_state); > }; > > -enum i915_cache_level { > - I915_CACHE_NONE = 0, > - I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */ > - I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc > - caches, eg sampler/render caches, and the > - large Last-Level-Cache. LLC is coherent with > - the CPU, but L3 is only visible to the GPU. */ > - I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */ > -}; > > #define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */ > > -- > 2.26.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx