All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: "Lionel Landwerlin" <lionel.g.landwerlin@linux.intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Jordan Justen" <jordan.l.justen@intel.com>,
	dri-devel@lists.freedesktop.org,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Kenneth Graunke" <kenneth@whitecape.org>,
	"Daniele Ceraolo Spurio" <daniele.ceraolospurio@intel.com>,
	"Jon Bloomfield" <jon.bloomfield@intel.com>,
	"Jason Ekstrand" <jason@jlekstrand.net>,
	mesa-dev@lists.freedesktop.org,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
Date: Mon, 26 Apr 2021 10:38:53 +0100	[thread overview]
Message-ID: <20210426093901.28937-1-matthew.auld@intel.com> (raw)

Add an entry for the new uAPI needed for DG1. Also add the overall
upstream plan, including some notes for the TTM conversion.

v2(Daniel):
  - include the overall upstreaming plan
  - add a note for mmap, there are differences here for TTM vs i915
  - bunch of other suggestions from Daniel
v3:
 (Daniel)
  - add a note for set/get caching stuff
  - add some more docs for existing query and extensions stuff
  - add an actual code example for regions query
  - bunch of other stuff
 (Jason)
  - uAPI change(!):
	- try a simpler design with the placements extension
	- rather than have a generic setparam which can cover multiple
	  use cases, have each extension be responsible for one thing
	  only
v4:
 (Daniel)
  - add some more notes for ttm conversion
  - bunch of other stuff
 (Jason)
  - uAPI change(!):
	- drop all the extra rsvd members for the region_query and
	  region_info, just keep the bare minimum needed for padding

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
---
 Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
 Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
 Documentation/gpu/rfc/index.rst         |   4 +
 3 files changed, 346 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
 create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst

diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
new file mode 100644
index 000000000000..7ed59b6202d5
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_gem_lmem.h
@@ -0,0 +1,212 @@
+/**
+ * enum drm_i915_gem_memory_class - Supported memory classes
+ */
+enum drm_i915_gem_memory_class {
+	/** @I915_MEMORY_CLASS_SYSTEM: System memory */
+	I915_MEMORY_CLASS_SYSTEM = 0,
+	/** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
+	I915_MEMORY_CLASS_DEVICE,
+};
+
+/**
+ * struct drm_i915_gem_memory_class_instance - Identify particular memory region
+ */
+struct drm_i915_gem_memory_class_instance {
+	/** @memory_class: See enum drm_i915_gem_memory_class */
+	__u16 memory_class;
+
+	/** @memory_instance: Which instance */
+	__u16 memory_instance;
+};
+
+/**
+ * struct drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note that we reserve some stuff here for potential future work. As an example
+ * we might want expose the capabilities(see @caps) for a given region, which
+ * could include things like if the region is CPU mappable/accessible, what are
+ * the supported mapping types etc.
+ *
+ * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
+ * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct drm_i915_memory_region_info {
+	/** @region: The class:instance pair encoding */
+	struct drm_i915_gem_memory_class_instance region;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @caps: MBZ */
+	__u64 caps;
+
+	/** @probed_size: Memory probed by the driver (-1 = unknown) */
+	__u64 probed_size;
+
+	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+	__u64 unallocated_size;
+};
+
+/**
+ * struct drm_i915_query_memory_regions
+ *
+ * The region info query enumerates all regions known to the driver by filling
+ * in an array of struct drm_i915_memory_region_info structures.
+ *
+ * Example for getting the list of supported regions:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_query_memory_regions *info;
+ *	struct drm_i915_query_item item = {
+ *		.query_id = DRM_I915_QUERY_MEMORY_REGIONS;
+ *	};
+ *	struct drm_i915_query query = {
+ *		.num_items = 1,
+ *		.items_ptr = (uintptr_t)&item,
+ *	};
+ *	int err, i;
+ *
+ *	// First query the size of the blob we need, this needs to be large
+ *	// enough to hold our array of regions. The kernel will fill out the
+ *	// item.length for us, which is the number of bytes we need.
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	info = calloc(1, item.length);
+ *	// Now that we allocated the required number of bytes, we call the ioctl
+ *	// again, this time with the data_ptr pointing to our newly allocated
+ *	// blob, which the kernel can then populate with the all the region info.
+ *	item.data_ptr = (uintptr_t)&info,
+ *
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	// We can now access each region in the array
+ *	for (i = 0; i < info->num_regions; i++) {
+ *		struct drm_i915_memory_region_info mr = info->regions[i];
+ *		u16 class = mr.region.class;
+ *		u16 instance = mr.region.instance;
+ *
+ *		....
+ *	}
+ *
+ *	free(info);
+ */
+struct drm_i915_query_memory_regions {
+	/** @num_regions: Number of supported regions */
+	__u32 num_regions;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @regions: Info about each supported region */
+	struct drm_i915_memory_region_info regions[];
+};
+
+#define DRM_I915_GEM_CREATE_EXT		0xdeadbeaf
+#define DRM_IOCTL_I915_GEM_CREATE_EXT	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
+
+/**
+ * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct drm_i915_gem_create_ext {
+	/**
+	 * @size: Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 *
+	 * Note that for some devices we have might have further minimum
+	 * page-size restrictions(larger than 4K), like for device local-memory.
+	 * However in general the final size here should always reflect any
+	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
+	 * extension to place the object in device local-memory.
+	 */
+	__u64 size;
+	/**
+	 * @handle: Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+	/** @flags: MBZ */
+	__u32 flags;
+	/**
+	 * @extensions: The chain of extensions to apply to this object.
+	 *
+	 * This will be useful in the future when we need to support several
+	 * different extensions, and we need to apply more than one when
+	 * creating the object. See struct i915_user_extension.
+	 *
+	 * If we don't supply any extensions then we get the same old gem_create
+	 * behaviour.
+	 *
+	 * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
+	 * struct drm_i915_gem_create_ext_memory_regions.
+	 */
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_create_ext_memory_regions - The
+ * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
+ *
+ * Set the object with the desired set of placements/regions in priority
+ * order. Each entry must be unique and supported by the device.
+ *
+ * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
+ * an equivalent layout of class:instance pair encodings. See struct
+ * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
+ * query the supported regions for a device.
+ *
+ * As an example, on discrete devices, if we wish to set the placement as
+ * device local-memory we can do something like:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_gem_memory_class_instance region_lmem = {
+ *              .memory_class = I915_MEMORY_CLASS_DEVICE,
+ *              .memory_instance = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext_memory_regions regions = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
+ *              .regions = (uintptr_t)&region_lmem,
+ *              .num_regions = 1,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = 16 * PAGE_SIZE,
+ *              .extensions = (uintptr_t)&regions,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ *
+ * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
+ * along with the final object size in &drm_i915_gem_create_ext.size, which
+ * should account for any rounding up, if required.
+ */
+struct drm_i915_gem_create_ext_memory_regions {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @pad: MBZ */
+	__u32 pad;
+	/** @num_regions: Number of elements in the @regions array. */
+	__u32 num_regions;
+	/**
+	 * @regions: The regions/placements array.
+	 *
+	 * An array of struct drm_i915_gem_memory_class_instance.
+	 */
+	__u64 regions;
+};
diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
new file mode 100644
index 000000000000..462f1efd9003
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
@@ -0,0 +1,130 @@
+=========================
+I915 DG1/LMEM RFC Section
+=========================
+
+Upstream plan
+=============
+For upstream the overall plan for landing all the DG1 stuff and turning it for
+real, with all the uAPI bits is:
+
+* Merge basic HW enabling of DG1(still without pciid)
+* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
+        * At this point we can still make changes, but importantly this lets us
+          start running IGTs which can utilize local-memory in CI
+* Convert over to TTM, make sure it all keeps working. Some of the work items:
+        * TTM shrinker for discrete
+        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
+        * Use TTM CPU pagefault handler
+        * Route shmem backend over to TTM SYSTEM for discrete
+        * TTM purgeable object support
+        * Move i915 buddy allocator over to TTM
+        * MMAP ioctl mode(see `I915 MMAP`_)
+        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
+* Add pciid for DG1 and turn on uAPI for real
+
+New object placement and region query uAPI
+==========================================
+Starting from DG1 we need to give userspace the ability to allocate buffers from
+device local-memory. Currently the driver supports gem_create, which can place
+buffers in system memory via shmem, and the usual assortment of other
+interfaces, like dumb buffers and userptr.
+
+To support this new capability, while also providing a uAPI which will work
+beyond just DG1, we propose to offer three new bits of uAPI:
+
+DRM_I915_QUERY_MEMORY_REGIONS
+-----------------------------
+New query ID which allows userspace to discover the list of supported memory
+regions(like system-memory and local-memory) for a given device. We identify
+each region with a class and instance pair, which should be unique. The class
+here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
+like DG1.
+
+Side note: The class/instance design is borrowed from our existing engine uAPI,
+where we describe every physical engine in terms of its class, and the
+particular instance, since we can have more than one per class.
+
+In the future we also want to expose more information which can further
+describe the capabilities of a region.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
+
+GEM_CREATE_EXT
+--------------
+New ioctl which is basically just gem_create but now allows userspace to
+provide a chain of possible extensions. Note that if we don't provide any
+extensions then we get the exact same behaviour as gem_create.
+
+Side note: We also need to support PXP[1] in the near future, which is also
+applicable to integrated platforms, and adds its own gem_create_ext extension,
+which basically lets userspace mark a buffer as "protected".
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_create_ext
+
+I915_GEM_CREATE_EXT_MEMORY_REGIONS
+----------------------------------
+Implemented as an extension for gem_create_ext, we would now allow userspace to
+optionally provide an immutable list of preferred placements at creation time,
+in priority order, for a given buffer object.  For the placements we expect
+them each to use the class/instance encoding, as per the output of the regions
+query. Having the list in priority order will be useful in the future when
+placing an object, say during eviction.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_create_ext_memory_regions
+
+One fair criticism here is that this seems a little over-engineered[2]. If we
+just consider DG1 then yes, a simple gem_create.flags or something is totally
+all that's needed to tell the kernel to allocate the buffer in local-memory or
+whatever. However looking to the future we need uAPI which can also support
+upcoming Xe HP multi-tile architecture in a sane way, where there can be
+multiple local-memory instances for a given device, and so using both class and
+instance in our uAPI to describe regions is desirable, although specifically
+for DG1 it's uninteresting, since we only have a single local-memory instance.
+
+Existing uAPI issues
+====================
+Some potential issues we still need to resolve.
+
+I915 MMAP
+---------
+In i915 there are multiple ways to MMAP GEM object, including mapping the same
+object using different mapping types(WC vs WB), i.e multiple active mmaps per
+object. TTM expects one MMAP at most for the lifetime of the object. If it
+turns out that we have to backpedal here, there might be some potential
+userspace fallout.
+
+I915 SET/GET CACHING
+--------------------
+In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
+DG1 doesn't support non-snooped pcie transactions, so we can just always
+allocate as WB for smem-only buffers.  If/when our hw gains support for
+non-snooped pcie transactions then we must fix this mode at allocation time as
+a new GEM extension.
+
+This is related to the mmap problem, because in general (meaning, when we're
+not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
+allocation mode.
+
+Possible idea is to let the kernel picks the mmap mode for userspace from the
+following table:
+
+smem-only: WB. Userspace does not need to call clflush.
+
+smem+lmem: We allocate uncached memory, and give userspace a WC mapping
+for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
+access, which is a bit inefficient.
+
+lmem only: always WC
+
+This means on discrete you only get a single mmap mode, all others must be
+rejected. That's probably going to be a new default mode or something like
+that.
+
+Links
+=====
+[1] https://patchwork.freedesktop.org/series/86798/
+
+[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index a8621f7dab8b..05670442ca1b 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -15,3 +15,7 @@ host such documentation:
 
 * Once the code has landed move all the documentation to the right places in
   the main core, helper or driver sections.
+
+.. toctree::
+
+    i915_gem_lmem.rst
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Auld <matthew.auld@intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: "Lionel Landwerlin" <lionel.g.landwerlin@linux.intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Dave Airlie" <airlied@redhat.com>,
	dri-devel@lists.freedesktop.org,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Kenneth Graunke" <kenneth@whitecape.org>,
	mesa-dev@lists.freedesktop.org,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: [Intel-gfx] [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
Date: Mon, 26 Apr 2021 10:38:53 +0100	[thread overview]
Message-ID: <20210426093901.28937-1-matthew.auld@intel.com> (raw)

Add an entry for the new uAPI needed for DG1. Also add the overall
upstream plan, including some notes for the TTM conversion.

v2(Daniel):
  - include the overall upstreaming plan
  - add a note for mmap, there are differences here for TTM vs i915
  - bunch of other suggestions from Daniel
v3:
 (Daniel)
  - add a note for set/get caching stuff
  - add some more docs for existing query and extensions stuff
  - add an actual code example for regions query
  - bunch of other stuff
 (Jason)
  - uAPI change(!):
	- try a simpler design with the placements extension
	- rather than have a generic setparam which can cover multiple
	  use cases, have each extension be responsible for one thing
	  only
v4:
 (Daniel)
  - add some more notes for ttm conversion
  - bunch of other stuff
 (Jason)
  - uAPI change(!):
	- drop all the extra rsvd members for the region_query and
	  region_info, just keep the bare minimum needed for padding

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
---
 Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
 Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
 Documentation/gpu/rfc/index.rst         |   4 +
 3 files changed, 346 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
 create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst

diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
new file mode 100644
index 000000000000..7ed59b6202d5
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_gem_lmem.h
@@ -0,0 +1,212 @@
+/**
+ * enum drm_i915_gem_memory_class - Supported memory classes
+ */
+enum drm_i915_gem_memory_class {
+	/** @I915_MEMORY_CLASS_SYSTEM: System memory */
+	I915_MEMORY_CLASS_SYSTEM = 0,
+	/** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
+	I915_MEMORY_CLASS_DEVICE,
+};
+
+/**
+ * struct drm_i915_gem_memory_class_instance - Identify particular memory region
+ */
+struct drm_i915_gem_memory_class_instance {
+	/** @memory_class: See enum drm_i915_gem_memory_class */
+	__u16 memory_class;
+
+	/** @memory_instance: Which instance */
+	__u16 memory_instance;
+};
+
+/**
+ * struct drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note that we reserve some stuff here for potential future work. As an example
+ * we might want expose the capabilities(see @caps) for a given region, which
+ * could include things like if the region is CPU mappable/accessible, what are
+ * the supported mapping types etc.
+ *
+ * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
+ * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct drm_i915_memory_region_info {
+	/** @region: The class:instance pair encoding */
+	struct drm_i915_gem_memory_class_instance region;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @caps: MBZ */
+	__u64 caps;
+
+	/** @probed_size: Memory probed by the driver (-1 = unknown) */
+	__u64 probed_size;
+
+	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+	__u64 unallocated_size;
+};
+
+/**
+ * struct drm_i915_query_memory_regions
+ *
+ * The region info query enumerates all regions known to the driver by filling
+ * in an array of struct drm_i915_memory_region_info structures.
+ *
+ * Example for getting the list of supported regions:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_query_memory_regions *info;
+ *	struct drm_i915_query_item item = {
+ *		.query_id = DRM_I915_QUERY_MEMORY_REGIONS;
+ *	};
+ *	struct drm_i915_query query = {
+ *		.num_items = 1,
+ *		.items_ptr = (uintptr_t)&item,
+ *	};
+ *	int err, i;
+ *
+ *	// First query the size of the blob we need, this needs to be large
+ *	// enough to hold our array of regions. The kernel will fill out the
+ *	// item.length for us, which is the number of bytes we need.
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	info = calloc(1, item.length);
+ *	// Now that we allocated the required number of bytes, we call the ioctl
+ *	// again, this time with the data_ptr pointing to our newly allocated
+ *	// blob, which the kernel can then populate with the all the region info.
+ *	item.data_ptr = (uintptr_t)&info,
+ *
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	// We can now access each region in the array
+ *	for (i = 0; i < info->num_regions; i++) {
+ *		struct drm_i915_memory_region_info mr = info->regions[i];
+ *		u16 class = mr.region.class;
+ *		u16 instance = mr.region.instance;
+ *
+ *		....
+ *	}
+ *
+ *	free(info);
+ */
+struct drm_i915_query_memory_regions {
+	/** @num_regions: Number of supported regions */
+	__u32 num_regions;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @regions: Info about each supported region */
+	struct drm_i915_memory_region_info regions[];
+};
+
+#define DRM_I915_GEM_CREATE_EXT		0xdeadbeaf
+#define DRM_IOCTL_I915_GEM_CREATE_EXT	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
+
+/**
+ * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct drm_i915_gem_create_ext {
+	/**
+	 * @size: Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 *
+	 * Note that for some devices we have might have further minimum
+	 * page-size restrictions(larger than 4K), like for device local-memory.
+	 * However in general the final size here should always reflect any
+	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
+	 * extension to place the object in device local-memory.
+	 */
+	__u64 size;
+	/**
+	 * @handle: Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+	/** @flags: MBZ */
+	__u32 flags;
+	/**
+	 * @extensions: The chain of extensions to apply to this object.
+	 *
+	 * This will be useful in the future when we need to support several
+	 * different extensions, and we need to apply more than one when
+	 * creating the object. See struct i915_user_extension.
+	 *
+	 * If we don't supply any extensions then we get the same old gem_create
+	 * behaviour.
+	 *
+	 * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
+	 * struct drm_i915_gem_create_ext_memory_regions.
+	 */
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_create_ext_memory_regions - The
+ * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
+ *
+ * Set the object with the desired set of placements/regions in priority
+ * order. Each entry must be unique and supported by the device.
+ *
+ * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
+ * an equivalent layout of class:instance pair encodings. See struct
+ * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
+ * query the supported regions for a device.
+ *
+ * As an example, on discrete devices, if we wish to set the placement as
+ * device local-memory we can do something like:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_gem_memory_class_instance region_lmem = {
+ *              .memory_class = I915_MEMORY_CLASS_DEVICE,
+ *              .memory_instance = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext_memory_regions regions = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
+ *              .regions = (uintptr_t)&region_lmem,
+ *              .num_regions = 1,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = 16 * PAGE_SIZE,
+ *              .extensions = (uintptr_t)&regions,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ *
+ * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
+ * along with the final object size in &drm_i915_gem_create_ext.size, which
+ * should account for any rounding up, if required.
+ */
+struct drm_i915_gem_create_ext_memory_regions {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @pad: MBZ */
+	__u32 pad;
+	/** @num_regions: Number of elements in the @regions array. */
+	__u32 num_regions;
+	/**
+	 * @regions: The regions/placements array.
+	 *
+	 * An array of struct drm_i915_gem_memory_class_instance.
+	 */
+	__u64 regions;
+};
diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
new file mode 100644
index 000000000000..462f1efd9003
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
@@ -0,0 +1,130 @@
+=========================
+I915 DG1/LMEM RFC Section
+=========================
+
+Upstream plan
+=============
+For upstream the overall plan for landing all the DG1 stuff and turning it for
+real, with all the uAPI bits is:
+
+* Merge basic HW enabling of DG1(still without pciid)
+* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
+        * At this point we can still make changes, but importantly this lets us
+          start running IGTs which can utilize local-memory in CI
+* Convert over to TTM, make sure it all keeps working. Some of the work items:
+        * TTM shrinker for discrete
+        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
+        * Use TTM CPU pagefault handler
+        * Route shmem backend over to TTM SYSTEM for discrete
+        * TTM purgeable object support
+        * Move i915 buddy allocator over to TTM
+        * MMAP ioctl mode(see `I915 MMAP`_)
+        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
+* Add pciid for DG1 and turn on uAPI for real
+
+New object placement and region query uAPI
+==========================================
+Starting from DG1 we need to give userspace the ability to allocate buffers from
+device local-memory. Currently the driver supports gem_create, which can place
+buffers in system memory via shmem, and the usual assortment of other
+interfaces, like dumb buffers and userptr.
+
+To support this new capability, while also providing a uAPI which will work
+beyond just DG1, we propose to offer three new bits of uAPI:
+
+DRM_I915_QUERY_MEMORY_REGIONS
+-----------------------------
+New query ID which allows userspace to discover the list of supported memory
+regions(like system-memory and local-memory) for a given device. We identify
+each region with a class and instance pair, which should be unique. The class
+here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
+like DG1.
+
+Side note: The class/instance design is borrowed from our existing engine uAPI,
+where we describe every physical engine in terms of its class, and the
+particular instance, since we can have more than one per class.
+
+In the future we also want to expose more information which can further
+describe the capabilities of a region.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
+
+GEM_CREATE_EXT
+--------------
+New ioctl which is basically just gem_create but now allows userspace to
+provide a chain of possible extensions. Note that if we don't provide any
+extensions then we get the exact same behaviour as gem_create.
+
+Side note: We also need to support PXP[1] in the near future, which is also
+applicable to integrated platforms, and adds its own gem_create_ext extension,
+which basically lets userspace mark a buffer as "protected".
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_create_ext
+
+I915_GEM_CREATE_EXT_MEMORY_REGIONS
+----------------------------------
+Implemented as an extension for gem_create_ext, we would now allow userspace to
+optionally provide an immutable list of preferred placements at creation time,
+in priority order, for a given buffer object.  For the placements we expect
+them each to use the class/instance encoding, as per the output of the regions
+query. Having the list in priority order will be useful in the future when
+placing an object, say during eviction.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_create_ext_memory_regions
+
+One fair criticism here is that this seems a little over-engineered[2]. If we
+just consider DG1 then yes, a simple gem_create.flags or something is totally
+all that's needed to tell the kernel to allocate the buffer in local-memory or
+whatever. However looking to the future we need uAPI which can also support
+upcoming Xe HP multi-tile architecture in a sane way, where there can be
+multiple local-memory instances for a given device, and so using both class and
+instance in our uAPI to describe regions is desirable, although specifically
+for DG1 it's uninteresting, since we only have a single local-memory instance.
+
+Existing uAPI issues
+====================
+Some potential issues we still need to resolve.
+
+I915 MMAP
+---------
+In i915 there are multiple ways to MMAP GEM object, including mapping the same
+object using different mapping types(WC vs WB), i.e multiple active mmaps per
+object. TTM expects one MMAP at most for the lifetime of the object. If it
+turns out that we have to backpedal here, there might be some potential
+userspace fallout.
+
+I915 SET/GET CACHING
+--------------------
+In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
+DG1 doesn't support non-snooped pcie transactions, so we can just always
+allocate as WB for smem-only buffers.  If/when our hw gains support for
+non-snooped pcie transactions then we must fix this mode at allocation time as
+a new GEM extension.
+
+This is related to the mmap problem, because in general (meaning, when we're
+not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
+allocation mode.
+
+Possible idea is to let the kernel picks the mmap mode for userspace from the
+following table:
+
+smem-only: WB. Userspace does not need to call clflush.
+
+smem+lmem: We allocate uncached memory, and give userspace a WC mapping
+for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
+access, which is a bit inefficient.
+
+lmem only: always WC
+
+This means on discrete you only get a single mmap mode, all others must be
+rejected. That's probably going to be a new default mode or something like
+that.
+
+Links
+=====
+[1] https://patchwork.freedesktop.org/series/86798/
+
+[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index a8621f7dab8b..05670442ca1b 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -15,3 +15,7 @@ host such documentation:
 
 * Once the code has landed move all the documentation to the right places in
   the main core, helper or driver sections.
+
+.. toctree::
+
+    i915_gem_lmem.rst
-- 
2.26.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

             reply	other threads:[~2021-04-26  9:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26  9:38 Matthew Auld [this message]
2021-04-26  9:38 ` [Intel-gfx] [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
2021-04-26  9:38 ` [PATCH 2/9] drm/i915: mark stolen as private Matthew Auld
2021-04-26  9:38   ` [Intel-gfx] " Matthew Auld
2021-04-26  9:38 ` [PATCH 3/9] drm/i915/query: Expose memory regions through the query uAPI Matthew Auld
2021-04-26  9:38   ` [Intel-gfx] " Matthew Auld
2021-04-26  9:38 ` [PATCH 4/9] drm/i915: rework gem_create flow for upcoming extensions Matthew Auld
2021-04-26  9:38   ` [Intel-gfx] " Matthew Auld
2021-04-26  9:38 ` [PATCH 5/9] drm/i915/uapi: introduce drm_i915_gem_create_ext Matthew Auld
2021-04-26  9:38   ` [Intel-gfx] " Matthew Auld
2021-04-26  9:38 ` [PATCH 6/9] drm/i915/uapi: implement object placement extension Matthew Auld
2021-04-26  9:38   ` [Intel-gfx] " Matthew Auld
2021-04-28 17:28   ` Kenneth Graunke
2021-04-28 17:28     ` [Intel-gfx] " Kenneth Graunke
2021-04-26  9:38 ` [PATCH 7/9] drm/i915/lmem: support optional CPU clearing for special internal use Matthew Auld
2021-04-26  9:38   ` [Intel-gfx] " Matthew Auld
2021-04-26 12:53   ` kernel test robot
2021-04-26 14:03   ` kernel test robot
2021-04-26  9:39 ` [PATCH 8/9] drm/i915/gem: clear userspace buffers for LMEM Matthew Auld
2021-04-26  9:39   ` [Intel-gfx] " Matthew Auld
2021-04-26  9:39 ` [PATCH 9/9] drm/i915/gem: hide new uAPI behind CONFIG_BROKEN Matthew Auld
2021-04-26  9:39   ` [Intel-gfx] " Matthew Auld
2021-04-26 12:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/doc/rfc: i915 DG1 uAPI Patchwork
2021-04-26 12:18 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-04-26 12:45 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-04-26 15:11 ` [PATCH 1/9] " Jason Ekstrand
2021-04-26 15:11   ` [Intel-gfx] " Jason Ekstrand
2021-04-26 15:31   ` Matthew Auld
2021-04-26 15:31     ` [Intel-gfx] " Matthew Auld
2021-04-26 16:25     ` Jason Ekstrand
2021-04-26 16:25       ` [Intel-gfx] " Jason Ekstrand
2021-04-26 16:32       ` Daniel Vetter
2021-04-26 16:32         ` [Intel-gfx] " Daniel Vetter
2021-04-26 15:13 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [1/9] " Patchwork
2021-04-28 15:16 ` [PATCH 1/9] " Kenneth Graunke
2021-04-28 15:16   ` [Intel-gfx] " Kenneth Graunke
2021-04-28 16:10   ` Matthew Auld
2021-04-28 16:10     ` [Intel-gfx] " Matthew Auld
2021-04-28 15:51 ` Jason Ekstrand
2021-04-28 15:51   ` [Intel-gfx] " Jason Ekstrand
2021-04-28 16:41   ` Matthew Auld
2021-04-28 16:41     ` [Intel-gfx] " Matthew Auld
2021-04-28 16:56     ` Jason Ekstrand
2021-04-28 16:56       ` [Intel-gfx] " Jason Ekstrand
2021-04-28 17:12       ` Kenneth Graunke
2021-04-28 17:12         ` [Intel-gfx] " Kenneth Graunke
2021-04-28 17:30 ` Kenneth Graunke
2021-04-28 17:30   ` [Intel-gfx] " Kenneth Graunke
2021-04-28 17:39 ` Bloomfield, Jon
2021-04-28 17:39   ` [Intel-gfx] " Bloomfield, Jon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426093901.28937-1-matthew.auld@intel.com \
    --to=matthew.auld@intel.com \
    --cc=airlied@redhat.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jason@jlekstrand.net \
    --cc=jon.bloomfield@intel.com \
    --cc=jordan.l.justen@intel.com \
    --cc=kenneth@whitecape.org \
    --cc=lionel.g.landwerlin@linux.intel.com \
    --cc=mesa-dev@lists.freedesktop.org \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.