dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
@ 2021-04-26  9:38 Matthew Auld
  2021-04-26  9:38 ` [PATCH 2/9] drm/i915: mark stolen as private Matthew Auld
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, dri-devel, Daniel Vetter, Kenneth Graunke,
	Daniele Ceraolo Spurio, Jon Bloomfield, Jason Ekstrand, mesa-dev,
	Daniel Vetter

Add an entry for the new uAPI needed for DG1. Also add the overall
upstream plan, including some notes for the TTM conversion.

v2(Daniel):
  - include the overall upstreaming plan
  - add a note for mmap, there are differences here for TTM vs i915
  - bunch of other suggestions from Daniel
v3:
 (Daniel)
  - add a note for set/get caching stuff
  - add some more docs for existing query and extensions stuff
  - add an actual code example for regions query
  - bunch of other stuff
 (Jason)
  - uAPI change(!):
	- try a simpler design with the placements extension
	- rather than have a generic setparam which can cover multiple
	  use cases, have each extension be responsible for one thing
	  only
v4:
 (Daniel)
  - add some more notes for ttm conversion
  - bunch of other stuff
 (Jason)
  - uAPI change(!):
	- drop all the extra rsvd members for the region_query and
	  region_info, just keep the bare minimum needed for padding

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
---
 Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
 Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
 Documentation/gpu/rfc/index.rst         |   4 +
 3 files changed, 346 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
 create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst

diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
new file mode 100644
index 000000000000..7ed59b6202d5
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_gem_lmem.h
@@ -0,0 +1,212 @@
+/**
+ * enum drm_i915_gem_memory_class - Supported memory classes
+ */
+enum drm_i915_gem_memory_class {
+	/** @I915_MEMORY_CLASS_SYSTEM: System memory */
+	I915_MEMORY_CLASS_SYSTEM = 0,
+	/** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
+	I915_MEMORY_CLASS_DEVICE,
+};
+
+/**
+ * struct drm_i915_gem_memory_class_instance - Identify particular memory region
+ */
+struct drm_i915_gem_memory_class_instance {
+	/** @memory_class: See enum drm_i915_gem_memory_class */
+	__u16 memory_class;
+
+	/** @memory_instance: Which instance */
+	__u16 memory_instance;
+};
+
+/**
+ * struct drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note that we reserve some stuff here for potential future work. As an example
+ * we might want expose the capabilities(see @caps) for a given region, which
+ * could include things like if the region is CPU mappable/accessible, what are
+ * the supported mapping types etc.
+ *
+ * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
+ * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct drm_i915_memory_region_info {
+	/** @region: The class:instance pair encoding */
+	struct drm_i915_gem_memory_class_instance region;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @caps: MBZ */
+	__u64 caps;
+
+	/** @probed_size: Memory probed by the driver (-1 = unknown) */
+	__u64 probed_size;
+
+	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+	__u64 unallocated_size;
+};
+
+/**
+ * struct drm_i915_query_memory_regions
+ *
+ * The region info query enumerates all regions known to the driver by filling
+ * in an array of struct drm_i915_memory_region_info structures.
+ *
+ * Example for getting the list of supported regions:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_query_memory_regions *info;
+ *	struct drm_i915_query_item item = {
+ *		.query_id = DRM_I915_QUERY_MEMORY_REGIONS;
+ *	};
+ *	struct drm_i915_query query = {
+ *		.num_items = 1,
+ *		.items_ptr = (uintptr_t)&item,
+ *	};
+ *	int err, i;
+ *
+ *	// First query the size of the blob we need, this needs to be large
+ *	// enough to hold our array of regions. The kernel will fill out the
+ *	// item.length for us, which is the number of bytes we need.
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	info = calloc(1, item.length);
+ *	// Now that we allocated the required number of bytes, we call the ioctl
+ *	// again, this time with the data_ptr pointing to our newly allocated
+ *	// blob, which the kernel can then populate with the all the region info.
+ *	item.data_ptr = (uintptr_t)&info,
+ *
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	// We can now access each region in the array
+ *	for (i = 0; i < info->num_regions; i++) {
+ *		struct drm_i915_memory_region_info mr = info->regions[i];
+ *		u16 class = mr.region.class;
+ *		u16 instance = mr.region.instance;
+ *
+ *		....
+ *	}
+ *
+ *	free(info);
+ */
+struct drm_i915_query_memory_regions {
+	/** @num_regions: Number of supported regions */
+	__u32 num_regions;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @regions: Info about each supported region */
+	struct drm_i915_memory_region_info regions[];
+};
+
+#define DRM_I915_GEM_CREATE_EXT		0xdeadbeaf
+#define DRM_IOCTL_I915_GEM_CREATE_EXT	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
+
+/**
+ * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct drm_i915_gem_create_ext {
+	/**
+	 * @size: Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 *
+	 * Note that for some devices we have might have further minimum
+	 * page-size restrictions(larger than 4K), like for device local-memory.
+	 * However in general the final size here should always reflect any
+	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
+	 * extension to place the object in device local-memory.
+	 */
+	__u64 size;
+	/**
+	 * @handle: Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+	/** @flags: MBZ */
+	__u32 flags;
+	/**
+	 * @extensions: The chain of extensions to apply to this object.
+	 *
+	 * This will be useful in the future when we need to support several
+	 * different extensions, and we need to apply more than one when
+	 * creating the object. See struct i915_user_extension.
+	 *
+	 * If we don't supply any extensions then we get the same old gem_create
+	 * behaviour.
+	 *
+	 * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
+	 * struct drm_i915_gem_create_ext_memory_regions.
+	 */
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
+	__u64 extensions;
+};
+
+/**
+ * struct drm_i915_gem_create_ext_memory_regions - The
+ * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
+ *
+ * Set the object with the desired set of placements/regions in priority
+ * order. Each entry must be unique and supported by the device.
+ *
+ * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
+ * an equivalent layout of class:instance pair encodings. See struct
+ * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
+ * query the supported regions for a device.
+ *
+ * As an example, on discrete devices, if we wish to set the placement as
+ * device local-memory we can do something like:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_gem_memory_class_instance region_lmem = {
+ *              .memory_class = I915_MEMORY_CLASS_DEVICE,
+ *              .memory_instance = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext_memory_regions regions = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
+ *              .regions = (uintptr_t)&region_lmem,
+ *              .num_regions = 1,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = 16 * PAGE_SIZE,
+ *              .extensions = (uintptr_t)&regions,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ *
+ * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
+ * along with the final object size in &drm_i915_gem_create_ext.size, which
+ * should account for any rounding up, if required.
+ */
+struct drm_i915_gem_create_ext_memory_regions {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @pad: MBZ */
+	__u32 pad;
+	/** @num_regions: Number of elements in the @regions array. */
+	__u32 num_regions;
+	/**
+	 * @regions: The regions/placements array.
+	 *
+	 * An array of struct drm_i915_gem_memory_class_instance.
+	 */
+	__u64 regions;
+};
diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
new file mode 100644
index 000000000000..462f1efd9003
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
@@ -0,0 +1,130 @@
+=========================
+I915 DG1/LMEM RFC Section
+=========================
+
+Upstream plan
+=============
+For upstream the overall plan for landing all the DG1 stuff and turning it for
+real, with all the uAPI bits is:
+
+* Merge basic HW enabling of DG1(still without pciid)
+* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
+        * At this point we can still make changes, but importantly this lets us
+          start running IGTs which can utilize local-memory in CI
+* Convert over to TTM, make sure it all keeps working. Some of the work items:
+        * TTM shrinker for discrete
+        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
+        * Use TTM CPU pagefault handler
+        * Route shmem backend over to TTM SYSTEM for discrete
+        * TTM purgeable object support
+        * Move i915 buddy allocator over to TTM
+        * MMAP ioctl mode(see `I915 MMAP`_)
+        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
+* Add pciid for DG1 and turn on uAPI for real
+
+New object placement and region query uAPI
+==========================================
+Starting from DG1 we need to give userspace the ability to allocate buffers from
+device local-memory. Currently the driver supports gem_create, which can place
+buffers in system memory via shmem, and the usual assortment of other
+interfaces, like dumb buffers and userptr.
+
+To support this new capability, while also providing a uAPI which will work
+beyond just DG1, we propose to offer three new bits of uAPI:
+
+DRM_I915_QUERY_MEMORY_REGIONS
+-----------------------------
+New query ID which allows userspace to discover the list of supported memory
+regions(like system-memory and local-memory) for a given device. We identify
+each region with a class and instance pair, which should be unique. The class
+here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
+like DG1.
+
+Side note: The class/instance design is borrowed from our existing engine uAPI,
+where we describe every physical engine in terms of its class, and the
+particular instance, since we can have more than one per class.
+
+In the future we also want to expose more information which can further
+describe the capabilities of a region.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
+
+GEM_CREATE_EXT
+--------------
+New ioctl which is basically just gem_create but now allows userspace to
+provide a chain of possible extensions. Note that if we don't provide any
+extensions then we get the exact same behaviour as gem_create.
+
+Side note: We also need to support PXP[1] in the near future, which is also
+applicable to integrated platforms, and adds its own gem_create_ext extension,
+which basically lets userspace mark a buffer as "protected".
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_create_ext
+
+I915_GEM_CREATE_EXT_MEMORY_REGIONS
+----------------------------------
+Implemented as an extension for gem_create_ext, we would now allow userspace to
+optionally provide an immutable list of preferred placements at creation time,
+in priority order, for a given buffer object.  For the placements we expect
+them each to use the class/instance encoding, as per the output of the regions
+query. Having the list in priority order will be useful in the future when
+placing an object, say during eviction.
+
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
+        :functions: drm_i915_gem_create_ext_memory_regions
+
+One fair criticism here is that this seems a little over-engineered[2]. If we
+just consider DG1 then yes, a simple gem_create.flags or something is totally
+all that's needed to tell the kernel to allocate the buffer in local-memory or
+whatever. However looking to the future we need uAPI which can also support
+upcoming Xe HP multi-tile architecture in a sane way, where there can be
+multiple local-memory instances for a given device, and so using both class and
+instance in our uAPI to describe regions is desirable, although specifically
+for DG1 it's uninteresting, since we only have a single local-memory instance.
+
+Existing uAPI issues
+====================
+Some potential issues we still need to resolve.
+
+I915 MMAP
+---------
+In i915 there are multiple ways to MMAP GEM object, including mapping the same
+object using different mapping types(WC vs WB), i.e multiple active mmaps per
+object. TTM expects one MMAP at most for the lifetime of the object. If it
+turns out that we have to backpedal here, there might be some potential
+userspace fallout.
+
+I915 SET/GET CACHING
+--------------------
+In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
+DG1 doesn't support non-snooped pcie transactions, so we can just always
+allocate as WB for smem-only buffers.  If/when our hw gains support for
+non-snooped pcie transactions then we must fix this mode at allocation time as
+a new GEM extension.
+
+This is related to the mmap problem, because in general (meaning, when we're
+not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
+allocation mode.
+
+Possible idea is to let the kernel picks the mmap mode for userspace from the
+following table:
+
+smem-only: WB. Userspace does not need to call clflush.
+
+smem+lmem: We allocate uncached memory, and give userspace a WC mapping
+for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
+access, which is a bit inefficient.
+
+lmem only: always WC
+
+This means on discrete you only get a single mmap mode, all others must be
+rejected. That's probably going to be a new default mode or something like
+that.
+
+Links
+=====
+[1] https://patchwork.freedesktop.org/series/86798/
+
+[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index a8621f7dab8b..05670442ca1b 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -15,3 +15,7 @@ host such documentation:
 
 * Once the code has landed move all the documentation to the right places in
   the main core, helper or driver sections.
+
+.. toctree::
+
+    i915_gem_lmem.rst
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/9] drm/i915: mark stolen as private
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
@ 2021-04-26  9:38 ` Matthew Auld
  2021-04-26  9:38 ` [PATCH 3/9] drm/i915/query: Expose memory regions through the query uAPI Matthew Auld
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Jordan Justen,
	dri-devel, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter

In the next patch we want to expose the supported regions to userspace,
which can then be fed into the gem_create_ext placement extensions. For
now treat stolen memory as private from userspace pov.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 ++++
 drivers/gpu/drm/i915/intel_memory_region.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index c5b64b2400e8..3bcbb146511a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -803,6 +803,8 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915)
 
 	intel_memory_region_set_name(mem, "stolen-local");
 
+	mem->private = true;
+
 	return mem;
 }
 
@@ -821,6 +823,8 @@ i915_gem_stolen_smem_setup(struct drm_i915_private *i915)
 
 	intel_memory_region_set_name(mem, "stolen-system");
 
+	mem->private = true;
+
 	return mem;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 4c8ec15af55f..942fc4f68764 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -86,6 +86,7 @@ struct intel_memory_region {
 	u16 instance;
 	enum intel_region_id id;
 	char name[16];
+	bool private; /* not for userspace */
 
 	struct list_head reserved;
 
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/9] drm/i915/query: Expose memory regions through the query uAPI
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
  2021-04-26  9:38 ` [PATCH 2/9] drm/i915: mark stolen as private Matthew Auld
@ 2021-04-26  9:38 ` Matthew Auld
  2021-04-26  9:38 ` [PATCH 4/9] drm/i915: rework gem_create flow for upcoming extensions Matthew Auld
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Abdiel Janulgue, Thomas Hellström, Daniele Ceraolo Spurio,
	Jordan Justen, dri-devel, Kenneth Graunke, Jon Bloomfield,
	Jason Ekstrand, mesa-dev, Lionel Landwerlin, Daniel Vetter

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

Returns the available memory region areas supported by the HW.

v2(Daniel & Jason):
    - Add some kernel-doc, including example usage.
    - Drop all the extra rsvd

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/i915_query.c          |  57 +++++++++++
 drivers/gpu/drm/i915/intel_memory_region.h |   8 +-
 include/uapi/drm/i915_drm.h                | 109 +++++++++++++++++++++
 3 files changed, 169 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..0b4cb2e1a15c 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -419,11 +419,68 @@ static int query_perf_config(struct drm_i915_private *i915,
 	}
 }
 
+static int query_memregion_info(struct drm_i915_private *i915,
+				struct drm_i915_query_item *query_item)
+{
+	struct drm_i915_query_memory_regions __user *query_ptr =
+		u64_to_user_ptr(query_item->data_ptr);
+	struct drm_i915_memory_region_info __user *info_ptr =
+		&query_ptr->regions[0];
+	struct drm_i915_memory_region_info info = { };
+	struct drm_i915_query_memory_regions query;
+	struct intel_memory_region *mr;
+	u32 total_length;
+	int ret, id;
+
+	if (query_item->flags != 0)
+		return -EINVAL;
+
+	total_length = sizeof(query);
+	for_each_memory_region(mr, i915, id) {
+		if (mr->private)
+			continue;
+
+		total_length += sizeof(info);
+	}
+
+	ret = copy_query_item(&query, sizeof(query), total_length, query_item);
+	if (ret != 0)
+		return ret;
+
+	if (query.num_regions)
+		return -EINVAL;
+
+	if (query.pad)
+		return  -EINVAL;
+
+	for_each_memory_region(mr, i915, id) {
+		if (mr->private)
+			continue;
+
+		info.region.memory_class = mr->type;
+		info.region.memory_instance = mr->instance;
+		info.probed_size = mr->total;
+		info.unallocated_size = mr->avail;
+
+		if (__copy_to_user(info_ptr, &info, sizeof(info)))
+			return -EFAULT;
+
+		query.num_regions++;
+		info_ptr++;
+	}
+
+	if (__copy_to_user(query_ptr, &query, sizeof(query)))
+		return -EFAULT;
+
+	return total_length;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
 					struct drm_i915_query_item *query_item) = {
 	query_topology_info,
 	query_engine_info,
 	query_perf_config,
+	query_memregion_info,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 942fc4f68764..7cd8e3d66a7f 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -11,6 +11,7 @@
 #include <linux/mutex.h>
 #include <linux/io-mapping.h>
 #include <drm/drm_mm.h>
+#include <drm/i915_drm.h>
 
 #include "i915_buddy.h"
 
@@ -19,12 +20,9 @@ struct drm_i915_gem_object;
 struct intel_memory_region;
 struct sg_table;
 
-/**
- *  Base memory type
- */
 enum intel_memory_type {
-	INTEL_MEMORY_SYSTEM = 0,
-	INTEL_MEMORY_LOCAL,
+	INTEL_MEMORY_SYSTEM = I915_MEMORY_CLASS_SYSTEM,
+	INTEL_MEMORY_LOCAL = I915_MEMORY_CLASS_DEVICE,
 	INTEL_MEMORY_STOLEN_SYSTEM,
 	INTEL_MEMORY_STOLEN_LOCAL,
 };
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6a34243a7646..c5e9c68c310d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2230,6 +2230,7 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
 #define DRM_I915_QUERY_ENGINE_INFO	2
 #define DRM_I915_QUERY_PERF_CONFIG      3
+#define DRM_I915_QUERY_MEMORY_REGIONS   4
 /* Must be kept compact -- no holes and well documented */
 
 	/**
@@ -2464,6 +2465,114 @@ struct drm_i915_query_perf_config {
 	__u8 data[];
 };
 
+/**
+ * enum drm_i915_gem_memory_class - Supported memory classes
+ */
+enum drm_i915_gem_memory_class {
+	/** @I915_MEMORY_CLASS_SYSTEM: System memory */
+	I915_MEMORY_CLASS_SYSTEM = 0,
+	/** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
+	I915_MEMORY_CLASS_DEVICE,
+};
+
+/**
+ * struct drm_i915_gem_memory_class_instance - Identify particular memory region
+ */
+struct drm_i915_gem_memory_class_instance {
+	/** @memory_class: See enum drm_i915_gem_memory_class */
+	__u16 memory_class;
+
+	/** @memory_instance: Which instance */
+	__u16 memory_instance;
+};
+
+/**
+ * struct drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note that we reserve some stuff here for potential future work. As an example
+ * we might want expose the capabilities(see @caps) for a given region, which
+ * could include things like if the region is CPU mappable/accessible, what are
+ * the supported mapping types etc.
+ *
+ * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
+ * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct drm_i915_memory_region_info {
+	/** @region: The class:instance pair encoding */
+	struct drm_i915_gem_memory_class_instance region;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @caps: MBZ */
+	__u64 caps;
+
+	/** @probed_size: Memory probed by the driver (-1 = unknown) */
+	__u64 probed_size;
+
+	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+	__u64 unallocated_size;
+};
+
+/**
+ * struct drm_i915_query_memory_regions
+ *
+ * The region info query enumerates all regions known to the driver by filling
+ * in an array of struct drm_i915_memory_region_info structures.
+ *
+ * Example for getting the list of supported regions:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_query_memory_regions *info;
+ *	struct drm_i915_query_item item = {
+ *		.query_id = DRM_I915_QUERY_MEMORY_REGIONS;
+ *	};
+ *	struct drm_i915_query query = {
+ *		.num_items = 1,
+ *		.items_ptr = (uintptr_t)&item,
+ *	};
+ *	int err, i;
+ *
+ *	// First query the size of the blob we need, this needs to be large
+ *	// enough to hold our array of regions. The kernel will fill out the
+ *	// item.length for us, which is the number of bytes we need.
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	info = calloc(1, item.length);
+ *	// Now that we allocated the required number of bytes, we call the ioctl
+ *	// again, this time with the data_ptr pointing to our newly allocated
+ *	// blob, which the kernel can then populate with the all the region info.
+ *	item.data_ptr = (uintptr_t)&info,
+ *
+ *	err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
+ *	if (err) ...
+ *
+ *	// We can now access each region in the array
+ *	for (i = 0; i < info->num_regions; i++) {
+ *		struct drm_i915_memory_region_info mr = info->regions[i];
+ *		u16 class = mr.region.class;
+ *		u16 instance = mr.region.instance;
+ *
+ *		....
+ *	}
+ *
+ *	free(info);
+ */
+struct drm_i915_query_memory_regions {
+	/** @num_regions: Number of supported regions */
+	__u32 num_regions;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @regions: Info about each supported region */
+	struct drm_i915_memory_region_info regions[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/9] drm/i915: rework gem_create flow for upcoming extensions
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
  2021-04-26  9:38 ` [PATCH 2/9] drm/i915: mark stolen as private Matthew Auld
  2021-04-26  9:38 ` [PATCH 3/9] drm/i915/query: Expose memory regions through the query uAPI Matthew Auld
@ 2021-04-26  9:38 ` Matthew Auld
  2021-04-26  9:38 ` [PATCH 5/9] drm/i915/uapi: introduce drm_i915_gem_create_ext Matthew Auld
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Jordan Justen,
	dri-devel, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter

With the upcoming gem_create_ext we want to be able create a "vanilla"
object upfront and pass that directly to the extensions, before actually
initialising the object. Functionally this should be the same expect we
now feed the object into the lower-level region specific init_object.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 92 +++++++++++++++-------
 1 file changed, 65 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 45d60e3d98e3..73f29224f5fe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -7,41 +7,51 @@
 #include "gem/i915_gem_region.h"
 
 #include "i915_drv.h"
+#include "i915_trace.h"
+
+static int i915_gem_publish(struct drm_i915_gem_object *obj,
+			    struct drm_file *file,
+			    u64 *size_p,
+			    u32 *handle_p)
+{
+	u64 size = obj->base.size;
+	int ret;
+
+	ret = drm_gem_handle_create(file, &obj->base, handle_p);
+	/* drop reference from allocate - handle holds it now */
+	i915_gem_object_put(obj);
+	if (ret)
+		return ret;
+
+	*size_p = size;
+	return 0;
+}
 
 static int
-i915_gem_create(struct drm_file *file,
-		struct intel_memory_region *mr,
-		u64 *size_p,
-		u32 *handle_p)
+i915_gem_setup(struct drm_i915_gem_object *obj,
+	       struct intel_memory_region *mr,
+	       u64 size)
 {
-	struct drm_i915_gem_object *obj;
-	u32 handle;
-	u64 size;
 	int ret;
 
 	GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
-	size = round_up(*size_p, mr->min_page_size);
+	size = round_up(size, mr->min_page_size);
 	if (size == 0)
 		return -EINVAL;
 
 	/* For most of the ABI (e.g. mmap) we think in system pages */
 	GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
 
-	/* Allocate the new object */
-	obj = i915_gem_object_create_region(mr, size, 0);
-	if (IS_ERR(obj))
-		return PTR_ERR(obj);
-
-	GEM_BUG_ON(size != obj->base.size);
+	if (i915_gem_object_size_2big(size))
+		return -E2BIG;
 
-	ret = drm_gem_handle_create(file, &obj->base, &handle);
-	/* drop reference from allocate - handle holds it now */
-	i915_gem_object_put(obj);
+	ret = mr->ops->init_object(mr, obj, size, 0);
 	if (ret)
 		return ret;
 
-	*handle_p = handle;
-	*size_p = size;
+	GEM_BUG_ON(size != obj->base.size);
+
+	trace_i915_gem_object_create(obj);
 	return 0;
 }
 
@@ -50,9 +60,11 @@ i915_gem_dumb_create(struct drm_file *file,
 		     struct drm_device *dev,
 		     struct drm_mode_create_dumb *args)
 {
+	struct drm_i915_gem_object *obj;
 	enum intel_memory_type mem_type;
 	int cpp = DIV_ROUND_UP(args->bpp, 8);
 	u32 format;
+	int ret;
 
 	switch (cpp) {
 	case 1:
@@ -85,10 +97,22 @@ i915_gem_dumb_create(struct drm_file *file,
 	if (HAS_LMEM(to_i915(dev)))
 		mem_type = INTEL_MEMORY_LOCAL;
 
-	return i915_gem_create(file,
-			       intel_memory_region_by_type(to_i915(dev),
-							   mem_type),
-			       &args->size, &args->handle);
+	obj = i915_gem_object_alloc();
+	if (!obj)
+		return -ENOMEM;
+
+	ret = i915_gem_setup(obj,
+			     intel_memory_region_by_type(to_i915(dev),
+							      mem_type),
+			     args->size);
+	if (ret)
+		goto object_free;
+
+	return i915_gem_publish(obj, file, &args->size, &args->handle);
+
+object_free:
+	i915_gem_object_free(obj);
+	return ret;
 }
 
 /**
@@ -103,11 +127,25 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create *args = data;
+	struct drm_i915_gem_object *obj;
+	int ret;
 
 	i915_gem_flush_free_objects(i915);
 
-	return i915_gem_create(file,
-			       intel_memory_region_by_type(i915,
-							   INTEL_MEMORY_SYSTEM),
-			       &args->size, &args->handle);
+	obj = i915_gem_object_alloc();
+	if (!obj)
+		return -ENOMEM;
+
+	ret = i915_gem_setup(obj,
+			     intel_memory_region_by_type(i915,
+							 INTEL_MEMORY_SYSTEM),
+			     args->size);
+	if (ret)
+		goto object_free;
+
+	return i915_gem_publish(obj, file, &args->size, &args->handle);
+
+object_free:
+	i915_gem_object_free(obj);
+	return ret;
 }
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/9] drm/i915/uapi: introduce drm_i915_gem_create_ext
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (2 preceding siblings ...)
  2021-04-26  9:38 ` [PATCH 4/9] drm/i915: rework gem_create flow for upcoming extensions Matthew Auld
@ 2021-04-26  9:38 ` Matthew Auld
  2021-04-26  9:38 ` [PATCH 6/9] drm/i915/uapi: implement object placement extension Matthew Auld
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Jordan Justen, Kenneth Graunke, dri-devel,
	CQ Tang, Daniele Ceraolo Spurio, Jason Ekstrand, mesa-dev,
	Daniel Vetter

Same old gem_create but with now with extensions support. This is needed
to support various upcoming usecases.

v2:(Chris)
    - Use separate ioctl number for gem_create_ext, instead of hijacking
      the existing gem_create ioctl, otherwise we run into the issue
      with being unable to detect if the kernel supports the new extension
      behaviour.
    - We now have gem_create_ext.flags, which should be zeroed.
    - I915_GEM_CREATE_EXT_SETPARAM value is now zero, since this is the
      index into our array of extensions.
    - Setup a "vanilla" object which we can directly apply our extensions
      to.
v3:(Daniel & Jason)
    - drop I915_GEM_CREATE_EXT_SETPARAM. Instead just have each extension
      do one thing only, instead of generic setparam which can cover
      various use cases.
    - add some kernel-doc.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 56 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h |  2 +
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 include/uapi/drm/i915_drm.h                | 42 ++++++++++++++++
 4 files changed, 101 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 73f29224f5fe..90e9eb6601b5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -8,6 +8,7 @@
 
 #include "i915_drv.h"
 #include "i915_trace.h"
+#include "i915_user_extensions.h"
 
 static int i915_gem_publish(struct drm_i915_gem_object *obj,
 			    struct drm_file *file,
@@ -149,3 +150,58 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 	i915_gem_object_free(obj);
 	return ret;
 }
+
+struct create_ext {
+	struct drm_i915_private *i915;
+	struct drm_i915_gem_object *vanilla_object;
+};
+
+static const i915_user_extension_fn create_extensions[] = {
+};
+
+/**
+ * Creates a new mm object and returns a handle to it.
+ * @dev: drm device pointer
+ * @data: ioctl data blob
+ * @file: drm file pointer
+ */
+int
+i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
+			  struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_create_ext *args = data;
+	struct create_ext ext_data = { .i915 = i915 };
+	struct drm_i915_gem_object *obj;
+	int ret;
+
+	if (args->flags)
+		return -EINVAL;
+
+	i915_gem_flush_free_objects(i915);
+
+	obj = i915_gem_object_alloc();
+	if (!obj)
+		return -ENOMEM;
+
+	ext_data.vanilla_object = obj;
+	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
+				   create_extensions,
+				   ARRAY_SIZE(create_extensions),
+				   &ext_data);
+	if (ret)
+		goto object_free;
+
+	ret = i915_gem_setup(obj,
+			     intel_memory_region_by_type(i915,
+							 INTEL_MEMORY_SYSTEM),
+			     args->size);
+	if (ret)
+		goto object_free;
+
+	return i915_gem_publish(obj, file, &args->size, &args->handle);
+
+object_free:
+	i915_gem_object_free(obj);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
index 7fd22f3efbef..28d6526e32ab 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h
@@ -14,6 +14,8 @@ int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
 int i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *file);
+int i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
 int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 785dcf20c77b..b5878c089830 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1728,6 +1728,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CREATE, i915_gem_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_CREATE_EXT, i915_gem_create_ext_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PREAD, i915_gem_pread_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_PWRITE, i915_gem_pwrite_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_MMAP, i915_gem_mmap_ioctl, DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index c5e9c68c310d..47a47b87380f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -406,6 +406,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_QUERY			0x39
 #define DRM_I915_GEM_VM_CREATE		0x3a
 #define DRM_I915_GEM_VM_DESTROY		0x3b
+#define DRM_I915_GEM_CREATE_EXT		0x3c
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -438,6 +439,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_ENTERVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_ENTERVT)
 #define DRM_IOCTL_I915_GEM_LEAVEVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_LEAVEVT)
 #define DRM_IOCTL_I915_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE, struct drm_i915_gem_create)
+#define DRM_IOCTL_I915_GEM_CREATE_EXT	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
 #define DRM_IOCTL_I915_GEM_PREAD	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_PREAD, struct drm_i915_gem_pread)
 #define DRM_IOCTL_I915_GEM_PWRITE	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_PWRITE, struct drm_i915_gem_pwrite)
 #define DRM_IOCTL_I915_GEM_MMAP		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MMAP, struct drm_i915_gem_mmap)
@@ -2573,6 +2575,46 @@ struct drm_i915_query_memory_regions {
 	struct drm_i915_memory_region_info regions[];
 };
 
+/**
+ * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct drm_i915_gem_create_ext {
+	/**
+	 * @size: Requested size for the object.
+	 *
+	 * The (page-aligned) allocated size for the object will be returned.
+	 *
+	 */
+	__u64 size;
+	/**
+	 * @handle: Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+	/** @flags: MBZ */
+	__u32 flags;
+	/**
+	 * @extensions: The chain of extensions to apply to this object.
+	 *
+	 * This will be useful in the future when we need to support several
+	 * different extensions, and we need to apply more than one when
+	 * creating the object. See struct i915_user_extension.
+	 *
+	 * If we don't supply any extensions then we get the same old gem_create
+	 * behaviour.
+	 *
+	 */
+	__u64 extensions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/9] drm/i915/uapi: implement object placement extension
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (3 preceding siblings ...)
  2021-04-26  9:38 ` [PATCH 5/9] drm/i915/uapi: introduce drm_i915_gem_create_ext Matthew Auld
@ 2021-04-26  9:38 ` Matthew Auld
  2021-04-28 17:28   ` Kenneth Graunke
  2021-04-26  9:38 ` [PATCH 7/9] drm/i915/lmem: support optional CPU clearing for special internal use Matthew Auld
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Jordan Justen, Kenneth Graunke, dri-devel,
	CQ Tang, Daniele Ceraolo Spurio, Jason Ekstrand, mesa-dev,
	Daniel Vetter

Add new extension to support setting an immutable-priority-list of
potential placements, at creation time.

If we use the normal gem_create or gem_create_ext without the
extensions/placements then we still get the old behaviour with only
placing the object in system memory.

v2(Daniel & Jason):
    - Add a bunch of kernel-doc
    - Simplify design for placements extension

Testcase: igt/gem_create/create-ext-placement-sanity-check
Testcase: igt/gem_create/create-ext-placement-each
Testcase: igt/gem_create/create-ext-placement-all
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c    | 215 ++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |   3 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 +
 .../drm/i915/gem/selftests/i915_gem_mman.c    |  26 +++
 drivers/gpu/drm/i915/intel_memory_region.c    |  16 ++
 drivers/gpu/drm/i915/intel_memory_region.h    |   4 +
 include/uapi/drm/i915_drm.h                   |  62 +++++
 7 files changed, 315 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 90e9eb6601b5..895f1666a8d3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -4,12 +4,47 @@
  */
 
 #include "gem/i915_gem_ioctls.h"
+#include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
 
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
+static u32 object_max_page_size(struct drm_i915_gem_object *obj)
+{
+	u32 max_page_size = 0;
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++) {
+		struct intel_memory_region *mr = obj->mm.placements[i];
+
+		GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
+		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
+	}
+
+	GEM_BUG_ON(!max_page_size);
+	return max_page_size;
+}
+
+static void object_set_placements(struct drm_i915_gem_object *obj,
+				  struct intel_memory_region **placements,
+				  unsigned int n_placements)
+{
+	GEM_BUG_ON(!n_placements);
+
+	if (n_placements == 1) {
+		struct intel_memory_region *mr = placements[0];
+		struct drm_i915_private *i915 = mr->i915;
+
+		obj->mm.placements = &i915->mm.regions[mr->id];
+		obj->mm.n_placements = 1;
+	} else {
+		obj->mm.placements = placements;
+		obj->mm.n_placements = n_placements;
+	}
+}
+
 static int i915_gem_publish(struct drm_i915_gem_object *obj,
 			    struct drm_file *file,
 			    u64 *size_p,
@@ -29,14 +64,12 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_setup(struct drm_i915_gem_object *obj,
-	       struct intel_memory_region *mr,
-	       u64 size)
+i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
 {
+	struct intel_memory_region *mr = obj->mm.placements[0];
 	int ret;
 
-	GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
-	size = round_up(size, mr->min_page_size);
+	size = round_up(size, object_max_page_size(obj));
 	if (size == 0)
 		return -EINVAL;
 
@@ -53,6 +86,7 @@ i915_gem_setup(struct drm_i915_gem_object *obj,
 	GEM_BUG_ON(size != obj->base.size);
 
 	trace_i915_gem_object_create(obj);
+
 	return 0;
 }
 
@@ -62,6 +96,7 @@ i915_gem_dumb_create(struct drm_file *file,
 		     struct drm_mode_create_dumb *args)
 {
 	struct drm_i915_gem_object *obj;
+	struct intel_memory_region *mr;
 	enum intel_memory_type mem_type;
 	int cpp = DIV_ROUND_UP(args->bpp, 8);
 	u32 format;
@@ -102,10 +137,10 @@ i915_gem_dumb_create(struct drm_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	ret = i915_gem_setup(obj,
-			     intel_memory_region_by_type(to_i915(dev),
-							      mem_type),
-			     args->size);
+	mr = intel_memory_region_by_type(to_i915(dev), mem_type);
+	object_set_placements(obj, &mr, 1);
+
+	ret = i915_gem_setup(obj, args->size);
 	if (ret)
 		goto object_free;
 
@@ -129,6 +164,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create *args = data;
 	struct drm_i915_gem_object *obj;
+	struct intel_memory_region *mr;
 	int ret;
 
 	i915_gem_flush_free_objects(i915);
@@ -137,10 +173,10 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 	if (!obj)
 		return -ENOMEM;
 
-	ret = i915_gem_setup(obj,
-			     intel_memory_region_by_type(i915,
-							 INTEL_MEMORY_SYSTEM),
-			     args->size);
+	mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
+	object_set_placements(obj, &mr, 1);
+
+	ret = i915_gem_setup(obj, args->size);
 	if (ret)
 		goto object_free;
 
@@ -156,7 +192,144 @@ struct create_ext {
 	struct drm_i915_gem_object *vanilla_object;
 };
 
+static void repr_placements(char *buf, size_t size,
+			    struct intel_memory_region **placements,
+			    int n_placements)
+{
+	int i;
+
+	buf[0] = '\0';
+
+	for (i = 0; i < n_placements; i++) {
+		struct intel_memory_region *mr = placements[i];
+		int r;
+
+		r = snprintf(buf, size, "\n  %s -> { class: %d, inst: %d }",
+			     mr->name, mr->type, mr->instance);
+		if (r >= size)
+			return;
+
+		buf += r;
+		size -= r;
+	}
+}
+
+static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
+			  struct create_ext *ext_data)
+{
+	struct drm_i915_private *i915 = ext_data->i915;
+	struct drm_i915_gem_memory_class_instance __user *uregions =
+		u64_to_user_ptr(args->regions);
+	struct drm_i915_gem_object *obj = ext_data->vanilla_object;
+	struct intel_memory_region **placements;
+	u32 mask;
+	int i, ret = 0;
+
+	if (args->pad) {
+		drm_dbg(&i915->drm, "pad should be zero\n");
+		ret = -EINVAL;
+	}
+
+	if (!args->num_regions) {
+		drm_dbg(&i915->drm, "num_regions is zero\n");
+		ret = -EINVAL;
+	}
+
+	if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) {
+		drm_dbg(&i915->drm, "num_regions is too large\n");
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	placements = kmalloc_array(args->num_regions,
+				   sizeof(struct intel_memory_region *),
+				   GFP_KERNEL);
+	if (!placements)
+		return -ENOMEM;
+
+	mask = 0;
+	for (i = 0; i < args->num_regions; i++) {
+		struct drm_i915_gem_memory_class_instance region;
+		struct intel_memory_region *mr;
+
+		if (copy_from_user(&region, uregions, sizeof(region))) {
+			ret = -EFAULT;
+			goto out_free;
+		}
+
+		mr = intel_memory_region_lookup(i915,
+						region.memory_class,
+						region.memory_instance);
+		if (!mr || mr->private) {
+			drm_dbg(&i915->drm, "Device is missing region { class: %d, inst: %d } at index = %d\n",
+				region.memory_class, region.memory_instance, i);
+			ret = -EINVAL;
+			goto out_dump;
+		}
+
+		if (mask & BIT(mr->id)) {
+			drm_dbg(&i915->drm, "Found duplicate placement %s -> { class: %d, inst: %d } at index = %d\n",
+				mr->name, region.memory_class,
+				region.memory_instance, i);
+			ret = -EINVAL;
+			goto out_dump;
+		}
+
+		placements[i] = mr;
+		mask |= BIT(mr->id);
+
+		++uregions;
+	}
+
+	if (obj->mm.placements) {
+		ret = -EINVAL;
+		goto out_dump;
+	}
+
+	object_set_placements(obj, placements, args->num_regions);
+	if (args->num_regions == 1)
+		kfree(placements);
+
+	return 0;
+
+out_dump:
+	if (1) {
+		char buf[256];
+
+		if (obj->mm.placements) {
+			repr_placements(buf,
+					sizeof(buf),
+					obj->mm.placements,
+					obj->mm.n_placements);
+			drm_dbg(&i915->drm,
+				"Placements were already set in previous EXT. Existing placements: %s\n",
+				buf);
+		}
+
+		repr_placements(buf, sizeof(buf), placements, i);
+		drm_dbg(&i915->drm, "New placements(so far validated): %s\n", buf);
+	}
+
+out_free:
+	kfree(placements);
+	return ret;
+}
+
+static int ext_set_placements(struct i915_user_extension __user *base,
+			      void *data)
+{
+	struct drm_i915_gem_create_ext_memory_regions ext;
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	return set_placements(&ext, data);
+}
+
 static const i915_user_extension_fn create_extensions[] = {
+	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 };
 
 /**
@@ -172,6 +345,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_create_ext *args = data;
 	struct create_ext ext_data = { .i915 = i915 };
+	struct intel_memory_region **placements_ext;
 	struct drm_i915_gem_object *obj;
 	int ret;
 
@@ -189,19 +363,26 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 				   create_extensions,
 				   ARRAY_SIZE(create_extensions),
 				   &ext_data);
+	placements_ext = obj->mm.placements;
 	if (ret)
 		goto object_free;
 
-	ret = i915_gem_setup(obj,
-			     intel_memory_region_by_type(i915,
-							 INTEL_MEMORY_SYSTEM),
-			     args->size);
+	if (!placements_ext) {
+		struct intel_memory_region *mr =
+			intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
+
+		object_set_placements(obj, &mr, 1);
+	}
+
+	ret = i915_gem_setup(obj, args->size);
 	if (ret)
 		goto object_free;
 
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
 
 object_free:
+	if (obj->mm.n_placements > 1)
+		kfree(placements_ext);
 	i915_gem_object_free(obj);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ea74cbca95be..28144410df86 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -249,6 +249,9 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 		if (obj->ops->release)
 			obj->ops->release(obj);
 
+		if (obj->mm.n_placements > 1)
+			kfree(obj->mm.placements);
+
 		/* But keep the pointer alive for RCU-protected lookups */
 		call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
 		cond_resched();
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 8e485cb3343c..69d6e54bc569 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -219,6 +219,12 @@ struct drm_i915_gem_object {
 		atomic_t pages_pin_count;
 		atomic_t shrink_pin;
 
+		/**
+		 * Priority list of potential placements for this object.
+		 */
+		struct intel_memory_region **placements;
+		int n_placements;
+
 		/**
 		 * Memory region for this object.
 		 */
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 5cf6df49c333..05a3b29f545e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -842,6 +842,24 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
 	return true;
 }
 
+static void object_set_placements(struct drm_i915_gem_object *obj,
+				  struct intel_memory_region **placements,
+				  unsigned int n_placements)
+{
+	GEM_BUG_ON(!n_placements);
+
+	if (n_placements == 1) {
+		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_memory_region *mr = placements[0];
+
+		obj->mm.placements = &i915->mm.regions[mr->id];
+		obj->mm.n_placements = 1;
+	} else {
+		obj->mm.placements = placements;
+		obj->mm.n_placements = n_placements;
+	}
+}
+
 #define expand32(x) (((x) << 0) | ((x) << 8) | ((x) << 16) | ((x) << 24))
 static int __igt_mmap(struct drm_i915_private *i915,
 		      struct drm_i915_gem_object *obj,
@@ -950,6 +968,8 @@ static int igt_mmap(void *arg)
 			if (IS_ERR(obj))
 				return PTR_ERR(obj);
 
+			object_set_placements(obj, &mr, 1);
+
 			err = __igt_mmap(i915, obj, I915_MMAP_TYPE_GTT);
 			if (err == 0)
 				err = __igt_mmap(i915, obj, I915_MMAP_TYPE_WC);
@@ -1068,6 +1088,8 @@ static int igt_mmap_access(void *arg)
 		if (IS_ERR(obj))
 			return PTR_ERR(obj);
 
+		object_set_placements(obj, &mr, 1);
+
 		err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_GTT);
 		if (err == 0)
 			err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_WB);
@@ -1211,6 +1233,8 @@ static int igt_mmap_gpu(void *arg)
 		if (IS_ERR(obj))
 			return PTR_ERR(obj);
 
+		object_set_placements(obj, &mr, 1);
+
 		err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_GTT);
 		if (err == 0)
 			err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_WC);
@@ -1354,6 +1378,8 @@ static int igt_mmap_revoke(void *arg)
 		if (IS_ERR(obj))
 			return PTR_ERR(obj);
 
+		object_set_placements(obj, &mr, 1);
+
 		err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_GTT);
 		if (err == 0)
 			err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_WC);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
index 481a487faca6..d98e8b81d322 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -28,6 +28,22 @@ static const struct {
 	},
 };
 
+struct intel_memory_region *
+intel_memory_region_lookup(struct drm_i915_private *i915,
+			   u16 class, u16 instance)
+{
+	struct intel_memory_region *mr;
+	int id;
+
+	/* XXX: consider maybe converting to an rb tree at some point */
+	for_each_memory_region(mr, i915, id) {
+		if (mr->type == class && mr->instance == instance)
+			return mr;
+	}
+
+	return NULL;
+}
+
 struct intel_memory_region *
 intel_memory_region_by_type(struct drm_i915_private *i915,
 			    enum intel_memory_type mem_type)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
index 7cd8e3d66a7f..d24ce5a0b30b 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.h
+++ b/drivers/gpu/drm/i915/intel_memory_region.h
@@ -97,6 +97,10 @@ struct intel_memory_region {
 	} objects;
 };
 
+struct intel_memory_region *
+intel_memory_region_lookup(struct drm_i915_private *i915,
+			   u16 class, u16 instance);
+
 int intel_memory_region_init_buddy(struct intel_memory_region *mem);
 void intel_memory_region_release_buddy(struct intel_memory_region *mem);
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 47a47b87380f..d025f7da2735 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2591,6 +2591,11 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
+	 * Note that for some devices we have might have further minimum
+	 * page-size restrictions(larger than 4K), like for device local-memory.
+	 * However in general the final size here should always reflect any
+	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
+	 * extension to place the object in device local-memory.
 	 */
 	__u64 size;
 	/**
@@ -2611,10 +2616,67 @@ struct drm_i915_gem_create_ext {
 	 * If we don't supply any extensions then we get the same old gem_create
 	 * behaviour.
 	 *
+	 * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
+	 * struct drm_i915_gem_create_ext_memory_regions.
 	 */
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 	__u64 extensions;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_memory_regions - The
+ * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
+ *
+ * Set the object with the desired set of placements/regions in priority
+ * order. Each entry must be unique and supported by the device.
+ *
+ * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
+ * an equivalent layout of class:instance pair encodings. See struct
+ * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
+ * query the supported regions for a device.
+ *
+ * As an example, on discrete devices, if we wish to set the placement as
+ * device local-memory we can do something like:
+ *
+ * .. code-block:: C
+ *
+ *	struct drm_i915_gem_memory_class_instance region_lmem = {
+ *              .memory_class = I915_MEMORY_CLASS_DEVICE,
+ *              .memory_instance = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext_memory_regions regions = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
+ *              .regions = (uintptr_t)&region_lmem,
+ *              .num_regions = 1,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = 16 * PAGE_SIZE,
+ *              .extensions = (uintptr_t)&regions,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ *
+ * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
+ * along with the final object size in &drm_i915_gem_create_ext.size, which
+ * should account for any rounding up, if required.
+ */
+struct drm_i915_gem_create_ext_memory_regions {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+
+	/** @pad: MBZ */
+	__u32 pad;
+	/** @num_regions: Number of elements in the @regions array. */
+	__u32 num_regions;
+	/**
+	 * @regions: The regions/placements array.
+	 *
+	 * An array of struct drm_i915_gem_memory_class_instance.
+	 */
+	__u64 regions;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 7/9] drm/i915/lmem: support optional CPU clearing for special internal use
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (4 preceding siblings ...)
  2021-04-26  9:38 ` [PATCH 6/9] drm/i915/uapi: implement object placement extension Matthew Auld
@ 2021-04-26  9:38 ` Matthew Auld
  2021-04-26  9:39 ` [PATCH 8/9] drm/i915/gem: clear userspace buffers for LMEM Matthew Auld
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:38 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Jordan Justen,
	dri-devel, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter

For some internal device local-memory objects it would be useful to have
an option to CPU clear the pages upon gathering the backing store. Note
that this might be before the blitter is useable, which is the case for
some internal GuC objects.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  8 +-
 drivers/gpu/drm/i915/gem/i915_gem_region.c    | 22 +++++
 .../drm/i915/selftests/intel_memory_region.c  | 87 ++++++++++++++++++-
 3 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 69d6e54bc569..0727d0c76aa0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -172,11 +172,13 @@ struct drm_i915_gem_object {
 #define I915_BO_ALLOC_CONTIGUOUS BIT(0)
 #define I915_BO_ALLOC_VOLATILE   BIT(1)
 #define I915_BO_ALLOC_STRUCT_PAGE BIT(2)
+#define I915_BO_ALLOC_CPU_CLEAR  BIT(3)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 			     I915_BO_ALLOC_VOLATILE | \
-			     I915_BO_ALLOC_STRUCT_PAGE)
-#define I915_BO_READONLY         BIT(3)
-#define I915_TILING_QUIRK_BIT    4 /* unknown swizzling; do not release! */
+			     I915_BO_ALLOC_STRUCT_PAGE | \
+			     I915_BO_ALLOC_CPU_CLEAR)
+#define I915_BO_READONLY         BIT(4)
+#define I915_TILING_QUIRK_BIT    5 /* unknown swizzling; do not release! */
 
 	/*
 	 * Is the object to be mapped as read-only to the GPU
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index 6a84fb6dde24..5d603098da57 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -95,6 +95,28 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj)
 	sg_mark_end(sg);
 	i915_sg_trim(st);
 
+	/* Intended for kernel internal use only */
+	if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) {
+		struct scatterlist *sg;
+		unsigned long i;
+
+		for_each_sg(st->sgl, sg, st->nents, i) {
+			unsigned int length;
+			void __iomem *vaddr;
+			dma_addr_t daddr;
+
+			daddr = sg_dma_address(sg);
+			daddr -= mem->region.start;
+			length = sg_dma_len(sg);
+
+			vaddr = io_mapping_map_wc(&mem->iomap, daddr, length);
+			memset64(vaddr, 0, length / sizeof(u64));
+			io_mapping_unmap(vaddr);
+		}
+
+		wmb();
+	}
+
 	__i915_gem_object_set_pages(obj, st, sg_page_sizes);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index a5fc0bf3feb9..0fe4c81f7589 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -513,7 +513,7 @@ static int igt_cpu_check(struct drm_i915_gem_object *obj, u32 dword, u32 val)
 	if (err)
 		return err;
 
-	ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
+	ptr = i915_gem_object_pin_map(obj, I915_MAP_WC);
 	if (IS_ERR(ptr))
 		return PTR_ERR(ptr);
 
@@ -593,7 +593,9 @@ static int igt_gpu_write(struct i915_gem_context *ctx,
 		if (err)
 			break;
 
+		i915_gem_object_lock(obj, NULL);
 		err = igt_cpu_check(obj, dword, rng);
+		i915_gem_object_unlock(obj);
 		if (err)
 			break;
 	} while (!__igt_timeout(end_time, NULL));
@@ -629,6 +631,88 @@ static int igt_lmem_create(void *arg)
 	return err;
 }
 
+static int igt_lmem_create_cleared_cpu(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	I915_RND_STATE(prng);
+	IGT_TIMEOUT(end_time);
+	u32 size, i;
+	int err;
+
+	i915_gem_drain_freed_objects(i915);
+
+	size = max_t(u32, PAGE_SIZE, i915_prandom_u32_max_state(SZ_32M, &prng));
+	size = round_up(size, PAGE_SIZE);
+	i = 0;
+
+	do {
+		struct drm_i915_gem_object *obj;
+		void __iomem *vaddr;
+		unsigned int flags;
+		u32 dword, val;
+
+		/*
+		 * Alternate between cleared and uncleared allocations, while
+		 * also dirtying the pages each time to check that the pages are
+		 * always cleared if requested, since we should get some overlap
+		 * of the underlying pages, if not all, since we are the only
+		 * user.
+		 */
+
+		flags = I915_BO_ALLOC_CPU_CLEAR;
+		if (i & 1)
+			flags = 0;
+
+		obj = i915_gem_object_create_lmem(i915, size, flags);
+		if (IS_ERR(obj))
+			return PTR_ERR(obj);
+
+		i915_gem_object_lock(obj, NULL);
+		err = i915_gem_object_pin_pages(obj);
+		if (err)
+			goto out_put;
+
+		dword = i915_prandom_u32_max_state(PAGE_SIZE / sizeof(u32),
+						   &prng);
+
+		if (flags & I915_BO_ALLOC_CPU_CLEAR) {
+			err = igt_cpu_check(obj, dword, 0);
+			if (err) {
+				pr_err("%s failed with size=%u, flags=%u\n",
+				       __func__, size, flags);
+				goto out_unpin;
+			}
+		}
+
+		vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC);
+		if (IS_ERR(vaddr)) {
+			err = PTR_ERR(vaddr);
+			goto out_unpin;
+		}
+
+		val = prandom_u32_state(&prng);
+
+		memset32(vaddr, val, obj->base.size / sizeof(u32));
+
+		i915_gem_object_flush_map(obj);
+		i915_gem_object_unpin_map(obj);
+out_unpin:
+		i915_gem_object_unpin_pages(obj);
+		__i915_gem_object_put_pages(obj);
+out_put:
+		i915_gem_object_unlock(obj);
+		i915_gem_object_put(obj);
+
+		if (err)
+			break;
+		++i;
+	} while (!__igt_timeout(end_time, NULL));
+
+	pr_info("%s completed (%u) iterations\n", __func__, i);
+
+	return err;
+}
+
 static int igt_lmem_write_gpu(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1043,6 +1127,7 @@ int intel_memory_region_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_lmem_create),
+		SUBTEST(igt_lmem_create_cleared_cpu),
 		SUBTEST(igt_lmem_write_cpu),
 		SUBTEST(igt_lmem_write_gpu),
 	};
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 8/9] drm/i915/gem: clear userspace buffers for LMEM
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (5 preceding siblings ...)
  2021-04-26  9:38 ` [PATCH 7/9] drm/i915/lmem: support optional CPU clearing for special internal use Matthew Auld
@ 2021-04-26  9:39 ` Matthew Auld
  2021-04-26  9:39 ` [PATCH 9/9] drm/i915/gem: hide new uAPI behind CONFIG_BROKEN Matthew Auld
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Jordan Justen,
	dri-devel, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter

All userspace objects must be cleared when allocating the backing store,
before they are potentially visible to userspace.  For now use simple
CPU based clearing to do this for device local-memory objects, note that
in the near future this will instead use the blitter engine.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 895f1666a8d3..338f3883e238 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -67,6 +67,7 @@ static int
 i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
 {
 	struct intel_memory_region *mr = obj->mm.placements[0];
+	unsigned int flags;
 	int ret;
 
 	size = round_up(size, object_max_page_size(obj));
@@ -79,7 +80,16 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
 	if (i915_gem_object_size_2big(size))
 		return -E2BIG;
 
-	ret = mr->ops->init_object(mr, obj, size, 0);
+	/*
+	 * For now resort to CPU based clearing for device local-memory, in the
+	 * near future this will use the blitter engine for accelerated, GPU
+	 * based clearing.
+	 */
+	flags = 0;
+	if (mr->type == INTEL_MEMORY_LOCAL)
+		flags = I915_BO_ALLOC_CPU_CLEAR;
+
+	ret = mr->ops->init_object(mr, obj, size, flags);
 	if (ret)
 		return ret;
 
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 9/9] drm/i915/gem: hide new uAPI behind CONFIG_BROKEN
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (6 preceding siblings ...)
  2021-04-26  9:39 ` [PATCH 8/9] drm/i915/gem: clear userspace buffers for LMEM Matthew Auld
@ 2021-04-26  9:39 ` Matthew Auld
  2021-04-26 15:11 ` [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Jason Ekstrand
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-26  9:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Jordan Justen,
	dri-devel, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter

Treat it the same as the fake local-memory stuff, where it is disabled
for normal kernels, in case some random UMD is tempted to use this. Once
we have all the other bits and pieces in place, like the TTM conversion,
we can turn this on for real.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 3 +++
 drivers/gpu/drm/i915/i915_query.c          | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 338f3883e238..1d0728b878d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -332,6 +332,9 @@ static int ext_set_placements(struct i915_user_extension __user *base,
 {
 	struct drm_i915_gem_create_ext_memory_regions ext;
 
+	if (!IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM))
+		return -ENODEV;
+
 	if (copy_from_user(&ext, base, sizeof(ext)))
 		return -EFAULT;
 
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 0b4cb2e1a15c..561684ded4a0 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -432,6 +432,9 @@ static int query_memregion_info(struct drm_i915_private *i915,
 	u32 total_length;
 	int ret, id;
 
+	if (!IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM))
+		return -ENODEV;
+
 	if (query_item->flags != 0)
 		return -EINVAL;
 
-- 
2.26.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (7 preceding siblings ...)
  2021-04-26  9:39 ` [PATCH 9/9] drm/i915/gem: hide new uAPI behind CONFIG_BROKEN Matthew Auld
@ 2021-04-26 15:11 ` Jason Ekstrand
  2021-04-26 15:31   ` Matthew Auld
  2021-04-28 15:16 ` Kenneth Graunke
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Jason Ekstrand @ 2021-04-26 15:11 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, ML mesa-dev, Daniel Vetter

On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
>
> Add an entry for the new uAPI needed for DG1. Also add the overall
> upstream plan, including some notes for the TTM conversion.
>
> v2(Daniel):
>   - include the overall upstreaming plan
>   - add a note for mmap, there are differences here for TTM vs i915
>   - bunch of other suggestions from Daniel
> v3:
>  (Daniel)
>   - add a note for set/get caching stuff
>   - add some more docs for existing query and extensions stuff
>   - add an actual code example for regions query
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
>         - try a simpler design with the placements extension
>         - rather than have a generic setparam which can cover multiple
>           use cases, have each extension be responsible for one thing
>           only
> v4:
>  (Daniel)
>   - add some more notes for ttm conversion
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
>         - drop all the extra rsvd members for the region_query and
>           region_info, just keep the bare minimum needed for padding
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: mesa-dev@lists.freedesktop.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Acked-by: Dave Airlie <airlied@redhat.com>
> ---
>  Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
>  Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
>  Documentation/gpu/rfc/index.rst         |   4 +
>  3 files changed, 346 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
>  create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
>
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
> new file mode 100644
> index 000000000000..7ed59b6202d5
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
> @@ -0,0 +1,212 @@
> +/**
> + * enum drm_i915_gem_memory_class - Supported memory classes
> + */
> +enum drm_i915_gem_memory_class {
> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> +       I915_MEMORY_CLASS_SYSTEM = 0,
> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> +       I915_MEMORY_CLASS_DEVICE,
> +};
> +
> +/**
> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
> + */
> +struct drm_i915_gem_memory_class_instance {
> +       /** @memory_class: See enum drm_i915_gem_memory_class */
> +       __u16 memory_class;
> +
> +       /** @memory_instance: Which instance */
> +       __u16 memory_instance;
> +};
> +
> +/**
> + * struct drm_i915_memory_region_info - Describes one region as known to the
> + * driver.
> + *
> + * Note that we reserve some stuff here for potential future work. As an example
> + * we might want expose the capabilities(see @caps) for a given region, which
> + * could include things like if the region is CPU mappable/accessible, what are
> + * the supported mapping types etc.
> + *
> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
> + * at &drm_i915_query_item.query_id.
> + */
> +struct drm_i915_memory_region_info {
> +       /** @region: The class:instance pair encoding */
> +       struct drm_i915_gem_memory_class_instance region;
> +
> +       /** @pad: MBZ */
> +       __u32 pad;
> +
> +       /** @caps: MBZ */
> +       __u64 caps;
> +
> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
> +       __u64 probed_size;
> +
> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> +       __u64 unallocated_size;
> +};
> +
> +/**
> + * struct drm_i915_query_memory_regions
> + *
> + * The region info query enumerates all regions known to the driver by filling
> + * in an array of struct drm_i915_memory_region_info structures.
> + *
> + * Example for getting the list of supported regions:
> + *
> + * .. code-block:: C
> + *
> + *     struct drm_i915_query_memory_regions *info;
> + *     struct drm_i915_query_item item = {
> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> + *     };
> + *     struct drm_i915_query query = {
> + *             .num_items = 1,
> + *             .items_ptr = (uintptr_t)&item,
> + *     };
> + *     int err, i;
> + *
> + *     // First query the size of the blob we need, this needs to be large
> + *     // enough to hold our array of regions. The kernel will fill out the
> + *     // item.length for us, which is the number of bytes we need.
> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> + *     if (err) ...
> + *
> + *     info = calloc(1, item.length);
> + *     // Now that we allocated the required number of bytes, we call the ioctl
> + *     // again, this time with the data_ptr pointing to our newly allocated
> + *     // blob, which the kernel can then populate with the all the region info.
> + *     item.data_ptr = (uintptr_t)&info,
> + *
> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> + *     if (err) ...
> + *
> + *     // We can now access each region in the array
> + *     for (i = 0; i < info->num_regions; i++) {
> + *             struct drm_i915_memory_region_info mr = info->regions[i];
> + *             u16 class = mr.region.class;
> + *             u16 instance = mr.region.instance;
> + *
> + *             ....
> + *     }
> + *
> + *     free(info);
> + */
> +struct drm_i915_query_memory_regions {
> +       /** @num_regions: Number of supported regions */
> +       __u32 num_regions;
> +
> +       /** @pad: MBZ */
> +       __u32 pad;
> +
> +       /** @regions: Info about each supported region */
> +       struct drm_i915_memory_region_info regions[];
> +};
> +
> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)

Here's another thought:  Instead of burning a new IOCTL number, should
we just re-use DRM_I915_GEM_CREATE?  The different structure size
should let us tell the two apart.

--Jason


> +
> +/**
> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
> + * extension support using struct i915_user_extension.
> + *
> + * Note that in the future we want to have our buffer flags here, at least for
> + * the stuff that is immutable. Previously we would have two ioctls, one to
> + * create the object with gem_create, and another to apply various parameters,
> + * however this creates some ambiguity for the params which are considered
> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
> + */
> +struct drm_i915_gem_create_ext {
> +       /**
> +        * @size: Requested size for the object.
> +        *
> +        * The (page-aligned) allocated size for the object will be returned.
> +        *
> +        * Note that for some devices we have might have further minimum
> +        * page-size restrictions(larger than 4K), like for device local-memory.
> +        * However in general the final size here should always reflect any
> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> +        * extension to place the object in device local-memory.
> +        */
> +       __u64 size;
> +       /**
> +        * @handle: Returned handle for the object.
> +        *
> +        * Object handles are nonzero.
> +        */
> +       __u32 handle;
> +       /** @flags: MBZ */
> +       __u32 flags;
> +       /**
> +        * @extensions: The chain of extensions to apply to this object.
> +        *
> +        * This will be useful in the future when we need to support several
> +        * different extensions, and we need to apply more than one when
> +        * creating the object. See struct i915_user_extension.
> +        *
> +        * If we don't supply any extensions then we get the same old gem_create
> +        * behaviour.
> +        *
> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
> +        * struct drm_i915_gem_create_ext_memory_regions.
> +        */
> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_memory_regions - The
> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
> + *
> + * Set the object with the desired set of placements/regions in priority
> + * order. Each entry must be unique and supported by the device.
> + *
> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
> + * an equivalent layout of class:instance pair encodings. See struct
> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
> + * query the supported regions for a device.
> + *
> + * As an example, on discrete devices, if we wish to set the placement as
> + * device local-memory we can do something like:
> + *
> + * .. code-block:: C
> + *
> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
> + *              .memory_instance = 0,
> + *      };
> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
> + *              .regions = (uintptr_t)&region_lmem,
> + *              .num_regions = 1,
> + *      };
> + *      struct drm_i915_gem_create_ext create_ext = {
> + *              .size = 16 * PAGE_SIZE,
> + *              .extensions = (uintptr_t)&regions,
> + *      };
> + *
> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
> + *      if (err) ...
> + *
> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
> + * along with the final object size in &drm_i915_gem_create_ext.size, which
> + * should account for any rounding up, if required.
> + */
> +struct drm_i915_gem_create_ext_memory_regions {
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @pad: MBZ */
> +       __u32 pad;
> +       /** @num_regions: Number of elements in the @regions array. */
> +       __u32 num_regions;
> +       /**
> +        * @regions: The regions/placements array.
> +        *
> +        * An array of struct drm_i915_gem_memory_class_instance.
> +        */
> +       __u64 regions;
> +};
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
> new file mode 100644
> index 000000000000..462f1efd9003
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> @@ -0,0 +1,130 @@
> +=========================
> +I915 DG1/LMEM RFC Section
> +=========================
> +
> +Upstream plan
> +=============
> +For upstream the overall plan for landing all the DG1 stuff and turning it for
> +real, with all the uAPI bits is:
> +
> +* Merge basic HW enabling of DG1(still without pciid)
> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
> +        * At this point we can still make changes, but importantly this lets us
> +          start running IGTs which can utilize local-memory in CI
> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
> +        * TTM shrinker for discrete
> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
> +        * Use TTM CPU pagefault handler
> +        * Route shmem backend over to TTM SYSTEM for discrete
> +        * TTM purgeable object support
> +        * Move i915 buddy allocator over to TTM
> +        * MMAP ioctl mode(see `I915 MMAP`_)
> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
> +* Add pciid for DG1 and turn on uAPI for real
> +
> +New object placement and region query uAPI
> +==========================================
> +Starting from DG1 we need to give userspace the ability to allocate buffers from
> +device local-memory. Currently the driver supports gem_create, which can place
> +buffers in system memory via shmem, and the usual assortment of other
> +interfaces, like dumb buffers and userptr.
> +
> +To support this new capability, while also providing a uAPI which will work
> +beyond just DG1, we propose to offer three new bits of uAPI:
> +
> +DRM_I915_QUERY_MEMORY_REGIONS
> +-----------------------------
> +New query ID which allows userspace to discover the list of supported memory
> +regions(like system-memory and local-memory) for a given device. We identify
> +each region with a class and instance pair, which should be unique. The class
> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> +like DG1.
> +
> +Side note: The class/instance design is borrowed from our existing engine uAPI,
> +where we describe every physical engine in terms of its class, and the
> +particular instance, since we can have more than one per class.
> +
> +In the future we also want to expose more information which can further
> +describe the capabilities of a region.
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
> +
> +GEM_CREATE_EXT
> +--------------
> +New ioctl which is basically just gem_create but now allows userspace to
> +provide a chain of possible extensions. Note that if we don't provide any
> +extensions then we get the exact same behaviour as gem_create.
> +
> +Side note: We also need to support PXP[1] in the near future, which is also
> +applicable to integrated platforms, and adds its own gem_create_ext extension,
> +which basically lets userspace mark a buffer as "protected".
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> +        :functions: drm_i915_gem_create_ext
> +
> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
> +----------------------------------
> +Implemented as an extension for gem_create_ext, we would now allow userspace to
> +optionally provide an immutable list of preferred placements at creation time,
> +in priority order, for a given buffer object.  For the placements we expect
> +them each to use the class/instance encoding, as per the output of the regions
> +query. Having the list in priority order will be useful in the future when
> +placing an object, say during eviction.
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> +        :functions: drm_i915_gem_create_ext_memory_regions
> +
> +One fair criticism here is that this seems a little over-engineered[2]. If we
> +just consider DG1 then yes, a simple gem_create.flags or something is totally
> +all that's needed to tell the kernel to allocate the buffer in local-memory or
> +whatever. However looking to the future we need uAPI which can also support
> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
> +multiple local-memory instances for a given device, and so using both class and
> +instance in our uAPI to describe regions is desirable, although specifically
> +for DG1 it's uninteresting, since we only have a single local-memory instance.
> +
> +Existing uAPI issues
> +====================
> +Some potential issues we still need to resolve.
> +
> +I915 MMAP
> +---------
> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
> +object. TTM expects one MMAP at most for the lifetime of the object. If it
> +turns out that we have to backpedal here, there might be some potential
> +userspace fallout.
> +
> +I915 SET/GET CACHING
> +--------------------
> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
> +DG1 doesn't support non-snooped pcie transactions, so we can just always
> +allocate as WB for smem-only buffers.  If/when our hw gains support for
> +non-snooped pcie transactions then we must fix this mode at allocation time as
> +a new GEM extension.
> +
> +This is related to the mmap problem, because in general (meaning, when we're
> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
> +allocation mode.
> +
> +Possible idea is to let the kernel picks the mmap mode for userspace from the
> +following table:
> +
> +smem-only: WB. Userspace does not need to call clflush.
> +
> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
> +access, which is a bit inefficient.
> +
> +lmem only: always WC
> +
> +This means on discrete you only get a single mmap mode, all others must be
> +rejected. That's probably going to be a new default mode or something like
> +that.
> +
> +Links
> +=====
> +[1] https://patchwork.freedesktop.org/series/86798/
> +
> +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index a8621f7dab8b..05670442ca1b 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -15,3 +15,7 @@ host such documentation:
>
>  * Once the code has landed move all the documentation to the right places in
>    the main core, helper or driver sections.
> +
> +.. toctree::
> +
> +    i915_gem_lmem.rst
> --
> 2.26.3
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26 15:11 ` [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Jason Ekstrand
@ 2021-04-26 15:31   ` Matthew Auld
  2021-04-26 16:25     ` Jason Ekstrand
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Auld @ 2021-04-26 15:31 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, ML mesa-dev, Daniel Vetter

On 26/04/2021 16:11, Jason Ekstrand wrote:
> On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
>>
>> Add an entry for the new uAPI needed for DG1. Also add the overall
>> upstream plan, including some notes for the TTM conversion.
>>
>> v2(Daniel):
>>    - include the overall upstreaming plan
>>    - add a note for mmap, there are differences here for TTM vs i915
>>    - bunch of other suggestions from Daniel
>> v3:
>>   (Daniel)
>>    - add a note for set/get caching stuff
>>    - add some more docs for existing query and extensions stuff
>>    - add an actual code example for regions query
>>    - bunch of other stuff
>>   (Jason)
>>    - uAPI change(!):
>>          - try a simpler design with the placements extension
>>          - rather than have a generic setparam which can cover multiple
>>            use cases, have each extension be responsible for one thing
>>            only
>> v4:
>>   (Daniel)
>>    - add some more notes for ttm conversion
>>    - bunch of other stuff
>>   (Jason)
>>    - uAPI change(!):
>>          - drop all the extra rsvd members for the region_query and
>>            region_info, just keep the bare minimum needed for padding
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: Jason Ekstrand <jason@jlekstrand.net>
>> Cc: Dave Airlie <airlied@gmail.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: mesa-dev@lists.freedesktop.org
>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Acked-by: Dave Airlie <airlied@redhat.com>
>> ---
>>   Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
>>   Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
>>   Documentation/gpu/rfc/index.rst         |   4 +
>>   3 files changed, 346 insertions(+)
>>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
>>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
>>
>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
>> new file mode 100644
>> index 000000000000..7ed59b6202d5
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
>> @@ -0,0 +1,212 @@
>> +/**
>> + * enum drm_i915_gem_memory_class - Supported memory classes
>> + */
>> +enum drm_i915_gem_memory_class {
>> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
>> +       I915_MEMORY_CLASS_SYSTEM = 0,
>> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
>> +       I915_MEMORY_CLASS_DEVICE,
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
>> + */
>> +struct drm_i915_gem_memory_class_instance {
>> +       /** @memory_class: See enum drm_i915_gem_memory_class */
>> +       __u16 memory_class;
>> +
>> +       /** @memory_instance: Which instance */
>> +       __u16 memory_instance;
>> +};
>> +
>> +/**
>> + * struct drm_i915_memory_region_info - Describes one region as known to the
>> + * driver.
>> + *
>> + * Note that we reserve some stuff here for potential future work. As an example
>> + * we might want expose the capabilities(see @caps) for a given region, which
>> + * could include things like if the region is CPU mappable/accessible, what are
>> + * the supported mapping types etc.
>> + *
>> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
>> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
>> + * at &drm_i915_query_item.query_id.
>> + */
>> +struct drm_i915_memory_region_info {
>> +       /** @region: The class:instance pair encoding */
>> +       struct drm_i915_gem_memory_class_instance region;
>> +
>> +       /** @pad: MBZ */
>> +       __u32 pad;
>> +
>> +       /** @caps: MBZ */
>> +       __u64 caps;
>> +
>> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
>> +       __u64 probed_size;
>> +
>> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
>> +       __u64 unallocated_size;
>> +};
>> +
>> +/**
>> + * struct drm_i915_query_memory_regions
>> + *
>> + * The region info query enumerates all regions known to the driver by filling
>> + * in an array of struct drm_i915_memory_region_info structures.
>> + *
>> + * Example for getting the list of supported regions:
>> + *
>> + * .. code-block:: C
>> + *
>> + *     struct drm_i915_query_memory_regions *info;
>> + *     struct drm_i915_query_item item = {
>> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
>> + *     };
>> + *     struct drm_i915_query query = {
>> + *             .num_items = 1,
>> + *             .items_ptr = (uintptr_t)&item,
>> + *     };
>> + *     int err, i;
>> + *
>> + *     // First query the size of the blob we need, this needs to be large
>> + *     // enough to hold our array of regions. The kernel will fill out the
>> + *     // item.length for us, which is the number of bytes we need.
>> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
>> + *     if (err) ...
>> + *
>> + *     info = calloc(1, item.length);
>> + *     // Now that we allocated the required number of bytes, we call the ioctl
>> + *     // again, this time with the data_ptr pointing to our newly allocated
>> + *     // blob, which the kernel can then populate with the all the region info.
>> + *     item.data_ptr = (uintptr_t)&info,
>> + *
>> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
>> + *     if (err) ...
>> + *
>> + *     // We can now access each region in the array
>> + *     for (i = 0; i < info->num_regions; i++) {
>> + *             struct drm_i915_memory_region_info mr = info->regions[i];
>> + *             u16 class = mr.region.class;
>> + *             u16 instance = mr.region.instance;
>> + *
>> + *             ....
>> + *     }
>> + *
>> + *     free(info);
>> + */
>> +struct drm_i915_query_memory_regions {
>> +       /** @num_regions: Number of supported regions */
>> +       __u32 num_regions;
>> +
>> +       /** @pad: MBZ */
>> +       __u32 pad;
>> +
>> +       /** @regions: Info about each supported region */
>> +       struct drm_i915_memory_region_info regions[];
>> +};
>> +
>> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
>> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
> 
> Here's another thought:  Instead of burning a new IOCTL number, should
> we just re-use DRM_I915_GEM_CREATE?  The different structure size
> should let us tell the two apart.

Yeah, it was exactly like that in the original version[1]. Scrolling 
through the review comments, I think the concern was with an older 
kernel just silently ignoring the extensions with new userspace.

[1] https://patchwork.freedesktop.org/patch/404455/?series=84344&rev=1

> 
> --Jason
> 
> 
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
>> + * extension support using struct i915_user_extension.
>> + *
>> + * Note that in the future we want to have our buffer flags here, at least for
>> + * the stuff that is immutable. Previously we would have two ioctls, one to
>> + * create the object with gem_create, and another to apply various parameters,
>> + * however this creates some ambiguity for the params which are considered
>> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
>> + */
>> +struct drm_i915_gem_create_ext {
>> +       /**
>> +        * @size: Requested size for the object.
>> +        *
>> +        * The (page-aligned) allocated size for the object will be returned.
>> +        *
>> +        * Note that for some devices we have might have further minimum
>> +        * page-size restrictions(larger than 4K), like for device local-memory.
>> +        * However in general the final size here should always reflect any
>> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> +        * extension to place the object in device local-memory.
>> +        */
>> +       __u64 size;
>> +       /**
>> +        * @handle: Returned handle for the object.
>> +        *
>> +        * Object handles are nonzero.
>> +        */
>> +       __u32 handle;
>> +       /** @flags: MBZ */
>> +       __u32 flags;
>> +       /**
>> +        * @extensions: The chain of extensions to apply to this object.
>> +        *
>> +        * This will be useful in the future when we need to support several
>> +        * different extensions, and we need to apply more than one when
>> +        * creating the object. See struct i915_user_extension.
>> +        *
>> +        * If we don't supply any extensions then we get the same old gem_create
>> +        * behaviour.
>> +        *
>> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
>> +        * struct drm_i915_gem_create_ext_memory_regions.
>> +        */
>> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>> +       __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext_memory_regions - The
>> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
>> + *
>> + * Set the object with the desired set of placements/regions in priority
>> + * order. Each entry must be unique and supported by the device.
>> + *
>> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
>> + * an equivalent layout of class:instance pair encodings. See struct
>> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
>> + * query the supported regions for a device.
>> + *
>> + * As an example, on discrete devices, if we wish to set the placement as
>> + * device local-memory we can do something like:
>> + *
>> + * .. code-block:: C
>> + *
>> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
>> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
>> + *              .memory_instance = 0,
>> + *      };
>> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
>> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
>> + *              .regions = (uintptr_t)&region_lmem,
>> + *              .num_regions = 1,
>> + *      };
>> + *      struct drm_i915_gem_create_ext create_ext = {
>> + *              .size = 16 * PAGE_SIZE,
>> + *              .extensions = (uintptr_t)&regions,
>> + *      };
>> + *
>> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
>> + *      if (err) ...
>> + *
>> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
>> + * along with the final object size in &drm_i915_gem_create_ext.size, which
>> + * should account for any rounding up, if required.
>> + */
>> +struct drm_i915_gem_create_ext_memory_regions {
>> +       /** @base: Extension link. See struct i915_user_extension. */
>> +       struct i915_user_extension base;
>> +
>> +       /** @pad: MBZ */
>> +       __u32 pad;
>> +       /** @num_regions: Number of elements in the @regions array. */
>> +       __u32 num_regions;
>> +       /**
>> +        * @regions: The regions/placements array.
>> +        *
>> +        * An array of struct drm_i915_gem_memory_class_instance.
>> +        */
>> +       __u64 regions;
>> +};
>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
>> new file mode 100644
>> index 000000000000..462f1efd9003
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
>> @@ -0,0 +1,130 @@
>> +=========================
>> +I915 DG1/LMEM RFC Section
>> +=========================
>> +
>> +Upstream plan
>> +=============
>> +For upstream the overall plan for landing all the DG1 stuff and turning it for
>> +real, with all the uAPI bits is:
>> +
>> +* Merge basic HW enabling of DG1(still without pciid)
>> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
>> +        * At this point we can still make changes, but importantly this lets us
>> +          start running IGTs which can utilize local-memory in CI
>> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
>> +        * TTM shrinker for discrete
>> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
>> +        * Use TTM CPU pagefault handler
>> +        * Route shmem backend over to TTM SYSTEM for discrete
>> +        * TTM purgeable object support
>> +        * Move i915 buddy allocator over to TTM
>> +        * MMAP ioctl mode(see `I915 MMAP`_)
>> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
>> +* Add pciid for DG1 and turn on uAPI for real
>> +
>> +New object placement and region query uAPI
>> +==========================================
>> +Starting from DG1 we need to give userspace the ability to allocate buffers from
>> +device local-memory. Currently the driver supports gem_create, which can place
>> +buffers in system memory via shmem, and the usual assortment of other
>> +interfaces, like dumb buffers and userptr.
>> +
>> +To support this new capability, while also providing a uAPI which will work
>> +beyond just DG1, we propose to offer three new bits of uAPI:
>> +
>> +DRM_I915_QUERY_MEMORY_REGIONS
>> +-----------------------------
>> +New query ID which allows userspace to discover the list of supported memory
>> +regions(like system-memory and local-memory) for a given device. We identify
>> +each region with a class and instance pair, which should be unique. The class
>> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
>> +like DG1.
>> +
>> +Side note: The class/instance design is borrowed from our existing engine uAPI,
>> +where we describe every physical engine in terms of its class, and the
>> +particular instance, since we can have more than one per class.
>> +
>> +In the future we also want to expose more information which can further
>> +describe the capabilities of a region.
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
>> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
>> +
>> +GEM_CREATE_EXT
>> +--------------
>> +New ioctl which is basically just gem_create but now allows userspace to
>> +provide a chain of possible extensions. Note that if we don't provide any
>> +extensions then we get the exact same behaviour as gem_create.
>> +
>> +Side note: We also need to support PXP[1] in the near future, which is also
>> +applicable to integrated platforms, and adds its own gem_create_ext extension,
>> +which basically lets userspace mark a buffer as "protected".
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
>> +        :functions: drm_i915_gem_create_ext
>> +
>> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> +----------------------------------
>> +Implemented as an extension for gem_create_ext, we would now allow userspace to
>> +optionally provide an immutable list of preferred placements at creation time,
>> +in priority order, for a given buffer object.  For the placements we expect
>> +them each to use the class/instance encoding, as per the output of the regions
>> +query. Having the list in priority order will be useful in the future when
>> +placing an object, say during eviction.
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
>> +        :functions: drm_i915_gem_create_ext_memory_regions
>> +
>> +One fair criticism here is that this seems a little over-engineered[2]. If we
>> +just consider DG1 then yes, a simple gem_create.flags or something is totally
>> +all that's needed to tell the kernel to allocate the buffer in local-memory or
>> +whatever. However looking to the future we need uAPI which can also support
>> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
>> +multiple local-memory instances for a given device, and so using both class and
>> +instance in our uAPI to describe regions is desirable, although specifically
>> +for DG1 it's uninteresting, since we only have a single local-memory instance.
>> +
>> +Existing uAPI issues
>> +====================
>> +Some potential issues we still need to resolve.
>> +
>> +I915 MMAP
>> +---------
>> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
>> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
>> +object. TTM expects one MMAP at most for the lifetime of the object. If it
>> +turns out that we have to backpedal here, there might be some potential
>> +userspace fallout.
>> +
>> +I915 SET/GET CACHING
>> +--------------------
>> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
>> +DG1 doesn't support non-snooped pcie transactions, so we can just always
>> +allocate as WB for smem-only buffers.  If/when our hw gains support for
>> +non-snooped pcie transactions then we must fix this mode at allocation time as
>> +a new GEM extension.
>> +
>> +This is related to the mmap problem, because in general (meaning, when we're
>> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
>> +allocation mode.
>> +
>> +Possible idea is to let the kernel picks the mmap mode for userspace from the
>> +following table:
>> +
>> +smem-only: WB. Userspace does not need to call clflush.
>> +
>> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
>> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
>> +access, which is a bit inefficient.
>> +
>> +lmem only: always WC
>> +
>> +This means on discrete you only get a single mmap mode, all others must be
>> +rejected. That's probably going to be a new default mode or something like
>> +that.
>> +
>> +Links
>> +=====
>> +[1] https://patchwork.freedesktop.org/series/86798/
>> +
>> +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
>> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
>> index a8621f7dab8b..05670442ca1b 100644
>> --- a/Documentation/gpu/rfc/index.rst
>> +++ b/Documentation/gpu/rfc/index.rst
>> @@ -15,3 +15,7 @@ host such documentation:
>>
>>   * Once the code has landed move all the documentation to the right places in
>>     the main core, helper or driver sections.
>> +
>> +.. toctree::
>> +
>> +    i915_gem_lmem.rst
>> --
>> 2.26.3
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26 15:31   ` Matthew Auld
@ 2021-04-26 16:25     ` Jason Ekstrand
  2021-04-26 16:32       ` Daniel Vetter
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Ekstrand @ 2021-04-26 16:25 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, ML mesa-dev, Daniel Vetter

On Mon, Apr 26, 2021 at 10:31 AM Matthew Auld <matthew.auld@intel.com> wrote:
>
> On 26/04/2021 16:11, Jason Ekstrand wrote:
> > On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
> >>
> >> Add an entry for the new uAPI needed for DG1. Also add the overall
> >> upstream plan, including some notes for the TTM conversion.
> >>
> >> v2(Daniel):
> >>    - include the overall upstreaming plan
> >>    - add a note for mmap, there are differences here for TTM vs i915
> >>    - bunch of other suggestions from Daniel
> >> v3:
> >>   (Daniel)
> >>    - add a note for set/get caching stuff
> >>    - add some more docs for existing query and extensions stuff
> >>    - add an actual code example for regions query
> >>    - bunch of other stuff
> >>   (Jason)
> >>    - uAPI change(!):
> >>          - try a simpler design with the placements extension
> >>          - rather than have a generic setparam which can cover multiple
> >>            use cases, have each extension be responsible for one thing
> >>            only
> >> v4:
> >>   (Daniel)
> >>    - add some more notes for ttm conversion
> >>    - bunch of other stuff
> >>   (Jason)
> >>    - uAPI change(!):
> >>          - drop all the extra rsvd members for the region_query and
> >>            region_info, just keep the bare minimum needed for padding
> >>
> >> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> >> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> >> Cc: Jordan Justen <jordan.l.justen@intel.com>
> >> Cc: Daniel Vetter <daniel.vetter@intel.com>
> >> Cc: Kenneth Graunke <kenneth@whitecape.org>
> >> Cc: Jason Ekstrand <jason@jlekstrand.net>
> >> Cc: Dave Airlie <airlied@gmail.com>
> >> Cc: dri-devel@lists.freedesktop.org
> >> Cc: mesa-dev@lists.freedesktop.org
> >> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Acked-by: Dave Airlie <airlied@redhat.com>
> >> ---
> >>   Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
> >>   Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
> >>   Documentation/gpu/rfc/index.rst         |   4 +
> >>   3 files changed, 346 insertions(+)
> >>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
> >>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
> >>
> >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
> >> new file mode 100644
> >> index 000000000000..7ed59b6202d5
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
> >> @@ -0,0 +1,212 @@
> >> +/**
> >> + * enum drm_i915_gem_memory_class - Supported memory classes
> >> + */
> >> +enum drm_i915_gem_memory_class {
> >> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> >> +       I915_MEMORY_CLASS_SYSTEM = 0,
> >> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> >> +       I915_MEMORY_CLASS_DEVICE,
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
> >> + */
> >> +struct drm_i915_gem_memory_class_instance {
> >> +       /** @memory_class: See enum drm_i915_gem_memory_class */
> >> +       __u16 memory_class;
> >> +
> >> +       /** @memory_instance: Which instance */
> >> +       __u16 memory_instance;
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_memory_region_info - Describes one region as known to the
> >> + * driver.
> >> + *
> >> + * Note that we reserve some stuff here for potential future work. As an example
> >> + * we might want expose the capabilities(see @caps) for a given region, which
> >> + * could include things like if the region is CPU mappable/accessible, what are
> >> + * the supported mapping types etc.
> >> + *
> >> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
> >> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
> >> + * at &drm_i915_query_item.query_id.
> >> + */
> >> +struct drm_i915_memory_region_info {
> >> +       /** @region: The class:instance pair encoding */
> >> +       struct drm_i915_gem_memory_class_instance region;
> >> +
> >> +       /** @pad: MBZ */
> >> +       __u32 pad;
> >> +
> >> +       /** @caps: MBZ */
> >> +       __u64 caps;
> >> +
> >> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
> >> +       __u64 probed_size;
> >> +
> >> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> >> +       __u64 unallocated_size;
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_query_memory_regions
> >> + *
> >> + * The region info query enumerates all regions known to the driver by filling
> >> + * in an array of struct drm_i915_memory_region_info structures.
> >> + *
> >> + * Example for getting the list of supported regions:
> >> + *
> >> + * .. code-block:: C
> >> + *
> >> + *     struct drm_i915_query_memory_regions *info;
> >> + *     struct drm_i915_query_item item = {
> >> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> >> + *     };
> >> + *     struct drm_i915_query query = {
> >> + *             .num_items = 1,
> >> + *             .items_ptr = (uintptr_t)&item,
> >> + *     };
> >> + *     int err, i;
> >> + *
> >> + *     // First query the size of the blob we need, this needs to be large
> >> + *     // enough to hold our array of regions. The kernel will fill out the
> >> + *     // item.length for us, which is the number of bytes we need.
> >> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> >> + *     if (err) ...
> >> + *
> >> + *     info = calloc(1, item.length);
> >> + *     // Now that we allocated the required number of bytes, we call the ioctl
> >> + *     // again, this time with the data_ptr pointing to our newly allocated
> >> + *     // blob, which the kernel can then populate with the all the region info.
> >> + *     item.data_ptr = (uintptr_t)&info,
> >> + *
> >> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> >> + *     if (err) ...
> >> + *
> >> + *     // We can now access each region in the array
> >> + *     for (i = 0; i < info->num_regions; i++) {
> >> + *             struct drm_i915_memory_region_info mr = info->regions[i];
> >> + *             u16 class = mr.region.class;
> >> + *             u16 instance = mr.region.instance;
> >> + *
> >> + *             ....
> >> + *     }
> >> + *
> >> + *     free(info);
> >> + */
> >> +struct drm_i915_query_memory_regions {
> >> +       /** @num_regions: Number of supported regions */
> >> +       __u32 num_regions;
> >> +
> >> +       /** @pad: MBZ */
> >> +       __u32 pad;
> >> +
> >> +       /** @regions: Info about each supported region */
> >> +       struct drm_i915_memory_region_info regions[];
> >> +};
> >> +
> >> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
> >> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
> >
> > Here's another thought:  Instead of burning a new IOCTL number, should
> > we just re-use DRM_I915_GEM_CREATE?  The different structure size
> > should let us tell the two apart.
>
> Yeah, it was exactly like that in the original version[1]. Scrolling
> through the review comments, I think the concern was with an older
> kernel just silently ignoring the extensions with new userspace.
>
> [1] https://patchwork.freedesktop.org/patch/404455/?series=84344&rev=1

Yeah, I think Chris is right.  I just dug through the code and there
really is no way for us to reject different sized structs; they just
show up zero-extended.  That's aggravating.

--Jason

> >
> > --Jason
> >
> >
> >> +
> >> +/**
> >> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
> >> + * extension support using struct i915_user_extension.
> >> + *
> >> + * Note that in the future we want to have our buffer flags here, at least for
> >> + * the stuff that is immutable. Previously we would have two ioctls, one to
> >> + * create the object with gem_create, and another to apply various parameters,
> >> + * however this creates some ambiguity for the params which are considered
> >> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
> >> + */
> >> +struct drm_i915_gem_create_ext {
> >> +       /**
> >> +        * @size: Requested size for the object.
> >> +        *
> >> +        * The (page-aligned) allocated size for the object will be returned.
> >> +        *
> >> +        * Note that for some devices we have might have further minimum
> >> +        * page-size restrictions(larger than 4K), like for device local-memory.
> >> +        * However in general the final size here should always reflect any
> >> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> >> +        * extension to place the object in device local-memory.
> >> +        */
> >> +       __u64 size;
> >> +       /**
> >> +        * @handle: Returned handle for the object.
> >> +        *
> >> +        * Object handles are nonzero.
> >> +        */
> >> +       __u32 handle;
> >> +       /** @flags: MBZ */
> >> +       __u32 flags;
> >> +       /**
> >> +        * @extensions: The chain of extensions to apply to this object.
> >> +        *
> >> +        * This will be useful in the future when we need to support several
> >> +        * different extensions, and we need to apply more than one when
> >> +        * creating the object. See struct i915_user_extension.
> >> +        *
> >> +        * If we don't supply any extensions then we get the same old gem_create
> >> +        * behaviour.
> >> +        *
> >> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
> >> +        * struct drm_i915_gem_create_ext_memory_regions.
> >> +        */
> >> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> >> +       __u64 extensions;
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_gem_create_ext_memory_regions - The
> >> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
> >> + *
> >> + * Set the object with the desired set of placements/regions in priority
> >> + * order. Each entry must be unique and supported by the device.
> >> + *
> >> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
> >> + * an equivalent layout of class:instance pair encodings. See struct
> >> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
> >> + * query the supported regions for a device.
> >> + *
> >> + * As an example, on discrete devices, if we wish to set the placement as
> >> + * device local-memory we can do something like:
> >> + *
> >> + * .. code-block:: C
> >> + *
> >> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
> >> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
> >> + *              .memory_instance = 0,
> >> + *      };
> >> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
> >> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
> >> + *              .regions = (uintptr_t)&region_lmem,
> >> + *              .num_regions = 1,
> >> + *      };
> >> + *      struct drm_i915_gem_create_ext create_ext = {
> >> + *              .size = 16 * PAGE_SIZE,
> >> + *              .extensions = (uintptr_t)&regions,
> >> + *      };
> >> + *
> >> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
> >> + *      if (err) ...
> >> + *
> >> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
> >> + * along with the final object size in &drm_i915_gem_create_ext.size, which
> >> + * should account for any rounding up, if required.
> >> + */
> >> +struct drm_i915_gem_create_ext_memory_regions {
> >> +       /** @base: Extension link. See struct i915_user_extension. */
> >> +       struct i915_user_extension base;
> >> +
> >> +       /** @pad: MBZ */
> >> +       __u32 pad;
> >> +       /** @num_regions: Number of elements in the @regions array. */
> >> +       __u32 num_regions;
> >> +       /**
> >> +        * @regions: The regions/placements array.
> >> +        *
> >> +        * An array of struct drm_i915_gem_memory_class_instance.
> >> +        */
> >> +       __u64 regions;
> >> +};
> >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
> >> new file mode 100644
> >> index 000000000000..462f1efd9003
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> >> @@ -0,0 +1,130 @@
> >> +=========================
> >> +I915 DG1/LMEM RFC Section
> >> +=========================
> >> +
> >> +Upstream plan
> >> +=============
> >> +For upstream the overall plan for landing all the DG1 stuff and turning it for
> >> +real, with all the uAPI bits is:
> >> +
> >> +* Merge basic HW enabling of DG1(still without pciid)
> >> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
> >> +        * At this point we can still make changes, but importantly this lets us
> >> +          start running IGTs which can utilize local-memory in CI
> >> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
> >> +        * TTM shrinker for discrete
> >> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
> >> +        * Use TTM CPU pagefault handler
> >> +        * Route shmem backend over to TTM SYSTEM for discrete
> >> +        * TTM purgeable object support
> >> +        * Move i915 buddy allocator over to TTM
> >> +        * MMAP ioctl mode(see `I915 MMAP`_)
> >> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
> >> +* Add pciid for DG1 and turn on uAPI for real
> >> +
> >> +New object placement and region query uAPI
> >> +==========================================
> >> +Starting from DG1 we need to give userspace the ability to allocate buffers from
> >> +device local-memory. Currently the driver supports gem_create, which can place
> >> +buffers in system memory via shmem, and the usual assortment of other
> >> +interfaces, like dumb buffers and userptr.
> >> +
> >> +To support this new capability, while also providing a uAPI which will work
> >> +beyond just DG1, we propose to offer three new bits of uAPI:
> >> +
> >> +DRM_I915_QUERY_MEMORY_REGIONS
> >> +-----------------------------
> >> +New query ID which allows userspace to discover the list of supported memory
> >> +regions(like system-memory and local-memory) for a given device. We identify
> >> +each region with a class and instance pair, which should be unique. The class
> >> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> >> +like DG1.
> >> +
> >> +Side note: The class/instance design is borrowed from our existing engine uAPI,
> >> +where we describe every physical engine in terms of its class, and the
> >> +particular instance, since we can have more than one per class.
> >> +
> >> +In the future we also want to expose more information which can further
> >> +describe the capabilities of a region.
> >> +
> >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> >> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
> >> +
> >> +GEM_CREATE_EXT
> >> +--------------
> >> +New ioctl which is basically just gem_create but now allows userspace to
> >> +provide a chain of possible extensions. Note that if we don't provide any
> >> +extensions then we get the exact same behaviour as gem_create.
> >> +
> >> +Side note: We also need to support PXP[1] in the near future, which is also
> >> +applicable to integrated platforms, and adds its own gem_create_ext extension,
> >> +which basically lets userspace mark a buffer as "protected".
> >> +
> >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> >> +        :functions: drm_i915_gem_create_ext
> >> +
> >> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
> >> +----------------------------------
> >> +Implemented as an extension for gem_create_ext, we would now allow userspace to
> >> +optionally provide an immutable list of preferred placements at creation time,
> >> +in priority order, for a given buffer object.  For the placements we expect
> >> +them each to use the class/instance encoding, as per the output of the regions
> >> +query. Having the list in priority order will be useful in the future when
> >> +placing an object, say during eviction.
> >> +
> >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> >> +        :functions: drm_i915_gem_create_ext_memory_regions
> >> +
> >> +One fair criticism here is that this seems a little over-engineered[2]. If we
> >> +just consider DG1 then yes, a simple gem_create.flags or something is totally
> >> +all that's needed to tell the kernel to allocate the buffer in local-memory or
> >> +whatever. However looking to the future we need uAPI which can also support
> >> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
> >> +multiple local-memory instances for a given device, and so using both class and
> >> +instance in our uAPI to describe regions is desirable, although specifically
> >> +for DG1 it's uninteresting, since we only have a single local-memory instance.
> >> +
> >> +Existing uAPI issues
> >> +====================
> >> +Some potential issues we still need to resolve.
> >> +
> >> +I915 MMAP
> >> +---------
> >> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
> >> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
> >> +object. TTM expects one MMAP at most for the lifetime of the object. If it
> >> +turns out that we have to backpedal here, there might be some potential
> >> +userspace fallout.
> >> +
> >> +I915 SET/GET CACHING
> >> +--------------------
> >> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
> >> +DG1 doesn't support non-snooped pcie transactions, so we can just always
> >> +allocate as WB for smem-only buffers.  If/when our hw gains support for
> >> +non-snooped pcie transactions then we must fix this mode at allocation time as
> >> +a new GEM extension.
> >> +
> >> +This is related to the mmap problem, because in general (meaning, when we're
> >> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
> >> +allocation mode.
> >> +
> >> +Possible idea is to let the kernel picks the mmap mode for userspace from the
> >> +following table:
> >> +
> >> +smem-only: WB. Userspace does not need to call clflush.
> >> +
> >> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
> >> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
> >> +access, which is a bit inefficient.
> >> +
> >> +lmem only: always WC
> >> +
> >> +This means on discrete you only get a single mmap mode, all others must be
> >> +rejected. That's probably going to be a new default mode or something like
> >> +that.
> >> +
> >> +Links
> >> +=====
> >> +[1] https://patchwork.freedesktop.org/series/86798/
> >> +
> >> +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
> >> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> >> index a8621f7dab8b..05670442ca1b 100644
> >> --- a/Documentation/gpu/rfc/index.rst
> >> +++ b/Documentation/gpu/rfc/index.rst
> >> @@ -15,3 +15,7 @@ host such documentation:
> >>
> >>   * Once the code has landed move all the documentation to the right places in
> >>     the main core, helper or driver sections.
> >> +
> >> +.. toctree::
> >> +
> >> +    i915_gem_lmem.rst
> >> --
> >> 2.26.3
> >>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26 16:25     ` Jason Ekstrand
@ 2021-04-26 16:32       ` Daniel Vetter
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2021-04-26 16:32 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, Matthew Auld, ML mesa-dev, Daniel Vetter

On Mon, Apr 26, 2021 at 11:25:09AM -0500, Jason Ekstrand wrote:
> On Mon, Apr 26, 2021 at 10:31 AM Matthew Auld <matthew.auld@intel.com> wrote:
> >
> > On 26/04/2021 16:11, Jason Ekstrand wrote:
> > > On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
> > >>
> > >> Add an entry for the new uAPI needed for DG1. Also add the overall
> > >> upstream plan, including some notes for the TTM conversion.
> > >>
> > >> v2(Daniel):
> > >>    - include the overall upstreaming plan
> > >>    - add a note for mmap, there are differences here for TTM vs i915
> > >>    - bunch of other suggestions from Daniel
> > >> v3:
> > >>   (Daniel)
> > >>    - add a note for set/get caching stuff
> > >>    - add some more docs for existing query and extensions stuff
> > >>    - add an actual code example for regions query
> > >>    - bunch of other stuff
> > >>   (Jason)
> > >>    - uAPI change(!):
> > >>          - try a simpler design with the placements extension
> > >>          - rather than have a generic setparam which can cover multiple
> > >>            use cases, have each extension be responsible for one thing
> > >>            only
> > >> v4:
> > >>   (Daniel)
> > >>    - add some more notes for ttm conversion
> > >>    - bunch of other stuff
> > >>   (Jason)
> > >>    - uAPI change(!):
> > >>          - drop all the extra rsvd members for the region_query and
> > >>            region_info, just keep the bare minimum needed for padding
> > >>
> > >> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > >> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > >> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> > >> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> > >> Cc: Jordan Justen <jordan.l.justen@intel.com>
> > >> Cc: Daniel Vetter <daniel.vetter@intel.com>
> > >> Cc: Kenneth Graunke <kenneth@whitecape.org>
> > >> Cc: Jason Ekstrand <jason@jlekstrand.net>
> > >> Cc: Dave Airlie <airlied@gmail.com>
> > >> Cc: dri-devel@lists.freedesktop.org
> > >> Cc: mesa-dev@lists.freedesktop.org
> > >> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > >> Acked-by: Dave Airlie <airlied@redhat.com>
> > >> ---
> > >>   Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
> > >>   Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
> > >>   Documentation/gpu/rfc/index.rst         |   4 +
> > >>   3 files changed, 346 insertions(+)
> > >>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
> > >>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
> > >>
> > >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
> > >> new file mode 100644
> > >> index 000000000000..7ed59b6202d5
> > >> --- /dev/null
> > >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
> > >> @@ -0,0 +1,212 @@
> > >> +/**
> > >> + * enum drm_i915_gem_memory_class - Supported memory classes
> > >> + */
> > >> +enum drm_i915_gem_memory_class {
> > >> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> > >> +       I915_MEMORY_CLASS_SYSTEM = 0,
> > >> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> > >> +       I915_MEMORY_CLASS_DEVICE,
> > >> +};
> > >> +
> > >> +/**
> > >> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
> > >> + */
> > >> +struct drm_i915_gem_memory_class_instance {
> > >> +       /** @memory_class: See enum drm_i915_gem_memory_class */
> > >> +       __u16 memory_class;
> > >> +
> > >> +       /** @memory_instance: Which instance */
> > >> +       __u16 memory_instance;
> > >> +};
> > >> +
> > >> +/**
> > >> + * struct drm_i915_memory_region_info - Describes one region as known to the
> > >> + * driver.
> > >> + *
> > >> + * Note that we reserve some stuff here for potential future work. As an example
> > >> + * we might want expose the capabilities(see @caps) for a given region, which
> > >> + * could include things like if the region is CPU mappable/accessible, what are
> > >> + * the supported mapping types etc.
> > >> + *
> > >> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
> > >> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
> > >> + * at &drm_i915_query_item.query_id.
> > >> + */
> > >> +struct drm_i915_memory_region_info {
> > >> +       /** @region: The class:instance pair encoding */
> > >> +       struct drm_i915_gem_memory_class_instance region;
> > >> +
> > >> +       /** @pad: MBZ */
> > >> +       __u32 pad;
> > >> +
> > >> +       /** @caps: MBZ */
> > >> +       __u64 caps;
> > >> +
> > >> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
> > >> +       __u64 probed_size;
> > >> +
> > >> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> > >> +       __u64 unallocated_size;
> > >> +};
> > >> +
> > >> +/**
> > >> + * struct drm_i915_query_memory_regions
> > >> + *
> > >> + * The region info query enumerates all regions known to the driver by filling
> > >> + * in an array of struct drm_i915_memory_region_info structures.
> > >> + *
> > >> + * Example for getting the list of supported regions:
> > >> + *
> > >> + * .. code-block:: C
> > >> + *
> > >> + *     struct drm_i915_query_memory_regions *info;
> > >> + *     struct drm_i915_query_item item = {
> > >> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> > >> + *     };
> > >> + *     struct drm_i915_query query = {
> > >> + *             .num_items = 1,
> > >> + *             .items_ptr = (uintptr_t)&item,
> > >> + *     };
> > >> + *     int err, i;
> > >> + *
> > >> + *     // First query the size of the blob we need, this needs to be large
> > >> + *     // enough to hold our array of regions. The kernel will fill out the
> > >> + *     // item.length for us, which is the number of bytes we need.
> > >> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> > >> + *     if (err) ...
> > >> + *
> > >> + *     info = calloc(1, item.length);
> > >> + *     // Now that we allocated the required number of bytes, we call the ioctl
> > >> + *     // again, this time with the data_ptr pointing to our newly allocated
> > >> + *     // blob, which the kernel can then populate with the all the region info.
> > >> + *     item.data_ptr = (uintptr_t)&info,
> > >> + *
> > >> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> > >> + *     if (err) ...
> > >> + *
> > >> + *     // We can now access each region in the array
> > >> + *     for (i = 0; i < info->num_regions; i++) {
> > >> + *             struct drm_i915_memory_region_info mr = info->regions[i];
> > >> + *             u16 class = mr.region.class;
> > >> + *             u16 instance = mr.region.instance;
> > >> + *
> > >> + *             ....
> > >> + *     }
> > >> + *
> > >> + *     free(info);
> > >> + */
> > >> +struct drm_i915_query_memory_regions {
> > >> +       /** @num_regions: Number of supported regions */
> > >> +       __u32 num_regions;
> > >> +
> > >> +       /** @pad: MBZ */
> > >> +       __u32 pad;
> > >> +
> > >> +       /** @regions: Info about each supported region */
> > >> +       struct drm_i915_memory_region_info regions[];
> > >> +};
> > >> +
> > >> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
> > >> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
> > >
> > > Here's another thought:  Instead of burning a new IOCTL number, should
> > > we just re-use DRM_I915_GEM_CREATE?  The different structure size
> > > should let us tell the two apart.
> >
> > Yeah, it was exactly like that in the original version[1]. Scrolling
> > through the review comments, I think the concern was with an older
> > kernel just silently ignoring the extensions with new userspace.
> >
> > [1] https://patchwork.freedesktop.org/patch/404455/?series=84344&rev=1
> 
> Yeah, I think Chris is right.  I just dug through the code and there
> really is no way for us to reject different sized structs; they just
> show up zero-extended.  That's aggravating.

I'm not sure why that's a big deal? Userspace doing nonsense isn't really
an issue, as long as we have a getparam or similar for announcing the
flags.

But also we're not that limited yet on ioctl numbers (but when they're
gone, they're gone), so *shrug*.
-Daniel

> 
> --Jason
> 
> > >
> > > --Jason
> > >
> > >
> > >> +
> > >> +/**
> > >> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
> > >> + * extension support using struct i915_user_extension.
> > >> + *
> > >> + * Note that in the future we want to have our buffer flags here, at least for
> > >> + * the stuff that is immutable. Previously we would have two ioctls, one to
> > >> + * create the object with gem_create, and another to apply various parameters,
> > >> + * however this creates some ambiguity for the params which are considered
> > >> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
> > >> + */
> > >> +struct drm_i915_gem_create_ext {
> > >> +       /**
> > >> +        * @size: Requested size for the object.
> > >> +        *
> > >> +        * The (page-aligned) allocated size for the object will be returned.
> > >> +        *
> > >> +        * Note that for some devices we have might have further minimum
> > >> +        * page-size restrictions(larger than 4K), like for device local-memory.
> > >> +        * However in general the final size here should always reflect any
> > >> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> > >> +        * extension to place the object in device local-memory.
> > >> +        */
> > >> +       __u64 size;
> > >> +       /**
> > >> +        * @handle: Returned handle for the object.
> > >> +        *
> > >> +        * Object handles are nonzero.
> > >> +        */
> > >> +       __u32 handle;
> > >> +       /** @flags: MBZ */
> > >> +       __u32 flags;
> > >> +       /**
> > >> +        * @extensions: The chain of extensions to apply to this object.
> > >> +        *
> > >> +        * This will be useful in the future when we need to support several
> > >> +        * different extensions, and we need to apply more than one when
> > >> +        * creating the object. See struct i915_user_extension.
> > >> +        *
> > >> +        * If we don't supply any extensions then we get the same old gem_create
> > >> +        * behaviour.
> > >> +        *
> > >> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
> > >> +        * struct drm_i915_gem_create_ext_memory_regions.
> > >> +        */
> > >> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> > >> +       __u64 extensions;
> > >> +};
> > >> +
> > >> +/**
> > >> + * struct drm_i915_gem_create_ext_memory_regions - The
> > >> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
> > >> + *
> > >> + * Set the object with the desired set of placements/regions in priority
> > >> + * order. Each entry must be unique and supported by the device.
> > >> + *
> > >> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
> > >> + * an equivalent layout of class:instance pair encodings. See struct
> > >> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
> > >> + * query the supported regions for a device.
> > >> + *
> > >> + * As an example, on discrete devices, if we wish to set the placement as
> > >> + * device local-memory we can do something like:
> > >> + *
> > >> + * .. code-block:: C
> > >> + *
> > >> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
> > >> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
> > >> + *              .memory_instance = 0,
> > >> + *      };
> > >> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
> > >> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
> > >> + *              .regions = (uintptr_t)&region_lmem,
> > >> + *              .num_regions = 1,
> > >> + *      };
> > >> + *      struct drm_i915_gem_create_ext create_ext = {
> > >> + *              .size = 16 * PAGE_SIZE,
> > >> + *              .extensions = (uintptr_t)&regions,
> > >> + *      };
> > >> + *
> > >> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
> > >> + *      if (err) ...
> > >> + *
> > >> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
> > >> + * along with the final object size in &drm_i915_gem_create_ext.size, which
> > >> + * should account for any rounding up, if required.
> > >> + */
> > >> +struct drm_i915_gem_create_ext_memory_regions {
> > >> +       /** @base: Extension link. See struct i915_user_extension. */
> > >> +       struct i915_user_extension base;
> > >> +
> > >> +       /** @pad: MBZ */
> > >> +       __u32 pad;
> > >> +       /** @num_regions: Number of elements in the @regions array. */
> > >> +       __u32 num_regions;
> > >> +       /**
> > >> +        * @regions: The regions/placements array.
> > >> +        *
> > >> +        * An array of struct drm_i915_gem_memory_class_instance.
> > >> +        */
> > >> +       __u64 regions;
> > >> +};
> > >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
> > >> new file mode 100644
> > >> index 000000000000..462f1efd9003
> > >> --- /dev/null
> > >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> > >> @@ -0,0 +1,130 @@
> > >> +=========================
> > >> +I915 DG1/LMEM RFC Section
> > >> +=========================
> > >> +
> > >> +Upstream plan
> > >> +=============
> > >> +For upstream the overall plan for landing all the DG1 stuff and turning it for
> > >> +real, with all the uAPI bits is:
> > >> +
> > >> +* Merge basic HW enabling of DG1(still without pciid)
> > >> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
> > >> +        * At this point we can still make changes, but importantly this lets us
> > >> +          start running IGTs which can utilize local-memory in CI
> > >> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
> > >> +        * TTM shrinker for discrete
> > >> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
> > >> +        * Use TTM CPU pagefault handler
> > >> +        * Route shmem backend over to TTM SYSTEM for discrete
> > >> +        * TTM purgeable object support
> > >> +        * Move i915 buddy allocator over to TTM
> > >> +        * MMAP ioctl mode(see `I915 MMAP`_)
> > >> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
> > >> +* Add pciid for DG1 and turn on uAPI for real
> > >> +
> > >> +New object placement and region query uAPI
> > >> +==========================================
> > >> +Starting from DG1 we need to give userspace the ability to allocate buffers from
> > >> +device local-memory. Currently the driver supports gem_create, which can place
> > >> +buffers in system memory via shmem, and the usual assortment of other
> > >> +interfaces, like dumb buffers and userptr.
> > >> +
> > >> +To support this new capability, while also providing a uAPI which will work
> > >> +beyond just DG1, we propose to offer three new bits of uAPI:
> > >> +
> > >> +DRM_I915_QUERY_MEMORY_REGIONS
> > >> +-----------------------------
> > >> +New query ID which allows userspace to discover the list of supported memory
> > >> +regions(like system-memory and local-memory) for a given device. We identify
> > >> +each region with a class and instance pair, which should be unique. The class
> > >> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> > >> +like DG1.
> > >> +
> > >> +Side note: The class/instance design is borrowed from our existing engine uAPI,
> > >> +where we describe every physical engine in terms of its class, and the
> > >> +particular instance, since we can have more than one per class.
> > >> +
> > >> +In the future we also want to expose more information which can further
> > >> +describe the capabilities of a region.
> > >> +
> > >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> > >> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
> > >> +
> > >> +GEM_CREATE_EXT
> > >> +--------------
> > >> +New ioctl which is basically just gem_create but now allows userspace to
> > >> +provide a chain of possible extensions. Note that if we don't provide any
> > >> +extensions then we get the exact same behaviour as gem_create.
> > >> +
> > >> +Side note: We also need to support PXP[1] in the near future, which is also
> > >> +applicable to integrated platforms, and adds its own gem_create_ext extension,
> > >> +which basically lets userspace mark a buffer as "protected".
> > >> +
> > >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> > >> +        :functions: drm_i915_gem_create_ext
> > >> +
> > >> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
> > >> +----------------------------------
> > >> +Implemented as an extension for gem_create_ext, we would now allow userspace to
> > >> +optionally provide an immutable list of preferred placements at creation time,
> > >> +in priority order, for a given buffer object.  For the placements we expect
> > >> +them each to use the class/instance encoding, as per the output of the regions
> > >> +query. Having the list in priority order will be useful in the future when
> > >> +placing an object, say during eviction.
> > >> +
> > >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> > >> +        :functions: drm_i915_gem_create_ext_memory_regions
> > >> +
> > >> +One fair criticism here is that this seems a little over-engineered[2]. If we
> > >> +just consider DG1 then yes, a simple gem_create.flags or something is totally
> > >> +all that's needed to tell the kernel to allocate the buffer in local-memory or
> > >> +whatever. However looking to the future we need uAPI which can also support
> > >> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
> > >> +multiple local-memory instances for a given device, and so using both class and
> > >> +instance in our uAPI to describe regions is desirable, although specifically
> > >> +for DG1 it's uninteresting, since we only have a single local-memory instance.
> > >> +
> > >> +Existing uAPI issues
> > >> +====================
> > >> +Some potential issues we still need to resolve.
> > >> +
> > >> +I915 MMAP
> > >> +---------
> > >> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
> > >> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
> > >> +object. TTM expects one MMAP at most for the lifetime of the object. If it
> > >> +turns out that we have to backpedal here, there might be some potential
> > >> +userspace fallout.
> > >> +
> > >> +I915 SET/GET CACHING
> > >> +--------------------
> > >> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
> > >> +DG1 doesn't support non-snooped pcie transactions, so we can just always
> > >> +allocate as WB for smem-only buffers.  If/when our hw gains support for
> > >> +non-snooped pcie transactions then we must fix this mode at allocation time as
> > >> +a new GEM extension.
> > >> +
> > >> +This is related to the mmap problem, because in general (meaning, when we're
> > >> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
> > >> +allocation mode.
> > >> +
> > >> +Possible idea is to let the kernel picks the mmap mode for userspace from the
> > >> +following table:
> > >> +
> > >> +smem-only: WB. Userspace does not need to call clflush.
> > >> +
> > >> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
> > >> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
> > >> +access, which is a bit inefficient.
> > >> +
> > >> +lmem only: always WC
> > >> +
> > >> +This means on discrete you only get a single mmap mode, all others must be
> > >> +rejected. That's probably going to be a new default mode or something like
> > >> +that.
> > >> +
> > >> +Links
> > >> +=====
> > >> +[1] https://patchwork.freedesktop.org/series/86798/
> > >> +
> > >> +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
> > >> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> > >> index a8621f7dab8b..05670442ca1b 100644
> > >> --- a/Documentation/gpu/rfc/index.rst
> > >> +++ b/Documentation/gpu/rfc/index.rst
> > >> @@ -15,3 +15,7 @@ host such documentation:
> > >>
> > >>   * Once the code has landed move all the documentation to the right places in
> > >>     the main core, helper or driver sections.
> > >> +
> > >> +.. toctree::
> > >> +
> > >> +    i915_gem_lmem.rst
> > >> --
> > >> 2.26.3
> > >>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (8 preceding siblings ...)
  2021-04-26 15:11 ` [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Jason Ekstrand
@ 2021-04-28 15:16 ` Kenneth Graunke
  2021-04-28 16:10   ` Matthew Auld
  2021-04-28 15:51 ` Jason Ekstrand
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Kenneth Graunke @ 2021-04-28 15:16 UTC (permalink / raw)
  To: intel-gfx, Matthew Auld
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, dri-devel, Daniel Vetter, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter


[-- Attachment #1.1: Type: text/plain, Size: 2076 bytes --]

On Monday, April 26, 2021 2:38:53 AM PDT Matthew Auld wrote:
> +Existing uAPI issues
> +====================
> +Some potential issues we still need to resolve.
> +
> +I915 MMAP
> +---------
> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
> +object. TTM expects one MMAP at most for the lifetime of the object. If it
> +turns out that we have to backpedal here, there might be some potential
> +userspace fallout.
> +
> +I915 SET/GET CACHING
> +--------------------
> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
> +DG1 doesn't support non-snooped pcie transactions, so we can just always
> +allocate as WB for smem-only buffers.  If/when our hw gains support for
> +non-snooped pcie transactions then we must fix this mode at allocation time as
> +a new GEM extension.
> +
> +This is related to the mmap problem, because in general (meaning, when we're
> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
> +allocation mode.
> +
> +Possible idea is to let the kernel picks the mmap mode for userspace from the
> +following table:
> +
> +smem-only: WB. Userspace does not need to call clflush.
> +
> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
> +access, which is a bit inefficient.

I think you meant to write something different here.  What I read was:

- If it's in SMEM, give them WC
- If it's in LMEM, give them WC

Presumably one of those should have been something else, since otherwise
you would have written "always WC" :)

> +
> +lmem only: always WC
> +
> +This means on discrete you only get a single mmap mode, all others must be
> +rejected. That's probably going to be a new default mode or something like
> +that.
> +
> +Links
> +=====
> +[1] https://patchwork.freedesktop.org/series/86798/
> +
> +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (9 preceding siblings ...)
  2021-04-28 15:16 ` Kenneth Graunke
@ 2021-04-28 15:51 ` Jason Ekstrand
  2021-04-28 16:41   ` Matthew Auld
  2021-04-28 17:30 ` Kenneth Graunke
  2021-04-28 17:39 ` Bloomfield, Jon
  12 siblings, 1 reply; 22+ messages in thread
From: Jason Ekstrand @ 2021-04-28 15:51 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, ML mesa-dev, Daniel Vetter

On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
>
> Add an entry for the new uAPI needed for DG1. Also add the overall
> upstream plan, including some notes for the TTM conversion.
>
> v2(Daniel):
>   - include the overall upstreaming plan
>   - add a note for mmap, there are differences here for TTM vs i915
>   - bunch of other suggestions from Daniel
> v3:
>  (Daniel)
>   - add a note for set/get caching stuff
>   - add some more docs for existing query and extensions stuff
>   - add an actual code example for regions query
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
>         - try a simpler design with the placements extension
>         - rather than have a generic setparam which can cover multiple
>           use cases, have each extension be responsible for one thing
>           only
> v4:
>  (Daniel)
>   - add some more notes for ttm conversion
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
>         - drop all the extra rsvd members for the region_query and
>           region_info, just keep the bare minimum needed for padding
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: mesa-dev@lists.freedesktop.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Acked-by: Dave Airlie <airlied@redhat.com>
> ---
>  Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
>  Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
>  Documentation/gpu/rfc/index.rst         |   4 +
>  3 files changed, 346 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
>  create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
>
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
> new file mode 100644
> index 000000000000..7ed59b6202d5
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
> @@ -0,0 +1,212 @@
> +/**
> + * enum drm_i915_gem_memory_class - Supported memory classes
> + */
> +enum drm_i915_gem_memory_class {
> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> +       I915_MEMORY_CLASS_SYSTEM = 0,
> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> +       I915_MEMORY_CLASS_DEVICE,
> +};
> +
> +/**
> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
> + */
> +struct drm_i915_gem_memory_class_instance {
> +       /** @memory_class: See enum drm_i915_gem_memory_class */
> +       __u16 memory_class;
> +
> +       /** @memory_instance: Which instance */
> +       __u16 memory_instance;
> +};
> +
> +/**
> + * struct drm_i915_memory_region_info - Describes one region as known to the
> + * driver.
> + *
> + * Note that we reserve some stuff here for potential future work. As an example
> + * we might want expose the capabilities(see @caps) for a given region, which
> + * could include things like if the region is CPU mappable/accessible, what are
> + * the supported mapping types etc.
> + *
> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
> + * at &drm_i915_query_item.query_id.
> + */
> +struct drm_i915_memory_region_info {
> +       /** @region: The class:instance pair encoding */
> +       struct drm_i915_gem_memory_class_instance region;
> +
> +       /** @pad: MBZ */
> +       __u32 pad;
> +
> +       /** @caps: MBZ */
> +       __u64 caps;

As was commented on another thread somewhere, if we're going to have
caps, we should have another __u64 supported_caps which tells
userspace what caps the kernel is capable of advertising.  That way
userspace can tell the difference between a kernel which doesn't
advertise a cap and a kernel which can advertise the cap but where the
cap isn't supported.

> +
> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
> +       __u64 probed_size;
> +
> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> +       __u64 unallocated_size;
> +};
> +
> +/**
> + * struct drm_i915_query_memory_regions
> + *
> + * The region info query enumerates all regions known to the driver by filling
> + * in an array of struct drm_i915_memory_region_info structures.
> + *
> + * Example for getting the list of supported regions:
> + *
> + * .. code-block:: C
> + *
> + *     struct drm_i915_query_memory_regions *info;
> + *     struct drm_i915_query_item item = {
> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> + *     };
> + *     struct drm_i915_query query = {
> + *             .num_items = 1,
> + *             .items_ptr = (uintptr_t)&item,
> + *     };
> + *     int err, i;
> + *
> + *     // First query the size of the blob we need, this needs to be large
> + *     // enough to hold our array of regions. The kernel will fill out the
> + *     // item.length for us, which is the number of bytes we need.
> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> + *     if (err) ...
> + *
> + *     info = calloc(1, item.length);
> + *     // Now that we allocated the required number of bytes, we call the ioctl
> + *     // again, this time with the data_ptr pointing to our newly allocated
> + *     // blob, which the kernel can then populate with the all the region info.
> + *     item.data_ptr = (uintptr_t)&info,
> + *
> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> + *     if (err) ...
> + *
> + *     // We can now access each region in the array
> + *     for (i = 0; i < info->num_regions; i++) {
> + *             struct drm_i915_memory_region_info mr = info->regions[i];
> + *             u16 class = mr.region.class;
> + *             u16 instance = mr.region.instance;
> + *
> + *             ....
> + *     }
> + *
> + *     free(info);
> + */
> +struct drm_i915_query_memory_regions {
> +       /** @num_regions: Number of supported regions */
> +       __u32 num_regions;
> +
> +       /** @pad: MBZ */
> +       __u32 pad;
> +
> +       /** @regions: Info about each supported region */
> +       struct drm_i915_memory_region_info regions[];
> +};
> +
> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
> +
> +/**
> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
> + * extension support using struct i915_user_extension.
> + *
> + * Note that in the future we want to have our buffer flags here, at least for
> + * the stuff that is immutable. Previously we would have two ioctls, one to
> + * create the object with gem_create, and another to apply various parameters,
> + * however this creates some ambiguity for the params which are considered
> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
> + */
> +struct drm_i915_gem_create_ext {
> +       /**
> +        * @size: Requested size for the object.
> +        *
> +        * The (page-aligned) allocated size for the object will be returned.
> +        *
> +        * Note that for some devices we have might have further minimum
> +        * page-size restrictions(larger than 4K), like for device local-memory.
> +        * However in general the final size here should always reflect any
> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> +        * extension to place the object in device local-memory.
> +        */
> +       __u64 size;
> +       /**
> +        * @handle: Returned handle for the object.
> +        *
> +        * Object handles are nonzero.
> +        */
> +       __u32 handle;
> +       /** @flags: MBZ */
> +       __u32 flags;
> +       /**
> +        * @extensions: The chain of extensions to apply to this object.
> +        *
> +        * This will be useful in the future when we need to support several
> +        * different extensions, and we need to apply more than one when
> +        * creating the object. See struct i915_user_extension.
> +        *
> +        * If we don't supply any extensions then we get the same old gem_create
> +        * behaviour.
> +        *
> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
> +        * struct drm_i915_gem_create_ext_memory_regions.
> +        */
> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> +       __u64 extensions;
> +};
> +
> +/**
> + * struct drm_i915_gem_create_ext_memory_regions - The
> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
> + *
> + * Set the object with the desired set of placements/regions in priority
> + * order. Each entry must be unique and supported by the device.
> + *
> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
> + * an equivalent layout of class:instance pair encodings. See struct
> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
> + * query the supported regions for a device.
> + *
> + * As an example, on discrete devices, if we wish to set the placement as
> + * device local-memory we can do something like:
> + *
> + * .. code-block:: C
> + *
> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
> + *              .memory_instance = 0,
> + *      };
> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
> + *              .regions = (uintptr_t)&region_lmem,
> + *              .num_regions = 1,
> + *      };
> + *      struct drm_i915_gem_create_ext create_ext = {
> + *              .size = 16 * PAGE_SIZE,
> + *              .extensions = (uintptr_t)&regions,
> + *      };
> + *
> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
> + *      if (err) ...
> + *
> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
> + * along with the final object size in &drm_i915_gem_create_ext.size, which
> + * should account for any rounding up, if required.
> + */
> +struct drm_i915_gem_create_ext_memory_regions {
> +       /** @base: Extension link. See struct i915_user_extension. */
> +       struct i915_user_extension base;
> +
> +       /** @pad: MBZ */
> +       __u32 pad;
> +       /** @num_regions: Number of elements in the @regions array. */
> +       __u32 num_regions;
> +       /**
> +        * @regions: The regions/placements array.
> +        *
> +        * An array of struct drm_i915_gem_memory_class_instance.
> +        */
> +       __u64 regions;
> +};
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
> new file mode 100644
> index 000000000000..462f1efd9003
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> @@ -0,0 +1,130 @@
> +=========================
> +I915 DG1/LMEM RFC Section
> +=========================
> +
> +Upstream plan
> +=============
> +For upstream the overall plan for landing all the DG1 stuff and turning it for
> +real, with all the uAPI bits is:
> +
> +* Merge basic HW enabling of DG1(still without pciid)
> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
> +        * At this point we can still make changes, but importantly this lets us
> +          start running IGTs which can utilize local-memory in CI
> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
> +        * TTM shrinker for discrete
> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
> +        * Use TTM CPU pagefault handler
> +        * Route shmem backend over to TTM SYSTEM for discrete
> +        * TTM purgeable object support
> +        * Move i915 buddy allocator over to TTM
> +        * MMAP ioctl mode(see `I915 MMAP`_)
> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
> +* Add pciid for DG1 and turn on uAPI for real

Part of this process should be another RFC e-mail, cc'd to mesa-dev
for final sign-off before we lock the API down.


> +
> +New object placement and region query uAPI
> +==========================================
> +Starting from DG1 we need to give userspace the ability to allocate buffers from
> +device local-memory. Currently the driver supports gem_create, which can place
> +buffers in system memory via shmem, and the usual assortment of other
> +interfaces, like dumb buffers and userptr.
> +
> +To support this new capability, while also providing a uAPI which will work
> +beyond just DG1, we propose to offer three new bits of uAPI:
> +
> +DRM_I915_QUERY_MEMORY_REGIONS
> +-----------------------------
> +New query ID which allows userspace to discover the list of supported memory
> +regions(like system-memory and local-memory) for a given device. We identify
> +each region with a class and instance pair, which should be unique. The class
> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> +like DG1.
> +
> +Side note: The class/instance design is borrowed from our existing engine uAPI,
> +where we describe every physical engine in terms of its class, and the
> +particular instance, since we can have more than one per class.
> +
> +In the future we also want to expose more information which can further
> +describe the capabilities of a region.
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
> +
> +GEM_CREATE_EXT
> +--------------
> +New ioctl which is basically just gem_create but now allows userspace to
> +provide a chain of possible extensions. Note that if we don't provide any
> +extensions then we get the exact same behaviour as gem_create.

"don't provide any extensions and set flags=0"

> +
> +Side note: We also need to support PXP[1] in the near future, which is also
> +applicable to integrated platforms, and adds its own gem_create_ext extension,
> +which basically lets userspace mark a buffer as "protected".
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> +        :functions: drm_i915_gem_create_ext
> +
> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
> +----------------------------------
> +Implemented as an extension for gem_create_ext, we would now allow userspace to
> +optionally provide an immutable list of preferred placements at creation time,
> +in priority order, for a given buffer object.  For the placements we expect
> +them each to use the class/instance encoding, as per the output of the regions
> +query. Having the list in priority order will be useful in the future when
> +placing an object, say during eviction.
> +
> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> +        :functions: drm_i915_gem_create_ext_memory_regions
> +
> +One fair criticism here is that this seems a little over-engineered[2]. If we
> +just consider DG1 then yes, a simple gem_create.flags or something is totally
> +all that's needed to tell the kernel to allocate the buffer in local-memory or
> +whatever. However looking to the future we need uAPI which can also support
> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
> +multiple local-memory instances for a given device, and so using both class and
> +instance in our uAPI to describe regions is desirable, although specifically
> +for DG1 it's uninteresting, since we only have a single local-memory instance.
> +
> +Existing uAPI issues
> +====================
> +Some potential issues we still need to resolve.
> +
> +I915 MMAP
> +---------
> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
> +object. TTM expects one MMAP at most for the lifetime of the object. If it
> +turns out that we have to backpedal here, there might be some potential
> +userspace fallout.
> +
> +I915 SET/GET CACHING
> +--------------------
> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
> +DG1 doesn't support non-snooped pcie transactions, so we can just always
> +allocate as WB for smem-only buffers.  If/when our hw gains support for
> +non-snooped pcie transactions then we must fix this mode at allocation time as
> +a new GEM extension.

From the Mesa PoV this should mostly be fine.  In Vulkan, we only ever
SET_CACHING right after BO creation.  In GL, we do SET_CACHING
multiple times on a BO but, from the perspective of the iris_bufmgr
API, it happens on BO creation.  We only SET_CACHING if we pull a BO
out of our internal cache with the wrong caching setting.  The Mesa
fix is pretty simple:  just add caching to the key we use for our
internal BO cache.  We can't do that retroactively, of course, but we
can fairly easily do it for all LMEM platforms going forward.

> +
> +This is related to the mmap problem, because in general (meaning, when we're
> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
> +allocation mode.
> +
> +Possible idea is to let the kernel picks the mmap mode for userspace from the
> +following table:
> +
> +smem-only: WB. Userspace does not need to call clflush.
> +
> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
> +access, which is a bit inefficient.
> +
> +lmem only: always WC
> +
> +This means on discrete you only get a single mmap mode, all others must be
> +rejected. That's probably going to be a new default mode or something like
> +that.

Seems reasonable for now, I think.  Again, we can't apply it
retroactively to old Mesa drivers that have already shipped but I
don't see why we can't do this going forward.  We can also add a
create flag for changing caching settings.

--Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-28 15:16 ` Kenneth Graunke
@ 2021-04-28 16:10   ` Matthew Auld
  0 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2021-04-28 16:10 UTC (permalink / raw)
  To: Kenneth Graunke, intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, dri-devel, Daniel Vetter, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter

On 28/04/2021 16:16, Kenneth Graunke wrote:
> On Monday, April 26, 2021 2:38:53 AM PDT Matthew Auld wrote:
>> +Existing uAPI issues
>> +====================
>> +Some potential issues we still need to resolve.
>> +
>> +I915 MMAP
>> +---------
>> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
>> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
>> +object. TTM expects one MMAP at most for the lifetime of the object. If it
>> +turns out that we have to backpedal here, there might be some potential
>> +userspace fallout.
>> +
>> +I915 SET/GET CACHING
>> +--------------------
>> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
>> +DG1 doesn't support non-snooped pcie transactions, so we can just always
>> +allocate as WB for smem-only buffers.  If/when our hw gains support for
>> +non-snooped pcie transactions then we must fix this mode at allocation time as
>> +a new GEM extension.
>> +
>> +This is related to the mmap problem, because in general (meaning, when we're
>> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
>> +allocation mode.
>> +
>> +Possible idea is to let the kernel picks the mmap mode for userspace from the
>> +following table:
>> +
>> +smem-only: WB. Userspace does not need to call clflush.
>> +
>> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
>> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
>> +access, which is a bit inefficient.
> 
> I think you meant to write something different here.  What I read was:
> 
> - If it's in SMEM, give them WC
> - If it's in LMEM, give them WC
> 
> Presumably one of those should have been something else, since otherwise
> you would have written "always WC" :)

It should have been "always WC", sorry for the confusion.

"smem+lmem: We only ever allow a single mode, so simply allocate this as 
uncached memory, and always give userspace a WC mapping. GPU still does 
snooped access here(assuming we can't turn it off like on DG1), which is 
a bit inefficient."

> 
>> +
>> +lmem only: always WC
>> +
>> +This means on discrete you only get a single mmap mode, all others must be
>> +rejected. That's probably going to be a new default mode or something like
>> +that.
>> +
>> +Links
>> +=====
>> +[1] https://patchwork.freedesktop.org/series/86798/
>> +
>> +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-28 15:51 ` Jason Ekstrand
@ 2021-04-28 16:41   ` Matthew Auld
  2021-04-28 16:56     ` Jason Ekstrand
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Auld @ 2021-04-28 16:41 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, ML mesa-dev, Daniel Vetter

On 28/04/2021 16:51, Jason Ekstrand wrote:
> On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
>>
>> Add an entry for the new uAPI needed for DG1. Also add the overall
>> upstream plan, including some notes for the TTM conversion.
>>
>> v2(Daniel):
>>    - include the overall upstreaming plan
>>    - add a note for mmap, there are differences here for TTM vs i915
>>    - bunch of other suggestions from Daniel
>> v3:
>>   (Daniel)
>>    - add a note for set/get caching stuff
>>    - add some more docs for existing query and extensions stuff
>>    - add an actual code example for regions query
>>    - bunch of other stuff
>>   (Jason)
>>    - uAPI change(!):
>>          - try a simpler design with the placements extension
>>          - rather than have a generic setparam which can cover multiple
>>            use cases, have each extension be responsible for one thing
>>            only
>> v4:
>>   (Daniel)
>>    - add some more notes for ttm conversion
>>    - bunch of other stuff
>>   (Jason)
>>    - uAPI change(!):
>>          - drop all the extra rsvd members for the region_query and
>>            region_info, just keep the bare minimum needed for padding
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
>> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: Jason Ekstrand <jason@jlekstrand.net>
>> Cc: Dave Airlie <airlied@gmail.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: mesa-dev@lists.freedesktop.org
>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Acked-by: Dave Airlie <airlied@redhat.com>
>> ---
>>   Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
>>   Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
>>   Documentation/gpu/rfc/index.rst         |   4 +
>>   3 files changed, 346 insertions(+)
>>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
>>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
>>
>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
>> new file mode 100644
>> index 000000000000..7ed59b6202d5
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
>> @@ -0,0 +1,212 @@
>> +/**
>> + * enum drm_i915_gem_memory_class - Supported memory classes
>> + */
>> +enum drm_i915_gem_memory_class {
>> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
>> +       I915_MEMORY_CLASS_SYSTEM = 0,
>> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
>> +       I915_MEMORY_CLASS_DEVICE,
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
>> + */
>> +struct drm_i915_gem_memory_class_instance {
>> +       /** @memory_class: See enum drm_i915_gem_memory_class */
>> +       __u16 memory_class;
>> +
>> +       /** @memory_instance: Which instance */
>> +       __u16 memory_instance;
>> +};
>> +
>> +/**
>> + * struct drm_i915_memory_region_info - Describes one region as known to the
>> + * driver.
>> + *
>> + * Note that we reserve some stuff here for potential future work. As an example
>> + * we might want expose the capabilities(see @caps) for a given region, which
>> + * could include things like if the region is CPU mappable/accessible, what are
>> + * the supported mapping types etc.
>> + *
>> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
>> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
>> + * at &drm_i915_query_item.query_id.
>> + */
>> +struct drm_i915_memory_region_info {
>> +       /** @region: The class:instance pair encoding */
>> +       struct drm_i915_gem_memory_class_instance region;
>> +
>> +       /** @pad: MBZ */
>> +       __u32 pad;
>> +
>> +       /** @caps: MBZ */
>> +       __u64 caps;
> 
> As was commented on another thread somewhere, if we're going to have
> caps, we should have another __u64 supported_caps which tells
> userspace what caps the kernel is capable of advertising.  That way
> userspace can tell the difference between a kernel which doesn't
> advertise a cap and a kernel which can advertise the cap but where the
> cap isn't supported.

Yeah, my plan was to just go with rsvd[], so drop the flags/caps for 
now, and add a comment/example for how we plan to extend this in the 
future(using your union + array trick). Hopefully that's reasonable.

> 
>> +
>> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
>> +       __u64 probed_size;
>> +
>> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
>> +       __u64 unallocated_size;
>> +};
>> +
>> +/**
>> + * struct drm_i915_query_memory_regions
>> + *
>> + * The region info query enumerates all regions known to the driver by filling
>> + * in an array of struct drm_i915_memory_region_info structures.
>> + *
>> + * Example for getting the list of supported regions:
>> + *
>> + * .. code-block:: C
>> + *
>> + *     struct drm_i915_query_memory_regions *info;
>> + *     struct drm_i915_query_item item = {
>> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
>> + *     };
>> + *     struct drm_i915_query query = {
>> + *             .num_items = 1,
>> + *             .items_ptr = (uintptr_t)&item,
>> + *     };
>> + *     int err, i;
>> + *
>> + *     // First query the size of the blob we need, this needs to be large
>> + *     // enough to hold our array of regions. The kernel will fill out the
>> + *     // item.length for us, which is the number of bytes we need.
>> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
>> + *     if (err) ...
>> + *
>> + *     info = calloc(1, item.length);
>> + *     // Now that we allocated the required number of bytes, we call the ioctl
>> + *     // again, this time with the data_ptr pointing to our newly allocated
>> + *     // blob, which the kernel can then populate with the all the region info.
>> + *     item.data_ptr = (uintptr_t)&info,
>> + *
>> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
>> + *     if (err) ...
>> + *
>> + *     // We can now access each region in the array
>> + *     for (i = 0; i < info->num_regions; i++) {
>> + *             struct drm_i915_memory_region_info mr = info->regions[i];
>> + *             u16 class = mr.region.class;
>> + *             u16 instance = mr.region.instance;
>> + *
>> + *             ....
>> + *     }
>> + *
>> + *     free(info);
>> + */
>> +struct drm_i915_query_memory_regions {
>> +       /** @num_regions: Number of supported regions */
>> +       __u32 num_regions;
>> +
>> +       /** @pad: MBZ */
>> +       __u32 pad;
>> +
>> +       /** @regions: Info about each supported region */
>> +       struct drm_i915_memory_region_info regions[];
>> +};
>> +
>> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
>> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
>> + * extension support using struct i915_user_extension.
>> + *
>> + * Note that in the future we want to have our buffer flags here, at least for
>> + * the stuff that is immutable. Previously we would have two ioctls, one to
>> + * create the object with gem_create, and another to apply various parameters,
>> + * however this creates some ambiguity for the params which are considered
>> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
>> + */
>> +struct drm_i915_gem_create_ext {
>> +       /**
>> +        * @size: Requested size for the object.
>> +        *
>> +        * The (page-aligned) allocated size for the object will be returned.
>> +        *
>> +        * Note that for some devices we have might have further minimum
>> +        * page-size restrictions(larger than 4K), like for device local-memory.
>> +        * However in general the final size here should always reflect any
>> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> +        * extension to place the object in device local-memory.
>> +        */
>> +       __u64 size;
>> +       /**
>> +        * @handle: Returned handle for the object.
>> +        *
>> +        * Object handles are nonzero.
>> +        */
>> +       __u32 handle;
>> +       /** @flags: MBZ */
>> +       __u32 flags;
>> +       /**
>> +        * @extensions: The chain of extensions to apply to this object.
>> +        *
>> +        * This will be useful in the future when we need to support several
>> +        * different extensions, and we need to apply more than one when
>> +        * creating the object. See struct i915_user_extension.
>> +        *
>> +        * If we don't supply any extensions then we get the same old gem_create
>> +        * behaviour.
>> +        *
>> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
>> +        * struct drm_i915_gem_create_ext_memory_regions.
>> +        */
>> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
>> +       __u64 extensions;
>> +};
>> +
>> +/**
>> + * struct drm_i915_gem_create_ext_memory_regions - The
>> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
>> + *
>> + * Set the object with the desired set of placements/regions in priority
>> + * order. Each entry must be unique and supported by the device.
>> + *
>> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
>> + * an equivalent layout of class:instance pair encodings. See struct
>> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
>> + * query the supported regions for a device.
>> + *
>> + * As an example, on discrete devices, if we wish to set the placement as
>> + * device local-memory we can do something like:
>> + *
>> + * .. code-block:: C
>> + *
>> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
>> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
>> + *              .memory_instance = 0,
>> + *      };
>> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
>> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
>> + *              .regions = (uintptr_t)&region_lmem,
>> + *              .num_regions = 1,
>> + *      };
>> + *      struct drm_i915_gem_create_ext create_ext = {
>> + *              .size = 16 * PAGE_SIZE,
>> + *              .extensions = (uintptr_t)&regions,
>> + *      };
>> + *
>> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
>> + *      if (err) ...
>> + *
>> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
>> + * along with the final object size in &drm_i915_gem_create_ext.size, which
>> + * should account for any rounding up, if required.
>> + */
>> +struct drm_i915_gem_create_ext_memory_regions {
>> +       /** @base: Extension link. See struct i915_user_extension. */
>> +       struct i915_user_extension base;
>> +
>> +       /** @pad: MBZ */
>> +       __u32 pad;
>> +       /** @num_regions: Number of elements in the @regions array. */
>> +       __u32 num_regions;
>> +       /**
>> +        * @regions: The regions/placements array.
>> +        *
>> +        * An array of struct drm_i915_gem_memory_class_instance.
>> +        */
>> +       __u64 regions;
>> +};
>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
>> new file mode 100644
>> index 000000000000..462f1efd9003
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
>> @@ -0,0 +1,130 @@
>> +=========================
>> +I915 DG1/LMEM RFC Section
>> +=========================
>> +
>> +Upstream plan
>> +=============
>> +For upstream the overall plan for landing all the DG1 stuff and turning it for
>> +real, with all the uAPI bits is:
>> +
>> +* Merge basic HW enabling of DG1(still without pciid)
>> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
>> +        * At this point we can still make changes, but importantly this lets us
>> +          start running IGTs which can utilize local-memory in CI
>> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
>> +        * TTM shrinker for discrete
>> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
>> +        * Use TTM CPU pagefault handler
>> +        * Route shmem backend over to TTM SYSTEM for discrete
>> +        * TTM purgeable object support
>> +        * Move i915 buddy allocator over to TTM
>> +        * MMAP ioctl mode(see `I915 MMAP`_)
>> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
>> +* Add pciid for DG1 and turn on uAPI for real
> 
> Part of this process should be another RFC e-mail, cc'd to mesa-dev
> for final sign-off before we lock the API down.

Do you mean for the actual patches that implement the proposed uAPI, or 
are you referring to this doc/rfc patch?

> 
> 
>> +
>> +New object placement and region query uAPI
>> +==========================================
>> +Starting from DG1 we need to give userspace the ability to allocate buffers from
>> +device local-memory. Currently the driver supports gem_create, which can place
>> +buffers in system memory via shmem, and the usual assortment of other
>> +interfaces, like dumb buffers and userptr.
>> +
>> +To support this new capability, while also providing a uAPI which will work
>> +beyond just DG1, we propose to offer three new bits of uAPI:
>> +
>> +DRM_I915_QUERY_MEMORY_REGIONS
>> +-----------------------------
>> +New query ID which allows userspace to discover the list of supported memory
>> +regions(like system-memory and local-memory) for a given device. We identify
>> +each region with a class and instance pair, which should be unique. The class
>> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
>> +like DG1.
>> +
>> +Side note: The class/instance design is borrowed from our existing engine uAPI,
>> +where we describe every physical engine in terms of its class, and the
>> +particular instance, since we can have more than one per class.
>> +
>> +In the future we also want to expose more information which can further
>> +describe the capabilities of a region.
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
>> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
>> +
>> +GEM_CREATE_EXT
>> +--------------
>> +New ioctl which is basically just gem_create but now allows userspace to
>> +provide a chain of possible extensions. Note that if we don't provide any
>> +extensions then we get the exact same behaviour as gem_create.
> 
> "don't provide any extensions and set flags=0"
> 
>> +
>> +Side note: We also need to support PXP[1] in the near future, which is also
>> +applicable to integrated platforms, and adds its own gem_create_ext extension,
>> +which basically lets userspace mark a buffer as "protected".
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
>> +        :functions: drm_i915_gem_create_ext
>> +
>> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> +----------------------------------
>> +Implemented as an extension for gem_create_ext, we would now allow userspace to
>> +optionally provide an immutable list of preferred placements at creation time,
>> +in priority order, for a given buffer object.  For the placements we expect
>> +them each to use the class/instance encoding, as per the output of the regions
>> +query. Having the list in priority order will be useful in the future when
>> +placing an object, say during eviction.
>> +
>> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
>> +        :functions: drm_i915_gem_create_ext_memory_regions
>> +
>> +One fair criticism here is that this seems a little over-engineered[2]. If we
>> +just consider DG1 then yes, a simple gem_create.flags or something is totally
>> +all that's needed to tell the kernel to allocate the buffer in local-memory or
>> +whatever. However looking to the future we need uAPI which can also support
>> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
>> +multiple local-memory instances for a given device, and so using both class and
>> +instance in our uAPI to describe regions is desirable, although specifically
>> +for DG1 it's uninteresting, since we only have a single local-memory instance.
>> +
>> +Existing uAPI issues
>> +====================
>> +Some potential issues we still need to resolve.
>> +
>> +I915 MMAP
>> +---------
>> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
>> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
>> +object. TTM expects one MMAP at most for the lifetime of the object. If it
>> +turns out that we have to backpedal here, there might be some potential
>> +userspace fallout.
>> +
>> +I915 SET/GET CACHING
>> +--------------------
>> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
>> +DG1 doesn't support non-snooped pcie transactions, so we can just always
>> +allocate as WB for smem-only buffers.  If/when our hw gains support for
>> +non-snooped pcie transactions then we must fix this mode at allocation time as
>> +a new GEM extension.
> 
>  From the Mesa PoV this should mostly be fine.  In Vulkan, we only ever
> SET_CACHING right after BO creation.  In GL, we do SET_CACHING
> multiple times on a BO but, from the perspective of the iris_bufmgr
> API, it happens on BO creation.  We only SET_CACHING if we pull a BO
> out of our internal cache with the wrong caching setting.  The Mesa
> fix is pretty simple:  just add caching to the key we use for our
> internal BO cache.  We can't do that retroactively, of course, but we
> can fairly easily do it for all LMEM platforms going forward.

Slightly orthogonal: what does Mesa do here for snooped vs LLC 
platforms? Does it make such a distinction? Just curious.

> 
>> +
>> +This is related to the mmap problem, because in general (meaning, when we're
>> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
>> +allocation mode.
>> +
>> +Possible idea is to let the kernel picks the mmap mode for userspace from the
>> +following table:
>> +
>> +smem-only: WB. Userspace does not need to call clflush.
>> +
>> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
>> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
>> +access, which is a bit inefficient.
>> +
>> +lmem only: always WC
>> +
>> +This means on discrete you only get a single mmap mode, all others must be
>> +rejected. That's probably going to be a new default mode or something like
>> +that.
> 
> Seems reasonable for now, I think.  Again, we can't apply it
> retroactively to old Mesa drivers that have already shipped but I
> don't see why we can't do this going forward.  We can also add a
> create flag for changing caching settings.
> 
> --Jason
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-28 16:41   ` Matthew Auld
@ 2021-04-28 16:56     ` Jason Ekstrand
  2021-04-28 17:12       ` Kenneth Graunke
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Ekstrand @ 2021-04-28 16:56 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Kenneth Graunke, Daniele Ceraolo Spurio,
	Jon Bloomfield, ML mesa-dev, Daniel Vetter

On Wed, Apr 28, 2021 at 11:41 AM Matthew Auld <matthew.auld@intel.com> wrote:
>
> On 28/04/2021 16:51, Jason Ekstrand wrote:
> > On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld <matthew.auld@intel.com> wrote:
> >>
> >> Add an entry for the new uAPI needed for DG1. Also add the overall
> >> upstream plan, including some notes for the TTM conversion.
> >>
> >> v2(Daniel):
> >>    - include the overall upstreaming plan
> >>    - add a note for mmap, there are differences here for TTM vs i915
> >>    - bunch of other suggestions from Daniel
> >> v3:
> >>   (Daniel)
> >>    - add a note for set/get caching stuff
> >>    - add some more docs for existing query and extensions stuff
> >>    - add an actual code example for regions query
> >>    - bunch of other stuff
> >>   (Jason)
> >>    - uAPI change(!):
> >>          - try a simpler design with the placements extension
> >>          - rather than have a generic setparam which can cover multiple
> >>            use cases, have each extension be responsible for one thing
> >>            only
> >> v4:
> >>   (Daniel)
> >>    - add some more notes for ttm conversion
> >>    - bunch of other stuff
> >>   (Jason)
> >>    - uAPI change(!):
> >>          - drop all the extra rsvd members for the region_query and
> >>            region_info, just keep the bare minimum needed for padding
> >>
> >> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> >> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> >> Cc: Jordan Justen <jordan.l.justen@intel.com>
> >> Cc: Daniel Vetter <daniel.vetter@intel.com>
> >> Cc: Kenneth Graunke <kenneth@whitecape.org>
> >> Cc: Jason Ekstrand <jason@jlekstrand.net>
> >> Cc: Dave Airlie <airlied@gmail.com>
> >> Cc: dri-devel@lists.freedesktop.org
> >> Cc: mesa-dev@lists.freedesktop.org
> >> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Acked-by: Dave Airlie <airlied@redhat.com>
> >> ---
> >>   Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
> >>   Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
> >>   Documentation/gpu/rfc/index.rst         |   4 +
> >>   3 files changed, 346 insertions(+)
> >>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
> >>   create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
> >>
> >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h
> >> new file mode 100644
> >> index 000000000000..7ed59b6202d5
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h
> >> @@ -0,0 +1,212 @@
> >> +/**
> >> + * enum drm_i915_gem_memory_class - Supported memory classes
> >> + */
> >> +enum drm_i915_gem_memory_class {
> >> +       /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> >> +       I915_MEMORY_CLASS_SYSTEM = 0,
> >> +       /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> >> +       I915_MEMORY_CLASS_DEVICE,
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_gem_memory_class_instance - Identify particular memory region
> >> + */
> >> +struct drm_i915_gem_memory_class_instance {
> >> +       /** @memory_class: See enum drm_i915_gem_memory_class */
> >> +       __u16 memory_class;
> >> +
> >> +       /** @memory_instance: Which instance */
> >> +       __u16 memory_instance;
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_memory_region_info - Describes one region as known to the
> >> + * driver.
> >> + *
> >> + * Note that we reserve some stuff here for potential future work. As an example
> >> + * we might want expose the capabilities(see @caps) for a given region, which
> >> + * could include things like if the region is CPU mappable/accessible, what are
> >> + * the supported mapping types etc.
> >> + *
> >> + * Note this is using both struct drm_i915_query_item and struct drm_i915_query.
> >> + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
> >> + * at &drm_i915_query_item.query_id.
> >> + */
> >> +struct drm_i915_memory_region_info {
> >> +       /** @region: The class:instance pair encoding */
> >> +       struct drm_i915_gem_memory_class_instance region;
> >> +
> >> +       /** @pad: MBZ */
> >> +       __u32 pad;
> >> +
> >> +       /** @caps: MBZ */
> >> +       __u64 caps;
> >
> > As was commented on another thread somewhere, if we're going to have
> > caps, we should have another __u64 supported_caps which tells
> > userspace what caps the kernel is capable of advertising.  That way
> > userspace can tell the difference between a kernel which doesn't
> > advertise a cap and a kernel which can advertise the cap but where the
> > cap isn't supported.
>
> Yeah, my plan was to just go with rsvd[], so drop the flags/caps for
> now, and add a comment/example for how we plan to extend this in the
> future(using your union + array trick). Hopefully that's reasonable.

That's fine with me too.  Just as long as we have an established plan
that works.

> >> +
> >> +       /** @probed_size: Memory probed by the driver (-1 = unknown) */
> >> +       __u64 probed_size;
> >> +
> >> +       /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> >> +       __u64 unallocated_size;
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_query_memory_regions
> >> + *
> >> + * The region info query enumerates all regions known to the driver by filling
> >> + * in an array of struct drm_i915_memory_region_info structures.
> >> + *
> >> + * Example for getting the list of supported regions:
> >> + *
> >> + * .. code-block:: C
> >> + *
> >> + *     struct drm_i915_query_memory_regions *info;
> >> + *     struct drm_i915_query_item item = {
> >> + *             .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> >> + *     };
> >> + *     struct drm_i915_query query = {
> >> + *             .num_items = 1,
> >> + *             .items_ptr = (uintptr_t)&item,
> >> + *     };
> >> + *     int err, i;
> >> + *
> >> + *     // First query the size of the blob we need, this needs to be large
> >> + *     // enough to hold our array of regions. The kernel will fill out the
> >> + *     // item.length for us, which is the number of bytes we need.
> >> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> >> + *     if (err) ...
> >> + *
> >> + *     info = calloc(1, item.length);
> >> + *     // Now that we allocated the required number of bytes, we call the ioctl
> >> + *     // again, this time with the data_ptr pointing to our newly allocated
> >> + *     // blob, which the kernel can then populate with the all the region info.
> >> + *     item.data_ptr = (uintptr_t)&info,
> >> + *
> >> + *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
> >> + *     if (err) ...
> >> + *
> >> + *     // We can now access each region in the array
> >> + *     for (i = 0; i < info->num_regions; i++) {
> >> + *             struct drm_i915_memory_region_info mr = info->regions[i];
> >> + *             u16 class = mr.region.class;
> >> + *             u16 instance = mr.region.instance;
> >> + *
> >> + *             ....
> >> + *     }
> >> + *
> >> + *     free(info);
> >> + */
> >> +struct drm_i915_query_memory_regions {
> >> +       /** @num_regions: Number of supported regions */
> >> +       __u32 num_regions;
> >> +
> >> +       /** @pad: MBZ */
> >> +       __u32 pad;
> >> +
> >> +       /** @regions: Info about each supported region */
> >> +       struct drm_i915_memory_region_info regions[];
> >> +};
> >> +
> >> +#define DRM_I915_GEM_CREATE_EXT                0xdeadbeaf
> >> +#define DRM_IOCTL_I915_GEM_CREATE_EXT  DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
> >> +
> >> +/**
> >> + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
> >> + * extension support using struct i915_user_extension.
> >> + *
> >> + * Note that in the future we want to have our buffer flags here, at least for
> >> + * the stuff that is immutable. Previously we would have two ioctls, one to
> >> + * create the object with gem_create, and another to apply various parameters,
> >> + * however this creates some ambiguity for the params which are considered
> >> + * immutable. Also in general we're phasing out the various SET/GET ioctls.
> >> + */
> >> +struct drm_i915_gem_create_ext {
> >> +       /**
> >> +        * @size: Requested size for the object.
> >> +        *
> >> +        * The (page-aligned) allocated size for the object will be returned.
> >> +        *
> >> +        * Note that for some devices we have might have further minimum
> >> +        * page-size restrictions(larger than 4K), like for device local-memory.
> >> +        * However in general the final size here should always reflect any
> >> +        * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> >> +        * extension to place the object in device local-memory.
> >> +        */
> >> +       __u64 size;
> >> +       /**
> >> +        * @handle: Returned handle for the object.
> >> +        *
> >> +        * Object handles are nonzero.
> >> +        */
> >> +       __u32 handle;
> >> +       /** @flags: MBZ */
> >> +       __u32 flags;
> >> +       /**
> >> +        * @extensions: The chain of extensions to apply to this object.
> >> +        *
> >> +        * This will be useful in the future when we need to support several
> >> +        * different extensions, and we need to apply more than one when
> >> +        * creating the object. See struct i915_user_extension.
> >> +        *
> >> +        * If we don't supply any extensions then we get the same old gem_create
> >> +        * behaviour.
> >> +        *
> >> +        * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
> >> +        * struct drm_i915_gem_create_ext_memory_regions.
> >> +        */
> >> +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
> >> +       __u64 extensions;
> >> +};
> >> +
> >> +/**
> >> + * struct drm_i915_gem_create_ext_memory_regions - The
> >> + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
> >> + *
> >> + * Set the object with the desired set of placements/regions in priority
> >> + * order. Each entry must be unique and supported by the device.
> >> + *
> >> + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or
> >> + * an equivalent layout of class:instance pair encodings. See struct
> >> + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
> >> + * query the supported regions for a device.
> >> + *
> >> + * As an example, on discrete devices, if we wish to set the placement as
> >> + * device local-memory we can do something like:
> >> + *
> >> + * .. code-block:: C
> >> + *
> >> + *     struct drm_i915_gem_memory_class_instance region_lmem = {
> >> + *              .memory_class = I915_MEMORY_CLASS_DEVICE,
> >> + *              .memory_instance = 0,
> >> + *      };
> >> + *      struct drm_i915_gem_create_ext_memory_regions regions = {
> >> + *              .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
> >> + *              .regions = (uintptr_t)&region_lmem,
> >> + *              .num_regions = 1,
> >> + *      };
> >> + *      struct drm_i915_gem_create_ext create_ext = {
> >> + *              .size = 16 * PAGE_SIZE,
> >> + *              .extensions = (uintptr_t)&regions,
> >> + *      };
> >> + *
> >> + *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
> >> + *      if (err) ...
> >> + *
> >> + * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
> >> + * along with the final object size in &drm_i915_gem_create_ext.size, which
> >> + * should account for any rounding up, if required.
> >> + */
> >> +struct drm_i915_gem_create_ext_memory_regions {
> >> +       /** @base: Extension link. See struct i915_user_extension. */
> >> +       struct i915_user_extension base;
> >> +
> >> +       /** @pad: MBZ */
> >> +       __u32 pad;
> >> +       /** @num_regions: Number of elements in the @regions array. */
> >> +       __u32 num_regions;
> >> +       /**
> >> +        * @regions: The regions/placements array.
> >> +        *
> >> +        * An array of struct drm_i915_gem_memory_class_instance.
> >> +        */
> >> +       __u64 regions;
> >> +};
> >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst
> >> new file mode 100644
> >> index 000000000000..462f1efd9003
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> >> @@ -0,0 +1,130 @@
> >> +=========================
> >> +I915 DG1/LMEM RFC Section
> >> +=========================
> >> +
> >> +Upstream plan
> >> +=============
> >> +For upstream the overall plan for landing all the DG1 stuff and turning it for
> >> +real, with all the uAPI bits is:
> >> +
> >> +* Merge basic HW enabling of DG1(still without pciid)
> >> +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
> >> +        * At this point we can still make changes, but importantly this lets us
> >> +          start running IGTs which can utilize local-memory in CI
> >> +* Convert over to TTM, make sure it all keeps working. Some of the work items:
> >> +        * TTM shrinker for discrete
> >> +        * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
> >> +        * Use TTM CPU pagefault handler
> >> +        * Route shmem backend over to TTM SYSTEM for discrete
> >> +        * TTM purgeable object support
> >> +        * Move i915 buddy allocator over to TTM
> >> +        * MMAP ioctl mode(see `I915 MMAP`_)
> >> +        * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
> >> +* Add pciid for DG1 and turn on uAPI for real
> >
> > Part of this process should be another RFC e-mail, cc'd to mesa-dev
> > for final sign-off before we lock the API down.
>
> Do you mean for the actual patches that implement the proposed uAPI, or
> are you referring to this doc/rfc patch?

I mean that, before we add the PCI ID or remove the CONFIG_BROKEN or
whatever it is that enables the new uAPI for real,  we should have one
final review of the new uAPI.


> >> +
> >> +New object placement and region query uAPI
> >> +==========================================
> >> +Starting from DG1 we need to give userspace the ability to allocate buffers from
> >> +device local-memory. Currently the driver supports gem_create, which can place
> >> +buffers in system memory via shmem, and the usual assortment of other
> >> +interfaces, like dumb buffers and userptr.
> >> +
> >> +To support this new capability, while also providing a uAPI which will work
> >> +beyond just DG1, we propose to offer three new bits of uAPI:
> >> +
> >> +DRM_I915_QUERY_MEMORY_REGIONS
> >> +-----------------------------
> >> +New query ID which allows userspace to discover the list of supported memory
> >> +regions(like system-memory and local-memory) for a given device. We identify
> >> +each region with a class and instance pair, which should be unique. The class
> >> +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> >> +like DG1.
> >> +
> >> +Side note: The class/instance design is borrowed from our existing engine uAPI,
> >> +where we describe every physical engine in terms of its class, and the
> >> +particular instance, since we can have more than one per class.
> >> +
> >> +In the future we also want to expose more information which can further
> >> +describe the capabilities of a region.
> >> +
> >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> >> +        :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
> >> +
> >> +GEM_CREATE_EXT
> >> +--------------
> >> +New ioctl which is basically just gem_create but now allows userspace to
> >> +provide a chain of possible extensions. Note that if we don't provide any
> >> +extensions then we get the exact same behaviour as gem_create.
> >
> > "don't provide any extensions and set flags=0"
> >
> >> +
> >> +Side note: We also need to support PXP[1] in the near future, which is also
> >> +applicable to integrated platforms, and adds its own gem_create_ext extension,
> >> +which basically lets userspace mark a buffer as "protected".
> >> +
> >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> >> +        :functions: drm_i915_gem_create_ext
> >> +
> >> +I915_GEM_CREATE_EXT_MEMORY_REGIONS
> >> +----------------------------------
> >> +Implemented as an extension for gem_create_ext, we would now allow userspace to
> >> +optionally provide an immutable list of preferred placements at creation time,
> >> +in priority order, for a given buffer object.  For the placements we expect
> >> +them each to use the class/instance encoding, as per the output of the regions
> >> +query. Having the list in priority order will be useful in the future when
> >> +placing an object, say during eviction.
> >> +
> >> +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
> >> +        :functions: drm_i915_gem_create_ext_memory_regions
> >> +
> >> +One fair criticism here is that this seems a little over-engineered[2]. If we
> >> +just consider DG1 then yes, a simple gem_create.flags or something is totally
> >> +all that's needed to tell the kernel to allocate the buffer in local-memory or
> >> +whatever. However looking to the future we need uAPI which can also support
> >> +upcoming Xe HP multi-tile architecture in a sane way, where there can be
> >> +multiple local-memory instances for a given device, and so using both class and
> >> +instance in our uAPI to describe regions is desirable, although specifically
> >> +for DG1 it's uninteresting, since we only have a single local-memory instance.
> >> +
> >> +Existing uAPI issues
> >> +====================
> >> +Some potential issues we still need to resolve.
> >> +
> >> +I915 MMAP
> >> +---------
> >> +In i915 there are multiple ways to MMAP GEM object, including mapping the same
> >> +object using different mapping types(WC vs WB), i.e multiple active mmaps per
> >> +object. TTM expects one MMAP at most for the lifetime of the object. If it
> >> +turns out that we have to backpedal here, there might be some potential
> >> +userspace fallout.
> >> +
> >> +I915 SET/GET CACHING
> >> +--------------------
> >> +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but
> >> +DG1 doesn't support non-snooped pcie transactions, so we can just always
> >> +allocate as WB for smem-only buffers.  If/when our hw gains support for
> >> +non-snooped pcie transactions then we must fix this mode at allocation time as
> >> +a new GEM extension.
> >
> >  From the Mesa PoV this should mostly be fine.  In Vulkan, we only ever
> > SET_CACHING right after BO creation.  In GL, we do SET_CACHING
> > multiple times on a BO but, from the perspective of the iris_bufmgr
> > API, it happens on BO creation.  We only SET_CACHING if we pull a BO
> > out of our internal cache with the wrong caching setting.  The Mesa
> > fix is pretty simple:  just add caching to the key we use for our
> > internal BO cache.  We can't do that retroactively, of course, but we
> > can fairly easily do it for all LMEM platforms going forward.
>
> Slightly orthogonal: what does Mesa do here for snooped vs LLC
> platforms? Does it make such a distinction? Just curious.

In Vulkan on non-LLC platforms, we only enable snooping for things
that are going to be mapped: staging buffers, state buffers, batches,
etc.  For anything that's not mapped (tiled images, etc.) we leave
snooping off on non-LLC platforms so we don't take a hit from it.  In
GL, I think it works out to be effectively the same but it's a less
obvious decision there.

--Jason


> >> +
> >> +This is related to the mmap problem, because in general (meaning, when we're
> >> +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with
> >> +allocation mode.
> >> +
> >> +Possible idea is to let the kernel picks the mmap mode for userspace from the
> >> +following table:
> >> +
> >> +smem-only: WB. Userspace does not need to call clflush.
> >> +
> >> +smem+lmem: We allocate uncached memory, and give userspace a WC mapping
> >> +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped
> >> +access, which is a bit inefficient.
> >> +
> >> +lmem only: always WC
> >> +
> >> +This means on discrete you only get a single mmap mode, all others must be
> >> +rejected. That's probably going to be a new default mode or something like
> >> +that.
> >
> > Seems reasonable for now, I think.  Again, we can't apply it
> > retroactively to old Mesa drivers that have already shipped but I
> > don't see why we can't do this going forward.  We can also add a
> > create flag for changing caching settings.
> >
> > --Jason
> >
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-28 16:56     ` Jason Ekstrand
@ 2021-04-28 17:12       ` Kenneth Graunke
  0 siblings, 0 replies; 22+ messages in thread
From: Kenneth Graunke @ 2021-04-28 17:12 UTC (permalink / raw)
  To: Matthew Auld, Jason Ekstrand
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, Intel GFX, Maling list - DRI developers,
	Daniel Vetter, Daniele Ceraolo Spurio, Jon Bloomfield,
	ML mesa-dev, Daniel Vetter


[-- Attachment #1.1: Type: text/plain, Size: 959 bytes --]

On Wednesday, April 28, 2021 9:56:25 AM PDT Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 11:41 AM Matthew Auld <matthew.auld@intel.com> wrote:
[snip]
> > Slightly orthogonal: what does Mesa do here for snooped vs LLC
> > platforms? Does it make such a distinction? Just curious.
> 
> In Vulkan on non-LLC platforms, we only enable snooping for things
> that are going to be mapped: staging buffers, state buffers, batches,
> etc.  For anything that's not mapped (tiled images, etc.) we leave
> snooping off on non-LLC platforms so we don't take a hit from it.  In
> GL, I think it works out to be effectively the same but it's a less
> obvious decision there.
> 
> --Jason

iris currently enables snooping on non-LLC platforms when Gallium marks
a resource as PIPE_USAGE_STAGING, which generally means it's going to be
mapped and "fast CPU access" is desired.  Most buffers are not snooped.

I don't believe i965 uses snooping at all, surprisingly.

--Ken

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/9] drm/i915/uapi: implement object placement extension
  2021-04-26  9:38 ` [PATCH 6/9] drm/i915/uapi: implement object placement extension Matthew Auld
@ 2021-04-28 17:28   ` Kenneth Graunke
  0 siblings, 0 replies; 22+ messages in thread
From: Kenneth Graunke @ 2021-04-28 17:28 UTC (permalink / raw)
  To: intel-gfx, Matthew Auld
  Cc: Lionel Landwerlin, Jordan Justen, dri-devel, CQ Tang,
	Daniele Ceraolo Spurio, Jason Ekstrand, mesa-dev, Daniel Vetter


[-- Attachment #1.1: Type: text/plain, Size: 3849 bytes --]

On Monday, April 26, 2021 2:38:58 AM PDT Matthew Auld wrote:
> Add new extension to support setting an immutable-priority-list of
> potential placements, at creation time.
> 
> If we use the normal gem_create or gem_create_ext without the
> extensions/placements then we still get the old behaviour with only
> placing the object in system memory.
> 
> v2(Daniel & Jason):
>     - Add a bunch of kernel-doc
>     - Simplify design for placements extension
> 
> Testcase: igt/gem_create/create-ext-placement-sanity-check
> Testcase: igt/gem_create/create-ext-placement-each
> Testcase: igt/gem_create/create-ext-placement-all
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: CQ Tang <cq.tang@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: mesa-dev@lists.freedesktop.org
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_create.c    | 215 ++++++++++++++++--
>  drivers/gpu/drm/i915/gem/i915_gem_object.c    |   3 +
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 +
>  .../drm/i915/gem/selftests/i915_gem_mman.c    |  26 +++
>  drivers/gpu/drm/i915/intel_memory_region.c    |  16 ++
>  drivers/gpu/drm/i915/intel_memory_region.h    |   4 +
>  include/uapi/drm/i915_drm.h                   |  62 +++++
>  7 files changed, 315 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 90e9eb6601b5..895f1666a8d3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -4,12 +4,47 @@
>   */
>  
>  #include "gem/i915_gem_ioctls.h"
> +#include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_region.h"
>  
>  #include "i915_drv.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
>  
> +static u32 object_max_page_size(struct drm_i915_gem_object *obj)
> +{
> +	u32 max_page_size = 0;
> +	int i;
> +
> +	for (i = 0; i < obj->mm.n_placements; i++) {
> +		struct intel_memory_region *mr = obj->mm.placements[i];
> +
> +		GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
> +		max_page_size = max_t(u32, max_page_size, mr->min_page_size);
> +	}
> +
> +	GEM_BUG_ON(!max_page_size);
> +	return max_page_size;
> +}
> +
> +static void object_set_placements(struct drm_i915_gem_object *obj,
> +				  struct intel_memory_region **placements,
> +				  unsigned int n_placements)
> +{
> +	GEM_BUG_ON(!n_placements);
> +
> +	if (n_placements == 1) {
> +		struct intel_memory_region *mr = placements[0];
> +		struct drm_i915_private *i915 = mr->i915;
> +
> +		obj->mm.placements = &i915->mm.regions[mr->id];
> +		obj->mm.n_placements = 1;
> +	} else {
> +		obj->mm.placements = placements;
> +		obj->mm.n_placements = n_placements;
> +	}
> +}
> +

I found this helper function rather odd looking at first.  In the
general case, it simply sets fields based on the parameters...but in
the n == 1 case, it goes and uses something else as the array.

On further inspection, this makes sense: normally, we have an array
of multiple placements in priority order.  That array is (essentially)
malloc'd.  But if there's only 1 item, having a malloc'd array of 1
thing is pretty silly.  We can just point at it directly.  Which means
the callers can kfree the array, and the object destructor should not.

Maybe a comment saying

   /* 
    * For the common case of one memory region, skip storing an
    * allocated array and just point at the region directly.
    */

would be helpful?

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (10 preceding siblings ...)
  2021-04-28 15:51 ` Jason Ekstrand
@ 2021-04-28 17:30 ` Kenneth Graunke
  2021-04-28 17:39 ` Bloomfield, Jon
  12 siblings, 0 replies; 22+ messages in thread
From: Kenneth Graunke @ 2021-04-28 17:30 UTC (permalink / raw)
  To: intel-gfx, Matthew Auld
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie,
	Jordan Justen, dri-devel, Daniel Vetter, Daniele Ceraolo Spurio,
	Jon Bloomfield, Jason Ekstrand, mesa-dev, Daniel Vetter


[-- Attachment #1.1: Type: text/plain, Size: 2403 bytes --]

On Monday, April 26, 2021 2:38:53 AM PDT Matthew Auld wrote:
> Add an entry for the new uAPI needed for DG1. Also add the overall
> upstream plan, including some notes for the TTM conversion.
> 
> v2(Daniel):
>   - include the overall upstreaming plan
>   - add a note for mmap, there are differences here for TTM vs i915
>   - bunch of other suggestions from Daniel
> v3:
>  (Daniel)
>   - add a note for set/get caching stuff
>   - add some more docs for existing query and extensions stuff
>   - add an actual code example for regions query
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
> 	- try a simpler design with the placements extension
> 	- rather than have a generic setparam which can cover multiple
> 	  use cases, have each extension be responsible for one thing
> 	  only
> v4:
>  (Daniel)
>   - add some more notes for ttm conversion
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
> 	- drop all the extra rsvd members for the region_query and
> 	  region_info, just keep the bare minimum needed for padding
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: mesa-dev@lists.freedesktop.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Acked-by: Dave Airlie <airlied@redhat.com>
> ---
>  Documentation/gpu/rfc/i915_gem_lmem.h   | 212 ++++++++++++++++++++++++
>  Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++
>  Documentation/gpu/rfc/index.rst         |   4 +
>  3 files changed, 346 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
>  create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst

With or without any of my suggestions,

Patch 7 is:

Acked-by: Kenneth Graunke <kenneth@whitecape.org>

The rest of the series (1-6, 8-9) are:

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
  2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
                   ` (11 preceding siblings ...)
  2021-04-28 17:30 ` Kenneth Graunke
@ 2021-04-28 17:39 ` Bloomfield, Jon
  12 siblings, 0 replies; 22+ messages in thread
From: Bloomfield, Jon @ 2021-04-28 17:39 UTC (permalink / raw)
  To: Auld, Matthew, intel-gfx
  Cc: Lionel Landwerlin, Thomas Hellström, Dave Airlie, Justen,
	Jordan L, dri-devel, Daniel Vetter, Kenneth Graunke,
	Ceraolo Spurio, Daniele, Jason Ekstrand, mesa-dev, Vetter,
	Daniel

> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Monday, April 26, 2021 2:39 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>; Ceraolo Spurio, Daniele
> <daniele.ceraolospurio@intel.com>; Lionel Landwerlin
> <lionel.g.landwerlin@linux.intel.com>; Bloomfield, Jon
> <jon.bloomfield@intel.com>; Justen, Jordan L <jordan.l.justen@intel.com>;
> Vetter, Daniel <daniel.vetter@intel.com>; Kenneth Graunke
> <kenneth@whitecape.org>; Jason Ekstrand <jason@jlekstrand.net>; Dave
> Airlie <airlied@gmail.com>; dri-devel@lists.freedesktop.org; mesa-
> dev@lists.freedesktop.org; Daniel Vetter <daniel.vetter@ffwll.ch>; Dave
> Airlie <airlied@redhat.com>
> Subject: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
> 
> Add an entry for the new uAPI needed for DG1. Also add the overall
> upstream plan, including some notes for the TTM conversion.
> 
> v2(Daniel):
>   - include the overall upstreaming plan
>   - add a note for mmap, there are differences here for TTM vs i915
>   - bunch of other suggestions from Daniel
> v3:
>  (Daniel)
>   - add a note for set/get caching stuff
>   - add some more docs for existing query and extensions stuff
>   - add an actual code example for regions query
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
> 	- try a simpler design with the placements extension
> 	- rather than have a generic setparam which can cover multiple
> 	  use cases, have each extension be responsible for one thing
> 	  only
> v4:
>  (Daniel)
>   - add some more notes for ttm conversion
>   - bunch of other stuff
>  (Jason)
>   - uAPI change(!):
> 	- drop all the extra rsvd members for the region_query and
> 	  region_info, just keep the bare minimum needed for padding
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: mesa-dev@lists.freedesktop.org
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Acked-by: Dave Airlie <airlied@redhat.com>
> ---

Acked-by: Jon Bloomfield <jon.bloomfield@intel.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-04-29  5:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26  9:38 [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Matthew Auld
2021-04-26  9:38 ` [PATCH 2/9] drm/i915: mark stolen as private Matthew Auld
2021-04-26  9:38 ` [PATCH 3/9] drm/i915/query: Expose memory regions through the query uAPI Matthew Auld
2021-04-26  9:38 ` [PATCH 4/9] drm/i915: rework gem_create flow for upcoming extensions Matthew Auld
2021-04-26  9:38 ` [PATCH 5/9] drm/i915/uapi: introduce drm_i915_gem_create_ext Matthew Auld
2021-04-26  9:38 ` [PATCH 6/9] drm/i915/uapi: implement object placement extension Matthew Auld
2021-04-28 17:28   ` Kenneth Graunke
2021-04-26  9:38 ` [PATCH 7/9] drm/i915/lmem: support optional CPU clearing for special internal use Matthew Auld
2021-04-26  9:39 ` [PATCH 8/9] drm/i915/gem: clear userspace buffers for LMEM Matthew Auld
2021-04-26  9:39 ` [PATCH 9/9] drm/i915/gem: hide new uAPI behind CONFIG_BROKEN Matthew Auld
2021-04-26 15:11 ` [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI Jason Ekstrand
2021-04-26 15:31   ` Matthew Auld
2021-04-26 16:25     ` Jason Ekstrand
2021-04-26 16:32       ` Daniel Vetter
2021-04-28 15:16 ` Kenneth Graunke
2021-04-28 16:10   ` Matthew Auld
2021-04-28 15:51 ` Jason Ekstrand
2021-04-28 16:41   ` Matthew Auld
2021-04-28 16:56     ` Jason Ekstrand
2021-04-28 17:12       ` Kenneth Graunke
2021-04-28 17:30 ` Kenneth Graunke
2021-04-28 17:39 ` Bloomfield, Jon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).