All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Documentation patch for batchbuffer submission
@ 2018-04-03 10:52 kevin.rogovin
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: kevin.rogovin @ 2018-04-03 10:52 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

From: Kevin Rogovin <kevin.rogovin@intel.com>

Note: I want to make a one or two follow-up series that provide
narration and potentially additional documentation for GUC submission
and the breadcrumbs.

v4:
   Drop some details from narration to provide better focus.
   (suggested/requested by Chris Wilson)

   Spelling and grammar fixes.
   (errors caught by Tvrtko Ursuli)

   Move "at the bottom" details to part of documentation in
   i915_gem_execbuffer.c.
   (suggested by Chris Wilson)

   Place additional documentation of intel_engine_cs function
   pointers to inlined in struct.

   Fix documentation error about lazy creation of intel_ringbuffer
   and backing store in intel_lrc.c.
   (Caught by Chris Wilson)

v3:
   More documentation: emphasize that handling of batchbuffer
   requests creates a request struct that is added to the
   dependency tree and that the inform to the hardware that
   there is new data on a ringbuffer is deferred until dependencies
   are satsified.

   Break patch into more digestable chunks.  

v2:
   More documentation: intel_ringbuffer, sequence number.
   Expose to i915.rst existing documentation

   Call out GEM_EXECBUFFER as deprecated.
   Place code detailed documentation in source files.
   Call out INTEL_EXEC_RENDER.
   Reorder text to make it more readable.
   Refer to Command Buffer Parser instead of Batchbuffer Parser.
   (suggested by Joonas Lahtinen)

Kevin Rogovin (5):
  i915.rst: Narration overview on GEM + minor reorder to improve
    narration
  i915: add a text about what happens at bottom of stack in processing a
    batchbuffer
  i915.rst: add link to documentation in i915_gem_execbuffer.c
  i915: correct lazy ringbuffer and backing store documentation
  i915: add documentation to intel_engine_cs

 Documentation/gpu/i915.rst                 | 122 +++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  22 ++++++
 drivers/gpu/drm/i915/i915_vma.h            |  10 ++-
 drivers/gpu/drm/i915/intel_lrc.c           |  13 +--
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  29 +++++++
 5 files changed, 158 insertions(+), 38 deletions(-)

-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration
  2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
@ 2018-04-03 10:52 ` kevin.rogovin
  2018-04-03 12:07   ` Joonas Lahtinen
                     ` (3 more replies)
  2018-04-03 10:52 ` [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer kevin.rogovin
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 20+ messages in thread
From: kevin.rogovin @ 2018-04-03 10:52 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

From: Kevin Rogovin <kevin.rogovin@intel.com>

Add a narration to i915.rst about Intel GEN GPU's: engines,
driver context and relocation.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
---
 Documentation/gpu/i915.rst      | 116 ++++++++++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_vma.h |  10 ++--
 2 files changed, 100 insertions(+), 26 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..00f897f67f85 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -249,6 +249,99 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+----------------
+
+An Intel GPU has multiple engines. There are several engine types.
+
+- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
+- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS.
+
+The Intel GPU family is a family of integrated GPU's using Unified
+Memory Access. For having the GPU "do work", user space will feed the
+GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
+instruct the GPU to perform work (for example rendering) and that work
+needs memory from which to read and memory to which to write. All memory
+is encapsulated within GEM buffer objects (usually created with the ioctl
+`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
+to create will also list all GEM buffer objects that the batchbuffer reads
+and/or writes. For implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+The i915 driver allows user space to create a context via the ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
+integer. Such a context should be veiwed by user-space as -loosely-
+analogous to the idea of a CPU process of an operating system. The i915
+driver guarantees that commands issued to a fixed context are to be
+executed so that writes of a previously issued command are seen by
+reads of following commands. Actions issued between different contexts
+(even if from the same file descriptor) are NOT given that guarantee
+and the only way to synchornize across contexts (even from the same
+file descriptor) is through the use of fences. At least as far back as
+Gen4, also have that a context carries with it a GPU HW context;
+the HW context is essentially (most of atleast) the state of a GPU.
+In addition to the ordering gaurantees, the kernel will restore GPU
+state via HW context when commands are issued to a context, this saves
+user space the need to restore (most of atleast) the GPU state at the
+start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
+is used by user space to create a hardware context which is identified
+by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
+work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
+to identify what context to use with the command.
+
+The GPU has its own memory management and address space. The kernel
+driver maintains the memory translation table for the GPU. For older
+GPUs (i.e. those before Gen8), there is a single global such translation
+table, a global Graphics Translation Table (GTT). For newer generation
+GPUs each context has its own translation table, called Per-Process
+Graphics Translation Table (PPGTT). Of important note, is that although
+PPGTT is named per-process it is actually per context. When user space
+submits a batchbuffer, the kernel walks the list of GEM buffer objects
+used by the batchbuffer and guarantees that not only is the memory of
+each such GEM buffer object resident but it is also present in the
+(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
+then it is given an address. Two consequences of this are: the kernel
+needs to edit the batchbuffer submitted to write the correct value of
+the GPU address when a GEM BO is assigned a GPU address and the kernel
+might evict a different GEM BO from the (PP)GTT to make address room
+for another GEM BO. Consequently, the ioctls submitting a batchbuffer
+for execution also include a list of all locations within buffers that
+refer to GPU-addresses so that the kernel can edit the buffer correctly.
+This process is dubbed relocation.
+
+GEM BO Management Implementation Details
+----------------------------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
+   :doc: Virtual Memory Address
+
+Buffer Object Eviction
+----------------------
+
+This section documents the interface functions for evicting buffer
+objects to make space available in the virtual gpu address spaces. Note
+that this is mostly orthogonal to shrinking buffer objects caches, which
+has the goal to make main memory (shared with the gpu through the
+unified memory architecture) available.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
+   :internal:
+
+Buffer Object Memory Shrinking
+------------------------------
+
+This section documents the interface function for shrinking memory usage
+of buffer object caches. Shrinking is used to make main memory
+available. Note that this is mostly orthogonal to evicting buffer
+objects, which has the goal to make space in gpu virtual address spaces.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
+   :internal:
+
 Batchbuffer Parsing
 -------------------
 
@@ -312,29 +405,6 @@ Object Tiling IOCTLs
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
    :doc: buffer object tiling
 
-Buffer Object Eviction
-----------------------
-
-This section documents the interface functions for evicting buffer
-objects to make space available in the virtual gpu address spaces. Note
-that this is mostly orthogonal to shrinking buffer objects caches, which
-has the goal to make main memory (shared with the gpu through the
-unified memory architecture) available.
-
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
-   :internal:
-
-Buffer Object Memory Shrinking
-------------------------------
-
-This section documents the interface function for shrinking memory usage
-of buffer object caches. Shrinking is used to make main memory
-available. Note that this is mostly orthogonal to evicting buffer
-objects, which has the goal to make space in gpu virtual address spaces.
-
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
-   :internal:
-
 GuC
 ===
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 8c5022095418..0000f23a7266 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -38,9 +38,13 @@
 enum i915_cache_level;
 
 /**
- * A VMA represents a GEM BO that is bound into an address space. Therefore, a
- * VMA's presence cannot be guaranteed before binding, or after unbinding the
- * object into/from the address space.
+ * DOC: Virtual Memory Address
+ *
+ * An `i915_vma` struct represents a GEM BO that is bound into an address
+ * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
+ * after unbinding the object into/from the address space. The struct includes
+ * the bookkepping details needed for tracking it in all the lists with which
+ * it interacts.
  *
  * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
  * will always be <= an objects lifetime. So object refcounting should cover us.
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer
  2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
@ 2018-04-03 10:52 ` kevin.rogovin
  2018-04-03 12:15   ` Joonas Lahtinen
  2018-04-03 10:52 ` [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c kevin.rogovin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: kevin.rogovin @ 2018-04-03 10:52 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

From: Kevin Rogovin <kevin.rogovin@intel.com>

Now that "DOC: User command execution" of i915_gem_execbuffer.c is included
in the i915.rst, it is benecifial (for new developers) to read what happens
at the bottom of the driver stack (in terms of bytes written to be read
by the GPU) when processing a user-space batchbuffer.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8c170db8495d..1fe5da1fed47 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -81,6 +81,28 @@ enum {
  * but this remains just a hint as the kernel may choose a new location for
  * any object in the future.
  *
+ * At the level of talking to the hardware, submitting a batchbuffer for the
+ * GPU to execute is to add content to a buffer from which the HW
+ * command streamer is reading.
+ * 1. Add a command to load the HW context. For Logical Ring Contexts, i.e.
+ *    Execlists, this command is not placed on the same buffer as the
+ *    remaining items.
+ * 2. Add a command to invalidate caches to the buffer.
+ * 3. Add a batchbuffer start command to the buffer; the start command is
+ *    essentially a token together with the GPU address of the batchbuffer
+ *    to be executed.
+ * 4. Add a pipeline flush to the buffer.
+ * 5. Add a memory write command to the buffer to record when the GPU
+ *    is done executing the batchbuffer. The memory write writes the
+ *    global sequence number of the request, `i915_request::global_seqno``;
+ *    the i915 driver uses the current value in the register to determine
+ *    if the GPU has completed the batchbuffer.
+ * 6. Add a user interrupt command to the buffer. This command instructs
+ *    the GPU to issue an interrupt when the command, pipeline flush and
+ *    memory write are completed.
+ * 7. Inform the hardware of the additional commands added to the buffer
+ *    (by updating the tail pointer).
+ *
  * Processing an execbuf ioctl is conceptually split up into a few phases.
  *
  * 1. Validation - Ensure all the pointers, handles and flags are valid.
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c
  2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
  2018-04-03 10:52 ` [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer kevin.rogovin
@ 2018-04-03 10:52 ` kevin.rogovin
  2018-04-03 12:16   ` Joonas Lahtinen
  2018-04-03 10:52 ` [PATCH v4 4/5] i915: correct lazy ringbuffer and backing store documentation kevin.rogovin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: kevin.rogovin @ 2018-04-03 10:52 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

From: Kevin Rogovin <kevin.rogovin@intel.com>

Add the documentation of "DOC: User command execution" of
i915_gem_execbuffer.c into a new section in i915.rst.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
---
 Documentation/gpu/i915.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 00f897f67f85..efe15406f322 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -360,6 +360,12 @@ Batchbuffer Pools
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
    :internal:
 
+User Batchbuffer Execution
+--------------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
 Logical Rings, Logical Ring Contexts and Execlists
 --------------------------------------------------
 
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 4/5] i915: correct lazy ringbuffer and backing store documentation
  2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
                   ` (2 preceding siblings ...)
  2018-04-03 10:52 ` [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c kevin.rogovin
@ 2018-04-03 10:52 ` kevin.rogovin
  2018-04-03 10:52 ` [PATCH v4 5/5] i915: add documentation to intel_engine_cs kevin.rogovin
  2018-04-03 12:19 ` ✗ Fi.CI.BAT: failure for Documentation patch for batchbuffer submission (rev4) Patchwork
  5 siblings, 0 replies; 20+ messages in thread
From: kevin.rogovin @ 2018-04-03 10:52 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

From: Kevin Rogovin <kevin.rogovin@intel.com>

Correct documentation of logical ring context implementation to note
that ringbuffer and backing store are created lazily for all context
types (driver global, local default context and local extra context).

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c009ed7b..382a4656a1d9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -70,22 +70,11 @@
  * - One ringbuffer per-engine inside each context.
  * - One backing object per-engine inside each context.
  *
- * The global default context starts its life with these new objects fully
- * allocated and populated. The local default context for each opened fd is
- * more complex, because we don't know at creation time which engine is going
- * to use them. To handle this, we have implemented a deferred creation of LR
- * contexts:
- *
- * The local context starts its life as a hollow or blank holder, that only
- * gets populated for a given engine once we receive an execbuffer. If later
+ * which are populated for a given engine once we receive an execbuffer.If later
  * on we receive another execbuffer ioctl for the same context but a different
  * engine, we allocate/populate a new ringbuffer and context backing object and
  * so on.
  *
- * Finally, regarding local contexts created using the ioctl call: as they are
- * only allowed with the render ring, we can allocate & populate them right
- * away (no need to defer anything, at least for now).
- *
  * Execlists implementation:
  * Execlists are the new method by which, on gen8+ hardware, workloads are
  * submitted for execution (as opposed to the legacy, ringbuffer-based, method).
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
                   ` (3 preceding siblings ...)
  2018-04-03 10:52 ` [PATCH v4 4/5] i915: correct lazy ringbuffer and backing store documentation kevin.rogovin
@ 2018-04-03 10:52 ` kevin.rogovin
  2018-04-03 13:10   ` Joonas Lahtinen
  2018-04-03 12:19 ` ✗ Fi.CI.BAT: failure for Documentation patch for batchbuffer submission (rev4) Patchwork
  5 siblings, 1 reply; 20+ messages in thread
From: kevin.rogovin @ 2018-04-03 10:52 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

From: Kevin Rogovin <kevin.rogovin@intel.com>

Add documentation to a number of the function pointer fields of
intel_engine_cs.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1f50727a5ddb..eafd1690acde 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -426,23 +426,52 @@ struct intel_engine_cs {
 
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
+	/* In addition to pinning the context, returns the intel_ringbuffer
+	 * to which to write commands.
+	 */
 	struct intel_ring *(*context_pin)(struct intel_engine_cs *engine,
 					  struct i915_gem_context *ctx);
 	void		(*context_unpin)(struct intel_engine_cs *engine,
 					 struct i915_gem_context *ctx);
+
+	/* Request room on the ringbuffer of a request in order to write
+	 * commands for a request; In addition, if necessary, add commands
+	 * to the buffer so that the i915_gem_context of the request
+	 * is the one active for the commands.
+	 */
 	int		(*request_alloc)(struct i915_request *rq);
+
+	/* Called only once (and only if non-NULL) for an engine; used to
+	 * initialize the global driver default context.
+	 */
 	int		(*init_context)(struct i915_request *rq);
 
+	/* Add a GPU command to cache invalidate with EMIT_INVALIDATE,
+	 * to pipeline flush with EMIT_FLUSH or to do both with EMIT_BARRIER;
+	 * the GPU command is added to the buffer holding the commands of
+	 * the request (i.e. calling intel_ring_begin() on
+	 * i915_request::ring).
+	 */
 	int		(*emit_flush)(struct i915_request *request, u32 mode);
 #define EMIT_INVALIDATE	BIT(0)
 #define EMIT_FLUSH	BIT(1)
 #define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
+	/* Add a batchbuffer start command; the GPU command is added to
+	 * the buffer holding the commands of the request (i.e. calling
+	 * intel_ring_begin() on i915_request::ring).
+	 */
 	int		(*emit_bb_start)(struct i915_request *rq,
 					 u64 offset, u32 length,
 					 unsigned int dispatch_flags);
 #define I915_DISPATCH_SECURE BIT(0)
 #define I915_DISPATCH_PINNED BIT(1)
 #define I915_DISPATCH_RS     BIT(2)
+	/* Add a memory write command that writes the global sequence number
+	 * (i915_request::global_seqno) and also add an interrupt command;
+	 * the GPU command is added to the buffer holding the commands of
+	 * the request (i.e. calling intel_ring_begin() on
+	 * i915_request::ring).
+	 */
 	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
 	int		emit_breadcrumb_sz;
 
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
@ 2018-04-03 12:07   ` Joonas Lahtinen
  2018-04-03 13:31     ` Jani Nikula
  2018-04-03 13:55   ` Mika Kuoppala
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Joonas Lahtinen @ 2018-04-03 12:07 UTC (permalink / raw)
  To: abdiel.janulgue, chris, intel-gfx, tvrtko.ursulin; +Cc: Kevin Rogovin

Quoting kevin.rogovin@intel.com (2018-04-03 13:52:23)
> From: Kevin Rogovin <kevin.rogovin@intel.com>
> 
> Add a narration to i915.rst about Intel GEN GPU's: engines,
> driver context and relocation.
> 
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>

I'm still bummed by the long lines in the bulleted list, but regardless:

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

> ---
>  Documentation/gpu/i915.rst      | 116 ++++++++++++++++++++++++++++++++--------
>  drivers/gpu/drm/i915/i915_vma.h |  10 ++--
>  2 files changed, 100 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 41dc881b00dc..00f897f67f85 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -249,6 +249,99 @@ Memory Management and Command Submission
>  This sections covers all things related to the GEM implementation in the
>  i915 driver.
>  
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
> +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS.
> +
> +The Intel GPU family is a family of integrated GPU's using Unified
> +Memory Access. For having the GPU "do work", user space will feed the
> +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
> +instruct the GPU to perform work (for example rendering) and that work
> +needs memory from which to read and memory to which to write. All memory
> +is encapsulated within GEM buffer objects (usually created with the ioctl
> +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
> +to create will also list all GEM buffer objects that the batchbuffer reads
> +and/or writes. For implementation details of memory management see
> +`GEM BO Management Implementation Details`_.
> +
> +The i915 driver allows user space to create a context via the ioctl
> +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
> +integer. Such a context should be veiwed by user-space as -loosely-
> +analogous to the idea of a CPU process of an operating system. The i915
> +driver guarantees that commands issued to a fixed context are to be
> +executed so that writes of a previously issued command are seen by
> +reads of following commands. Actions issued between different contexts
> +(even if from the same file descriptor) are NOT given that guarantee
> +and the only way to synchornize across contexts (even from the same
> +file descriptor) is through the use of fences. At least as far back as
> +Gen4, also have that a context carries with it a GPU HW context;
> +the HW context is essentially (most of atleast) the state of a GPU.
> +In addition to the ordering gaurantees, the kernel will restore GPU
> +state via HW context when commands are issued to a context, this saves
> +user space the need to restore (most of atleast) the GPU state at the
> +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
> +is used by user space to create a hardware context which is identified
> +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
> +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
> +to identify what context to use with the command.
> +
> +The GPU has its own memory management and address space. The kernel
> +driver maintains the memory translation table for the GPU. For older
> +GPUs (i.e. those before Gen8), there is a single global such translation
> +table, a global Graphics Translation Table (GTT). For newer generation
> +GPUs each context has its own translation table, called Per-Process
> +Graphics Translation Table (PPGTT). Of important note, is that although
> +PPGTT is named per-process it is actually per context. When user space
> +submits a batchbuffer, the kernel walks the list of GEM buffer objects
> +used by the batchbuffer and guarantees that not only is the memory of
> +each such GEM buffer object resident but it is also present in the
> +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
> +then it is given an address. Two consequences of this are: the kernel
> +needs to edit the batchbuffer submitted to write the correct value of
> +the GPU address when a GEM BO is assigned a GPU address and the kernel
> +might evict a different GEM BO from the (PP)GTT to make address room
> +for another GEM BO. Consequently, the ioctls submitting a batchbuffer
> +for execution also include a list of all locations within buffers that
> +refer to GPU-addresses so that the kernel can edit the buffer correctly.
> +This process is dubbed relocation.
> +
> +GEM BO Management Implementation Details
> +----------------------------------------
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
> +   :doc: Virtual Memory Address
> +
> +Buffer Object Eviction
> +----------------------
> +
> +This section documents the interface functions for evicting buffer
> +objects to make space available in the virtual gpu address spaces. Note
> +that this is mostly orthogonal to shrinking buffer objects caches, which
> +has the goal to make main memory (shared with the gpu through the
> +unified memory architecture) available.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> +   :internal:
> +
> +Buffer Object Memory Shrinking
> +------------------------------
> +
> +This section documents the interface function for shrinking memory usage
> +of buffer object caches. Shrinking is used to make main memory
> +available. Note that this is mostly orthogonal to evicting buffer
> +objects, which has the goal to make space in gpu virtual address spaces.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> +   :internal:
> +
>  Batchbuffer Parsing
>  -------------------
>  
> @@ -312,29 +405,6 @@ Object Tiling IOCTLs
>  .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
>     :doc: buffer object tiling
>  
> -Buffer Object Eviction
> -----------------------
> -
> -This section documents the interface functions for evicting buffer
> -objects to make space available in the virtual gpu address spaces. Note
> -that this is mostly orthogonal to shrinking buffer objects caches, which
> -has the goal to make main memory (shared with the gpu through the
> -unified memory architecture) available.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> -   :internal:
> -
> -Buffer Object Memory Shrinking
> -------------------------------
> -
> -This section documents the interface function for shrinking memory usage
> -of buffer object caches. Shrinking is used to make main memory
> -available. Note that this is mostly orthogonal to evicting buffer
> -objects, which has the goal to make space in gpu virtual address spaces.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> -   :internal:
> -
>  GuC
>  ===
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 8c5022095418..0000f23a7266 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -38,9 +38,13 @@
>  enum i915_cache_level;
>  
>  /**
> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> - * VMA's presence cannot be guaranteed before binding, or after unbinding the
> - * object into/from the address space.
> + * DOC: Virtual Memory Address
> + *
> + * An `i915_vma` struct represents a GEM BO that is bound into an address
> + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
> + * after unbinding the object into/from the address space. The struct includes
> + * the bookkepping details needed for tracking it in all the lists with which
> + * it interacts.
>   *
>   * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
>   * will always be <= an objects lifetime. So object refcounting should cover us.
> -- 
> 2.16.2
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer
  2018-04-03 10:52 ` [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer kevin.rogovin
@ 2018-04-03 12:15   ` Joonas Lahtinen
  0 siblings, 0 replies; 20+ messages in thread
From: Joonas Lahtinen @ 2018-04-03 12:15 UTC (permalink / raw)
  To: abdiel.janulgue, chris, intel-gfx, tvrtko.ursulin; +Cc: Kevin Rogovin

Quoting kevin.rogovin@intel.com (2018-04-03 13:52:24)
> From: Kevin Rogovin <kevin.rogovin@intel.com>
> 
> Now that "DOC: User command execution" of i915_gem_execbuffer.c is included
> in the i915.rst, it is benecifial (for new developers) to read what happens
> at the bottom of the driver stack (in terms of bytes written to be read
> by the GPU) when processing a user-space batchbuffer.
> 
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 8c170db8495d..1fe5da1fed47 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -81,6 +81,28 @@ enum {
>   * but this remains just a hint as the kernel may choose a new location for
>   * any object in the future.
>   *
> + * At the level of talking to the hardware, submitting a batchbuffer for the
> + * GPU to execute is to add content to a buffer from which the HW
> + * command streamer is reading.
> + * 1. Add a command to load the HW context. For Logical Ring Contexts, i.e.
> + *    Execlists, this command is not placed on the same buffer as the
> + *    remaining items.
> + * 2. Add a command to invalidate caches to the buffer.
> + * 3. Add a batchbuffer start command to the buffer; the start command is
> + *    essentially a token together with the GPU address of the batchbuffer
> + *    to be executed.
> + * 4. Add a pipeline flush to the buffer.
> + * 5. Add a memory write command to the buffer to record when the GPU
> + *    is done executing the batchbuffer. The memory write writes the
> + *    global sequence number of the request, `i915_request::global_seqno``;

Double tick here. With that fixed:

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

> + *    the i915 driver uses the current value in the register to determine
> + *    if the GPU has completed the batchbuffer.
> + * 6. Add a user interrupt command to the buffer. This command instructs
> + *    the GPU to issue an interrupt when the command, pipeline flush and
> + *    memory write are completed.
> + * 7. Inform the hardware of the additional commands added to the buffer
> + *    (by updating the tail pointer).
> + *
>   * Processing an execbuf ioctl is conceptually split up into a few phases.
>   *
>   * 1. Validation - Ensure all the pointers, handles and flags are valid.
> -- 
> 2.16.2
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c
  2018-04-03 10:52 ` [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c kevin.rogovin
@ 2018-04-03 12:16   ` Joonas Lahtinen
  0 siblings, 0 replies; 20+ messages in thread
From: Joonas Lahtinen @ 2018-04-03 12:16 UTC (permalink / raw)
  To: abdiel.janulgue, chris, intel-gfx, tvrtko.ursulin; +Cc: Kevin Rogovin

Quoting kevin.rogovin@intel.com (2018-04-03 13:52:25)
> From: Kevin Rogovin <kevin.rogovin@intel.com>
> 
> Add the documentation of "DOC: User command execution" of
> i915_gem_execbuffer.c into a new section in i915.rst.
> 
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✗ Fi.CI.BAT: failure for Documentation patch for batchbuffer submission (rev4)
  2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
                   ` (4 preceding siblings ...)
  2018-04-03 10:52 ` [PATCH v4 5/5] i915: add documentation to intel_engine_cs kevin.rogovin
@ 2018-04-03 12:19 ` Patchwork
  5 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2018-04-03 12:19 UTC (permalink / raw)
  To: kevin.rogovin; +Cc: intel-gfx

== Series Details ==

Series: Documentation patch for batchbuffer submission (rev4)
URL   : https://patchwork.freedesktop.org/series/38433/
State : failure

== Summary ==

Applying: i915.rst: Narration overview on GEM + minor reorder to improve narration
error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	Documentation/gpu/i915.rst
Falling back to patching base and 3-way merge...
Auto-merging Documentation/gpu/i915.rst
CONFLICT (content): Merge conflict in Documentation/gpu/i915.rst
Patch failed at 0001 i915.rst: Narration overview on GEM + minor reorder to improve narration
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-03 10:52 ` [PATCH v4 5/5] i915: add documentation to intel_engine_cs kevin.rogovin
@ 2018-04-03 13:10   ` Joonas Lahtinen
  2018-04-03 14:34     ` Rogovin, Kevin
  0 siblings, 1 reply; 20+ messages in thread
From: Joonas Lahtinen @ 2018-04-03 13:10 UTC (permalink / raw)
  To: abdiel.janulgue, chris, intel-gfx, tvrtko.ursulin; +Cc: Kevin Rogovin

Quoting kevin.rogovin@intel.com (2018-04-03 13:52:27)
> From: Kevin Rogovin <kevin.rogovin@intel.com>
> 
> Add documentation to a number of the function pointer fields of
> intel_engine_cs.
> 
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 1f50727a5ddb..eafd1690acde 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -426,23 +426,52 @@ struct intel_engine_cs {
>  
>         void            (*set_default_submission)(struct intel_engine_cs *engine);
>  
> +       /* In addition to pinning the context, returns the intel_ringbuffer
> +        * to which to write commands.

	/* Pin context and return intel_ring to write commands to. */

And if you have to resort to multi-line comments, make them
balanced:

	/*
	 * Foo...
	 * Bar...
	 */

These comments feel bit verbose for just being internal ones.

> +        */
>         struct intel_ring *(*context_pin)(struct intel_engine_cs *engine,
>                                           struct i915_gem_context *ctx);
>         void            (*context_unpin)(struct intel_engine_cs *engine,
>                                          struct i915_gem_context *ctx);
> +
> +       /* Request room on the ringbuffer of a request in order to write
> +        * commands for a request; In addition, if necessary, add commands
> +        * to the buffer so that the i915_gem_context of the request
> +        * is the one active for the commands.
> +        */

"Reserve room from the ringbuffer for commands and emit necessary context
switching commands."?

>         int             (*request_alloc)(struct i915_request *rq);
> +
> +       /* Called only once (and only if non-NULL) for an engine; used to
> +        * initialize the global driver default context.
> +        */
>         int             (*init_context)(struct i915_request *rq);
>  
> +       /* Add a GPU command to cache invalidate with EMIT_INVALIDATE,
> +        * to pipeline flush with EMIT_FLUSH or to do both with EMIT_BARRIER;
> +        * the GPU command is added to the buffer holding the commands of
> +        * the request (i.e. calling intel_ring_begin() on
> +        * i915_request::ring).
> +        */
>         int             (*emit_flush)(struct i915_request *request, u32 mode);
>  #define EMIT_INVALIDATE        BIT(0)
>  #define EMIT_FLUSH     BIT(1)
>  #define EMIT_BARRIER   (EMIT_INVALIDATE | EMIT_FLUSH)
> +       /* Add a batchbuffer start command; the GPU command is added to
> +        * the buffer holding the commands of the request (i.e. calling
> +        * intel_ring_begin() on i915_request::ring).
> +        */
>         int             (*emit_bb_start)(struct i915_request *rq,
>                                          u64 offset, u32 length,
>                                          unsigned int dispatch_flags);
>  #define I915_DISPATCH_SECURE BIT(0)
>  #define I915_DISPATCH_PINNED BIT(1)
>  #define I915_DISPATCH_RS     BIT(2)
> +       /* Add a memory write command that writes the global sequence number
> +        * (i915_request::global_seqno) and also add an interrupt command;
> +        * the GPU command is added to the buffer holding the commands of
> +        * the request (i.e. calling intel_ring_begin() on
> +        * i915_request::ring).

This is more about what a breadcrumb is than what this interface is
about. "Add commands for triggering a breadcrumb to be picked up" and
maybe explain elsewhere what a breadcrumb is.

So overall, try to make the comments bit less verbose and leave the
implementation detail to the implementation functions :)

Regards, Joonas

> +        */
>         void            (*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>         int             emit_breadcrumb_sz;
>  
> -- 
> 2.16.2
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration
  2018-04-03 12:07   ` Joonas Lahtinen
@ 2018-04-03 13:31     ` Jani Nikula
  0 siblings, 0 replies; 20+ messages in thread
From: Jani Nikula @ 2018-04-03 13:31 UTC (permalink / raw)
  To: Joonas Lahtinen, abdiel.janulgue, chris, intel-gfx, tvrtko.ursulin
  Cc: Kevin Rogovin

On Tue, 03 Apr 2018, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> Quoting kevin.rogovin@intel.com (2018-04-03 13:52:23)
>> From: Kevin Rogovin <kevin.rogovin@intel.com>
>> 
>> Add a narration to i915.rst about Intel GEN GPU's: engines,
>> driver context and relocation.
>> 
>> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
>
> I'm still bummed by the long lines in the bulleted list, but regardless:

Hum, there's no need to do that. Please reflow.

BR,
Jani.

>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>
> Regards, Joonas
>
>> ---
>>  Documentation/gpu/i915.rst      | 116 ++++++++++++++++++++++++++++++++--------
>>  drivers/gpu/drm/i915/i915_vma.h |  10 ++--
>>  2 files changed, 100 insertions(+), 26 deletions(-)
>> 
>> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
>> index 41dc881b00dc..00f897f67f85 100644
>> --- a/Documentation/gpu/i915.rst
>> +++ b/Documentation/gpu/i915.rst
>> @@ -249,6 +249,99 @@ Memory Management and Command Submission
>>  This sections covers all things related to the GEM implementation in the
>>  i915 driver.
>>  
>> +Intel GPU Basics
>> +----------------
>> +
>> +An Intel GPU has multiple engines. There are several engine types.
>> +
>> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
>> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
>> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
>> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
>> +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS.
>> +
>> +The Intel GPU family is a family of integrated GPU's using Unified
>> +Memory Access. For having the GPU "do work", user space will feed the
>> +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
>> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
>> +instruct the GPU to perform work (for example rendering) and that work
>> +needs memory from which to read and memory to which to write. All memory
>> +is encapsulated within GEM buffer objects (usually created with the ioctl
>> +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
>> +to create will also list all GEM buffer objects that the batchbuffer reads
>> +and/or writes. For implementation details of memory management see
>> +`GEM BO Management Implementation Details`_.
>> +
>> +The i915 driver allows user space to create a context via the ioctl
>> +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
>> +integer. Such a context should be veiwed by user-space as -loosely-
>> +analogous to the idea of a CPU process of an operating system. The i915
>> +driver guarantees that commands issued to a fixed context are to be
>> +executed so that writes of a previously issued command are seen by
>> +reads of following commands. Actions issued between different contexts
>> +(even if from the same file descriptor) are NOT given that guarantee
>> +and the only way to synchornize across contexts (even from the same
>> +file descriptor) is through the use of fences. At least as far back as
>> +Gen4, also have that a context carries with it a GPU HW context;
>> +the HW context is essentially (most of atleast) the state of a GPU.
>> +In addition to the ordering gaurantees, the kernel will restore GPU
>> +state via HW context when commands are issued to a context, this saves
>> +user space the need to restore (most of atleast) the GPU state at the
>> +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
>> +is used by user space to create a hardware context which is identified
>> +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
>> +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
>> +to identify what context to use with the command.
>> +
>> +The GPU has its own memory management and address space. The kernel
>> +driver maintains the memory translation table for the GPU. For older
>> +GPUs (i.e. those before Gen8), there is a single global such translation
>> +table, a global Graphics Translation Table (GTT). For newer generation
>> +GPUs each context has its own translation table, called Per-Process
>> +Graphics Translation Table (PPGTT). Of important note, is that although
>> +PPGTT is named per-process it is actually per context. When user space
>> +submits a batchbuffer, the kernel walks the list of GEM buffer objects
>> +used by the batchbuffer and guarantees that not only is the memory of
>> +each such GEM buffer object resident but it is also present in the
>> +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
>> +then it is given an address. Two consequences of this are: the kernel
>> +needs to edit the batchbuffer submitted to write the correct value of
>> +the GPU address when a GEM BO is assigned a GPU address and the kernel
>> +might evict a different GEM BO from the (PP)GTT to make address room
>> +for another GEM BO. Consequently, the ioctls submitting a batchbuffer
>> +for execution also include a list of all locations within buffers that
>> +refer to GPU-addresses so that the kernel can edit the buffer correctly.
>> +This process is dubbed relocation.
>> +
>> +GEM BO Management Implementation Details
>> +----------------------------------------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
>> +   :doc: Virtual Memory Address
>> +
>> +Buffer Object Eviction
>> +----------------------
>> +
>> +This section documents the interface functions for evicting buffer
>> +objects to make space available in the virtual gpu address spaces. Note
>> +that this is mostly orthogonal to shrinking buffer objects caches, which
>> +has the goal to make main memory (shared with the gpu through the
>> +unified memory architecture) available.
>> +
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
>> +   :internal:
>> +
>> +Buffer Object Memory Shrinking
>> +------------------------------
>> +
>> +This section documents the interface function for shrinking memory usage
>> +of buffer object caches. Shrinking is used to make main memory
>> +available. Note that this is mostly orthogonal to evicting buffer
>> +objects, which has the goal to make space in gpu virtual address spaces.
>> +
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
>> +   :internal:
>> +
>>  Batchbuffer Parsing
>>  -------------------
>>  
>> @@ -312,29 +405,6 @@ Object Tiling IOCTLs
>>  .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
>>     :doc: buffer object tiling
>>  
>> -Buffer Object Eviction
>> -----------------------
>> -
>> -This section documents the interface functions for evicting buffer
>> -objects to make space available in the virtual gpu address spaces. Note
>> -that this is mostly orthogonal to shrinking buffer objects caches, which
>> -has the goal to make main memory (shared with the gpu through the
>> -unified memory architecture) available.
>> -
>> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
>> -   :internal:
>> -
>> -Buffer Object Memory Shrinking
>> -------------------------------
>> -
>> -This section documents the interface function for shrinking memory usage
>> -of buffer object caches. Shrinking is used to make main memory
>> -available. Note that this is mostly orthogonal to evicting buffer
>> -objects, which has the goal to make space in gpu virtual address spaces.
>> -
>> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
>> -   :internal:
>> -
>>  GuC
>>  ===
>>  
>> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
>> index 8c5022095418..0000f23a7266 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.h
>> +++ b/drivers/gpu/drm/i915/i915_vma.h
>> @@ -38,9 +38,13 @@
>>  enum i915_cache_level;
>>  
>>  /**
>> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
>> - * VMA's presence cannot be guaranteed before binding, or after unbinding the
>> - * object into/from the address space.
>> + * DOC: Virtual Memory Address
>> + *
>> + * An `i915_vma` struct represents a GEM BO that is bound into an address
>> + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
>> + * after unbinding the object into/from the address space. The struct includes
>> + * the bookkepping details needed for tracking it in all the lists with which
>> + * it interacts.
>>   *
>>   * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
>>   * will always be <= an objects lifetime. So object refcounting should cover us.
>> -- 
>> 2.16.2
>> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
  2018-04-03 12:07   ` Joonas Lahtinen
@ 2018-04-03 13:55   ` Mika Kuoppala
  2018-04-04  7:56   ` Mika Kuoppala
  2018-04-04 22:22   ` Belgaumkar, Vinay
  3 siblings, 0 replies; 20+ messages in thread
From: Mika Kuoppala @ 2018-04-03 13:55 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

kevin.rogovin@intel.com writes:

> From: Kevin Rogovin <kevin.rogovin@intel.com>
>
> Add a narration to i915.rst about Intel GEN GPU's: engines,
> driver context and relocation.
>
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
> ---
>  Documentation/gpu/i915.rst      | 116 ++++++++++++++++++++++++++++++++--------
>  drivers/gpu/drm/i915/i915_vma.h |  10 ++--
>  2 files changed, 100 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 41dc881b00dc..00f897f67f85 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -249,6 +249,99 @@ Memory Management and Command Submission
>  This sections covers all things related to the GEM implementation in the
>  i915 driver.
>  
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
> +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS.
> +
> +The Intel GPU family is a family of integrated GPU's using Unified
> +Memory Access. For having the GPU "do work", user space will feed the
> +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
> +instruct the GPU to perform work (for example rendering) and that work
> +needs memory from which to read and memory to which to write. All memory
> +is encapsulated within GEM buffer objects (usually created with the ioctl
> +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
> +to create will also list all GEM buffer objects that the batchbuffer reads
> +and/or writes. For implementation details of memory management see
> +`GEM BO Management Implementation Details`_.
> +
> +The i915 driver allows user space to create a context via the ioctl
> +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
> +integer. Such a context should be veiwed by user-space as -loosely-

s/veiwed/viewed

> +analogous to the idea of a CPU process of an operating system. The i915
> +driver guarantees that commands issued to a fixed context are to be
> +executed so that writes of a previously issued command are seen by
> +reads of following commands. Actions issued between different contexts
> +(even if from the same file descriptor) are NOT given that guarantee
> +and the only way to synchornize across contexts (even from the same
> +file descriptor) is through the use of fences. At least as far back as
> +Gen4, also have that a context carries with it a GPU HW context;
> +the HW context is essentially (most of atleast) the state of a GPU.
> +In addition to the ordering gaurantees, the kernel will restore GPU

s/gaurantees/guarantees

-Mika

> +state via HW context when commands are issued to a context, this saves
> +user space the need to restore (most of atleast) the GPU state at the
> +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
> +is used by user space to create a hardware context which is identified
> +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
> +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
> +to identify what context to use with the command.
> +
> +The GPU has its own memory management and address space. The kernel
> +driver maintains the memory translation table for the GPU. For older
> +GPUs (i.e. those before Gen8), there is a single global such translation
> +table, a global Graphics Translation Table (GTT). For newer generation
> +GPUs each context has its own translation table, called Per-Process
> +Graphics Translation Table (PPGTT). Of important note, is that although
> +PPGTT is named per-process it is actually per context. When user space
> +submits a batchbuffer, the kernel walks the list of GEM buffer objects
> +used by the batchbuffer and guarantees that not only is the memory of
> +each such GEM buffer object resident but it is also present in the
> +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
> +then it is given an address. Two consequences of this are: the kernel
> +needs to edit the batchbuffer submitted to write the correct value of
> +the GPU address when a GEM BO is assigned a GPU address and the kernel
> +might evict a different GEM BO from the (PP)GTT to make address room
> +for another GEM BO. Consequently, the ioctls submitting a batchbuffer
> +for execution also include a list of all locations within buffers that
> +refer to GPU-addresses so that the kernel can edit the buffer correctly.
> +This process is dubbed relocation.
> +
> +GEM BO Management Implementation Details
> +----------------------------------------
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
> +   :doc: Virtual Memory Address
> +
> +Buffer Object Eviction
> +----------------------
> +
> +This section documents the interface functions for evicting buffer
> +objects to make space available in the virtual gpu address spaces. Note
> +that this is mostly orthogonal to shrinking buffer objects caches, which
> +has the goal to make main memory (shared with the gpu through the
> +unified memory architecture) available.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> +   :internal:
> +
> +Buffer Object Memory Shrinking
> +------------------------------
> +
> +This section documents the interface function for shrinking memory usage
> +of buffer object caches. Shrinking is used to make main memory
> +available. Note that this is mostly orthogonal to evicting buffer
> +objects, which has the goal to make space in gpu virtual address spaces.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> +   :internal:
> +
>  Batchbuffer Parsing
>  -------------------
>  
> @@ -312,29 +405,6 @@ Object Tiling IOCTLs
>  .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
>     :doc: buffer object tiling
>  
> -Buffer Object Eviction
> -----------------------
> -
> -This section documents the interface functions for evicting buffer
> -objects to make space available in the virtual gpu address spaces. Note
> -that this is mostly orthogonal to shrinking buffer objects caches, which
> -has the goal to make main memory (shared with the gpu through the
> -unified memory architecture) available.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> -   :internal:
> -
> -Buffer Object Memory Shrinking
> -------------------------------
> -
> -This section documents the interface function for shrinking memory usage
> -of buffer object caches. Shrinking is used to make main memory
> -available. Note that this is mostly orthogonal to evicting buffer
> -objects, which has the goal to make space in gpu virtual address spaces.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> -   :internal:
> -
>  GuC
>  ===
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 8c5022095418..0000f23a7266 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -38,9 +38,13 @@
>  enum i915_cache_level;
>  
>  /**
> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> - * VMA's presence cannot be guaranteed before binding, or after unbinding the
> - * object into/from the address space.
> + * DOC: Virtual Memory Address
> + *
> + * An `i915_vma` struct represents a GEM BO that is bound into an address
> + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
> + * after unbinding the object into/from the address space. The struct includes
> + * the bookkepping details needed for tracking it in all the lists with which
> + * it interacts.
>   *
>   * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
>   * will always be <= an objects lifetime. So object refcounting should cover us.
> -- 
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-03 13:10   ` Joonas Lahtinen
@ 2018-04-03 14:34     ` Rogovin, Kevin
  2018-04-04  9:31       ` Joonas Lahtinen
  0 siblings, 1 reply; 20+ messages in thread
From: Rogovin, Kevin @ 2018-04-03 14:34 UTC (permalink / raw)
  To: Joonas Lahtinen, abdiel.janulgue, chris, intel-gfx, tvrtko.ursulin

HI,


-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 

 - snip -
>>  
>>         void            (*set_default_submission)(struct intel_engine_cs *engine);
>>  
>> +       /* In addition to pinning the context, returns the intel_ringbuffer
>> +        * to which to write commands.

>	/* Pin context and return intel_ring to write commands to. */

I like that since it is shorter :)

>> +
>> +       /* Request room on the ringbuffer of a request in order to write
>> +        * commands for a request; In addition, if necessary, add commands
>> +        * to the buffer so that the i915_gem_context of the request
>> +        * is the one active for the commands.
>> +        */

> "Reserve room from the ringbuffer for commands and emit necessary context switching commands."?

Agreed; reserved is word to use here.

>> +       /* Add a batchbuffer start command; the GPU command is added to
>> +        * the buffer holding the commands of the request (i.e. calling
>> +        * intel_ring_begin() on i915_request::ring).
>> +        */
>>         int             (*emit_bb_start)(struct i915_request *rq,
>>                                          u64 offset, u32 length,
>>                                         unsigned int dispatch_flags);  
>> #define I915_DISPATCH_SECURE BIT(0)  #define I915_DISPATCH_PINNED 
>> BIT(1)
>>  #define I915_DISPATCH_RS     BIT(2)
>> +       /* Add a memory write command that writes the global sequence number
>> +        * (i915_request::global_seqno) and also add an interrupt command;
>> +        * the GPU command is added to the buffer holding the commands of
>> +        * the request (i.e. calling intel_ring_begin() on
>> +        * i915_request::ring).

>This is more about what a breadcrumb is than what this interface is about. "Add commands for triggering a breadcrumb 
> to be picked up" and maybe explain elsewhere what a breadcrumb is.

> So overall, try to make the comments bit less verbose and leave the implementation detail to the implementation functions :)

I am somewhat tempted to just drop this patch or add more documentation. The function pointers are used in the code common
to the legacy way and LRC way of submitting batchbuffers to the GPU, so they should have somekind of contract to what they are
supposed to do... but spelling out that contract might be a bit much...

Opinions?

 -Kevin
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
  2018-04-03 12:07   ` Joonas Lahtinen
  2018-04-03 13:55   ` Mika Kuoppala
@ 2018-04-04  7:56   ` Mika Kuoppala
  2018-04-04 22:22   ` Belgaumkar, Vinay
  3 siblings, 0 replies; 20+ messages in thread
From: Mika Kuoppala @ 2018-04-04  7:56 UTC (permalink / raw)
  To: intel-gfx, tvrtko.ursulin, abdiel.janulgue, joonas.lahtinen, chris
  Cc: Kevin Rogovin

kevin.rogovin@intel.com writes:

> From: Kevin Rogovin <kevin.rogovin@intel.com>
>
> Add a narration to i915.rst about Intel GEN GPU's: engines,
> driver context and relocation.
>
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>

Few typos pointed on a previous mail with those fixed,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  Documentation/gpu/i915.rst      | 116 ++++++++++++++++++++++++++++++++--------
>  drivers/gpu/drm/i915/i915_vma.h |  10 ++--
>  2 files changed, 100 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 41dc881b00dc..00f897f67f85 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -249,6 +249,99 @@ Memory Management and Command Submission
>  This sections covers all things related to the GEM implementation in the
>  i915 driver.
>  
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
> +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS.
> +
> +The Intel GPU family is a family of integrated GPU's using Unified
> +Memory Access. For having the GPU "do work", user space will feed the
> +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
> +instruct the GPU to perform work (for example rendering) and that work
> +needs memory from which to read and memory to which to write. All memory
> +is encapsulated within GEM buffer objects (usually created with the ioctl
> +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
> +to create will also list all GEM buffer objects that the batchbuffer reads
> +and/or writes. For implementation details of memory management see
> +`GEM BO Management Implementation Details`_.
> +
> +The i915 driver allows user space to create a context via the ioctl
> +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
> +integer. Such a context should be veiwed by user-space as -loosely-
> +analogous to the idea of a CPU process of an operating system. The i915
> +driver guarantees that commands issued to a fixed context are to be
> +executed so that writes of a previously issued command are seen by
> +reads of following commands. Actions issued between different contexts
> +(even if from the same file descriptor) are NOT given that guarantee
> +and the only way to synchornize across contexts (even from the same
> +file descriptor) is through the use of fences. At least as far back as
> +Gen4, also have that a context carries with it a GPU HW context;
> +the HW context is essentially (most of atleast) the state of a GPU.
> +In addition to the ordering gaurantees, the kernel will restore GPU
> +state via HW context when commands are issued to a context, this saves
> +user space the need to restore (most of atleast) the GPU state at the
> +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
> +is used by user space to create a hardware context which is identified
> +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
> +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
> +to identify what context to use with the command.
> +
> +The GPU has its own memory management and address space. The kernel
> +driver maintains the memory translation table for the GPU. For older
> +GPUs (i.e. those before Gen8), there is a single global such translation
> +table, a global Graphics Translation Table (GTT). For newer generation
> +GPUs each context has its own translation table, called Per-Process
> +Graphics Translation Table (PPGTT). Of important note, is that although
> +PPGTT is named per-process it is actually per context. When user space
> +submits a batchbuffer, the kernel walks the list of GEM buffer objects
> +used by the batchbuffer and guarantees that not only is the memory of
> +each such GEM buffer object resident but it is also present in the
> +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
> +then it is given an address. Two consequences of this are: the kernel
> +needs to edit the batchbuffer submitted to write the correct value of
> +the GPU address when a GEM BO is assigned a GPU address and the kernel
> +might evict a different GEM BO from the (PP)GTT to make address room
> +for another GEM BO. Consequently, the ioctls submitting a batchbuffer
> +for execution also include a list of all locations within buffers that
> +refer to GPU-addresses so that the kernel can edit the buffer correctly.
> +This process is dubbed relocation.
> +
> +GEM BO Management Implementation Details
> +----------------------------------------
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
> +   :doc: Virtual Memory Address
> +
> +Buffer Object Eviction
> +----------------------
> +
> +This section documents the interface functions for evicting buffer
> +objects to make space available in the virtual gpu address spaces. Note
> +that this is mostly orthogonal to shrinking buffer objects caches, which
> +has the goal to make main memory (shared with the gpu through the
> +unified memory architecture) available.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> +   :internal:
> +
> +Buffer Object Memory Shrinking
> +------------------------------
> +
> +This section documents the interface function for shrinking memory usage
> +of buffer object caches. Shrinking is used to make main memory
> +available. Note that this is mostly orthogonal to evicting buffer
> +objects, which has the goal to make space in gpu virtual address spaces.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> +   :internal:
> +
>  Batchbuffer Parsing
>  -------------------
>  
> @@ -312,29 +405,6 @@ Object Tiling IOCTLs
>  .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
>     :doc: buffer object tiling
>  
> -Buffer Object Eviction
> -----------------------
> -
> -This section documents the interface functions for evicting buffer
> -objects to make space available in the virtual gpu address spaces. Note
> -that this is mostly orthogonal to shrinking buffer objects caches, which
> -has the goal to make main memory (shared with the gpu through the
> -unified memory architecture) available.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> -   :internal:
> -
> -Buffer Object Memory Shrinking
> -------------------------------
> -
> -This section documents the interface function for shrinking memory usage
> -of buffer object caches. Shrinking is used to make main memory
> -available. Note that this is mostly orthogonal to evicting buffer
> -objects, which has the goal to make space in gpu virtual address spaces.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> -   :internal:
> -
>  GuC
>  ===
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 8c5022095418..0000f23a7266 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -38,9 +38,13 @@
>  enum i915_cache_level;
>  
>  /**
> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> - * VMA's presence cannot be guaranteed before binding, or after unbinding the
> - * object into/from the address space.
> + * DOC: Virtual Memory Address
> + *
> + * An `i915_vma` struct represents a GEM BO that is bound into an address
> + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
> + * after unbinding the object into/from the address space. The struct includes
> + * the bookkepping details needed for tracking it in all the lists with which
> + * it interacts.
>   *
>   * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
>   * will always be <= an objects lifetime. So object refcounting should cover us.
> -- 
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-03 14:34     ` Rogovin, Kevin
@ 2018-04-04  9:31       ` Joonas Lahtinen
  2018-04-04  9:48         ` Jani Nikula
  0 siblings, 1 reply; 20+ messages in thread
From: Joonas Lahtinen @ 2018-04-04  9:31 UTC (permalink / raw)
  To: Rogovin, Kevin, abdiel.janulgue, chris, intel-gfx,
	tvrtko.ursulin, Jani Nikula

+ Jani for Sphinx

Quoting Rogovin, Kevin (2018-04-03 17:34:49)
> I am somewhat tempted to just drop this patch or add more documentation. The function pointers are used in the code common
> to the legacy way and LRC way of submitting batchbuffers to the GPU, so they should have somekind of contract to what they are
> supposed to do... but spelling out that contract might be a bit much...
> 
> Opinions?

No big feelings to either direction, you could add a documentation block
for the flow nearby.

If the struct members are referred to from documentation blocks, how far
are we from generating warnings if a patch renames something that
becomes non-existent in .rst or documentation block? (this one for Jani)

Regards, Joonas

> 
>  -Kevin
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-04  9:31       ` Joonas Lahtinen
@ 2018-04-04  9:48         ` Jani Nikula
  2018-04-04 10:27           ` Joonas Lahtinen
  0 siblings, 1 reply; 20+ messages in thread
From: Jani Nikula @ 2018-04-04  9:48 UTC (permalink / raw)
  To: Joonas Lahtinen, Rogovin, Kevin, abdiel.janulgue, chris,
	intel-gfx, tvrtko.ursulin

On Wed, 04 Apr 2018, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> + Jani for Sphinx
>
> Quoting Rogovin, Kevin (2018-04-03 17:34:49)
>> I am somewhat tempted to just drop this patch or add more documentation. The function pointers are used in the code common
>> to the legacy way and LRC way of submitting batchbuffers to the GPU, so they should have somekind of contract to what they are
>> supposed to do... but spelling out that contract might be a bit much...
>> 
>> Opinions?
>
> No big feelings to either direction, you could add a documentation block
> for the flow nearby.
>
> If the struct members are referred to from documentation blocks, how far
> are we from generating warnings if a patch renames something that
> becomes non-existent in .rst or documentation block? (this one for Jani)

So first of all, the comments here are not kernel-doc comments, just
regular comments. It's just free text.

If you want them to be kernel-doc comments, included to some fancy
generated documentation, you'll have to follow the guide at [1], wrap
them in /** and */ and add the @member: tag at the start.

Specifically, struct::member is not a thing. If you want to reference
documented struct members in kernel-doc comments, you'll need to use
&struct_name->member or &struct_name.member.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-04  9:48         ` Jani Nikula
@ 2018-04-04 10:27           ` Joonas Lahtinen
  2018-04-04 11:59             ` Jani Nikula
  0 siblings, 1 reply; 20+ messages in thread
From: Joonas Lahtinen @ 2018-04-04 10:27 UTC (permalink / raw)
  To: Rogovin, Kevin, abdiel.janulgue, chris, intel-gfx,
	tvrtko.ursulin, Jani Nikula

Quoting Jani Nikula (2018-04-04 12:48:55)
> On Wed, 04 Apr 2018, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> > + Jani for Sphinx
> >
> > Quoting Rogovin, Kevin (2018-04-03 17:34:49)
> >> I am somewhat tempted to just drop this patch or add more documentation. The function pointers are used in the code common
> >> to the legacy way and LRC way of submitting batchbuffers to the GPU, so they should have somekind of contract to what they are
> >> supposed to do... but spelling out that contract might be a bit much...
> >> 
> >> Opinions?
> >
> > No big feelings to either direction, you could add a documentation block
> > for the flow nearby.
> >
> > If the struct members are referred to from documentation blocks, how far
> > are we from generating warnings if a patch renames something that
> > becomes non-existent in .rst or documentation block? (this one for Jani)
> 
> So first of all, the comments here are not kernel-doc comments, just
> regular comments. It's just free text.
> 
> If you want them to be kernel-doc comments, included to some fancy
> generated documentation, you'll have to follow the guide at [1], wrap
> them in /** and */ and add the @member: tag at the start.
> 
> Specifically, struct::member is not a thing. If you want to reference
> documented struct members in kernel-doc comments, you'll need to use
> &struct_name->member or &struct_name.member.

That was known, but thanks for reminding. What I was weighing was what's the
likelihood of noticing if some struct members get renamed in a patch and
the documentation breaks.

To be perfectly clear: Are we working towards a clean make htmldocs and
CI nagging when it gets broken?

Regards, Joonas

> 
> BR,
> Jani.
> 
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] i915: add documentation to intel_engine_cs
  2018-04-04 10:27           ` Joonas Lahtinen
@ 2018-04-04 11:59             ` Jani Nikula
  0 siblings, 0 replies; 20+ messages in thread
From: Jani Nikula @ 2018-04-04 11:59 UTC (permalink / raw)
  To: Joonas Lahtinen, Rogovin, Kevin, abdiel.janulgue, chris,
	intel-gfx, tvrtko.ursulin

On Wed, 04 Apr 2018, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> To be perfectly clear: Are we working towards a clean make htmldocs and
> CI nagging when it gets broken?

I'd like that to be the goal, yes. I'm not sure if anyone's working
towards that.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration
  2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
                     ` (2 preceding siblings ...)
  2018-04-04  7:56   ` Mika Kuoppala
@ 2018-04-04 22:22   ` Belgaumkar, Vinay
  3 siblings, 0 replies; 20+ messages in thread
From: Belgaumkar, Vinay @ 2018-04-04 22:22 UTC (permalink / raw)
  To: kevin.rogovin, intel-gfx, tvrtko.ursulin, abdiel.janulgue,
	joonas.lahtinen, chris



On 4/3/2018 3:52 AM, kevin.rogovin@intel.com wrote:
> From: Kevin Rogovin <kevin.rogovin@intel.com>
> 
> Add a narration to i915.rst about Intel GEN GPU's: engines,
> driver context and relocation.
> 
> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
> ---
>   Documentation/gpu/i915.rst      | 116 ++++++++++++++++++++++++++++++++--------
>   drivers/gpu/drm/i915/i915_vma.h |  10 ++--
>   2 files changed, 100 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 41dc881b00dc..00f897f67f85 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -249,6 +249,99 @@ Memory Management and Command Submission
>   This sections covers all things related to the GEM implementation in the
>   i915 driver.
>   
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
> +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS.
> +
> +The Intel GPU family is a family of integrated GPU's using Unified
> +Memory Access. For having the GPU "do work", user space will feed the
> +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
> +instruct the GPU to perform work (for example rendering) and that work
> +needs memory from which to read and memory to which to write. All memory
> +is encapsulated within GEM buffer objects (usually created with the ioctl
> +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
> +to create will also list all GEM buffer objects that the batchbuffer reads
> +and/or writes. For implementation details of memory management see
> +`GEM BO Management Implementation Details`_.
> +
> +The i915 driver allows user space to create a context via the ioctl
> +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit

Context is defined here

> +integer. Such a context should be veiwed by user-space as -loosely-
> +analogous to the idea of a CPU process of an operating system. The i915
> +driver guarantees that commands issued to a fixed context are to be
> +executed so that writes of a previously issued command are seen by
> +reads of following commands. Actions issued between different contexts
> +(even if from the same file descriptor) are NOT given that guarantee
> +and the only way to synchornize across contexts (even from the same

s/synchornize/synchronize

> +file descriptor) is through the use of fences. At least as far back as
> +Gen4, also have that a context carries with it a GPU HW context;
> +the HW context is essentially (most of atleast) the state of a GPU.
> +In addition to the ordering gaurantees, the kernel will restore GPU
> +state via HW context when commands are issued to a context, this saves
> +user space the need to restore (most of atleast) the GPU state at the
> +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
> +is used by user space to create a hardware context which is identified

Duplicate of above definition of context?

> +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
> +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
> +to identify what context to use with the command.
> +
> +The GPU has its own memory management and address space. The kernel
> +driver maintains the memory translation table for the GPU. For older
> +GPUs (i.e. those before Gen8), there is a single global such translation
> +table, a global Graphics Translation Table (GTT). For newer generation
> +GPUs each context has its own translation table, called Per-Process
> +Graphics Translation Table (PPGTT). Of important note, is that although
> +PPGTT is named per-process it is actually per context. When user space
> +submits a batchbuffer, the kernel walks the list of GEM buffer objects
> +used by the batchbuffer and guarantees that not only is the memory of
> +each such GEM buffer object resident but it is also present in the
> +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
> +then it is given an address. Two consequences of this are: the kernel
> +needs to edit the batchbuffer submitted to write the correct value of
> +the GPU address when a GEM BO is assigned a GPU address and the kernel
> +might evict a different GEM BO from the (PP)GTT to make address room
> +for another GEM BO. Consequently, the ioctls submitting a batchbuffer
> +for execution also include a list of all locations within buffers that
> +refer to GPU-addresses so that the kernel can edit the buffer correctly.
> +This process is dubbed relocation.
> +
> +GEM BO Management Implementation Details
> +----------------------------------------
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
> +   :doc: Virtual Memory Address
> +
> +Buffer Object Eviction
> +----------------------
> +
> +This section documents the interface functions for evicting buffer
> +objects to make space available in the virtual gpu address spaces. Note
> +that this is mostly orthogonal to shrinking buffer objects caches, which
> +has the goal to make main memory (shared with the gpu through the
> +unified memory architecture) available.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> +   :internal:
> +
> +Buffer Object Memory Shrinking
> +------------------------------
> +
> +This section documents the interface function for shrinking memory usage
> +of buffer object caches. Shrinking is used to make main memory
> +available. Note that this is mostly orthogonal to evicting buffer
> +objects, which has the goal to make space in gpu virtual address spaces.
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> +   :internal:
> +
>   Batchbuffer Parsing
>   -------------------
>   
> @@ -312,29 +405,6 @@ Object Tiling IOCTLs
>   .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
>      :doc: buffer object tiling
>   
> -Buffer Object Eviction
> -----------------------
> -
> -This section documents the interface functions for evicting buffer
> -objects to make space available in the virtual gpu address spaces. Note
> -that this is mostly orthogonal to shrinking buffer objects caches, which
> -has the goal to make main memory (shared with the gpu through the
> -unified memory architecture) available.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
> -   :internal:
> -
> -Buffer Object Memory Shrinking
> -------------------------------
> -
> -This section documents the interface function for shrinking memory usage
> -of buffer object caches. Shrinking is used to make main memory
> -available. Note that this is mostly orthogonal to evicting buffer
> -objects, which has the goal to make space in gpu virtual address spaces.
> -
> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
> -   :internal:
> -
>   GuC
>   ===
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 8c5022095418..0000f23a7266 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -38,9 +38,13 @@
>   enum i915_cache_level;
>   
>   /**
> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> - * VMA's presence cannot be guaranteed before binding, or after unbinding the
> - * object into/from the address space.
> + * DOC: Virtual Memory Address
> + *
> + * An `i915_vma` struct represents a GEM BO that is bound into an address
> + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
> + * after unbinding the object into/from the address space. The struct includes
> + * the bookkepping details needed for tracking it in all the lists with which
> + * it interacts.
>    *
>    * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
>    * will always be <= an objects lifetime. So object refcounting should cover us.
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-04-04 22:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 10:52 [PATCH v4 0/5] Documentation patch for batchbuffer submission kevin.rogovin
2018-04-03 10:52 ` [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration kevin.rogovin
2018-04-03 12:07   ` Joonas Lahtinen
2018-04-03 13:31     ` Jani Nikula
2018-04-03 13:55   ` Mika Kuoppala
2018-04-04  7:56   ` Mika Kuoppala
2018-04-04 22:22   ` Belgaumkar, Vinay
2018-04-03 10:52 ` [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer kevin.rogovin
2018-04-03 12:15   ` Joonas Lahtinen
2018-04-03 10:52 ` [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c kevin.rogovin
2018-04-03 12:16   ` Joonas Lahtinen
2018-04-03 10:52 ` [PATCH v4 4/5] i915: correct lazy ringbuffer and backing store documentation kevin.rogovin
2018-04-03 10:52 ` [PATCH v4 5/5] i915: add documentation to intel_engine_cs kevin.rogovin
2018-04-03 13:10   ` Joonas Lahtinen
2018-04-03 14:34     ` Rogovin, Kevin
2018-04-04  9:31       ` Joonas Lahtinen
2018-04-04  9:48         ` Jani Nikula
2018-04-04 10:27           ` Joonas Lahtinen
2018-04-04 11:59             ` Jani Nikula
2018-04-03 12:19 ` ✗ Fi.CI.BAT: failure for Documentation patch for batchbuffer submission (rev4) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.