From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id E5BA510E337 for ; Tue, 11 Jul 2023 09:07:00 +0000 (UTC) Date: Tue, 11 Jul 2023 11:06:32 +0200 From: Zbigniew =?utf-8?Q?Kempczy=C5=84ski?= To: Karolina Stolarek Message-ID: <20230711090632.yrmouv5ajsjhdjrs@zkempczy-mobl2> References: <20230706060555.282757-1-zbigniew.kempczynski@intel.com> <20230706060555.282757-12-zbigniew.kempczynski@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [igt-dev] [PATCH i-g-t v2 11/16] lib/intel_ctx: Add xe context information List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: igt-dev@lists.freedesktop.org Errors-To: igt-dev-bounces@lists.freedesktop.org Sender: "igt-dev" List-ID: On Fri, Jul 07, 2023 at 10:31:19AM +0200, Karolina Stolarek wrote: > On 6.07.2023 08:05, Zbigniew Kempczyński wrote: > > Most complicated part in adopting i915_blt to intel_blt - which should > > handle both drivers - is how to achieve pipelined execution. In term > > pipelined execution I mean all gpu workloads are executed without > > stalls. > > > > Comparing to i915 relocations and softpinning xe architecture migrates > > binding (this means also unbind operation) responsibility from the > > kernel to user via vm_bind ioctl(). To avoid stalls user has to > > provide in/out fences (syncobjs) between consecutive bindings/execs. > > Of course for many igt tests we don't need pipelined execution, > > just synchronous bind, then exec. But exercising the driver should > > also cover pipelining to verify it is possible to work without stalls. > > > > I decided to extend intel_ctx_t with all necessary for xe objects > > (vm, engine, syncobjs) to get flexibility in deciding how to bind, > > execute and wait for (synchronize) those operations. Context object > > along with i915 engine is already passed to blitter library so adding > > xe required fields doesn't break i915 but will allow xe path to get > > all necessary data to execute. > > > > Using intel_ctx with xe requires some code patterns caused by usage > > of an allocator. For xe the allocator started tracking alloc()/free() > > operations to do bind/unbind in one call just before execution. > > I've added two helpers in intel_ctx which - intel_ctx_xe_exec() > > and intel_ctx_xe_sync(). Depending how intel_ctx was created > > (with 0 or real syncobj handles as in/bind/out fences) bind and exec > > in intel_ctx_xe_exec() are pipelined but synchronize last operation > > (exec). For real syncobjs they are used to join bind + exec calls > > but there's no wait for exec (sync-out) completion. This allows > > building more cascaded bind + exec operations without stalls. > > > > To wait for a sync-out fence caller may use intel_ctx_xe_sync() > > which is synchronous wait on syncobj. It allows user to reset > > fences to prepare for next operation. > > > > Signed-off-by: Zbigniew Kempczyński > > --- > > lib/intel_ctx.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++- > > lib/intel_ctx.h | 14 ++++++ > > 2 files changed, 123 insertions(+), 1 deletion(-) > > > > diff --git a/lib/intel_ctx.c b/lib/intel_ctx.c > > index ded9c0f1e4..f210907fac 100644 > > --- a/lib/intel_ctx.c > > +++ b/lib/intel_ctx.c > > @@ -5,9 +5,12 @@ > > #include > > +#include "i915/gem_engine_topology.h" > > +#include "igt_syncobj.h" > > +#include "intel_allocator.h" > > #include "intel_ctx.h" > > #include "ioctl_wrappers.h" > > -#include "i915/gem_engine_topology.h" > > +#include "xe/xe_ioctl.h" > > /** > > * SECTION:intel_ctx > > @@ -390,3 +393,108 @@ unsigned int intel_ctx_engine_class(const intel_ctx_t *ctx, unsigned int engine) > > { > > return intel_ctx_cfg_engine_class(&ctx->cfg, engine); > > } > > + > > +/** > > + * intel_ctx_xe: > > + * @fd: open i915 drm file descriptor > > + * @vm: vm > > + * @engine: engine > > + * > > + * Returns an intel_ctx_t representing the xe context. > > + */ > > +intel_ctx_t *intel_ctx_xe(int fd, uint32_t vm, uint32_t engine, > > + uint32_t sync_in, uint32_t sync_bind, uint32_t sync_out) > > +{ > > + intel_ctx_t *ctx; > > + > > + ctx = calloc(1, sizeof(*ctx)); > > + igt_assert(ctx); > > + > > + ctx->fd = fd; > > + ctx->vm = vm; > > + ctx->engine = engine; > > + ctx->sync_in = sync_in; > > + ctx->sync_bind = sync_bind; > > + ctx->sync_out = sync_out; > > + > > + return ctx; > > +} > > + > > +static int __xe_exec(int fd, struct drm_xe_exec *exec) > > +{ > > + int err = 0; > > + > > + if (igt_ioctl(fd, DRM_IOCTL_XE_EXEC, exec)) { > > + err = -errno; > > + igt_assume(err != 0); > > Wouldn't "igt_assume(err)" be enough? > > > + } > > + errno = 0; > > + return err; > > +} > > I'm aware that it's a helper that you use in other execs, but it feels out > of place, it doesn't deal with intel_ctx_t. Maybe xe_util could be its new > home? > I'm going to just export __xe_exec() from xe_ioctl.c. > > + > > +int __intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset) > > +{ > > + struct drm_xe_sync syncs[2] = { > > + { .flags = DRM_XE_SYNC_SYNCOBJ, }, > > + { .flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL, }, > > + }; > > + struct drm_xe_exec exec = { > > + .engine_id = ctx->engine, > > + .syncs = (uintptr_t)syncs, > > + .num_syncs = 2, > > + .address = bb_offset, > > + .num_batch_buffer = 1, > > + }; > > + uint32_t sync_in = ctx->sync_in; > > + uint32_t sync_bind = ctx->sync_bind ?: syncobj_create(ctx->fd, 0); > > + uint32_t sync_out = ctx->sync_out ?: syncobj_create(ctx->fd, 0); > > + int ret; > > + > > + /* Synchronize allocator state -> vm */ > > + intel_allocator_bind(ahnd, sync_in, sync_bind); > > + > > + /* Pipelined exec */ > > + syncs[0].handle = sync_bind; > > + syncs[1].handle = sync_out; > > + > > + ret = __xe_exec(ctx->fd, &exec); > > + if (ret) > > + goto err; > > + > > + if (!ctx->sync_bind || !ctx->sync_out) > > + syncobj_wait_err(ctx->fd, &sync_out, 1, INT64_MAX, 0); > > This whole flow is so nice and tidy, I like it > > > + > > +err: > > + if (!ctx->sync_bind) > > + syncobj_destroy(ctx->fd, sync_bind); > > + > > + if (!ctx->sync_out) > > + syncobj_destroy(ctx->fd, sync_out); > > + > > + return ret; > > +} > > + > > +void intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset) > > +{ > > + igt_assert_eq(__intel_ctx_xe_exec(ctx, ahnd, bb_offset), 0); > > +} > > + > > +#define RESET_SYNCOBJ(__fd, __sync) do { \ > > + if (__sync) \ > > + syncobj_reset((__fd), &(__sync), 1); \ > > +} while (0) > > + > > +int intel_ctx_xe_sync(intel_ctx_t *ctx, bool reset_syncs) > > +{ > > + int ret; > > + > > + ret = syncobj_wait_err(ctx->fd, &ctx->sync_out, 1, INT64_MAX, 0); > > + > > + if (reset_syncs) { > > + RESET_SYNCOBJ(ctx->fd, ctx->sync_in); > > + RESET_SYNCOBJ(ctx->fd, ctx->sync_bind); > > + RESET_SYNCOBJ(ctx->fd, ctx->sync_out); > > + } > > Is there a usecase where we want to do a synced execution without resetting > syncobjs? > I don't know - that's I left decision to the user. > > + > > + return ret; > > +} > > diff --git a/lib/intel_ctx.h b/lib/intel_ctx.h > > index 3cfeaae81e..59d0360ada 100644 > > --- a/lib/intel_ctx.h > > +++ b/lib/intel_ctx.h > > @@ -67,6 +67,14 @@ int intel_ctx_cfg_engine_class(const intel_ctx_cfg_t *cfg, unsigned int engine); > > typedef struct intel_ctx { > > uint32_t id; > > intel_ctx_cfg_t cfg; > > + > > + /* Xe */ > > + int fd; > > + uint32_t vm; > > + uint32_t engine; > > + uint32_t sync_in; > > + uint32_t sync_bind; > > + uint32_t sync_out; > > Hmm, I wonder if we could wrap it in a struct. Yes, it would be painful to > unpack, but now it feels like we've just added a bunch of fields that are > irrelevant 80% of the time. Instead, we could have one additional field that > could be NULL, and use it if it's initialized. > But maybe I'm just being too nit-picky. I wondered to introduce union of i915 and xe structs, but I would need to rewrite almost all igt's which are using this struct so I dropped this idea. At the moment I need to handle both drivers so mixing fields is not a big pain imo. -- Zbigniew > > All the best, > Karolina > > > } intel_ctx_t; > > int __intel_ctx_create(int fd, const intel_ctx_cfg_t *cfg, > > @@ -81,4 +89,10 @@ void intel_ctx_destroy(int fd, const intel_ctx_t *ctx); > > unsigned int intel_ctx_engine_class(const intel_ctx_t *ctx, unsigned int engine); > > +intel_ctx_t *intel_ctx_xe(int fd, uint32_t vm, uint32_t engine, > > + uint32_t sync_in, uint32_t sync_bind, uint32_t sync_out); > > +int __intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset); > > +void intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset); > > +int intel_ctx_xe_sync(intel_ctx_t *ctx, bool reset_syncs); > > + > > #endif