From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 294E3C433DF for ; Fri, 17 Jul 2020 08:34:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 036E62071A for ; Fri, 17 Jul 2020 08:34:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 036E62071A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6E09A6E0FF; Fri, 17 Jul 2020 08:34:12 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 828B66E0FF; Fri, 17 Jul 2020 08:34:11 +0000 (UTC) IronPort-SDR: J4/mLQDXPgdY2ch+x1sTnkQDx1ARDbOLzYf8Bwlvk0kxUSNBM91fvOyyPE/Y5QCS1Zef/bxhjQ mKhJ4tDe66ZA== X-IronPort-AV: E=McAfee;i="6000,8403,9684"; a="167689124" X-IronPort-AV: E=Sophos;i="5.75,362,1589266800"; d="scan'208";a="167689124" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2020 01:34:11 -0700 IronPort-SDR: ocl+GPfs1HOyl3RnD33K1XD47f8E82SL+cEgylhgRmkJ8pWexBT+ymonHxKeaZwTibX+zWtEvs n0kxZp+vHiyA== X-IronPort-AV: E=Sophos;i="5.75,362,1589266800"; d="scan'208";a="430778854" Received: from gpanagop-mobl.ger.corp.intel.com (HELO [10.249.33.238]) ([10.249.33.238]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2020 01:34:09 -0700 To: Chris Wilson , intel-gfx@lists.freedesktop.org References: <20200716204448.737869-1-chris@chris-wilson.co.uk> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: Date: Fri, 17 Jul 2020 09:34:07 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200716204448.737869-1-chris@chris-wilson.co.uk> Content-Language: en-US Subject: Re: [Intel-gfx] [PATCH i-g-t] i915/gem_exec_balancer: Race breadcrumb signaling against timeslicing X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: igt-dev@lists.freedesktop.org Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 16/07/2020 21:44, Chris Wilson wrote: > This is an attempt to chase down some preempt-to-busy races with > breadcrumb signaling on the virtual engines. By using more semaphore > spinners than available engines, we encourage very short timeslices, and > we make each batch of random duration to try and coincide the end of a > batch with the context being scheduled out. > > Signed-off-by: Chris Wilson > Cc: Tvrtko Ursulin > --- > tests/i915/gem_exec_balancer.c | 109 +++++++++++++++++++++++++++++++++ > 1 file changed, 109 insertions(+) > > diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c > index c5c0055fc..e4d9e0464 100644 > --- a/tests/i915/gem_exec_balancer.c > +++ b/tests/i915/gem_exec_balancer.c > @@ -2240,6 +2240,112 @@ static void hog(int i915) > gem_quiescent_gpu(i915); > } > > +static uint32_t sema_create(int i915, uint64_t addr, uint32_t **x) > +{ > + uint32_t handle = gem_create(i915, 4096); > + > + *x = gem_mmap__device_coherent(i915, handle, 0, 4096, PROT_WRITE); > + for (int n = 1; n <= 32; n++) { > + uint32_t *cs = *x + n * 16; Okay so semaphore target is in the batch itself, that's why first 16 dwords are nops for convenience. > + > + *cs++ = MI_SEMAPHORE_WAIT | > + MI_SEMAPHORE_POLL | > + MI_SEMAPHORE_SAD_GTE_SDD | > + (4 - 2); > + *cs++ = n; > + *cs++ = addr; > + *cs++ = addr >> 32; > + > + *cs++ = MI_BATCH_BUFFER_END; > + } > + > + return handle; > +} > + > +static uint32_t *sema(int i915, uint32_t ctx) > +{ > + uint32_t *ctl; > + struct drm_i915_gem_exec_object2 batch = { > + .handle = sema_create(i915, 64 << 20, &ctl), > + .offset = 64 << 20, > + .flags = EXEC_OBJECT_PINNED > + }; > + struct drm_i915_gem_execbuffer2 execbuf = { > + .buffers_ptr = to_user_pointer(&batch), > + .buffer_count = 1, > + .rsvd1 = gem_context_clone_with_engines(i915, ctx), > + }; > + int64_t poll = 1; > + > + for (int n = 1; n <= 32; n++) { > + execbuf.batch_start_offset = 64 * n, > + gem_execbuf(i915, &execbuf); > + /* Force a breadcrumb to be installed on each request */ > + gem_wait(i915, batch.handle, &poll); > + } > + > + gem_context_destroy(i915, execbuf.rsvd1); > + > + igt_assert(gem_bo_busy(i915, batch.handle)); > + gem_close(i915, batch.handle); > + > + return ctl; > +} > + > +static void __waits(int i915, int timeout, uint32_t ctx, unsigned int count) > +{ > + uint32_t *semaphores[count + 1]; > + > + for (int i = 0; i <= count; i++) > + semaphores[i] = sema(i915, ctx); > + > + igt_until_timeout(timeout) { > + int i = rand() % (count + 1); > + > + if ((*semaphores[i] += rand() % 32) >= 32) { Write releases some batch buffers, until it knows it released all of them when it creates a new set. > + munmap(semaphores[i], 4096); > + semaphores[i] = sema(i915, ctx); > + } > + } > + > + for (int i = 0; i <= count; i++) { > + *semaphores[i] = 0xffffffff; > + munmap(semaphores[i], 4096); > + } > +} > + > +static void waits(int i915, int timeout) > +{ > + igt_require(gem_scheduler_has_preemption(i915)); > + igt_require(gem_scheduler_has_semaphores(i915)); > + > + for (int class = 0; class < 32; class++) { > + struct i915_engine_class_instance *ci; > + unsigned int count; > + uint32_t ctx; > + > + ci = list_engines(i915, 1u << class, &count); > + if (!ci) > + continue; > + > + if (count < 2) { > + free(ci); > + continue; > + } > + > + ctx = load_balancer_create(i915, ci, count); > + > + __waits(i915, timeout, ctx, count); > + > + gem_context_destroy(i915, ctx); > + igt_waitchildren(); Don't see any forking in the test. > + > + free(ci); > + } > + > + gem_quiescent_gpu(i915); > +} > + > static void nop(int i915) > { > struct drm_i915_gem_exec_object2 batch = { > @@ -2729,6 +2835,9 @@ igt_main > igt_subtest("hog") > hog(i915); > > + igt_subtest("waits") > + waits(i915, 5); > + > igt_subtest("smoke") > smoketest(i915, 20); > > Looks okay in principle. Reviewed-by: Tvrtko Ursulin I am not sure if the batch duration is not too short in practice, the add loop will really rapidly end all, just needs 64 iterations on average to end all 32 I think. So 64 WC writes from the CPU compared to CSB processing and breadcrumb signaling latencies might be too short. Maybe some small random udelays in the loop would be more realistic. Maybe as a 2nd flavour of the test just in case.. more coverage the better. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: References: <20200716204448.737869-1-chris@chris-wilson.co.uk> From: Tvrtko Ursulin Message-ID: Date: Fri, 17 Jul 2020 09:34:07 +0100 MIME-Version: 1.0 In-Reply-To: <20200716204448.737869-1-chris@chris-wilson.co.uk> Content-Language: en-US Subject: Re: [igt-dev] [Intel-gfx] [PATCH i-g-t] i915/gem_exec_balancer: Race breadcrumb signaling against timeslicing List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: igt-dev-bounces@lists.freedesktop.org Sender: "igt-dev" To: Chris Wilson , intel-gfx@lists.freedesktop.org Cc: igt-dev@lists.freedesktop.org List-ID: On 16/07/2020 21:44, Chris Wilson wrote: > This is an attempt to chase down some preempt-to-busy races with > breadcrumb signaling on the virtual engines. By using more semaphore > spinners than available engines, we encourage very short timeslices, and > we make each batch of random duration to try and coincide the end of a > batch with the context being scheduled out. > > Signed-off-by: Chris Wilson > Cc: Tvrtko Ursulin > --- > tests/i915/gem_exec_balancer.c | 109 +++++++++++++++++++++++++++++++++ > 1 file changed, 109 insertions(+) > > diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c > index c5c0055fc..e4d9e0464 100644 > --- a/tests/i915/gem_exec_balancer.c > +++ b/tests/i915/gem_exec_balancer.c > @@ -2240,6 +2240,112 @@ static void hog(int i915) > gem_quiescent_gpu(i915); > } > > +static uint32_t sema_create(int i915, uint64_t addr, uint32_t **x) > +{ > + uint32_t handle = gem_create(i915, 4096); > + > + *x = gem_mmap__device_coherent(i915, handle, 0, 4096, PROT_WRITE); > + for (int n = 1; n <= 32; n++) { > + uint32_t *cs = *x + n * 16; Okay so semaphore target is in the batch itself, that's why first 16 dwords are nops for convenience. > + > + *cs++ = MI_SEMAPHORE_WAIT | > + MI_SEMAPHORE_POLL | > + MI_SEMAPHORE_SAD_GTE_SDD | > + (4 - 2); > + *cs++ = n; > + *cs++ = addr; > + *cs++ = addr >> 32; > + > + *cs++ = MI_BATCH_BUFFER_END; > + } > + > + return handle; > +} > + > +static uint32_t *sema(int i915, uint32_t ctx) > +{ > + uint32_t *ctl; > + struct drm_i915_gem_exec_object2 batch = { > + .handle = sema_create(i915, 64 << 20, &ctl), > + .offset = 64 << 20, > + .flags = EXEC_OBJECT_PINNED > + }; > + struct drm_i915_gem_execbuffer2 execbuf = { > + .buffers_ptr = to_user_pointer(&batch), > + .buffer_count = 1, > + .rsvd1 = gem_context_clone_with_engines(i915, ctx), > + }; > + int64_t poll = 1; > + > + for (int n = 1; n <= 32; n++) { > + execbuf.batch_start_offset = 64 * n, > + gem_execbuf(i915, &execbuf); > + /* Force a breadcrumb to be installed on each request */ > + gem_wait(i915, batch.handle, &poll); > + } > + > + gem_context_destroy(i915, execbuf.rsvd1); > + > + igt_assert(gem_bo_busy(i915, batch.handle)); > + gem_close(i915, batch.handle); > + > + return ctl; > +} > + > +static void __waits(int i915, int timeout, uint32_t ctx, unsigned int count) > +{ > + uint32_t *semaphores[count + 1]; > + > + for (int i = 0; i <= count; i++) > + semaphores[i] = sema(i915, ctx); > + > + igt_until_timeout(timeout) { > + int i = rand() % (count + 1); > + > + if ((*semaphores[i] += rand() % 32) >= 32) { Write releases some batch buffers, until it knows it released all of them when it creates a new set. > + munmap(semaphores[i], 4096); > + semaphores[i] = sema(i915, ctx); > + } > + } > + > + for (int i = 0; i <= count; i++) { > + *semaphores[i] = 0xffffffff; > + munmap(semaphores[i], 4096); > + } > +} > + > +static void waits(int i915, int timeout) > +{ > + igt_require(gem_scheduler_has_preemption(i915)); > + igt_require(gem_scheduler_has_semaphores(i915)); > + > + for (int class = 0; class < 32; class++) { > + struct i915_engine_class_instance *ci; > + unsigned int count; > + uint32_t ctx; > + > + ci = list_engines(i915, 1u << class, &count); > + if (!ci) > + continue; > + > + if (count < 2) { > + free(ci); > + continue; > + } > + > + ctx = load_balancer_create(i915, ci, count); > + > + __waits(i915, timeout, ctx, count); > + > + gem_context_destroy(i915, ctx); > + igt_waitchildren(); Don't see any forking in the test. > + > + free(ci); > + } > + > + gem_quiescent_gpu(i915); > +} > + > static void nop(int i915) > { > struct drm_i915_gem_exec_object2 batch = { > @@ -2729,6 +2835,9 @@ igt_main > igt_subtest("hog") > hog(i915); > > + igt_subtest("waits") > + waits(i915, 5); > + > igt_subtest("smoke") > smoketest(i915, 20); > > Looks okay in principle. Reviewed-by: Tvrtko Ursulin I am not sure if the batch duration is not too short in practice, the add loop will really rapidly end all, just needs 64 iterations on average to end all 32 I think. So 64 WC writes from the CPU compared to CSB processing and breadcrumb signaling latencies might be too short. Maybe some small random udelays in the loop would be more realistic. Maybe as a 2nd flavour of the test just in case.. more coverage the better. Regards, Tvrtko _______________________________________________ igt-dev mailing list igt-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/igt-dev