From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 08/17] drm/i915/selftests: Add request throughput measurement to perf
Date: Tue, 10 Mar 2020 11:58:26 +0000 [thread overview]
Message-ID: <08d601d5-1583-61e4-113d-8208a17d3d0f@linux.intel.com> (raw)
In-Reply-To: <158383855988.16414.10338993219228723247@build.alporthouse.com>
On 10/03/2020 11:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-03-10 10:38:21)
>>
>> On 06/03/2020 13:38, Chris Wilson wrote:
>>> +static int perf_many(void *arg)
>>> +{
>>> + struct perf_parallel *p = arg;
>>> + struct intel_engine_cs *engine = p->engine;
>>> + struct intel_context *ce;
>>> + IGT_TIMEOUT(end_time);
>>> + unsigned long count;
>>> + int err = 0;
>>> + bool busy;
>>> +
>>> + ce = intel_context_create(engine);
>>> + if (IS_ERR(ce))
>>> + return PTR_ERR(ce);
>>> +
>>> + err = intel_context_pin(ce);
>>> + if (err) {
>>> + intel_context_put(ce);
>>> + return err;
>>> + }
>>> +
>>> + busy = false;
>>> + if (intel_engine_supports_stats(engine) &&
>>> + !intel_enable_engine_stats(engine)) {
>>> + p->busy = intel_engine_get_busy_time(engine);
>>> + busy = true;
>>> + }
>>> +
>>> + count = 0;
>>> + p->time = ktime_get();
>>> + do {
>>> + struct i915_request *rq;
>>> +
>>> + rq = i915_request_create(ce);
>>> + if (IS_ERR(rq)) {
>>> + err = PTR_ERR(rq);
>>> + break;
>>> + }
>>> +
>>> + i915_request_add(rq);
>>
>> Any concerns on ring size here and maybe managing the wait explicitly?
>
> No concern, the intention is to flood the ring. If we are able to wait
> on the ring, we have succeeded in submitting faster than the engine can
> retire. (Which might be another issue for us to resolve, as it may be
> our own interrupt latency that is then the bottleneck.)
>
> If we did a sync0, sync1, many; that could give us some more insight
> into the interrupt latency in comparison to engine latency.
>
>>
>>> + count++;
>>> + } while (!__igt_timeout(end_time, NULL));
>>> + p->time = ktime_sub(ktime_get(), p->time);
>>> +
>>> + if (busy) {
>>> + p->busy = ktime_sub(intel_engine_get_busy_time(engine),
>>> + p->busy);
>>> + intel_disable_engine_stats(engine);
>>> + }
>>> +
>>> + err = switch_to_kernel_sync(ce, err);
>>> + p->runtime = intel_context_get_total_runtime_ns(ce);
>>> + p->count = count;
>>> +
>>> + intel_context_unpin(ce);
>>> + intel_context_put(ce);
>>> + return err;
>>> +}
>>> +
>>> +static int perf_parallel_engines(void *arg)
>>> +{
>>> + struct drm_i915_private *i915 = arg;
>>> + static int (* const func[])(void *arg) = {
>>> + perf_sync,
>>> + perf_many,
>>> + NULL,
>>> + };
>>> + const unsigned int nengines = num_uabi_engines(i915);
>>> + struct intel_engine_cs *engine;
>>> + int (* const *fn)(void *arg);
>>> + struct pm_qos_request *qos;
>>> + struct {
>>> + struct perf_parallel p;
>>> + struct task_struct *tsk;
>>> + } *engines;
>>> + int err = 0;
>>> +
>>> + engines = kcalloc(nengines, sizeof(*engines), GFP_KERNEL);
>>> + if (!engines)
>>> + return -ENOMEM;
>>> +
>>> + qos = kzalloc(sizeof(*qos), GFP_KERNEL);
>>> + if (qos)
>>> + pm_qos_add_request(qos, PM_QOS_CPU_DMA_LATENCY, 0);
>>> +
>>> + for (fn = func; *fn; fn++) {
>>> + char name[KSYM_NAME_LEN];
>>> + struct igt_live_test t;
>>> + unsigned int idx;
>>> +
>>> + snprintf(name, sizeof(name), "%ps", *fn);
>>
>> Is this any better than just storing the name in local static array?
>
> It's easier for sure, and since the name is already in a static array,
> why not use it :)
It looks weird, it needs KSYM_NAME_LEN of stack space and the special
%ps. But okay.
>
>>> + err = igt_live_test_begin(&t, i915, __func__, name);
>>> + if (err)
>>> + break;
>>> +
>>> + atomic_set(&i915->selftest.counter, nengines);
>>> +
>>> + idx = 0;
>>> + for_each_uabi_engine(engine, i915) {
>>
>> For a pure driver overhead test I would suggest this to be a gt live test.
>
> It's a request performance test, so sits above the gt. My thinking is
> that this is a more of a high level request/scheduler test than
> execlists/guc (though it depends on those backends).
Okay, yeah, it makes sense.
>
>>> + intel_engine_pm_get(engine);
>>> +
>>> + memset(&engines[idx].p, 0, sizeof(engines[idx].p));
>>> + engines[idx].p.engine = engine;
>>> +
>>> + engines[idx].tsk = kthread_run(*fn, &engines[idx].p,
>>> + "igt:%s", engine->name);
>>
>> Test will get affected by the host CPU core count. How about we only
>> measure num_cpu engines? Might be even more important with discrete.
>
> No. We want to be able to fill the GPU with the different processors.
> Comparing glk to kbl helps highlight any inefficiencies we have -- we
> have to be efficient enough that core count is simply not a critical
> factor to offset our submission overhead.
>
> So we can run the same test and see how it scaled with engines vs cpus
> just by running it on different machines and look for problems.
Normally you would expect one core per engine is enough to saturate the
engine. I am afraid adding more combinations will be confusing when
reading test results. (Same GPU, same engine count, different CPU core
count.) How about two subtest variants? One is 1:1 CPU core to engine,
and another can be all engines like here?
Or possibly:
1. 1 CPU core - 1 engine - purest latency/overhead
2. 1 CPU core - N engines (N = all engines) - more
3. N CPU cores - N engines (N = min(engines, cores) - global lock
contention, stable setup
4. M CPU cores - N engines (N, M = max) - lock contention stress
5. N CPU cores - 1 engine (N = all cores) - more extreme lock contention
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2020-03-10 11:58 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-06 13:38 [Intel-gfx] [PATCH 01/17] drm/i915/selftests: Apply a heavy handed flush to i915_active Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 02/17] drm/i915/execlists: Enable timeslice on partial virtual engine dequeue Chris Wilson
2020-03-07 23:20 ` Sasha Levin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 03/17] drm/i915: Improve the start alignment of bonded pairs Chris Wilson
2020-03-10 9:59 ` Tvrtko Ursulin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 04/17] drm/i915: Tweak scheduler's kick_submission() Chris Wilson
2020-03-10 10:07 ` Tvrtko Ursulin
2020-03-10 11:00 ` Chris Wilson
2020-03-10 11:47 ` Tvrtko Ursulin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 05/17] drm/i915: Wrap i915_active in a simple kreffed struct Chris Wilson
2020-03-06 14:44 ` Mika Kuoppala
2020-03-06 13:38 ` [Intel-gfx] [PATCH 06/17] drm/i915: Extend i915_request_await_active to use all timelines Chris Wilson
2020-03-10 10:18 ` Tvrtko Ursulin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 07/17] drm/i915/perf: Schedule oa_config after modifying the contexts Chris Wilson
2020-03-06 14:20 ` Lionel Landwerlin
2020-03-10 11:17 ` Chris Wilson
2020-03-10 12:01 ` Lionel Landwerlin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 08/17] drm/i915/selftests: Add request throughput measurement to perf Chris Wilson
2020-03-10 10:38 ` Tvrtko Ursulin
2020-03-10 11:09 ` Chris Wilson
2020-03-10 11:58 ` Tvrtko Ursulin [this message]
2020-03-10 12:06 ` Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 09/17] dma-buf: Prettify typecasts for dma-fence-chain Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 10/17] dma-buf: Report signaled links inside dma-fence-chain Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 11/17] dma-buf: Exercise dma-fence-chain under selftests Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 12/17] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 13/17] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 14/17] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 15/17] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 16/17] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 17/17] drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore Chris Wilson
2020-03-06 14:35 ` [Intel-gfx] [PATCH 01/17] drm/i915/selftests: Apply a heavy handed flush to i915_active Mika Kuoppala
2020-03-06 21:13 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] " Patchwork
2020-03-06 21:33 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2020-03-06 21:59 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08d601d5-1583-61e4-113d-8208a17d3d0f@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).