intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 08/17] drm/i915/selftests: Add request throughput measurement to perf
Date: Tue, 10 Mar 2020 11:58:26 +0000	[thread overview]
Message-ID: <08d601d5-1583-61e4-113d-8208a17d3d0f@linux.intel.com> (raw)
In-Reply-To: <158383855988.16414.10338993219228723247@build.alporthouse.com>


On 10/03/2020 11:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-03-10 10:38:21)
>>
>> On 06/03/2020 13:38, Chris Wilson wrote:
>>> +static int perf_many(void *arg)
>>> +{
>>> +     struct perf_parallel *p = arg;
>>> +     struct intel_engine_cs *engine = p->engine;
>>> +     struct intel_context *ce;
>>> +     IGT_TIMEOUT(end_time);
>>> +     unsigned long count;
>>> +     int err = 0;
>>> +     bool busy;
>>> +
>>> +     ce = intel_context_create(engine);
>>> +     if (IS_ERR(ce))
>>> +             return PTR_ERR(ce);
>>> +
>>> +     err = intel_context_pin(ce);
>>> +     if (err) {
>>> +             intel_context_put(ce);
>>> +             return err;
>>> +     }
>>> +
>>> +     busy = false;
>>> +     if (intel_engine_supports_stats(engine) &&
>>> +         !intel_enable_engine_stats(engine)) {
>>> +             p->busy = intel_engine_get_busy_time(engine);
>>> +             busy = true;
>>> +     }
>>> +
>>> +     count = 0;
>>> +     p->time = ktime_get();
>>> +     do {
>>> +             struct i915_request *rq;
>>> +
>>> +             rq = i915_request_create(ce);
>>> +             if (IS_ERR(rq)) {
>>> +                     err = PTR_ERR(rq);
>>> +                     break;
>>> +             }
>>> +
>>> +             i915_request_add(rq);
>>
>> Any concerns on ring size here and maybe managing the wait explicitly?
> 
> No concern, the intention is to flood the ring. If we are able to wait
> on the ring, we have succeeded in submitting faster than the engine can
> retire. (Which might be another issue for us to resolve, as it may be
> our own interrupt latency that is then the bottleneck.)
> 
> If we did a sync0, sync1, many; that could give us some more insight
> into the interrupt latency in comparison to engine latency.
> 
>>
>>> +             count++;
>>> +     } while (!__igt_timeout(end_time, NULL));
>>> +     p->time = ktime_sub(ktime_get(), p->time);
>>> +
>>> +     if (busy) {
>>> +             p->busy = ktime_sub(intel_engine_get_busy_time(engine),
>>> +                                 p->busy);
>>> +             intel_disable_engine_stats(engine);
>>> +     }
>>> +
>>> +     err = switch_to_kernel_sync(ce, err);
>>> +     p->runtime = intel_context_get_total_runtime_ns(ce);
>>> +     p->count = count;
>>> +
>>> +     intel_context_unpin(ce);
>>> +     intel_context_put(ce);
>>> +     return err;
>>> +}
>>> +
>>> +static int perf_parallel_engines(void *arg)
>>> +{
>>> +     struct drm_i915_private *i915 = arg;
>>> +     static int (* const func[])(void *arg) = {
>>> +             perf_sync,
>>> +             perf_many,
>>> +             NULL,
>>> +     };
>>> +     const unsigned int nengines = num_uabi_engines(i915);
>>> +     struct intel_engine_cs *engine;
>>> +     int (* const *fn)(void *arg);
>>> +     struct pm_qos_request *qos;
>>> +     struct {
>>> +             struct perf_parallel p;
>>> +             struct task_struct *tsk;
>>> +     } *engines;
>>> +     int err = 0;
>>> +
>>> +     engines = kcalloc(nengines, sizeof(*engines), GFP_KERNEL);
>>> +     if (!engines)
>>> +             return -ENOMEM;
>>> +
>>> +     qos = kzalloc(sizeof(*qos), GFP_KERNEL);
>>> +     if (qos)
>>> +             pm_qos_add_request(qos, PM_QOS_CPU_DMA_LATENCY, 0);
>>> +
>>> +     for (fn = func; *fn; fn++) {
>>> +             char name[KSYM_NAME_LEN];
>>> +             struct igt_live_test t;
>>> +             unsigned int idx;
>>> +
>>> +             snprintf(name, sizeof(name), "%ps", *fn);
>>
>> Is this any better than just storing the name in local static array?
> 
> It's easier for sure, and since the name is already in a static array,
> why not use it :)

It looks weird, it needs KSYM_NAME_LEN of stack space and the special 
%ps. But okay.

> 
>>> +             err = igt_live_test_begin(&t, i915, __func__, name);
>>> +             if (err)
>>> +                     break;
>>> +
>>> +             atomic_set(&i915->selftest.counter, nengines);
>>> +
>>> +             idx = 0;
>>> +             for_each_uabi_engine(engine, i915) {
>>
>> For a pure driver overhead test I would suggest this to be a gt live test.
> 
> It's a request performance test, so sits above the gt. My thinking is
> that this is a more of a high level request/scheduler test than
> execlists/guc (though it depends on those backends).

Okay, yeah, it makes sense.

>   
>>> +                     intel_engine_pm_get(engine);
>>> +
>>> +                     memset(&engines[idx].p, 0, sizeof(engines[idx].p));
>>> +                     engines[idx].p.engine = engine;
>>> +
>>> +                     engines[idx].tsk = kthread_run(*fn, &engines[idx].p,
>>> +                                                    "igt:%s", engine->name);
>>
>> Test will get affected by the host CPU core count. How about we only
>> measure num_cpu engines? Might be even more important with discrete.
> 
> No. We want to be able to fill the GPU with the different processors.
> Comparing glk to kbl helps highlight any inefficiencies we have -- we
> have to be efficient enough that core count is simply not a critical
> factor to offset our submission overhead.
> 
> So we can run the same test and see how it scaled with engines vs cpus
> just by running it on different machines and look for problems.

Normally you would expect one core per engine is enough to saturate the 
engine. I am afraid adding more combinations will be confusing when 
reading test results. (Same GPU, same engine count, different CPU core 
count.) How about two subtest variants? One is 1:1 CPU core to engine, 
and another can be all engines like here?

Or possibly:

1. 1 CPU core - 1 engine - purest latency/overhead
2. 1 CPU core - N engines (N = all engines) - more
3. N CPU cores - N engines (N = min(engines, cores) - global lock 
contention, stable setup
4. M CPU cores - N engines (N, M = max) - lock contention stress
5. N CPU cores - 1 engine (N = all cores) - more extreme lock contention

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-03-10 11:58 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 13:38 [Intel-gfx] [PATCH 01/17] drm/i915/selftests: Apply a heavy handed flush to i915_active Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 02/17] drm/i915/execlists: Enable timeslice on partial virtual engine dequeue Chris Wilson
2020-03-07 23:20   ` Sasha Levin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 03/17] drm/i915: Improve the start alignment of bonded pairs Chris Wilson
2020-03-10  9:59   ` Tvrtko Ursulin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 04/17] drm/i915: Tweak scheduler's kick_submission() Chris Wilson
2020-03-10 10:07   ` Tvrtko Ursulin
2020-03-10 11:00     ` Chris Wilson
2020-03-10 11:47       ` Tvrtko Ursulin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 05/17] drm/i915: Wrap i915_active in a simple kreffed struct Chris Wilson
2020-03-06 14:44   ` Mika Kuoppala
2020-03-06 13:38 ` [Intel-gfx] [PATCH 06/17] drm/i915: Extend i915_request_await_active to use all timelines Chris Wilson
2020-03-10 10:18   ` Tvrtko Ursulin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 07/17] drm/i915/perf: Schedule oa_config after modifying the contexts Chris Wilson
2020-03-06 14:20   ` Lionel Landwerlin
2020-03-10 11:17   ` Chris Wilson
2020-03-10 12:01     ` Lionel Landwerlin
2020-03-06 13:38 ` [Intel-gfx] [PATCH 08/17] drm/i915/selftests: Add request throughput measurement to perf Chris Wilson
2020-03-10 10:38   ` Tvrtko Ursulin
2020-03-10 11:09     ` Chris Wilson
2020-03-10 11:58       ` Tvrtko Ursulin [this message]
2020-03-10 12:06         ` Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 09/17] dma-buf: Prettify typecasts for dma-fence-chain Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 10/17] dma-buf: Report signaled links inside dma-fence-chain Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 11/17] dma-buf: Exercise dma-fence-chain under selftests Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 12/17] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 13/17] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 14/17] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 15/17] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 16/17] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
2020-03-06 13:38 ` [Intel-gfx] [PATCH 17/17] drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore Chris Wilson
2020-03-06 14:35 ` [Intel-gfx] [PATCH 01/17] drm/i915/selftests: Apply a heavy handed flush to i915_active Mika Kuoppala
2020-03-06 21:13 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] " Patchwork
2020-03-06 21:33 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2020-03-06 21:59 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08d601d5-1583-61e4-113d-8208a17d3d0f@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).