All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert Bragg <robert@sixbynine.org>
To: "Gupta, Sourab" <sourab.gupta@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"Wu, Jabin" <jabin.wu@intel.com>,
	"Woo, Insoo" <insoo.woo@intel.com>,
	"Bragg, Robert" <robert.bragg@intel.com>
Subject: Re: [RFC 5/7] drm/i915: Wait for GPU to finish before event stop in Gen Perf PMU
Date: Thu, 25 Jun 2015 12:47:17 +0100	[thread overview]
Message-ID: <CAMou1-0mWFP7thEa9F0pzYcn0WP30VXcg69SCV6Poj7DoUFw_w@mail.gmail.com> (raw)
In-Reply-To: <1435220967.1938.24.camel@sourabgu-desktop>

On Thu, Jun 25, 2015 at 9:27 AM, Gupta, Sourab <sourab.gupta@intel.com> wrote:
> On Thu, 2015-06-25 at 07:42 +0000, Daniel Vetter wrote:
>> On Thu, Jun 25, 2015 at 06:02:35AM +0000, Gupta, Sourab wrote:
>> > On Mon, 2015-06-22 at 16:09 +0000, Daniel Vetter wrote:
>> > > On Mon, Jun 22, 2015 at 02:22:54PM +0100, Chris Wilson wrote:
>> > > > On Mon, Jun 22, 2015 at 03:25:07PM +0530, sourab.gupta@intel.com wrote:
>> > > > > From: Sourab Gupta <sourab.gupta@intel.com>
>> > > > >
>> > > > > To collect timestamps around any GPU workload, we need to insert
>> > > > > commands to capture them into the ringbuffer. Therefore, during the stop event
>> > > > > call, we need to wait for GPU to complete processing the last request for
>> > > > > which these commands were inserted.
>> > > > > We need to ensure this processing is done before event_destroy callback which
>> > > > > deallocates the buffer for holding the data.
>> > > >
>> > > > There's no reason for this to be synchronous. Just that you need an
>> > > > active reference on the output buffer to be only released after the
>> > > > final request.
>> > >
>> > > Yeah I think the interaction between OA sampling and GEM is the critical
>> > > piece here for both patch series. Step one is to have a per-pmu lock to
>> > > keep track of data private to OA and mmio based sampling as a starting
>> > > point. Then we need to figure out how to structure the connection without
>> > > OA/PMU and gem trampling over each another's feet.
>> > >
>> > > Wrt requests those are refcounted already and just waiting on them doesn't
>> > > need dev->struct_mutex. That should be all you need, assuming you do
>> > > correctly refcount them ...
>> > > -Daniel
>> >
>> > Hi Daniel/Chris,
>> >
>> > Thanks for your feedback. I acknowledge the issue with increasing
>> > coverage scope of i915 dev mutex. I'll have a per-pmu lock to keep track
>> > of data private to OA in my next version of patches.
>>
>> If you create new locking schemes please coordinate with Rob about them.
>> Better if we come up with a consistent locking scheme for all of OA. That
>> ofc doesn't exclude that we'll have per-pmu locking, just that the locking
>> design should match if it makes sense. The example I'm thinking of is
>> drrs&psr which both use the frontbuffer tracking code and both have their
>> private mutex. But the overall approach to locking and cleaning up the
>> async work is the same.
>>
> Ok, I'll coordinate with Robert about a consistent locking scheme here.
>
>> > W.r.t. having the synchronous wait during event stop, I thought through
>> > the method of having active reference on output buffer. This will let us
>> > have a delayed buffer destroy (controlled by decrement of output buffer
>> > refcount as and when gpu requests complete). This will be decoupled from
>> > the event stop here. But that is not the only thing at play here.
>> >
>> > The event stop is expected to disable the OA unit (which it is doing
>> > right now). In my usecase, I can't disable OA unit, till the time GPU
>> > processes the MI_RPC requests scheduled already, which otherwise may
>> > lead to hangs.
>> > So, I'm waiting synchronously for requests to complete before disabling
>> > OA unit.
>> >
>> > Also, the event_destroy callback should be destroying the buffers
>> > associated with event. (Right now, in current design, event_destroy
>> > callback waits for the above operations to be completed before
>> > proceeding to destroy the buffer).
>> >
>> > If I try to create a function which destroys the output buffer according
>> > to all references being released, all these operations will have to be
>> > performed together in that function, in this serial order (so that when
>> > refcount drops to 0, i.e. requests have completed, OA unit will be
>> > disabled, and then the buffer destroyed). This would lead to these
>> > operations being decoupled from the event_stop and event_destroy
>> > functions. This may be non-intentional behaviour w.r.t. these callbacks
>> > from userspace perspective.
>> >
>> > Robert, any thoughts?
>>
>> Yeah now idea whether those perf callbacks are allowed to stall for a
>> longer time. If not we could simply push the cleanup/OA disabling into an
>> async work.
>> -Daniel
>
> Hi Daniel,
> Actually right now, the stop_event callback is not stalled. Instead I'm
> scheduling an async work fn to do the job. The event destroy is then
> waiting for the work fn to finish processing, before proceeding on to
> destroy the buffer.
> I'm also thinking through Chris' suggestions here and will try to come
> up with solution based on them.

A notable detail here is that most pmu methods are called in atomic
context by events/core.c via inter-processor interrupts (as a somewhat
unintended side effect of userspace opening i915_oa events as
specific-cpu events perf will make sure all methods are invoked on
that specific cpu). This means waiting in pmu->event_stop() isn't even
an option for us, though as noted it's only the destroy currently that
waits for the completion of the stop work fn. event_init() and
event->destroy() are two exceptions that are called in process
context. That said though, it does seem worth considering if we can
avoid waiting in the event_destroy() if not strictly necessary, and
chris's suggestion to follow a refcounting scheme sounds workable.

Regards,
- Robert

>
> Thanks,
> Sourab
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-06-25 11:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-22  9:55 [RFC 0/7] Introduce framework for forwarding generic non-OA performance sourab.gupta
2015-06-22  9:55 ` [RFC 1/7] drm/i915: Add a new PMU for handling non-OA counter data profiling requests sourab.gupta
2015-06-22  9:55 ` [RFC 2/7] drm/i915: Register routines for Gen perf PMU driver sourab.gupta
2015-06-22  9:55 ` [RFC 3/7] drm/i915: Introduce timestamp node for timestamp data collection sourab.gupta
2015-06-22  9:55 ` [RFC 4/7] drm/i915: Add mechanism for forwarding the data samples to userspace through Gen PMU perf interface sourab.gupta
2015-06-22 13:21   ` Chris Wilson
2015-06-22  9:55 ` [RFC 5/7] drm/i915: Wait for GPU to finish before event stop in Gen Perf PMU sourab.gupta
2015-06-22 13:22   ` Chris Wilson
2015-06-22 16:09     ` Daniel Vetter
2015-06-25  6:02       ` Gupta, Sourab
2015-06-25  7:42         ` Daniel Vetter
2015-06-25  8:27           ` Gupta, Sourab
2015-06-25 11:47             ` Robert Bragg [this message]
2015-06-25  8:02         ` Chris Wilson
2015-06-25 17:31           ` Robert Bragg
2015-06-25 17:37             ` Chris Wilson
2015-06-25 18:20               ` Chris Wilson
2015-06-25 13:02         ` Robert Bragg
2015-06-25 13:07           ` Robert Bragg
2015-06-22  9:55 ` [RFC 6/7] drm/i915: Add routines for inserting commands in the ringbuf for capturing timestamps sourab.gupta
2015-06-22  9:55 ` [RFC 7/7] drm/i915: Add support for retrieving MMIO register values in Gen Perf PMU sourab.gupta
2015-06-22 13:29   ` Chris Wilson
2015-06-22 16:06   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMou1-0mWFP7thEa9F0pzYcn0WP30VXcg69SCV6Poj7DoUFw_w@mail.gmail.com \
    --to=robert@sixbynine.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=insoo.woo@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jabin.wu@intel.com \
    --cc=robert.bragg@intel.com \
    --cc=sourab.gupta@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.