All of lore.kernel.org
 help / color / mirror / Atom feed
* Design of a GPU profiling debug interface
@ 2010-10-30 13:04 Peter Clifton
  2010-11-09 17:15 ` Jesse Barnes
  2010-11-09 22:27 ` Eric Anholt
  0 siblings, 2 replies; 3+ messages in thread
From: Peter Clifton @ 2010-10-30 13:04 UTC (permalink / raw)
  To: intel-gfx

I think I'll need some help with this. I'm by no means a kernel
programmer, so I'm feeling my way in the dark with this.

I want to design an interface so I can synchronise my GPU idle flags
polling with batchbuffer execution. I'm imagining at a high level, doing
something like this in my application (or mesa). (Hand-wavey-pseudocode)

expose_event_handler ()
{
	static bool one_shot_trace = true;

	if (one_shot_trace)
		mesa_debug_i915_trace_idle (TRUE);

	/* RENDERING COMMANDS IN HERE */
	SwapBuffers();

	if (one_shot_trace)
		mesa_debug_i915_trace_idle (FALSE);

	one_shot_trace = false;
}


I was imagining adding a flag to the EXECBUFFER2 IOCTL, or perhaps
adding a new EXECBUFFER3 IOCTL (which I'm playing with locally now).
Basically I just want to flag execbuffers which I'm interested in seeing
profiling data for.

In order to get really high-resolution profiling, it would be
advantageous to confine it to the time-period of interest otherwise the
data rate is too high. I guestimated about 10MB/s for a binary
representation of the data I'm currently polling in user-space. More
spatial resolution would be nice too, so this could increase.


I think I have a vague idea how to do the GPU and logging parts, even if
I end up having to start the polling before the batchbuffer starts
executing.

What I've got little / no clue how to is manage allocation of memory to
store the results in.

Should userspace (mesa?) be passing buffers for the kernel to return
profiling data? Then retrieving it somehow when it "knows" the
batchbuffer is finished? This will probably require over-allocation with
a guestimate of required memory space to log the given batch-buffer.

What about exporting via debugfs. Assuming the above code-fragment, we
could leave the last "frame" of polled data available, with the data
being overwritten when the next request to start logging comes in.
(That would perhaps require some kind of sequence number assigned if we
have multiple batches which come under the same request... or a separate
IOCTL to turn on / off logging).

Also.. I'm not sure how the locking would work if userspace is reading
out the debugfs file whilst another frame is being executed. (We'd
probably need a secondary logging buffer allocating in that case).


Thoughts?
-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Design of a GPU profiling debug interface
  2010-10-30 13:04 Design of a GPU profiling debug interface Peter Clifton
@ 2010-11-09 17:15 ` Jesse Barnes
  2010-11-09 22:27 ` Eric Anholt
  1 sibling, 0 replies; 3+ messages in thread
From: Jesse Barnes @ 2010-11-09 17:15 UTC (permalink / raw)
  To: Peter Clifton; +Cc: intel-gfx

On Sat, 30 Oct 2010 14:04:11 +0100
Peter Clifton <pcjc2@cam.ac.uk> wrote:

> I think I'll need some help with this. I'm by no means a kernel
> programmer, so I'm feeling my way in the dark with this.
> 
> I want to design an interface so I can synchronise my GPU idle flags
> polling with batchbuffer execution. I'm imagining at a high level, doing
> something like this in my application (or mesa). (Hand-wavey-pseudocode)
> 
> expose_event_handler ()
> {
> 	static bool one_shot_trace = true;
> 
> 	if (one_shot_trace)
> 		mesa_debug_i915_trace_idle (TRUE);
> 
> 	/* RENDERING COMMANDS IN HERE */
> 	SwapBuffers();
> 
> 	if (one_shot_trace)
> 		mesa_debug_i915_trace_idle (FALSE);
> 
> 	one_shot_trace = false;
> }
> 
> 
> I was imagining adding a flag to the EXECBUFFER2 IOCTL, or perhaps
> adding a new EXECBUFFER3 IOCTL (which I'm playing with locally now).
> Basically I just want to flag execbuffers which I'm interested in seeing
> profiling data for.
> 
> In order to get really high-resolution profiling, it would be
> advantageous to confine it to the time-period of interest otherwise the
> data rate is too high. I guestimated about 10MB/s for a binary
> representation of the data I'm currently polling in user-space. More
> spatial resolution would be nice too, so this could increase.

Would be very cool to be able to correlate the data...

> I think I have a vague idea how to do the GPU and logging parts, even if
> I end up having to start the polling before the batchbuffer starts
> executing.
> 
> What I've got little / no clue how to is manage allocation of memory to
> store the results in.
> 
> Should userspace (mesa?) be passing buffers for the kernel to return
> profiling data? Then retrieving it somehow when it "knows" the
> batchbuffer is finished? This will probably require over-allocation with
> a guestimate of required memory space to log the given batch-buffer.
> 
> What about exporting via debugfs. Assuming the above code-fragment, we
> could leave the last "frame" of polled data available, with the data
> being overwritten when the next request to start logging comes in.
> (That would perhaps require some kind of sequence number assigned if we
> have multiple batches which come under the same request... or a separate
> IOCTL to turn on / off logging).

There's also relayfs, which is made for high bandwidth kernel->user
communication.  I'm not sure if it will make this any easier, but I
think there's some documentation in the kernel about it.

A ring buffer with the last N timestamps might also be a good way of
exposing things.  Having more than one entry available means that if
userspace didn't get scheduled at the right time it would still have a
good chance of getting all the data it missed since the last read.

> 
> Also.. I'm not sure how the locking would work if userspace is reading
> out the debugfs file whilst another frame is being executed. (We'd
> probably need a secondary logging buffer allocating in that case).

The kernel implementation of the read() side of the file could do some
locking to prevent new data from corrupting a read in progress.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Design of a GPU profiling debug interface
  2010-10-30 13:04 Design of a GPU profiling debug interface Peter Clifton
  2010-11-09 17:15 ` Jesse Barnes
@ 2010-11-09 22:27 ` Eric Anholt
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Anholt @ 2010-11-09 22:27 UTC (permalink / raw)
  To: Peter Clifton, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1654 bytes --]

On Sat, 30 Oct 2010 14:04:11 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> I think I'll need some help with this. I'm by no means a kernel
> programmer, so I'm feeling my way in the dark with this.
> 
> I want to design an interface so I can synchronise my GPU idle flags
> polling with batchbuffer execution. I'm imagining at a high level, doing
> something like this in my application (or mesa). (Hand-wavey-pseudocode)

Here's a thought.  It ties in a little with something Arjan was asking
for.  Have trace events around batchbuffer submit that report the
timestamp before and after -- you can do this on G45+ at least using
PIPE_CONTROL writes to a temporary BO that the kernel makes for the job.
The idle-bit tracing daemon would ask for those events, then run
(CPU-side) doing the capture of INSTDONEs and the TIMESTAMP register,
and also collecting the perf events.  It could throw out reg captures
that aren't within start/end of batchbuffers once perf events beyond
those points arrive so you don't spew irrelevant data to disk.  I think
we've got pids of requester in the trace events, so you could correlate
the records with which app did the rendering.

The tricky part would be that the kernel needs to collect the
PIPE_CONTROL-written timestamp values out "some time later" after the
GPU is done with them and generate the events at that point -- so if you
enabled lots of tracking, you'd see a tracing stream that looks
something like:

i915_gem_request_submit
i915_gem_request_retire
i915_gem_request_started # the new gpu-side timestamp event
i915_gem_request_finished # the new gpu-side timestamp event

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-11-09 22:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-30 13:04 Design of a GPU profiling debug interface Peter Clifton
2010-11-09 17:15 ` Jesse Barnes
2010-11-09 22:27 ` Eric Anholt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.