Design of a GPU profiling debug interface

* Design of a GPU profiling debug interface
@ 2010-10-30 13:04 Peter Clifton
  2010-11-09 17:15 ` Jesse Barnes
  2010-11-09 22:27 ` Eric Anholt
  0 siblings, 2 replies; 3+ messages in thread
From: Peter Clifton @ 2010-10-30 13:04 UTC (permalink / raw)
  To: intel-gfx

I think I'll need some help with this. I'm by no means a kernel
programmer, so I'm feeling my way in the dark with this.

I want to design an interface so I can synchronise my GPU idle flags
polling with batchbuffer execution. I'm imagining at a high level, doing
something like this in my application (or mesa). (Hand-wavey-pseudocode)

expose_event_handler ()
{
	static bool one_shot_trace = true;

	if (one_shot_trace)
		mesa_debug_i915_trace_idle (TRUE);

	/* RENDERING COMMANDS IN HERE */
	SwapBuffers();

	if (one_shot_trace)
		mesa_debug_i915_trace_idle (FALSE);

	one_shot_trace = false;
}

I was imagining adding a flag to the EXECBUFFER2 IOCTL, or perhaps
adding a new EXECBUFFER3 IOCTL (which I'm playing with locally now).
Basically I just want to flag execbuffers which I'm interested in seeing
profiling data for.

In order to get really high-resolution profiling, it would be
advantageous to confine it to the time-period of interest otherwise the
data rate is too high. I guestimated about 10MB/s for a binary
representation of the data I'm currently polling in user-space. More
spatial resolution would be nice too, so this could increase.

I think I have a vague idea how to do the GPU and logging parts, even if
I end up having to start the polling before the batchbuffer starts
executing.

What I've got little / no clue how to is manage allocation of memory to
store the results in.

Should userspace (mesa?) be passing buffers for the kernel to return
profiling data? Then retrieving it somehow when it "knows" the
batchbuffer is finished? This will probably require over-allocation with
a guestimate of required memory space to log the given batch-buffer.

What about exporting via debugfs. Assuming the above code-fragment, we
could leave the last "frame" of polled data available, with the data
being overwritten when the next request to start logging comes in.
(That would perhaps require some kind of sequence number assigned if we
have multiple batches which come under the same request... or a separate
IOCTL to turn on / off logging).

Also.. I'm not sure how the locking would work if userspace is reading
out the debugfs file whilst another frame is being executed. (We'd
probably need a secondary logging buffer allocating in that case).

Thoughts?
-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 3+ messages in thread