All of lore.kernel.org
 help / color / mirror / Atom feed
* intel_prepare_render(intel); unhelpful?
@ 2010-10-31  1:15 Peter Clifton
  2010-11-01  3:54 ` Eric Anholt
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Clifton @ 2010-10-31  1:15 UTC (permalink / raw)
  To: intel-gfx

Hi guys,

I was just poking around looking for somewhere quick and dirty to shove
my new experimental DRM IOCTL for retrieving IDLE data from the GPU. I
was looking at the various breakpoints in the debugger, and found
intel_prepare_render() being called more often than I'd like.

For instance, in intelClear() we call it - AIUI, flushing rendering
before code execution continues. Nasty ;(

I use glClear many times per frame on the stencil buffer, which always
ends up hitting the 3D engine for the clear (even normal colour buffer
clear seems to hit that path too, as BLIT can't do the correct kind of
tiling IIRC?)

If we can pre-determine we will use the 3D engine, presumably all the
state-changes in _mesa_meta_clear() will end up pipelined?

In a non-statistically correct sample test run of one benchmark
iteration each.. blindly commenting the intel_prepare_render() call gave
me 27.7fps -> 29.8fps.



I also noted that I'm hitting a path in intel_prepare_render which
throttles, even with vblank_mode=0. Why does it have to do this?

   if (intel->need_throttle && intel->first_post_swapbuffers_batch) {
      drm_intel_bo_wait_rendering(intel->first_post_swapbuffers_batch);
      drm_intel_bo_unreference(intel->first_post_swapbuffers_batch);
      intel->first_post_swapbuffers_batch = NULL;
      intel->need_throttle = GL_FALSE;
   }


Actually, bypassing it doesn't seem to have much / any positive effect,
(although I thought I got one the first time I tried it). Never mind.


-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: intel_prepare_render(intel); unhelpful?
  2010-10-31  1:15 intel_prepare_render(intel); unhelpful? Peter Clifton
@ 2010-11-01  3:54 ` Eric Anholt
  2010-11-01 19:20   ` Peter Clifton
  2010-11-01 19:52   ` Peter Clifton
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Anholt @ 2010-11-01  3:54 UTC (permalink / raw)
  To: Peter Clifton, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1833 bytes --]

On Sun, 31 Oct 2010 01:15:34 +0000, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Hi guys,
> 
> I was just poking around looking for somewhere quick and dirty to shove
> my new experimental DRM IOCTL for retrieving IDLE data from the GPU. I
> was looking at the various breakpoints in the debugger, and found
> intel_prepare_render() being called more often than I'd like.
> 
> For instance, in intelClear() we call it - AIUI, flushing rendering
> before code execution continues. Nasty ;(

I don't see flushing in intel_prepare_render() unless you're using
frontbuffer rendering, which you shouldn't be.

> I use glClear many times per frame on the stencil buffer, which always
> ends up hitting the 3D engine for the clear (even normal colour buffer
> clear seems to hit that path too, as BLIT can't do the correct kind of
> tiling IIRC?)
> 
> If we can pre-determine we will use the 3D engine, presumably all the
> state-changes in _mesa_meta_clear() will end up pipelined?
> 
> In a non-statistically correct sample test run of one benchmark
> iteration each.. blindly commenting the intel_prepare_render() call gave
> me 27.7fps -> 29.8fps.
> 
> 
> 
> I also noted that I'm hitting a path in intel_prepare_render which
> throttles, even with vblank_mode=0. Why does it have to do this?

Because otherwise, many apps out there will dump frames out to the GPU
as fast as possible, which will bottleneck, and interactivity of the
application (input reactions happen N frames later) and the X Server (it
can't get any rendering out, because the app has hogged the GPU for the
next few seconds) is ruined.

Now, this version of the code has bothered me, since apps that execute
in one batchbuffer should end up getting overly penalized.  See
intel-throttle-hack of my mesa tree for a possible fix.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: intel_prepare_render(intel); unhelpful?
  2010-11-01  3:54 ` Eric Anholt
@ 2010-11-01 19:20   ` Peter Clifton
  2010-11-01 19:52   ` Peter Clifton
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Clifton @ 2010-11-01 19:20 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Sun, 2010-10-31 at 20:54 -0700, Eric Anholt wrote:
> On Sun, 31 Oct 2010 01:15:34 +0000, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > For instance, in intelClear() we call it - AIUI, flushing rendering
> > before code execution continues. Nasty ;(
> 
> I don't see flushing in intel_prepare_render() unless you're using
> frontbuffer rendering, which you shouldn't be.

Hmm.. no, I'm not rendering to the front buffer - and having re-read the
code, I see what you mean. I'll have to investigate further what is
going on, as I'm sure I noted a change in performance when I removed the
call. Perhaps some extra data is warranted here.

The batchbuffers I'm ending up with aren't small, so perhaps I'm hitting
a penalty at the end of intel_prepare_render() with the throttling.

I think the throttling logic looks suspect too, but I'll fetch and
read / try your new code before I go much more into depth on the
existing implementation. However.. this is my understanding of it:

The comments in the code suggest the intention is to wait on the
"swapbuffers before the one we just emitted", but I think the code does
this instead:


Render frame into back-buffer:
	(CPU: Render commands)
		Reference Batch1 (to be submitted) as first_post_swapbuffers
		Batch1 (After any previous GPU commands)
	(CPU: Render commands)
		Batch2 (After Batch1)
	(CPU: Render commands)
		Batch3 (After Batch2)

Swap buffers:
	(Sets need_throttle flag)
	Swap1 (After batch3)

Render frame into back-buffer (Queued to wait for Swap1 - ON GPU?)
	(CPU: Render commands)
		First intel_prepare_render call stalls _CPU_ waiting for Batch1 to complete
		Reference Batch4 (to be submitted) as first_post_swapbuffers
		Batch4 (After Swap1)
	(CPU: Render commands)
		Batch5 (After Batch4)
	(CPU: Render commands)
		Batch6 (After Batch5)

Swap buffers:	
	(Sets need_throttle flag)
	Swap2 (After Batch6 - ON GPU?)

Render frame into back-buffer
	(CPU: Render commands)
		First intel_prepare_render call stalls _CPU_ waiting for Batch4 to complete
		Reference Batch 7 (to be submitted) as first_post_swapbuffers
		Batch7
	(CPU: Render commands)
		Batch8
	(CPU: Render commands)
		Batch9

....

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: intel_prepare_render(intel); unhelpful?
  2010-11-01  3:54 ` Eric Anholt
  2010-11-01 19:20   ` Peter Clifton
@ 2010-11-01 19:52   ` Peter Clifton
  2010-11-01 20:17     ` Eric Anholt
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Clifton @ 2010-11-01 19:52 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Sun, 2010-10-31 at 20:54 -0700, Eric Anholt wrote:

> Now, this version of the code has bothered me, since apps that execute
> in one batchbuffer should end up getting overly penalized.  See
> intel-throttle-hack of my mesa tree for a possible fix.

I like it! I still can't quite figure out what synchronisation issue I
was running into with my app though. With a single wait_for_rendering /
synchronisation per frame, I can't quite contrive how the GPU would get
stalled at all during a sequence of consecutive frames.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: intel_prepare_render(intel); unhelpful?
  2010-11-01 19:52   ` Peter Clifton
@ 2010-11-01 20:17     ` Eric Anholt
       [not found]       ` <1288645484.2714.2.camel@pcjc2lap>
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Anholt @ 2010-11-01 20:17 UTC (permalink / raw)
  To: Peter Clifton, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 762 bytes --]

On Mon, 01 Nov 2010 19:52:58 +0000, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> On Sun, 2010-10-31 at 20:54 -0700, Eric Anholt wrote:
> 
> > Now, this version of the code has bothered me, since apps that execute
> > in one batchbuffer should end up getting overly penalized.  See
> > intel-throttle-hack of my mesa tree for a possible fix.
> 
> I like it! I still can't quite figure out what synchronisation issue I
> was running into with my app though. With a single wait_for_rendering /
> synchronisation per frame, I can't quite contrive how the GPU would get
> stalled at all during a sequence of consecutive frames.

I like it too, except for the whole "not just no performance
improvement, but actually penalty on everything I've tried" thing.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: intel_prepare_render(intel); unhelpful?
       [not found]         ` <874oc0203e.fsf@pollan.anholt.net>
@ 2010-11-01 22:19           ` Peter Clifton
  2010-11-02 15:40             ` Eric Anholt
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Clifton @ 2010-11-01 22:19 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Mon, 2010-11-01 at 14:41 -0700, Eric Anholt wrote:
> > I'm going to look at the case I "think" I hit an improvement for and
> > dissect _why_, then get back to you.

I'll check this again shortly.. (I recall I was testing this with the
display lists anyway)..
 
> > I'm chasing my code right now to see why it is emitting lots of batches
> > when not using display lists for benchmarking purposes. I got my figures
> > muddled up before.. I'm seeing 5 batches / frame when using Display
> > lists, and nearer 40 when not. (I previously reported the other way
> > around).
> 
> I'd love to know too.  INTEL_DEBUG=state (in the midst of much other
> spam) dumps out a report of how many times various state changes got
> flagged, which may highlight a change between the two modes.

The large number of batches was due to a dumb dumb thing I was doing
with VBOs.. rather than just discarding the memory after rendering some
primitives, I was mapping the same VBO and re-uploading, causing
synchronisation.

Actually, I had two VBOs and was alternating between them, but was still
of course causing synchronisation at the map stage. Fixed now, so my non
display-list code is much faster again.

I guess it kind of begs the question why the compiled display list needs
4 or 5 batches to do what my own code manages in 1.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: intel_prepare_render(intel); unhelpful?
  2010-11-01 22:19           ` Peter Clifton
@ 2010-11-02 15:40             ` Eric Anholt
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Anholt @ 2010-11-02 15:40 UTC (permalink / raw)
  To: Peter Clifton, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1600 bytes --]

On Mon, 01 Nov 2010 22:19:34 +0000, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> On Mon, 2010-11-01 at 14:41 -0700, Eric Anholt wrote:
> > > I'm going to look at the case I "think" I hit an improvement for and
> > > dissect _why_, then get back to you.
> 
> I'll check this again shortly.. (I recall I was testing this with the
> display lists anyway)..
>  
> > > I'm chasing my code right now to see why it is emitting lots of batches
> > > when not using display lists for benchmarking purposes. I got my figures
> > > muddled up before.. I'm seeing 5 batches / frame when using Display
> > > lists, and nearer 40 when not. (I previously reported the other way
> > > around).
> > 
> > I'd love to know too.  INTEL_DEBUG=state (in the midst of much other
> > spam) dumps out a report of how many times various state changes got
> > flagged, which may highlight a change between the two modes.
> 
> The large number of batches was due to a dumb dumb thing I was doing
> with VBOs.. rather than just discarding the memory after rendering some
> primitives, I was mapping the same VBO and re-uploading, causing
> synchronisation.
> 
> Actually, I had two VBOs and was alternating between them, but was still
> of course causing synchronisation at the map stage. Fixed now, so my non
> display-list code is much faster again.
> 
> I guess it kind of begs the question why the compiled display list needs
> 4 or 5 batches to do what my own code manages in 1.

Display lists are an awful, deprecated feature of GL.  The solution to
them being inefficient is to not use them :)

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-02 15:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-31  1:15 intel_prepare_render(intel); unhelpful? Peter Clifton
2010-11-01  3:54 ` Eric Anholt
2010-11-01 19:20   ` Peter Clifton
2010-11-01 19:52   ` Peter Clifton
2010-11-01 20:17     ` Eric Anholt
     [not found]       ` <1288645484.2714.2.camel@pcjc2lap>
     [not found]         ` <874oc0203e.fsf@pollan.anholt.net>
2010-11-01 22:19           ` Peter Clifton
2010-11-02 15:40             ` Eric Anholt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.