From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Harrison Subject: Re: [RFC 15/44] drm/i915: Added deferred work handler for scheduler Date: Thu, 24 Jul 2014 16:42:55 +0100 Message-ID: <53D1297F.70604@Intel.com> References: <1403803475-16337-1-git-send-email-John.C.Harrison@Intel.com> <1403803475-16337-16-git-send-email-John.C.Harrison@Intel.com> <20140707191435.GD5821@phenom.ffwll.local> <53CFD6C2.2010600@Intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 417FF6E0C4 for ; Thu, 24 Jul 2014 08:43:19 -0700 (PDT) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Daniel Vetter Cc: intel-gfx List-Id: intel-gfx@lists.freedesktop.org On 23/07/2014 19:50, Daniel Vetter wrote: > On Wed, Jul 23, 2014 at 5:37 PM, John Harrison > wrote: >>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>>> b/drivers/gpu/drm/i915/i915_drv.h >>>> index 0977653..fbafa68 100644 >>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>> @@ -1075,6 +1075,16 @@ struct i915_gem_mm { >>>> struct delayed_work idle_work; >>>> /** >>>> + * New scheme is to get an interrupt after every work packet >>>> + * in order to allow the low latency scheduling of pending >>>> + * packets. The idea behind adding new packets to a pending >>>> + * queue rather than directly into the hardware ring buffer >>>> + * is to allow high priority packets to over take low priority >>>> + * ones. >>>> + */ >>>> + struct work_struct scheduler_work; >>> Latency for work items isn't too awesome, and e.g. Oscar's execlist code >>> latches the next context right away from the irq handler. Why can't we do >>> something similar for the scheduler? Fishing the next item out of a >>> priority queue shouldn't be expensive ... >>> -Daniel >> >> The problem is that taking batch buffers from the scheduler's queue and >> submitting them to the hardware requires lots of processing that is not IRQ >> compatible. It isn't just a simple register write. Half of the code in >> 'i915_gem_do_execbuffer()' must be executed. Probably/possibly it could be >> made IRQ friendly but that would place a lot of restrictions on a lot of >> code that currently doesn't expect to be restricted. Instead, the submission >> is done via a work handler that acquires the driver mutex lock. >> >> In order to cover the extra latency, the scheduler operates in a >> multi-buffered mode and aims to keep eight batch buffers in flight at all >> times. That number being obtained empirically by running lots of benchmarks >> on Android with lots of different settings and seeing where the buffer size >> stopped making a difference. > So I've tried to stitch together that part of the scheduler from the > patch series. Afaics you do the actual scheduling under the protection > of irqsave spinlocks (well you also hold the dev->struct_mutex). That > means you disable local interrupts. Up to the actual submit point I > spotted two such critcial sections encompassing pretty much all the > code. > > If we'd run the same code from the interrupt handler then only our own > interrupt handler is blocked, all other interrupt processing can > continue. So that's actually a lot nicer than what you have. In any > case you can't do expensive operations under an irqsave spinlock > anyway. > > So either I've missed something big here, or this justification doesn't hold up. > -Daniel The irqsave spinlock is only held while manipulating the internal scheduler data structures. It is released immediately prior to calling i915_gem_do_execbuffer_final(). So the actual submission code path is done with the driver mutex but no spinlocks. I'm sure I got 'scheduling while atomic' bug checks the one time I accidentally left the spinlock held.