From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for Date: Tue, 25 Mar 2014 10:42:09 +0100 Message-ID: <20140325094209.GN26878@phenom.ffwll.local> References: <1394838536-5372-1-git-send-email-daniel.vetter@ffwll.ch> <87vbv7v4l4.fsf@gaia.fi.intel.com> <20140322175225.GG26519@phenom.ffwll.local> <20140324223711.GA2082@bwidawsk.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ee0-f48.google.com (mail-ee0-f48.google.com [74.125.83.48]) by gabe.freedesktop.org (Postfix) with ESMTP id 4F8836E0D0 for ; Tue, 25 Mar 2014 02:42:14 -0700 (PDT) Received: by mail-ee0-f48.google.com with SMTP id b57so206102eek.7 for ; Tue, 25 Mar 2014 02:42:13 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20140324223711.GA2082@bwidawsk.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Ben Widawsky Cc: Intel Graphics Development , Daniel Vetter List-Id: intel-gfx@lists.freedesktop.org On Mon, Mar 24, 2014 at 03:37:12PM -0700, Ben Widawsky wrote: > On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote: > > On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote: > > > Hi, > > > > > > Daniel Vetter writes: > > > > > > > There's an entire pile of issues in here: > > > > > > > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt > > > > offset of the batch buffer when a batch is executed. Semaphores are > > > > always emitted to the main ring, so we always want to look at that. > > > > > > The ipehr check should make sure that we are at the ring and > > > acthd should not be at batch. > > > > > > Regardless I agree that RING_HEAD is much more safer. When I have > > > tried to get bottom of the snb reset hang, I have noticed that > > > after reset the acthd is sometimes 0x0c even tho head is 0x00, > > > on snb. > > > > Hm, should we maybe mask out the lower bits, too? If you can confirm this, > > can you please add a follow-up patch? > > > > > > - Mask the obtained HEAD pointer with the actual ring size, which is > > > > much smaller. Together with the above issue this resulted us in > > > > trying to dereference a pointer way outside of the ring mmio > > > > mapping. The resulting invalid access in interrupt context > > > > (hangcheck is executed from timers) lead to a full blown kernel > > > > panic. The fbcon panic handler then tried to frob our driver harder, > > > > resulting in a full machine hang at least on my snb here where I've > > > > stumbled over this. > > > > > > > > - Handle ring wrapping correctly and be a bit more explicit about how > > > > many dwords we're scanning. We probably should also scan more than > > > > just 4 ... > > > > > > ipehr dont update on MI_NOOPS and the head should point to > > > the extra MI_NOOP after semaphore sequence. So 4 should be > > > enough. Perhaps revisit this when bdw gains semaphores. > > > > Yeah, Chris also mentioned that the HEAD should point at one dword past > > the end of the ring, so should even work when there are no MI_NOOPs. In > > any case this code is more robust in case hw engineers suddenly change the > > behaviour. > > > > > > - Space out some of teh computations for readability. > > > > > > > > This prevents hard-hangs on my snb here. Verdict from QA is still > > > > pending, but the symptoms match. > > > > > > > > Cc: Mika Kuoppala > > > > Cc: Ben Widawsky > > > > Cc: Chris Wilson > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100 > > > > > > The hard hangs are not so frequent with this patch but > > > they are still there. This patch should take care of > > > the oops we have been seeing, but there is still > > > something else to be found until #74100 can be marked as > > > fixed. > > > > > > Very often after reset, when igt has pushed the quietence > > > batches into rings, blitter and video ring heads gets > > > moved properly but all updates to hardware status page and to > > > the sync registers are missing. And result is hang by hangcheck. > > > Repeat enough times and it is a hard hang. > > > > > > > Signed-off-by: Daniel Vetter > > > > > > Please remove the blowup comment and also update the > > > bugzilla tag. > > > > Yeah, QA also says that it doesn't fix the hard hangs, only seems to > > reduce them a bit on certain setups. I've updated the commit message. > > > > btw are you testing with FBDEV=n? The lack of a fbcon panic handler should > > greatly increase the chances that the last few message get correctly > > transmitted through other means like netconsole. > > > > > Patches 1-2 and the followup one are > > > > > > Reviewed-by: Mika Kuoppala > > > > Thanks for the review, patches merged. > > -Daniel > > Patch 2 was trivial to implement for gen8. This functionality is a lot > less trivial. I volunteered to do patch 2, are you going to port this to > gen8? If you fill out the bdw pieces for patch 2&3 the only thing you need to change here is to adjsut the number of dwords we walk backwards to make sure that the bdw semaphore cmd still fits. Or at least that's been my idea behind all this, maybe I've overlooked something. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch