From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751096AbaLOF5x (ORCPT ); Mon, 15 Dec 2014 00:57:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52443 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750736AbaLOF5v (ORCPT ); Mon, 15 Dec 2014 00:57:51 -0500 Date: Mon, 15 Dec 2014 00:57:07 -0500 From: Dave Jones To: Linus Torvalds Cc: Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin Subject: Re: frequent lockups in 3.18rc4 Message-ID: <20141215055707.GA26225@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin References: <20141211145408.GB16800@redhat.com> <20141212185454.GB4716@redhat.com> <20141213165915.GA12756@redhat.com> <20141213223616.GA22559@redhat.com> <20141214234654.GA396@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 14, 2014 at 09:47:26PM -0800, Linus Torvalds wrote: > so it's always in __do_page_fault, but at sometimes it has gotten into > handle_mm_fault too. So it really really looks like it is taking an > endless stream of page faults on that "xsaveq" instruction. Presumably > the page faulting never actually makes any progress, even though it > *thinks* the page tables are fine. > > DaveJ - you've seen that "endless page faults" behavior before. You > had a few traces that showed it. That was in that whole "pipe/page > fault oddness." email thread, where you would get endless faults in > copy_page_to_iter() with an error_code=0x2. > > That was the one where I chased it down to "page table entry must be > marked with _PAGE_PROTNONE", but VM_WRITE in the vma, because your > machine was alive enough that you got traces out of the endless loop. We had a flashback to that old bug last month too. See this mail & your followup. : https://lkml.org/lkml/2014/11/25/1171 That was during a bisect though, so may have been something entirely different, but it is a spooky coincidence. Dave