From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751873AbaLETic (ORCPT ); Fri, 5 Dec 2014 14:38:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48182 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751233AbaLETib (ORCPT ); Fri, 5 Dec 2014 14:38:31 -0500 Date: Fri, 5 Dec 2014 14:37:50 -0500 From: Dave Jones To: Linus Torvalds Cc: Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List Subject: Re: frequent lockups in 3.18rc4 Message-ID: <20141205193750.GB8251@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List References: <20141201230339.GA20487@ret.masoncoding.com> <1417529606.3924.26.camel@maggy.simpson.net> <1417540493.21136.3@mail.thefacebook.com> <20141203184111.GA32005@redhat.com> <20141205171501.GA1320@redhat.com> <20141205184808.GA2753@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 05, 2014 at 11:31:11AM -0800, Linus Torvalds wrote: > On Fri, Dec 5, 2014 at 10:48 AM, Dave Jones wrote: > > > > In the meantime, I rebooted into the same kernel, and ran trinity > > solely doing the lsetxattr syscalls. > > Any particular reason for the lsetxattr guess? Just the last call > chain? I don't recognize it from the other traces, but maybe I just > didn't notice. yeah just a wild guess, just that that trace looked so.. clean. > > The load was a bit lower, so I > > cranked up the number of child processes to 512, and then this > > happened.. > > Ugh. "dump_trace()" being broken and looping forever? I don't actually > believe it, because this isn't even on the exception stack (well, the > NMI dumper is, but that one worked fine - this is the "nested" dumping > of just the allocation call chain) > > Smells like more random callchains to me. Unless this one is repeatable. > > Limiting trinity to just lsetxattr is interesting. Did it make things > fail faster? It sure failed quickly, but not in the "machine is totally locked up" sense, just "shit is all corrupted" sense. So it might be a completely different thing, or it could be a different manifestation of a corruptor. I guess we'll see how things go now that I marked it 'bad'. I'll give it a quick run with just lsetxattr again just to see what happens, but before I leave this one run over the weekend, I'll switch it back to "do everything", and pick it up again on Monday. Dave