From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753259AbaKRCjl (ORCPT ); Mon, 17 Nov 2014 21:39:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40829 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752483AbaKRCjj (ORCPT ); Mon, 17 Nov 2014 21:39:39 -0500 Date: Mon, 17 Nov 2014 21:39:30 -0500 From: Dave Jones To: Linus Torvalds Cc: Linux Kernel , the arch/x86 maintainers Subject: Re: frequent lockups in 3.18rc4 Message-ID: <20141118023930.GA2871@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Linux Kernel , the arch/x86 maintainers References: <20141114213124.GB3344@redhat.com> <20141115213405.GA31971@redhat.com> <20141116014006.GA5016@redhat.com> <20141117170359.GA1382@redhat.com> <20141118020959.GA2091@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 17, 2014 at 06:21:08PM -0800, Linus Torvalds wrote: > On Mon, Nov 17, 2014 at 6:09 PM, Dave Jones wrote: > > > > After wasting countless hours rolling back to Fedora 20 and gcc 4.8.1, > > I saw the exact same trace on 3.17, so now I don't know what to think. > > Uhhuh. > > Has anything else changed? New trinity tests? If it has happened in as > little as ten minutes, and you don't recall having seen this until > about a week ago, it does sound like something changed. Looking at the trinity commits over the last month or so, there's a few new things, but nothing that sounds like it would trip up a bug like this. "generate random ascii strings" and "mess with fcntl's after opening fd's on startup" being the stand-outs. Everything else is pretty much cleanups and code-motion. There was a lot of work on the code that tracks mmaps about a month ago, but that shouldn't have had any visible runtime differences. hm, something I changed not that long ago, which I didn't commit yet, was that it now runs more child processes than it used to (was 64, now 256) I've been running like that for a while though. I want to say that was before .17, but I'm not 100% sure. So it could be that I'm just generating a lot more load now. I could drop that back down and see if it 'goes away' or at least happens less, but it strikes me that there's something here that needs fixing regardless. Dave