From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751530AbaLRSyw (ORCPT ); Thu, 18 Dec 2014 13:54:52 -0500 Received: from mail-qc0-f173.google.com ([209.85.216.173]:60505 "EHLO mail-qc0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751196AbaLRSyv (ORCPT ); Thu, 18 Dec 2014 13:54:51 -0500 MIME-Version: 1.0 In-Reply-To: <1418918059.17358.6@mail.thefacebook.com> References: <20141213165915.GA12756@redhat.com> <20141213223616.GA22559@redhat.com> <20141214234654.GA396@redhat.com> <20141215055707.GA26225@redhat.com> <20141218051327.GA31988@redhat.com> <1418918059.17358.6@mail.thefacebook.com> Date: Thu, 18 Dec 2014 10:54:50 -0800 X-Google-Sender-Auth: u7hjV9eabpocgYGUT2Kbq5l4yNI Message-ID: Subject: Re: frequent lockups in 3.18rc4 From: Linus Torvalds To: Chris Mason Cc: Dave Jones , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?UTF-8?Q?D=C3=A2niel_Fraga?= , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 18, 2014 at 7:54 AM, Chris Mason wrote: > > CPU 2 seems to be the one making the least progress. I think he's calling > fork and then trying to allocate a debug object for his hrtimer, eventually > wandering into fill_pool from __debug_object_init(): Good call. I agree - fill_pool() seems to be just plain nasty. We've had this bug before, btw - a *loong* time ago in the original kmalloc stuff. You really should not fill a pool of memory that way. It's fundamentally wrong to fill a pool and then (later - after having released and re-aqcuired the lock) allocate from the pool. Somebody else will steal the allocations you did, and take advantage of your work. The high/low watermarks are done completely wrong for that thing too - if things fall below a minimum level, you want to try to make sure it grows clearly past the minimum, so that you don't get stuck just around the minimum. But you need to spread out the pain, rather than make one unlucky allocator have to do all the work. > It might be fun to run with CONFIG_DEBUG_OBJECTS off...Linus' patch clearly > helped, I think we're off in a different bug now. I'm not sure it was my patch. I'm wondering if it's because Dave still has preemption off, and the backtraces look different (and better) as a result. But yes, trying with DEBUG_OBJECTS off might be a good idea. It's entirely possible that the debug code is actually triggering bugs of its own, rather than showing other peoples bugs. Linus