From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751516AbaLZSMc (ORCPT ); Fri, 26 Dec 2014 13:12:32 -0500 Received: from arcturus.aphlor.org ([188.246.204.175]:57223 "EHLO arcturus.aphlor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751366AbaLZSMa (ORCPT ); Fri, 26 Dec 2014 13:12:30 -0500 Date: Fri, 26 Dec 2014 13:12:04 -0500 From: Dave Jones To: Linus Torvalds , Thomas Gleixner , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin , John Stultz Subject: Re: frequent lockups in 3.18rc4 Message-ID: <20141226181204.GA26527@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Linus Torvalds , Thomas Gleixner , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin , John Stultz References: <20141221223204.GA9618@codemonkey.org.uk> <20141222225725.GA8140@codemonkey.org.uk> <20141224030125.GA8725@codemonkey.org.uk> <20141226163410.GA25161@codemonkey.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141226163410.GA25161@codemonkey.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Report: Spam report generated by SpamAssassin on "arcturus.aphlor.org" Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Authenticated-User: davej@codemonkey.org.uk Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote: > One thing I think I'll try is to try and narrow down which > syscalls are triggering those "Clocksource hpet had cycles off" > messages. I'm still unclear on exactly what is doing > the stomping on the hpet. First I ran trinity with "-g vm" which limits it to use just a subset of syscalls, specifically VM related ones. That triggered the messages. Further experiments revealed: -c mremap triggered it, but only when I also passed -C256 to crank up the number of child processes. The same thing occured with mprotect, madvise, remap_file_pages. I couldn't trigger it with -c mmap, or msync, mbind, move_pages, migrate_pages, mlock, regardless of how many child processes there were. Given the high child count necessary to trigger it, it's nigh on impossible to weed through all the calls that trinity made to figure out which one actually triggered the messages. I'm not even convinced that the syscall parameters are even particularly interesting. The "needs high load to trigger" aspect of the bug still has a smell of scheduler interaction or side effect of lock contention. Looking at one childs syscall params in isolation might look quite dull, but if we have N processes hammering on the same mapping, that's probably a lot more interesting. Dave