From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754562Ab2I0FSV (ORCPT ); Thu, 27 Sep 2012 01:18:21 -0400 Received: from mail.skyhub.de ([78.46.96.112]:45636 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752565Ab2I0FST (ORCPT ); Thu, 27 Sep 2012 01:18:19 -0400 Date: Thu, 27 Sep 2012 07:18:15 +0200 From: Borislav Petkov To: Mike Galbraith Cc: Linus Torvalds , Peter Zijlstra , Mel Gorman , Nikolay Ulyanitsky , linux-kernel@vger.kernel.org, Andreas Herrmann , Andrew Morton , Thomas Gleixner , Ingo Molnar , Suresh Siddha Subject: Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected Message-ID: <20120927051815.GA1075@liondog.tnic> Mail-Followup-To: Borislav Petkov , Mike Galbraith , Linus Torvalds , Peter Zijlstra , Mel Gorman , Nikolay Ulyanitsky , linux-kernel@vger.kernel.org, Andreas Herrmann , Andrew Morton , Thomas Gleixner , Ingo Molnar , Suresh Siddha References: <1348538258.7100.23.camel@marge.simpson.net> <1348574286.3881.40.camel@twins> <20120925131736.GA30652@x1.osrc.amd.com> <20120925170058.GC30158@x1.osrc.amd.com> <20120926163233.GA5339@x1.osrc.amd.com> <20120926213723.GA27692@liondog.tnic> <1348722568.7059.115.camel@marge.simpson.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1348722568.7059.115.camel@marge.simpson.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 27, 2012 at 07:09:28AM +0200, Mike Galbraith wrote: > > The way I understand it is, you either want to share L2 with a process, > > because, for example, both working sets fit in the L2 and/or there's > > some sharing which saves you moving everything over the L3. This is > > where selecting a core on the same L2 is actually a good thing. > > Yeah, and if the wakee can't get to the L2 hot data instantly, it may be > better to let wakee drag the data to an instantly accessible spot. Yep, then moving it to another L2 is the same. [ … ] > > A crazy thought: one could go and sample tasks while running their > > timeslices with the perf counters to know exactly what type of workload > > we're looking at. I.e., do I have a large number of L2 evictions? Yes, > > then spread them out. No, then select the other core on the L2. And so > > on. > > Hm. That sampling better be really cheap. Might help... Yeah, that's why I said sampling and not run the perfcounters during every timeslice. But if you count the proper events, you should be able to know exactly what the workload is doing (compute-bound, io-bound, contention, etc...) > but how does that affect pgbench and ilk that must spread regardless > of footprints. Well, how do you measure latency of the 1 process in the 1:N case? Maybe pipeline stalls of the 1 along with some way to recognize it is the 1 in the 1:N case. Hmm. -- Regards/Gruss, Boris.