From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751021AbcGLS0X (ORCPT ); Tue, 12 Jul 2016 14:26:23 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:38507 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750814AbcGLS0W (ORCPT ); Tue, 12 Jul 2016 14:26:22 -0400 X-IBM-Helo: d01dlp02.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Tue, 12 Jul 2016 11:26:35 -0700 From: "Paul E. McKenney" To: "H. Peter Anvin" Cc: Peter Zijlstra , tglx@linutronix.de, mingo@elte.hu, ak@linux.intel.com, linux-kernel@vger.kernel.org Subject: Re: Odd performance results Reply-To: paulmck@linux.vnet.ibm.com References: <20160710042639.GA4068@linux.vnet.ibm.com> <7DF218CD-22F6-4E46-A628-2138AEA3A161@infradead.org> <20160710144327.GX4650@linux.vnet.ibm.com> <20160712145551.GU30909@twins.programming.kicks-ass.net> <20160712150529.GN7094@linux.vnet.ibm.com> <27d2c710-479d-77a9-f2c6-875e9c2bc40f@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <27d2c710-479d-77a9-f2c6-875e9c2bc40f@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16071218-0040-0000-0000-000000C9E00B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16071218-0041-0000-0000-000004A40F01 Message-Id: <20160712182635.GV7094@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-12_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607120163 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 12, 2016 at 10:49:58AM -0700, H. Peter Anvin wrote: > On 07/12/16 08:05, Paul E. McKenney wrote: > > On Tue, Jul 12, 2016 at 04:55:51PM +0200, Peter Zijlstra wrote: > >> On Sun, Jul 10, 2016 at 07:43:27AM -0700, Paul E. McKenney wrote: > >>> On Sun, Jul 10, 2016 at 07:17:19AM +0200, Peter Zijlstra wrote: > >>>> > >>>> > >>>> On 10 July 2016 06:26:39 CEST, "Paul E. McKenney" wrote: > >>>>> Hello! > >>>>> > >>>>> So I ran a quick benchmark which showed stair-step results. I > >>>>> immediately > >>>>> thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7 > >>>>> being threads in a core." Then I thought "Wait, this is an x86!" > >>>>> Then I dumped out cpu*/topology/thread_siblings_list, getting the > >>>>> following: > >>>>> > >>>>> cpu0/topology/thread_siblings_list: 0-1 > >>>>> cpu1/topology/thread_siblings_list: 0-1 > >>>>> cpu2/topology/thread_siblings_list: 2-3 > >>>>> cpu3/topology/thread_siblings_list: 2-3 > >>>>> cpu4/topology/thread_siblings_list: 4-5 > >>>>> cpu5/topology/thread_siblings_list: 4-5 > >>>>> cpu6/topology/thread_siblings_list: 6-7 > >>>>> cpu7/topology/thread_siblings_list: 6-7 > >>>> > >>>> > >>>> I'm guessing this is an AMD bulldozer like machine? > >>> > >>> /proc/cpuinfo thinks otherwise: > >>> > >>> processor : 0 > >>> vendor_id : GenuineIntel > >>> cpu family : 6 > >>> model : 60 > >>> model name : Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz > >> > >> Weird, I've never seen an Intel box do that before... hpa, any idea? or > >> is this just one weird BIOS. > > > > ;-) > > > > It is a Lenovo W541 laptop, for whatever that might be worth. Roughly > > on year old. > > Well, the obvious thing here is that CPUs 0-1, 2-3, 4-5, and 6-7 *are* > indeed threads in a core... Intel x86 products have supported > multithreading since the Pentium 4. So the "wait, this is an x86!" bit > is strange to me. > > The CPU in question (and /proc/cpuinfo should show this) has four cores > with a total of eight threads. The "siblings" and "cpu cores" fields in > /proc/cpuinfo should show the same thing. So I am utterly confused > about what is unexpected here? My prior experience with Intel x86 systems led me to expect that the hardware-thread pairs would instead be 0 and 4, 1 and 5, 2 and 6, and 3 and 7. This would result in a graph with a two-segment line, having higher slope for the lower-numbered CPUs and a lower slope for the higher-numbered CPUs, and I have in fact seen this behavior on older Intel x86 systems. See for example slides 64-67 of: http://www.rdrop.com/users/paulmck/scalability/paper/Updates.2016.06.05a.TUDresden.pdf But don't get me wrong, I do very much prefer the CPU-numbering approach that my laptop uses, where the hardware threads in a given core have consecutive numbers. > Also, you mentioned absolutely nothing about what kind of benchmark it > was, or what the "stairstepping" results imply, so it doesn't really > make it any easier... The benchmark was a POSIX-threads multithreaded benchmark with each thread repeatedly searching a small linked list, which should fit into the nearest-to-CPU cache. The "stairstepping" results suggest to me that a no-cache-miss pointer-following workload allows a single hardware thread to consume most of a given core's relevant hardware resources, at least on this particular chip. Which is fine -- this sort of thing always has been workload-specific. If you want to see an example plot, take a look at: CodeSamples/defer/perf-rcu-qsbr.eps within: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git Thanx, Paul