From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751021AbcGLS0X (ORCPT <rfc822;w@1wt.eu>);
	Tue, 12 Jul 2016 14:26:23 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:38507 "EHLO
	mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1750814AbcGLS0W (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 12 Jul 2016 14:26:22 -0400
X-IBM-Helo: d01dlp02.pok.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
Date: Tue, 12 Jul 2016 11:26:35 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>, tglx@linutronix.de, mingo@elte.hu,
        ak@linux.intel.com, linux-kernel@vger.kernel.org
Subject: Re: Odd performance results
Reply-To: paulmck@linux.vnet.ibm.com
References: <20160710042639.GA4068@linux.vnet.ibm.com>
 <7DF218CD-22F6-4E46-A628-2138AEA3A161@infradead.org>
 <20160710144327.GX4650@linux.vnet.ibm.com>
 <20160712145551.GU30909@twins.programming.kicks-ass.net>
 <20160712150529.GN7094@linux.vnet.ibm.com>
 <27d2c710-479d-77a9-f2c6-875e9c2bc40f@zytor.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <27d2c710-479d-77a9-f2c6-875e9c2bc40f@zytor.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16071218-0040-0000-0000-000000C9E00B
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16071218-0041-0000-0000-000004A40F01
Message-Id: <20160712182635.GV7094@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-12_09:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000
 definitions=main-1607120163
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jul 12, 2016 at 10:49:58AM -0700, H. Peter Anvin wrote:
> On 07/12/16 08:05, Paul E. McKenney wrote:
> > On Tue, Jul 12, 2016 at 04:55:51PM +0200, Peter Zijlstra wrote:
> >> On Sun, Jul 10, 2016 at 07:43:27AM -0700, Paul E. McKenney wrote:
> >>> On Sun, Jul 10, 2016 at 07:17:19AM +0200, Peter Zijlstra wrote:
> >>>>
> >>>>
> >>>> On 10 July 2016 06:26:39 CEST, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >>>>> Hello!
> >>>>>
> >>>>> So I ran a quick benchmark which showed stair-step results.  I
> >>>>> immediately
> >>>>> thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7
> >>>>> being threads in a core."  Then I thought "Wait, this is an x86!"
> >>>>> Then I dumped out cpu*/topology/thread_siblings_list, getting the
> >>>>> following:
> >>>>>
> >>>>> 	cpu0/topology/thread_siblings_list: 0-1
> >>>>> 	cpu1/topology/thread_siblings_list: 0-1
> >>>>> 	cpu2/topology/thread_siblings_list: 2-3
> >>>>> 	cpu3/topology/thread_siblings_list: 2-3
> >>>>> 	cpu4/topology/thread_siblings_list: 4-5
> >>>>> 	cpu5/topology/thread_siblings_list: 4-5
> >>>>> 	cpu6/topology/thread_siblings_list: 6-7
> >>>>> 	cpu7/topology/thread_siblings_list: 6-7
> >>>>
> >>>>
> >>>> I'm guessing this is an AMD bulldozer like machine?
> >>>
> >>> /proc/cpuinfo thinks otherwise:
> >>>
> >>> processor	: 0
> >>> vendor_id	: GenuineIntel
> >>> cpu family	: 6
> >>> model		: 60
> >>> model name	: Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
> >>
> >> Weird, I've never seen an Intel box do that before... hpa, any idea? or
> >> is this just one weird BIOS.
> > 
> > ;-)
> > 
> > It is a Lenovo W541 laptop, for whatever that might be worth.  Roughly
> > on year old.
> 
> Well, the obvious thing here is that CPUs 0-1, 2-3, 4-5, and 6-7 *are*
> indeed threads in a core... Intel x86 products have supported
> multithreading since the Pentium 4.  So the "wait, this is an x86!" bit
> is strange to me.
> 
> The CPU in question (and /proc/cpuinfo should show this) has four cores
> with a total of eight threads.  The "siblings" and "cpu cores" fields in
> /proc/cpuinfo should show the same thing.  So I am utterly confused
> about what is unexpected here?

My prior experience with Intel x86 systems led me to expect that the
hardware-thread pairs would instead be 0 and 4, 1 and 5, 2 and 6, and 3
and 7.  This would result in a graph with a two-segment line, having
higher slope for the lower-numbered CPUs and a lower slope for the
higher-numbered CPUs, and I have in fact seen this behavior on older
Intel x86 systems.  See for example slides 64-67 of:

http://www.rdrop.com/users/paulmck/scalability/paper/Updates.2016.06.05a.TUDresden.pdf

But don't get me wrong, I do very much prefer the CPU-numbering approach
that my laptop uses, where the hardware threads in a given core have
consecutive numbers.

> Also, you mentioned absolutely nothing about what kind of benchmark it
> was, or what the "stairstepping" results imply, so it doesn't really
> make it any easier...

The benchmark was a POSIX-threads multithreaded benchmark with each
thread repeatedly searching a small linked list, which should fit into
the nearest-to-CPU cache.  The "stairstepping" results suggest to me
that a no-cache-miss pointer-following workload allows a single hardware
thread to consume most of a given core's relevant hardware resources,
at least on this particular chip.  Which is fine -- this sort of thing
always has been workload-specific.

If you want to see an example plot, take a look at:

	CodeSamples/defer/perf-rcu-qsbr.eps

within:

	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git

							Thanx, Paul