From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.hansen@linux.intel.com>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 25 Jun
  2018 17:26:29 -0000
Received: from mga01.intel.com ([192.55.52.88])
	by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <dave.hansen@linux.intel.com>)
	id 1fXVFz-0004Ii-0d
	for speck@linutronix.de; Mon, 25 Jun 2018 19:26:28 +0200
Subject: [MODERATED] Re: [PATCH v4 3/8] [PATCH v4 3/8] Linux Patch #3
References: <20180623135445.641656585@localhost.localdomain>
 <b785267d-32da-f92f-885e-8106bd228e38@redhat.com>
 <f153b2f6-c2ce-fe40-5e39-f7b297b746f4@linux.intel.com>
 <cb3b10a8-1e8a-e8e4-07cb-fa767e22675b@redhat.com>
From: Dave Hansen <dave.hansen@linux.intel.com>
Message-ID: <b0c08867-32a9-4a36-a1ab-c1b5f990a0b7@linux.intel.com>
Date: Mon, 25 Jun 2018 10:26:10 -0700
MIME-Version: 1.0
In-Reply-To: <cb3b10a8-1e8a-e8e4-07cb-fa767e22675b@redhat.com>
Content-Type: multipart/mixed; boundary="RSvlOUnWFDqZibIkmb4ruW6UHP3XyBNAy"; protected-headers="v1"
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

This is an OpenPGP/MIME encrypted message (RFC 4880 and 3156)
--RSvlOUnWFDqZibIkmb4ruW6UHP3XyBNAy
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 06/25/2018 09:46 AM, speck for Paolo Bonzini wrote:
>> 32k is theoretically enough, but _only_ if none of the lines being
>> touched were in the cache previously.  That's why it was a 64k buffer =
in
>> some examples.
>=20
> But pre-Skylake has 16k cache only, doesn't it?  Does it need to read i=
n
> 4 times the cache size?

I thought it's been 32k for a while.  But, either way, I guess we should
be doing the *enumerated* L1D size rather than a fixed 32k.

Here's a Haswell system, btw:

dave@o2:~$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K
dave@o2:~$ cat /proc/cpuinfo  | grep model
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Or a Westmere Xeon:

dave@bigbox:~$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K

>> You also need guard pages at either end to ensure the prefetchers don'=
t
>> run into the next page.
>=20
> Hmm, it would be a pity to require order 5 even.  Earlier in the thread=

> someone said that 52 KiB were enough, if that's confirmed we could keep=

> order 4 and have guard pages.

52k was the theoretical floor of the smallest possible size that would
guarantee 32k got _evicted_.  But, the recommendation from the hardware
folks was to do more than 52k so there was some buffer in case the
analysis was imprecise.

BTW, this buffer does *not* need to be per-thread necessarily.  Tony
Luck pointed out that we could just have a buffer for each hyperthread.
Core-0/Thread-0 could share its buffer with Core-1/Thread-0, for
instance.  Having them be NUMA-node-local would be nice too, but not
required for correctness.

At the point that we've got two per NUMA node, I'm not sure we really
care much whether it's 128k or 64k consumed.  I'd much rather do what
the hardware folks are comfortable with than save 64k.


--RSvlOUnWFDqZibIkmb4ruW6UHP3XyBNAy--