xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: x86: memset() / clear_page() / page scrubbing
Date: Wed, 14 Apr 2021 10:12:53 +0200	[thread overview]
Message-ID: <213c3706-5296-4673-dae2-12f9056ed73b@suse.com> (raw)
In-Reply-To: <21c00073-86a8-a040-fa40-e99e2fb434eb@citrix.com>

[-- Attachment #1: Type: text/plain, Size: 5516 bytes --]

On 13.04.2021 15:17, Andrew Cooper wrote:
> Do you have actual numbers from these experiments?

Attached is the collected raw output from a number of systems.

>  I've seen your patch
> from the thread, but at a minimum its missing some hunks adding new
> CPUID bits.

It's not missing hunks - these additions are in a prereq patch that
I meant to post together with whatever this analysis would lead to.
If you think I should submit the prereqs ahead of time, I can of
course do so.

>  I do worry however whether the testing is likely to be
> realistic for non-idle scenarios.

Of course it's not going to be - in non-idle scenarios we'll always
be somewhere in the middle. Therefore I wanted to have numbers at
the edges (hot and cold cache respectively), as any other numbers
are going to be much harder to obtain in a way that they would
actually be meaningful (and hence reasonably stable).

> It is very little surprise that AVX-512 on Skylake is poor.  The
> frequency hit from using %zmm is staggering.  IceLake is expected to be
> better, but almost certainly won't exceed REP MOVSB, which is optimised
> in microcode for the data width of the CPU.

Right, much like AVX has improved but didn't get anywhere near
REP MOVS.

> For memset(), please don't move in the direction of memcpy().  memcpy()
> is problematic because the common case is likely to be a multiple of 8
> bytes, meaning that we feed 0 into the REP MOVSB, and this a hit wanting
> avoiding.

And you say this despite me having pointed out that REP STOSL may
be faster in a number of cases? Or do you mean to suggest we should
branch around the trailing REP {MOV,STO}SB?

>  The "Fast Zero length $FOO" bits on future parts indicate
> when passing %ecx=0 is likely to be faster than branching around the
> invocation.

IOW down the road we could use alternatives patching to remove such
branches. But this of course is only if we don't end up using
exclusively REP MOVSB / REP STOSB there anyway, as you seem to be
suggesting ...

> With ERMS/etc, our logic should be a REP MOVSB/STOSB only, without any
> cleverness about larger word sizes.  The Linux forms do this fairly well
> already, and probably better than Xen, although there might be some room
> for improvement IMO.

... here.

As to the Linux implementations - for memcpy_erms() I don't think
I see any room for improvement in the function itself. We could do
alternatives patching somewhat differently (and I probably would).
For memset_erms() the tiny bit of improvement over Linux'es code
that I would see is to avoid the partial register access when
loading %al. But to be honest - in both cases I wouldn't have
bothered looking at their code anyway, if you hadn't pointed me
there.

> It is worth nothing that we have extra variations of memset/memcpy where
> __builtin_memcpy() gets expanded inline, and the result is a
> compiler-chosen sequence, and doesn't hit any of our optimised
> sequences.  I'm not sure what to do about this, because there is surely
> a larger win from the cases which can be turned into a single mov, or an
> elided store/copy, than using a potentially inefficient sequence in the
> rare cases.  Maybe there is room for a fine-tuning option to say "just
> call memset() if you're going to expand it inline".

You mean "just call memset() instead of expanding it inline"?

If the inline expansion is merely REP STOS, I'm not sure we'd
actually gain anything from keeping the compiler from expanding it
inline. But if the inline construct was more complicated (as I
observe e.g. in map_vcpu_info() with gcc 10), then it would likely
be nice if there was such a control. I'll take note to see if I
can find anything.

But this isn't relevant for {clear,copy}_page().

> For all set/copy operations, whether you want non-temporal or not
> depends on when/where the lines are next going to be consumed.  Page
> scrubbing in idle context is the only example I can think of where we
> aren't plausibly going to consume the destination imminently.  Even
> clear/copy page in a hypercall doesn't want to be non-temporal, because
> chances are good that the vcpu is going to touch the page on return.

I'm afraid the situation isn't as black-and-white. Take HAP or
IOMMU page table allocations, for example: They need to clear the
full page, yes. But often this is just to then insert one single
entry, i.e. re-use exactly one of the cache lines. Or take initial
population of guest RAM: The larger the guest, the less likely it
is for every individual page to get accessed again before its
contents get evicted from the caches. Judging from what Ankur said,
once we get to around L3 capacity, MOVNT / CLZERO may be preferable
there.

I think in cases where we don't know how the page is going to be
used subsequently, we ought to favor latency over cache pollution
avoidance. But in cases where we know the subsequent usage pattern,
we may want to direct scrubbing / zeroing accordingly. Yet of
course it's not very helpful that there's no way to avoid
polluting caches and still have reasonably low latency, so using
some heuristics may be unavoidable.

And of course another goal of mine would be to avoid double zeroing
of pages: When scrubbing uses clear_page() anyway, there's no point
in the caller then calling clear_page() again. IMO, just like we
have xzalloc(), we should also have MEMF_zero. Internally the page
allocator can know whether a page was already scrubbed, and it
does know for sure whether scrubbing means zeroing.

Jan

[-- Attachment #2: xen-clear-page.txt --]
[-- Type: text/plain, Size: 18093 bytes --]

Aorus (Skylake):

(XEN) erms=1 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=32k l2=1024k
(XEN) L1 w/o flush:
(XEN)  pre=5aa sse2=17c8 post=466
(XEN)  pre=302 stosb=544 post=6f2
(XEN)  pre=2f6 stosl=4de post=500
(XEN)  pre=308 stosq=4bc post=4b6
(XEN)  pre=300 avx=14d4 post=2fa
(XEN)  pre=2ea avx512=11ca post=300
(XEN)  pre=32c sse2=1620 post=330
(XEN)  pre=326 stosb=55a post=4b0
(XEN)  pre=332 stosl=4f2 post=4a2
(XEN)  pre=336 stosq=4ec post=47c
(XEN)  pre=332 avx=14f4 post=324
(XEN)  pre=3a2 avx512=1204 post=35c
(XEN)  pre=322 sse2=1606 post=330
(XEN)  pre=324 stosb=564 post=466
(XEN)  pre=31e stosl=4f8 post=49c
(XEN)  pre=322 stosq=4fa post=3e0
(XEN)  pre=340 avx=14f6 post=328
(XEN)  pre=326 avx512=120c post=322
(XEN) L1 w/ flush:
(XEN)  pre=2e4 sse2=c00 post=3e6
(XEN)  pre=34c stosb=916 post=722
(XEN)  pre=358 stosl=908 post=7b4
(XEN)  pre=360 stosq=a72 post=732
(XEN)  pre=33e avx=b3c post=33c
(XEN)  pre=348 avx512=a38 post=342
(XEN)  pre=342 sse2=c24 post=33e
(XEN)  pre=34e stosb=998 post=77c
(XEN)  pre=352 stosl=910 post=6e4
(XEN)  pre=356 stosq=94c post=74a
(XEN)  pre=334 avx=b44 post=332
(XEN)  pre=36e avx512=bca post=336
(XEN)  pre=356 sse2=c1a post=336
(XEN)  pre=35c stosb=92a post=6f0
(XEN)  pre=32e stosl=970 post=864
(XEN)  pre=358 stosq=94c post=756
(XEN)  pre=344 avx=b4c post=326
(XEN)  pre=34c avx512=a5c post=372
(XEN) L2 w/o flush:
(XEN)  pre=15f7c sse2=2eff8 post=c272
(XEN)  pre=cf8c stosb=cbf6 post=c6a4
(XEN)  pre=ce5c stosl=cc7e post=c6bc
(XEN)  pre=d3b6 stosq=7f5e6 post=d898
(XEN)  pre=cf56 avx=2d7de post=be1a
(XEN)  pre=cfe6 avx512=349c6 post=caf8
(XEN)  pre=dcee sse2=2f93e post=c97e
(XEN)  pre=dd6e stosb=d000 post=d102
(XEN)  pre=dad0 stosl=d034 post=d12e
(XEN)  pre=db00 stosq=d0ee post=d0b2
(XEN)  pre=dabc avx=2dec8 post=c830
(XEN)  pre=dc04 avx512=2dbbe post=c8aa
(XEN)  pre=db74 sse2=2f8e6 post=c89e
(XEN)  pre=dd4c stosb=d0a6 post=d16c
(XEN)  pre=da6c stosl=cfd0 post=d388
(XEN)  pre=d8c8 stosq=d054 post=d0b4
(XEN)  pre=db2e avx=2de78 post=cb3c
(XEN)  pre=d9ea avx512=2d9d6 post=c8f0
(XEN) L2 w/ flush:
(XEN)  pre=16000 sse2=16cf2 post=bfc4
(XEN)  pre=1604c stosb=12ab8 post=c66c
(XEN)  pre=16054 stosl=12624 post=c7a6
(XEN)  pre=16008 stosq=127b4 post=c54e
(XEN)  pre=15f7c avx=15a98 post=bd50
(XEN)  pre=16046 avx512=15760 post=13c52
(XEN)  pre=15f8a sse2=16dc0 post=bfb8
(XEN)  pre=15fb4 stosb=1293a post=c6da
(XEN)  pre=15f7c stosl=12672 post=c574
(XEN)  pre=15fee stosq=1245e post=c6fe
(XEN)  pre=15fc8 avx=15aae post=c01c
(XEN)  pre=1608c avx512=1ca32 post=c9ce
(XEN)  pre=15fba sse2=16cdc post=c076
(XEN)  pre=15ffe stosb=12992 post=c9b0
(XEN)  pre=16050 stosl=1290a post=c53e
(XEN)  pre=16002 stosq=129a6 post=c540
(XEN)  pre=15f98 avx=159ee post=bc50
(XEN)  pre=15fca avx512=159bc post=13d9a


Rome:

(XEN) erms=0 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=32k l2=512k
(XEN) L1 w/o flush:
(XEN)  pre=4c4 sse2=eec post=384
(XEN)  pre=35c stosb=230 post=398
(XEN)  pre=35c stosl=230 post=3d4
(XEN)  pre=35c stosq=258 post=410
(XEN)  pre=370 avx=dd4 post=370
(XEN)  pre=370 clzero=758 post=35c
(XEN)  pre=35c sse2=e10 post=370
(XEN)  pre=35c stosb=21c post=370
(XEN)  pre=35c stosl=230 post=3fc
(XEN)  pre=35c stosq=21c post=3ac
(XEN)  pre=35c avx=d98 post=35c
(XEN)  pre=35c clzero=758 post=35c
(XEN)  pre=35c sse2=e24 post=35c
(XEN)  pre=35c stosb=21c post=3d4
(XEN)  pre=35c stosl=21c post=3d4
(XEN)  pre=370 stosq=21c post=3ac
(XEN)  pre=35c avx=d84 post=35c
(XEN)  pre=35c clzero=758 post=370
(XEN) L1 w/ flush:
(XEN)  pre=438 sse2=a50 post=35c
(XEN)  pre=438 stosb=d34 post=398
(XEN)  pre=438 stosl=d0c post=384
(XEN)  pre=438 stosq=aa0 post=384
(XEN)  pre=44c avx=924 post=370
(XEN)  pre=44c clzero=5f0 post=370
(XEN)  pre=438 sse2=a50 post=35c
(XEN)  pre=438 stosb=c30 post=398
(XEN)  pre=44c stosl=d20 post=3c0
(XEN)  pre=438 stosq=b04 post=370
(XEN)  pre=438 avx=938 post=370
(XEN)  pre=44c clzero=6f4 post=35c
(XEN)  pre=438 sse2=a3c post=35c
(XEN)  pre=44c stosb=adc post=384
(XEN)  pre=438 stosl=aa0 post=3c0
(XEN)  pre=44c stosq=a3c post=370
(XEN)  pre=438 avx=924 post=35c
(XEN)  pre=438 clzero=5c8 post=370
(XEN) L2 w/o flush:
(XEN)  pre=670c sse2=fe88 post=108ec
(XEN)  pre=6e28 stosb=24cc post=14500
(XEN)  pre=7120 stosl=2468 post=14d0c
(XEN)  pre=7490 stosq=247c post=1507c
(XEN)  pre=7a6c avx=fc6c post=119cc
(XEN)  pre=72b0 clzero=73f0 post=118b4
(XEN)  pre=7184 sse2=fdfc post=11e2c
(XEN)  pre=6f04 stosb=247c post=14b90
(XEN)  pre=7288 stosl=2530 post=15054
(XEN)  pre=75d0 stosq=24a4 post=15b30
(XEN)  pre=6fe0 avx=fc94 post=11864
(XEN)  pre=7198 clzero=74cc post=11d50
(XEN)  pre=751c sse2=fdfc post=121b0
(XEN)  pre=7350 stosb=24cc post=15360
(XEN)  pre=6e64 stosl=24b8 post=14f00
(XEN)  pre=7738 stosq=2440 post=14a8c
(XEN)  pre=6f90 avx=fcf8 post=11bc0
(XEN)  pre=729c clzero=747c post=11ae4
(XEN) L2 w/ flush:
(XEN)  pre=580c sse2=a870 post=10554
(XEN)  pre=5744 stosb=9c7c post=152ac
(XEN)  pre=5924 stosl=9a24 post=15c48
(XEN)  pre=56cc stosq=9df8 post=157fc
(XEN)  pre=5898 avx=a640 post=10338
(XEN)  pre=5974 clzero=69dc post=10dec
(XEN)  pre=5be0 sse2=a870 post=10ba8
(XEN)  pre=57a8 stosb=9ed4 post=15a40
(XEN)  pre=594c stosl=9d6c post=16198
(XEN)  pre=5438 stosq=9dd0 post=15860
(XEN)  pre=57d0 avx=a49c post=10b80
(XEN)  pre=52bc clzero=69dc post=f938
(XEN)  pre=56e0 sse2=ab54 post=10b08
(XEN)  pre=5654 stosb=9f88 post=1584c
(XEN)  pre=5654 stosl=a014 post=14ab4
(XEN)  pre=58c0 stosq=9a38 post=15dc4
(XEN)  pre=57a8 avx=a640 post=10c0c
(XEN)  pre=5618 clzero=69dc post=10554


Precision 7810 (Haswell):

(XEN) erms=1 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=32k l2=256k
(XEN) L1 w/o flush:
(XEN)  pre=618 sse2=1324 post=41c
(XEN)  pre=3c4 stosb=6fc post=74c
(XEN)  pre=3ac stosl=6c4 post=728
(XEN)  pre=39c stosq=6b0 post=720
(XEN)  pre=3ac avx=df4 post=3e4
(XEN)  pre=38c sse2=f4c post=3a8
(XEN)  pre=38c stosb=6e4 post=748
(XEN)  pre=390 stosl=698 post=6f8
(XEN)  pre=380 stosq=6ac post=6ec
(XEN)  pre=3a4 avx=e28 post=3a8
(XEN)  pre=384 sse2=f50 post=374
(XEN)  pre=398 stosb=6ec post=6d4
(XEN)  pre=380 stosl=69c post=700
(XEN)  pre=3b8 stosq=698 post=6cc
(XEN)  pre=394 avx=e64 post=390
(XEN) L1 w/ flush:
(XEN)  pre=49c sse2=109c post=380
(XEN)  pre=480 stosb=1c08 post=864
(XEN)  pre=4d0 stosl=1bc8 post=820
(XEN)  pre=488 stosq=1bb8 post=834
(XEN)  pre=3ac avx=ddc post=388
(XEN)  pre=498 sse2=ef8 post=384
(XEN)  pre=474 stosb=1cb0 post=85c
(XEN)  pre=4a4 stosl=1bc4 post=85c
(XEN)  pre=47c stosq=1bcc post=828
(XEN)  pre=480 avx=df0 post=38c
(XEN)  pre=498 sse2=f08 post=370
(XEN)  pre=480 stosb=1ed4 post=880
(XEN)  pre=47c stosl=1bb0 post=848
(XEN)  pre=48c stosq=1ba0 post=850
(XEN)  pre=488 avx=de4 post=394
(XEN) L2 w/o flush:
(XEN)  pre=6450 sse2=7f78 post=39c8
(XEN)  pre=5478 stosb=3ab8 post=4b74
(XEN)  pre=4f68 stosl=3978 post=4d84
(XEN)  pre=4ca0 stosq=395c post=4e60
(XEN)  pre=52b4 avx=7974 post=3c84
(XEN)  pre=4fa8 sse2=7f24 post=3a80
(XEN)  pre=5118 stosb=3ad8 post=4e18
(XEN)  pre=4df0 stosl=3908 post=4ce8
(XEN)  pre=5028 stosq=396c post=4ef0
(XEN)  pre=5110 avx=7968 post=3ba4
(XEN)  pre=5088 sse2=7f20 post=3b1c
(XEN)  pre=4db8 stosb=3908 post=4ec4
(XEN)  pre=4eb4 stosl=3a00 post=4c00
(XEN)  pre=4f90 stosq=3970 post=4d98
(XEN)  pre=4f3c avx=7950 post=3a78
(XEN) L2 w/ flush:
(XEN)  pre=6380 sse2=786c post=3948
(XEN)  pre=6400 stosb=10478 post=4740
(XEN)  pre=6430 stosl=10564 post=46cc
(XEN)  pre=6430 stosq=10608 post=46c4
(XEN)  pre=6498 avx=7548 post=3978
(XEN)  pre=6418 sse2=7868 post=3934
(XEN)  pre=6350 stosb=10988 post=4798
(XEN)  pre=6410 stosl=10508 post=4678
(XEN)  pre=63dc stosq=105a8 post=46fc
(XEN)  pre=6500 avx=7564 post=39d0
(XEN)  pre=63b0 sse2=7890 post=397c
(XEN)  pre=648c stosb=10868 post=47f0
(XEN)  pre=64a0 stosl=106f4 post=46b4
(XEN)  pre=646c stosq=10468 post=4734
(XEN)  pre=63ec avx=75c4 post=3938


Dinar:

(XEN) erms=0 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=16k l2=2048k
(XEN) L1 w/o flush:
(XEN)  pre=7e6 sse2=1c06 post=79d
(XEN)  pre=70a stosb=668 post=84f
(XEN)  pre=6dc stosl=676 post=83f
(XEN)  pre=6cf stosq=65b post=872
(XEN)  pre=6e0 avx=1a84 post=706
(XEN)  pre=709 sse2=19aa post=6ce
(XEN)  pre=6b7 stosb=601 post=844
(XEN)  pre=6e8 stosl=613 post=85e
(XEN)  pre=6a1 stosq=614 post=824
(XEN)  pre=6b9 avx=1a66 post=695
(XEN)  pre=6e2 sse2=199b post=6af
(XEN)  pre=6e7 stosb=602 post=839
(XEN)  pre=6cc stosl=61b post=845
(XEN)  pre=6ad stosq=607 post=815
(XEN)  pre=6ac avx=1a81 post=693
(XEN) L1 w/ flush:
(XEN)  pre=804 sse2=c48 post=6da
(XEN)  pre=7ca stosb=e16 post=82b
(XEN)  pre=7a3 stosl=ef0 post=81e
(XEN)  pre=7d7 stosq=dde post=829
(XEN)  pre=7ae avx=1562 post=6c0
(XEN)  pre=7c9 sse2=c3a post=6d8
(XEN)  pre=7ec stosb=db0 post=82b
(XEN)  pre=7f0 stosl=e3e post=84d
(XEN)  pre=7f1 stosq=de8 post=827
(XEN)  pre=7dd avx=157a post=6bd
(XEN)  pre=7d2 sse2=c49 post=6c4
(XEN)  pre=7a4 stosb=dfe post=848
(XEN)  pre=7ce stosl=e8c post=831
(XEN)  pre=7b3 stosq=daa post=81d
(XEN)  pre=7f8 avx=156b post=6d0
(XEN) L2 w/o flush:
(XEN)  pre=5e24f sse2=7ff69 post=40af6
(XEN)  pre=3c515 stosb=4ddc7 post=9f3bf
(XEN)  pre=3cfb9 stosl=4da3c post=9efcb
(XEN)  pre=3bc5c stosq=4dbd3 post=9ec1c
(XEN)  pre=3c927 avx=a6cc0 post=42aa1
(XEN)  pre=3cf6d sse2=7fe95 post=4223d
(XEN)  pre=3c55f stosb=4e035 post=9f25d
(XEN)  pre=3cd63 stosl=4dd8b post=9f14f
(XEN)  pre=3b8d3 stosq=4de1f post=9f050
(XEN)  pre=3c66f avx=a6cad post=43886
(XEN)  pre=3c990 sse2=7feb9 post=42a6d
(XEN)  pre=3c1a0 stosb=4dd45 post=9f04a
(XEN)  pre=3d0ae stosl=4de64 post=9f02b
(XEN)  pre=3c0ae stosq=4d9dc post=9edb8
(XEN)  pre=3d0b4 avx=a6c97 post=41e67
(XEN) L2 w/ flush:
(XEN)  pre=39194 sse2=55efd post=3a2a9
(XEN)  pre=391cf stosb=5a8bc post=95a1d
(XEN)  pre=3913c stosl=5a5a7 post=8fced
(XEN)  pre=3938b stosq=5a68b post=967d4
(XEN)  pre=38232 avx=9d328 post=3a4fe
(XEN)  pre=393a6 sse2=56027 post=3a2fe
(XEN)  pre=3917a stosb=59f3f post=9518a
(XEN)  pre=390c2 stosl=5a0f3 post=951bc
(XEN)  pre=3922e stosq=5a7f6 post=952db
(XEN)  pre=39443 avx=9d407 post=3a4c4
(XEN)  pre=38635 sse2=55fb8 post=3a557
(XEN)  pre=38237 stosb=5a2fb post=92f3a
(XEN)  pre=3914e stosl=5a8e5 post=8bb48
(XEN)  pre=39058 stosq=5a5dc post=96726
(XEN)  pre=3913c avx=9d33d post=3a2d1


Romley (Sandybridge):

(XEN) erms=0 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=32k l2=256k
(XEN) L1 w/o flush:
(XEN)  pre=954 sse2=2958 post=798
(XEN)  pre=792 stosb=e7c post=af2
(XEN)  pre=732 stosl=b70 post=b28
(XEN)  pre=768 stosq=bdc post=ac2
(XEN)  pre=74a avx=26ac post=750
(XEN)  pre=774 sse2=27d2 post=708
(XEN)  pre=738 stosb=e4c post=ada
(XEN)  pre=714 stosl=b22 post=a98
(XEN)  pre=732 stosq=b34 post=ac2
(XEN)  pre=714 avx=2730 post=714
(XEN)  pre=714 sse2=27d8 post=70e
(XEN)  pre=72c stosb=e3a post=ab0
(XEN)  pre=714 stosl=b04 post=a74
(XEN)  pre=732 stosq=b04 post=a92
(XEN)  pre=714 avx=4fc8 post=714
(XEN) L1 w/ flush:
(XEN)  pre=7c8 sse2=2784 post=708
(XEN)  pre=72c stosb=2100 post=ca8
(XEN)  pre=80a stosl=1ed2 post=c1e
(XEN)  pre=7f2 stosq=2052 post=c90
(XEN)  pre=714 avx=2652 post=714
(XEN)  pre=7d4 sse2=2772 post=732
(XEN)  pre=7c8 stosb=2466 post=be2
(XEN)  pre=828 stosl=2004 post=c72
(XEN)  pre=7d4 stosq=20b2 post=c96
(XEN)  pre=81c avx=2682 post=714
(XEN)  pre=7d4 sse2=2754 post=72c
(XEN)  pre=7c8 stosb=2358 post=bca
(XEN)  pre=828 stosl=1ecc post=c48
(XEN)  pre=7c8 stosq=20b8 post=c00
(XEN)  pre=81c avx=26f4 post=714
(XEN) L2 w/o flush:
(XEN)  pre=9cf6 sse2=14b9e post=5706
(XEN)  pre=7ce0 stosb=6f00 post=74a6
(XEN)  pre=78ea stosl=5e26 post=79c8
(XEN)  pre=7926 stosq=5ec2 post=7848
(XEN)  pre=7920 avx=1410c post=5c70
(XEN)  pre=7bde sse2=14a06 post=5dea
(XEN)  pre=7ab2 stosb=6dda post=78c0
(XEN)  pre=7a6a stosl=5f34 post=792c
(XEN)  pre=7752 stosq=6054 post=7bfc
(XEN)  pre=7974 avx=14172 post=5de4
(XEN)  pre=7a76 sse2=14a54 post=5dc0
(XEN)  pre=77d6 stosb=6cd8 post=779a
(XEN)  pre=774c stosl=5dcc post=7c38
(XEN)  pre=788a stosq=5e62 post=7a04
(XEN)  pre=7722 avx=16aca post=5e2c
(XEN) L2 w/ flush:
(XEN)  pre=9cea sse2=14172 post=571e
(XEN)  pre=9c3c stosb=113e2 post=6d50
(XEN)  pre=9d56 stosl=10926 post=6ca8
(XEN)  pre=9ca2 stosq=10950 post=6db6
(XEN)  pre=9d44 avx=13b06 post=5700
(XEN)  pre=9df8 sse2=141cc post=56a6
(XEN)  pre=9cc0 stosb=112a4 post=6ca8
(XEN)  pre=9d50 stosl=109c8 post=6ca2
(XEN)  pre=9c84 stosq=10a10 post=6cf0
(XEN)  pre=9c84 avx=13b30 post=56e8
(XEN)  pre=9cde sse2=141ea post=579c
(XEN)  pre=9c7e stosb=11370 post=6c2a
(XEN)  pre=9d44 stosl=108de post=6c3c
(XEN)  pre=9bf4 stosq=1096e post=6ccc
(XEN)  pre=9c7e avx=13b18 post=56ac


Westmere:

(XEN) erms=0 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=32k l2=256k
(XEN) L1 w/o flush:
(XEN)  pre=1184 sse2=2058 post=c60
(XEN)  pre=ad4 stosb=b60 post=1a24
(XEN)  pre=9d4 stosl=874 post=1348
(XEN)  pre=9e8 stosq=8d4 post=dd0
(XEN)  pre=9dc sse2=1dfc post=9e8
(XEN)  pre=9e8 stosb=a6c post=da4
(XEN)  pre=9d4 stosl=854 post=dd4
(XEN)  pre=9e8 stosq=8a4 post=d3c
(XEN)  pre=9d8 sse2=1e1c post=9ec
(XEN)  pre=9e8 stosb=a44 post=cc8
(XEN)  pre=9d4 stosl=81c post=d0c
(XEN)  pre=9ec stosq=810 post=cc8
(XEN) L1 w/ flush:
(XEN)  pre=b18 sse2=196c post=a84
(XEN)  pre=b08 stosb=15b8 post=116c
(XEN)  pre=b10 stosl=1440 post=163c
(XEN)  pre=a48 stosq=13d8 post=13b4
(XEN)  pre=b1c sse2=199c post=a3c
(XEN)  pre=bb8 stosb=15c4 post=12e8
(XEN)  pre=b0c stosl=1324 post=1430
(XEN)  pre=a48 stosq=135c post=12c4
(XEN)  pre=b1c sse2=199c post=a3c
(XEN)  pre=b18 stosb=1818 post=1320
(XEN)  pre=b10 stosl=1324 post=11bc
(XEN)  pre=a48 stosq=135c post=122c
(XEN) L2 w/o flush:
(XEN)  pre=8e20 sse2=f490 post=504c
(XEN)  pre=77a4 stosb=7804 post=6854
(XEN)  pre=778c stosl=7280 post=636c
(XEN)  pre=7594 stosq=7234 post=60c8
(XEN)  pre=70bc sse2=f3c4 post=55e0
(XEN)  pre=7014 stosb=77e8 post=5f68
(XEN)  pre=73f8 stosl=7264 post=62b8
(XEN)  pre=72ec stosq=7208 post=62fc
(XEN)  pre=6d80 sse2=f370 post=51a0
(XEN)  pre=6e34 stosb=7804 post=5f84
(XEN)  pre=7058 stosl=723c post=5fb8
(XEN)  pre=6f1c stosq=725c post=6188
(XEN) L2 w/ flush:
(XEN)  pre=8e48 sse2=cbc4 post=5034
(XEN)  pre=8d5c stosb=999c post=58d0
(XEN)  pre=8da0 stosl=912c post=590c
(XEN)  pre=8c10 stosq=8f80 post=5a0c
(XEN)  pre=8e10 sse2=cbd0 post=5030
(XEN)  pre=8cb0 stosb=9878 post=5960
(XEN)  pre=8de4 stosl=9060 post=58e4
(XEN)  pre=8c0c stosq=8fa0 post=5a10
(XEN)  pre=8d4c sse2=cbd0 post=502c
(XEN)  pre=8cf8 stosb=9834 post=58f0
(XEN)  pre=8de4 stosl=90d0 post=58e4
(XEN)  pre=8c0c stosq=9178 post=5998


Latitude E6410 (Sandybridge):

(XEN) erms=0 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=32k l2=256k
(XEN) L1 w/o flush:
(XEN)  pre=68d sse2=3c06 post=460
(XEN)  pre=41f stosb=8a0 post=823
(XEN)  pre=413 stosl=6ae post=789
(XEN)  pre=413 stosq=6e3 post=78f
(XEN)  pre=413 sse2=3989 post=410
(XEN)  pre=422 stosb=81d post=771
(XEN)  pre=413 stosl=675 post=77d
(XEN)  pre=3f9 stosq=667 post=6fb
(XEN)  pre=437 sse2=38b7 post=416
(XEN)  pre=407 stosb=802 post=727
(XEN)  pre=407 stosl=65e post=754
(XEN)  pre=404 stosq=65b post=6ef
(XEN) L1 w/ flush:
(XEN)  pre=5b4 sse2=20ca post=433
(XEN)  pre=55f stosb=15a2 post=861
(XEN)  pre=565 stosl=1252 post=861
(XEN)  pre=559 stosq=1444 post=84d
(XEN)  pre=57c sse2=21ae post=436
(XEN)  pre=55f stosb=157e post=897
(XEN)  pre=56d stosl=1255 post=83b
(XEN)  pre=657 stosq=1282 post=86d
(XEN)  pre=565 sse2=21bd post=43a
(XEN)  pre=57c stosb=153d post=88d
(XEN)  pre=56b stosl=1247 post=87c
(XEN)  pre=573 stosq=1258 post=87f
(XEN) L2 w/o flush:
(XEN)  pre=602b sse2=1d4d4 post=3669
(XEN)  pre=4a6c stosb=4b79 post=44b4
(XEN)  pre=4976 stosl=4383 post=48d0
(XEN)  pre=4d95 stosq=435d post=47ba
(XEN)  pre=4bf3 sse2=1d333 post=39f7
(XEN)  pre=4bed stosb=4b3c post=4671
(XEN)  pre=5003 stosl=435d post=4de8
(XEN)  pre=4f0a stosq=4377 post=4874
(XEN)  pre=4d1e sse2=1d368 post=3e6e
(XEN)  pre=4f25 stosb=4b4a post=47a5
(XEN)  pre=4abf stosl=4316 post=47cc
(XEN)  pre=4f19 stosq=4351 post=48bb
(XEN) L2 w/ flush:
(XEN)  pre=60cb sse2=10310 post=3672
(XEN)  pre=60ce stosb=956c post=436b
(XEN)  pre=603d stosl=8a70 post=438f
(XEN)  pre=5fe1 stosq=876d post=442f
(XEN)  pre=60f8 sse2=103dc post=36aa
(XEN)  pre=6010 stosb=94db post=436e
(XEN)  pre=60fe stosl=8a7c post=43a7
(XEN)  pre=605b stosq=876d post=4473
(XEN)  pre=6093 sse2=10485 post=36b9
(XEN)  pre=604c stosb=93c4 post=43b3
(XEN)  pre=60b3 stosl=8c03 post=435f
(XEN)  pre=607e stosq=895c post=43fc


Tulsa (Fam0f Xeon (7100?)):

(XEN) erms=0 fsrm=0 fzrm=0 fsrs=0 fsrcs=0 l1d=16k l2=1024k
(XEN) L1 w/o flush:
(XEN)  pre=caf sse2=3cd4 post=b39
(XEN)  pre=b28 stosb=192b post=1485
(XEN)  pre=b9f stosl=d7b post=d37
(XEN)  pre=b28 stosq=c6b post=c8d
(XEN)  pre=b17 sse2=3223 post=ae4
(XEN)  pre=a8f stosb=bd2 post=b4a
(XEN)  pre=a8f stosl=aa0 post=af5
(XEN)  pre=ab1 stosq=a8f post=be3
(XEN)  pre=ac2 sse2=3212 post=ae4
(XEN)  pre=a8f stosb=bc1 post=ae4
(XEN)  pre=aa0 stosl=a6d post=ad3
(XEN)  pre=aa0 stosq=a6d post=ae4
(XEN) L1 w/ flush:
(XEN)  pre=b06 sse2=628c post=c27
(XEN)  pre=ae4 stosb=958c post=14eb
(XEN)  pre=b17 stosl=959d post=16a5
(XEN)  pre=b06 stosq=9669 post=15d9
(XEN)  pre=ae4 sse2=6127 post=bc1
(XEN)  pre=a7e stosb=9636 post=15b7
(XEN)  pre=aa0 stosl=92e4 post=1474
(XEN)  pre=a8f stosq=95e1 post=161d
(XEN)  pre=ab1 sse2=62bf post=c05
(XEN)  pre=a8f stosb=96e0 post=180a
(XEN)  pre=a8f stosl=9702 post=15a6
(XEN)  pre=aa0 stosq=9438 post=15d9
(XEN) L2 w/o flush:
(XEN)  pre=5342d sse2=d49ed post=21c8c
(XEN)  pre=25498 stosb=69d4b post=315c5
(XEN)  pre=249b4 stosl=6982e post=3166f
(XEN)  pre=2470c stosq=69e28 post=30bbe
(XEN)  pre=23f25 sse2=d4536 post=1fb14
(XEN)  pre=23e26 stosb=6ad2a post=30a6a
(XEN)  pre=23fcf stosl=68ed1 post=2fd22
(XEN)  pre=23e48 stosq=69be6 post=308e3
(XEN)  pre=23e9d sse2=d4459 post=20c9c
(XEN)  pre=23f69 stosb=6a70e post=30c9b
(XEN)  pre=24035 stosl=69069 post=302e9
(XEN)  pre=258fa stosq=69aa3 post=30cbd
(XEN) L2 w/ flush:
(XEN)  pre=263cd sse2=1306a2 post=21bd1
(XEN)  pre=265fe stosb=27aeb2 post=3177f
(XEN)  pre=26466 stosl=27f516 post=311c9
(XEN)  pre=26004 stosq=27cb40 post=3153d
(XEN)  pre=25641 sse2=13031d post=21b16
(XEN)  pre=26411 stosb=27f57c post=31ecd
(XEN)  pre=262f0 stosl=27b5bc post=315a3
(XEN)  pre=25f49 stosq=27b974 post=312a6
(XEN)  pre=25520 sse2=1310ed post=21b38
(XEN)  pre=2617a stosb=27d107 post=314d7
(XEN)  pre=261ad stosl=27bd81 post=309f3
(XEN)  pre=25ff3 stosq=27dff8 post=3164d

  reply	other threads:[~2021-04-14  8:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 13:58 x86: memset() / clear_page() / page scrubbing Jan Beulich
2021-04-09  6:08 ` Ankur Arora
2021-04-09  6:38   ` Jan Beulich
2021-04-09 21:01     ` Ankur Arora
2021-04-12  9:15       ` Jan Beulich
2021-04-13 13:17 ` Andrew Cooper
2021-04-14  8:12   ` Jan Beulich [this message]
2021-04-15 16:21     ` Andrew Cooper
2021-04-21 13:55       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=213c3706-5296-4673-dae2-12f9056ed73b@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).