From: Mikulas Patocka <mpatocka@redhat.com> To: Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Dave Jiang <dave.jiang@intel.com>, Ira Weiny <ira.weiny@intel.com>, Mike Snitzer <msnitzer@redhat.com> Cc: linux-nvdimm@lists.01.org, dm-devel@redhat.com Subject: [PATCH v2] memcpy_flushcache: use cache flusing for larger lengths Date: Mon, 30 Mar 2020 07:32:20 -0400 (EDT) [thread overview] Message-ID: <alpine.LRH.2.02.2003300729320.9938@file01.intranet.prod.int.rdu2.redhat.com> (raw) In-Reply-To: <alpine.LRH.2.02.2003291625590.32108@file01.intranet.prod.int.rdu2.redhat.com> This is the second version of the patch - it adds a test for boot_cpu_data.x86_clflush_size. There may be CPUs with different cache line size and we don't want to run the 64-byte aligned loop on them. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> memcpy_flushcache: use cache flusing for larger lengths I tested dm-writecache performance on a machine with Optane nvdimm and it turned out that for larger writes, cached stores + cache flushing perform better than non-temporal stores. This is the throughput of dm-writecache measured with this command: dd if=/dev/zero of=/dev/mapper/wc bs=64 oflag=direct block size 512 1024 2048 4096 movnti 496 MB/s 642 MB/s 725 MB/s 744 MB/s clflushopt 373 MB/s 688 MB/s 1.1 GB/s 1.2 GB/s We can see that for smaller block, movnti performs better, but for larger blocks, clflushopt has better performance. This patch changes the function __memcpy_flushcache accordingly, so that with size >= 768 it performs cached stores and cache flushing. Note that we must not use the new branch if the CPU doesn't have clflushopt - in that case, the kernel would use inefficient "clflush" instruction that has very bad performance. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- arch/x86/lib/usercopy_64.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) Index: linux-2.6/arch/x86/lib/usercopy_64.c =================================================================== --- linux-2.6.orig/arch/x86/lib/usercopy_64.c 2020-03-24 15:15:36.644945091 -0400 +++ linux-2.6/arch/x86/lib/usercopy_64.c 2020-03-30 07:17:51.450290007 -0400 @@ -152,6 +152,42 @@ void __memcpy_flushcache(void *_dst, con return; } + if (static_cpu_has(X86_FEATURE_CLFLUSHOPT) && size >= 768 && likely(boot_cpu_data.x86_clflush_size == 64)) { + while (!IS_ALIGNED(dest, 64)) { + asm("movq (%0), %%r8\n" + "movnti %%r8, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 8; + source += 8; + size -= 8; + } + do { + asm("movq (%0), %%r8\n" + "movq 8(%0), %%r9\n" + "movq 16(%0), %%r10\n" + "movq 24(%0), %%r11\n" + "movq %%r8, (%1)\n" + "movq %%r9, 8(%1)\n" + "movq %%r10, 16(%1)\n" + "movq %%r11, 24(%1)\n" + "movq 32(%0), %%r8\n" + "movq 40(%0), %%r9\n" + "movq 48(%0), %%r10\n" + "movq 56(%0), %%r11\n" + "movq %%r8, 32(%1)\n" + "movq %%r9, 40(%1)\n" + "movq %%r10, 48(%1)\n" + "movq %%r11, 56(%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8", "r9", "r10", "r11"); + clflushopt((void *)dest); + dest += 64; + source += 64; + size -= 64; + } while (size >= 64); + } + /* 4x8 movnti loop */ while (size >= 32) { asm("movq (%0), %%r8\n" _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Mikulas Patocka <mpatocka@redhat.com> To: Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Dave Jiang <dave.jiang@intel.com>, Ira Weiny <ira.weiny@intel.com>, Mike Snitzer <msnitzer@redhat.com> Cc: dm-devel@redhat.com, linux-nvdimm@lists.01.org Subject: [PATCH v2] memcpy_flushcache: use cache flusing for larger lengths Date: Mon, 30 Mar 2020 07:32:20 -0400 (EDT) [thread overview] Message-ID: <alpine.LRH.2.02.2003300729320.9938@file01.intranet.prod.int.rdu2.redhat.com> (raw) In-Reply-To: <alpine.LRH.2.02.2003291625590.32108@file01.intranet.prod.int.rdu2.redhat.com> This is the second version of the patch - it adds a test for boot_cpu_data.x86_clflush_size. There may be CPUs with different cache line size and we don't want to run the 64-byte aligned loop on them. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> memcpy_flushcache: use cache flusing for larger lengths I tested dm-writecache performance on a machine with Optane nvdimm and it turned out that for larger writes, cached stores + cache flushing perform better than non-temporal stores. This is the throughput of dm-writecache measured with this command: dd if=/dev/zero of=/dev/mapper/wc bs=64 oflag=direct block size 512 1024 2048 4096 movnti 496 MB/s 642 MB/s 725 MB/s 744 MB/s clflushopt 373 MB/s 688 MB/s 1.1 GB/s 1.2 GB/s We can see that for smaller block, movnti performs better, but for larger blocks, clflushopt has better performance. This patch changes the function __memcpy_flushcache accordingly, so that with size >= 768 it performs cached stores and cache flushing. Note that we must not use the new branch if the CPU doesn't have clflushopt - in that case, the kernel would use inefficient "clflush" instruction that has very bad performance. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- arch/x86/lib/usercopy_64.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) Index: linux-2.6/arch/x86/lib/usercopy_64.c =================================================================== --- linux-2.6.orig/arch/x86/lib/usercopy_64.c 2020-03-24 15:15:36.644945091 -0400 +++ linux-2.6/arch/x86/lib/usercopy_64.c 2020-03-30 07:17:51.450290007 -0400 @@ -152,6 +152,42 @@ void __memcpy_flushcache(void *_dst, con return; } + if (static_cpu_has(X86_FEATURE_CLFLUSHOPT) && size >= 768 && likely(boot_cpu_data.x86_clflush_size == 64)) { + while (!IS_ALIGNED(dest, 64)) { + asm("movq (%0), %%r8\n" + "movnti %%r8, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 8; + source += 8; + size -= 8; + } + do { + asm("movq (%0), %%r8\n" + "movq 8(%0), %%r9\n" + "movq 16(%0), %%r10\n" + "movq 24(%0), %%r11\n" + "movq %%r8, (%1)\n" + "movq %%r9, 8(%1)\n" + "movq %%r10, 16(%1)\n" + "movq %%r11, 24(%1)\n" + "movq 32(%0), %%r8\n" + "movq 40(%0), %%r9\n" + "movq 48(%0), %%r10\n" + "movq 56(%0), %%r11\n" + "movq %%r8, 32(%1)\n" + "movq %%r9, 40(%1)\n" + "movq %%r10, 48(%1)\n" + "movq %%r11, 56(%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8", "r9", "r10", "r11"); + clflushopt((void *)dest); + dest += 64; + source += 64; + size -= 64; + } while (size >= 64); + } + /* 4x8 movnti loop */ while (size >= 32) { asm("movq (%0), %%r8\n"
next prev parent reply other threads:[~2020-03-30 11:32 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-03-29 20:28 [PATCH] memcpy_flushcache: use cache flusing for larger lengths Mikulas Patocka 2020-03-29 20:28 ` Mikulas Patocka 2020-03-30 11:32 ` Mikulas Patocka [this message] 2020-03-30 11:32 ` [PATCH v2] " Mikulas Patocka 2020-03-31 0:28 ` Elliott, Robert (Servers) 2020-03-31 0:28 ` Elliott, Robert (Servers) 2020-03-31 11:58 ` Mikulas Patocka 2020-03-31 11:58 ` Mikulas Patocka 2020-03-31 21:19 ` Dan Williams 2020-03-31 21:19 ` Dan Williams 2020-04-01 16:26 ` Mikulas Patocka 2020-04-01 16:26 ` Mikulas Patocka 2020-04-01 16:26 ` Mikulas Patocka
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.LRH.2.02.2003300729320.9938@file01.intranet.prod.int.rdu2.redhat.com \ --to=mpatocka@redhat.com \ --cc=dan.j.williams@intel.com \ --cc=dave.jiang@intel.com \ --cc=dm-devel@redhat.com \ --cc=ira.weiny@intel.com \ --cc=linux-nvdimm@lists.01.org \ --cc=msnitzer@redhat.com \ --cc=vishal.l.verma@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.