* [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash @ 2018-04-18 19:32 Timofey Titovets 2018-04-18 19:32 ` [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets 2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets 0 siblings, 2 replies; 27+ messages in thread From: Timofey Titovets @ 2018-04-18 19:32 UTC (permalink / raw) To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh From: Timofey Titovets <nefelim4ag@gmail.com> Currently used jhash are slow enough and replace it allow as to make KSM less cpu hungry. About speed (in kernel): ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s By sioh Lee tests (copy from other mail): Test platform: openstack cloud platform (NEWTON version) Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb) VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4 Linux kernel: 4.14 (latest version) KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200 Experiment process Firstly, we turn off KSM and launch 4 VMs. Then we turn on the KSM and measure the checksum computation time until full_scans become two. The experimental results (the experimental value is the average of the measured values) crc32c_intel: 1084.10ns crc32c (no hardware acceleration): 7012.51ns xxhash32: 2227.75ns xxhash64: 1413.16ns jhash2: 5128.30ns In summary, the result shows that crc32c_intel has advantages over all of the hash function used in the experiment. (decreased by 84.54% compared to crc32c, 78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to xxhash64) the results are similar to those of Timofey. So: - Fisrt patch implement compile time pickup of fastest implementation of xxhash for target platform. - Second implement logic in ksm, what test speed of hashes and pickup fastest hash Thanks. CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org CC: leesioh <solee@os.korea.ac.kr> Timofey Titovets (2): xxHash: create arch dependent 32/64-bit xxhash() ksm: replace jhash2 with faster hash include/linux/xxhash.h | 23 +++++++++++++ mm/Kconfig | 2 ++ mm/ksm.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 114 insertions(+), 4 deletions(-) -- 2.14.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() 2018-04-18 19:32 [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash Timofey Titovets @ 2018-04-18 19:32 ` Timofey Titovets 2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets 1 sibling, 0 replies; 27+ messages in thread From: Timofey Titovets @ 2018-04-18 19:32 UTC (permalink / raw) To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh From: Timofey Titovets <nefelim4ag@gmail.com> xxh32() - fast on both 32/64-bit platforms xxh64() - fast only on 64-bit platform Create xxhash() which will pickup fastest version on compile time. As result depends on cpu word size, the main proporse of that - in memory hashing. Changes: v2: - Create that patch v3 -> v6: - Nothing, whole patchset version bump Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org CC: leesioh <solee@os.korea.ac.kr> --- include/linux/xxhash.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h index 9e1f42cb57e9..52b073fea17f 100644 --- a/include/linux/xxhash.h +++ b/include/linux/xxhash.h @@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed); */ uint64_t xxh64(const void *input, size_t length, uint64_t seed); +/** + * xxhash() - calculate wordsize hash of the input with a given seed + * @input: The data to hash. + * @length: The length of the data to hash. + * @seed: The seed can be used to alter the result predictably. + * + * If the hash does not need to be comparable between machines with + * different word sizes, this function will call whichever of xxh32() + * or xxh64() is faster. + * + * Return: wordsize hash of the data. + */ + +static inline unsigned long xxhash(const void *input, size_t length, + uint64_t seed) +{ +#if BITS_PER_LONG == 64 + return xxh64(input, length, seed); +#else + return xxh32(input, length, seed); +#endif +} + /*-**************************** * Streaming Hash Functions *****************************/ -- 2.14.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-04-18 19:32 [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash Timofey Titovets 2018-04-18 19:32 ` [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets @ 2018-04-18 19:32 ` Timofey Titovets 2018-05-08 15:26 ` Claudio Imbrenda 2018-05-22 20:22 ` Pavel Tatashin 1 sibling, 2 replies; 27+ messages in thread From: Timofey Titovets @ 2018-04-18 19:32 UTC (permalink / raw) To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm From: Timofey Titovets <nefelim4ag@gmail.com> 1. Pickup, Sioh Lee crc32 patch, after some long conversation 2. Merge with my work on xxhash 3. Add autoselect code to choice fastest hash helper. Base idea are same, replace jhash2 with something faster. Perf numbers: Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s As jhash2 always will be slower (for data size like PAGE_SIZE), just drop it from choice. Add function to autoselect hash algo on boot, based on hashing speed, like raid6 code does. Move init of zero_checksum from init, to first call of fasthash(): 1. KSM Init run on early kernel init, run perf testing stuff on main kernel boot thread looks bad to me. 2. Crypto subsystem not avaliable at that early booting, so crc32c even, compiled in, not avaliable As crypto and ksm init, run at subsys_initcall() (4) kernel level of init, all possible consumers will run later at 5+ levels Output after first try of KSM to hash page: ksm: crc32c hash() 15218 MB/s ksm: xxhash hash() 8640 MB/s ksm: choice crc32c as hash function Thanks. Changes: v1 -> v2: - Move xxhash() to xxhash.h/c and separate patches v2 -> v3: - Move xxhash() xxhash.c -> xxhash.h - replace xxhash_t with 'unsigned long' - update kerneldoc above xxhash() v3 -> v4: - Merge xxhash/crc32 patches - Replace crc32 with crc32c (crc32 have same as jhash2 speed) - Add auto speed test and auto choice of fastest hash function v4 -> v5: - Pickup missed xxhash patch - Update code with compile time choicen xxhash - Add more macros to make code more readable - As now that only possible use xxhash or crc32c, on crc32c allocation error, skip speed test and fallback to xxhash - For workaround too early init problem (crc32c not avaliable), move zero_checksum init to first call of fastcall() - Don't alloc page for hash testing, use arch zero pages for that v5 -> v6: - Use libcrc32c instead of CRYPTO API, mainly for code/Kconfig deps Simplification - Add crc32c_available(): libcrc32c will BUG_ON on crc32c problems, so test crc32c avaliable by crc32c_available() - Simplify choice_fastest_hash() - Simplify fasthash() - struct rmap_item && stable_node have sizeof == 64 on x86_64, that makes them cache friendly. As we don't suffer from hash collisions, change hash type from unsigned long back to u32. - Fix kbuild robot warning, make all local functions static Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: leesioh <solee@os.korea.ac.kr> CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org --- mm/Kconfig | 2 ++ mm/ksm.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 91 insertions(+), 4 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 03ff7703d322..b60bee4bb07e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -305,6 +305,8 @@ config MMU_NOTIFIER config KSM bool "Enable KSM for page merging" depends on MMU + select XXHASH + select LIBCRC32C help Enable Kernel Samepage Merging: KSM periodically scans those areas of an application's address space that an app has advised may be diff --git a/mm/ksm.c b/mm/ksm.c index c406f75957ad..2b84407fb918 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -25,7 +25,6 @@ #include <linux/pagemap.h> #include <linux/rmap.h> #include <linux/spinlock.h> -#include <linux/jhash.h> #include <linux/delay.h> #include <linux/kthread.h> #include <linux/wait.h> @@ -41,6 +40,13 @@ #include <linux/numa.h> #include <asm/tlbflush.h> + +/* Support for xxhash and crc32c */ +#include <crypto/hash.h> +#include <linux/crc32c.h> +#include <linux/xxhash.h> +#include <linux/sizes.h> + #include "internal.h" #ifdef CONFIG_NUMA @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); sizeof(struct __struct), __alignof__(struct __struct),\ (__flags), NULL) +#define TIME_125MS (HZ >> 3) +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) + +#define HASH_NONE 0 +#define HASH_CRC32C 1 +#define HASH_XXHASH 2 + +static int fastest_hash = HASH_NONE; + +static bool __init crc32c_available(void) +{ + static struct shash_desc desc; + + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); + desc.flags = 0; + + if (IS_ERR(desc.tfm)) { + pr_warn("ksm: alloc crc32c shash error %ld\n", + -PTR_ERR(desc.tfm)); + return false; + } + + crypto_free_shash(desc.tfm); + return true; +} + +static void __init choice_fastest_hash(void) +{ + + unsigned long je; + unsigned long perf_crc32c = 0; + unsigned long perf_xxhash = 0; + + fastest_hash = HASH_XXHASH; + if (!crc32c_available()) + goto out; + + preempt_disable(); + je = jiffies + TIME_125MS; + while (time_before(jiffies, je)) { + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); + perf_crc32c++; + } + preempt_enable(); + + preempt_disable(); + je = jiffies + TIME_125MS; + while (time_before(jiffies, je)) { + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); + perf_xxhash++; + } + preempt_enable(); + + pr_info("ksm: crc32c hash() %5ld MB/s\n", PERF_TO_MBS(perf_crc32c)); + pr_info("ksm: xxhash hash() %5ld MB/s\n", PERF_TO_MBS(perf_xxhash)); + + if (perf_crc32c > perf_xxhash) + fastest_hash = HASH_CRC32C; +out: + if (fastest_hash == HASH_CRC32C) + pr_info("ksm: choice crc32c as hash function\n"); + else + pr_info("ksm: choice xxhash as hash function\n"); +} + +static u32 fasthash(const void *input, size_t length) +{ +again: + switch (fastest_hash) { + case HASH_CRC32C: + return crc32c(0, input, length); + case HASH_XXHASH: + return xxhash(input, length, 0); + default: + choice_fastest_hash(); + /* The correct value depends on page size and endianness */ + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); + goto again; + } +} + static int __init ksm_slab_init(void) { rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0); @@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page) { u32 checksum; void *addr = kmap_atomic(page); - checksum = jhash2(addr, PAGE_SIZE / 4, 17); + checksum = fasthash(addr, PAGE_SIZE); kunmap_atomic(addr); return checksum; } @@ -3061,8 +3148,6 @@ static int __init ksm_init(void) struct task_struct *ksm_thread; int err; - /* The correct value depends on page size and endianness */ - zero_checksum = calc_checksum(ZERO_PAGE(0)); /* Default to false for backwards compatibility */ ksm_use_zero_pages = false; -- 2.14.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets @ 2018-05-08 15:26 ` Claudio Imbrenda 2018-05-11 23:06 ` Timofey Titovets 2018-05-22 20:22 ` Pavel Tatashin 1 sibling, 1 reply; 27+ messages in thread From: Claudio Imbrenda @ 2018-05-08 15:26 UTC (permalink / raw) To: Timofey Titovets; +Cc: linux-mm, leesioh, Andrea Arcangeli, kvm On Wed, 18 Apr 2018 22:32:20 +0300 Timofey Titovets <nefelim4ag@gmail.com> wrote: > From: Timofey Titovets <nefelim4ag@gmail.com> > > 1. Pickup, Sioh Lee crc32 patch, after some long conversation > 2. Merge with my work on xxhash > 3. Add autoselect code to choice fastest hash helper. > > Base idea are same, replace jhash2 with something faster. > > Perf numbers: > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > ksm: crc32c hash() 12081 MB/s > ksm: xxh64 hash() 8770 MB/s > ksm: xxh32 hash() 4529 MB/s > ksm: jhash2 hash() 1569 MB/s > > As jhash2 always will be slower (for data size like PAGE_SIZE), > just drop it from choice. > > Add function to autoselect hash algo on boot, > based on hashing speed, like raid6 code does. > > Move init of zero_checksum from init, to first call of fasthash(): > 1. KSM Init run on early kernel init, > run perf testing stuff on main kernel boot thread looks bad to This is my personal opinion, but I think it would be better and more uniform to have it during boot like raid6. It doesn't take too much time, and it allows to see immediately in dmesg what is going on. > me. 2. Crypto subsystem not avaliable at that early booting, > so crc32c even, compiled in, not avaliable > As crypto and ksm init, run at subsys_initcall() (4) kernel > level of init, all possible consumers will run later at 5+ levels have you tried moving ksm to a later stage? before commit a64fb3cd610c8e680 KSM was in fact initialized at level 6. After all, KSM cannot be triggered until userspace starts. > Output after first try of KSM to hash page: > ksm: crc32c hash() 15218 MB/s > ksm: xxhash hash() 8640 MB/s > ksm: choice crc32c as hash function > > Thanks. > > Changes: > v1 -> v2: > - Move xxhash() to xxhash.h/c and separate patches > v2 -> v3: > - Move xxhash() xxhash.c -> xxhash.h > - replace xxhash_t with 'unsigned long' > - update kerneldoc above xxhash() > v3 -> v4: > - Merge xxhash/crc32 patches > - Replace crc32 with crc32c (crc32 have same as jhash2 speed) > - Add auto speed test and auto choice of fastest hash function > v4 -> v5: > - Pickup missed xxhash patch > - Update code with compile time choicen xxhash > - Add more macros to make code more readable > - As now that only possible use xxhash or crc32c, > on crc32c allocation error, skip speed test and fallback to > xxhash > - For workaround too early init problem (crc32c not avaliable), > move zero_checksum init to first call of fastcall() > - Don't alloc page for hash testing, use arch zero pages for that > v5 -> v6: > - Use libcrc32c instead of CRYPTO API, mainly for > code/Kconfig deps Simplification > - Add crc32c_available(): > libcrc32c will BUG_ON on crc32c problems, > so test crc32c avaliable by crc32c_available() > - Simplify choice_fastest_hash() > - Simplify fasthash() > - struct rmap_item && stable_node have sizeof == 64 on x86_64, > that makes them cache friendly. As we don't suffer from hash > collisions, change hash type from unsigned long back to u32. > - Fix kbuild robot warning, make all local functions static > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > Signed-off-by: leesioh <solee@os.korea.ac.kr> > CC: Andrea Arcangeli <aarcange@redhat.com> > CC: linux-mm@kvack.org > CC: kvm@vger.kernel.org > --- > mm/Kconfig | 2 ++ > mm/ksm.c | 93 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 > files changed, 91 insertions(+), 4 deletions(-) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 03ff7703d322..b60bee4bb07e 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -305,6 +305,8 @@ config MMU_NOTIFIER > config KSM > bool "Enable KSM for page merging" > depends on MMU > + select XXHASH > + select LIBCRC32C > help > Enable Kernel Samepage Merging: KSM periodically scans > those areas of an application's address space that an app has advised > may be diff --git a/mm/ksm.c b/mm/ksm.c > index c406f75957ad..2b84407fb918 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -25,7 +25,6 @@ > #include <linux/pagemap.h> > #include <linux/rmap.h> > #include <linux/spinlock.h> > -#include <linux/jhash.h> > #include <linux/delay.h> > #include <linux/kthread.h> > #include <linux/wait.h> > @@ -41,6 +40,13 @@ > #include <linux/numa.h> > > #include <asm/tlbflush.h> > + > +/* Support for xxhash and crc32c */ > +#include <crypto/hash.h> > +#include <linux/crc32c.h> > +#include <linux/xxhash.h> > +#include <linux/sizes.h> > + > #include "internal.h" > > #ifdef CONFIG_NUMA > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); > sizeof(struct __struct), __alignof__(struct > __struct),\ (__flags), NULL) > > +#define TIME_125MS (HZ >> 3) > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) > + > +#define HASH_NONE 0 > +#define HASH_CRC32C 1 > +#define HASH_XXHASH 2 > + > +static int fastest_hash = HASH_NONE; > + > +static bool __init crc32c_available(void) > +{ > + static struct shash_desc desc; > + > + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); will this work without the crypto api? > + desc.flags = 0; > + > + if (IS_ERR(desc.tfm)) { > + pr_warn("ksm: alloc crc32c shash error %ld\n", > + -PTR_ERR(desc.tfm)); > + return false; > + } > + > + crypto_free_shash(desc.tfm); > + return true; > +} > + > +static void __init choice_fastest_hash(void) s/choice/choose/ > +{ > + > + unsigned long je; > + unsigned long perf_crc32c = 0; > + unsigned long perf_xxhash = 0; > + > + fastest_hash = HASH_XXHASH; > + if (!crc32c_available()) > + goto out; > + > + preempt_disable(); > + je = jiffies + TIME_125MS; > + while (time_before(jiffies, je)) { > + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); > + perf_crc32c++; > + } > + preempt_enable(); > + > + preempt_disable(); > + je = jiffies + TIME_125MS; > + while (time_before(jiffies, je)) { > + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); > + perf_xxhash++; > + } > + preempt_enable(); > + > + pr_info("ksm: crc32c hash() %5ld MB/s\n", > PERF_TO_MBS(perf_crc32c)); > + pr_info("ksm: xxhash hash() %5ld MB/s\n", > PERF_TO_MBS(perf_xxhash)); + > + if (perf_crc32c > perf_xxhash) > + fastest_hash = HASH_CRC32C; > +out: > + if (fastest_hash == HASH_CRC32C) > + pr_info("ksm: choice crc32c as hash function\n"); > + else > + pr_info("ksm: choice xxhash as hash function\n"); > +} I wonder if this can be generalized to have a list of possible hash functions, filtered by availability, and then tested for performance, more like the raid6 functions. > + > +static u32 fasthash(const void *input, size_t length) > +{ > +again: > + switch (fastest_hash) { > + case HASH_CRC32C: > + return crc32c(0, input, length); > + case HASH_XXHASH: > + return xxhash(input, length, 0); > + default: > + choice_fastest_hash(); same here s/choice/choose/ > + /* The correct value depends on page size and > endianness */ > + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); > + goto again; > + } > +} > + so if I understand correctly, the benchmark function will be called only when the function is called for the first time? > static int __init ksm_slab_init(void) > { > rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0); > @@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page) > { > u32 checksum; > void *addr = kmap_atomic(page); > - checksum = jhash2(addr, PAGE_SIZE / 4, 17); > + checksum = fasthash(addr, PAGE_SIZE); > kunmap_atomic(addr); > return checksum; > } > @@ -3061,8 +3148,6 @@ static int __init ksm_init(void) > struct task_struct *ksm_thread; > int err; > > - /* The correct value depends on page size and endianness */ > - zero_checksum = calc_checksum(ZERO_PAGE(0)); > /* Default to false for backwards compatibility */ > ksm_use_zero_pages = false; > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-08 15:26 ` Claudio Imbrenda @ 2018-05-11 23:06 ` Timofey Titovets 2018-05-14 10:17 ` Claudio Imbrenda 0 siblings, 1 reply; 27+ messages in thread From: Timofey Titovets @ 2018-05-11 23:06 UTC (permalink / raw) To: imbrenda; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm вт, 8 мая 2018 г. в 18:26, Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>: > On Wed, 18 Apr 2018 22:32:20 +0300 > Timofey Titovets <nefelim4ag@gmail.com> wrote: > > From: Timofey Titovets <nefelim4ag@gmail.com> > > > > 1. Pickup, Sioh Lee crc32 patch, after some long conversation > > 2. Merge with my work on xxhash > > 3. Add autoselect code to choice fastest hash helper. > > > > Base idea are same, replace jhash2 with something faster. > > > > Perf numbers: > > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > > ksm: crc32c hash() 12081 MB/s > > ksm: xxh64 hash() 8770 MB/s > > ksm: xxh32 hash() 4529 MB/s > > ksm: jhash2 hash() 1569 MB/s > > > > As jhash2 always will be slower (for data size like PAGE_SIZE), > > just drop it from choice. > > > > Add function to autoselect hash algo on boot, > > based on hashing speed, like raid6 code does. > > > > Move init of zero_checksum from init, to first call of fasthash(): > > 1. KSM Init run on early kernel init, > > run perf testing stuff on main kernel boot thread looks bad to > This is my personal opinion, but I think it would be better and more > uniform to have it during boot like raid6. It doesn't take too much > time, and it allows to see immediately in dmesg what is going on. I don't like such things at boot, that will slowdown boot and add useless work in *MOST* cases. ex. Anyone who use btrfs as rootfs must wait raid6_pq init, for mount. Even if they didn't use raid56 functionality. Same for ksm, who use ksm? I think that 90% of users currently are servers with KVM's VMs. i.e. i don't think that you use it on your notebook, and add 250ms to every bootup, even, if you did not use ksm looks as bad idea for me. And as that a mm subsystem, that will lead to *every linux device in the world* with compiled in ksm, will spend time and energy to ksm init. > > me. 2. Crypto subsystem not avaliable at that early booting, > > so crc32c even, compiled in, not avaliable > > As crypto and ksm init, run at subsys_initcall() (4) kernel > > level of init, all possible consumers will run later at 5+ levels > have you tried moving ksm to a later stage? before commit > a64fb3cd610c8e680 KSM was in fact initialized at level 6. After all, KSM > cannot be triggered until userspace starts. Of course and that works, but i didn't have sufficient competence, to suggest such changes. > > Output after first try of KSM to hash page: > > ksm: crc32c hash() 15218 MB/s > > ksm: xxhash hash() 8640 MB/s > > ksm: choice crc32c as hash function > > > > Thanks. > > > > Changes: > > v1 -> v2: > > - Move xxhash() to xxhash.h/c and separate patches > > v2 -> v3: > > - Move xxhash() xxhash.c -> xxhash.h > > - replace xxhash_t with 'unsigned long' > > - update kerneldoc above xxhash() > > v3 -> v4: > > - Merge xxhash/crc32 patches > > - Replace crc32 with crc32c (crc32 have same as jhash2 speed) > > - Add auto speed test and auto choice of fastest hash function > > v4 -> v5: > > - Pickup missed xxhash patch > > - Update code with compile time choicen xxhash > > - Add more macros to make code more readable > > - As now that only possible use xxhash or crc32c, > > on crc32c allocation error, skip speed test and fallback to > > xxhash > > - For workaround too early init problem (crc32c not avaliable), > > move zero_checksum init to first call of fastcall() > > - Don't alloc page for hash testing, use arch zero pages for that > > v5 -> v6: > > - Use libcrc32c instead of CRYPTO API, mainly for > > code/Kconfig deps Simplification > > - Add crc32c_available(): > > libcrc32c will BUG_ON on crc32c problems, > > so test crc32c avaliable by crc32c_available() > > - Simplify choice_fastest_hash() > > - Simplify fasthash() > > - struct rmap_item && stable_node have sizeof == 64 on x86_64, > > that makes them cache friendly. As we don't suffer from hash > > collisions, change hash type from unsigned long back to u32. > > - Fix kbuild robot warning, make all local functions static > > > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > > Signed-off-by: leesioh <solee@os.korea.ac.kr> > > CC: Andrea Arcangeli <aarcange@redhat.com> > > CC: linux-mm@kvack.org > > CC: kvm@vger.kernel.org > > --- > > mm/Kconfig | 2 ++ > > mm/ksm.c | 93 > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 > > files changed, 91 insertions(+), 4 deletions(-) > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > index 03ff7703d322..b60bee4bb07e 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -305,6 +305,8 @@ config MMU_NOTIFIER > > config KSM > > bool "Enable KSM for page merging" > > depends on MMU > > + select XXHASH > > + select LIBCRC32C > > help > > Enable Kernel Samepage Merging: KSM periodically scans > > those areas of an application's address space that an app has advised > > may be diff --git a/mm/ksm.c b/mm/ksm.c > > index c406f75957ad..2b84407fb918 100644 > > --- a/mm/ksm.c > > +++ b/mm/ksm.c > > @@ -25,7 +25,6 @@ > > #include <linux/pagemap.h> > > #include <linux/rmap.h> > > #include <linux/spinlock.h> > > -#include <linux/jhash.h> > > #include <linux/delay.h> > > #include <linux/kthread.h> > > #include <linux/wait.h> > > @@ -41,6 +40,13 @@ > > #include <linux/numa.h> > > > > #include <asm/tlbflush.h> > > + > > +/* Support for xxhash and crc32c */ > > +#include <crypto/hash.h> > > +#include <linux/crc32c.h> > > +#include <linux/xxhash.h> > > +#include <linux/sizes.h> > > + > > #include "internal.h" > > > > #ifdef CONFIG_NUMA > > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); > > sizeof(struct __struct), __alignof__(struct > > __struct),\ (__flags), NULL) > > > > +#define TIME_125MS (HZ >> 3) > > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) > > + > > +#define HASH_NONE 0 > > +#define HASH_CRC32C 1 > > +#define HASH_XXHASH 2 > > + > > +static int fastest_hash = HASH_NONE; > > + > > +static bool __init crc32c_available(void) > > +{ > > + static struct shash_desc desc; > > + > > + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); > will this work without the crypto api? I didn't know a way to compile kernel without crypto api, To many different sub systems depends on him, if i read Kconfig correctly of course. > > + desc.flags = 0; > > + > > + if (IS_ERR(desc.tfm)) { > > + pr_warn("ksm: alloc crc32c shash error %ld\n", > > + -PTR_ERR(desc.tfm)); > > + return false; > > + } > > + > > + crypto_free_shash(desc.tfm); > > + return true; > > +} > > + > > +static void __init choice_fastest_hash(void) > s/choice/choose/ > > +{ > > + > > + unsigned long je; > > + unsigned long perf_crc32c = 0; > > + unsigned long perf_xxhash = 0; > > + > > + fastest_hash = HASH_XXHASH; > > + if (!crc32c_available()) > > + goto out; > > + > > + preempt_disable(); > > + je = jiffies + TIME_125MS; > > + while (time_before(jiffies, je)) { > > + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); > > + perf_crc32c++; > > + } > > + preempt_enable(); > > + > > + preempt_disable(); > > + je = jiffies + TIME_125MS; > > + while (time_before(jiffies, je)) { > > + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); > > + perf_xxhash++; > > + } > > + preempt_enable(); > > + > > + pr_info("ksm: crc32c hash() %5ld MB/s\n", > > PERF_TO_MBS(perf_crc32c)); > > + pr_info("ksm: xxhash hash() %5ld MB/s\n", > > PERF_TO_MBS(perf_xxhash)); + > > + if (perf_crc32c > perf_xxhash) > > + fastest_hash = HASH_CRC32C; > > +out: > > + if (fastest_hash == HASH_CRC32C) > > + pr_info("ksm: choice crc32c as hash function\n"); > > + else > > + pr_info("ksm: choice xxhash as hash function\n"); > > +} > I wonder if this can be generalized to have a list of possible hash > functions, filtered by availability, and then tested for performance, > more like the raid6 functions. IIRC: We was talk about that on old version of patch set. And we decide what: - in ideal situation, ksm must use only one hash function, always. But, we afraid about that crc32c with hardware acceleration, can be missed by some way. So, as appropriate fallback, xxhash added, as general proporse, which must work good enough for ksm in most cases. So adding more complex logic, like raid6_pq have with all of different instruction set are overkill. > > + > > +static u32 fasthash(const void *input, size_t length) > > +{ > > +again: > > + switch (fastest_hash) { > > + case HASH_CRC32C: > > + return crc32c(0, input, length); > > + case HASH_XXHASH: > > + return xxhash(input, length, 0); > > + default: > > + choice_fastest_hash(); > same here s/choice/choose/ > > + /* The correct value depends on page size and > > endianness */ > > + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); > > + goto again; > > + } > > +} > > + > so if I understand correctly, the benchmark function will be called > only when the function is called for the first time? yes, that is. That a little bit tricky, but it's will be called only from KSM thread, and only what KSM thread will try do some useful work. So that must not block anything. Thanks. -- Have a nice day, Timofey. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-11 23:06 ` Timofey Titovets @ 2018-05-14 10:17 ` Claudio Imbrenda 2018-05-16 10:26 ` Timofey Titovets 0 siblings, 1 reply; 27+ messages in thread From: Claudio Imbrenda @ 2018-05-14 10:17 UTC (permalink / raw) To: Timofey Titovets; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm On Sat, 12 May 2018 02:06:20 +0300 Timofey Titovets <nefelim4ag@gmail.com> wrote: > вт, 8 мая 2018 г. в 18:26, Claudio Imbrenda > <imbrenda@linux.vnet.ibm.com>: > > > On Wed, 18 Apr 2018 22:32:20 +0300 > > Timofey Titovets <nefelim4ag@gmail.com> wrote: > > > > From: Timofey Titovets <nefelim4ag@gmail.com> > > > > > > 1. Pickup, Sioh Lee crc32 patch, after some long conversation > > > 2. Merge with my work on xxhash > > > 3. Add autoselect code to choice fastest hash helper. > > > > > > Base idea are same, replace jhash2 with something faster. > > > > > > Perf numbers: > > > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > > > ksm: crc32c hash() 12081 MB/s > > > ksm: xxh64 hash() 8770 MB/s > > > ksm: xxh32 hash() 4529 MB/s > > > ksm: jhash2 hash() 1569 MB/s > > > > > > As jhash2 always will be slower (for data size like PAGE_SIZE), > > > just drop it from choice. > > > > > > Add function to autoselect hash algo on boot, > > > based on hashing speed, like raid6 code does. > > > > > > Move init of zero_checksum from init, to first call of fasthash(): > > > 1. KSM Init run on early kernel init, > > > run perf testing stuff on main kernel boot thread looks bad > > > to > > > This is my personal opinion, but I think it would be better and more > > uniform to have it during boot like raid6. It doesn't take too much > > time, and it allows to see immediately in dmesg what is going on. > > I don't like such things at boot, that will slowdown boot and add > useless work in *MOST* cases. > > ex. Anyone who use btrfs as rootfs must wait raid6_pq init, for mount. > Even if they didn't use raid56 functionality. > > Same for ksm, who use ksm? I think that 90% of users currently > are servers with KVM's VMs. > > i.e. i don't think that you use it on your notebook, > and add 250ms to every bootup, even, if you did not use ksm > looks as bad idea for me. > > And as that a mm subsystem, that will lead to *every linux device in > the world* > with compiled in ksm, will spend time and energy to ksm init. fair enough > > > me. 2. Crypto subsystem not avaliable at that early booting, > > > so crc32c even, compiled in, not avaliable > > > As crypto and ksm init, run at subsys_initcall() (4) kernel > > > level of init, all possible consumers will run later at 5+ > > > levels > > > have you tried moving ksm to a later stage? before commit > > a64fb3cd610c8e680 KSM was in fact initialized at level 6. After > > all, KSM cannot be triggered until userspace starts. > > Of course and that works, > but i didn't have sufficient competence, > to suggest such changes. > > > > Output after first try of KSM to hash page: > > > ksm: crc32c hash() 15218 MB/s > > > ksm: xxhash hash() 8640 MB/s > > > ksm: choice crc32c as hash function > > > > > > Thanks. > > > > > > Changes: > > > v1 -> v2: > > > - Move xxhash() to xxhash.h/c and separate patches > > > v2 -> v3: > > > - Move xxhash() xxhash.c -> xxhash.h > > > - replace xxhash_t with 'unsigned long' > > > - update kerneldoc above xxhash() > > > v3 -> v4: > > > - Merge xxhash/crc32 patches > > > - Replace crc32 with crc32c (crc32 have same as jhash2 speed) > > > - Add auto speed test and auto choice of fastest hash function > > > v4 -> v5: > > > - Pickup missed xxhash patch > > > - Update code with compile time choicen xxhash > > > - Add more macros to make code more readable > > > - As now that only possible use xxhash or crc32c, > > > on crc32c allocation error, skip speed test and fallback to > > > xxhash > > > - For workaround too early init problem (crc32c not > > > avaliable), move zero_checksum init to first call of fastcall() > > > - Don't alloc page for hash testing, use arch zero pages for > > > that v5 -> v6: > > > - Use libcrc32c instead of CRYPTO API, mainly for > > > code/Kconfig deps Simplification > > > - Add crc32c_available(): > > > libcrc32c will BUG_ON on crc32c problems, > > > so test crc32c avaliable by crc32c_available() > > > - Simplify choice_fastest_hash() > > > - Simplify fasthash() > > > - struct rmap_item && stable_node have sizeof == 64 on x86_64, > > > that makes them cache friendly. As we don't suffer from hash > > > collisions, change hash type from unsigned long back to u32. > > > - Fix kbuild robot warning, make all local functions static > > > > > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > > > Signed-off-by: leesioh <solee@os.korea.ac.kr> > > > CC: Andrea Arcangeli <aarcange@redhat.com> > > > CC: linux-mm@kvack.org > > > CC: kvm@vger.kernel.org > > > --- > > > mm/Kconfig | 2 ++ > > > mm/ksm.c | 93 > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 > > > files changed, 91 insertions(+), 4 deletions(-) > > > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > > index 03ff7703d322..b60bee4bb07e 100644 > > > --- a/mm/Kconfig > > > +++ b/mm/Kconfig > > > @@ -305,6 +305,8 @@ config MMU_NOTIFIER > > > config KSM > > > bool "Enable KSM for page merging" > > > depends on MMU > > > + select XXHASH > > > + select LIBCRC32C > > > help > > > Enable Kernel Samepage Merging: KSM periodically scans > > > those areas of an application's address space that an app has > > > advised may be diff --git a/mm/ksm.c b/mm/ksm.c > > > index c406f75957ad..2b84407fb918 100644 > > > --- a/mm/ksm.c > > > +++ b/mm/ksm.c > > > @@ -25,7 +25,6 @@ > > > #include <linux/pagemap.h> > > > #include <linux/rmap.h> > > > #include <linux/spinlock.h> > > > -#include <linux/jhash.h> > > > #include <linux/delay.h> > > > #include <linux/kthread.h> > > > #include <linux/wait.h> > > > @@ -41,6 +40,13 @@ > > > #include <linux/numa.h> > > > > > > #include <asm/tlbflush.h> > > > + > > > +/* Support for xxhash and crc32c */ > > > +#include <crypto/hash.h> > > > +#include <linux/crc32c.h> > > > +#include <linux/xxhash.h> > > > +#include <linux/sizes.h> > > > + > > > #include "internal.h" > > > > > > #ifdef CONFIG_NUMA > > > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); > > > sizeof(struct __struct), __alignof__(struct > > > __struct),\ (__flags), NULL) > > > > > > +#define TIME_125MS (HZ >> 3) > > > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) > > > + > > > +#define HASH_NONE 0 > > > +#define HASH_CRC32C 1 > > > +#define HASH_XXHASH 2 > > > + > > > +static int fastest_hash = HASH_NONE; > > > + > > > +static bool __init crc32c_available(void) > > > +{ > > > + static struct shash_desc desc; > > > + > > > + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); > > > will this work without the crypto api? > > I didn't know a way to compile kernel without crypto api, > To many different sub systems depends on him, > if i read Kconfig correctly of course. I'm confused here. Why did you want to drop the dependency on the crypto API in Kconfig if you are using it anyway? Or did I misunderstand? > > > + desc.flags = 0; > > > + > > > + if (IS_ERR(desc.tfm)) { > > > + pr_warn("ksm: alloc crc32c shash error %ld\n", > > > + -PTR_ERR(desc.tfm)); > > > + return false; > > > + } > > > + > > > + crypto_free_shash(desc.tfm); > > > + return true; > > > +} > > > + > > > +static void __init choice_fastest_hash(void) > > > s/choice/choose/ > > > > +{ > > > + > > > + unsigned long je; > > > + unsigned long perf_crc32c = 0; > > > + unsigned long perf_xxhash = 0; > > > + > > > + fastest_hash = HASH_XXHASH; > > > + if (!crc32c_available()) > > > + goto out; > > > + > > > + preempt_disable(); > > > + je = jiffies + TIME_125MS; > > > + while (time_before(jiffies, je)) { > > > + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); > > > + perf_crc32c++; > > > + } > > > + preempt_enable(); > > > + > > > + preempt_disable(); > > > + je = jiffies + TIME_125MS; > > > + while (time_before(jiffies, je)) { > > > + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); > > > + perf_xxhash++; > > > + } > > > + preempt_enable(); > > > + > > > + pr_info("ksm: crc32c hash() %5ld MB/s\n", > > > PERF_TO_MBS(perf_crc32c)); > > > + pr_info("ksm: xxhash hash() %5ld MB/s\n", > > > PERF_TO_MBS(perf_xxhash)); + > > > + if (perf_crc32c > perf_xxhash) > > > + fastest_hash = HASH_CRC32C; > > > +out: > > > + if (fastest_hash == HASH_CRC32C) > > > + pr_info("ksm: choice crc32c as hash function\n"); > > > + else > > > + pr_info("ksm: choice xxhash as hash function\n"); > > > +} > > > I wonder if this can be generalized to have a list of possible hash > > functions, filtered by availability, and then tested for > > performance, more like the raid6 functions. > > IIRC: > We was talk about that on old version of patch set. > And we decide what: > - in ideal situation, ksm must use only one hash function, always. > But, we afraid about that crc32c with hardware acceleration, can > be missed by some way. > So, as appropriate fallback, xxhash added, as general proporse, > which must work > good enough for ksm in most cases. > > So adding more complex logic, like raid6_pq have with all of different > instruction set are overkill. fair enough > > > + > > > +static u32 fasthash(const void *input, size_t length) > > > +{ > > > +again: > > > + switch (fastest_hash) { > > > + case HASH_CRC32C: > > > + return crc32c(0, input, length); > > > + case HASH_XXHASH: > > > + return xxhash(input, length, 0); > > > + default: > > > + choice_fastest_hash(); > > > same here s/choice/choose/ > > > > + /* The correct value depends on page size and > > > endianness */ > > > + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); > > > + goto again; > > > + } > > > +} > > > + > > > so if I understand correctly, the benchmark function will be called > > only when the function is called for the first time? > > yes, that is. > That a little bit tricky, > but it's will be called only from KSM thread, > and only what KSM thread will try do some useful work. > > So that must not block anything. > > Thanks. best regards Claudio Imbrenda ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-14 10:17 ` Claudio Imbrenda @ 2018-05-16 10:26 ` Timofey Titovets 0 siblings, 0 replies; 27+ messages in thread From: Timofey Titovets @ 2018-05-16 10:26 UTC (permalink / raw) To: imbrenda; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm пн, 14 мая 2018 г. в 13:17, Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>: > On Sat, 12 May 2018 02:06:20 +0300 > Timofey Titovets <nefelim4ag@gmail.com> wrote: > > вт, 8 мая 2018 г. в 18:26, Claudio Imbrenda > > <imbrenda@linux.vnet.ibm.com>: > > > > > On Wed, 18 Apr 2018 22:32:20 +0300 > > > Timofey Titovets <nefelim4ag@gmail.com> wrote: > > > > > > From: Timofey Titovets <nefelim4ag@gmail.com> > > > > > > > > 1. Pickup, Sioh Lee crc32 patch, after some long conversation > > > > 2. Merge with my work on xxhash > > > > 3. Add autoselect code to choice fastest hash helper. > > > > > > > > Base idea are same, replace jhash2 with something faster. > > > > > > > > Perf numbers: > > > > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > > > > ksm: crc32c hash() 12081 MB/s > > > > ksm: xxh64 hash() 8770 MB/s > > > > ksm: xxh32 hash() 4529 MB/s > > > > ksm: jhash2 hash() 1569 MB/s > > > > > > > > As jhash2 always will be slower (for data size like PAGE_SIZE), > > > > just drop it from choice. > > > > > > > > Add function to autoselect hash algo on boot, > > > > based on hashing speed, like raid6 code does. > > > > > > > > Move init of zero_checksum from init, to first call of fasthash(): > > > > 1. KSM Init run on early kernel init, > > > > run perf testing stuff on main kernel boot thread looks bad > > > > to > > > > > This is my personal opinion, but I think it would be better and more > > > uniform to have it during boot like raid6. It doesn't take too much > > > time, and it allows to see immediately in dmesg what is going on. > > > > I don't like such things at boot, that will slowdown boot and add > > useless work in *MOST* cases. > > > > ex. Anyone who use btrfs as rootfs must wait raid6_pq init, for mount. > > Even if they didn't use raid56 functionality. > > > > Same for ksm, who use ksm? I think that 90% of users currently > > are servers with KVM's VMs. > > > > i.e. i don't think that you use it on your notebook, > > and add 250ms to every bootup, even, if you did not use ksm > > looks as bad idea for me. > > > > And as that a mm subsystem, that will lead to *every linux device in > > the world* > > with compiled in ksm, will spend time and energy to ksm init. > fair enough > > > > me. 2. Crypto subsystem not avaliable at that early booting, > > > > so crc32c even, compiled in, not avaliable > > > > As crypto and ksm init, run at subsys_initcall() (4) kernel > > > > level of init, all possible consumers will run later at 5+ > > > > levels > > > > > have you tried moving ksm to a later stage? before commit > > > a64fb3cd610c8e680 KSM was in fact initialized at level 6. After > > > all, KSM cannot be triggered until userspace starts. > > > > Of course and that works, > > but i didn't have sufficient competence, > > to suggest such changes. > > > > > > Output after first try of KSM to hash page: > > > > ksm: crc32c hash() 15218 MB/s > > > > ksm: xxhash hash() 8640 MB/s > > > > ksm: choice crc32c as hash function > > > > > > > > Thanks. > > > > > > > > Changes: > > > > v1 -> v2: > > > > - Move xxhash() to xxhash.h/c and separate patches > > > > v2 -> v3: > > > > - Move xxhash() xxhash.c -> xxhash.h > > > > - replace xxhash_t with 'unsigned long' > > > > - update kerneldoc above xxhash() > > > > v3 -> v4: > > > > - Merge xxhash/crc32 patches > > > > - Replace crc32 with crc32c (crc32 have same as jhash2 speed) > > > > - Add auto speed test and auto choice of fastest hash function > > > > v4 -> v5: > > > > - Pickup missed xxhash patch > > > > - Update code with compile time choicen xxhash > > > > - Add more macros to make code more readable > > > > - As now that only possible use xxhash or crc32c, > > > > on crc32c allocation error, skip speed test and fallback to > > > > xxhash > > > > - For workaround too early init problem (crc32c not > > > > avaliable), move zero_checksum init to first call of fastcall() > > > > - Don't alloc page for hash testing, use arch zero pages for > > > > that v5 -> v6: > > > > - Use libcrc32c instead of CRYPTO API, mainly for > > > > code/Kconfig deps Simplification > > > > - Add crc32c_available(): > > > > libcrc32c will BUG_ON on crc32c problems, > > > > so test crc32c avaliable by crc32c_available() > > > > - Simplify choice_fastest_hash() > > > > - Simplify fasthash() > > > > - struct rmap_item && stable_node have sizeof == 64 on x86_64, > > > > that makes them cache friendly. As we don't suffer from hash > > > > collisions, change hash type from unsigned long back to u32. > > > > - Fix kbuild robot warning, make all local functions static > > > > > > > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > > > > Signed-off-by: leesioh <solee@os.korea.ac.kr> > > > > CC: Andrea Arcangeli <aarcange@redhat.com> > > > > CC: linux-mm@kvack.org > > > > CC: kvm@vger.kernel.org > > > > --- > > > > mm/Kconfig | 2 ++ > > > > mm/ksm.c | 93 > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 > > > > files changed, 91 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > > > index 03ff7703d322..b60bee4bb07e 100644 > > > > --- a/mm/Kconfig > > > > +++ b/mm/Kconfig > > > > @@ -305,6 +305,8 @@ config MMU_NOTIFIER > > > > config KSM > > > > bool "Enable KSM for page merging" > > > > depends on MMU > > > > + select XXHASH > > > > + select LIBCRC32C > > > > help > > > > Enable Kernel Samepage Merging: KSM periodically scans > > > > those areas of an application's address space that an app has > > > > advised may be diff --git a/mm/ksm.c b/mm/ksm.c > > > > index c406f75957ad..2b84407fb918 100644 > > > > --- a/mm/ksm.c > > > > +++ b/mm/ksm.c > > > > @@ -25,7 +25,6 @@ > > > > #include <linux/pagemap.h> > > > > #include <linux/rmap.h> > > > > #include <linux/spinlock.h> > > > > -#include <linux/jhash.h> > > > > #include <linux/delay.h> > > > > #include <linux/kthread.h> > > > > #include <linux/wait.h> > > > > @@ -41,6 +40,13 @@ > > > > #include <linux/numa.h> > > > > > > > > #include <asm/tlbflush.h> > > > > + > > > > +/* Support for xxhash and crc32c */ > > > > +#include <crypto/hash.h> > > > > +#include <linux/crc32c.h> > > > > +#include <linux/xxhash.h> > > > > +#include <linux/sizes.h> > > > > + > > > > #include "internal.h" > > > > > > > > #ifdef CONFIG_NUMA > > > > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); > > > > sizeof(struct __struct), __alignof__(struct > > > > __struct),\ (__flags), NULL) > > > > > > > > +#define TIME_125MS (HZ >> 3) > > > > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) > > > > + > > > > +#define HASH_NONE 0 > > > > +#define HASH_CRC32C 1 > > > > +#define HASH_XXHASH 2 > > > > + > > > > +static int fastest_hash = HASH_NONE; > > > > + > > > > +static bool __init crc32c_available(void) > > > > +{ > > > > + static struct shash_desc desc; > > > > + > > > > + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); > > > > > will this work without the crypto api? > > > > I didn't know a way to compile kernel without crypto api, > > To many different sub systems depends on him, > > if i read Kconfig correctly of course. > I'm confused here. Why did you want to drop the dependency on the > crypto API in Kconfig if you are using it anyway? Or did I > misunderstand? Misunderstanding, i think. You ask, `will this work without the crypto api?` I answer that: i didn't see obvious way to try compile that without crypto api. Even without add deps on crypto api from ksm, kernel have plenty of other subsystems, which depends on crypto api. (As patch add dependency on crypto api, we have no way to compile ksm without crypto api, so i'm not sure what ksm without crypto api are issue which must be fixed) Thanks. > > > > + desc.flags = 0; > > > > + > > > > + if (IS_ERR(desc.tfm)) { > > > > + pr_warn("ksm: alloc crc32c shash error %ld\n", > > > > + -PTR_ERR(desc.tfm)); > > > > + return false; > > > > + } > > > > + > > > > + crypto_free_shash(desc.tfm); > > > > + return true; > > > > +} > > > > + > > > > +static void __init choice_fastest_hash(void) > > > > > s/choice/choose/ > > > > > > +{ > > > > + > > > > + unsigned long je; > > > > + unsigned long perf_crc32c = 0; > > > > + unsigned long perf_xxhash = 0; > > > > + > > > > + fastest_hash = HASH_XXHASH; > > > > + if (!crc32c_available()) > > > > + goto out; > > > > + > > > > + preempt_disable(); > > > > + je = jiffies + TIME_125MS; > > > > + while (time_before(jiffies, je)) { > > > > + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); > > > > + perf_crc32c++; > > > > + } > > > > + preempt_enable(); > > > > + > > > > + preempt_disable(); > > > > + je = jiffies + TIME_125MS; > > > > + while (time_before(jiffies, je)) { > > > > + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); > > > > + perf_xxhash++; > > > > + } > > > > + preempt_enable(); > > > > + > > > > + pr_info("ksm: crc32c hash() %5ld MB/s\n", > > > > PERF_TO_MBS(perf_crc32c)); > > > > + pr_info("ksm: xxhash hash() %5ld MB/s\n", > > > > PERF_TO_MBS(perf_xxhash)); + > > > > + if (perf_crc32c > perf_xxhash) > > > > + fastest_hash = HASH_CRC32C; > > > > +out: > > > > + if (fastest_hash == HASH_CRC32C) > > > > + pr_info("ksm: choice crc32c as hash function\n"); > > > > + else > > > > + pr_info("ksm: choice xxhash as hash function\n"); > > > > +} > > > > > I wonder if this can be generalized to have a list of possible hash > > > functions, filtered by availability, and then tested for > > > performance, more like the raid6 functions. > > > > IIRC: > > We was talk about that on old version of patch set. > > And we decide what: > > - in ideal situation, ksm must use only one hash function, always. > > But, we afraid about that crc32c with hardware acceleration, can > > be missed by some way. > > So, as appropriate fallback, xxhash added, as general proporse, > > which must work > > good enough for ksm in most cases. > > > > So adding more complex logic, like raid6_pq have with all of different > > instruction set are overkill. > fair enough > > > > + > > > > +static u32 fasthash(const void *input, size_t length) > > > > +{ > > > > +again: > > > > + switch (fastest_hash) { > > > > + case HASH_CRC32C: > > > > + return crc32c(0, input, length); > > > > + case HASH_XXHASH: > > > > + return xxhash(input, length, 0); > > > > + default: > > > > + choice_fastest_hash(); > > > > > same here s/choice/choose/ > > > > > > + /* The correct value depends on page size and > > > > endianness */ > > > > + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); > > > > + goto again; > > > > + } > > > > +} > > > > + > > > > > so if I understand correctly, the benchmark function will be called > > > only when the function is called for the first time? > > > > yes, that is. > > That a little bit tricky, > > but it's will be called only from KSM thread, > > and only what KSM thread will try do some useful work. > > > > So that must not block anything. > > > > Thanks. > best regards > Claudio Imbrenda -- Have a nice day, Timofey. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets 2018-05-08 15:26 ` Claudio Imbrenda @ 2018-05-22 20:22 ` Pavel Tatashin 2018-05-23 13:45 ` Timofey Titovets 1 sibling, 1 reply; 27+ messages in thread From: Pavel Tatashin @ 2018-05-22 20:22 UTC (permalink / raw) To: Timofey Titovets; +Cc: linux-mm, leesioh, Andrea Arcangeli, kvm Hi Timofey, > > Perf numbers: > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > ksm: crc32c hash() 12081 MB/s > ksm: xxh64 hash() 8770 MB/s > ksm: xxh32 hash() 4529 MB/s > ksm: jhash2 hash() 1569 MB/s That is a very nice improvement over jhash2! > Add function to autoselect hash algo on boot, > based on hashing speed, like raid6 code does. Are you aware of hardware where crc32c is slower compared to xxhash? Perhaps always use crc32c when available? > + > +static u32 fasthash(const void *input, size_t length) > +{ > +again: > + switch (fastest_hash) { > + case HASH_CRC32C: > + return crc32c(0, input, length); > + case HASH_XXHASH: > + return xxhash(input, length, 0); You are loosing half of 64-bit word in xxh64 case? Is this acceptable? May be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ? > + default: > + choice_fastest_hash(); > + /* The correct value depends on page size and endianness */ > + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); > + goto again; > + } > +} choice_fastest_hash() does not belong to fasthash(). We are loosing leaf function optimizations if you keep it in this hot-path. Also, fastest_hash should really be a static branch in order to avoid extra load and conditional branch. I think, crc32c should simply be used when it is available, and use xxhash otherwise, the decision should be made in ksm_init() Thank you, Pavel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-22 20:22 ` Pavel Tatashin @ 2018-05-23 13:45 ` Timofey Titovets 2018-05-23 14:24 ` Pavel Tatashin 0 siblings, 1 reply; 27+ messages in thread From: Timofey Titovets @ 2018-05-23 13:45 UTC (permalink / raw) To: pasha.tatashin; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm вт, 22 мая 2018 г. в 23:22, Pavel Tatashin <pasha.tatashin@oracle.com>: > Hi Timofey, > > > > Perf numbers: > > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz > > ksm: crc32c hash() 12081 MB/s > > ksm: xxh64 hash() 8770 MB/s > > ksm: xxh32 hash() 4529 MB/s > > ksm: jhash2 hash() 1569 MB/s > That is a very nice improvement over jhash2! > > Add function to autoselect hash algo on boot, > > based on hashing speed, like raid6 code does. > Are you aware of hardware where crc32c is slower compared to xxhash? > Perhaps always use crc32c when available? crc32c will always be available, because of Kconfig. But if crc32c doesn't have HW acceleration, it will be slower. For talk about range of HW, i must have that HW, so i can't say that *all* supported HW, have crc32c with acceleration. > > + > > +static u32 fasthash(const void *input, size_t length) > > +{ > > +again: > > + switch (fastest_hash) { > > + case HASH_CRC32C: > > + return crc32c(0, input, length); > > + case HASH_XXHASH: > > + return xxhash(input, length, 0); > You are loosing half of 64-bit word in xxh64 case? Is this acceptable? May > be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ? AFAIK, that lead to make hash function worse. Even, in ksm hash used only for check if page has changed since last scan, so that doesn't matter really (IMHO). > > + default: > > + choice_fastest_hash(); > > + /* The correct value depends on page size and endianness */ > > + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); > > + goto again; > > + } > > +} > choice_fastest_hash() does not belong to fasthash(). We are loosing leaf > function optimizations if you keep it in this hot-path. Also, fastest_hash > should really be a static branch in order to avoid extra load and conditional > branch. I don't think what that will give any noticeable performance benefit. In compare to hash computation and memcmp in RB. In theory, that can be replaced with self written jump table, to *avoid* run time overhead. AFAIK at 5 entries, gcc convert switch to jump table itself. > I think, crc32c should simply be used when it is available, and use xxhash > otherwise, the decision should be made in ksm_init() I already said, in above conversation, why i think do that at ksm_init() is a bad idea. > Thank you, > Pavel Thanks. -- Have a nice day, Timofey. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-23 13:45 ` Timofey Titovets @ 2018-05-23 14:24 ` Pavel Tatashin 2018-05-24 8:01 ` Timofey Titovets 0 siblings, 1 reply; 27+ messages in thread From: Pavel Tatashin @ 2018-05-23 14:24 UTC (permalink / raw) To: nefelim4ag; +Cc: Linux Memory Management List, solee, aarcange, kvm Hi Timofey, > crc32c will always be available, because of Kconfig. > But if crc32c doesn't have HW acceleration, it will be slower. > For talk about range of HW, i must have that HW, > so i can't say that *all* supported HW, have crc32c with acceleration. How about always defaulting to crc32c when HW acceleration is present without doing timings? Do you have performance numbers of crc32c without acceleration? > > You are loosing half of 64-bit word in xxh64 case? Is this acceptable? May > > be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ? > AFAIK, that lead to make hash function worse. > Even, in ksm hash used only for check if page has changed since last scan, > so that doesn't matter really (IMHO). I understand that losing half of the hash result might be acceptable in this case, but I am not really sure how XOirng one more time can possibly make hash function worse, could you please elaborate? > > choice_fastest_hash() does not belong to fasthash(). We are loosing leaf > > function optimizations if you keep it in this hot-path. Also, fastest_hash > > should really be a static branch in order to avoid extra load and > conditional > > branch. > I don't think what that will give any noticeable performance benefit. > In compare to hash computation and memcmp in RB. You are right, it is small compared to hash and memcmp, but still I think it makes sense to use static branch, after all the value will never change during runtime after the first time it is set. > In theory, that can be replaced with self written jump table, to *avoid* > run time overhead. > AFAIK at 5 entries, gcc convert switch to jump table itself. > > I think, crc32c should simply be used when it is available, and use xxhash > > otherwise, the decision should be made in ksm_init() > I already said, in above conversation, why i think do that at ksm_init() is > a bad idea. It really feels wrong to keep choice_fastest_hash() in fasthash(), it is done only once and really belongs to the init function, like ksm_init(). As I understand, you think it is a bad idea to keep it in ksm_init() because it slows down boot by 0.25s, which I agree with your is substantial. But, I really do not think that we should spend those 0.25s at all deciding what hash function is optimal, and instead default to one or another during boot based on hardware we are booting on. If crc32c without hw acceleration is no worse than jhash2, maybe we should simply switch to crc32c? Thank you, Pavel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-23 14:24 ` Pavel Tatashin @ 2018-05-24 8:01 ` Timofey Titovets 2018-05-25 1:16 ` Pavel Tatashin 2018-05-27 13:03 ` [PATCH V6 2/2 RESEND] " Mike Rapoport 0 siblings, 2 replies; 27+ messages in thread From: Timofey Titovets @ 2018-05-24 8:01 UTC (permalink / raw) To: pasha.tatashin; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm ср, 23 мая 2018 г. в 17:24, Pavel Tatashin <pasha.tatashin@oracle.com>: > Hi Timofey, > > crc32c will always be available, because of Kconfig. > > But if crc32c doesn't have HW acceleration, it will be slower. > > For talk about range of HW, i must have that HW, > > so i can't say that *all* supported HW, have crc32c with acceleration. > How about always defaulting to crc32c when HW acceleration is present > without doing timings? IIRC, yes, shash api can return 'cra_priority'. > Do you have performance numbers of crc32c without acceleration? Yes, https://lkml.org/lkml/2017/12/30/222 The experimental results (the experimental value is the average of the measured values) crc32c_intel: 1084.10ns crc32c (no hardware acceleration): 7012.51ns xxhash32: 2227.75ns xxhash64: 1413.16ns jhash2: 5128.30ns > > > You are loosing half of 64-bit word in xxh64 case? Is this acceptable? > May > > > be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ? > > AFAIK, that lead to make hash function worse. > > Even, in ksm hash used only for check if page has changed since last scan, > > so that doesn't matter really (IMHO). > I understand that losing half of the hash result might be acceptable in > this case, but I am not really sure how XOirng one more time can possibly > make hash function worse, could you please elaborate? IIRC, because of xor are symmetric i.e. shift: 0b01011010 >> 4 = 0b0101 and xor: 0b0101 ^ 0b1010 = 0b1111 Xor will decrease randomness/entropy and will lead to hash collisions. > > > choice_fastest_hash() does not belong to fasthash(). We are loosing leaf > > > function optimizations if you keep it in this hot-path. Also, > fastest_hash > > > should really be a static branch in order to avoid extra load and > > conditional > > > branch. > > I don't think what that will give any noticeable performance benefit. > > In compare to hash computation and memcmp in RB. > You are right, it is small compared to hash and memcmp, but still I think > it makes sense to use static branch, after all the value will never change > during runtime after the first time it is set. > > In theory, that can be replaced with self written jump table, to *avoid* > > run time overhead. > > AFAIK at 5 entries, gcc convert switch to jump table itself. > > > I think, crc32c should simply be used when it is available, and use > xxhash > > > otherwise, the decision should be made in ksm_init() > > I already said, in above conversation, why i think do that at ksm_init() > is > > a bad idea. > It really feels wrong to keep choice_fastest_hash() in fasthash(), it is > done only once and really belongs to the init function, like ksm_init(). As That possible to move decision from lazy load, to ksm_thread, that will allow us to start bench and not slowdown boot. But for that to works, ksm must start later, after init of crypto. > I understand, you think it is a bad idea to keep it in ksm_init() because > it slows down boot by 0.25s, which I agree with your is substantial. But, I > really do not think that we should spend those 0.25s at all deciding what > hash function is optimal, and instead default to one or another during boot > based on hardware we are booting on. If crc32c without hw acceleration is > no worse than jhash2, maybe we should simply switch to crc32c? crc32c with no hw, are slower in compare to jhash2 on x86, so i think on other arches result will be same. > Thank you, > Pavel Thanks. -- Have a nice day, Timofey. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-24 8:01 ` Timofey Titovets @ 2018-05-25 1:16 ` Pavel Tatashin 2018-05-26 20:25 ` [PATCH] " kbuild test robot 2018-05-26 21:06 ` kbuild test robot 2018-05-27 13:03 ` [PATCH V6 2/2 RESEND] " Mike Rapoport 1 sibling, 2 replies; 27+ messages in thread From: Pavel Tatashin @ 2018-05-25 1:16 UTC (permalink / raw) To: Timofey Titovets; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm Hi Timofey, > > Do you have performance numbers of crc32c without acceleration? > Yes, https://lkml.org/lkml/2017/12/30/222 > > The experimental results (the experimental value is the average of the > measured values) > crc32c_intel: 1084.10ns > crc32c (no hardware acceleration): 7012.51ns > xxhash32: 2227.75ns > xxhash64: 1413.16ns > jhash2: 5128.30ns Excellent, thank you for this data. > > I understand that losing half of the hash result might be acceptable in > > this case, but I am not really sure how XOirng one more time can possibly > > make hash function worse, could you please elaborate? > > IIRC, because of xor are symmetric > i.e. shift: > 0b01011010 >> 4 = 0b0101 > and xor: > 0b0101 ^ 0b1010 = 0b1111 > Xor will decrease randomness/entropy and will lead to hash collisions. Makes perfect sense. Yes, XORing two random numbers reduces entropy. > That possible to move decision from lazy load, to ksm_thread, > that will allow us to start bench and not slowdown boot. > > But for that to works, ksm must start later, after init of crypto. After studying this dependency some more I agree, it is OK to choose hash function where it is, but I still disagree that we must measure the performance at runtime. > crc32c with no hw, are slower in compare to jhash2 on x86, so i think on > other arches result will be same. Agreed. Below, is your patch updated with my suggested changes. 1. Removes dependency on crc32c, use it only when it is available. 2. Do not spend time measuring the performance, choose only if there is HW optimized implementation of crc32c is available. 3. Replace the logic with static branches. 4. Fix a couple minor bugs: fastest_hash_setup() and crc32c_available() were marked as __init functions. Thus could be unmapped by the time they are run for the first time. I think section mismatch would catch those Removed dead code: "desc.flags = 0", and also replaced desc with sash. Removed unnecessary local global "static struct shash_desc desc" this removes it from data page. Fixed few spelling errors, and other minor changes to pass ./scripts/checkpatch.pl The patch is untested, but should work. Please let me know if you agree with the changes. If so, you can test and resubmit the series. Thank you, Pavel Patch: ========================================================================== ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] ksm: replace jhash2 with faster hash 2018-05-25 1:16 ` Pavel Tatashin @ 2018-05-26 20:25 ` kbuild test robot 2018-05-26 21:06 ` kbuild test robot 1 sibling, 0 replies; 27+ messages in thread From: kbuild test robot @ 2018-05-26 20:25 UTC (permalink / raw) To: Pavel Tatashin Cc: kbuild-all, Timofey Titovets, linux-mm, Sioh Lee, Andrea Arcangeli, kvm [-- Attachment #1: Type: text/plain, Size: 1714 bytes --] Hi Pavel, Thank you for the patch! Yet something to improve: [auto build test ERROR on mmotm/master] [also build test ERROR on v4.17-rc6] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Pavel-Tatashin/ksm-replace-jhash2-with-faster-hash/20180527-032512 base: git://git.cmpxchg.org/linux-mmotm.git master config: i386-randconfig-x078-201821 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/ksm.c: In function 'fasthash': >> mm/ksm.c:338:15: error: implicit declaration of function 'xxhash'; did you mean 'jhash'? [-Werror=implicit-function-declaration] return (u32)xxhash(input, length, 0); ^~~~~~ jhash cc1: some warnings being treated as errors vim +338 mm/ksm.c 331 332 static u32 fasthash(const void *input, size_t length) 333 { 334 if (static_branch_likely(&ksm_use_crc32c)) 335 return crc32c(0, input, length); 336 337 if (static_branch_likely(&ksm_use_xxhash)) > 338 return (u32)xxhash(input, length, 0); 339 340 /* Is done only once on the first call of fasthash() */ 341 fasthash_setup(); 342 343 /* Now, that we know the hash alg., calculate checksum for zero page */ 344 zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); 345 346 return fasthash(input, length); 347 } 348 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 27001 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] ksm: replace jhash2 with faster hash 2018-05-25 1:16 ` Pavel Tatashin 2018-05-26 20:25 ` [PATCH] " kbuild test robot @ 2018-05-26 21:06 ` kbuild test robot 1 sibling, 0 replies; 27+ messages in thread From: kbuild test robot @ 2018-05-26 21:06 UTC (permalink / raw) To: Pavel Tatashin Cc: kbuild-all, Timofey Titovets, linux-mm, Sioh Lee, Andrea Arcangeli, kvm [-- Attachment #1: Type: text/plain, Size: 1682 bytes --] Hi Pavel, Thank you for the patch! Yet something to improve: [auto build test ERROR on mmotm/master] [also build test ERROR on v4.17-rc6] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Pavel-Tatashin/ksm-replace-jhash2-with-faster-hash/20180527-032512 base: git://git.cmpxchg.org/linux-mmotm.git master config: i386-randconfig-s0-201821 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/ksm.c: In function 'fasthash': >> mm/ksm.c:338:15: error: implicit declaration of function 'xxhash' [-Werror=implicit-function-declaration] return (u32)xxhash(input, length, 0); ^~~~~~ cc1: some warnings being treated as errors vim +/xxhash +338 mm/ksm.c 331 332 static u32 fasthash(const void *input, size_t length) 333 { 334 if (static_branch_likely(&ksm_use_crc32c)) 335 return crc32c(0, input, length); 336 337 if (static_branch_likely(&ksm_use_xxhash)) > 338 return (u32)xxhash(input, length, 0); 339 340 /* Is done only once on the first call of fasthash() */ 341 fasthash_setup(); 342 343 /* Now, that we know the hash alg., calculate checksum for zero page */ 344 zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); 345 346 return fasthash(input, length); 347 } 348 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 29416 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-24 8:01 ` Timofey Titovets 2018-05-25 1:16 ` Pavel Tatashin @ 2018-05-27 13:03 ` Mike Rapoport 2018-05-29 14:45 ` Pavel Tatashin 1 sibling, 1 reply; 27+ messages in thread From: Mike Rapoport @ 2018-05-27 13:03 UTC (permalink / raw) To: Timofey Titovets Cc: pasha.tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm Hi, On Thu, May 24, 2018 at 11:01:20AM +0300, Timofey Titovets wrote: > N?N?, 23 D 1/4 D?N? 2018 D3. D2 17:24, Pavel Tatashin <pasha.tatashin@oracle.com>: > > > Hi Timofey, [ ... ] > > It really feels wrong to keep choice_fastest_hash() in fasthash(), it is > > done only once and really belongs to the init function, like ksm_init(). > As > > That possible to move decision from lazy load, to ksm_thread, > that will allow us to start bench and not slowdown boot. > > But for that to works, ksm must start later, after init of crypto. What about moving choice_fastest_hash() to run_store()? KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until userspace writes !0 to /sys/kernel/mm/ksm/run. Selection of the hash function when KSM is actually enabled seems quite appropriate... > > I understand, you think it is a bad idea to keep it in ksm_init() because > > it slows down boot by 0.25s, which I agree with your is substantial. But, > I > > really do not think that we should spend those 0.25s at all deciding what > > hash function is optimal, and instead default to one or another during > boot > > based on hardware we are booting on. If crc32c without hw acceleration is > > no worse than jhash2, maybe we should simply switch to crc32c? > > crc32c with no hw, are slower in compare to jhash2 on x86, so i think on > other arches result will be same. > > > Thank you, > > Pavel > > Thanks. > > -- > Have a nice day, > Timofey. > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-27 13:03 ` [PATCH V6 2/2 RESEND] " Mike Rapoport @ 2018-05-29 14:45 ` Pavel Tatashin 2018-06-07 8:58 ` Timofey Titovets 0 siblings, 1 reply; 27+ messages in thread From: Pavel Tatashin @ 2018-05-29 14:45 UTC (permalink / raw) To: rppt; +Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm > What about moving choice_fastest_hash() to run_store()? > KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until > userspace writes !0 to /sys/kernel/mm/ksm/run. > Selection of the hash function when KSM is actually enabled seems quite > appropriate... Hi Mike, This is a good idea to select hash function from run_store() when (flags & KSM_RUN_MERGE) is set for the first time. Pavel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-05-29 14:45 ` Pavel Tatashin @ 2018-06-07 8:58 ` Timofey Titovets 2018-06-07 11:52 ` Mike Rapoport 0 siblings, 1 reply; 27+ messages in thread From: Timofey Titovets @ 2018-06-07 8:58 UTC (permalink / raw) To: pasha.tatashin; +Cc: rppt, linux-mm, Sioh Lee, Andrea Arcangeli, kvm вт, 29 мая 2018 г. в 17:46, Pavel Tatashin <pasha.tatashin@oracle.com>: > > > What about moving choice_fastest_hash() to run_store()? > > > KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until > > userspace writes !0 to /sys/kernel/mm/ksm/run. > > > Selection of the hash function when KSM is actually enabled seems quite > > appropriate... > > Hi Mike, > > This is a good idea to select hash function from run_store() when (flags & > KSM_RUN_MERGE) is set for the first time. > > Pavel IIRC, run_store hidden under '#ifdef CONFIG_SYSFS' So, what's about case without CONFIG_SYSFS? -- Have a nice day, Timofey. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-06-07 8:58 ` Timofey Titovets @ 2018-06-07 11:52 ` Mike Rapoport 2018-06-08 1:29 ` Pavel Tatashin 0 siblings, 1 reply; 27+ messages in thread From: Mike Rapoport @ 2018-06-07 11:52 UTC (permalink / raw) To: Timofey Titovets Cc: pasha.tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm On Thu, Jun 07, 2018 at 11:58:05AM +0300, Timofey Titovets wrote: > D2N?, 29 D 1/4 D?N? 2018 D3. D2 17:46, Pavel Tatashin <pasha.tatashin@oracle.com>: > > > > > What about moving choice_fastest_hash() to run_store()? > > > > > KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until > > > userspace writes !0 to /sys/kernel/mm/ksm/run. > > > > > Selection of the hash function when KSM is actually enabled seems quite > > > appropriate... > > > > Hi Mike, > > > > This is a good idea to select hash function from run_store() when (flags & > > KSM_RUN_MERGE) is set for the first time. > > > > Pavel > > IIRC, run_store hidden under '#ifdef CONFIG_SYSFS' > So, what's about case without CONFIG_SYSFS? With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but zero and ksm_do_scan will never be called. > -- > Have a nice day, > Timofey. > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-06-07 11:52 ` Mike Rapoport @ 2018-06-08 1:29 ` Pavel Tatashin 2018-06-10 5:38 ` Mike Rapoport 2018-06-25 8:48 ` Mike Rapoport 0 siblings, 2 replies; 27+ messages in thread From: Pavel Tatashin @ 2018-06-08 1:29 UTC (permalink / raw) To: rppt; +Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but > zero and ksm_do_scan will never be called. > Unfortunatly, this is not so: In: /linux-master/mm/ksm.c 3143#else 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ 3145 3146#endif /* CONFIG_SYSFS */ So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n. I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ? Thank you, Pavel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-06-08 1:29 ` Pavel Tatashin @ 2018-06-10 5:38 ` Mike Rapoport 2018-06-22 18:48 ` Pavel Tatashin 2018-06-25 8:48 ` Mike Rapoport 1 sibling, 1 reply; 27+ messages in thread From: Mike Rapoport @ 2018-06-10 5:38 UTC (permalink / raw) To: Pavel Tatashin Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote: > > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but > > zero and ksm_do_scan will never be called. > > > > Unfortunatly, this is not so: > > In: /linux-master/mm/ksm.c > > 3143#else > 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ > 3145 > 3146#endif /* CONFIG_SYSFS */ > > So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n. Huh, missed that one... > I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ? A bit unrelated to CONFIG_SYSFS, but rather for rare use-cases in general. What will happen in the following scenario: * The system has crc32c HW acceleration * KSM chooses crc32c * KSM runs with crc32c * user removes crc32c HW acceleration module If I understand correctly, we'll then fall back to pure SW crc32c calculations, right? > Thank you, > Pavel > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-06-10 5:38 ` Mike Rapoport @ 2018-06-22 18:48 ` Pavel Tatashin 0 siblings, 0 replies; 27+ messages in thread From: Pavel Tatashin @ 2018-06-22 18:48 UTC (permalink / raw) To: rppt; +Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm > A bit unrelated to CONFIG_SYSFS, but rather for rare use-cases in general. > What will happen in the following scenario: > > * The system has crc32c HW acceleration > * KSM chooses crc32c > * KSM runs with crc32c > * user removes crc32c HW acceleration module > > If I understand correctly, we'll then fall back to pure SW crc32c > calculations, right? Yes, we fallback to the SW crc32c, which is slower compared to hw optimized, but we won't change hash function once it is set. I do not think it makes sense to add any extra logic into ksm for that, even after every page is unmerged and ksm thread is stopped. Pavel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-06-08 1:29 ` Pavel Tatashin 2018-06-10 5:38 ` Mike Rapoport @ 2018-06-25 8:48 ` Mike Rapoport 2018-09-13 10:35 ` Timofey Titovets 1 sibling, 1 reply; 27+ messages in thread From: Mike Rapoport @ 2018-06-25 8:48 UTC (permalink / raw) To: Pavel Tatashin Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote: > > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but > > zero and ksm_do_scan will never be called. > > > > Unfortunatly, this is not so: > > In: /linux-master/mm/ksm.c > > 3143#else > 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ > 3145 > 3146#endif /* CONFIG_SYSFS */ > > So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n. > > I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ? BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration for crc32c is initialized... > Thank you, > Pavel > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-06-25 8:48 ` Mike Rapoport @ 2018-09-13 10:35 ` Timofey Titovets 2018-09-13 18:01 ` Mike Rapoport 0 siblings, 1 reply; 27+ messages in thread From: Timofey Titovets @ 2018-09-13 10:35 UTC (permalink / raw) To: rppt; +Cc: pasha.tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm пн, 25 июн. 2018 г. в 11:48, Mike Rapoport <rppt@linux.vnet.ibm.com>: > > On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote: > > > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but > > > zero and ksm_do_scan will never be called. > > > > > > > Unfortunatly, this is not so: > > > > In: /linux-master/mm/ksm.c > > > > 3143#else > > 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ > > 3145 > > 3146#endif /* CONFIG_SYSFS */ > > > > So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n. > > > > I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ? > > BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration > for crc32c is initialized... > > > Thank you, > > Pavel > > > > -- > Sincerely yours, > Mike. > Little thread bump. That patchset can't move forward already for about ~8 month. As i see main question in thread: that we have a race with ksm initialization and availability of crypto api. Maybe we then can fall back to simple plan, and just replace old good buddy jhash by just more fast xxhash? That allow move question with crypto api & crc32 to background, and make things better for now, in 2-3 times. What you all think about that? > crc32c_intel: 1084.10ns > crc32c (no hardware acceleration): 7012.51ns > xxhash32: 2227.75ns > xxhash64: 1413.16ns > jhash2: 5128.30ns -- Have a nice day, Timofey. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-09-13 10:35 ` Timofey Titovets @ 2018-09-13 18:01 ` Mike Rapoport 2018-09-13 18:10 ` Pasha Tatashin 0 siblings, 1 reply; 27+ messages in thread From: Mike Rapoport @ 2018-09-13 18:01 UTC (permalink / raw) To: Timofey Titovets Cc: Pasha Tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm (updated Pasha's e-mail) Thu, Sep 13, 2018 at 01:35:20PM +0300, Timofey Titovets wrote: > D?D 1/2 , 25 D,N?D 1/2 . 2018 D3. D2 11:48, Mike Rapoport <rppt@linux.vnet.ibm.com>: > > > > On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote: > > > > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but > > > > zero and ksm_do_scan will never be called. > > > > > > > > > > Unfortunatly, this is not so: > > > > > > In: /linux-master/mm/ksm.c > > > > > > 3143#else > > > 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ > > > 3145 > > > 3146#endif /* CONFIG_SYSFS */ > > > > > > So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n. > > > > > > I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ? > > > > BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration > > for crc32c is initialized... > > > > > Thank you, > > > Pavel > > > > > > > -- > > Sincerely yours, > > Mike. > > > > Little thread bump. > That patchset can't move forward already for about ~8 month. > As i see main question in thread: that we have a race with ksm > initialization and availability of crypto api. > Maybe we then can fall back to simple plan, and just replace old good > buddy jhash by just more fast xxhash? > That allow move question with crypto api & crc32 to background, and > make things better for now, in 2-3 times. > > What you all think about that? Sounds reasonable to me > > crc32c_intel: 1084.10ns > > crc32c (no hardware acceleration): 7012.51ns > > xxhash32: 2227.75ns > > xxhash64: 1413.16ns > > jhash2: 5128.30ns > > -- > Have a nice day, > Timofey. > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-09-13 18:01 ` Mike Rapoport @ 2018-09-13 18:10 ` Pasha Tatashin 0 siblings, 0 replies; 27+ messages in thread From: Pasha Tatashin @ 2018-09-13 18:10 UTC (permalink / raw) To: Mike Rapoport, Timofey Titovets; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm On 9/13/18 2:01 PM, Mike Rapoport wrote: > (updated Pasha's e-mail) > > Thu, Sep 13, 2018 at 01:35:20PM +0300, Timofey Titovets wrote: >> пн, 25 июн. 2018 г. в 11:48, Mike Rapoport <rppt@linux.vnet.ibm.com>: >>> >>> On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote: >>>>> With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but >>>>> zero and ksm_do_scan will never be called. >>>>> >>>> >>>> Unfortunatly, this is not so: >>>> >>>> In: /linux-master/mm/ksm.c >>>> >>>> 3143#else >>>> 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ >>>> 3145 >>>> 3146#endif /* CONFIG_SYSFS */ >>>> >>>> So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n. >>>> >>>> I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ? >>> >>> BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration >>> for crc32c is initialized... >>> >>>> Thank you, >>>> Pavel >>>> >>> >>> -- >>> Sincerely yours, >>> Mike. >>> >> >> Little thread bump. >> That patchset can't move forward already for about ~8 month. >> As i see main question in thread: that we have a race with ksm >> initialization and availability of crypto api. >> Maybe we then can fall back to simple plan, and just replace old good >> buddy jhash by just more fast xxhash? >> That allow move question with crypto api & crc32 to background, and >> make things better for now, in 2-3 times. >> >> What you all think about that? > > Sounds reasonable to me Same here, please send a new patch with xxhash, and after that we can work on a faster crc32. Thank you, Pavel > >>> crc32c_intel: 1084.10ns >>> crc32c (no hardware acceleration): 7012.51ns >>> xxhash32: 2227.75ns >>> xxhash64: 1413.16ns >>> jhash2: 5128.30ns >> >> -- >> Have a nice day, >> Timofey. >> > ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash @ 2018-02-07 10:22 Timofey Titovets 2018-02-07 10:22 ` Timofey Titovets 0 siblings, 1 reply; 27+ messages in thread From: Timofey Titovets @ 2018-02-07 10:22 UTC (permalink / raw) To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh Currently used jhash are slow enough and replace it allow as to make KSM less cpu hungry. About speed (in kernel): ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s By sioh Lee tests (copy from other mail): Test platform: openstack cloud platform (NEWTON version) Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb) VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4 Linux kernel: 4.14 (latest version) KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200 Experiment process Firstly, we turn off KSM and launch 4 VMs. Then we turn on the KSM and measure the checksum computation time until full_scans become two. The experimental results (the experimental value is the average of the measured values) crc32c_intel: 1084.10ns crc32c (no hardware acceleration): 7012.51ns xxhash32: 2227.75ns xxhash64: 1413.16ns jhash2: 5128.30ns In summary, the result shows that crc32c_intel has advantages over all of the hash function used in the experiment. (decreased by 84.54% compared to crc32c, 78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to xxhash64) the results are similar to those of Timofey. So: - Fisrt patch implement compile time pickup of fastest implementation of xxhash for target platform. - Second implement logic in ksm, what test speed of hashes and pickup fastest hash Thanks. CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org CC: leesioh <solee@os.korea.ac.kr> Timofey Titovets (2): xxHash: create arch dependent 32/64-bit xxhash() ksm: replace jhash2 with faster hash include/linux/xxhash.h | 23 +++++++++++++ mm/Kconfig | 2 ++ mm/ksm.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 114 insertions(+), 4 deletions(-) -- 2.14.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash 2018-02-07 10:22 [PATCH V6 0/2 RESEND] KSM replace hash algo " Timofey Titovets @ 2018-02-07 10:22 ` Timofey Titovets 0 siblings, 0 replies; 27+ messages in thread From: Timofey Titovets @ 2018-02-07 10:22 UTC (permalink / raw) To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm 1. Pickup, Sioh Lee crc32 patch, after some long conversation 2. Merge with my work on xxhash 3. Add autoselect code to choice fastest hash helper. Base idea are same, replace jhash2 with something faster. Perf numbers: Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s As jhash2 always will be slower (for data size like PAGE_SIZE), just drop it from choice. Add function to autoselect hash algo on boot, based on hashing speed, like raid6 code does. Move init of zero_checksum from init, to first call of fasthash(): 1. KSM Init run on early kernel init, run perf testing stuff on main kernel boot thread looks bad to me. 2. Crypto subsystem not avaliable at that early booting, so crc32c even, compiled in, not avaliable As crypto and ksm init, run at subsys_initcall() (4) kernel level of init, all possible consumers will run later at 5+ levels Output after first try of KSM to hash page: ksm: crc32c hash() 15218 MB/s ksm: xxhash hash() 8640 MB/s ksm: choice crc32c as hash function Thanks. Changes: v1 -> v2: - Move xxhash() to xxhash.h/c and separate patches v2 -> v3: - Move xxhash() xxhash.c -> xxhash.h - replace xxhash_t with 'unsigned long' - update kerneldoc above xxhash() v3 -> v4: - Merge xxhash/crc32 patches - Replace crc32 with crc32c (crc32 have same as jhash2 speed) - Add auto speed test and auto choice of fastest hash function v4 -> v5: - Pickup missed xxhash patch - Update code with compile time choicen xxhash - Add more macros to make code more readable - As now that only possible use xxhash or crc32c, on crc32c allocation error, skip speed test and fallback to xxhash - For workaround too early init problem (crc32c not avaliable), move zero_checksum init to first call of fastcall() - Don't alloc page for hash testing, use arch zero pages for that v5 -> v6: - Use libcrc32c instead of CRYPTO API, mainly for code/Kconfig deps Simplification - Add crc32c_available(): libcrc32c will BUG_ON on crc32c problems, so test crc32c avaliable by crc32c_available() - Simplify choice_fastest_hash() - Simplify fasthash() - struct rmap_item && stable_node have sizeof == 64 on x86_64, that makes them cache friendly. As we don't suffer from hash collisions, change hash type from unsigned long back to u32. - Fix kbuild robot warning, make all local functions static Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: leesioh <solee@os.korea.ac.kr> CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org --- mm/Kconfig | 2 ++ mm/ksm.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 91 insertions(+), 4 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 03ff7703d322..b60bee4bb07e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -305,6 +305,8 @@ config MMU_NOTIFIER config KSM bool "Enable KSM for page merging" depends on MMU + select XXHASH + select LIBCRC32C help Enable Kernel Samepage Merging: KSM periodically scans those areas of an application's address space that an app has advised may be diff --git a/mm/ksm.c b/mm/ksm.c index c406f75957ad..2b84407fb918 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -25,7 +25,6 @@ #include <linux/pagemap.h> #include <linux/rmap.h> #include <linux/spinlock.h> -#include <linux/jhash.h> #include <linux/delay.h> #include <linux/kthread.h> #include <linux/wait.h> @@ -41,6 +40,13 @@ #include <linux/numa.h> #include <asm/tlbflush.h> + +/* Support for xxhash and crc32c */ +#include <crypto/hash.h> +#include <linux/crc32c.h> +#include <linux/xxhash.h> +#include <linux/sizes.h> + #include "internal.h" #ifdef CONFIG_NUMA @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); sizeof(struct __struct), __alignof__(struct __struct),\ (__flags), NULL) +#define TIME_125MS (HZ >> 3) +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) + +#define HASH_NONE 0 +#define HASH_CRC32C 1 +#define HASH_XXHASH 2 + +static int fastest_hash = HASH_NONE; + +static bool __init crc32c_available(void) +{ + static struct shash_desc desc; + + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); + desc.flags = 0; + + if (IS_ERR(desc.tfm)) { + pr_warn("ksm: alloc crc32c shash error %ld\n", + -PTR_ERR(desc.tfm)); + return false; + } + + crypto_free_shash(desc.tfm); + return true; +} + +static void __init choice_fastest_hash(void) +{ + + unsigned long je; + unsigned long perf_crc32c = 0; + unsigned long perf_xxhash = 0; + + fastest_hash = HASH_XXHASH; + if (!crc32c_available()) + goto out; + + preempt_disable(); + je = jiffies + TIME_125MS; + while (time_before(jiffies, je)) { + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); + perf_crc32c++; + } + preempt_enable(); + + preempt_disable(); + je = jiffies + TIME_125MS; + while (time_before(jiffies, je)) { + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); + perf_xxhash++; + } + preempt_enable(); + + pr_info("ksm: crc32c hash() %5ld MB/s\n", PERF_TO_MBS(perf_crc32c)); + pr_info("ksm: xxhash hash() %5ld MB/s\n", PERF_TO_MBS(perf_xxhash)); + + if (perf_crc32c > perf_xxhash) + fastest_hash = HASH_CRC32C; +out: + if (fastest_hash == HASH_CRC32C) + pr_info("ksm: choice crc32c as hash function\n"); + else + pr_info("ksm: choice xxhash as hash function\n"); +} + +static u32 fasthash(const void *input, size_t length) +{ +again: + switch (fastest_hash) { + case HASH_CRC32C: + return crc32c(0, input, length); + case HASH_XXHASH: + return xxhash(input, length, 0); + default: + choice_fastest_hash(); + /* The correct value depends on page size and endianness */ + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); + goto again; + } +} + static int __init ksm_slab_init(void) { rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0); @@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page) { u32 checksum; void *addr = kmap_atomic(page); - checksum = jhash2(addr, PAGE_SIZE / 4, 17); + checksum = fasthash(addr, PAGE_SIZE); kunmap_atomic(addr); return checksum; } @@ -3061,8 +3148,6 @@ static int __init ksm_init(void) struct task_struct *ksm_thread; int err; - /* The correct value depends on page size and endianness */ - zero_checksum = calc_checksum(ZERO_PAGE(0)); /* Default to false for backwards compatibility */ ksm_use_zero_pages = false; -- 2.14.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash @ 2018-02-07 10:22 ` Timofey Titovets 0 siblings, 0 replies; 27+ messages in thread From: Timofey Titovets @ 2018-02-07 10:22 UTC (permalink / raw) To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm 1. Pickup, Sioh Lee crc32 patch, after some long conversation 2. Merge with my work on xxhash 3. Add autoselect code to choice fastest hash helper. Base idea are same, replace jhash2 with something faster. Perf numbers: Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz ksm: crc32c hash() 12081 MB/s ksm: xxh64 hash() 8770 MB/s ksm: xxh32 hash() 4529 MB/s ksm: jhash2 hash() 1569 MB/s As jhash2 always will be slower (for data size like PAGE_SIZE), just drop it from choice. Add function to autoselect hash algo on boot, based on hashing speed, like raid6 code does. Move init of zero_checksum from init, to first call of fasthash(): 1. KSM Init run on early kernel init, run perf testing stuff on main kernel boot thread looks bad to me. 2. Crypto subsystem not avaliable at that early booting, so crc32c even, compiled in, not avaliable As crypto and ksm init, run at subsys_initcall() (4) kernel level of init, all possible consumers will run later at 5+ levels Output after first try of KSM to hash page: ksm: crc32c hash() 15218 MB/s ksm: xxhash hash() 8640 MB/s ksm: choice crc32c as hash function Thanks. Changes: v1 -> v2: - Move xxhash() to xxhash.h/c and separate patches v2 -> v3: - Move xxhash() xxhash.c -> xxhash.h - replace xxhash_t with 'unsigned long' - update kerneldoc above xxhash() v3 -> v4: - Merge xxhash/crc32 patches - Replace crc32 with crc32c (crc32 have same as jhash2 speed) - Add auto speed test and auto choice of fastest hash function v4 -> v5: - Pickup missed xxhash patch - Update code with compile time choicen xxhash - Add more macros to make code more readable - As now that only possible use xxhash or crc32c, on crc32c allocation error, skip speed test and fallback to xxhash - For workaround too early init problem (crc32c not avaliable), move zero_checksum init to first call of fastcall() - Don't alloc page for hash testing, use arch zero pages for that v5 -> v6: - Use libcrc32c instead of CRYPTO API, mainly for code/Kconfig deps Simplification - Add crc32c_available(): libcrc32c will BUG_ON on crc32c problems, so test crc32c avaliable by crc32c_available() - Simplify choice_fastest_hash() - Simplify fasthash() - struct rmap_item && stable_node have sizeof == 64 on x86_64, that makes them cache friendly. As we don't suffer from hash collisions, change hash type from unsigned long back to u32. - Fix kbuild robot warning, make all local functions static Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: leesioh <solee@os.korea.ac.kr> CC: Andrea Arcangeli <aarcange@redhat.com> CC: linux-mm@kvack.org CC: kvm@vger.kernel.org --- mm/Kconfig | 2 ++ mm/ksm.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 91 insertions(+), 4 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 03ff7703d322..b60bee4bb07e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -305,6 +305,8 @@ config MMU_NOTIFIER config KSM bool "Enable KSM for page merging" depends on MMU + select XXHASH + select LIBCRC32C help Enable Kernel Samepage Merging: KSM periodically scans those areas of an application's address space that an app has advised may be diff --git a/mm/ksm.c b/mm/ksm.c index c406f75957ad..2b84407fb918 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -25,7 +25,6 @@ #include <linux/pagemap.h> #include <linux/rmap.h> #include <linux/spinlock.h> -#include <linux/jhash.h> #include <linux/delay.h> #include <linux/kthread.h> #include <linux/wait.h> @@ -41,6 +40,13 @@ #include <linux/numa.h> #include <asm/tlbflush.h> + +/* Support for xxhash and crc32c */ +#include <crypto/hash.h> +#include <linux/crc32c.h> +#include <linux/xxhash.h> +#include <linux/sizes.h> + #include "internal.h" #ifdef CONFIG_NUMA @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock); sizeof(struct __struct), __alignof__(struct __struct),\ (__flags), NULL) +#define TIME_125MS (HZ >> 3) +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M)) + +#define HASH_NONE 0 +#define HASH_CRC32C 1 +#define HASH_XXHASH 2 + +static int fastest_hash = HASH_NONE; + +static bool __init crc32c_available(void) +{ + static struct shash_desc desc; + + desc.tfm = crypto_alloc_shash("crc32c", 0, 0); + desc.flags = 0; + + if (IS_ERR(desc.tfm)) { + pr_warn("ksm: alloc crc32c shash error %ld\n", + -PTR_ERR(desc.tfm)); + return false; + } + + crypto_free_shash(desc.tfm); + return true; +} + +static void __init choice_fastest_hash(void) +{ + + unsigned long je; + unsigned long perf_crc32c = 0; + unsigned long perf_xxhash = 0; + + fastest_hash = HASH_XXHASH; + if (!crc32c_available()) + goto out; + + preempt_disable(); + je = jiffies + TIME_125MS; + while (time_before(jiffies, je)) { + crc32c(0, ZERO_PAGE(0), PAGE_SIZE); + perf_crc32c++; + } + preempt_enable(); + + preempt_disable(); + je = jiffies + TIME_125MS; + while (time_before(jiffies, je)) { + xxhash(ZERO_PAGE(0), PAGE_SIZE, 0); + perf_xxhash++; + } + preempt_enable(); + + pr_info("ksm: crc32c hash() %5ld MB/s\n", PERF_TO_MBS(perf_crc32c)); + pr_info("ksm: xxhash hash() %5ld MB/s\n", PERF_TO_MBS(perf_xxhash)); + + if (perf_crc32c > perf_xxhash) + fastest_hash = HASH_CRC32C; +out: + if (fastest_hash == HASH_CRC32C) + pr_info("ksm: choice crc32c as hash function\n"); + else + pr_info("ksm: choice xxhash as hash function\n"); +} + +static u32 fasthash(const void *input, size_t length) +{ +again: + switch (fastest_hash) { + case HASH_CRC32C: + return crc32c(0, input, length); + case HASH_XXHASH: + return xxhash(input, length, 0); + default: + choice_fastest_hash(); + /* The correct value depends on page size and endianness */ + zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE); + goto again; + } +} + static int __init ksm_slab_init(void) { rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0); @@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page) { u32 checksum; void *addr = kmap_atomic(page); - checksum = jhash2(addr, PAGE_SIZE / 4, 17); + checksum = fasthash(addr, PAGE_SIZE); kunmap_atomic(addr); return checksum; } @@ -3061,8 +3148,6 @@ static int __init ksm_init(void) struct task_struct *ksm_thread; int err; - /* The correct value depends on page size and endianness */ - zero_checksum = calc_checksum(ZERO_PAGE(0)); /* Default to false for backwards compatibility */ ksm_use_zero_pages = false; -- 2.14.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
end of thread, other threads:[~2018-09-13 18:11 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-18 19:32 [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash Timofey Titovets 2018-04-18 19:32 ` [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets 2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets 2018-05-08 15:26 ` Claudio Imbrenda 2018-05-11 23:06 ` Timofey Titovets 2018-05-14 10:17 ` Claudio Imbrenda 2018-05-16 10:26 ` Timofey Titovets 2018-05-22 20:22 ` Pavel Tatashin 2018-05-23 13:45 ` Timofey Titovets 2018-05-23 14:24 ` Pavel Tatashin 2018-05-24 8:01 ` Timofey Titovets 2018-05-25 1:16 ` Pavel Tatashin 2018-05-26 20:25 ` [PATCH] " kbuild test robot 2018-05-26 21:06 ` kbuild test robot 2018-05-27 13:03 ` [PATCH V6 2/2 RESEND] " Mike Rapoport 2018-05-29 14:45 ` Pavel Tatashin 2018-06-07 8:58 ` Timofey Titovets 2018-06-07 11:52 ` Mike Rapoport 2018-06-08 1:29 ` Pavel Tatashin 2018-06-10 5:38 ` Mike Rapoport 2018-06-22 18:48 ` Pavel Tatashin 2018-06-25 8:48 ` Mike Rapoport 2018-09-13 10:35 ` Timofey Titovets 2018-09-13 18:01 ` Mike Rapoport 2018-09-13 18:10 ` Pasha Tatashin -- strict thread matches above, loose matches on Subject: below -- 2018-02-07 10:22 [PATCH V6 0/2 RESEND] KSM replace hash algo " Timofey Titovets 2018-02-07 10:22 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 " Timofey Titovets 2018-02-07 10:22 ` Timofey Titovets
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.