All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash
@ 2018-04-18 19:32 Timofey Titovets
  2018-04-18 19:32 ` [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
  2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets
  0 siblings, 2 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-04-18 19:32 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh

From: Timofey Titovets <nefelim4ag@gmail.com>

Currently used jhash are slow enough and replace it allow as to make KSM
less cpu hungry.

About speed (in kernel):
        ksm: crc32c   hash() 12081 MB/s
        ksm: xxh64    hash()  8770 MB/s
        ksm: xxh32    hash()  4529 MB/s
        ksm: jhash2   hash()  1569 MB/s

By sioh Lee tests (copy from other mail):
Test platform: openstack cloud platform (NEWTON version)
Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb)
VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4
Linux kernel: 4.14 (latest version)
KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200

Experiment process
Firstly, we turn off KSM and launch 4 VMs.
Then we turn on the KSM and measure the checksum computation time until full_scans become two.

The experimental results (the experimental value is the average of the measured values)
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns

In summary, the result shows that crc32c_intel has advantages over all 
of the hash function used in the experiment. (decreased by 84.54% compared to crc32c,
78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to xxhash64)
the results are similar to those of Timofey.

So:
  - Fisrt patch implement compile time pickup of fastest implementation of xxhash
    for target platform.
  - Second implement logic in ksm, what test speed of hashes and pickup fastest hash
  
Thanks.

CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
CC: leesioh <solee@os.korea.ac.kr>

Timofey Titovets (2):
  xxHash: create arch dependent 32/64-bit xxhash()
  ksm: replace jhash2 with faster hash

 include/linux/xxhash.h | 23 +++++++++++++
 mm/Kconfig             |  2 ++
 mm/ksm.c               | 93 +++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 114 insertions(+), 4 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash()
  2018-04-18 19:32 [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash Timofey Titovets
@ 2018-04-18 19:32 ` Timofey Titovets
  2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets
  1 sibling, 0 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-04-18 19:32 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh

From: Timofey Titovets <nefelim4ag@gmail.com>

xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform

Create xxhash() which will pickup fastest version
on compile time.

As result depends on cpu word size,
the main proporse of that - in memory hashing.

Changes:
  v2:
    - Create that patch
  v3 -> v6:
    - Nothing, whole patchset version bump

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
CC: leesioh <solee@os.korea.ac.kr>
---
 include/linux/xxhash.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
index 9e1f42cb57e9..52b073fea17f 100644
--- a/include/linux/xxhash.h
+++ b/include/linux/xxhash.h
@@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
  */
 uint64_t xxh64(const void *input, size_t length, uint64_t seed);
 
+/**
+ * xxhash() - calculate wordsize hash of the input with a given seed
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * If the hash does not need to be comparable between machines with
+ * different word sizes, this function will call whichever of xxh32()
+ * or xxh64() is faster.
+ *
+ * Return:  wordsize hash of the data.
+ */
+
+static inline unsigned long xxhash(const void *input, size_t length,
+				   uint64_t seed)
+{
+#if BITS_PER_LONG == 64
+       return xxh64(input, length, seed);
+#else
+       return xxh32(input, length, seed);
+#endif
+}
+
 /*-****************************
  * Streaming Hash Functions
  *****************************/
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-04-18 19:32 [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash Timofey Titovets
  2018-04-18 19:32 ` [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
@ 2018-04-18 19:32 ` Timofey Titovets
  2018-05-08 15:26   ` Claudio Imbrenda
  2018-05-22 20:22   ` Pavel Tatashin
  1 sibling, 2 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-04-18 19:32 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm

From: Timofey Titovets <nefelim4ag@gmail.com>

1. Pickup, Sioh Lee crc32 patch, after some long conversation
2. Merge with my work on xxhash
3. Add autoselect code to choice fastest hash helper.

Base idea are same, replace jhash2 with something faster.

Perf numbers:
Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
ksm: crc32c   hash() 12081 MB/s
ksm: xxh64    hash()  8770 MB/s
ksm: xxh32    hash()  4529 MB/s
ksm: jhash2   hash()  1569 MB/s

As jhash2 always will be slower (for data size like PAGE_SIZE),
just drop it from choice.

Add function to autoselect hash algo on boot,
based on hashing speed, like raid6 code does.

Move init of zero_checksum from init, to first call of fasthash():
  1. KSM Init run on early kernel init,
     run perf testing stuff on main kernel boot thread looks bad to me.
  2. Crypto subsystem not avaliable at that early booting,
     so crc32c even, compiled in, not avaliable
     As crypto and ksm init, run at subsys_initcall() (4) kernel level of init,
     all possible consumers will run later at 5+ levels

Output after first try of KSM to hash page:
ksm: crc32c hash() 15218 MB/s
ksm: xxhash hash()  8640 MB/s
ksm: choice crc32c as hash function

Thanks.

Changes:
  v1 -> v2:
    - Move xxhash() to xxhash.h/c and separate patches
  v2 -> v3:
    - Move xxhash() xxhash.c -> xxhash.h
    - replace xxhash_t with 'unsigned long'
    - update kerneldoc above xxhash()
  v3 -> v4:
    - Merge xxhash/crc32 patches
    - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
    - Add auto speed test and auto choice of fastest hash function
  v4 -> v5:
    - Pickup missed xxhash patch
    - Update code with compile time choicen xxhash
    - Add more macros to make code more readable
    - As now that only possible use xxhash or crc32c,
      on crc32c allocation error, skip speed test and fallback to xxhash
    - For workaround too early init problem (crc32c not avaliable),
      move zero_checksum init to first call of fastcall()
    - Don't alloc page for hash testing, use arch zero pages for that
  v5 -> v6:
    - Use libcrc32c instead of CRYPTO API, mainly for
      code/Kconfig deps Simplification
    - Add crc32c_available():
      libcrc32c will BUG_ON on crc32c problems,
      so test crc32c avaliable by crc32c_available()
    - Simplify choice_fastest_hash()
    - Simplify fasthash()
    - struct rmap_item && stable_node have sizeof == 64 on x86_64,
      that makes them cache friendly. As we don't suffer from hash collisions,
      change hash type from unsigned long back to u32.
    - Fix kbuild robot warning, make all local functions static

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: leesioh <solee@os.korea.ac.kr>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
---
 mm/Kconfig |  2 ++
 mm/ksm.c   | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 03ff7703d322..b60bee4bb07e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -305,6 +305,8 @@ config MMU_NOTIFIER
 config KSM
 	bool "Enable KSM for page merging"
 	depends on MMU
+	select XXHASH
+	select LIBCRC32C
 	help
 	  Enable Kernel Samepage Merging: KSM periodically scans those areas
 	  of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index c406f75957ad..2b84407fb918 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,6 @@
 #include <linux/pagemap.h>
 #include <linux/rmap.h>
 #include <linux/spinlock.h>
-#include <linux/jhash.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
@@ -41,6 +40,13 @@
 #include <linux/numa.h>
 
 #include <asm/tlbflush.h>
+
+/* Support for xxhash and crc32c */
+#include <crypto/hash.h>
+#include <linux/crc32c.h>
+#include <linux/xxhash.h>
+#include <linux/sizes.h>
+
 #include "internal.h"
 
 #ifdef CONFIG_NUMA
@@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
 		sizeof(struct __struct), __alignof__(struct __struct),\
 		(__flags), NULL)
 
+#define TIME_125MS  (HZ >> 3)
+#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
+
+#define HASH_NONE   0
+#define HASH_CRC32C 1
+#define HASH_XXHASH 2
+
+static int fastest_hash = HASH_NONE;
+
+static bool __init crc32c_available(void)
+{
+	static struct shash_desc desc;
+
+	desc.tfm = crypto_alloc_shash("crc32c", 0, 0);
+	desc.flags = 0;
+
+	if (IS_ERR(desc.tfm)) {
+		pr_warn("ksm: alloc crc32c shash error %ld\n",
+			-PTR_ERR(desc.tfm));
+		return false;
+	}
+
+	crypto_free_shash(desc.tfm);
+	return true;
+}
+
+static void __init choice_fastest_hash(void)
+{
+
+	unsigned long je;
+	unsigned long perf_crc32c = 0;
+	unsigned long perf_xxhash = 0;
+
+	fastest_hash = HASH_XXHASH;
+	if (!crc32c_available())
+		goto out;
+
+	preempt_disable();
+	je = jiffies + TIME_125MS;
+	while (time_before(jiffies, je)) {
+		crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
+		perf_crc32c++;
+	}
+	preempt_enable();
+
+	preempt_disable();
+	je = jiffies + TIME_125MS;
+	while (time_before(jiffies, je)) {
+		xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
+		perf_xxhash++;
+	}
+	preempt_enable();
+
+	pr_info("ksm: crc32c hash() %5ld MB/s\n", PERF_TO_MBS(perf_crc32c));
+	pr_info("ksm: xxhash hash() %5ld MB/s\n", PERF_TO_MBS(perf_xxhash));
+
+	if (perf_crc32c > perf_xxhash)
+		fastest_hash = HASH_CRC32C;
+out:
+	if (fastest_hash == HASH_CRC32C)
+		pr_info("ksm: choice crc32c as hash function\n");
+	else
+		pr_info("ksm: choice xxhash as hash function\n");
+}
+
+static u32 fasthash(const void *input, size_t length)
+{
+again:
+	switch (fastest_hash) {
+	case HASH_CRC32C:
+		return crc32c(0, input, length);
+	case HASH_XXHASH:
+		return xxhash(input, length, 0);
+	default:
+		choice_fastest_hash();
+		/* The correct value depends on page size and endianness */
+		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
+		goto again;
+	}
+}
+
 static int __init ksm_slab_init(void)
 {
 	rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0);
@@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page)
 {
 	u32 checksum;
 	void *addr = kmap_atomic(page);
-	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+	checksum = fasthash(addr, PAGE_SIZE);
 	kunmap_atomic(addr);
 	return checksum;
 }
@@ -3061,8 +3148,6 @@ static int __init ksm_init(void)
 	struct task_struct *ksm_thread;
 	int err;
 
-	/* The correct value depends on page size and endianness */
-	zero_checksum = calc_checksum(ZERO_PAGE(0));
 	/* Default to false for backwards compatibility */
 	ksm_use_zero_pages = false;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets
@ 2018-05-08 15:26   ` Claudio Imbrenda
  2018-05-11 23:06     ` Timofey Titovets
  2018-05-22 20:22   ` Pavel Tatashin
  1 sibling, 1 reply; 27+ messages in thread
From: Claudio Imbrenda @ 2018-05-08 15:26 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-mm, leesioh, Andrea Arcangeli, kvm

On Wed, 18 Apr 2018 22:32:20 +0300
Timofey Titovets <nefelim4ag@gmail.com> wrote:

> From: Timofey Titovets <nefelim4ag@gmail.com>
> 
> 1. Pickup, Sioh Lee crc32 patch, after some long conversation
> 2. Merge with my work on xxhash
> 3. Add autoselect code to choice fastest hash helper.
> 
> Base idea are same, replace jhash2 with something faster.
> 
> Perf numbers:
> Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> ksm: crc32c   hash() 12081 MB/s
> ksm: xxh64    hash()  8770 MB/s
> ksm: xxh32    hash()  4529 MB/s
> ksm: jhash2   hash()  1569 MB/s
> 
> As jhash2 always will be slower (for data size like PAGE_SIZE),
> just drop it from choice.
> 
> Add function to autoselect hash algo on boot,
> based on hashing speed, like raid6 code does.
> 
> Move init of zero_checksum from init, to first call of fasthash():
>   1. KSM Init run on early kernel init,
>      run perf testing stuff on main kernel boot thread looks bad to

This is my personal opinion, but I think it would be better and more
uniform to have it during boot like raid6. It doesn't take too much
time, and it allows to see immediately in dmesg what is going on.

> me. 2. Crypto subsystem not avaliable at that early booting,
>      so crc32c even, compiled in, not avaliable
>      As crypto and ksm init, run at subsys_initcall() (4) kernel
> level of init, all possible consumers will run later at 5+ levels

have you tried moving ksm to a later stage? before commit 
a64fb3cd610c8e680 KSM was in fact initialized at level 6. After all, KSM
cannot be triggered until userspace starts.

> Output after first try of KSM to hash page:
> ksm: crc32c hash() 15218 MB/s
> ksm: xxhash hash()  8640 MB/s
> ksm: choice crc32c as hash function
> 
> Thanks.
> 
> Changes:
>   v1 -> v2:
>     - Move xxhash() to xxhash.h/c and separate patches
>   v2 -> v3:
>     - Move xxhash() xxhash.c -> xxhash.h
>     - replace xxhash_t with 'unsigned long'
>     - update kerneldoc above xxhash()
>   v3 -> v4:
>     - Merge xxhash/crc32 patches
>     - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
>     - Add auto speed test and auto choice of fastest hash function
>   v4 -> v5:
>     - Pickup missed xxhash patch
>     - Update code with compile time choicen xxhash
>     - Add more macros to make code more readable
>     - As now that only possible use xxhash or crc32c,
>       on crc32c allocation error, skip speed test and fallback to
> xxhash
>     - For workaround too early init problem (crc32c not avaliable),
>       move zero_checksum init to first call of fastcall()
>     - Don't alloc page for hash testing, use arch zero pages for that
>   v5 -> v6:
>     - Use libcrc32c instead of CRYPTO API, mainly for
>       code/Kconfig deps Simplification
>     - Add crc32c_available():
>       libcrc32c will BUG_ON on crc32c problems,
>       so test crc32c avaliable by crc32c_available()
>     - Simplify choice_fastest_hash()
>     - Simplify fasthash()
>     - struct rmap_item && stable_node have sizeof == 64 on x86_64,
>       that makes them cache friendly. As we don't suffer from hash
> collisions, change hash type from unsigned long back to u32.
>     - Fix kbuild robot warning, make all local functions static
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> Signed-off-by: leesioh <solee@os.korea.ac.kr>
> CC: Andrea Arcangeli <aarcange@redhat.com>
> CC: linux-mm@kvack.org
> CC: kvm@vger.kernel.org
> ---
>  mm/Kconfig |  2 ++
>  mm/ksm.c   | 93
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2
> files changed, 91 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 03ff7703d322..b60bee4bb07e 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -305,6 +305,8 @@ config MMU_NOTIFIER
>  config KSM
>  	bool "Enable KSM for page merging"
>  	depends on MMU
> +	select XXHASH
> +	select LIBCRC32C
>  	help
>  	  Enable Kernel Samepage Merging: KSM periodically scans
> those areas of an application's address space that an app has advised
> may be diff --git a/mm/ksm.c b/mm/ksm.c
> index c406f75957ad..2b84407fb918 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -25,7 +25,6 @@
>  #include <linux/pagemap.h>
>  #include <linux/rmap.h>
>  #include <linux/spinlock.h>
> -#include <linux/jhash.h>
>  #include <linux/delay.h>
>  #include <linux/kthread.h>
>  #include <linux/wait.h>
> @@ -41,6 +40,13 @@
>  #include <linux/numa.h>
> 
>  #include <asm/tlbflush.h>
> +
> +/* Support for xxhash and crc32c */
> +#include <crypto/hash.h>
> +#include <linux/crc32c.h>
> +#include <linux/xxhash.h>
> +#include <linux/sizes.h>
> +
>  #include "internal.h"
> 
>  #ifdef CONFIG_NUMA
> @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
>  		sizeof(struct __struct), __alignof__(struct
> __struct),\ (__flags), NULL)
> 
> +#define TIME_125MS  (HZ >> 3)
> +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
> +
> +#define HASH_NONE   0
> +#define HASH_CRC32C 1
> +#define HASH_XXHASH 2
> +
> +static int fastest_hash = HASH_NONE;
> +
> +static bool __init crc32c_available(void)
> +{
> +	static struct shash_desc desc;
> +
> +	desc.tfm = crypto_alloc_shash("crc32c", 0, 0);

will this work without the crypto api?

> +	desc.flags = 0;
> +
> +	if (IS_ERR(desc.tfm)) {
> +		pr_warn("ksm: alloc crc32c shash error %ld\n",
> +			-PTR_ERR(desc.tfm));
> +		return false;
> +	}
> +
> +	crypto_free_shash(desc.tfm);
> +	return true;
> +}
> +
> +static void __init choice_fastest_hash(void)

s/choice/choose/

> +{
> +
> +	unsigned long je;
> +	unsigned long perf_crc32c = 0;
> +	unsigned long perf_xxhash = 0;
> +
> +	fastest_hash = HASH_XXHASH;
> +	if (!crc32c_available())
> +		goto out;
> +
> +	preempt_disable();
> +	je = jiffies + TIME_125MS;
> +	while (time_before(jiffies, je)) {
> +		crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
> +		perf_crc32c++;
> +	}
> +	preempt_enable();
> +
> +	preempt_disable();
> +	je = jiffies + TIME_125MS;
> +	while (time_before(jiffies, je)) {
> +		xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
> +		perf_xxhash++;
> +	}
> +	preempt_enable();
> +
> +	pr_info("ksm: crc32c hash() %5ld MB/s\n",
> PERF_TO_MBS(perf_crc32c));
> +	pr_info("ksm: xxhash hash() %5ld MB/s\n",
> PERF_TO_MBS(perf_xxhash)); +
> +	if (perf_crc32c > perf_xxhash)
> +		fastest_hash = HASH_CRC32C;
> +out:
> +	if (fastest_hash == HASH_CRC32C)
> +		pr_info("ksm: choice crc32c as hash function\n");
> +	else
> +		pr_info("ksm: choice xxhash as hash function\n");
> +}

I wonder if this can be generalized to have a list of possible hash
functions, filtered by availability, and then tested for performance,
more like the raid6 functions.

> +
> +static u32 fasthash(const void *input, size_t length)
> +{
> +again:
> +	switch (fastest_hash) {
> +	case HASH_CRC32C:
> +		return crc32c(0, input, length);
> +	case HASH_XXHASH:
> +		return xxhash(input, length, 0);
> +	default:
> +		choice_fastest_hash();

same here s/choice/choose/

> +		/* The correct value depends on page size and
> endianness */
> +		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
> +		goto again;
> +	}
> +}
> +

so if I understand correctly, the benchmark function will be called
only when the function is called for the first time?

>  static int __init ksm_slab_init(void)
>  {
>  	rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0);
> @@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page)
>  {
>  	u32 checksum;
>  	void *addr = kmap_atomic(page);
> -	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
> +	checksum = fasthash(addr, PAGE_SIZE);
>  	kunmap_atomic(addr);
>  	return checksum;
>  }
> @@ -3061,8 +3148,6 @@ static int __init ksm_init(void)
>  	struct task_struct *ksm_thread;
>  	int err;
> 
> -	/* The correct value depends on page size and endianness */
> -	zero_checksum = calc_checksum(ZERO_PAGE(0));
>  	/* Default to false for backwards compatibility */
>  	ksm_use_zero_pages = false;
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-08 15:26   ` Claudio Imbrenda
@ 2018-05-11 23:06     ` Timofey Titovets
  2018-05-14 10:17       ` Claudio Imbrenda
  0 siblings, 1 reply; 27+ messages in thread
From: Timofey Titovets @ 2018-05-11 23:06 UTC (permalink / raw)
  To: imbrenda; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm

вт, 8 мая 2018 г. в 18:26, Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>:

> On Wed, 18 Apr 2018 22:32:20 +0300
> Timofey Titovets <nefelim4ag@gmail.com> wrote:

> > From: Timofey Titovets <nefelim4ag@gmail.com>
> >
> > 1. Pickup, Sioh Lee crc32 patch, after some long conversation
> > 2. Merge with my work on xxhash
> > 3. Add autoselect code to choice fastest hash helper.
> >
> > Base idea are same, replace jhash2 with something faster.
> >
> > Perf numbers:
> > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> > ksm: crc32c   hash() 12081 MB/s
> > ksm: xxh64    hash()  8770 MB/s
> > ksm: xxh32    hash()  4529 MB/s
> > ksm: jhash2   hash()  1569 MB/s
> >
> > As jhash2 always will be slower (for data size like PAGE_SIZE),
> > just drop it from choice.
> >
> > Add function to autoselect hash algo on boot,
> > based on hashing speed, like raid6 code does.
> >
> > Move init of zero_checksum from init, to first call of fasthash():
> >   1. KSM Init run on early kernel init,
> >      run perf testing stuff on main kernel boot thread looks bad to

> This is my personal opinion, but I think it would be better and more
> uniform to have it during boot like raid6. It doesn't take too much
> time, and it allows to see immediately in dmesg what is going on.

I don't like such things at boot, that will slowdown boot and add
useless work in *MOST* cases.

ex. Anyone who use btrfs as rootfs must wait raid6_pq init, for mount.
Even if they didn't use raid56 functionality.

Same for ksm, who use ksm? I think that 90% of users currently
are servers with KVM's VMs.

i.e. i don't think that you use it on your notebook,
and add 250ms to every bootup, even, if you did not use ksm
looks as bad idea for me.

And as that a mm subsystem, that will lead to *every linux device in the
world*
with compiled in ksm, will spend time and energy to ksm init.

> > me. 2. Crypto subsystem not avaliable at that early booting,
> >      so crc32c even, compiled in, not avaliable
> >      As crypto and ksm init, run at subsys_initcall() (4) kernel
> > level of init, all possible consumers will run later at 5+ levels

> have you tried moving ksm to a later stage? before commit
> a64fb3cd610c8e680 KSM was in fact initialized at level 6. After all, KSM
> cannot be triggered until userspace starts.

Of course and that works,
but i didn't have sufficient competence,
to suggest such changes.

> > Output after first try of KSM to hash page:
> > ksm: crc32c hash() 15218 MB/s
> > ksm: xxhash hash()  8640 MB/s
> > ksm: choice crc32c as hash function
> >
> > Thanks.
> >
> > Changes:
> >   v1 -> v2:
> >     - Move xxhash() to xxhash.h/c and separate patches
> >   v2 -> v3:
> >     - Move xxhash() xxhash.c -> xxhash.h
> >     - replace xxhash_t with 'unsigned long'
> >     - update kerneldoc above xxhash()
> >   v3 -> v4:
> >     - Merge xxhash/crc32 patches
> >     - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
> >     - Add auto speed test and auto choice of fastest hash function
> >   v4 -> v5:
> >     - Pickup missed xxhash patch
> >     - Update code with compile time choicen xxhash
> >     - Add more macros to make code more readable
> >     - As now that only possible use xxhash or crc32c,
> >       on crc32c allocation error, skip speed test and fallback to
> > xxhash
> >     - For workaround too early init problem (crc32c not avaliable),
> >       move zero_checksum init to first call of fastcall()
> >     - Don't alloc page for hash testing, use arch zero pages for that
> >   v5 -> v6:
> >     - Use libcrc32c instead of CRYPTO API, mainly for
> >       code/Kconfig deps Simplification
> >     - Add crc32c_available():
> >       libcrc32c will BUG_ON on crc32c problems,
> >       so test crc32c avaliable by crc32c_available()
> >     - Simplify choice_fastest_hash()
> >     - Simplify fasthash()
> >     - struct rmap_item && stable_node have sizeof == 64 on x86_64,
> >       that makes them cache friendly. As we don't suffer from hash
> > collisions, change hash type from unsigned long back to u32.
> >     - Fix kbuild robot warning, make all local functions static
> >
> > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> > Signed-off-by: leesioh <solee@os.korea.ac.kr>
> > CC: Andrea Arcangeli <aarcange@redhat.com>
> > CC: linux-mm@kvack.org
> > CC: kvm@vger.kernel.org
> > ---
> >  mm/Kconfig |  2 ++
> >  mm/ksm.c   | 93
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2
> > files changed, 91 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 03ff7703d322..b60bee4bb07e 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -305,6 +305,8 @@ config MMU_NOTIFIER
> >  config KSM
> >       bool "Enable KSM for page merging"
> >       depends on MMU
> > +     select XXHASH
> > +     select LIBCRC32C
> >       help
> >         Enable Kernel Samepage Merging: KSM periodically scans
> > those areas of an application's address space that an app has advised
> > may be diff --git a/mm/ksm.c b/mm/ksm.c
> > index c406f75957ad..2b84407fb918 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -25,7 +25,6 @@
> >  #include <linux/pagemap.h>
> >  #include <linux/rmap.h>
> >  #include <linux/spinlock.h>
> > -#include <linux/jhash.h>
> >  #include <linux/delay.h>
> >  #include <linux/kthread.h>
> >  #include <linux/wait.h>
> > @@ -41,6 +40,13 @@
> >  #include <linux/numa.h>
> >
> >  #include <asm/tlbflush.h>
> > +
> > +/* Support for xxhash and crc32c */
> > +#include <crypto/hash.h>
> > +#include <linux/crc32c.h>
> > +#include <linux/xxhash.h>
> > +#include <linux/sizes.h>
> > +
> >  #include "internal.h"
> >
> >  #ifdef CONFIG_NUMA
> > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
> >               sizeof(struct __struct), __alignof__(struct
> > __struct),\ (__flags), NULL)
> >
> > +#define TIME_125MS  (HZ >> 3)
> > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
> > +
> > +#define HASH_NONE   0
> > +#define HASH_CRC32C 1
> > +#define HASH_XXHASH 2
> > +
> > +static int fastest_hash = HASH_NONE;
> > +
> > +static bool __init crc32c_available(void)
> > +{
> > +     static struct shash_desc desc;
> > +
> > +     desc.tfm = crypto_alloc_shash("crc32c", 0, 0);

> will this work without the crypto api?

I didn't know a way to compile kernel without crypto api,
To many different sub systems depends on him,
if i read Kconfig correctly of course.

> > +     desc.flags = 0;
> > +
> > +     if (IS_ERR(desc.tfm)) {
> > +             pr_warn("ksm: alloc crc32c shash error %ld\n",
> > +                     -PTR_ERR(desc.tfm));
> > +             return false;
> > +     }
> > +
> > +     crypto_free_shash(desc.tfm);
> > +     return true;
> > +}
> > +
> > +static void __init choice_fastest_hash(void)

> s/choice/choose/

> > +{
> > +
> > +     unsigned long je;
> > +     unsigned long perf_crc32c = 0;
> > +     unsigned long perf_xxhash = 0;
> > +
> > +     fastest_hash = HASH_XXHASH;
> > +     if (!crc32c_available())
> > +             goto out;
> > +
> > +     preempt_disable();
> > +     je = jiffies + TIME_125MS;
> > +     while (time_before(jiffies, je)) {
> > +             crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
> > +             perf_crc32c++;
> > +     }
> > +     preempt_enable();
> > +
> > +     preempt_disable();
> > +     je = jiffies + TIME_125MS;
> > +     while (time_before(jiffies, je)) {
> > +             xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
> > +             perf_xxhash++;
> > +     }
> > +     preempt_enable();
> > +
> > +     pr_info("ksm: crc32c hash() %5ld MB/s\n",
> > PERF_TO_MBS(perf_crc32c));
> > +     pr_info("ksm: xxhash hash() %5ld MB/s\n",
> > PERF_TO_MBS(perf_xxhash)); +
> > +     if (perf_crc32c > perf_xxhash)
> > +             fastest_hash = HASH_CRC32C;
> > +out:
> > +     if (fastest_hash == HASH_CRC32C)
> > +             pr_info("ksm: choice crc32c as hash function\n");
> > +     else
> > +             pr_info("ksm: choice xxhash as hash function\n");
> > +}

> I wonder if this can be generalized to have a list of possible hash
> functions, filtered by availability, and then tested for performance,
> more like the raid6 functions.

IIRC:
We was talk about that on old version of patch set.
And we decide what:
  - in ideal situation, ksm must use only one hash function, always.
    But, we afraid about that crc32c with hardware acceleration, can be
missed by some way.
    So, as appropriate fallback, xxhash added, as general proporse, which
must work
    good enough for ksm in most cases.

So adding more complex logic, like raid6_pq have with all of different
instruction set are overkill.

> > +
> > +static u32 fasthash(const void *input, size_t length)
> > +{
> > +again:
> > +     switch (fastest_hash) {
> > +     case HASH_CRC32C:
> > +             return crc32c(0, input, length);
> > +     case HASH_XXHASH:
> > +             return xxhash(input, length, 0);
> > +     default:
> > +             choice_fastest_hash();

> same here s/choice/choose/

> > +             /* The correct value depends on page size and
> > endianness */
> > +             zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
> > +             goto again;
> > +     }
> > +}
> > +

> so if I understand correctly, the benchmark function will be called
> only when the function is called for the first time?

yes, that is.
That a little bit tricky,
but it's will be called only from KSM thread,
and only what KSM thread will try do some useful work.

So that must not block anything.

Thanks.

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-11 23:06     ` Timofey Titovets
@ 2018-05-14 10:17       ` Claudio Imbrenda
  2018-05-16 10:26         ` Timofey Titovets
  0 siblings, 1 reply; 27+ messages in thread
From: Claudio Imbrenda @ 2018-05-14 10:17 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm

On Sat, 12 May 2018 02:06:20 +0300
Timofey Titovets <nefelim4ag@gmail.com> wrote:

> вт, 8 мая 2018 г. в 18:26, Claudio Imbrenda
> <imbrenda@linux.vnet.ibm.com>:
> 
> > On Wed, 18 Apr 2018 22:32:20 +0300
> > Timofey Titovets <nefelim4ag@gmail.com> wrote:  
> 
> > > From: Timofey Titovets <nefelim4ag@gmail.com>
> > >
> > > 1. Pickup, Sioh Lee crc32 patch, after some long conversation
> > > 2. Merge with my work on xxhash
> > > 3. Add autoselect code to choice fastest hash helper.
> > >
> > > Base idea are same, replace jhash2 with something faster.
> > >
> > > Perf numbers:
> > > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> > > ksm: crc32c   hash() 12081 MB/s
> > > ksm: xxh64    hash()  8770 MB/s
> > > ksm: xxh32    hash()  4529 MB/s
> > > ksm: jhash2   hash()  1569 MB/s
> > >
> > > As jhash2 always will be slower (for data size like PAGE_SIZE),
> > > just drop it from choice.
> > >
> > > Add function to autoselect hash algo on boot,
> > > based on hashing speed, like raid6 code does.
> > >
> > > Move init of zero_checksum from init, to first call of fasthash():
> > >   1. KSM Init run on early kernel init,
> > >      run perf testing stuff on main kernel boot thread looks bad
> > > to  
> 
> > This is my personal opinion, but I think it would be better and more
> > uniform to have it during boot like raid6. It doesn't take too much
> > time, and it allows to see immediately in dmesg what is going on.  
> 
> I don't like such things at boot, that will slowdown boot and add
> useless work in *MOST* cases.
> 
> ex. Anyone who use btrfs as rootfs must wait raid6_pq init, for mount.
> Even if they didn't use raid56 functionality.
> 
> Same for ksm, who use ksm? I think that 90% of users currently
> are servers with KVM's VMs.
> 
> i.e. i don't think that you use it on your notebook,
> and add 250ms to every bootup, even, if you did not use ksm
> looks as bad idea for me.
> 
> And as that a mm subsystem, that will lead to *every linux device in
> the world*
> with compiled in ksm, will spend time and energy to ksm init.

fair enough

> > > me. 2. Crypto subsystem not avaliable at that early booting,
> > >      so crc32c even, compiled in, not avaliable
> > >      As crypto and ksm init, run at subsys_initcall() (4) kernel
> > > level of init, all possible consumers will run later at 5+
> > > levels  
> 
> > have you tried moving ksm to a later stage? before commit
> > a64fb3cd610c8e680 KSM was in fact initialized at level 6. After
> > all, KSM cannot be triggered until userspace starts.  
> 
> Of course and that works,
> but i didn't have sufficient competence,
> to suggest such changes.
> 
> > > Output after first try of KSM to hash page:
> > > ksm: crc32c hash() 15218 MB/s
> > > ksm: xxhash hash()  8640 MB/s
> > > ksm: choice crc32c as hash function
> > >
> > > Thanks.
> > >
> > > Changes:
> > >   v1 -> v2:
> > >     - Move xxhash() to xxhash.h/c and separate patches
> > >   v2 -> v3:
> > >     - Move xxhash() xxhash.c -> xxhash.h
> > >     - replace xxhash_t with 'unsigned long'
> > >     - update kerneldoc above xxhash()
> > >   v3 -> v4:
> > >     - Merge xxhash/crc32 patches
> > >     - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
> > >     - Add auto speed test and auto choice of fastest hash function
> > >   v4 -> v5:
> > >     - Pickup missed xxhash patch
> > >     - Update code with compile time choicen xxhash
> > >     - Add more macros to make code more readable
> > >     - As now that only possible use xxhash or crc32c,
> > >       on crc32c allocation error, skip speed test and fallback to
> > > xxhash
> > >     - For workaround too early init problem (crc32c not
> > > avaliable), move zero_checksum init to first call of fastcall()
> > >     - Don't alloc page for hash testing, use arch zero pages for
> > > that v5 -> v6:
> > >     - Use libcrc32c instead of CRYPTO API, mainly for
> > >       code/Kconfig deps Simplification
> > >     - Add crc32c_available():
> > >       libcrc32c will BUG_ON on crc32c problems,
> > >       so test crc32c avaliable by crc32c_available()
> > >     - Simplify choice_fastest_hash()
> > >     - Simplify fasthash()
> > >     - struct rmap_item && stable_node have sizeof == 64 on x86_64,
> > >       that makes them cache friendly. As we don't suffer from hash
> > > collisions, change hash type from unsigned long back to u32.
> > >     - Fix kbuild robot warning, make all local functions static
> > >
> > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> > > Signed-off-by: leesioh <solee@os.korea.ac.kr>
> > > CC: Andrea Arcangeli <aarcange@redhat.com>
> > > CC: linux-mm@kvack.org
> > > CC: kvm@vger.kernel.org
> > > ---
> > >  mm/Kconfig |  2 ++
> > >  mm/ksm.c   | 93
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2
> > > files changed, 91 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 03ff7703d322..b60bee4bb07e 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -305,6 +305,8 @@ config MMU_NOTIFIER
> > >  config KSM
> > >       bool "Enable KSM for page merging"
> > >       depends on MMU
> > > +     select XXHASH
> > > +     select LIBCRC32C
> > >       help
> > >         Enable Kernel Samepage Merging: KSM periodically scans
> > > those areas of an application's address space that an app has
> > > advised may be diff --git a/mm/ksm.c b/mm/ksm.c
> > > index c406f75957ad..2b84407fb918 100644
> > > --- a/mm/ksm.c
> > > +++ b/mm/ksm.c
> > > @@ -25,7 +25,6 @@
> > >  #include <linux/pagemap.h>
> > >  #include <linux/rmap.h>
> > >  #include <linux/spinlock.h>
> > > -#include <linux/jhash.h>
> > >  #include <linux/delay.h>
> > >  #include <linux/kthread.h>
> > >  #include <linux/wait.h>
> > > @@ -41,6 +40,13 @@
> > >  #include <linux/numa.h>
> > >
> > >  #include <asm/tlbflush.h>
> > > +
> > > +/* Support for xxhash and crc32c */
> > > +#include <crypto/hash.h>
> > > +#include <linux/crc32c.h>
> > > +#include <linux/xxhash.h>
> > > +#include <linux/sizes.h>
> > > +
> > >  #include "internal.h"
> > >
> > >  #ifdef CONFIG_NUMA
> > > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
> > >               sizeof(struct __struct), __alignof__(struct
> > > __struct),\ (__flags), NULL)
> > >
> > > +#define TIME_125MS  (HZ >> 3)
> > > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
> > > +
> > > +#define HASH_NONE   0
> > > +#define HASH_CRC32C 1
> > > +#define HASH_XXHASH 2
> > > +
> > > +static int fastest_hash = HASH_NONE;
> > > +
> > > +static bool __init crc32c_available(void)
> > > +{
> > > +     static struct shash_desc desc;
> > > +
> > > +     desc.tfm = crypto_alloc_shash("crc32c", 0, 0);  
> 
> > will this work without the crypto api?  
> 
> I didn't know a way to compile kernel without crypto api,
> To many different sub systems depends on him,
> if i read Kconfig correctly of course.

I'm confused here. Why did you want to drop the dependency on the
crypto API in Kconfig if you are using it anyway? Or did I
misunderstand?

> > > +     desc.flags = 0;
> > > +
> > > +     if (IS_ERR(desc.tfm)) {
> > > +             pr_warn("ksm: alloc crc32c shash error %ld\n",
> > > +                     -PTR_ERR(desc.tfm));
> > > +             return false;
> > > +     }
> > > +
> > > +     crypto_free_shash(desc.tfm);
> > > +     return true;
> > > +}
> > > +
> > > +static void __init choice_fastest_hash(void)  
> 
> > s/choice/choose/  
> 
> > > +{
> > > +
> > > +     unsigned long je;
> > > +     unsigned long perf_crc32c = 0;
> > > +     unsigned long perf_xxhash = 0;
> > > +
> > > +     fastest_hash = HASH_XXHASH;
> > > +     if (!crc32c_available())
> > > +             goto out;
> > > +
> > > +     preempt_disable();
> > > +     je = jiffies + TIME_125MS;
> > > +     while (time_before(jiffies, je)) {
> > > +             crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
> > > +             perf_crc32c++;
> > > +     }
> > > +     preempt_enable();
> > > +
> > > +     preempt_disable();
> > > +     je = jiffies + TIME_125MS;
> > > +     while (time_before(jiffies, je)) {
> > > +             xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
> > > +             perf_xxhash++;
> > > +     }
> > > +     preempt_enable();
> > > +
> > > +     pr_info("ksm: crc32c hash() %5ld MB/s\n",
> > > PERF_TO_MBS(perf_crc32c));
> > > +     pr_info("ksm: xxhash hash() %5ld MB/s\n",
> > > PERF_TO_MBS(perf_xxhash)); +
> > > +     if (perf_crc32c > perf_xxhash)
> > > +             fastest_hash = HASH_CRC32C;
> > > +out:
> > > +     if (fastest_hash == HASH_CRC32C)
> > > +             pr_info("ksm: choice crc32c as hash function\n");
> > > +     else
> > > +             pr_info("ksm: choice xxhash as hash function\n");
> > > +}  
> 
> > I wonder if this can be generalized to have a list of possible hash
> > functions, filtered by availability, and then tested for
> > performance, more like the raid6 functions.  
> 
> IIRC:
> We was talk about that on old version of patch set.
> And we decide what:
>   - in ideal situation, ksm must use only one hash function, always.
>     But, we afraid about that crc32c with hardware acceleration, can
> be missed by some way.
>     So, as appropriate fallback, xxhash added, as general proporse,
> which must work
>     good enough for ksm in most cases.
> 
> So adding more complex logic, like raid6_pq have with all of different
> instruction set are overkill.

fair enough

> > > +
> > > +static u32 fasthash(const void *input, size_t length)
> > > +{
> > > +again:
> > > +     switch (fastest_hash) {
> > > +     case HASH_CRC32C:
> > > +             return crc32c(0, input, length);
> > > +     case HASH_XXHASH:
> > > +             return xxhash(input, length, 0);
> > > +     default:
> > > +             choice_fastest_hash();  
> 
> > same here s/choice/choose/  
> 
> > > +             /* The correct value depends on page size and
> > > endianness */
> > > +             zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
> > > +             goto again;
> > > +     }
> > > +}
> > > +  
> 
> > so if I understand correctly, the benchmark function will be called
> > only when the function is called for the first time?  
> 
> yes, that is.
> That a little bit tricky,
> but it's will be called only from KSM thread,
> and only what KSM thread will try do some useful work.
> 
> So that must not block anything.
> 
> Thanks.


best regards

Claudio Imbrenda

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-14 10:17       ` Claudio Imbrenda
@ 2018-05-16 10:26         ` Timofey Titovets
  0 siblings, 0 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-05-16 10:26 UTC (permalink / raw)
  To: imbrenda; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm

пн, 14 мая 2018 г. в 13:17, Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>:

> On Sat, 12 May 2018 02:06:20 +0300
> Timofey Titovets <nefelim4ag@gmail.com> wrote:

> > вт, 8 мая 2018 г. в 18:26, Claudio Imbrenda
> > <imbrenda@linux.vnet.ibm.com>:
> >
> > > On Wed, 18 Apr 2018 22:32:20 +0300
> > > Timofey Titovets <nefelim4ag@gmail.com> wrote:
> >
> > > > From: Timofey Titovets <nefelim4ag@gmail.com>
> > > >
> > > > 1. Pickup, Sioh Lee crc32 patch, after some long conversation
> > > > 2. Merge with my work on xxhash
> > > > 3. Add autoselect code to choice fastest hash helper.
> > > >
> > > > Base idea are same, replace jhash2 with something faster.
> > > >
> > > > Perf numbers:
> > > > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> > > > ksm: crc32c   hash() 12081 MB/s
> > > > ksm: xxh64    hash()  8770 MB/s
> > > > ksm: xxh32    hash()  4529 MB/s
> > > > ksm: jhash2   hash()  1569 MB/s
> > > >
> > > > As jhash2 always will be slower (for data size like PAGE_SIZE),
> > > > just drop it from choice.
> > > >
> > > > Add function to autoselect hash algo on boot,
> > > > based on hashing speed, like raid6 code does.
> > > >
> > > > Move init of zero_checksum from init, to first call of fasthash():
> > > >   1. KSM Init run on early kernel init,
> > > >      run perf testing stuff on main kernel boot thread looks bad
> > > > to
> >
> > > This is my personal opinion, but I think it would be better and more
> > > uniform to have it during boot like raid6. It doesn't take too much
> > > time, and it allows to see immediately in dmesg what is going on.
> >
> > I don't like such things at boot, that will slowdown boot and add
> > useless work in *MOST* cases.
> >
> > ex. Anyone who use btrfs as rootfs must wait raid6_pq init, for mount.
> > Even if they didn't use raid56 functionality.
> >
> > Same for ksm, who use ksm? I think that 90% of users currently
> > are servers with KVM's VMs.
> >
> > i.e. i don't think that you use it on your notebook,
> > and add 250ms to every bootup, even, if you did not use ksm
> > looks as bad idea for me.
> >
> > And as that a mm subsystem, that will lead to *every linux device in
> > the world*
> > with compiled in ksm, will spend time and energy to ksm init.

> fair enough

> > > > me. 2. Crypto subsystem not avaliable at that early booting,
> > > >      so crc32c even, compiled in, not avaliable
> > > >      As crypto and ksm init, run at subsys_initcall() (4) kernel
> > > > level of init, all possible consumers will run later at 5+
> > > > levels
> >
> > > have you tried moving ksm to a later stage? before commit
> > > a64fb3cd610c8e680 KSM was in fact initialized at level 6. After
> > > all, KSM cannot be triggered until userspace starts.
> >
> > Of course and that works,
> > but i didn't have sufficient competence,
> > to suggest such changes.
> >
> > > > Output after first try of KSM to hash page:
> > > > ksm: crc32c hash() 15218 MB/s
> > > > ksm: xxhash hash()  8640 MB/s
> > > > ksm: choice crc32c as hash function
> > > >
> > > > Thanks.
> > > >
> > > > Changes:
> > > >   v1 -> v2:
> > > >     - Move xxhash() to xxhash.h/c and separate patches
> > > >   v2 -> v3:
> > > >     - Move xxhash() xxhash.c -> xxhash.h
> > > >     - replace xxhash_t with 'unsigned long'
> > > >     - update kerneldoc above xxhash()
> > > >   v3 -> v4:
> > > >     - Merge xxhash/crc32 patches
> > > >     - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
> > > >     - Add auto speed test and auto choice of fastest hash function
> > > >   v4 -> v5:
> > > >     - Pickup missed xxhash patch
> > > >     - Update code with compile time choicen xxhash
> > > >     - Add more macros to make code more readable
> > > >     - As now that only possible use xxhash or crc32c,
> > > >       on crc32c allocation error, skip speed test and fallback to
> > > > xxhash
> > > >     - For workaround too early init problem (crc32c not
> > > > avaliable), move zero_checksum init to first call of fastcall()
> > > >     - Don't alloc page for hash testing, use arch zero pages for
> > > > that v5 -> v6:
> > > >     - Use libcrc32c instead of CRYPTO API, mainly for
> > > >       code/Kconfig deps Simplification
> > > >     - Add crc32c_available():
> > > >       libcrc32c will BUG_ON on crc32c problems,
> > > >       so test crc32c avaliable by crc32c_available()
> > > >     - Simplify choice_fastest_hash()
> > > >     - Simplify fasthash()
> > > >     - struct rmap_item && stable_node have sizeof == 64 on x86_64,
> > > >       that makes them cache friendly. As we don't suffer from hash
> > > > collisions, change hash type from unsigned long back to u32.
> > > >     - Fix kbuild robot warning, make all local functions static
> > > >
> > > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> > > > Signed-off-by: leesioh <solee@os.korea.ac.kr>
> > > > CC: Andrea Arcangeli <aarcange@redhat.com>
> > > > CC: linux-mm@kvack.org
> > > > CC: kvm@vger.kernel.org
> > > > ---
> > > >  mm/Kconfig |  2 ++
> > > >  mm/ksm.c   | 93
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2
> > > > files changed, 91 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > > index 03ff7703d322..b60bee4bb07e 100644
> > > > --- a/mm/Kconfig
> > > > +++ b/mm/Kconfig
> > > > @@ -305,6 +305,8 @@ config MMU_NOTIFIER
> > > >  config KSM
> > > >       bool "Enable KSM for page merging"
> > > >       depends on MMU
> > > > +     select XXHASH
> > > > +     select LIBCRC32C
> > > >       help
> > > >         Enable Kernel Samepage Merging: KSM periodically scans
> > > > those areas of an application's address space that an app has
> > > > advised may be diff --git a/mm/ksm.c b/mm/ksm.c
> > > > index c406f75957ad..2b84407fb918 100644
> > > > --- a/mm/ksm.c
> > > > +++ b/mm/ksm.c
> > > > @@ -25,7 +25,6 @@
> > > >  #include <linux/pagemap.h>
> > > >  #include <linux/rmap.h>
> > > >  #include <linux/spinlock.h>
> > > > -#include <linux/jhash.h>
> > > >  #include <linux/delay.h>
> > > >  #include <linux/kthread.h>
> > > >  #include <linux/wait.h>
> > > > @@ -41,6 +40,13 @@
> > > >  #include <linux/numa.h>
> > > >
> > > >  #include <asm/tlbflush.h>
> > > > +
> > > > +/* Support for xxhash and crc32c */
> > > > +#include <crypto/hash.h>
> > > > +#include <linux/crc32c.h>
> > > > +#include <linux/xxhash.h>
> > > > +#include <linux/sizes.h>
> > > > +
> > > >  #include "internal.h"
> > > >
> > > >  #ifdef CONFIG_NUMA
> > > > @@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
> > > >               sizeof(struct __struct), __alignof__(struct
> > > > __struct),\ (__flags), NULL)
> > > >
> > > > +#define TIME_125MS  (HZ >> 3)
> > > > +#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
> > > > +
> > > > +#define HASH_NONE   0
> > > > +#define HASH_CRC32C 1
> > > > +#define HASH_XXHASH 2
> > > > +
> > > > +static int fastest_hash = HASH_NONE;
> > > > +
> > > > +static bool __init crc32c_available(void)
> > > > +{
> > > > +     static struct shash_desc desc;
> > > > +
> > > > +     desc.tfm = crypto_alloc_shash("crc32c", 0, 0);
> >
> > > will this work without the crypto api?
> >
> > I didn't know a way to compile kernel without crypto api,
> > To many different sub systems depends on him,
> > if i read Kconfig correctly of course.

> I'm confused here. Why did you want to drop the dependency on the
> crypto API in Kconfig if you are using it anyway? Or did I
> misunderstand?

Misunderstanding, i think.

You ask,
`will this work without the crypto api?`

I answer that: i didn't see obvious way to try compile that without crypto
api.
Even without add deps on crypto api from ksm, kernel have plenty of other
subsystems,
which depends on crypto api.

(As patch add dependency on crypto api,
we have no way to compile ksm without crypto api,
so i'm not sure what ksm without crypto api are issue which must be fixed)

Thanks.

> > > > +     desc.flags = 0;
> > > > +
> > > > +     if (IS_ERR(desc.tfm)) {
> > > > +             pr_warn("ksm: alloc crc32c shash error %ld\n",
> > > > +                     -PTR_ERR(desc.tfm));
> > > > +             return false;
> > > > +     }
> > > > +
> > > > +     crypto_free_shash(desc.tfm);
> > > > +     return true;
> > > > +}
> > > > +
> > > > +static void __init choice_fastest_hash(void)
> >
> > > s/choice/choose/
> >
> > > > +{
> > > > +
> > > > +     unsigned long je;
> > > > +     unsigned long perf_crc32c = 0;
> > > > +     unsigned long perf_xxhash = 0;
> > > > +
> > > > +     fastest_hash = HASH_XXHASH;
> > > > +     if (!crc32c_available())
> > > > +             goto out;
> > > > +
> > > > +     preempt_disable();
> > > > +     je = jiffies + TIME_125MS;
> > > > +     while (time_before(jiffies, je)) {
> > > > +             crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
> > > > +             perf_crc32c++;
> > > > +     }
> > > > +     preempt_enable();
> > > > +
> > > > +     preempt_disable();
> > > > +     je = jiffies + TIME_125MS;
> > > > +     while (time_before(jiffies, je)) {
> > > > +             xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
> > > > +             perf_xxhash++;
> > > > +     }
> > > > +     preempt_enable();
> > > > +
> > > > +     pr_info("ksm: crc32c hash() %5ld MB/s\n",
> > > > PERF_TO_MBS(perf_crc32c));
> > > > +     pr_info("ksm: xxhash hash() %5ld MB/s\n",
> > > > PERF_TO_MBS(perf_xxhash)); +
> > > > +     if (perf_crc32c > perf_xxhash)
> > > > +             fastest_hash = HASH_CRC32C;
> > > > +out:
> > > > +     if (fastest_hash == HASH_CRC32C)
> > > > +             pr_info("ksm: choice crc32c as hash function\n");
> > > > +     else
> > > > +             pr_info("ksm: choice xxhash as hash function\n");
> > > > +}
> >
> > > I wonder if this can be generalized to have a list of possible hash
> > > functions, filtered by availability, and then tested for
> > > performance, more like the raid6 functions.
> >
> > IIRC:
> > We was talk about that on old version of patch set.
> > And we decide what:
> >   - in ideal situation, ksm must use only one hash function, always.
> >     But, we afraid about that crc32c with hardware acceleration, can
> > be missed by some way.
> >     So, as appropriate fallback, xxhash added, as general proporse,
> > which must work
> >     good enough for ksm in most cases.
> >
> > So adding more complex logic, like raid6_pq have with all of different
> > instruction set are overkill.

> fair enough

> > > > +
> > > > +static u32 fasthash(const void *input, size_t length)
> > > > +{
> > > > +again:
> > > > +     switch (fastest_hash) {
> > > > +     case HASH_CRC32C:
> > > > +             return crc32c(0, input, length);
> > > > +     case HASH_XXHASH:
> > > > +             return xxhash(input, length, 0);
> > > > +     default:
> > > > +             choice_fastest_hash();
> >
> > > same here s/choice/choose/
> >
> > > > +             /* The correct value depends on page size and
> > > > endianness */
> > > > +             zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
> > > > +             goto again;
> > > > +     }
> > > > +}
> > > > +
> >
> > > so if I understand correctly, the benchmark function will be called
> > > only when the function is called for the first time?
> >
> > yes, that is.
> > That a little bit tricky,
> > but it's will be called only from KSM thread,
> > and only what KSM thread will try do some useful work.
> >
> > So that must not block anything.
> >
> > Thanks.


> best regards

> Claudio Imbrenda



--
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets
  2018-05-08 15:26   ` Claudio Imbrenda
@ 2018-05-22 20:22   ` Pavel Tatashin
  2018-05-23 13:45     ` Timofey Titovets
  1 sibling, 1 reply; 27+ messages in thread
From: Pavel Tatashin @ 2018-05-22 20:22 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-mm, leesioh, Andrea Arcangeli, kvm

Hi Timofey,

> 
> Perf numbers:
> Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> ksm: crc32c   hash() 12081 MB/s
> ksm: xxh64    hash()  8770 MB/s
> ksm: xxh32    hash()  4529 MB/s
> ksm: jhash2   hash()  1569 MB/s

That is a very nice improvement over jhash2!

> Add function to autoselect hash algo on boot,
> based on hashing speed, like raid6 code does.

Are you aware of hardware where crc32c is slower compared to xxhash?
Perhaps always use crc32c when available?

> +
> +static u32 fasthash(const void *input, size_t length)
> +{
> +again:
> +	switch (fastest_hash) {
> +	case HASH_CRC32C:
> +		return crc32c(0, input, length);
> +	case HASH_XXHASH:
> +		return xxhash(input, length, 0);

You are loosing half of 64-bit word in xxh64 case? Is this acceptable? May
be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ?

> +	default:
> +		choice_fastest_hash();
> +		/* The correct value depends on page size and endianness */
> +		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
> +		goto again;
> +	}
> +}

choice_fastest_hash() does not belong to fasthash(). We are loosing leaf
function optimizations if you keep it in this hot-path. Also, fastest_hash
should really be a static branch in order to avoid extra load and conditional
branch.

I think, crc32c should simply be used when it is available, and use xxhash
otherwise, the decision should be made in ksm_init()

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-22 20:22   ` Pavel Tatashin
@ 2018-05-23 13:45     ` Timofey Titovets
  2018-05-23 14:24       ` Pavel Tatashin
  0 siblings, 1 reply; 27+ messages in thread
From: Timofey Titovets @ 2018-05-23 13:45 UTC (permalink / raw)
  To: pasha.tatashin; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm

вт, 22 мая 2018 г. в 23:22, Pavel Tatashin <pasha.tatashin@oracle.com>:

> Hi Timofey,

> >
> > Perf numbers:
> > Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> > ksm: crc32c   hash() 12081 MB/s
> > ksm: xxh64    hash()  8770 MB/s
> > ksm: xxh32    hash()  4529 MB/s
> > ksm: jhash2   hash()  1569 MB/s

> That is a very nice improvement over jhash2!

> > Add function to autoselect hash algo on boot,
> > based on hashing speed, like raid6 code does.

> Are you aware of hardware where crc32c is slower compared to xxhash?
> Perhaps always use crc32c when available?

crc32c will always be available, because of Kconfig.
But if crc32c doesn't have HW acceleration, it will be slower.

For talk about range of HW, i must have that HW,
so i can't say that *all* supported HW, have crc32c with acceleration.

> > +
> > +static u32 fasthash(const void *input, size_t length)
> > +{
> > +again:
> > +     switch (fastest_hash) {
> > +     case HASH_CRC32C:
> > +             return crc32c(0, input, length);
> > +     case HASH_XXHASH:
> > +             return xxhash(input, length, 0);

> You are loosing half of 64-bit word in xxh64 case? Is this acceptable? May
> be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ?

AFAIK, that lead to make hash function worse.
Even, in ksm hash used only for check if page has changed since last scan,
so that doesn't matter really (IMHO).

> > +     default:
> > +             choice_fastest_hash();
> > +             /* The correct value depends on page size and endianness
*/
> > +             zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
> > +             goto again;
> > +     }
> > +}

> choice_fastest_hash() does not belong to fasthash(). We are loosing leaf
> function optimizations if you keep it in this hot-path. Also, fastest_hash
> should really be a static branch in order to avoid extra load and
conditional
> branch.

I don't think what that will give any noticeable performance benefit.
In compare to hash computation and memcmp in RB.

In theory, that can be replaced with self written jump table, to *avoid*
run time overhead.
AFAIK at 5 entries, gcc convert switch to jump table itself.

> I think, crc32c should simply be used when it is available, and use xxhash
> otherwise, the decision should be made in ksm_init()

I already said, in above conversation, why i think do that at ksm_init() is
a bad idea.

> Thank you,
> Pavel

Thanks.

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-23 13:45     ` Timofey Titovets
@ 2018-05-23 14:24       ` Pavel Tatashin
  2018-05-24  8:01         ` Timofey Titovets
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Tatashin @ 2018-05-23 14:24 UTC (permalink / raw)
  To: nefelim4ag; +Cc: Linux Memory Management List, solee, aarcange, kvm

Hi Timofey,

> crc32c will always be available, because of Kconfig.
> But if crc32c doesn't have HW acceleration, it will be slower.

> For talk about range of HW, i must have that HW,
> so i can't say that *all* supported HW, have crc32c with acceleration.

How about always defaulting to crc32c when HW acceleration is present
without doing timings?
Do you have performance numbers of crc32c without acceleration?

> > You are loosing half of 64-bit word in xxh64 case? Is this acceptable?
May
> > be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v ?

> AFAIK, that lead to make hash function worse.
> Even, in ksm hash used only for check if page has changed since last scan,
> so that doesn't matter really (IMHO).

I understand that losing half of the hash result might be acceptable in
this case, but I am not really sure how XOirng one more time can possibly
make hash function worse, could you please elaborate?

> > choice_fastest_hash() does not belong to fasthash(). We are loosing leaf
> > function optimizations if you keep it in this hot-path. Also,
fastest_hash
> > should really be a static branch in order to avoid extra load and
> conditional
> > branch.

> I don't think what that will give any noticeable performance benefit.
> In compare to hash computation and memcmp in RB.

You are right, it is small compared to hash and memcmp, but still I think
it makes sense to use static branch, after all the value will never change
during runtime after the first time it is set.


> In theory, that can be replaced with self written jump table, to *avoid*
> run time overhead.
> AFAIK at 5 entries, gcc convert switch to jump table itself.

> > I think, crc32c should simply be used when it is available, and use
xxhash
> > otherwise, the decision should be made in ksm_init()

> I already said, in above conversation, why i think do that at ksm_init()
is
> a bad idea.

It really feels wrong to keep  choice_fastest_hash() in fasthash(), it is
done only once and really belongs to the init function, like ksm_init(). As
I understand, you think it is a bad idea to keep it in ksm_init() because
it slows down boot by 0.25s, which I agree with your is substantial. But, I
really do not think that we should spend those 0.25s at all deciding what
hash function is optimal, and instead default to one or another during boot
based on hardware we are booting on. If crc32c without hw acceleration is
no worse than jhash2, maybe we should simply switch to  crc32c?

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-23 14:24       ` Pavel Tatashin
@ 2018-05-24  8:01         ` Timofey Titovets
  2018-05-25  1:16           ` Pavel Tatashin
  2018-05-27 13:03           ` [PATCH V6 2/2 RESEND] " Mike Rapoport
  0 siblings, 2 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-05-24  8:01 UTC (permalink / raw)
  To: pasha.tatashin; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm

ср, 23 мая 2018 г. в 17:24, Pavel Tatashin <pasha.tatashin@oracle.com>:

> Hi Timofey,

> > crc32c will always be available, because of Kconfig.
> > But if crc32c doesn't have HW acceleration, it will be slower.

> > For talk about range of HW, i must have that HW,
> > so i can't say that *all* supported HW, have crc32c with acceleration.

> How about always defaulting to crc32c when HW acceleration is present
> without doing timings?
IIRC, yes, shash api can return 'cra_priority'.

> Do you have performance numbers of crc32c without acceleration?
Yes, https://lkml.org/lkml/2017/12/30/222

The experimental results (the experimental value is the average of the
measured values)
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns

> > > You are loosing half of 64-bit word in xxh64 case? Is this acceptable?
> May
> > > be do one more xor: in 64-bit case in xxhash() do: (v >> 32) | (u32)v
?

> > AFAIK, that lead to make hash function worse.
> > Even, in ksm hash used only for check if page has changed since last
scan,
> > so that doesn't matter really (IMHO).

> I understand that losing half of the hash result might be acceptable in
> this case, but I am not really sure how XOirng one more time can possibly
> make hash function worse, could you please elaborate?

IIRC, because of xor are symmetric
i.e. shift:
0b01011010 >> 4 = 0b0101
and xor:
0b0101 ^ 0b1010 = 0b1111
Xor will decrease randomness/entropy and will lead to hash collisions.

> > > choice_fastest_hash() does not belong to fasthash(). We are loosing
leaf
> > > function optimizations if you keep it in this hot-path. Also,
> fastest_hash
> > > should really be a static branch in order to avoid extra load and
> > conditional
> > > branch.

> > I don't think what that will give any noticeable performance benefit.
> > In compare to hash computation and memcmp in RB.

> You are right, it is small compared to hash and memcmp, but still I think
> it makes sense to use static branch, after all the value will never change
> during runtime after the first time it is set.


> > In theory, that can be replaced with self written jump table, to *avoid*
> > run time overhead.
> > AFAIK at 5 entries, gcc convert switch to jump table itself.

> > > I think, crc32c should simply be used when it is available, and use
> xxhash
> > > otherwise, the decision should be made in ksm_init()

> > I already said, in above conversation, why i think do that at ksm_init()
> is
> > a bad idea.

> It really feels wrong to keep  choice_fastest_hash() in fasthash(), it is
> done only once and really belongs to the init function, like ksm_init().
As

That possible to move decision from lazy load, to ksm_thread,
that will allow us to start bench and not slowdown boot.

But for that to works, ksm must start later, after init of crypto.

> I understand, you think it is a bad idea to keep it in ksm_init() because
> it slows down boot by 0.25s, which I agree with your is substantial. But,
I
> really do not think that we should spend those 0.25s at all deciding what
> hash function is optimal, and instead default to one or another during
boot
> based on hardware we are booting on. If crc32c without hw acceleration is
> no worse than jhash2, maybe we should simply switch to  crc32c?

crc32c with no hw, are slower in compare to jhash2 on x86, so i think on
other arches result will be same.

> Thank you,
> Pavel

Thanks.

--
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-24  8:01         ` Timofey Titovets
@ 2018-05-25  1:16           ` Pavel Tatashin
  2018-05-26 20:25             ` [PATCH] " kbuild test robot
  2018-05-26 21:06             ` kbuild test robot
  2018-05-27 13:03           ` [PATCH V6 2/2 RESEND] " Mike Rapoport
  1 sibling, 2 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-05-25  1:16 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm

Hi Timofey,

> > Do you have performance numbers of crc32c without acceleration?
> Yes, https://lkml.org/lkml/2017/12/30/222
> 
> The experimental results (the experimental value is the average of the
> measured values)
> crc32c_intel: 1084.10ns
> crc32c (no hardware acceleration): 7012.51ns
> xxhash32: 2227.75ns
> xxhash64: 1413.16ns
> jhash2: 5128.30ns

Excellent, thank you for this data.

> > I understand that losing half of the hash result might be acceptable in
> > this case, but I am not really sure how XOirng one more time can possibly
> > make hash function worse, could you please elaborate?
> 
> IIRC, because of xor are symmetric
> i.e. shift:
> 0b01011010 >> 4 = 0b0101
> and xor:
> 0b0101 ^ 0b1010 = 0b1111
> Xor will decrease randomness/entropy and will lead to hash collisions.

Makes perfect sense. Yes, XORing two random numbers reduces entropy.

> That possible to move decision from lazy load, to ksm_thread,
> that will allow us to start bench and not slowdown boot.
> 
> But for that to works, ksm must start later, after init of crypto.

After studying this dependency some more I agree, it is OK to choose hash
function where it is, but I still disagree that we must measure the
performance at runtime.

> crc32c with no hw, are slower in compare to jhash2 on x86, so i think on
> other arches result will be same.

Agreed.

Below, is your patch updated with my suggested changes.

1. Removes dependency on crc32c, use it only when it is available.
2. Do not spend time measuring the performance, choose only if there is HW
optimized implementation of crc32c is available.
3. Replace the logic with static branches.
4. Fix a couple minor bugs: 
   fastest_hash_setup() and crc32c_available() were marked as __init
   functions. Thus could be unmapped by the time they are run for the
   first time. I think section mismatch would catch those
   Removed dead code:  "desc.flags = 0", and also replaced desc with sash.
   Removed unnecessary local global "static struct shash_desc desc" this
   removes it from data page.
   Fixed few spelling errors, and other minor changes to pass
   ./scripts/checkpatch.pl

The patch is untested, but should work. Please let me know if you agree
with the changes. If so, you can test and resubmit the series.

Thank you,
Pavel

Patch:
==========================================================================

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] ksm: replace jhash2 with faster hash
  2018-05-25  1:16           ` Pavel Tatashin
@ 2018-05-26 20:25             ` kbuild test robot
  2018-05-26 21:06             ` kbuild test robot
  1 sibling, 0 replies; 27+ messages in thread
From: kbuild test robot @ 2018-05-26 20:25 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, Timofey Titovets, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

Hi Pavel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.17-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/ksm-replace-jhash2-with-faster-hash/20180527-032512
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-x078-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/ksm.c: In function 'fasthash':
>> mm/ksm.c:338:15: error: implicit declaration of function 'xxhash'; did you mean 'jhash'? [-Werror=implicit-function-declaration]
      return (u32)xxhash(input, length, 0);
                  ^~~~~~
                  jhash
   cc1: some warnings being treated as errors

vim +338 mm/ksm.c

   331	
   332	static u32 fasthash(const void *input, size_t length)
   333	{
   334		if (static_branch_likely(&ksm_use_crc32c))
   335			return crc32c(0, input, length);
   336	
   337		if (static_branch_likely(&ksm_use_xxhash))
 > 338			return (u32)xxhash(input, length, 0);
   339	
   340		/* Is done only once on the first call of fasthash() */
   341		fasthash_setup();
   342	
   343		/* Now, that we know the hash alg., calculate checksum for zero page */
   344		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
   345	
   346		return fasthash(input, length);
   347	}
   348	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27001 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] ksm: replace jhash2 with faster hash
  2018-05-25  1:16           ` Pavel Tatashin
  2018-05-26 20:25             ` [PATCH] " kbuild test robot
@ 2018-05-26 21:06             ` kbuild test robot
  1 sibling, 0 replies; 27+ messages in thread
From: kbuild test robot @ 2018-05-26 21:06 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, Timofey Titovets, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

[-- Attachment #1: Type: text/plain, Size: 1682 bytes --]

Hi Pavel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.17-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/ksm-replace-jhash2-with-faster-hash/20180527-032512
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-s0-201821 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/ksm.c: In function 'fasthash':
>> mm/ksm.c:338:15: error: implicit declaration of function 'xxhash' [-Werror=implicit-function-declaration]
      return (u32)xxhash(input, length, 0);
                  ^~~~~~
   cc1: some warnings being treated as errors

vim +/xxhash +338 mm/ksm.c

   331	
   332	static u32 fasthash(const void *input, size_t length)
   333	{
   334		if (static_branch_likely(&ksm_use_crc32c))
   335			return crc32c(0, input, length);
   336	
   337		if (static_branch_likely(&ksm_use_xxhash))
 > 338			return (u32)xxhash(input, length, 0);
   339	
   340		/* Is done only once on the first call of fasthash() */
   341		fasthash_setup();
   342	
   343		/* Now, that we know the hash alg., calculate checksum for zero page */
   344		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
   345	
   346		return fasthash(input, length);
   347	}
   348	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29416 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-24  8:01         ` Timofey Titovets
  2018-05-25  1:16           ` Pavel Tatashin
@ 2018-05-27 13:03           ` Mike Rapoport
  2018-05-29 14:45             ` Pavel Tatashin
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Rapoport @ 2018-05-27 13:03 UTC (permalink / raw)
  To: Timofey Titovets
  Cc: pasha.tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

Hi,

On Thu, May 24, 2018 at 11:01:20AM +0300, Timofey Titovets wrote:
> N?N?, 23 D 1/4 D?N? 2018 D3. D2 17:24, Pavel Tatashin <pasha.tatashin@oracle.com>:
> 
> > Hi Timofey,

[ ... ]
 
> > It really feels wrong to keep  choice_fastest_hash() in fasthash(), it is
> > done only once and really belongs to the init function, like ksm_init().
> As
> 
> That possible to move decision from lazy load, to ksm_thread,
> that will allow us to start bench and not slowdown boot.
> 
> But for that to works, ksm must start later, after init of crypto.
 
What about moving choice_fastest_hash() to run_store()?

KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until
userspace writes !0 to /sys/kernel/mm/ksm/run.

Selection of the hash function when KSM is actually enabled seems quite
appropriate...

> > I understand, you think it is a bad idea to keep it in ksm_init() because
> > it slows down boot by 0.25s, which I agree with your is substantial. But,
> I
> > really do not think that we should spend those 0.25s at all deciding what
> > hash function is optimal, and instead default to one or another during
> boot
> > based on hardware we are booting on. If crc32c without hw acceleration is
> > no worse than jhash2, maybe we should simply switch to  crc32c?
> 
> crc32c with no hw, are slower in compare to jhash2 on x86, so i think on
> other arches result will be same.
> 
> > Thank you,
> > Pavel
> 
> Thanks.
> 
> --
> Have a nice day,
> Timofey.
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-27 13:03           ` [PATCH V6 2/2 RESEND] " Mike Rapoport
@ 2018-05-29 14:45             ` Pavel Tatashin
  2018-06-07  8:58               ` Timofey Titovets
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Tatashin @ 2018-05-29 14:45 UTC (permalink / raw)
  To: rppt; +Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm

> What about moving choice_fastest_hash() to run_store()?

> KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until
> userspace writes !0 to /sys/kernel/mm/ksm/run.

> Selection of the hash function when KSM is actually enabled seems quite
> appropriate...

Hi Mike,

This is a good idea to select hash function from run_store() when (flags &
KSM_RUN_MERGE) is set for the first time.

Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-05-29 14:45             ` Pavel Tatashin
@ 2018-06-07  8:58               ` Timofey Titovets
  2018-06-07 11:52                 ` Mike Rapoport
  0 siblings, 1 reply; 27+ messages in thread
From: Timofey Titovets @ 2018-06-07  8:58 UTC (permalink / raw)
  To: pasha.tatashin; +Cc: rppt, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

вт, 29 мая 2018 г. в 17:46, Pavel Tatashin <pasha.tatashin@oracle.com>:
>
> > What about moving choice_fastest_hash() to run_store()?
>
> > KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until
> > userspace writes !0 to /sys/kernel/mm/ksm/run.
>
> > Selection of the hash function when KSM is actually enabled seems quite
> > appropriate...
>
> Hi Mike,
>
> This is a good idea to select hash function from run_store() when (flags &
> KSM_RUN_MERGE) is set for the first time.
>
> Pavel

IIRC, run_store hidden under '#ifdef CONFIG_SYSFS'
So, what's about case without CONFIG_SYSFS?

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-06-07  8:58               ` Timofey Titovets
@ 2018-06-07 11:52                 ` Mike Rapoport
  2018-06-08  1:29                   ` Pavel Tatashin
  0 siblings, 1 reply; 27+ messages in thread
From: Mike Rapoport @ 2018-06-07 11:52 UTC (permalink / raw)
  To: Timofey Titovets
  Cc: pasha.tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

On Thu, Jun 07, 2018 at 11:58:05AM +0300, Timofey Titovets wrote:
> D2N?, 29 D 1/4 D?N? 2018 D3. D2 17:46, Pavel Tatashin <pasha.tatashin@oracle.com>:
> >
> > > What about moving choice_fastest_hash() to run_store()?
> >
> > > KSM anyway starts with ksm_run = KSM_RUN_STOP and does not scan until
> > > userspace writes !0 to /sys/kernel/mm/ksm/run.
> >
> > > Selection of the hash function when KSM is actually enabled seems quite
> > > appropriate...
> >
> > Hi Mike,
> >
> > This is a good idea to select hash function from run_store() when (flags &
> > KSM_RUN_MERGE) is set for the first time.
> >
> > Pavel
> 
> IIRC, run_store hidden under '#ifdef CONFIG_SYSFS'
> So, what's about case without CONFIG_SYSFS?

With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
zero and ksm_do_scan will never be called.
 
> -- 
> Have a nice day,
> Timofey.
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-06-07 11:52                 ` Mike Rapoport
@ 2018-06-08  1:29                   ` Pavel Tatashin
  2018-06-10  5:38                     ` Mike Rapoport
  2018-06-25  8:48                     ` Mike Rapoport
  0 siblings, 2 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-06-08  1:29 UTC (permalink / raw)
  To: rppt; +Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm

> With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
> zero and ksm_do_scan will never be called.
>

Unfortunatly, this is not so:

In: /linux-master/mm/ksm.c

3143#else
3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */
3145
3146#endif /* CONFIG_SYSFS */

So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n.

I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ?

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-06-08  1:29                   ` Pavel Tatashin
@ 2018-06-10  5:38                     ` Mike Rapoport
  2018-06-22 18:48                       ` Pavel Tatashin
  2018-06-25  8:48                     ` Mike Rapoport
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Rapoport @ 2018-06-10  5:38 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm

On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote:
> > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
> > zero and ksm_do_scan will never be called.
> >
> 
> Unfortunatly, this is not so:
> 
> In: /linux-master/mm/ksm.c
> 
> 3143#else
> 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */
> 3145
> 3146#endif /* CONFIG_SYSFS */
> 
> So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n.

Huh, missed that one...
 
> I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ?

A bit unrelated to CONFIG_SYSFS, but rather for rare use-cases in general.
What will happen in the following scenario:

* The system has crc32c HW acceleration
* KSM chooses crc32c
* KSM runs with crc32c
* user removes crc32c HW acceleration module

If I understand correctly, we'll then fall back to pure SW crc32c
calculations, right?

> Thank you,
> Pavel
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-06-10  5:38                     ` Mike Rapoport
@ 2018-06-22 18:48                       ` Pavel Tatashin
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-06-22 18:48 UTC (permalink / raw)
  To: rppt; +Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm

> A bit unrelated to CONFIG_SYSFS, but rather for rare use-cases in general.
> What will happen in the following scenario:
>
> * The system has crc32c HW acceleration
> * KSM chooses crc32c
> * KSM runs with crc32c
> * user removes crc32c HW acceleration module
>
> If I understand correctly, we'll then fall back to pure SW crc32c
> calculations, right?

Yes, we fallback to the SW crc32c, which is slower compared to hw
optimized, but we won't change hash function once it is set. I do not
think it makes sense to add any extra logic into ksm for that, even
after every page is unmerged and ksm thread is stopped.

Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-06-08  1:29                   ` Pavel Tatashin
  2018-06-10  5:38                     ` Mike Rapoport
@ 2018-06-25  8:48                     ` Mike Rapoport
  2018-09-13 10:35                       ` Timofey Titovets
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Rapoport @ 2018-06-25  8:48 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Timofey Titovets, Linux Memory Management List, solee, aarcange, kvm

On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote:
> > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
> > zero and ksm_do_scan will never be called.
> >
> 
> Unfortunatly, this is not so:
> 
> In: /linux-master/mm/ksm.c
> 
> 3143#else
> 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */
> 3145
> 3146#endif /* CONFIG_SYSFS */
> 
> So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n.
> 
> I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ?

BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration
for crc32c is initialized...
 
> Thank you,
> Pavel
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-06-25  8:48                     ` Mike Rapoport
@ 2018-09-13 10:35                       ` Timofey Titovets
  2018-09-13 18:01                         ` Mike Rapoport
  0 siblings, 1 reply; 27+ messages in thread
From: Timofey Titovets @ 2018-09-13 10:35 UTC (permalink / raw)
  To: rppt; +Cc: pasha.tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

пн, 25 июн. 2018 г. в 11:48, Mike Rapoport <rppt@linux.vnet.ibm.com>:
>
> On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote:
> > > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
> > > zero and ksm_do_scan will never be called.
> > >
> >
> > Unfortunatly, this is not so:
> >
> > In: /linux-master/mm/ksm.c
> >
> > 3143#else
> > 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */
> > 3145
> > 3146#endif /* CONFIG_SYSFS */
> >
> > So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n.
> >
> > I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ?
>
> BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration
> for crc32c is initialized...
>
> > Thank you,
> > Pavel
> >
>
> --
> Sincerely yours,
> Mike.
>

Little thread bump.
That patchset can't move forward already for about ~8 month.
As i see main question in thread: that we have a race with ksm
initialization and availability of crypto api.
Maybe we then can fall back to simple plan, and just replace old good
buddy jhash by just more fast xxhash?
That allow move question with crypto api & crc32 to background, and
make things better for now, in 2-3 times.

What you all think about that?

> crc32c_intel: 1084.10ns
> crc32c (no hardware acceleration): 7012.51ns
> xxhash32: 2227.75ns
> xxhash64: 1413.16ns
> jhash2: 5128.30ns

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-09-13 10:35                       ` Timofey Titovets
@ 2018-09-13 18:01                         ` Mike Rapoport
  2018-09-13 18:10                           ` Pasha Tatashin
  0 siblings, 1 reply; 27+ messages in thread
From: Mike Rapoport @ 2018-09-13 18:01 UTC (permalink / raw)
  To: Timofey Titovets
  Cc: Pasha Tatashin, linux-mm, Sioh Lee, Andrea Arcangeli, kvm

(updated Pasha's e-mail)

Thu, Sep 13, 2018 at 01:35:20PM +0300, Timofey Titovets wrote:
> D?D 1/2 , 25 D,N?D 1/2 . 2018 D3. D2 11:48, Mike Rapoport <rppt@linux.vnet.ibm.com>:
> >
> > On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote:
> > > > With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
> > > > zero and ksm_do_scan will never be called.
> > > >
> > >
> > > Unfortunatly, this is not so:
> > >
> > > In: /linux-master/mm/ksm.c
> > >
> > > 3143#else
> > > 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */
> > > 3145
> > > 3146#endif /* CONFIG_SYSFS */
> > >
> > > So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n.
> > >
> > > I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ?
> >
> > BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration
> > for crc32c is initialized...
> >
> > > Thank you,
> > > Pavel
> > >
> >
> > --
> > Sincerely yours,
> > Mike.
> >
> 
> Little thread bump.
> That patchset can't move forward already for about ~8 month.
> As i see main question in thread: that we have a race with ksm
> initialization and availability of crypto api.
> Maybe we then can fall back to simple plan, and just replace old good
> buddy jhash by just more fast xxhash?
> That allow move question with crypto api & crc32 to background, and
> make things better for now, in 2-3 times.
> 
> What you all think about that?

Sounds reasonable to me

> > crc32c_intel: 1084.10ns
> > crc32c (no hardware acceleration): 7012.51ns
> > xxhash32: 2227.75ns
> > xxhash64: 1413.16ns
> > jhash2: 5128.30ns
> 
> -- 
> Have a nice day,
> Timofey.
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-09-13 18:01                         ` Mike Rapoport
@ 2018-09-13 18:10                           ` Pasha Tatashin
  0 siblings, 0 replies; 27+ messages in thread
From: Pasha Tatashin @ 2018-09-13 18:10 UTC (permalink / raw)
  To: Mike Rapoport, Timofey Titovets; +Cc: linux-mm, Sioh Lee, Andrea Arcangeli, kvm



On 9/13/18 2:01 PM, Mike Rapoport wrote:
> (updated Pasha's e-mail)
> 
> Thu, Sep 13, 2018 at 01:35:20PM +0300, Timofey Titovets wrote:
>> пн, 25 июн. 2018 г. в 11:48, Mike Rapoport <rppt@linux.vnet.ibm.com>:
>>>
>>> On Thu, Jun 07, 2018 at 09:29:49PM -0400, Pavel Tatashin wrote:
>>>>> With CONFIG_SYSFS=n there is nothing that will set ksm_run to anything but
>>>>> zero and ksm_do_scan will never be called.
>>>>>
>>>>
>>>> Unfortunatly, this is not so:
>>>>
>>>> In: /linux-master/mm/ksm.c
>>>>
>>>> 3143#else
>>>> 3144 ksm_run = KSM_RUN_MERGE; /* no way for user to start it */
>>>> 3145
>>>> 3146#endif /* CONFIG_SYSFS */
>>>>
>>>> So, we do set ksm_run to run right from ksm_init() when CONFIG_SYSFS=n.
>>>>
>>>> I wonder if this is acceptible to only use xxhash when CONFIG_SYSFS=n ?
>>>
>>> BTW, with CONFIG_SYSFS=n KSM may start running before hardware acceleration
>>> for crc32c is initialized...
>>>
>>>> Thank you,
>>>> Pavel
>>>>
>>>
>>> --
>>> Sincerely yours,
>>> Mike.
>>>
>>
>> Little thread bump.
>> That patchset can't move forward already for about ~8 month.
>> As i see main question in thread: that we have a race with ksm
>> initialization and availability of crypto api.
>> Maybe we then can fall back to simple plan, and just replace old good
>> buddy jhash by just more fast xxhash?
>> That allow move question with crypto api & crc32 to background, and
>> make things better for now, in 2-3 times.
>>
>> What you all think about that?
> 
> Sounds reasonable to me

Same here, please send a new patch with xxhash, and after that we can
work on a faster crc32.

Thank you,
Pavel

> 
>>> crc32c_intel: 1084.10ns
>>> crc32c (no hardware acceleration): 7012.51ns
>>> xxhash32: 2227.75ns
>>> xxhash64: 1413.16ns
>>> jhash2: 5128.30ns
>>
>> -- 
>> Have a nice day,
>> Timofey.
>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
  2018-02-07 10:22 [PATCH V6 0/2 RESEND] KSM replace hash algo " Timofey Titovets
@ 2018-02-07 10:22   ` Timofey Titovets
  0 siblings, 0 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-02-07 10:22 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm

1. Pickup, Sioh Lee crc32 patch, after some long conversation
2. Merge with my work on xxhash
3. Add autoselect code to choice fastest hash helper.

Base idea are same, replace jhash2 with something faster.

Perf numbers:
Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
ksm: crc32c   hash() 12081 MB/s
ksm: xxh64    hash()  8770 MB/s
ksm: xxh32    hash()  4529 MB/s
ksm: jhash2   hash()  1569 MB/s

As jhash2 always will be slower (for data size like PAGE_SIZE),
just drop it from choice.

Add function to autoselect hash algo on boot,
based on hashing speed, like raid6 code does.

Move init of zero_checksum from init, to first call of fasthash():
  1. KSM Init run on early kernel init,
     run perf testing stuff on main kernel boot thread looks bad to me.
  2. Crypto subsystem not avaliable at that early booting,
     so crc32c even, compiled in, not avaliable
     As crypto and ksm init, run at subsys_initcall() (4) kernel level of init,
     all possible consumers will run later at 5+ levels

Output after first try of KSM to hash page:
ksm: crc32c hash() 15218 MB/s
ksm: xxhash hash()  8640 MB/s
ksm: choice crc32c as hash function

Thanks.

Changes:
  v1 -> v2:
    - Move xxhash() to xxhash.h/c and separate patches
  v2 -> v3:
    - Move xxhash() xxhash.c -> xxhash.h
    - replace xxhash_t with 'unsigned long'
    - update kerneldoc above xxhash()
  v3 -> v4:
    - Merge xxhash/crc32 patches
    - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
    - Add auto speed test and auto choice of fastest hash function
  v4 -> v5:
    - Pickup missed xxhash patch
    - Update code with compile time choicen xxhash
    - Add more macros to make code more readable
    - As now that only possible use xxhash or crc32c,
      on crc32c allocation error, skip speed test and fallback to xxhash
    - For workaround too early init problem (crc32c not avaliable),
      move zero_checksum init to first call of fastcall()
    - Don't alloc page for hash testing, use arch zero pages for that
  v5 -> v6:
    - Use libcrc32c instead of CRYPTO API, mainly for
      code/Kconfig deps Simplification
    - Add crc32c_available():
      libcrc32c will BUG_ON on crc32c problems,
      so test crc32c avaliable by crc32c_available()
    - Simplify choice_fastest_hash()
    - Simplify fasthash()
    - struct rmap_item && stable_node have sizeof == 64 on x86_64,
      that makes them cache friendly. As we don't suffer from hash collisions,
      change hash type from unsigned long back to u32.
    - Fix kbuild robot warning, make all local functions static

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: leesioh <solee@os.korea.ac.kr>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
---
 mm/Kconfig |  2 ++
 mm/ksm.c   | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 03ff7703d322..b60bee4bb07e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -305,6 +305,8 @@ config MMU_NOTIFIER
 config KSM
 	bool "Enable KSM for page merging"
 	depends on MMU
+	select XXHASH
+	select LIBCRC32C
 	help
 	  Enable Kernel Samepage Merging: KSM periodically scans those areas
 	  of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index c406f75957ad..2b84407fb918 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,6 @@
 #include <linux/pagemap.h>
 #include <linux/rmap.h>
 #include <linux/spinlock.h>
-#include <linux/jhash.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
@@ -41,6 +40,13 @@
 #include <linux/numa.h>
 
 #include <asm/tlbflush.h>
+
+/* Support for xxhash and crc32c */
+#include <crypto/hash.h>
+#include <linux/crc32c.h>
+#include <linux/xxhash.h>
+#include <linux/sizes.h>
+
 #include "internal.h"
 
 #ifdef CONFIG_NUMA
@@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
 		sizeof(struct __struct), __alignof__(struct __struct),\
 		(__flags), NULL)
 
+#define TIME_125MS  (HZ >> 3)
+#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
+
+#define HASH_NONE   0
+#define HASH_CRC32C 1
+#define HASH_XXHASH 2
+
+static int fastest_hash = HASH_NONE;
+
+static bool __init crc32c_available(void)
+{
+	static struct shash_desc desc;
+
+	desc.tfm = crypto_alloc_shash("crc32c", 0, 0);
+	desc.flags = 0;
+
+	if (IS_ERR(desc.tfm)) {
+		pr_warn("ksm: alloc crc32c shash error %ld\n",
+			-PTR_ERR(desc.tfm));
+		return false;
+	}
+
+	crypto_free_shash(desc.tfm);
+	return true;
+}
+
+static void __init choice_fastest_hash(void)
+{
+
+	unsigned long je;
+	unsigned long perf_crc32c = 0;
+	unsigned long perf_xxhash = 0;
+
+	fastest_hash = HASH_XXHASH;
+	if (!crc32c_available())
+		goto out;
+
+	preempt_disable();
+	je = jiffies + TIME_125MS;
+	while (time_before(jiffies, je)) {
+		crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
+		perf_crc32c++;
+	}
+	preempt_enable();
+
+	preempt_disable();
+	je = jiffies + TIME_125MS;
+	while (time_before(jiffies, je)) {
+		xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
+		perf_xxhash++;
+	}
+	preempt_enable();
+
+	pr_info("ksm: crc32c hash() %5ld MB/s\n", PERF_TO_MBS(perf_crc32c));
+	pr_info("ksm: xxhash hash() %5ld MB/s\n", PERF_TO_MBS(perf_xxhash));
+
+	if (perf_crc32c > perf_xxhash)
+		fastest_hash = HASH_CRC32C;
+out:
+	if (fastest_hash == HASH_CRC32C)
+		pr_info("ksm: choice crc32c as hash function\n");
+	else
+		pr_info("ksm: choice xxhash as hash function\n");
+}
+
+static u32 fasthash(const void *input, size_t length)
+{
+again:
+	switch (fastest_hash) {
+	case HASH_CRC32C:
+		return crc32c(0, input, length);
+	case HASH_XXHASH:
+		return xxhash(input, length, 0);
+	default:
+		choice_fastest_hash();
+		/* The correct value depends on page size and endianness */
+		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
+		goto again;
+	}
+}
+
 static int __init ksm_slab_init(void)
 {
 	rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0);
@@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page)
 {
 	u32 checksum;
 	void *addr = kmap_atomic(page);
-	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+	checksum = fasthash(addr, PAGE_SIZE);
 	kunmap_atomic(addr);
 	return checksum;
 }
@@ -3061,8 +3148,6 @@ static int __init ksm_init(void)
 	struct task_struct *ksm_thread;
 	int err;
 
-	/* The correct value depends on page size and endianness */
-	zero_checksum = calc_checksum(ZERO_PAGE(0));
 	/* Default to false for backwards compatibility */
 	ksm_use_zero_pages = false;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash
@ 2018-02-07 10:22   ` Timofey Titovets
  0 siblings, 0 replies; 27+ messages in thread
From: Timofey Titovets @ 2018-02-07 10:22 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm

1. Pickup, Sioh Lee crc32 patch, after some long conversation
2. Merge with my work on xxhash
3. Add autoselect code to choice fastest hash helper.

Base idea are same, replace jhash2 with something faster.

Perf numbers:
Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
ksm: crc32c   hash() 12081 MB/s
ksm: xxh64    hash()  8770 MB/s
ksm: xxh32    hash()  4529 MB/s
ksm: jhash2   hash()  1569 MB/s

As jhash2 always will be slower (for data size like PAGE_SIZE),
just drop it from choice.

Add function to autoselect hash algo on boot,
based on hashing speed, like raid6 code does.

Move init of zero_checksum from init, to first call of fasthash():
  1. KSM Init run on early kernel init,
     run perf testing stuff on main kernel boot thread looks bad to me.
  2. Crypto subsystem not avaliable at that early booting,
     so crc32c even, compiled in, not avaliable
     As crypto and ksm init, run at subsys_initcall() (4) kernel level of init,
     all possible consumers will run later at 5+ levels

Output after first try of KSM to hash page:
ksm: crc32c hash() 15218 MB/s
ksm: xxhash hash()  8640 MB/s
ksm: choice crc32c as hash function

Thanks.

Changes:
  v1 -> v2:
    - Move xxhash() to xxhash.h/c and separate patches
  v2 -> v3:
    - Move xxhash() xxhash.c -> xxhash.h
    - replace xxhash_t with 'unsigned long'
    - update kerneldoc above xxhash()
  v3 -> v4:
    - Merge xxhash/crc32 patches
    - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
    - Add auto speed test and auto choice of fastest hash function
  v4 -> v5:
    - Pickup missed xxhash patch
    - Update code with compile time choicen xxhash
    - Add more macros to make code more readable
    - As now that only possible use xxhash or crc32c,
      on crc32c allocation error, skip speed test and fallback to xxhash
    - For workaround too early init problem (crc32c not avaliable),
      move zero_checksum init to first call of fastcall()
    - Don't alloc page for hash testing, use arch zero pages for that
  v5 -> v6:
    - Use libcrc32c instead of CRYPTO API, mainly for
      code/Kconfig deps Simplification
    - Add crc32c_available():
      libcrc32c will BUG_ON on crc32c problems,
      so test crc32c avaliable by crc32c_available()
    - Simplify choice_fastest_hash()
    - Simplify fasthash()
    - struct rmap_item && stable_node have sizeof == 64 on x86_64,
      that makes them cache friendly. As we don't suffer from hash collisions,
      change hash type from unsigned long back to u32.
    - Fix kbuild robot warning, make all local functions static

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: leesioh <solee@os.korea.ac.kr>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
---
 mm/Kconfig |  2 ++
 mm/ksm.c   | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 03ff7703d322..b60bee4bb07e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -305,6 +305,8 @@ config MMU_NOTIFIER
 config KSM
 	bool "Enable KSM for page merging"
 	depends on MMU
+	select XXHASH
+	select LIBCRC32C
 	help
 	  Enable Kernel Samepage Merging: KSM periodically scans those areas
 	  of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index c406f75957ad..2b84407fb918 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,6 @@
 #include <linux/pagemap.h>
 #include <linux/rmap.h>
 #include <linux/spinlock.h>
-#include <linux/jhash.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
@@ -41,6 +40,13 @@
 #include <linux/numa.h>
 
 #include <asm/tlbflush.h>
+
+/* Support for xxhash and crc32c */
+#include <crypto/hash.h>
+#include <linux/crc32c.h>
+#include <linux/xxhash.h>
+#include <linux/sizes.h>
+
 #include "internal.h"
 
 #ifdef CONFIG_NUMA
@@ -284,6 +290,87 @@ static DEFINE_SPINLOCK(ksm_mmlist_lock);
 		sizeof(struct __struct), __alignof__(struct __struct),\
 		(__flags), NULL)
 
+#define TIME_125MS  (HZ >> 3)
+#define PERF_TO_MBS(X) (X*PAGE_SIZE*(1 << 3)/(SZ_1M))
+
+#define HASH_NONE   0
+#define HASH_CRC32C 1
+#define HASH_XXHASH 2
+
+static int fastest_hash = HASH_NONE;
+
+static bool __init crc32c_available(void)
+{
+	static struct shash_desc desc;
+
+	desc.tfm = crypto_alloc_shash("crc32c", 0, 0);
+	desc.flags = 0;
+
+	if (IS_ERR(desc.tfm)) {
+		pr_warn("ksm: alloc crc32c shash error %ld\n",
+			-PTR_ERR(desc.tfm));
+		return false;
+	}
+
+	crypto_free_shash(desc.tfm);
+	return true;
+}
+
+static void __init choice_fastest_hash(void)
+{
+
+	unsigned long je;
+	unsigned long perf_crc32c = 0;
+	unsigned long perf_xxhash = 0;
+
+	fastest_hash = HASH_XXHASH;
+	if (!crc32c_available())
+		goto out;
+
+	preempt_disable();
+	je = jiffies + TIME_125MS;
+	while (time_before(jiffies, je)) {
+		crc32c(0, ZERO_PAGE(0), PAGE_SIZE);
+		perf_crc32c++;
+	}
+	preempt_enable();
+
+	preempt_disable();
+	je = jiffies + TIME_125MS;
+	while (time_before(jiffies, je)) {
+		xxhash(ZERO_PAGE(0), PAGE_SIZE, 0);
+		perf_xxhash++;
+	}
+	preempt_enable();
+
+	pr_info("ksm: crc32c hash() %5ld MB/s\n", PERF_TO_MBS(perf_crc32c));
+	pr_info("ksm: xxhash hash() %5ld MB/s\n", PERF_TO_MBS(perf_xxhash));
+
+	if (perf_crc32c > perf_xxhash)
+		fastest_hash = HASH_CRC32C;
+out:
+	if (fastest_hash == HASH_CRC32C)
+		pr_info("ksm: choice crc32c as hash function\n");
+	else
+		pr_info("ksm: choice xxhash as hash function\n");
+}
+
+static u32 fasthash(const void *input, size_t length)
+{
+again:
+	switch (fastest_hash) {
+	case HASH_CRC32C:
+		return crc32c(0, input, length);
+	case HASH_XXHASH:
+		return xxhash(input, length, 0);
+	default:
+		choice_fastest_hash();
+		/* The correct value depends on page size and endianness */
+		zero_checksum = fasthash(ZERO_PAGE(0), PAGE_SIZE);
+		goto again;
+	}
+}
+
 static int __init ksm_slab_init(void)
 {
 	rmap_item_cache = KSM_KMEM_CACHE(rmap_item, 0);
@@ -979,7 +1066,7 @@ static u32 calc_checksum(struct page *page)
 {
 	u32 checksum;
 	void *addr = kmap_atomic(page);
-	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+	checksum = fasthash(addr, PAGE_SIZE);
 	kunmap_atomic(addr);
 	return checksum;
 }
@@ -3061,8 +3148,6 @@ static int __init ksm_init(void)
 	struct task_struct *ksm_thread;
 	int err;
 
-	/* The correct value depends on page size and endianness */
-	zero_checksum = calc_checksum(ZERO_PAGE(0));
 	/* Default to false for backwards compatibility */
 	ksm_use_zero_pages = false;
 
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-09-13 18:11 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18 19:32 [PATCH V6 0/2 RESEND] KSM replace hash algo with faster hash Timofey Titovets
2018-04-18 19:32 ` [PATCH V6 1/2 RESEND] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
2018-04-18 19:32 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 with faster hash Timofey Titovets
2018-05-08 15:26   ` Claudio Imbrenda
2018-05-11 23:06     ` Timofey Titovets
2018-05-14 10:17       ` Claudio Imbrenda
2018-05-16 10:26         ` Timofey Titovets
2018-05-22 20:22   ` Pavel Tatashin
2018-05-23 13:45     ` Timofey Titovets
2018-05-23 14:24       ` Pavel Tatashin
2018-05-24  8:01         ` Timofey Titovets
2018-05-25  1:16           ` Pavel Tatashin
2018-05-26 20:25             ` [PATCH] " kbuild test robot
2018-05-26 21:06             ` kbuild test robot
2018-05-27 13:03           ` [PATCH V6 2/2 RESEND] " Mike Rapoport
2018-05-29 14:45             ` Pavel Tatashin
2018-06-07  8:58               ` Timofey Titovets
2018-06-07 11:52                 ` Mike Rapoport
2018-06-08  1:29                   ` Pavel Tatashin
2018-06-10  5:38                     ` Mike Rapoport
2018-06-22 18:48                       ` Pavel Tatashin
2018-06-25  8:48                     ` Mike Rapoport
2018-09-13 10:35                       ` Timofey Titovets
2018-09-13 18:01                         ` Mike Rapoport
2018-09-13 18:10                           ` Pasha Tatashin
  -- strict thread matches above, loose matches on Subject: below --
2018-02-07 10:22 [PATCH V6 0/2 RESEND] KSM replace hash algo " Timofey Titovets
2018-02-07 10:22 ` [PATCH V6 2/2 RESEND] ksm: replace jhash2 " Timofey Titovets
2018-02-07 10:22   ` Timofey Titovets

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.