All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM
@ 2018-10-23 18:25 Timofey Titovets
  2018-10-23 18:25 ` [PATCH RESEND V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
  2018-10-23 18:25 ` [PATCH RESEND V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets
  0 siblings, 2 replies; 4+ messages in thread
From: Timofey Titovets @ 2018-10-23 18:25 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh

From: Timofey Titovets <timofey.titovets@synesis.ru>

About speed (in kernel):
        ksm: crc32c   hash() 12081 MB/s
        ksm: xxh64    hash()  8770 MB/s
        ksm: xxh32    hash()  4529 MB/s
        ksm: jhash2   hash()  1569 MB/s

By sioh Lee tests (copy from other mail):
Test platform: openstack cloud platform (NEWTON version)
Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb)
VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4
Linux kernel: 4.14 (latest version)
KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200

Experiment process
Firstly, we turn off KSM and launch 4 VMs.
Then we turn on the KSM and measure the checksum computation time until full_scans become two.

The experimental results (the experimental value is the average of the measured values)
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns

In summary, the result shows that crc32c_intel has advantages over all 
of the hash function used in the experiment. (decreased by 84.54% compared to crc32c,
78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to xxhash64)
the results are similar to those of Timofey.

But,
use only xxhash for now, because for using crc32c,
cryptoapi must be initialized first - that require some
tricky solution to work good in all situations.

So:
  - Fisrt patch implement compile time pickup of fastest implementation of xxhash
    for target platform.
  - Second replace jhash2 with xxhash

Thanks.

CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
CC: leesioh <solee@os.korea.ac.kr>

Timofey Titovets (2):
  xxHash: create arch dependent 32/64-bit xxhash()
  ksm: replace jhash2 with xxhash

 include/linux/xxhash.h | 23 +++++++++++++++++++++++
 mm/Kconfig             |  1 +
 mm/ksm.c               |  4 ++--
 3 files changed, 26 insertions(+), 2 deletions(-)

-- 
2.19.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RESEND V8 1/2] xxHash: create arch dependent 32/64-bit xxhash()
  2018-10-23 18:25 [PATCH RESEND V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets
@ 2018-10-23 18:25 ` Timofey Titovets
  2018-11-08 18:31   ` Pavel Tatashin
  2018-10-23 18:25 ` [PATCH RESEND V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets
  1 sibling, 1 reply; 4+ messages in thread
From: Timofey Titovets @ 2018-10-23 18:25 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, Andrea Arcangeli, kvm, leesioh

xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform

Create xxhash() which will pickup fastest version
on compile time.

As result depends on cpu word size,
the main proporse of that - in memory hashing.

Changes:
  v2:
    - Create that patch
  v3 -> v8:
    - Nothing, whole patchset version bump

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>

CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
CC: leesioh <solee@os.korea.ac.kr>
---
 include/linux/xxhash.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
index 9e1f42cb57e9..52b073fea17f 100644
--- a/include/linux/xxhash.h
+++ b/include/linux/xxhash.h
@@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
  */
 uint64_t xxh64(const void *input, size_t length, uint64_t seed);
 
+/**
+ * xxhash() - calculate wordsize hash of the input with a given seed
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * If the hash does not need to be comparable between machines with
+ * different word sizes, this function will call whichever of xxh32()
+ * or xxh64() is faster.
+ *
+ * Return:  wordsize hash of the data.
+ */
+
+static inline unsigned long xxhash(const void *input, size_t length,
+				   uint64_t seed)
+{
+#if BITS_PER_LONG == 64
+       return xxh64(input, length, seed);
+#else
+       return xxh32(input, length, seed);
+#endif
+}
+
 /*-****************************
  * Streaming Hash Functions
  *****************************/
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH RESEND V8 2/2] ksm: replace jhash2 with xxhash
  2018-10-23 18:25 [PATCH RESEND V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets
  2018-10-23 18:25 ` [PATCH RESEND V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
@ 2018-10-23 18:25 ` Timofey Titovets
  1 sibling, 0 replies; 4+ messages in thread
From: Timofey Titovets @ 2018-10-23 18:25 UTC (permalink / raw)
  To: linux-mm; +Cc: Timofey Titovets, leesioh, Andrea Arcangeli, kvm

Replace jhash2 with xxhash.

Perf numbers:
Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
ksm: crc32c   hash() 12081 MB/s
ksm: xxh64    hash()  8770 MB/s
ksm: xxh32    hash()  4529 MB/s
ksm: jhash2   hash()  1569 MB/s

>From Sioh Lee:
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns

As jhash2 always will be slower (for data size like PAGE_SIZE).
Don't use it in ksm at all.

Use only xxhash for now, because for using crc32c,
cryptoapi must be initialized first - that require some
tricky solution to work good in all situations.

Thanks.

Changes:
  v1 -> v2:
    - Move xxhash() to xxhash.h/c and separate patches
  v2 -> v3:
    - Move xxhash() xxhash.c -> xxhash.h
    - replace xxhash_t with 'unsigned long'
    - update kerneldoc above xxhash()
  v3 -> v4:
    - Merge xxhash/crc32 patches
    - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
    - Add auto speed test and auto choice of fastest hash function
  v4 -> v5:
    - Pickup missed xxhash patch
    - Update code with compile time choicen xxhash
    - Add more macros to make code more readable
    - As now that only possible use xxhash or crc32c,
      on crc32c allocation error, skip speed test and fallback to xxhash
    - For workaround too early init problem (crc32c not avaliable),
      move zero_checksum init to first call of fastcall()
    - Don't alloc page for hash testing, use arch zero pages for that
  v5 -> v6:
    - Use libcrc32c instead of CRYPTO API, mainly for
      code/Kconfig deps Simplification
    - Add crc32c_available():
      libcrc32c will BUG_ON on crc32c problems,
      so test crc32c avaliable by crc32c_available()
    - Simplify choice_fastest_hash()
    - Simplify fasthash()
    - struct rmap_item && stable_node have sizeof == 64 on x86_64,
      that makes them cache friendly. As we don't suffer from hash collisions,
      change hash type from unsigned long back to u32.
    - Fix kbuild robot warning, make all local functions static
  v6 -> v7:
    - Drop crc32c for now and use only xxhash in ksm.
  v7 -> v8:
    - Remove empty line changes

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: leesioh <solee@os.korea.ac.kr>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>

CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
---
 mm/Kconfig | 1 +
 mm/ksm.c   | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index a550635ea5c3..b5f923081bce 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -297,6 +297,7 @@ config MMU_NOTIFIER
 config KSM
 	bool "Enable KSM for page merging"
 	depends on MMU
+	select XXHASH
 	help
 	  Enable Kernel Samepage Merging: KSM periodically scans those areas
 	  of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index 5b0894b45ee5..1a088306ef81 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,7 @@
 #include <linux/pagemap.h>
 #include <linux/rmap.h>
 #include <linux/spinlock.h>
-#include <linux/jhash.h>
+#include <linux/xxhash.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
@@ -1009,7 +1009,7 @@ static u32 calc_checksum(struct page *page)
 {
 	u32 checksum;
 	void *addr = kmap_atomic(page);
-	checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+	checksum = xxhash(addr, PAGE_SIZE, 0);
 	kunmap_atomic(addr);
 	return checksum;
 }
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND V8 1/2] xxHash: create arch dependent 32/64-bit xxhash()
  2018-10-23 18:25 ` [PATCH RESEND V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
@ 2018-11-08 18:31   ` Pavel Tatashin
  0 siblings, 0 replies; 4+ messages in thread
From: Pavel Tatashin @ 2018-11-08 18:31 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: Timofey Titovets, linux-mm, Andrea Arcangeli, kvm, leesioh

Hi Andrew,

Can you please accept these patches? They are simple yet provide a good
performance improvement. Timofey has been resending them for a while.

Thank you,
Pasha

On 18-10-23 21:25:53, Timofey Titovets wrote:
> xxh32() - fast on both 32/64-bit platforms
> xxh64() - fast only on 64-bit platform
> 
> Create xxhash() which will pickup fastest version
> on compile time.
> 
> As result depends on cpu word size,
> the main proporse of that - in memory hashing.
> 
> Changes:
>   v2:
>     - Create that patch
>   v3 -> v8:
>     - Nothing, whole patchset version bump
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
> 
> CC: Andrea Arcangeli <aarcange@redhat.com>
> CC: linux-mm@kvack.org
> CC: kvm@vger.kernel.org
> CC: leesioh <solee@os.korea.ac.kr>
> ---
>  include/linux/xxhash.h | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
> index 9e1f42cb57e9..52b073fea17f 100644
> --- a/include/linux/xxhash.h
> +++ b/include/linux/xxhash.h
> @@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
>   */
>  uint64_t xxh64(const void *input, size_t length, uint64_t seed);
>  
> +/**
> + * xxhash() - calculate wordsize hash of the input with a given seed
> + * @input:  The data to hash.
> + * @length: The length of the data to hash.
> + * @seed:   The seed can be used to alter the result predictably.
> + *
> + * If the hash does not need to be comparable between machines with
> + * different word sizes, this function will call whichever of xxh32()
> + * or xxh64() is faster.
> + *
> + * Return:  wordsize hash of the data.
> + */
> +
> +static inline unsigned long xxhash(const void *input, size_t length,
> +				   uint64_t seed)
> +{
> +#if BITS_PER_LONG == 64
> +       return xxh64(input, length, seed);
> +#else
> +       return xxh32(input, length, seed);
> +#endif
> +}
> +
>  /*-****************************
>   * Streaming Hash Functions
>   *****************************/
> -- 
> 2.19.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-08 18:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 18:25 [PATCH RESEND V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets
2018-10-23 18:25 ` [PATCH RESEND V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
2018-11-08 18:31   ` Pavel Tatashin
2018-10-23 18:25 ` [PATCH RESEND V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.