* [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM
@ 2018-09-13 21:41 Timofey Titovets
2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets
0 siblings, 2 replies; 5+ messages in thread
From: Timofey Titovets @ 2018-09-13 21:41 UTC (permalink / raw)
To: linux-mm; +Cc: rppt, Timofey Titovets, Andrea Arcangeli, kvm, leesioh
About speed (in kernel):
ksm: crc32c hash() 12081 MB/s
ksm: xxh64 hash() 8770 MB/s
ksm: xxh32 hash() 4529 MB/s
ksm: jhash2 hash() 1569 MB/s
By sioh Lee tests (copy from other mail):
Test platform: openstack cloud platform (NEWTON version)
Experiment node: openstack based cloud compute node (CPU: xeon E5-2620 v3, memory 64gb)
VM: (2 VCPU, RAM 4GB, DISK 20GB) * 4
Linux kernel: 4.14 (latest version)
KSM setup - sleep_millisecs: 200ms, pages_to_scan: 200
Experiment process
Firstly, we turn off KSM and launch 4 VMs.
Then we turn on the KSM and measure the checksum computation time until full_scans become two.
The experimental results (the experimental value is the average of the measured values)
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns
In summary, the result shows that crc32c_intel has advantages over all
of the hash function used in the experiment. (decreased by 84.54% compared to crc32c,
78.86% compared to jhash2, 51.33% xxhash32, 23.28% compared to xxhash64)
the results are similar to those of Timofey.
But,
use only xxhash for now, because for using crc32c,
cryptoapi must be initialized first - that require some
tricky solution to work good in all situations.
So:
- Fisrt patch implement compile time pickup of fastest implementation of xxhash
for target platform.
- Second replace jhash2 with xxhash
Thanks.
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
CC: leesioh <solee@os.korea.ac.kr>
Timofey Titovets (2):
xxHash: create arch dependent 32/64-bit xxhash()
ksm: replace jhash2 with xxhash
include/linux/xxhash.h | 23 +++++++++++++++++++++++
mm/Kconfig | 1 +
mm/ksm.c | 4 ++--
3 files changed, 26 insertions(+), 2 deletions(-)
--
2.19.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash()
2018-09-13 21:41 [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets
@ 2018-09-13 21:41 ` Timofey Titovets
2018-09-14 8:41 ` Mike Rapoport
2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets
1 sibling, 1 reply; 5+ messages in thread
From: Timofey Titovets @ 2018-09-13 21:41 UTC (permalink / raw)
To: linux-mm; +Cc: rppt, Timofey Titovets, Andrea Arcangeli, kvm, leesioh
From: Timofey Titovets <nefelim4ag@gmail.com>
xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform
Create xxhash() which will pickup fastest version
on compile time.
As result depends on cpu word size,
the main proporse of that - in memory hashing.
Changes:
v2:
- Create that patch
v3 -> v8:
- Nothing, whole patchset version bump
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
CC: leesioh <solee@os.korea.ac.kr>
---
include/linux/xxhash.h | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
index 9e1f42cb57e9..52b073fea17f 100644
--- a/include/linux/xxhash.h
+++ b/include/linux/xxhash.h
@@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
*/
uint64_t xxh64(const void *input, size_t length, uint64_t seed);
+/**
+ * xxhash() - calculate wordsize hash of the input with a given seed
+ * @input: The data to hash.
+ * @length: The length of the data to hash.
+ * @seed: The seed can be used to alter the result predictably.
+ *
+ * If the hash does not need to be comparable between machines with
+ * different word sizes, this function will call whichever of xxh32()
+ * or xxh64() is faster.
+ *
+ * Return: wordsize hash of the data.
+ */
+
+static inline unsigned long xxhash(const void *input, size_t length,
+ uint64_t seed)
+{
+#if BITS_PER_LONG == 64
+ return xxh64(input, length, seed);
+#else
+ return xxh32(input, length, seed);
+#endif
+}
+
/*-****************************
* Streaming Hash Functions
*****************************/
--
2.19.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH V8 2/2] ksm: replace jhash2 with xxhash
2018-09-13 21:41 [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets
2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
@ 2018-09-13 21:41 ` Timofey Titovets
2018-09-14 8:42 ` Mike Rapoport
1 sibling, 1 reply; 5+ messages in thread
From: Timofey Titovets @ 2018-09-13 21:41 UTC (permalink / raw)
To: linux-mm; +Cc: rppt, Timofey Titovets, leesioh, Andrea Arcangeli, kvm
From: Timofey Titovets <nefelim4ag@gmail.com>
Replace jhash2 with xxhash.
Perf numbers:
Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
ksm: crc32c hash() 12081 MB/s
ksm: xxh64 hash() 8770 MB/s
ksm: xxh32 hash() 4529 MB/s
ksm: jhash2 hash() 1569 MB/s
>From Sioh Lee:
crc32c_intel: 1084.10ns
crc32c (no hardware acceleration): 7012.51ns
xxhash32: 2227.75ns
xxhash64: 1413.16ns
jhash2: 5128.30ns
As jhash2 always will be slower (for data size like PAGE_SIZE).
Don't use it in ksm at all.
Use only xxhash for now, because for using crc32c,
cryptoapi must be initialized first - that require some
tricky solution to work good in all situations.
Thanks.
Changes:
v1 -> v2:
- Move xxhash() to xxhash.h/c and separate patches
v2 -> v3:
- Move xxhash() xxhash.c -> xxhash.h
- replace xxhash_t with 'unsigned long'
- update kerneldoc above xxhash()
v3 -> v4:
- Merge xxhash/crc32 patches
- Replace crc32 with crc32c (crc32 have same as jhash2 speed)
- Add auto speed test and auto choice of fastest hash function
v4 -> v5:
- Pickup missed xxhash patch
- Update code with compile time choicen xxhash
- Add more macros to make code more readable
- As now that only possible use xxhash or crc32c,
on crc32c allocation error, skip speed test and fallback to xxhash
- For workaround too early init problem (crc32c not avaliable),
move zero_checksum init to first call of fastcall()
- Don't alloc page for hash testing, use arch zero pages for that
v5 -> v6:
- Use libcrc32c instead of CRYPTO API, mainly for
code/Kconfig deps Simplification
- Add crc32c_available():
libcrc32c will BUG_ON on crc32c problems,
so test crc32c avaliable by crc32c_available()
- Simplify choice_fastest_hash()
- Simplify fasthash()
- struct rmap_item && stable_node have sizeof == 64 on x86_64,
that makes them cache friendly. As we don't suffer from hash collisions,
change hash type from unsigned long back to u32.
- Fix kbuild robot warning, make all local functions static
v6 -> v7:
- Drop crc32c for now and use only xxhash in ksm.
v7 -> v8:
- Remove empty line changes
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: leesioh <solee@os.korea.ac.kr>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: linux-mm@kvack.org
CC: kvm@vger.kernel.org
---
mm/Kconfig | 1 +
mm/ksm.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index a550635ea5c3..b5f923081bce 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -297,6 +297,7 @@ config MMU_NOTIFIER
config KSM
bool "Enable KSM for page merging"
depends on MMU
+ select XXHASH
help
Enable Kernel Samepage Merging: KSM periodically scans those areas
of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index 5b0894b45ee5..1a088306ef81 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,7 @@
#include <linux/pagemap.h>
#include <linux/rmap.h>
#include <linux/spinlock.h>
-#include <linux/jhash.h>
+#include <linux/xxhash.h>
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/wait.h>
@@ -1009,7 +1009,7 @@ static u32 calc_checksum(struct page *page)
{
u32 checksum;
void *addr = kmap_atomic(page);
- checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+ checksum = xxhash(addr, PAGE_SIZE, 0);
kunmap_atomic(addr);
return checksum;
}
--
2.19.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash()
2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
@ 2018-09-14 8:41 ` Mike Rapoport
0 siblings, 0 replies; 5+ messages in thread
From: Mike Rapoport @ 2018-09-14 8:41 UTC (permalink / raw)
To: Timofey Titovets
Cc: linux-mm, Timofey Titovets, Andrea Arcangeli, kvm, leesioh
On Fri, Sep 14, 2018 at 12:41:01AM +0300, Timofey Titovets wrote:
> From: Timofey Titovets <nefelim4ag@gmail.com>
>
> xxh32() - fast on both 32/64-bit platforms
> xxh64() - fast only on 64-bit platform
>
> Create xxhash() which will pickup fastest version
> on compile time.
>
> As result depends on cpu word size,
> the main proporse of that - in memory hashing.
>
> Changes:
> v2:
> - Create that patch
> v3 -> v8:
> - Nothing, whole patchset version bump
>
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
> CC: Andrea Arcangeli <aarcange@redhat.com>
> CC: linux-mm@kvack.org
> CC: kvm@vger.kernel.org
> CC: leesioh <solee@os.korea.ac.kr>
> ---
> include/linux/xxhash.h | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
> index 9e1f42cb57e9..52b073fea17f 100644
> --- a/include/linux/xxhash.h
> +++ b/include/linux/xxhash.h
> @@ -107,6 +107,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
> */
> uint64_t xxh64(const void *input, size_t length, uint64_t seed);
>
> +/**
> + * xxhash() - calculate wordsize hash of the input with a given seed
> + * @input: The data to hash.
> + * @length: The length of the data to hash.
> + * @seed: The seed can be used to alter the result predictably.
> + *
> + * If the hash does not need to be comparable between machines with
> + * different word sizes, this function will call whichever of xxh32()
> + * or xxh64() is faster.
> + *
> + * Return: wordsize hash of the data.
> + */
> +
> +static inline unsigned long xxhash(const void *input, size_t length,
> + uint64_t seed)
> +{
> +#if BITS_PER_LONG == 64
> + return xxh64(input, length, seed);
> +#else
> + return xxh32(input, length, seed);
> +#endif
> +}
> +
> /*-****************************
> * Streaming Hash Functions
> *****************************/
> --
> 2.19.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V8 2/2] ksm: replace jhash2 with xxhash
2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets
@ 2018-09-14 8:42 ` Mike Rapoport
0 siblings, 0 replies; 5+ messages in thread
From: Mike Rapoport @ 2018-09-14 8:42 UTC (permalink / raw)
To: Timofey Titovets
Cc: linux-mm, Timofey Titovets, leesioh, Andrea Arcangeli, kvm
On Fri, Sep 14, 2018 at 12:41:02AM +0300, Timofey Titovets wrote:
> From: Timofey Titovets <nefelim4ag@gmail.com>
>
> Replace jhash2 with xxhash.
>
> Perf numbers:
> Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
> ksm: crc32c hash() 12081 MB/s
> ksm: xxh64 hash() 8770 MB/s
> ksm: xxh32 hash() 4529 MB/s
> ksm: jhash2 hash() 1569 MB/s
>
> From Sioh Lee:
> crc32c_intel: 1084.10ns
> crc32c (no hardware acceleration): 7012.51ns
> xxhash32: 2227.75ns
> xxhash64: 1413.16ns
> jhash2: 5128.30ns
>
> As jhash2 always will be slower (for data size like PAGE_SIZE).
> Don't use it in ksm at all.
>
> Use only xxhash for now, because for using crc32c,
> cryptoapi must be initialized first - that require some
> tricky solution to work good in all situations.
>
> Thanks.
>
> Changes:
> v1 -> v2:
> - Move xxhash() to xxhash.h/c and separate patches
> v2 -> v3:
> - Move xxhash() xxhash.c -> xxhash.h
> - replace xxhash_t with 'unsigned long'
> - update kerneldoc above xxhash()
> v3 -> v4:
> - Merge xxhash/crc32 patches
> - Replace crc32 with crc32c (crc32 have same as jhash2 speed)
> - Add auto speed test and auto choice of fastest hash function
> v4 -> v5:
> - Pickup missed xxhash patch
> - Update code with compile time choicen xxhash
> - Add more macros to make code more readable
> - As now that only possible use xxhash or crc32c,
> on crc32c allocation error, skip speed test and fallback to xxhash
> - For workaround too early init problem (crc32c not avaliable),
> move zero_checksum init to first call of fastcall()
> - Don't alloc page for hash testing, use arch zero pages for that
> v5 -> v6:
> - Use libcrc32c instead of CRYPTO API, mainly for
> code/Kconfig deps Simplification
> - Add crc32c_available():
> libcrc32c will BUG_ON on crc32c problems,
> so test crc32c avaliable by crc32c_available()
> - Simplify choice_fastest_hash()
> - Simplify fasthash()
> - struct rmap_item && stable_node have sizeof == 64 on x86_64,
> that makes them cache friendly. As we don't suffer from hash collisions,
> change hash type from unsigned long back to u32.
> - Fix kbuild robot warning, make all local functions static
> v6 -> v7:
> - Drop crc32c for now and use only xxhash in ksm.
> v7 -> v8:
> - Remove empty line changes
>
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> Signed-off-by: leesioh <solee@os.korea.ac.kr>
> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
> CC: Andrea Arcangeli <aarcange@redhat.com>
> CC: linux-mm@kvack.org
> CC: kvm@vger.kernel.org
> ---
> mm/Kconfig | 1 +
> mm/ksm.c | 4 ++--
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index a550635ea5c3..b5f923081bce 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -297,6 +297,7 @@ config MMU_NOTIFIER
> config KSM
> bool "Enable KSM for page merging"
> depends on MMU
> + select XXHASH
> help
> Enable Kernel Samepage Merging: KSM periodically scans those areas
> of an application's address space that an app has advised may be
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 5b0894b45ee5..1a088306ef81 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -25,7 +25,7 @@
> #include <linux/pagemap.h>
> #include <linux/rmap.h>
> #include <linux/spinlock.h>
> -#include <linux/jhash.h>
> +#include <linux/xxhash.h>
> #include <linux/delay.h>
> #include <linux/kthread.h>
> #include <linux/wait.h>
> @@ -1009,7 +1009,7 @@ static u32 calc_checksum(struct page *page)
> {
> u32 checksum;
> void *addr = kmap_atomic(page);
> - checksum = jhash2(addr, PAGE_SIZE / 4, 17);
> + checksum = xxhash(addr, PAGE_SIZE, 0);
> kunmap_atomic(addr);
> return checksum;
> }
> --
> 2.19.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-14 8:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13 21:41 [PATCH V8 0/2] Currently used jhash are slow enough and replace it allow as to make KSM Timofey Titovets
2018-09-13 21:41 ` [PATCH V8 1/2] xxHash: create arch dependent 32/64-bit xxhash() Timofey Titovets
2018-09-14 8:41 ` Mike Rapoport
2018-09-13 21:41 ` [PATCH V8 2/2] ksm: replace jhash2 with xxhash Timofey Titovets
2018-09-14 8:42 ` Mike Rapoport
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.