[PATCH] vfs: Optimize dedupe comparison

* [PATCH] vfs: Optimize dedupe comparison
@ 2021-07-15 14:13 Nikolay Borisov
  2021-07-15 14:30 ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Borisov @ 2021-07-15 14:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, david, djwong, Nikolay Borisov

Currently the comparison method vfs_dedupe_file_range_compare utilizes
is a plain memcmp. This effectively means the code is doing byte-by-byte
comparison. Instead, the code could do word-sized comparison without
adverse effect on performance, provided that the comparison's length is
at least as big as the native word size, as well as resulting memory
addresses are properly aligned.

On a workload consisting of running duperemove (a userspace program
doing deduplication of duplicated extents) on a fully-duplicated
dataset, consisting of 80g spread among 20k 4m files I get the following
results:

		Unpatched:		Patched:
real		21m45.275s		21m14.445s
user		0m0.986s		0m0.933s
sys		1m30.734s		1m8.900s (-25%)

Notable changes in the perf profiles:
 .... omitted for brevity ....
     0.29%     +1.01%  [kernel.vmlinux]         [k] vfs_dedupe_file_range_compare.constprop.0
    23.62%             [kernel.vmlinux]         [k] memcmp
 .... omitted for brevity ....

The memcmp is being eliminated altogether and instead is replaced by the
newly introduced loop in vfs_dedupe_file_range_compare, hence the
increase of cycles spent there by 1%.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/remap_range.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index e4a5fdd7ad7b..041e03b082ed 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -212,6 +212,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 	loff_t cmp_len;
 	bool same;
 	int error;
+	const uint8_t block_size = sizeof(unsigned long);

 	error = -EINVAL;
 	same = true;
@@ -256,9 +257,35 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 		flush_dcache_page(src_page);
 		flush_dcache_page(dest_page);

-		if (memcmp(src_addr + src_poff, dest_addr + dest_poff, cmp_len))
-			same = false;

+		if (!IS_ALIGNED((unsigned long)(src_addr + src_poff), block_size) ||
+		    !IS_ALIGNED((unsigned long)(dest_addr + dest_poff), block_size) ||
+		    cmp_len < block_size) {
+			if (memcmp(src_addr + src_poff, dest_addr + dest_poff,
+				   cmp_len))
+				same = false;
+		} else {
+			int i;
+			size_t blocks = cmp_len / block_size;
+			loff_t rem_len = cmp_len - (blocks * block_size);
+			unsigned long *src = src_addr + src_poff;
+			unsigned long *dst = dest_addr + src_poff;
+
+			for (i = 0; i < blocks; i++) {
+				if (src[i] - dst[i]) {
+					same = false;
+					goto finished;
+				}
+			}
+
+			if (rem_len) {
+				src_addr += src_poff + (blocks * block_size);
+				dest_addr += dest_poff + (blocks * block_size);
+				if (memcmp(src_addr, dest_addr, rem_len))
+					same = false;
+			}
+		}
+finished:
 		kunmap_atomic(dest_addr);
 		kunmap_atomic(src_addr);
 unlock:
--
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread