All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/6] Btrfs: populate heuristic with code
@ 2017-08-25  9:18 Timofey Titovets
  2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Based on kdave for-next

Patches short:
1. Move heuristic to use compression workspaces
   Bit tricky, but works.

2. Add heuristic counters and buffer to workspaces

3. Implement simple input data sampling
   It's get 16 byte samples with 256 bytes shifts
   over input data. Collect info about how many
   different bytes (symbols) has been found in sample data

4. Implement check sample to repeated data
   Just iterate over sample and do memcmp()

5. Add code for calculate
   how many unique bytes has been found in sample data
   That can fast detect easy compressible data

6. Add code for calculate byte core set size
   i.e. how many unique bytes use 90% of sample data
   That code require that numbers in bucket must be sorted
   That can detect easy compressible data with many repeated bytes
   That can detect not compressible data with evenly distributed bytes

Changes v1 -> v2:
  - Change input data iterator shift 512 -> 256
  - Replace magic macro numbers with direct values
  - Drop useless symbol population in bucket
    as no one care about where and what symbol stored
    in bucket at now

Changes v2 -> v3 (only update #3 patch):
  - Fix u64 division problem by use u32 for input_size
  - Fix input size calculation start - end -> end - start
  - Add missing sort.h header

Changes v3 -> v4 (only update #1 patch):
  - Change counter type in bucket item u16 -> u32
  - Drop other fields from bucket item for now,
    no one use it

Change v4 -> v5
  - Move heuristic code to external file
  - Make heuristic use compression workspaces
  - Add check sample to zeroes

Change v5 -> v6
  - Add some code to hande page unaligned range start/end
  - replace sample zeroed check with check for repeated data

Change v6 -> v7
  - Add missing part of first patch
  - Make use of IS_ALIGNED() for check tail aligment

Timofey Titovets (6):
  Btrfs: heuristic make use compression workspaces
  Btrfs: heuristic workspace add bucket and sample items
  Btrfs: implement heuristic sampling logic
  Btrfs: heuristic add detection of repeated data patterns
  Btrfs: heuristic add byte set calculation
  Btrfs: heuristic add byte core set calculation

 fs/btrfs/Makefile      |   2 +-
 fs/btrfs/compression.c |  18 ++--
 fs/btrfs/compression.h |   7 +-
 fs/btrfs/heuristic.c   | 223 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 237 insertions(+), 13 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c

--
2.14.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces
  2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
@ 2017-08-25  9:18 ` Timofey Titovets
  2017-09-27 13:12   ` David Sterba
  2017-08-25  9:18 ` [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Move heuristic to external file
Implement compression workspaces support for
heuristic resources

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/Makefile      |  2 +-
 fs/btrfs/compression.c | 18 +++++--------
 fs/btrfs/compression.h |  7 ++++-
 fs/btrfs/heuristic.c   | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 87 insertions(+), 13 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17a80b0..6fa8479dff43 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-	   uuid-tree.o props.o hash.o free-space-tree.o
+	   uuid-tree.o props.o hash.o free-space-tree.o heuristic.o

 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 883ecc58fd0d..f0aaf27bcc95 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -704,6 +704,7 @@ static struct {
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
 	&btrfs_zlib_compress,
 	&btrfs_lzo_compress,
+	&btrfs_heuristic,
 };

 void __init btrfs_init_compress(void)
@@ -1065,18 +1066,13 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
  */
 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 {
-	u64 index = start >> PAGE_SHIFT;
-	u64 end_index = end >> PAGE_SHIFT;
-	struct page *page;
-	int ret = 1;
+	int ret;
+	enum btrfs_compression_type type = BTRFS_HEURISTIC;
+	struct list_head *workspace = find_workspace(type);

-	while (index <= end_index) {
-		page = find_get_page(inode->i_mapping, index);
-		kmap(page);
-		kunmap(page);
-		put_page(page);
-		index++;
-	}
+	ret = btrfs_compress_op[type-1]->heuristic(workspace, inode,
+						   start, end);

+	free_workspace(type, workspace);
 	return ret;
 }
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 3b1b0ac15fdc..10e9ffa6dfa4 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -99,7 +99,8 @@ enum btrfs_compression_type {
 	BTRFS_COMPRESS_NONE  = 0,
 	BTRFS_COMPRESS_ZLIB  = 1,
 	BTRFS_COMPRESS_LZO   = 2,
-	BTRFS_COMPRESS_TYPES = 2,
+	BTRFS_HEURISTIC = 3,
+	BTRFS_COMPRESS_TYPES = 3,
 };

 struct btrfs_compress_op {
@@ -123,10 +124,14 @@ struct btrfs_compress_op {
 			  struct page *dest_page,
 			  unsigned long start_byte,
 			  size_t srclen, size_t destlen);
+
+	int (*heuristic)(struct list_head *workspace,
+			 struct inode *inode, u64 start, u64 end);
 };

 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;
+extern const struct btrfs_compress_op btrfs_heuristic;

 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
new file mode 100644
index 000000000000..92f9335bafd4
--- /dev/null
+++ b/fs/btrfs/heuristic.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2017
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/pagemap.h>
+#include <linux/string.h>
+#include <linux/bio.h>
+#include "compression.h"
+
+struct workspace {
+	struct list_head list;
+};
+
+static void heuristic_free_workspace(struct list_head *ws)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	kfree(workspace);
+}
+
+static struct list_head *heuristic_alloc_workspace(void)
+{
+	struct workspace *workspace;
+
+	workspace = kzalloc(sizeof(*workspace), GFP_KERNEL);
+	if (!workspace)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&workspace->list);
+
+	return &workspace->list;
+}
+
+static int heuristic(struct list_head *ws, struct inode *inode,
+		     u64 start, u64 end)
+{
+	struct page *page;
+	u64 index, index_end;
+
+	index = start >> PAGE_SHIFT;
+	index_end = end >> PAGE_SHIFT;
+
+	/* Don't miss unaligned end */
+	if (!IS_ALIGNED(end, PAGE_SIZE))
+		index_end++;
+
+	for (; index < index_end; index++) {
+		page = find_get_page(inode->i_mapping, index);
+		kmap(page);
+		kunmap(page);
+		put_page(page);
+	}
+
+	return 1;
+}
+
+const struct btrfs_compress_op btrfs_heuristic = {
+	.alloc_workspace	= heuristic_alloc_workspace,
+	.free_workspace		= heuristic_free_workspace,
+	.heuristic              = heuristic,
+};
--
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items
  2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
  2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
@ 2017-08-25  9:18 ` Timofey Titovets
  2017-09-27 13:22   ` David Sterba
  2017-08-25  9:18 ` [PATCH v7 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Heuristic workspace:
 - Add bucket for storing byte type counters
 - Add sample array for storing partial copy of
   input data range
 - Add counter for store current sample size to workspace

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 92f9335bafd4..e3924c87af08 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -20,13 +20,29 @@
 #include <linux/bio.h>
 #include "compression.h"

+#define READ_SIZE 16
+#define ITER_SHIFT 256
+#define BUCKET_SIZE 256
+#define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
+
+struct bucket_item {
+	u32 count;
+};
+
 struct workspace {
+	u8  *sample;
+	/* Partial copy of input data */
+	u32 sample_size;
+	/* Bucket store counter for each byte type */
+	struct bucket_item bucket[BUCKET_SIZE];
 	struct list_head list;
 };

 static void heuristic_free_workspace(struct list_head *ws)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
+
+	kvfree(workspace->sample);
 	kfree(workspace);
 }

@@ -38,9 +54,16 @@ static struct list_head *heuristic_alloc_workspace(void)
 	if (!workspace)
 		return ERR_PTR(-ENOMEM);

+	workspace->sample = kvmalloc(MAX_SAMPLE_SIZE, GFP_KERNEL);
+	if (!workspace->sample)
+		goto fail;
+
 	INIT_LIST_HEAD(&workspace->list);

 	return &workspace->list;
+fail:
+	heuristic_free_workspace(&workspace->list);
+	return ERR_PTR(-ENOMEM);
 }

 static int heuristic(struct list_head *ws, struct inode *inode,
--
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 3/6] Btrfs: implement heuristic sampling logic
  2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
  2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
  2017-08-25  9:18 ` [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
@ 2017-08-25  9:18 ` Timofey Titovets
  2017-09-27 13:38   ` David Sterba
  2017-08-25  9:18 ` [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Copy sample data from input data range to sample buffer
then calculate byte type count for that sample into bucket.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index e3924c87af08..5192e51ab81e 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -69,8 +69,20 @@ static struct list_head *heuristic_alloc_workspace(void)
 static int heuristic(struct list_head *ws, struct inode *inode,
 		     u64 start, u64 end)
 {
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
 	struct page *page;
 	u64 index, index_end;
+	u32 a, b;
+	u8 *in_data, *sample = workspace->sample;
+	u8 byte;
+
+	/*
+	 * Compression only handle first 128kb of input range
+	 * And just shift over range in loop for compressing it.
+	 * Let's do the same.
+	*/
+	if (end - start > BTRFS_MAX_UNCOMPRESSED)
+		end = start + BTRFS_MAX_UNCOMPRESSED;

 	index = start >> PAGE_SHIFT;
 	index_end = end >> PAGE_SHIFT;
@@ -79,13 +91,37 @@ static int heuristic(struct list_head *ws, struct inode *inode,
 	if (!IS_ALIGNED(end, PAGE_SIZE))
 		index_end++;

+	b = 0;
 	for (; index < index_end; index++) {
 		page = find_get_page(inode->i_mapping, index);
-		kmap(page);
+		in_data = kmap(page);
+		/* Handle case where start unaligned to PAGE_SIZE */
+		a = start%PAGE_SIZE;
+		while (a < PAGE_SIZE - READ_SIZE) {
+			/* Prevent sample overflow */
+			if (b >= MAX_SAMPLE_SIZE)
+				break;
+			/* Don't sample mem trash from last page */
+			if (start > end - READ_SIZE)
+				break;
+			memcpy(&sample[b], &in_data[a], READ_SIZE);
+			a += ITER_SHIFT;
+			start += ITER_SHIFT;
+			b += READ_SIZE;
+		}
 		kunmap(page);
 		put_page(page);
 	}

+	workspace->sample_size = b;
+
+	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
+
+	for (a = 0; a < workspace->sample_size; a++) {
+		byte = sample[a];
+		workspace->bucket[byte].count++;
+	}
+
 	return 1;
 }

--
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns
  2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (2 preceding siblings ...)
  2017-08-25  9:18 ` [PATCH v7 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
@ 2017-08-25  9:18 ` Timofey Titovets
  2017-09-27 13:47   ` David Sterba
  2017-08-25  9:18 ` [PATCH v7 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
  2017-08-25  9:18 ` [PATCH v7 6/6] Btrfs: heuristic add byte core " Timofey Titovets
  5 siblings, 1 reply; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Walk over data sample and use memcmp to detect
repeated data (like zeroed)

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 5192e51ab81e..f1fa6e4f1c11 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -66,6 +66,19 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+static bool sample_repeated_patterns(struct workspace *ws)
+{
+	u32 i = 0;
+	u8 *p = ws->sample;
+
+	for (; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {
+		if(memcpy(&p[i], &p[i + READ_SIZE], READ_SIZE))
+			return false;
+	}
+
+	return true;
+}
+
 static int heuristic(struct list_head *ws, struct inode *inode,
 		     u64 start, u64 end)
 {
@@ -115,6 +128,9 @@ static int heuristic(struct list_head *ws, struct inode *inode,

 	workspace->sample_size = b;

+	if (sample_repeated_patterns(workspace))
+		return 1;
+
 	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);

 	for (a = 0; a < workspace->sample_size; a++) {
--
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 5/6] Btrfs: heuristic add byte set calculation
  2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (3 preceding siblings ...)
  2017-08-25  9:18 ` [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
@ 2017-08-25  9:18 ` Timofey Titovets
  2017-09-27 13:50   ` David Sterba
  2017-08-25  9:18 ` [PATCH v7 6/6] Btrfs: heuristic add byte core " Timofey Titovets
  5 siblings, 1 reply; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte set size for data sample:
Calculate how many unique bytes has been in sample
By count all bytes in bucket with count > 0
If byte set low (~25%), data are easily compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index f1fa6e4f1c11..ef723e991576 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -24,6 +24,7 @@
 #define ITER_SHIFT 256
 #define BUCKET_SIZE 256
 #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
+#define BYTE_SET_THRESHOLD 64

 struct bucket_item {
 	u32 count;
@@ -66,6 +67,27 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+static u32 byte_set_size(const struct workspace *ws)
+{
+	u32 a = 0;
+	u32 byte_set_size = 0;
+
+	for (; a < BYTE_SET_THRESHOLD; a++) {
+		if (ws->bucket[a].count > 0)
+			byte_set_size++;
+	}
+
+	for (; a < BUCKET_SIZE; a++) {
+		if (ws->bucket[a].count > 0) {
+			byte_set_size++;
+			if (byte_set_size > BYTE_SET_THRESHOLD)
+				return byte_set_size;
+		}
+	}
+
+	return byte_set_size;
+}
+
 static bool sample_repeated_patterns(struct workspace *ws)
 {
 	u32 i = 0;
@@ -138,6 +160,10 @@ static int heuristic(struct list_head *ws, struct inode *inode,
 		workspace->bucket[byte].count++;
 	}

+	a = byte_set_size(workspace);
+	if (a > BYTE_SET_THRESHOLD)
+		return 2;
+
 	return 1;
 }

--
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 6/6] Btrfs: heuristic add byte core set calculation
  2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (4 preceding siblings ...)
  2017-08-25  9:18 ` [PATCH v7 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
@ 2017-08-25  9:18 ` Timofey Titovets
  2017-09-27 13:54   ` David Sterba
  2017-09-27 13:56   ` David Sterba
  5 siblings, 2 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-08-25  9:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte core set for data sample:
Sort bucket's numbers in decreasing order
Count how many numbers use 90% of sample
If core set are low (<=25%), data are easily compressible
If core set high (>=80%), data are not compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index ef723e991576..df0cefa42857 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/string.h>
 #include <linux/bio.h>
+#include <linux/sort.h>
 #include "compression.h"

 #define READ_SIZE 16
@@ -25,6 +26,8 @@
 #define BUCKET_SIZE 256
 #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
 #define BYTE_SET_THRESHOLD 64
+#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
+#define BYTE_CORE_SET_HIGH 200 // ~80%

 struct bucket_item {
 	u32 count;
@@ -67,6 +70,45 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+/* For bucket sorting */
+static inline int bucket_compare(const void *lv, const void *rv)
+{
+	struct bucket_item *l = (struct bucket_item *)(lv);
+	struct bucket_item *r = (struct bucket_item *)(rv);
+
+	return r->count - l->count;
+}
+
+/*
+ * Byte Core set size
+ * How many bytes use 90% of sample
+ */
+static int byte_core_set_size(struct workspace *ws)
+{
+	u32 a = 0;
+	u32 coreset_sum = 0;
+	u32 core_set_threshold = ws->sample_size*90/100;
+	struct bucket_item *bucket = ws->bucket;
+
+	/* Sort in reverse order */
+	sort(bucket, BUCKET_SIZE, sizeof(*bucket),
+	     &bucket_compare, NULL);
+
+	for (; a < BYTE_CORE_SET_LOW; a++)
+		coreset_sum += bucket[a].count;
+
+	if (coreset_sum > core_set_threshold)
+		return a;
+
+	for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
+		coreset_sum += bucket[a].count;
+		if (coreset_sum > core_set_threshold)
+			break;
+	}
+
+	return a;
+}
+
 static u32 byte_set_size(const struct workspace *ws)
 {
 	u32 a = 0;
@@ -164,7 +206,14 @@ static int heuristic(struct list_head *ws, struct inode *inode,
 	if (a > BYTE_SET_THRESHOLD)
 		return 2;

-	return 1;
+	a = byte_core_set_size(workspace);
+	if (a <= BYTE_CORE_SET_LOW)
+		return 3;
+
+	if (a >= BYTE_CORE_SET_HIGH)
+		return 0;
+
+	return 4;
 }

 const struct btrfs_compress_op btrfs_heuristic = {
--
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces
  2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
@ 2017-09-27 13:12   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:12 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:40PM +0300, Timofey Titovets wrote:
> Move heuristic to external file
> Implement compression workspaces support for
> heuristic resources

I think the file compression.c is suitable, there's not that much code
the heuristic adds and it its only related to compresession.

> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/Makefile      |  2 +-
>  fs/btrfs/compression.c | 18 +++++--------
>  fs/btrfs/compression.h |  7 ++++-
>  fs/btrfs/heuristic.c   | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 87 insertions(+), 13 deletions(-)
>  create mode 100644 fs/btrfs/heuristic.c
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 128ce17a80b0..6fa8479dff43 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
>  	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
>  	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
>  	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
> -	   uuid-tree.o props.o hash.o free-space-tree.o
> +	   uuid-tree.o props.o hash.o free-space-tree.o heuristic.o
> 
>  btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
>  btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 883ecc58fd0d..f0aaf27bcc95 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -704,6 +704,7 @@ static struct {
>  static const struct btrfs_compress_op * const btrfs_compress_op[] = {
>  	&btrfs_zlib_compress,
>  	&btrfs_lzo_compress,
> +	&btrfs_heuristic,
>  };
> 
>  void __init btrfs_init_compress(void)
> @@ -1065,18 +1066,13 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
>   */
>  int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
>  {
> -	u64 index = start >> PAGE_SHIFT;
> -	u64 end_index = end >> PAGE_SHIFT;
> -	struct page *page;
> -	int ret = 1;
> +	int ret;
> +	enum btrfs_compression_type type = BTRFS_HEURISTIC;
> +	struct list_head *workspace = find_workspace(type);
> 
> -	while (index <= end_index) {
> -		page = find_get_page(inode->i_mapping, index);
> -		kmap(page);
> -		kunmap(page);
> -		put_page(page);
> -		index++;
> -	}
> +	ret = btrfs_compress_op[type-1]->heuristic(workspace, inode,
> +						   start, end);
> 
> +	free_workspace(type, workspace);
>  	return ret;
>  }
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index 3b1b0ac15fdc..10e9ffa6dfa4 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -99,7 +99,8 @@ enum btrfs_compression_type {
>  	BTRFS_COMPRESS_NONE  = 0,
>  	BTRFS_COMPRESS_ZLIB  = 1,
>  	BTRFS_COMPRESS_LZO   = 2,
> -	BTRFS_COMPRESS_TYPES = 2,
> +	BTRFS_HEURISTIC = 3,
> +	BTRFS_COMPRESS_TYPES = 3,

Compression cannot be another type, I get it's for the workspace
management, that would need to be managed in another way anyway.

>  };
> 
>  struct btrfs_compress_op {
> @@ -123,10 +124,14 @@ struct btrfs_compress_op {
>  			  struct page *dest_page,
>  			  unsigned long start_byte,
>  			  size_t srclen, size_t destlen);
> +
> +	int (*heuristic)(struct list_head *workspace,
> +			 struct inode *inode, u64 start, u64 end);
>  };
> 
>  extern const struct btrfs_compress_op btrfs_zlib_compress;
>  extern const struct btrfs_compress_op btrfs_lzo_compress;
> +extern const struct btrfs_compress_op btrfs_heuristic;
> 
>  int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> new file mode 100644
> index 000000000000..92f9335bafd4
> --- /dev/null
> +++ b/fs/btrfs/heuristic.c
> @@ -0,0 +1,73 @@
> +/*
> + * Copyright (c) 2017
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/sizes.h>
> +#include <linux/pagemap.h>
> +#include <linux/string.h>
> +#include <linux/bio.h>
> +#include "compression.h"
> +
> +struct workspace {
> +	struct list_head list;
> +};
> +
> +static void heuristic_free_workspace(struct list_head *ws)
> +{
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +	kfree(workspace);
> +}
> +
> +static struct list_head *heuristic_alloc_workspace(void)
> +{
> +	struct workspace *workspace;
> +
> +	workspace = kzalloc(sizeof(*workspace), GFP_KERNEL);
> +	if (!workspace)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&workspace->list);
> +
> +	return &workspace->list;
> +}
> +
> +static int heuristic(struct list_head *ws, struct inode *inode,
> +		     u64 start, u64 end)
> +{
> +	struct page *page;
> +	u64 index, index_end;
> +
> +	index = start >> PAGE_SHIFT;
> +	index_end = end >> PAGE_SHIFT;
> +
> +	/* Don't miss unaligned end */
> +	if (!IS_ALIGNED(end, PAGE_SIZE))
> +		index_end++;
> +
> +	for (; index < index_end; index++) {
> +		page = find_get_page(inode->i_mapping, index);
> +		kmap(page);
> +		kunmap(page);
> +		put_page(page);
> +	}
> +
> +	return 1;
> +}
> +
> +const struct btrfs_compress_op btrfs_heuristic = {
> +	.alloc_workspace	= heuristic_alloc_workspace,
> +	.free_workspace		= heuristic_free_workspace,
> +	.heuristic              = heuristic,
> +};
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items
  2017-08-25  9:18 ` [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
@ 2017-09-27 13:22   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:22 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:41PM +0300, Timofey Titovets wrote:
> Heuristic workspace:
>  - Add bucket for storing byte type counters
>  - Add sample array for storing partial copy of
>    input data range
>  - Add counter for store current sample size to workspace
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index 92f9335bafd4..e3924c87af08 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -20,13 +20,29 @@
>  #include <linux/bio.h>
>  #include "compression.h"
> 
> +#define READ_SIZE 16
> +#define ITER_SHIFT 256
> +#define BUCKET_SIZE 256
> +#define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)

All the defines need a short explanation what's the purpose. I've seen
that eg. READ_SIZE is used as sample size, so I suggest to rename it.

Style issue (here and in many other places): binary operators are
separate by a space from both sides:

#define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED * READ_SIZE / ITER_SHIFT)

> +
> +struct bucket_item {
> +	u32 count;
> +};
> +
>  struct workspace {
> +	u8  *sample;
> +	/* Partial copy of input data */
> +	u32 sample_size;
> +	/* Bucket store counter for each byte type */
> +	struct bucket_item bucket[BUCKET_SIZE];
>  	struct list_head list;
>  };
> 
>  static void heuristic_free_workspace(struct list_head *ws)
>  {
>  	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +
> +	kvfree(workspace->sample);
>  	kfree(workspace);
>  }
> 
> @@ -38,9 +54,16 @@ static struct list_head *heuristic_alloc_workspace(void)
>  	if (!workspace)
>  		return ERR_PTR(-ENOMEM);
> 
> +	workspace->sample = kvmalloc(MAX_SAMPLE_SIZE, GFP_KERNEL);
> +	if (!workspace->sample)
> +		goto fail;
> +
>  	INIT_LIST_HEAD(&workspace->list);
> 
>  	return &workspace->list;
> +fail:
> +	heuristic_free_workspace(&workspace->list);
> +	return ERR_PTR(-ENOMEM);
>  }

Regarding the workspaces: I think we'll need to separate them completely
from the compression workspaces. The size is different, the usage
pattern is different and they can be reused by all compression types.

The size is obvious, if the size is eg. 4k, and we have like 128k for
compression workspace, we'd waste too much space.

The heruistic workspace could be used more frequently, when we just
sample the data and maybe decide not to compress. Here the locks and
workspaces will not interfere with actuall compression and
decompression.

So the idea is to preallocate a few heuristic workspaces in a similar
way as we do for compression, and add similar fallback logic, when we
need them but all are used.

I'm hoping that the heuristic calculations will be fast enough that we
can avoid the fallback allocation at all.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 3/6] Btrfs: implement heuristic sampling logic
  2017-08-25  9:18 ` [PATCH v7 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
@ 2017-09-27 13:38   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:38 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:42PM +0300, Timofey Titovets wrote:
> Copy sample data from input data range to sample buffer
> then calculate byte type count for that sample into bucket.
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index e3924c87af08..5192e51ab81e 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -69,8 +69,20 @@ static struct list_head *heuristic_alloc_workspace(void)
>  static int heuristic(struct list_head *ws, struct inode *inode,
>  		     u64 start, u64 end)
>  {
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
>  	struct page *page;
>  	u64 index, index_end;
> +	u32 a, b;

Please use more descriptive variable names. Using 'i' for simple
iteration is ok.

> +	u8 *in_data, *sample = workspace->sample;
> +	u8 byte;
> +
> +	/*
> +	 * Compression only handle first 128kb of input range
> +	 * And just shift over range in loop for compressing it.
> +	 * Let's do the same.
> +	*/
> +	if (end - start > BTRFS_MAX_UNCOMPRESSED)
> +		end = start + BTRFS_MAX_UNCOMPRESSED;
> 
>  	index = start >> PAGE_SHIFT;
>  	index_end = end >> PAGE_SHIFT;
> @@ -79,13 +91,37 @@ static int heuristic(struct list_head *ws, struct inode *inode,
>  	if (!IS_ALIGNED(end, PAGE_SIZE))
>  		index_end++;
> 
> +	b = 0;
>  	for (; index < index_end; index++) {
>  		page = find_get_page(inode->i_mapping, index);
> -		kmap(page);
> +		in_data = kmap(page);
> +		/* Handle case where start unaligned to PAGE_SIZE */
> +		a = start%PAGE_SIZE;

		a = start % PAGE_SIZE;

> +		while (a < PAGE_SIZE - READ_SIZE) {
> +			/* Prevent sample overflow */
> +			if (b >= MAX_SAMPLE_SIZE)
> +				break;
> +			/* Don't sample mem trash from last page */
> +			if (start > end - READ_SIZE)
> +				break;

I think you can merge the two conditions into one, if you calculate
beforehand where b would overflow 'end -> READ_SIZE'.

> +			memcpy(&sample[b], &in_data[a], READ_SIZE);
> +			a += ITER_SHIFT;
> +			start += ITER_SHIFT;
> +			b += READ_SIZE;
> +		}
>  		kunmap(page);
>  		put_page(page);
>  	}
> 
> +	workspace->sample_size = b;
> +
> +	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
> +
> +	for (a = 0; a < workspace->sample_size; a++) {
> +		byte = sample[a];
> +		workspace->bucket[byte].count++;
> +	}
> +
>  	return 1;
>  }
> 
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns
  2017-08-25  9:18 ` [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
@ 2017-09-27 13:47   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:47 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:43PM +0300, Timofey Titovets wrote:
> Walk over data sample and use memcmp to detect
> repeated data (like zeroed)
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index 5192e51ab81e..f1fa6e4f1c11 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -66,6 +66,19 @@ static struct list_head *heuristic_alloc_workspace(void)
>  	return ERR_PTR(-ENOMEM);
>  }
> 
> +static bool sample_repeated_patterns(struct workspace *ws)
> +{
> +	u32 i = 0;
> +	u8 *p = ws->sample;
> +
> +	for (; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {

Please use the inline initializatio of the iteration variables.

	for (i = 0; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {

> +		if(memcpy(&p[i], &p[i + READ_SIZE], READ_SIZE))
> +			return false;

I think zeros are by far the most common repeated pattern, for that we
can use some speedy tests, built on top of "find first bit", possibly
with hardware support.

OTOH if it's a more generic "any repeated bytes", this might be actually
slower, and the heuristic needs to be as fast as possible. I don't want
to start optimizing now, so the generic pattern is ok for now.

> +	}
> +
> +	return true;
> +}
> +
>  static int heuristic(struct list_head *ws, struct inode *inode,
>  		     u64 start, u64 end)
>  {
> @@ -115,6 +128,9 @@ static int heuristic(struct list_head *ws, struct inode *inode,
> 
>  	workspace->sample_size = b;
> 
> +	if (sample_repeated_patterns(workspace))
> +		return 1;
> +
>  	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
> 
>  	for (a = 0; a < workspace->sample_size; a++) {
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 5/6] Btrfs: heuristic add byte set calculation
  2017-08-25  9:18 ` [PATCH v7 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
@ 2017-09-27 13:50   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:50 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:44PM +0300, Timofey Titovets wrote:
> Calculate byte set size for data sample:
> Calculate how many unique bytes has been in sample
> By count all bytes in bucket with count > 0
> If byte set low (~25%), data are easily compressible
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index f1fa6e4f1c11..ef723e991576 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -24,6 +24,7 @@
>  #define ITER_SHIFT 256
>  #define BUCKET_SIZE 256
>  #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
> +#define BYTE_SET_THRESHOLD 64

Explanation needed, partially covered by the changelog but we need to
see it in the code as well.

> 
>  struct bucket_item {
>  	u32 count;
> @@ -66,6 +67,27 @@ static struct list_head *heuristic_alloc_workspace(void)
>  	return ERR_PTR(-ENOMEM);
>  }
> 
> +static u32 byte_set_size(const struct workspace *ws)
> +{
> +	u32 a = 0;

Please use 'i'.

> +	u32 byte_set_size = 0;
> +
> +	for (; a < BYTE_SET_THRESHOLD; a++) {

	for (i = 0; ...)


> +		if (ws->bucket[a].count > 0)
> +			byte_set_size++;
> +	}
> +
> +	for (; a < BUCKET_SIZE; a++) {

So here the initialization is intentionally skipped, please add a
comment that this is expected or explain like "continue past the first
sample until ...".

> +		if (ws->bucket[a].count > 0) {
> +			byte_set_size++;
> +			if (byte_set_size > BYTE_SET_THRESHOLD)
> +				return byte_set_size;
> +		}
> +	}
> +
> +	return byte_set_size;
> +}
> +
>  static bool sample_repeated_patterns(struct workspace *ws)
>  {
>  	u32 i = 0;
> @@ -138,6 +160,10 @@ static int heuristic(struct list_head *ws, struct inode *inode,
>  		workspace->bucket[byte].count++;
>  	}
> 
> +	a = byte_set_size(workspace);
> +	if (a > BYTE_SET_THRESHOLD)
> +		return 2;
> +
>  	return 1;
>  }
> 
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 6/6] Btrfs: heuristic add byte core set calculation
  2017-08-25  9:18 ` [PATCH v7 6/6] Btrfs: heuristic add byte core " Timofey Titovets
@ 2017-09-27 13:54   ` David Sterba
  2017-09-27 13:56   ` David Sterba
  1 sibling, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:54 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:45PM +0300, Timofey Titovets wrote:
> Calculate byte core set for data sample:
> Sort bucket's numbers in decreasing order
> Count how many numbers use 90% of sample
> If core set are low (<=25%), data are easily compressible
> If core set high (>=80%), data are not compressible
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index ef723e991576..df0cefa42857 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -18,6 +18,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/string.h>
>  #include <linux/bio.h>
> +#include <linux/sort.h>
>  #include "compression.h"
> 
>  #define READ_SIZE 16
> @@ -25,6 +26,8 @@
>  #define BUCKET_SIZE 256
>  #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
>  #define BYTE_SET_THRESHOLD 64
> +#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
> +#define BYTE_CORE_SET_HIGH 200 // ~80%

Please don't use the // c++ style of comments.

> 
>  struct bucket_item {
>  	u32 count;
> @@ -67,6 +70,45 @@ static struct list_head *heuristic_alloc_workspace(void)
>  	return ERR_PTR(-ENOMEM);
>  }
> 
> +/* For bucket sorting */

"Compare buckets by size, ascending */

> +static inline int bucket_compare(const void *lv, const void *rv)
> +{
> +	struct bucket_item *l = (struct bucket_item *)(lv);
> +	struct bucket_item *r = (struct bucket_item *)(rv);

The const should be used also for the specific types.

> +
> +	return r->count - l->count;
> +}
> +
> +/*
> + * Byte Core set size
> + * How many bytes use 90% of sample
> + */
> +static int byte_core_set_size(struct workspace *ws)
> +{
> +	u32 a = 0;
> +	u32 coreset_sum = 0;
> +	u32 core_set_threshold = ws->sample_size*90/100;

	u32 core_set_threshold = ws->sample_size * 90 / 100;

> +	struct bucket_item *bucket = ws->bucket;
> +
> +	/* Sort in reverse order */

So if it's reverse, please add _rev to the end of the comparator
function. The common naming is "comp_WHAT" so I suggest to use
comp_bucket_rev.

> +	sort(bucket, BUCKET_SIZE, sizeof(*bucket),
> +	     &bucket_compare, NULL);
> +
> +	for (; a < BYTE_CORE_SET_LOW; a++)
> +		coreset_sum += bucket[a].count;
> +
> +	if (coreset_sum > core_set_threshold)
> +		return a;
> +
> +	for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
> +		coreset_sum += bucket[a].count;
> +		if (coreset_sum > core_set_threshold)
> +			break;
> +	}
> +
> +	return a;
> +}
> +
>  static u32 byte_set_size(const struct workspace *ws)
>  {
>  	u32 a = 0;
> @@ -164,7 +206,14 @@ static int heuristic(struct list_head *ws, struct inode *inode,
>  	if (a > BYTE_SET_THRESHOLD)
>  		return 2;
> 
> -	return 1;
> +	a = byte_core_set_size(workspace);
> +	if (a <= BYTE_CORE_SET_LOW)
> +		return 3;
> +
> +	if (a >= BYTE_CORE_SET_HIGH)
> +		return 0;
> +
> +	return 4;
>  }
> 
>  const struct btrfs_compress_op btrfs_heuristic = {
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 6/6] Btrfs: heuristic add byte core set calculation
  2017-08-25  9:18 ` [PATCH v7 6/6] Btrfs: heuristic add byte core " Timofey Titovets
  2017-09-27 13:54   ` David Sterba
@ 2017-09-27 13:56   ` David Sterba
  1 sibling, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-09-27 13:56 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Aug 25, 2017 at 12:18:45PM +0300, Timofey Titovets wrote:
> Calculate byte core set for data sample:
> Sort bucket's numbers in decreasing order
> Count how many numbers use 90% of sample
> If core set are low (<=25%), data are easily compressible
> If core set high (>=80%), data are not compressible
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index ef723e991576..df0cefa42857 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -18,6 +18,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/string.h>
>  #include <linux/bio.h>
> +#include <linux/sort.h>
>  #include "compression.h"
> 
>  #define READ_SIZE 16
> @@ -25,6 +26,8 @@
>  #define BUCKET_SIZE 256
>  #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
>  #define BYTE_SET_THRESHOLD 64
> +#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
> +#define BYTE_CORE_SET_HIGH 200 // ~80%
> 
>  struct bucket_item {
>  	u32 count;
> @@ -67,6 +70,45 @@ static struct list_head *heuristic_alloc_workspace(void)
>  	return ERR_PTR(-ENOMEM);
>  }
> 
> +/* For bucket sorting */
> +static inline int bucket_compare(const void *lv, const void *rv)
> +{
> +	struct bucket_item *l = (struct bucket_item *)(lv);
> +	struct bucket_item *r = (struct bucket_item *)(rv);
> +
> +	return r->count - l->count;
> +}
> +
> +/*
> + * Byte Core set size
> + * How many bytes use 90% of sample
> + */
> +static int byte_core_set_size(struct workspace *ws)
> +{
> +	u32 a = 0;
> +	u32 coreset_sum = 0;
> +	u32 core_set_threshold = ws->sample_size*90/100;
> +	struct bucket_item *bucket = ws->bucket;
> +
> +	/* Sort in reverse order */
> +	sort(bucket, BUCKET_SIZE, sizeof(*bucket),
> +	     &bucket_compare, NULL);
> +
> +	for (; a < BYTE_CORE_SET_LOW; a++)
> +		coreset_sum += bucket[a].count;
> +
> +	if (coreset_sum > core_set_threshold)
> +		return a;
> +
> +	for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
> +		coreset_sum += bucket[a].count;
> +		if (coreset_sum > core_set_threshold)
> +			break;
> +	}
> +
> +	return a;
> +}
> +
>  static u32 byte_set_size(const struct workspace *ws)
>  {
>  	u32 a = 0;
> @@ -164,7 +206,14 @@ static int heuristic(struct list_head *ws, struct inode *inode,
>  	if (a > BYTE_SET_THRESHOLD)
>  		return 2;
> 
> -	return 1;
> +	a = byte_core_set_size(workspace);
> +	if (a <= BYTE_CORE_SET_LOW)
> +		return 3;
> +
> +	if (a >= BYTE_CORE_SET_HIGH)
> +		return 0;
> +
> +	return 4;

So the return value determines guessed compressibility, this could use
some human readable enums or defines.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-09-27 13:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
2017-09-27 13:12   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
2017-09-27 13:22   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
2017-09-27 13:38   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
2017-09-27 13:47   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-09-27 13:50   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 6/6] Btrfs: heuristic add byte core " Timofey Titovets
2017-09-27 13:54   ` David Sterba
2017-09-27 13:56   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.