All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/6] Btrfs: populate heuristic with code
@ 2017-09-28 14:33 Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 1/6] Btrfs: compression.c separated heuristic/compression workspaces Timofey Titovets
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Based on linux master 4.14-rc2
Duplicated to github:
https://github.com/Nefelim4ag/linux/tree/heuristic_v8

Compile tested, hand tested on live system

Patches short:
1. Implement workspaces for heuristic
   Separate heuristic/compression workspaces
   Main target for that patch:
   Maximum code sharing, minimum changes

2. Add heuristic counters and buffer to workspaces
   Add some base macros for heuristic

3. Implement simple input data sampling
   It's get 16 byte samples with 256 bytes shifts
   over input data. Collect info about how many
   different bytes (symbols) has been found in sample data
   (i.e. systematic sampling used for now)

4. Implement check sample to repeated data
   Just iterate over sample and do memcmp()
   ex. will detect zeroed data

5. Add code for calculate how many unique bytes has been found
   in sample data.
   That heuristic can detect text like data (configs, xml, json, html & etc)
   Because in most text like data byte set are restricted to limit number
   of possible characters, and that restriction in most cases
   make data easy compressible.

6. Add code for calculate byte core set size
   i.e. how many unique bytes use 90% of sample data

   Several type of structured binary data have in general
   nearly all types of bytes, but distribution can be Uniform
   where in bucket all byte types will have the nearly same count
   (ex. Encrypted data)
   and as ex. Normal (Gaussian), where count of bytes will be not so linear

   That code require that numbers in bucket must be sorted
   That can detect easy compressible data with many repeated bytes
   That can detect not compressible data with evenly distributed bytes

Changes v1 -> v2:
  - Change input data iterator shift 512 -> 256
  - Replace magic macro numbers with direct values
  - Drop useless symbol population in bucket
    as no one care about where and what symbol stored
    in bucket at now

Changes v2 -> v3 (only update #3 patch):
  - Fix u64 division problem by use u32 for input_size
  - Fix input size calculation start - end -> end - start
  - Add missing sort.h header

Changes v3 -> v4 (only update #1 patch):
  - Change counter type in bucket item u16 -> u32
  - Drop other fields from bucket item for now,
    no one use it

Change v4 -> v5
  - Move heuristic code to external file
  - Make heuristic use compression workspaces
  - Add check sample to zeroes

Change v5 -> v6
  - Add some code to hande page unaligned range start/end
  - replace sample zeroed check with check for repeated data

Change v6 -> v7
  - Add missing part of first patch
  - Make use of IS_ALIGNED() for check tail aligment

Change v7 -> v8
  - All code moved to compression.c (again)
  - Heuristic workspaces inmplemented another way
    i.e. only share logic with compression workspaces
  - Some style fixes suggested by Devid
  - Move sampling function from heuristic code
    (I'm afraid of big functions)
  - Much more comments and explanations

Timofey Titovets (6):
  Btrfs: compression.c separated heuristic/compression workspaces
  Btrfs: heuristic workspace add bucket and sample items, macros
  Btrfs: implement heuristic sampling logic
  Btrfs: heuristic add detection of repeated data patterns
  Btrfs: heuristic add byte set calculation
  Btrfs: heuristic add byte core set calculation

 fs/btrfs/compression.c | 393 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 366 insertions(+), 27 deletions(-)

--
2.14.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v8 1/6] Btrfs: compression.c separated heuristic/compression workspaces
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
@ 2017-09-28 14:33 ` Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 2/6] Btrfs: heuristic workspace add bucket and sample items, macros Timofey Titovets
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Compression heuristic itself is not a compression type,
as current infrastructure supposed to provide workspaces
for several compression types, it's difficult to just add
heuristic workspace.

Just refactor the code to support compression/heuristic
workspaces with maximum code sharing and minimum changes in it.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 138 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 120 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b51d23f5cafa..c3624e8e3919 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -690,7 +690,33 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	return ret;
 }

-static struct {
+
+struct heuristic_ws {
+	struct list_head list;
+};
+
+static void free_heuristic_ws(struct list_head *ws)
+{
+	struct heuristic_ws *workspace;
+
+	workspace = list_entry(ws, struct heuristic_ws, list);
+
+	kfree(workspace);
+}
+
+static struct list_head *alloc_heuristic_ws(void){
+	struct heuristic_ws *ws;
+
+	ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+	if (!ws)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ws->list);
+
+	return &ws->list;
+}
+
+struct workspaces_list {
 	struct list_head idle_ws;
 	spinlock_t ws_lock;
 	/* Number of free workspaces */
@@ -699,7 +725,11 @@ static struct {
 	atomic_t total_ws;
 	/* Waiters for a free workspace */
 	wait_queue_head_t ws_wait;
-} btrfs_comp_ws[BTRFS_COMPRESS_TYPES];
+};
+
+static struct workspaces_list btrfs_comp_ws[BTRFS_COMPRESS_TYPES];
+
+static struct workspaces_list btrfs_heuristic_ws;

 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
 	&btrfs_zlib_compress,
@@ -709,11 +739,24 @@ static const struct btrfs_compress_op * const btrfs_compress_op[] = {

 void __init btrfs_init_compress(void)
 {
+	struct list_head *workspace;
 	int i;

-	for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
-		struct list_head *workspace;
+	INIT_LIST_HEAD(&btrfs_heuristic_ws.idle_ws);
+	spin_lock_init(&btrfs_heuristic_ws.ws_lock);
+	atomic_set(&btrfs_heuristic_ws.total_ws, 0);
+	init_waitqueue_head(&btrfs_heuristic_ws.ws_wait);

+	workspace = alloc_heuristic_ws();
+	if (IS_ERR(workspace)) {
+		pr_warn("BTRFS: cannot preallocate heuristic workspace, will try later\n");
+	} else {
+		atomic_set(&btrfs_heuristic_ws.total_ws, 1);
+		btrfs_heuristic_ws.free_ws = 1;
+		list_add(workspace, &btrfs_heuristic_ws.idle_ws);
+	}
+
+	for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
 		INIT_LIST_HEAD(&btrfs_comp_ws[i].idle_ws);
 		spin_lock_init(&btrfs_comp_ws[i].ws_lock);
 		atomic_set(&btrfs_comp_ws[i].total_ws, 0);
@@ -740,18 +783,33 @@ void __init btrfs_init_compress(void)
  * Preallocation makes a forward progress guarantees and we do not return
  * errors.
  */
-static struct list_head *find_workspace(int type)
+static struct list_head *__find_workspace(int type, bool heuristic)
 {
 	struct list_head *workspace;
 	int cpus = num_online_cpus();
 	int idx = type - 1;
 	unsigned nofs_flag;

-	struct list_head *idle_ws	= &btrfs_comp_ws[idx].idle_ws;
-	spinlock_t *ws_lock		= &btrfs_comp_ws[idx].ws_lock;
-	atomic_t *total_ws		= &btrfs_comp_ws[idx].total_ws;
-	wait_queue_head_t *ws_wait	= &btrfs_comp_ws[idx].ws_wait;
-	int *free_ws			= &btrfs_comp_ws[idx].free_ws;
+	struct list_head *idle_ws;
+	spinlock_t *ws_lock;
+	atomic_t *total_ws;
+	wait_queue_head_t *ws_wait;
+	int *free_ws;
+
+	if (!heuristic) {
+		idle_ws		= &btrfs_comp_ws[idx].idle_ws;
+		ws_lock		= &btrfs_comp_ws[idx].ws_lock;
+		total_ws	= &btrfs_comp_ws[idx].total_ws;
+		ws_wait		= &btrfs_comp_ws[idx].ws_wait;
+		free_ws		= &btrfs_comp_ws[idx].free_ws;
+	} else {
+		idle_ws		= &btrfs_heuristic_ws.idle_ws;
+		ws_lock		= &btrfs_heuristic_ws.ws_lock;
+		total_ws	= &btrfs_heuristic_ws.total_ws;
+		ws_wait		= &btrfs_heuristic_ws.ws_wait;
+		free_ws		= &btrfs_heuristic_ws.free_ws;
+	}
+
 again:
 	spin_lock(ws_lock);
 	if (!list_empty(idle_ws)) {
@@ -781,7 +839,10 @@ static struct list_head *find_workspace(int type)
 	 * context of btrfs_compress_bio/btrfs_compress_pages
 	 */
 	nofs_flag = memalloc_nofs_save();
-	workspace = btrfs_compress_op[idx]->alloc_workspace();
+	if (!heuristic)
+		workspace = btrfs_compress_op[idx]->alloc_workspace();
+	else
+		workspace = alloc_heuristic_ws();
 	memalloc_nofs_restore(nofs_flag);

 	if (IS_ERR(workspace)) {
@@ -812,18 +873,38 @@ static struct list_head *find_workspace(int type)
 	return workspace;
 }

+static struct list_head *find_workspace(int type)
+{
+	return __find_workspace(type, false);
+}
+
 /*
  * put a workspace struct back on the list or free it if we have enough
  * idle ones sitting around
  */
-static void free_workspace(int type, struct list_head *workspace)
+static void __free_workspace(int type, struct list_head *workspace,
+			     bool heuristic)
 {
 	int idx = type - 1;
-	struct list_head *idle_ws	= &btrfs_comp_ws[idx].idle_ws;
-	spinlock_t *ws_lock		= &btrfs_comp_ws[idx].ws_lock;
-	atomic_t *total_ws		= &btrfs_comp_ws[idx].total_ws;
-	wait_queue_head_t *ws_wait	= &btrfs_comp_ws[idx].ws_wait;
-	int *free_ws			= &btrfs_comp_ws[idx].free_ws;
+	struct list_head *idle_ws;
+	spinlock_t *ws_lock;
+	atomic_t *total_ws;
+	wait_queue_head_t *ws_wait;
+	int *free_ws;
+
+	if (!heuristic) {
+		idle_ws		= &btrfs_comp_ws[idx].idle_ws;
+		ws_lock		= &btrfs_comp_ws[idx].ws_lock;
+		total_ws	= &btrfs_comp_ws[idx].total_ws;
+		ws_wait		= &btrfs_comp_ws[idx].ws_wait;
+		free_ws		= &btrfs_comp_ws[idx].free_ws;
+	} else {
+		idle_ws		= &btrfs_heuristic_ws.idle_ws;
+		ws_lock		= &btrfs_heuristic_ws.ws_lock;
+		total_ws	= &btrfs_heuristic_ws.total_ws;
+		ws_wait		= &btrfs_heuristic_ws.ws_wait;
+		free_ws		= &btrfs_heuristic_ws.free_ws;
+	}

 	spin_lock(ws_lock);
 	if (*free_ws <= num_online_cpus()) {
@@ -834,7 +915,10 @@ static void free_workspace(int type, struct list_head *workspace)
 	}
 	spin_unlock(ws_lock);

-	btrfs_compress_op[idx]->free_workspace(workspace);
+	if (!heuristic)
+		btrfs_compress_op[idx]->free_workspace(workspace);
+	else
+		free_heuristic_ws(workspace);
 	atomic_dec(total_ws);
 wake:
 	/*
@@ -845,6 +929,11 @@ static void free_workspace(int type, struct list_head *workspace)
 		wake_up(ws_wait);
 }

+static void free_workspace(int type, struct list_head *ws)
+{
+	return __free_workspace(type, ws, false);
+}
+
 /*
  * cleanup function for module exit
  */
@@ -853,6 +942,13 @@ static void free_workspaces(void)
 	struct list_head *workspace;
 	int i;

+	while (!list_empty(&btrfs_heuristic_ws.idle_ws)) {
+		workspace = btrfs_heuristic_ws.idle_ws.next;
+		list_del(workspace);
+		free_heuristic_ws(workspace);
+		atomic_dec(&btrfs_heuristic_ws.total_ws);
+	}
+
 	for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
 		while (!list_empty(&btrfs_comp_ws[i].idle_ws)) {
 			workspace = btrfs_comp_ws[i].idle_ws.next;
@@ -1066,11 +1162,15 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
  */
 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 {
+	struct list_head *ws_list = __find_workspace(0, true);
+	struct heuristic_ws *ws;
 	u64 index = start >> PAGE_SHIFT;
 	u64 end_index = end >> PAGE_SHIFT;
 	struct page *page;
 	int ret = 1;

+	ws = list_entry(ws_list, struct heuristic_ws, list);
+
 	while (index <= end_index) {
 		page = find_get_page(inode->i_mapping, index);
 		kmap(page);
@@ -1079,5 +1179,7 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 		index++;
 	}

+	__free_workspace(0, ws_list, true);
+
 	return ret;
 }
--
2.14.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 2/6] Btrfs: heuristic workspace add bucket and sample items, macros
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 1/6] Btrfs: compression.c separated heuristic/compression workspaces Timofey Titovets
@ 2017-09-28 14:33 ` Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Added macros:
 - For future sampling algo
 - For bucket size

Heuristic workspace:
 - Add bucket for storing byte type counters
 - Add sample array for storing partial copy of
   input data range
 - Add counter for store current sample size to workspace

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index c3624e8e3919..1715655d050e 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -691,7 +691,50 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 }


+/*
+ * Heuristic use systematic sampling to collect data from
+ * input data range, so some constant needed to control algo
+ *
+ * @SAMPLING_READ_SIZE - how many bytes will be copied from on each sample
+ * @SAMPLING_INTERVAL  - period that used to iterate over input data range
+ */
+#define SAMPLING_READ_SIZE 16
+#define SAMPLING_INTERVAL 256
+
+/*
+ * For statistic analize of input data range
+ * consider that data consists from bytes
+ * so this is Galois Field with 256 objects
+ * each object have attribute count, i.e. how many times
+ * that object detected in sample
+ */
+#define BUCKET_SIZE 256
+
+/*
+ * The size of the sample is based on a statistical sampling rule of thumb.
+ * That common to perform sampling tests as long as number of elements in
+ * each cell is at least five.
+ *
+ * Instead of five, for now choose 32 value to obtain more accurate results.
+ * If the data contains the maximum number of symbols, which is 256,
+ * lets obtain a sample size bound of 8192.
+ *
+ * So sample at most 8KB of data per data range: 16 consecutive
+ * bytes from up to 512 locations.
+ */
+#define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED * \
+			  SAMPLING_READ_SIZE / SAMPLING_INTERVAL)
+
+
+struct bucket_item {
+	u32 count;
+};
+
 struct heuristic_ws {
+	/* Partial copy of input data */
+	u8 *sample;
+	/* Bucket store counter for each byte type */
+	struct bucket_item *bucket;
 	struct list_head list;
 };

@@ -701,6 +744,8 @@ static void free_heuristic_ws(struct list_head *ws)

 	workspace = list_entry(ws, struct heuristic_ws, list);

+	kvfree(workspace->sample);
+	kfree(workspace->bucket);
 	kfree(workspace);
 }

@@ -711,9 +756,19 @@ static struct list_head *alloc_heuristic_ws(void){
 	if (!ws)
 		return ERR_PTR(-ENOMEM);

-	INIT_LIST_HEAD(&ws->list);
+	ws->sample = kvmalloc(MAX_SAMPLE_SIZE, GFP_KERNEL);
+	if (!ws->sample)
+		goto fail;
+
+	ws->bucket = kcalloc(BUCKET_SIZE, sizeof(*ws->bucket), GFP_KERNEL);
+	if (!ws->bucket)
+		goto fail;

+	INIT_LIST_HEAD(&ws->list);
 	return &ws->list;
+fail:
+	free_heuristic_ws(&ws->list);
+	return ERR_PTR(-ENOMEM);
 }

 struct workspaces_list {
--
2.14.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 3/6] Btrfs: implement heuristic sampling logic
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 1/6] Btrfs: compression.c separated heuristic/compression workspaces Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 2/6] Btrfs: heuristic workspace add bucket and sample items, macros Timofey Titovets
@ 2017-09-28 14:33 ` Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Copy sample data from input data range to sample buffer
then calculate byte type count for that sample into bucket.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 71 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 61 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1715655d050e..e2419639ae7f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -725,7 +725,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED * \
 			  SAMPLING_READ_SIZE / SAMPLING_INTERVAL)

-
 struct bucket_item {
 	u32 count;
 };
@@ -733,6 +732,7 @@ struct bucket_item {
 struct heuristic_ws {
 	/* Partial copy of input data */
 	u8 *sample;
+	u32 sample_size;
 	/* Bucket store counter for each byte type */
 	struct bucket_item *bucket;
 	struct list_head list;
@@ -1200,6 +1200,57 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 	return 1;
 }

+static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end,
+				     struct heuristic_ws *ws)
+{
+	struct page *page;
+	u64 index, index_end;
+	u32 i, curr_sample_pos;
+	u8 *in_data;
+
+	/*
+	 * Compression only handle first 128kb of input range
+	 * And just shift over range in loop for compressing it.
+	 * Let's do the same.
+	 *
+	 * MAX_SAMPLE_SIZE - calculated in assume that heuristic will process
+	 * not more then BTRFS_MAX_UNCOMPRESSED at run
+	 */
+	if (end - start > BTRFS_MAX_UNCOMPRESSED)
+		end = start + BTRFS_MAX_UNCOMPRESSED;
+
+	index = start >> PAGE_SHIFT;
+	index_end = end >> PAGE_SHIFT;
+
+	/* Don't miss unaligned end */
+	if (!IS_ALIGNED(end, PAGE_SIZE))
+		index_end++;
+
+	curr_sample_pos = 0;
+	while (index < index_end) {
+		page = find_get_page(inode->i_mapping, index);
+		in_data = kmap(page);
+		/* Handle case where start unaligned to PAGE_SIZE */
+		i = start % PAGE_SIZE;
+		while (i < PAGE_SIZE - SAMPLING_READ_SIZE) {
+			/* Don't sample mem trash from last page */
+			if (start > end - SAMPLING_READ_SIZE)
+				break;
+			memcpy(&ws->sample[curr_sample_pos],
+			       &in_data[i], SAMPLING_READ_SIZE);
+			i += SAMPLING_INTERVAL;
+			start += SAMPLING_INTERVAL;
+			curr_sample_pos += SAMPLING_READ_SIZE;
+		}
+		kunmap(page);
+		put_page(page);
+
+		index++;
+	}
+
+	ws->sample_size = curr_sample_pos;
+}
+
 /*
  * Compression heuristic.
  *
@@ -1219,19 +1270,19 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 {
 	struct list_head *ws_list = __find_workspace(0, true);
 	struct heuristic_ws *ws;
-	u64 index = start >> PAGE_SHIFT;
-	u64 end_index = end >> PAGE_SHIFT;
-	struct page *page;
+	u32 i;
+	u8 byte;
 	int ret = 1;

 	ws = list_entry(ws_list, struct heuristic_ws, list);

-	while (index <= end_index) {
-		page = find_get_page(inode->i_mapping, index);
-		kmap(page);
-		kunmap(page);
-		put_page(page);
-		index++;
+	heuristic_collect_sample(inode, start, end, ws);
+
+	memset(ws->bucket, 0, sizeof(*ws->bucket)*BUCKET_SIZE);
+
+	for (i = 0; i < ws->sample_size; i++) {
+		byte = ws->sample[i];
+		ws->bucket[byte].count++;
 	}

 	__free_workspace(0, ws_list, true);
--
2.14.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 4/6] Btrfs: heuristic add detection of repeated data patterns
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (2 preceding siblings ...)
  2017-09-28 14:33 ` [PATCH v8 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
@ 2017-09-28 14:33 ` Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Walk over data sample and use memcmp to detect
repeated data (like zeroed)

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e2419639ae7f..1cb4df023d5e 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1200,6 +1200,16 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 	return 1;
 }

+
+static bool sample_repeated_patterns(struct heuristic_ws *ws)
+{
+	u32 half_of_sample = ws->sample_size / 2;
+	u8 *p = ws->sample;
+
+	return !memcmp(&p[0], &p[half_of_sample], half_of_sample);
+}
+
+
 static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end,
 				     struct heuristic_ws *ws)
 {
@@ -1278,6 +1288,11 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)

 	heuristic_collect_sample(inode, start, end, ws);

+	if (sample_repeated_patterns(ws)) {
+		ret = 1;
+		goto out;
+	}
+
 	memset(ws->bucket, 0, sizeof(*ws->bucket)*BUCKET_SIZE);

 	for (i = 0; i < ws->sample_size; i++) {
@@ -1285,7 +1300,7 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 		ws->bucket[byte].count++;
 	}

+out:
 	__free_workspace(0, ws_list, true);
-
 	return ret;
 }
--
2.14.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 5/6] Btrfs: heuristic add byte set calculation
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (3 preceding siblings ...)
  2017-09-28 14:33 ` [PATCH v8 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
@ 2017-09-28 14:33 ` Timofey Titovets
  2017-09-28 14:33 ` [PATCH v8 6/6] Btrfs: heuristic add byte core " Timofey Titovets
  2017-09-29 16:22 ` [PATCH v8 0/6] Btrfs: populate heuristic with code David Sterba
  6 siblings, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte set size for data sample:
Calculate how many unique bytes has been in sample
By count all bytes in bucket with count > 0
If byte set low (~25%), data are easily compressible
Otherwise need additional analize

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1cb4df023d5e..83daf2f9d051 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1201,6 +1201,48 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 }


+/*
+ * Count byte types in bucket
+ * That heuristic can detect text like data (configs, xml, json, html & etc)
+ * Because in most text like data byte set are restricted to limit number
+ * of possible characters, and that restriction in most cases
+ * make data easy compressible.
+ *
+ * @BYTE_SET_THRESHOLD - assume that all data with that byte set size:
+ *	less - compressible
+ *	more - need additional analize
+ */
+
+#define BYTE_SET_THRESHOLD 64
+
+static u32 byte_set_size(const struct heuristic_ws *ws)
+{
+	u32 i;
+	u32 byte_set_size = 0;
+
+	for (i = 0; i < BYTE_SET_THRESHOLD; i++) {
+		if (ws->bucket[i].count > 0)
+			byte_set_size++;
+	}
+
+	/*
+	 * Continue collecting count of byte types in bucket
+	 * If byte set size bigger then threshold
+	 * That useless to continue, because for that data type
+	 * detection technique fail
+	 */
+	for (; i < BUCKET_SIZE; i++) {
+		if (ws->bucket[i].count > 0) {
+			byte_set_size++;
+			if (byte_set_size > BYTE_SET_THRESHOLD)
+				return byte_set_size;
+		}
+	}
+
+	return byte_set_size;
+}
+
+
 static bool sample_repeated_patterns(struct heuristic_ws *ws)
 {
 	u32 half_of_sample = ws->sample_size / 2;
@@ -1300,6 +1342,12 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 		ws->bucket[byte].count++;
 	}

+	i = byte_set_size(ws);
+	if (i < BYTE_SET_THRESHOLD) {
+		ret = 2;
+		goto out;
+	}
+
 out:
 	__free_workspace(0, ws_list, true);
 	return ret;
--
2.14.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 6/6] Btrfs: heuristic add byte core set calculation
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (4 preceding siblings ...)
  2017-09-28 14:33 ` [PATCH v8 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
@ 2017-09-28 14:33 ` Timofey Titovets
  2017-09-29 16:22 ` [PATCH v8 0/6] Btrfs: populate heuristic with code David Sterba
  6 siblings, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-09-28 14:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte core set for data sample:
Sort bucket's numbers in decreasing order
Count how many numbers use 90% of sample
If core set are low (<=25%), data are easily compressible
If core set high (>=80%), data are not compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 83daf2f9d051..1aa04ae214a7 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -33,6 +33,7 @@
 #include <linux/bit_spinlock.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
+#include <linux/sort.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -1201,6 +1202,62 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 }


+/* Compare buckets by size, ascending */
+static inline int bucket_comp_rev(const void *lv, const void *rv)
+{
+	const struct bucket_item *l = (struct bucket_item *)(lv);
+	const struct bucket_item *r = (struct bucket_item *)(rv);
+
+	return r->count - l->count;
+}
+
+/*
+ * Byte Core set size
+ * How many bytes use 90% of sample
+ *
+ * Several type of structure d binary data have in general
+ * nearly all types of bytes, but distribution can be Uniform
+ * where in bucket all byte types will have the nearly same count
+ * (ex. Encrypted data)
+ * and as ex. Normal (Gaussian), where count of bytes will be not so linear
+ * in that case data can be compressible, probably compressible, and
+ * not compressible, so assume:
+ *
+ * @BYTE_CORE_SET_LOW - main part of byte types repeated frequently
+ *                      compression algo can easy fix that
+ * @BYTE_CORE_SET_HIGH - data have Uniform distribution and with high
+ *                       probability not compressible
+ *
+ */
+
+#define BYTE_CORE_SET_LOW  64
+#define BYTE_CORE_SET_HIGH 200
+
+static int byte_core_set_size(struct heuristic_ws *ws)
+{
+	u32 i;
+	u32 coreset_sum = 0;
+	u32 core_set_threshold = ws->sample_size * 90 / 100;
+	struct bucket_item *bucket = ws->bucket;
+
+	/* Sort in reverse order */
+	sort(bucket, BUCKET_SIZE, sizeof(*bucket), &bucket_comp_rev, NULL);
+
+	for (i = 0; i < BYTE_CORE_SET_LOW; i++)
+		coreset_sum += bucket[i].count;
+
+	if (coreset_sum > core_set_threshold)
+		return i;
+
+	for (; i < BYTE_CORE_SET_HIGH && bucket[i].count > 0; i++) {
+		coreset_sum += bucket[i].count;
+		if (coreset_sum > core_set_threshold)
+			break;
+	}
+
+	return i;
+}
+
 /*
  * Count byte types in bucket
  * That heuristic can detect text like data (configs, xml, json, html & etc)
@@ -1348,6 +1405,17 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 		goto out;
 	}

+	i = byte_core_set_size(ws);
+	if (i <= BYTE_CORE_SET_LOW) {
+		ret = 3;
+		goto out;
+	}
+
+	if (i >= BYTE_CORE_SET_HIGH) {
+		ret = 0;
+		goto out;
+	}
+
 out:
 	__free_workspace(0, ws_list, true);
 	return ret;
--
2.14.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (5 preceding siblings ...)
  2017-09-28 14:33 ` [PATCH v8 6/6] Btrfs: heuristic add byte core " Timofey Titovets
@ 2017-09-29 16:22 ` David Sterba
  2017-10-19 15:39   ` David Sterba
  6 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2017-09-29 16:22 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
> Compile tested, hand tested on live system
> 
> Change v7 -> v8
>   - All code moved to compression.c (again)
>   - Heuristic workspaces inmplemented another way
>     i.e. only share logic with compression workspaces
>   - Some style fixes suggested by Devid
>   - Move sampling function from heuristic code
>     (I'm afraid of big functions)
>   - Much more comments and explanations

Thanks for the update, I went through the patches and they looked good
enough to be put into for-next. I may have more comments about a few
things, but nothing serious that would hinder testing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-09-29 16:22 ` [PATCH v8 0/6] Btrfs: populate heuristic with code David Sterba
@ 2017-10-19 15:39   ` David Sterba
  2017-10-19 22:48     ` Timofey Titovets
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2017-10-19 15:39 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: David Sterba, linux-btrfs

On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
> > Compile tested, hand tested on live system
> > 
> > Change v7 -> v8
> >   - All code moved to compression.c (again)
> >   - Heuristic workspaces inmplemented another way
> >     i.e. only share logic with compression workspaces
> >   - Some style fixes suggested by Devid
> >   - Move sampling function from heuristic code
> >     (I'm afraid of big functions)
> >   - Much more comments and explanations
> 
> Thanks for the update, I went through the patches and they looked good
> enough to be put into for-next. I may have more comments about a few
> things, but nothing serious that would hinder testing.

I did a final pass through the patches and edited comments wehre I was
not able to undrerstand them. Please check the updated patches in [1] if
I did not accidentally change the meaning.

I'm about to add the patchset to the main patch pile for 4.15 soon.
Further tuning is possible and such patches will be probably accepted
during the 4.15 development cycle once the as parts have landed. It's
desirable to gather some testing results of heuristic effects on various
data types. So far I've been watching for performance drops only.

In case the heuristic would turn out to cause problems we can't fix
during 4.15 cycle, we can still disable it. This is only a last resort
measure but we need to be prepared.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-10-19 15:39   ` David Sterba
@ 2017-10-19 22:48     ` Timofey Titovets
  2017-10-20 13:45       ` David Sterba
  0 siblings, 1 reply; 14+ messages in thread
From: Timofey Titovets @ 2017-10-19 22:48 UTC (permalink / raw)
  To: David Sterba, Timofey Titovets, linux-btrfs

2017-10-19 18:39 GMT+03:00 David Sterba <dsterba@suse.cz>:
> On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
>> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
>> > Compile tested, hand tested on live system
>> >
>> > Change v7 -> v8
>> >   - All code moved to compression.c (again)
>> >   - Heuristic workspaces inmplemented another way
>> >     i.e. only share logic with compression workspaces
>> >   - Some style fixes suggested by Devid
>> >   - Move sampling function from heuristic code
>> >     (I'm afraid of big functions)
>> >   - Much more comments and explanations
>>
>> Thanks for the update, I went through the patches and they looked good
>> enough to be put into for-next. I may have more comments about a few
>> things, but nothing serious that would hinder testing.
>
> I did a final pass through the patches and edited comments wehre I was
> not able to undrerstand them. Please check the updated patches in [1] if
> I did not accidentally change the meaning.

I don't see a link [1] in mail, may be you missed it?
I look at my patches in for-next branch, and that's not looks like
changed, so i assume your link not point at kernel.org %).

> I'm about to add the patchset to the main patch pile for 4.15 soon.
> Further tuning is possible and such patches will be probably accepted
> during the 4.15 development cycle once the as parts have landed. It's
> desirable to gather some testing results of heuristic effects on various
> data types. So far I've been watching for performance drops only.

Just for my information, you compare compress + heuristic with
compression force?

P.S.
Just to sync that we expect from heuristic:
it's expected to get some performance drops on easy compressible data, because
heuristic not free,
but how much this drops?
Main reason for heuristic, it's to win cpu/latency cost for bad
compressible data.
In compare to direct compression.
That allow to provide some worst case stable latency/throughput for userspace.

P.S.S.
I send some emails before, where i show slow paths in heuristic
(sort(), ilog2()).
So i expect that kernel can see same slow downs on that paths.
But i'm don't have enough skills for now, to perform kernel profiling.

> In case the heuristic would turn out to cause problems we can't fix
> during 4.15 cycle, we can still disable it. This is only a last resort
> measure but we need to be prepared.
kk

Thanks.


-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-10-19 22:48     ` Timofey Titovets
@ 2017-10-20 13:45       ` David Sterba
  2017-10-22 13:44         ` Timofey Titovets
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2017-10-20 13:45 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote:
> 2017-10-19 18:39 GMT+03:00 David Sterba <dsterba@suse.cz>:
> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
> >> > Compile tested, hand tested on live system
> >> >
> >> > Change v7 -> v8
> >> >   - All code moved to compression.c (again)
> >> >   - Heuristic workspaces inmplemented another way
> >> >     i.e. only share logic with compression workspaces
> >> >   - Some style fixes suggested by Devid
> >> >   - Move sampling function from heuristic code
> >> >     (I'm afraid of big functions)
> >> >   - Much more comments and explanations
> >>
> >> Thanks for the update, I went through the patches and they looked good
> >> enough to be put into for-next. I may have more comments about a few
> >> things, but nothing serious that would hinder testing.
> >
> > I did a final pass through the patches and edited comments wehre I was
> > not able to undrerstand them. Please check the updated patches in [1] if
> > I did not accidentally change the meaning.
> 
> I don't see a link [1] in mail, may be you missed it?

Yeah, sorry:
https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-10-20 13:45       ` David Sterba
@ 2017-10-22 13:44         ` Timofey Titovets
  2017-10-23 18:36           ` Timofey Titovets
  2017-10-24 19:23           ` David Sterba
  0 siblings, 2 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-10-22 13:44 UTC (permalink / raw)
  To: David Sterba, Timofey Titovets, linux-btrfs

2017-10-20 16:45 GMT+03:00 David Sterba <dsterba@suse.cz>:
> On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote:
>> 2017-10-19 18:39 GMT+03:00 David Sterba <dsterba@suse.cz>:
>> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
>> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
>> >> > Compile tested, hand tested on live system
>> >> >
>> >> > Change v7 -> v8
>> >> >   - All code moved to compression.c (again)
>> >> >   - Heuristic workspaces inmplemented another way
>> >> >     i.e. only share logic with compression workspaces
>> >> >   - Some style fixes suggested by Devid
>> >> >   - Move sampling function from heuristic code
>> >> >     (I'm afraid of big functions)
>> >> >   - Much more comments and explanations
>> >>
>> >> Thanks for the update, I went through the patches and they looked good
>> >> enough to be put into for-next. I may have more comments about a few
>> >> things, but nothing serious that would hinder testing.
>> >
>> > I did a final pass through the patches and edited comments wehre I was
>> > not able to undrerstand them. Please check the updated patches in [1] if
>> > I did not accidentally change the meaning.
>>
>> I don't see a link [1] in mail, may be you missed it?
>
> Yeah, sorry:
> https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic

I did re-read updated comments, looks ok to me
(i only found one typo, leave a comment).


Thanks
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-10-22 13:44         ` Timofey Titovets
@ 2017-10-23 18:36           ` Timofey Titovets
  2017-10-24 19:23           ` David Sterba
  1 sibling, 0 replies; 14+ messages in thread
From: Timofey Titovets @ 2017-10-23 18:36 UTC (permalink / raw)
  To: David Sterba, Timofey Titovets, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

2017-10-22 16:44 GMT+03:00 Timofey Titovets <nefelim4ag@gmail.com>:
> 2017-10-20 16:45 GMT+03:00 David Sterba <dsterba@suse.cz>:
>> On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote:
>>> 2017-10-19 18:39 GMT+03:00 David Sterba <dsterba@suse.cz>:
>>> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
>>> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
>>> >> > Compile tested, hand tested on live system
>>> >> >
>>> >> > Change v7 -> v8
>>> >> >   - All code moved to compression.c (again)
>>> >> >   - Heuristic workspaces inmplemented another way
>>> >> >     i.e. only share logic with compression workspaces
>>> >> >   - Some style fixes suggested by Devid
>>> >> >   - Move sampling function from heuristic code
>>> >> >     (I'm afraid of big functions)
>>> >> >   - Much more comments and explanations
>>> >>
>>> >> Thanks for the update, I went through the patches and they looked good
>>> >> enough to be put into for-next. I may have more comments about a few
>>> >> things, but nothing serious that would hinder testing.
>>> >
>>> > I did a final pass through the patches and edited comments wehre I was
>>> > not able to undrerstand them. Please check the updated patches in [1] if
>>> > I did not accidentally change the meaning.
>>>
>>> I don't see a link [1] in mail, may be you missed it?
>>
>> Yeah, sorry:
>> https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic
>
> I did re-read updated comments, looks ok to me
> (i only found one typo, leave a comment).
>
>
> Thanks
> --
> Have a nice day,
> Timofey.

Can you please try that patch? (in attach)

I think some time about performance hit of heuristic and
how to avoid using sorting,

That patch will try prefind min/max values (before sorting) in array,
and (max - min), used to filter edge data cases where
byte core size < 64 or bigger > 200
It's a bit hacky workaround =\,
That show a ~same speedup on my data set as show using of radix sort.
(i.e. x2 speed up)

Thanks.

-- 
Have a nice day,
Timofey.

[-- Attachment #2: 0001-Btrfs-heuristic-try-avoid-bucket-sorting-on-edge-dat.patch --]
[-- Type: text/x-patch, Size: 2144 bytes --]

From fb2a329828e64ad0e224a8cb97dbc17147149629 Mon Sep 17 00:00:00 2001
From: Timofey Titovets <nefelim4ag@gmail.com>
Date: Mon, 23 Oct 2017 21:24:29 +0300
Subject: [PATCH] Btrfs: heuristic try avoid bucket sorting on edge data cases

Heap sort used in kernel are too slow and costly,
So let's make some statistic assume about egde input data cases
Based on observation of difference between min/max values in bucket.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0ca16909894e..56b67ec4fb5b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1310,8 +1310,46 @@ static int byte_core_set_size(struct heuristic_ws *ws)
 	u32 i;
 	u32 coreset_sum = 0;
 	const u32 core_set_threshold = ws->sample_size * 90 / 100;
+	struct bucket_item *max, *min;
+	struct bucket_item tmp;
 	struct bucket_item *bucket = ws->bucket;
 
+
+	/* Presort for find min/max value */
+	max = &bucket[0];
+	min = &bucket[BUCKET_SIZE - 1];
+	for (i = 1; i < BUCKET_SIZE - 1; i++) {
+		if (bucket[i].count > max->count) {
+			tmp = *max;
+			*max = bucket[i];
+			bucket[i] = tmp;
+		}
+		if (bucket[i].count < min->count) {
+			tmp = *min;
+			*min = bucket[i];
+			bucket[i] = tmp;
+		}
+	}
+
+	/*
+	 * Hacks for avoid sorting on Edge data cases (sorting too constly)
+	 * i.e. that will fast filter easy compressible
+	 * and bad compressible data
+	 * Based on observation of number distribution on different data sets
+	 *
+	 * Assume 1: For bad compressible data distribution between min/max
+	 * will be less then 0.6% of sample size
+	 *
+	 * Assume 2: For good compressible data distribution between min/max
+	 * will be far bigger then 4% of sample size
+	 */
+
+	if (max->count - min->count < ws->sample_size * 6 / 1000)
+		return BYTE_CORE_SET_HIGH + 1;
+
+	if (max->count - min->count > ws->sample_size * 4 / 100)
+		return BYTE_CORE_SET_LOW - 1;
+
 	/* Sort in reverse order */
 	sort(bucket, BUCKET_SIZE, sizeof(*bucket), &bucket_comp_rev, NULL);
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
  2017-10-22 13:44         ` Timofey Titovets
  2017-10-23 18:36           ` Timofey Titovets
@ 2017-10-24 19:23           ` David Sterba
  1 sibling, 0 replies; 14+ messages in thread
From: David Sterba @ 2017-10-24 19:23 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Sun, Oct 22, 2017 at 04:44:04PM +0300, Timofey Titovets wrote:
> 2017-10-20 16:45 GMT+03:00 David Sterba <dsterba@suse.cz>:
> > On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote:
> >> 2017-10-19 18:39 GMT+03:00 David Sterba <dsterba@suse.cz>:
> >> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
> >> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
> >> >> > Compile tested, hand tested on live system
> >> >> >
> >> >> > Change v7 -> v8
> >> >> >   - All code moved to compression.c (again)
> >> >> >   - Heuristic workspaces inmplemented another way
> >> >> >     i.e. only share logic with compression workspaces
> >> >> >   - Some style fixes suggested by Devid
> >> >> >   - Move sampling function from heuristic code
> >> >> >     (I'm afraid of big functions)
> >> >> >   - Much more comments and explanations
> >> >>
> >> >> Thanks for the update, I went through the patches and they looked good
> >> >> enough to be put into for-next. I may have more comments about a few
> >> >> things, but nothing serious that would hinder testing.
> >> >
> >> > I did a final pass through the patches and edited comments wehre I was
> >> > not able to undrerstand them. Please check the updated patches in [1] if
> >> > I did not accidentally change the meaning.
> >>
> >> I don't see a link [1] in mail, may be you missed it?
> >
> > Yeah, sorry:
> > https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic
> 
> I did re-read updated comments, looks ok to me
> (i only found one typo, leave a comment).

Thanks, fixed and branch added to the rest for 4.15.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-10-24 19:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-28 14:33 [PATCH v8 0/6] Btrfs: populate heuristic with code Timofey Titovets
2017-09-28 14:33 ` [PATCH v8 1/6] Btrfs: compression.c separated heuristic/compression workspaces Timofey Titovets
2017-09-28 14:33 ` [PATCH v8 2/6] Btrfs: heuristic workspace add bucket and sample items, macros Timofey Titovets
2017-09-28 14:33 ` [PATCH v8 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
2017-09-28 14:33 ` [PATCH v8 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
2017-09-28 14:33 ` [PATCH v8 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-09-28 14:33 ` [PATCH v8 6/6] Btrfs: heuristic add byte core " Timofey Titovets
2017-09-29 16:22 ` [PATCH v8 0/6] Btrfs: populate heuristic with code David Sterba
2017-10-19 15:39   ` David Sterba
2017-10-19 22:48     ` Timofey Titovets
2017-10-20 13:45       ` David Sterba
2017-10-22 13:44         ` Timofey Titovets
2017-10-23 18:36           ` Timofey Titovets
2017-10-24 19:23           ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.