All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] Btrfs: populate heuristic with code
@ 2017-08-23  0:26 Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Based on kdave for-next

Patches short:
1. Move heuristic to use compression workspaces
   Bit tricky, but works.

2. Add heuristic counters and buffer to workspaces

3. Implement simple input data sampling
   It's get 16 byte samples with 256 bytes shifts
   over input data. Collect info about how many
   different bytes (symbols) has been found in sample data

4. Implement check sample to zeroes
   Just check all bytes in sample to 0

5. Add code for calculate
   how many unique bytes has been found in sample data
   That can fast detect easy compressible data

6. Add code for calculate byte core set size
   i.e. how many unique bytes use 90% of sample data
   That code require that numbers in bucket must be sorted
   That can detect easy compressible data with many repeated bytes
   That can detect not compressible data with evenly distributed bytes

Changes v1 -> v2:
  - Change input data iterator shift 512 -> 256
  - Replace magic macro numbers with direct values
  - Drop useless symbol population in bucket
    as no one care about where and what symbol stored
    in bucket at now

Changes v2 -> v3 (only update #3 patch):
  - Fix u64 division problem by use u32 for input_size
  - Fix input size calculation start - end -> end - start
  - Add missing sort.h header

Changes v3 -> v4 (only update #1 patch):
  - Change counter type in bucket item u16 -> u32
  - Drop other fields from bucket item for now,
    no one use it

Change v4 -> v5
  - Move heuristic code to external file
  - Make heuristic use compression workspaces
  - Add check sample to zeroes

Timofey Titovets (6):
  Btrfs: heuristic make use compression workspaces
  Btrfs: heuristic workspace add bucket and sample items
  Btrfs: Implement heuristic sampling logic
  Btrfs: heuristic add detection of zeroed sample
  Btrfs: heuristic add byte set calculation
  Btrfs: heuristic add byte core set calculation

 fs/btrfs/Makefile      |   2 +-
 fs/btrfs/compression.c |  18 ++--
 fs/btrfs/compression.h |   7 +-
 fs/btrfs/heuristic.c   | 220 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 234 insertions(+), 13 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c

--
2.14.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/6] Btrfs: heuristic make use compression workspaces
  2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
@ 2017-08-23  0:26 ` Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Move heuristic to external file
Implement compression workspaces support for
heuristic resources

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/Makefile      |  2 +-
 fs/btrfs/compression.c | 18 +++++--------
 fs/btrfs/compression.h |  7 ++++-
 fs/btrfs/heuristic.c   | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 84 insertions(+), 13 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17a80b0..6fa8479dff43 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-	   uuid-tree.o props.o hash.o free-space-tree.o
+	   uuid-tree.o props.o hash.o free-space-tree.o heuristic.o

 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 883ecc58fd0d..f0aaf27bcc95 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -704,6 +704,7 @@ static struct {
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
 	&btrfs_zlib_compress,
 	&btrfs_lzo_compress,
+	&btrfs_heuristic,
 };

 void __init btrfs_init_compress(void)
@@ -1065,18 +1066,13 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
  */
 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 {
-	u64 index = start >> PAGE_SHIFT;
-	u64 end_index = end >> PAGE_SHIFT;
-	struct page *page;
-	int ret = 1;
+	int ret;
+	enum btrfs_compression_type type = BTRFS_HEURISTIC;
+	struct list_head *workspace = find_workspace(type);

-	while (index <= end_index) {
-		page = find_get_page(inode->i_mapping, index);
-		kmap(page);
-		kunmap(page);
-		put_page(page);
-		index++;
-	}
+	ret = btrfs_compress_op[type-1]->heuristic(workspace, inode,
+						   start, end);

+	free_workspace(type, workspace);
 	return ret;
 }
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 3b1b0ac15fdc..10e9ffa6dfa4 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -99,7 +99,8 @@ enum btrfs_compression_type {
 	BTRFS_COMPRESS_NONE  = 0,
 	BTRFS_COMPRESS_ZLIB  = 1,
 	BTRFS_COMPRESS_LZO   = 2,
-	BTRFS_COMPRESS_TYPES = 2,
+	BTRFS_HEURISTIC = 3,
+	BTRFS_COMPRESS_TYPES = 3,
 };

 struct btrfs_compress_op {
@@ -123,10 +124,14 @@ struct btrfs_compress_op {
 			  struct page *dest_page,
 			  unsigned long start_byte,
 			  size_t srclen, size_t destlen);
+
+	int (*heuristic)(struct list_head *workspace,
+			 struct inode *inode, u64 start, u64 end);
 };

 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;
+extern const struct btrfs_compress_op btrfs_heuristic;

 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
new file mode 100644
index 000000000000..96ae3e9334bc
--- /dev/null
+++ b/fs/btrfs/heuristic.c
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2017
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/pagemap.h>
+#include <linux/string.h>
+#include <linux/bio.h>
+#include "compression.h"
+
+struct workspace {
+	struct list_head list;
+};
+
+static void heuristic_free_workspace(struct list_head *ws)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	kfree(workspace);
+}
+
+static struct list_head *heuristic_alloc_workspace(void)
+{
+	struct workspace *workspace;
+
+	workspace = kzalloc(sizeof(*workspace), GFP_KERNEL);
+	if (!workspace)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&workspace->list);
+
+	return &workspace->list;
+}
+
+static int heuristic(struct list_head *ws, struct inode *inode,
+		     u64 start, u64 end)
+{
+	struct page *page;
+	u64 index, index_end;
+	u8 *input_data;
+
+	index = start >> PAGE_SHIFT;
+	index_end = end >> PAGE_SHIFT;
+
+	for (; index <= index_end; index++) {
+		page = find_get_page(inode->i_mapping, index);
+		input_data = kmap(page);
+		kunmap(page);
+		put_page(page);
+	}
+
+	return 1;
+}
+
+const struct btrfs_compress_op btrfs_heuristic = {
+	.alloc_workspace	= heuristic_alloc_workspace,
+	.free_workspace		= heuristic_free_workspace,
+	.heuristic              = heuristic,
+};
--
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/6] Btrfs: heuristic workspace add bucket and sample items
  2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
@ 2017-08-23  0:26 ` Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Heuristic workspace:
 - Add bucket for storing byte type counters
 - Add sample array for storing partial copy of
   input data range
 - Add counter for store current sample size to workspace

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 96ae3e9334bc..2c2cadc9dfad 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -20,13 +20,36 @@
 #include <linux/bio.h>
 #include "compression.h"

+#define READ_SIZE 16
+#define ITER_SHIFT 256
+#define BUCKET_SIZE 256
+
+/*
+ * While mapping 128KiB range into pages (with 4k PAGE_SIZE as ex),
+ * and iterate that with index <= index_end
+ * code get 0-32 items, that a 33 pages
+ */
+#define MAX_INPUT_PAGES ((BTRFS_MAX_UNCOMPRESSED >> PAGE_SHIFT)+1)
+#define MAX_SAMPLE_SIZE (MAX_INPUT_PAGES*PAGE_SIZE*READ_SIZE/ITER_SHIFT)
+
+struct bucket_item {
+	u32 count;
+};
+
 struct workspace {
+	u8  *sample;
+	/* Partial copy of input data */
+	u32 sample_size;
+	/* Bucket store counter for each byte type */
+	struct bucket_item bucket[BUCKET_SIZE];
 	struct list_head list;
 };

 static void heuristic_free_workspace(struct list_head *ws)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
+
+	kvfree(workspace->sample);
 	kfree(workspace);
 }

@@ -38,9 +61,16 @@ static struct list_head *heuristic_alloc_workspace(void)
 	if (!workspace)
 		return ERR_PTR(-ENOMEM);

+	workspace->sample = kvmalloc(MAX_SAMPLE_SIZE, GFP_KERNEL);
+	if (!workspace->sample)
+		goto fail;
+
 	INIT_LIST_HEAD(&workspace->list);

 	return &workspace->list;
+fail:
+	heuristic_free_workspace(&workspace->list);
+	return ERR_PTR(-ENOMEM);
 }

 static int heuristic(struct list_head *ws, struct inode *inode,
--
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/6] Btrfs: implement heuristic sampling logic
  2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
@ 2017-08-23  0:26 ` Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample Timofey Titovets
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Copy sample data from input data range to sample buffer
then calculate byte type count for that sample into bucket.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 2c2cadc9dfad..5336638a3b7c 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -76,20 +76,47 @@ static struct list_head *heuristic_alloc_workspace(void)
 static int heuristic(struct list_head *ws, struct inode *inode,
 		     u64 start, u64 end)
 {
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
 	struct page *page;
 	u64 index, index_end;
-	u8 *input_data;
+	u8 *in_data;
+	u32 a, b;
+	u8 byte;
+
+	/*
+	 * Compression only handle first 128kb of input range
+	 * And just shift over range in loop for compressing it.
+	 * Let's do the same.
+	 */
+	if (end - start > BTRFS_MAX_UNCOMPRESSED)
+		end = start + BTRFS_MAX_UNCOMPRESSED;

 	index = start >> PAGE_SHIFT;
 	index_end = end >> PAGE_SHIFT;

+	b = 0;
 	for (; index <= index_end; index++) {
 		page = find_get_page(inode->i_mapping, index);
-		input_data = kmap(page);
+		in_data = kmap(page);
+		a = 0;
+		while (a < PAGE_SIZE-READ_SIZE && b < MAX_SAMPLE_SIZE) {
+			memcpy(&workspace->sample[b], &in_data[a], READ_SIZE);
+			a += ITER_SHIFT;
+			b += READ_SIZE;
+		}
 		kunmap(page);
 		put_page(page);
 	}

+	workspace->sample_size = b;
+
+	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
+
+	for (a = 0; a < workspace->sample_size; a++) {
+		byte = workspace->sample[a];
+		workspace->bucket[byte].count++;
+	}
+
 	return 1;
 }

--
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample
  2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (2 preceding siblings ...)
  2017-08-23  0:26 ` [PATCH v5 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
@ 2017-08-23  0:26 ` Timofey Titovets
  2017-08-23 17:55   ` Diego Calleja
  2017-08-23  0:26 ` [PATCH v5 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 6/6] Btrfs: heuristic add byte core " Timofey Titovets
  5 siblings, 1 reply; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Use memcmp for check sample data to zeroes.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 5336638a3b7c..4557ea1db373 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -73,6 +73,21 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+static bool sample_zeroed(struct workspace *workspace)
+{
+	u32 i;
+	u8 zero[READ_SIZE];
+
+	memset(&zero, 0, sizeof(zero));
+
+	for (i = 0; i < workspace->sample_size; i += sizeof(zero)) {
+		if (memcmp(&workspace->sample[i], &zero, sizeof(zero)))
+			return false;
+	}
+
+	return true;
+}
+
 static int heuristic(struct list_head *ws, struct inode *inode,
 		     u64 start, u64 end)
 {
@@ -110,6 +125,9 @@ static int heuristic(struct list_head *ws, struct inode *inode,

 	workspace->sample_size = b;

+	if (sample_zeroed(workspace))
+		return 1;
+
 	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);

 	for (a = 0; a < workspace->sample_size; a++) {
--
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 5/6] Btrfs: heuristic add byte set calculation
  2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (3 preceding siblings ...)
  2017-08-23  0:26 ` [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample Timofey Titovets
@ 2017-08-23  0:26 ` Timofey Titovets
  2017-08-23  0:26 ` [PATCH v5 6/6] Btrfs: heuristic add byte core " Timofey Titovets
  5 siblings, 0 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte set size for data sample:
Calculate how many unique bytes has been in sample
By count all bytes in bucket with count > 0
If byte set low (~25%), data are easily compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 4557ea1db373..953428fde305 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -31,6 +31,7 @@
  */
 #define MAX_INPUT_PAGES ((BTRFS_MAX_UNCOMPRESSED >> PAGE_SHIFT)+1)
 #define MAX_SAMPLE_SIZE (MAX_INPUT_PAGES*PAGE_SIZE*READ_SIZE/ITER_SHIFT)
+#define BYTE_SET_THRESHOLD 64

 struct bucket_item {
 	u32 count;
@@ -73,6 +74,27 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+static int byte_set_size(const struct workspace *workspace)
+{
+	int a = 0;
+	int byte_set_size = 0;
+
+	for (; a < BYTE_SET_THRESHOLD; a++) {
+		if (workspace->bucket[a].count > 0)
+			byte_set_size++;
+	}
+
+	for (; a < BUCKET_SIZE; a++) {
+		if (workspace->bucket[a].count > 0) {
+			byte_set_size++;
+			if (byte_set_size > BYTE_SET_THRESHOLD)
+				return byte_set_size;
+		}
+	}
+
+	return byte_set_size;
+}
+
 static bool sample_zeroed(struct workspace *workspace)
 {
 	u32 i;
@@ -135,6 +157,10 @@ static int heuristic(struct list_head *ws, struct inode *inode,
 		workspace->bucket[byte].count++;
 	}

+	a = byte_set_size(workspace);
+	if (a > BYTE_SET_THRESHOLD)
+		return 2;
+
 	return 1;
 }

--
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 6/6] Btrfs: heuristic add byte core set calculation
  2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
                   ` (4 preceding siblings ...)
  2017-08-23  0:26 ` [PATCH v5 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
@ 2017-08-23  0:26 ` Timofey Titovets
  5 siblings, 0 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23  0:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte core set for data sample:
Sort bucket's numbers in decreasing order
Count how many numbers use 90% of sample
If core set are low (<=25%), data are easily compressible
If core set high (>=80%), data are not compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 953428fde305..14128f77d5ae 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/string.h>
 #include <linux/bio.h>
+#include <linux/sort.h>
 #include "compression.h"

 #define READ_SIZE 16
@@ -32,6 +33,8 @@
 #define MAX_INPUT_PAGES ((BTRFS_MAX_UNCOMPRESSED >> PAGE_SHIFT)+1)
 #define MAX_SAMPLE_SIZE (MAX_INPUT_PAGES*PAGE_SIZE*READ_SIZE/ITER_SHIFT)
 #define BYTE_SET_THRESHOLD 64
+#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
+#define BYTE_CORE_SET_HIGH 200 // ~80%

 struct bucket_item {
 	u32 count;
@@ -74,6 +77,45 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+/* For bucket sorting */
+static inline int bucket_compare(const void *lv, const void *rv)
+{
+	struct bucket_item *l = (struct bucket_item *)(lv);
+	struct bucket_item *r = (struct bucket_item *)(rv);
+
+	return r->count - l->count;
+}
+
+/*
+ * Byte Core set size
+ * How many bytes use 90% of sample
+ */
+static int byte_core_set_size(struct workspace *workspace)
+{
+	int a = 0;
+	u32 coreset_sum = 0;
+	struct bucket_item *bucket = workspace->bucket;
+	u32 core_set_threshold = workspace->sample_size*90/100;
+
+	/* Sort in reverse order */
+	sort(bucket, BUCKET_SIZE, sizeof(*bucket),
+	     &bucket_compare, NULL);
+
+	for (; a < BYTE_CORE_SET_LOW; a++)
+		coreset_sum += bucket[a].count;
+
+	if (coreset_sum > core_set_threshold)
+		return a;
+
+	for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
+		coreset_sum += bucket[a].count;
+		if (coreset_sum > core_set_threshold)
+			break;
+	}
+
+	return a;
+}
+
 static int byte_set_size(const struct workspace *workspace)
 {
 	int a = 0;
@@ -161,7 +203,14 @@ static int heuristic(struct list_head *ws, struct inode *inode,
 	if (a > BYTE_SET_THRESHOLD)
 		return 2;

-	return 1;
+	a = byte_core_set_size(workspace);
+	if (a <= BYTE_CORE_SET_LOW)
+		return 3;
+
+	if (a >= BYTE_CORE_SET_HIGH)
+		return 0;
+
+	return 4;
 }

 const struct btrfs_compress_op btrfs_heuristic = {
--
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample
  2017-08-23  0:26 ` [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample Timofey Titovets
@ 2017-08-23 17:55   ` Diego Calleja
  2017-08-23 20:03     ` Timofey Titovets
  0 siblings, 1 reply; 9+ messages in thread
From: Diego Calleja @ 2017-08-23 17:55 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

El miércoles, 23 de agosto de 2017 2:26:48 (CEST) Timofey Titovets escribió:
> +	for (i = 0; i < workspace->sample_size; i += sizeof(zero)) {
> +		if (memcmp(&workspace->sample[i], &zero, sizeof(zero)))
> +			return false;

Instead of just checking for 0, wouldn't it be a better idea to check
for any kind of repetitions?

As in, iterate over the sample and memcmp() each part of sample with
the previous one. The cost would be the same, and it would detect not
just zeros, but any kind of repeated data. Is there any reason I'm
missing for not doing this?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample
  2017-08-23 17:55   ` Diego Calleja
@ 2017-08-23 20:03     ` Timofey Titovets
  0 siblings, 0 replies; 9+ messages in thread
From: Timofey Titovets @ 2017-08-23 20:03 UTC (permalink / raw)
  To: Diego Calleja; +Cc: linux-btrfs

2017-08-23 20:55 GMT+03:00 Diego Calleja <diegocg@gmail.com>:
> El miércoles, 23 de agosto de 2017 2:26:48 (CEST) Timofey Titovets escribió:
>> +     for (i = 0; i < workspace->sample_size; i += sizeof(zero)) {
>> +             if (memcmp(&workspace->sample[i], &zero, sizeof(zero)))
>> +                     return false;
>
> Instead of just checking for 0, wouldn't it be a better idea to check
> for any kind of repetitions?
>
> As in, iterate over the sample and memcmp() each part of sample with
> the previous one. The cost would be the same, and it would detect not
> just zeros, but any kind of repeated data. Is there any reason I'm
> missing for not doing this?

Thank you, i was not think about that,
That approach seems better, i will update the patch.

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-08-23 20:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23  0:26 [PATCH v5 0/6] Btrfs: populate heuristic with code Timofey Titovets
2017-08-23  0:26 ` [PATCH v5 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
2017-08-23  0:26 ` [PATCH v5 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
2017-08-23  0:26 ` [PATCH v5 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
2017-08-23  0:26 ` [PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample Timofey Titovets
2017-08-23 17:55   ` Diego Calleja
2017-08-23 20:03     ` Timofey Titovets
2017-08-23  0:26 ` [PATCH v5 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-08-23  0:26 ` [PATCH v5 6/6] Btrfs: heuristic add byte core " Timofey Titovets

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.