All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timofey Titovets <nefelim4ag@gmail.com>
To: linux-btrfs@vger.kernel.org
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Subject: [PATCH v7 6/6] Btrfs: heuristic add byte core set calculation
Date: Fri, 25 Aug 2017 12:18:45 +0300	[thread overview]
Message-ID: <20170825091845.4120-7-nefelim4ag@gmail.com> (raw)
In-Reply-To: <20170825091845.4120-1-nefelim4ag@gmail.com>

Calculate byte core set for data sample:
Sort bucket's numbers in decreasing order
Count how many numbers use 90% of sample
If core set are low (<=25%), data are easily compressible
If core set high (>=80%), data are not compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index ef723e991576..df0cefa42857 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/string.h>
 #include <linux/bio.h>
+#include <linux/sort.h>
 #include "compression.h"

 #define READ_SIZE 16
@@ -25,6 +26,8 @@
 #define BUCKET_SIZE 256
 #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
 #define BYTE_SET_THRESHOLD 64
+#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
+#define BYTE_CORE_SET_HIGH 200 // ~80%

 struct bucket_item {
 	u32 count;
@@ -67,6 +70,45 @@ static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+/* For bucket sorting */
+static inline int bucket_compare(const void *lv, const void *rv)
+{
+	struct bucket_item *l = (struct bucket_item *)(lv);
+	struct bucket_item *r = (struct bucket_item *)(rv);
+
+	return r->count - l->count;
+}
+
+/*
+ * Byte Core set size
+ * How many bytes use 90% of sample
+ */
+static int byte_core_set_size(struct workspace *ws)
+{
+	u32 a = 0;
+	u32 coreset_sum = 0;
+	u32 core_set_threshold = ws->sample_size*90/100;
+	struct bucket_item *bucket = ws->bucket;
+
+	/* Sort in reverse order */
+	sort(bucket, BUCKET_SIZE, sizeof(*bucket),
+	     &bucket_compare, NULL);
+
+	for (; a < BYTE_CORE_SET_LOW; a++)
+		coreset_sum += bucket[a].count;
+
+	if (coreset_sum > core_set_threshold)
+		return a;
+
+	for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
+		coreset_sum += bucket[a].count;
+		if (coreset_sum > core_set_threshold)
+			break;
+	}
+
+	return a;
+}
+
 static u32 byte_set_size(const struct workspace *ws)
 {
 	u32 a = 0;
@@ -164,7 +206,14 @@ static int heuristic(struct list_head *ws, struct inode *inode,
 	if (a > BYTE_SET_THRESHOLD)
 		return 2;

-	return 1;
+	a = byte_core_set_size(workspace);
+	if (a <= BYTE_CORE_SET_LOW)
+		return 3;
+
+	if (a >= BYTE_CORE_SET_HIGH)
+		return 0;
+
+	return 4;
 }

 const struct btrfs_compress_op btrfs_heuristic = {
--
2.14.1

  parent reply	other threads:[~2017-08-25  9:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25  9:18 [PATCH v7 0/6] Btrfs: populate heuristic with code Timofey Titovets
2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
2017-09-27 13:12   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
2017-09-27 13:22   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
2017-09-27 13:38   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
2017-09-27 13:47   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-09-27 13:50   ` David Sterba
2017-08-25  9:18 ` Timofey Titovets [this message]
2017-09-27 13:54   ` [PATCH v7 6/6] Btrfs: heuristic add byte core " David Sterba
2017-09-27 13:56   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170825091845.4120-7-nefelim4ag@gmail.com \
    --to=nefelim4ag@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.