linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fuse: Solve request_find() bottleneck
@ 2018-09-11 10:11 Kirill Tkhai
  2018-09-11 10:11 ` [PATCH 1/3] fuse: Change interrupt requests allocation algorhythm Kirill Tkhai
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-09-11 10:11 UTC (permalink / raw)
  To: miklos, kuznet, ktkhai, linux-fsdevel, linux-kernel

Hi,

We noticed the performance bottle neck in FUSE running our
Virtuozzo storage over rdma. On some types of workload
we observe 20% of time spent in request_find() in profiler.
This function is iterating over long list of requests, and it
scales bad.

The patch introduces hash table to reduce the number
of iterations, we do in this function. Also, algorithm
of generating IDs for interrupt requests is changed,
simplified request_find() function and killed
fuse_req::intr_unique field.

Kirill
---

Kirill Tkhai (3):
      fuse: Change interrupt requests allocation algorhythm
      fuse: Kill fuse_req::intr_unique
      fuse: Use hash table to link processing request


 fs/fuse/dev.c    |   47 +++++++++++++++++++++++++++++++++++++----------
 fs/fuse/fuse_i.h |   11 +++++------
 fs/fuse/inode.c  |    5 ++++-
 3 files changed, 46 insertions(+), 17 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] fuse: Change interrupt requests allocation algorhythm
  2018-09-11 10:11 [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
@ 2018-09-11 10:11 ` Kirill Tkhai
  2018-09-11 10:12 ` [PATCH 2/3] fuse: Kill fuse_req::intr_unique Kirill Tkhai
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-09-11 10:11 UTC (permalink / raw)
  To: miklos, kuznet, ktkhai, linux-fsdevel, linux-kernel

Using of two unconnected IDs req->in.h.unique and
req->intr_unique does not allow to link requests
to a hash table. We need can't use none of them
as a key to calculate hash.

This patch changes the algorhythm of allocation
of IDs for a request. Plain requests obtain even
ID, while interrupt requests are encoded in
the low bit. So, in next patches we will be able
to use the rest of ID bits to caculate hash,
and the hash will be the same for plain and
interrupt requests.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 fs/fuse/dev.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 11ea2c4a38ab..f24fd6f61a7a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -25,6 +25,10 @@
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
 
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
 static struct kmem_cache *fuse_req_cachep;
 
 static struct fuse_dev *fuse_get_dev(struct file *file)
@@ -319,7 +323,8 @@ static unsigned len_args(unsigned numargs, struct fuse_arg *args)
 
 static u64 fuse_get_unique(struct fuse_iqueue *fiq)
 {
-	return ++fiq->reqctr;
+	fiq->reqctr += FUSE_REQ_ID_STEP;
+	return fiq->reqctr;
 }
 
 static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req)
@@ -1084,7 +1089,7 @@ __releases(fiq->waitq.lock)
 	int err;
 
 	list_del_init(&req->intr_entry);
-	req->intr_unique = fuse_get_unique(fiq);
+	req->intr_unique = (req->in.h.unique | FUSE_INT_REQ_BIT);
 	memset(&ih, 0, sizeof(ih));
 	memset(&arg, 0, sizeof(arg));
 	ih.len = reqsize;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/3] fuse: Kill fuse_req::intr_unique
  2018-09-11 10:11 [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
  2018-09-11 10:11 ` [PATCH 1/3] fuse: Change interrupt requests allocation algorhythm Kirill Tkhai
@ 2018-09-11 10:12 ` Kirill Tkhai
  2018-09-11 10:12 ` [PATCH 3/3] fuse: Use hash table to link processing request Kirill Tkhai
  2018-09-24 12:08 ` [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
  3 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-09-11 10:12 UTC (permalink / raw)
  To: miklos, kuznet, ktkhai, linux-fsdevel, linux-kernel

This field is not needed after the previous patch,
since we can easily convert request ID to interrupt
request ID and vice versa.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 fs/fuse/dev.c    |   11 ++++++-----
 fs/fuse/fuse_i.h |    3 ---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index f24fd6f61a7a..dda177b57ea2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1089,12 +1089,11 @@ __releases(fiq->waitq.lock)
 	int err;
 
 	list_del_init(&req->intr_entry);
-	req->intr_unique = (req->in.h.unique | FUSE_INT_REQ_BIT);
 	memset(&ih, 0, sizeof(ih));
 	memset(&arg, 0, sizeof(arg));
 	ih.len = reqsize;
 	ih.opcode = FUSE_INTERRUPT;
-	ih.unique = req->intr_unique;
+	ih.unique = (req->in.h.unique | FUSE_INT_REQ_BIT);
 	arg.unique = req->in.h.unique;
 
 	spin_unlock(&fiq->waitq.lock);
@@ -1799,8 +1798,10 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 {
 	struct fuse_req *req;
 
+	unique &= ~FUSE_INT_REQ_BIT;
+
 	list_for_each_entry(req, &fpq->processing, list) {
-		if (req->in.h.unique == unique || req->intr_unique == unique)
+		if (req->in.h.unique == unique)
 			return req;
 	}
 	return NULL;
@@ -1878,8 +1879,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	if (!req)
 		goto err_unlock_pq;
 
-	/* Is it an interrupt reply? */
-	if (req->intr_unique == oh.unique) {
+	/* Is it an interrupt reply ID? */
+	if (oh.unique & FUSE_INT_REQ_BIT) {
 		spin_unlock(&fpq->lock);
 
 		err = -EINVAL;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f78e9614bb5f..f72e4974b3bb 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -311,9 +311,6 @@ struct fuse_req {
 	/** refcount */
 	refcount_t count;
 
-	/** Unique ID for the interrupt request */
-	u64 intr_unique;
-
 	/* Request flags, updated with test/set/clear_bit() */
 	unsigned long flags;
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] fuse: Use hash table to link processing request
  2018-09-11 10:11 [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
  2018-09-11 10:11 ` [PATCH 1/3] fuse: Change interrupt requests allocation algorhythm Kirill Tkhai
  2018-09-11 10:12 ` [PATCH 2/3] fuse: Kill fuse_req::intr_unique Kirill Tkhai
@ 2018-09-11 10:12 ` Kirill Tkhai
  2018-09-25  9:08   ` Miklos Szeredi
  2018-09-24 12:08 ` [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
  3 siblings, 1 reply; 8+ messages in thread
From: Kirill Tkhai @ 2018-09-11 10:12 UTC (permalink / raw)
  To: miklos, kuznet, ktkhai, linux-fsdevel, linux-kernel

We noticed the performance bottle neck in FUSE running our
Virtuozzo storage over rdma. On some types of workload
we observe 20% of times pent in request_find() in profiler.
This function is iterating over long requests list, and it
scales bad.

The patch introduces hash table to reduce the number
of iterations, we do in this function. Hash generating
algorithm is taken from hash_add() function, while
512 lines table is used to store pending requests.
This fixes problem and improves the performance.

Reported-by: Alexey Kuznetsov <kuznet@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 fs/fuse/dev.c    |   29 +++++++++++++++++++++++++----
 fs/fuse/fuse_i.h |    8 +++++---
 fs/fuse/inode.c  |    5 ++++-
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index dda177b57ea2..867825cd04fa 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -327,6 +327,20 @@ static u64 fuse_get_unique(struct fuse_iqueue *fiq)
 	return fiq->reqctr;
 }
 
+static unsigned int __fuse_req_hash(u64 unique)
+{
+	unique &= ~FUSE_INT_REQ_BIT;
+
+	/* Borrowed from hash_add() */
+	return hash_min(unique,
+			HASH_BITS(((struct fuse_pqueue *)0)->processing));
+}
+
+static unsigned int fuse_req_hash(struct fuse_req *req)
+{
+	return __fuse_req_hash(req->in.h.unique);
+}
+
 static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req)
 {
 	req->in.h.len = sizeof(struct fuse_in_header) +
@@ -1314,7 +1328,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 		err = reqsize;
 		goto out_end;
 	}
-	list_move_tail(&req->list, &fpq->processing);
+	list_move_tail(&req->list, &fpq->processing[fuse_req_hash(req)]);
 	spin_unlock(&fpq->lock);
 	set_bit(FR_SENT, &req->flags);
 	/* matches barrier in request_wait_answer() */
@@ -1797,10 +1811,12 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
 static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 {
 	struct fuse_req *req;
+	unsigned int hash;
 
 	unique &= ~FUSE_INT_REQ_BIT;
+	hash = __fuse_req_hash(unique);
 
-	list_for_each_entry(req, &fpq->processing, list) {
+	list_for_each_entry(req, &fpq->processing[hash], list) {
 		if (req->in.h.unique == unique)
 			return req;
 	}
@@ -2108,6 +2124,7 @@ void fuse_abort_conn(struct fuse_conn *fc, bool is_abort)
 		struct fuse_dev *fud;
 		struct fuse_req *req, *next;
 		LIST_HEAD(to_end);
+		int i;
 
 		fc->connected = 0;
 		fc->blocked = 0;
@@ -2129,7 +2146,9 @@ void fuse_abort_conn(struct fuse_conn *fc, bool is_abort)
 				}
 				spin_unlock(&req->waitq.lock);
 			}
-			list_splice_tail_init(&fpq->processing, &to_end);
+			for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
+				list_splice_tail_init(&fpq->processing[i],
+						      &to_end);
 			spin_unlock(&fpq->lock);
 		}
 		fc->max_background = UINT_MAX;
@@ -2169,10 +2188,12 @@ int fuse_dev_release(struct inode *inode, struct file *file)
 		struct fuse_conn *fc = fud->fc;
 		struct fuse_pqueue *fpq = &fud->pq;
 		LIST_HEAD(to_end);
+		int i;
 
 		spin_lock(&fpq->lock);
 		WARN_ON(!list_empty(&fpq->io));
-		list_splice_init(&fpq->processing, &to_end);
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
+			list_splice_init(&fpq->processing[i], &to_end);
 		spin_unlock(&fpq->lock);
 
 		end_requests(fc, &to_end);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f72e4974b3bb..ed69e1530216 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -408,6 +408,8 @@ struct fuse_iqueue {
 	struct fasync_struct *fasync;
 };
 
+#define FUSE_PQ_HASH_SIZE 512
+
 struct fuse_pqueue {
 	/** Connection established */
 	unsigned connected;
@@ -415,11 +417,11 @@ struct fuse_pqueue {
 	/** Lock protecting accessess to  members of this structure */
 	spinlock_t lock;
 
-	/** The list of requests being processed */
-	struct list_head processing;
-
 	/** The list of requests under I/O */
 	struct list_head io;
+
+	/** The lists of requests being processed */
+	struct list_head processing[FUSE_PQ_HASH_SIZE];
 };
 
 /**
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index db9e60b7eb69..b28412e75c18 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -594,9 +594,12 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq)
 
 static void fuse_pqueue_init(struct fuse_pqueue *fpq)
 {
+	int i;
+
 	memset(fpq, 0, sizeof(struct fuse_pqueue));
 	spin_lock_init(&fpq->lock);
-	INIT_LIST_HEAD(&fpq->processing);
+	for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
+		INIT_LIST_HEAD(&fpq->processing[i]);
 	INIT_LIST_HEAD(&fpq->io);
 	fpq->connected = 1;
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] fuse: Solve request_find() bottleneck
  2018-09-11 10:11 [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
                   ` (2 preceding siblings ...)
  2018-09-11 10:12 ` [PATCH 3/3] fuse: Use hash table to link processing request Kirill Tkhai
@ 2018-09-24 12:08 ` Kirill Tkhai
  2018-09-24 15:04   ` Miklos Szeredi
  3 siblings, 1 reply; 8+ messages in thread
From: Kirill Tkhai @ 2018-09-24 12:08 UTC (permalink / raw)
  To: miklos, kuznet, linux-fsdevel, linux-kernel

2 weeks passed, so ping.

Miklos, any reaction on this?

Thanks,
Kirill

On 11.09.2018 13:11, Kirill Tkhai wrote:
> Hi,
> 
> We noticed the performance bottle neck in FUSE running our
> Virtuozzo storage over rdma. On some types of workload
> we observe 20% of time spent in request_find() in profiler.
> This function is iterating over long list of requests, and it
> scales bad.
> 
> The patch introduces hash table to reduce the number
> of iterations, we do in this function. Also, algorithm
> of generating IDs for interrupt requests is changed,
> simplified request_find() function and killed
> fuse_req::intr_unique field.
> 
> Kirill
> ---
> 
> Kirill Tkhai (3):
>       fuse: Change interrupt requests allocation algorhythm
>       fuse: Kill fuse_req::intr_unique
>       fuse: Use hash table to link processing request
> 
> 
>  fs/fuse/dev.c    |   47 +++++++++++++++++++++++++++++++++++++----------
>  fs/fuse/fuse_i.h |   11 +++++------
>  fs/fuse/inode.c  |    5 ++++-
>  3 files changed, 46 insertions(+), 17 deletions(-)
> 
> --
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] fuse: Solve request_find() bottleneck
  2018-09-24 12:08 ` [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
@ 2018-09-24 15:04   ` Miklos Szeredi
  0 siblings, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2018-09-24 15:04 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: kuznet, linux-fsdevel, linux-kernel

On Mon, Sep 24, 2018 at 2:08 PM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> 2 weeks passed, so ping.
>
> Miklos, any reaction on this?
>
> Thanks,
> Kirill
>
> On 11.09.2018 13:11, Kirill Tkhai wrote:
>> Hi,
>>
>> We noticed the performance bottle neck in FUSE running our
>> Virtuozzo storage over rdma. On some types of workload
>> we observe 20% of time spent in request_find() in profiler.
>> This function is iterating over long list of requests, and it
>> scales bad.
>>
>> The patch introduces hash table to reduce the number
>> of iterations, we do in this function. Also, algorithm
>> of generating IDs for interrupt requests is changed,
>> simplified request_find() function and killed
>> fuse_req::intr_unique field.

Concept looks good.  Will review details...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] fuse: Use hash table to link processing request
  2018-09-11 10:12 ` [PATCH 3/3] fuse: Use hash table to link processing request Kirill Tkhai
@ 2018-09-25  9:08   ` Miklos Szeredi
  2018-09-25  9:35     ` Kirill Tkhai
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2018-09-25  9:08 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: kuznet, linux-fsdevel, linux-kernel

On Tue, Sep 11, 2018 at 12:12 PM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> We noticed the performance bottle neck in FUSE running our
> Virtuozzo storage over rdma. On some types of workload
> we observe 20% of times pent in request_find() in profiler.
> This function is iterating over long requests list, and it
> scales bad.
>
> The patch introduces hash table to reduce the number
> of iterations, we do in this function. Hash generating
> algorithm is taken from hash_add() function, while
> 512 lines table is used to store pending requests.
> This fixes problem and improves the performance.

Pushed to fuse.git#for-next with a number of small changes.   E.g. I
reduced the number of cachlines to 256 to make the hashtable size just
4k.   Was there a scientific reason for choosing 512 as the optimal
number of cache lines?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] fuse: Use hash table to link processing request
  2018-09-25  9:08   ` Miklos Szeredi
@ 2018-09-25  9:35     ` Kirill Tkhai
  0 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-09-25  9:35 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: kuznet, linux-fsdevel, linux-kernel

On 25.09.2018 12:08, Miklos Szeredi wrote:
> On Tue, Sep 11, 2018 at 12:12 PM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>> We noticed the performance bottle neck in FUSE running our
>> Virtuozzo storage over rdma. On some types of workload
>> we observe 20% of times pent in request_find() in profiler.
>> This function is iterating over long requests list, and it
>> scales bad.
>>
>> The patch introduces hash table to reduce the number
>> of iterations, we do in this function. Hash generating
>> algorithm is taken from hash_add() function, while
>> 512 lines table is used to store pending requests.
>> This fixes problem and improves the performance.
> 
> Pushed to fuse.git#for-next with a number of small changes.   E.g. I

Thanks!

> reduced the number of cachlines to 256 to make the hashtable size just
> 4k.   Was there a scientific reason for choosing 512 as the optimal
> number of cache lines?

I just tried to choose a size, which is not small for all of potential
users. But, it looks like 256 should be also enough. 
So, there was no hidden mathematics...

Kirill

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-25 15:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-11 10:11 [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
2018-09-11 10:11 ` [PATCH 1/3] fuse: Change interrupt requests allocation algorhythm Kirill Tkhai
2018-09-11 10:12 ` [PATCH 2/3] fuse: Kill fuse_req::intr_unique Kirill Tkhai
2018-09-11 10:12 ` [PATCH 3/3] fuse: Use hash table to link processing request Kirill Tkhai
2018-09-25  9:08   ` Miklos Szeredi
2018-09-25  9:35     ` Kirill Tkhai
2018-09-24 12:08 ` [PATCH 0/3] fuse: Solve request_find() bottleneck Kirill Tkhai
2018-09-24 15:04   ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).