linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] percpu_ida: Fix data race on cpus_have_tags cpumask
       [not found] <cover.1382950629.git.agordeev@redhat.com>
@ 2013-10-28 10:05 ` Alexander Gordeev
  2013-10-28 21:20   ` Kent Overstreet
  2013-10-28 10:05 ` [PATCH 2/5] percpu_ida: Move waking up waiters out of atomic contexts Alexander Gordeev
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Alexander Gordeev @ 2013-10-28 10:05 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

Function steal_tags() might miss a bit in cpus_have_tags due to
unsynchronized access from percpu_ida_free(). As result, function
percpu_ida_alloc() might enter unwakable sleep. This update adds
memory barriers to prevent the described scenario.

In fact, accesses to cpus_have_tags are fenced by prepare_to_wait()
and wake_up() calls at the moment and the aforementioned sequence
does not appear could hit. Nevertheless, explicit memory barriers
still seem justifiable.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 lib/percpu_ida.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index b0698ea..96cfacf 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -68,6 +68,11 @@ static inline void steal_tags(struct percpu_ida *pool,
 	unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
 	struct percpu_ida_cpu *remote;
 
+	/*
+	 * Pairs with smp_wmb() in percpu_ida_free()
+	 */
+	smp_rmb();
+
 	for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
 	     cpus_have_tags * pool->percpu_max_size > pool->nr_tags / 2;
 	     cpus_have_tags--) {
@@ -231,8 +236,11 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 	spin_unlock(&tags->lock);
 
 	if (nr_free == 1) {
-		cpumask_set_cpu(smp_processor_id(),
-				&pool->cpus_have_tags);
+		cpumask_set_cpu(smp_processor_id(), &pool->cpus_have_tags);
+		/*
+		 * Pairs with smp_rmb() in steal_tags()
+		 */
+		smp_wmb();
 		wake_up(&pool->wait);
 	}
 
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] percpu_ida: Move waking up waiters out of atomic contexts
       [not found] <cover.1382950629.git.agordeev@redhat.com>
  2013-10-28 10:05 ` [PATCH 1/5] percpu_ida: Fix data race on cpus_have_tags cpumask Alexander Gordeev
@ 2013-10-28 10:05 ` Alexander Gordeev
  2013-10-28 21:21   ` Kent Overstreet
  2013-10-28 10:05 ` [PATCH 3/5] percpu_ida: Optimize freeing tags when maximum cache size is 1 Alexander Gordeev
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Alexander Gordeev @ 2013-10-28 10:05 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

Currently percpu_ida_free() waikes up waiters always with local
interrupts disabled and sometimes with pool->lock held. Yet, it
does not appear there is any reason why it could not be done out
of these atomic contexts.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 lib/percpu_ida.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index 96cfacf..9dd8741 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -223,6 +223,7 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 	struct percpu_ida_cpu *tags;
 	unsigned long flags;
 	unsigned nr_free;
+	bool wake_up = false;
 
 	BUG_ON(tag >= pool->nr_tags);
 
@@ -241,8 +242,8 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 		 * Pairs with smp_rmb() in steal_tags()
 		 */
 		smp_wmb();
-		wake_up(&pool->wait);
-	}
+		wake_up = true;
+	} 
 
 	if (nr_free == pool->percpu_max_size) {
 		spin_lock(&pool->lock);
@@ -255,13 +256,15 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 			move_tags(pool->freelist, &pool->nr_free,
 				  tags->freelist, &tags->nr_free,
 				  pool->percpu_batch_size);
-
-			wake_up(&pool->wait);
+			wake_up = true;
 		}
 		spin_unlock(&pool->lock);
 	}
 
 	local_irq_restore(flags);
+
+	if (wake_up)
+		wake_up(&pool->wait);
 }
 EXPORT_SYMBOL_GPL(percpu_ida_free);
 
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] percpu_ida: Optimize freeing tags when maximum cache size is 1
       [not found] <cover.1382950629.git.agordeev@redhat.com>
  2013-10-28 10:05 ` [PATCH 1/5] percpu_ida: Fix data race on cpus_have_tags cpumask Alexander Gordeev
  2013-10-28 10:05 ` [PATCH 2/5] percpu_ida: Move waking up waiters out of atomic contexts Alexander Gordeev
@ 2013-10-28 10:05 ` Alexander Gordeev
  2013-10-28 10:06 ` [PATCH 4/5] percpu_ida: Sanity check initialization parameters Alexander Gordeev
  2013-10-28 10:06 ` [PATCH 5/5] percpu_ida: Allow variable maximum number of cached tags Alexander Gordeev
  4 siblings, 0 replies; 7+ messages in thread
From: Alexander Gordeev @ 2013-10-28 10:05 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

In case percpu_max_size is 1 percpu_ida_free() sets bit in
the cpumask and immediately transfers the tag to the pool.
As result, the very next call to percpu_ida_alloc() on the
same CPU will have to pull a tag from the pool to the local
cache and so on. Hence, positive effects of local caching
become largely negated.

This update assumes stealing tags is faster than ping-ponging
between local caches and the pool and prevents returning tags
to the pool in case percpu_max_size is 1.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 lib/percpu_ida.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index 9dd8741..4adc3e5 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -243,9 +243,8 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 		 */
 		smp_wmb();
 		wake_up = true;
-	} 
-
-	if (nr_free == pool->percpu_max_size) {
+	} else if ((nr_free == pool->percpu_max_size) &&
+		   (pool->percpu_max_size > 1)) {
 		spin_lock(&pool->lock);
 
 		/*
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] percpu_ida: Sanity check initialization parameters
       [not found] <cover.1382950629.git.agordeev@redhat.com>
                   ` (2 preceding siblings ...)
  2013-10-28 10:05 ` [PATCH 3/5] percpu_ida: Optimize freeing tags when maximum cache size is 1 Alexander Gordeev
@ 2013-10-28 10:06 ` Alexander Gordeev
  2013-10-28 10:06 ` [PATCH 5/5] percpu_ida: Allow variable maximum number of cached tags Alexander Gordeev
  4 siblings, 0 replies; 7+ messages in thread
From: Alexander Gordeev @ 2013-10-28 10:06 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 lib/percpu_ida.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index 4adc3e5..1fc89f9 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -298,6 +298,11 @@ int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
 {
 	unsigned i, cpu, order;
 
+	if (batch_size > max_size)
+		return -ERANGE;
+	if (!batch_size)
+		return -EINVAL;
+
 	memset(pool, 0, sizeof(*pool));
 
 	init_waitqueue_head(&pool->wait);
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] percpu_ida: Allow variable maximum number of cached tags
       [not found] <cover.1382950629.git.agordeev@redhat.com>
                   ` (3 preceding siblings ...)
  2013-10-28 10:06 ` [PATCH 4/5] percpu_ida: Sanity check initialization parameters Alexander Gordeev
@ 2013-10-28 10:06 ` Alexander Gordeev
  4 siblings, 0 replies; 7+ messages in thread
From: Alexander Gordeev @ 2013-10-28 10:06 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

Currently a threshold for stealing tags from remote CPUs
is set to a half of the total number of tags. However,
in general case this threshold is a function not only of
the total number of tags and maximum number of tags per
CPU, but also of a usage pattern. Just let percpu_ida
users decide how big this threshold value should be.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq-tag.c         |    9 +++++----
 include/linux/percpu_ida.h |    5 +++--
 lib/percpu_ida.c           |    7 +++++--
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d64a02f..da0b3dd 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -145,10 +145,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_max_cache = nr_cache;
 	tags->nr_batch_move = max(1u, nr_cache / 2);
 
-	ret = __percpu_ida_init(&tags->free_tags, tags->nr_tags -
-				tags->nr_reserved_tags,
+	ret = __percpu_ida_init(&tags->free_tags,
+				tags->nr_tags - tags->nr_reserved_tags,
 				tags->nr_max_cache,
-				tags->nr_batch_move);
+				tags->nr_batch_move,
+				(tags->nr_tags - tags->nr_reserved_tags) / 2);
 	if (ret)
 		goto err_free_tags;
 
@@ -158,7 +159,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 		 * no cached. It's fine reserved tags allocation is slow.
 		 */
 		ret = __percpu_ida_init(&tags->reserved_tags, reserved_tags,
-				1, 1);
+				1, 1, 0);
 		if (ret)
 			goto err_reserved_tags;
 	}
diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
index 1900bd0..d874cca 100644
--- a/include/linux/percpu_ida.h
+++ b/include/linux/percpu_ida.h
@@ -18,6 +18,7 @@ struct percpu_ida {
 	unsigned			nr_tags;
 	unsigned			percpu_max_size;
 	unsigned			percpu_batch_size;
+	unsigned			max_cached;
 
 	struct percpu_ida_cpu __percpu	*tag_cpu;
 
@@ -66,11 +67,11 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
 
 void percpu_ida_destroy(struct percpu_ida *pool);
 int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-	unsigned long max_size, unsigned long batch_size);
+	unsigned long max_size, unsigned long batch_size, unsigned max_cached);
 static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
 {
 	return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE,
-		IDA_DEFAULT_PCPU_BATCH_MOVE);
+		IDA_DEFAULT_PCPU_BATCH_MOVE, nr_tags / 2);
 }
 
 typedef int (*percpu_ida_cb)(unsigned, void *);
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
index 1fc89f9..241f8a3 100644
--- a/lib/percpu_ida.c
+++ b/lib/percpu_ida.c
@@ -74,7 +74,7 @@ static inline void steal_tags(struct percpu_ida *pool,
 	smp_rmb();
 
 	for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
-	     cpus_have_tags * pool->percpu_max_size > pool->nr_tags / 2;
+	     cpus_have_tags * pool->percpu_max_size > pool->max_cached;
 	     cpus_have_tags--) {
 		cpu = cpumask_next(cpu, &pool->cpus_have_tags);
 
@@ -294,7 +294,7 @@ EXPORT_SYMBOL_GPL(percpu_ida_destroy);
  * performance, the workload should not span more cpus than nr_tags / 128.
  */
 int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-	unsigned long max_size, unsigned long batch_size)
+	unsigned long max_size, unsigned long batch_size, unsigned max_cached)
 {
 	unsigned i, cpu, order;
 
@@ -302,6 +302,8 @@ int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
 		return -ERANGE;
 	if (!batch_size)
 		return -EINVAL;
+	if (max_cached > nr_tags)
+		return -EINVAL;
 
 	memset(pool, 0, sizeof(*pool));
 
@@ -310,6 +312,7 @@ int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
 	pool->nr_tags = nr_tags;
 	pool->percpu_max_size = max_size;
 	pool->percpu_batch_size = batch_size;
+	pool->max_cached = max_cached;
 
 	/* Guard against overflow */
 	if (nr_tags > (unsigned) INT_MAX + 1) {
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/5] percpu_ida: Fix data race on cpus_have_tags cpumask
  2013-10-28 10:05 ` [PATCH 1/5] percpu_ida: Fix data race on cpus_have_tags cpumask Alexander Gordeev
@ 2013-10-28 21:20   ` Kent Overstreet
  0 siblings, 0 replies; 7+ messages in thread
From: Kent Overstreet @ 2013-10-28 21:20 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

On Mon, Oct 28, 2013 at 11:05:06AM +0100, Alexander Gordeev wrote:
> Function steal_tags() might miss a bit in cpus_have_tags due to
> unsynchronized access from percpu_ida_free(). As result, function
> percpu_ida_alloc() might enter unwakable sleep. This update adds
> memory barriers to prevent the described scenario.
> 
> In fact, accesses to cpus_have_tags are fenced by prepare_to_wait()
> and wake_up() calls at the moment and the aforementioned sequence
> does not appear could hit. Nevertheless, explicit memory barriers
> still seem justifiable.

Good catch!

Acked-by: Kent Overstreet <kmo@daterainc.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/5] percpu_ida: Move waking up waiters out of atomic contexts
  2013-10-28 10:05 ` [PATCH 2/5] percpu_ida: Move waking up waiters out of atomic contexts Alexander Gordeev
@ 2013-10-28 21:21   ` Kent Overstreet
  0 siblings, 0 replies; 7+ messages in thread
From: Kent Overstreet @ 2013-10-28 21:21 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Oleg Nesterov, Jens Axboe, Nicholas A. Bellinger, linux-kernel

On Mon, Oct 28, 2013 at 11:05:25AM +0100, Alexander Gordeev wrote:
> Currently percpu_ida_free() waikes up waiters always with local
> interrupts disabled and sometimes with pool->lock held. Yet, it
> does not appear there is any reason why it could not be done out
> of these atomic contexts.

This should be a noticable performance boost, nested irqsave/restore is painful.

Acked-by: Kent Overstreet <kmo@daterainc.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-28 21:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1382950629.git.agordeev@redhat.com>
2013-10-28 10:05 ` [PATCH 1/5] percpu_ida: Fix data race on cpus_have_tags cpumask Alexander Gordeev
2013-10-28 21:20   ` Kent Overstreet
2013-10-28 10:05 ` [PATCH 2/5] percpu_ida: Move waking up waiters out of atomic contexts Alexander Gordeev
2013-10-28 21:21   ` Kent Overstreet
2013-10-28 10:05 ` [PATCH 3/5] percpu_ida: Optimize freeing tags when maximum cache size is 1 Alexander Gordeev
2013-10-28 10:06 ` [PATCH 4/5] percpu_ida: Sanity check initialization parameters Alexander Gordeev
2013-10-28 10:06 ` [PATCH 5/5] percpu_ida: Allow variable maximum number of cached tags Alexander Gordeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).