All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix hangs with shared sqpoll
@ 2021-04-16  0:22 Pavel Begunkov
  2021-04-16  0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16  0:22 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.

Pavel Begunkov (2):
  percpu_ref: add percpu_ref_atomic_count()
  io_uring: fix shared sqpoll cancellation hangs

 fs/io_uring.c                   |  5 +++--
 include/linux/percpu-refcount.h |  1 +
 lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 2 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16  0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
@ 2021-04-16  0:22 ` Pavel Begunkov
  2021-04-16  4:45   ` Dennis Zhou
  2021-04-16 15:31   ` Bart Van Assche
  2021-04-16  0:22 ` [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs Pavel Begunkov
  2021-04-16  0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
  2 siblings, 2 replies; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16  0:22 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

Add percpu_ref_atomic_count(), which returns number of references of a
percpu_ref switched prior into atomic mode, so the caller is responsible
to make sure it's in the right mode.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/percpu-refcount.h |  1 +
 lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 16c35a728b4c..0ff40e79efa2 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 void percpu_ref_resurrect(struct percpu_ref *ref);
 void percpu_ref_reinit(struct percpu_ref *ref);
 bool percpu_ref_is_zero(struct percpu_ref *ref);
+unsigned long percpu_ref_atomic_count(struct percpu_ref *ref);
 
 /**
  * percpu_ref_kill - drop the initial ref
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index a1071cdefb5a..56286995e2b8 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
 }
 EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
 
+/**
+ * percpu_ref_atomic_count - returns number of left references
+ * @ref: percpu_ref to test
+ *
+ * This function is safe to call as long as @ref is switch into atomic mode,
+ * and is between init and exit.
+ */
+unsigned long percpu_ref_atomic_count(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
+	unsigned long count, flags;
+
+	if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)))
+		return -1UL;
+
+	/* protect us from being destroyed */
+	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
+	if (ref->data)
+		count = atomic_long_read(&ref->data->count);
+	else
+		count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS;
+	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
+
+	return count;
+}
+
 /**
  * percpu_ref_reinit - re-initialize a percpu refcount
  * @ref: perpcu_ref to re-initialize
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs
  2021-04-16  0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
  2021-04-16  0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
@ 2021-04-16  0:22 ` Pavel Begunkov
  2021-04-16  0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
  2 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16  0:22 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

[  736.982891] INFO: task iou-sqp-4294:4295 blocked for more than 122 seconds.
[  736.982897] Call Trace:
[  736.982901]  schedule+0x68/0xe0
[  736.982903]  io_uring_cancel_sqpoll+0xdb/0x110
[  736.982908]  io_sqpoll_cancel_cb+0x24/0x30
[  736.982911]  io_run_task_work_head+0x28/0x50
[  736.982913]  io_sq_thread+0x4e3/0x720

We call io_uring_cancel_sqpoll() one by one for each ctx either in
sq_thread() itself or via task works, and it's intended to cancel all
requests of a specified context. However the function uses per-task
counters to track the number of inflight requests, so it counts more
requests than available via currect io_uring ctx and goes to sleep for
them to appear (e.g. from IRQ), that will never happen.

Reported-by: Joakim Hassila <joj@mac.com>
Reported-by: Jens Axboe <axboe@kernel.dk>
Fixes: 37d1e2e3642e2 ("io_uring: move SQPOLL thread io-wq forked worker")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dff34975d86b..c1c843b044c0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9000,10 +9000,11 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 
 	WARN_ON_ONCE(!sqd || ctx->sq_data->thread != current);
 
+	percpu_ref_switch_to_atomic_sync(&ctx->refs);
 	atomic_inc(&tctx->in_idle);
 	do {
 		/* read completions before cancelations */
-		inflight = tctx_inflight(tctx);
+		inflight = percpu_ref_atomic_count(&ctx->refs);
 		if (!inflight)
 			break;
 		io_uring_try_cancel_requests(ctx, current, NULL);
@@ -9014,7 +9015,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 		 * avoids a race where a completion comes in before we did
 		 * prepare_to_wait().
 		 */
-		if (inflight == tctx_inflight(tctx))
+		if (inflight == percpu_ref_atomic_count(&ctx->refs))
 			schedule();
 		finish_wait(&tctx->wait, &wait);
 	} while (1);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
  2021-04-16  0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
  2021-04-16  0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
  2021-04-16  0:22 ` [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs Pavel Begunkov
@ 2021-04-16  0:26 ` Pavel Begunkov
  2021-04-16 13:04   ` Jens Axboe
  2 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16  0:26 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 16/04/2021 01:22, Pavel Begunkov wrote:
> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.

1/2 is basically a rip off of one of old Jens' patches, but can't
find it anywhere. If you still have it, especially if it was
reviewed/etc., may make sense to go with it instead

> 
> Pavel Begunkov (2):
>   percpu_ref: add percpu_ref_atomic_count()
>   io_uring: fix shared sqpoll cancellation hangs
> 
>  fs/io_uring.c                   |  5 +++--
>  include/linux/percpu-refcount.h |  1 +
>  lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
>  3 files changed, 30 insertions(+), 2 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16  0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
@ 2021-04-16  4:45   ` Dennis Zhou
  2021-04-16 13:16     ` Pavel Begunkov
  2021-04-16 15:31   ` Bart Van Assche
  1 sibling, 1 reply; 17+ messages in thread
From: Dennis Zhou @ 2021-04-16  4:45 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, io-uring, Tejun Heo, Christoph Lameter, Joakim Hassila

Hello,

On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote:
> Add percpu_ref_atomic_count(), which returns number of references of a
> percpu_ref switched prior into atomic mode, so the caller is responsible
> to make sure it's in the right mode.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/percpu-refcount.h |  1 +
>  lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index 16c35a728b4c..0ff40e79efa2 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
>  void percpu_ref_resurrect(struct percpu_ref *ref);
>  void percpu_ref_reinit(struct percpu_ref *ref);
>  bool percpu_ref_is_zero(struct percpu_ref *ref);
> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref);
>  
>  /**
>   * percpu_ref_kill - drop the initial ref
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index a1071cdefb5a..56286995e2b8 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
>  }
>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
>  
> +/**
> + * percpu_ref_atomic_count - returns number of left references
> + * @ref: percpu_ref to test
> + *
> + * This function is safe to call as long as @ref is switch into atomic mode,
> + * and is between init and exit.
> + */
> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref)
> +{
> +	unsigned long __percpu *percpu_count;
> +	unsigned long count, flags;
> +
> +	if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)))
> +		return -1UL;
> +
> +	/* protect us from being destroyed */
> +	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
> +	if (ref->data)
> +		count = atomic_long_read(&ref->data->count);
> +	else
> +		count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS;

Sorry I missed Jens' patch before and also the update to percpu_ref.
However, I feel like I'm missing something. This isn't entirely related
to your patch, but I'm not following why percpu_count_ptr stores the
excess count of an exited percpu_ref and doesn't warn when it's not
zero. It seems like this should be an error if it's not 0?

Granted we have made some contract with the user to do the right thing,
but say someone does mess up, we don't indicate to them hey this ref is
actually dead and if they're waiting for it to go to 0, it never will.

> +	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
> +
> +	return count;
> +}
> +
>  /**
>   * percpu_ref_reinit - re-initialize a percpu refcount
>   * @ref: perpcu_ref to re-initialize
> -- 
> 2.24.0
> 

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
  2021-04-16  0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
@ 2021-04-16 13:04   ` Jens Axboe
  2021-04-16 13:12     ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2021-04-16 13:04 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 4/15/21 6:26 PM, Pavel Begunkov wrote:
> On 16/04/2021 01:22, Pavel Begunkov wrote:
>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
> 
> 1/2 is basically a rip off of one of old Jens' patches, but can't
> find it anywhere. If you still have it, especially if it was
> reviewed/etc., may make sense to go with it instead

I wonder if we can do something like the below instead - we don't
care about a particularly stable count in terms of wakeup
reliance, and it'd save a nasty sync atomic switch.

Totally untested...


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6c182a3a221b..9edbcf01ea49 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8928,7 +8928,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 	atomic_inc(&tctx->in_idle);
 	do {
 		/* read completions before cancelations */
-		inflight = tctx_inflight(tctx, false);
+		inflight = percpu_ref_sum(&ctx->refs);
 		if (!inflight)
 			break;
 		io_uring_try_cancel_requests(ctx, current, NULL);
@@ -8939,7 +8939,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 		 * avoids a race where a completion comes in before we did
 		 * prepare_to_wait().
 		 */
-		if (inflight == tctx_inflight(tctx, false))
+		if (inflight == percpu_ref_sum(&ctx->refs))
 			schedule();
 		finish_wait(&tctx->wait, &wait);
 	} while (1);
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 16c35a728b4c..2f29f34bc993 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 void percpu_ref_resurrect(struct percpu_ref *ref);
 void percpu_ref_reinit(struct percpu_ref *ref);
 bool percpu_ref_is_zero(struct percpu_ref *ref);
+long percpu_ref_sum(struct percpu_ref *ref);
 
 /**
  * percpu_ref_kill - drop the initial ref
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index a1071cdefb5a..b09ed9fdd32d 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -475,3 +475,31 @@ void percpu_ref_resurrect(struct percpu_ref *ref)
 	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_resurrect);
+
+/**
+ * percpu_ref_sum - return approximate ref counts
+ * @ref: perpcu_ref to sum
+ *
+ * Note that this should only really be used to compare refs, as by the
+ * very nature of percpu references, the value may be stale even before it
+ * has been returned.
+ */
+long percpu_ref_sum(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
+	long ret;
+
+	rcu_read_lock();
+	if (__ref_is_percpu(ref, &percpu_count)) {
+		ret = atomic_long_read(&ref->data->count);
+	} else {
+		int cpu;
+
+		ret = 0;
+		for_each_possible_cpu(cpu)
+			ret += *per_cpu_ptr(percpu_count, cpu);
+	}
+	rcu_read_unlock();
+
+	return ret;
+}

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
  2021-04-16 13:04   ` Jens Axboe
@ 2021-04-16 13:12     ` Pavel Begunkov
  2021-04-16 13:58       ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16 13:12 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 16/04/2021 14:04, Jens Axboe wrote:
> On 4/15/21 6:26 PM, Pavel Begunkov wrote:
>> On 16/04/2021 01:22, Pavel Begunkov wrote:
>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
>>
>> 1/2 is basically a rip off of one of old Jens' patches, but can't
>> find it anywhere. If you still have it, especially if it was
>> reviewed/etc., may make sense to go with it instead
> 
> I wonder if we can do something like the below instead - we don't
> care about a particularly stable count in terms of wakeup
> reliance, and it'd save a nasty sync atomic switch.

But we care about it being monotonous. There are nuances with it.
I think, non sync'ed summing may put it to eternal sleep.

Are you looking to save on switching? It's almost always is already
dying with prior ref_kill

> 
> Totally untested...
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6c182a3a221b..9edbcf01ea49 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8928,7 +8928,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
>  	atomic_inc(&tctx->in_idle);
>  	do {
>  		/* read completions before cancelations */
> -		inflight = tctx_inflight(tctx, false);
> +		inflight = percpu_ref_sum(&ctx->refs);
>  		if (!inflight)
>  			break;
>  		io_uring_try_cancel_requests(ctx, current, NULL);
> @@ -8939,7 +8939,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
>  		 * avoids a race where a completion comes in before we did
>  		 * prepare_to_wait().
>  		 */
> -		if (inflight == tctx_inflight(tctx, false))
> +		if (inflight == percpu_ref_sum(&ctx->refs))
>  			schedule();
>  		finish_wait(&tctx->wait, &wait);
>  	} while (1);
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index 16c35a728b4c..2f29f34bc993 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
>  void percpu_ref_resurrect(struct percpu_ref *ref);
>  void percpu_ref_reinit(struct percpu_ref *ref);
>  bool percpu_ref_is_zero(struct percpu_ref *ref);
> +long percpu_ref_sum(struct percpu_ref *ref);
>  
>  /**
>   * percpu_ref_kill - drop the initial ref
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index a1071cdefb5a..b09ed9fdd32d 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -475,3 +475,31 @@ void percpu_ref_resurrect(struct percpu_ref *ref)
>  	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(percpu_ref_resurrect);
> +
> +/**
> + * percpu_ref_sum - return approximate ref counts
> + * @ref: perpcu_ref to sum
> + *
> + * Note that this should only really be used to compare refs, as by the
> + * very nature of percpu references, the value may be stale even before it
> + * has been returned.
> + */
> +long percpu_ref_sum(struct percpu_ref *ref)
> +{
> +	unsigned long __percpu *percpu_count;
> +	long ret;
> +
> +	rcu_read_lock();
> +	if (__ref_is_percpu(ref, &percpu_count)) {
> +		ret = atomic_long_read(&ref->data->count);
> +	} else {
> +		int cpu;
> +
> +		ret = 0;
> +		for_each_possible_cpu(cpu)
> +			ret += *per_cpu_ptr(percpu_count, cpu);
> +	}
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16  4:45   ` Dennis Zhou
@ 2021-04-16 13:16     ` Pavel Begunkov
  2021-04-16 14:10       ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16 13:16 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, io-uring, Tejun Heo, Christoph Lameter,
	Joakim Hassila, Ming Lei

On 16/04/2021 05:45, Dennis Zhou wrote:
> Hello,
> 
> On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote:
>> Add percpu_ref_atomic_count(), which returns number of references of a
>> percpu_ref switched prior into atomic mode, so the caller is responsible
>> to make sure it's in the right mode.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>  include/linux/percpu-refcount.h |  1 +
>>  lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
>> index 16c35a728b4c..0ff40e79efa2 100644
>> --- a/include/linux/percpu-refcount.h
>> +++ b/include/linux/percpu-refcount.h
>> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
>>  void percpu_ref_resurrect(struct percpu_ref *ref);
>>  void percpu_ref_reinit(struct percpu_ref *ref);
>>  bool percpu_ref_is_zero(struct percpu_ref *ref);
>> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref);
>>  
>>  /**
>>   * percpu_ref_kill - drop the initial ref
>> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
>> index a1071cdefb5a..56286995e2b8 100644
>> --- a/lib/percpu-refcount.c
>> +++ b/lib/percpu-refcount.c
>> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
>>  }
>>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
>>  
>> +/**
>> + * percpu_ref_atomic_count - returns number of left references
>> + * @ref: percpu_ref to test
>> + *
>> + * This function is safe to call as long as @ref is switch into atomic mode,
>> + * and is between init and exit.
>> + */
>> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref)
>> +{
>> +	unsigned long __percpu *percpu_count;
>> +	unsigned long count, flags;
>> +
>> +	if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)))
>> +		return -1UL;
>> +
>> +	/* protect us from being destroyed */
>> +	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
>> +	if (ref->data)
>> +		count = atomic_long_read(&ref->data->count);
>> +	else
>> +		count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS;
> 
> Sorry I missed Jens' patch before and also the update to percpu_ref.
> However, I feel like I'm missing something. This isn't entirely related
> to your patch, but I'm not following why percpu_count_ptr stores the
> excess count of an exited percpu_ref and doesn't warn when it's not
> zero. It seems like this should be an error if it's not 0?
> 
> Granted we have made some contract with the user to do the right thing,
> but say someone does mess up, we don't indicate to them hey this ref is
> actually dead and if they're waiting for it to go to 0, it never will.

fwiw, I copied is_zero, but skimming through the code don't immediately
see myself why it is so...

Cc Ming, he split out some parts of it to dynamic allocation not too
long ago, maybe he knows the trick.

> 
>> +	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
>> +
>> +	return count;
>> +}
>> +
>>  /**
>>   * percpu_ref_reinit - re-initialize a percpu refcount
>>   * @ref: perpcu_ref to re-initialize
>> -- 
>> 2.24.0
>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
  2021-04-16 13:12     ` Pavel Begunkov
@ 2021-04-16 13:58       ` Jens Axboe
  2021-04-16 14:09         ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2021-04-16 13:58 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 4/16/21 7:12 AM, Pavel Begunkov wrote:
> On 16/04/2021 14:04, Jens Axboe wrote:
>> On 4/15/21 6:26 PM, Pavel Begunkov wrote:
>>> On 16/04/2021 01:22, Pavel Begunkov wrote:
>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
>>>
>>> 1/2 is basically a rip off of one of old Jens' patches, but can't
>>> find it anywhere. If you still have it, especially if it was
>>> reviewed/etc., may make sense to go with it instead
>>
>> I wonder if we can do something like the below instead - we don't
>> care about a particularly stable count in terms of wakeup
>> reliance, and it'd save a nasty sync atomic switch.
> 
> But we care about it being monotonous. There are nuances with it.

Do we, though? We care about it changing when something has happened,
but not about it being monotonic.

> I think, non sync'ed summing may put it to eternal sleep.

That's what the two reads are about, that's the same as before. The
numbers are racy in both cases, but that's why we compare after having
added ourselves to the wait queue.

> Are you looking to save on switching? It's almost always is already
> dying with prior ref_kill

Yep, always looking to avoid a sync switch if at all possible. For 99%
of the cases it's fine, it's the last case in busy prod that wreaks
havoc.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
  2021-04-16 13:58       ` Jens Axboe
@ 2021-04-16 14:09         ` Pavel Begunkov
  2021-04-16 14:42           ` Pavel Begunkov
       [not found]           ` <20210417013115.15032-1-hdanton@sina.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16 14:09 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 16/04/2021 14:58, Jens Axboe wrote:
> On 4/16/21 7:12 AM, Pavel Begunkov wrote:
>> On 16/04/2021 14:04, Jens Axboe wrote:
>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote:
>>>> On 16/04/2021 01:22, Pavel Begunkov wrote:
>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
>>>>
>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't
>>>> find it anywhere. If you still have it, especially if it was
>>>> reviewed/etc., may make sense to go with it instead
>>>
>>> I wonder if we can do something like the below instead - we don't
>>> care about a particularly stable count in terms of wakeup
>>> reliance, and it'd save a nasty sync atomic switch.
>>
>> But we care about it being monotonous. There are nuances with it.
> 
> Do we, though? We care about it changing when something has happened,
> but not about it being monotonic.

We may find inflight == get_inflight(), when it's not really so,
and so get to schedule() awhile there are pending requests that
are not going to be cancelled by itself. And those pending requests
may have been non-discoverable and so non-cancellable, e.g. because
were a part of a ling/hardlink.

>> I think, non sync'ed summing may put it to eternal sleep.
> 
> That's what the two reads are about, that's the same as before. The
> numbers are racy in both cases, but that's why we compare after having
> added ourselves to the wait queue.
> 
>> Are you looking to save on switching? It's almost always is already
>> dying with prior ref_kill
> 
> Yep, always looking to avoid a sync switch if at all possible. For 99%
> of the cases it's fine, it's the last case in busy prod that wreaks
> havoc.

Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll
doesn't have many file notes (as it was called before). We can
completely avoid it and even make faster if happens from sq_thread()
on it getting to exit, but do we want it for 5.12?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16 13:16     ` Pavel Begunkov
@ 2021-04-16 14:10       ` Ming Lei
  2021-04-16 14:37         ` Dennis Zhou
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2021-04-16 14:10 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Dennis Zhou, Jens Axboe, io-uring, Tejun Heo, Christoph Lameter,
	Joakim Hassila

On Fri, Apr 16, 2021 at 02:16:41PM +0100, Pavel Begunkov wrote:
> On 16/04/2021 05:45, Dennis Zhou wrote:
> > Hello,
> > 
> > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote:
> >> Add percpu_ref_atomic_count(), which returns number of references of a
> >> percpu_ref switched prior into atomic mode, so the caller is responsible
> >> to make sure it's in the right mode.
> >>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >>  include/linux/percpu-refcount.h |  1 +
> >>  lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
> >>  2 files changed, 27 insertions(+)
> >>
> >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> >> index 16c35a728b4c..0ff40e79efa2 100644
> >> --- a/include/linux/percpu-refcount.h
> >> +++ b/include/linux/percpu-refcount.h
> >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> >>  void percpu_ref_resurrect(struct percpu_ref *ref);
> >>  void percpu_ref_reinit(struct percpu_ref *ref);
> >>  bool percpu_ref_is_zero(struct percpu_ref *ref);
> >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref);
> >>  
> >>  /**
> >>   * percpu_ref_kill - drop the initial ref
> >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> >> index a1071cdefb5a..56286995e2b8 100644
> >> --- a/lib/percpu-refcount.c
> >> +++ b/lib/percpu-refcount.c
> >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
> >>  }
> >>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
> >>  
> >> +/**
> >> + * percpu_ref_atomic_count - returns number of left references
> >> + * @ref: percpu_ref to test
> >> + *
> >> + * This function is safe to call as long as @ref is switch into atomic mode,
> >> + * and is between init and exit.
> >> + */
> >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref)
> >> +{
> >> +	unsigned long __percpu *percpu_count;
> >> +	unsigned long count, flags;
> >> +
> >> +	if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)))
> >> +		return -1UL;
> >> +
> >> +	/* protect us from being destroyed */
> >> +	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
> >> +	if (ref->data)
> >> +		count = atomic_long_read(&ref->data->count);
> >> +	else
> >> +		count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS;
> > 
> > Sorry I missed Jens' patch before and also the update to percpu_ref.
> > However, I feel like I'm missing something. This isn't entirely related
> > to your patch, but I'm not following why percpu_count_ptr stores the
> > excess count of an exited percpu_ref and doesn't warn when it's not
> > zero. It seems like this should be an error if it's not 0?
> > 
> > Granted we have made some contract with the user to do the right thing,
> > but say someone does mess up, we don't indicate to them hey this ref is
> > actually dead and if they're waiting for it to go to 0, it never will.
> 
> fwiw, I copied is_zero, but skimming through the code don't immediately
> see myself why it is so...
> 
> Cc Ming, he split out some parts of it to dynamic allocation not too
> long ago, maybe he knows the trick.

I remembered that percpu_ref_is_zero() can be called even after percpu_ref_exit()
returns, and looks percpu_ref_is_zero() isn't classified into 'active use'.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16 14:10       ` Ming Lei
@ 2021-04-16 14:37         ` Dennis Zhou
  2021-04-19  2:03           ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Dennis Zhou @ 2021-04-16 14:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Pavel Begunkov, Jens Axboe, io-uring, Tejun Heo,
	Christoph Lameter, Joakim Hassila

On Fri, Apr 16, 2021 at 10:10:07PM +0800, Ming Lei wrote:
> On Fri, Apr 16, 2021 at 02:16:41PM +0100, Pavel Begunkov wrote:
> > On 16/04/2021 05:45, Dennis Zhou wrote:
> > > Hello,
> > > 
> > > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote:
> > >> Add percpu_ref_atomic_count(), which returns number of references of a
> > >> percpu_ref switched prior into atomic mode, so the caller is responsible
> > >> to make sure it's in the right mode.
> > >>
> > >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> > >> ---
> > >>  include/linux/percpu-refcount.h |  1 +
> > >>  lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
> > >>  2 files changed, 27 insertions(+)
> > >>
> > >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> > >> index 16c35a728b4c..0ff40e79efa2 100644
> > >> --- a/include/linux/percpu-refcount.h
> > >> +++ b/include/linux/percpu-refcount.h
> > >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> > >>  void percpu_ref_resurrect(struct percpu_ref *ref);
> > >>  void percpu_ref_reinit(struct percpu_ref *ref);
> > >>  bool percpu_ref_is_zero(struct percpu_ref *ref);
> > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref);
> > >>  
> > >>  /**
> > >>   * percpu_ref_kill - drop the initial ref
> > >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> > >> index a1071cdefb5a..56286995e2b8 100644
> > >> --- a/lib/percpu-refcount.c
> > >> +++ b/lib/percpu-refcount.c
> > >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
> > >>  }
> > >>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
> > >>  
> > >> +/**
> > >> + * percpu_ref_atomic_count - returns number of left references
> > >> + * @ref: percpu_ref to test
> > >> + *
> > >> + * This function is safe to call as long as @ref is switch into atomic mode,
> > >> + * and is between init and exit.
> > >> + */
> > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref)
> > >> +{
> > >> +	unsigned long __percpu *percpu_count;
> > >> +	unsigned long count, flags;
> > >> +
> > >> +	if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)))
> > >> +		return -1UL;
> > >> +
> > >> +	/* protect us from being destroyed */
> > >> +	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
> > >> +	if (ref->data)
> > >> +		count = atomic_long_read(&ref->data->count);
> > >> +	else
> > >> +		count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS;
> > > 
> > > Sorry I missed Jens' patch before and also the update to percpu_ref.
> > > However, I feel like I'm missing something. This isn't entirely related
> > > to your patch, but I'm not following why percpu_count_ptr stores the
> > > excess count of an exited percpu_ref and doesn't warn when it's not
> > > zero. It seems like this should be an error if it's not 0?
> > > 
> > > Granted we have made some contract with the user to do the right thing,
> > > but say someone does mess up, we don't indicate to them hey this ref is
> > > actually dead and if they're waiting for it to go to 0, it never will.
> > 
> > fwiw, I copied is_zero, but skimming through the code don't immediately
> > see myself why it is so...
> > 
> > Cc Ming, he split out some parts of it to dynamic allocation not too
> > long ago, maybe he knows the trick.
> 
> I remembered that percpu_ref_is_zero() can be called even after percpu_ref_exit()
> returns, and looks percpu_ref_is_zero() isn't classified into 'active use'.
> 

Looking at the commit prior, it seems like percpu_ref_is_zero() was
subject to the usual init and exit lifetime. I guess I'm just not
convinced it should ever be > 0. I'll think about it a little longer and
might fix it.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
  2021-04-16 14:09         ` Pavel Begunkov
@ 2021-04-16 14:42           ` Pavel Begunkov
       [not found]           ` <20210417013115.15032-1-hdanton@sina.com>
  1 sibling, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-16 14:42 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 16/04/2021 15:09, Pavel Begunkov wrote:
> On 16/04/2021 14:58, Jens Axboe wrote:
>> On 4/16/21 7:12 AM, Pavel Begunkov wrote:
>>> On 16/04/2021 14:04, Jens Axboe wrote:
>>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote:
>>>>> On 16/04/2021 01:22, Pavel Begunkov wrote:
>>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
>>>>>
>>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't
>>>>> find it anywhere. If you still have it, especially if it was
>>>>> reviewed/etc., may make sense to go with it instead
>>>>
>>>> I wonder if we can do something like the below instead - we don't
>>>> care about a particularly stable count in terms of wakeup
>>>> reliance, and it'd save a nasty sync atomic switch.
>>>
>>> But we care about it being monotonous. There are nuances with it.
>>
>> Do we, though? We care about it changing when something has happened,
>> but not about it being monotonic.
> 
> We may find inflight == get_inflight(), when it's not really so,
> and so get to schedule() awhile there are pending requests that
> are not going to be cancelled by itself. And those pending requests
> may have been non-discoverable and so non-cancellable, e.g. because
> were a part of a ling/hardlink.

Anyway, there might be other problems because of how wake_up()'s
and ctx->refs putting is ordered. Needs to be remade, probably
without ctx->refs in the first place.

>>> I think, non sync'ed summing may put it to eternal sleep.
>>
>> That's what the two reads are about, that's the same as before. The
>> numbers are racy in both cases, but that's why we compare after having
>> added ourselves to the wait queue.
>>
>>> Are you looking to save on switching? It's almost always is already
>>> dying with prior ref_kill
>>
>> Yep, always looking to avoid a sync switch if at all possible. For 99%
>> of the cases it's fine, it's the last case in busy prod that wreaks
>> havoc.
> 
> Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll
> doesn't have many file notes (as it was called before). We can
> completely avoid it and even make faster if happens from sq_thread()
> on it getting to exit, but do we want it for 5.12?
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16  0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
  2021-04-16  4:45   ` Dennis Zhou
@ 2021-04-16 15:31   ` Bart Van Assche
  2021-04-16 15:34     ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2021-04-16 15:31 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 4/15/21 5:22 PM, Pavel Begunkov wrote:
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index a1071cdefb5a..56286995e2b8 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
>  }
>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
>  
> +/**
> + * percpu_ref_atomic_count - returns number of left references
> + * @ref: percpu_ref to test
> + *
> + * This function is safe to call as long as @ref is switch into atomic mode,
> + * and is between init and exit.
> + */

How about using the name percpu_ref_read() instead of
percpu_ref_atomic_count()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16 15:31   ` Bart Van Assche
@ 2021-04-16 15:34     ` Jens Axboe
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2021-04-16 15:34 UTC (permalink / raw)
  To: Bart Van Assche, Pavel Begunkov, io-uring
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila

On 4/16/21 9:31 AM, Bart Van Assche wrote:
> On 4/15/21 5:22 PM, Pavel Begunkov wrote:
>> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
>> index a1071cdefb5a..56286995e2b8 100644
>> --- a/lib/percpu-refcount.c
>> +++ b/lib/percpu-refcount.c
>> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
>>  }
>>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
>>  
>> +/**
>> + * percpu_ref_atomic_count - returns number of left references
>> + * @ref: percpu_ref to test
>> + *
>> + * This function is safe to call as long as @ref is switch into atomic mode,
>> + * and is between init and exit.
>> + */
> 
> How about using the name percpu_ref_read() instead of
> percpu_ref_atomic_count()?

Not sure we're going that route, but in any case, I think it's important
to have it visibly require the ref to be in atomic mode. Maybe
percpu_ref_read_atomic() would be better, but I do think 'atomic' has
to be in the name.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] fix hangs with shared sqpoll
       [not found]           ` <20210417013115.15032-1-hdanton@sina.com>
@ 2021-04-18 13:56             ` Pavel Begunkov
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2021-04-18 13:56 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Jens Axboe, io-uring

On 4/17/21 2:31 AM, Hillf Danton wrote:
> On Fri, 16 Apr 2021 15:42:07 Pavel Begunkov wrote:
>> On 16/04/2021 15:09, Pavel Begunkov wrote:
>>> On 16/04/2021 14:58, Jens Axboe wrote:
>>>> On 4/16/21 7:12 AM, Pavel Begunkov wrote:
>>>>> On 16/04/2021 14:04, Jens Axboe wrote:
>>>>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote:
>>>>>>> On 16/04/2021 01:22, Pavel Begunkov wrote:
>>>>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
>>>>>>>
>>>>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't
>>>>>>> find it anywhere. If you still have it, especially if it was
>>>>>>> reviewed/etc., may make sense to go with it instead
>>>>>>
>>>>>> I wonder if we can do something like the below instead - we don't
>>>>>> care about a particularly stable count in terms of wakeup
>>>>>> reliance, and it'd save a nasty sync atomic switch.
>>>>>
>>>>> But we care about it being monotonous. There are nuances with it.
>>>>
>>>> Do we, though? We care about it changing when something has happened,
>>>> but not about it being monotonic.
>>>
>>> We may find inflight == get_inflight(), when it's not really so,
>>> and so get to schedule() awhile there are pending requests that
>>> are not going to be cancelled by itself. And those pending requests
>>> may have been non-discoverable and so non-cancellable, e.g. because
>>> were a part of a ling/hardlink.
>>
>> Anyway, there might be other problems because of how wake_up()'s
>> and ctx->refs putting is ordered. Needs to be remade, probably
>> without ctx->refs in the first place.
>>
> Given the test rounds in the current tree, next tree and his tree the

Whose "his" tree?

> percpu count had survived, one of the quick questions is how it fell apart
> last night?

What "percpu count had survived"? Do you mean the percpu-related patch
from the series? What fell apart?

--   
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count()
  2021-04-16 14:37         ` Dennis Zhou
@ 2021-04-19  2:03           ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2021-04-19  2:03 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Pavel Begunkov, Jens Axboe, io-uring, Tejun Heo,
	Christoph Lameter, Joakim Hassila

On Fri, Apr 16, 2021 at 02:37:03PM +0000, Dennis Zhou wrote:
> On Fri, Apr 16, 2021 at 10:10:07PM +0800, Ming Lei wrote:
> > On Fri, Apr 16, 2021 at 02:16:41PM +0100, Pavel Begunkov wrote:
> > > On 16/04/2021 05:45, Dennis Zhou wrote:
> > > > Hello,
> > > > 
> > > > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote:
> > > >> Add percpu_ref_atomic_count(), which returns number of references of a
> > > >> percpu_ref switched prior into atomic mode, so the caller is responsible
> > > >> to make sure it's in the right mode.
> > > >>
> > > >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> > > >> ---
> > > >>  include/linux/percpu-refcount.h |  1 +
> > > >>  lib/percpu-refcount.c           | 26 ++++++++++++++++++++++++++
> > > >>  2 files changed, 27 insertions(+)
> > > >>
> > > >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> > > >> index 16c35a728b4c..0ff40e79efa2 100644
> > > >> --- a/include/linux/percpu-refcount.h
> > > >> +++ b/include/linux/percpu-refcount.h
> > > >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> > > >>  void percpu_ref_resurrect(struct percpu_ref *ref);
> > > >>  void percpu_ref_reinit(struct percpu_ref *ref);
> > > >>  bool percpu_ref_is_zero(struct percpu_ref *ref);
> > > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref);
> > > >>  
> > > >>  /**
> > > >>   * percpu_ref_kill - drop the initial ref
> > > >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> > > >> index a1071cdefb5a..56286995e2b8 100644
> > > >> --- a/lib/percpu-refcount.c
> > > >> +++ b/lib/percpu-refcount.c
> > > >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref)
> > > >>  }
> > > >>  EXPORT_SYMBOL_GPL(percpu_ref_is_zero);
> > > >>  
> > > >> +/**
> > > >> + * percpu_ref_atomic_count - returns number of left references
> > > >> + * @ref: percpu_ref to test
> > > >> + *
> > > >> + * This function is safe to call as long as @ref is switch into atomic mode,
> > > >> + * and is between init and exit.
> > > >> + */
> > > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref)
> > > >> +{
> > > >> +	unsigned long __percpu *percpu_count;
> > > >> +	unsigned long count, flags;
> > > >> +
> > > >> +	if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count)))
> > > >> +		return -1UL;
> > > >> +
> > > >> +	/* protect us from being destroyed */
> > > >> +	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
> > > >> +	if (ref->data)
> > > >> +		count = atomic_long_read(&ref->data->count);
> > > >> +	else
> > > >> +		count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS;
> > > > 
> > > > Sorry I missed Jens' patch before and also the update to percpu_ref.
> > > > However, I feel like I'm missing something. This isn't entirely related
> > > > to your patch, but I'm not following why percpu_count_ptr stores the
> > > > excess count of an exited percpu_ref and doesn't warn when it's not
> > > > zero. It seems like this should be an error if it's not 0?
> > > > 
> > > > Granted we have made some contract with the user to do the right thing,
> > > > but say someone does mess up, we don't indicate to them hey this ref is
> > > > actually dead and if they're waiting for it to go to 0, it never will.
> > > 
> > > fwiw, I copied is_zero, but skimming through the code don't immediately
> > > see myself why it is so...
> > > 
> > > Cc Ming, he split out some parts of it to dynamic allocation not too
> > > long ago, maybe he knows the trick.
> > 
> > I remembered that percpu_ref_is_zero() can be called even after percpu_ref_exit()
> > returns, and looks percpu_ref_is_zero() isn't classified into 'active use'.
> > 
> 
> Looking at the commit prior, it seems like percpu_ref_is_zero() was
> subject to the usual init and exit lifetime. I guess I'm just not
> convinced it should ever be > 0. I'll think about it a little longer and
> might fix it.

There may not be > 0 at that time, but it was allowed for
percpu_ref_is_zero() to read un-initialized refcount, and there was
such kernel oops report:

https://lore.kernel.org/lkml/165db20c-bfc5-fca8-1ecf-45d85ea5d9e2@kernel.dk/#r




Thanks, 
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-19  2:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16  0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
2021-04-16  0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
2021-04-16  4:45   ` Dennis Zhou
2021-04-16 13:16     ` Pavel Begunkov
2021-04-16 14:10       ` Ming Lei
2021-04-16 14:37         ` Dennis Zhou
2021-04-19  2:03           ` Ming Lei
2021-04-16 15:31   ` Bart Van Assche
2021-04-16 15:34     ` Jens Axboe
2021-04-16  0:22 ` [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs Pavel Begunkov
2021-04-16  0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
2021-04-16 13:04   ` Jens Axboe
2021-04-16 13:12     ` Pavel Begunkov
2021-04-16 13:58       ` Jens Axboe
2021-04-16 14:09         ` Pavel Begunkov
2021-04-16 14:42           ` Pavel Begunkov
     [not found]           ` <20210417013115.15032-1-hdanton@sina.com>
2021-04-18 13:56             ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.