io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* signals not reliably interrupting io_uring_enter anymore
@ 2020-07-04  0:00 Andres Freund
  2020-07-04  0:15 ` Andres Freund
  0 siblings, 1 reply; 10+ messages in thread
From: Andres Freund @ 2020-07-04  0:00 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Hi,

I haven't yet fully analyzed the problem, but after updating to
cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres does
not work reliably anymore.

The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
interrupted by signals anymore. The signal handler executes, but
afterwards the syscall is restarted. Previously io_uring_enter reliably
returned EINTR in that case.

Currently postgres relies on signals interrupting io_uring_enter(). We
probably can find a way to not do so, but it'd not be entirely trivial.

I suspect the issue is

commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag: io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
Author: Jens Axboe <axboe@kernel.dk>
Date:   2020-06-30 12:39:05 -0600

    io_uring: use signal based task_work running

as that appears to have changed the error returned by
io_uring_enter(GETEVENTS) after having been interrupted by a signal from
EINTR to ERESTARTSYS.


I'll check to make sure that the issue doesn't exist before the above
commit.


Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  0:00 signals not reliably interrupting io_uring_enter anymore Andres Freund
@ 2020-07-04  0:15 ` Andres Freund
  2020-07-04  0:48   ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Andres Freund @ 2020-07-04  0:15 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Hi,

On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
> I haven't yet fully analyzed the problem, but after updating to
> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres does
> not work reliably anymore.
> 
> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
> interrupted by signals anymore. The signal handler executes, but
> afterwards the syscall is restarted. Previously io_uring_enter reliably
> returned EINTR in that case.
> 
> Currently postgres relies on signals interrupting io_uring_enter(). We
> probably can find a way to not do so, but it'd not be entirely trivial.
> 
> I suspect the issue is
> 
> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag: io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   2020-06-30 12:39:05 -0600
> 
>     io_uring: use signal based task_work running
> 
> as that appears to have changed the error returned by
> io_uring_enter(GETEVENTS) after having been interrupted by a signal from
> EINTR to ERESTARTSYS.
> 
> 
> I'll check to make sure that the issue doesn't exist before the above
> commit.

Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue doesn't
exist, which pretty much confirms that the above commit is the issue.

What was the reason for changing EINTR to ERESTARTSYS in the above
commit? I assume trying to avoid returning spurious EINTRs to userland?

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  0:15 ` Andres Freund
@ 2020-07-04  0:48   ` Jens Axboe
  2020-07-04  1:13     ` Andres Freund
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-07-04  0:48 UTC (permalink / raw)
  To: Andres Freund, io-uring

On 7/3/20 6:15 PM, Andres Freund wrote:
> Hi,
> 
> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>> I haven't yet fully analyzed the problem, but after updating to
>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres does
>> not work reliably anymore.
>>
>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>> interrupted by signals anymore. The signal handler executes, but
>> afterwards the syscall is restarted. Previously io_uring_enter reliably
>> returned EINTR in that case.
>>
>> Currently postgres relies on signals interrupting io_uring_enter(). We
>> probably can find a way to not do so, but it'd not be entirely trivial.
>>
>> I suspect the issue is
>>
>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag: io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>> Author: Jens Axboe <axboe@kernel.dk>
>> Date:   2020-06-30 12:39:05 -0600
>>
>>     io_uring: use signal based task_work running
>>
>> as that appears to have changed the error returned by
>> io_uring_enter(GETEVENTS) after having been interrupted by a signal from
>> EINTR to ERESTARTSYS.
>>
>>
>> I'll check to make sure that the issue doesn't exist before the above
>> commit.
> 
> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue doesn't
> exist, which pretty much confirms that the above commit is the issue.
> 
> What was the reason for changing EINTR to ERESTARTSYS in the above
> commit? I assume trying to avoid returning spurious EINTRs to userland?

Yeah, for when it's running task_work. I wonder if something like the
below will do the trick?


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..0efa73d78451 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		/* make sure we run task_work before checking for signals */
-		if (current->task_works)
-			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (current->task_works)
+				ret = -ERESTARTSYS;
+			else
+				ret = -EINTR;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  0:48   ` Jens Axboe
@ 2020-07-04  1:13     ` Andres Freund
  2020-07-04  1:52       ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Andres Freund @ 2020-07-04  1:13 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Hi, 

On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@kernel.dk> wrote:
>On 7/3/20 6:15 PM, Andres Freund wrote:
>> Hi,
>> 
>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>> I haven't yet fully analyzed the problem, but after updating to
>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>does
>>> not work reliably anymore.
>>>
>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>> interrupted by signals anymore. The signal handler executes, but
>>> afterwards the syscall is restarted. Previously io_uring_enter
>reliably
>>> returned EINTR in that case.
>>>
>>> Currently postgres relies on signals interrupting io_uring_enter().
>We
>>> probably can find a way to not do so, but it'd not be entirely
>trivial.
>>>
>>> I suspect the issue is
>>>
>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>> Author: Jens Axboe <axboe@kernel.dk>
>>> Date:   2020-06-30 12:39:05 -0600
>>>
>>>     io_uring: use signal based task_work running
>>>
>>> as that appears to have changed the error returned by
>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>from
>>> EINTR to ERESTARTSYS.
>>>
>>>
>>> I'll check to make sure that the issue doesn't exist before the
>above
>>> commit.
>> 
>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>doesn't
>> exist, which pretty much confirms that the above commit is the issue.
>> 
>> What was the reason for changing EINTR to ERESTARTSYS in the above
>> commit? I assume trying to avoid returning spurious EINTRs to
>userland?
>
>Yeah, for when it's running task_work. I wonder if something like the
>below will do the trick?
>
>
>diff --git a/fs/io_uring.c b/fs/io_uring.c
>index 700644a016a7..0efa73d78451 100644
>--- a/fs/io_uring.c
>+++ b/fs/io_uring.c
>@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>*ctx, int min_events,
> 	do {
> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
> 						TASK_INTERRUPTIBLE);
>-		/* make sure we run task_work before checking for signals */
>-		if (current->task_works)
>-			task_work_run();
> 		if (signal_pending(current)) {
>-			ret = -ERESTARTSYS;
>+			if (current->task_works)
>+				ret = -ERESTARTSYS;
>+			else
>+				ret = -EINTR;
> 			break;
> 		}
> 		if (io_should_wake(&iowq, false))
>@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>*ctx, int min_events,
> 	} while (1);
> 	finish_wait(&ctx->wait, &iowq.wq);
> 
>-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>+	restore_saved_sigmask_unless(ret == -EINTR);
> 
>	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>0;
> }

I'll try in a bit. Suspect however that there'd be trouble if there were both an actual signal and task work pending?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  1:13     ` Andres Freund
@ 2020-07-04  1:52       ` Jens Axboe
  2020-07-04  2:08         ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-07-04  1:52 UTC (permalink / raw)
  To: Andres Freund, io-uring

On 7/3/20 7:13 PM, Andres Freund wrote:
> Hi, 
> 
> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@kernel.dk> wrote:
>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>> I haven't yet fully analyzed the problem, but after updating to
>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>> does
>>>> not work reliably anymore.
>>>>
>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>> interrupted by signals anymore. The signal handler executes, but
>>>> afterwards the syscall is restarted. Previously io_uring_enter
>> reliably
>>>> returned EINTR in that case.
>>>>
>>>> Currently postgres relies on signals interrupting io_uring_enter().
>> We
>>>> probably can find a way to not do so, but it'd not be entirely
>> trivial.
>>>>
>>>> I suspect the issue is
>>>>
>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>> Date:   2020-06-30 12:39:05 -0600
>>>>
>>>>     io_uring: use signal based task_work running
>>>>
>>>> as that appears to have changed the error returned by
>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>> from
>>>> EINTR to ERESTARTSYS.
>>>>
>>>>
>>>> I'll check to make sure that the issue doesn't exist before the
>> above
>>>> commit.
>>>
>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>> doesn't
>>> exist, which pretty much confirms that the above commit is the issue.
>>>
>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>> commit? I assume trying to avoid returning spurious EINTRs to
>> userland?
>>
>> Yeah, for when it's running task_work. I wonder if something like the
>> below will do the trick?
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 700644a016a7..0efa73d78451 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>> *ctx, int min_events,
>> 	do {
>> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>> 						TASK_INTERRUPTIBLE);
>> -		/* make sure we run task_work before checking for signals */
>> -		if (current->task_works)
>> -			task_work_run();
>> 		if (signal_pending(current)) {
>> -			ret = -ERESTARTSYS;
>> +			if (current->task_works)
>> +				ret = -ERESTARTSYS;
>> +			else
>> +				ret = -EINTR;
>> 			break;
>> 		}
>> 		if (io_should_wake(&iowq, false))
>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>> *ctx, int min_events,
>> 	} while (1);
>> 	finish_wait(&ctx->wait, &iowq.wq);
>>
>> -	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>> +	restore_saved_sigmask_unless(ret == -EINTR);
>>
>> 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>> 0;
>> }
> 
> I'll try in a bit. Suspect however that there'd be trouble if there
> were both an actual signal and task work pending?

Yes, I have that worry too. We'd really need to check if we have an
actual signal pending - if we do, we still do -EINTR. If not, then we
just do -ERESTARTSYS and restart the system call after task_work has
been completed. Half-assed approach below, I suspect this won't _really_
work without appropriate locking. Which would be unfortunate.

Either that, or we'd need to know if an actual signal was delivered when
we get re-entered due to returning -ERESTARTSYS. If it was just
task_work being run, then we're fine. But if an actual signal was
pending, then we'd need to return -EINTR.

CC'ing Oleg to see if he has any good ideas here.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..715d56144f15 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		/* make sure we run task_work before checking for signals */
-		if (current->task_works)
-			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (has_pending_signal(current))
+				ret = -EINTR;
+			else
+				ret = -ERESTARTSYS;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 6bb1a3f0258c..8ef23b0bb406 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -321,6 +321,7 @@ static inline void disallow_signal(int sig)
 extern struct kmem_cache *sighand_cachep;
 
 extern bool unhandled_signal(struct task_struct *tsk, int sig);
+extern bool has_pending_signal(struct task_struct *tsk);
 
 /*
  * In POSIX a signal is sent either to a specific thread (Linux task)
diff --git a/kernel/signal.c b/kernel/signal.c
index ee22ec78fd6d..8923872e5228 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -151,12 +151,16 @@ static inline bool has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 #define PENDING(p,b) has_pending_signals(&(p)->signal, (b))
 
+inline bool has_pending_signal(struct task_struct *t)
+{
+	return PENDING(&t->pending, &t->blocked) ||
+		PENDING(&t->signal->shared_pending, &t->blocked);
+}
+
 static bool recalc_sigpending_tsk(struct task_struct *t)
 {
 	if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
-	    PENDING(&t->pending, &t->blocked) ||
-	    PENDING(&t->signal->shared_pending, &t->blocked) ||
-	    cgroup_task_frozen(t)) {
+	    has_pending_signal(t) || cgroup_task_frozen(t)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
 		return true;
 	}

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  1:52       ` Jens Axboe
@ 2020-07-04  2:08         ` Jens Axboe
  2020-07-04  2:56           ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-07-04  2:08 UTC (permalink / raw)
  To: Andres Freund, io-uring

On 7/3/20 7:52 PM, Jens Axboe wrote:
> On 7/3/20 7:13 PM, Andres Freund wrote:
>> Hi, 
>>
>> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@kernel.dk> wrote:
>>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>>> Hi,
>>>>
>>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>>> I haven't yet fully analyzed the problem, but after updating to
>>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>>> does
>>>>> not work reliably anymore.
>>>>>
>>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>>> interrupted by signals anymore. The signal handler executes, but
>>>>> afterwards the syscall is restarted. Previously io_uring_enter
>>> reliably
>>>>> returned EINTR in that case.
>>>>>
>>>>> Currently postgres relies on signals interrupting io_uring_enter().
>>> We
>>>>> probably can find a way to not do so, but it'd not be entirely
>>> trivial.
>>>>>
>>>>> I suspect the issue is
>>>>>
>>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>> Date:   2020-06-30 12:39:05 -0600
>>>>>
>>>>>     io_uring: use signal based task_work running
>>>>>
>>>>> as that appears to have changed the error returned by
>>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>>> from
>>>>> EINTR to ERESTARTSYS.
>>>>>
>>>>>
>>>>> I'll check to make sure that the issue doesn't exist before the
>>> above
>>>>> commit.
>>>>
>>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>>> doesn't
>>>> exist, which pretty much confirms that the above commit is the issue.
>>>>
>>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>>> commit? I assume trying to avoid returning spurious EINTRs to
>>> userland?
>>>
>>> Yeah, for when it's running task_work. I wonder if something like the
>>> below will do the trick?
>>>
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 700644a016a7..0efa73d78451 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>>> *ctx, int min_events,
>>> 	do {
>>> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>>> 						TASK_INTERRUPTIBLE);
>>> -		/* make sure we run task_work before checking for signals */
>>> -		if (current->task_works)
>>> -			task_work_run();
>>> 		if (signal_pending(current)) {
>>> -			ret = -ERESTARTSYS;
>>> +			if (current->task_works)
>>> +				ret = -ERESTARTSYS;
>>> +			else
>>> +				ret = -EINTR;
>>> 			break;
>>> 		}
>>> 		if (io_should_wake(&iowq, false))
>>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>>> *ctx, int min_events,
>>> 	} while (1);
>>> 	finish_wait(&ctx->wait, &iowq.wq);
>>>
>>> -	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>>> +	restore_saved_sigmask_unless(ret == -EINTR);
>>>
>>> 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>>> 0;
>>> }
>>
>> I'll try in a bit. Suspect however that there'd be trouble if there
>> were both an actual signal and task work pending?
> 
> Yes, I have that worry too. We'd really need to check if we have an
> actual signal pending - if we do, we still do -EINTR. If not, then we
> just do -ERESTARTSYS and restart the system call after task_work has
> been completed. Half-assed approach below, I suspect this won't _really_
> work without appropriate locking. Which would be unfortunate.
> 
> Either that, or we'd need to know if an actual signal was delivered when
> we get re-entered due to returning -ERESTARTSYS. If it was just
> task_work being run, then we're fine. But if an actual signal was
> pending, then we'd need to return -EINTR.
> 
> CC'ing Oleg to see if he has any good ideas here.

This might be simpler:

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..8f54fd5085bf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		/* make sure we run task_work before checking for signals */
-		if (current->task_works)
-			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (!sigisemptyset(&current->pending.signal))
+				ret = -EINTR;
+			else
+				ret = -ERESTARTSYS;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  2:08         ` Jens Axboe
@ 2020-07-04  2:56           ` Jens Axboe
  2020-07-04 14:55             ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-07-04  2:56 UTC (permalink / raw)
  To: Andres Freund, io-uring; +Cc: Oleg Nesterov

On 7/3/20 8:08 PM, Jens Axboe wrote:
> On 7/3/20 7:52 PM, Jens Axboe wrote:
>> On 7/3/20 7:13 PM, Andres Freund wrote:
>>> Hi, 
>>>
>>> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>>>> Hi,
>>>>>
>>>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>>>> I haven't yet fully analyzed the problem, but after updating to
>>>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>>>> does
>>>>>> not work reliably anymore.
>>>>>>
>>>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>>>> interrupted by signals anymore. The signal handler executes, but
>>>>>> afterwards the syscall is restarted. Previously io_uring_enter
>>>> reliably
>>>>>> returned EINTR in that case.
>>>>>>
>>>>>> Currently postgres relies on signals interrupting io_uring_enter().
>>>> We
>>>>>> probably can find a way to not do so, but it'd not be entirely
>>>> trivial.
>>>>>>
>>>>>> I suspect the issue is
>>>>>>
>>>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>>>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>>> Date:   2020-06-30 12:39:05 -0600
>>>>>>
>>>>>>     io_uring: use signal based task_work running
>>>>>>
>>>>>> as that appears to have changed the error returned by
>>>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>>>> from
>>>>>> EINTR to ERESTARTSYS.
>>>>>>
>>>>>>
>>>>>> I'll check to make sure that the issue doesn't exist before the
>>>> above
>>>>>> commit.
>>>>>
>>>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>>>> doesn't
>>>>> exist, which pretty much confirms that the above commit is the issue.
>>>>>
>>>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>>>> commit? I assume trying to avoid returning spurious EINTRs to
>>>> userland?
>>>>
>>>> Yeah, for when it's running task_work. I wonder if something like the
>>>> below will do the trick?
>>>>
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 700644a016a7..0efa73d78451 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>>>> *ctx, int min_events,
>>>> 	do {
>>>> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>>>> 						TASK_INTERRUPTIBLE);
>>>> -		/* make sure we run task_work before checking for signals */
>>>> -		if (current->task_works)
>>>> -			task_work_run();
>>>> 		if (signal_pending(current)) {
>>>> -			ret = -ERESTARTSYS;
>>>> +			if (current->task_works)
>>>> +				ret = -ERESTARTSYS;
>>>> +			else
>>>> +				ret = -EINTR;
>>>> 			break;
>>>> 		}
>>>> 		if (io_should_wake(&iowq, false))
>>>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>>>> *ctx, int min_events,
>>>> 	} while (1);
>>>> 	finish_wait(&ctx->wait, &iowq.wq);
>>>>
>>>> -	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>>>> +	restore_saved_sigmask_unless(ret == -EINTR);
>>>>
>>>> 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>>>> 0;
>>>> }
>>>
>>> I'll try in a bit. Suspect however that there'd be trouble if there
>>> were both an actual signal and task work pending?
>>
>> Yes, I have that worry too. We'd really need to check if we have an
>> actual signal pending - if we do, we still do -EINTR. If not, then we
>> just do -ERESTARTSYS and restart the system call after task_work has
>> been completed. Half-assed approach below, I suspect this won't _really_
>> work without appropriate locking. Which would be unfortunate.
>>
>> Either that, or we'd need to know if an actual signal was delivered when
>> we get re-entered due to returning -ERESTARTSYS. If it was just
>> task_work being run, then we're fine. But if an actual signal was
>> pending, then we'd need to return -EINTR.
>>
>> CC'ing Oleg to see if he has any good ideas here.
> 
> This might be simpler:

Or... That's it for today, I'll check in after the weekend.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..25a1877d3d84 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,18 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		/* make sure we run task_work before checking for signals */
-		if (current->task_works)
-			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (current->jobctl & JOBCTL_TASK_WORK) {
+				spin_lock_irq(&current->sighand->siglock);
+				current->jobctl &= ~JOBCTL_TASK_WORK;
+				recalc_sigpending();
+				spin_unlock_irq(&current->sighand->siglock);
+				if (current->task_works) {
+					task_work_run();
+					continue;
+				}
+			}
+			ret = -EINTR;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6217,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04  2:56           ` Jens Axboe
@ 2020-07-04 14:55             ` Jens Axboe
  2020-07-04 19:11               ` Andres Freund
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-07-04 14:55 UTC (permalink / raw)
  To: Andres Freund, io-uring; +Cc: Oleg Nesterov

On 7/3/20 8:56 PM, Jens Axboe wrote:
> On 7/3/20 8:08 PM, Jens Axboe wrote:
>> On 7/3/20 7:52 PM, Jens Axboe wrote:
>>> On 7/3/20 7:13 PM, Andres Freund wrote:
>>>> Hi, 
>>>>
>>>> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>>>>> I haven't yet fully analyzed the problem, but after updating to
>>>>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>>>>> does
>>>>>>> not work reliably anymore.
>>>>>>>
>>>>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>>>>> interrupted by signals anymore. The signal handler executes, but
>>>>>>> afterwards the syscall is restarted. Previously io_uring_enter
>>>>> reliably
>>>>>>> returned EINTR in that case.
>>>>>>>
>>>>>>> Currently postgres relies on signals interrupting io_uring_enter().
>>>>> We
>>>>>>> probably can find a way to not do so, but it'd not be entirely
>>>>> trivial.
>>>>>>>
>>>>>>> I suspect the issue is
>>>>>>>
>>>>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>>>>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>>>> Date:   2020-06-30 12:39:05 -0600
>>>>>>>
>>>>>>>     io_uring: use signal based task_work running
>>>>>>>
>>>>>>> as that appears to have changed the error returned by
>>>>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>>>>> from
>>>>>>> EINTR to ERESTARTSYS.
>>>>>>>
>>>>>>>
>>>>>>> I'll check to make sure that the issue doesn't exist before the
>>>>> above
>>>>>>> commit.
>>>>>>
>>>>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>>>>> doesn't
>>>>>> exist, which pretty much confirms that the above commit is the issue.
>>>>>>
>>>>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>>>>> commit? I assume trying to avoid returning spurious EINTRs to
>>>>> userland?
>>>>>
>>>>> Yeah, for when it's running task_work. I wonder if something like the
>>>>> below will do the trick?
>>>>>
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 700644a016a7..0efa73d78451 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>>>>> *ctx, int min_events,
>>>>> 	do {
>>>>> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>>>>> 						TASK_INTERRUPTIBLE);
>>>>> -		/* make sure we run task_work before checking for signals */
>>>>> -		if (current->task_works)
>>>>> -			task_work_run();
>>>>> 		if (signal_pending(current)) {
>>>>> -			ret = -ERESTARTSYS;
>>>>> +			if (current->task_works)
>>>>> +				ret = -ERESTARTSYS;
>>>>> +			else
>>>>> +				ret = -EINTR;
>>>>> 			break;
>>>>> 		}
>>>>> 		if (io_should_wake(&iowq, false))
>>>>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>>>>> *ctx, int min_events,
>>>>> 	} while (1);
>>>>> 	finish_wait(&ctx->wait, &iowq.wq);
>>>>>
>>>>> -	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>>>>> +	restore_saved_sigmask_unless(ret == -EINTR);
>>>>>
>>>>> 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>>>>> 0;
>>>>> }
>>>>
>>>> I'll try in a bit. Suspect however that there'd be trouble if there
>>>> were both an actual signal and task work pending?
>>>
>>> Yes, I have that worry too. We'd really need to check if we have an
>>> actual signal pending - if we do, we still do -EINTR. If not, then we
>>> just do -ERESTARTSYS and restart the system call after task_work has
>>> been completed. Half-assed approach below, I suspect this won't _really_
>>> work without appropriate locking. Which would be unfortunate.
>>>
>>> Either that, or we'd need to know if an actual signal was delivered when
>>> we get re-entered due to returning -ERESTARTSYS. If it was just
>>> task_work being run, then we're fine. But if an actual signal was
>>> pending, then we'd need to return -EINTR.
>>>
>>> CC'ing Oleg to see if he has any good ideas here.
>>
>> This might be simpler:
> 
> Or... That's it for today, I'll check in after the weekend.

This tests out fine for me, and it avoids TWA_SIGNAL if we're not using
an eventfd.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..d37d7ea5ebe5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4072,14 +4072,22 @@ struct io_poll_table {
 	int error;
 };
 
-static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb,
-				int notify)
+static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
 {
 	struct task_struct *tsk = req->task;
-	int ret;
+	struct io_ring_ctx *ctx = req->ctx;
+	int ret, notify = TWA_RESUME;
 
-	if (req->ctx->flags & IORING_SETUP_SQPOLL)
+	/*
+	 * SQPOLL kernel thread doesn't need notification, just a wakeup.
+	 * If we're not using an eventfd, then TWA_RESUME is always fine,
+	 * as we won't have dependencies between request completions for
+	 * other kernel wait conditions.
+	 */
+	if (ctx->flags & IORING_SETUP_SQPOLL)
 		notify = 0;
+	else if (ctx->cq_ev_fd)
+		notify = TWA_SIGNAL;
 
 	ret = task_work_add(tsk, cb, notify);
 	if (!ret)
@@ -4110,7 +4118,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 	 * of executing it. We can't safely execute it anyway, as we may not
 	 * have the needed state needed for it anyway.
 	 */
-	ret = io_req_task_work_add(req, &req->task_work, TWA_SIGNAL);
+	ret = io_req_task_work_add(req, &req->task_work);
 	if (unlikely(ret)) {
 		WRITE_ONCE(poll->canceled, true);
 		tsk = io_wq_get_task(req->ctx->io_wq);
@@ -6201,7 +6209,14 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 		if (current->task_works)
 			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (current->jobctl & JOBCTL_TASK_WORK) {
+				spin_lock_irq(&current->sighand->siglock);
+				current->jobctl &= ~JOBCTL_TASK_WORK;
+				recalc_sigpending();
+				spin_unlock_irq(&current->sighand->siglock);
+				continue;
+			}
+			ret = -EINTR;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6225,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04 14:55             ` Jens Axboe
@ 2020-07-04 19:11               ` Andres Freund
  2020-07-04 19:45                 ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Andres Freund @ 2020-07-04 19:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Oleg Nesterov

Hi,

On 2020-07-04 08:55:39 -0600, Jens Axboe wrote:
> This tests out fine for me, and it avoids TWA_SIGNAL if we're not using
> an eventfd.

I tried this variant and it does fix the problem for me, all my tests
pass again.  I assume this will eventually need to be in stable for 5.7?

Regards,

Andres

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: signals not reliably interrupting io_uring_enter anymore
  2020-07-04 19:11               ` Andres Freund
@ 2020-07-04 19:45                 ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2020-07-04 19:45 UTC (permalink / raw)
  To: Andres Freund; +Cc: io-uring, Oleg Nesterov

On 7/4/20 1:11 PM, Andres Freund wrote:
> Hi,
> 
> On 2020-07-04 08:55:39 -0600, Jens Axboe wrote:
>> This tests out fine for me, and it avoids TWA_SIGNAL if we're not using
>> an eventfd.
> 
> I tried this variant and it does fix the problem for me, all my tests
> pass again.  I assume this will eventually need to be in stable for 5.7?

Yeah, it should go in with the original, I'm aware of it and will keep
an eye on it.

Thanks for testing! Both this patch, but also the -rc series. So much
easier to catch incidents like this upfront.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-04 19:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-04  0:00 signals not reliably interrupting io_uring_enter anymore Andres Freund
2020-07-04  0:15 ` Andres Freund
2020-07-04  0:48   ` Jens Axboe
2020-07-04  1:13     ` Andres Freund
2020-07-04  1:52       ` Jens Axboe
2020-07-04  2:08         ` Jens Axboe
2020-07-04  2:56           ` Jens Axboe
2020-07-04 14:55             ` Jens Axboe
2020-07-04 19:11               ` Andres Freund
2020-07-04 19:45                 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).