linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Hillf Danton <hdanton@sina.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	io-uring <io-uring@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH RFC v2] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals
Date: Fri, 2 Oct 2020 07:44:53 -0600	[thread overview]
Message-ID: <de200581-1e95-c175-4b6a-a41636a34507@kernel.dk> (raw)
In-Reply-To: <20201002133813.3180-1-hdanton@sina.com>

On 10/2/20 7:38 AM, Hillf Danton wrote:
> 
> On Thu, 1 Oct 2020 11:27:04 -0600 Jens Axboe wrote:
>> On 10/1/20 10:27 AM, Oleg Nesterov wrote:
>>> Jens,
>>>
>>> I'll read this version tomorrow, but:
>>>
>>> On 10/01, Jens Axboe wrote:
>>>>
>>>>  static inline int signal_pending(struct task_struct *p)
>>>>  {
>>>> -	return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
>>>> +#ifdef TIF_TASKWORK
>>>> +	/*
>>>> +	 * TIF_TASKWORK isn't really a signal, but it requires the same
>>>> +	 * behavior of restarting the system call to force a kernel/user
>>>> +	 * transition.
>>>> +	 */
>>>> +	return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING) ||
>>>> +			test_tsk_thread_flag(p, TIF_TASKWORK));
>>>> +#else
>>>> +	return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING));
>>>> +#endif
>>>
>>> This change alone is already very wrong.
>>>
>>> signal_pending(task) == T means that this task will do get_signal() as
>>> soon as it can, and this basically means you can't "divorce" SIGPENDING
>>> and TASKWORK.
>>>
>>> Simple example. Suppose we have a single-threaded task T.
>>>
>>> Someone does task_work_add(T, TWA_SIGNAL). This makes signal_pending()==T
>>> and this is what we need.
>>>
>>> Now suppose that another task sends a signal to T before T calls
>>> task_work_run() and clears TIF_TASKWORK. In this case SIGPENDING won't
>>> be set because signal_pending() is already set (see wants_signal), and
>>> this means that T won't notice this signal.
>>
>> That's a good point, and I have been thinking along those lines. The
>> "problem" is the two different use cases:
>>
>> 1) The "should I return from schedule() or break out of schedule() loops
>>    kind of use cases".
>>
>> 2) Internal signal delivery use cases.
>>
>> The former wants one that factors in TIF_TASKWORK, while the latter
>> should of course only look at TIF_SIGPENDING.
>>
>> Now, my gut reaction would be to have __signal_pending() that purely
>> checks for TIF_SIGPENDING, and make sure we use that on the signal
>> delivery side of things. Or something with a better name than that, but
>> functionally the same. Ala:
>>
>> static inline int __signal_pending(struct task_struct *p)
>> {
>> 	return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING));
>> }
>>
>> static inline int signal_pending(struct task_struct *p)
>> {
>> #ifdef TIF_TASKWORK
>> 	return unlikely(test_tsk_thread_flag(p, TIF_TASKWORK)||
>> 			__signal_pending(p));
>> #else
>> 	return __signal_pending(p));
>> #endif
>> }
>>
>> and then use __signal_pending() on the signal delivery side.
>>
>> It's still not great in the sense that renaming signal_pending() would
>> be a better choice, but that's a whole lot of churn...
> 
> To avoid that churn, IIUC replace TWA_SIGNAL with TWA_RESUME on
> adding task work, which is compensated by adding a counter of
> event source in IO ctx and waiting for event to arrive instead
> of signal.

That doesn't work. If the task is waiting in cqring_wait(), then
there's no issue already. The problem is if it's waiting somewhere
else.

Imagine three threads, call them T1-3. T1 creates a pipe, and creates
a ring. T1 queues a poll request for the read end of the pipe, and now
does a wait for T2. T2 is a completer thread, so it ends up waiting
for events on the ring. T2 is now in cqring_wait(). T3 is created,
and it writes to the pipe. This write triggers the original poll
request from T1, and task_work is now queued for T1. This task work
needs to be processed for T2 to wakeup and complete, but it can't
since T1 is in pthread_join() for T2.

This is why TWA_SIGNAL is needed, we need it to break the T1 wait
loop and process this work. No amount of changes in io_uring can
fix this dependency, and if you look at the last series posted,
it does in fact not even have any io_uring changes.

Hence the goal is to have TWA_SIGNAL have the same kind of semantics
it does now, but decoupled from ->sighand since that is problematic
on particularly threaded setups.

-- 
Jens Axboe


      parent reply	other threads:[~2020-10-02 13:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 15:03 Jens Axboe
2020-10-01 16:27 ` Oleg Nesterov
2020-10-01 17:27   ` Jens Axboe
     [not found]   ` <20201002133813.3180-1-hdanton@sina.com>
2020-10-02 13:44     ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de200581-1e95-c175-4b6a-a41636a34507@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hdanton@sina.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --subject='Re: [PATCH RFC v2] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).