All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@domain.hid>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: Xenomai core <Xenomai-core@domain.hid>
Subject: Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and	thread deletion
Date: Thu, 14 Jul 2011 22:57:00 +0200	[thread overview]
Message-ID: <4E1F581C.4050809@domain.hid> (raw)
In-Reply-To: <4E1DEE27.7030900@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 4747 bytes --]

On 2011-07-13 21:12, Gilles Chanteperdrix wrote:
> On 07/13/2011 09:04 PM, Jan Kiszka wrote:
>> On 2011-07-13 20:39, Gilles Chanteperdrix wrote:
>>> On 07/12/2011 07:43 PM, Jan Kiszka wrote:
>>>> On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
>>>>> On 07/12/2011 07:34 PM, Jan Kiszka wrote:
>>>>>> On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
>>>>>>> On 07/12/2011 02:57 PM, Jan Kiszka wrote:
>>>>>>>>  			xnlock_put_irqrestore(&nklock, s);
>>>>>>>>  			xnpod_schedule();
>>>>>>>>  		}
>>>>>>>> @@ -1036,6 +1043,7 @@ redo:
>>>>>>>>  	 * to process this signal anyway.
>>>>>>>>  	 */
>>>>>>>>  	if (rthal_current_domain == rthal_root_domain) {
>>>>>>>> +		XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));
>>>>>>>
>>>>>>> Misleading dead code again, XNATOMIC is cleared not ten lines above.
>>>>>>
>>>>>> Nope, I forgot to remove that line.
>>>>>>
>>>>>>>
>>>>>>>>  		if (XENO_DEBUG(NUCLEUS) && (!signal_pending(this_task)
>>>>>>>>  		    || this_task->state != TASK_RUNNING))
>>>>>>>>  			xnpod_fatal
>>>>>>>> @@ -1044,6 +1052,8 @@ redo:
>>>>>>>>  		return -ERESTARTSYS;
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> +	xnthread_clear_info(thread, XNATOMIC);
>>>>>>>
>>>>>>> Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
>>>>>>> place at the point it currently is.
>>>>>>
>>>>>> Nope. Now we either clear XNATOMIC after successful migration or when
>>>>>> the signal is about to be sent (ie. in the hook). That way we can test
>>>>>> more reliably (TM) in the gatekeeper if the thread can be migrated.
>>>>>
>>>>> Ok for adding the XNATOMIC test, because it improves the robustness, but
>>>>> why changing the way XNATOMIC is set and clear? Chances of breaking
>>>>> thing while changing code in this area are really high...
>>>>
>>>> The current code is (most probably) broken as it does not properly
>>>> synchronizes the gatekeeper against a signaled and "runaway" target
>>>> Linux task.
>>>>
>>>> We need an indication if a Linux signal will (or already has) woken up
>>>> the to-be-migrated task. That task may have continued over its context,
>>>> potentially on a different CPU. Providing this indication is the purpose
>>>> of changing where XNATOMIC is cleared.
>>>
>>> What about synchronizing with the gatekeeper with a semaphore, as done
>>> in the first patch you sent, but doing it in xnshadow_harden, as soon as
>>> we detect that we are not back from schedule in primary mode? It seems
>>> it would avoid any further issue, as we would then be guaranteed that
>>> the thread could not switch to TASK_INTERRUPTIBLE again before the
>>> gatekeeper is finished.
>>
>> The problem is that the gatekeeper tests the task state without holding
>> the task's rq lock (which is not available to us without a kernel
>> patch). That cannot work reliably as long as we accept signals. That's
>> why I'm trying to move state change and test under nklock.
>>
>>>
>>> What worries me is the comment in xnshadow_harden:
>>>
>>> 	 * gatekeeper sent us to primary mode. Since
>>> 	 * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
>>> 	 * the runqueue's count of uniniterruptible tasks, we just
>>> 	 * notice the issue and gracefully fail; the caller will have
>>> 	 * to process this signal anyway.
>>> 	 */
>>>
>>> Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
>>> point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
>>> business of xnshadow_harden?
>>>
>>
>> TASK_UNINTERRUPTIBLE is not available without patching the kernel's
>> scheduler for the reason mentioned in the comment (the scheduler becomes
>> confused and may pick the wrong tasks, IIRC).
> 
> Does not using down/up in the taskexit event handler risk to cause the
> same issue?

Yes, and that means the first patch is incomplete without something like
the second.

> 
>>
>> But I would refrain from trying to "improve" the gatekeeper design. I've
>> recently mentioned this to Philippe offlist: For Xenomai 3 with some
>> ipipe v3, we must rather patch schedule() to enable zero-switch domain
>> migration. Means: enter the scheduler, let it suspend current and pick
>> another task, but then simply escalate to the RT domain before doing any
>> context switch. That's much cheaper than the current design and
>> hopefully also less error-prone.
> 
> So, do you want me to merge your for-upstream branch?

You may merge up to for-upstream^, ie. without any gatekeeper fixes.

I strongly suspect that there are still more races in the migration
path. The crashes we face even with all patches applied may be related
to a shadow task being executed under Linux and Xenomai at the same time.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  reply	other threads:[~2011-07-14 20:57 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <E1QfDvt-0003TN-G7@domain.hid>
2011-07-11 18:53 ` [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion Gilles Chanteperdrix
2011-07-11 19:10   ` Jan Kiszka
2011-07-11 19:16     ` Jan Kiszka
2011-07-11 19:51       ` Gilles Chanteperdrix
2011-07-11 19:59         ` Jan Kiszka
2011-07-11 20:02           ` Gilles Chanteperdrix
2011-07-11 20:06             ` Jan Kiszka
2011-07-11 20:09               ` Gilles Chanteperdrix
2011-07-11 20:12                 ` Jan Kiszka
2011-07-12  6:41                   ` Gilles Chanteperdrix
2011-07-12  7:22                     ` Jan Kiszka
2011-07-12  7:49                       ` Gilles Chanteperdrix
2011-07-12 10:59                       ` Gilles Chanteperdrix
2011-07-12 11:00                         ` Jan Kiszka
2011-07-12 11:04                           ` Gilles Chanteperdrix
2011-07-12 11:06                             ` Jan Kiszka
2011-07-12 11:08                               ` Gilles Chanteperdrix
2011-07-12 11:10                                 ` Jan Kiszka
2011-07-12 11:26                                   ` Gilles Chanteperdrix
2011-07-12 11:29                                     ` Jan Kiszka
2011-07-12 11:41                                       ` Gilles Chanteperdrix
2011-07-12 11:56                                         ` Jan Kiszka
2011-07-12 11:58                                           ` Jan Kiszka
2011-07-12 12:06                                             ` Gilles Chanteperdrix
2011-07-12 12:13                                               ` Jan Kiszka
2011-07-12 12:57                                                 ` Jan Kiszka
2011-07-12 15:48                                                   ` Philippe Gerum
2011-07-12 16:18                                                     ` Jan Kiszka
2011-07-12 17:31                                                   ` Gilles Chanteperdrix
2011-07-12 17:34                                                     ` Jan Kiszka
2011-07-12 17:38                                                       ` Gilles Chanteperdrix
2011-07-12 17:43                                                         ` Jan Kiszka
2011-07-13 18:39                                                           ` Gilles Chanteperdrix
2011-07-13 19:04                                                             ` Jan Kiszka
2011-07-13 19:12                                                               ` Gilles Chanteperdrix
2011-07-14 20:57                                                                 ` Jan Kiszka [this message]
2011-07-15 12:30                                                                   ` Gilles Chanteperdrix
2011-07-15 13:10                                                                     ` Jan Kiszka
2011-07-16  8:13                                                                       ` Jan Kiszka
2011-07-16  8:52                                                                         ` Philippe Gerum
2011-07-16  9:15                                                                           ` Jan Kiszka
2011-07-16  9:56                                                                             ` Philippe Gerum
2011-07-16 17:16                                                                               ` Jan Kiszka
2011-07-13 19:35                                                             ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E1F581C.4050809@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=Xenomai-core@domain.hid \
    --cc=gilles.chanteperdrix@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.