All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: Philippe Gerum <rpm@xenomai.org>
Cc: Xenomai <xenomai@xenomai.org>
Subject: Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
Date: Fri, 11 Jun 2021 09:15:34 +0200	[thread overview]
Message-ID: <99135b48-1452-4f47-eb93-1bfae2e5e0a2@siemens.com> (raw)
In-Reply-To: <87wnr37eht.fsf@xenomai.org>

On 09.06.21 08:12, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 08.06.21 19:39, Philippe Gerum wrote:
>>>
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 07.06.21 18:19, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, always this way?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>>>>>> other way around?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>>>>>
>>>>>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>>>>>
>>>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>>>>>> pending some tests ongoing here:
>>>>>>>>>>>>
>>>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>>>>>
>>>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>>>>>> scenario:
>>>>>>>>>>>>
>>>>>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>>>>>                        stall_inband();
>>>>>>>>>>>>                                 trap_handler();
>>>>>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>>>>>                              unstall_inband();
>>>>>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>>>>>
>>>>>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>>>>>> hard_local_irq_disable or in general?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>>>>>> directly?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>>>>>
>>>>>>>>> Can we safely sync at that point?
>>>>>>>>
>>>>>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>>>>>> point. What we may not do obviously, is creating synchronization points
>>>>>>>> where the kernel does not expect irqs.
>>>>>>>>
>>>>>>>
>>>>>>> Again: When will any reschedules that might be requested by those
>>>>>>> replayed IRQs be handled then?
>>>>>>>
>>>>>>
>>>>>> Nowhere unless we provide a log synchronization routine dedicated to the
>>>>>> in-band kernel exit path, which would fold in the rescheduling call via
>>>>>> irqentry_exit_cond_resched(). I'll write that one.
>>>>>>
>>>>>
>>>>> How about simply moving the replay up, before the kernel checks for
>>>>> pending reschedules anyway? We can stall inband then again, to please
>>>>> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>>>>>
>>>>
>>>> Yes, this is the right way to do this. I'm on it.
>>>
>>> Please test this (heavily) with cobalt. This survived a 2-hours long kvm
>>> test here, running loops of the evl test suite + hackbench over 64 vcpus,
>>> so far so good. No reason for any difference in behavior between cobalt
>>> and evl in this area, this is merely a dovetail business.
>>>
>>> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl)
>>> Author: Philippe Gerum <rpm@xenomai.org>
>>>
>>>     genirq: irq_pipeline: synchronize log on irq exit to kernel
>>>     
>>>     We must make sure to play any IRQ which might be pending in the
>>>     in-band log before leaving an interrupt frame for a preempted kernel
>>>     context.
>>>     
>>>     This completes "irq_pipeline: Account for stage migration across
>>>     faults", so that we synchronize the log once the in-band stage is
>>>     unstalled. In addition, we also care to do this before
>>>     preempt_schedule_irq() runs, so that we won't miss any rescheduling
>>>     request which might have been triggered by some IRQ we just played.
>>>     
>>>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>>>     Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>>> index 4e81c0c03e5726a..99512307537561b 100644
>>> --- a/kernel/entry/common.c
>>> +++ b/kernel/entry/common.c
>>> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state,
>>>  
>>>  #endif
>>>  
>>> +#ifdef CONFIG_IRQ_PIPELINE
>>> +
>>> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */
>>> +{
>>> +	/*
>>> +	 * If pipelining interrupts, enable in-band IRQs then
>>> +	 * synchronize the interrupt log on exit if:
>>> +	 *
>>> +	 * - irqentry_enter() stalled the stage in order to mirror the
>>> +	 * hardware state.
>>> +	 *
>>> +	 * - we where coming from oob, thus went through a stage migration
>>> +	 * that was caused by taking a CPU exception, e.g., a fault.
>>> +	 *
>>> +	 * We run before preempt_schedule_irq() may be called later on
>>> +	 * by preemptible kernels, so that any rescheduling request
>>> +	 * triggered by in-band IRQ handlers is considered.
>>> +	 */
>>> +	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>> +		state.stage_info == IRQENTRY_OOB) {
>>> +		unstall_inband_nocheck();
>>> +		synchronize_pipeline_on_irq();
>>> +		stall_inband_nocheck();
>>> +		return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +static void irqentry_unstall(void)
>>> +{
>>> +	unstall_inband_nocheck();
>>> +}
>>> +
>>> +#else
>>> +
>>> +static bool irqentry_syncstage(irqentry_state_t state)
>>> +{
>>> +	return false;
>>> +}
>>> +
>>> +static void irqentry_unstall(void)
>>> +{
>>> +}
>>> +
>>> +#endif
>>> +
>>>  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>  {
>>> +	bool synchronized = false;
>>> +
>>>  	if (running_oob())
>>>  		return;
>>>  
>>> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>  	if (user_mode(regs)) {
>>>  		irqentry_exit_to_user_mode(regs);
>>>  		return;
>>> -	} else if (irqexit_may_preempt_schedule(state, regs)) {
>>> +	}
>>> +
>>> +	synchronized = irqentry_syncstage(state);
>>> +
>>> +	if (irqexit_may_preempt_schedule(state, regs)) {
>>>  		/*
>>>  		 * If RCU was not watching on entry this needs to be done
>>>  		 * carefully and needs the same ordering of lockdep/tracing
>>> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>  	}
>>>  
>>>  out:
>>> -#ifdef CONFIG_IRQ_PIPELINE
>>> -	/*
>>> -	 * If pipelining interrupts, clear the in-band stall bit if
>>> -	 * irqentry_enter() raised it in order to mirror the hardware
>>> -	 * state. Also clear it when we where coming from oob, thus went
>>> -	 * through a migration that was caused by taking, e.g., a fault.
>>> -	 */
>>> -	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>> -	    state.stage_info == IRQENTRY_OOB)
>>> -		unstall_inband();
>>> -#endif
>>> +	if (synchronized)
>>> +		irqentry_unstall();
>>> +
>>>  	return;
>>>  }
>>>  
>>>
>>
>> I've put that into a staging/v5.10.y-dovetail branch, pointed
>> xenomai-images to that and started a run in our lab. That should give us
>> some confidence that things work under cobalt across all archs, and we
>> can make that commit official.
>>
>> However, heavy stressing would require triggering migration-on-fault in
>> tighter loops while running something like stress-ng in the background.
>> Maybe it would make sense to add such a pattern to the switchtest (would
>> also require a way to silence the related kernel warnings during
>> runtime, not only build-time)?
>>
> 
> switchtest aims at exercising the core scheduling logic including the
> transition between execution stages, so migration-on-fault would fit
> there indeed.
> 

Do we need this testcase first? My feeling from review and testing is
that is safe to move forward with the patch already.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


  reply	other threads:[~2021-06-11  7:15 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-06 12:35 [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs Jan Kiszka
2021-06-06 14:43 ` Philippe Gerum
2021-06-07  5:53   ` Jan Kiszka
2021-06-07  7:17     ` Philippe Gerum
2021-06-07 10:22       ` Jan Kiszka
2021-06-07 13:03         ` Philippe Gerum
2021-06-07 13:28           ` Philippe Gerum
2021-06-07 13:29             ` Philippe Gerum
2021-06-07 13:40               ` Jan Kiszka
2021-06-07 15:11                 ` Philippe Gerum
2021-06-07 13:44           ` Jan Kiszka
2021-06-07 13:48             ` Jan Kiszka
2021-06-07 13:58               ` Jan Kiszka
2021-06-07 14:44                 ` Philippe Gerum
2021-06-07 14:52                   ` Jan Kiszka
2021-06-07 15:04                 ` Philippe Gerum
2021-06-07 15:38                   ` Jan Kiszka
2021-06-07 16:19                     ` Philippe Gerum
2021-06-07 16:35                       ` Jan Kiszka
2021-06-08 16:50                         ` Philippe Gerum
2021-06-08 17:39                           ` Philippe Gerum
2021-06-09  5:34                             ` Jan Kiszka
2021-06-09  6:12                               ` Philippe Gerum
2021-06-11  7:15                                 ` Jan Kiszka [this message]
2021-06-11 15:39                                   ` Philippe Gerum
2021-06-07 14:37               ` Philippe Gerum
2021-06-07 14:36             ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99135b48-1452-4f47-eb93-1bfae2e5e0a2@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=rpm@xenomai.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.