[PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
@ 2021-06-06 12:35 Jan Kiszka
  2021-06-06 14:43 ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-06 12:35 UTC (permalink / raw)
  To: Xenomai, Philippe Gerum

From: Jan Kiszka <jan.kiszka@siemens.com>

The correct order is hard IRQs off first, then stalling inband, see also
I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
do not inject it before returning to user space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Fixes the RCU stalls Florian reported, at least for me.

I'm afraid this wasn't the last regression of translating I-pipe into
dovetail..

 include/linux/entry-common.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 0fb45b2d6094..00540110985e 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
 #ifndef local_irq_disable_exit_to_user
 static inline void local_irq_disable_exit_to_user(void)
 {
-	local_irq_disable_full();
+	hard_cond_local_irq_disable();
+	local_irq_disable();
 }
 #endif

--
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-06 12:35 [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs Jan Kiszka
@ 2021-06-06 14:43 ` Philippe Gerum
  2021-06-07  5:53   ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-06 14:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai, Bezdeka, Florian (T RDA IOT SES-DE)

Jan Kiszka <jan.kiszka@web.de> writes:

> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> The correct order is hard IRQs off first, then stalling inband, see also
> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
> do not inject it before returning to user space.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>
> Fixes the RCU stalls Florian reported, at least for me.
>
> I'm afraid this wasn't the last regression of translating I-pipe into
> dovetail..
>
>  include/linux/entry-common.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index 0fb45b2d6094..00540110985e 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>  #ifndef local_irq_disable_exit_to_user
>  static inline void local_irq_disable_exit_to_user(void)
>  {
> -	local_irq_disable_full();
> +	hard_cond_local_irq_disable();
> +	local_irq_disable();
>  }
>  #endif

Good spot, real issue here, just like the CPU_IDLE code deals with. The
fix looks good, but maybe we could do even better.

The way local_irq_disable_full() works tends to introduce this issue in
a sneaky way, when the code path does not synchronize the interrupt log
(exit_to_user_mode_loop may be at fault in this case, which does not
traverses synchronize_pipeline()). Lack of synchronization would still
hit us if some trap handler turns hard irqs off -> on -> off the same
way, and we don't leave through the user exit path.

Inverting the order of the actions in local_irq_disable_full() may be an
option (inband_irq_disable would allow that), making sure we cannot be
caught in the same issue when returning to kernel mode is another
way. This needs more thought I think.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-06 14:43 ` Philippe Gerum
@ 2021-06-07  5:53   ` Jan Kiszka
  2021-06-07  7:17     ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07  5:53 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
> 
> Jan Kiszka <jan.kiszka@web.de> writes:
> 
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> The correct order is hard IRQs off first, then stalling inband, see also
>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>> do not inject it before returning to user space.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> Fixes the RCU stalls Florian reported, at least for me.
>>
>> I'm afraid this wasn't the last regression of translating I-pipe into
>> dovetail..
>>
>>  include/linux/entry-common.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>> index 0fb45b2d6094..00540110985e 100644
>> --- a/include/linux/entry-common.h
>> +++ b/include/linux/entry-common.h
>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>  #ifndef local_irq_disable_exit_to_user
>>  static inline void local_irq_disable_exit_to_user(void)
>>  {
>> -	local_irq_disable_full();
>> +	hard_cond_local_irq_disable();
>> +	local_irq_disable();
>>  }
>>  #endif
> 
> Good spot, real issue here, just like the CPU_IDLE code deals with. The
> fix looks good, but maybe we could do even better.
> 
> The way local_irq_disable_full() works tends to introduce this issue in
> a sneaky way, when the code path does not synchronize the interrupt log
> (exit_to_user_mode_loop may be at fault in this case, which does not
> traverses synchronize_pipeline()). Lack of synchronization would still
> hit us if some trap handler turns hard irqs off -> on -> off the same
> way, and we don't leave through the user exit path.
> 
> Inverting the order of the actions in local_irq_disable_full() may be an
> option (inband_irq_disable would allow that), making sure we cannot be
> caught in the same issue when returning to kernel mode is another
> way. This needs more thought I think.
> 

So, always this way?

local_irq_disable_full:
	hard_local_irq_disable	
	local_irq_disable

local_irq_enable_full:
	hard_local_irq_enable
	local_irq_enable

Or even flip local_irq_enable_full as well?

Was there ever code that required ordering local_irq_disable_full the
other way around?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07  5:53   ` Jan Kiszka
@ 2021-06-07  7:17     ` Philippe Gerum
  2021-06-07 10:22       ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07  7:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>> 
>> Jan Kiszka <jan.kiszka@web.de> writes:
>> 
>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> The correct order is hard IRQs off first, then stalling inband, see also
>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>> do not inject it before returning to user space.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>
>>> Fixes the RCU stalls Florian reported, at least for me.
>>>
>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>> dovetail..
>>>
>>>  include/linux/entry-common.h | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>> index 0fb45b2d6094..00540110985e 100644
>>> --- a/include/linux/entry-common.h
>>> +++ b/include/linux/entry-common.h
>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>  #ifndef local_irq_disable_exit_to_user
>>>  static inline void local_irq_disable_exit_to_user(void)
>>>  {
>>> -	local_irq_disable_full();
>>> +	hard_cond_local_irq_disable();
>>> +	local_irq_disable();
>>>  }
>>>  #endif
>> 
>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>> fix looks good, but maybe we could do even better.
>> 
>> The way local_irq_disable_full() works tends to introduce this issue in
>> a sneaky way, when the code path does not synchronize the interrupt log
>> (exit_to_user_mode_loop may be at fault in this case, which does not
>> traverses synchronize_pipeline()). Lack of synchronization would still
>> hit us if some trap handler turns hard irqs off -> on -> off the same
>> way, and we don't leave through the user exit path.
>> 
>> Inverting the order of the actions in local_irq_disable_full() may be an
>> option (inband_irq_disable would allow that), making sure we cannot be
>> caught in the same issue when returning to kernel mode is another
>> way. This needs more thought I think.
>> 
>
> So, always this way?
>
> local_irq_disable_full:
> 	hard_local_irq_disable	
> 	local_irq_disable
>
> local_irq_enable_full:
> 	hard_local_irq_enable
> 	local_irq_enable
>

Yes.

> Or even flip local_irq_enable_full as well?

Not this one, hard irqs must be on before local_irq_enable() is issued
otherwise we would trigger a debug assertion. The reason for such
assertion is that the I-pipe version of local_irq_enable() force enables
hard irqs on, and this is something we want to depart from because
whether the caller expects or not such side-effect is error-prone. For
this reason, inband_irq_enable() expects hard irqs on, which should be
the current hard irq state for the caller under normal circumstances.

>
> Was there ever code that required ordering local_irq_disable_full the
> other way around?
>

After review, I don't think so. The current order for
local_irq_disable_full() was rather determined by applying a reverse
sequence compared to local_irq_enable_full(). So this looks good. I need
to test this on the armv[7/8] side here first before merging.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07  7:17     ` Philippe Gerum
@ 2021-06-07 10:22       ` Jan Kiszka
  2021-06-07 13:03         ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 10:22 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 09:17, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>
>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>> do not inject it before returning to user space.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>
>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>
>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>> dovetail..
>>>>
>>>>  include/linux/entry-common.h | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>> index 0fb45b2d6094..00540110985e 100644
>>>> --- a/include/linux/entry-common.h
>>>> +++ b/include/linux/entry-common.h
>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>  #ifndef local_irq_disable_exit_to_user
>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>  {
>>>> -	local_irq_disable_full();
>>>> +	hard_cond_local_irq_disable();
>>>> +	local_irq_disable();
>>>>  }
>>>>  #endif
>>>
>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>> fix looks good, but maybe we could do even better.
>>>
>>> The way local_irq_disable_full() works tends to introduce this issue in
>>> a sneaky way, when the code path does not synchronize the interrupt log
>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>> way, and we don't leave through the user exit path.
>>>
>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>> option (inband_irq_disable would allow that), making sure we cannot be
>>> caught in the same issue when returning to kernel mode is another
>>> way. This needs more thought I think.
>>>
>>
>> So, always this way?
>>
>> local_irq_disable_full:
>> 	hard_local_irq_disable	
>> 	local_irq_disable
>>
>> local_irq_enable_full:
>> 	hard_local_irq_enable
>> 	local_irq_enable
>>
> 
> Yes.
> 
>> Or even flip local_irq_enable_full as well?
> 
> Not this one, hard irqs must be on before local_irq_enable() is issued
> otherwise we would trigger a debug assertion. The reason for such
> assertion is that the I-pipe version of local_irq_enable() force enables
> hard irqs on, and this is something we want to depart from because
> whether the caller expects or not such side-effect is error-prone. For
> this reason, inband_irq_enable() expects hard irqs on, which should be
> the current hard irq state for the caller under normal circumstances.
> 
>>
>> Was there ever code that required ordering local_irq_disable_full the
>> other way around?
>>
> 
> After review, I don't think so. The current order for
> local_irq_disable_full() was rather determined by applying a reverse
> sequence compared to local_irq_enable_full(). So this looks good. I need
> to test this on the armv[7/8] side here first before merging.
> 

Is there a commit somewhere that I can drop already?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 10:22       ` Jan Kiszka
@ 2021-06-07 13:03         ` Philippe Gerum
  2021-06-07 13:28           ` Philippe Gerum
  2021-06-07 13:44           ` Jan Kiszka
  0 siblings, 2 replies; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 13:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 09:17, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>
>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>> do not inject it before returning to user space.
>>>>>
>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>> ---
>>>>>
>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>
>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>> dovetail..
>>>>>
>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>> --- a/include/linux/entry-common.h
>>>>> +++ b/include/linux/entry-common.h
>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>  {
>>>>> -	local_irq_disable_full();
>>>>> +	hard_cond_local_irq_disable();
>>>>> +	local_irq_disable();
>>>>>  }
>>>>>  #endif
>>>>
>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>> fix looks good, but maybe we could do even better.
>>>>
>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>> way, and we don't leave through the user exit path.
>>>>
>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>> caught in the same issue when returning to kernel mode is another
>>>> way. This needs more thought I think.
>>>>
>>>
>>> So, always this way?
>>>
>>> local_irq_disable_full:
>>> 	hard_local_irq_disable	
>>> 	local_irq_disable
>>>
>>> local_irq_enable_full:
>>> 	hard_local_irq_enable
>>> 	local_irq_enable
>>>
>> 
>> Yes.
>> 
>>> Or even flip local_irq_enable_full as well?
>> 
>> Not this one, hard irqs must be on before local_irq_enable() is issued
>> otherwise we would trigger a debug assertion. The reason for such
>> assertion is that the I-pipe version of local_irq_enable() force enables
>> hard irqs on, and this is something we want to depart from because
>> whether the caller expects or not such side-effect is error-prone. For
>> this reason, inband_irq_enable() expects hard irqs on, which should be
>> the current hard irq state for the caller under normal circumstances.
>> 
>>>
>>> Was there ever code that required ordering local_irq_disable_full the
>>> other way around?
>>>
>> 
>> After review, I don't think so. The current order for
>> local_irq_disable_full() was rather determined by applying a reverse
>> sequence compared to local_irq_enable_full(). So this looks good. I need
>> to test this on the armv[7/8] side here first before merging.
>> 
>
> Is there a commit somewhere that I can drop already?
>

1. These are fine with me in their latest iteration:

  irq_pipeline: Clean up stage_info field and users
  irq_pipeline: Account for stage migration across faults
  irq_pipeline: Warn when calling irqentry_enter with oob stalled
  x86: dovetail: Fix TS flag reservation

2. This one should be replaced by a fix in local_irq_disable_full(),
pending some tests ongoing here:

irq_pipeline: Prevent returning to user space with pending inband IRQs

However, as I mentioned earlier, this may not be sufficient per se, we
may have to synchronize the in-band log when unstalling the stage on the
kernel exit path, i.e. the exit point fixed up by the "Account for stage
migration across faults" patch. This would address the following
scenario:

        irqentry_enter on kernel context:
                       stall_inband();
                                trap_handler();
                                        hard_local_irq_enable()
                                        hard_local_irq_disable()
                       if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
                           state.stage_info == IRQENTRY_OOB) {
                             unstall_inband();
                       	     synchronize_pipeline();
                       }

3. I believe local_irq_restore_full() is fragile, it mixes hard and
virtual irq states carelessly. I'll likely issue a patch for this one
after testing.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:03         ` Philippe Gerum
@ 2021-06-07 13:28           ` Philippe Gerum
  2021-06-07 13:29             ` Philippe Gerum
  2021-06-07 13:44           ` Jan Kiszka
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 13:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Philippe Gerum <rpm@xenomai.org> writes:
>
> 2. This one should be replaced by a fix in local_irq_disable_full(),
> pending some tests ongoing here:
>
> irq_pipeline: Prevent returning to user space with pending inband IRQs
>

Tests went ok. So [1] should be replaced by a patch inverting the order
in local_irq_disable_full() as we discussed.

[1] irq_pipeline: Prevent returning to user space with pending inband IRQs


-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:28           ` Philippe Gerum
@ 2021-06-07 13:29             ` Philippe Gerum
  2021-06-07 13:40               ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 13:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Philippe Gerum <rpm@xenomai.org> writes:

> Philippe Gerum <rpm@xenomai.org> writes:
>>
>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>> pending some tests ongoing here:
>>
>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>
>
> Tests went ok. So [1] should be replaced by a patch inverting the order
> in local_irq_disable_full() as we discussed.
>
> [1] irq_pipeline: Prevent returning to user space with pending inband IRQs

Likewise for local_irq_save_full() which may cause the same problem,
i.e.:

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 8663f19cc7b7b68..051c72751db42d1 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -242,14 +242,14 @@ do {						\
 
 #define local_irq_disable_full()		\
 	do {					\
-		local_irq_disable();		\
 		hard_local_irq_disable();	\
+		local_irq_disable();		\
 	} while (0)
 
 #define local_irq_save_full(__flags)		\
   	do {					\
-		local_irq_save(__flags);	\
 		hard_local_irq_disable();	\
+		local_irq_save(__flags);	\
 	} while (0)
 
 #define local_irq_restore_full(__flags)			\
 
-- 
Philippe.


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:29             ` Philippe Gerum
@ 2021-06-07 13:40               ` Jan Kiszka
  2021-06-07 15:11                 ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 13:40 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 15:29, Philippe Gerum wrote:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>> pending some tests ongoing here:
>>>
>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>
>>
>> Tests went ok. So [1] should be replaced by a patch inverting the order
>> in local_irq_disable_full() as we discussed.
>>
>> [1] irq_pipeline: Prevent returning to user space with pending inband IRQs
> 
> Likewise for local_irq_save_full() which may cause the same problem,
> i.e.:
> 
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 8663f19cc7b7b68..051c72751db42d1 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -242,14 +242,14 @@ do {						\
>  
>  #define local_irq_disable_full()		\
>  	do {					\
> -		local_irq_disable();		\
>  		hard_local_irq_disable();	\
> +		local_irq_disable();		\
>  	} while (0)
>  
>  #define local_irq_save_full(__flags)		\
>    	do {					\
> -		local_irq_save(__flags);	\
>  		hard_local_irq_disable();	\
> +		local_irq_save(__flags);	\
>  	} while (0)
>  
>  #define local_irq_restore_full(__flags)			\
>  
> 

As you tested already, I assume you will make the corresponding commit
available via -rebase. Let me know otherwise if I should do that.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:40               ` Jan Kiszka
@ 2021-06-07 15:11                 ` Philippe Gerum
  0 siblings, 0 replies; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 15:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 15:29, Philippe Gerum wrote:
>> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>> 
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>>
>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>> pending some tests ongoing here:
>>>>
>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>
>>>
>>> Tests went ok. So [1] should be replaced by a patch inverting the order
>>> in local_irq_disable_full() as we discussed.
>>>
>>> [1] irq_pipeline: Prevent returning to user space with pending inband IRQs
>> 
>> Likewise for local_irq_save_full() which may cause the same problem,
>> i.e.:
>> 
>> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
>> index 8663f19cc7b7b68..051c72751db42d1 100644
>> --- a/include/linux/irqflags.h
>> +++ b/include/linux/irqflags.h
>> @@ -242,14 +242,14 @@ do {						\
>>  
>>  #define local_irq_disable_full()		\
>>  	do {					\
>> -		local_irq_disable();		\
>>  		hard_local_irq_disable();	\
>> +		local_irq_disable();		\
>>  	} while (0)
>>  
>>  #define local_irq_save_full(__flags)		\
>>    	do {					\
>> -		local_irq_save(__flags);	\
>>  		hard_local_irq_disable();	\
>> +		local_irq_save(__flags);	\
>>  	} while (0)
>>  
>>  #define local_irq_restore_full(__flags)			\
>>  
>> 
>
> As you tested already, I assume you will make the corresponding commit
> available via -rebase. Let me know otherwise if I should do that.
>
> Jan

Yes, I'll do that.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:03         ` Philippe Gerum
  2021-06-07 13:28           ` Philippe Gerum
@ 2021-06-07 13:44           ` Jan Kiszka
  2021-06-07 13:48             ` Jan Kiszka
  2021-06-07 14:36             ` Philippe Gerum
  1 sibling, 2 replies; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 13:44 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 15:03, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>
>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>
>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>> do not inject it before returning to user space.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> ---
>>>>>>
>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>
>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>> dovetail..
>>>>>>
>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>> --- a/include/linux/entry-common.h
>>>>>> +++ b/include/linux/entry-common.h
>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>  {
>>>>>> -	local_irq_disable_full();
>>>>>> +	hard_cond_local_irq_disable();
>>>>>> +	local_irq_disable();
>>>>>>  }
>>>>>>  #endif
>>>>>
>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>> fix looks good, but maybe we could do even better.
>>>>>
>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>> way, and we don't leave through the user exit path.
>>>>>
>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>> caught in the same issue when returning to kernel mode is another
>>>>> way. This needs more thought I think.
>>>>>
>>>>
>>>> So, always this way?
>>>>
>>>> local_irq_disable_full:
>>>> 	hard_local_irq_disable	
>>>> 	local_irq_disable
>>>>
>>>> local_irq_enable_full:
>>>> 	hard_local_irq_enable
>>>> 	local_irq_enable
>>>>
>>>
>>> Yes.
>>>
>>>> Or even flip local_irq_enable_full as well?
>>>
>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>> otherwise we would trigger a debug assertion. The reason for such
>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>> hard irqs on, and this is something we want to depart from because
>>> whether the caller expects or not such side-effect is error-prone. For
>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>> the current hard irq state for the caller under normal circumstances.
>>>
>>>>
>>>> Was there ever code that required ordering local_irq_disable_full the
>>>> other way around?
>>>>
>>>
>>> After review, I don't think so. The current order for
>>> local_irq_disable_full() was rather determined by applying a reverse
>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>> to test this on the armv[7/8] side here first before merging.
>>>
>>
>> Is there a commit somewhere that I can drop already?
>>
> 
> 1. These are fine with me in their latest iteration:
> 
>   irq_pipeline: Clean up stage_info field and users
>   irq_pipeline: Account for stage migration across faults
>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>   x86: dovetail: Fix TS flag reservation
> 
> 2. This one should be replaced by a fix in local_irq_disable_full(),
> pending some tests ongoing here:
> 
> irq_pipeline: Prevent returning to user space with pending inband IRQs
> 
> However, as I mentioned earlier, this may not be sufficient per se, we
> may have to synchronize the in-band log when unstalling the stage on the
> kernel exit path, i.e. the exit point fixed up by the "Account for stage
> migration across faults" patch. This would address the following
> scenario:
> 
>         irqentry_enter on kernel context:
>                        stall_inband();
>                                 trap_handler();
>                                         hard_local_irq_enable()
>                                         hard_local_irq_disable()
>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>                            state.stage_info == IRQENTRY_OOB) {
>                              unstall_inband();
>                        	     synchronize_pipeline();

Oh, unstall_inband does not synchronize? Because of
hard_local_irq_disable or in general?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:44           ` Jan Kiszka
@ 2021-06-07 13:48             ` Jan Kiszka
  2021-06-07 13:58               ` Jan Kiszka
  2021-06-07 14:37               ` Philippe Gerum
  2021-06-07 14:36             ` Philippe Gerum
  1 sibling, 2 replies; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 13:48 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 15:44, Jan Kiszka wrote:
> On 07.06.21 15:03, Philippe Gerum wrote:
>>
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>
>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>
>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>> do not inject it before returning to user space.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>> ---
>>>>>>>
>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>
>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>> dovetail..
>>>>>>>
>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>> --- a/include/linux/entry-common.h
>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>  {
>>>>>>> -	local_irq_disable_full();
>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>> +	local_irq_disable();
>>>>>>>  }
>>>>>>>  #endif
>>>>>>
>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>> fix looks good, but maybe we could do even better.
>>>>>>
>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>> way, and we don't leave through the user exit path.
>>>>>>
>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>> way. This needs more thought I think.
>>>>>>
>>>>>
>>>>> So, always this way?
>>>>>
>>>>> local_irq_disable_full:
>>>>> 	hard_local_irq_disable	
>>>>> 	local_irq_disable
>>>>>
>>>>> local_irq_enable_full:
>>>>> 	hard_local_irq_enable
>>>>> 	local_irq_enable
>>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Or even flip local_irq_enable_full as well?
>>>>
>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>> otherwise we would trigger a debug assertion. The reason for such
>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>> hard irqs on, and this is something we want to depart from because
>>>> whether the caller expects or not such side-effect is error-prone. For
>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>> the current hard irq state for the caller under normal circumstances.
>>>>
>>>>>
>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>> other way around?
>>>>>
>>>>
>>>> After review, I don't think so. The current order for
>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>> to test this on the armv[7/8] side here first before merging.
>>>>
>>>
>>> Is there a commit somewhere that I can drop already?
>>>
>>
>> 1. These are fine with me in their latest iteration:
>>
>>   irq_pipeline: Clean up stage_info field and users
>>   irq_pipeline: Account for stage migration across faults
>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>   x86: dovetail: Fix TS flag reservation
>>
>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>> pending some tests ongoing here:
>>
>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>
>> However, as I mentioned earlier, this may not be sufficient per se, we
>> may have to synchronize the in-band log when unstalling the stage on the
>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>> migration across faults" patch. This would address the following
>> scenario:
>>
>>         irqentry_enter on kernel context:
>>                        stall_inband();
>>                                 trap_handler();
>>                                         hard_local_irq_enable()
>>                                         hard_local_irq_disable()
>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>                            state.stage_info == IRQENTRY_OOB) {
>>                              unstall_inband();
>>                        	     synchronize_pipeline();
> 
> Oh, unstall_inband does not synchronize? Because of
> hard_local_irq_disable or in general?
> 

Ah, found it - unstall != inband_irq_enable. Why not using the latter
directly?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:48             ` Jan Kiszka
@ 2021-06-07 13:58               ` Jan Kiszka
  2021-06-07 14:44                 ` Philippe Gerum
  2021-06-07 15:04                 ` Philippe Gerum
  2021-06-07 14:37               ` Philippe Gerum
  1 sibling, 2 replies; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 13:58 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 15:48, Jan Kiszka wrote:
> On 07.06.21 15:44, Jan Kiszka wrote:
>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>
>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>
>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>> do not inject it before returning to user space.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>
>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>> dovetail..
>>>>>>>>
>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>  {
>>>>>>>> -	local_irq_disable_full();
>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>> +	local_irq_disable();
>>>>>>>>  }
>>>>>>>>  #endif
>>>>>>>
>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>
>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>
>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>> way. This needs more thought I think.
>>>>>>>
>>>>>>
>>>>>> So, always this way?
>>>>>>
>>>>>> local_irq_disable_full:
>>>>>> 	hard_local_irq_disable	
>>>>>> 	local_irq_disable
>>>>>>
>>>>>> local_irq_enable_full:
>>>>>> 	hard_local_irq_enable
>>>>>> 	local_irq_enable
>>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Or even flip local_irq_enable_full as well?
>>>>>
>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>> hard irqs on, and this is something we want to depart from because
>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>
>>>>>>
>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>> other way around?
>>>>>>
>>>>>
>>>>> After review, I don't think so. The current order for
>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>
>>>>
>>>> Is there a commit somewhere that I can drop already?
>>>>
>>>
>>> 1. These are fine with me in their latest iteration:
>>>
>>>   irq_pipeline: Clean up stage_info field and users
>>>   irq_pipeline: Account for stage migration across faults
>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>   x86: dovetail: Fix TS flag reservation
>>>
>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>> pending some tests ongoing here:
>>>
>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>
>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>> may have to synchronize the in-band log when unstalling the stage on the
>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>> migration across faults" patch. This would address the following
>>> scenario:
>>>
>>>         irqentry_enter on kernel context:
>>>                        stall_inband();
>>>                                 trap_handler();
>>>                                         hard_local_irq_enable()
>>>                                         hard_local_irq_disable()
>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>                            state.stage_info == IRQENTRY_OOB) {
>>>                              unstall_inband();
>>>                        	     synchronize_pipeline();
>>
>> Oh, unstall_inband does not synchronize? Because of
>> hard_local_irq_disable or in general?
>>
> 
> Ah, found it - unstall != inband_irq_enable. Why not using the latter
> directly?
> 

...because it's instrumented to detect hard_irqs_disabled.

Can we safely sync at that point? Will that also ensure that an inband
rescheduling event will not be delayed? We are not running
irqentry_exit_cond_resched after that.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:58               ` Jan Kiszka
@ 2021-06-07 14:44                 ` Philippe Gerum
  2021-06-07 14:52                   ` Jan Kiszka
  2021-06-07 15:04                 ` Philippe Gerum
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 14:44 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 15:48, Jan Kiszka wrote:
>> On 07.06.21 15:44, Jan Kiszka wrote:
>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>
>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>
>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>
>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>> dovetail..
>>>>>>>>>
>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>  {
>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>> +	local_irq_disable();
>>>>>>>>>  }
>>>>>>>>>  #endif
>>>>>>>>
>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>
>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>
>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>> way. This needs more thought I think.
>>>>>>>>
>>>>>>>
>>>>>>> So, always this way?
>>>>>>>
>>>>>>> local_irq_disable_full:
>>>>>>> 	hard_local_irq_disable	
>>>>>>> 	local_irq_disable
>>>>>>>
>>>>>>> local_irq_enable_full:
>>>>>>> 	hard_local_irq_enable
>>>>>>> 	local_irq_enable
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>
>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>
>>>>>>>
>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>> other way around?
>>>>>>>
>>>>>>
>>>>>> After review, I don't think so. The current order for
>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>
>>>>>
>>>>> Is there a commit somewhere that I can drop already?
>>>>>
>>>>
>>>> 1. These are fine with me in their latest iteration:
>>>>
>>>>   irq_pipeline: Clean up stage_info field and users
>>>>   irq_pipeline: Account for stage migration across faults
>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>   x86: dovetail: Fix TS flag reservation
>>>>
>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>> pending some tests ongoing here:
>>>>
>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>
>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>> migration across faults" patch. This would address the following
>>>> scenario:
>>>>
>>>>         irqentry_enter on kernel context:
>>>>                        stall_inband();
>>>>                                 trap_handler();
>>>>                                         hard_local_irq_enable()
>>>>                                         hard_local_irq_disable()
>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>                              unstall_inband();
>>>>                        	     synchronize_pipeline();
>>>
>>> Oh, unstall_inband does not synchronize? Because of
>>> hard_local_irq_disable or in general?
>>>
>> 
>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>> directly?
>> 
>
> ...because it's instrumented to detect hard_irqs_disabled.
>
> Can we safely sync at that point? Will that also ensure that an inband
> rescheduling event will not be delayed? We are not running
> irqentry_exit_cond_resched after that.
>

We should have hit irqexit_may_preempt_schedule() and rescheduled
appropriately before branching to unstall_inband(). Since we cannot log
more IRQs in the meantime, we should not miss any rescheduling
opportunity.

[Yes, __inband_irq_enable() reschedules, but does not perform any of the
expected rcu tweaks, so we should not count on it in this particular
case].

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 14:44                 ` Philippe Gerum
@ 2021-06-07 14:52                   ` Jan Kiszka
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 14:52 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 16:44, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 07.06.21 15:48, Jan Kiszka wrote:
>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>
>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>
>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>
>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>> dovetail..
>>>>>>>>>>
>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>  {
>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>  }
>>>>>>>>>>  #endif
>>>>>>>>>
>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>
>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>
>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>
>>>>>>>>
>>>>>>>> So, always this way?
>>>>>>>>
>>>>>>>> local_irq_disable_full:
>>>>>>>> 	hard_local_irq_disable	
>>>>>>>> 	local_irq_disable
>>>>>>>>
>>>>>>>> local_irq_enable_full:
>>>>>>>> 	hard_local_irq_enable
>>>>>>>> 	local_irq_enable
>>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>
>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>
>>>>>>>>
>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>> other way around?
>>>>>>>>
>>>>>>>
>>>>>>> After review, I don't think so. The current order for
>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>
>>>>>>
>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>
>>>>>
>>>>> 1. These are fine with me in their latest iteration:
>>>>>
>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>
>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>> pending some tests ongoing here:
>>>>>
>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>
>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>> migration across faults" patch. This would address the following
>>>>> scenario:
>>>>>
>>>>>         irqentry_enter on kernel context:
>>>>>                        stall_inband();
>>>>>                                 trap_handler();
>>>>>                                         hard_local_irq_enable()
>>>>>                                         hard_local_irq_disable()
>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>                              unstall_inband();
>>>>>                        	     synchronize_pipeline();
>>>>
>>>> Oh, unstall_inband does not synchronize? Because of
>>>> hard_local_irq_disable or in general?
>>>>
>>>
>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>> directly?
>>>
>>
>> ...because it's instrumented to detect hard_irqs_disabled.
>>
>> Can we safely sync at that point? Will that also ensure that an inband
>> rescheduling event will not be delayed? We are not running
>> irqentry_exit_cond_resched after that.
>>
> 
> We should have hit irqexit_may_preempt_schedule() and rescheduled
> appropriately before branching to unstall_inband(). Since we cannot log
> more IRQs in the meantime, we should not miss any rescheduling
> opportunity.
> 
> [Yes, __inband_irq_enable() reschedules, but does not perform any of the
> expected rcu tweaks, so we should not count on it in this particular
> case].
> 

I don't get this argument yet: irqexit_may_preempt_schedule() runs
before we replayed the log, no? So it cannot act on any reschedules that
only replay the log will mark pending.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:58               ` Jan Kiszka
  2021-06-07 14:44                 ` Philippe Gerum
@ 2021-06-07 15:04                 ` Philippe Gerum
  2021-06-07 15:38                   ` Jan Kiszka
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 15:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 15:48, Jan Kiszka wrote:
>> On 07.06.21 15:44, Jan Kiszka wrote:
>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>
>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>
>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>
>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>> dovetail..
>>>>>>>>>
>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>  {
>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>> +	local_irq_disable();
>>>>>>>>>  }
>>>>>>>>>  #endif
>>>>>>>>
>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>
>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>
>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>> way. This needs more thought I think.
>>>>>>>>
>>>>>>>
>>>>>>> So, always this way?
>>>>>>>
>>>>>>> local_irq_disable_full:
>>>>>>> 	hard_local_irq_disable	
>>>>>>> 	local_irq_disable
>>>>>>>
>>>>>>> local_irq_enable_full:
>>>>>>> 	hard_local_irq_enable
>>>>>>> 	local_irq_enable
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>
>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>
>>>>>>>
>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>> other way around?
>>>>>>>
>>>>>>
>>>>>> After review, I don't think so. The current order for
>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>
>>>>>
>>>>> Is there a commit somewhere that I can drop already?
>>>>>
>>>>
>>>> 1. These are fine with me in their latest iteration:
>>>>
>>>>   irq_pipeline: Clean up stage_info field and users
>>>>   irq_pipeline: Account for stage migration across faults
>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>   x86: dovetail: Fix TS flag reservation
>>>>
>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>> pending some tests ongoing here:
>>>>
>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>
>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>> migration across faults" patch. This would address the following
>>>> scenario:
>>>>
>>>>         irqentry_enter on kernel context:
>>>>                        stall_inband();
>>>>                                 trap_handler();
>>>>                                         hard_local_irq_enable()
>>>>                                         hard_local_irq_disable()
>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>                              unstall_inband();
>>>>                        	     synchronize_pipeline();
>>>
>>> Oh, unstall_inband does not synchronize? Because of
>>> hard_local_irq_disable or in general?
>>>
>> 
>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>> directly?
>> 
>
> ...because it's instrumented to detect hard_irqs_disabled.
>
> Can we safely sync at that point?

Yes we may do this, because we restore the original -unstalled- state of
the in-band stage. So if we have pending IRQs, we may play them at this
point. What we may not do obviously, is creating synchronization points
where the kernel does not expect irqs.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 15:04                 ` Philippe Gerum
@ 2021-06-07 15:38                   ` Jan Kiszka
  2021-06-07 16:19                     ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 15:38 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 17:04, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 07.06.21 15:48, Jan Kiszka wrote:
>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>
>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>
>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>
>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>> dovetail..
>>>>>>>>>>
>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>  {
>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>  }
>>>>>>>>>>  #endif
>>>>>>>>>
>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>
>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>
>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>
>>>>>>>>
>>>>>>>> So, always this way?
>>>>>>>>
>>>>>>>> local_irq_disable_full:
>>>>>>>> 	hard_local_irq_disable	
>>>>>>>> 	local_irq_disable
>>>>>>>>
>>>>>>>> local_irq_enable_full:
>>>>>>>> 	hard_local_irq_enable
>>>>>>>> 	local_irq_enable
>>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>
>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>
>>>>>>>>
>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>> other way around?
>>>>>>>>
>>>>>>>
>>>>>>> After review, I don't think so. The current order for
>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>
>>>>>>
>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>
>>>>>
>>>>> 1. These are fine with me in their latest iteration:
>>>>>
>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>
>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>> pending some tests ongoing here:
>>>>>
>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>
>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>> migration across faults" patch. This would address the following
>>>>> scenario:
>>>>>
>>>>>         irqentry_enter on kernel context:
>>>>>                        stall_inband();
>>>>>                                 trap_handler();
>>>>>                                         hard_local_irq_enable()
>>>>>                                         hard_local_irq_disable()
>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>                              unstall_inband();
>>>>>                        	     synchronize_pipeline();
>>>>
>>>> Oh, unstall_inband does not synchronize? Because of
>>>> hard_local_irq_disable or in general?
>>>>
>>>
>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>> directly?
>>>
>>
>> ...because it's instrumented to detect hard_irqs_disabled.
>>
>> Can we safely sync at that point?
> 
> Yes we may do this, because we restore the original -unstalled- state of
> the in-band stage. So if we have pending IRQs, we may play them at this
> point. What we may not do obviously, is creating synchronization points
> where the kernel does not expect irqs.
> 

Again: When will any reschedules that might be requested by those
replayed IRQs be handled then?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 15:38                   ` Jan Kiszka
@ 2021-06-07 16:19                     ` Philippe Gerum
  2021-06-07 16:35                       ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 16:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 17:04, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>
>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>
>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>
>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>> dovetail..
>>>>>>>>>>>
>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>  {
>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>  }
>>>>>>>>>>>  #endif
>>>>>>>>>>
>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>
>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>
>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So, always this way?
>>>>>>>>>
>>>>>>>>> local_irq_disable_full:
>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>> 	local_irq_disable
>>>>>>>>>
>>>>>>>>> local_irq_enable_full:
>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>> 	local_irq_enable
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>
>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>> other way around?
>>>>>>>>>
>>>>>>>>
>>>>>>>> After review, I don't think so. The current order for
>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>
>>>>>>>
>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>
>>>>>>
>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>
>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>
>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>> pending some tests ongoing here:
>>>>>>
>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>
>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>> migration across faults" patch. This would address the following
>>>>>> scenario:
>>>>>>
>>>>>>         irqentry_enter on kernel context:
>>>>>>                        stall_inband();
>>>>>>                                 trap_handler();
>>>>>>                                         hard_local_irq_enable()
>>>>>>                                         hard_local_irq_disable()
>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>                              unstall_inband();
>>>>>>                        	     synchronize_pipeline();
>>>>>
>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>> hard_local_irq_disable or in general?
>>>>>
>>>>
>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>> directly?
>>>>
>>>
>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>
>>> Can we safely sync at that point?
>> 
>> Yes we may do this, because we restore the original -unstalled- state of
>> the in-band stage. So if we have pending IRQs, we may play them at this
>> point. What we may not do obviously, is creating synchronization points
>> where the kernel does not expect irqs.
>> 
>
> Again: When will any reschedules that might be requested by those
> replayed IRQs be handled then?
>

Nowhere unless we provide a log synchronization routine dedicated to the
in-band kernel exit path, which would fold in the rescheduling call via
irqentry_exit_cond_resched(). I'll write that one.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 16:19                     ` Philippe Gerum
@ 2021-06-07 16:35                       ` Jan Kiszka
  2021-06-08 16:50                         ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-07 16:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 07.06.21 18:19, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>
>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>
>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>
>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>  {
>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>  }
>>>>>>>>>>>>  #endif
>>>>>>>>>>>
>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>
>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>
>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So, always this way?
>>>>>>>>>>
>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>
>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>
>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>> other way around?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>
>>>>>>>
>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>
>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>
>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>> pending some tests ongoing here:
>>>>>>>
>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>
>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>> migration across faults" patch. This would address the following
>>>>>>> scenario:
>>>>>>>
>>>>>>>         irqentry_enter on kernel context:
>>>>>>>                        stall_inband();
>>>>>>>                                 trap_handler();
>>>>>>>                                         hard_local_irq_enable()
>>>>>>>                                         hard_local_irq_disable()
>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>                              unstall_inband();
>>>>>>>                        	     synchronize_pipeline();
>>>>>>
>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>> hard_local_irq_disable or in general?
>>>>>>
>>>>>
>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>> directly?
>>>>>
>>>>
>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>
>>>> Can we safely sync at that point?
>>>
>>> Yes we may do this, because we restore the original -unstalled- state of
>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>> point. What we may not do obviously, is creating synchronization points
>>> where the kernel does not expect irqs.
>>>
>>
>> Again: When will any reschedules that might be requested by those
>> replayed IRQs be handled then?
>>
> 
> Nowhere unless we provide a log synchronization routine dedicated to the
> in-band kernel exit path, which would fold in the rescheduling call via
> irqentry_exit_cond_resched(). I'll write that one.
> 

How about simply moving the replay up, before the kernel checks for
pending reschedules anyway? We can stall inband then again, to please
lockdep & Co., as long as there is no hard-irq-on section before the intret.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 16:35                       ` Jan Kiszka
@ 2021-06-08 16:50                         ` Philippe Gerum
  2021-06-08 17:39                           ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-08 16:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 18:19, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>
>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>
>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>
>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>
>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, always this way?
>>>>>>>>>>>
>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>
>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>
>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>> other way around?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>
>>>>>>>>
>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>
>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>
>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>> pending some tests ongoing here:
>>>>>>>>
>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>
>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>> scenario:
>>>>>>>>
>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>                        stall_inband();
>>>>>>>>                                 trap_handler();
>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>                              unstall_inband();
>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>
>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>> hard_local_irq_disable or in general?
>>>>>>>
>>>>>>
>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>> directly?
>>>>>>
>>>>>
>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>
>>>>> Can we safely sync at that point?
>>>>
>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>> point. What we may not do obviously, is creating synchronization points
>>>> where the kernel does not expect irqs.
>>>>
>>>
>>> Again: When will any reschedules that might be requested by those
>>> replayed IRQs be handled then?
>>>
>> 
>> Nowhere unless we provide a log synchronization routine dedicated to the
>> in-band kernel exit path, which would fold in the rescheduling call via
>> irqentry_exit_cond_resched(). I'll write that one.
>> 
>
> How about simply moving the replay up, before the kernel checks for
> pending reschedules anyway? We can stall inband then again, to please
> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>

Yes, this is the right way to do this. I'm on it.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-08 16:50                         ` Philippe Gerum
@ 2021-06-08 17:39                           ` Philippe Gerum
  2021-06-09  5:34                             ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-08 17:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Philippe Gerum <rpm@xenomai.org> writes:

> Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 07.06.21 18:19, Philippe Gerum wrote:
>>> 
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>> 
>>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>>
>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So, always this way?
>>>>>>>>>>>>
>>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>>
>>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>>
>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>>> other way around?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>>
>>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>>
>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>>> pending some tests ongoing here:
>>>>>>>>>
>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>>
>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>>> scenario:
>>>>>>>>>
>>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>>                        stall_inband();
>>>>>>>>>                                 trap_handler();
>>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>>                              unstall_inband();
>>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>>
>>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>>> hard_local_irq_disable or in general?
>>>>>>>>
>>>>>>>
>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>>> directly?
>>>>>>>
>>>>>>
>>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>>
>>>>>> Can we safely sync at that point?
>>>>>
>>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>>> point. What we may not do obviously, is creating synchronization points
>>>>> where the kernel does not expect irqs.
>>>>>
>>>>
>>>> Again: When will any reschedules that might be requested by those
>>>> replayed IRQs be handled then?
>>>>
>>> 
>>> Nowhere unless we provide a log synchronization routine dedicated to the
>>> in-band kernel exit path, which would fold in the rescheduling call via
>>> irqentry_exit_cond_resched(). I'll write that one.
>>> 
>>
>> How about simply moving the replay up, before the kernel checks for
>> pending reschedules anyway? We can stall inband then again, to please
>> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>>
>
> Yes, this is the right way to do this. I'm on it.

Please test this (heavily) with cobalt. This survived a 2-hours long kvm
test here, running loops of the evl test suite + hackbench over 64 vcpus,
so far so good. No reason for any difference in behavior between cobalt
and evl in this area, this is merely a dovetail business.

commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl)
Author: Philippe Gerum <rpm@xenomai.org>

    genirq: irq_pipeline: synchronize log on irq exit to kernel
    
    We must make sure to play any IRQ which might be pending in the
    in-band log before leaving an interrupt frame for a preempted kernel
    context.
    
    This completes "irq_pipeline: Account for stage migration across
    faults", so that we synchronize the log once the in-band stage is
    unstalled. In addition, we also care to do this before
    preempt_schedule_irq() runs, so that we won't miss any rescheduling
    request which might have been triggered by some IRQ we just played.
    
    Signed-off-by: Philippe Gerum <rpm@xenomai.org>
    Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 4e81c0c03e5726a..99512307537561b 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state,
 
 #endif
 
+#ifdef CONFIG_IRQ_PIPELINE
+
+static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */
+{
+	/*
+	 * If pipelining interrupts, enable in-band IRQs then
+	 * synchronize the interrupt log on exit if:
+	 *
+	 * - irqentry_enter() stalled the stage in order to mirror the
+	 * hardware state.
+	 *
+	 * - we where coming from oob, thus went through a stage migration
+	 * that was caused by taking a CPU exception, e.g., a fault.
+	 *
+	 * We run before preempt_schedule_irq() may be called later on
+	 * by preemptible kernels, so that any rescheduling request
+	 * triggered by in-band IRQ handlers is considered.
+	 */
+	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
+		state.stage_info == IRQENTRY_OOB) {
+		unstall_inband_nocheck();
+		synchronize_pipeline_on_irq();
+		stall_inband_nocheck();
+		return true;
+	}
+
+	return false;
+}
+
+static void irqentry_unstall(void)
+{
+	unstall_inband_nocheck();
+}
+
+#else
+
+static bool irqentry_syncstage(irqentry_state_t state)
+{
+	return false;
+}
+
+static void irqentry_unstall(void)
+{
+}
+
+#endif
+
 noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 {
+	bool synchronized = false;
+
 	if (running_oob())
 		return;
 
@@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 	if (user_mode(regs)) {
 		irqentry_exit_to_user_mode(regs);
 		return;
-	} else if (irqexit_may_preempt_schedule(state, regs)) {
+	}
+
+	synchronized = irqentry_syncstage(state);
+
+	if (irqexit_may_preempt_schedule(state, regs)) {
 		/*
 		 * If RCU was not watching on entry this needs to be done
 		 * carefully and needs the same ordering of lockdep/tracing
@@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 	}
 
 out:
-#ifdef CONFIG_IRQ_PIPELINE
-	/*
-	 * If pipelining interrupts, clear the in-band stall bit if
-	 * irqentry_enter() raised it in order to mirror the hardware
-	 * state. Also clear it when we where coming from oob, thus went
-	 * through a migration that was caused by taking, e.g., a fault.
-	 */
-	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
-	    state.stage_info == IRQENTRY_OOB)
-		unstall_inband();
-#endif
+	if (synchronized)
+		irqentry_unstall();
+
 	return;
 }
 
-- 
Philippe.


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-08 17:39                           ` Philippe Gerum
@ 2021-06-09  5:34                             ` Jan Kiszka
  2021-06-09  6:12                               ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-09  5:34 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 08.06.21 19:39, Philippe Gerum wrote:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 07.06.21 18:19, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, always this way?
>>>>>>>>>>>>>
>>>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>>>
>>>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes.
>>>>>>>>>>>>
>>>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>>>
>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>>>> other way around?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>>>
>>>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>>>
>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>>>> pending some tests ongoing here:
>>>>>>>>>>
>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>>>
>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>>>> scenario:
>>>>>>>>>>
>>>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>>>                        stall_inband();
>>>>>>>>>>                                 trap_handler();
>>>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>>>                              unstall_inband();
>>>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>>>
>>>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>>>> hard_local_irq_disable or in general?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>>>> directly?
>>>>>>>>
>>>>>>>
>>>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>>>
>>>>>>> Can we safely sync at that point?
>>>>>>
>>>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>>>> point. What we may not do obviously, is creating synchronization points
>>>>>> where the kernel does not expect irqs.
>>>>>>
>>>>>
>>>>> Again: When will any reschedules that might be requested by those
>>>>> replayed IRQs be handled then?
>>>>>
>>>>
>>>> Nowhere unless we provide a log synchronization routine dedicated to the
>>>> in-band kernel exit path, which would fold in the rescheduling call via
>>>> irqentry_exit_cond_resched(). I'll write that one.
>>>>
>>>
>>> How about simply moving the replay up, before the kernel checks for
>>> pending reschedules anyway? We can stall inband then again, to please
>>> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>>>
>>
>> Yes, this is the right way to do this. I'm on it.
> 
> Please test this (heavily) with cobalt. This survived a 2-hours long kvm
> test here, running loops of the evl test suite + hackbench over 64 vcpus,
> so far so good. No reason for any difference in behavior between cobalt
> and evl in this area, this is merely a dovetail business.
> 
> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl)
> Author: Philippe Gerum <rpm@xenomai.org>
> 
>     genirq: irq_pipeline: synchronize log on irq exit to kernel
>     
>     We must make sure to play any IRQ which might be pending in the
>     in-band log before leaving an interrupt frame for a preempted kernel
>     context.
>     
>     This completes "irq_pipeline: Account for stage migration across
>     faults", so that we synchronize the log once the in-band stage is
>     unstalled. In addition, we also care to do this before
>     preempt_schedule_irq() runs, so that we won't miss any rescheduling
>     request which might have been triggered by some IRQ we just played.
>     
>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>     Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 4e81c0c03e5726a..99512307537561b 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state,
>  
>  #endif
>  
> +#ifdef CONFIG_IRQ_PIPELINE
> +
> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */
> +{
> +	/*
> +	 * If pipelining interrupts, enable in-band IRQs then
> +	 * synchronize the interrupt log on exit if:
> +	 *
> +	 * - irqentry_enter() stalled the stage in order to mirror the
> +	 * hardware state.
> +	 *
> +	 * - we where coming from oob, thus went through a stage migration
> +	 * that was caused by taking a CPU exception, e.g., a fault.
> +	 *
> +	 * We run before preempt_schedule_irq() may be called later on
> +	 * by preemptible kernels, so that any rescheduling request
> +	 * triggered by in-band IRQ handlers is considered.
> +	 */
> +	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
> +		state.stage_info == IRQENTRY_OOB) {
> +		unstall_inband_nocheck();
> +		synchronize_pipeline_on_irq();
> +		stall_inband_nocheck();
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static void irqentry_unstall(void)
> +{
> +	unstall_inband_nocheck();
> +}
> +
> +#else
> +
> +static bool irqentry_syncstage(irqentry_state_t state)
> +{
> +	return false;
> +}
> +
> +static void irqentry_unstall(void)
> +{
> +}
> +
> +#endif
> +
>  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>  {
> +	bool synchronized = false;
> +
>  	if (running_oob())
>  		return;
>  
> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>  	if (user_mode(regs)) {
>  		irqentry_exit_to_user_mode(regs);
>  		return;
> -	} else if (irqexit_may_preempt_schedule(state, regs)) {
> +	}
> +
> +	synchronized = irqentry_syncstage(state);
> +
> +	if (irqexit_may_preempt_schedule(state, regs)) {
>  		/*
>  		 * If RCU was not watching on entry this needs to be done
>  		 * carefully and needs the same ordering of lockdep/tracing
> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>  	}
>  
>  out:
> -#ifdef CONFIG_IRQ_PIPELINE
> -	/*
> -	 * If pipelining interrupts, clear the in-band stall bit if
> -	 * irqentry_enter() raised it in order to mirror the hardware
> -	 * state. Also clear it when we where coming from oob, thus went
> -	 * through a migration that was caused by taking, e.g., a fault.
> -	 */
> -	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
> -	    state.stage_info == IRQENTRY_OOB)
> -		unstall_inband();
> -#endif
> +	if (synchronized)
> +		irqentry_unstall();
> +
>  	return;
>  }
>  
> 

I've put that into a staging/v5.10.y-dovetail branch, pointed
xenomai-images to that and started a run in our lab. That should give us
some confidence that things work under cobalt across all archs, and we
can make that commit official.

However, heavy stressing would require triggering migration-on-fault in
tighter loops while running something like stress-ng in the background.
Maybe it would make sense to add such a pattern to the switchtest (would
also require a way to silence the related kernel warnings during
runtime, not only build-time)?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-09  5:34                             ` Jan Kiszka
@ 2021-06-09  6:12                               ` Philippe Gerum
  2021-06-11  7:15                                 ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2021-06-09  6:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 08.06.21 19:39, Philippe Gerum wrote:
>> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>> 
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 07.06.21 18:19, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, always this way?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>>>>> other way around?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>>>>
>>>>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>>>>
>>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>>>>> pending some tests ongoing here:
>>>>>>>>>>>
>>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>>>>
>>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>>>>> scenario:
>>>>>>>>>>>
>>>>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>>>>                        stall_inband();
>>>>>>>>>>>                                 trap_handler();
>>>>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>>>>                              unstall_inband();
>>>>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>>>>
>>>>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>>>>> hard_local_irq_disable or in general?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>>>>> directly?
>>>>>>>>>
>>>>>>>>
>>>>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>>>>
>>>>>>>> Can we safely sync at that point?
>>>>>>>
>>>>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>>>>> point. What we may not do obviously, is creating synchronization points
>>>>>>> where the kernel does not expect irqs.
>>>>>>>
>>>>>>
>>>>>> Again: When will any reschedules that might be requested by those
>>>>>> replayed IRQs be handled then?
>>>>>>
>>>>>
>>>>> Nowhere unless we provide a log synchronization routine dedicated to the
>>>>> in-band kernel exit path, which would fold in the rescheduling call via
>>>>> irqentry_exit_cond_resched(). I'll write that one.
>>>>>
>>>>
>>>> How about simply moving the replay up, before the kernel checks for
>>>> pending reschedules anyway? We can stall inband then again, to please
>>>> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>>>>
>>>
>>> Yes, this is the right way to do this. I'm on it.
>> 
>> Please test this (heavily) with cobalt. This survived a 2-hours long kvm
>> test here, running loops of the evl test suite + hackbench over 64 vcpus,
>> so far so good. No reason for any difference in behavior between cobalt
>> and evl in this area, this is merely a dovetail business.
>> 
>> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl)
>> Author: Philippe Gerum <rpm@xenomai.org>
>> 
>>     genirq: irq_pipeline: synchronize log on irq exit to kernel
>>     
>>     We must make sure to play any IRQ which might be pending in the
>>     in-band log before leaving an interrupt frame for a preempted kernel
>>     context.
>>     
>>     This completes "irq_pipeline: Account for stage migration across
>>     faults", so that we synchronize the log once the in-band stage is
>>     unstalled. In addition, we also care to do this before
>>     preempt_schedule_irq() runs, so that we won't miss any rescheduling
>>     request which might have been triggered by some IRQ we just played.
>>     
>>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>>     Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>
>> 
>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> index 4e81c0c03e5726a..99512307537561b 100644
>> --- a/kernel/entry/common.c
>> +++ b/kernel/entry/common.c
>> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state,
>>  
>>  #endif
>>  
>> +#ifdef CONFIG_IRQ_PIPELINE
>> +
>> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */
>> +{
>> +	/*
>> +	 * If pipelining interrupts, enable in-band IRQs then
>> +	 * synchronize the interrupt log on exit if:
>> +	 *
>> +	 * - irqentry_enter() stalled the stage in order to mirror the
>> +	 * hardware state.
>> +	 *
>> +	 * - we where coming from oob, thus went through a stage migration
>> +	 * that was caused by taking a CPU exception, e.g., a fault.
>> +	 *
>> +	 * We run before preempt_schedule_irq() may be called later on
>> +	 * by preemptible kernels, so that any rescheduling request
>> +	 * triggered by in-band IRQ handlers is considered.
>> +	 */
>> +	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>> +		state.stage_info == IRQENTRY_OOB) {
>> +		unstall_inband_nocheck();
>> +		synchronize_pipeline_on_irq();
>> +		stall_inband_nocheck();
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static void irqentry_unstall(void)
>> +{
>> +	unstall_inband_nocheck();
>> +}
>> +
>> +#else
>> +
>> +static bool irqentry_syncstage(irqentry_state_t state)
>> +{
>> +	return false;
>> +}
>> +
>> +static void irqentry_unstall(void)
>> +{
>> +}
>> +
>> +#endif
>> +
>>  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>  {
>> +	bool synchronized = false;
>> +
>>  	if (running_oob())
>>  		return;
>>  
>> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>  	if (user_mode(regs)) {
>>  		irqentry_exit_to_user_mode(regs);
>>  		return;
>> -	} else if (irqexit_may_preempt_schedule(state, regs)) {
>> +	}
>> +
>> +	synchronized = irqentry_syncstage(state);
>> +
>> +	if (irqexit_may_preempt_schedule(state, regs)) {
>>  		/*
>>  		 * If RCU was not watching on entry this needs to be done
>>  		 * carefully and needs the same ordering of lockdep/tracing
>> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>  	}
>>  
>>  out:
>> -#ifdef CONFIG_IRQ_PIPELINE
>> -	/*
>> -	 * If pipelining interrupts, clear the in-band stall bit if
>> -	 * irqentry_enter() raised it in order to mirror the hardware
>> -	 * state. Also clear it when we where coming from oob, thus went
>> -	 * through a migration that was caused by taking, e.g., a fault.
>> -	 */
>> -	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>> -	    state.stage_info == IRQENTRY_OOB)
>> -		unstall_inband();
>> -#endif
>> +	if (synchronized)
>> +		irqentry_unstall();
>> +
>>  	return;
>>  }
>>  
>> 
>
> I've put that into a staging/v5.10.y-dovetail branch, pointed
> xenomai-images to that and started a run in our lab. That should give us
> some confidence that things work under cobalt across all archs, and we
> can make that commit official.
>
> However, heavy stressing would require triggering migration-on-fault in
> tighter loops while running something like stress-ng in the background.
> Maybe it would make sense to add such a pattern to the switchtest (would
> also require a way to silence the related kernel warnings during
> runtime, not only build-time)?
>

switchtest aims at exercising the core scheduling logic including the
transition between execution stages, so migration-on-fault would fit
there indeed.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-09  6:12                               ` Philippe Gerum
@ 2021-06-11  7:15                                 ` Jan Kiszka
  2021-06-11 15:39                                   ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2021-06-11  7:15 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai

On 09.06.21 08:12, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 08.06.21 19:39, Philippe Gerum wrote:
>>>
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 07.06.21 18:19, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>
>>>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, always this way?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>>>>>> other way around?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>>>>>
>>>>>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>>>>>
>>>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>>>>>> pending some tests ongoing here:
>>>>>>>>>>>>
>>>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>>>>>
>>>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>>>>>> scenario:
>>>>>>>>>>>>
>>>>>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>>>>>                        stall_inband();
>>>>>>>>>>>>                                 trap_handler();
>>>>>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>>>>>                              unstall_inband();
>>>>>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>>>>>
>>>>>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>>>>>> hard_local_irq_disable or in general?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>>>>>> directly?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>>>>>
>>>>>>>>> Can we safely sync at that point?
>>>>>>>>
>>>>>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>>>>>> point. What we may not do obviously, is creating synchronization points
>>>>>>>> where the kernel does not expect irqs.
>>>>>>>>
>>>>>>>
>>>>>>> Again: When will any reschedules that might be requested by those
>>>>>>> replayed IRQs be handled then?
>>>>>>>
>>>>>>
>>>>>> Nowhere unless we provide a log synchronization routine dedicated to the
>>>>>> in-band kernel exit path, which would fold in the rescheduling call via
>>>>>> irqentry_exit_cond_resched(). I'll write that one.
>>>>>>
>>>>>
>>>>> How about simply moving the replay up, before the kernel checks for
>>>>> pending reschedules anyway? We can stall inband then again, to please
>>>>> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>>>>>
>>>>
>>>> Yes, this is the right way to do this. I'm on it.
>>>
>>> Please test this (heavily) with cobalt. This survived a 2-hours long kvm
>>> test here, running loops of the evl test suite + hackbench over 64 vcpus,
>>> so far so good. No reason for any difference in behavior between cobalt
>>> and evl in this area, this is merely a dovetail business.
>>>
>>> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl)
>>> Author: Philippe Gerum <rpm@xenomai.org>
>>>
>>>     genirq: irq_pipeline: synchronize log on irq exit to kernel
>>>     
>>>     We must make sure to play any IRQ which might be pending in the
>>>     in-band log before leaving an interrupt frame for a preempted kernel
>>>     context.
>>>     
>>>     This completes "irq_pipeline: Account for stage migration across
>>>     faults", so that we synchronize the log once the in-band stage is
>>>     unstalled. In addition, we also care to do this before
>>>     preempt_schedule_irq() runs, so that we won't miss any rescheduling
>>>     request which might have been triggered by some IRQ we just played.
>>>     
>>>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>>>     Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>>> index 4e81c0c03e5726a..99512307537561b 100644
>>> --- a/kernel/entry/common.c
>>> +++ b/kernel/entry/common.c
>>> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state,
>>>  
>>>  #endif
>>>  
>>> +#ifdef CONFIG_IRQ_PIPELINE
>>> +
>>> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */
>>> +{
>>> +	/*
>>> +	 * If pipelining interrupts, enable in-band IRQs then
>>> +	 * synchronize the interrupt log on exit if:
>>> +	 *
>>> +	 * - irqentry_enter() stalled the stage in order to mirror the
>>> +	 * hardware state.
>>> +	 *
>>> +	 * - we where coming from oob, thus went through a stage migration
>>> +	 * that was caused by taking a CPU exception, e.g., a fault.
>>> +	 *
>>> +	 * We run before preempt_schedule_irq() may be called later on
>>> +	 * by preemptible kernels, so that any rescheduling request
>>> +	 * triggered by in-band IRQ handlers is considered.
>>> +	 */
>>> +	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>> +		state.stage_info == IRQENTRY_OOB) {
>>> +		unstall_inband_nocheck();
>>> +		synchronize_pipeline_on_irq();
>>> +		stall_inband_nocheck();
>>> +		return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +static void irqentry_unstall(void)
>>> +{
>>> +	unstall_inband_nocheck();
>>> +}
>>> +
>>> +#else
>>> +
>>> +static bool irqentry_syncstage(irqentry_state_t state)
>>> +{
>>> +	return false;
>>> +}
>>> +
>>> +static void irqentry_unstall(void)
>>> +{
>>> +}
>>> +
>>> +#endif
>>> +
>>>  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>  {
>>> +	bool synchronized = false;
>>> +
>>>  	if (running_oob())
>>>  		return;
>>>  
>>> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>  	if (user_mode(regs)) {
>>>  		irqentry_exit_to_user_mode(regs);
>>>  		return;
>>> -	} else if (irqexit_may_preempt_schedule(state, regs)) {
>>> +	}
>>> +
>>> +	synchronized = irqentry_syncstage(state);
>>> +
>>> +	if (irqexit_may_preempt_schedule(state, regs)) {
>>>  		/*
>>>  		 * If RCU was not watching on entry this needs to be done
>>>  		 * carefully and needs the same ordering of lockdep/tracing
>>> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>  	}
>>>  
>>>  out:
>>> -#ifdef CONFIG_IRQ_PIPELINE
>>> -	/*
>>> -	 * If pipelining interrupts, clear the in-band stall bit if
>>> -	 * irqentry_enter() raised it in order to mirror the hardware
>>> -	 * state. Also clear it when we where coming from oob, thus went
>>> -	 * through a migration that was caused by taking, e.g., a fault.
>>> -	 */
>>> -	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>> -	    state.stage_info == IRQENTRY_OOB)
>>> -		unstall_inband();
>>> -#endif
>>> +	if (synchronized)
>>> +		irqentry_unstall();
>>> +
>>>  	return;
>>>  }
>>>  
>>>
>>
>> I've put that into a staging/v5.10.y-dovetail branch, pointed
>> xenomai-images to that and started a run in our lab. That should give us
>> some confidence that things work under cobalt across all archs, and we
>> can make that commit official.
>>
>> However, heavy stressing would require triggering migration-on-fault in
>> tighter loops while running something like stress-ng in the background.
>> Maybe it would make sense to add such a pattern to the switchtest (would
>> also require a way to silence the related kernel warnings during
>> runtime, not only build-time)?
>>
> 
> switchtest aims at exercising the core scheduling logic including the
> transition between execution stages, so migration-on-fault would fit
> there indeed.
> 

Do we need this testcase first? My feeling from review and testing is
that is safe to move forward with the patch already.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-11  7:15                                 ` Jan Kiszka
@ 2021-06-11 15:39                                   ` Philippe Gerum
  0 siblings, 0 replies; 27+ messages in thread
From: Philippe Gerum @ 2021-06-11 15:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 09.06.21 08:12, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 08.06.21 19:39, Philippe Gerum wrote:
>>>>
>>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 07.06.21 18:19, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 07.06.21 17:04, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>
>>>>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote:
>>>>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote:
>>>>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>>>>>>>>>>>> do not inject it before returning to user space.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>>>>>>>>>>>> dovetail..
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>>>>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>>>>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>>> -	local_irq_disable_full();
>>>>>>>>>>>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>>>>>>>>>>>> +	local_irq_disable();
>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>  #endif
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>>>>>>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>>>>>>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>>>>>>>>>>>> way. This needs more thought I think.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, always this way?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> local_irq_disable_full:
>>>>>>>>>>>>>>>> 	hard_local_irq_disable	
>>>>>>>>>>>>>>>> 	local_irq_disable
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> local_irq_enable_full:
>>>>>>>>>>>>>>>> 	hard_local_irq_enable
>>>>>>>>>>>>>>>> 	local_irq_enable
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Or even flip local_irq_enable_full as well?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because
>>>>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>>>>>>>>>>>> other way around?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> After review, I don't think so. The current order for
>>>>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a commit somewhere that I can drop already?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. These are fine with me in their latest iteration:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   irq_pipeline: Clean up stage_info field and users
>>>>>>>>>>>>>   irq_pipeline: Account for stage migration across faults
>>>>>>>>>>>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>>>>>>>>>>>   x86: dovetail: Fix TS flag reservation
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>>>>>>>>>>>> pending some tests ongoing here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>>>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the
>>>>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>>>>>>>>>>>> migration across faults" patch. This would address the following
>>>>>>>>>>>>> scenario:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         irqentry_enter on kernel context:
>>>>>>>>>>>>>                        stall_inband();
>>>>>>>>>>>>>                                 trap_handler();
>>>>>>>>>>>>>                                         hard_local_irq_enable()
>>>>>>>>>>>>>                                         hard_local_irq_disable()
>>>>>>>>>>>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>>>>>>>>>>>                            state.stage_info == IRQENTRY_OOB) {
>>>>>>>>>>>>>                              unstall_inband();
>>>>>>>>>>>>>                        	     synchronize_pipeline();
>>>>>>>>>>>>
>>>>>>>>>>>> Oh, unstall_inband does not synchronize? Because of
>>>>>>>>>>>> hard_local_irq_disable or in general?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter
>>>>>>>>>>> directly?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ...because it's instrumented to detect hard_irqs_disabled.
>>>>>>>>>>
>>>>>>>>>> Can we safely sync at that point?
>>>>>>>>>
>>>>>>>>> Yes we may do this, because we restore the original -unstalled- state of
>>>>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this
>>>>>>>>> point. What we may not do obviously, is creating synchronization points
>>>>>>>>> where the kernel does not expect irqs.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Again: When will any reschedules that might be requested by those
>>>>>>>> replayed IRQs be handled then?
>>>>>>>>
>>>>>>>
>>>>>>> Nowhere unless we provide a log synchronization routine dedicated to the
>>>>>>> in-band kernel exit path, which would fold in the rescheduling call via
>>>>>>> irqentry_exit_cond_resched(). I'll write that one.
>>>>>>>
>>>>>>
>>>>>> How about simply moving the replay up, before the kernel checks for
>>>>>> pending reschedules anyway? We can stall inband then again, to please
>>>>>> lockdep & Co., as long as there is no hard-irq-on section before the intret.
>>>>>>
>>>>>
>>>>> Yes, this is the right way to do this. I'm on it.
>>>>
>>>> Please test this (heavily) with cobalt. This survived a 2-hours long kvm
>>>> test here, running loops of the evl test suite + hackbench over 64 vcpus,
>>>> so far so good. No reason for any difference in behavior between cobalt
>>>> and evl in this area, this is merely a dovetail business.
>>>>
>>>> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl)
>>>> Author: Philippe Gerum <rpm@xenomai.org>
>>>>
>>>>     genirq: irq_pipeline: synchronize log on irq exit to kernel
>>>>     
>>>>     We must make sure to play any IRQ which might be pending in the
>>>>     in-band log before leaving an interrupt frame for a preempted kernel
>>>>     context.
>>>>     
>>>>     This completes "irq_pipeline: Account for stage migration across
>>>>     faults", so that we synchronize the log once the in-band stage is
>>>>     unstalled. In addition, we also care to do this before
>>>>     preempt_schedule_irq() runs, so that we won't miss any rescheduling
>>>>     request which might have been triggered by some IRQ we just played.
>>>>     
>>>>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>>>>     Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>>>> index 4e81c0c03e5726a..99512307537561b 100644
>>>> --- a/kernel/entry/common.c
>>>> +++ b/kernel/entry/common.c
>>>> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state,
>>>>  
>>>>  #endif
>>>>  
>>>> +#ifdef CONFIG_IRQ_PIPELINE
>>>> +
>>>> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */
>>>> +{
>>>> +	/*
>>>> +	 * If pipelining interrupts, enable in-band IRQs then
>>>> +	 * synchronize the interrupt log on exit if:
>>>> +	 *
>>>> +	 * - irqentry_enter() stalled the stage in order to mirror the
>>>> +	 * hardware state.
>>>> +	 *
>>>> +	 * - we where coming from oob, thus went through a stage migration
>>>> +	 * that was caused by taking a CPU exception, e.g., a fault.
>>>> +	 *
>>>> +	 * We run before preempt_schedule_irq() may be called later on
>>>> +	 * by preemptible kernels, so that any rescheduling request
>>>> +	 * triggered by in-band IRQ handlers is considered.
>>>> +	 */
>>>> +	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>> +		state.stage_info == IRQENTRY_OOB) {
>>>> +		unstall_inband_nocheck();
>>>> +		synchronize_pipeline_on_irq();
>>>> +		stall_inband_nocheck();
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>> +static void irqentry_unstall(void)
>>>> +{
>>>> +	unstall_inband_nocheck();
>>>> +}
>>>> +
>>>> +#else
>>>> +
>>>> +static bool irqentry_syncstage(irqentry_state_t state)
>>>> +{
>>>> +	return false;
>>>> +}
>>>> +
>>>> +static void irqentry_unstall(void)
>>>> +{
>>>> +}
>>>> +
>>>> +#endif
>>>> +
>>>>  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>>  {
>>>> +	bool synchronized = false;
>>>> +
>>>>  	if (running_oob())
>>>>  		return;
>>>>  
>>>> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>>  	if (user_mode(regs)) {
>>>>  		irqentry_exit_to_user_mode(regs);
>>>>  		return;
>>>> -	} else if (irqexit_may_preempt_schedule(state, regs)) {
>>>> +	}
>>>> +
>>>> +	synchronized = irqentry_syncstage(state);
>>>> +
>>>> +	if (irqexit_may_preempt_schedule(state, regs)) {
>>>>  		/*
>>>>  		 * If RCU was not watching on entry this needs to be done
>>>>  		 * carefully and needs the same ordering of lockdep/tracing
>>>> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>>>  	}
>>>>  
>>>>  out:
>>>> -#ifdef CONFIG_IRQ_PIPELINE
>>>> -	/*
>>>> -	 * If pipelining interrupts, clear the in-band stall bit if
>>>> -	 * irqentry_enter() raised it in order to mirror the hardware
>>>> -	 * state. Also clear it when we where coming from oob, thus went
>>>> -	 * through a migration that was caused by taking, e.g., a fault.
>>>> -	 */
>>>> -	if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>> -	    state.stage_info == IRQENTRY_OOB)
>>>> -		unstall_inband();
>>>> -#endif
>>>> +	if (synchronized)
>>>> +		irqentry_unstall();
>>>> +
>>>>  	return;
>>>>  }
>>>>  
>>>>
>>>
>>> I've put that into a staging/v5.10.y-dovetail branch, pointed
>>> xenomai-images to that and started a run in our lab. That should give us
>>> some confidence that things work under cobalt across all archs, and we
>>> can make that commit official.
>>>
>>> However, heavy stressing would require triggering migration-on-fault in
>>> tighter loops while running something like stress-ng in the background.
>>> Maybe it would make sense to add such a pattern to the switchtest (would
>>> also require a way to silence the related kernel warnings during
>>> runtime, not only build-time)?
>>>
>> 
>> switchtest aims at exercising the core scheduling logic including the
>> transition between execution stages, so migration-on-fault would fit
>> there indeed.
>> 
>
> Do we need this testcase first? My feeling from review and testing is
> that is safe to move forward with the patch already.
>

Looks ok here too. The patch is fairly straightforward, and this has
been tested on the EVL side of the universe on a handful of armv7, armv8
machines without any glitch. I'll merge this patch shortly.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:48             ` Jan Kiszka
  2021-06-07 13:58               ` Jan Kiszka
@ 2021-06-07 14:37               ` Philippe Gerum
  1 sibling, 0 replies; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 14:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 15:44, Jan Kiszka wrote:
>> On 07.06.21 15:03, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>>
>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>
>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>>> do not inject it before returning to user space.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>>
>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>>> dovetail..
>>>>>>>>
>>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>>> --- a/include/linux/entry-common.h
>>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>>  {
>>>>>>>> -	local_irq_disable_full();
>>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>>> +	local_irq_disable();
>>>>>>>>  }
>>>>>>>>  #endif
>>>>>>>
>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>>> fix looks good, but maybe we could do even better.
>>>>>>>
>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>>> way, and we don't leave through the user exit path.
>>>>>>>
>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>>> way. This needs more thought I think.
>>>>>>>
>>>>>>
>>>>>> So, always this way?
>>>>>>
>>>>>> local_irq_disable_full:
>>>>>> 	hard_local_irq_disable	
>>>>>> 	local_irq_disable
>>>>>>
>>>>>> local_irq_enable_full:
>>>>>> 	hard_local_irq_enable
>>>>>> 	local_irq_enable
>>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Or even flip local_irq_enable_full as well?
>>>>>
>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>>> otherwise we would trigger a debug assertion. The reason for such
>>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>>> hard irqs on, and this is something we want to depart from because
>>>>> whether the caller expects or not such side-effect is error-prone. For
>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>>> the current hard irq state for the caller under normal circumstances.
>>>>>
>>>>>>
>>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>>> other way around?
>>>>>>
>>>>>
>>>>> After review, I don't think so. The current order for
>>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>>> to test this on the armv[7/8] side here first before merging.
>>>>>
>>>>
>>>> Is there a commit somewhere that I can drop already?
>>>>
>>>
>>> 1. These are fine with me in their latest iteration:
>>>
>>>   irq_pipeline: Clean up stage_info field and users
>>>   irq_pipeline: Account for stage migration across faults
>>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>>   x86: dovetail: Fix TS flag reservation
>>>
>>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>>> pending some tests ongoing here:
>>>
>>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>>>
>>> However, as I mentioned earlier, this may not be sufficient per se, we
>>> may have to synchronize the in-band log when unstalling the stage on the
>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>>> migration across faults" patch. This would address the following
>>> scenario:
>>>
>>>         irqentry_enter on kernel context:
>>>                        stall_inband();
>>>                                 trap_handler();
>>>                                         hard_local_irq_enable()
>>>                                         hard_local_irq_disable()
>>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>>                            state.stage_info == IRQENTRY_OOB) {
>>>                              unstall_inband();
>>>                        	     synchronize_pipeline();
>> 
>> Oh, unstall_inband does not synchronize? Because of
>> hard_local_irq_disable or in general?
>> 
>
> Ah, found it - unstall != inband_irq_enable. Why not using the latter
> directly?
>

Well, in this case I should have, hence the bug.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs
  2021-06-07 13:44           ` Jan Kiszka
  2021-06-07 13:48             ` Jan Kiszka
@ 2021-06-07 14:36             ` Philippe Gerum
  1 sibling, 0 replies; 27+ messages in thread
From: Philippe Gerum @ 2021-06-07 14:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 07.06.21 15:03, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 07.06.21 09:17, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>>>
>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>
>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also
>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but
>>>>>>> do not inject it before returning to user space.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>> ---
>>>>>>>
>>>>>>> Fixes the RCU stalls Florian reported, at least for me.
>>>>>>>
>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into
>>>>>>> dovetail..
>>>>>>>
>>>>>>>  include/linux/entry-common.h | 3 ++-
>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>>>>>>> index 0fb45b2d6094..00540110985e 100644
>>>>>>> --- a/include/linux/entry-common.h
>>>>>>> +++ b/include/linux/entry-common.h
>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void);
>>>>>>>  #ifndef local_irq_disable_exit_to_user
>>>>>>>  static inline void local_irq_disable_exit_to_user(void)
>>>>>>>  {
>>>>>>> -	local_irq_disable_full();
>>>>>>> +	hard_cond_local_irq_disable();
>>>>>>> +	local_irq_disable();
>>>>>>>  }
>>>>>>>  #endif
>>>>>>
>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The
>>>>>> fix looks good, but maybe we could do even better.
>>>>>>
>>>>>> The way local_irq_disable_full() works tends to introduce this issue in
>>>>>> a sneaky way, when the code path does not synchronize the interrupt log
>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not
>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still
>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same
>>>>>> way, and we don't leave through the user exit path.
>>>>>>
>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an
>>>>>> option (inband_irq_disable would allow that), making sure we cannot be
>>>>>> caught in the same issue when returning to kernel mode is another
>>>>>> way. This needs more thought I think.
>>>>>>
>>>>>
>>>>> So, always this way?
>>>>>
>>>>> local_irq_disable_full:
>>>>> 	hard_local_irq_disable	
>>>>> 	local_irq_disable
>>>>>
>>>>> local_irq_enable_full:
>>>>> 	hard_local_irq_enable
>>>>> 	local_irq_enable
>>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Or even flip local_irq_enable_full as well?
>>>>
>>>> Not this one, hard irqs must be on before local_irq_enable() is issued
>>>> otherwise we would trigger a debug assertion. The reason for such
>>>> assertion is that the I-pipe version of local_irq_enable() force enables
>>>> hard irqs on, and this is something we want to depart from because
>>>> whether the caller expects or not such side-effect is error-prone. For
>>>> this reason, inband_irq_enable() expects hard irqs on, which should be
>>>> the current hard irq state for the caller under normal circumstances.
>>>>
>>>>>
>>>>> Was there ever code that required ordering local_irq_disable_full the
>>>>> other way around?
>>>>>
>>>>
>>>> After review, I don't think so. The current order for
>>>> local_irq_disable_full() was rather determined by applying a reverse
>>>> sequence compared to local_irq_enable_full(). So this looks good. I need
>>>> to test this on the armv[7/8] side here first before merging.
>>>>
>>>
>>> Is there a commit somewhere that I can drop already?
>>>
>> 
>> 1. These are fine with me in their latest iteration:
>> 
>>   irq_pipeline: Clean up stage_info field and users
>>   irq_pipeline: Account for stage migration across faults
>>   irq_pipeline: Warn when calling irqentry_enter with oob stalled
>>   x86: dovetail: Fix TS flag reservation
>> 
>> 2. This one should be replaced by a fix in local_irq_disable_full(),
>> pending some tests ongoing here:
>> 
>> irq_pipeline: Prevent returning to user space with pending inband IRQs
>> 
>> However, as I mentioned earlier, this may not be sufficient per se, we
>> may have to synchronize the in-band log when unstalling the stage on the
>> kernel exit path, i.e. the exit point fixed up by the "Account for stage
>> migration across faults" patch. This would address the following
>> scenario:
>> 
>>         irqentry_enter on kernel context:
>>                        stall_inband();
>>                                 trap_handler();
>>                                         hard_local_irq_enable()
>>                                         hard_local_irq_disable()
>>                        if (state.stage_info == IRQENTRY_INBAND_UNSTALLED ||
>>                            state.stage_info == IRQENTRY_OOB) {
>>                              unstall_inband();
>>                        	     synchronize_pipeline();
>
> Oh, unstall_inband does not synchronize? Because of
> hard_local_irq_disable or in general?
>

In general, this is only a bit flip. The caller has to do the right
thing afterwards, e.g. inband_irq_enable() or inband_irq_restore() do
so.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-06-11 15:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-06 12:35 [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs Jan Kiszka
2021-06-06 14:43 ` Philippe Gerum
2021-06-07  5:53   ` Jan Kiszka
2021-06-07  7:17     ` Philippe Gerum
2021-06-07 10:22       ` Jan Kiszka
2021-06-07 13:03         ` Philippe Gerum
2021-06-07 13:28           ` Philippe Gerum
2021-06-07 13:29             ` Philippe Gerum
2021-06-07 13:40               ` Jan Kiszka
2021-06-07 15:11                 ` Philippe Gerum
2021-06-07 13:44           ` Jan Kiszka
2021-06-07 13:48             ` Jan Kiszka
2021-06-07 13:58               ` Jan Kiszka
2021-06-07 14:44                 ` Philippe Gerum
2021-06-07 14:52                   ` Jan Kiszka
2021-06-07 15:04                 ` Philippe Gerum
2021-06-07 15:38                   ` Jan Kiszka
2021-06-07 16:19                     ` Philippe Gerum
2021-06-07 16:35                       ` Jan Kiszka
2021-06-08 16:50                         ` Philippe Gerum
2021-06-08 17:39                           ` Philippe Gerum
2021-06-09  5:34                             ` Jan Kiszka
2021-06-09  6:12                               ` Philippe Gerum
2021-06-11  7:15                                 ` Jan Kiszka
2021-06-11 15:39                                   ` Philippe Gerum
2021-06-07 14:37               ` Philippe Gerum
2021-06-07 14:36             ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.