* [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs @ 2021-06-06 12:35 Jan Kiszka 2021-06-06 14:43 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-06 12:35 UTC (permalink / raw) To: Xenomai, Philippe Gerum From: Jan Kiszka <jan.kiszka@siemens.com> The correct order is hard IRQs off first, then stalling inband, see also I-pipe. Otherwise, we risk to take an interrupt for the inband stage but do not inject it before returning to user space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- Fixes the RCU stalls Florian reported, at least for me. I'm afraid this wasn't the last regression of translating I-pipe into dovetail.. include/linux/entry-common.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 0fb45b2d6094..00540110985e 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); #ifndef local_irq_disable_exit_to_user static inline void local_irq_disable_exit_to_user(void) { - local_irq_disable_full(); + hard_cond_local_irq_disable(); + local_irq_disable(); } #endif -- 2.26.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-06 12:35 [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs Jan Kiszka @ 2021-06-06 14:43 ` Philippe Gerum 2021-06-07 5:53 ` Jan Kiszka 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-06 14:43 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai, Bezdeka, Florian (T RDA IOT SES-DE) Jan Kiszka <jan.kiszka@web.de> writes: > From: Jan Kiszka <jan.kiszka@siemens.com> > > The correct order is hard IRQs off first, then stalling inband, see also > I-pipe. Otherwise, we risk to take an interrupt for the inband stage but > do not inject it before returning to user space. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > --- > > Fixes the RCU stalls Florian reported, at least for me. > > I'm afraid this wasn't the last regression of translating I-pipe into > dovetail.. > > include/linux/entry-common.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h > index 0fb45b2d6094..00540110985e 100644 > --- a/include/linux/entry-common.h > +++ b/include/linux/entry-common.h > @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); > #ifndef local_irq_disable_exit_to_user > static inline void local_irq_disable_exit_to_user(void) > { > - local_irq_disable_full(); > + hard_cond_local_irq_disable(); > + local_irq_disable(); > } > #endif Good spot, real issue here, just like the CPU_IDLE code deals with. The fix looks good, but maybe we could do even better. The way local_irq_disable_full() works tends to introduce this issue in a sneaky way, when the code path does not synchronize the interrupt log (exit_to_user_mode_loop may be at fault in this case, which does not traverses synchronize_pipeline()). Lack of synchronization would still hit us if some trap handler turns hard irqs off -> on -> off the same way, and we don't leave through the user exit path. Inverting the order of the actions in local_irq_disable_full() may be an option (inband_irq_disable would allow that), making sure we cannot be caught in the same issue when returning to kernel mode is another way. This needs more thought I think. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-06 14:43 ` Philippe Gerum @ 2021-06-07 5:53 ` Jan Kiszka 2021-06-07 7:17 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 5:53 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: > > Jan Kiszka <jan.kiszka@web.de> writes: > >> From: Jan Kiszka <jan.kiszka@siemens.com> >> >> The correct order is hard IRQs off first, then stalling inband, see also >> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >> do not inject it before returning to user space. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >> --- >> >> Fixes the RCU stalls Florian reported, at least for me. >> >> I'm afraid this wasn't the last regression of translating I-pipe into >> dovetail.. >> >> include/linux/entry-common.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >> index 0fb45b2d6094..00540110985e 100644 >> --- a/include/linux/entry-common.h >> +++ b/include/linux/entry-common.h >> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >> #ifndef local_irq_disable_exit_to_user >> static inline void local_irq_disable_exit_to_user(void) >> { >> - local_irq_disable_full(); >> + hard_cond_local_irq_disable(); >> + local_irq_disable(); >> } >> #endif > > Good spot, real issue here, just like the CPU_IDLE code deals with. The > fix looks good, but maybe we could do even better. > > The way local_irq_disable_full() works tends to introduce this issue in > a sneaky way, when the code path does not synchronize the interrupt log > (exit_to_user_mode_loop may be at fault in this case, which does not > traverses synchronize_pipeline()). Lack of synchronization would still > hit us if some trap handler turns hard irqs off -> on -> off the same > way, and we don't leave through the user exit path. > > Inverting the order of the actions in local_irq_disable_full() may be an > option (inband_irq_disable would allow that), making sure we cannot be > caught in the same issue when returning to kernel mode is another > way. This needs more thought I think. > So, always this way? local_irq_disable_full: hard_local_irq_disable local_irq_disable local_irq_enable_full: hard_local_irq_enable local_irq_enable Or even flip local_irq_enable_full as well? Was there ever code that required ordering local_irq_disable_full the other way around? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 5:53 ` Jan Kiszka @ 2021-06-07 7:17 ` Philippe Gerum 2021-06-07 10:22 ` Jan Kiszka 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 7:17 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >> >> Jan Kiszka <jan.kiszka@web.de> writes: >> >>> From: Jan Kiszka <jan.kiszka@siemens.com> >>> >>> The correct order is hard IRQs off first, then stalling inband, see also >>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>> do not inject it before returning to user space. >>> >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>> --- >>> >>> Fixes the RCU stalls Florian reported, at least for me. >>> >>> I'm afraid this wasn't the last regression of translating I-pipe into >>> dovetail.. >>> >>> include/linux/entry-common.h | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>> index 0fb45b2d6094..00540110985e 100644 >>> --- a/include/linux/entry-common.h >>> +++ b/include/linux/entry-common.h >>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>> #ifndef local_irq_disable_exit_to_user >>> static inline void local_irq_disable_exit_to_user(void) >>> { >>> - local_irq_disable_full(); >>> + hard_cond_local_irq_disable(); >>> + local_irq_disable(); >>> } >>> #endif >> >> Good spot, real issue here, just like the CPU_IDLE code deals with. The >> fix looks good, but maybe we could do even better. >> >> The way local_irq_disable_full() works tends to introduce this issue in >> a sneaky way, when the code path does not synchronize the interrupt log >> (exit_to_user_mode_loop may be at fault in this case, which does not >> traverses synchronize_pipeline()). Lack of synchronization would still >> hit us if some trap handler turns hard irqs off -> on -> off the same >> way, and we don't leave through the user exit path. >> >> Inverting the order of the actions in local_irq_disable_full() may be an >> option (inband_irq_disable would allow that), making sure we cannot be >> caught in the same issue when returning to kernel mode is another >> way. This needs more thought I think. >> > > So, always this way? > > local_irq_disable_full: > hard_local_irq_disable > local_irq_disable > > local_irq_enable_full: > hard_local_irq_enable > local_irq_enable > Yes. > Or even flip local_irq_enable_full as well? Not this one, hard irqs must be on before local_irq_enable() is issued otherwise we would trigger a debug assertion. The reason for such assertion is that the I-pipe version of local_irq_enable() force enables hard irqs on, and this is something we want to depart from because whether the caller expects or not such side-effect is error-prone. For this reason, inband_irq_enable() expects hard irqs on, which should be the current hard irq state for the caller under normal circumstances. > > Was there ever code that required ordering local_irq_disable_full the > other way around? > After review, I don't think so. The current order for local_irq_disable_full() was rather determined by applying a reverse sequence compared to local_irq_enable_full(). So this looks good. I need to test this on the armv[7/8] side here first before merging. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 7:17 ` Philippe Gerum @ 2021-06-07 10:22 ` Jan Kiszka 2021-06-07 13:03 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 10:22 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 09:17, Philippe Gerum wrote: > > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>> >>> Jan Kiszka <jan.kiszka@web.de> writes: >>> >>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>> >>>> The correct order is hard IRQs off first, then stalling inband, see also >>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>> do not inject it before returning to user space. >>>> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>> --- >>>> >>>> Fixes the RCU stalls Florian reported, at least for me. >>>> >>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>> dovetail.. >>>> >>>> include/linux/entry-common.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>> index 0fb45b2d6094..00540110985e 100644 >>>> --- a/include/linux/entry-common.h >>>> +++ b/include/linux/entry-common.h >>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>> #ifndef local_irq_disable_exit_to_user >>>> static inline void local_irq_disable_exit_to_user(void) >>>> { >>>> - local_irq_disable_full(); >>>> + hard_cond_local_irq_disable(); >>>> + local_irq_disable(); >>>> } >>>> #endif >>> >>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>> fix looks good, but maybe we could do even better. >>> >>> The way local_irq_disable_full() works tends to introduce this issue in >>> a sneaky way, when the code path does not synchronize the interrupt log >>> (exit_to_user_mode_loop may be at fault in this case, which does not >>> traverses synchronize_pipeline()). Lack of synchronization would still >>> hit us if some trap handler turns hard irqs off -> on -> off the same >>> way, and we don't leave through the user exit path. >>> >>> Inverting the order of the actions in local_irq_disable_full() may be an >>> option (inband_irq_disable would allow that), making sure we cannot be >>> caught in the same issue when returning to kernel mode is another >>> way. This needs more thought I think. >>> >> >> So, always this way? >> >> local_irq_disable_full: >> hard_local_irq_disable >> local_irq_disable >> >> local_irq_enable_full: >> hard_local_irq_enable >> local_irq_enable >> > > Yes. > >> Or even flip local_irq_enable_full as well? > > Not this one, hard irqs must be on before local_irq_enable() is issued > otherwise we would trigger a debug assertion. The reason for such > assertion is that the I-pipe version of local_irq_enable() force enables > hard irqs on, and this is something we want to depart from because > whether the caller expects or not such side-effect is error-prone. For > this reason, inband_irq_enable() expects hard irqs on, which should be > the current hard irq state for the caller under normal circumstances. > >> >> Was there ever code that required ordering local_irq_disable_full the >> other way around? >> > > After review, I don't think so. The current order for > local_irq_disable_full() was rather determined by applying a reverse > sequence compared to local_irq_enable_full(). So this looks good. I need > to test this on the armv[7/8] side here first before merging. > Is there a commit somewhere that I can drop already? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 10:22 ` Jan Kiszka @ 2021-06-07 13:03 ` Philippe Gerum 2021-06-07 13:28 ` Philippe Gerum 2021-06-07 13:44 ` Jan Kiszka 0 siblings, 2 replies; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 13:03 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 09:17, Philippe Gerum wrote: >> >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>> >>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>> >>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>> >>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>> do not inject it before returning to user space. >>>>> >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>> --- >>>>> >>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>> >>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>> dovetail.. >>>>> >>>>> include/linux/entry-common.h | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>> index 0fb45b2d6094..00540110985e 100644 >>>>> --- a/include/linux/entry-common.h >>>>> +++ b/include/linux/entry-common.h >>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>> #ifndef local_irq_disable_exit_to_user >>>>> static inline void local_irq_disable_exit_to_user(void) >>>>> { >>>>> - local_irq_disable_full(); >>>>> + hard_cond_local_irq_disable(); >>>>> + local_irq_disable(); >>>>> } >>>>> #endif >>>> >>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>> fix looks good, but maybe we could do even better. >>>> >>>> The way local_irq_disable_full() works tends to introduce this issue in >>>> a sneaky way, when the code path does not synchronize the interrupt log >>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>> way, and we don't leave through the user exit path. >>>> >>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>> option (inband_irq_disable would allow that), making sure we cannot be >>>> caught in the same issue when returning to kernel mode is another >>>> way. This needs more thought I think. >>>> >>> >>> So, always this way? >>> >>> local_irq_disable_full: >>> hard_local_irq_disable >>> local_irq_disable >>> >>> local_irq_enable_full: >>> hard_local_irq_enable >>> local_irq_enable >>> >> >> Yes. >> >>> Or even flip local_irq_enable_full as well? >> >> Not this one, hard irqs must be on before local_irq_enable() is issued >> otherwise we would trigger a debug assertion. The reason for such >> assertion is that the I-pipe version of local_irq_enable() force enables >> hard irqs on, and this is something we want to depart from because >> whether the caller expects or not such side-effect is error-prone. For >> this reason, inband_irq_enable() expects hard irqs on, which should be >> the current hard irq state for the caller under normal circumstances. >> >>> >>> Was there ever code that required ordering local_irq_disable_full the >>> other way around? >>> >> >> After review, I don't think so. The current order for >> local_irq_disable_full() was rather determined by applying a reverse >> sequence compared to local_irq_enable_full(). So this looks good. I need >> to test this on the armv[7/8] side here first before merging. >> > > Is there a commit somewhere that I can drop already? > 1. These are fine with me in their latest iteration: irq_pipeline: Clean up stage_info field and users irq_pipeline: Account for stage migration across faults irq_pipeline: Warn when calling irqentry_enter with oob stalled x86: dovetail: Fix TS flag reservation 2. This one should be replaced by a fix in local_irq_disable_full(), pending some tests ongoing here: irq_pipeline: Prevent returning to user space with pending inband IRQs However, as I mentioned earlier, this may not be sufficient per se, we may have to synchronize the in-band log when unstalling the stage on the kernel exit path, i.e. the exit point fixed up by the "Account for stage migration across faults" patch. This would address the following scenario: irqentry_enter on kernel context: stall_inband(); trap_handler(); hard_local_irq_enable() hard_local_irq_disable() if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || state.stage_info == IRQENTRY_OOB) { unstall_inband(); synchronize_pipeline(); } 3. I believe local_irq_restore_full() is fragile, it mixes hard and virtual irq states carelessly. I'll likely issue a patch for this one after testing. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:03 ` Philippe Gerum @ 2021-06-07 13:28 ` Philippe Gerum 2021-06-07 13:29 ` Philippe Gerum 2021-06-07 13:44 ` Jan Kiszka 1 sibling, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 13:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Philippe Gerum <rpm@xenomai.org> writes: > > 2. This one should be replaced by a fix in local_irq_disable_full(), > pending some tests ongoing here: > > irq_pipeline: Prevent returning to user space with pending inband IRQs > Tests went ok. So [1] should be replaced by a patch inverting the order in local_irq_disable_full() as we discussed. [1] irq_pipeline: Prevent returning to user space with pending inband IRQs -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:28 ` Philippe Gerum @ 2021-06-07 13:29 ` Philippe Gerum 2021-06-07 13:40 ` Jan Kiszka 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 13:29 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Philippe Gerum <rpm@xenomai.org> writes: > Philippe Gerum <rpm@xenomai.org> writes: >> >> 2. This one should be replaced by a fix in local_irq_disable_full(), >> pending some tests ongoing here: >> >> irq_pipeline: Prevent returning to user space with pending inband IRQs >> > > Tests went ok. So [1] should be replaced by a patch inverting the order > in local_irq_disable_full() as we discussed. > > [1] irq_pipeline: Prevent returning to user space with pending inband IRQs Likewise for local_irq_save_full() which may cause the same problem, i.e.: diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 8663f19cc7b7b68..051c72751db42d1 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -242,14 +242,14 @@ do { \ #define local_irq_disable_full() \ do { \ - local_irq_disable(); \ hard_local_irq_disable(); \ + local_irq_disable(); \ } while (0) #define local_irq_save_full(__flags) \ do { \ - local_irq_save(__flags); \ hard_local_irq_disable(); \ + local_irq_save(__flags); \ } while (0) #define local_irq_restore_full(__flags) \ -- Philippe. ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:29 ` Philippe Gerum @ 2021-06-07 13:40 ` Jan Kiszka 2021-06-07 15:11 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 13:40 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 15:29, Philippe Gerum wrote: > > Philippe Gerum <rpm@xenomai.org> writes: > >> Philippe Gerum <rpm@xenomai.org> writes: >>> >>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>> pending some tests ongoing here: >>> >>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>> >> >> Tests went ok. So [1] should be replaced by a patch inverting the order >> in local_irq_disable_full() as we discussed. >> >> [1] irq_pipeline: Prevent returning to user space with pending inband IRQs > > Likewise for local_irq_save_full() which may cause the same problem, > i.e.: > > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h > index 8663f19cc7b7b68..051c72751db42d1 100644 > --- a/include/linux/irqflags.h > +++ b/include/linux/irqflags.h > @@ -242,14 +242,14 @@ do { \ > > #define local_irq_disable_full() \ > do { \ > - local_irq_disable(); \ > hard_local_irq_disable(); \ > + local_irq_disable(); \ > } while (0) > > #define local_irq_save_full(__flags) \ > do { \ > - local_irq_save(__flags); \ > hard_local_irq_disable(); \ > + local_irq_save(__flags); \ > } while (0) > > #define local_irq_restore_full(__flags) \ > > As you tested already, I assume you will make the corresponding commit available via -rebase. Let me know otherwise if I should do that. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:40 ` Jan Kiszka @ 2021-06-07 15:11 ` Philippe Gerum 0 siblings, 0 replies; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 15:11 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 15:29, Philippe Gerum wrote: >> >> Philippe Gerum <rpm@xenomai.org> writes: >> >>> Philippe Gerum <rpm@xenomai.org> writes: >>>> >>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>> pending some tests ongoing here: >>>> >>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>> >>> >>> Tests went ok. So [1] should be replaced by a patch inverting the order >>> in local_irq_disable_full() as we discussed. >>> >>> [1] irq_pipeline: Prevent returning to user space with pending inband IRQs >> >> Likewise for local_irq_save_full() which may cause the same problem, >> i.e.: >> >> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h >> index 8663f19cc7b7b68..051c72751db42d1 100644 >> --- a/include/linux/irqflags.h >> +++ b/include/linux/irqflags.h >> @@ -242,14 +242,14 @@ do { \ >> >> #define local_irq_disable_full() \ >> do { \ >> - local_irq_disable(); \ >> hard_local_irq_disable(); \ >> + local_irq_disable(); \ >> } while (0) >> >> #define local_irq_save_full(__flags) \ >> do { \ >> - local_irq_save(__flags); \ >> hard_local_irq_disable(); \ >> + local_irq_save(__flags); \ >> } while (0) >> >> #define local_irq_restore_full(__flags) \ >> >> > > As you tested already, I assume you will make the corresponding commit > available via -rebase. Let me know otherwise if I should do that. > > Jan Yes, I'll do that. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:03 ` Philippe Gerum 2021-06-07 13:28 ` Philippe Gerum @ 2021-06-07 13:44 ` Jan Kiszka 2021-06-07 13:48 ` Jan Kiszka 2021-06-07 14:36 ` Philippe Gerum 1 sibling, 2 replies; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 13:44 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 15:03, Philippe Gerum wrote: > > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 07.06.21 09:17, Philippe Gerum wrote: >>> >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>> >>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>> >>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>> do not inject it before returning to user space. >>>>>> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>> --- >>>>>> >>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>> >>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>> dovetail.. >>>>>> >>>>>> include/linux/entry-common.h | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>> --- a/include/linux/entry-common.h >>>>>> +++ b/include/linux/entry-common.h >>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>> #ifndef local_irq_disable_exit_to_user >>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>> { >>>>>> - local_irq_disable_full(); >>>>>> + hard_cond_local_irq_disable(); >>>>>> + local_irq_disable(); >>>>>> } >>>>>> #endif >>>>> >>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>> fix looks good, but maybe we could do even better. >>>>> >>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>> way, and we don't leave through the user exit path. >>>>> >>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>> caught in the same issue when returning to kernel mode is another >>>>> way. This needs more thought I think. >>>>> >>>> >>>> So, always this way? >>>> >>>> local_irq_disable_full: >>>> hard_local_irq_disable >>>> local_irq_disable >>>> >>>> local_irq_enable_full: >>>> hard_local_irq_enable >>>> local_irq_enable >>>> >>> >>> Yes. >>> >>>> Or even flip local_irq_enable_full as well? >>> >>> Not this one, hard irqs must be on before local_irq_enable() is issued >>> otherwise we would trigger a debug assertion. The reason for such >>> assertion is that the I-pipe version of local_irq_enable() force enables >>> hard irqs on, and this is something we want to depart from because >>> whether the caller expects or not such side-effect is error-prone. For >>> this reason, inband_irq_enable() expects hard irqs on, which should be >>> the current hard irq state for the caller under normal circumstances. >>> >>>> >>>> Was there ever code that required ordering local_irq_disable_full the >>>> other way around? >>>> >>> >>> After review, I don't think so. The current order for >>> local_irq_disable_full() was rather determined by applying a reverse >>> sequence compared to local_irq_enable_full(). So this looks good. I need >>> to test this on the armv[7/8] side here first before merging. >>> >> >> Is there a commit somewhere that I can drop already? >> > > 1. These are fine with me in their latest iteration: > > irq_pipeline: Clean up stage_info field and users > irq_pipeline: Account for stage migration across faults > irq_pipeline: Warn when calling irqentry_enter with oob stalled > x86: dovetail: Fix TS flag reservation > > 2. This one should be replaced by a fix in local_irq_disable_full(), > pending some tests ongoing here: > > irq_pipeline: Prevent returning to user space with pending inband IRQs > > However, as I mentioned earlier, this may not be sufficient per se, we > may have to synchronize the in-band log when unstalling the stage on the > kernel exit path, i.e. the exit point fixed up by the "Account for stage > migration across faults" patch. This would address the following > scenario: > > irqentry_enter on kernel context: > stall_inband(); > trap_handler(); > hard_local_irq_enable() > hard_local_irq_disable() > if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || > state.stage_info == IRQENTRY_OOB) { > unstall_inband(); > synchronize_pipeline(); Oh, unstall_inband does not synchronize? Because of hard_local_irq_disable or in general? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:44 ` Jan Kiszka @ 2021-06-07 13:48 ` Jan Kiszka 2021-06-07 13:58 ` Jan Kiszka 2021-06-07 14:37 ` Philippe Gerum 2021-06-07 14:36 ` Philippe Gerum 1 sibling, 2 replies; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 13:48 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 15:44, Jan Kiszka wrote: > On 07.06.21 15:03, Philippe Gerum wrote: >> >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 07.06.21 09:17, Philippe Gerum wrote: >>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>> >>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>> >>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>> do not inject it before returning to user space. >>>>>>> >>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>> --- >>>>>>> >>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>> >>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>> dovetail.. >>>>>>> >>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>> --- a/include/linux/entry-common.h >>>>>>> +++ b/include/linux/entry-common.h >>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>> { >>>>>>> - local_irq_disable_full(); >>>>>>> + hard_cond_local_irq_disable(); >>>>>>> + local_irq_disable(); >>>>>>> } >>>>>>> #endif >>>>>> >>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>> fix looks good, but maybe we could do even better. >>>>>> >>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>> way, and we don't leave through the user exit path. >>>>>> >>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>> caught in the same issue when returning to kernel mode is another >>>>>> way. This needs more thought I think. >>>>>> >>>>> >>>>> So, always this way? >>>>> >>>>> local_irq_disable_full: >>>>> hard_local_irq_disable >>>>> local_irq_disable >>>>> >>>>> local_irq_enable_full: >>>>> hard_local_irq_enable >>>>> local_irq_enable >>>>> >>>> >>>> Yes. >>>> >>>>> Or even flip local_irq_enable_full as well? >>>> >>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>> otherwise we would trigger a debug assertion. The reason for such >>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>> hard irqs on, and this is something we want to depart from because >>>> whether the caller expects or not such side-effect is error-prone. For >>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>> the current hard irq state for the caller under normal circumstances. >>>> >>>>> >>>>> Was there ever code that required ordering local_irq_disable_full the >>>>> other way around? >>>>> >>>> >>>> After review, I don't think so. The current order for >>>> local_irq_disable_full() was rather determined by applying a reverse >>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>> to test this on the armv[7/8] side here first before merging. >>>> >>> >>> Is there a commit somewhere that I can drop already? >>> >> >> 1. These are fine with me in their latest iteration: >> >> irq_pipeline: Clean up stage_info field and users >> irq_pipeline: Account for stage migration across faults >> irq_pipeline: Warn when calling irqentry_enter with oob stalled >> x86: dovetail: Fix TS flag reservation >> >> 2. This one should be replaced by a fix in local_irq_disable_full(), >> pending some tests ongoing here: >> >> irq_pipeline: Prevent returning to user space with pending inband IRQs >> >> However, as I mentioned earlier, this may not be sufficient per se, we >> may have to synchronize the in-band log when unstalling the stage on the >> kernel exit path, i.e. the exit point fixed up by the "Account for stage >> migration across faults" patch. This would address the following >> scenario: >> >> irqentry_enter on kernel context: >> stall_inband(); >> trap_handler(); >> hard_local_irq_enable() >> hard_local_irq_disable() >> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >> state.stage_info == IRQENTRY_OOB) { >> unstall_inband(); >> synchronize_pipeline(); > > Oh, unstall_inband does not synchronize? Because of > hard_local_irq_disable or in general? > Ah, found it - unstall != inband_irq_enable. Why not using the latter directly? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:48 ` Jan Kiszka @ 2021-06-07 13:58 ` Jan Kiszka 2021-06-07 14:44 ` Philippe Gerum 2021-06-07 15:04 ` Philippe Gerum 2021-06-07 14:37 ` Philippe Gerum 1 sibling, 2 replies; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 13:58 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 15:48, Jan Kiszka wrote: > On 07.06.21 15:44, Jan Kiszka wrote: >> On 07.06.21 15:03, Philippe Gerum wrote: >>> >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>> >>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>> >>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>> do not inject it before returning to user space. >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>> --- >>>>>>>> >>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>> >>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>> dovetail.. >>>>>>>> >>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>> --- a/include/linux/entry-common.h >>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>> { >>>>>>>> - local_irq_disable_full(); >>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>> + local_irq_disable(); >>>>>>>> } >>>>>>>> #endif >>>>>>> >>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>> fix looks good, but maybe we could do even better. >>>>>>> >>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>> way, and we don't leave through the user exit path. >>>>>>> >>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>> way. This needs more thought I think. >>>>>>> >>>>>> >>>>>> So, always this way? >>>>>> >>>>>> local_irq_disable_full: >>>>>> hard_local_irq_disable >>>>>> local_irq_disable >>>>>> >>>>>> local_irq_enable_full: >>>>>> hard_local_irq_enable >>>>>> local_irq_enable >>>>>> >>>>> >>>>> Yes. >>>>> >>>>>> Or even flip local_irq_enable_full as well? >>>>> >>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>> otherwise we would trigger a debug assertion. The reason for such >>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>> hard irqs on, and this is something we want to depart from because >>>>> whether the caller expects or not such side-effect is error-prone. For >>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>> the current hard irq state for the caller under normal circumstances. >>>>> >>>>>> >>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>> other way around? >>>>>> >>>>> >>>>> After review, I don't think so. The current order for >>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>> to test this on the armv[7/8] side here first before merging. >>>>> >>>> >>>> Is there a commit somewhere that I can drop already? >>>> >>> >>> 1. These are fine with me in their latest iteration: >>> >>> irq_pipeline: Clean up stage_info field and users >>> irq_pipeline: Account for stage migration across faults >>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>> x86: dovetail: Fix TS flag reservation >>> >>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>> pending some tests ongoing here: >>> >>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>> >>> However, as I mentioned earlier, this may not be sufficient per se, we >>> may have to synchronize the in-band log when unstalling the stage on the >>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>> migration across faults" patch. This would address the following >>> scenario: >>> >>> irqentry_enter on kernel context: >>> stall_inband(); >>> trap_handler(); >>> hard_local_irq_enable() >>> hard_local_irq_disable() >>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>> state.stage_info == IRQENTRY_OOB) { >>> unstall_inband(); >>> synchronize_pipeline(); >> >> Oh, unstall_inband does not synchronize? Because of >> hard_local_irq_disable or in general? >> > > Ah, found it - unstall != inband_irq_enable. Why not using the latter > directly? > ...because it's instrumented to detect hard_irqs_disabled. Can we safely sync at that point? Will that also ensure that an inband rescheduling event will not be delayed? We are not running irqentry_exit_cond_resched after that. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:58 ` Jan Kiszka @ 2021-06-07 14:44 ` Philippe Gerum 2021-06-07 14:52 ` Jan Kiszka 2021-06-07 15:04 ` Philippe Gerum 1 sibling, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 14:44 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 15:48, Jan Kiszka wrote: >> On 07.06.21 15:44, Jan Kiszka wrote: >>> On 07.06.21 15:03, Philippe Gerum wrote: >>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>> >>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>> >>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>> >>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>> >>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>> do not inject it before returning to user space. >>>>>>>>> >>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>> >>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>> dovetail.. >>>>>>>>> >>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>> { >>>>>>>>> - local_irq_disable_full(); >>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>> + local_irq_disable(); >>>>>>>>> } >>>>>>>>> #endif >>>>>>>> >>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>> >>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>> way, and we don't leave through the user exit path. >>>>>>>> >>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>> way. This needs more thought I think. >>>>>>>> >>>>>>> >>>>>>> So, always this way? >>>>>>> >>>>>>> local_irq_disable_full: >>>>>>> hard_local_irq_disable >>>>>>> local_irq_disable >>>>>>> >>>>>>> local_irq_enable_full: >>>>>>> hard_local_irq_enable >>>>>>> local_irq_enable >>>>>>> >>>>>> >>>>>> Yes. >>>>>> >>>>>>> Or even flip local_irq_enable_full as well? >>>>>> >>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>> hard irqs on, and this is something we want to depart from because >>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>> the current hard irq state for the caller under normal circumstances. >>>>>> >>>>>>> >>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>> other way around? >>>>>>> >>>>>> >>>>>> After review, I don't think so. The current order for >>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>> to test this on the armv[7/8] side here first before merging. >>>>>> >>>>> >>>>> Is there a commit somewhere that I can drop already? >>>>> >>>> >>>> 1. These are fine with me in their latest iteration: >>>> >>>> irq_pipeline: Clean up stage_info field and users >>>> irq_pipeline: Account for stage migration across faults >>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>> x86: dovetail: Fix TS flag reservation >>>> >>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>> pending some tests ongoing here: >>>> >>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>> >>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>> may have to synchronize the in-band log when unstalling the stage on the >>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>> migration across faults" patch. This would address the following >>>> scenario: >>>> >>>> irqentry_enter on kernel context: >>>> stall_inband(); >>>> trap_handler(); >>>> hard_local_irq_enable() >>>> hard_local_irq_disable() >>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>> state.stage_info == IRQENTRY_OOB) { >>>> unstall_inband(); >>>> synchronize_pipeline(); >>> >>> Oh, unstall_inband does not synchronize? Because of >>> hard_local_irq_disable or in general? >>> >> >> Ah, found it - unstall != inband_irq_enable. Why not using the latter >> directly? >> > > ...because it's instrumented to detect hard_irqs_disabled. > > Can we safely sync at that point? Will that also ensure that an inband > rescheduling event will not be delayed? We are not running > irqentry_exit_cond_resched after that. > We should have hit irqexit_may_preempt_schedule() and rescheduled appropriately before branching to unstall_inband(). Since we cannot log more IRQs in the meantime, we should not miss any rescheduling opportunity. [Yes, __inband_irq_enable() reschedules, but does not perform any of the expected rcu tweaks, so we should not count on it in this particular case]. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 14:44 ` Philippe Gerum @ 2021-06-07 14:52 ` Jan Kiszka 0 siblings, 0 replies; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 14:52 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 16:44, Philippe Gerum wrote: > > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 07.06.21 15:48, Jan Kiszka wrote: >>> On 07.06.21 15:44, Jan Kiszka wrote: >>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>> >>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>> >>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>> >>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>> >>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>> >>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>> dovetail.. >>>>>>>>>> >>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>> { >>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>> + local_irq_disable(); >>>>>>>>>> } >>>>>>>>>> #endif >>>>>>>>> >>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>> >>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>> >>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>> way. This needs more thought I think. >>>>>>>>> >>>>>>>> >>>>>>>> So, always this way? >>>>>>>> >>>>>>>> local_irq_disable_full: >>>>>>>> hard_local_irq_disable >>>>>>>> local_irq_disable >>>>>>>> >>>>>>>> local_irq_enable_full: >>>>>>>> hard_local_irq_enable >>>>>>>> local_irq_enable >>>>>>>> >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>> >>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>> >>>>>>>> >>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>> other way around? >>>>>>>> >>>>>>> >>>>>>> After review, I don't think so. The current order for >>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>> >>>>>> >>>>>> Is there a commit somewhere that I can drop already? >>>>>> >>>>> >>>>> 1. These are fine with me in their latest iteration: >>>>> >>>>> irq_pipeline: Clean up stage_info field and users >>>>> irq_pipeline: Account for stage migration across faults >>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>> x86: dovetail: Fix TS flag reservation >>>>> >>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>> pending some tests ongoing here: >>>>> >>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>> >>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>> migration across faults" patch. This would address the following >>>>> scenario: >>>>> >>>>> irqentry_enter on kernel context: >>>>> stall_inband(); >>>>> trap_handler(); >>>>> hard_local_irq_enable() >>>>> hard_local_irq_disable() >>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>> state.stage_info == IRQENTRY_OOB) { >>>>> unstall_inband(); >>>>> synchronize_pipeline(); >>>> >>>> Oh, unstall_inband does not synchronize? Because of >>>> hard_local_irq_disable or in general? >>>> >>> >>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>> directly? >>> >> >> ...because it's instrumented to detect hard_irqs_disabled. >> >> Can we safely sync at that point? Will that also ensure that an inband >> rescheduling event will not be delayed? We are not running >> irqentry_exit_cond_resched after that. >> > > We should have hit irqexit_may_preempt_schedule() and rescheduled > appropriately before branching to unstall_inband(). Since we cannot log > more IRQs in the meantime, we should not miss any rescheduling > opportunity. > > [Yes, __inband_irq_enable() reschedules, but does not perform any of the > expected rcu tweaks, so we should not count on it in this particular > case]. > I don't get this argument yet: irqexit_may_preempt_schedule() runs before we replayed the log, no? So it cannot act on any reschedules that only replay the log will mark pending. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:58 ` Jan Kiszka 2021-06-07 14:44 ` Philippe Gerum @ 2021-06-07 15:04 ` Philippe Gerum 2021-06-07 15:38 ` Jan Kiszka 1 sibling, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 15:04 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 15:48, Jan Kiszka wrote: >> On 07.06.21 15:44, Jan Kiszka wrote: >>> On 07.06.21 15:03, Philippe Gerum wrote: >>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>> >>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>> >>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>> >>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>> >>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>> do not inject it before returning to user space. >>>>>>>>> >>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>> >>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>> dovetail.. >>>>>>>>> >>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>> { >>>>>>>>> - local_irq_disable_full(); >>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>> + local_irq_disable(); >>>>>>>>> } >>>>>>>>> #endif >>>>>>>> >>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>> >>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>> way, and we don't leave through the user exit path. >>>>>>>> >>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>> way. This needs more thought I think. >>>>>>>> >>>>>>> >>>>>>> So, always this way? >>>>>>> >>>>>>> local_irq_disable_full: >>>>>>> hard_local_irq_disable >>>>>>> local_irq_disable >>>>>>> >>>>>>> local_irq_enable_full: >>>>>>> hard_local_irq_enable >>>>>>> local_irq_enable >>>>>>> >>>>>> >>>>>> Yes. >>>>>> >>>>>>> Or even flip local_irq_enable_full as well? >>>>>> >>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>> hard irqs on, and this is something we want to depart from because >>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>> the current hard irq state for the caller under normal circumstances. >>>>>> >>>>>>> >>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>> other way around? >>>>>>> >>>>>> >>>>>> After review, I don't think so. The current order for >>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>> to test this on the armv[7/8] side here first before merging. >>>>>> >>>>> >>>>> Is there a commit somewhere that I can drop already? >>>>> >>>> >>>> 1. These are fine with me in their latest iteration: >>>> >>>> irq_pipeline: Clean up stage_info field and users >>>> irq_pipeline: Account for stage migration across faults >>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>> x86: dovetail: Fix TS flag reservation >>>> >>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>> pending some tests ongoing here: >>>> >>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>> >>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>> may have to synchronize the in-band log when unstalling the stage on the >>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>> migration across faults" patch. This would address the following >>>> scenario: >>>> >>>> irqentry_enter on kernel context: >>>> stall_inband(); >>>> trap_handler(); >>>> hard_local_irq_enable() >>>> hard_local_irq_disable() >>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>> state.stage_info == IRQENTRY_OOB) { >>>> unstall_inband(); >>>> synchronize_pipeline(); >>> >>> Oh, unstall_inband does not synchronize? Because of >>> hard_local_irq_disable or in general? >>> >> >> Ah, found it - unstall != inband_irq_enable. Why not using the latter >> directly? >> > > ...because it's instrumented to detect hard_irqs_disabled. > > Can we safely sync at that point? Yes we may do this, because we restore the original -unstalled- state of the in-band stage. So if we have pending IRQs, we may play them at this point. What we may not do obviously, is creating synchronization points where the kernel does not expect irqs. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 15:04 ` Philippe Gerum @ 2021-06-07 15:38 ` Jan Kiszka 2021-06-07 16:19 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 15:38 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 17:04, Philippe Gerum wrote: > > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 07.06.21 15:48, Jan Kiszka wrote: >>> On 07.06.21 15:44, Jan Kiszka wrote: >>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>> >>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>> >>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>> >>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>> >>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>> >>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>> dovetail.. >>>>>>>>>> >>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>> { >>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>> + local_irq_disable(); >>>>>>>>>> } >>>>>>>>>> #endif >>>>>>>>> >>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>> >>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>> >>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>> way. This needs more thought I think. >>>>>>>>> >>>>>>>> >>>>>>>> So, always this way? >>>>>>>> >>>>>>>> local_irq_disable_full: >>>>>>>> hard_local_irq_disable >>>>>>>> local_irq_disable >>>>>>>> >>>>>>>> local_irq_enable_full: >>>>>>>> hard_local_irq_enable >>>>>>>> local_irq_enable >>>>>>>> >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>> >>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>> >>>>>>>> >>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>> other way around? >>>>>>>> >>>>>>> >>>>>>> After review, I don't think so. The current order for >>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>> >>>>>> >>>>>> Is there a commit somewhere that I can drop already? >>>>>> >>>>> >>>>> 1. These are fine with me in their latest iteration: >>>>> >>>>> irq_pipeline: Clean up stage_info field and users >>>>> irq_pipeline: Account for stage migration across faults >>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>> x86: dovetail: Fix TS flag reservation >>>>> >>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>> pending some tests ongoing here: >>>>> >>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>> >>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>> migration across faults" patch. This would address the following >>>>> scenario: >>>>> >>>>> irqentry_enter on kernel context: >>>>> stall_inband(); >>>>> trap_handler(); >>>>> hard_local_irq_enable() >>>>> hard_local_irq_disable() >>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>> state.stage_info == IRQENTRY_OOB) { >>>>> unstall_inband(); >>>>> synchronize_pipeline(); >>>> >>>> Oh, unstall_inband does not synchronize? Because of >>>> hard_local_irq_disable or in general? >>>> >>> >>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>> directly? >>> >> >> ...because it's instrumented to detect hard_irqs_disabled. >> >> Can we safely sync at that point? > > Yes we may do this, because we restore the original -unstalled- state of > the in-band stage. So if we have pending IRQs, we may play them at this > point. What we may not do obviously, is creating synchronization points > where the kernel does not expect irqs. > Again: When will any reschedules that might be requested by those replayed IRQs be handled then? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 15:38 ` Jan Kiszka @ 2021-06-07 16:19 ` Philippe Gerum 2021-06-07 16:35 ` Jan Kiszka 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 16:19 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 17:04, Philippe Gerum wrote: >> >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 07.06.21 15:48, Jan Kiszka wrote: >>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>> >>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>> >>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>> >>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>> >>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>> >>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>> >>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>> --- >>>>>>>>>>> >>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>> >>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>> dovetail.. >>>>>>>>>>> >>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>> { >>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>> } >>>>>>>>>>> #endif >>>>>>>>>> >>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>> >>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>> >>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>> way. This needs more thought I think. >>>>>>>>>> >>>>>>>>> >>>>>>>>> So, always this way? >>>>>>>>> >>>>>>>>> local_irq_disable_full: >>>>>>>>> hard_local_irq_disable >>>>>>>>> local_irq_disable >>>>>>>>> >>>>>>>>> local_irq_enable_full: >>>>>>>>> hard_local_irq_enable >>>>>>>>> local_irq_enable >>>>>>>>> >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>> >>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>> >>>>>>>>> >>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>> other way around? >>>>>>>>> >>>>>>>> >>>>>>>> After review, I don't think so. The current order for >>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>> >>>>>>> >>>>>>> Is there a commit somewhere that I can drop already? >>>>>>> >>>>>> >>>>>> 1. These are fine with me in their latest iteration: >>>>>> >>>>>> irq_pipeline: Clean up stage_info field and users >>>>>> irq_pipeline: Account for stage migration across faults >>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>> x86: dovetail: Fix TS flag reservation >>>>>> >>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>> pending some tests ongoing here: >>>>>> >>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>> >>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>> migration across faults" patch. This would address the following >>>>>> scenario: >>>>>> >>>>>> irqentry_enter on kernel context: >>>>>> stall_inband(); >>>>>> trap_handler(); >>>>>> hard_local_irq_enable() >>>>>> hard_local_irq_disable() >>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>> unstall_inband(); >>>>>> synchronize_pipeline(); >>>>> >>>>> Oh, unstall_inband does not synchronize? Because of >>>>> hard_local_irq_disable or in general? >>>>> >>>> >>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>> directly? >>>> >>> >>> ...because it's instrumented to detect hard_irqs_disabled. >>> >>> Can we safely sync at that point? >> >> Yes we may do this, because we restore the original -unstalled- state of >> the in-band stage. So if we have pending IRQs, we may play them at this >> point. What we may not do obviously, is creating synchronization points >> where the kernel does not expect irqs. >> > > Again: When will any reschedules that might be requested by those > replayed IRQs be handled then? > Nowhere unless we provide a log synchronization routine dedicated to the in-band kernel exit path, which would fold in the rescheduling call via irqentry_exit_cond_resched(). I'll write that one. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 16:19 ` Philippe Gerum @ 2021-06-07 16:35 ` Jan Kiszka 2021-06-08 16:50 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-07 16:35 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 07.06.21 18:19, Philippe Gerum wrote: > > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 07.06.21 17:04, Philippe Gerum wrote: >>> >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>> >>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>> >>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>> >>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>> >>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>> >>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>> >>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>> --- >>>>>>>>>>>> >>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>> >>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>> dovetail.. >>>>>>>>>>>> >>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>> { >>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>> } >>>>>>>>>>>> #endif >>>>>>>>>>> >>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>> >>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>> >>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So, always this way? >>>>>>>>>> >>>>>>>>>> local_irq_disable_full: >>>>>>>>>> hard_local_irq_disable >>>>>>>>>> local_irq_disable >>>>>>>>>> >>>>>>>>>> local_irq_enable_full: >>>>>>>>>> hard_local_irq_enable >>>>>>>>>> local_irq_enable >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes. >>>>>>>>> >>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>> >>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>> other way around? >>>>>>>>>> >>>>>>>>> >>>>>>>>> After review, I don't think so. The current order for >>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>> >>>>>>>> >>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>> >>>>>>> >>>>>>> 1. These are fine with me in their latest iteration: >>>>>>> >>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>> >>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>> pending some tests ongoing here: >>>>>>> >>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>> >>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>> migration across faults" patch. This would address the following >>>>>>> scenario: >>>>>>> >>>>>>> irqentry_enter on kernel context: >>>>>>> stall_inband(); >>>>>>> trap_handler(); >>>>>>> hard_local_irq_enable() >>>>>>> hard_local_irq_disable() >>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>> unstall_inband(); >>>>>>> synchronize_pipeline(); >>>>>> >>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>> hard_local_irq_disable or in general? >>>>>> >>>>> >>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>> directly? >>>>> >>>> >>>> ...because it's instrumented to detect hard_irqs_disabled. >>>> >>>> Can we safely sync at that point? >>> >>> Yes we may do this, because we restore the original -unstalled- state of >>> the in-band stage. So if we have pending IRQs, we may play them at this >>> point. What we may not do obviously, is creating synchronization points >>> where the kernel does not expect irqs. >>> >> >> Again: When will any reschedules that might be requested by those >> replayed IRQs be handled then? >> > > Nowhere unless we provide a log synchronization routine dedicated to the > in-band kernel exit path, which would fold in the rescheduling call via > irqentry_exit_cond_resched(). I'll write that one. > How about simply moving the replay up, before the kernel checks for pending reschedules anyway? We can stall inband then again, to please lockdep & Co., as long as there is no hard-irq-on section before the intret. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 16:35 ` Jan Kiszka @ 2021-06-08 16:50 ` Philippe Gerum 2021-06-08 17:39 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-08 16:50 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 18:19, Philippe Gerum wrote: >> >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 07.06.21 17:04, Philippe Gerum wrote: >>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>>> >>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>> >>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>>> >>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>> >>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>>> >>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>>> >>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>> >>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>> --- >>>>>>>>>>>>> >>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>>> dovetail.. >>>>>>>>>>>>> >>>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>>> { >>>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif >>>>>>>>>>>> >>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>>> >>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>>> >>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So, always this way? >>>>>>>>>>> >>>>>>>>>>> local_irq_disable_full: >>>>>>>>>>> hard_local_irq_disable >>>>>>>>>>> local_irq_disable >>>>>>>>>>> >>>>>>>>>>> local_irq_enable_full: >>>>>>>>>>> hard_local_irq_enable >>>>>>>>>>> local_irq_enable >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>>> >>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>>> other way around? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> After review, I don't think so. The current order for >>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>>> >>>>>>>> >>>>>>>> 1. These are fine with me in their latest iteration: >>>>>>>> >>>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>>> >>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>>> pending some tests ongoing here: >>>>>>>> >>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>>> >>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>>> migration across faults" patch. This would address the following >>>>>>>> scenario: >>>>>>>> >>>>>>>> irqentry_enter on kernel context: >>>>>>>> stall_inband(); >>>>>>>> trap_handler(); >>>>>>>> hard_local_irq_enable() >>>>>>>> hard_local_irq_disable() >>>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>>> unstall_inband(); >>>>>>>> synchronize_pipeline(); >>>>>>> >>>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>>> hard_local_irq_disable or in general? >>>>>>> >>>>>> >>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>>> directly? >>>>>> >>>>> >>>>> ...because it's instrumented to detect hard_irqs_disabled. >>>>> >>>>> Can we safely sync at that point? >>>> >>>> Yes we may do this, because we restore the original -unstalled- state of >>>> the in-band stage. So if we have pending IRQs, we may play them at this >>>> point. What we may not do obviously, is creating synchronization points >>>> where the kernel does not expect irqs. >>>> >>> >>> Again: When will any reschedules that might be requested by those >>> replayed IRQs be handled then? >>> >> >> Nowhere unless we provide a log synchronization routine dedicated to the >> in-band kernel exit path, which would fold in the rescheduling call via >> irqentry_exit_cond_resched(). I'll write that one. >> > > How about simply moving the replay up, before the kernel checks for > pending reschedules anyway? We can stall inband then again, to please > lockdep & Co., as long as there is no hard-irq-on section before the intret. > Yes, this is the right way to do this. I'm on it. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-08 16:50 ` Philippe Gerum @ 2021-06-08 17:39 ` Philippe Gerum 2021-06-09 5:34 ` Jan Kiszka 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-08 17:39 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Philippe Gerum <rpm@xenomai.org> writes: > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 07.06.21 18:19, Philippe Gerum wrote: >>> >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 07.06.21 17:04, Philippe Gerum wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>>>> >>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>> >>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>>>> >>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>> >>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>>>> >>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>>>> dovetail.. >>>>>>>>>>>>>> >>>>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif >>>>>>>>>>>>> >>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>>>> >>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>>>> >>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So, always this way? >>>>>>>>>>>> >>>>>>>>>>>> local_irq_disable_full: >>>>>>>>>>>> hard_local_irq_disable >>>>>>>>>>>> local_irq_disable >>>>>>>>>>>> >>>>>>>>>>>> local_irq_enable_full: >>>>>>>>>>>> hard_local_irq_enable >>>>>>>>>>>> local_irq_enable >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>>>> >>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>>>> other way around? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> After review, I don't think so. The current order for >>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>>>> >>>>>>>>> >>>>>>>>> 1. These are fine with me in their latest iteration: >>>>>>>>> >>>>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>>>> >>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>>>> pending some tests ongoing here: >>>>>>>>> >>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>>>> >>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>>>> migration across faults" patch. This would address the following >>>>>>>>> scenario: >>>>>>>>> >>>>>>>>> irqentry_enter on kernel context: >>>>>>>>> stall_inband(); >>>>>>>>> trap_handler(); >>>>>>>>> hard_local_irq_enable() >>>>>>>>> hard_local_irq_disable() >>>>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>>>> unstall_inband(); >>>>>>>>> synchronize_pipeline(); >>>>>>>> >>>>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>>>> hard_local_irq_disable or in general? >>>>>>>> >>>>>>> >>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>>>> directly? >>>>>>> >>>>>> >>>>>> ...because it's instrumented to detect hard_irqs_disabled. >>>>>> >>>>>> Can we safely sync at that point? >>>>> >>>>> Yes we may do this, because we restore the original -unstalled- state of >>>>> the in-band stage. So if we have pending IRQs, we may play them at this >>>>> point. What we may not do obviously, is creating synchronization points >>>>> where the kernel does not expect irqs. >>>>> >>>> >>>> Again: When will any reschedules that might be requested by those >>>> replayed IRQs be handled then? >>>> >>> >>> Nowhere unless we provide a log synchronization routine dedicated to the >>> in-band kernel exit path, which would fold in the rescheduling call via >>> irqentry_exit_cond_resched(). I'll write that one. >>> >> >> How about simply moving the replay up, before the kernel checks for >> pending reschedules anyway? We can stall inband then again, to please >> lockdep & Co., as long as there is no hard-irq-on section before the intret. >> > > Yes, this is the right way to do this. I'm on it. Please test this (heavily) with cobalt. This survived a 2-hours long kvm test here, running loops of the evl test suite + hackbench over 64 vcpus, so far so good. No reason for any difference in behavior between cobalt and evl in this area, this is merely a dovetail business. commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl) Author: Philippe Gerum <rpm@xenomai.org> genirq: irq_pipeline: synchronize log on irq exit to kernel We must make sure to play any IRQ which might be pending in the in-band log before leaving an interrupt frame for a preempted kernel context. This completes "irq_pipeline: Account for stage migration across faults", so that we synchronize the log once the in-band stage is unstalled. In addition, we also care to do this before preempt_schedule_irq() runs, so that we won't miss any rescheduling request which might have been triggered by some IRQ we just played. Signed-off-by: Philippe Gerum <rpm@xenomai.org> Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 4e81c0c03e5726a..99512307537561b 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state, #endif +#ifdef CONFIG_IRQ_PIPELINE + +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */ +{ + /* + * If pipelining interrupts, enable in-band IRQs then + * synchronize the interrupt log on exit if: + * + * - irqentry_enter() stalled the stage in order to mirror the + * hardware state. + * + * - we where coming from oob, thus went through a stage migration + * that was caused by taking a CPU exception, e.g., a fault. + * + * We run before preempt_schedule_irq() may be called later on + * by preemptible kernels, so that any rescheduling request + * triggered by in-band IRQ handlers is considered. + */ + if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || + state.stage_info == IRQENTRY_OOB) { + unstall_inband_nocheck(); + synchronize_pipeline_on_irq(); + stall_inband_nocheck(); + return true; + } + + return false; +} + +static void irqentry_unstall(void) +{ + unstall_inband_nocheck(); +} + +#else + +static bool irqentry_syncstage(irqentry_state_t state) +{ + return false; +} + +static void irqentry_unstall(void) +{ +} + +#endif + noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) { + bool synchronized = false; + if (running_oob()) return; @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) if (user_mode(regs)) { irqentry_exit_to_user_mode(regs); return; - } else if (irqexit_may_preempt_schedule(state, regs)) { + } + + synchronized = irqentry_syncstage(state); + + if (irqexit_may_preempt_schedule(state, regs)) { /* * If RCU was not watching on entry this needs to be done * carefully and needs the same ordering of lockdep/tracing @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) } out: -#ifdef CONFIG_IRQ_PIPELINE - /* - * If pipelining interrupts, clear the in-band stall bit if - * irqentry_enter() raised it in order to mirror the hardware - * state. Also clear it when we where coming from oob, thus went - * through a migration that was caused by taking, e.g., a fault. - */ - if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || - state.stage_info == IRQENTRY_OOB) - unstall_inband(); -#endif + if (synchronized) + irqentry_unstall(); + return; } -- Philippe. ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-08 17:39 ` Philippe Gerum @ 2021-06-09 5:34 ` Jan Kiszka 2021-06-09 6:12 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-09 5:34 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 08.06.21 19:39, Philippe Gerum wrote: > > Philippe Gerum <rpm@xenomai.org> writes: > >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 07.06.21 18:19, Philippe Gerum wrote: >>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 07.06.21 17:04, Philippe Gerum wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>> >>>>>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>>>>> >>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>> >>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>>>>> >>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>>> >>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>>>>> dovetail.. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>> >>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> So, always this way? >>>>>>>>>>>>> >>>>>>>>>>>>> local_irq_disable_full: >>>>>>>>>>>>> hard_local_irq_disable >>>>>>>>>>>>> local_irq_disable >>>>>>>>>>>>> >>>>>>>>>>>>> local_irq_enable_full: >>>>>>>>>>>>> hard_local_irq_enable >>>>>>>>>>>>> local_irq_enable >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yes. >>>>>>>>>>>> >>>>>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>>>>> >>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>>>>> other way around? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> After review, I don't think so. The current order for >>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. These are fine with me in their latest iteration: >>>>>>>>>> >>>>>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>>>>> >>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>>>>> pending some tests ongoing here: >>>>>>>>>> >>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>>>>> >>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>>>>> migration across faults" patch. This would address the following >>>>>>>>>> scenario: >>>>>>>>>> >>>>>>>>>> irqentry_enter on kernel context: >>>>>>>>>> stall_inband(); >>>>>>>>>> trap_handler(); >>>>>>>>>> hard_local_irq_enable() >>>>>>>>>> hard_local_irq_disable() >>>>>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>>>>> unstall_inband(); >>>>>>>>>> synchronize_pipeline(); >>>>>>>>> >>>>>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>>>>> hard_local_irq_disable or in general? >>>>>>>>> >>>>>>>> >>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>>>>> directly? >>>>>>>> >>>>>>> >>>>>>> ...because it's instrumented to detect hard_irqs_disabled. >>>>>>> >>>>>>> Can we safely sync at that point? >>>>>> >>>>>> Yes we may do this, because we restore the original -unstalled- state of >>>>>> the in-band stage. So if we have pending IRQs, we may play them at this >>>>>> point. What we may not do obviously, is creating synchronization points >>>>>> where the kernel does not expect irqs. >>>>>> >>>>> >>>>> Again: When will any reschedules that might be requested by those >>>>> replayed IRQs be handled then? >>>>> >>>> >>>> Nowhere unless we provide a log synchronization routine dedicated to the >>>> in-band kernel exit path, which would fold in the rescheduling call via >>>> irqentry_exit_cond_resched(). I'll write that one. >>>> >>> >>> How about simply moving the replay up, before the kernel checks for >>> pending reschedules anyway? We can stall inband then again, to please >>> lockdep & Co., as long as there is no hard-irq-on section before the intret. >>> >> >> Yes, this is the right way to do this. I'm on it. > > Please test this (heavily) with cobalt. This survived a 2-hours long kvm > test here, running loops of the evl test suite + hackbench over 64 vcpus, > so far so good. No reason for any difference in behavior between cobalt > and evl in this area, this is merely a dovetail business. > > commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl) > Author: Philippe Gerum <rpm@xenomai.org> > > genirq: irq_pipeline: synchronize log on irq exit to kernel > > We must make sure to play any IRQ which might be pending in the > in-band log before leaving an interrupt frame for a preempted kernel > context. > > This completes "irq_pipeline: Account for stage migration across > faults", so that we synchronize the log once the in-band stage is > unstalled. In addition, we also care to do this before > preempt_schedule_irq() runs, so that we won't miss any rescheduling > request which might have been triggered by some IRQ we just played. > > Signed-off-by: Philippe Gerum <rpm@xenomai.org> > Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> > > diff --git a/kernel/entry/common.c b/kernel/entry/common.c > index 4e81c0c03e5726a..99512307537561b 100644 > --- a/kernel/entry/common.c > +++ b/kernel/entry/common.c > @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state, > > #endif > > +#ifdef CONFIG_IRQ_PIPELINE > + > +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */ > +{ > + /* > + * If pipelining interrupts, enable in-band IRQs then > + * synchronize the interrupt log on exit if: > + * > + * - irqentry_enter() stalled the stage in order to mirror the > + * hardware state. > + * > + * - we where coming from oob, thus went through a stage migration > + * that was caused by taking a CPU exception, e.g., a fault. > + * > + * We run before preempt_schedule_irq() may be called later on > + * by preemptible kernels, so that any rescheduling request > + * triggered by in-band IRQ handlers is considered. > + */ > + if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || > + state.stage_info == IRQENTRY_OOB) { > + unstall_inband_nocheck(); > + synchronize_pipeline_on_irq(); > + stall_inband_nocheck(); > + return true; > + } > + > + return false; > +} > + > +static void irqentry_unstall(void) > +{ > + unstall_inband_nocheck(); > +} > + > +#else > + > +static bool irqentry_syncstage(irqentry_state_t state) > +{ > + return false; > +} > + > +static void irqentry_unstall(void) > +{ > +} > + > +#endif > + > noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) > { > + bool synchronized = false; > + > if (running_oob()) > return; > > @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) > if (user_mode(regs)) { > irqentry_exit_to_user_mode(regs); > return; > - } else if (irqexit_may_preempt_schedule(state, regs)) { > + } > + > + synchronized = irqentry_syncstage(state); > + > + if (irqexit_may_preempt_schedule(state, regs)) { > /* > * If RCU was not watching on entry this needs to be done > * carefully and needs the same ordering of lockdep/tracing > @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) > } > > out: > -#ifdef CONFIG_IRQ_PIPELINE > - /* > - * If pipelining interrupts, clear the in-band stall bit if > - * irqentry_enter() raised it in order to mirror the hardware > - * state. Also clear it when we where coming from oob, thus went > - * through a migration that was caused by taking, e.g., a fault. > - */ > - if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || > - state.stage_info == IRQENTRY_OOB) > - unstall_inband(); > -#endif > + if (synchronized) > + irqentry_unstall(); > + > return; > } > > I've put that into a staging/v5.10.y-dovetail branch, pointed xenomai-images to that and started a run in our lab. That should give us some confidence that things work under cobalt across all archs, and we can make that commit official. However, heavy stressing would require triggering migration-on-fault in tighter loops while running something like stress-ng in the background. Maybe it would make sense to add such a pattern to the switchtest (would also require a way to silence the related kernel warnings during runtime, not only build-time)? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-09 5:34 ` Jan Kiszka @ 2021-06-09 6:12 ` Philippe Gerum 2021-06-11 7:15 ` Jan Kiszka 0 siblings, 1 reply; 27+ messages in thread From: Philippe Gerum @ 2021-06-09 6:12 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 08.06.21 19:39, Philippe Gerum wrote: >> >> Philippe Gerum <rpm@xenomai.org> writes: >> >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 07.06.21 18:19, Philippe Gerum wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 07.06.21 17:04, Philippe Gerum wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>> >>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>>>>>> >>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>> >>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>>>> >>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>>>>>> dovetail.. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, always this way? >>>>>>>>>>>>>> >>>>>>>>>>>>>> local_irq_disable_full: >>>>>>>>>>>>>> hard_local_irq_disable >>>>>>>>>>>>>> local_irq_disable >>>>>>>>>>>>>> >>>>>>>>>>>>>> local_irq_enable_full: >>>>>>>>>>>>>> hard_local_irq_enable >>>>>>>>>>>>>> local_irq_enable >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes. >>>>>>>>>>>>> >>>>>>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>>>>>> >>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>>>>>> other way around? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> After review, I don't think so. The current order for >>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. These are fine with me in their latest iteration: >>>>>>>>>>> >>>>>>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>>>>>> >>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>>>>>> pending some tests ongoing here: >>>>>>>>>>> >>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>>>>>> >>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>>>>>> migration across faults" patch. This would address the following >>>>>>>>>>> scenario: >>>>>>>>>>> >>>>>>>>>>> irqentry_enter on kernel context: >>>>>>>>>>> stall_inband(); >>>>>>>>>>> trap_handler(); >>>>>>>>>>> hard_local_irq_enable() >>>>>>>>>>> hard_local_irq_disable() >>>>>>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>>>>>> unstall_inband(); >>>>>>>>>>> synchronize_pipeline(); >>>>>>>>>> >>>>>>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>>>>>> hard_local_irq_disable or in general? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>>>>>> directly? >>>>>>>>> >>>>>>>> >>>>>>>> ...because it's instrumented to detect hard_irqs_disabled. >>>>>>>> >>>>>>>> Can we safely sync at that point? >>>>>>> >>>>>>> Yes we may do this, because we restore the original -unstalled- state of >>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this >>>>>>> point. What we may not do obviously, is creating synchronization points >>>>>>> where the kernel does not expect irqs. >>>>>>> >>>>>> >>>>>> Again: When will any reschedules that might be requested by those >>>>>> replayed IRQs be handled then? >>>>>> >>>>> >>>>> Nowhere unless we provide a log synchronization routine dedicated to the >>>>> in-band kernel exit path, which would fold in the rescheduling call via >>>>> irqentry_exit_cond_resched(). I'll write that one. >>>>> >>>> >>>> How about simply moving the replay up, before the kernel checks for >>>> pending reschedules anyway? We can stall inband then again, to please >>>> lockdep & Co., as long as there is no hard-irq-on section before the intret. >>>> >>> >>> Yes, this is the right way to do this. I'm on it. >> >> Please test this (heavily) with cobalt. This survived a 2-hours long kvm >> test here, running loops of the evl test suite + hackbench over 64 vcpus, >> so far so good. No reason for any difference in behavior between cobalt >> and evl in this area, this is merely a dovetail business. >> >> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl) >> Author: Philippe Gerum <rpm@xenomai.org> >> >> genirq: irq_pipeline: synchronize log on irq exit to kernel >> >> We must make sure to play any IRQ which might be pending in the >> in-band log before leaving an interrupt frame for a preempted kernel >> context. >> >> This completes "irq_pipeline: Account for stage migration across >> faults", so that we synchronize the log once the in-band stage is >> unstalled. In addition, we also care to do this before >> preempt_schedule_irq() runs, so that we won't miss any rescheduling >> request which might have been triggered by some IRQ we just played. >> >> Signed-off-by: Philippe Gerum <rpm@xenomai.org> >> Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> >> >> diff --git a/kernel/entry/common.c b/kernel/entry/common.c >> index 4e81c0c03e5726a..99512307537561b 100644 >> --- a/kernel/entry/common.c >> +++ b/kernel/entry/common.c >> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state, >> >> #endif >> >> +#ifdef CONFIG_IRQ_PIPELINE >> + >> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */ >> +{ >> + /* >> + * If pipelining interrupts, enable in-band IRQs then >> + * synchronize the interrupt log on exit if: >> + * >> + * - irqentry_enter() stalled the stage in order to mirror the >> + * hardware state. >> + * >> + * - we where coming from oob, thus went through a stage migration >> + * that was caused by taking a CPU exception, e.g., a fault. >> + * >> + * We run before preempt_schedule_irq() may be called later on >> + * by preemptible kernels, so that any rescheduling request >> + * triggered by in-band IRQ handlers is considered. >> + */ >> + if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >> + state.stage_info == IRQENTRY_OOB) { >> + unstall_inband_nocheck(); >> + synchronize_pipeline_on_irq(); >> + stall_inband_nocheck(); >> + return true; >> + } >> + >> + return false; >> +} >> + >> +static void irqentry_unstall(void) >> +{ >> + unstall_inband_nocheck(); >> +} >> + >> +#else >> + >> +static bool irqentry_syncstage(irqentry_state_t state) >> +{ >> + return false; >> +} >> + >> +static void irqentry_unstall(void) >> +{ >> +} >> + >> +#endif >> + >> noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >> { >> + bool synchronized = false; >> + >> if (running_oob()) >> return; >> >> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >> if (user_mode(regs)) { >> irqentry_exit_to_user_mode(regs); >> return; >> - } else if (irqexit_may_preempt_schedule(state, regs)) { >> + } >> + >> + synchronized = irqentry_syncstage(state); >> + >> + if (irqexit_may_preempt_schedule(state, regs)) { >> /* >> * If RCU was not watching on entry this needs to be done >> * carefully and needs the same ordering of lockdep/tracing >> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >> } >> >> out: >> -#ifdef CONFIG_IRQ_PIPELINE >> - /* >> - * If pipelining interrupts, clear the in-band stall bit if >> - * irqentry_enter() raised it in order to mirror the hardware >> - * state. Also clear it when we where coming from oob, thus went >> - * through a migration that was caused by taking, e.g., a fault. >> - */ >> - if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >> - state.stage_info == IRQENTRY_OOB) >> - unstall_inband(); >> -#endif >> + if (synchronized) >> + irqentry_unstall(); >> + >> return; >> } >> >> > > I've put that into a staging/v5.10.y-dovetail branch, pointed > xenomai-images to that and started a run in our lab. That should give us > some confidence that things work under cobalt across all archs, and we > can make that commit official. > > However, heavy stressing would require triggering migration-on-fault in > tighter loops while running something like stress-ng in the background. > Maybe it would make sense to add such a pattern to the switchtest (would > also require a way to silence the related kernel warnings during > runtime, not only build-time)? > switchtest aims at exercising the core scheduling logic including the transition between execution stages, so migration-on-fault would fit there indeed. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-09 6:12 ` Philippe Gerum @ 2021-06-11 7:15 ` Jan Kiszka 2021-06-11 15:39 ` Philippe Gerum 0 siblings, 1 reply; 27+ messages in thread From: Jan Kiszka @ 2021-06-11 7:15 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai On 09.06.21 08:12, Philippe Gerum wrote: > > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 08.06.21 19:39, Philippe Gerum wrote: >>> >>> Philippe Gerum <rpm@xenomai.org> writes: >>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 07.06.21 18:19, Philippe Gerum wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>> >>>>>>> On 07.06.21 17:04, Philippe Gerum wrote: >>>>>>>> >>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>> >>>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>>>>>>> >>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>>> >>>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>>>>>>> dovetail.. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So, always this way? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> local_irq_disable_full: >>>>>>>>>>>>>>> hard_local_irq_disable >>>>>>>>>>>>>>> local_irq_disable >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> local_irq_enable_full: >>>>>>>>>>>>>>> hard_local_irq_enable >>>>>>>>>>>>>>> local_irq_enable >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>>>>>>> other way around? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> After review, I don't think so. The current order for >>>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 1. These are fine with me in their latest iteration: >>>>>>>>>>>> >>>>>>>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>>>>>>> >>>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>>>>>>> pending some tests ongoing here: >>>>>>>>>>>> >>>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>>>>>>> >>>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>>>>>>> migration across faults" patch. This would address the following >>>>>>>>>>>> scenario: >>>>>>>>>>>> >>>>>>>>>>>> irqentry_enter on kernel context: >>>>>>>>>>>> stall_inband(); >>>>>>>>>>>> trap_handler(); >>>>>>>>>>>> hard_local_irq_enable() >>>>>>>>>>>> hard_local_irq_disable() >>>>>>>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>>>>>>> unstall_inband(); >>>>>>>>>>>> synchronize_pipeline(); >>>>>>>>>>> >>>>>>>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>>>>>>> hard_local_irq_disable or in general? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>>>>>>> directly? >>>>>>>>>> >>>>>>>>> >>>>>>>>> ...because it's instrumented to detect hard_irqs_disabled. >>>>>>>>> >>>>>>>>> Can we safely sync at that point? >>>>>>>> >>>>>>>> Yes we may do this, because we restore the original -unstalled- state of >>>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this >>>>>>>> point. What we may not do obviously, is creating synchronization points >>>>>>>> where the kernel does not expect irqs. >>>>>>>> >>>>>>> >>>>>>> Again: When will any reschedules that might be requested by those >>>>>>> replayed IRQs be handled then? >>>>>>> >>>>>> >>>>>> Nowhere unless we provide a log synchronization routine dedicated to the >>>>>> in-band kernel exit path, which would fold in the rescheduling call via >>>>>> irqentry_exit_cond_resched(). I'll write that one. >>>>>> >>>>> >>>>> How about simply moving the replay up, before the kernel checks for >>>>> pending reschedules anyway? We can stall inband then again, to please >>>>> lockdep & Co., as long as there is no hard-irq-on section before the intret. >>>>> >>>> >>>> Yes, this is the right way to do this. I'm on it. >>> >>> Please test this (heavily) with cobalt. This survived a 2-hours long kvm >>> test here, running loops of the evl test suite + hackbench over 64 vcpus, >>> so far so good. No reason for any difference in behavior between cobalt >>> and evl in this area, this is merely a dovetail business. >>> >>> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl) >>> Author: Philippe Gerum <rpm@xenomai.org> >>> >>> genirq: irq_pipeline: synchronize log on irq exit to kernel >>> >>> We must make sure to play any IRQ which might be pending in the >>> in-band log before leaving an interrupt frame for a preempted kernel >>> context. >>> >>> This completes "irq_pipeline: Account for stage migration across >>> faults", so that we synchronize the log once the in-band stage is >>> unstalled. In addition, we also care to do this before >>> preempt_schedule_irq() runs, so that we won't miss any rescheduling >>> request which might have been triggered by some IRQ we just played. >>> >>> Signed-off-by: Philippe Gerum <rpm@xenomai.org> >>> Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> >>> >>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c >>> index 4e81c0c03e5726a..99512307537561b 100644 >>> --- a/kernel/entry/common.c >>> +++ b/kernel/entry/common.c >>> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state, >>> >>> #endif >>> >>> +#ifdef CONFIG_IRQ_PIPELINE >>> + >>> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */ >>> +{ >>> + /* >>> + * If pipelining interrupts, enable in-band IRQs then >>> + * synchronize the interrupt log on exit if: >>> + * >>> + * - irqentry_enter() stalled the stage in order to mirror the >>> + * hardware state. >>> + * >>> + * - we where coming from oob, thus went through a stage migration >>> + * that was caused by taking a CPU exception, e.g., a fault. >>> + * >>> + * We run before preempt_schedule_irq() may be called later on >>> + * by preemptible kernels, so that any rescheduling request >>> + * triggered by in-band IRQ handlers is considered. >>> + */ >>> + if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>> + state.stage_info == IRQENTRY_OOB) { >>> + unstall_inband_nocheck(); >>> + synchronize_pipeline_on_irq(); >>> + stall_inband_nocheck(); >>> + return true; >>> + } >>> + >>> + return false; >>> +} >>> + >>> +static void irqentry_unstall(void) >>> +{ >>> + unstall_inband_nocheck(); >>> +} >>> + >>> +#else >>> + >>> +static bool irqentry_syncstage(irqentry_state_t state) >>> +{ >>> + return false; >>> +} >>> + >>> +static void irqentry_unstall(void) >>> +{ >>> +} >>> + >>> +#endif >>> + >>> noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >>> { >>> + bool synchronized = false; >>> + >>> if (running_oob()) >>> return; >>> >>> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >>> if (user_mode(regs)) { >>> irqentry_exit_to_user_mode(regs); >>> return; >>> - } else if (irqexit_may_preempt_schedule(state, regs)) { >>> + } >>> + >>> + synchronized = irqentry_syncstage(state); >>> + >>> + if (irqexit_may_preempt_schedule(state, regs)) { >>> /* >>> * If RCU was not watching on entry this needs to be done >>> * carefully and needs the same ordering of lockdep/tracing >>> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >>> } >>> >>> out: >>> -#ifdef CONFIG_IRQ_PIPELINE >>> - /* >>> - * If pipelining interrupts, clear the in-band stall bit if >>> - * irqentry_enter() raised it in order to mirror the hardware >>> - * state. Also clear it when we where coming from oob, thus went >>> - * through a migration that was caused by taking, e.g., a fault. >>> - */ >>> - if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>> - state.stage_info == IRQENTRY_OOB) >>> - unstall_inband(); >>> -#endif >>> + if (synchronized) >>> + irqentry_unstall(); >>> + >>> return; >>> } >>> >>> >> >> I've put that into a staging/v5.10.y-dovetail branch, pointed >> xenomai-images to that and started a run in our lab. That should give us >> some confidence that things work under cobalt across all archs, and we >> can make that commit official. >> >> However, heavy stressing would require triggering migration-on-fault in >> tighter loops while running something like stress-ng in the background. >> Maybe it would make sense to add such a pattern to the switchtest (would >> also require a way to silence the related kernel warnings during >> runtime, not only build-time)? >> > > switchtest aims at exercising the core scheduling logic including the > transition between execution stages, so migration-on-fault would fit > there indeed. > Do we need this testcase first? My feeling from review and testing is that is safe to move forward with the patch already. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-11 7:15 ` Jan Kiszka @ 2021-06-11 15:39 ` Philippe Gerum 0 siblings, 0 replies; 27+ messages in thread From: Philippe Gerum @ 2021-06-11 15:39 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 09.06.21 08:12, Philippe Gerum wrote: >> >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 08.06.21 19:39, Philippe Gerum wrote: >>>> >>>> Philippe Gerum <rpm@xenomai.org> writes: >>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 07.06.21 18:19, Philippe Gerum wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>> >>>>>>>> On 07.06.21 17:04, Philippe Gerum wrote: >>>>>>>>> >>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>> >>>>>>>>>> On 07.06.21 15:48, Jan Kiszka wrote: >>>>>>>>>>> On 07.06.21 15:44, Jan Kiszka wrote: >>>>>>>>>>>> On 07.06.21 15:03, Philippe Gerum wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>>>> >>>>>>>>>>>>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>>>>>>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>>>>>>>>>>>> do not inject it before returning to user space. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>>>>>>>>>>>> dovetail.. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>>>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>>>>>>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>>>>>>>>>>>> --- a/include/linux/entry-common.h >>>>>>>>>>>>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>>>>>>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>>>>>>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>>>>>>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> - local_irq_disable_full(); >>>>>>>>>>>>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>>>>>>>>>>>> + local_irq_disable(); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>>>>>>>>>>>> fix looks good, but maybe we could do even better. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>>>>>>>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>>>>>>>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>>>>>>>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>>>>>>>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>>>>>>>>>>>> way, and we don't leave through the user exit path. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>>>>>>>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>>>>>>>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>>>>>>>>>>>> way. This needs more thought I think. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So, always this way? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> local_irq_disable_full: >>>>>>>>>>>>>>>> hard_local_irq_disable >>>>>>>>>>>>>>>> local_irq_disable >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> local_irq_enable_full: >>>>>>>>>>>>>>>> hard_local_irq_enable >>>>>>>>>>>>>>>> local_irq_enable >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Or even flip local_irq_enable_full as well? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>>>>>>>>>>>> otherwise we would trigger a debug assertion. The reason for such >>>>>>>>>>>>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>>>>>>>>>>>> hard irqs on, and this is something we want to depart from because >>>>>>>>>>>>>>> whether the caller expects or not such side-effect is error-prone. For >>>>>>>>>>>>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>>>>>>>>>>>> the current hard irq state for the caller under normal circumstances. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>>>>>>>>>>>> other way around? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After review, I don't think so. The current order for >>>>>>>>>>>>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>>>>>>>>>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>>>>>>>>>>>> to test this on the armv[7/8] side here first before merging. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there a commit somewhere that I can drop already? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 1. These are fine with me in their latest iteration: >>>>>>>>>>>>> >>>>>>>>>>>>> irq_pipeline: Clean up stage_info field and users >>>>>>>>>>>>> irq_pipeline: Account for stage migration across faults >>>>>>>>>>>>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>>>>>>>>>>>> x86: dovetail: Fix TS flag reservation >>>>>>>>>>>>> >>>>>>>>>>>>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>>>>>>>>>>>> pending some tests ongoing here: >>>>>>>>>>>>> >>>>>>>>>>>>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>>>>>>>>>>>> >>>>>>>>>>>>> However, as I mentioned earlier, this may not be sufficient per se, we >>>>>>>>>>>>> may have to synchronize the in-band log when unstalling the stage on the >>>>>>>>>>>>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>>>>>>>>>>>> migration across faults" patch. This would address the following >>>>>>>>>>>>> scenario: >>>>>>>>>>>>> >>>>>>>>>>>>> irqentry_enter on kernel context: >>>>>>>>>>>>> stall_inband(); >>>>>>>>>>>>> trap_handler(); >>>>>>>>>>>>> hard_local_irq_enable() >>>>>>>>>>>>> hard_local_irq_disable() >>>>>>>>>>>>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>>>>>>>>>>> state.stage_info == IRQENTRY_OOB) { >>>>>>>>>>>>> unstall_inband(); >>>>>>>>>>>>> synchronize_pipeline(); >>>>>>>>>>>> >>>>>>>>>>>> Oh, unstall_inband does not synchronize? Because of >>>>>>>>>>>> hard_local_irq_disable or in general? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ah, found it - unstall != inband_irq_enable. Why not using the latter >>>>>>>>>>> directly? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ...because it's instrumented to detect hard_irqs_disabled. >>>>>>>>>> >>>>>>>>>> Can we safely sync at that point? >>>>>>>>> >>>>>>>>> Yes we may do this, because we restore the original -unstalled- state of >>>>>>>>> the in-band stage. So if we have pending IRQs, we may play them at this >>>>>>>>> point. What we may not do obviously, is creating synchronization points >>>>>>>>> where the kernel does not expect irqs. >>>>>>>>> >>>>>>>> >>>>>>>> Again: When will any reschedules that might be requested by those >>>>>>>> replayed IRQs be handled then? >>>>>>>> >>>>>>> >>>>>>> Nowhere unless we provide a log synchronization routine dedicated to the >>>>>>> in-band kernel exit path, which would fold in the rescheduling call via >>>>>>> irqentry_exit_cond_resched(). I'll write that one. >>>>>>> >>>>>> >>>>>> How about simply moving the replay up, before the kernel checks for >>>>>> pending reschedules anyway? We can stall inband then again, to please >>>>>> lockdep & Co., as long as there is no hard-irq-on section before the intret. >>>>>> >>>>> >>>>> Yes, this is the right way to do this. I'm on it. >>>> >>>> Please test this (heavily) with cobalt. This survived a 2-hours long kvm >>>> test here, running loops of the evl test suite + hackbench over 64 vcpus, >>>> so far so good. No reason for any difference in behavior between cobalt >>>> and evl in this area, this is merely a dovetail business. >>>> >>>> commit 30808d958277a525baad33f3c68ccb7910dc3176 (HEAD -> rebase/v5.10.y-evl) >>>> Author: Philippe Gerum <rpm@xenomai.org> >>>> >>>> genirq: irq_pipeline: synchronize log on irq exit to kernel >>>> >>>> We must make sure to play any IRQ which might be pending in the >>>> in-band log before leaving an interrupt frame for a preempted kernel >>>> context. >>>> >>>> This completes "irq_pipeline: Account for stage migration across >>>> faults", so that we synchronize the log once the in-band stage is >>>> unstalled. In addition, we also care to do this before >>>> preempt_schedule_irq() runs, so that we won't miss any rescheduling >>>> request which might have been triggered by some IRQ we just played. >>>> >>>> Signed-off-by: Philippe Gerum <rpm@xenomai.org> >>>> Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> >>>> >>>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c >>>> index 4e81c0c03e5726a..99512307537561b 100644 >>>> --- a/kernel/entry/common.c >>>> +++ b/kernel/entry/common.c >>>> @@ -477,8 +477,57 @@ bool irqexit_may_preempt_schedule(irqentry_state_t state, >>>> >>>> #endif >>>> >>>> +#ifdef CONFIG_IRQ_PIPELINE >>>> + >>>> +static bool irqentry_syncstage(irqentry_state_t state) /* hard irqs off */ >>>> +{ >>>> + /* >>>> + * If pipelining interrupts, enable in-band IRQs then >>>> + * synchronize the interrupt log on exit if: >>>> + * >>>> + * - irqentry_enter() stalled the stage in order to mirror the >>>> + * hardware state. >>>> + * >>>> + * - we where coming from oob, thus went through a stage migration >>>> + * that was caused by taking a CPU exception, e.g., a fault. >>>> + * >>>> + * We run before preempt_schedule_irq() may be called later on >>>> + * by preemptible kernels, so that any rescheduling request >>>> + * triggered by in-band IRQ handlers is considered. >>>> + */ >>>> + if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>> + state.stage_info == IRQENTRY_OOB) { >>>> + unstall_inband_nocheck(); >>>> + synchronize_pipeline_on_irq(); >>>> + stall_inband_nocheck(); >>>> + return true; >>>> + } >>>> + >>>> + return false; >>>> +} >>>> + >>>> +static void irqentry_unstall(void) >>>> +{ >>>> + unstall_inband_nocheck(); >>>> +} >>>> + >>>> +#else >>>> + >>>> +static bool irqentry_syncstage(irqentry_state_t state) >>>> +{ >>>> + return false; >>>> +} >>>> + >>>> +static void irqentry_unstall(void) >>>> +{ >>>> +} >>>> + >>>> +#endif >>>> + >>>> noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >>>> { >>>> + bool synchronized = false; >>>> + >>>> if (running_oob()) >>>> return; >>>> >>>> @@ -488,7 +537,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >>>> if (user_mode(regs)) { >>>> irqentry_exit_to_user_mode(regs); >>>> return; >>>> - } else if (irqexit_may_preempt_schedule(state, regs)) { >>>> + } >>>> + >>>> + synchronized = irqentry_syncstage(state); >>>> + >>>> + if (irqexit_may_preempt_schedule(state, regs)) { >>>> /* >>>> * If RCU was not watching on entry this needs to be done >>>> * carefully and needs the same ordering of lockdep/tracing >>>> @@ -521,17 +574,9 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) >>>> } >>>> >>>> out: >>>> -#ifdef CONFIG_IRQ_PIPELINE >>>> - /* >>>> - * If pipelining interrupts, clear the in-band stall bit if >>>> - * irqentry_enter() raised it in order to mirror the hardware >>>> - * state. Also clear it when we where coming from oob, thus went >>>> - * through a migration that was caused by taking, e.g., a fault. >>>> - */ >>>> - if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>>> - state.stage_info == IRQENTRY_OOB) >>>> - unstall_inband(); >>>> -#endif >>>> + if (synchronized) >>>> + irqentry_unstall(); >>>> + >>>> return; >>>> } >>>> >>>> >>> >>> I've put that into a staging/v5.10.y-dovetail branch, pointed >>> xenomai-images to that and started a run in our lab. That should give us >>> some confidence that things work under cobalt across all archs, and we >>> can make that commit official. >>> >>> However, heavy stressing would require triggering migration-on-fault in >>> tighter loops while running something like stress-ng in the background. >>> Maybe it would make sense to add such a pattern to the switchtest (would >>> also require a way to silence the related kernel warnings during >>> runtime, not only build-time)? >>> >> >> switchtest aims at exercising the core scheduling logic including the >> transition between execution stages, so migration-on-fault would fit >> there indeed. >> > > Do we need this testcase first? My feeling from review and testing is > that is safe to move forward with the patch already. > Looks ok here too. The patch is fairly straightforward, and this has been tested on the EVL side of the universe on a handful of armv7, armv8 machines without any glitch. I'll merge this patch shortly. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:48 ` Jan Kiszka 2021-06-07 13:58 ` Jan Kiszka @ 2021-06-07 14:37 ` Philippe Gerum 1 sibling, 0 replies; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 14:37 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 15:44, Jan Kiszka wrote: >> On 07.06.21 15:03, Philippe Gerum wrote: >>> >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 07.06.21 09:17, Philippe Gerum wrote: >>>>> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>>> >>>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>>> >>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>> >>>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>>> do not inject it before returning to user space. >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>> --- >>>>>>>> >>>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>>> >>>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>>> dovetail.. >>>>>>>> >>>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>>> --- a/include/linux/entry-common.h >>>>>>>> +++ b/include/linux/entry-common.h >>>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>>> { >>>>>>>> - local_irq_disable_full(); >>>>>>>> + hard_cond_local_irq_disable(); >>>>>>>> + local_irq_disable(); >>>>>>>> } >>>>>>>> #endif >>>>>>> >>>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>>> fix looks good, but maybe we could do even better. >>>>>>> >>>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>>> way, and we don't leave through the user exit path. >>>>>>> >>>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>>> caught in the same issue when returning to kernel mode is another >>>>>>> way. This needs more thought I think. >>>>>>> >>>>>> >>>>>> So, always this way? >>>>>> >>>>>> local_irq_disable_full: >>>>>> hard_local_irq_disable >>>>>> local_irq_disable >>>>>> >>>>>> local_irq_enable_full: >>>>>> hard_local_irq_enable >>>>>> local_irq_enable >>>>>> >>>>> >>>>> Yes. >>>>> >>>>>> Or even flip local_irq_enable_full as well? >>>>> >>>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>>> otherwise we would trigger a debug assertion. The reason for such >>>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>>> hard irqs on, and this is something we want to depart from because >>>>> whether the caller expects or not such side-effect is error-prone. For >>>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>>> the current hard irq state for the caller under normal circumstances. >>>>> >>>>>> >>>>>> Was there ever code that required ordering local_irq_disable_full the >>>>>> other way around? >>>>>> >>>>> >>>>> After review, I don't think so. The current order for >>>>> local_irq_disable_full() was rather determined by applying a reverse >>>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>>> to test this on the armv[7/8] side here first before merging. >>>>> >>>> >>>> Is there a commit somewhere that I can drop already? >>>> >>> >>> 1. These are fine with me in their latest iteration: >>> >>> irq_pipeline: Clean up stage_info field and users >>> irq_pipeline: Account for stage migration across faults >>> irq_pipeline: Warn when calling irqentry_enter with oob stalled >>> x86: dovetail: Fix TS flag reservation >>> >>> 2. This one should be replaced by a fix in local_irq_disable_full(), >>> pending some tests ongoing here: >>> >>> irq_pipeline: Prevent returning to user space with pending inband IRQs >>> >>> However, as I mentioned earlier, this may not be sufficient per se, we >>> may have to synchronize the in-band log when unstalling the stage on the >>> kernel exit path, i.e. the exit point fixed up by the "Account for stage >>> migration across faults" patch. This would address the following >>> scenario: >>> >>> irqentry_enter on kernel context: >>> stall_inband(); >>> trap_handler(); >>> hard_local_irq_enable() >>> hard_local_irq_disable() >>> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >>> state.stage_info == IRQENTRY_OOB) { >>> unstall_inband(); >>> synchronize_pipeline(); >> >> Oh, unstall_inband does not synchronize? Because of >> hard_local_irq_disable or in general? >> > > Ah, found it - unstall != inband_irq_enable. Why not using the latter > directly? > Well, in this case I should have, hence the bug. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs 2021-06-07 13:44 ` Jan Kiszka 2021-06-07 13:48 ` Jan Kiszka @ 2021-06-07 14:36 ` Philippe Gerum 1 sibling, 0 replies; 27+ messages in thread From: Philippe Gerum @ 2021-06-07 14:36 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai Jan Kiszka <jan.kiszka@siemens.com> writes: > On 07.06.21 15:03, Philippe Gerum wrote: >> >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 07.06.21 09:17, Philippe Gerum wrote: >>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 06.06.21 16:43, Philippe Gerum via Xenomai wrote: >>>>>> >>>>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>>>> >>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>> >>>>>>> The correct order is hard IRQs off first, then stalling inband, see also >>>>>>> I-pipe. Otherwise, we risk to take an interrupt for the inband stage but >>>>>>> do not inject it before returning to user space. >>>>>>> >>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>> --- >>>>>>> >>>>>>> Fixes the RCU stalls Florian reported, at least for me. >>>>>>> >>>>>>> I'm afraid this wasn't the last regression of translating I-pipe into >>>>>>> dovetail.. >>>>>>> >>>>>>> include/linux/entry-common.h | 3 ++- >>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h >>>>>>> index 0fb45b2d6094..00540110985e 100644 >>>>>>> --- a/include/linux/entry-common.h >>>>>>> +++ b/include/linux/entry-common.h >>>>>>> @@ -203,7 +203,8 @@ static inline void local_irq_disable_exit_to_user(void); >>>>>>> #ifndef local_irq_disable_exit_to_user >>>>>>> static inline void local_irq_disable_exit_to_user(void) >>>>>>> { >>>>>>> - local_irq_disable_full(); >>>>>>> + hard_cond_local_irq_disable(); >>>>>>> + local_irq_disable(); >>>>>>> } >>>>>>> #endif >>>>>> >>>>>> Good spot, real issue here, just like the CPU_IDLE code deals with. The >>>>>> fix looks good, but maybe we could do even better. >>>>>> >>>>>> The way local_irq_disable_full() works tends to introduce this issue in >>>>>> a sneaky way, when the code path does not synchronize the interrupt log >>>>>> (exit_to_user_mode_loop may be at fault in this case, which does not >>>>>> traverses synchronize_pipeline()). Lack of synchronization would still >>>>>> hit us if some trap handler turns hard irqs off -> on -> off the same >>>>>> way, and we don't leave through the user exit path. >>>>>> >>>>>> Inverting the order of the actions in local_irq_disable_full() may be an >>>>>> option (inband_irq_disable would allow that), making sure we cannot be >>>>>> caught in the same issue when returning to kernel mode is another >>>>>> way. This needs more thought I think. >>>>>> >>>>> >>>>> So, always this way? >>>>> >>>>> local_irq_disable_full: >>>>> hard_local_irq_disable >>>>> local_irq_disable >>>>> >>>>> local_irq_enable_full: >>>>> hard_local_irq_enable >>>>> local_irq_enable >>>>> >>>> >>>> Yes. >>>> >>>>> Or even flip local_irq_enable_full as well? >>>> >>>> Not this one, hard irqs must be on before local_irq_enable() is issued >>>> otherwise we would trigger a debug assertion. The reason for such >>>> assertion is that the I-pipe version of local_irq_enable() force enables >>>> hard irqs on, and this is something we want to depart from because >>>> whether the caller expects or not such side-effect is error-prone. For >>>> this reason, inband_irq_enable() expects hard irqs on, which should be >>>> the current hard irq state for the caller under normal circumstances. >>>> >>>>> >>>>> Was there ever code that required ordering local_irq_disable_full the >>>>> other way around? >>>>> >>>> >>>> After review, I don't think so. The current order for >>>> local_irq_disable_full() was rather determined by applying a reverse >>>> sequence compared to local_irq_enable_full(). So this looks good. I need >>>> to test this on the armv[7/8] side here first before merging. >>>> >>> >>> Is there a commit somewhere that I can drop already? >>> >> >> 1. These are fine with me in their latest iteration: >> >> irq_pipeline: Clean up stage_info field and users >> irq_pipeline: Account for stage migration across faults >> irq_pipeline: Warn when calling irqentry_enter with oob stalled >> x86: dovetail: Fix TS flag reservation >> >> 2. This one should be replaced by a fix in local_irq_disable_full(), >> pending some tests ongoing here: >> >> irq_pipeline: Prevent returning to user space with pending inband IRQs >> >> However, as I mentioned earlier, this may not be sufficient per se, we >> may have to synchronize the in-band log when unstalling the stage on the >> kernel exit path, i.e. the exit point fixed up by the "Account for stage >> migration across faults" patch. This would address the following >> scenario: >> >> irqentry_enter on kernel context: >> stall_inband(); >> trap_handler(); >> hard_local_irq_enable() >> hard_local_irq_disable() >> if (state.stage_info == IRQENTRY_INBAND_UNSTALLED || >> state.stage_info == IRQENTRY_OOB) { >> unstall_inband(); >> synchronize_pipeline(); > > Oh, unstall_inband does not synchronize? Because of > hard_local_irq_disable or in general? > In general, this is only a bit flip. The caller has to do the right thing afterwards, e.g. inband_irq_enable() or inband_irq_restore() do so. -- Philippe. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2021-06-11 15:39 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-06 12:35 [PATCH] irq_pipeline: Prevent returning to user space with pending inband IRQs Jan Kiszka 2021-06-06 14:43 ` Philippe Gerum 2021-06-07 5:53 ` Jan Kiszka 2021-06-07 7:17 ` Philippe Gerum 2021-06-07 10:22 ` Jan Kiszka 2021-06-07 13:03 ` Philippe Gerum 2021-06-07 13:28 ` Philippe Gerum 2021-06-07 13:29 ` Philippe Gerum 2021-06-07 13:40 ` Jan Kiszka 2021-06-07 15:11 ` Philippe Gerum 2021-06-07 13:44 ` Jan Kiszka 2021-06-07 13:48 ` Jan Kiszka 2021-06-07 13:58 ` Jan Kiszka 2021-06-07 14:44 ` Philippe Gerum 2021-06-07 14:52 ` Jan Kiszka 2021-06-07 15:04 ` Philippe Gerum 2021-06-07 15:38 ` Jan Kiszka 2021-06-07 16:19 ` Philippe Gerum 2021-06-07 16:35 ` Jan Kiszka 2021-06-08 16:50 ` Philippe Gerum 2021-06-08 17:39 ` Philippe Gerum 2021-06-09 5:34 ` Jan Kiszka 2021-06-09 6:12 ` Philippe Gerum 2021-06-11 7:15 ` Jan Kiszka 2021-06-11 15:39 ` Philippe Gerum 2021-06-07 14:37 ` Philippe Gerum 2021-06-07 14:36 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.