xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
@ 2023-08-25 12:58 Florian Bezdeka
  2023-08-28 12:33 ` Jan Kiszka
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-08-25 12:58 UTC (permalink / raw)
  To: xenomai; +Cc: jan.kiszka, rpm, Florian Bezdeka, Clara Kowalsky

When running an compat RT application on arm64 the break trap is
handled via the undefined instruction trap.

A possible call stack looks like this:

Call trace:
  handle_inband_event+0x2d0/0x320
  inband_event_notify+0x28/0x50
  signal_wake_up_state+0x7c/0xa4
  complete_signal+0x104/0x2d0
  __send_signal_locked+0x1d0/0x3e4
  send_signal_locked+0xf0/0x140
  force_sig_info_to_task+0xa0/0x164
  force_sig_fault+0x64/0x94
  arm64_force_sig_fault+0x48/0x80
  send_user_sigtrap+0x50/0x8c
  aarch32_break_handler+0xac/0x1d0
  do_undefinstr+0x6c/0x360
  el0_undef+0x4c/0xd0
  el0t_32_sync_handler+0xd0/0x140
  el0t_32_sync+0x190/0x194

The trap is never reported to the companion core at that stage so
running_oob() in do_undefinstr() will always return true. As the
following bailout happens before calling the compat breakpoint
detection (aarch32_break_handler()) debugging the compat 
application does not work.

In addition aarch32_break_handler() has to report the trap entry to the
companion core.

Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---
 arch/arm64/kernel/debug-monitors.c | 3 +++
 arch/arm64/kernel/traps.c          | 7 -------
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 32271ed24ef5..ef7ac042a0a6 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
 	if (!bp)
 		return -EFAULT;
 
+	mark_trap_entry(ARM64_TRAP_UNDI, regs);
 	send_user_sigtrap(TRAP_BRKPT);
+	mark_trap_exit(ARM64_TRAP_UNDI, regs);
+
 	return 0;
 }
 NOKPROBE_SYMBOL(aarch32_break_handler);
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cc68be400244..9bf2f309248f 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
 
 void do_undefinstr(struct pt_regs *regs, unsigned long esr)
 {
-	/*
-	 * If the companion core did not switched us to in-band
-	 * context, we may assume that it has handled the trap.
-	 */
-	if (running_oob())
-		return;
-
 	/* check for AArch32 breakpoint instructions */
 	if (!aarch32_break_handler(regs))
 		return;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
  2023-08-25 12:58 [PATCH] arm64: dovetail: Fix undefinstr/break trap handling Florian Bezdeka
@ 2023-08-28 12:33 ` Jan Kiszka
  2023-08-28 12:52   ` Florian Bezdeka
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2023-08-28 12:33 UTC (permalink / raw)
  To: Florian Bezdeka, xenomai; +Cc: rpm, Clara Kowalsky

On 25.08.23 14:58, Florian Bezdeka wrote:
> When running an compat RT application on arm64 the break trap is
> handled via the undefined instruction trap.
> 
> A possible call stack looks like this:
> 
> Call trace:
>   handle_inband_event+0x2d0/0x320
>   inband_event_notify+0x28/0x50
>   signal_wake_up_state+0x7c/0xa4
>   complete_signal+0x104/0x2d0
>   __send_signal_locked+0x1d0/0x3e4
>   send_signal_locked+0xf0/0x140
>   force_sig_info_to_task+0xa0/0x164
>   force_sig_fault+0x64/0x94
>   arm64_force_sig_fault+0x48/0x80
>   send_user_sigtrap+0x50/0x8c
>   aarch32_break_handler+0xac/0x1d0
>   do_undefinstr+0x6c/0x360
>   el0_undef+0x4c/0xd0
>   el0t_32_sync_handler+0xd0/0x140
>   el0t_32_sync+0x190/0x194
> 
> The trap is never reported to the companion core at that stage so
> running_oob() in do_undefinstr() will always return true. As the
> following bailout happens before calling the compat breakpoint
> detection (aarch32_break_handler()) debugging the compat 
> application does not work.
> 
> In addition aarch32_break_handler() has to report the trap entry to the
> companion core.
> 
> Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
> Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>  arch/arm64/kernel/debug-monitors.c | 3 +++
>  arch/arm64/kernel/traps.c          | 7 -------
>  2 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 32271ed24ef5..ef7ac042a0a6 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
>  	if (!bp)
>  		return -EFAULT;
>  
> +	mark_trap_entry(ARM64_TRAP_UNDI, regs);
>  	send_user_sigtrap(TRAP_BRKPT);
> +	mark_trap_exit(ARM64_TRAP_UNDI, regs);
> +
>  	return 0;
>  }
>  NOKPROBE_SYMBOL(aarch32_break_handler);
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index cc68be400244..9bf2f309248f 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
>  
>  void do_undefinstr(struct pt_regs *regs, unsigned long esr)
>  {
> -	/*
> -	 * If the companion core did not switched us to in-band
> -	 * context, we may assume that it has handled the trap.
> -	 */
> -	if (running_oob())
> -		return;
> -
>  	/* check for AArch32 breakpoint instructions */
>  	if (!aarch32_break_handler(regs))
>  		return;

This is not against v6.5-dovetail-rebase, right? We likely need to start
from the top, then backport to stable.

Also note that this change came in via
https://source.denx.de/Xenomai/linux-dovetail/-/commit/2b2ccdaeb8116727cf4076960d664a3cedff0ac6,
so just dropping it will likely cause problems elsewhere. Should we
rather move that down in the handler?

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
  2023-08-28 12:33 ` Jan Kiszka
@ 2023-08-28 12:52   ` Florian Bezdeka
  2023-08-28 13:04     ` Jan Kiszka
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-08-28 12:52 UTC (permalink / raw)
  To: Jan Kiszka, xenomai; +Cc: rpm, Clara Kowalsky

On Mon, 2023-08-28 at 14:33 +0200, Jan Kiszka wrote:
> On 25.08.23 14:58, Florian Bezdeka wrote:
> > When running an compat RT application on arm64 the break trap is
> > handled via the undefined instruction trap.
> > 
> > A possible call stack looks like this:
> > 
> > Call trace:
> >   handle_inband_event+0x2d0/0x320
> >   inband_event_notify+0x28/0x50
> >   signal_wake_up_state+0x7c/0xa4
> >   complete_signal+0x104/0x2d0
> >   __send_signal_locked+0x1d0/0x3e4
> >   send_signal_locked+0xf0/0x140
> >   force_sig_info_to_task+0xa0/0x164
> >   force_sig_fault+0x64/0x94
> >   arm64_force_sig_fault+0x48/0x80
> >   send_user_sigtrap+0x50/0x8c
> >   aarch32_break_handler+0xac/0x1d0
> >   do_undefinstr+0x6c/0x360
> >   el0_undef+0x4c/0xd0
> >   el0t_32_sync_handler+0xd0/0x140
> >   el0t_32_sync+0x190/0x194
> > 
> > The trap is never reported to the companion core at that stage so
> > running_oob() in do_undefinstr() will always return true. As the
> > following bailout happens before calling the compat breakpoint
> > detection (aarch32_break_handler()) debugging the compat 
> > application does not work.
> > 
> > In addition aarch32_break_handler() has to report the trap entry to the
> > companion core.
> > 
> > Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
> > Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> > ---
> >  arch/arm64/kernel/debug-monitors.c | 3 +++
> >  arch/arm64/kernel/traps.c          | 7 -------
> >  2 files changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> > index 32271ed24ef5..ef7ac042a0a6 100644
> > --- a/arch/arm64/kernel/debug-monitors.c
> > +++ b/arch/arm64/kernel/debug-monitors.c
> > @@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
> >  	if (!bp)
> >  		return -EFAULT;
> >  
> > +	mark_trap_entry(ARM64_TRAP_UNDI, regs);
> >  	send_user_sigtrap(TRAP_BRKPT);
> > +	mark_trap_exit(ARM64_TRAP_UNDI, regs);
> > +
> >  	return 0;
> >  }
> >  NOKPROBE_SYMBOL(aarch32_break_handler);
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > index cc68be400244..9bf2f309248f 100644
> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
> >  
> >  void do_undefinstr(struct pt_regs *regs, unsigned long esr)
> >  {
> > -	/*
> > -	 * If the companion core did not switched us to in-band
> > -	 * context, we may assume that it has handled the trap.
> > -	 */
> > -	if (running_oob())
> > -		return;
> > -
> >  	/* check for AArch32 breakpoint instructions */
> >  	if (!aarch32_break_handler(regs))
> >  		return;
> 
> This is not against v6.5-dovetail-rebase, right? We likely need to start
> from the top, then backport to stable.

That applied for 6.1 and all other (lower) versions that are currently
covered by our CI. Might need some lifting for 6.5, didn't check yet.

> 
> Also note that this change came in via
> https://source.denx.de/Xenomai/linux-dovetail/-/commit/2b2ccdaeb8116727cf4076960d664a3cedff0ac6,
> so just dropping it will likely cause problems elsewhere. Should we
> rather move that down in the handler?

Well, as written in the commit message running_oob() will always return
true for RT tasks so I'm quite sure that the invalid instruction
handling was broken on arm64 for some time now. We bailed out even
before sending the notification to the companion core.

Moving it down might fix the compat case but native arm64 would stay
broken. No?

Florian

> 
> Jan
> 
> -- 
> Siemens AG, Technology
> Linux Expert Center


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
  2023-08-28 12:52   ` Florian Bezdeka
@ 2023-08-28 13:04     ` Jan Kiszka
  2023-08-28 14:36       ` Philippe Gerum
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kiszka @ 2023-08-28 13:04 UTC (permalink / raw)
  To: Florian Bezdeka, xenomai, Philippe Gerum; +Cc: rpm, Clara Kowalsky

On 28.08.23 14:52, Florian Bezdeka wrote:
> On Mon, 2023-08-28 at 14:33 +0200, Jan Kiszka wrote:
>> On 25.08.23 14:58, Florian Bezdeka wrote:
>>> When running an compat RT application on arm64 the break trap is
>>> handled via the undefined instruction trap.
>>>
>>> A possible call stack looks like this:
>>>
>>> Call trace:
>>>   handle_inband_event+0x2d0/0x320
>>>   inband_event_notify+0x28/0x50
>>>   signal_wake_up_state+0x7c/0xa4
>>>   complete_signal+0x104/0x2d0
>>>   __send_signal_locked+0x1d0/0x3e4
>>>   send_signal_locked+0xf0/0x140
>>>   force_sig_info_to_task+0xa0/0x164
>>>   force_sig_fault+0x64/0x94
>>>   arm64_force_sig_fault+0x48/0x80
>>>   send_user_sigtrap+0x50/0x8c
>>>   aarch32_break_handler+0xac/0x1d0
>>>   do_undefinstr+0x6c/0x360
>>>   el0_undef+0x4c/0xd0
>>>   el0t_32_sync_handler+0xd0/0x140
>>>   el0t_32_sync+0x190/0x194
>>>
>>> The trap is never reported to the companion core at that stage so
>>> running_oob() in do_undefinstr() will always return true. As the
>>> following bailout happens before calling the compat breakpoint
>>> detection (aarch32_break_handler()) debugging the compat 
>>> application does not work.
>>>
>>> In addition aarch32_break_handler() has to report the trap entry to the
>>> companion core.
>>>
>>> Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
>>> Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
>>> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> ---
>>>  arch/arm64/kernel/debug-monitors.c | 3 +++
>>>  arch/arm64/kernel/traps.c          | 7 -------
>>>  2 files changed, 3 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
>>> index 32271ed24ef5..ef7ac042a0a6 100644
>>> --- a/arch/arm64/kernel/debug-monitors.c
>>> +++ b/arch/arm64/kernel/debug-monitors.c
>>> @@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
>>>  	if (!bp)
>>>  		return -EFAULT;
>>>  
>>> +	mark_trap_entry(ARM64_TRAP_UNDI, regs);
>>>  	send_user_sigtrap(TRAP_BRKPT);
>>> +	mark_trap_exit(ARM64_TRAP_UNDI, regs);
>>> +
>>>  	return 0;
>>>  }
>>>  NOKPROBE_SYMBOL(aarch32_break_handler);
>>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>>> index cc68be400244..9bf2f309248f 100644
>>> --- a/arch/arm64/kernel/traps.c
>>> +++ b/arch/arm64/kernel/traps.c
>>> @@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
>>>  
>>>  void do_undefinstr(struct pt_regs *regs, unsigned long esr)
>>>  {
>>> -	/*
>>> -	 * If the companion core did not switched us to in-band
>>> -	 * context, we may assume that it has handled the trap.
>>> -	 */
>>> -	if (running_oob())
>>> -		return;
>>> -
>>>  	/* check for AArch32 breakpoint instructions */
>>>  	if (!aarch32_break_handler(regs))
>>>  		return;
>>
>> This is not against v6.5-dovetail-rebase, right? We likely need to start
>> from the top, then backport to stable.
> 
> That applied for 6.1 and all other (lower) versions that are currently
> covered by our CI. Might need some lifting for 6.5, didn't check yet.
> 
>>
>> Also note that this change came in via
>> https://source.denx.de/Xenomai/linux-dovetail/-/commit/2b2ccdaeb8116727cf4076960d664a3cedff0ac6,
>> so just dropping it will likely cause problems elsewhere. Should we
>> rather move that down in the handler?
> 
> Well, as written in the commit message running_oob() will always return
> true for RT tasks so I'm quite sure that the invalid instruction
> handling was broken on arm64 for some time now. We bailed out even
> before sending the notification to the companion core.
> 
> Moving it down might fix the compat case but native arm64 would stay
> broken. No?
> 

Something looks still fishy, either in the original patch that
introduced the condition (I still don't get that special case) or now
with this change trying to restore things. I agree that also the
original change needed the notification to be delivered.

Philippe, can you help clarifying the logic behind
do_undefinstr/do_el0_undef?

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
  2023-08-28 13:04     ` Jan Kiszka
@ 2023-08-28 14:36       ` Philippe Gerum
  2023-08-29  7:42         ` Florian Bezdeka
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Gerum @ 2023-08-28 14:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Florian Bezdeka, xenomai, Clara Kowalsky


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 28.08.23 14:52, Florian Bezdeka wrote:
>> On Mon, 2023-08-28 at 14:33 +0200, Jan Kiszka wrote:
>>> On 25.08.23 14:58, Florian Bezdeka wrote:
>>>> When running an compat RT application on arm64 the break trap is
>>>> handled via the undefined instruction trap.
>>>>
>>>> A possible call stack looks like this:
>>>>
>>>> Call trace:
>>>>   handle_inband_event+0x2d0/0x320
>>>>   inband_event_notify+0x28/0x50
>>>>   signal_wake_up_state+0x7c/0xa4
>>>>   complete_signal+0x104/0x2d0
>>>>   __send_signal_locked+0x1d0/0x3e4
>>>>   send_signal_locked+0xf0/0x140
>>>>   force_sig_info_to_task+0xa0/0x164
>>>>   force_sig_fault+0x64/0x94
>>>>   arm64_force_sig_fault+0x48/0x80
>>>>   send_user_sigtrap+0x50/0x8c
>>>>   aarch32_break_handler+0xac/0x1d0
>>>>   do_undefinstr+0x6c/0x360
>>>>   el0_undef+0x4c/0xd0
>>>>   el0t_32_sync_handler+0xd0/0x140
>>>>   el0t_32_sync+0x190/0x194
>>>>
>>>> The trap is never reported to the companion core at that stage so
>>>> running_oob() in do_undefinstr() will always return true. As the
>>>> following bailout happens before calling the compat breakpoint
>>>> detection (aarch32_break_handler()) debugging the compat 
>>>> application does not work.
>>>>
>>>> In addition aarch32_break_handler() has to report the trap entry to the
>>>> companion core.
>>>>
>>>> Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
>>>> Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
>>>> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>> ---
>>>>  arch/arm64/kernel/debug-monitors.c | 3 +++
>>>>  arch/arm64/kernel/traps.c          | 7 -------
>>>>  2 files changed, 3 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
>>>> index 32271ed24ef5..ef7ac042a0a6 100644
>>>> --- a/arch/arm64/kernel/debug-monitors.c
>>>> +++ b/arch/arm64/kernel/debug-monitors.c
>>>> @@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
>>>>  	if (!bp)
>>>>  		return -EFAULT;
>>>>  
>>>> +	mark_trap_entry(ARM64_TRAP_UNDI, regs);
>>>>  	send_user_sigtrap(TRAP_BRKPT);
>>>> +	mark_trap_exit(ARM64_TRAP_UNDI, regs);
>>>> +
>>>>  	return 0;
>>>>  }
>>>>  NOKPROBE_SYMBOL(aarch32_break_handler);
>>>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>>>> index cc68be400244..9bf2f309248f 100644
>>>> --- a/arch/arm64/kernel/traps.c
>>>> +++ b/arch/arm64/kernel/traps.c
>>>> @@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
>>>>  
>>>>  void do_undefinstr(struct pt_regs *regs, unsigned long esr)
>>>>  {
>>>> -	/*
>>>> -	 * If the companion core did not switched us to in-band
>>>> -	 * context, we may assume that it has handled the trap.
>>>> -	 */
>>>> -	if (running_oob())
>>>> -		return;
>>>> -
>>>>  	/* check for AArch32 breakpoint instructions */
>>>>  	if (!aarch32_break_handler(regs))
>>>>  		return;
>>>
>>> This is not against v6.5-dovetail-rebase, right? We likely need to start
>>> from the top, then backport to stable.
>> 
>> That applied for 6.1 and all other (lower) versions that are currently
>> covered by our CI. Might need some lifting for 6.5, didn't check yet.
>> 
>>>
>>> Also note that this change came in via
>>> https://source.denx.de/Xenomai/linux-dovetail/-/commit/2b2ccdaeb8116727cf4076960d664a3cedff0ac6,
>>> so just dropping it will likely cause problems elsewhere. Should we
>>> rather move that down in the handler?
>> 
>> Well, as written in the commit message running_oob() will always return
>> true for RT tasks so I'm quite sure that the invalid instruction
>> handling was broken on arm64 for some time now. We bailed out even
>> before sending the notification to the companion core.
>> 
>> Moving it down might fix the compat case but native arm64 would stay
>> broken. No?
>> 
>
> Something looks still fishy, either in the original patch that
> introduced the condition (I still don't get that special case) or now
> with this change trying to restore things. I agree that also the
> original change needed the notification to be delivered.
>
> Philippe, can you help clarifying the logic behind
> do_undefinstr/do_el0_undef?
>

This very much looks like an unfortunate attempt to mimic the arm32
logic, which does notify the core about the undefined insn trap prior to
calling do_undefinstr() (und_fault from the asm entry code).

So Florian is right, this should not apply to the arm64 side because
unlike arm32, we need to wait until all branches which might be able to
handle this fault directly from oob are considered
(e.g. try_emulate_mrs()), before assuming that we might need the core to
switch us in-band. IOW, do_el0_undef() is broken since it wrongly
assumes that such switch might already have happened on entry. As a
matter of fact, it did not for a reason.

I see another issue hiding in the dark: emulation of the deprecated
armv8 SWP{B} instruction cannot be done from the oob stage.  So in
addition to fixing the aarch32 break handler, I would notify the core
before handling the CONDTEST_PASS case as well (bluntly disabling
all emulations from the oob stage entirely seems wrong ATM).

-- 
Philippe.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
  2023-08-28 14:36       ` Philippe Gerum
@ 2023-08-29  7:42         ` Florian Bezdeka
  2023-09-01 13:29           ` Philippe Gerum
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Bezdeka @ 2023-08-29  7:42 UTC (permalink / raw)
  To: Philippe Gerum, Jan Kiszka; +Cc: xenomai, Clara Kowalsky

On Mon, 2023-08-28 at 16:36 +0200, Philippe Gerum wrote:
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
> > On 28.08.23 14:52, Florian Bezdeka wrote:
> > > On Mon, 2023-08-28 at 14:33 +0200, Jan Kiszka wrote:
> > > > On 25.08.23 14:58, Florian Bezdeka wrote:
> > > > > When running an compat RT application on arm64 the break trap is
> > > > > handled via the undefined instruction trap.
> > > > > 
> > > > > A possible call stack looks like this:
> > > > > 
> > > > > Call trace:
> > > > >   handle_inband_event+0x2d0/0x320
> > > > >   inband_event_notify+0x28/0x50
> > > > >   signal_wake_up_state+0x7c/0xa4
> > > > >   complete_signal+0x104/0x2d0
> > > > >   __send_signal_locked+0x1d0/0x3e4
> > > > >   send_signal_locked+0xf0/0x140
> > > > >   force_sig_info_to_task+0xa0/0x164
> > > > >   force_sig_fault+0x64/0x94
> > > > >   arm64_force_sig_fault+0x48/0x80
> > > > >   send_user_sigtrap+0x50/0x8c
> > > > >   aarch32_break_handler+0xac/0x1d0
> > > > >   do_undefinstr+0x6c/0x360
> > > > >   el0_undef+0x4c/0xd0
> > > > >   el0t_32_sync_handler+0xd0/0x140
> > > > >   el0t_32_sync+0x190/0x194
> > > > > 
> > > > > The trap is never reported to the companion core at that stage so
> > > > > running_oob() in do_undefinstr() will always return true. As the
> > > > > following bailout happens before calling the compat breakpoint
> > > > > detection (aarch32_break_handler()) debugging the compat 
> > > > > application does not work.
> > > > > 
> > > > > In addition aarch32_break_handler() has to report the trap entry to the
> > > > > companion core.
> > > > > 
> > > > > Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
> > > > > Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
> > > > > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> > > > > ---
> > > > >  arch/arm64/kernel/debug-monitors.c | 3 +++
> > > > >  arch/arm64/kernel/traps.c          | 7 -------
> > > > >  2 files changed, 3 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> > > > > index 32271ed24ef5..ef7ac042a0a6 100644
> > > > > --- a/arch/arm64/kernel/debug-monitors.c
> > > > > +++ b/arch/arm64/kernel/debug-monitors.c
> > > > > @@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
> > > > >  	if (!bp)
> > > > >  		return -EFAULT;
> > > > >  
> > > > > +	mark_trap_entry(ARM64_TRAP_UNDI, regs);
> > > > >  	send_user_sigtrap(TRAP_BRKPT);
> > > > > +	mark_trap_exit(ARM64_TRAP_UNDI, regs);
> > > > > +
> > > > >  	return 0;
> > > > >  }
> > > > >  NOKPROBE_SYMBOL(aarch32_break_handler);
> > > > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > index cc68be400244..9bf2f309248f 100644
> > > > > --- a/arch/arm64/kernel/traps.c
> > > > > +++ b/arch/arm64/kernel/traps.c
> > > > > @@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
> > > > >  
> > > > >  void do_undefinstr(struct pt_regs *regs, unsigned long esr)
> > > > >  {
> > > > > -	/*
> > > > > -	 * If the companion core did not switched us to in-band
> > > > > -	 * context, we may assume that it has handled the trap.
> > > > > -	 */
> > > > > -	if (running_oob())
> > > > > -		return;
> > > > > -
> > > > >  	/* check for AArch32 breakpoint instructions */
> > > > >  	if (!aarch32_break_handler(regs))
> > > > >  		return;
> > > > 
> > > > This is not against v6.5-dovetail-rebase, right? We likely need to start
> > > > from the top, then backport to stable.
> > > 
> > > That applied for 6.1 and all other (lower) versions that are currently
> > > covered by our CI. Might need some lifting for 6.5, didn't check yet.
> > > 
> > > > 
> > > > Also note that this change came in via
> > > > https://source.denx.de/Xenomai/linux-dovetail/-/commit/2b2ccdaeb8116727cf4076960d664a3cedff0ac6,
> > > > so just dropping it will likely cause problems elsewhere. Should we
> > > > rather move that down in the handler?
> > > 
> > > Well, as written in the commit message running_oob() will always return
> > > true for RT tasks so I'm quite sure that the invalid instruction
> > > handling was broken on arm64 for some time now. We bailed out even
> > > before sending the notification to the companion core.
> > > 
> > > Moving it down might fix the compat case but native arm64 would stay
> > > broken. No?
> > > 
> > 
> > Something looks still fishy, either in the original patch that
> > introduced the condition (I still don't get that special case) or now
> > with this change trying to restore things. I agree that also the
> > original change needed the notification to be delivered.
> > 
> > Philippe, can you help clarifying the logic behind
> > do_undefinstr/do_el0_undef?
> > 
> 
> This very much looks like an unfortunate attempt to mimic the arm32
> logic, which does notify the core about the undefined insn trap prior to
> calling do_undefinstr() (und_fault from the asm entry code).
> 
> So Florian is right, this should not apply to the arm64 side because
> unlike arm32, we need to wait until all branches which might be able to
> handle this fault directly from oob are considered
> (e.g. try_emulate_mrs()), before assuming that we might need the core to
> switch us in-band. IOW, do_el0_undef() is broken since it wrongly
> assumes that such switch might already have happened on entry. As a
> matter of fact, it did not for a reason.
> 
> I see another issue hiding in the dark: emulation of the deprecated
> armv8 SWP{B} instruction cannot be done from the oob stage.  So in
> addition to fixing the aarch32 break handler, I would notify the core
> before handling the CONDTEST_PASS case as well (bluntly disabling
> all emulations from the oob stage entirely seems wrong ATM).

So it seems we are on the same page now, that's great.

To sum up: my "backport" is missing the armv8 SWP{B} part and we still
lack a fix for recent dovetail versions.

Who will write the necessary fixes? Philippe, could you jump on that?

Best regards,
Florian

> 
> -- 
> Philippe.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] arm64: dovetail: Fix undefinstr/break trap handling
  2023-08-29  7:42         ` Florian Bezdeka
@ 2023-09-01 13:29           ` Philippe Gerum
  0 siblings, 0 replies; 7+ messages in thread
From: Philippe Gerum @ 2023-09-01 13:29 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: Jan Kiszka, xenomai, Clara Kowalsky


Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> On Mon, 2023-08-28 at 16:36 +0200, Philippe Gerum wrote:
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>> > On 28.08.23 14:52, Florian Bezdeka wrote:
>> > > On Mon, 2023-08-28 at 14:33 +0200, Jan Kiszka wrote:
>> > > > On 25.08.23 14:58, Florian Bezdeka wrote:
>> > > > > When running an compat RT application on arm64 the break trap is
>> > > > > handled via the undefined instruction trap.
>> > > > > 
>> > > > > A possible call stack looks like this:
>> > > > > 
>> > > > > Call trace:
>> > > > >   handle_inband_event+0x2d0/0x320
>> > > > >   inband_event_notify+0x28/0x50
>> > > > >   signal_wake_up_state+0x7c/0xa4
>> > > > >   complete_signal+0x104/0x2d0
>> > > > >   __send_signal_locked+0x1d0/0x3e4
>> > > > >   send_signal_locked+0xf0/0x140
>> > > > >   force_sig_info_to_task+0xa0/0x164
>> > > > >   force_sig_fault+0x64/0x94
>> > > > >   arm64_force_sig_fault+0x48/0x80
>> > > > >   send_user_sigtrap+0x50/0x8c
>> > > > >   aarch32_break_handler+0xac/0x1d0
>> > > > >   do_undefinstr+0x6c/0x360
>> > > > >   el0_undef+0x4c/0xd0
>> > > > >   el0t_32_sync_handler+0xd0/0x140
>> > > > >   el0t_32_sync+0x190/0x194
>> > > > > 
>> > > > > The trap is never reported to the companion core at that stage so
>> > > > > running_oob() in do_undefinstr() will always return true. As the
>> > > > > following bailout happens before calling the compat breakpoint
>> > > > > detection (aarch32_break_handler()) debugging the compat 
>> > > > > application does not work.
>> > > > > 
>> > > > > In addition aarch32_break_handler() has to report the trap entry to the
>> > > > > companion core.
>> > > > > 
>> > > > > Reported-by: Clara Kowalsky <clara.kowalsky@siemens.com>
>> > > > > Tested-by: Clara Kowalsky <clara.kowalsky@siemens.com>
>> > > > > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > ---
>> > > > >  arch/arm64/kernel/debug-monitors.c | 3 +++
>> > > > >  arch/arm64/kernel/traps.c          | 7 -------
>> > > > >  2 files changed, 3 insertions(+), 7 deletions(-)
>> > > > > 
>> > > > > diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
>> > > > > index 32271ed24ef5..ef7ac042a0a6 100644
>> > > > > --- a/arch/arm64/kernel/debug-monitors.c
>> > > > > +++ b/arch/arm64/kernel/debug-monitors.c
>> > > > > @@ -373,7 +373,10 @@ int aarch32_break_handler(struct pt_regs *regs)
>> > > > >  	if (!bp)
>> > > > >  		return -EFAULT;
>> > > > >  
>> > > > > +	mark_trap_entry(ARM64_TRAP_UNDI, regs);
>> > > > >  	send_user_sigtrap(TRAP_BRKPT);
>> > > > > +	mark_trap_exit(ARM64_TRAP_UNDI, regs);
>> > > > > +
>> > > > >  	return 0;
>> > > > >  }
>> > > > >  NOKPROBE_SYMBOL(aarch32_break_handler);
>> > > > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> > > > > index cc68be400244..9bf2f309248f 100644
>> > > > > --- a/arch/arm64/kernel/traps.c
>> > > > > +++ b/arch/arm64/kernel/traps.c
>> > > > > @@ -489,13 +489,6 @@ void arm64_notify_segfault(unsigned long addr)
>> > > > >  
>> > > > >  void do_undefinstr(struct pt_regs *regs, unsigned long esr)
>> > > > >  {
>> > > > > -	/*
>> > > > > -	 * If the companion core did not switched us to in-band
>> > > > > -	 * context, we may assume that it has handled the trap.
>> > > > > -	 */
>> > > > > -	if (running_oob())
>> > > > > -		return;
>> > > > > -
>> > > > >  	/* check for AArch32 breakpoint instructions */
>> > > > >  	if (!aarch32_break_handler(regs))
>> > > > >  		return;
>> > > > 
>> > > > This is not against v6.5-dovetail-rebase, right? We likely need to start
>> > > > from the top, then backport to stable.
>> > > 
>> > > That applied for 6.1 and all other (lower) versions that are currently
>> > > covered by our CI. Might need some lifting for 6.5, didn't check yet.
>> > > 
>> > > > 
>> > > > Also note that this change came in via
>> > > > https://source.denx.de/Xenomai/linux-dovetail/-/commit/2b2ccdaeb8116727cf4076960d664a3cedff0ac6,
>> > > > so just dropping it will likely cause problems elsewhere. Should we
>> > > > rather move that down in the handler?
>> > > 
>> > > Well, as written in the commit message running_oob() will always return
>> > > true for RT tasks so I'm quite sure that the invalid instruction
>> > > handling was broken on arm64 for some time now. We bailed out even
>> > > before sending the notification to the companion core.
>> > > 
>> > > Moving it down might fix the compat case but native arm64 would stay
>> > > broken. No?
>> > > 
>> > 
>> > Something looks still fishy, either in the original patch that
>> > introduced the condition (I still don't get that special case) or now
>> > with this change trying to restore things. I agree that also the
>> > original change needed the notification to be delivered.
>> > 
>> > Philippe, can you help clarifying the logic behind
>> > do_undefinstr/do_el0_undef?
>> > 
>> 
>> This very much looks like an unfortunate attempt to mimic the arm32
>> logic, which does notify the core about the undefined insn trap prior to
>> calling do_undefinstr() (und_fault from the asm entry code).
>> 
>> So Florian is right, this should not apply to the arm64 side because
>> unlike arm32, we need to wait until all branches which might be able to
>> handle this fault directly from oob are considered
>> (e.g. try_emulate_mrs()), before assuming that we might need the core to
>> switch us in-band. IOW, do_el0_undef() is broken since it wrongly
>> assumes that such switch might already have happened on entry. As a
>> matter of fact, it did not for a reason.
>> 
>> I see another issue hiding in the dark: emulation of the deprecated
>> armv8 SWP{B} instruction cannot be done from the oob stage.  So in
>> addition to fixing the aarch32 break handler, I would notify the core
>> before handling the CONDTEST_PASS case as well (bluntly disabling
>> all emulations from the oob stage entirely seems wrong ATM).
>
> So it seems we are on the same page now, that's great.
>
> To sum up: my "backport" is missing the armv8 SWP{B} part and we still
> lack a fix for recent dovetail versions.
>
> Who will write the necessary fixes? Philippe, could you jump on that?
>

Would you submit a consolidated patch against 6.1 or 6.5 which would
include all the bits we have been discussing first? I could pick it from
there and update other branches.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-01 13:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 12:58 [PATCH] arm64: dovetail: Fix undefinstr/break trap handling Florian Bezdeka
2023-08-28 12:33 ` Jan Kiszka
2023-08-28 12:52   ` Florian Bezdeka
2023-08-28 13:04     ` Jan Kiszka
2023-08-28 14:36       ` Philippe Gerum
2023-08-29  7:42         ` Florian Bezdeka
2023-09-01 13:29           ` Philippe Gerum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).