linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel 6.2 stuck at boot (efi_call_rts) on arm64
@ 2023-03-16  7:54 Andrea Righi
  2023-03-16  7:58 ` Ard Biesheuvel
  2023-03-16  9:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 31+ messages in thread
From: Andrea Righi @ 2023-03-16  7:54 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Ard Biesheuvel, Paolo Pisati, linux-efi, linux-kernel

Hello,

the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
gets stuck and never completes the boot. On the console I see this:

[   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
[   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
[   72.064949] Task dump for CPU 22:
[   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
[   72.078156] Workqueue: efi_rts_wq efi_call_rts
[   72.082595] Call trace:
[   72.085029]  __switch_to+0xbc/0x100
[   72.088508]  0xffff80000fe83d4c

After that, as a consequence, I start to get a lot of hung task timeout traces.

I tried to bisect the problem and I found that the offending commit is
this one:

 e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")

I've reverted this commit for now and everything works just fine, but I
was wondering if the problem could be caused by a lack of entropy on
these arm64 boxes or something else.

Any suggestion? Let me know if you want me to do any specific test.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16  7:54 kernel 6.2 stuck at boot (efi_call_rts) on arm64 Andrea Righi
@ 2023-03-16  7:58 ` Ard Biesheuvel
  2023-03-16  9:45   ` Andrea Righi
  2023-03-16  9:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  7:58 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel

Hello Andrea,

On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> Hello,
>
> the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> gets stuck and never completes the boot. On the console I see this:
>
> [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> [   72.064949] Task dump for CPU 22:
> [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> [   72.082595] Call trace:
> [   72.085029]  __switch_to+0xbc/0x100
> [   72.088508]  0xffff80000fe83d4c
>
> After that, as a consequence, I start to get a lot of hung task timeout traces.
>
> I tried to bisect the problem and I found that the offending commit is
> this one:
>
>  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
>
> I've reverted this commit for now and everything works just fine, but I
> was wondering if the problem could be caused by a lack of entropy on
> these arm64 boxes or something else.
>
> Any suggestion? Let me know if you want me to do any specific test.
>

Thanks for the report.

This is most likely the EFI SetVariable() call going off into the
weeds and never returning.

Is this an Ampere Altra system by any chance? Do you see it on
different types of hardware?

Could you check whether SetVariable works on this system? E.g. by
updating the EFI boot timeout (sudo efibootmgr -t <n>)?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16  7:54 kernel 6.2 stuck at boot (efi_call_rts) on arm64 Andrea Righi
  2023-03-16  7:58 ` Ard Biesheuvel
@ 2023-03-16  9:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-04-05 12:50   ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 31+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-03-16  9:45 UTC (permalink / raw)
  To: Andrea Righi, Jason A. Donenfeld
  Cc: Ard Biesheuvel, Paolo Pisati, linux-efi, linux-kernel,
	Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 16.03.23 08:54, Andrea Righi wrote:
> Hello,
> 
> the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> gets stuck and never completes the boot. On the console I see this:
> 
> [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> [   72.064949] Task dump for CPU 22:
> [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> [   72.082595] Call trace:
> [   72.085029]  __switch_to+0xbc/0x100
> [   72.088508]  0xffff80000fe83d4c
> 
> After that, as a consequence, I start to get a lot of hung task timeout traces.
> 
> I tried to bisect the problem and I found that the offending commit is
> this one:
> 
>  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> 
> I've reverted this commit for now and everything works just fine, but I
> was wondering if the problem could be caused by a lack of entropy on
> these arm64 boxes or something else.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e7b813b32a42
#regzbot title efi: stuck at boot (efi_call_rts) on arm64
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16  7:58 ` Ard Biesheuvel
@ 2023-03-16  9:45   ` Andrea Righi
  2023-03-16  9:55     ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16  9:45 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel

On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> Hello Andrea,
> 
> On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > Hello,
> >
> > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > gets stuck and never completes the boot. On the console I see this:
> >
> > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > [   72.064949] Task dump for CPU 22:
> > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > [   72.082595] Call trace:
> > [   72.085029]  __switch_to+0xbc/0x100
> > [   72.088508]  0xffff80000fe83d4c
> >
> > After that, as a consequence, I start to get a lot of hung task timeout traces.
> >
> > I tried to bisect the problem and I found that the offending commit is
> > this one:
> >
> >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> >
> > I've reverted this commit for now and everything works just fine, but I
> > was wondering if the problem could be caused by a lack of entropy on
> > these arm64 boxes or something else.
> >
> > Any suggestion? Let me know if you want me to do any specific test.
> >
> 
> Thanks for the report.
> 
> This is most likely the EFI SetVariable() call going off into the
> weeds and never returning.
> 
> Is this an Ampere Altra system by any chance? Do you see it on
> different types of hardware?

This is: Ampere eMAG / Lenovo ThinkSystem HR330a.

> 
> Could you check whether SetVariable works on this system? E.g. by
> updating the EFI boot timeout (sudo efibootmgr -t <n>)?

ubuntu@kuzzle:~$ sudo efibootmgr -t 10
^C^C^C^C

^ Stuck there, so it really looks like SetVariable is the problem.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16  9:45   ` Andrea Righi
@ 2023-03-16  9:55     ` Ard Biesheuvel
  2023-03-16 10:03       ` Andrea Righi
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16  9:55 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

(cc Darren)

On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > Hello Andrea,
> >
> > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > Hello,
> > >
> > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > gets stuck and never completes the boot. On the console I see this:
> > >
> > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > [   72.064949] Task dump for CPU 22:
> > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > [   72.082595] Call trace:
> > > [   72.085029]  __switch_to+0xbc/0x100
> > > [   72.088508]  0xffff80000fe83d4c
> > >
> > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > >
> > > I tried to bisect the problem and I found that the offending commit is
> > > this one:
> > >
> > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > >
> > > I've reverted this commit for now and everything works just fine, but I
> > > was wondering if the problem could be caused by a lack of entropy on
> > > these arm64 boxes or something else.
> > >
> > > Any suggestion? Let me know if you want me to do any specific test.
> > >
> >
> > Thanks for the report.
> >
> > This is most likely the EFI SetVariable() call going off into the
> > weeds and never returning.
> >
> > Is this an Ampere Altra system by any chance? Do you see it on
> > different types of hardware?
>
> This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
>
> >
> > Could you check whether SetVariable works on this system? E.g. by
> > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
>
> ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> ^C^C^C^C
>
> ^ Stuck there, so it really looks like SetVariable is the problem.
>

Could you please share the output of

dmidecode -s bios
dmidecode -s system-family

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16  9:55     ` Ard Biesheuvel
@ 2023-03-16 10:03       ` Andrea Righi
  2023-03-16 10:18         ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 10:03 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> (cc Darren)
> 
> On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > Hello Andrea,
> > >
> > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > gets stuck and never completes the boot. On the console I see this:
> > > >
> > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > [   72.064949] Task dump for CPU 22:
> > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > [   72.082595] Call trace:
> > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > [   72.088508]  0xffff80000fe83d4c
> > > >
> > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > >
> > > > I tried to bisect the problem and I found that the offending commit is
> > > > this one:
> > > >
> > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > >
> > > > I've reverted this commit for now and everything works just fine, but I
> > > > was wondering if the problem could be caused by a lack of entropy on
> > > > these arm64 boxes or something else.
> > > >
> > > > Any suggestion? Let me know if you want me to do any specific test.
> > > >
> > >
> > > Thanks for the report.
> > >
> > > This is most likely the EFI SetVariable() call going off into the
> > > weeds and never returning.
> > >
> > > Is this an Ampere Altra system by any chance? Do you see it on
> > > different types of hardware?
> >
> > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> >
> > >
> > > Could you check whether SetVariable works on this system? E.g. by
> > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> >
> > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > ^C^C^C^C
> >
> > ^ Stuck there, so it really looks like SetVariable is the problem.
> >
> 
> Could you please share the output of
> 
> dmidecode -s bios
> dmidecode -s system-family

$ sudo dmidecode -s bios-vendor
LENOVO
$ sudo dmidecode -s bios-version
hve104r-1.15
$ sudo dmidecode -s bios-release-date
02/26/2021
$ sudo dmidecode -s bios-revision
1.15
$ sudo dmidecode -s system-family
Lenovo ThinkSystem HR330A/HR350A

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 10:03       ` Andrea Righi
@ 2023-03-16 10:18         ` Ard Biesheuvel
  2023-03-16 11:33           ` Andrea Righi
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 10:18 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > (cc Darren)
> >
> > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > Hello Andrea,
> > > >
> > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > gets stuck and never completes the boot. On the console I see this:
> > > > >
> > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > [   72.064949] Task dump for CPU 22:
> > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > [   72.082595] Call trace:
> > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > [   72.088508]  0xffff80000fe83d4c
> > > > >
> > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > >
> > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > this one:
> > > > >
> > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > >
> > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > these arm64 boxes or something else.
> > > > >
> > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > >
> > > >
> > > > Thanks for the report.
> > > >
> > > > This is most likely the EFI SetVariable() call going off into the
> > > > weeds and never returning.
> > > >
> > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > different types of hardware?
> > >
> > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > >
> > > >
> > > > Could you check whether SetVariable works on this system? E.g. by
> > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > >
> > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > ^C^C^C^C
> > >
> > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > >
> >
> > Could you please share the output of
> >
> > dmidecode -s bios
> > dmidecode -s system-family
>
> $ sudo dmidecode -s bios-vendor
> LENOVO
> $ sudo dmidecode -s bios-version
> hve104r-1.15
> $ sudo dmidecode -s bios-release-date
> 02/26/2021
> $ sudo dmidecode -s bios-revision
> 1.15
> $ sudo dmidecode -s system-family
> Lenovo ThinkSystem HR330A/HR350A
>

Thanks

Mind checking if this patch fixes your issue as well?

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 10:18         ` Ard Biesheuvel
@ 2023-03-16 11:33           ` Andrea Righi
  2023-03-16 12:21             ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 11:33 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > (cc Darren)
> > >
> > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > Hello Andrea,
> > > > >
> > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > >
> > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > [   72.082595] Call trace:
> > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > >
> > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > >
> > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > this one:
> > > > > >
> > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > >
> > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > these arm64 boxes or something else.
> > > > > >
> > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > >
> > > > >
> > > > > Thanks for the report.
> > > > >
> > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > weeds and never returning.
> > > > >
> > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > different types of hardware?
> > > >
> > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > >
> > > > >
> > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > >
> > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > ^C^C^C^C
> > > >
> > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > >
> > >
> > > Could you please share the output of
> > >
> > > dmidecode -s bios
> > > dmidecode -s system-family
> >
> > $ sudo dmidecode -s bios-vendor
> > LENOVO
> > $ sudo dmidecode -s bios-version
> > hve104r-1.15
> > $ sudo dmidecode -s bios-release-date
> > 02/26/2021
> > $ sudo dmidecode -s bios-revision
> > 1.15
> > $ sudo dmidecode -s system-family
> > Lenovo ThinkSystem HR330A/HR350A
> >
> 
> Thanks
> 
> Mind checking if this patch fixes your issue as well?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0

Unfortunately this doesn't seem to be enough, I'm still getting the same
problem also with this patch applied.

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 11:33           ` Andrea Righi
@ 2023-03-16 12:21             ` Ard Biesheuvel
  2023-03-16 12:38               ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 12:21 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > (cc Darren)
> > > >
> > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > Hello Andrea,
> > > > > >
> > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > >
> > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > [   72.082595] Call trace:
> > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > >
> > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > >
> > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > this one:
> > > > > > >
> > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > >
> > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > these arm64 boxes or something else.
> > > > > > >
> > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > >
> > > > > >
> > > > > > Thanks for the report.
> > > > > >
> > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > weeds and never returning.
> > > > > >
> > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > different types of hardware?
> > > > >
> > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > >
> > > > > >
> > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > >
> > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > ^C^C^C^C
> > > > >
> > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > >
> > > >
> > > > Could you please share the output of
> > > >
> > > > dmidecode -s bios
> > > > dmidecode -s system-family
> > >
> > > $ sudo dmidecode -s bios-vendor
> > > LENOVO
> > > $ sudo dmidecode -s bios-version
> > > hve104r-1.15
> > > $ sudo dmidecode -s bios-release-date
> > > 02/26/2021
> > > $ sudo dmidecode -s bios-revision
> > > 1.15
> > > $ sudo dmidecode -s system-family
> > > Lenovo ThinkSystem HR330A/HR350A
> > >
> >
> > Thanks
> >
> > Mind checking if this patch fixes your issue as well?
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
>
> Unfortunately this doesn't seem to be enough, I'm still getting the same
> problem also with this patch applied.
>

Thanks for trying.

How about the last 3 patches on this branch?

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 12:21             ` Ard Biesheuvel
@ 2023-03-16 12:38               ` Ard Biesheuvel
  2023-03-16 12:41                 ` Andrea Righi
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 12:38 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > (cc Darren)
> > > > >
> > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > Hello Andrea,
> > > > > > >
> > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > >
> > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > [   72.082595] Call trace:
> > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > >
> > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > >
> > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > this one:
> > > > > > > >
> > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > >
> > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > these arm64 boxes or something else.
> > > > > > > >
> > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks for the report.
> > > > > > >
> > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > weeds and never returning.
> > > > > > >
> > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > different types of hardware?
> > > > > >
> > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > >
> > > > > > >
> > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > >
> > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > ^C^C^C^C
> > > > > >
> > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > >
> > > > >
> > > > > Could you please share the output of
> > > > >
> > > > > dmidecode -s bios
> > > > > dmidecode -s system-family
> > > >
> > > > $ sudo dmidecode -s bios-vendor
> > > > LENOVO
> > > > $ sudo dmidecode -s bios-version
> > > > hve104r-1.15
> > > > $ sudo dmidecode -s bios-release-date
> > > > 02/26/2021
> > > > $ sudo dmidecode -s bios-revision
> > > > 1.15
> > > > $ sudo dmidecode -s system-family
> > > > Lenovo ThinkSystem HR330A/HR350A
> > > >
> > >
> > > Thanks
> > >
> > > Mind checking if this patch fixes your issue as well?
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> >
> > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > problem also with this patch applied.
> >
>
> Thanks for trying.
>
> How about the last 3 patches on this branch?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix

Actually, that may not match your hardware.

Does your kernel log have a line like

SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102

?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 12:38               ` Ard Biesheuvel
@ 2023-03-16 12:41                 ` Andrea Righi
  2023-03-16 12:43                   ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 12:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > (cc Darren)
> > > > > >
> > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > Hello Andrea,
> > > > > > > >
> > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > >
> > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > >
> > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > >
> > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > this one:
> > > > > > > > >
> > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > >
> > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > these arm64 boxes or something else.
> > > > > > > > >
> > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for the report.
> > > > > > > >
> > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > weeds and never returning.
> > > > > > > >
> > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > different types of hardware?
> > > > > > >
> > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > >
> > > > > > > >
> > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > >
> > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > ^C^C^C^C
> > > > > > >
> > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > >
> > > > > >
> > > > > > Could you please share the output of
> > > > > >
> > > > > > dmidecode -s bios
> > > > > > dmidecode -s system-family
> > > > >
> > > > > $ sudo dmidecode -s bios-vendor
> > > > > LENOVO
> > > > > $ sudo dmidecode -s bios-version
> > > > > hve104r-1.15
> > > > > $ sudo dmidecode -s bios-release-date
> > > > > 02/26/2021
> > > > > $ sudo dmidecode -s bios-revision
> > > > > 1.15
> > > > > $ sudo dmidecode -s system-family
> > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > >
> > > >
> > > > Thanks
> > > >
> > > > Mind checking if this patch fixes your issue as well?
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > >
> > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > problem also with this patch applied.
> > >
> >
> > Thanks for trying.
> >
> > How about the last 3 patches on this branch?
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> 
> Actually, that may not match your hardware.
> 
> Does your kernel log have a line like
> 
> SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> 
> ?

$ sudo dmesg | grep "SMCCC: SOC_ID"
[    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 12:41                 ` Andrea Righi
@ 2023-03-16 12:43                   ` Ard Biesheuvel
  2023-03-16 12:49                     ` Andrea Righi
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 12:43 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > (cc Darren)
> > > > > > >
> > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > Hello Andrea,
> > > > > > > > >
> > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > >
> > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > >
> > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > >
> > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > this one:
> > > > > > > > > >
> > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > >
> > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > >
> > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks for the report.
> > > > > > > > >
> > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > weeds and never returning.
> > > > > > > > >
> > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > different types of hardware?
> > > > > > > >
> > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > >
> > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > ^C^C^C^C
> > > > > > > >
> > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > >
> > > > > > >
> > > > > > > Could you please share the output of
> > > > > > >
> > > > > > > dmidecode -s bios
> > > > > > > dmidecode -s system-family
> > > > > >
> > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > LENOVO
> > > > > > $ sudo dmidecode -s bios-version
> > > > > > hve104r-1.15
> > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > 02/26/2021
> > > > > > $ sudo dmidecode -s bios-revision
> > > > > > 1.15
> > > > > > $ sudo dmidecode -s system-family
> > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > Mind checking if this patch fixes your issue as well?
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > >
> > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > problem also with this patch applied.
> > > >
> > >
> > > Thanks for trying.
> > >
> > > How about the last 3 patches on this branch?
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> >
> > Actually, that may not match your hardware.
> >
> > Does your kernel log have a line like
> >
> > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> >
> > ?
>
> $ sudo dmesg | grep "SMCCC: SOC_ID"
> [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
>

Thanks. Could you share the entire dmidecode output somewhere? Or at
least the type 4 record(s)?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 12:43                   ` Ard Biesheuvel
@ 2023-03-16 12:49                     ` Andrea Righi
  2023-03-16 13:45                       ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 12:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > (cc Darren)
> > > > > > > >
> > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > Hello Andrea,
> > > > > > > > > >
> > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > >
> > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > >
> > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > >
> > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > this one:
> > > > > > > > > > >
> > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > >
> > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > >
> > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks for the report.
> > > > > > > > > >
> > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > weeds and never returning.
> > > > > > > > > >
> > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > different types of hardware?
> > > > > > > > >
> > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > >
> > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > ^C^C^C^C
> > > > > > > > >
> > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Could you please share the output of
> > > > > > > >
> > > > > > > > dmidecode -s bios
> > > > > > > > dmidecode -s system-family
> > > > > > >
> > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > LENOVO
> > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > hve104r-1.15
> > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > 02/26/2021
> > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > 1.15
> > > > > > > $ sudo dmidecode -s system-family
> > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Mind checking if this patch fixes your issue as well?
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > >
> > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > problem also with this patch applied.
> > > > >
> > > >
> > > > Thanks for trying.
> > > >
> > > > How about the last 3 patches on this branch?
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > >
> > > Actually, that may not match your hardware.
> > >
> > > Does your kernel log have a line like
> > >
> > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > >
> > > ?
> >
> > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> >
> 
> Thanks. Could you share the entire dmidecode output somewhere? Or at
> least the type 4 record(s)?

Sure, here's the full output of dmidecode:
https://pastebin.ubuntu.com/p/4ZmKmP2xTm/

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 12:49                     ` Andrea Righi
@ 2023-03-16 13:45                       ` Ard Biesheuvel
  2023-03-16 13:46                         ` Ard Biesheuvel
  2023-03-16 13:50                         ` Andrea Righi
  0 siblings, 2 replies; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 13:45 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > (cc Darren)
> > > > > > > > >
> > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > Hello Andrea,
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > >
> > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > >
> > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > >
> > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > this one:
> > > > > > > > > > > >
> > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > >
> > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > >
> > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the report.
> > > > > > > > > > >
> > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > weeds and never returning.
> > > > > > > > > > >
> > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > different types of hardware?
> > > > > > > > > >
> > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > >
> > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > ^C^C^C^C
> > > > > > > > > >
> > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Could you please share the output of
> > > > > > > > >
> > > > > > > > > dmidecode -s bios
> > > > > > > > > dmidecode -s system-family
> > > > > > > >
> > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > LENOVO
> > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > hve104r-1.15
> > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > 02/26/2021
> > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > 1.15
> > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > >
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > >
> > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > problem also with this patch applied.
> > > > > >
> > > > >
> > > > > Thanks for trying.
> > > > >
> > > > > How about the last 3 patches on this branch?
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > >
> > > > Actually, that may not match your hardware.
> > > >
> > > > Does your kernel log have a line like
> > > >
> > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > >
> > > > ?
> > >
> > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > >
> >
> > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > least the type 4 record(s)?
>
> Sure, here's the full output of dmidecode:
> https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
>

Thanks. I have updated my SMBIOS patches to take the processor version
'eMAG' into account, which appears to be what these boxes are using.

I have updated the efi/urgent branch here with the latest versions.
Mind giving them a spin?


In the mean time, just for the record - could you please run this as well?

hexdump -C /sys/firmware/dmi/entries/4-0/raw

(as root)

There seem to be eMAG boxes that put the type 4 ID in the wrong word
order, so I'd like to make sure we have a record of the binary
representation.

Thanks a lot for spending time on this.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 13:45                       ` Ard Biesheuvel
@ 2023-03-16 13:46                         ` Ard Biesheuvel
  2023-03-16 13:50                         ` Andrea Righi
  1 sibling, 0 replies; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 13:46 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 14:45, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > (cc Darren)
> > > > > > > > > >
> > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello,
> > > > > > > > > > > > >
> > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > >
> > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > >
> > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > this one:
> > > > > > > > > > > > >
> > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > >
> > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > >
> > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > different types of hardware?
> > > > > > > > > > >
> > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > >
> > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > >
> > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Could you please share the output of
> > > > > > > > > >
> > > > > > > > > > dmidecode -s bios
> > > > > > > > > > dmidecode -s system-family
> > > > > > > > >
> > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > LENOVO
> > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > hve104r-1.15
> > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > 02/26/2021
> > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > 1.15
> > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > >
> > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > >
> > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > problem also with this patch applied.
> > > > > > >
> > > > > >
> > > > > > Thanks for trying.
> > > > > >
> > > > > > How about the last 3 patches on this branch?
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > >
> > > > > Actually, that may not match your hardware.
> > > > >
> > > > > Does your kernel log have a line like
> > > > >
> > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > >
> > > > > ?
> > > >
> > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > >
> > >
> > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > least the type 4 record(s)?
> >
> > Sure, here's the full output of dmidecode:
> > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> >
>
> Thanks. I have updated my SMBIOS patches to take the processor version
> 'eMAG' into account, which appears to be what these boxes are using.
>
> I have updated the efi/urgent branch here with the latest versions.
> Mind giving them a spin?
>

https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/log/?h=urgent

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 13:45                       ` Ard Biesheuvel
  2023-03-16 13:46                         ` Ard Biesheuvel
@ 2023-03-16 13:50                         ` Andrea Righi
  2023-03-16 13:53                           ` Ard Biesheuvel
  1 sibling, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 13:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > (cc Darren)
> > > > > > > > > >
> > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello,
> > > > > > > > > > > > >
> > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > >
> > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > >
> > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > this one:
> > > > > > > > > > > > >
> > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > >
> > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > >
> > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > different types of hardware?
> > > > > > > > > > >
> > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > >
> > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > >
> > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Could you please share the output of
> > > > > > > > > >
> > > > > > > > > > dmidecode -s bios
> > > > > > > > > > dmidecode -s system-family
> > > > > > > > >
> > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > LENOVO
> > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > hve104r-1.15
> > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > 02/26/2021
> > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > 1.15
> > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > >
> > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > >
> > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > problem also with this patch applied.
> > > > > > >
> > > > > >
> > > > > > Thanks for trying.
> > > > > >
> > > > > > How about the last 3 patches on this branch?
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > >
> > > > > Actually, that may not match your hardware.
> > > > >
> > > > > Does your kernel log have a line like
> > > > >
> > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > >
> > > > > ?
> > > >
> > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > >
> > >
> > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > least the type 4 record(s)?
> >
> > Sure, here's the full output of dmidecode:
> > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> >
> 
> Thanks. I have updated my SMBIOS patches to take the processor version
> 'eMAG' into account, which appears to be what these boxes are using.
> 
> I have updated the efi/urgent branch here with the latest versions.
> Mind giving them a spin?
> 
> 
> In the mean time, just for the record - could you please run this as well?
> 
> hexdump -C /sys/firmware/dmi/entries/4-0/raw
> 
> (as root)

hm.. I don't have that in /sys/firmware/, this is what I have:

# ls -l /sys/firmware/dmi/
total 0
drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
# ls -l /sys/firmware/dmi/tables/
total 0
-r-------- 1 root root 5004 Mar 16 13:26 DMI
-r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 13:50                         ` Andrea Righi
@ 2023-03-16 13:53                           ` Ard Biesheuvel
  2023-03-16 13:59                             ` Andrea Righi
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 13:53 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > (cc Darren)
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > >
> > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > >
> > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > >
> > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Could you please share the output of
> > > > > > > > > > >
> > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > >
> > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > LENOVO
> > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > hve104r-1.15
> > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > 02/26/2021
> > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > 1.15
> > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > >
> > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > >
> > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > problem also with this patch applied.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks for trying.
> > > > > > >
> > > > > > > How about the last 3 patches on this branch?
> > > > > > >
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > >
> > > > > > Actually, that may not match your hardware.
> > > > > >
> > > > > > Does your kernel log have a line like
> > > > > >
> > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > >
> > > > > > ?
> > > > >
> > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > >
> > > >
> > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > least the type 4 record(s)?
> > >
> > > Sure, here's the full output of dmidecode:
> > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > >
> >
> > Thanks. I have updated my SMBIOS patches to take the processor version
> > 'eMAG' into account, which appears to be what these boxes are using.
> >
> > I have updated the efi/urgent branch here with the latest versions.
> > Mind giving them a spin?
> >
> >
> > In the mean time, just for the record - could you please run this as well?
> >
> > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> >
> > (as root)
>
> hm.. I don't have that in /sys/firmware/, this is what I have:
>
> # ls -l /sys/firmware/dmi/
> total 0
> drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> # ls -l /sys/firmware/dmi/tables/
> total 0
> -r-------- 1 root root 5004 Mar 16 13:26 DMI
> -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
>

You'll need to load the dmi_sysfs module for that. But no big deal
otherwise, I'm pretty sure the word order is the correct on on your
system in any case (it decodes the value correctly in the next line)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 13:53                           ` Ard Biesheuvel
@ 2023-03-16 13:59                             ` Andrea Righi
  2023-03-16 14:06                               ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 13:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > >
> > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > >
> > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > >
> > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > >
> > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > LENOVO
> > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > 02/26/2021
> > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > 1.15
> > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > >
> > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > >
> > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > problem also with this patch applied.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for trying.
> > > > > > > >
> > > > > > > > How about the last 3 patches on this branch?
> > > > > > > >
> > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > >
> > > > > > > Actually, that may not match your hardware.
> > > > > > >
> > > > > > > Does your kernel log have a line like
> > > > > > >
> > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > >
> > > > > > > ?
> > > > > >
> > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > >
> > > > >
> > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > least the type 4 record(s)?
> > > >
> > > > Sure, here's the full output of dmidecode:
> > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > >
> > >
> > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > 'eMAG' into account, which appears to be what these boxes are using.
> > >
> > > I have updated the efi/urgent branch here with the latest versions.
> > > Mind giving them a spin?
> > >
> > >
> > > In the mean time, just for the record - could you please run this as well?
> > >
> > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > >
> > > (as root)
> >
> > hm.. I don't have that in /sys/firmware/, this is what I have:
> >
> > # ls -l /sys/firmware/dmi/
> > total 0
> > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > # ls -l /sys/firmware/dmi/tables/
> > total 0
> > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> >
> 
> You'll need to load the dmi_sysfs module for that. But no big deal
> otherwise, I'm pretty sure the word order is the correct on on your
> system in any case (it decodes the value correctly in the next line)

ok, much better after modprobe dmi_sysfs. :)

$ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw 
00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|
00000050  30 30 30 30 30 30 30 35  30 30 35 30 31 30 35 30  |0000000500501050|
00000060  32 46 42 30 39 38 38 00  55 6e 6b 6e 6f 77 6e 00  |2FB0988.Unknown.|
00000070  55 6e 6b 6e 6f 77 6e 00  00                       |Unknown..|
00000079

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 13:59                             ` Andrea Righi
@ 2023-03-16 14:06                               ` Ard Biesheuvel
  2023-03-16 14:08                                 ` Ard Biesheuvel
  0 siblings, 1 reply; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 14:06 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 14:59, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > > >
> > > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > > >
> > > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > > LENOVO
> > > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > > 02/26/2021
> > > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > > 1.15
> > > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > > >
> > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > > >
> > > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > > problem also with this patch applied.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks for trying.
> > > > > > > > >
> > > > > > > > > How about the last 3 patches on this branch?
> > > > > > > > >
> > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > > >
> > > > > > > > Actually, that may not match your hardware.
> > > > > > > >
> > > > > > > > Does your kernel log have a line like
> > > > > > > >
> > > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > > >
> > > > > > > > ?
> > > > > > >
> > > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > > >
> > > > > >
> > > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > > least the type 4 record(s)?
> > > > >
> > > > > Sure, here's the full output of dmidecode:
> > > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > > >
> > > >
> > > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > > 'eMAG' into account, which appears to be what these boxes are using.
> > > >
> > > > I have updated the efi/urgent branch here with the latest versions.
> > > > Mind giving them a spin?
> > > >
> > > >
> > > > In the mean time, just for the record - could you please run this as well?
> > > >
> > > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > >
> > > > (as root)
> > >
> > > hm.. I don't have that in /sys/firmware/, this is what I have:
> > >
> > > # ls -l /sys/firmware/dmi/
> > > total 0
> > > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > > # ls -l /sys/firmware/dmi/tables/
> > > total 0
> > > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> > >
> >
> > You'll need to load the dmi_sysfs module for that. But no big deal
> > otherwise, I'm pretty sure the word order is the correct on on your
> > system in any case (it decodes the value correctly in the next line)
>
> ok, much better after modprobe dmi_sysfs. :)
>

Yeah better, thanks.

> $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> 00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
> 00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
> 00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
> 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
> 00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|

Darn, this means we have to match for "eMAG " (with the trailing
space) so the branch i just pushed needs to be updated for this.

> 00000050  30 30 30 30 30 30 30 35  30 30 35 30 31 30 35 30  |0000000500501050|
> 00000060  32 46 42 30 39 38 38 00  55 6e 6b 6e 6f 77 6e 00  |2FB0988.Unknown.|
> 00000070  55 6e 6b 6e 6f 77 6e 00  00                       |Unknown..|
> 00000079
>
> -Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 14:06                               ` Ard Biesheuvel
@ 2023-03-16 14:08                                 ` Ard Biesheuvel
  2023-03-16 14:25                                   ` Andrea Righi
  2023-03-16 17:52                                   ` Andrea Righi
  0 siblings, 2 replies; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 14:08 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 15:06, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 16 Mar 2023 at 14:59, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > > > >
> > > > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > > > LENOVO
> > > > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > > > 02/26/2021
> > > > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > > > 1.15
> > > > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > > > >
> > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > > > >
> > > > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > > > problem also with this patch applied.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks for trying.
> > > > > > > > > >
> > > > > > > > > > How about the last 3 patches on this branch?
> > > > > > > > > >
> > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > > > >
> > > > > > > > > Actually, that may not match your hardware.
> > > > > > > > >
> > > > > > > > > Does your kernel log have a line like
> > > > > > > > >
> > > > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > > > >
> > > > > > > > > ?
> > > > > > > >
> > > > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > > > >
> > > > > > >
> > > > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > > > least the type 4 record(s)?
> > > > > >
> > > > > > Sure, here's the full output of dmidecode:
> > > > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > > > >
> > > > >
> > > > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > > > 'eMAG' into account, which appears to be what these boxes are using.
> > > > >
> > > > > I have updated the efi/urgent branch here with the latest versions.
> > > > > Mind giving them a spin?
> > > > >
> > > > >
> > > > > In the mean time, just for the record - could you please run this as well?
> > > > >
> > > > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > >
> > > > > (as root)
> > > >
> > > > hm.. I don't have that in /sys/firmware/, this is what I have:
> > > >
> > > > # ls -l /sys/firmware/dmi/
> > > > total 0
> > > > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > > > # ls -l /sys/firmware/dmi/tables/
> > > > total 0
> > > > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > > > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> > > >
> > >
> > > You'll need to load the dmi_sysfs module for that. But no big deal
> > > otherwise, I'm pretty sure the word order is the correct on on your
> > > system in any case (it decodes the value correctly in the next line)
> >
> > ok, much better after modprobe dmi_sysfs. :)
> >
>
> Yeah better, thanks.
>
> > $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > 00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
> > 00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
> > 00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
> > 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
> > 00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|
>
> Darn, this means we have to match for "eMAG " (with the trailing
> space) so the branch i just pushed needs to be updated for this.
>

I.e.,

--- a/drivers/firmware/efi/libstub/arm64.c
+++ b/drivers/firmware/efi/libstub/arm64.c
@@ -36,7 +36,7 @@ static bool system_needs_vamap(void)
        default:
                version = efi_get_smbios_string(&record->header, 4,
                                                processor_version);
-               if (!version || strcmp(version, "eMAG"))
+               if (!version || strncmp(version, "eMAG", 4))
                        break;

                fallthrough;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 14:08                                 ` Ard Biesheuvel
@ 2023-03-16 14:25                                   ` Andrea Righi
  2023-03-16 17:52                                   ` Andrea Righi
  1 sibling, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 14:25 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 03:08:53PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 15:06, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 16 Mar 2023 at 14:59, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > > > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > > > > LENOVO
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > > > > 02/26/2021
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > > > > 1.15
> > > > > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > > > > >
> > > > > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > > > > problem also with this patch applied.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for trying.
> > > > > > > > > > >
> > > > > > > > > > > How about the last 3 patches on this branch?
> > > > > > > > > > >
> > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > > > > >
> > > > > > > > > > Actually, that may not match your hardware.
> > > > > > > > > >
> > > > > > > > > > Does your kernel log have a line like
> > > > > > > > > >
> > > > > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > > > > >
> > > > > > > > > > ?
> > > > > > > > >
> > > > > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > > > > least the type 4 record(s)?
> > > > > > >
> > > > > > > Sure, here's the full output of dmidecode:
> > > > > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > > > > >
> > > > > >
> > > > > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > > > > 'eMAG' into account, which appears to be what these boxes are using.
> > > > > >
> > > > > > I have updated the efi/urgent branch here with the latest versions.
> > > > > > Mind giving them a spin?
> > > > > >
> > > > > >
> > > > > > In the mean time, just for the record - could you please run this as well?
> > > > > >
> > > > > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > > >
> > > > > > (as root)
> > > > >
> > > > > hm.. I don't have that in /sys/firmware/, this is what I have:
> > > > >
> > > > > # ls -l /sys/firmware/dmi/
> > > > > total 0
> > > > > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > > > > # ls -l /sys/firmware/dmi/tables/
> > > > > total 0
> > > > > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > > > > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> > > > >
> > > >
> > > > You'll need to load the dmi_sysfs module for that. But no big deal
> > > > otherwise, I'm pretty sure the word order is the correct on on your
> > > > system in any case (it decodes the value correctly in the next line)
> > >
> > > ok, much better after modprobe dmi_sysfs. :)
> > >
> >
> > Yeah better, thanks.
> >
> > > $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > 00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
> > > 00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
> > > 00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
> > > 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
> > > 00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|
> >
> > Darn, this means we have to match for "eMAG " (with the trailing
> > space) so the branch i just pushed needs to be updated for this.
> >
> 
> I.e.,
> 
> --- a/drivers/firmware/efi/libstub/arm64.c
> +++ b/drivers/firmware/efi/libstub/arm64.c
> @@ -36,7 +36,7 @@ static bool system_needs_vamap(void)
>         default:
>                 version = efi_get_smbios_string(&record->header, 4,
>                                                 processor_version);
> -               if (!version || strcmp(version, "eMAG"))
> +               if (!version || strncmp(version, "eMAG", 4))
>                         break;
> 
>                 fallthrough;

OK, I can add that and test it.

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 14:08                                 ` Ard Biesheuvel
  2023-03-16 14:25                                   ` Andrea Righi
@ 2023-03-16 17:52                                   ` Andrea Righi
  2023-03-16 18:55                                     ` Ard Biesheuvel
  1 sibling, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 17:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 03:08:53PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 15:06, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 16 Mar 2023 at 14:59, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > > > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > > > > LENOVO
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > > > > 02/26/2021
> > > > > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > > > > 1.15
> > > > > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > > > > >
> > > > > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > > > > problem also with this patch applied.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for trying.
> > > > > > > > > > >
> > > > > > > > > > > How about the last 3 patches on this branch?
> > > > > > > > > > >
> > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > > > > >
> > > > > > > > > > Actually, that may not match your hardware.
> > > > > > > > > >
> > > > > > > > > > Does your kernel log have a line like
> > > > > > > > > >
> > > > > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > > > > >
> > > > > > > > > > ?
> > > > > > > > >
> > > > > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > > > > least the type 4 record(s)?
> > > > > > >
> > > > > > > Sure, here's the full output of dmidecode:
> > > > > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > > > > >
> > > > > >
> > > > > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > > > > 'eMAG' into account, which appears to be what these boxes are using.
> > > > > >
> > > > > > I have updated the efi/urgent branch here with the latest versions.
> > > > > > Mind giving them a spin?
> > > > > >
> > > > > >
> > > > > > In the mean time, just for the record - could you please run this as well?
> > > > > >
> > > > > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > > >
> > > > > > (as root)
> > > > >
> > > > > hm.. I don't have that in /sys/firmware/, this is what I have:
> > > > >
> > > > > # ls -l /sys/firmware/dmi/
> > > > > total 0
> > > > > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > > > > # ls -l /sys/firmware/dmi/tables/
> > > > > total 0
> > > > > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > > > > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> > > > >
> > > >
> > > > You'll need to load the dmi_sysfs module for that. But no big deal
> > > > otherwise, I'm pretty sure the word order is the correct on on your
> > > > system in any case (it decodes the value correctly in the next line)
> > >
> > > ok, much better after modprobe dmi_sysfs. :)
> > >
> >
> > Yeah better, thanks.
> >
> > > $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > 00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
> > > 00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
> > > 00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
> > > 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
> > > 00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|
> >
> > Darn, this means we have to match for "eMAG " (with the trailing
> > space) so the branch i just pushed needs to be updated for this.
> >
> 
> I.e.,
> 
> --- a/drivers/firmware/efi/libstub/arm64.c
> +++ b/drivers/firmware/efi/libstub/arm64.c
> @@ -36,7 +36,7 @@ static bool system_needs_vamap(void)
>         default:
>                 version = efi_get_smbios_string(&record->header, 4,
>                                                 processor_version);
> -               if (!version || strcmp(version, "eMAG"))
> +               if (!version || strncmp(version, "eMAG", 4))
>                         break;
> 
>                 fallthrough;

Yay! Success! I just tested your latest efi/urgent (with the fixup) and
system completed the boot without any soft lockups.

Thanks!
-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 17:52                                   ` Andrea Righi
@ 2023-03-16 18:55                                     ` Ard Biesheuvel
  2023-03-16 18:57                                       ` Andrea Righi
  2023-03-16 22:28                                       ` Darren Hart
  0 siblings, 2 replies; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-16 18:55 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Thu, Mar 16, 2023 at 03:08:53PM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 15:06, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 16 Mar 2023 at 14:59, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > >
> > > > On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> > > > > On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > > > > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > > > > > LENOVO
> > > > > > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > > > > > 02/26/2021
> > > > > > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > > > > > 1.15
> > > > > > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > > > > > problem also with this patch applied.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for trying.
> > > > > > > > > > > >
> > > > > > > > > > > > How about the last 3 patches on this branch?
> > > > > > > > > > > >
> > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > > > > > >
> > > > > > > > > > > Actually, that may not match your hardware.
> > > > > > > > > > >
> > > > > > > > > > > Does your kernel log have a line like
> > > > > > > > > > >
> > > > > > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > > > > > >
> > > > > > > > > > > ?
> > > > > > > > > >
> > > > > > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > > > > > least the type 4 record(s)?
> > > > > > > >
> > > > > > > > Sure, here's the full output of dmidecode:
> > > > > > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > > > > > >
> > > > > > >
> > > > > > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > > > > > 'eMAG' into account, which appears to be what these boxes are using.
> > > > > > >
> > > > > > > I have updated the efi/urgent branch here with the latest versions.
> > > > > > > Mind giving them a spin?
> > > > > > >
> > > > > > >
> > > > > > > In the mean time, just for the record - could you please run this as well?
> > > > > > >
> > > > > > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > > > >
> > > > > > > (as root)
> > > > > >
> > > > > > hm.. I don't have that in /sys/firmware/, this is what I have:
> > > > > >
> > > > > > # ls -l /sys/firmware/dmi/
> > > > > > total 0
> > > > > > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > > > > > # ls -l /sys/firmware/dmi/tables/
> > > > > > total 0
> > > > > > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > > > > > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> > > > > >
> > > > >
> > > > > You'll need to load the dmi_sysfs module for that. But no big deal
> > > > > otherwise, I'm pretty sure the word order is the correct on on your
> > > > > system in any case (it decodes the value correctly in the next line)
> > > >
> > > > ok, much better after modprobe dmi_sysfs. :)
> > > >
> > >
> > > Yeah better, thanks.
> > >
> > > > $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > 00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
> > > > 00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
> > > > 00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
> > > > 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
> > > > 00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|
> > >
> > > Darn, this means we have to match for "eMAG " (with the trailing
> > > space) so the branch i just pushed needs to be updated for this.
> > >
> >
> > I.e.,
> >
> > --- a/drivers/firmware/efi/libstub/arm64.c
> > +++ b/drivers/firmware/efi/libstub/arm64.c
> > @@ -36,7 +36,7 @@ static bool system_needs_vamap(void)
> >         default:
> >                 version = efi_get_smbios_string(&record->header, 4,
> >                                                 processor_version);
> > -               if (!version || strcmp(version, "eMAG"))
> > +               if (!version || strncmp(version, "eMAG", 4))
> >                         break;
> >
> >                 fallthrough;
>
> Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> system completed the boot without any soft lockups.
>

Thanks for confirming. I'll take that as a tested-by

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 18:55                                     ` Ard Biesheuvel
@ 2023-03-16 18:57                                       ` Andrea Righi
  2023-03-16 22:28                                       ` Darren Hart
  1 sibling, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2023-03-16 18:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel, Darren Hart

On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
...
> >
> > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > system completed the boot without any soft lockups.
> >
> 
> Thanks for confirming. I'll take that as a tested-by

Sure, thanks!

Tested-by: Andrea Righi <andrea.righi@canonical.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 18:55                                     ` Ard Biesheuvel
  2023-03-16 18:57                                       ` Andrea Righi
@ 2023-03-16 22:28                                       ` Darren Hart
  2023-03-18 10:35                                         ` Ard Biesheuvel
  1 sibling, 1 reply; 31+ messages in thread
From: Darren Hart @ 2023-03-16 22:28 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andrea Righi, Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel

On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 03:08:53PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 15:06, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Thu, 16 Mar 2023 at 14:59, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > >
> > > > > On Thu, Mar 16, 2023 at 02:53:24PM +0100, Ard Biesheuvel wrote:
> > > > > > On Thu, 16 Mar 2023 at 14:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 16, 2023 at 02:45:49PM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Thu, 16 Mar 2023 at 13:50, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 16, 2023 at 01:43:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > On Thu, 16 Mar 2023 at 13:41, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 16, 2023 at 01:38:30PM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > On Thu, 16 Mar 2023 at 13:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 16 Mar 2023 at 12:34, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 11:18:21AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 11:03, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 10:55:58AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > > (cc Darren)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 10:45, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Mar 16, 2023 at 08:58:20AM +0100, Ard Biesheuvel wrote:
> > > > > > > > > > > > > > > > > > > Hello Andrea,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Thu, 16 Mar 2023 at 08:54, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> > > > > > > > > > > > > > > > > > > > gets stuck and never completes the boot. On the console I see this:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > > > > > > > > > > [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> > > > > > > > > > > > > > > > > > > > [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> > > > > > > > > > > > > > > > > > > > [   72.064949] Task dump for CPU 22:
> > > > > > > > > > > > > > > > > > > > [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
> > > > > > > > > > > > > > > > > > > > [   72.078156] Workqueue: efi_rts_wq efi_call_rts
> > > > > > > > > > > > > > > > > > > > [   72.082595] Call trace:
> > > > > > > > > > > > > > > > > > > > [   72.085029]  __switch_to+0xbc/0x100
> > > > > > > > > > > > > > > > > > > > [   72.088508]  0xffff80000fe83d4c
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > After that, as a consequence, I start to get a lot of hung task timeout traces.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I tried to bisect the problem and I found that the offending commit is
> > > > > > > > > > > > > > > > > > > > this one:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I've reverted this commit for now and everything works just fine, but I
> > > > > > > > > > > > > > > > > > > > was wondering if the problem could be caused by a lack of entropy on
> > > > > > > > > > > > > > > > > > > > these arm64 boxes or something else.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Any suggestion? Let me know if you want me to do any specific test.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the report.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This is most likely the EFI SetVariable() call going off into the
> > > > > > > > > > > > > > > > > > > weeds and never returning.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Is this an Ampere Altra system by any chance? Do you see it on
> > > > > > > > > > > > > > > > > > > different types of hardware?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This is: Ampere eMAG / Lenovo ThinkSystem HR330a.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Could you check whether SetVariable works on this system? E.g. by
> > > > > > > > > > > > > > > > > > > updating the EFI boot timeout (sudo efibootmgr -t <n>)?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ubuntu@kuzzle:~$ sudo efibootmgr -t 10
> > > > > > > > > > > > > > > > > > ^C^C^C^C
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ^ Stuck there, so it really looks like SetVariable is the problem.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Could you please share the output of
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > dmidecode -s bios
> > > > > > > > > > > > > > > > > dmidecode -s system-family
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > $ sudo dmidecode -s bios-vendor
> > > > > > > > > > > > > > > > LENOVO
> > > > > > > > > > > > > > > > $ sudo dmidecode -s bios-version
> > > > > > > > > > > > > > > > hve104r-1.15
> > > > > > > > > > > > > > > > $ sudo dmidecode -s bios-release-date
> > > > > > > > > > > > > > > > 02/26/2021
> > > > > > > > > > > > > > > > $ sudo dmidecode -s bios-revision
> > > > > > > > > > > > > > > > 1.15
> > > > > > > > > > > > > > > > $ sudo dmidecode -s system-family
> > > > > > > > > > > > > > > > Lenovo ThinkSystem HR330A/HR350A
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Mind checking if this patch fixes your issue as well?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=altra-fix&id=77fa99dd4741456da85049c13ec31a148f5f5ac0
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Unfortunately this doesn't seem to be enough, I'm still getting the same
> > > > > > > > > > > > > > problem also with this patch applied.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for trying.
> > > > > > > > > > > > >
> > > > > > > > > > > > > How about the last 3 patches on this branch?
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-smbios-altra-fix
> > > > > > > > > > > >
> > > > > > > > > > > > Actually, that may not match your hardware.
> > > > > > > > > > > >
> > > > > > > > > > > > Does your kernel log have a line like
> > > > > > > > > > > >
> > > > > > > > > > > > SMCCC: SOC_ID: ID = jep106:036b:0019 Revision = 0x00000102
> > > > > > > > > > > >
> > > > > > > > > > > > ?
> > > > > > > > > > >
> > > > > > > > > > > $ sudo dmesg | grep "SMCCC: SOC_ID"
> > > > > > > > > > > [    5.320782] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks. Could you share the entire dmidecode output somewhere? Or at
> > > > > > > > > > least the type 4 record(s)?
> > > > > > > > >
> > > > > > > > > Sure, here's the full output of dmidecode:
> > > > > > > > > https://pastebin.ubuntu.com/p/4ZmKmP2xTm/
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks. I have updated my SMBIOS patches to take the processor version
> > > > > > > > 'eMAG' into account, which appears to be what these boxes are using.
> > > > > > > >
> > > > > > > > I have updated the efi/urgent branch here with the latest versions.
> > > > > > > > Mind giving them a spin?
> > > > > > > >
> > > > > > > >
> > > > > > > > In the mean time, just for the record - could you please run this as well?
> > > > > > > >
> > > > > > > > hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > > > > >
> > > > > > > > (as root)
> > > > > > >
> > > > > > > hm.. I don't have that in /sys/firmware/, this is what I have:
> > > > > > >
> > > > > > > # ls -l /sys/firmware/dmi/
> > > > > > > total 0
> > > > > > > drwxr-xr-x 2 root root 0 Mar 16 13:26 tables
> > > > > > > # ls -l /sys/firmware/dmi/tables/
> > > > > > > total 0
> > > > > > > -r-------- 1 root root 5004 Mar 16 13:26 DMI
> > > > > > > -r-------- 1 root root   24 Mar 16 13:26 smbios_entry_point
> > > > > > >
> > > > > >
> > > > > > You'll need to load the dmi_sysfs module for that. But no big deal
> > > > > > otherwise, I'm pretty sure the word order is the correct on on your
> > > > > > system in any case (it decodes the value correctly in the next line)
> > > > >
> > > > > ok, much better after modprobe dmi_sysfs. :)
> > > > >
> > > >
> > > > Yeah better, thanks.
> > > >
> > > > > $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > > > > 00000000  04 30 04 00 01 03 fe 02  02 00 3f 50 00 00 00 00  |.0........?P....|
> > > > > 00000010  03 89 b8 0b e4 0c b8 0b  41 06 05 00 06 00 07 00  |........A.......|
> > > > > 00000020  04 00 00 20 20 20 7c 00  01 01 00 00 00 00 00 00  |...   |.........|
> > > > > 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 54 4d 29  |CPU 1.Ampere(TM)|
> > > > > 00000040  00 65 4d 41 47 20 00 30  30 30 30 30 30 30 30 30  |.eMAG .000000000|
> > > >
> > > > Darn, this means we have to match for "eMAG " (with the trailing
> > > > space) so the branch i just pushed needs to be updated for this.
> > > >
> > >
> > > I.e.,
> > >
> > > --- a/drivers/firmware/efi/libstub/arm64.c
> > > +++ b/drivers/firmware/efi/libstub/arm64.c
> > > @@ -36,7 +36,7 @@ static bool system_needs_vamap(void)
> > >         default:
> > >                 version = efi_get_smbios_string(&record->header, 4,
> > >                                                 processor_version);
> > > -               if (!version || strcmp(version, "eMAG"))
> > > +               if (!version || strncmp(version, "eMAG", 4))
> > >                         break;
> > >
> > >                 fallthrough;
> >
> > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > system completed the boot without any soft lockups.
> >
> 
> Thanks for confirming. I'll take that as a tested-by

The solution in the current branch looks like the best approach we have to date
to address the broadest of affected systems. We could switch the eMAG test to an
MIDR test I believe (but this won't work for Altra as that would capture all the
Neoverse v1 cores beyond Altra). I can look into the MIDR test if you think it's
worthwhile - but since I don't think we can eliminate the SMBIOS string test, it
doesn't buy us much since we don't need a greedier eMAG test (there aren't more
of them to match).

Given that some OEM Altra platforms change the processor ID, I don't see a
better solution currently than adding their the "product name" to the smbios
string tests unfortunately.

-- 
Darren Hart
Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16 22:28                                       ` Darren Hart
@ 2023-03-18 10:35                                         ` Ard Biesheuvel
  2023-03-20 18:00                                           ` Darren Hart
  2023-04-13 20:24                                           ` Andrea Righi
  0 siblings, 2 replies; 31+ messages in thread
From: Ard Biesheuvel @ 2023-03-18 10:35 UTC (permalink / raw)
  To: Darren Hart
  Cc: Andrea Righi, Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel

On Thu, 16 Mar 2023 at 23:28, Darren Hart <darren@os.amperecomputing.com> wrote:
>
> On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
> > On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@canonical.com> wrote:
...
> > >
> > > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > > system completed the boot without any soft lockups.
> > >
> >
> > Thanks for confirming. I'll take that as a tested-by
>
> The solution in the current branch looks like the best approach we have to date
> to address the broadest of affected systems. We could switch the eMAG test to an
> MIDR test I believe (but this won't work for Altra as that would capture all the
> Neoverse v1 cores beyond Altra). I can look into the MIDR test if you think it's
> worthwhile - but since I don't think we can eliminate the SMBIOS string test, it
> doesn't buy us much since we don't need a greedier eMAG test (there aren't more
> of them to match).
>
> Given that some OEM Altra platforms change the processor ID, I don't see a
> better solution currently than adding their the "product name" to the smbios
> string tests unfortunately.
>

Indeed. I spotted a Gigabyte system [0] with a different processor ID,
but with a version we can test for.

So for now, I'll go with

        socid = (u32 *)record->processor_id;
        switch (*socid & 0xffff000f) {
                static char const altra[] = "Ampere(TM) Altra(TM) Processor";
                static char const emag[] = "eMAG";
        default:
                version = efi_get_smbios_string(&record->header, 4,
                                                processor_version);
                if (!version || (strncmp(version, altra, sizeof(altra) - 1) &&
                                 strncmp(version, emag, sizeof(emag) - 1)))
                        break;

                fallthrough;

        case 0x0a160001:        // Altra
        case 0x0a160002:        // Altra Max
                efi_warn("Working around broken SetVirtualAddressMap()\n");
...

which should cover all the affected systems we encountered so far.

I'll push this to linux-next to let it soak for a little bit, and then
send it to Linus somewhere during the week

Thanks,
Ard.


[0] https://pastebin.com/HQLE1yYv

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-18 10:35                                         ` Ard Biesheuvel
@ 2023-03-20 18:00                                           ` Darren Hart
  2023-04-13 20:24                                           ` Andrea Righi
  1 sibling, 0 replies; 31+ messages in thread
From: Darren Hart @ 2023-03-20 18:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andrea Righi, Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel

On Sat, Mar 18, 2023 at 11:35:44AM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 23:28, Darren Hart <darren@os.amperecomputing.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@canonical.com> wrote:
> ...
> > > >
> > > > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > > > system completed the boot without any soft lockups.
> > > >
> > >
> > > Thanks for confirming. I'll take that as a tested-by
> >
> > The solution in the current branch looks like the best approach we have to date
> > to address the broadest of affected systems. We could switch the eMAG test to an
> > MIDR test I believe (but this won't work for Altra as that would capture all the
> > Neoverse v1 cores beyond Altra). I can look into the MIDR test if you think it's
> > worthwhile - but since I don't think we can eliminate the SMBIOS string test, it
> > doesn't buy us much since we don't need a greedier eMAG test (there aren't more
> > of them to match).
> >
> > Given that some OEM Altra platforms change the processor ID, I don't see a
> > better solution currently than adding their the "product name" to the smbios
> > string tests unfortunately.
> >
> 
> Indeed. I spotted a Gigabyte system [0] with a different processor ID,
> but with a version we can test for.
> 
> So for now, I'll go with
> 
>         socid = (u32 *)record->processor_id;
>         switch (*socid & 0xffff000f) {
>                 static char const altra[] = "Ampere(TM) Altra(TM) Processor";
>                 static char const emag[] = "eMAG";
>         default:
>                 version = efi_get_smbios_string(&record->header, 4,
>                                                 processor_version);
>                 if (!version || (strncmp(version, altra, sizeof(altra) - 1) &&
>                                  strncmp(version, emag, sizeof(emag) - 1)))
>                         break;
> 
>                 fallthrough;
> 
>         case 0x0a160001:        // Altra
>         case 0x0a160002:        // Altra Max
>                 efi_warn("Working around broken SetVirtualAddressMap()\n");
> ...
> 
> which should cover all the affected systems we encountered so far.
> 
> I'll push this to linux-next to let it soak for a little bit, and then
> send it to Linus somewhere during the week

Thank you Ard, I think this is our best option.

-- 
Darren Hart
Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-16  9:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-04-05 12:50   ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 31+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-04-05 12:50 UTC (permalink / raw)
  To: Andrea Righi, Jason A. Donenfeld
  Cc: Ard Biesheuvel, Paolo Pisati, linux-efi, linux-kernel,
	Linux kernel regressions list

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 16.03.23 10:45, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> On 16.03.23 08:54, Andrea Righi wrote:
>> Hello,
>>
>> the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
>> gets stuck and never completes the boot. On the console I see this:
>>
>> [   72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> [   72.049571] rcu:     22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
>> [   72.058520]     (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
>> [   72.064949] Task dump for CPU 22:
>> [   72.068251] task:kworker/u64:5   state:R  running task     stack:0     pid:447   ppid:2      flags:0x0000000a
>> [   72.078156] Workqueue: efi_rts_wq efi_call_rts
>> [   72.082595] Call trace:
>> [   72.085029]  __switch_to+0xbc/0x100
>> [   72.088508]  0xffff80000fe83d4c
>>
>> After that, as a consequence, I start to get a lot of hung task timeout traces.
>>
>> I tried to bisect the problem and I found that the offending commit is
>> this one:
>>
>>  e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
>>
>> I've reverted this commit for now and everything works just fine, but I
>> was wondering if the problem could be caused by a lack of entropy on
>> these arm64 boxes or something else.
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e7b813b32a42
> #regzbot title efi: stuck at boot (efi_call_rts) on arm64
> #regzbot ignore-activity

#regzbot fix: eb684408f3ea4856
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-03-18 10:35                                         ` Ard Biesheuvel
  2023-03-20 18:00                                           ` Darren Hart
@ 2023-04-13 20:24                                           ` Andrea Righi
  2023-04-17 22:05                                             ` Darren Hart
  1 sibling, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2023-04-13 20:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Darren Hart, Jason A. Donenfeld, Paolo Pisati, linux-efi, linux-kernel

On Sat, Mar 18, 2023 at 11:35:44AM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 23:28, Darren Hart <darren@os.amperecomputing.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@canonical.com> wrote:
> ...
> > > >
> > > > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > > > system completed the boot without any soft lockups.
> > > >
> > >
> > > Thanks for confirming. I'll take that as a tested-by
> >
> > The solution in the current branch looks like the best approach we have to date
> > to address the broadest of affected systems. We could switch the eMAG test to an
> > MIDR test I believe (but this won't work for Altra as that would capture all the
> > Neoverse v1 cores beyond Altra). I can look into the MIDR test if you think it's
> > worthwhile - but since I don't think we can eliminate the SMBIOS string test, it
> > doesn't buy us much since we don't need a greedier eMAG test (there aren't more
> > of them to match).
> >
> > Given that some OEM Altra platforms change the processor ID, I don't see a
> > better solution currently than adding their the "product name" to the smbios
> > string tests unfortunately.
> >
> 
> Indeed. I spotted a Gigabyte system [0] with a different processor ID,
> but with a version we can test for.
> 
> So for now, I'll go with
> 
>         socid = (u32 *)record->processor_id;
>         switch (*socid & 0xffff000f) {
>                 static char const altra[] = "Ampere(TM) Altra(TM) Processor";
>                 static char const emag[] = "eMAG";
>         default:
>                 version = efi_get_smbios_string(&record->header, 4,
>                                                 processor_version);
>                 if (!version || (strncmp(version, altra, sizeof(altra) - 1) &&
>                                  strncmp(version, emag, sizeof(emag) - 1)))
>                         break;
> 
>                 fallthrough;
> 
>         case 0x0a160001:        // Altra
>         case 0x0a160002:        // Altra Max
>                 efi_warn("Working around broken SetVirtualAddressMap()\n");
> ...
> 
> which should cover all the affected systems we encountered so far.
> 
> I'll push this to linux-next to let it soak for a little bit, and then
> send it to Linus somewhere during the week
> 
> Thanks,
> Ard.
> 
> 
> [0] https://pastebin.com/HQLE1yYv

Not sure if it's a similar issue, but I have found another Ampere box
that is booting fine with your fixes, but the eifvars.sh kselftest is
failing with some I/O errors, specifically:

$ sudo ./efivarfs.sh
--------------------
running test_create
--------------------
./efivarfs.sh: line 58: printf: write error: Input/output error
/sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27 has invalid size
  [FAIL]
--------------------
running test_create_empty
--------------------
  [PASS]
--------------------
running test_create_read
--------------------
  [PASS]
--------------------
running test_delete
--------------------
./efivarfs.sh: line 103: printf: write error: Input/output error
  [PASS]
--------------------
running test_zero_size_delete
--------------------
./efivarfs.sh: line 126: printf: write error: Input/output error
./efivarfs.sh: line 134: printf: write error: Input/output error
/sys/firmware/efi/efivars/test_zero_size_delete-210be57c-9849-4fc7-a635-e6382d1aec27 should have been deleted
  [FAIL]
--------------------
running test_open_unlink
--------------------
open(O_WRONLY): Operation not permitted
  [FAIL]
--------------------
running test_valid_filenames
--------------------
./efivarfs.sh: line 158: printf: write error: Input/output error
./efivarfs.sh: line 158: printf: write error: Input/output error
./efivarfs.sh: line 158: printf: write error: Input/output error
./efivarfs.sh: line 158: printf: write error: Input/output error
  [PASS]
--------------------
running test_invalid_filenames
--------------------
  [PASS]

If it helps:

$ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
00000000  04 30 04 00 01 03 fe 02  c1 d0 3f 41 00 00 00 00  |.0........?A....|
00000010  03 8a 72 06 b8 0b f0 0a  41 06 05 00 06 00 07 00  |..r.....A.......|
00000020  04 05 06 50 50 50 04 00  01 01 01 00 01 00 01 00  |...PPP..........|
00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 52 29 00  |CPU 1.Ampere(R).|
00000040  41 6d 70 65 72 65 28 52  29 20 41 6c 74 72 61 28  |Ampere(R) Altra(|
00000050  52 29 20 50 72 6f 63 65  73 73 6f 72 00 30 30 30  |R) Processor.000|
00000060  30 30 30 30 30 30 30 30  30 30 30 30 30 30 32 35  |0000000000000025|
00000070  35 30 32 30 39 30 33 33  38 36 35 42 34 00 30 30  |50209033865B4.00|
00000080  30 30 30 30 30 31 00 51  38 30 2d 33 30 00 00     |000001.Q80-30..|
0000008f

I guess EFI is not very reliable here...

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-04-13 20:24                                           ` Andrea Righi
@ 2023-04-17 22:05                                             ` Darren Hart
  2023-04-18  5:42                                               ` Andrea Righi
  0 siblings, 1 reply; 31+ messages in thread
From: Darren Hart @ 2023-04-17 22:05 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Ard Biesheuvel, Jason A. Donenfeld, Paolo Pisati, linux-efi,
	linux-kernel

On Thu, Apr 13, 2023 at 10:24:38PM +0200, Andrea Righi wrote:
> 
> Not sure if it's a similar issue, but I have found another Ampere box
> that is booting fine with your fixes, but the eifvars.sh kselftest is
> failing with some I/O errors, specifically:

Thanks for reporting. Can you confirm this worked reliably for you prior
to v6.1?

--
Darren

> 
> $ sudo ./efivarfs.sh
> --------------------
> running test_create
> --------------------
> ./efivarfs.sh: line 58: printf: write error: Input/output error
> /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27 has invalid size
>   [FAIL]
> --------------------
> running test_create_empty
> --------------------
>   [PASS]
> --------------------
> running test_create_read
> --------------------
>   [PASS]
> --------------------
> running test_delete
> --------------------
> ./efivarfs.sh: line 103: printf: write error: Input/output error
>   [PASS]
> --------------------
> running test_zero_size_delete
> --------------------
> ./efivarfs.sh: line 126: printf: write error: Input/output error
> ./efivarfs.sh: line 134: printf: write error: Input/output error
> /sys/firmware/efi/efivars/test_zero_size_delete-210be57c-9849-4fc7-a635-e6382d1aec27 should have been deleted
>   [FAIL]
> --------------------
> running test_open_unlink
> --------------------
> open(O_WRONLY): Operation not permitted
>   [FAIL]
> --------------------
> running test_valid_filenames
> --------------------
> ./efivarfs.sh: line 158: printf: write error: Input/output error
> ./efivarfs.sh: line 158: printf: write error: Input/output error
> ./efivarfs.sh: line 158: printf: write error: Input/output error
> ./efivarfs.sh: line 158: printf: write error: Input/output error
>   [PASS]
> --------------------
> running test_invalid_filenames
> --------------------
>   [PASS]
> 
> If it helps:
> 
> $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> 00000000  04 30 04 00 01 03 fe 02  c1 d0 3f 41 00 00 00 00  |.0........?A....|
> 00000010  03 8a 72 06 b8 0b f0 0a  41 06 05 00 06 00 07 00  |..r.....A.......|
> 00000020  04 05 06 50 50 50 04 00  01 01 01 00 01 00 01 00  |...PPP..........|
> 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 52 29 00  |CPU 1.Ampere(R).|
> 00000040  41 6d 70 65 72 65 28 52  29 20 41 6c 74 72 61 28  |Ampere(R) Altra(|
> 00000050  52 29 20 50 72 6f 63 65  73 73 6f 72 00 30 30 30  |R) Processor.000|
> 00000060  30 30 30 30 30 30 30 30  30 30 30 30 30 30 32 35  |0000000000000025|
> 00000070  35 30 32 30 39 30 33 33  38 36 35 42 34 00 30 30  |50209033865B4.00|
> 00000080  30 30 30 30 30 31 00 51  38 30 2d 33 30 00 00     |000001.Q80-30..|
> 0000008f
> 
> I guess EFI is not very reliable here...
> 
> -Andrea

-- 
Darren Hart
Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64
  2023-04-17 22:05                                             ` Darren Hart
@ 2023-04-18  5:42                                               ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2023-04-18  5:42 UTC (permalink / raw)
  To: Darren Hart
  Cc: Ard Biesheuvel, Jason A. Donenfeld, Paolo Pisati, linux-efi,
	linux-kernel

On Mon, Apr 17, 2023 at 03:05:18PM -0700, Darren Hart wrote:
> On Thu, Apr 13, 2023 at 10:24:38PM +0200, Andrea Righi wrote:
> > 
> > Not sure if it's a similar issue, but I have found another Ampere box
> > that is booting fine with your fixes, but the eifvars.sh kselftest is
> > failing with some I/O errors, specifically:
> 
> Thanks for reporting. Can you confirm this worked reliably for you prior
> to v6.1?
> 
> --
> Darren

I tested again and I confirm that after a reboot everything looks fine.
Maybe EFI was messed up with a previous test and the latest kernel fixes
everything. Anyway this issue seems resolved for me.

Thanks,
-Andrea

> 
> > 
> > $ sudo ./efivarfs.sh
> > --------------------
> > running test_create
> > --------------------
> > ./efivarfs.sh: line 58: printf: write error: Input/output error
> > /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27 has invalid size
> >   [FAIL]
> > --------------------
> > running test_create_empty
> > --------------------
> >   [PASS]
> > --------------------
> > running test_create_read
> > --------------------
> >   [PASS]
> > --------------------
> > running test_delete
> > --------------------
> > ./efivarfs.sh: line 103: printf: write error: Input/output error
> >   [PASS]
> > --------------------
> > running test_zero_size_delete
> > --------------------
> > ./efivarfs.sh: line 126: printf: write error: Input/output error
> > ./efivarfs.sh: line 134: printf: write error: Input/output error
> > /sys/firmware/efi/efivars/test_zero_size_delete-210be57c-9849-4fc7-a635-e6382d1aec27 should have been deleted
> >   [FAIL]
> > --------------------
> > running test_open_unlink
> > --------------------
> > open(O_WRONLY): Operation not permitted
> >   [FAIL]
> > --------------------
> > running test_valid_filenames
> > --------------------
> > ./efivarfs.sh: line 158: printf: write error: Input/output error
> > ./efivarfs.sh: line 158: printf: write error: Input/output error
> > ./efivarfs.sh: line 158: printf: write error: Input/output error
> > ./efivarfs.sh: line 158: printf: write error: Input/output error
> >   [PASS]
> > --------------------
> > running test_invalid_filenames
> > --------------------
> >   [PASS]
> > 
> > If it helps:
> > 
> > $ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
> > 00000000  04 30 04 00 01 03 fe 02  c1 d0 3f 41 00 00 00 00  |.0........?A....|
> > 00000010  03 8a 72 06 b8 0b f0 0a  41 06 05 00 06 00 07 00  |..r.....A.......|
> > 00000020  04 05 06 50 50 50 04 00  01 01 01 00 01 00 01 00  |...PPP..........|
> > 00000030  43 50 55 20 31 00 41 6d  70 65 72 65 28 52 29 00  |CPU 1.Ampere(R).|
> > 00000040  41 6d 70 65 72 65 28 52  29 20 41 6c 74 72 61 28  |Ampere(R) Altra(|
> > 00000050  52 29 20 50 72 6f 63 65  73 73 6f 72 00 30 30 30  |R) Processor.000|
> > 00000060  30 30 30 30 30 30 30 30  30 30 30 30 30 30 32 35  |0000000000000025|
> > 00000070  35 30 32 30 39 30 33 33  38 36 35 42 34 00 30 30  |50209033865B4.00|
> > 00000080  30 30 30 30 30 31 00 51  38 30 2d 33 30 00 00     |000001.Q80-30..|
> > 0000008f
> > 
> > I guess EFI is not very reliable here...
> > 
> > -Andrea
> 
> -- 
> Darren Hart
> Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-04-18  5:43 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16  7:54 kernel 6.2 stuck at boot (efi_call_rts) on arm64 Andrea Righi
2023-03-16  7:58 ` Ard Biesheuvel
2023-03-16  9:45   ` Andrea Righi
2023-03-16  9:55     ` Ard Biesheuvel
2023-03-16 10:03       ` Andrea Righi
2023-03-16 10:18         ` Ard Biesheuvel
2023-03-16 11:33           ` Andrea Righi
2023-03-16 12:21             ` Ard Biesheuvel
2023-03-16 12:38               ` Ard Biesheuvel
2023-03-16 12:41                 ` Andrea Righi
2023-03-16 12:43                   ` Ard Biesheuvel
2023-03-16 12:49                     ` Andrea Righi
2023-03-16 13:45                       ` Ard Biesheuvel
2023-03-16 13:46                         ` Ard Biesheuvel
2023-03-16 13:50                         ` Andrea Righi
2023-03-16 13:53                           ` Ard Biesheuvel
2023-03-16 13:59                             ` Andrea Righi
2023-03-16 14:06                               ` Ard Biesheuvel
2023-03-16 14:08                                 ` Ard Biesheuvel
2023-03-16 14:25                                   ` Andrea Righi
2023-03-16 17:52                                   ` Andrea Righi
2023-03-16 18:55                                     ` Ard Biesheuvel
2023-03-16 18:57                                       ` Andrea Righi
2023-03-16 22:28                                       ` Darren Hart
2023-03-18 10:35                                         ` Ard Biesheuvel
2023-03-20 18:00                                           ` Darren Hart
2023-04-13 20:24                                           ` Andrea Righi
2023-04-17 22:05                                             ` Darren Hart
2023-04-18  5:42                                               ` Andrea Righi
2023-03-16  9:45 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-04-05 12:50   ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).