All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
@ 2023-05-26 19:17 Sami Korkalainen
  2023-05-27  1:17 ` Bagas Sanjaya
  0 siblings, 1 reply; 27+ messages in thread
From: Sami Korkalainen @ 2023-05-26 19:17 UTC (permalink / raw)
  To: stable; +Cc: regressions

Linux 6.2 and newer are (mostly) unbootable on my old HP 6730b laptop, the 6.1.30 works still fine.
The weirdest thing is that newer kernels (like 6.3.4 and 6.4-rc3) may boot ok on the first try, but when rebooting, the very same version doesn't boot.
       
Some times, when trying to boot, I get this message repeated forever:
ACPI Error: No handler or method for GPE [XX], disabling event (20221020/evgpe-839)
On newer kernels, the date is 20230331 instead of 20221020. There is also some other error, but I can't read it as it gets overwritten by the other ACPI error, see image linked at the end.

And some times, the screen will just stay completely blank.

I tried booting with acpi=off, but it does not help.
       
I bisected and this is the first bad commit 7e68dd7d07a2
"Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"
       
As the later kernels had the seemingly random booting behaviour (mentioned above), I retested the last good one 7c4a6309e27f by booting it several times and it boots every time.

I tried getting some boot logs, but the boot process does not go far enough to make any logs.

Kernel .config file: https://0x0.st/Hqt1.txt
     
Environment (outputs of a working Linux 6.1 build):
Software (output of the ver_linux script): https://0x0.st/Hqte.txt
Processor information (from /proc/cpuinfo): https://0x0.st/Hqt2.txt
Module information (from /proc/modules): https://0x0.st/HqtL.txt
/proc/ioports: https://0x0.st/Hqt9.txt
/proc/iomem:   https://0x0.st/Hqtf.txt
PCI information ('lspci -vvv' as root): https://0x0.st/HqtO.txt
SCSI information (from /proc/scsi/scsi)

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: KINGSTON SVP200S Rev: C4
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: hp       Model: CDDVDW TS-L633M  Rev: 0301
Type:   CD-ROM                           ANSI  SCSI revision: 05
       
Distribution: Arch Linux
Boot manager: systemd-boot (UEFI)

git bisect log: https://0x0.st/Hqgx.txt
ACPI Error (sorry for the dusty screen): https://0x0.st/HqEk.jpeg

#regzbot ^introduced 7e68dd7d07a2

Best regards
Sami Korkalainen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-05-26 19:17 [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2' Sami Korkalainen
@ 2023-05-27  1:17 ` Bagas Sanjaya
  2023-05-27  4:07   ` Sami Korkalainen
  0 siblings, 1 reply; 27+ messages in thread
From: Bagas Sanjaya @ 2023-05-27  1:17 UTC (permalink / raw)
  To: Sami Korkalainen, Linux Stable
  Cc: Linux Regressions, Linux Networking, Linux ACPI,
	Rafael J. Wysocki, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

[-- Attachment #1: Type: text/plain, Size: 2842 bytes --]

On Fri, May 26, 2023 at 07:17:26PM +0000, Sami Korkalainen wrote:
> Linux 6.2 and newer are (mostly) unbootable on my old HP 6730b laptop, the 6.1.30 works still fine.
> The weirdest thing is that newer kernels (like 6.3.4 and 6.4-rc3) may boot ok on the first try, but when rebooting, the very same version doesn't boot.
>        
> Some times, when trying to boot, I get this message repeated forever:
> ACPI Error: No handler or method for GPE [XX], disabling event (20221020/evgpe-839)
> On newer kernels, the date is 20230331 instead of 20221020. There is also some other error, but I can't read it as it gets overwritten by the other ACPI error, see image linked at the end.
> 
> And some times, the screen will just stay completely blank.
> 
> I tried booting with acpi=off, but it does not help.
>        
> I bisected and this is the first bad commit 7e68dd7d07a2
> "Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"

I think networking changes shouldn't cause this ACPI regression, right?

>        
> As the later kernels had the seemingly random booting behaviour (mentioned above), I retested the last good one 7c4a6309e27f by booting it several times and it boots every time.
> 
> I tried getting some boot logs, but the boot process does not go far enough to make any logs.
> 
> Kernel .config file: https://0x0.st/Hqt1.txt
>      
> Environment (outputs of a working Linux 6.1 build):
> Software (output of the ver_linux script): https://0x0.st/Hqte.txt
> Processor information (from /proc/cpuinfo): https://0x0.st/Hqt2.txt
> Module information (from /proc/modules): https://0x0.st/HqtL.txt
> /proc/ioports: https://0x0.st/Hqt9.txt
> /proc/iomem:   https://0x0.st/Hqtf.txt
> PCI information ('lspci -vvv' as root): https://0x0.st/HqtO.txt
> SCSI information (from /proc/scsi/scsi)

Where is SCSI info?

> 
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: ATA      Model: KINGSTON SVP200S Rev: C4
> Type:   Direct-Access                    ANSI  SCSI revision: 05
> Host: scsi1 Channel: 00 Id: 00 Lun: 00
> Vendor: hp       Model: CDDVDW TS-L633M  Rev: 0301
> Type:   CD-ROM                           ANSI  SCSI revision: 05
>        
> Distribution: Arch Linux
> Boot manager: systemd-boot (UEFI)
> 
> git bisect log: https://0x0.st/Hqgx.txt
> ACPI Error (sorry for the dusty screen): https://0x0.st/HqEk.jpeg
> 
> #regzbot ^introduced 7e68dd7d07a2
> 
> Best regards
> Sami Korkalainen

Anyway, I also Cc: netdev and acpi lists and maintainers (maybe they have
idea on what's going on here) and also fixing up regzbot entry title:

#regzbot title: Boot stall with ACPI error (no handler/method for GPE) caused by net-next 6.2 pull

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-05-27  1:17 ` Bagas Sanjaya
@ 2023-05-27  4:07   ` Sami Korkalainen
  2023-06-12 14:07     ` Bagas Sanjaya
  0 siblings, 1 reply; 27+ messages in thread
From: Sami Korkalainen @ 2023-05-27  4:07 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Linux Stable, Linux Regressions, Linux Networking, Linux ACPI,
	Rafael J. Wysocki, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

>Where is SCSI info?

Right there, under the text (It was so short, that I thought to put it in the message. Maybe I should have put that also in pastebin for consistency and clarity):

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: KINGSTON SVP200S Rev: C4
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: hp       Model: CDDVDW TS-L633M  Rev: 0301
Type:   CD-ROM                           ANSI  SCSI revision: 05

>I think networking changes shouldn't cause this ACPI regression, right?
Yeah, beats me, but that's what I got by bisecting. My expertise ends about here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-05-27  4:07   ` Sami Korkalainen
@ 2023-06-12 14:07     ` Bagas Sanjaya
  2023-06-12 19:05       ` Sami Korkalainen
  0 siblings, 1 reply; 27+ messages in thread
From: Bagas Sanjaya @ 2023-06-12 14:07 UTC (permalink / raw)
  To: Sami Korkalainen, Linus Torvalds
  Cc: Linux Stable, Linux Regressions, Linux Networking, Linux ACPI,
	Rafael J. Wysocki, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Linux regression tracking (Thorsten Leemhuis)

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Sat, May 27, 2023 at 04:07:56AM +0000, Sami Korkalainen wrote:
> >Where is SCSI info?
> 
> Right there, under the text (It was so short, that I thought to put it in the message. Maybe I should have put that also in pastebin for consistency and clarity):
> 
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: ATA      Model: KINGSTON SVP200S Rev: C4
> Type:   Direct-Access                    ANSI  SCSI revision: 05
> Host: scsi1 Channel: 00 Id: 00 Lun: 00
> Vendor: hp       Model: CDDVDW TS-L633M  Rev: 0301
> Type:   CD-ROM                           ANSI  SCSI revision: 05
> 
> >I think networking changes shouldn't cause this ACPI regression, right?
> Yeah, beats me, but that's what I got by bisecting. My expertise ends about here.

Hmm, no reply for a while.

Networking people: It looks like your v6.2 PR introduces unrelated
ACPICA regression. Can you explain why?

ACPICA people: Can you figure out why do this regression happen?

Sami: Can you try latest mainline and repeat bisection as confirmation?

I'm considering to remove this from regression tracking if there is
no replies in several more days.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-12 14:07     ` Bagas Sanjaya
@ 2023-06-12 19:05       ` Sami Korkalainen
  2023-06-12 19:50         ` Andrew Lunn
  0 siblings, 1 reply; 27+ messages in thread
From: Sami Korkalainen @ 2023-06-12 19:05 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Linus Torvalds, Linux Stable, Linux Regressions,
	Linux Networking, Linux ACPI, Rafael J. Wysocki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux regression tracking (Thorsten Leemhuis)

Ok. I will try the latest mainline and if it does not work, I try bisecting again, but it should take at least a couple of weeks with this old PC. Can't really compile more than once a day.

Regards
Sami Korkalainen
___________________________

Sent with Proton Mail secure email.

------- Original Message -------
On Monday, June 12th, 2023 at 17.07, Bagas Sanjaya <bagasdotme@gmail.com> wrote:


> On Sat, May 27, 2023 at 04:07:56AM +0000, Sami Korkalainen wrote:
> 
> > > Where is SCSI info?
> > 
> > Right there, under the text (It was so short, that I thought to put it in the message. Maybe I should have put that also in pastebin for consistency and clarity):
> > 
> > Attached devices:
> > Host: scsi0 Channel: 00 Id: 00 Lun: 00
> > Vendor: ATA Model: KINGSTON SVP200S Rev: C4
> > Type: Direct-Access ANSI SCSI revision: 05
> > Host: scsi1 Channel: 00 Id: 00 Lun: 00
> > Vendor: hp Model: CDDVDW TS-L633M Rev: 0301
> > Type: CD-ROM ANSI SCSI revision: 05
> > 
> > > I think networking changes shouldn't cause this ACPI regression, right?
> > > Yeah, beats me, but that's what I got by bisecting. My expertise ends about here.
> 
> 
> Hmm, no reply for a while.
> 
> Networking people: It looks like your v6.2 PR introduces unrelated
> ACPICA regression. Can you explain why?
> 
> ACPICA people: Can you figure out why do this regression happen?
> 
> Sami: Can you try latest mainline and repeat bisection as confirmation?
> 
> I'm considering to remove this from regression tracking if there is
> no replies in several more days.
> 
> Thanks.
> 
> --
> An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-12 19:05       ` Sami Korkalainen
@ 2023-06-12 19:50         ` Andrew Lunn
  2023-06-21  6:07           ` Sami Korkalainen
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2023-06-12 19:50 UTC (permalink / raw)
  To: Sami Korkalainen
  Cc: Bagas Sanjaya, Linus Torvalds, Linux Stable, Linux Regressions,
	Linux Networking, Linux ACPI, Rafael J. Wysocki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux regression tracking (Thorsten Leemhuis)

On Mon, Jun 12, 2023 at 07:05:45PM +0000, Sami Korkalainen wrote:

> Ok. I will try the latest mainline and if it does not work, I try
> bisecting again, but it should take at least a couple of weeks with
> this old PC. Can't really compile more than once a day.

Cross compiling Linux has been possible for at least 20 years. Do the
build on something modern and copy the results to the target.

      Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-12 19:50         ` Andrew Lunn
@ 2023-06-21  6:07           ` Sami Korkalainen
  2023-06-21  8:46             ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 27+ messages in thread
From: Sami Korkalainen @ 2023-06-21  6:07 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Andrew Lunn, Linus Torvalds, Linux Stable, Linux Regressions,
	Linux Networking, Linux ACPI, Rafael J. Wysocki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux regression tracking (Thorsten Leemhuis)

[-- Attachment #1: Type: text/plain, Size: 538 bytes --]

I bisected again. It seems I made some mistake last time, as I got a different result this time. Maybe, because these problematic kernels may boot fine sometimes, like I said before.

Anyway, first bad commit (makes much more sense this time):
e7b813b32a42a3a6281a4fd9ae7700a0257c1d50
efi: random: refresh non-volatile random seed when RNG is initialized

I confirmed that this is the code causing the issue by commenting it out (see the patch file). Without this code, the latest mainline boots fine.

Terveisin
Sami Korkalainen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch; name=patch.diff, Size: 1137 bytes --]

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index abeff7dc0b58..c362c807f5d6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -360,7 +360,7 @@ static void __init efi_debugfs_init(void)
 #else
 static inline void efi_debugfs_init(void) {}
 #endif
-
+/*
 static void refresh_nv_rng_seed(struct work_struct *work)
 {
 	u8 seed[EFI_RANDOM_SEED_SIZE];
@@ -378,7 +378,7 @@ static int refresh_nv_rng_seed_notification(struct notifier_block *nb, unsigned
 	return NOTIFY_DONE;
 }
 static struct notifier_block refresh_nv_rng_seed_nb = { .notifier_call = refresh_nv_rng_seed_notification };
-
+*/
 /*
  * We register the efi subsystem with the firmware subsystem and the
  * efivars subsystem with the efi subsystem, if the system was booted with
@@ -450,10 +450,10 @@ static int __init efisubsys_init(void)
 	if (efi.coco_secret != EFI_INVALID_TABLE_ADDR)
 		platform_device_register_simple("efi_secret", 0, NULL, 0);
 #endif
-
+/*
 	if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
 		execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
-
+*/
 	return 0;
 
 err_remove_group:

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21  6:07           ` Sami Korkalainen
@ 2023-06-21  8:46             ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-21 17:56               ` Linus Torvalds
                                 ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-06-21  8:46 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Andrew Lunn, Linus Torvalds, Linux Stable, Linux Regressions,
	Bagas Sanjaya, Sami Korkalainen

[added Jason (who authored the culprit) to the list of recipients; moved
net people and list to BCC, guess they are not much interested in this
anymore then]

On 21.06.23 08:07, Sami Korkalainen wrote:
> I bisected again. It seems I made some mistake last time, as I got a
> different result this time. Maybe, because these problematic kernels may
> boot fine sometimes, like I said before.
> 
> Anyway, first bad commit (makes much more sense this time): 
> e7b813b32a42a3a6281a4fd9ae7700a0257c1d50 efi: random: refresh
> non-volatile random seed when RNG is initialized
> 
> I confirmed that this is the code causing the issue by commenting it
> out (see the patch file). Without this code, the latest mainline boots fine.

Jason, in that case it seems this is something for you. For the initial
report, see here:

https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/

Quoting a part of it:

```
Linux 6.2 and newer are (mostly) unbootable on my old HP 6730b laptop,
the 6.1.30 works still fine.
The weirdest thing is that newer kernels (like 6.3.4 and 6.4-rc3) may
boot ok on the first try, but when rebooting, the very same version
doesn't boot.

Some times, when trying to boot, I get this message repeated forever:
ACPI Error: No handler or method for GPE [XX], disabling event
(20221020/evgpe-839)
On newer kernels, the date is 20230331 instead of 20221020. There is
also some other error, but I can't read it as it gets overwritten by the
other ACPI error, see image linked at the end.

And some times, the screen will just stay completely blank.

I tried booting with acpi=off, but it does not help.
```
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot introduced e7b813b32a42a3a6281a4fd9ae7700a0257c1d50

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21  8:46             ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-06-21 17:56               ` Linus Torvalds
  2023-06-21 18:08                 ` Linus Torvalds
  2023-06-21 17:57               ` Jason A. Donenfeld
  2023-06-21 18:49               ` Jason A. Donenfeld
  2 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2023-06-21 17:56 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Jason A. Donenfeld, Andrew Lunn, Linux Stable, Bagas Sanjaya,
	Sami Korkalainen

On Wed, 21 Jun 2023 at 01:46, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> Jason, in that case it seems this is something for you. For the initial
> report, see here:

I'll just revert it for now. Writing EFI variables has always been
fraught with danger - more so than just reading them - and this one
just looks horrible anyway.

Calling execute_with_initialized_rng() can end up having the callback
done under a spinlock with interrupts disabled, which is probably why
it then has that odd double indirection through a one-time work. And
in no situation should we start writing to EFI variables during early
subsystem initialization, I feel.

It also probably shouldn't use the "set_variable" function at all, but
the non-blocking one, and who knows if it should try to do some
serialization with efi/vars.c.

I think it would be better off done in user space, but if we can't
trust user space to do the right thing, at least do it much much
later.

               Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21  8:46             ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-21 17:56               ` Linus Torvalds
@ 2023-06-21 17:57               ` Jason A. Donenfeld
  2023-06-23 13:55                 ` Ard Biesheuvel
  2023-06-21 18:49               ` Jason A. Donenfeld
  2 siblings, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2023-06-21 17:57 UTC (permalink / raw)
  To: regressions, Ard Biesheuvel
  Cc: Andrew Lunn, Linus Torvalds, Linux Stable, Linux Regressions,
	Bagas Sanjaya, Sami Korkalainen

+Ard - any ideas here?

On Wed, Jun 21, 2023 at 10:46 AM Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> [added Jason (who authored the culprit) to the list of recipients; moved
> net people and list to BCC, guess they are not much interested in this
> anymore then]
>
> On 21.06.23 08:07, Sami Korkalainen wrote:
> > I bisected again. It seems I made some mistake last time, as I got a
> > different result this time. Maybe, because these problematic kernels may
> > boot fine sometimes, like I said before.
> >
> > Anyway, first bad commit (makes much more sense this time):
> > e7b813b32a42a3a6281a4fd9ae7700a0257c1d50 efi: random: refresh
> > non-volatile random seed when RNG is initialized
> >
> > I confirmed that this is the code causing the issue by commenting it
> > out (see the patch file). Without this code, the latest mainline boots fine.
>
> Jason, in that case it seems this is something for you. For the initial
> report, see here:
>
> https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/
>
> Quoting a part of it:
>
> ```
> Linux 6.2 and newer are (mostly) unbootable on my old HP 6730b laptop,
> the 6.1.30 works still fine.
> The weirdest thing is that newer kernels (like 6.3.4 and 6.4-rc3) may
> boot ok on the first try, but when rebooting, the very same version
> doesn't boot.
>
> Some times, when trying to boot, I get this message repeated forever:
> ACPI Error: No handler or method for GPE [XX], disabling event
> (20221020/evgpe-839)
> On newer kernels, the date is 20230331 instead of 20221020. There is
> also some other error, but I can't read it as it gets overwritten by the
> other ACPI error, see image linked at the end.
>
> And some times, the screen will just stay completely blank.
>
> I tried booting with acpi=off, but it does not help.
> ```
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot introduced e7b813b32a42a3a6281a4fd9ae7700a0257c1d50

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21 17:56               ` Linus Torvalds
@ 2023-06-21 18:08                 ` Linus Torvalds
  2023-06-22 18:34                   ` Thorsten Leemhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2023-06-21 18:08 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Jason A. Donenfeld, Andrew Lunn, Linux Stable, Bagas Sanjaya,
	Sami Korkalainen

On Wed, 21 Jun 2023 at 10:56, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll just revert it for now.

Btw, Thorsten, is there a good way to refer to the regzbot entry in a
commit message some way? I know about the email interface, but I'd
love to just be able to link to the regression entry. Now I just
linked to the report in this thread.

Maybe you don't keep a long-term stable link around anywhere, and you
just pick up on reverts directly, but I suspect it would be nice to be
able to just link to any regression entry directly.

            Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21  8:46             ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-21 17:56               ` Linus Torvalds
  2023-06-21 17:57               ` Jason A. Donenfeld
@ 2023-06-21 18:49               ` Jason A. Donenfeld
  2023-06-21 19:51                 ` Linus Torvalds
  2 siblings, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2023-06-21 18:49 UTC (permalink / raw)
  To: Sami Korkalainen
  Cc: Andrew Lunn, Linus Torvalds, Linux Stable, Linux Regressions,
	Bagas Sanjaya, Brad Spengler, regressions

Hi Sami,

Would you try applying
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=13bb06f8dd42071cb9a49f6e21099eea05d4b856
instead of the revert?

Spender (CC'd) suggested to me that possibly the reason for your first
mis-bisect and possibly for the result you wound with has more to do
with some non-determinism in the actual underlying bug that the above
commit fixes. If applying 13bb06f8dd4207 fixes the issue, then Linus
can then revert the revert he just committed.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21 18:49               ` Jason A. Donenfeld
@ 2023-06-21 19:51                 ` Linus Torvalds
  2023-06-22 13:40                   ` Jason A. Donenfeld
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2023-06-21 19:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Sami Korkalainen, Andrew Lunn, Linux Stable, Linux Regressions,
	Bagas Sanjaya, Brad Spengler, regressions

On Wed, 21 Jun 2023 at 11:49, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Would you try applying
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=13bb06f8dd42071cb9a49f6e21099eea05d4b856
> instead of the revert?

That commit just got merged into my tree, and it fixes a real bug, but
it _shouldn't_ be what Sami sees.

The bug it fixes was only introduced in this merge window.

So any boot failures seen in older kernels would only be because it
was then backported to stable trees, but Sami mentions kernel versions
that don't have those stable backports (eg the original questionable
bisection that ended up on a bad commit 7e68dd7d07a2).

Now, with non-repeatable boot failures, anything is possible, and Sami
does mention 6.1.30 as good (implying that 6.1.31 might not be - and
that is when the backport happened).

So it's certainly worth checking out, but on the face of it, that
bisection result doesn't really support the bug being due to
e9523a0d81899 (which came *after* e7b813b32a42).

                Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21 19:51                 ` Linus Torvalds
@ 2023-06-22 13:40                   ` Jason A. Donenfeld
  2023-06-22 19:25                     ` Sami Korkalainen
  0 siblings, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2023-06-22 13:40 UTC (permalink / raw)
  To: Linus Torvalds, Sami Korkalainen
  Cc: Andrew Lunn, Linux Stable, Linux Regressions, Bagas Sanjaya,
	Brad Spengler, regressions

On Wed, Jun 21, 2023 at 9:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Now, with non-repeatable boot failures, anything is possible, and Sami
> does mention 6.1.30 as good (implying that 6.1.31 might not be - and
> that is when the backport happened).
>
> So it's certainly worth checking out, but on the face of it, that
> bisection result doesn't really support the bug being due to
> e9523a0d81899 (which came *after* e7b813b32a42).

Sami - awaiting your results.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21 18:08                 ` Linus Torvalds
@ 2023-06-22 18:34                   ` Thorsten Leemhuis
  0 siblings, 0 replies; 27+ messages in thread
From: Thorsten Leemhuis @ 2023-06-22 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Andrew Lunn, Linux Stable, Bagas Sanjaya,
	Sami Korkalainen, Konstantin Ryabitsev,
	Linux regressions mailing list

[CCing Konstantin]

On 21.06.23 20:08, Linus Torvalds wrote:
> On Wed, 21 Jun 2023 at 10:56, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I'll just revert it for now.
> 
> Btw, Thorsten, is there a good way to refer to the regzbot entry in a
> commit message some way? I know about the email interface, but I'd
> love to just be able to link to the regression entry.

There is a separate page for each tracked regression:

https://linux-regtracking.leemhuis.info/regzbot/regression/lore/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/

FWIW, such pages existed earlier already, but before sending this reply
I wanted to fix a related bug that changed the url slightly. One can
find that link by clicking on "activity" in the regzbot webui (I need to
find a better place for this link to make it more approachable :-/ ).

And yes, in this case the URL sadly is rather long -- and the long msgid
is only partly to blame. If we really want to link there more regularly
I could work towards making that url shorter.

That being said: I wonder if we really want to add these links to commit
messages regularly. In case of this particular regression...

> Now I just linked to the report in this thread.

...the thread with the report basically contains nearly everything
relevant (expect a link to the commit with the revert; but in this case
that's where the journey or a curious reader would start).

But yes, for regressions with a more complex history it's different, as
there the regzbot webui makes things a bit easier -- among others by
directly pointing to patches in the same or other threads that otherwise
are hard to find from the original thread, unless you know how to search
for them on lore.

I sometimes wonder if the real solution for this kind of problem would
be some bot (regzbot? bugbot?) that does something similar to the
pr-tracker-bot:

1) bot notices when a patch with a Link: or Closes: tag to a thread with
the msgid <foo> is posted or applied to next, mainline, or stable
2) bot posts a reply to <foo> with a short msg like "a patch that links
to this thread was (posted|merged); for details see <url>"

That would solve a few things (that might or might not be worth solving):

 * bug reporters would become aware of the progress in case the
developer forgets to CC them (which happens)

 * people that run into an issue and search for existing mailed reports
on lore currently have no simple way to find fixes that are already
under review or were applied somewhere already

That together with lore is also more likely to be long-term stable than
links to the regzbot webui.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-22 13:40                   ` Jason A. Donenfeld
@ 2023-06-22 19:25                     ` Sami Korkalainen
  0 siblings, 0 replies; 27+ messages in thread
From: Sami Korkalainen @ 2023-06-22 19:25 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linus Torvalds, Andrew Lunn, Linux Stable, Linux Regressions,
	Bagas Sanjaya, Brad Spengler, regressions

> > Now, with non-repeatable boot failures, anything is possible, and Sami
> > does mention 6.1.30 as good (implying that 6.1.31 might not be - 

6.1.34 is ok

> Sami - awaiting your results.

I cherry-picked that commit 13bb06f8dd42071cb9a49f6e21099eea05d4b856
on top of 6.4-rc7 and it does not fix the issue for me.

—Sami

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-21 17:57               ` Jason A. Donenfeld
@ 2023-06-23 13:55                 ` Ard Biesheuvel
  2023-06-23 17:29                   ` Linus Torvalds
  2023-06-23 18:20                   ` Sami Korkalainen
  0 siblings, 2 replies; 27+ messages in thread
From: Ard Biesheuvel @ 2023-06-23 13:55 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: regressions, Andrew Lunn, Linus Torvalds, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

On Wed, 21 Jun 2023 at 19:57, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> +Ard - any ideas here?
>
> On Wed, Jun 21, 2023 at 10:46 AM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
> >
> > [added Jason (who authored the culprit) to the list of recipients; moved
> > net people and list to BCC, guess they are not much interested in this
> > anymore then]
> >
> > On 21.06.23 08:07, Sami Korkalainen wrote:
> > > I bisected again. It seems I made some mistake last time, as I got a
> > > different result this time. Maybe, because these problematic kernels may
> > > boot fine sometimes, like I said before.
> > >
> > > Anyway, first bad commit (makes much more sense this time):
> > > e7b813b32a42a3a6281a4fd9ae7700a0257c1d50 efi: random: refresh
> > > non-volatile random seed when RNG is initialized
> > >
> > > I confirmed that this is the code causing the issue by commenting it
> > > out (see the patch file). Without this code, the latest mainline boots fine.
> >
> > Jason, in that case it seems this is something for you. For the initial
> > report, see here:
> >
> > https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/
> >
> > Quoting a part of it:
> >
> > ```
> > Linux 6.2 and newer are (mostly) unbootable on my old HP 6730b laptop,
> > the 6.1.30 works still fine.
> > The weirdest thing is that newer kernels (like 6.3.4 and 6.4-rc3) may
> > boot ok on the first try, but when rebooting, the very same version
> > doesn't boot.
> >
> > Some times, when trying to boot, I get this message repeated forever:
> > ACPI Error: No handler or method for GPE [XX], disabling event
> > (20221020/evgpe-839)
> > On newer kernels, the date is 20230331 instead of 20221020. There is
> > also some other error, but I can't read it as it gets overwritten by the
> > other ACPI error, see image linked at the end.
> >
> > And some times, the screen will just stay completely blank.
> >
> > I tried booting with acpi=off, but it does not help.

Catching up with email after my vacation, apologies for the delay.

This ship seems to have sailed in the meantime, but I'll contribute
some observations anyway.

The machine in question appears to be Vista-era Windows laptop, and I
am not surprised at all that the firmware is flaky. In those days,
firmware testing was limited to boot testing Windows, and nobody
bothered testing for EFI compliance beyond that (as it is not needed
to get the Windows sticker)

However, the failure mode still strikes me as odd, and I'd be
interested in finding out whether booting with efi=noruntime makes a
difference at all, as that would prevent the SetVariable() all from
taking place, without affecting anything else.

Setting the variable from user space is ultimately a better choice, I
think. The reason it was avoided it here is so that we don't have to
rely on user space to set limited permissions on the efivarfs file
entry in order to avoid the seed from being world readable (which is
something, e.g., systemd does today for other 'sensitive' EFI
variables, whatever that means). But given that this variable is in
its own GUIDed namespace, we could easily fix that in efivarfs itself.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 13:55                 ` Ard Biesheuvel
@ 2023-06-23 17:29                   ` Linus Torvalds
  2023-06-23 20:30                     ` Jason A. Donenfeld
  2023-06-23 18:20                   ` Sami Korkalainen
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2023-06-23 17:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

On Fri, 23 Jun 2023 at 06:55, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Setting the variable from user space is ultimately a better choice, I
> think.

Doing it from the kernel might still be an option, but I think it was
a huge mistake to do it *early*.

Early boot is fragile to begin with when not everything is set up, and
*much* harder to debug.

So not only are problems more likely to happen in the first place,
when they do happen they are a lot harder to figure out.

Maybe it would make more sense to write a new seed at kernel shutdown.
Not only do y ou presumably have a ton more entropy at that point, but
if things go sideways it's also less of a problem to have dead
machine.

Of course, shutdown is another really hard to debug situation, so not
optimal either.

               Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 13:55                 ` Ard Biesheuvel
  2023-06-23 17:29                   ` Linus Torvalds
@ 2023-06-23 18:20                   ` Sami Korkalainen
  2023-06-23 18:38                     ` Ard Biesheuvel
  1 sibling, 1 reply; 27+ messages in thread
From: Sami Korkalainen @ 2023-06-23 18:20 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, regressions, Andrew Lunn, Linus Torvalds,
	Linux Stable, Linux Regressions, Bagas Sanjaya

> However, the failure mode still strikes me as odd, and I'd be
> interested in finding out whether booting with efi=noruntime makes a
> difference at all, as that would prevent the SetVariable() all from
> taking place, without affecting anything else.

No boot stall with efi=noruntime. Tested on 6.3.9 and 6.4-rc7.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 18:20                   ` Sami Korkalainen
@ 2023-06-23 18:38                     ` Ard Biesheuvel
  2023-06-23 19:01                       ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2023-06-23 18:38 UTC (permalink / raw)
  To: Sami Korkalainen
  Cc: Jason A. Donenfeld, regressions, Andrew Lunn, Linus Torvalds,
	Linux Stable, Linux Regressions, Bagas Sanjaya

On Fri, 23 Jun 2023 at 20:20, Sami Korkalainen
<sami.korkalainen@proton.me> wrote:
>

Please don't send me encrypted emails.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 18:38                     ` Ard Biesheuvel
@ 2023-06-23 19:01                       ` Linus Torvalds
  0 siblings, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2023-06-23 19:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Sami Korkalainen, Jason A. Donenfeld, regressions, Andrew Lunn,
	Linux Stable, Linux Regressions, Bagas Sanjaya

On Fri, 23 Jun 2023 at 11:39, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 23 Jun 2023 at 20:20, Sami Korkalainen
> <sami.korkalainen@proton.me> wrote:
> >
>
> Please don't send me encrypted emails.

Heh. That must be protonmail doing some crazy stuff based on
recipient. Here's Sami's email on the lists:

   https://lore.kernel.org/all/CzNbNfn7R2cqLMD6_jp11Dku0OoXYJhx2AMfk8JXeQVP2EGdt7tqeYD4HH0COhp2o_yj5kN6Ao7oObSelRi8yiz-5ltbQ2xtjBvplvgcZjo=@proton.me/

(and it's what I got too). No encryption anywhere, just the message ID
from hell.

So for some reason protonmail decided that *you* are special, and
singled you out for their super sekrit encryption. Presumably because
Sami has your pgp key.

                  Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 17:29                   ` Linus Torvalds
@ 2023-06-23 20:30                     ` Jason A. Donenfeld
  2023-06-23 21:52                       ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2023-06-23 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ard Biesheuvel, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

Hi Linus, Ard,

On Fri, Jun 23, 2023 at 7:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Maybe it would make more sense to write a new seed at kernel shutdown.
> Not only do y ou presumably have a ton more entropy at that point, but
> if things go sideways it's also less of a problem to have dead
> machine.

We always have to write when using so that we don't credit the same
seed twice, so it's gotta be used at a stage when SetVariable is
somewhat working.

> On Fri, 23 Jun 2023 at 06:55, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Setting the variable from user space is ultimately a better choice, I
> > think.
>
> Doing it from the kernel might still be an option, but I think it was
> a huge mistake to do it *early*.
>
> Early boot is fragile to begin with when not everything is set up, and
> *much* harder to debug.
>
> So not only are problems more likely to happen in the first place,
> when they do happen they are a lot harder to figure out.

I think it's still worth doing in the kernel - or trying to do, at least.

I wonder why SetVariable is failing on this system, and whether
there's a way to workaround it. If we wind up needing to quirk around
it somewhat, then I suspect your suggestion of not doing this as early
in boot might be wise. Specifically, what if we do this after
workqueues are available and do it from one of them? That's still
early enough in boot that it makes the feature useful, but the
scheduler is alive at that point. Then in the worst case, we just get
a wq stall splat, which the user is able to report, and then can
figure out what to do from there.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 20:30                     ` Jason A. Donenfeld
@ 2023-06-23 21:52                       ` Linus Torvalds
  2023-06-23 22:55                         ` Ard Biesheuvel
  2023-06-25 14:40                         ` Jason A. Donenfeld
  0 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2023-06-23 21:52 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Ard Biesheuvel, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

On Fri, 23 Jun 2023 at 13:31, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> We always have to write when using so that we don't credit the same
> seed twice, so it's gotta be used at a stage when SetVariable is
> somewhat working.

This code isn't even the code that "uses" the alleged entropy from
that EFI variable in the first place. That's the code in
efi_random_get_seed() in the EFI boot sequence, and appends it to the
bootup randomness buffers.

And that code already seems to clear the EFI variable (or seems to
append to it).

So this argument seems to be complete garbage - we absolutely do not
have to write it, and your patch already just wrote it in the wrong
place anyway.

Don't make excuses. That code caused boot failures, it was all done in
the wrong place, and at entirely the wrong time.

                  Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 21:52                       ` Linus Torvalds
@ 2023-06-23 22:55                         ` Ard Biesheuvel
  2023-06-23 23:02                           ` Linus Torvalds
  2023-06-25 14:40                         ` Jason A. Donenfeld
  1 sibling, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2023-06-23 22:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

On Fri, 23 Jun 2023 at 23:52, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, 23 Jun 2023 at 13:31, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > We always have to write when using so that we don't credit the same
> > seed twice, so it's gotta be used at a stage when SetVariable is
> > somewhat working.
>
> This code isn't even the code that "uses" the alleged entropy from
> that EFI variable in the first place. That's the code in
> efi_random_get_seed() in the EFI boot sequence, and appends it to the
> bootup randomness buffers.
>
> And that code already seems to clear the EFI variable (or seems to
> append to it).
>

It reads the variable twice (once to obtain the size and once to grab
the data), and replaces it with a zero-length string, which causes the
variable to disappear. (This is typically NOR flash with spare blocks
managed by a fault tolerant write layer in software, and so really
wiping the seed or overwriting it is not generally possible)

Using SetVariable() from boot services to delete a variable is highly
unlikely to regress older systems in a similar way.

> So this argument seems to be complete garbage - we absolutely do not
> have to write it, and your patch already just wrote it in the wrong
> place anyway.
>
> Don't make excuses. That code caused boot failures, it was all done in
> the wrong place, and at entirely the wrong time.
>

With the revert applied, the kernel/EFI stub only consumes the
variable and deletes it, but never creates it by itself, and so the
code does nothing if the variable is never created in the first place.

If we leave it up to user space to create it, we won´t need any policy
or quirks handling in the kernel at all, which I´d prefer. The only
thing we should do is special case the variable's scope GUID in
efivarfs so the file is not created world-readable like we do for
other variables. (This predates my involvement but I think this was an
oversight). Using efivarfs will also ensure that the 'storage
paranoia' logic is used on x86. (This is something I failed to take
into account when I reviewed Jason's patch)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 22:55                         ` Ard Biesheuvel
@ 2023-06-23 23:02                           ` Linus Torvalds
  2023-06-25 15:36                             ` David Laight
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2023-06-23 23:02 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

On Fri, 23 Jun 2023 at 15:55, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> With the revert applied, the kernel/EFI stub only consumes the
> variable and deletes it, but never creates it by itself, and so the
> code does nothing if the variable is never created in the first place.

Right.

But my *point* was that if we want to create it, we DAMN WELL DO NOT
WANT TO DO SO AT BOOT TIME.

Boot time is absolutely the worst possible time to do it.

We'd be much better off doing so at shutdown time, when we at least
have (a) maximal entropy and (b) failures are less critical.

Jason's argument against that was pure and utter BS.

Now, there are real arguments against shutdown time: it too is
horrible to debug. So shutdown is not exactly great either. It's
better than bootup, but it really would be better to do it at a point
where we can actually get reasonable results out if something goes
wrong. Which it clearly did.

              Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 21:52                       ` Linus Torvalds
  2023-06-23 22:55                         ` Ard Biesheuvel
@ 2023-06-25 14:40                         ` Jason A. Donenfeld
  1 sibling, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2023-06-25 14:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ard Biesheuvel, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

On Fri, Jun 23, 2023 at 02:52:25PM -0700, Linus Torvalds wrote:
> On Fri, 23 Jun 2023 at 13:31, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > We always have to write when using so that we don't credit the same
> > seed twice, so it's gotta be used at a stage when SetVariable is
> > somewhat working.
> 
> This code isn't even the code that "uses" the alleged entropy from
> that EFI variable in the first place. That's the code in
> efi_random_get_seed() in the EFI boot sequence, and appends it to the
> bootup randomness buffers.
> 
> And that code already seems to clear the EFI variable (or seems to
> append to it).

Oh, doh, yea, you're right. Sorry. My mistake.

So indeed, we can probably get away with just delaying this until much
later in boot, and doing this inside of a workqueue or similar, instead
of in some special early boot context. Or maybe shutdown? Shutdown seems
like it'd better handle potential firmware issues since hanging on
shutdown is a lot better than hanging on boot. But it would be nice to
keep this working during unclean shutdown, which maybe means doing it
sometime after bootup is still better.

> So this argument seems to be complete garbage - we absolutely do not
> have to write it, and your patch already just wrote it in the wrong
> place anyway.
> 
> Don't make excuses. That code caused boot failures, it was all done in
> the wrong place, and at entirely the wrong time.

Yes, my point was entirely wrong. I was mistaken. But it wasn't an
*excuse*. I was just momentarily confused. No malice here, I promise.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2'
  2023-06-23 23:02                           ` Linus Torvalds
@ 2023-06-25 15:36                             ` David Laight
  0 siblings, 0 replies; 27+ messages in thread
From: David Laight @ 2023-06-25 15:36 UTC (permalink / raw)
  To: 'Linus Torvalds', Ard Biesheuvel
  Cc: Jason A. Donenfeld, regressions, Andrew Lunn, Linux Stable,
	Linux Regressions, Bagas Sanjaya, Sami Korkalainen

From: Linus Torvalds
> Sent: 24 June 2023 00:03
> 
> On Fri, 23 Jun 2023 at 15:55, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > With the revert applied, the kernel/EFI stub only consumes the
> > variable and deletes it, but never creates it by itself, and so the
> > code does nothing if the variable is never created in the first place.
> 
> Right.
> 
> But my *point* was that if we want to create it, we DAMN WELL DO NOT
> WANT TO DO SO AT BOOT TIME.
> 
> Boot time is absolutely the worst possible time to do it.
> 
> We'd be much better off doing so at shutdown time, when we at least
> have (a) maximal entropy and (b) failures are less critical.

Or maybe better - especially for embedded systems which don't
often get shut down properly (or any where someone can force
a system crash and then get no saved entropy) - after the system
has been running long enough to get a reasonable amount of
entropy.

Also, why delete the entropy during boot?
Clearly it is sub-optimal to use it twice, but that has to
be better that not using any at all?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-06-25 15:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 19:17 [REGRESSION][BISECTED] Boot stall from merge tag 'net-next-6.2' Sami Korkalainen
2023-05-27  1:17 ` Bagas Sanjaya
2023-05-27  4:07   ` Sami Korkalainen
2023-06-12 14:07     ` Bagas Sanjaya
2023-06-12 19:05       ` Sami Korkalainen
2023-06-12 19:50         ` Andrew Lunn
2023-06-21  6:07           ` Sami Korkalainen
2023-06-21  8:46             ` Linux regression tracking (Thorsten Leemhuis)
2023-06-21 17:56               ` Linus Torvalds
2023-06-21 18:08                 ` Linus Torvalds
2023-06-22 18:34                   ` Thorsten Leemhuis
2023-06-21 17:57               ` Jason A. Donenfeld
2023-06-23 13:55                 ` Ard Biesheuvel
2023-06-23 17:29                   ` Linus Torvalds
2023-06-23 20:30                     ` Jason A. Donenfeld
2023-06-23 21:52                       ` Linus Torvalds
2023-06-23 22:55                         ` Ard Biesheuvel
2023-06-23 23:02                           ` Linus Torvalds
2023-06-25 15:36                             ` David Laight
2023-06-25 14:40                         ` Jason A. Donenfeld
2023-06-23 18:20                   ` Sami Korkalainen
2023-06-23 18:38                     ` Ard Biesheuvel
2023-06-23 19:01                       ` Linus Torvalds
2023-06-21 18:49               ` Jason A. Donenfeld
2023-06-21 19:51                 ` Linus Torvalds
2023-06-22 13:40                   ` Jason A. Donenfeld
2023-06-22 19:25                     ` Sami Korkalainen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.