All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] do not clean dummy variable in kexec path
@ 2019-08-05  8:35 ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-05  8:35 UTC (permalink / raw)
  To: linux-efi, ard.biesheuvel; +Cc: kexec, linux-kernel, matthewgarrett, bhsharma

kexec reboot fails randomly in UEFI based kvm guest.  The firmware
just reset while calling efi_delete_dummy_variable();  Unfortunately
I don't know how to debug the firmware, it is also possible a potential
problem on real hardware as well although nobody reproduced it.

The intention of efi_delete_dummy_variable is to trigger garbage collection
when entering virtual mode.  But SetVirtualAddressMap can only run once
for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
a good place to clean dummy object.

Drop efi_delete_dummy_variable so that kexec reboot can work.

Signed-off-by: Dave Young <dyoung@redhat.com>
---
 arch/x86/platform/efi/efi.c |    3 ---
 1 file changed, 3 deletions(-)

--- linux-x86.orig/arch/x86/platform/efi/efi.c
+++ linux-x86/arch/x86/platform/efi/efi.c
@@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
 
 	if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
 		runtime_code_page_mkexec();
-
-	/* clean DUMMY object */
-	efi_delete_dummy_variable();
 #endif
 }
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] do not clean dummy variable in kexec path
@ 2019-08-05  8:35 ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-05  8:35 UTC (permalink / raw)
  To: linux-efi, ard.biesheuvel; +Cc: matthewgarrett, bhsharma, kexec, linux-kernel

kexec reboot fails randomly in UEFI based kvm guest.  The firmware
just reset while calling efi_delete_dummy_variable();  Unfortunately
I don't know how to debug the firmware, it is also possible a potential
problem on real hardware as well although nobody reproduced it.

The intention of efi_delete_dummy_variable is to trigger garbage collection
when entering virtual mode.  But SetVirtualAddressMap can only run once
for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
a good place to clean dummy object.

Drop efi_delete_dummy_variable so that kexec reboot can work.

Signed-off-by: Dave Young <dyoung@redhat.com>
---
 arch/x86/platform/efi/efi.c |    3 ---
 1 file changed, 3 deletions(-)

--- linux-x86.orig/arch/x86/platform/efi/efi.c
+++ linux-x86/arch/x86/platform/efi/efi.c
@@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
 
 	if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
 		runtime_code_page_mkexec();
-
-	/* clean DUMMY object */
-	efi_delete_dummy_variable();
 #endif
 }
 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-05  8:35 ` Dave Young
@ 2019-08-05 15:55   ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2019-08-05 15:55 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-efi, Kexec Mailing List, Linux Kernel Mailing List,
	Matthew Garrett, Bhupesh Sharma

On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@redhat.com> wrote:
>
> kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> just reset while calling efi_delete_dummy_variable();  Unfortunately
> I don't know how to debug the firmware, it is also possible a potential
> problem on real hardware as well although nobody reproduced it.
>
> The intention of efi_delete_dummy_variable is to trigger garbage collection
> when entering virtual mode.  But SetVirtualAddressMap can only run once
> for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> a good place to clean dummy object.
>

I would argue that this means it is not a good place to *create* the
dummy variable, and if we don't create it, we don't have to delete it
either.

> Drop efi_delete_dummy_variable so that kexec reboot can work.
>

Creating it and not deleting it is bad, so please try and see if we
can omit the creation on this code path instead.


> Signed-off-by: Dave Young <dyoung@redhat.com>
> ---
>  arch/x86/platform/efi/efi.c |    3 ---
>  1 file changed, 3 deletions(-)
>
> --- linux-x86.orig/arch/x86/platform/efi/efi.c
> +++ linux-x86/arch/x86/platform/efi/efi.c
> @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
>
>         if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
>                 runtime_code_page_mkexec();
> -
> -       /* clean DUMMY object */
> -       efi_delete_dummy_variable();
>  #endif
>  }
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
@ 2019-08-05 15:55   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2019-08-05 15:55 UTC (permalink / raw)
  To: Dave Young
  Cc: Matthew Garrett, Bhupesh Sharma, linux-efi, Kexec Mailing List,
	Linux Kernel Mailing List

On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@redhat.com> wrote:
>
> kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> just reset while calling efi_delete_dummy_variable();  Unfortunately
> I don't know how to debug the firmware, it is also possible a potential
> problem on real hardware as well although nobody reproduced it.
>
> The intention of efi_delete_dummy_variable is to trigger garbage collection
> when entering virtual mode.  But SetVirtualAddressMap can only run once
> for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> a good place to clean dummy object.
>

I would argue that this means it is not a good place to *create* the
dummy variable, and if we don't create it, we don't have to delete it
either.

> Drop efi_delete_dummy_variable so that kexec reboot can work.
>

Creating it and not deleting it is bad, so please try and see if we
can omit the creation on this code path instead.


> Signed-off-by: Dave Young <dyoung@redhat.com>
> ---
>  arch/x86/platform/efi/efi.c |    3 ---
>  1 file changed, 3 deletions(-)
>
> --- linux-x86.orig/arch/x86/platform/efi/efi.c
> +++ linux-x86/arch/x86/platform/efi/efi.c
> @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
>
>         if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
>                 runtime_code_page_mkexec();
> -
> -       /* clean DUMMY object */
> -       efi_delete_dummy_variable();
>  #endif
>  }
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-05  8:35 ` Dave Young
@ 2019-08-05 17:09   ` Matthew Garrett
  -1 siblings, 0 replies; 18+ messages in thread
From: Matthew Garrett @ 2019-08-05 17:09 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-efi, Ard Biesheuvel, kexec, Linux Kernel Mailing List, bhsharma

On Mon, Aug 5, 2019 at 1:36 AM Dave Young <dyoung@redhat.com> wrote:
>
> kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> just reset while calling efi_delete_dummy_variable();  Unfortunately
> I don't know how to debug the firmware, it is also possible a potential
> problem on real hardware as well although nobody reproduced it.
>
> The intention of efi_delete_dummy_variable is to trigger garbage collection
> when entering virtual mode.  But SetVirtualAddressMap can only run once
> for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> a good place to clean dummy object.

I agree that this isn't necessarily the best place to do this in the
kexec case, but given we control the firmware, figuring out what's
actually breaking seems like a good plan.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
@ 2019-08-05 17:09   ` Matthew Garrett
  0 siblings, 0 replies; 18+ messages in thread
From: Matthew Garrett @ 2019-08-05 17:09 UTC (permalink / raw)
  To: Dave Young
  Cc: bhsharma, linux-efi, kexec, Linux Kernel Mailing List, Ard Biesheuvel

On Mon, Aug 5, 2019 at 1:36 AM Dave Young <dyoung@redhat.com> wrote:
>
> kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> just reset while calling efi_delete_dummy_variable();  Unfortunately
> I don't know how to debug the firmware, it is also possible a potential
> problem on real hardware as well although nobody reproduced it.
>
> The intention of efi_delete_dummy_variable is to trigger garbage collection
> when entering virtual mode.  But SetVirtualAddressMap can only run once
> for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> a good place to clean dummy object.

I agree that this isn't necessarily the best place to do this in the
kexec case, but given we control the firmware, figuring out what's
actually breaking seems like a good plan.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-05 15:55   ` Ard Biesheuvel
@ 2019-08-06  2:41     ` Dave Young
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-06  2:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Kexec Mailing List, Linux Kernel Mailing List,
	Matthew Garrett, Bhupesh Sharma

On 08/05/19 at 06:55pm, Ard Biesheuvel wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@redhat.com> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> > just reset while calling efi_delete_dummy_variable();  Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode.  But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> >
> 
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
> 
> > Drop efi_delete_dummy_variable so that kexec reboot can work.
> >
> 
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.

I'm not sure in this case the var is created or not, the logic seems
tricky to me.  It seems to me it is intend to force delete a non-exist
var here.

Matthew, can you comment here about Ard's question?

> 
> 
> > Signed-off-by: Dave Young <dyoung@redhat.com>
> > ---
> >  arch/x86/platform/efi/efi.c |    3 ---
> >  1 file changed, 3 deletions(-)
> >
> > --- linux-x86.orig/arch/x86/platform/efi/efi.c
> > +++ linux-x86/arch/x86/platform/efi/efi.c
> > @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
> >
> >         if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
> >                 runtime_code_page_mkexec();
> > -
> > -       /* clean DUMMY object */
> > -       efi_delete_dummy_variable();
> >  #endif
> >  }
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
@ 2019-08-06  2:41     ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-06  2:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Bhupesh Sharma, linux-efi, Kexec Mailing List,
	Linux Kernel Mailing List

On 08/05/19 at 06:55pm, Ard Biesheuvel wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@redhat.com> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> > just reset while calling efi_delete_dummy_variable();  Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode.  But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> >
> 
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
> 
> > Drop efi_delete_dummy_variable so that kexec reboot can work.
> >
> 
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.

I'm not sure in this case the var is created or not, the logic seems
tricky to me.  It seems to me it is intend to force delete a non-exist
var here.

Matthew, can you comment here about Ard's question?

> 
> 
> > Signed-off-by: Dave Young <dyoung@redhat.com>
> > ---
> >  arch/x86/platform/efi/efi.c |    3 ---
> >  1 file changed, 3 deletions(-)
> >
> > --- linux-x86.orig/arch/x86/platform/efi/efi.c
> > +++ linux-x86/arch/x86/platform/efi/efi.c
> > @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
> >
> >         if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
> >                 runtime_code_page_mkexec();
> > -
> > -       /* clean DUMMY object */
> > -       efi_delete_dummy_variable();
> >  #endif
> >  }
> >

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-05 17:09   ` Matthew Garrett
@ 2019-08-06  2:44     ` Dave Young
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-06  2:44 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-efi, Ard Biesheuvel, kexec, Linux Kernel Mailing List, bhsharma

On 08/05/19 at 10:09am, Matthew Garrett wrote:
> On Mon, Aug 5, 2019 at 1:36 AM Dave Young <dyoung@redhat.com> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> > just reset while calling efi_delete_dummy_variable();  Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode.  But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> 
> I agree that this isn't necessarily the best place to do this in the
> kexec case, but given we control the firmware, figuring out what's
> actually breaking seems like a good plan.

I'm more than glad to get the root cause, if you can help on debugging I
would like to share the efi var file etc.

But it is indeed a problem cause weird reset on end user part, but even if we can
not find the root cause (in firmware..)  I think we still need avoid it
with such workaround.

Thanks
Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
@ 2019-08-06  2:44     ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-06  2:44 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: bhsharma, linux-efi, kexec, Linux Kernel Mailing List, Ard Biesheuvel

On 08/05/19 at 10:09am, Matthew Garrett wrote:
> On Mon, Aug 5, 2019 at 1:36 AM Dave Young <dyoung@redhat.com> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> > just reset while calling efi_delete_dummy_variable();  Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode.  But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> 
> I agree that this isn't necessarily the best place to do this in the
> kexec case, but given we control the firmware, figuring out what's
> actually breaking seems like a good plan.

I'm more than glad to get the root cause, if you can help on debugging I
would like to share the efi var file etc.

But it is indeed a problem cause weird reset on end user part, but even if we can
not find the root cause (in firmware..)  I think we still need avoid it
with such workaround.

Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-05 15:55   ` Ard Biesheuvel
@ 2019-08-08  7:49     ` Dave Young
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-08  7:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, Kexec Mailing List, Linux Kernel Mailing List,
	Matthew Garrett, Bhupesh Sharma

On 08/05/19 at 06:55pm, Ard Biesheuvel wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@redhat.com> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> > just reset while calling efi_delete_dummy_variable();  Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode.  But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> >
> 
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
> 
> > Drop efi_delete_dummy_variable so that kexec reboot can work.
> >
> 
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.
> 

Check the code for the dummy var, it is created only in below chunk:
arch/x86/platform/efi/quirks.c:
efi_query_variable_store():
[snip]
	/*
	 * We account for that by refusing the write if permitting it would
	 * reduce the available space to under 5KB. This figure was provided by
	 * Samsung, so should be safe.
	 */
	if ((remaining_size - size < EFI_MIN_RESERVE) &&
		!efi_no_storage_paranoia) {

		/*
		 * Triggering garbage collection may require that the firmware
		 * generate a real EFI_OUT_OF_RESOURCES error. We can force
		 * that by attempting to use more space than is available.
		 */
		unsigned long dummy_size = remaining_size + 1024;
		void *dummy = kzalloc(dummy_size, GFP_KERNEL);

		if (!dummy)
			return EFI_OUT_OF_RESOURCES;

		status = efi.set_variable((efi_char16_t *)efi_dummy_name,
					  &EFI_DUMMY_GUID,
					  EFI_VARIABLE_NON_VOLATILE |
					  EFI_VARIABLE_BOOTSERVICE_ACCESS |
					  EFI_VARIABLE_RUNTIME_ACCESS,
					  dummy_size, dummy);

		if (status == EFI_SUCCESS) {
			/*
			 * This should have failed, so if it didn't make sure
			 * that we delete it...
			 */
			efi_delete_dummy_variable();
		}

[snip]

So the dummy var only be created when the if condition matched, also
once creating succeeded it is deleted.  The deleting while entering
virtual mode is always deleting a non exist efi var.  Please correct me
if I miss something. 

If above is true, then at least in the kexec path can be dropped because
we have a real bug which resets machine.

Thanks
Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
@ 2019-08-08  7:49     ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-08-08  7:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Bhupesh Sharma, linux-efi, Kexec Mailing List,
	Linux Kernel Mailing List

On 08/05/19 at 06:55pm, Ard Biesheuvel wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@redhat.com> wrote:
> >
> > kexec reboot fails randomly in UEFI based kvm guest.  The firmware
> > just reset while calling efi_delete_dummy_variable();  Unfortunately
> > I don't know how to debug the firmware, it is also possible a potential
> > problem on real hardware as well although nobody reproduced it.
> >
> > The intention of efi_delete_dummy_variable is to trigger garbage collection
> > when entering virtual mode.  But SetVirtualAddressMap can only run once
> > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily
> > a good place to clean dummy object.
> >
> 
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
> 
> > Drop efi_delete_dummy_variable so that kexec reboot can work.
> >
> 
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.
> 

Check the code for the dummy var, it is created only in below chunk:
arch/x86/platform/efi/quirks.c:
efi_query_variable_store():
[snip]
	/*
	 * We account for that by refusing the write if permitting it would
	 * reduce the available space to under 5KB. This figure was provided by
	 * Samsung, so should be safe.
	 */
	if ((remaining_size - size < EFI_MIN_RESERVE) &&
		!efi_no_storage_paranoia) {

		/*
		 * Triggering garbage collection may require that the firmware
		 * generate a real EFI_OUT_OF_RESOURCES error. We can force
		 * that by attempting to use more space than is available.
		 */
		unsigned long dummy_size = remaining_size + 1024;
		void *dummy = kzalloc(dummy_size, GFP_KERNEL);

		if (!dummy)
			return EFI_OUT_OF_RESOURCES;

		status = efi.set_variable((efi_char16_t *)efi_dummy_name,
					  &EFI_DUMMY_GUID,
					  EFI_VARIABLE_NON_VOLATILE |
					  EFI_VARIABLE_BOOTSERVICE_ACCESS |
					  EFI_VARIABLE_RUNTIME_ACCESS,
					  dummy_size, dummy);

		if (status == EFI_SUCCESS) {
			/*
			 * This should have failed, so if it didn't make sure
			 * that we delete it...
			 */
			efi_delete_dummy_variable();
		}

[snip]

So the dummy var only be created when the if condition matched, also
once creating succeeded it is deleted.  The deleting while entering
virtual mode is always deleting a non exist efi var.  Please correct me
if I miss something. 

If above is true, then at least in the kexec path can be dropped because
we have a real bug which resets machine.

Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-05 15:55   ` Ard Biesheuvel
                     ` (2 preceding siblings ...)
  (?)
@ 2019-08-13 11:28   ` Laszlo Ersek
  2019-08-13 11:31     ` Laszlo Ersek
  2019-08-13 21:14     ` Matthew Garrett
  -1 siblings, 2 replies; 18+ messages in thread
From: Laszlo Ersek @ 2019-08-13 11:28 UTC (permalink / raw)
  To: Ard Biesheuvel, Dave Young, Matthew Garrett; +Cc: kexec devel list

On 08/05/19 17:55, ard.biesheuvel at linaro.org (Ard Biesheuvel) wrote:
> On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung at redhat.com> wrote:
>>
>> kexec reboot fails randomly in UEFI based kvm guest.  The firmware
>> just reset while calling efi_delete_dummy_variable();  Unfortunately
>> I don't know how to debug the firmware, it is also possible a
>> potential problem on real hardware as well although nobody reproduced
>> it.
>>
>> The intention of efi_delete_dummy_variable is to trigger garbage
>> collection when entering virtual mode.  But SetVirtualAddressMap can
>> only run once for each physical reboot, thus kexec_enter_virtual_mode
>> is not necessarily a good place to clean dummy object.
>>
>
> I would argue that this means it is not a good place to *create* the
> dummy variable, and if we don't create it, we don't have to delete it
> either.
>
>> Drop efi_delete_dummy_variable so that kexec reboot can work.
>>
>
> Creating it and not deleting it is bad, so please try and see if we
> can omit the creation on this code path instead.

The existent code seems to date back to the following upstream bug
report:

  https://bugzilla.kernel.org/show_bug.cgi?id=55471

The following commits appear relevant:

[1] 68d929862e29 ("efi: be more paranoid about available space when
    creating variables", 2013-03-06) -- part of v3.9-rc2

[2] cc5a080c5d40 ("efi: Pass boot services variable info to runtime
    code", 2013-04-15) -- part of v3.9-rc8

[3] 31ff2f20d900 ("efi: Distinguish between "remaining space" and
    actually used space", 2013-04-15) -- direct descendant of
    cc5a080c5d40, hence also part of v3.9-rc8

[4] f8b8404337de ("Modify UEFI anti-bricking code", 2013-06-10) -- part
    of v3.10-rc6

Commit [1] correctly observed that a varstore reclaim doesn't occur
after ExitBootServices() -- see an excerpt of the message, rewrapped for
readability, below:

> UEFI variables are typically stored in flash. For various reasons,
> avaiable space is typically not reclaimed immediately upon the
> deletion of a variable - instead, the system will garbage collect
> during initialisation after a reboot.

(I verified yesterday, using the edk2 source code, that there is no
varstore reclaim after ExitBootServices(), indeed.)

Thus, (it had been known that,) creating a non-volatile UEFI variable,
and then deleting it too, would not trigger a reclaim *at once*. After
the dummy creation/deletion, a cold reboot of the machine would still be
necessary.


Then looking at commit [4], I guess the idea was to trip the "high water
mark", in order to force a reclaim, with the following considerations:

- the high water mark should be tripped as early as possible, and as
  conservatively as possible, to avoid causing problems for the platform
  firmware,

- but the reclaim would still only occur at *next cold boot*.

In that regard, commit [4] looks justified to me.


However, there are two things in [4] that confuse me:

(a) Attempting to delete the dummy variable in efi_enter_virtual_mode().

(b) The following part, in efi_query_variable_store():

+               /*
+                * The runtime code may now have triggered a garbage collection
+                * run, so check the variable info again
+                */

Let me start with (b). That code is essentially dead, I would say, based
on the information that had already been captured in the commit message
of [1]. Reclaim would never happen after ExitBootServices(). (I assume
efi_query_variable_store() is only invoked after ExitBootServices(),
i.e., from kernel space proper -- sorry if that's a wrong assumption.)

Either way, (b) doesn't hurt; even if it's dead code on a specific
firmware platform, it shouldn't cause problems.


Considering (a): what justified the attempt to delete the dummy variable
in efi_enter_virtual_mode(), in commit [4]? Was that meant as a
fail-safe just so we don't leave a dummy variable lying around?

Because, that hunk in efi_enter_virtual_mode(), from [4], is where the
current trouble originates from. The function efi_enter_virtual_mode()
got split in two later, namely in:

[5] fabb37c736f9 ("x86/efi: Split efi_enter_virtual_mode", 2014-03-04)
    -- part of v3.15-rc1

and the "clean DUMMY object" logic was diligently copied to the new
function called kexec_enter_virtual_mode().

However, commit [5] did not *introduce* the cleanup logic specifically
for kexec. The cleanup hunk from commit [4] had never been conditional
on (!efi_setup). Therefore commit [5] only preserved the kexec behavior,
when it duplicated the cleanup to kexec_enter_virtual_mode().

So even if we consider the "clean DUMMY object" hunk from [4] a
justified fail-safe for the normal boot path, it doesn't apply to the
kexec path -- the cold-booted primary kernel will have gone through
those motions already, will it not?

Therefore, we should do two things:

- Remove the cleanup from the kexec path -- the cleanup logic from [4],
  even if justified for the cold boot path, should have never modified
  the kexec path.

- Consider moving the cleanup even for cold boot to a different spot.
  I'd argue for efi_query_variable_store() -- we should delete the dummy
  variable there, *regardless* of whether we just created it, or it had
  existed previously.

I think the present patch is certainly an improvement (it covers the
first item), as long as we explain in the commit message that [4] should
have restricted, as a minimum, the cleanup logic to the cold boot path.
In that case, the cleanup wouldn't have been explicitly duplicated for
kexec_enter_virtual_mode() in [5], and now it wouldn't be blowing up.


Regarding the particulars of the crash -- I've given some firmware
debugging hints in
<https://bugzilla.redhat.com/show_bug.cgi?id=1707669>, as I don't have
time to track that down myself. (Even this email is quite
extra-curricular for me.)

(Note: I'm not subscribed to the kexec list, so please CC me if you'd
like me to see your followup.)

Thanks
Laszlo


>> Signed-off-by: Dave Young <dyoung at redhat.com>
>> ---
>>  arch/x86/platform/efi/efi.c |    3 ---
>>  1 file changed, 3 deletions(-)
>>
>> --- linux-x86.orig/arch/x86/platform/efi/efi.c
>> +++ linux-x86/arch/x86/platform/efi/efi.c
>> @@ -894,9 +894,6 @@ static void __init kexec_enter_virtual_m
>>
>>         if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
>>                 runtime_code_page_mkexec();
>> -
>> -       /* clean DUMMY object */
>> -       efi_delete_dummy_variable();
>>  #endif
>>  }
>>
>
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-13 11:28   ` Laszlo Ersek
@ 2019-08-13 11:31     ` Laszlo Ersek
  2019-08-13 21:14     ` Matthew Garrett
  1 sibling, 0 replies; 18+ messages in thread
From: Laszlo Ersek @ 2019-08-13 11:31 UTC (permalink / raw)
  To: Ard Biesheuvel, Dave Young, Matthew Garrett; +Cc: kexec devel list

On 08/13/19 13:28, Laszlo Ersek wrote:
> On 08/05/19 17:55, ard.biesheuvel at linaro.org (Ard Biesheuvel) wrote:

>> I would argue that this means it is not a good place to *create* the
>> dummy variable, and if we don't create it, we don't have to delete it
>> either.

> - Consider moving the cleanup even for cold boot to a different spot.
>   I'd argue for efi_query_variable_store() -- we should delete the dummy
>   variable there, *regardless* of whether we just created it, or it had
>   existed previously.

To clarify: I suggest deleting the dummy -- regardless of it
pre-existing, vs. us creating it on the spot -- *wherever* we create it
during cold boot.

I'm not arguing for efi_query_variable_store() per se. If there is a
better init / setup function, we should move both the creation and the
(independent!) removal there.

Thanks
Laszlo

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-13 11:28   ` Laszlo Ersek
  2019-08-13 11:31     ` Laszlo Ersek
@ 2019-08-13 21:14     ` Matthew Garrett
  2019-09-13  9:17       ` Ard Biesheuvel
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Garrett @ 2019-08-13 21:14 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: kexec devel list, Dave Young, Ard Biesheuvel

On Tue, Aug 13, 2019 at 4:28 AM Laszlo Ersek <lersek@redhat.com> wrote:
> (I verified yesterday, using the edk2 source code, that there is no
> varstore reclaim after ExitBootServices(), indeed.)

Some implementations do reclaim at runtime, in which case the
create/delete dance will permit variable creation.

> (a) Attempting to delete the dummy variable in efi_enter_virtual_mode().

To be clear, the dummy variable should never actually come into
existence - we explicitly attempt to create a variable that's bigger
than the available space, so the expectation is that it will always
fail. However, should it somehow end up being created, there's a race
between the creation and the deletion and so there's a (small) risk
that the variable actually ends up there. The cleanup in
enter_virtual_mode() is just there to ensure that anything that did
end up being created on a previous boot is deleted - the expectation
is that it'll be a noop.

> (b) The following part, in efi_query_variable_store():
>
> +               /*
> +                * The runtime code may now have triggered a garbage collection
> +                * run, so check the variable info again
> +                */
>
> Let me start with (b). That code is essentially dead, I would say, based
> on the information that had already been captured in the commit message
> of [1]. Reclaim would never happen after ExitBootServices(). (I assume
> efi_query_variable_store() is only invoked after ExitBootServices(),
> i.e., from kernel space proper -- sorry if that's a wrong assumption.)

It's dead code on Tiano, but not on at least one vendor implementation.

> Considering (a): what justified the attempt to delete the dummy variable
> in efi_enter_virtual_mode(), in commit [4]? Was that meant as a
> fail-safe just so we don't leave a dummy variable lying around?

Yes.

> So even if we consider the "clean DUMMY object" hunk from [4] a
> justified fail-safe for the normal boot path, it doesn't apply to the
> kexec path -- the cold-booted primary kernel will have gone through
> those motions already, will it not?
>
> Therefore, we should do two things:
>
> - Remove the cleanup from the kexec path -- the cleanup logic from [4],
>   even if justified for the cold boot path, should have never modified
>   the kexec path.

I agree that there's no benefit in it being called in the kexec path.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-08-13 21:14     ` Matthew Garrett
@ 2019-09-13  9:17       ` Ard Biesheuvel
  2019-09-17 17:51         ` Matthew Garrett
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2019-09-13  9:17 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: Dave Young, Laszlo Ersek, kexec devel list

On Tue, 13 Aug 2019 at 22:14, Matthew Garrett <mjg59@google.com> wrote:
>
> On Tue, Aug 13, 2019 at 4:28 AM Laszlo Ersek <lersek@redhat.com> wrote:
> > (I verified yesterday, using the edk2 source code, that there is no
> > varstore reclaim after ExitBootServices(), indeed.)
>
> Some implementations do reclaim at runtime, in which case the
> create/delete dance will permit variable creation.
>
> > (a) Attempting to delete the dummy variable in efi_enter_virtual_mode().
>
> To be clear, the dummy variable should never actually come into
> existence - we explicitly attempt to create a variable that's bigger
> than the available space, so the expectation is that it will always
> fail. However, should it somehow end up being created, there's a race
> between the creation and the deletion and so there's a (small) risk
> that the variable actually ends up there. The cleanup in
> enter_virtual_mode() is just there to ensure that anything that did
> end up being created on a previous boot is deleted - the expectation
> is that it'll be a noop.
>
> > (b) The following part, in efi_query_variable_store():
> >
> > +               /*
> > +                * The runtime code may now have triggered a garbage collection
> > +                * run, so check the variable info again
> > +                */
> >
> > Let me start with (b). That code is essentially dead, I would say, based
> > on the information that had already been captured in the commit message
> > of [1]. Reclaim would never happen after ExitBootServices(). (I assume
> > efi_query_variable_store() is only invoked after ExitBootServices(),
> > i.e., from kernel space proper -- sorry if that's a wrong assumption.)
>
> It's dead code on Tiano, but not on at least one vendor implementation.
>
> > Considering (a): what justified the attempt to delete the dummy variable
> > in efi_enter_virtual_mode(), in commit [4]? Was that meant as a
> > fail-safe just so we don't leave a dummy variable lying around?
>
> Yes.
>
> > So even if we consider the "clean DUMMY object" hunk from [4] a
> > justified fail-safe for the normal boot path, it doesn't apply to the
> > kexec path -- the cold-booted primary kernel will have gone through
> > those motions already, will it not?
> >
> > Therefore, we should do two things:
> >
> > - Remove the cleanup from the kexec path -- the cleanup logic from [4],
> >   even if justified for the cold boot path, should have never modified
> >   the kexec path.
>
> I agree that there's no benefit in it being called in the kexec path.

Can I take that as an ack?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-09-13  9:17       ` Ard Biesheuvel
@ 2019-09-17 17:51         ` Matthew Garrett
  2019-09-25 15:44           ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Garrett @ 2019-09-17 17:51 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Dave Young, Laszlo Ersek, kexec devel list

On Fri, Sep 13, 2019 at 2:18 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:

> > > - Remove the cleanup from the kexec path -- the cleanup logic from [4],
> > >   even if justified for the cold boot path, should have never modified
> > >   the kexec path.
> >
> > I agree that there's no benefit in it being called in the kexec path.
>
> Can I take that as an ack?

An ack of this hunk.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] do not clean dummy variable in kexec path
  2019-09-17 17:51         ` Matthew Garrett
@ 2019-09-25 15:44           ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2019-09-25 15:44 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: Dave Young, Laszlo Ersek, kexec devel list

On Tue, 17 Sep 2019 at 19:52, Matthew Garrett <mjg59@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 2:18 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>
> > > > - Remove the cleanup from the kexec path -- the cleanup logic from [4],
> > > >   even if justified for the cold boot path, should have never modified
> > > >   the kexec path.
> > >
> > > I agree that there's no benefit in it being called in the kexec path.
> >
> > Can I take that as an ack?
>
> An ack of this hunk.

Given that the patch in question has only one hunk, I'll take this as
an ack of the entire patch, and queue it as a fix.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-09-25 15:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-05  8:35 [PATCH] do not clean dummy variable in kexec path Dave Young
2019-08-05  8:35 ` Dave Young
2019-08-05 15:55 ` Ard Biesheuvel
2019-08-05 15:55   ` Ard Biesheuvel
2019-08-06  2:41   ` Dave Young
2019-08-06  2:41     ` Dave Young
2019-08-08  7:49   ` Dave Young
2019-08-08  7:49     ` Dave Young
2019-08-13 11:28   ` Laszlo Ersek
2019-08-13 11:31     ` Laszlo Ersek
2019-08-13 21:14     ` Matthew Garrett
2019-09-13  9:17       ` Ard Biesheuvel
2019-09-17 17:51         ` Matthew Garrett
2019-09-25 15:44           ` Ard Biesheuvel
2019-08-05 17:09 ` Matthew Garrett
2019-08-05 17:09   ` Matthew Garrett
2019-08-06  2:44   ` Dave Young
2019-08-06  2:44     ` Dave Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.