All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-20 11:15 ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-20 11:15 UTC (permalink / raw)
  To: linux-kernel, kexec, linux-rt-users
  Cc: Eric Biederman, Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
of mutex_trylock():

	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		return 0;

This prevents an NMI panic() from executing the main body of
__crash_kexec() which does the actual kexec into the kdump kernel.
The warning and return are explained by:

  6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
  [...]
  The reasons for this are:

      1) There is a potential deadlock in the slowpath

      2) Another cpu which blocks on the rtmutex will boost the task
	 which allegedly locked the rtmutex, but that cannot work
	 because the hard/softirq context borrows the task context.

Furthermore, grabbing the lock isn't NMI safe, so do away with it and
use an atomic variable to serialize reads vs writes of
kexec_crash_image.

Tested by triggering NMI panics via:

  $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
  $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
  $ echo 1 > /proc/sys/kernel/panic

  $ ipmitool power diag

Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
v1 -> v2
++++++++

o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
  (Petr)
o Slightly reworded changelog
o Added Fixes: tag. Technically should be up to since kexec can happen
  in an NMI, but that isn't such a clear target
---
 include/linux/kexec.h |  1 +
 kernel/kexec.c        | 16 ++++++++++++----
 kernel/kexec_core.c   | 36 +++++++++++++++++++-----------------
 kernel/kexec_file.c   | 11 +++++++++++
 4 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ce6536f1d269..5849a15ae3dd 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -369,6 +369,7 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
 
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
+extern atomic_t crash_kexec_lock;
 extern int kexec_load_disabled;
 
 #ifndef kexec_flush_icache_page
diff --git a/kernel/kexec.c b/kernel/kexec.c
index b5e40f069768..73e0df2c608f 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 	/*
 	 * Because we write directly to the reserved memory region when loading
 	 * crash kernels we need a mutex here to prevent multiple crash kernels
-	 * from attempting to load simultaneously, and to prevent a crash kernel
-	 * from loading over the top of a in use crash kernel.
-	 *
-	 * KISS: always take the mutex.
+	 * from attempting to load simultaneously.
 	 */
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+	/*
+	 * Prevent loading a new crash kernel while one is in use.
+	 * See associated comment in __crash_kexec().
+	 */
+	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
+		ret = -EBUSY;
+		goto out_unlock_mutex;
+	}
+
 	if (flags & KEXEC_ON_CRASH) {
 		dest_image = &kexec_crash_image;
 		if (kexec_crash_image)
@@ -165,6 +171,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 
 	kimage_free(image);
 out_unlock:
+	atomic_set_release(&crash_kexec_lock, 0);
+out_unlock_mutex:
 	mutex_unlock(&kexec_mutex);
 	return ret;
 }
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 4d34c78334ce..f957109a266c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -933,6 +933,7 @@ int kimage_load_segment(struct kimage *image,
 
 struct kimage *kexec_image;
 struct kimage *kexec_crash_image;
+atomic_t crash_kexec_lock = ATOMIC_INIT(0);
 int kexec_load_disabled;
 #ifdef CONFIG_SYSCTL
 static struct ctl_table kexec_core_sysctls[] = {
@@ -964,25 +965,26 @@ late_initcall(kexec_core_sysctl_init);
  */
 void __noclone __crash_kexec(struct pt_regs *regs)
 {
-	/* Take the kexec_mutex here to prevent sys_kexec_load
-	 * running on one cpu from replacing the crash kernel
-	 * we are using after a panic on a different cpu.
-	 *
-	 * If the crash kernel was not located in a fixed area
-	 * of memory the xchg(&kexec_crash_image) would be
-	 * sufficient.  But since I reuse the memory...
+	/*
+	 * This should be taking kexec_mutex before doing anything with the
+	 * kexec_crash_image, but this code can be run in NMI context which
+	 * means we can't even trylock. This is circumvented by using an
+	 * atomic variable that is *also* used by the codepaths that take
+	 * the mutex to modify kexec_crash_image.
 	 */
-	if (mutex_trylock(&kexec_mutex)) {
-		if (kexec_crash_image) {
-			struct pt_regs fixed_regs;
-
-			crash_setup_regs(&fixed_regs, regs);
-			crash_save_vmcoreinfo();
-			machine_crash_shutdown(&fixed_regs);
-			machine_kexec(kexec_crash_image);
-		}
-		mutex_unlock(&kexec_mutex);
+	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1))
+		return;
+
+	if (kexec_crash_image) {
+		struct pt_regs fixed_regs;
+
+		crash_setup_regs(&fixed_regs, regs);
+		crash_save_vmcoreinfo();
+		machine_crash_shutdown(&fixed_regs);
+		machine_kexec(kexec_crash_image);
 	}
+
+	atomic_set_release(&crash_kexec_lock, 0);
 }
 STACK_FRAME_NON_STANDARD(__crash_kexec);
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 145321a5e798..3faec031cfc9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -337,6 +337,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+	/*
+	 * Prevent loading a new crash kernel while one is in use.
+	 * See associated comment in __crash_kexec().
+	 */
+	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
+		ret = -EBUSY;
+		goto out_mutex_unlock;
+	}
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_FILE_ON_CRASH) {
 		dest_image = &kexec_crash_image;
@@ -406,6 +415,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image)
 		arch_kexec_protect_crashkres();
 
+	atomic_set_release(&crash_kexec_lock, 0);
+out_mutex_unlock:
 	mutex_unlock(&kexec_mutex);
 	kimage_free(image);
 	return ret;
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-20 11:15 ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-20 11:15 UTC (permalink / raw)
  To: linux-kernel, kexec, linux-rt-users
  Cc: Eric Biederman, Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
of mutex_trylock():

	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		return 0;

This prevents an NMI panic() from executing the main body of
__crash_kexec() which does the actual kexec into the kdump kernel.
The warning and return are explained by:

  6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
  [...]
  The reasons for this are:

      1) There is a potential deadlock in the slowpath

      2) Another cpu which blocks on the rtmutex will boost the task
	 which allegedly locked the rtmutex, but that cannot work
	 because the hard/softirq context borrows the task context.

Furthermore, grabbing the lock isn't NMI safe, so do away with it and
use an atomic variable to serialize reads vs writes of
kexec_crash_image.

Tested by triggering NMI panics via:

  $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
  $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
  $ echo 1 > /proc/sys/kernel/panic

  $ ipmitool power diag

Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
v1 -> v2
++++++++

o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
  (Petr)
o Slightly reworded changelog
o Added Fixes: tag. Technically should be up to since kexec can happen
  in an NMI, but that isn't such a clear target
---
 include/linux/kexec.h |  1 +
 kernel/kexec.c        | 16 ++++++++++++----
 kernel/kexec_core.c   | 36 +++++++++++++++++++-----------------
 kernel/kexec_file.c   | 11 +++++++++++
 4 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ce6536f1d269..5849a15ae3dd 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -369,6 +369,7 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
 
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
+extern atomic_t crash_kexec_lock;
 extern int kexec_load_disabled;
 
 #ifndef kexec_flush_icache_page
diff --git a/kernel/kexec.c b/kernel/kexec.c
index b5e40f069768..73e0df2c608f 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 	/*
 	 * Because we write directly to the reserved memory region when loading
 	 * crash kernels we need a mutex here to prevent multiple crash kernels
-	 * from attempting to load simultaneously, and to prevent a crash kernel
-	 * from loading over the top of a in use crash kernel.
-	 *
-	 * KISS: always take the mutex.
+	 * from attempting to load simultaneously.
 	 */
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+	/*
+	 * Prevent loading a new crash kernel while one is in use.
+	 * See associated comment in __crash_kexec().
+	 */
+	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
+		ret = -EBUSY;
+		goto out_unlock_mutex;
+	}
+
 	if (flags & KEXEC_ON_CRASH) {
 		dest_image = &kexec_crash_image;
 		if (kexec_crash_image)
@@ -165,6 +171,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 
 	kimage_free(image);
 out_unlock:
+	atomic_set_release(&crash_kexec_lock, 0);
+out_unlock_mutex:
 	mutex_unlock(&kexec_mutex);
 	return ret;
 }
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 4d34c78334ce..f957109a266c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -933,6 +933,7 @@ int kimage_load_segment(struct kimage *image,
 
 struct kimage *kexec_image;
 struct kimage *kexec_crash_image;
+atomic_t crash_kexec_lock = ATOMIC_INIT(0);
 int kexec_load_disabled;
 #ifdef CONFIG_SYSCTL
 static struct ctl_table kexec_core_sysctls[] = {
@@ -964,25 +965,26 @@ late_initcall(kexec_core_sysctl_init);
  */
 void __noclone __crash_kexec(struct pt_regs *regs)
 {
-	/* Take the kexec_mutex here to prevent sys_kexec_load
-	 * running on one cpu from replacing the crash kernel
-	 * we are using after a panic on a different cpu.
-	 *
-	 * If the crash kernel was not located in a fixed area
-	 * of memory the xchg(&kexec_crash_image) would be
-	 * sufficient.  But since I reuse the memory...
+	/*
+	 * This should be taking kexec_mutex before doing anything with the
+	 * kexec_crash_image, but this code can be run in NMI context which
+	 * means we can't even trylock. This is circumvented by using an
+	 * atomic variable that is *also* used by the codepaths that take
+	 * the mutex to modify kexec_crash_image.
 	 */
-	if (mutex_trylock(&kexec_mutex)) {
-		if (kexec_crash_image) {
-			struct pt_regs fixed_regs;
-
-			crash_setup_regs(&fixed_regs, regs);
-			crash_save_vmcoreinfo();
-			machine_crash_shutdown(&fixed_regs);
-			machine_kexec(kexec_crash_image);
-		}
-		mutex_unlock(&kexec_mutex);
+	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1))
+		return;
+
+	if (kexec_crash_image) {
+		struct pt_regs fixed_regs;
+
+		crash_setup_regs(&fixed_regs, regs);
+		crash_save_vmcoreinfo();
+		machine_crash_shutdown(&fixed_regs);
+		machine_kexec(kexec_crash_image);
 	}
+
+	atomic_set_release(&crash_kexec_lock, 0);
 }
 STACK_FRAME_NON_STANDARD(__crash_kexec);
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 145321a5e798..3faec031cfc9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -337,6 +337,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+	/*
+	 * Prevent loading a new crash kernel while one is in use.
+	 * See associated comment in __crash_kexec().
+	 */
+	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
+		ret = -EBUSY;
+		goto out_mutex_unlock;
+	}
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_FILE_ON_CRASH) {
 		dest_image = &kexec_crash_image;
@@ -406,6 +415,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image)
 		arch_kexec_protect_crashkres();
 
+	atomic_set_release(&crash_kexec_lock, 0);
+out_mutex_unlock:
 	mutex_unlock(&kexec_mutex);
 	kimage_free(image);
 	return ret;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-20 11:15 ` Valentin Schneider
@ 2022-06-23  9:31   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-23  9:31 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner, Juri Lelli,
	Luis Claudio R. Goncalves

On 2022-06-20 12:15:20 [+0100], Valentin Schneider wrote:
> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
> 
> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> ---
> v1 -> v2
> ++++++++
> 
> o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
>   (Petr)
> o Slightly reworded changelog
> o Added Fixes: tag. Technically should be up to since kexec can happen
>   in an NMI, but that isn't such a clear target

RT-wise it would be needed for each release.
There is also a mutex_unlock() in case an image is missing. This can go
via the scheduler if there is a waiter which does not look good with the
NMI in the picture.

Sebastian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-23  9:31   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-23  9:31 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner, Juri Lelli,
	Luis Claudio R. Goncalves

On 2022-06-20 12:15:20 [+0100], Valentin Schneider wrote:
> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
> 
> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> ---
> v1 -> v2
> ++++++++
> 
> o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
>   (Petr)
> o Slightly reworded changelog
> o Added Fixes: tag. Technically should be up to since kexec can happen
>   in an NMI, but that isn't such a clear target

RT-wise it would be needed for each release.
There is also a mutex_unlock() in case an image is missing. This can go
via the scheduler if there is a waiter which does not look good with the
NMI in the picture.

Sebastian

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-23  9:31   ` Sebastian Andrzej Siewior
@ 2022-06-23 11:39     ` Valentin Schneider
  -1 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-23 11:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner, Juri Lelli,
	Luis Claudio R. Goncalves

On 23/06/22 11:31, Sebastian Andrzej Siewior wrote:
> On 2022-06-20 12:15:20 [+0100], Valentin Schneider wrote:
>> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
>> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
>> of mutex_trylock():
>> 
> …
>
>> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
>> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
>> ---
>> v1 -> v2
>> ++++++++
>> 
>> o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
>>   (Petr)
>> o Slightly reworded changelog
>> o Added Fixes: tag. Technically should be up to since kexec can happen
>>   in an NMI, but that isn't such a clear target
>
> RT-wise it would be needed for each release.

So git tells me the Fixes: commit dates from v4.2; from [1] and [2] I get
that both current longterm stable and stable-rt trees go as far back as
v4.9, so I'm guessing if that gets picked up for the stable trees then it
should make its way into the stable -rt trees?

[1]: https://wiki.linuxfoundation.org/realtime/preempt_rt_versions
[2]: https://www.kernel.org/category/releases.html

> There is also a mutex_unlock() in case an image is missing. This can go
> via the scheduler if there is a waiter which does not look good with the
> NMI in the picture.
>
> Sebastian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-23 11:39     ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-23 11:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner, Juri Lelli,
	Luis Claudio R. Goncalves

On 23/06/22 11:31, Sebastian Andrzej Siewior wrote:
> On 2022-06-20 12:15:20 [+0100], Valentin Schneider wrote:
>> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
>> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
>> of mutex_trylock():
>> 
> …
>
>> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
>> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
>> ---
>> v1 -> v2
>> ++++++++
>> 
>> o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
>>   (Petr)
>> o Slightly reworded changelog
>> o Added Fixes: tag. Technically should be up to since kexec can happen
>>   in an NMI, but that isn't such a clear target
>
> RT-wise it would be needed for each release.

So git tells me the Fixes: commit dates from v4.2; from [1] and [2] I get
that both current longterm stable and stable-rt trees go as far back as
v4.9, so I'm guessing if that gets picked up for the stable trees then it
should make its way into the stable -rt trees?

[1]: https://wiki.linuxfoundation.org/realtime/preempt_rt_versions
[2]: https://www.kernel.org/category/releases.html

> There is also a mutex_unlock() in case an image is missing. This can go
> via the scheduler if there is a waiter which does not look good with the
> NMI in the picture.
>
> Sebastian


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-23 11:39     ` Valentin Schneider
@ 2022-06-23 13:35       ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-23 13:35 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner, Juri Lelli,
	Luis Claudio R. Goncalves

On 2022-06-23 12:39:57 [+0100], Valentin Schneider wrote:
> > RT-wise it would be needed for each release.
> 
> So git tells me the Fixes: commit dates from v4.2; from [1] and [2] I get
> that both current longterm stable and stable-rt trees go as far back as
> v4.9, so I'm guessing if that gets picked up for the stable trees then it
> should make its way into the stable -rt trees?

correct, if it goes -stable. Just pointing it out. But the
mutex_unlock() looks like it might also be relevant for non-RT kernels.

Sebastian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-23 13:35       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-23 13:35 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner, Juri Lelli,
	Luis Claudio R. Goncalves

On 2022-06-23 12:39:57 [+0100], Valentin Schneider wrote:
> > RT-wise it would be needed for each release.
> 
> So git tells me the Fixes: commit dates from v4.2; from [1] and [2] I get
> that both current longterm stable and stable-rt trees go as far back as
> v4.9, so I'm guessing if that gets picked up for the stable trees then it
> should make its way into the stable -rt trees?

correct, if it goes -stable. Just pointing it out. But the
mutex_unlock() looks like it might also be relevant for non-RT kernels.

Sebastian

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-20 11:15 ` Valentin Schneider
@ 2022-06-24  1:30   ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2022-06-24  1:30 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 06/20/22 at 12:15pm, Valentin Schneider wrote:
> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
> 
> 	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
> 		return 0;
> 
> This prevents an NMI panic() from executing the main body of
> __crash_kexec() which does the actual kexec into the kdump kernel.
> The warning and return are explained by:
> 
>   6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
>   [...]
>   The reasons for this are:
> 
>       1) There is a potential deadlock in the slowpath
> 
>       2) Another cpu which blocks on the rtmutex will boost the task
> 	 which allegedly locked the rtmutex, but that cannot work
> 	 because the hard/softirq context borrows the task context.
> 
> Furthermore, grabbing the lock isn't NMI safe, so do away with it and
> use an atomic variable to serialize reads vs writes of
> kexec_crash_image.
> 
> Tested by triggering NMI panics via:
> 
>   $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
>   $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
>   $ echo 1 > /proc/sys/kernel/panic
> 
>   $ ipmitool power diag
> 
> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> v1 -> v2
> ++++++++
> 
> o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
>   (Petr)
> o Slightly reworded changelog
> o Added Fixes: tag. Technically should be up to since kexec can happen
>   in an NMI, but that isn't such a clear target
> ---
>  include/linux/kexec.h |  1 +
>  kernel/kexec.c        | 16 ++++++++++++----
>  kernel/kexec_core.c   | 36 +++++++++++++++++++-----------------
>  kernel/kexec_file.c   | 11 +++++++++++
>  4 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index ce6536f1d269..5849a15ae3dd 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -369,6 +369,7 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>  
>  extern struct kimage *kexec_image;
>  extern struct kimage *kexec_crash_image;
> +extern atomic_t crash_kexec_lock;
>  extern int kexec_load_disabled;
>  
>  #ifndef kexec_flush_icache_page
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index b5e40f069768..73e0df2c608f 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>  	/*
>  	 * Because we write directly to the reserved memory region when loading
>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
> -	 * from attempting to load simultaneously, and to prevent a crash kernel
> -	 * from loading over the top of a in use crash kernel.
> -	 *
> -	 * KISS: always take the mutex.
> +	 * from attempting to load simultaneously.
>  	 */
>  	if (!mutex_trylock(&kexec_mutex))
>  		return -EBUSY;

So kexec_mutex is degenerated to only avoid simultaneous loading,
should we rename to reflect that?, e.g kexec_load_mutex.

>  
> +	/*
> +	 * Prevent loading a new crash kernel while one is in use.
> +	 * See associated comment in __crash_kexec().
> +	 */
> +	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
> +		ret = -EBUSY;
> +		goto out_unlock_mutex;
> +	}
> +
>  	if (flags & KEXEC_ON_CRASH) {
>  		dest_image = &kexec_crash_image;
>  		if (kexec_crash_image)
> @@ -165,6 +171,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>  
>  	kimage_free(image);
>  out_unlock:
> +	atomic_set_release(&crash_kexec_lock, 0);
> +out_unlock_mutex:
>  	mutex_unlock(&kexec_mutex);
>  	return ret;
>  }
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 4d34c78334ce..f957109a266c 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -933,6 +933,7 @@ int kimage_load_segment(struct kimage *image,
>  
>  struct kimage *kexec_image;
>  struct kimage *kexec_crash_image;
> +atomic_t crash_kexec_lock = ATOMIC_INIT(0);
>  int kexec_load_disabled;
>  #ifdef CONFIG_SYSCTL
>  static struct ctl_table kexec_core_sysctls[] = {
> @@ -964,25 +965,26 @@ late_initcall(kexec_core_sysctl_init);
>   */
>  void __noclone __crash_kexec(struct pt_regs *regs)
>  {
> -	/* Take the kexec_mutex here to prevent sys_kexec_load
> -	 * running on one cpu from replacing the crash kernel
> -	 * we are using after a panic on a different cpu.
> -	 *
> -	 * If the crash kernel was not located in a fixed area
> -	 * of memory the xchg(&kexec_crash_image) would be
> -	 * sufficient.  But since I reuse the memory...
> +	/*
> +	 * This should be taking kexec_mutex before doing anything with the
> +	 * kexec_crash_image, but this code can be run in NMI context which
> +	 * means we can't even trylock. This is circumvented by using an
> +	 * atomic variable that is *also* used by the codepaths that take
> +	 * the mutex to modify kexec_crash_image.
>  	 */
> -	if (mutex_trylock(&kexec_mutex)) {
> -		if (kexec_crash_image) {
> -			struct pt_regs fixed_regs;
> -
> -			crash_setup_regs(&fixed_regs, regs);
> -			crash_save_vmcoreinfo();
> -			machine_crash_shutdown(&fixed_regs);
> -			machine_kexec(kexec_crash_image);
> -		}
> -		mutex_unlock(&kexec_mutex);
> +	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1))
> +		return;
> +
> +	if (kexec_crash_image) {
> +		struct pt_regs fixed_regs;
> +
> +		crash_setup_regs(&fixed_regs, regs);
> +		crash_save_vmcoreinfo();
> +		machine_crash_shutdown(&fixed_regs);
> +		machine_kexec(kexec_crash_image);
>  	}
> +
> +	atomic_set_release(&crash_kexec_lock, 0);
>  }
>  STACK_FRAME_NON_STANDARD(__crash_kexec);
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 145321a5e798..3faec031cfc9 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -337,6 +337,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
>  	if (!mutex_trylock(&kexec_mutex))
>  		return -EBUSY;
>  
> +	/*
> +	 * Prevent loading a new crash kernel while one is in use.
> +	 * See associated comment in __crash_kexec().
> +	 */
> +	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
> +		ret = -EBUSY;
> +		goto out_mutex_unlock;
> +	}
> +
>  	dest_image = &kexec_image;
>  	if (flags & KEXEC_FILE_ON_CRASH) {
>  		dest_image = &kexec_crash_image;
> @@ -406,6 +415,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
>  	if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image)
>  		arch_kexec_protect_crashkres();
>  
> +	atomic_set_release(&crash_kexec_lock, 0);
> +out_mutex_unlock:
>  	mutex_unlock(&kexec_mutex);
>  	kimage_free(image);
>  	return ret;
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-24  1:30   ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2022-06-24  1:30 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 06/20/22 at 12:15pm, Valentin Schneider wrote:
> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
> 
> 	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
> 		return 0;
> 
> This prevents an NMI panic() from executing the main body of
> __crash_kexec() which does the actual kexec into the kdump kernel.
> The warning and return are explained by:
> 
>   6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
>   [...]
>   The reasons for this are:
> 
>       1) There is a potential deadlock in the slowpath
> 
>       2) Another cpu which blocks on the rtmutex will boost the task
> 	 which allegedly locked the rtmutex, but that cannot work
> 	 because the hard/softirq context borrows the task context.
> 
> Furthermore, grabbing the lock isn't NMI safe, so do away with it and
> use an atomic variable to serialize reads vs writes of
> kexec_crash_image.
> 
> Tested by triggering NMI panics via:
> 
>   $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
>   $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
>   $ echo 1 > /proc/sys/kernel/panic
> 
>   $ ipmitool power diag
> 
> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> v1 -> v2
> ++++++++
> 
> o Changed from Peterson-like synchronization to simpler atomic_cmpxchg
>   (Petr)
> o Slightly reworded changelog
> o Added Fixes: tag. Technically should be up to since kexec can happen
>   in an NMI, but that isn't such a clear target
> ---
>  include/linux/kexec.h |  1 +
>  kernel/kexec.c        | 16 ++++++++++++----
>  kernel/kexec_core.c   | 36 +++++++++++++++++++-----------------
>  kernel/kexec_file.c   | 11 +++++++++++
>  4 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index ce6536f1d269..5849a15ae3dd 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -369,6 +369,7 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>  
>  extern struct kimage *kexec_image;
>  extern struct kimage *kexec_crash_image;
> +extern atomic_t crash_kexec_lock;
>  extern int kexec_load_disabled;
>  
>  #ifndef kexec_flush_icache_page
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index b5e40f069768..73e0df2c608f 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>  	/*
>  	 * Because we write directly to the reserved memory region when loading
>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
> -	 * from attempting to load simultaneously, and to prevent a crash kernel
> -	 * from loading over the top of a in use crash kernel.
> -	 *
> -	 * KISS: always take the mutex.
> +	 * from attempting to load simultaneously.
>  	 */
>  	if (!mutex_trylock(&kexec_mutex))
>  		return -EBUSY;

So kexec_mutex is degenerated to only avoid simultaneous loading,
should we rename to reflect that?, e.g kexec_load_mutex.

>  
> +	/*
> +	 * Prevent loading a new crash kernel while one is in use.
> +	 * See associated comment in __crash_kexec().
> +	 */
> +	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
> +		ret = -EBUSY;
> +		goto out_unlock_mutex;
> +	}
> +
>  	if (flags & KEXEC_ON_CRASH) {
>  		dest_image = &kexec_crash_image;
>  		if (kexec_crash_image)
> @@ -165,6 +171,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>  
>  	kimage_free(image);
>  out_unlock:
> +	atomic_set_release(&crash_kexec_lock, 0);
> +out_unlock_mutex:
>  	mutex_unlock(&kexec_mutex);
>  	return ret;
>  }
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 4d34c78334ce..f957109a266c 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -933,6 +933,7 @@ int kimage_load_segment(struct kimage *image,
>  
>  struct kimage *kexec_image;
>  struct kimage *kexec_crash_image;
> +atomic_t crash_kexec_lock = ATOMIC_INIT(0);
>  int kexec_load_disabled;
>  #ifdef CONFIG_SYSCTL
>  static struct ctl_table kexec_core_sysctls[] = {
> @@ -964,25 +965,26 @@ late_initcall(kexec_core_sysctl_init);
>   */
>  void __noclone __crash_kexec(struct pt_regs *regs)
>  {
> -	/* Take the kexec_mutex here to prevent sys_kexec_load
> -	 * running on one cpu from replacing the crash kernel
> -	 * we are using after a panic on a different cpu.
> -	 *
> -	 * If the crash kernel was not located in a fixed area
> -	 * of memory the xchg(&kexec_crash_image) would be
> -	 * sufficient.  But since I reuse the memory...
> +	/*
> +	 * This should be taking kexec_mutex before doing anything with the
> +	 * kexec_crash_image, but this code can be run in NMI context which
> +	 * means we can't even trylock. This is circumvented by using an
> +	 * atomic variable that is *also* used by the codepaths that take
> +	 * the mutex to modify kexec_crash_image.
>  	 */
> -	if (mutex_trylock(&kexec_mutex)) {
> -		if (kexec_crash_image) {
> -			struct pt_regs fixed_regs;
> -
> -			crash_setup_regs(&fixed_regs, regs);
> -			crash_save_vmcoreinfo();
> -			machine_crash_shutdown(&fixed_regs);
> -			machine_kexec(kexec_crash_image);
> -		}
> -		mutex_unlock(&kexec_mutex);
> +	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1))
> +		return;
> +
> +	if (kexec_crash_image) {
> +		struct pt_regs fixed_regs;
> +
> +		crash_setup_regs(&fixed_regs, regs);
> +		crash_save_vmcoreinfo();
> +		machine_crash_shutdown(&fixed_regs);
> +		machine_kexec(kexec_crash_image);
>  	}
> +
> +	atomic_set_release(&crash_kexec_lock, 0);
>  }
>  STACK_FRAME_NON_STANDARD(__crash_kexec);
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 145321a5e798..3faec031cfc9 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -337,6 +337,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
>  	if (!mutex_trylock(&kexec_mutex))
>  		return -EBUSY;
>  
> +	/*
> +	 * Prevent loading a new crash kernel while one is in use.
> +	 * See associated comment in __crash_kexec().
> +	 */
> +	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1)) {
> +		ret = -EBUSY;
> +		goto out_mutex_unlock;
> +	}
> +
>  	dest_image = &kexec_image;
>  	if (flags & KEXEC_FILE_ON_CRASH) {
>  		dest_image = &kexec_crash_image;
> @@ -406,6 +415,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
>  	if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image)
>  		arch_kexec_protect_crashkres();
>  
> +	atomic_set_release(&crash_kexec_lock, 0);
> +out_mutex_unlock:
>  	mutex_unlock(&kexec_mutex);
>  	kimage_free(image);
>  	return ret;
> -- 
> 2.31.1
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-24  1:30   ` Baoquan He
@ 2022-06-24 13:37     ` Valentin Schneider
  -1 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-24 13:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 24/06/22 09:30, Baoquan He wrote:
> On 06/20/22 at 12:15pm, Valentin Schneider wrote:
>> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>  	/*
>>  	 * Because we write directly to the reserved memory region when loading
>>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
>> -	 * from attempting to load simultaneously, and to prevent a crash kernel
>> -	 * from loading over the top of a in use crash kernel.
>> -	 *
>> -	 * KISS: always take the mutex.
>> +	 * from attempting to load simultaneously.
>>  	 */
>>  	if (!mutex_trylock(&kexec_mutex))
>>  		return -EBUSY;
>
> So kexec_mutex is degenerated to only avoid simultaneous loading,
> should we rename to reflect that?, e.g kexec_load_mutex.
>

It's also serializing crash_get_memory_size() and crash_shrink_memory();
more generally it should still be the preferred serialization mechanism as
it's a "proper" lock visible by instrumentation, the atomic variable is a
side character for the NMI case.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-24 13:37     ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-24 13:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 24/06/22 09:30, Baoquan He wrote:
> On 06/20/22 at 12:15pm, Valentin Schneider wrote:
>> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>  	/*
>>  	 * Because we write directly to the reserved memory region when loading
>>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
>> -	 * from attempting to load simultaneously, and to prevent a crash kernel
>> -	 * from loading over the top of a in use crash kernel.
>> -	 *
>> -	 * KISS: always take the mutex.
>> +	 * from attempting to load simultaneously.
>>  	 */
>>  	if (!mutex_trylock(&kexec_mutex))
>>  		return -EBUSY;
>
> So kexec_mutex is degenerated to only avoid simultaneous loading,
> should we rename to reflect that?, e.g kexec_load_mutex.
>

It's also serializing crash_get_memory_size() and crash_shrink_memory();
more generally it should still be the preferred serialization mechanism as
it's a "proper" lock visible by instrumentation, the atomic variable is a
side character for the NMI case.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-20 11:15 ` Valentin Schneider
@ 2022-06-25 17:04   ` Eric W. Biederman
  -1 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2022-06-25 17:04 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Arnd Bergmann, Petr Mladek,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Luis Claudio R. Goncalves, Andrew Morton, Vivek Goyal

Valentin Schneider <vschneid@redhat.com> writes:

> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
>
> 	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
> 		return 0;
>
> This prevents an NMI panic() from executing the main body of
> __crash_kexec() which does the actual kexec into the kdump kernel.
> The warning and return are explained by:
>
>   6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
>   [...]
>   The reasons for this are:
>
>       1) There is a potential deadlock in the slowpath
>
>       2) Another cpu which blocks on the rtmutex will boost the task
> 	 which allegedly locked the rtmutex, but that cannot work
> 	 because the hard/softirq context borrows the task context.
>
> Furthermore, grabbing the lock isn't NMI safe, so do away with it and
> use an atomic variable to serialize reads vs writes of
> kexec_crash_image.
>
> Tested by triggering NMI panics via:
>
>   $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
>   $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
>   $ echo 1 > /proc/sys/kernel/panic
>
>   $ ipmitool power diag
>
> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>

I am not particularly fond of this patch as it adds more complexity than
is necessary to solve the problem.

Calling a spade a spade PREEMPT_RT's mutex_trylock implementation is
broken as it can not support the use cases of an ordinary mutex_trylock.
I have not seen (possibly I skimmed too quickly) anywhere in the
discussion why PREEMPT_RT is not being fixed.  Looking at the code
there is enough going on in try_to_take_rt_mutex that I can imagine
that some part of that code is not nmi safe.  So I can believe
PREEMPT_RT may be unfix-ably broken.


At this point I recommend going back to being ``unconventional'' with
the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
use a mutex for locking rather than xchg()").

That would also mean that we don't have to worry about the lockdep code
doing something weird in the future and breaking kexec.

Your change starting to is atomic_cmpxchng is most halfway to a revert
of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
xchg()").  So we might as well go the whole way and just document that
the kexec on panic code can not use conventional kernel locking
primitives and has to dig deep and build it's own.  At which point it
makes no sense for the rest of the kexec code to use anything different.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-25 17:04   ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2022-06-25 17:04 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Arnd Bergmann, Petr Mladek,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Luis Claudio R. Goncalves, Andrew Morton, Vivek Goyal

Valentin Schneider <vschneid@redhat.com> writes:

> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
>
> 	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
> 		return 0;
>
> This prevents an NMI panic() from executing the main body of
> __crash_kexec() which does the actual kexec into the kdump kernel.
> The warning and return are explained by:
>
>   6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
>   [...]
>   The reasons for this are:
>
>       1) There is a potential deadlock in the slowpath
>
>       2) Another cpu which blocks on the rtmutex will boost the task
> 	 which allegedly locked the rtmutex, but that cannot work
> 	 because the hard/softirq context borrows the task context.
>
> Furthermore, grabbing the lock isn't NMI safe, so do away with it and
> use an atomic variable to serialize reads vs writes of
> kexec_crash_image.
>
> Tested by triggering NMI panics via:
>
>   $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
>   $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
>   $ echo 1 > /proc/sys/kernel/panic
>
>   $ ipmitool power diag
>
> Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>

I am not particularly fond of this patch as it adds more complexity than
is necessary to solve the problem.

Calling a spade a spade PREEMPT_RT's mutex_trylock implementation is
broken as it can not support the use cases of an ordinary mutex_trylock.
I have not seen (possibly I skimmed too quickly) anywhere in the
discussion why PREEMPT_RT is not being fixed.  Looking at the code
there is enough going on in try_to_take_rt_mutex that I can imagine
that some part of that code is not nmi safe.  So I can believe
PREEMPT_RT may be unfix-ably broken.


At this point I recommend going back to being ``unconventional'' with
the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
use a mutex for locking rather than xchg()").

That would also mean that we don't have to worry about the lockdep code
doing something weird in the future and breaking kexec.

Your change starting to is atomic_cmpxchng is most halfway to a revert
of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
xchg()").  So we might as well go the whole way and just document that
the kexec on panic code can not use conventional kernel locking
primitives and has to dig deep and build it's own.  At which point it
makes no sense for the rest of the kexec code to use anything different.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-24 13:37     ` Valentin Schneider
@ 2022-06-26 10:37       ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2022-06-26 10:37 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 06/24/22 at 02:37pm, Valentin Schneider wrote:
> On 24/06/22 09:30, Baoquan He wrote:
> > On 06/20/22 at 12:15pm, Valentin Schneider wrote:
> >> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
> >>  	/*
> >>  	 * Because we write directly to the reserved memory region when loading
> >>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
> >> -	 * from attempting to load simultaneously, and to prevent a crash kernel
> >> -	 * from loading over the top of a in use crash kernel.
> >> -	 *
> >> -	 * KISS: always take the mutex.
> >> +	 * from attempting to load simultaneously.
> >>  	 */
> >>  	if (!mutex_trylock(&kexec_mutex))
> >>  		return -EBUSY;
> >
> > So kexec_mutex is degenerated to only avoid simultaneous loading,
> > should we rename to reflect that?, e.g kexec_load_mutex.
> >
> 
> It's also serializing crash_get_memory_size() and crash_shrink_memory();
> more generally it should still be the preferred serialization mechanism as
> it's a "proper" lock visible by instrumentation, the atomic variable is a
> side character for the NMI case.

You are right. I only checked the code comment in this place. Then this
patch looks good to me, thx.

Acked-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-26 10:37       ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2022-06-26 10:37 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 06/24/22 at 02:37pm, Valentin Schneider wrote:
> On 24/06/22 09:30, Baoquan He wrote:
> > On 06/20/22 at 12:15pm, Valentin Schneider wrote:
> >> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
> >>  	/*
> >>  	 * Because we write directly to the reserved memory region when loading
> >>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
> >> -	 * from attempting to load simultaneously, and to prevent a crash kernel
> >> -	 * from loading over the top of a in use crash kernel.
> >> -	 *
> >> -	 * KISS: always take the mutex.
> >> +	 * from attempting to load simultaneously.
> >>  	 */
> >>  	if (!mutex_trylock(&kexec_mutex))
> >>  		return -EBUSY;
> >
> > So kexec_mutex is degenerated to only avoid simultaneous loading,
> > should we rename to reflect that?, e.g kexec_load_mutex.
> >
> 
> It's also serializing crash_get_memory_size() and crash_shrink_memory();
> more generally it should still be the preferred serialization mechanism as
> it's a "proper" lock visible by instrumentation, the atomic variable is a
> side character for the NMI case.

You are right. I only checked the code comment in this place. Then this
patch looks good to me, thx.

Acked-by: Baoquan He <bhe@redhat.com>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-26 10:37       ` Baoquan He
@ 2022-06-26 10:45         ` Baoquan He
  -1 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2022-06-26 10:45 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 06/26/22 at 06:37pm, Baoquan He wrote:
> On 06/24/22 at 02:37pm, Valentin Schneider wrote:
> > On 24/06/22 09:30, Baoquan He wrote:
> > > On 06/20/22 at 12:15pm, Valentin Schneider wrote:
> > >> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
> > >>  	/*
> > >>  	 * Because we write directly to the reserved memory region when loading
> > >>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
> > >> -	 * from attempting to load simultaneously, and to prevent a crash kernel
> > >> -	 * from loading over the top of a in use crash kernel.
> > >> -	 *
> > >> -	 * KISS: always take the mutex.
> > >> +	 * from attempting to load simultaneously.
> > >>  	 */
> > >>  	if (!mutex_trylock(&kexec_mutex))
> > >>  		return -EBUSY;
> > >
> > > So kexec_mutex is degenerated to only avoid simultaneous loading,
> > > should we rename to reflect that?, e.g kexec_load_mutex.
> > >
> > 
> > It's also serializing crash_get_memory_size() and crash_shrink_memory();
> > more generally it should still be the preferred serialization mechanism as
> > it's a "proper" lock visible by instrumentation, the atomic variable is a
> > side character for the NMI case.
> 
> You are right. I only checked the code comment in this place. Then this
> patch looks good to me, thx.
> 
> Acked-by: Baoquan He <bhe@redhat.com>

OK, just saw Eric's comment after I replied.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-26 10:45         ` Baoquan He
  0 siblings, 0 replies; 26+ messages in thread
From: Baoquan He @ 2022-06-26 10:45 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, kexec, linux-rt-users, Eric Biederman,
	Arnd Bergmann, Petr Mladek, Thomas Gleixner,
	Sebastian Andrzej Siewior, Juri Lelli, Luis Claudio R. Goncalves

On 06/26/22 at 06:37pm, Baoquan He wrote:
> On 06/24/22 at 02:37pm, Valentin Schneider wrote:
> > On 24/06/22 09:30, Baoquan He wrote:
> > > On 06/20/22 at 12:15pm, Valentin Schneider wrote:
> > >> @@ -94,14 +94,20 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
> > >>  	/*
> > >>  	 * Because we write directly to the reserved memory region when loading
> > >>  	 * crash kernels we need a mutex here to prevent multiple crash kernels
> > >> -	 * from attempting to load simultaneously, and to prevent a crash kernel
> > >> -	 * from loading over the top of a in use crash kernel.
> > >> -	 *
> > >> -	 * KISS: always take the mutex.
> > >> +	 * from attempting to load simultaneously.
> > >>  	 */
> > >>  	if (!mutex_trylock(&kexec_mutex))
> > >>  		return -EBUSY;
> > >
> > > So kexec_mutex is degenerated to only avoid simultaneous loading,
> > > should we rename to reflect that?, e.g kexec_load_mutex.
> > >
> > 
> > It's also serializing crash_get_memory_size() and crash_shrink_memory();
> > more generally it should still be the preferred serialization mechanism as
> > it's a "proper" lock visible by instrumentation, the atomic variable is a
> > side character for the NMI case.
> 
> You are right. I only checked the code comment in this place. Then this
> patch looks good to me, thx.
> 
> Acked-by: Baoquan He <bhe@redhat.com>

OK, just saw Eric's comment after I replied.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-25 17:04   ` Eric W. Biederman
@ 2022-06-27 12:42     ` Valentin Schneider
  -1 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-27 12:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, kexec, linux-rt-users, Arnd Bergmann, Petr Mladek,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Luis Claudio R. Goncalves, Andrew Morton, Vivek Goyal

On 25/06/22 12:04, Eric W. Biederman wrote:
> I am not particularly fond of this patch as it adds more complexity than
> is necessary to solve the problem.
>
> Calling a spade a spade PREEMPT_RT's mutex_trylock implementation is
> broken as it can not support the use cases of an ordinary mutex_trylock.
> I have not seen (possibly I skimmed too quickly) anywhere in the
> discussion why PREEMPT_RT is not being fixed.  Looking at the code
> there is enough going on in try_to_take_rt_mutex that I can imagine
> that some part of that code is not nmi safe.  So I can believe
> PREEMPT_RT may be unfix-ably broken.
>

AFAICT same goes for !PREEMPT_RT given the mutex_unlock(); it's a bit
convoluted but you can craft scenarios where the NMI ends up spinning on
mutex->wait_lock that is owned by the interrupted task, e.g.

  CPU0                    CPU1

                          crash_shrink_memory()
                            mutex_lock();
  crash_get_memory_size()
    mutex_lock()
      raw_spin_lock(&lock->wait_lock);
      // Lock acquired
  <NMI>
                            mutex_unlock()
                              <Release lock->owner>;

  // Owner is free at this point so this succeeds
  mutex_trylock();
  // No kexec_crash_image
  mutex_unlock()
    raw_spin_lock(&lock->wait_lock);

>
> At this point I recommend going back to being ``unconventional'' with
> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
> use a mutex for locking rather than xchg()").
>
> That would also mean that we don't have to worry about the lockdep code
> doing something weird in the future and breaking kexec.
>
> Your change starting to is atomic_cmpxchng is most halfway to a revert
> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
> xchg()").  So we might as well go the whole way and just document that
> the kexec on panic code can not use conventional kernel locking
> primitives and has to dig deep and build it's own.  At which point it
> makes no sense for the rest of the kexec code to use anything different.
>

Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
locking primitives to just where they are needed (loading & kexec'ing), but
I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.

> Eric


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-27 12:42     ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-27 12:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, kexec, linux-rt-users, Arnd Bergmann, Petr Mladek,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Luis Claudio R. Goncalves, Andrew Morton, Vivek Goyal

On 25/06/22 12:04, Eric W. Biederman wrote:
> I am not particularly fond of this patch as it adds more complexity than
> is necessary to solve the problem.
>
> Calling a spade a spade PREEMPT_RT's mutex_trylock implementation is
> broken as it can not support the use cases of an ordinary mutex_trylock.
> I have not seen (possibly I skimmed too quickly) anywhere in the
> discussion why PREEMPT_RT is not being fixed.  Looking at the code
> there is enough going on in try_to_take_rt_mutex that I can imagine
> that some part of that code is not nmi safe.  So I can believe
> PREEMPT_RT may be unfix-ably broken.
>

AFAICT same goes for !PREEMPT_RT given the mutex_unlock(); it's a bit
convoluted but you can craft scenarios where the NMI ends up spinning on
mutex->wait_lock that is owned by the interrupted task, e.g.

  CPU0                    CPU1

                          crash_shrink_memory()
                            mutex_lock();
  crash_get_memory_size()
    mutex_lock()
      raw_spin_lock(&lock->wait_lock);
      // Lock acquired
  <NMI>
                            mutex_unlock()
                              <Release lock->owner>;

  // Owner is free at this point so this succeeds
  mutex_trylock();
  // No kexec_crash_image
  mutex_unlock()
    raw_spin_lock(&lock->wait_lock);

>
> At this point I recommend going back to being ``unconventional'' with
> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
> use a mutex for locking rather than xchg()").
>
> That would also mean that we don't have to worry about the lockdep code
> doing something weird in the future and breaking kexec.
>
> Your change starting to is atomic_cmpxchng is most halfway to a revert
> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
> xchg()").  So we might as well go the whole way and just document that
> the kexec on panic code can not use conventional kernel locking
> primitives and has to dig deep and build it's own.  At which point it
> makes no sense for the rest of the kexec code to use anything different.
>

Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
locking primitives to just where they are needed (loading & kexec'ing), but
I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.

> Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-27 12:42     ` Valentin Schneider
@ 2022-06-28 17:33       ` Valentin Schneider
  -1 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-28 17:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, kexec, linux-rt-users, Arnd Bergmann, Petr Mladek,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Luis Claudio R. Goncalves, Andrew Morton, Vivek Goyal

On 27/06/22 13:42, Valentin Schneider wrote:
> On 25/06/22 12:04, Eric W. Biederman wrote:
>> At this point I recommend going back to being ``unconventional'' with
>> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
>> use a mutex for locking rather than xchg()").
>>
>> That would also mean that we don't have to worry about the lockdep code
>> doing something weird in the future and breaking kexec.
>>
>> Your change starting to is atomic_cmpxchng is most halfway to a revert
>> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
>> xchg()").  So we might as well go the whole way and just document that
>> the kexec on panic code can not use conventional kernel locking
>> primitives and has to dig deep and build it's own.  At which point it
>> makes no sense for the rest of the kexec code to use anything different.
>>
>
> Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
> locking primitives to just where they are needed (loading & kexec'ing), but
> I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.
>

8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") was
straightforward enough because it turned

        if (xchg(&lock, 1))
                return -EBUSY;

into

        if (!mutex_trylock(&lock))
                return -EBUSY;

Now, most of the kexec_mutex uses are trylocks, except for:
- crash_get_memory_size()
- crash_shrink_memory()

I really don't want to go down the route of turning those into cmpxchg
try-loops, would it be acceptable to make those use trylocks (i.e. return
-EBUSY if the cmpxchg fails)?

Otherwise, we keep the mutexes for functions like those which go nowhere
near an NMI.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-28 17:33       ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-28 17:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, kexec, linux-rt-users, Arnd Bergmann, Petr Mladek,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Luis Claudio R. Goncalves, Andrew Morton, Vivek Goyal

On 27/06/22 13:42, Valentin Schneider wrote:
> On 25/06/22 12:04, Eric W. Biederman wrote:
>> At this point I recommend going back to being ``unconventional'' with
>> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
>> use a mutex for locking rather than xchg()").
>>
>> That would also mean that we don't have to worry about the lockdep code
>> doing something weird in the future and breaking kexec.
>>
>> Your change starting to is atomic_cmpxchng is most halfway to a revert
>> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
>> xchg()").  So we might as well go the whole way and just document that
>> the kexec on panic code can not use conventional kernel locking
>> primitives and has to dig deep and build it's own.  At which point it
>> makes no sense for the rest of the kexec code to use anything different.
>>
>
> Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
> locking primitives to just where they are needed (loading & kexec'ing), but
> I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.
>

8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") was
straightforward enough because it turned

        if (xchg(&lock, 1))
                return -EBUSY;

into

        if (!mutex_trylock(&lock))
                return -EBUSY;

Now, most of the kexec_mutex uses are trylocks, except for:
- crash_get_memory_size()
- crash_shrink_memory()

I really don't want to go down the route of turning those into cmpxchg
try-loops, would it be acceptable to make those use trylocks (i.e. return
-EBUSY if the cmpxchg fails)?

Otherwise, we keep the mutexes for functions like those which go nowhere
near an NMI.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-28 17:33       ` Valentin Schneider
@ 2022-06-29 11:55         ` Petr Mladek
  -1 siblings, 0 replies; 26+ messages in thread
From: Petr Mladek @ 2022-06-29 11:55 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Eric W. Biederman, linux-kernel, kexec, linux-rt-users,
	Arnd Bergmann, Thomas Gleixner, Sebastian Andrzej Siewior,
	Juri Lelli, Luis Claudio R. Goncalves, Andrew Morton,
	Vivek Goyal

On Tue 2022-06-28 18:33:08, Valentin Schneider wrote:
> On 27/06/22 13:42, Valentin Schneider wrote:
> > On 25/06/22 12:04, Eric W. Biederman wrote:
> >> At this point I recommend going back to being ``unconventional'' with
> >> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
> >> use a mutex for locking rather than xchg()").
> >>
> >> That would also mean that we don't have to worry about the lockdep code
> >> doing something weird in the future and breaking kexec.
> >>
> >> Your change starting to is atomic_cmpxchng is most halfway to a revert
> >> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
> >> xchg()").  So we might as well go the whole way and just document that
> >> the kexec on panic code can not use conventional kernel locking
> >> primitives and has to dig deep and build it's own.  At which point it
> >> makes no sense for the rest of the kexec code to use anything different.
> >>
> >
> > Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
> > locking primitives to just where they are needed (loading & kexec'ing), but
> > I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.
> >
> 
> 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") was
> straightforward enough because it turned
> 
>         if (xchg(&lock, 1))
>                 return -EBUSY;
> 
> into
> 
>         if (!mutex_trylock(&lock))
>                 return -EBUSY;
> 
> Now, most of the kexec_mutex uses are trylocks, except for:
> - crash_get_memory_size()
> - crash_shrink_memory()
> 
> I really don't want to go down the route of turning those into cmpxchg
> try-loops, would it be acceptable to make those use trylocks (i.e. return
> -EBUSY if the cmpxchg fails)?

IMHO, -EBUSY is acceptable for both crash_get_memory_size()
and crash_shrink_memory(). They are used in the sysfs interface.

> Otherwise, we keep the mutexes for functions like those which go nowhere
> near an NMI.

If we go this way then I would hide the locking into some wrappers,
like crash_kexec_trylock()/unlock() that would do both mutex
and xchg. The xchg part might be hidden in a separate wrapper
__crash_kexec_trylock()/unlock() or
crash_kexec_atomic_trylock()/unlock().

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-29 11:55         ` Petr Mladek
  0 siblings, 0 replies; 26+ messages in thread
From: Petr Mladek @ 2022-06-29 11:55 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Eric W. Biederman, linux-kernel, kexec, linux-rt-users,
	Arnd Bergmann, Thomas Gleixner, Sebastian Andrzej Siewior,
	Juri Lelli, Luis Claudio R. Goncalves, Andrew Morton,
	Vivek Goyal

On Tue 2022-06-28 18:33:08, Valentin Schneider wrote:
> On 27/06/22 13:42, Valentin Schneider wrote:
> > On 25/06/22 12:04, Eric W. Biederman wrote:
> >> At this point I recommend going back to being ``unconventional'' with
> >> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
> >> use a mutex for locking rather than xchg()").
> >>
> >> That would also mean that we don't have to worry about the lockdep code
> >> doing something weird in the future and breaking kexec.
> >>
> >> Your change starting to is atomic_cmpxchng is most halfway to a revert
> >> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
> >> xchg()").  So we might as well go the whole way and just document that
> >> the kexec on panic code can not use conventional kernel locking
> >> primitives and has to dig deep and build it's own.  At which point it
> >> makes no sense for the rest of the kexec code to use anything different.
> >>
> >
> > Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
> > locking primitives to just where they are needed (loading & kexec'ing), but
> > I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.
> >
> 
> 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") was
> straightforward enough because it turned
> 
>         if (xchg(&lock, 1))
>                 return -EBUSY;
> 
> into
> 
>         if (!mutex_trylock(&lock))
>                 return -EBUSY;
> 
> Now, most of the kexec_mutex uses are trylocks, except for:
> - crash_get_memory_size()
> - crash_shrink_memory()
> 
> I really don't want to go down the route of turning those into cmpxchg
> try-loops, would it be acceptable to make those use trylocks (i.e. return
> -EBUSY if the cmpxchg fails)?

IMHO, -EBUSY is acceptable for both crash_get_memory_size()
and crash_shrink_memory(). They are used in the sysfs interface.

> Otherwise, we keep the mutexes for functions like those which go nowhere
> near an NMI.

If we go this way then I would hide the locking into some wrappers,
like crash_kexec_trylock()/unlock() that would do both mutex
and xchg. The xchg part might be hidden in a separate wrapper
__crash_kexec_trylock()/unlock() or
crash_kexec_atomic_trylock()/unlock().

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
  2022-06-29 11:55         ` Petr Mladek
@ 2022-06-29 12:23           ` Valentin Schneider
  -1 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-29 12:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Eric W. Biederman, linux-kernel, kexec, linux-rt-users,
	Arnd Bergmann, Thomas Gleixner, Sebastian Andrzej Siewior,
	Juri Lelli, Luis Claudio R. Goncalves, Andrew Morton,
	Vivek Goyal

On 29/06/22 13:55, Petr Mladek wrote:
> On Tue 2022-06-28 18:33:08, Valentin Schneider wrote:
>>
>> 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") was
>> straightforward enough because it turned
>>
>>         if (xchg(&lock, 1))
>>                 return -EBUSY;
>>
>> into
>>
>>         if (!mutex_trylock(&lock))
>>                 return -EBUSY;
>>
>> Now, most of the kexec_mutex uses are trylocks, except for:
>> - crash_get_memory_size()
>> - crash_shrink_memory()
>>
>> I really don't want to go down the route of turning those into cmpxchg
>> try-loops, would it be acceptable to make those use trylocks (i.e. return
>> -EBUSY if the cmpxchg fails)?
>
> IMHO, -EBUSY is acceptable for both crash_get_memory_size()
> and crash_shrink_memory(). They are used in the sysfs interface.
>
>> Otherwise, we keep the mutexes for functions like those which go nowhere
>> near an NMI.
>
> If we go this way then I would hide the locking into some wrappers,
> like crash_kexec_trylock()/unlock() that would do both mutex
> and xchg. The xchg part might be hidden in a separate wrapper
> __crash_kexec_trylock()/unlock() or
> crash_kexec_atomic_trylock()/unlock().
>

Makes sense, thanks. I've started playing with the trylock/-EBUSY approach,
I'll toss it out if I don't end up hating it.

> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
@ 2022-06-29 12:23           ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2022-06-29 12:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Eric W. Biederman, linux-kernel, kexec, linux-rt-users,
	Arnd Bergmann, Thomas Gleixner, Sebastian Andrzej Siewior,
	Juri Lelli, Luis Claudio R. Goncalves, Andrew Morton,
	Vivek Goyal

On 29/06/22 13:55, Petr Mladek wrote:
> On Tue 2022-06-28 18:33:08, Valentin Schneider wrote:
>>
>> 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") was
>> straightforward enough because it turned
>>
>>         if (xchg(&lock, 1))
>>                 return -EBUSY;
>>
>> into
>>
>>         if (!mutex_trylock(&lock))
>>                 return -EBUSY;
>>
>> Now, most of the kexec_mutex uses are trylocks, except for:
>> - crash_get_memory_size()
>> - crash_shrink_memory()
>>
>> I really don't want to go down the route of turning those into cmpxchg
>> try-loops, would it be acceptable to make those use trylocks (i.e. return
>> -EBUSY if the cmpxchg fails)?
>
> IMHO, -EBUSY is acceptable for both crash_get_memory_size()
> and crash_shrink_memory(). They are used in the sysfs interface.
>
>> Otherwise, we keep the mutexes for functions like those which go nowhere
>> near an NMI.
>
> If we go this way then I would hide the locking into some wrappers,
> like crash_kexec_trylock()/unlock() that would do both mutex
> and xchg. The xchg part might be hidden in a separate wrapper
> __crash_kexec_trylock()/unlock() or
> crash_kexec_atomic_trylock()/unlock().
>

Makes sense, thanks. I've started playing with the trylock/-EBUSY approach,
I'll toss it out if I don't end up hating it.

> Best Regards,
> Petr


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-06-29 12:25 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 11:15 [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe Valentin Schneider
2022-06-20 11:15 ` Valentin Schneider
2022-06-23  9:31 ` Sebastian Andrzej Siewior
2022-06-23  9:31   ` Sebastian Andrzej Siewior
2022-06-23 11:39   ` Valentin Schneider
2022-06-23 11:39     ` Valentin Schneider
2022-06-23 13:35     ` Sebastian Andrzej Siewior
2022-06-23 13:35       ` Sebastian Andrzej Siewior
2022-06-24  1:30 ` Baoquan He
2022-06-24  1:30   ` Baoquan He
2022-06-24 13:37   ` Valentin Schneider
2022-06-24 13:37     ` Valentin Schneider
2022-06-26 10:37     ` Baoquan He
2022-06-26 10:37       ` Baoquan He
2022-06-26 10:45       ` Baoquan He
2022-06-26 10:45         ` Baoquan He
2022-06-25 17:04 ` Eric W. Biederman
2022-06-25 17:04   ` Eric W. Biederman
2022-06-27 12:42   ` Valentin Schneider
2022-06-27 12:42     ` Valentin Schneider
2022-06-28 17:33     ` Valentin Schneider
2022-06-28 17:33       ` Valentin Schneider
2022-06-29 11:55       ` Petr Mladek
2022-06-29 11:55         ` Petr Mladek
2022-06-29 12:23         ` Valentin Schneider
2022-06-29 12:23           ` Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.