Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
@ 2020-05-04  9:40 Mark Marshall
  2020-05-29 13:14 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Marshall @ 2020-05-04  9:40 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Mark Marshall, thomas.graziadei, Thomas Gleixner, bigeasy,
	linux-kernel, rostedt

Hi RT experts,

We are using the RT kernel with the PowerPC e500.  Until recently we
were on the 4.19 kernel series, and are in the process of upgrading.
When we switched to the v5.4 version, we get a reproducible kernel
crash.  The crashes all contain the "BUG: Bad rss-counter state" line,
and then after that it appears that a structure of type mm_struct or
vm_area_struct is corrupted.

The easiest way we have found to reproduce the crash is to repeatedly
insert and then remove a module.  The crash then appears to be related
to either paging in the module or in exiting the mdev process.  (The
crash does also happen at other times, but it is hard to reproduce
reliably then).  This simple script will almost always crash:

   for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done

(The crc7 module is chosen as it is small and simple.  Any module will
work / crash).

We have tried kernels v5.0, v5.2 and v5.6.  The v5.0 and v5.2 kernels
do not show the problem.  The v5.6 kernel does show the problem.
Switching of RT fixes the problem.

I have reduced the functionality in the kernel to a bare minimum
(removing networking, USB and PCI, as we have some out-of-tree patches
in those areas) and we still get the crash.

Here are a couple of example stack traces:

000: NIP [c003f8e0] __mmdrop+0x2c8/0x3dc
000: LR [c003f8e0] __mmdrop+0x2c8/0x3dc
000: Call Trace:
000: [e953fd48] [c003f8e0] __mmdrop+0x2c8/0x3dc
000:  (unreliable)
000: [e953fd88] [c00c6d28] rcu_core+0x324/0x78c
000: [e953fe58] [c00c79e0] rcu_cpu_kthread+0x1f4/0x42c
000: [e953fe98] [c00838fc] smpboot_thread_fn+0x2e8/0x488
000: [e953fef8] [c007d514] kthread+0x1b0/0x1b8
000: [e953ff38] [c001a26c] ret_from_kernel_thread+0x14/0x1c

000: NIP [c010cdd4] acct_collect+0x3a8/0x3e0
000: LR [c010cdd4] acct_collect+0x3a8/0x3e0
000: Call Trace:
000: [c6f2bbe0] [c010cdd4] acct_collect+0x3a8/0x3e0
000:  (unreliable)
000: [c6f2bc10] [c0049354] do_exit+0x294/0xf9c
000: [c6f2bcf0] [c0013030] die+0x220/0x2c4
000: [c6f2bd30] [c00132cc] exception_common+0x1f8/0x238
000: [c6f2bd30] [c00132cc] exception_common+0x1f8/0x238
000: [c6f2bd70] [c0013404] _exception+0x34/0x80
000: [c6f2bd90] [c001a4a8] ret_from_except_full+0x0/0x4

I have added some debugging code where the mm_struct and
vma_area_struct have "poision" values at the start and the end, and
this seems to show that the vma_area_struct is getting corrupted, but
I'm not able to see where.

We have switched on all of the debugging that we can, including
KASAN, and this shows nothing.

Can anyone help us?  What can we try next?  Is anyone using the e500
with the RT kernel?  Does anyone have any idea how to debug problems
related to the error message "Bad rss-counter state"?

Any help or advice would be most gratefully received.

Many thanks,
Mark Marshall and Thomas Graziadei

PS.  Thomas Grazidei (my colleague) did find a bug in the start_32.S
file for the e500, and we have the fix for that included.  We have
also tried removing the LAZY_PREEMPTION patch completely, and this
doesn't help.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-05-04  9:40 Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 Mark Marshall
@ 2020-05-29 13:14 ` Sebastian Andrzej Siewior
  2020-05-29 15:38   ` Mark Marshall
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-29 13:14 UTC (permalink / raw)
  To: Mark Marshall
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt

On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote:
> The easiest way we have found to reproduce the crash is to repeatedly
> insert and then remove a module.  The crash then appears to be related
> to either paging in the module or in exiting the mdev process.  (The
> crash does also happen at other times, but it is hard to reproduce
> reliably then).  This simple script will almost always crash:
> 
>    for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done

So I tried that on 5.6.14-rt7 with the qemu version of e500 (the SMP and
UP version). No luck. I don't have anything with real hardware.
Could you share the .config in case this is related?

> (The crc7 module is chosen as it is small and simple.  Any module will
> work / crash).
> 
> We have tried kernels v5.0, v5.2 and v5.6.  The v5.0 and v5.2 kernels
> do not show the problem.  The v5.6 kernel does show the problem.
> Switching of RT fixes the problem.
> 
> I have reduced the functionality in the kernel to a bare minimum
> (removing networking, USB and PCI, as we have some out-of-tree patches
> in those areas) and we still get the crash.
…
> I have added some debugging code where the mm_struct and
> vma_area_struct have "poision" values at the start and the end, and
> this seems to show that the vma_area_struct is getting corrupted, but
> I'm not able to see where.

oh.

> We have switched on all of the debugging that we can, including
> KASAN, and this shows nothing.
> 
> 
> Can anyone help us?  What can we try next?  Is anyone using the e500
> with the RT kernel?  Does anyone have any idea how to debug problems
> related to the error message "Bad rss-counter state"?
> 
> Any help or advice would be most gratefully received.

I don't have any ideas. You could try to apply only a part of the RT
patch and see if it problem is still there. If you are lucky you find
the patch that introduces the problem. If not, the problem appears with
the RT switch…

> Many thanks,
> Mark Marshall and Thomas Graziadei

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-05-29 13:14 ` Sebastian Andrzej Siewior
@ 2020-05-29 15:38   ` Mark Marshall
  2020-05-29 16:15     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Marshall @ 2020-05-29 15:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt

Hi Sebastian & list,

I had assumed that my e-mail had got lost or overlooked, I was meaning to
post a follow up message this week...

All I could find from the debugging and tracing that we added was that
something was going wrong with the mm data structures somewhere in the
exec code.  In the end I just spent a week or two pouring over the diffs
of this code between the versions that I new worked and didn't work.

I eventually found the culprit.  On the working kernel versions there is
a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
Although the commit message talks about ARM, it seems that we need this for
PowerPC too (I guess, any PowerPC with the "nohash" MMU?).

Could you please add this commit back to the RT branch?  I'm not sure how
to find out the history of this commit.  For instance, why has it been
removed from the RT patchset?  How are these things tracked, generally?

Best regards,
Mark

On Fri, 29 May 2020 at 15:14, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote:
> > The easiest way we have found to reproduce the crash is to repeatedly
> > insert and then remove a module.  The crash then appears to be related
> > to either paging in the module or in exiting the mdev process.  (The
> > crash does also happen at other times, but it is hard to reproduce
> > reliably then).  This simple script will almost always crash:
> >
> >    for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done
>
> So I tried that on 5.6.14-rt7 with the qemu version of e500 (the SMP and
> UP version). No luck. I don't have anything with real hardware.
> Could you share the .config in case this is related?
>
> > (The crc7 module is chosen as it is small and simple.  Any module will
> > work / crash).
> >
> > We have tried kernels v5.0, v5.2 and v5.6.  The v5.0 and v5.2 kernels
> > do not show the problem.  The v5.6 kernel does show the problem.
> > Switching of RT fixes the problem.
> >
> > I have reduced the functionality in the kernel to a bare minimum
> > (removing networking, USB and PCI, as we have some out-of-tree patches
> > in those areas) and we still get the crash.
> …
> > I have added some debugging code where the mm_struct and
> > vma_area_struct have "poision" values at the start and the end, and
> > this seems to show that the vma_area_struct is getting corrupted, but
> > I'm not able to see where.
>
> oh.
>
> > We have switched on all of the debugging that we can, including
> > KASAN, and this shows nothing.
> >
> >
> > Can anyone help us?  What can we try next?  Is anyone using the e500
> > with the RT kernel?  Does anyone have any idea how to debug problems
> > related to the error message "Bad rss-counter state"?
> >
> > Any help or advice would be most gratefully received.
>
> I don't have any ideas. You could try to apply only a part of the RT
> patch and see if it problem is still there. If you are lucky you find
> the patch that introduces the problem. If not, the problem appears with
> the RT switch…
>
> > Many thanks,
> > Mark Marshall and Thomas Graziadei
>
> Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-05-29 15:38   ` Mark Marshall
@ 2020-05-29 16:15     ` Sebastian Andrzej Siewior
  2020-05-29 16:37       ` Sebastian Andrzej Siewior
  2020-05-29 19:03       ` Mark Marshall
  0 siblings, 2 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-29 16:15 UTC (permalink / raw)
  To: Mark Marshall
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt

On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> Hi Sebastian & list,
Hi,

> I had assumed that my e-mail had got lost or overlooked, I was meaning to
> post a follow up message this week...
> 
> All I could find from the debugging and tracing that we added was that
> something was going wrong with the mm data structures somewhere in the
> exec code.  In the end I just spent a week or two pouring over the diffs
> of this code between the versions that I new worked and didn't work.
> 
> I eventually found the culprit.  On the working kernel versions there is
> a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> Although the commit message talks about ARM, it seems that we need this for
> PowerPC too (I guess, any PowerPC with the "nohash" MMU?).

Could you drop me your config, please? I need to dig here a little and I
should have seen this on qemu, right?

> Could you please add this commit back to the RT branch?  I'm not sure how
> to find out the history of this commit.  For instance, why has it been
> removed from the RT patchset?  How are these things tracked, generally?

I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
was documented in the patch and the code that triggered the warning was
removed / reworked in commit
    b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")

So it looked like no longer needed and then got dropped during the
rebase.
In order to get it back into the RT queue I need to understand why it is
required. What exactly is it fixing. Let me stare at for a little…

> Best regards,
> Mark

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-05-29 16:15     ` Sebastian Andrzej Siewior
@ 2020-05-29 16:37       ` Sebastian Andrzej Siewior
  2020-07-06 16:50         ` Sebastian Andrzej Siewior
  2020-05-29 19:03       ` Mark Marshall
  1 sibling, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-29 16:37 UTC (permalink / raw)
  To: Mark Marshall
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt

On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a little…

it used to be local_irq_disable() which then became preempt_disable()
local_irq_disable() due to ARM's limitation.

> > Best regards,
> > Mark
> 
Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-05-29 16:37       ` Sebastian Andrzej Siewior
@ 2020-07-06 16:50         ` Sebastian Andrzej Siewior
  2020-07-10 10:59           ` Thomas Graziadei
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-07-06 16:50 UTC (permalink / raw)
  To: Mark Marshall
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt

On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote:
> On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> > In order to get it back into the RT queue I need to understand why it is
> > required. What exactly is it fixing. Let me stare at for a little…
> 
> it used to be local_irq_disable() which then became preempt_disable()
> local_irq_disable() due to ARM's limitation.

Any luck on your side?

I *think* if you swap the mm assignment in exec_mmap() then it should be
gone. Basically:
|         tsk->active_mm = mm;
|         tsk->mm = mm;

However I think to apply something like this:

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm)
 		}
 	}
 	task_lock(tsk);
+
+	task_lock_mm();
 	active_mm = tsk->active_mm;
 	membarrier_exec_mmap(mm);
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
+	task_unlock_mm();
+
 	tsk->mm->vmacache_seqnum = 0;
 	vmacache_flush(tsk);
 	task_unlock(tsk);
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -176,4 +176,31 @@ static inline void task_unlock(struct task_struct *p)
 	spin_unlock(&p->alloc_lock);
 }
 
+#ifdef CONFIG_PREEMPT_RT
+/*
+ * Protects ->mm and ->active_mm.
+ * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not read the
+ * members while they are updated.
+ */
+static inline void task_lock_mm(void)
+{
+	preempt_disable();
+}
+
+static inline void task_unlock_mm(void)
+{
+	preempt_enable();
+}
+
+#else
+
+static inline void task_lock_mm(void)
+{
+}
+
+static inline void task_unlock_mm(void)
+{
+}
+#endif
+
 #endif /* _LINUX_SCHED_TASK_H */
diff --git a/mm/mmu_context.c b/mm/mmu_context.c
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm)
 	struct task_struct *tsk = current;
 
 	task_lock(tsk);
+	task_lock_mm();
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
 		mmgrab(mm);
@@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm)
 	}
 	tsk->mm = mm;
 	switch_mm(active_mm, mm, tsk);
+	task_unlock_mm();
 	task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
 	finish_arch_post_lock_switch();
@@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm)
 	struct task_struct *tsk = current;
 
 	task_lock(tsk);
+	task_lock_mm();
 	sync_mm_rss(mm);
 	tsk->mm = NULL;
 	/* active_mm is still 'mm' */
 	enter_lazy_tlb(mm, tsk);
+	task_unlock_mm();
 	task_unlock(tsk);
 }
 EXPORT_SYMBOL_GPL(unuse_mm);
-- 
2.27.0

> > > Best regards,
> > > Mark

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-07-06 16:50         ` Sebastian Andrzej Siewior
@ 2020-07-10 10:59           ` Thomas Graziadei
  2020-08-12 12:45             ` Thomas Graziadei
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Graziadei @ 2020-07-10 10:59 UTC (permalink / raw)
  To: 'Sebastian Andrzej Siewior', Mark Marshall
  Cc: linux-rt-users, Mark Marshall, Thomas Gleixner, linux-kernel, rostedt

[-- Attachment #1: Type: text/plain, Size: 4133 bytes --]

Hi Sebastian,

thanks for looking into this.

We could reproduce the issue with QEMU.
At runtime you need to set mdev as the kernel's hotplug client (/proc/sys/kernel/hotplug) and give it a dummy /etc/mdev.conf like (.* 1:1 777). Then just do a loop and insmod/rmmod crc4.ko and crc7.ko.

Swapping the mm assignment did not work -> exception after 1900 iterations
Your second suggestion with check.patch (attached to this email for completeness, only protecting the exec_mmap function) did not work eighter -> exception after 2600 iterations

Your third suggestion (a modification to the original revert) enclosed in this e-mail does seem to work. Still no problems after 30000 iterations.

By the way, as noticed in your kernel config, we would be quite interested in a gcc 9 compiler for our platform. Is there a mainline/maintained version or fork for this or another possibility to get it?
 
Regards,
Thomas

-----Original Message-----
From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de] 
Sent: Monday, July 06, 2020 6:50 PM
To: Mark Marshall <markmarshall14@gmail.com>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>; Mark Marshall <mark.marshall@omicronenergy.com>; Thomas Graziadei <thomas.graziadei@omicronenergy.com>; Thomas Gleixner <tglx@linutronix.de>; linux-kernel@vger.kernel.org; rostedt@goodmis.org
Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote:
> On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> > In order to get it back into the RT queue I need to understand why 
> > it is required. What exactly is it fixing. Let me stare at for a 
> > little…
> 
> it used to be local_irq_disable() which then became preempt_disable()
> local_irq_disable() due to ARM's limitation.

Any luck on your side?

I *think* if you swap the mm assignment in exec_mmap() then it should be gone. Basically:
|         tsk->active_mm = mm;
|         tsk->mm = mm;

However I think to apply something like this:

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm)
 		}
 	}
 	task_lock(tsk);
+
+	task_lock_mm();
 	active_mm = tsk->active_mm;
 	membarrier_exec_mmap(mm);
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
+	task_unlock_mm();
+
 	tsk->mm->vmacache_seqnum = 0;
 	vmacache_flush(tsk);
 	task_unlock(tsk);
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -176,4 +176,31 @@ static inline void task_unlock(struct task_struct *p)
 	spin_unlock(&p->alloc_lock);
 }
 
+#ifdef CONFIG_PREEMPT_RT
+/*
+ * Protects ->mm and ->active_mm.
+ * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not read 
+the
+ * members while they are updated.
+ */
+static inline void task_lock_mm(void)
+{
+	preempt_disable();
+}
+
+static inline void task_unlock_mm(void) {
+	preempt_enable();
+}
+
+#else
+
+static inline void task_lock_mm(void)
+{
+}
+
+static inline void task_unlock_mm(void) { } #endif
+
 #endif /* _LINUX_SCHED_TASK_H */
diff --git a/mm/mmu_context.c b/mm/mmu_context.c
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm)
 	struct task_struct *tsk = current;
 
 	task_lock(tsk);
+	task_lock_mm();
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
 		mmgrab(mm);
@@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm)
 	}
 	tsk->mm = mm;
 	switch_mm(active_mm, mm, tsk);
+	task_unlock_mm();
 	task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
 	finish_arch_post_lock_switch();
@@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm)
 	struct task_struct *tsk = current;
 
 	task_lock(tsk);
+	task_lock_mm();
 	sync_mm_rss(mm);
 	tsk->mm = NULL;
 	/* active_mm is still 'mm' */
 	enter_lazy_tlb(mm, tsk);
+	task_unlock_mm();
 	task_unlock(tsk);
 }
 EXPORT_SYMBOL_GPL(unuse_mm);
--
2.27.0

> > > Best regards,
> > > Mark

Sebastian

[-- Attachment #2: check.patch --]
[-- Type: application/octet-stream, Size: 437 bytes --]

diff --git a/fs/exec.c b/fs/exec.c
index 77603ceed51f9..1310fb4d5f0d4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1037,8 +1037,12 @@ static int exec_mmap(struct mm_struct *mm)
 	task_lock(tsk);
 	active_mm = tsk->active_mm;
 	membarrier_exec_mmap(mm);
+
+	preempt_disable();
 	tsk->mm = mm;
 	tsk->active_mm = mm;
+	preempt_enable();
+
 	activate_mm(active_mm, mm);
 	tsk->mm->vmacache_seqnum = 0;
 	vmacache_flush(tsk);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-07-10 10:59           ` Thomas Graziadei
@ 2020-08-12 12:45             ` Thomas Graziadei
  2020-08-19  7:11               ` 'Sebastian Andrzej Siewior'
  2020-09-01  7:41               ` 'Sebastian Andrzej Siewior'
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Graziadei @ 2020-08-12 12:45 UTC (permalink / raw)
  To: 'Sebastian Andrzej Siewior', Mark Marshall
  Cc: linux-rt-users, Mark Marshall, Thomas Gleixner, linux-kernel, rostedt

Hi Sebastian,

any progress on your side?

Do you think the patch could be applied for the next versions?

Regards,
Thomas

On Fri, 2020-07-10 at 10:59 +0000, Thomas Graziadei wrote:
> Hi Sebastian,
> 
> thanks for looking into this.
> 
> We could reproduce the issue with QEMU.
> At runtime you need to set mdev as the kernel's hotplug client
> (/proc/sys/kernel/hotplug) and give it a dummy /etc/mdev.conf like
> (.* 1:1 777). Then just do a loop and insmod/rmmod crc4.ko and
> crc7.ko.
> 
> Swapping the mm assignment did not work -> exception after 1900
> iterations
> Your second suggestion with check.patch (attached to this email for
> completeness, only protecting the exec_mmap function) did not work
> eighter -> exception after 2600 iterations
> 
> Your third suggestion (a modification to the original revert)
> enclosed in this e-mail does seem to work. Still no problems after
> 30000 iterations.
> 
> By the way, as noticed in your kernel config, we would be quite
> interested in a gcc 9 compiler for our platform. Is there a
> mainline/maintained version or fork for this or another possibility
> to get it?
>  
> Regards,
> Thomas
> 
> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de] 
> Sent: Monday, July 06, 2020 6:50 PM
> To: Mark Marshall <markmarshall14@gmail.com>
> Cc: linux-rt-users <linux-rt-users@vger.kernel.org>; Mark Marshall <
> mark.marshall@omicronenergy.com>; Thomas Graziadei <
> thomas.graziadei@omicronenergy.com>; Thomas Gleixner <
> tglx@linutronix.de>; linux-kernel@vger.kernel.org; 
> rostedt@goodmis.org
> Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17
> and PowerPC e500
> 
> On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote:
> > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote:
> > > In order to get it back into the RT queue I need to understand
> > > why 
> > > it is required. What exactly is it fixing. Let me stare at for a 
> > > little…
> > 
> > it used to be local_irq_disable() which then became
> > preempt_disable()
> > local_irq_disable() due to ARM's limitation.
> 
> Any luck on your side?
> 
> I *think* if you swap the mm assignment in exec_mmap() then it should
> be gone. Basically:
> >         tsk->active_mm = mm;
> >         tsk->mm = mm;
> 
> However I think to apply something like this:
> 
> diff --git a/fs/exec.c b/fs/exec.c
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm)
>  		}
>  	}
>  	task_lock(tsk);
> +
> +	task_lock_mm();
>  	active_mm = tsk->active_mm;
>  	membarrier_exec_mmap(mm);
>  	tsk->mm = mm;
>  	tsk->active_mm = mm;
>  	activate_mm(active_mm, mm);
> +	task_unlock_mm();
> +
>  	tsk->mm->vmacache_seqnum = 0;
>  	vmacache_flush(tsk);
>  	task_unlock(tsk);
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -176,4 +176,31 @@ static inline void task_unlock(struct
> task_struct *p)
>  	spin_unlock(&p->alloc_lock);
>  }
>  
> +#ifdef CONFIG_PREEMPT_RT
> +/*
> + * Protects ->mm and ->active_mm.
> + * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not
> read 
> +the
> + * members while they are updated.
> + */
> +static inline void task_lock_mm(void)
> +{
> +	preempt_disable();
> +}
> +
> +static inline void task_unlock_mm(void) {
> +	preempt_enable();
> +}
> +
> +#else
> +
> +static inline void task_lock_mm(void)
> +{
> +}
> +
> +static inline void task_unlock_mm(void) { } #endif
> +
>  #endif /* _LINUX_SCHED_TASK_H */
> diff --git a/mm/mmu_context.c b/mm/mmu_context.c
> --- a/mm/mmu_context.c
> +++ b/mm/mmu_context.c
> @@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm)
>  	struct task_struct *tsk = current;
>  
>  	task_lock(tsk);
> +	task_lock_mm();
>  	active_mm = tsk->active_mm;
>  	if (active_mm != mm) {
>  		mmgrab(mm);
> @@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm)
>  	}
>  	tsk->mm = mm;
>  	switch_mm(active_mm, mm, tsk);
> +	task_unlock_mm();
>  	task_unlock(tsk);
>  #ifdef finish_arch_post_lock_switch
>  	finish_arch_post_lock_switch();
> @@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm)
>  	struct task_struct *tsk = current;
>  
>  	task_lock(tsk);
> +	task_lock_mm();
>  	sync_mm_rss(mm);
>  	tsk->mm = NULL;
>  	/* active_mm is still 'mm' */
>  	enter_lazy_tlb(mm, tsk);
> +	task_unlock_mm();
>  	task_unlock(tsk);
>  }
>  EXPORT_SYMBOL_GPL(unuse_mm);
> --
> 2.27.0
> 
> > > > Best regards,
> > > > Mark
> 
> Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-08-12 12:45             ` Thomas Graziadei
@ 2020-08-19  7:11               ` 'Sebastian Andrzej Siewior'
  2020-09-01  7:41               ` 'Sebastian Andrzej Siewior'
  1 sibling, 0 replies; 11+ messages in thread
From: 'Sebastian Andrzej Siewior' @ 2020-08-19  7:11 UTC (permalink / raw)
  To: Thomas Graziadei
  Cc: Mark Marshall, linux-rt-users, Mark Marshall, Thomas Gleixner,
	linux-kernel, rostedt

On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote:
> Hi Sebastian,
Hi Thomas,

> any progress on your side?

due to lack of time none. But I am on it…

> Do you think the patch could be applied for the next versions?

So I had a theory why it happens but then you said no so now I need
to figure out why it happens so I can write it in the changelog.

I believe you made it happen in qemu and you sent a .config and
everything so I will stare into it as soon as I can.

> Regards,
> Thomas

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-08-12 12:45             ` Thomas Graziadei
  2020-08-19  7:11               ` 'Sebastian Andrzej Siewior'
@ 2020-09-01  7:41               ` 'Sebastian Andrzej Siewior'
  1 sibling, 0 replies; 11+ messages in thread
From: 'Sebastian Andrzej Siewior' @ 2020-09-01  7:41 UTC (permalink / raw)
  To: Thomas Graziadei
  Cc: Mark Marshall, linux-rt-users, Mark Marshall, Thomas Gleixner,
	linux-kernel, rostedt

On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote:
> Hi Sebastian,
Hi Thomas,

> any progress on your side?
> 
> Do you think the patch could be applied for the next versions?

Yes.  The ->active_mm change needs to be protected against scheduling
regardless of the arch/mmu. Otherwise the mm will be put twice. For this
to trigger you need to exec from a kernel thread and get preempted.
This will be addressed in use_mm() by commit
    38cf307c1f201 ("mm: fix kthread_use_mm() vs TLB invalidate")

which is in v5.9-rc1 and exec_mmap() is under discussion at
    https://lore.kernel.org/linux-arch/20200828100022.1099682-2-npiggin@gmail.com/

> Regards,
> Thomas

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
  2020-05-29 16:15     ` Sebastian Andrzej Siewior
  2020-05-29 16:37       ` Sebastian Andrzej Siewior
@ 2020-05-29 19:03       ` Mark Marshall
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Marshall @ 2020-05-29 19:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt

[-- Attachment #1: Type: text/plain, Size: 2851 bytes --]

My config is attached.  This is the greatly reduced config that I used
when trying to narrow down the problem.  We normally have much more
enabled, but that had no effect on the bug in my testing.  We do,
unfortunately, have quite a few out-of-tree patches, but they are all
in USB or Networking, which are disabled here.

I've never tried out the kernel under qemu, but I will try that next
week to see if I can reproduce the problem there.  It's certainly
quite a narrow race window though, so it might behave quite
differently under qemu.  In general, how reliable is qemu at showing
these kinds of problems?

Thanks,
Mark

PS.
I've also noticed that THREAD_SHIFT is set in this config.  That's
because when I added lots of debug options, I got warnings about the
stack being too small.  This had no impact on the bug that I had, I
increased the size of the stack, and the stack warnings stopped, but
the bug was still the same.

On Fri, 29 May 2020 at 18:15, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> > Hi Sebastian & list,
> Hi,
>
> > I had assumed that my e-mail had got lost or overlooked, I was meaning to
> > post a follow up message this week...
> >
> > All I could find from the debugging and tracing that we added was that
> > something was going wrong with the mm data structures somewhere in the
> > exec code.  In the end I just spent a week or two pouring over the diffs
> > of this code between the versions that I new worked and didn't work.
> >
> > I eventually found the culprit.  On the working kernel versions there is
> > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> > Although the commit message talks about ARM, it seems that we need this for
> > PowerPC too (I guess, any PowerPC with the "nohash" MMU?).
>
> Could you drop me your config, please? I need to dig here a little and I
> should have seen this on qemu, right?
>
> > Could you please add this commit back to the RT branch?  I'm not sure how
> > to find out the history of this commit.  For instance, why has it been
> > removed from the RT patchset?  How are these things tracked, generally?
>
> I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
> was documented in the patch and the code that triggered the warning was
> removed / reworked in commit
>     b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")
>
> So it looked like no longer needed and then got dropped during the
> rebase.
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a little…
>
> > Best regards,
> > Mark
>
> Sebastian

[-- Attachment #2: config-5.4-rt --]
[-- Type: application/octet-stream, Size: 5142 bytes --]

# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_RT=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_RCU_EXPERT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
# CONFIG_RSEQ is not set
CONFIG_EMBEDDED=y
CONFIG_PERF_EVENTS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PPC_85xx=y
CONFIG_MPC85xx_DS=y
CONFIG_MPC85xx_RDB=y
CONFIG_P1010_RDB=y
CONFIG_MAIO400=y
CONFIG_MIC400=y
CONFIG_GEN_RTC=y
CONFIG_HZ_1000=y
CONFIG_THREAD_SHIFT=14
# CONFIG_SUSPEND is not set
# CONFIG_SECCOMP is not set
CONFIG_FSL_LBC=y
CONFIG_JUMP_LABEL=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLK_DEV_INTEGRITY=y
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_COMPACTION is not set
# CONFIG_MIGRATION is not set
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_STANDALONE is not set
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_RAW_NAND=y
CONFIG_MTD_NAND_FSL_IFC=y
CONFIG_MTD_SPI_NOR=y
CONFIG_MTD_UBI=y
CONFIG_MTD_UBI_FASTMAP=y
CONFIG_MTD_UBI_BLOCK=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=1
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=2
CONFIG_BLK_DEV_RAM_SIZE=131072
CONFIG_EEPROM_AT24=y
CONFIG_EEPROM_AT25=y
CONFIG_EEPROM_93CX6=m
CONFIG_SCSI=y
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=y
# CONFIG_SCSI_LOWLEVEL is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_GPIO=y
# CONFIG_INPUT_MOUSE is not set
# CONFIG_SERIO is not set
CONFIG_LEGACY_PTY_COUNT=64
CONFIG_DEVKMEM=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_NR_UARTS=2
CONFIG_SERIAL_8250_RUNTIME_UARTS=2
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y
# CONFIG_NVRAM is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS_SPI=y
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MPC=y
CONFIG_SPI=y
CONFIG_SPI_FSL_ESPI=y
CONFIG_GPIOLIB=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_MPC8XXX=y
CONFIG_GPIO_PCA953X=y
CONFIG_GPIO_PCA953X_IRQ=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_BOOKE_WDT=y
CONFIG_BOOKE_WDT_DEFAULT_TIMEOUT=34
# CONFIG_VGA_CONSOLE is not set
# CONFIG_HID is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_DRV_DS1307=y
CONFIG_RTC_DRV_CMOS=y
# CONFIG_DNOTIFY is not set
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
CONFIG_JFFS2_FS_WBUF_VERIFY=y
CONFIG_JFFS2_SUMMARY=y
CONFIG_JFFS2_FS_XATTR=y
CONFIG_UBIFS_FS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_DIRECT=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
CONFIG_KEYS=y
CONFIG_CRYPTO_ECDH=y
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_ECHAINIV=m
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_ESSIV=y
CONFIG_CRYPTO_CMAC=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MD5_PPC=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_PPC_SPE=y
CONFIG_CRYPTO_SHA256_PPC_SPE=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_PPC_SPE=y
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_DEV_FSL_CAAM=y
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_PKCS7_MESSAGE_PARSER=y
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_CRC_CCITT=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_IA64 is not set
# CONFIG_XZ_DEC_ARM is not set
# CONFIG_XZ_DEC_ARMTHUMB is not set
# CONFIG_XZ_DEC_SPARC is not set
CONFIG_DYNAMIC_DEBUG=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_PAGE_POISONING=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_VMACACHE=y
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_POISON=y
CONFIG_DEBUG_VIRTUAL=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_KASAN=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=60
CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_RCU_EQS_DEBUG=y
CONFIG_FUNCTION_TRACER=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_UBSAN=y
CONFIG_PPC_DISABLE_WERROR=y
CONFIG_PPC_EMULATED_STATS=y
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y
CONFIG_BDI_SWITCH=y

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-09-01  7:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04  9:40 Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 Mark Marshall
2020-05-29 13:14 ` Sebastian Andrzej Siewior
2020-05-29 15:38   ` Mark Marshall
2020-05-29 16:15     ` Sebastian Andrzej Siewior
2020-05-29 16:37       ` Sebastian Andrzej Siewior
2020-07-06 16:50         ` Sebastian Andrzej Siewior
2020-07-10 10:59           ` Thomas Graziadei
2020-08-12 12:45             ` Thomas Graziadei
2020-08-19  7:11               ` 'Sebastian Andrzej Siewior'
2020-09-01  7:41               ` 'Sebastian Andrzej Siewior'
2020-05-29 19:03       ` Mark Marshall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).