* Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 @ 2020-05-04 9:40 Mark Marshall 2020-05-29 13:14 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 11+ messages in thread From: Mark Marshall @ 2020-05-04 9:40 UTC (permalink / raw) To: linux-rt-users Cc: Mark Marshall, thomas.graziadei, Thomas Gleixner, bigeasy, linux-kernel, rostedt Hi RT experts, We are using the RT kernel with the PowerPC e500. Until recently we were on the 4.19 kernel series, and are in the process of upgrading. When we switched to the v5.4 version, we get a reproducible kernel crash. The crashes all contain the "BUG: Bad rss-counter state" line, and then after that it appears that a structure of type mm_struct or vm_area_struct is corrupted. The easiest way we have found to reproduce the crash is to repeatedly insert and then remove a module. The crash then appears to be related to either paging in the module or in exiting the mdev process. (The crash does also happen at other times, but it is hard to reproduce reliably then). This simple script will almost always crash: for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done (The crc7 module is chosen as it is small and simple. Any module will work / crash). We have tried kernels v5.0, v5.2 and v5.6. The v5.0 and v5.2 kernels do not show the problem. The v5.6 kernel does show the problem. Switching of RT fixes the problem. I have reduced the functionality in the kernel to a bare minimum (removing networking, USB and PCI, as we have some out-of-tree patches in those areas) and we still get the crash. Here are a couple of example stack traces: 000: NIP [c003f8e0] __mmdrop+0x2c8/0x3dc 000: LR [c003f8e0] __mmdrop+0x2c8/0x3dc 000: Call Trace: 000: [e953fd48] [c003f8e0] __mmdrop+0x2c8/0x3dc 000: (unreliable) 000: [e953fd88] [c00c6d28] rcu_core+0x324/0x78c 000: [e953fe58] [c00c79e0] rcu_cpu_kthread+0x1f4/0x42c 000: [e953fe98] [c00838fc] smpboot_thread_fn+0x2e8/0x488 000: [e953fef8] [c007d514] kthread+0x1b0/0x1b8 000: [e953ff38] [c001a26c] ret_from_kernel_thread+0x14/0x1c 000: NIP [c010cdd4] acct_collect+0x3a8/0x3e0 000: LR [c010cdd4] acct_collect+0x3a8/0x3e0 000: Call Trace: 000: [c6f2bbe0] [c010cdd4] acct_collect+0x3a8/0x3e0 000: (unreliable) 000: [c6f2bc10] [c0049354] do_exit+0x294/0xf9c 000: [c6f2bcf0] [c0013030] die+0x220/0x2c4 000: [c6f2bd30] [c00132cc] exception_common+0x1f8/0x238 000: [c6f2bd30] [c00132cc] exception_common+0x1f8/0x238 000: [c6f2bd70] [c0013404] _exception+0x34/0x80 000: [c6f2bd90] [c001a4a8] ret_from_except_full+0x0/0x4 I have added some debugging code where the mm_struct and vma_area_struct have "poision" values at the start and the end, and this seems to show that the vma_area_struct is getting corrupted, but I'm not able to see where. We have switched on all of the debugging that we can, including KASAN, and this shows nothing. Can anyone help us? What can we try next? Is anyone using the e500 with the RT kernel? Does anyone have any idea how to debug problems related to the error message "Bad rss-counter state"? Any help or advice would be most gratefully received. Many thanks, Mark Marshall and Thomas Graziadei PS. Thomas Grazidei (my colleague) did find a bug in the start_32.S file for the e500, and we have the fix for that included. We have also tried removing the LAZY_PREEMPTION patch completely, and this doesn't help. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-05-04 9:40 Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 Mark Marshall @ 2020-05-29 13:14 ` Sebastian Andrzej Siewior 2020-05-29 15:38 ` Mark Marshall 0 siblings, 1 reply; 11+ messages in thread From: Sebastian Andrzej Siewior @ 2020-05-29 13:14 UTC (permalink / raw) To: Mark Marshall Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner, linux-kernel, rostedt On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote: > The easiest way we have found to reproduce the crash is to repeatedly > insert and then remove a module. The crash then appears to be related > to either paging in the module or in exiting the mdev process. (The > crash does also happen at other times, but it is hard to reproduce > reliably then). This simple script will almost always crash: > > for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done So I tried that on 5.6.14-rt7 with the qemu version of e500 (the SMP and UP version). No luck. I don't have anything with real hardware. Could you share the .config in case this is related? > (The crc7 module is chosen as it is small and simple. Any module will > work / crash). > > We have tried kernels v5.0, v5.2 and v5.6. The v5.0 and v5.2 kernels > do not show the problem. The v5.6 kernel does show the problem. > Switching of RT fixes the problem. > > I have reduced the functionality in the kernel to a bare minimum > (removing networking, USB and PCI, as we have some out-of-tree patches > in those areas) and we still get the crash. … > I have added some debugging code where the mm_struct and > vma_area_struct have "poision" values at the start and the end, and > this seems to show that the vma_area_struct is getting corrupted, but > I'm not able to see where. oh. > We have switched on all of the debugging that we can, including > KASAN, and this shows nothing. > > > Can anyone help us? What can we try next? Is anyone using the e500 > with the RT kernel? Does anyone have any idea how to debug problems > related to the error message "Bad rss-counter state"? > > Any help or advice would be most gratefully received. I don't have any ideas. You could try to apply only a part of the RT patch and see if it problem is still there. If you are lucky you find the patch that introduces the problem. If not, the problem appears with the RT switch… > Many thanks, > Mark Marshall and Thomas Graziadei Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-05-29 13:14 ` Sebastian Andrzej Siewior @ 2020-05-29 15:38 ` Mark Marshall 2020-05-29 16:15 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 11+ messages in thread From: Mark Marshall @ 2020-05-29 15:38 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner, linux-kernel, rostedt Hi Sebastian & list, I had assumed that my e-mail had got lost or overlooked, I was meaning to post a follow up message this week... All I could find from the debugging and tracing that we added was that something was going wrong with the mm data structures somewhere in the exec code. In the end I just spent a week or two pouring over the diffs of this code between the versions that I new worked and didn't work. I eventually found the culprit. On the working kernel versions there is a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()". This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance. Although the commit message talks about ARM, it seems that we need this for PowerPC too (I guess, any PowerPC with the "nohash" MMU?). Could you please add this commit back to the RT branch? I'm not sure how to find out the history of this commit. For instance, why has it been removed from the RT patchset? How are these things tracked, generally? Best regards, Mark On Fri, 29 May 2020 at 15:14, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote: > > The easiest way we have found to reproduce the crash is to repeatedly > > insert and then remove a module. The crash then appears to be related > > to either paging in the module or in exiting the mdev process. (The > > crash does also happen at other times, but it is hard to reproduce > > reliably then). This simple script will almost always crash: > > > > for i in $(seq 1000) ; do echo $i ; modprobe crc7 ; rmmod crc7 ; done > > So I tried that on 5.6.14-rt7 with the qemu version of e500 (the SMP and > UP version). No luck. I don't have anything with real hardware. > Could you share the .config in case this is related? > > > (The crc7 module is chosen as it is small and simple. Any module will > > work / crash). > > > > We have tried kernels v5.0, v5.2 and v5.6. The v5.0 and v5.2 kernels > > do not show the problem. The v5.6 kernel does show the problem. > > Switching of RT fixes the problem. > > > > I have reduced the functionality in the kernel to a bare minimum > > (removing networking, USB and PCI, as we have some out-of-tree patches > > in those areas) and we still get the crash. > … > > I have added some debugging code where the mm_struct and > > vma_area_struct have "poision" values at the start and the end, and > > this seems to show that the vma_area_struct is getting corrupted, but > > I'm not able to see where. > > oh. > > > We have switched on all of the debugging that we can, including > > KASAN, and this shows nothing. > > > > > > Can anyone help us? What can we try next? Is anyone using the e500 > > with the RT kernel? Does anyone have any idea how to debug problems > > related to the error message "Bad rss-counter state"? > > > > Any help or advice would be most gratefully received. > > I don't have any ideas. You could try to apply only a part of the RT > patch and see if it problem is still there. If you are lucky you find > the patch that introduces the problem. If not, the problem appears with > the RT switch… > > > Many thanks, > > Mark Marshall and Thomas Graziadei > > Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-05-29 15:38 ` Mark Marshall @ 2020-05-29 16:15 ` Sebastian Andrzej Siewior 2020-05-29 16:37 ` Sebastian Andrzej Siewior 2020-05-29 19:03 ` Mark Marshall 0 siblings, 2 replies; 11+ messages in thread From: Sebastian Andrzej Siewior @ 2020-05-29 16:15 UTC (permalink / raw) To: Mark Marshall Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner, linux-kernel, rostedt On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote: > Hi Sebastian & list, Hi, > I had assumed that my e-mail had got lost or overlooked, I was meaning to > post a follow up message this week... > > All I could find from the debugging and tracing that we added was that > something was going wrong with the mm data structures somewhere in the > exec code. In the end I just spent a week or two pouring over the diffs > of this code between the versions that I new worked and didn't work. > > I eventually found the culprit. On the working kernel versions there is > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()". > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance. > Although the commit message talks about ARM, it seems that we need this for > PowerPC too (I guess, any PowerPC with the "nohash" MMU?). Could you drop me your config, please? I need to dig here a little and I should have seen this on qemu, right? > Could you please add this commit back to the RT branch? I'm not sure how > to find out the history of this commit. For instance, why has it been > removed from the RT patchset? How are these things tracked, generally? I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that was documented in the patch and the code that triggered the warning was removed / reworked in commit b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover") So it looked like no longer needed and then got dropped during the rebase. In order to get it back into the RT queue I need to understand why it is required. What exactly is it fixing. Let me stare at for a little… > Best regards, > Mark Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-05-29 16:15 ` Sebastian Andrzej Siewior @ 2020-05-29 16:37 ` Sebastian Andrzej Siewior 2020-07-06 16:50 ` Sebastian Andrzej Siewior 2020-05-29 19:03 ` Mark Marshall 1 sibling, 1 reply; 11+ messages in thread From: Sebastian Andrzej Siewior @ 2020-05-29 16:37 UTC (permalink / raw) To: Mark Marshall Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner, linux-kernel, rostedt On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > In order to get it back into the RT queue I need to understand why it is > required. What exactly is it fixing. Let me stare at for a little… it used to be local_irq_disable() which then became preempt_disable() local_irq_disable() due to ARM's limitation. > > Best regards, > > Mark > Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-05-29 16:37 ` Sebastian Andrzej Siewior @ 2020-07-06 16:50 ` Sebastian Andrzej Siewior 2020-07-10 10:59 ` Thomas Graziadei 0 siblings, 1 reply; 11+ messages in thread From: Sebastian Andrzej Siewior @ 2020-07-06 16:50 UTC (permalink / raw) To: Mark Marshall Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner, linux-kernel, rostedt On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote: > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > > In order to get it back into the RT queue I need to understand why it is > > required. What exactly is it fixing. Let me stare at for a little… > > it used to be local_irq_disable() which then became preempt_disable() > local_irq_disable() due to ARM's limitation. Any luck on your side? I *think* if you swap the mm assignment in exec_mmap() then it should be gone. Basically: | tsk->active_mm = mm; | tsk->mm = mm; However I think to apply something like this: diff --git a/fs/exec.c b/fs/exec.c --- a/fs/exec.c +++ b/fs/exec.c @@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm) } } task_lock(tsk); + + task_lock_mm(); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); tsk->mm = mm; tsk->active_mm = mm; activate_mm(active_mm, mm); + task_unlock_mm(); + tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -176,4 +176,31 @@ static inline void task_unlock(struct task_struct *p) spin_unlock(&p->alloc_lock); } +#ifdef CONFIG_PREEMPT_RT +/* + * Protects ->mm and ->active_mm. + * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not read the + * members while they are updated. + */ +static inline void task_lock_mm(void) +{ + preempt_disable(); +} + +static inline void task_unlock_mm(void) +{ + preempt_enable(); +} + +#else + +static inline void task_lock_mm(void) +{ +} + +static inline void task_unlock_mm(void) +{ +} +#endif + #endif /* _LINUX_SCHED_TASK_H */ diff --git a/mm/mmu_context.c b/mm/mmu_context.c --- a/mm/mmu_context.c +++ b/mm/mmu_context.c @@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm) struct task_struct *tsk = current; task_lock(tsk); + task_lock_mm(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); @@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm) } tsk->mm = mm; switch_mm(active_mm, mm, tsk); + task_unlock_mm(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm) struct task_struct *tsk = current; task_lock(tsk); + task_lock_mm(); sync_mm_rss(mm); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + task_unlock_mm(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(unuse_mm); -- 2.27.0 > > > Best regards, > > > Mark Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-07-06 16:50 ` Sebastian Andrzej Siewior @ 2020-07-10 10:59 ` Thomas Graziadei 2020-08-12 12:45 ` Thomas Graziadei 0 siblings, 1 reply; 11+ messages in thread From: Thomas Graziadei @ 2020-07-10 10:59 UTC (permalink / raw) To: 'Sebastian Andrzej Siewior', Mark Marshall Cc: linux-rt-users, Mark Marshall, Thomas Gleixner, linux-kernel, rostedt [-- Attachment #1: Type: text/plain, Size: 4133 bytes --] Hi Sebastian, thanks for looking into this. We could reproduce the issue with QEMU. At runtime you need to set mdev as the kernel's hotplug client (/proc/sys/kernel/hotplug) and give it a dummy /etc/mdev.conf like (.* 1:1 777). Then just do a loop and insmod/rmmod crc4.ko and crc7.ko. Swapping the mm assignment did not work -> exception after 1900 iterations Your second suggestion with check.patch (attached to this email for completeness, only protecting the exec_mmap function) did not work eighter -> exception after 2600 iterations Your third suggestion (a modification to the original revert) enclosed in this e-mail does seem to work. Still no problems after 30000 iterations. By the way, as noticed in your kernel config, we would be quite interested in a gcc 9 compiler for our platform. Is there a mainline/maintained version or fork for this or another possibility to get it? Regards, Thomas -----Original Message----- From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de] Sent: Monday, July 06, 2020 6:50 PM To: Mark Marshall <markmarshall14@gmail.com> Cc: linux-rt-users <linux-rt-users@vger.kernel.org>; Mark Marshall <mark.marshall@omicronenergy.com>; Thomas Graziadei <thomas.graziadei@omicronenergy.com>; Thomas Gleixner <tglx@linutronix.de>; linux-kernel@vger.kernel.org; rostedt@goodmis.org Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote: > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > > In order to get it back into the RT queue I need to understand why > > it is required. What exactly is it fixing. Let me stare at for a > > little… > > it used to be local_irq_disable() which then became preempt_disable() > local_irq_disable() due to ARM's limitation. Any luck on your side? I *think* if you swap the mm assignment in exec_mmap() then it should be gone. Basically: | tsk->active_mm = mm; | tsk->mm = mm; However I think to apply something like this: diff --git a/fs/exec.c b/fs/exec.c --- a/fs/exec.c +++ b/fs/exec.c @@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm) } } task_lock(tsk); + + task_lock_mm(); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); tsk->mm = mm; tsk->active_mm = mm; activate_mm(active_mm, mm); + task_unlock_mm(); + tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -176,4 +176,31 @@ static inline void task_unlock(struct task_struct *p) spin_unlock(&p->alloc_lock); } +#ifdef CONFIG_PREEMPT_RT +/* + * Protects ->mm and ->active_mm. + * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not read +the + * members while they are updated. + */ +static inline void task_lock_mm(void) +{ + preempt_disable(); +} + +static inline void task_unlock_mm(void) { + preempt_enable(); +} + +#else + +static inline void task_lock_mm(void) +{ +} + +static inline void task_unlock_mm(void) { } #endif + #endif /* _LINUX_SCHED_TASK_H */ diff --git a/mm/mmu_context.c b/mm/mmu_context.c --- a/mm/mmu_context.c +++ b/mm/mmu_context.c @@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm) struct task_struct *tsk = current; task_lock(tsk); + task_lock_mm(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); @@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm) } tsk->mm = mm; switch_mm(active_mm, mm, tsk); + task_unlock_mm(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm) struct task_struct *tsk = current; task_lock(tsk); + task_lock_mm(); sync_mm_rss(mm); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + task_unlock_mm(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(unuse_mm); -- 2.27.0 > > > Best regards, > > > Mark Sebastian [-- Attachment #2: check.patch --] [-- Type: application/octet-stream, Size: 437 bytes --] diff --git a/fs/exec.c b/fs/exec.c index 77603ceed51f9..1310fb4d5f0d4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1037,8 +1037,12 @@ static int exec_mmap(struct mm_struct *mm) task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); + + preempt_disable(); tsk->mm = mm; tsk->active_mm = mm; + preempt_enable(); + activate_mm(active_mm, mm); tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-07-10 10:59 ` Thomas Graziadei @ 2020-08-12 12:45 ` Thomas Graziadei 2020-08-19 7:11 ` 'Sebastian Andrzej Siewior' 2020-09-01 7:41 ` 'Sebastian Andrzej Siewior' 0 siblings, 2 replies; 11+ messages in thread From: Thomas Graziadei @ 2020-08-12 12:45 UTC (permalink / raw) To: 'Sebastian Andrzej Siewior', Mark Marshall Cc: linux-rt-users, Mark Marshall, Thomas Gleixner, linux-kernel, rostedt Hi Sebastian, any progress on your side? Do you think the patch could be applied for the next versions? Regards, Thomas On Fri, 2020-07-10 at 10:59 +0000, Thomas Graziadei wrote: > Hi Sebastian, > > thanks for looking into this. > > We could reproduce the issue with QEMU. > At runtime you need to set mdev as the kernel's hotplug client > (/proc/sys/kernel/hotplug) and give it a dummy /etc/mdev.conf like > (.* 1:1 777). Then just do a loop and insmod/rmmod crc4.ko and > crc7.ko. > > Swapping the mm assignment did not work -> exception after 1900 > iterations > Your second suggestion with check.patch (attached to this email for > completeness, only protecting the exec_mmap function) did not work > eighter -> exception after 2600 iterations > > Your third suggestion (a modification to the original revert) > enclosed in this e-mail does seem to work. Still no problems after > 30000 iterations. > > By the way, as noticed in your kernel config, we would be quite > interested in a gcc 9 compiler for our platform. Is there a > mainline/maintained version or fork for this or another possibility > to get it? > > Regards, > Thomas > > -----Original Message----- > From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de] > Sent: Monday, July 06, 2020 6:50 PM > To: Mark Marshall <markmarshall14@gmail.com> > Cc: linux-rt-users <linux-rt-users@vger.kernel.org>; Mark Marshall < > mark.marshall@omicronenergy.com>; Thomas Graziadei < > thomas.graziadei@omicronenergy.com>; Thomas Gleixner < > tglx@linutronix.de>; linux-kernel@vger.kernel.org; > rostedt@goodmis.org > Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 > and PowerPC e500 > > On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote: > > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > > > In order to get it back into the RT queue I need to understand > > > why > > > it is required. What exactly is it fixing. Let me stare at for a > > > little… > > > > it used to be local_irq_disable() which then became > > preempt_disable() > > local_irq_disable() due to ARM's limitation. > > Any luck on your side? > > I *think* if you swap the mm assignment in exec_mmap() then it should > be gone. Basically: > > tsk->active_mm = mm; > > tsk->mm = mm; > > However I think to apply something like this: > > diff --git a/fs/exec.c b/fs/exec.c > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1035,11 +1035,15 @@ static int exec_mmap(struct mm_struct *mm) > } > } > task_lock(tsk); > + > + task_lock_mm(); > active_mm = tsk->active_mm; > membarrier_exec_mmap(mm); > tsk->mm = mm; > tsk->active_mm = mm; > activate_mm(active_mm, mm); > + task_unlock_mm(); > + > tsk->mm->vmacache_seqnum = 0; > vmacache_flush(tsk); > task_unlock(tsk); > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -176,4 +176,31 @@ static inline void task_unlock(struct > task_struct *p) > spin_unlock(&p->alloc_lock); > } > > +#ifdef CONFIG_PREEMPT_RT > +/* > + * Protects ->mm and ->active_mm. > + * Avoids scheduling so switch_mm() or enter_lazy_tlb() will not > read > +the > + * members while they are updated. > + */ > +static inline void task_lock_mm(void) > +{ > + preempt_disable(); > +} > + > +static inline void task_unlock_mm(void) { > + preempt_enable(); > +} > + > +#else > + > +static inline void task_lock_mm(void) > +{ > +} > + > +static inline void task_unlock_mm(void) { } #endif > + > #endif /* _LINUX_SCHED_TASK_H */ > diff --git a/mm/mmu_context.c b/mm/mmu_context.c > --- a/mm/mmu_context.c > +++ b/mm/mmu_context.c > @@ -25,6 +25,7 @@ void use_mm(struct mm_struct *mm) > struct task_struct *tsk = current; > > task_lock(tsk); > + task_lock_mm(); > active_mm = tsk->active_mm; > if (active_mm != mm) { > mmgrab(mm); > @@ -32,6 +33,7 @@ void use_mm(struct mm_struct *mm) > } > tsk->mm = mm; > switch_mm(active_mm, mm, tsk); > + task_unlock_mm(); > task_unlock(tsk); > #ifdef finish_arch_post_lock_switch > finish_arch_post_lock_switch(); > @@ -55,10 +57,12 @@ void unuse_mm(struct mm_struct *mm) > struct task_struct *tsk = current; > > task_lock(tsk); > + task_lock_mm(); > sync_mm_rss(mm); > tsk->mm = NULL; > /* active_mm is still 'mm' */ > enter_lazy_tlb(mm, tsk); > + task_unlock_mm(); > task_unlock(tsk); > } > EXPORT_SYMBOL_GPL(unuse_mm); > -- > 2.27.0 > > > > > Best regards, > > > > Mark > > Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-08-12 12:45 ` Thomas Graziadei @ 2020-08-19 7:11 ` 'Sebastian Andrzej Siewior' 2020-09-01 7:41 ` 'Sebastian Andrzej Siewior' 1 sibling, 0 replies; 11+ messages in thread From: 'Sebastian Andrzej Siewior' @ 2020-08-19 7:11 UTC (permalink / raw) To: Thomas Graziadei Cc: Mark Marshall, linux-rt-users, Mark Marshall, Thomas Gleixner, linux-kernel, rostedt On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote: > Hi Sebastian, Hi Thomas, > any progress on your side? due to lack of time none. But I am on it… > Do you think the patch could be applied for the next versions? So I had a theory why it happens but then you said no so now I need to figure out why it happens so I can write it in the changelog. I believe you made it happen in qemu and you sent a .config and everything so I will stare into it as soon as I can. > Regards, > Thomas Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-08-12 12:45 ` Thomas Graziadei 2020-08-19 7:11 ` 'Sebastian Andrzej Siewior' @ 2020-09-01 7:41 ` 'Sebastian Andrzej Siewior' 1 sibling, 0 replies; 11+ messages in thread From: 'Sebastian Andrzej Siewior' @ 2020-09-01 7:41 UTC (permalink / raw) To: Thomas Graziadei Cc: Mark Marshall, linux-rt-users, Mark Marshall, Thomas Gleixner, linux-kernel, rostedt On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote: > Hi Sebastian, Hi Thomas, > any progress on your side? > > Do you think the patch could be applied for the next versions? Yes. The ->active_mm change needs to be protected against scheduling regardless of the arch/mmu. Otherwise the mm will be put twice. For this to trigger you need to exec from a kernel thread and get preempted. This will be addressed in use_mm() by commit 38cf307c1f201 ("mm: fix kthread_use_mm() vs TLB invalidate") which is in v5.9-rc1 and exec_mmap() is under discussion at https://lore.kernel.org/linux-arch/20200828100022.1099682-2-npiggin@gmail.com/ > Regards, > Thomas Sebastian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 2020-05-29 16:15 ` Sebastian Andrzej Siewior 2020-05-29 16:37 ` Sebastian Andrzej Siewior @ 2020-05-29 19:03 ` Mark Marshall 1 sibling, 0 replies; 11+ messages in thread From: Mark Marshall @ 2020-05-29 19:03 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner, linux-kernel, rostedt [-- Attachment #1: Type: text/plain, Size: 2851 bytes --] My config is attached. This is the greatly reduced config that I used when trying to narrow down the problem. We normally have much more enabled, but that had no effect on the bug in my testing. We do, unfortunately, have quite a few out-of-tree patches, but they are all in USB or Networking, which are disabled here. I've never tried out the kernel under qemu, but I will try that next week to see if I can reproduce the problem there. It's certainly quite a narrow race window though, so it might behave quite differently under qemu. In general, how reliable is qemu at showing these kinds of problems? Thanks, Mark PS. I've also noticed that THREAD_SHIFT is set in this config. That's because when I added lots of debug options, I got warnings about the stack being too small. This had no impact on the bug that I had, I increased the size of the stack, and the stack warnings stopped, but the bug was still the same. On Fri, 29 May 2020 at 18:15, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote: > > Hi Sebastian & list, > Hi, > > > I had assumed that my e-mail had got lost or overlooked, I was meaning to > > post a follow up message this week... > > > > All I could find from the debugging and tracing that we added was that > > something was going wrong with the mm data structures somewhere in the > > exec code. In the end I just spent a week or two pouring over the diffs > > of this code between the versions that I new worked and didn't work. > > > > I eventually found the culprit. On the working kernel versions there is > > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()". > > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance. > > Although the commit message talks about ARM, it seems that we need this for > > PowerPC too (I guess, any PowerPC with the "nohash" MMU?). > > Could you drop me your config, please? I need to dig here a little and I > should have seen this on qemu, right? > > > Could you please add this commit back to the RT branch? I'm not sure how > > to find out the history of this commit. For instance, why has it been > > removed from the RT patchset? How are these things tracked, generally? > > I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that > was documented in the patch and the code that triggered the warning was > removed / reworked in commit > b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover") > > So it looked like no longer needed and then got dropped during the > rebase. > In order to get it back into the RT queue I need to understand why it is > required. What exactly is it fixing. Let me stare at for a little… > > > Best regards, > > Mark > > Sebastian [-- Attachment #2: config-5.4-rt --] [-- Type: application/octet-stream, Size: 5142 bytes --] # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_HIGH_RES_TIMERS=y CONFIG_PREEMPT_RT=y CONFIG_IRQ_TIME_ACCOUNTING=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_RCU_EXPERT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_BLK_DEV_INITRD=y # CONFIG_RD_BZIP2 is not set # CONFIG_RD_LZMA is not set # CONFIG_RD_XZ is not set # CONFIG_RD_LZO is not set # CONFIG_RD_LZ4 is not set # CONFIG_SGETMASK_SYSCALL is not set # CONFIG_SYSFS_SYSCALL is not set CONFIG_KALLSYMS_ALL=y CONFIG_BPF_SYSCALL=y # CONFIG_RSEQ is not set CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_COMPAT_BRK is not set CONFIG_PPC_85xx=y CONFIG_MPC85xx_DS=y CONFIG_MPC85xx_RDB=y CONFIG_P1010_RDB=y CONFIG_MAIO400=y CONFIG_MIC400=y CONFIG_GEN_RTC=y CONFIG_HZ_1000=y CONFIG_THREAD_SHIFT=14 # CONFIG_SUSPEND is not set # CONFIG_SECCOMP is not set CONFIG_FSL_LBC=y CONFIG_JUMP_LABEL=y CONFIG_STRICT_KERNEL_RWX=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y # CONFIG_BLK_DEV_BSG is not set CONFIG_BLK_DEV_INTEGRITY=y # CONFIG_MQ_IOSCHED_DEADLINE is not set # CONFIG_MQ_IOSCHED_KYBER is not set # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set # CONFIG_COMPACTION is not set # CONFIG_MIGRATION is not set CONFIG_UEVENT_HELPER=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y # CONFIG_STANDALONE is not set CONFIG_FW_LOADER_USER_HELPER=y CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y CONFIG_MTD=y CONFIG_MTD_CMDLINE_PARTS=y CONFIG_MTD_BLOCK=y CONFIG_MTD_CFI=y CONFIG_MTD_CFI_INTELEXT=y CONFIG_MTD_CFI_AMDSTD=y CONFIG_MTD_RAW_NAND=y CONFIG_MTD_NAND_FSL_IFC=y CONFIG_MTD_SPI_NOR=y CONFIG_MTD_UBI=y CONFIG_MTD_UBI_FASTMAP=y CONFIG_MTD_UBI_BLOCK=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_LOOP_MIN_COUNT=1 CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=2 CONFIG_BLK_DEV_RAM_SIZE=131072 CONFIG_EEPROM_AT24=y CONFIG_EEPROM_AT25=y CONFIG_EEPROM_93CX6=m CONFIG_SCSI=y # CONFIG_SCSI_PROC_FS is not set CONFIG_BLK_DEV_SD=y # CONFIG_SCSI_LOWLEVEL is not set CONFIG_INPUT_EVDEV=y # CONFIG_KEYBOARD_ATKBD is not set CONFIG_KEYBOARD_GPIO=y # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_LEGACY_PTY_COUNT=64 CONFIG_DEVKMEM=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_NR_UARTS=2 CONFIG_SERIAL_8250_RUNTIME_UARTS=2 CONFIG_SERIAL_8250_MANY_PORTS=y CONFIG_SERIAL_8250_DETECT_IRQ=y CONFIG_SERIAL_8250_RSA=y # CONFIG_NVRAM is not set CONFIG_TCG_TPM=y CONFIG_TCG_TIS_SPI=y CONFIG_I2C=y CONFIG_I2C_CHARDEV=y CONFIG_I2C_MPC=y CONFIG_SPI=y CONFIG_SPI_FSL_ESPI=y CONFIG_GPIOLIB=y CONFIG_GPIO_SYSFS=y CONFIG_GPIO_MPC8XXX=y CONFIG_GPIO_PCA953X=y CONFIG_GPIO_PCA953X_IRQ=y # CONFIG_HWMON is not set CONFIG_WATCHDOG=y CONFIG_WATCHDOG_NOWAYOUT=y CONFIG_BOOKE_WDT=y CONFIG_BOOKE_WDT_DEFAULT_TIMEOUT=34 # CONFIG_VGA_CONSOLE is not set # CONFIG_HID is not set # CONFIG_USB_SUPPORT is not set CONFIG_RTC_DRV_DS1307=y CONFIG_RTC_DRV_CMOS=y # CONFIG_DNOTIFY is not set CONFIG_PROC_KCORE=y CONFIG_TMPFS=y CONFIG_CONFIGFS_FS=y CONFIG_JFFS2_FS=y CONFIG_JFFS2_FS_WBUF_VERIFY=y CONFIG_JFFS2_SUMMARY=y CONFIG_JFFS2_FS_XATTR=y CONFIG_UBIFS_FS=y CONFIG_SQUASHFS=y CONFIG_SQUASHFS_FILE_DIRECT=y CONFIG_SQUASHFS_XATTR=y CONFIG_SQUASHFS_LZ4=y CONFIG_SQUASHFS_LZO=y CONFIG_SQUASHFS_XZ=y CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y CONFIG_KEYS=y CONFIG_CRYPTO_ECDH=y CONFIG_CRYPTO_CCM=y CONFIG_CRYPTO_GCM=y CONFIG_CRYPTO_ECHAINIV=m CONFIG_CRYPTO_CBC=y CONFIG_CRYPTO_CTS=y CONFIG_CRYPTO_XTS=y CONFIG_CRYPTO_ESSIV=y CONFIG_CRYPTO_CMAC=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_MD5_PPC=y CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA1_PPC_SPE=y CONFIG_CRYPTO_SHA256_PPC_SPE=y CONFIG_CRYPTO_SHA512=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_AES_PPC_SPE=y CONFIG_CRYPTO_ARC4=y CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_DEV_FSL_CAAM=y CONFIG_ASYMMETRIC_KEY_TYPE=y CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y CONFIG_X509_CERTIFICATE_PARSER=y CONFIG_PKCS7_MESSAGE_PARSER=y CONFIG_SYSTEM_TRUSTED_KEYRING=y CONFIG_CRC_CCITT=m CONFIG_CRC_ITU_T=m CONFIG_CRC7=m CONFIG_LIBCRC32C=y # CONFIG_XZ_DEC_X86 is not set # CONFIG_XZ_DEC_IA64 is not set # CONFIG_XZ_DEC_ARM is not set # CONFIG_XZ_DEC_ARMTHUMB is not set # CONFIG_XZ_DEC_SPARC is not set CONFIG_DYNAMIC_DEBUG=y CONFIG_STRIP_ASM_SYMS=y CONFIG_DEBUG_PAGEALLOC=y CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y CONFIG_PAGE_POISONING=y CONFIG_DEBUG_OBJECTS=y CONFIG_DEBUG_OBJECTS_FREE=y CONFIG_DEBUG_OBJECTS_TIMERS=y CONFIG_DEBUG_OBJECTS_WORK=y CONFIG_DEBUG_OBJECTS_RCU_HEAD=y CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y CONFIG_DEBUG_KMEMLEAK=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_VMACACHE=y CONFIG_DEBUG_VM_RB=y CONFIG_DEBUG_VM_PGFLAGS=y CONFIG_DEBUG_VM_POISON=y CONFIG_DEBUG_VIRTUAL=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_KASAN=y CONFIG_DETECT_HUNG_TASK=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=60 CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y # CONFIG_SCHED_DEBUG is not set CONFIG_SCHED_STACK_END_CHECK=y # CONFIG_DEBUG_PREEMPT is not set # CONFIG_DEBUG_BUGVERBOSE is not set CONFIG_RCU_EQS_DEBUG=y CONFIG_FUNCTION_TRACER=y CONFIG_BUG_ON_DATA_CORRUPTION=y CONFIG_UBSAN=y CONFIG_PPC_DISABLE_WERROR=y CONFIG_PPC_EMULATED_STATS=y CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y CONFIG_BDI_SWITCH=y ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-09-01 7:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-04 9:40 Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 Mark Marshall 2020-05-29 13:14 ` Sebastian Andrzej Siewior 2020-05-29 15:38 ` Mark Marshall 2020-05-29 16:15 ` Sebastian Andrzej Siewior 2020-05-29 16:37 ` Sebastian Andrzej Siewior 2020-07-06 16:50 ` Sebastian Andrzej Siewior 2020-07-10 10:59 ` Thomas Graziadei 2020-08-12 12:45 ` Thomas Graziadei 2020-08-19 7:11 ` 'Sebastian Andrzej Siewior' 2020-09-01 7:41 ` 'Sebastian Andrzej Siewior' 2020-05-29 19:03 ` Mark Marshall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).