linux-m68k.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Schmitz <schmitzmic@gmail.com>
To: Finn Thain <fthain@linux-m68k.org>
Cc: linux-m68k@vger.kernel.org
Subject: Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
Date: Mon, 13 Sep 2021 15:26:24 +1200	[thread overview]
Message-ID: <82f6f161-b9e0-bf9b-3c20-aa2ce810d99a@gmail.com> (raw)
In-Reply-To: <d59a44c-ddea-a774-d217-2484ad582dc0@linux-m68k.org>

Hi Finn,

On 13/09/21 13:27, Finn Thain wrote:
> On Sun, 12 Sep 2021, Finn Thain wrote:
>
>> ... I've now done as you did, that is,
>>
>> diff --git a/arch/m68k/kernel/irq.c b/arch/m68k/kernel/irq.c
>> index 9ab4f550342e..b46d8a57f4da 100644
>> --- a/arch/m68k/kernel/irq.c
>> +++ b/arch/m68k/kernel/irq.c
>> @@ -20,10 +20,13 @@
>>  asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
>>  {
>>  	struct pt_regs *oldregs = set_irq_regs(regs);
>> +	unsigned long flags;
>>
>> +	local_irq_save(flags);
>>  	irq_enter();
>>  	generic_handle_irq(irq);
>>  	irq_exit();
>> +	local_irq_restore(flags);
>>
>>  	set_irq_regs(oldregs);
>>  }
>>
>> There may be a better way to achieve that. If the final IPL can be found
>> in regs then it doesn't need to be saved again.
>>
>> I haven't looked for a possible entropy pool improvement from correct
>> locking in random.c -- it would not surprise me if there was one.
>>
>> But none of this explains the panics I saw so I went looking for potential
>> race conditions in the irq_enter_rcu() and irq_exit_rcu() code. I haven't
>> found the bug yet.
>>
>
> Turns out that the panic bug was not affected by that patch...
>
> running --mmap -1 --mmap-odirect --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
> stress-ng: 17:06:09.62 info:  [1241] setting to a 60 second run per stressor
> stress-ng: 17:06:09.62 info:  [1241] dispatching hogs: 1 mmap
> [  807.270000] Kernel panic - not syncing: Aiee, killing interrupt handler!
> [  807.270000] CPU: 0 PID: 1243 Comm: stress-ng Not tainted 5.14.0-multi-00002-g69f953866c7e #2
> [  807.270000] Stack from 00bcbde4:
> [  807.270000]         00bcbde4 00488d85 00488d85 000c0000 00bcbe00 003f3708 00488d85 00bcbe20
> [  807.270000]         003f270e 000c0000 418004fc 00bca000 009f8a80 00bca000 00a06fc0 00bcbe5c
> [  807.270000]         000317f6 0048098b 00000009 418004fc 00bca000 00000000 07408000 00000009
> [  807.270000]         00000008 00bcbf38 00a06fc0 00000006 00000000 00000001 00bcbe6c 000319ac
> [  807.270000]         00000009 01438a20 00bcbeb8 0003acf0 00000009 0000000f 0000000e c043c000
> [  807.270000]         00000000 07408000 00000003 00bcbf98 efb2c944 efb2b8a8 00039afa 00bca000
> [  807.270000] Call Trace: [<000c0000>] insert_vmap_area.constprop.91+0xbc/0x15a
> [  807.270000]  [<003f3708>] dump_stack+0x10/0x16
> [  807.270000]  [<003f270e>] panic+0xba/0x2bc
> [  807.270000]  [<000c0000>] insert_vmap_area.constprop.91+0xbc/0x15a
> [  807.270000]  [<000317f6>] do_exit+0x87e/0x9d6
> [  807.270000]  [<000319ac>] do_group_exit+0x28/0xb6
> [  807.270000]  [<0003acf0>] get_signal+0x126/0x720
> [  807.270000]  [<00039afa>] send_signal+0xde/0x16e
> [  807.270000]  [<00004f70>] do_notify_resume+0x38/0x61c
> [  807.270000]  [<0003abaa>] force_sig_fault_to_task+0x36/0x3a
> [  807.270000]  [<0003abc6>] force_sig_fault+0x18/0x1c
> [  807.270000]  [<000074f4>] send_fault_sig+0x44/0xc6
> [  807.270000]  [<00006a62>] buserr_c+0x2c8/0x6a2
> [  807.270000]  [<00002cfc>] do_signal_return+0x10/0x1a
> [  807.270000]  [<0018800e>] ext4_htree_fill_tree+0x7c/0x32a
> [  807.270000]  [<0010800a>] d_absolute_path+0x18/0x6a
> [  807.270000]
> [  807.270000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
>
> On the Quadra 630, the panic almost completely disappeared when I enabled
> the relevant CONFIG_DEBUG_* options. After about 7 hours of stress testing
> I got this:
>
> [23982.680000] list_add corruption. next->prev should be prev (00b51e98), but was 00bb22d8. (next=00b75cd0).

I chased a similar list corruption bug (shadow LRU list corrupt in 
mm/workingset.c:shadow_lru_isolate()) in 4.10. I believe that was 
related to an out of bounds memory access - maybe get_reg() from 
drivers/char/random.c but it might have been something else.

That bug had disappeared in 4.12, haven't seen it ever since.


> [23982.690000] kernel BUG at lib/list_debug.c:25!
> [23982.700000] *** TRAP #7 ***   FORMAT=0
> [23982.710000] Current process id is 15489
> [23982.720000] BAD KERNEL TRAP: 00000000
> [23982.740000] Modules linked in:
> [23982.750000] PC: [<00261e62>] __list_add_valid+0x62/0xc0
> [23982.760000] SR: 2000  SP: e2fb938b  a2: 00bcba80
> [23982.770000] d0: 00000022    d1: 00000002    d2: 008c4e40    d3: 00b7a9c0
> [23982.780000] d4: 00b51e98    d5: 000da3c0    a0: 00067f00    a1: 00b51d2c
> [23982.790000] Process stress-ng (pid: 15489, task=35ee07ca)
> [23982.800000] Frame format=0
> [23982.810000] Stack from 00b51e80:
> [23982.810000]         004cbab9 004ea3a1 00000019 004ea34f 00b51e98 00bb22d8 00b75cd0 008c4e38
> [23982.810000]         00b51ecc 000da3f2 008c4e40 00b51e98 00b75cd0 00b51e5c 000f5d40 00b75cd0
> [23982.810000]         00b7a9c0 00bb22d0 00b7a9c0 00b51f04 000dc346 00b51e5c 008c4e38 00b7a9c0
> [23982.810000]         c4c97000 00000000 c4c96000 00102073 00b14960 c4c97000 00b51e5c 00b75c94
> [23982.810000]         00000001 00b51f24 000d5628 00b51e5c 00b75c94 00102070 00000000 00b75c94
> [23982.810000]         00b75c94 00b51f3c 000d5728 00b14960 00b75c94 c4c97000 00000000 00b51f78
> [23982.830000] Call Trace: [<000da3f2>] anon_vma_chain_link+0x32/0x80
> [23982.840000]  [<000f5d40>] kmem_cache_alloc+0x0/0x200
> [23982.850000]  [<000dc346>] anon_vma_clone+0xc6/0x180
> [23982.860000]  [<00102073>] cdev_get+0x33/0x80
> [23982.870000]  [<000d5628>] __split_vma+0x68/0x140
> [23982.880000]  [<00102070>] cdev_get+0x30/0x80
> [23982.890000]  [<000d5728>] split_vma+0x28/0x40
> [23982.900000]  [<000d83ba>] mprotect_fixup+0x13a/0x200
> [23982.910000]  [<00102070>] cdev_get+0x30/0x80
> [23982.920000]  [<000d8280>] mprotect_fixup+0x0/0x200
> [23982.930000]  [<000d85b2>] sys_mprotect+0x132/0x1c0
> [23982.940000]  [<00102070>] cdev_get+0x30/0x80
> [23982.950000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [23982.960000]  [<000071df>] flush_icache_range+0x1f/0x40
> [23982.970000]  [<00002ca4>] syscall+0x8/0xc
> [23982.980000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [23982.990000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [23983.000000]  [<00002000>] _start+0x0/0x40
> [23983.010000]  [<0018800e>] ext4_ext_remove_space+0x20e/0x1540
> [23983.030000]
> [23983.040000] Code: 4879 004e a3a1 4879 004c bab9 4e93 4e47 <b089> 6704 b088 661c 2f08 2f2e 000c 2f00 4879 004e a404 47f9 0043 d16c 4e93 4878
> [23983.060000] Disabling lock debugging due to kernel taint
>
> I am still unable to reproduce this in Aranym or QEMU. (Though I did find
> a QEMU bug in the attempt.)
>
> I suppose list pointer corruption could have resulted in the above panic
> had it gone undetected. So it's tempting to blame the panic on bad DRAM --

Yes, such list corruption may well cause a kernel panic. I'd expect bad 
DRAM to manifest other ways before a corrupted linked list causes a 
kernel panic though. Filesystem corruption, for instance.

> especially if this anon_vma_chain struct always gets placed at the same
> physical address (?)

Does it? I think this would be part of the per-process VMA data, not 
something like page tables ...

Incidentally - have you ever checked whether Al Viro's signal handling 
fixes have an impact on these bugs?

Cheers,

	Michael



  reply	other threads:[~2021-09-13  3:28 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
2021-07-09  7:01 ` [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES Christoph Hellwig
2021-07-09  7:01 ` [PATCH 2/7] m68k: use BUILD_BUG for passing invalid sizes to get_user/put_user Christoph Hellwig
2021-07-09  7:01 ` [PATCH 3/7] m68k: remove the inline copy_{from,to}_user variants Christoph Hellwig
2021-07-09  7:01 ` [PATCH 4/7] m68k: remove the err argument to the get_user/put_user assembly helpers Christoph Hellwig
2021-07-09  7:01 ` [PATCH 5/7] m68k: factor the 8-byte lowlevel {get,put}_user code into helpers Christoph Hellwig
2021-07-09  7:01 ` [PATCH 6/7] m68k: provide __{get,put}_kernel_nofault Christoph Hellwig
2021-07-09  7:01 ` [PATCH 7/7] m68k: remove set_fs() Christoph Hellwig
2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
2021-07-12  9:50   ` Christoph Hellwig
2021-07-12 10:20   ` Andreas Schwab
2021-07-12 19:12     ` Michael Schmitz
2021-07-13  5:41       ` Christoph Hellwig
2021-07-13  8:16         ` Michael Schmitz
2021-07-13  8:54           ` Christoph Hellwig
2021-07-14 19:26             ` Michael Schmitz
2021-07-14 20:03               ` Andreas Schwab
2021-07-15  5:44                 ` Michael Schmitz
2021-07-16  2:03               ` Michael Schmitz
2021-07-17  5:41                 ` Michael Schmitz
2021-07-18  1:14                   ` Michael Schmitz
2021-07-21 17:05                     ` Christoph Hellwig
2021-07-21 19:20                       ` Michael Schmitz
2021-07-23  4:00                       ` Michael Schmitz
2021-07-23  5:11                         ` Christoph Hellwig
2021-07-25  7:36                           ` Michael Schmitz
2021-07-31 19:31                             ` Michael Schmitz
2021-08-06  3:10                               ` Michael Schmitz
2021-08-11  9:12                                 ` Christoph Hellwig
2021-08-12  3:37                                   ` Michael Schmitz
2021-08-15  7:42                                 ` Christoph Hellwig
2021-08-15 19:17                                   ` Michael Schmitz
2021-08-16  6:58                                     ` Christoph Hellwig
     [not found]                                       ` <23f745f2-9086-81fb-3d9e-40ea08a1923@linux-m68k.org>
     [not found]                                         ` <20210816075155.GA29187@lst.de>
     [not found]                                           ` <d407a2a1-738b-5cd5-c2ed-b7250c5da8ec@gmail.com>
     [not found]                                             ` <83571ae-10ae-2919-cde-b6b4a5769c9@linux-m68k.org>
     [not found]                                               ` <dc594142-e459-533e-cac2-c7a213cec464@gmail.com>
     [not found]                                                 ` <f4ab2dcb-6761-c60b-54ce-35d0d017d371@gmail.com>
     [not found]                                                   ` <d772d22e-a945-3e35-80a2-f4783893bea@linux-m68k.org>
     [not found]                                                     ` <b2c55280-657b-51c2-065c-3fc93db050b9@gmail.com>
     [not found]                                                       ` <d7b8f7eb-fc18-c8d-fe3e-dcdf19d3f4b@linux-m68k.org>
     [not found]                                                         ` <755e55ba-4ce2-b4e4-a628-5abc183a557a@linux-m68k.org>
     [not found]                                                           ` <b52a10fe-3e4b-5740-d3f8-52bce3bc988@linux-m68k.org>
     [not found]                                                             ` <31f27da7-be60-8eb-9834-748b653c2246@linux-m68k.org>
2021-09-07  3:28                                                               ` Mainline kernel crashes, was " Finn Thain
2021-09-07  5:53                                                                 ` Michael Schmitz
2021-09-07 23:50                                                                   ` Finn Thain
2021-09-08  8:54                                                                     ` Michael Schmitz
2021-09-09  9:40                                                                       ` Finn Thain
2021-09-09 23:29                                                                         ` Michael Schmitz
2021-09-09 22:51                                                                       ` Finn Thain
2021-09-10  0:03                                                                         ` Michael Schmitz
2021-09-12  0:51                                                                           ` Finn Thain
2021-09-12  3:55                                                                             ` Brad Boyer
2021-09-13  1:27                                                                             ` Finn Thain
2021-09-13  3:26                                                                               ` Michael Schmitz [this message]
2021-09-13  5:22                                                                                 ` Finn Thain
2021-09-13  7:20                                                                                   ` Michael Schmitz
2021-09-14  3:13                                                                                     ` Michael Schmitz
2021-09-15  1:38                                                                                     ` Finn Thain
2021-09-15  8:37                                                                                       ` Michael Schmitz
2021-09-16  9:04                                                                                         ` Finn Thain
2021-09-16 22:28                                                                                           ` Michael Schmitz
2021-09-21 21:14                                       ` Michael Schcmitz
2021-08-22 19:33                                         ` Michael Schmitz
2021-08-23  4:04                                           ` Michael Schmitz
2021-08-23 17:59                                           ` Linus Torvalds
2021-08-23 21:31                                             ` Michael Schmitz
2021-08-23 21:49                                               ` Linus Torvalds
2021-08-24  8:08                                                 ` Andreas Schwab
2021-08-24  8:44                                                 ` Michael Schmitz
2021-08-24  8:59                                                   ` Andreas Schwab
2021-08-25  7:51                                                     ` Michael Schmitz
2021-08-25  8:44                                                       ` Andreas Schwab
2021-08-25 22:59                                                         ` Michael Schmitz
2021-08-25 23:30                                                           ` Brad Boyer
2021-08-26  7:46                                                             ` Michael Schmitz
2021-08-26  7:45                                                           ` Andreas Schwab
2021-09-14  2:43                                             ` Michael Schmitz
2021-09-14 15:54                                               ` Linus Torvalds
2021-09-14 16:28                                                 ` Al Viro
2021-09-14 16:38                                                   ` Linus Torvalds
2021-09-15  1:06                                                     ` Al Viro
2021-07-12 19:04   ` Michael Schmitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82f6f161-b9e0-bf9b-3c20-aa2ce810d99a@gmail.com \
    --to=schmitzmic@gmail.com \
    --cc=fthain@linux-m68k.org \
    --cc=linux-m68k@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).