From: Michael Schmitz <schmitzmic@gmail.com>
To: Finn Thain <fthain@linux-m68k.org>
Cc: linux-m68k@vger.kernel.org
Subject: Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
Date: Fri, 17 Sep 2021 10:28:56 +1200 [thread overview]
Message-ID: <f2094795-c683-b7b2-76e9-dc33e1610c96@gmail.com> (raw)
In-Reply-To: <19f1bb6c-5ac5-e7d-c7f4-f89b5e6c8ec6@linux-m68k.org>
Hi Finn,
On 16/09/21 21:04, Finn Thain wrote:
> On Wed, 15 Sep 2021, Michael Schmitz wrote:
>
>> On 15/09/21 13:38, Finn Thain wrote:
>>> On Mon, 13 Sep 2021, Michael Schmitz wrote:
>>>
>>>>>> Incidentally - have you ever checked whether Al Viro's signal
>>>>>> handling fixes have an impact on these bugs?
>>>>>
>>>>> I will try that patch series if you think it is related.
>>>>
>>>> Initial tests look promising (but I've said that before).
>>>
>>> Here's what I found in recent tests on my Quadra 630.
>>>
>>> The usual stress-ng panic can happen without list corruption, even
>>> with local_irq_save/restore() added to do_IRQ().
>>>
>>> The panic did not show up at all during stress tests with Al's signal
>>> handling patch series.
>>>
>>> I think my results are consistent with yours.
>>
>> Thanks - that's encouraging to hear. My tests with Christoph's patches
>> on top of Al's haven't shown any further errors either, but I'll give
>> that combination some more workout.
>
> Further stress testing here using Al's patches did eventually result in
> the same panic that I see using mainline (below).
That's bad - there's another bug lurking in the exception return code,
it seems. Not a regression though.
>
>>
>> Would you care to add your tested-by for Al's patches?
>
> Sure. I haven't seen any regression, so
> Tested-by: Finn Thain <fthain@linux-m68k.org>
>
> ---
> running --mmap -1 --mmap-osync --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
> stress-ng: 22:52:11.63 info: [5491] setting to a 60 second run per stressor
> stress-ng: 22:52:11.64 info: [5491] dispatching hogs: 1 mmap
> [ 9858.090000] Kernel panic - not syncing: Aiee, killing interrupt handler!
That one's from do_exit(), right at the start. Can you instrument that
to print the hardirq and softirq counts separate?
> [ 9858.090000] CPU: 0 PID: 5493 Comm: stress-ng Not tainted 5.14.0-multi-00003-gb2406d5d331a #7
> [ 9858.090000] Stack from 00b4bde4:
> [ 9858.090000] 00b4bde4 00488d5f 00488d5f 00040000 00b4be00 003f3630 00488d5f 00b4be20
> [ 9858.090000] 003f2636 00040000 418004fc 00b4a000 009f8540 00b4a000 00a07440 00b4be5c
> [ 9858.090000] 0003171e 00480965 00000009 418004fc 00b4a000 00000000 073f8000 00000009
> [ 9858.090000] 00000008 00b4bf38 00a07440 00000006 00000000 00000001 00b4be6c 000318d4
> [ 9858.090000] 00000009 01438f30 00b4beb8 0003ac18 00000009 0000000f 0000000e c043c000
> [ 9858.090000] 00000000 073f8000 00000003 00b4bf98 eff82944 eff818a8 00039a22 00b4a000
> [ 9858.090000] Call Trace: [<00040000>] rcu_free_pwq+0x1c/0x1e
> [ 9858.090000] [<003f3630>] dump_stack+0x10/0x16
> [ 9858.090000] [<003f2636>] panic+0xba/0x2bc
> [ 9858.090000] [<00040000>] rcu_free_pwq+0x1c/0x1e
> [ 9858.090000] [<0003171e>] do_exit+0x87e/0x9d6
That offset into do_exit() does not make sense to me - in my version,
that's beyond the end of do_exit(). Does this correspond to the
in_interrupt() test in do_exit() in your image?
> [ 9858.090000] [<000318d4>] do_group_exit+0x28/0xb6
> [ 9858.090000] [<0003ac18>] get_signal+0x126/0x720
> [ 9858.090000] [<00039a22>] send_signal+0xde/0x16e
> [ 9858.090000] [<00004f0c>] do_notify_resume+0x38/0x5dc
> [ 9858.090000] [<0003aad2>] force_sig_fault_to_task+0x36/0x3a
> [ 9858.090000] [<0003aaee>] force_sig_fault+0x18/0x1c
> [ 9858.090000] [<00007450>] send_fault_sig+0x44/0xc6
> [ 9858.090000] [<000069be>] buserr_c+0x2c8/0x6a2
> [ 9858.090000] [<00002cd8>] do_signal_return+0x10/0x1a
RESTORE_SWITCH_STACK in my version. We don't get there in interrupt
context unless it's the only interrupt on the kernel stack.
This is after do_notify_resume() which would have called setup_frame()
in case there was a signal pending (which we can pretty much assume
here, unless you're tracing stress-ng).
I can't see anything in do_signal() and its call chain that would cause
our stack pointer to change upon return from do_notify_resume() ...
Could you add code to do_notify_resume() that compares the 'regs'
argument upon entry and return, and prints both if there is a mismatch?
I know, grasping at straws again ...
Cheers,
Michael
> [ 9858.090000] [<0018800e>] ext4_htree_fill_tree+0x154/0x32a
> [ 9858.090000] [<0010800a>] d_path+0x86/0x114
> [ 9858.090000]
> [ 9858.090000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
>
next prev parent reply other threads:[~2021-09-16 22:30 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-09 7:01 RFC: remove set_fs for m68k Christoph Hellwig
2021-07-09 7:01 ` [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES Christoph Hellwig
2021-07-09 7:01 ` [PATCH 2/7] m68k: use BUILD_BUG for passing invalid sizes to get_user/put_user Christoph Hellwig
2021-07-09 7:01 ` [PATCH 3/7] m68k: remove the inline copy_{from,to}_user variants Christoph Hellwig
2021-07-09 7:01 ` [PATCH 4/7] m68k: remove the err argument to the get_user/put_user assembly helpers Christoph Hellwig
2021-07-09 7:01 ` [PATCH 5/7] m68k: factor the 8-byte lowlevel {get,put}_user code into helpers Christoph Hellwig
2021-07-09 7:01 ` [PATCH 6/7] m68k: provide __{get,put}_kernel_nofault Christoph Hellwig
2021-07-09 7:01 ` [PATCH 7/7] m68k: remove set_fs() Christoph Hellwig
2021-07-11 7:20 ` RFC: remove set_fs for m68k Michael Schmitz
2021-07-12 9:50 ` Christoph Hellwig
2021-07-12 10:20 ` Andreas Schwab
2021-07-12 19:12 ` Michael Schmitz
2021-07-13 5:41 ` Christoph Hellwig
2021-07-13 8:16 ` Michael Schmitz
2021-07-13 8:54 ` Christoph Hellwig
2021-07-14 19:26 ` Michael Schmitz
2021-07-14 20:03 ` Andreas Schwab
2021-07-15 5:44 ` Michael Schmitz
2021-07-16 2:03 ` Michael Schmitz
2021-07-17 5:41 ` Michael Schmitz
2021-07-18 1:14 ` Michael Schmitz
2021-07-21 17:05 ` Christoph Hellwig
2021-07-21 19:20 ` Michael Schmitz
2021-07-23 4:00 ` Michael Schmitz
2021-07-23 5:11 ` Christoph Hellwig
2021-07-25 7:36 ` Michael Schmitz
2021-07-31 19:31 ` Michael Schmitz
2021-08-06 3:10 ` Michael Schmitz
2021-08-11 9:12 ` Christoph Hellwig
2021-08-12 3:37 ` Michael Schmitz
2021-08-15 7:42 ` Christoph Hellwig
2021-08-15 19:17 ` Michael Schmitz
2021-08-16 6:58 ` Christoph Hellwig
[not found] ` <23f745f2-9086-81fb-3d9e-40ea08a1923@linux-m68k.org>
[not found] ` <20210816075155.GA29187@lst.de>
[not found] ` <d407a2a1-738b-5cd5-c2ed-b7250c5da8ec@gmail.com>
[not found] ` <83571ae-10ae-2919-cde-b6b4a5769c9@linux-m68k.org>
[not found] ` <dc594142-e459-533e-cac2-c7a213cec464@gmail.com>
[not found] ` <f4ab2dcb-6761-c60b-54ce-35d0d017d371@gmail.com>
[not found] ` <d772d22e-a945-3e35-80a2-f4783893bea@linux-m68k.org>
[not found] ` <b2c55280-657b-51c2-065c-3fc93db050b9@gmail.com>
[not found] ` <d7b8f7eb-fc18-c8d-fe3e-dcdf19d3f4b@linux-m68k.org>
[not found] ` <755e55ba-4ce2-b4e4-a628-5abc183a557a@linux-m68k.org>
[not found] ` <b52a10fe-3e4b-5740-d3f8-52bce3bc988@linux-m68k.org>
[not found] ` <31f27da7-be60-8eb-9834-748b653c2246@linux-m68k.org>
2021-09-07 3:28 ` Mainline kernel crashes, was " Finn Thain
2021-09-07 5:53 ` Michael Schmitz
2021-09-07 23:50 ` Finn Thain
2021-09-08 8:54 ` Michael Schmitz
2021-09-09 9:40 ` Finn Thain
2021-09-09 23:29 ` Michael Schmitz
2021-09-09 22:51 ` Finn Thain
2021-09-10 0:03 ` Michael Schmitz
2021-09-12 0:51 ` Finn Thain
2021-09-12 3:55 ` Brad Boyer
2021-09-13 1:27 ` Finn Thain
2021-09-13 3:26 ` Michael Schmitz
2021-09-13 5:22 ` Finn Thain
2021-09-13 7:20 ` Michael Schmitz
2021-09-14 3:13 ` Michael Schmitz
2021-09-15 1:38 ` Finn Thain
2021-09-15 8:37 ` Michael Schmitz
2021-09-16 9:04 ` Finn Thain
2021-09-16 22:28 ` Michael Schmitz [this message]
2021-09-21 21:14 ` Michael Schcmitz
2021-08-22 19:33 ` Michael Schmitz
2021-08-23 4:04 ` Michael Schmitz
2021-08-23 17:59 ` Linus Torvalds
2021-08-23 21:31 ` Michael Schmitz
2021-08-23 21:49 ` Linus Torvalds
2021-08-24 8:08 ` Andreas Schwab
2021-08-24 8:44 ` Michael Schmitz
2021-08-24 8:59 ` Andreas Schwab
2021-08-25 7:51 ` Michael Schmitz
2021-08-25 8:44 ` Andreas Schwab
2021-08-25 22:59 ` Michael Schmitz
2021-08-25 23:30 ` Brad Boyer
2021-08-26 7:46 ` Michael Schmitz
2021-08-26 7:45 ` Andreas Schwab
2021-09-14 2:43 ` Michael Schmitz
2021-09-14 15:54 ` Linus Torvalds
2021-09-14 16:28 ` Al Viro
2021-09-14 16:38 ` Linus Torvalds
2021-09-15 1:06 ` Al Viro
2021-07-12 19:04 ` Michael Schmitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2094795-c683-b7b2-76e9-dc33e1610c96@gmail.com \
--to=schmitzmic@gmail.com \
--cc=fthain@linux-m68k.org \
--cc=linux-m68k@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).