linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Cc: linux-rt-users@vger.kernel.org,
	Mel Gorman <mgorman@techsingularity.net>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope
Date: Wed, 21 Jul 2021 10:44:03 +0200	[thread overview]
Message-ID: <69da2ecd-a797-e264-fbfa-13108dc7a573@suse.cz> (raw)
In-Reply-To: <240f104fc6757d8c38fa01342511eda931632d5a.camel@gmx.de>

On 7/21/21 6:56 AM, Mike Galbraith wrote:
> On Tue, 2021-07-20 at 13:26 +0200, Mike Galbraith wrote:
>> On Tue, 2021-07-20 at 10:56 +0200, Vlastimil Babka wrote:
>> > > crash> bt -sx
>> > > PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
>> > >  #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
>> > >  #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
>> > >  #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
>> > >  #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
>> > >  #4 [ffff88818f8ffab8] exc_general_protection+0x195 at
>> > > ffffffff8179fdb5
>> > >  #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at
>> > > ffffffff81800a0e
>> > >     [exception RIP: __unfreeze_partials+156]
>> >
>> > Hm going back to this report...
>> > So could it be that it was stillput_cpu_partial() preempting
>> > __slab_alloc() messing the partial list, but for some reason the
>> > put_cpu_partial() side crashed this time?
>>
>> Thinking this bug is toast, I emptied the trash bin, so no can peek.
> 
> I made fireworks while waiting for bike riding time, boom #10 was
> finally the right flavor, but...
> 
> crash> bt -sx
> PID: 32     TASK: ffff888100a56000  CPU: 3   COMMAND: "rcuc/3"
>  #0 [ffff888100aa7a90] machine_kexec+0x14f at ffffffff81051c8f
>  #1 [ffff888100aa7ad8] __crash_kexec+0xd2 at ffffffff81120612
>  #2 [ffff888100aa7b98] crash_kexec+0x30 at ffffffff811213b0
>  #3 [ffff888100aa7ba8] oops_end+0xd3 at ffffffff810267e3
>  #4 [ffff888100aa7bc8] exc_general_protection+0x195 at ffffffff817a2cc5
>  #5 [ffff888100aa7c60] asm_exc_general_protection+0x1e at ffffffff81800a0e
>     [exception RIP: __unfreeze_partials+149]
>     RIP: ffffffff8124a295  RSP: ffff888100aa7d10  RFLAGS: 00010202
>     RAX: 0000000000190016  RBX: 0000000000190016  RCX: 000000017fffffff
>     RDX: 00000001ffffffff  RSI: 0000000000000023  RDI: ffffffff81e58b10
>     RBP: ffff888100aa7da0   R8: 0000000000000000   R9: 0000000000190018
>     R10: ffff888100aa7db8  R11: 000000000002d9e4  R12: ffff888100190500
>     R13: ffff88810018c980  R14: 00000001ffffffff  R15: ffffea0004571588
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffff888100aa7db0] put_cpu_partial+0x8e at ffffffff8124a56e
>  #7 [ffff888100aa7dd0] kmem_cache_free+0x3a8 at ffffffff8124d238
>  #8 [ffff888100aa7e08] rcu_do_batch+0x186 at ffffffff810eb246
>  #9 [ffff888100aa7e70] rcu_core+0x25f at ffffffff810eeb2f
> #10 [ffff888100aa7eb0] rcu_cpu_kthread+0x94 at ffffffff810eed24
> #11 [ffff888100aa7ee0] smpboot_thread_fn+0x249 at ffffffff8109e559
> #12 [ffff888100aa7f18] kthread+0x1ac at ffffffff810984dc
> #13 [ffff888100aa7f50] ret_from_fork+0x1f at ffffffff81001b1f
> crash> runq
> ...
> CPU 3 RUNQUEUE: ffff88840ece9980
>   CURRENT: PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
>   RT PRIO_ARRAY: ffff88840ece9bc0
>      [ 94] PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
>   CFS RB_ROOT: ffff88840ece9a40
>      [120] PID: 33     TASK: ffff888100a51000  COMMAND: "ksoftirqd/3"
> ...
> crash> bt -sx 33
> PID: 33     TASK: ffff888100a51000  CPU: 3   COMMAND: "ksoftirqd/3"
>  #0 [ffff888100aabdf0] __schedule+0x2d7 at ffffffff817ad3a7
>  #1 [ffff888100aabec8] schedule+0x3b at ffffffff817ae4eb
>  #2 [ffff888100aabee0] smpboot_thread_fn+0x18c at ffffffff8109e49c
>  #3 [ffff888100aabf18] kthread+0x1ac at ffffffff810984dc
>  #4 [ffff888100aabf50] ret_from_fork+0x1f at ffffffff81001b1f
> crash>

So this doesn't look like our put_cpu_partial() preempted a __slab_alloc() on
the same cpu, right? There might have been __slab_alloc() in irq handler
preempting us, but we won't see that anymore. I don't immediately see the root
cause and this scenario should be possible on !RT too where we however didn't
see these explosions.

BTW did my ugly patch work?

Thanks.

  reply	other threads:[~2021-07-21  9:07 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08  9:42 [ANNOUNCE] v5.13-rt1 Thomas Gleixner
2021-07-09  5:20 ` [patch] mm/slub: Fix kmem_cache_alloc_bulk() error path Mike Galbraith
2021-07-09  5:20 ` [patch] mm/slub: Replace do_slab_free() local_lock_irqsave/restore() calls in PREEMPT_RT scope Mike Galbraith
2021-07-09  5:21 ` [rfc/patch] mm/slub: restore/expand unfreeze_partials() local exclusion scope Mike Galbraith
2021-07-09  5:25   ` Mike Galbraith
2021-07-09 19:28   ` Thomas Gleixner
2021-07-10  1:12     ` Mike Galbraith
2021-07-15 16:34       ` Mike Galbraith
2021-07-17 14:58         ` [patch] v2 " Mike Galbraith
2021-07-18  7:58           ` Vlastimil Babka
2021-07-18  8:11             ` Mike Galbraith
2021-07-18 15:43           ` Mike Galbraith
2021-07-18 21:19           ` Vlastimil Babka
2021-07-19  4:01             ` Mike Galbraith
2021-07-19 13:15               ` Mike Galbraith
2021-07-20  2:46           ` kernel test robot
2021-07-20  8:56         ` [rfc/patch] " Vlastimil Babka
2021-07-20 11:26           ` Mike Galbraith
2021-07-21  4:56             ` Mike Galbraith
2021-07-21  8:44               ` Vlastimil Babka [this message]
2021-07-21  9:33                 ` Mike Galbraith
2021-07-23 22:39                   ` Vlastimil Babka
2021-07-24  2:25                     ` Mike Galbraith
2021-07-25 14:09                     ` Mike Galbraith
2021-07-25 14:16                       ` Vlastimil Babka
2021-07-25 15:02                         ` Mike Galbraith
2021-07-25 16:27                           ` Vlastimil Babka
2021-07-25 19:12                             ` Vlastimil Babka
2021-07-25 19:34                               ` Vlastimil Babka
2021-07-26 10:04                                 ` Mike Galbraith
2021-07-26 17:00                                 ` Mike Galbraith
2021-07-26 21:26                                   ` Vlastimil Babka
2021-07-27  4:09                                     ` Mike Galbraith
2021-07-28 16:59                                       ` Vlastimil Babka
2021-07-29  4:51                                         ` Mike Galbraith
2021-07-29  9:51                                           ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69da2ecd-a797-e264-fbfa-13108dc7a573@suse.cz \
    --to=vbabka@suse.cz \
    --cc=bigeasy@linutronix.de \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).