From: Andrey Konovalov <andreyknvl@google.com>
To: Qian Cai <cai@lca.pw>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
kasan-dev <kasan-dev@googlegroups.com>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Andrey Ryabinin <aryabinin@virtuozzo.com>,
Dmitry Vyukov <dvyukov@google.com>
Subject: Re: soft lockups with SLAB_CONSISTENCY_CHECKS + KASAN_SW_TAGS (was: livelock with KASAN_SW_TAGS)
Date: Tue, 19 Feb 2019 19:56:57 +0100 [thread overview]
Message-ID: <CAAeHK+yUQ0kZUspiFazjkFu7CRdaL_DZijUXD1po45gnGZkV3w@mail.gmail.com> (raw)
In-Reply-To: <1550601754.6911.41.camel@lca.pw>
On Tue, Feb 19, 2019 at 7:42 PM Qian Cai <cai@lca.pw> wrote:
>
> On Tue, 2019-02-19 at 18:56 +0100, Andrey Konovalov wrote:
> > > > Once the machine is restricted to 16 CPUs (nr_cpus=16), although it still
> > > > trigger soft lockups and msgstress03 would seem running forever, the
> > > > machine is
> > > > still responsible and is able to login via ssh. Hence, it is possible to
> > > > capture
> > > > a task dump (echo t >/proc/sysrq-trigger) while this is happening.
> > > >
> > > > https://git.sr.ht/~cai/linux-debug/tree/master/console
> > > >
> > > > Some traces looks strange that looks like running free_debug_processing()
> > > > in a loop,
> > > >
> > > > [ 1986.002139] Call trace:
> > > > [ 1986.002145] _raw_spin_unlock_irqrestore+0x44/0xac
> > > > [ 1986.002152] free_debug_processing+0x2f4/0x3e4
> > > > [ 1986.002157] kmem_cache_free+0x44c/0x870
> > > > [ 1986.002163] free_object_rcu+0x200/0x228
> > > > [ 1986.002169] rcu_process_callbacks+0xb00/0x12c0
> > > > [ 1986.002175] __do_softirq+0x644/0xfd0
> > > > [ 1986.002181] irq_exit+0x29c/0x370
> > > > [ 1986.002187] __handle_domain_irq+0xe0/0x1c4
> > > > [ 1986.002192] gic_handle_irq+0x1c4/0x3b0
> > > > [ 1986.002197] el1_irq+0xb0/0x140
> > > > [ 1986.002203] lock_release+0x660/0x7dc
> > > > [ 1986.002209] rcu_lock_release+0x20/0x28
> > > > [ 1986.002214] do_msgrcv+0x708/0xed0
> > > > [ 1986.002219] ksys_msgrcv+0x4c/0x60
> > > > [ 1986.002224] __arm64_sys_msgrcv+0xb8/0x194
> > > > [ 1986.002230] el0_svc_handler+0x230/0x3bc
> > > > [ 1986.002236] el0_svc+0x8/0xc
> > > > [ 1986.007106] OUTLINED_FUNCTION_169+0x4/0xc
> > > > [ 1986.011885] free_debug_processing+0x2f4/0x3e4
> > > > [ 1986.017186] load_msg+0x4c/0x324
> > > > [ 1986.021617] kmem_cache_free+0x44c/0x870
> > > > [ 1986.026917] ksys_msgsnd+0x1e0/0xe5c
> > > > [ 1988.050035] _raw_spin_unlock_irqrestore+0x44/0xac
> > > > [ 1988.054821] free_debug_processing+0x2f4/0x3e4
> > > > [ 1988.059260] kfree+0x3f8/0x7ac
> > > > [ 1988.062313] free_msg+0x50/0xb0
> > > > [ 1988.065450] do_msgrcv+0xd80/0xed0
> > > > [ 1988.068846] ksys_msgrcv+0x4c/0x60
> > > > [ 1988.072243] __arm64_sys_msgrcv+0xb8/0x194
> > > > [ 1988.076336] el0_svc_handler+0x230/0x3bc
> > > > [ 1988.080255] el0_svc+0x
> > >
> > > I'm hoping that Andrey can make sense of this, since he recently hacked up
> > > freelist_ptr(), although only if CONFIG_SLAB_FREELIST_HARDENED=y, which
> > > isn't the case in your .config.
> >
> > So far, I've been unable to trigger this in QEMU as well.
> >
> > Qian, could you check if this still happens after adding that -pg flag
> > in KASAN Makefile?
>
> Yes, it still happen. Although the reproducer (LTP msgstress0[3-4]) is making
> slow progress, so not strict a live lock now. The situation gets worse if the
> system has more CPUs probably because more CPUs are trying to acquire the
> spinlock in free_debug_processing() and then flush the console with soft
> lockups.
>
> One workaround is to add "KASAN_SANITIZE_string.o := n" to lib/Makefile which
> will stop inserting KASAN instruments for check_bytes8(), and then the
> reproducers are running smoothly without triggering any soft lockups.
>
> It looks like check_bytes8() is a big CPU consumer especially with KASAN
> instruments added.
>
> 0000000000001a2c <check_bytes8>:
> 1a2c: d2c20008 mov x8, #0x100000000000
> 1a30: f2fdffe8 movk x8, #0xefff, lsl #48
> 1a34: 34000202 cbz w2, 1a74 <check_bytes8+0x48>
> 1a38: aa1e03e3 mov x3, x30
> 1a3c: 94000047 bl 1b58 <OUTLINED_FUNCTION_2>
> 1a40: aa0303fe mov x30, x3
> 1a44: 54000060 b.eq 1a50 <check_bytes8+0x24> // b.none
> 1a48: 7103fd3f cmp w9, #0xff
> 1a4c: 54000101 b.ne 1a6c <check_bytes8+0x40> // b.any
> 1a50: 39400009 ldrb w9, [x0]
> 1a54: 6b21013f cmp w9, w1, uxtb
> 1a58: 54000101 b.ne 1a78 <check_bytes8+0x4c> // b.any
> 1a5c: 91000400 add x0, x0, #0x1
> 1a60: 51000442 sub w2, w2, #0x1
> 1a64: 35fffea2 cbnz w2, 1a38 <check_bytes8+0xc>
> 1a68: 14000003 b 1a74 <check_bytes8+0x48>
> 1a6c: d4212400 brk #0x920
> 1a70: 17fffff8 b 1a50 <check_bytes8+0x24>
> 1a74: aa1f03e0 mov x0, xzr
> 1a78: d65f03c0 ret
>
> This function is called by over and over again (with interrupts disabled),
>
> free_debug_processing [1]
> free_consistency_checks
> check_object
> memchr_inv [2]
> check_bytes8
>
> [1] iterate all objects in the slab.
> [2] while (words) { words--;
Ah, so it doesn't lock, it's just very slow? memchr_inv() is the only
caller of check_bytes8(), so we could remove instrumentation from the
latter, and add one KASAN range check into the former. But I'd say
this is the expected behavior, KASAN slows down stuff and I don't
think it makes much sense to enable it together with other memory
debugging options.
>
> I also noticed that even the single "top" command is now consuming 30% - 40%
> CPUs all the time. Sometimes, it could jump to 80% or so.
>
> 5969 root 20 0 24512 10560 4736 R 83.8 0.0 3:25.79 top
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-02-19 18:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-14 3:32 livelock with KASAN_SW_TAGS Qian Cai
2019-02-14 16:35 ` Will Deacon
2019-02-14 16:50 ` Qian Cai
2019-02-14 18:01 ` Will Deacon
2019-02-15 4:04 ` Qian Cai
2019-02-15 14:23 ` Will Deacon
2019-02-15 14:26 ` Will Deacon
2019-02-19 17:56 ` Andrey Konovalov
[not found] ` <1550601754.6911.41.camel@lca.pw>
2019-02-19 18:56 ` Andrey Konovalov [this message]
2019-02-19 19:08 ` soft lockups with SLAB_CONSISTENCY_CHECKS + KASAN_SW_TAGS (was: livelock with KASAN_SW_TAGS) Qian Cai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAeHK+yUQ0kZUspiFazjkFu7CRdaL_DZijUXD1po45gnGZkV3w@mail.gmail.com \
--to=andreyknvl@google.com \
--cc=aryabinin@virtuozzo.com \
--cc=cai@lca.pw \
--cc=catalin.marinas@arm.com \
--cc=dvyukov@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).