From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19F6EC43331 for ; Tue, 31 Mar 2020 13:25:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D40E220772 for ; Tue, 31 Mar 2020 13:25:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730926AbgCaNZe convert rfc822-to-8bit (ORCPT ); Tue, 31 Mar 2020 09:25:34 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:37792 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730442AbgCaNZe (ORCPT ); Tue, 31 Mar 2020 09:25:34 -0400 Received: by mail-ot1-f65.google.com with SMTP id g23so21972328otq.4 for ; Tue, 31 Mar 2020 06:25:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=XGyKuM7pLO0meZP23Om4xZ9WDxe8z5tqirUELAeA9HE=; b=L5iA+AU96X/IddS4TS1JX/5r7hJLggSsZvF8FTbxA3K/sFesyT7Yrilx8t9XnGfZxl 1dbKN2F0AdiJPjx3geqa2sdc0KC4uhPSUQEiXMl2BvbXWL0IDj6wl9ekIFDOabdp6MqP TOTsHIygsp4xYRwr6Dfhuxy0SfKkw8Ihd/M/7jVf+JUi4Csqy4n/cMkmkyktBkm54TYw aOX5HHT5cCu580V4XAFm1DP00/A6jLdUtsR5/aDvHVCvO0J4dQNEb3IDeIbM59uAP+Wn yOeRphgglK5+vISEeLR7pro0XeBwTU88tsMrSExItD/luEC/SFxvPQY1UF4MZ8jX6X2B so3Q== X-Gm-Message-State: ANhLgQ0xNh9SfqOMbU221M/2POJxN0OQXyPHhp078G9swT5WGW0O2GAZ KepuKDhADNp/oOxX55qVeZO8JZGLX7BbgeDVo48= X-Google-Smtp-Source: ADFU+vsv5dzMKVCHS+yQWZ+D38OtHEpnqweDoDNLY1ZJ7s1k7jb/zTnDPhJUwt3g/NBC8q9N8hSzvtYNSXQlojIDg5U= X-Received: by 2002:a05:6830:15c2:: with SMTP id j2mr12236793otr.107.1585661133086; Tue, 31 Mar 2020 06:25:33 -0700 (PDT) MIME-Version: 1.0 References: <20200313174701.148376-1-bigeasy@linutronix.de> <20200313174701.148376-7-bigeasy@linutronix.de> In-Reply-To: <20200313174701.148376-7-bigeasy@linutronix.de> From: Geert Uytterhoeven Date: Tue, 31 Mar 2020 15:25:21 +0200 Message-ID: Subject: Re: [PATCH 6/9] lockdep: Introduce wait-type checks To: Sebastian Andrzej Siewior , Peter Zijlstra Cc: Linux Kernel Mailing List , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Linus Torvalds , Thomas Gleixner , Linux ARM , Russell King , Catalin Marinas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, On Fri, Mar 13, 2020 at 6:48 PM Sebastian Andrzej Siewior wrote: > From: Peter Zijlstra > > Extend lockdep to validate lock wait-type context. > > The current wait-types are: > > LD_WAIT_FREE, /* wait free, rcu etc.. */ > LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ > LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ > LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ > > Where lockdep validates that the current lock (the one being acquired) > fits in the current wait-context (as generated by the held stack). > > This ensures that there is no attempt to acquire mutexes while holding > spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In > other words, its a more fancy might_sleep(). > > Obviously RCU made the entire ordeal more complex than a simple single > value test because RCU can be acquired in (pretty much) any context and > while it presents a context to nested locks it is not the same as it > got acquired in. > > Therefore its necessary to split the wait_type into two values, one > representing the acquire (outer) and one representing the nested context > (inner). For most 'normal' locks these two are the same. > > [ To make static initialization easier we have the rule that: > .outer == INV means .outer == .inner; because INV == 0. ] > > It further means that its required to find the minimal .inner of the held > stack to compare against the outer of the new lock; because while 'normal' > RCU presents a CONFIG type to nested locks, if it is taken while already > holding a SPIN type it obviously doesn't relax the rules. > > Below is an example output generated by the trivial test code: > > raw_spin_lock(&foo); > spin_lock(&bar); > spin_unlock(&bar); > raw_spin_unlock(&foo); > > [ BUG: Invalid wait context ] > ----------------------------- > swapper/0/1 is trying to lock: > ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187 > other info that might help us debug this: > 1 lock held by swapper/0/1: > #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187 > > The way to read it is to look at the new -{n,m} part in the lock > description; -{3:3} for the attempted lock, and try and match that up to > the held locks, which in this case is the one: -{2,2}. > > This tells that the acquiring lock requires a more relaxed environment than > presented by the lock stack. > > Currently only the normal locks and RCU are converted, the rest of the > lockdep users defaults to .inner = INV which is ignored. More conversions > can be done when desired. > > The check for spinlock_t nesting is not enabled by default. It's a separate > config option for now as there are known problems which are currently > addressed. The config option allows to identify these problems and to > verify that the solutions found are indeed solving them. > > The config switch will be removed and the checks will permanently enabled > once the vast majority of issues has been addressed. > > [ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile > failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP] > [ tglx: Add the config option ] > > Requested-by: Thomas Gleixner > Signed-off-by: Peter Zijlstra (Intel) > Signed-off-by: Sebastian Andrzej Siewior On arm64 (e.g. R-Car H3 ES2.0): +============================= +[ BUG: Invalid wait context ] +5.6.0-salvator-x-09423-gb29514ba13a9c459-dirty #679 Not tainted +----------------------------- +swapper/5/0 is trying to lock: +ffffff86ff76f398 (&pool->lock){..-.}-{3:3}, at: __queue_work+0x134/0x430 +other info that might help us debug this: +1 lock held by swapper/5/0: + #0: ffffffc01103a4a0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.59+0x0/0x38 +stack backtrace: +CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.6.0-salvator-x-09423-gb29514ba13a9c459-dirty #679 +Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT) +Call trace: + dump_backtrace+0x0/0x180 + show_stack+0x14/0x1c + dump_stack+0xdc/0x12c + __lock_acquire+0x37c/0xf9c + lock_acquire+0x258/0x288 + _raw_spin_lock+0x34/0x48 + __queue_work+0x134/0x430 + queue_work_on+0x48/0x8c + timers_update_nohz+0x24/0x2c + tick_nohz_activate.isra.15.part.16+0x5c/0x80 + tick_setup_sched_timer+0xe0/0xf0 + hrtimer_run_queues+0x88/0xf8 + run_local_timers+0x20/0x58 + update_process_times+0x24/0x50 + tick_periodic+0xd8/0xe8 + tick_handle_periodic+0x30/0x98 + arch_timer_handler_phys+0x28/0x3c + handle_percpu_devid_irq+0x64/0x110 + generic_handle_irq+0x20/0x34 + __handle_domain_irq+0x94/0x98 + gic_handle_irq+0x78/0xbc + el1_irq+0xf4/0x1c0 + arch_cpu_idle+0x38/0x54 + default_idle_call+0x2c/0x30 + do_idle+0x13c/0x244 + cpu_startup_entry+0x20/0x24 + secondary_start_kernel+0x1c4/0x1d8 On arm32 (e.g SH-Mobile AG5, using Cortex-A9 TWD): +============================= +[ BUG: Invalid wait context ] +5.6.0-kzm9g-09424-g2698719b4c4f35db-dirty #253 Not tainted +----------------------------- +swapper/0/0 is trying to lock: +dfbc5250 (&pool->lock){..-.}-{3:3}, at: __queue_work+0x14c/0x4d0 +other info that might help us debug this: +1 lock held by swapper/0/0: + #0: c0a17074 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.33+0x0/0x38 +stack backtrace: +CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-kzm9g-09424-g2698719b4c4f35db-dirty #253 +Hardware name: Generic SH73A0 (Flattened Device Tree) +[] (unwind_backtrace) from [] (show_stack+0x10/0x14) +[] (show_stack) from [] (dump_stack+0xa8/0xe0) +[] (dump_stack) from [] (__lock_acquire+0x398/0x1568) +[] (__lock_acquire) from [] (lock_acquire+0x274/0x2ac) +[] (lock_acquire) from [] (_raw_spin_lock+0x40/0x50) +[] (_raw_spin_lock) from [] (__queue_work+0x14c/0x4d0) +[] (__queue_work) from [] (queue_work_on+0x48/0x6c) +[] (queue_work_on) from [] (tick_setup_sched_timer+0x148/0x188) +[] (tick_setup_sched_timer) from [] (hrtimer_run_queues+0x74/0x114) +[] (hrtimer_run_queues) from [] (run_local_timers+0x14/0x54) +[] (run_local_timers) from [] (update_process_times+0x24/0x54) +[] (update_process_times) from [] (tick_handle_periodic+0x28/0xa0) +[] (tick_handle_periodic) from [] (twd_handler+0x2c/0x38) +[] (twd_handler) from [] (handle_percpu_devid_irq+0x58/0xfc) +[] (handle_percpu_devid_irq) from [] (generic_handle_irq+0x28/0x38) +[] (generic_handle_irq) from [] (__handle_domain_irq+0x90/0xa0) +[] (__handle_domain_irq) from [] (gic_handle_irq+0x58/0x90) +[] (gic_handle_irq) from [] (__irq_svc+0x70/0x98) +Exception stack(0xc0a01f40 to 0xc0a01f88) +1f40: 00000001 00000006 c0a0bac0 00000000 00000001 ffffe000 c0a08ea8 c0a94532 +1f60: c0a08eec 00000001 c09278e0 00000000 ffffe000 c0a01f90 c0108448 c010844c +1f80: 60000013 ffffffff +[] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c) +[] (arch_cpu_idle) from [] (do_idle+0xe8/0x13c) +[] (do_idle) from [] (cpu_startup_entry+0x18/0x1c) +[] (cpu_startup_entry) from [] (start_kernel+0x3f0/0x49c) +[] (start_kernel) from [<00000000>] (0x0) I also see it on other arm32 platforms using Renesas-specific timers, but I'll ignore those until the issues with "standard" ARM timers have been resolved ;-) Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds