From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15F7CC433E6 for ; Tue, 26 Jan 2021 23:41:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E357020665 for ; Tue, 26 Jan 2021 23:41:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729054AbhAZX13 (ORCPT ); Tue, 26 Jan 2021 18:27:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:58950 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389209AbhAZQ7e (ORCPT ); Tue, 26 Jan 2021 11:59:34 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1611680326; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Ywldab530I7sgs7wwDL1d+8MVhSu4IVOVcidJWTtdko=; b=siaxoypXU2slFQCpKtPoDRwzaC7z5xzvsqJmtEbYQYhEXOiM/qY0aTuVzs5nhxphT/6rxD mgGB6Nt2Ztwk0W/4GEBW89Xv/v6sR9yZI2KLQHEAg4hcUW51PG5798ipgi0I1w+mefFsOg 7XgyQz7S++NQ9BWiKFkMC2YWp8rNKh8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DA67AAE55; Tue, 26 Jan 2021 16:58:45 +0000 (UTC) Date: Tue, 26 Jan 2021 17:58:45 +0100 From: Petr Mladek To: Sergey Senozhatsky Cc: Qais Yousef , Thomas Gleixner , Steven Rostedt , John Ogness , "Peter Zijlstra (Intel)" , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt() Message-ID: References: <20210123233741.3614408-1-qais.yousef@arm.com> <20210126145917.r64vzibgpiwyuake@e107158-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 2021-01-27 00:58:33, Sergey Senozhatsky wrote: > On (21/01/26 14:59), Qais Yousef wrote: > > On 01/26/21 13:46, Sergey Senozhatsky wrote: > > > On (21/01/23 23:37), Qais Yousef wrote: > > [..] > > > I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/ > > > > # [67628.388606] hrtimer: interrupt took 304720 ns > > [67628.393546] > > [67628.393550] ============================= > > [67628.393554] [ BUG: Invalid wait context ] > > [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted > > [67628.393560] ----------------------------- > > [67628.393563] sugov:0/192 is trying to lock: > > [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218 > > [67628.393581] other info that might help us debug this: > > [67628.393584] context-{2:2} > > [67628.393586] 4 locks held by sugov:0/192: > > [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88 > > [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0 > > [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310 > > [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8 > > [67628.393646] stack backtrace: > > [67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37 > > [67628.393653] Hardware name: ARM Juno development board (r2) (DT) > > [67628.393656] Call trace: > > [67628.393659] dump_backtrace+0x0/0x1b0 > > [67628.393661] show_stack+0x20/0x70 > > [67628.393664] dump_stack+0xf8/0x168 > > [67628.393666] __lock_acquire+0x964/0x1c88 > > [67628.393669] lock_acquire+0x26c/0x500 > > [67628.393671] _raw_spin_lock+0x64/0x88 > > [67628.393674] pl011_console_write+0x138/0x218 > > [67628.393677] console_unlock+0x2c4/0x6d8 > > [67628.393680] vprintk_emit+0x134/0x310 > > [67628.393682] vprintk_default+0x40/0x50 > > [67628.393685] vprintk_func+0xfc/0x2b0 > > [67628.393687] printk+0x68/0x90 > > [67628.393690] hrtimer_interrupt+0x24c/0x250 > > [67628.393693] arch_timer_handler_phys+0x3c/0x50 > > [67628.393695] handle_percpu_devid_irq+0xd8/0x460 > > [67628.393698] generic_handle_irq+0x38/0x50 > > [67628.393701] __handle_domain_irq+0x6c/0xc8 > > [67628.393704] gic_handle_irq+0xb0/0xf0 > > [67628.393706] el1_irq+0xb4/0x180 > > [67628.393709] _raw_spin_unlock_irqrestore+0x58/0xa8 > > [67628.393712] hrtimer_start_range_ns+0x1b0/0x420 > > [67628.393715] msg_submit+0x100/0x108 > > [67628.393717] mbox_send_message+0x84/0x128 > > [67628.393720] scpi_send_message+0x11c/0x2a8 > > [67628.393723] scpi_dvfs_set_idx+0x48/0x70 > > [67628.393726] scpi_dvfs_set_rate+0x60/0x78 > > [67628.393728] clk_change_rate+0x184/0x8a8 > > [67628.393731] clk_core_set_rate_nolock+0x1c0/0x280 > > [67628.393734] clk_set_rate+0x40/0x98 > > [67628.393736] scpi_cpufreq_set_target+0x40/0x68 > > [67628.393739] __cpufreq_driver_target+0x1a0/0x5d0 > > Thanks. > > > > Why is it invalid? Is this... -rt kernel, perhaps? > > > > No I was running on Linus master at the time. > > > > AFAIU printk will hold the console lock which could sleep in interrupt context. > > > > Did I miss something? > > printk() is not permitted to sleep/schedule/etc and it never does. > Generally it should be OK to call it from IRQ (module recursion paths). To make it clear. printk() takes the mutex via console_trylock(). It does not sleep when the lock is not available. But lockdep should be aware of this. Honestly, I am not sure what the lockdep complains about here. IMHO, the key is that we try to take lock (&port_lock_key){-.-.}-{3:3} in context context-{2:2} where {-.-.} looks fine. Both dots and dashes looks safe, see get_usage_char() So, it is likely the {3:3} vs. {2:2}. I have never seen this. It has been added by the commit de8f5e4f2dc1f032 ("lockdep: Introduce wait-type checks"). Might it be that we take normal mutex under a raw one? It would be safe in normal kernel. But it would be bad in RT one. But lockep should complain about it even in the normal kernel. Best Regards, Petr