From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90266C48BE5 for ; Mon, 21 Jun 2021 07:24:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65B8961205 for ; Mon, 21 Jun 2021 07:24:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230161AbhFUH03 (ORCPT ); Mon, 21 Jun 2021 03:26:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230028AbhFUH02 (ORCPT ); Mon, 21 Jun 2021 03:26:28 -0400 Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38C58C061574 for ; Mon, 21 Jun 2021 00:24:13 -0700 (PDT) Received: by mail-io1-xd2e.google.com with SMTP id l64so14592774ioa.7 for ; Mon, 21 Jun 2021 00:24:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5ud4Z0WiKV1T/EnA8dpybNxTYNaYVrDg9lP9IWWY5v4=; b=frUuGMM1IL7J4k7/ro+rNgmB5wKfDnlJYiO+4b1I0LM2NamN76dNRHQRpILFpVE0JR SL5tfsOKQtSxfePSEImoiyTvF0XO/IbM8UJSbDB+Uzs6d5Uj54DOdUd8x2OodforNPJ7 8fE2bOUIIX565aF+7reXZiJuMqf9QeS5c5OCp9O1ohHnDUl02m0dXv5w2PnZ24wVc0Na xzAIEDWWxXSFSJGZ8V9Y9D/6KnK8nA8jHkRPoACkwglPMQZZ3GCXZdG7mW4ZPeNFcsj8 9vjObV0fOFomVvRtIIyQGPYMoDKpskmTkDwvqlYELpIWYPwlLt7o8I1cGQ+k95NQR4Ai tMzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5ud4Z0WiKV1T/EnA8dpybNxTYNaYVrDg9lP9IWWY5v4=; b=KSKrKPo4i8QvAU41ynTOT/wcTLEBkS/qqmgtEJTqIjpsXpGE24S/eLNjsSTb59M5sS aAIT8FF+zd4jQPtCLO4sDsGqbnl6gem+aKryMkwLUlPQ7crQwSMdbkAku0v+hg9F/Ewc KL9ioSo+TErqF5h3TS3dLUueSkp77oGTVgl5enZVznkIa/My0NiEH6/s4LD+1B81t0ZB VHPtIqK7LBScFAOYvR53Yvn7kjw6tHbmKjr0bM0khMqqrlFzjQloj8VoIU+DC8zTOL3F 1RNFgfjD1vCr2j3j23CLOMzytlcS8bESajdneTDXXGt+ZcC+89J7momiJozkXdDROzxJ KYdw== X-Gm-Message-State: AOAM531v40GbqG+bwCsyfeKfh2X2cgIHHIf4IQlKJqo4H7KyxoNQKQng QaECN3RB3gIwswNcZqUCxo4= X-Google-Smtp-Source: ABdhPJxbQZP0vSnSUrgoYjHaCjWhXuEiljBUgHWRyTW9S2CZWzAo7+OJRlKqOHDnicInXFG72y83hg== X-Received: by 2002:a02:7348:: with SMTP id a8mr16182708jae.116.1624260252396; Mon, 21 Jun 2021 00:24:12 -0700 (PDT) Received: from auth1-smtp.messagingengine.com (auth1-smtp.messagingengine.com. [66.111.4.227]) by smtp.gmail.com with ESMTPSA id i12sm2087636ilk.22.2021.06.21.00.24.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Jun 2021 00:24:11 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id 2C3E127C0054; Mon, 21 Jun 2021 03:24:10 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Mon, 21 Jun 2021 03:24:11 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfeefkedgudduiecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpeffhffvuffkfhggtggujgesthdtredttddtvdenucfhrhhomhepuehoqhhu nhcuhfgvnhhguceosghoqhhunhdrfhgvnhhgsehgmhgrihhlrdgtohhmqeenucggtffrrg htthgvrhhnpeegiefgudfhvdekuefgkefgiefhgfefteelhedtvedufedvtdelvdelheet leejhfenucffohhmrghinhepkhgvrhhnvghlrdhorhhgpdhqvghmuhdqphhrohhjvggtth drohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhm pegsohhquhhnodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdeiledvgeehtd eigedqudejjeekheehhedvqdgsohhquhhnrdhfvghngheppehgmhgrihhlrdgtohhmsehf ihigmhgvrdhnrghmvg X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 21 Jun 2021 03:24:09 -0400 (EDT) Date: Mon, 21 Jun 2021 15:23:35 +0800 From: Boqun Feng To: Johannes Berg Cc: Peter Zijlstra , Andrea Righi , linux-kernel@vger.kernel.org Subject: Re: lockdep issue with trig->leddev_list_lock again/still? Message-ID: References: <8a61ecda99843307018e3e71a5540682436443fc.camel@sipsolutions.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8a61ecda99843307018e3e71a5540682436443fc.camel@sipsolutions.net> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 16, 2021 at 09:54:04PM +0200, Johannes Berg wrote: > Hi Boqun, all, > Hello, Sorry for the late response, please see below: > I've been reading the threads at > > https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ > > and > > https://lore.kernel.org/lkml/20201102104152.GG9930@xps-13-7390/ > > as well as the previous fix commit 27af8e2c90fb ("leds: trigger: fix > potential deadlock with libata"), but I cannot make sense of this. > > > I'm getting a report - on 5.10.27, which has that fix backported - that > says we have a chain with leddev_list_lock involved in it, but I don't > really understand it. > > FWIW, I have the following patch, that Peter gave me some time ago but > said you (Boqun) wrote, applied to the tree as well: > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 48d736aa03b2..9de9ae8a4ae0 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -2667,8 +2667,18 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev, > * Step 3: we found a bad match! Now retrieve a lock from the backward > * list whose usage mask matches the exclusive usage mask from the > * lock found on the forward list. > + * > + * Note, we should only keep the LOCKF_ENABLED_IRQ_ALL bits, considering > + * the follow case: > + * > + * When trying to add A -> B to the graph, we find that there is a > + * hardirq-safe L, that L -> ... -> A, and another hardirq-unsafe M, > + * that B -> ... -> M. However M is **softirq-safe**, if we use exact > + * invert bits of M's usage_mask, we will find another lock N that is > + * **softirq-unsafe** and N -> ... -> A, however N -> .. -> M will not > + * cause a inversion deadlock. > */ > - backward_mask = original_mask(target_entry1->class->usage_mask); > + backward_mask = original_mask(target_entry1->class->usage_mask & LOCKF_ENABLED_IRQ_ALL); > > ret = find_usage_backwards(&this, backward_mask, &target_entry); > if (bfs_error(ret)) { > > > Without the patch, the lockdep report makes no sense at all, so thanks > for that. :-) > > > With that patch, I'm getting the following report from our driver vs. > libata vs. LED triggers: > > [ 498.399434] ===================================================== > [ 498.399481] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected > [ 498.399523] 5.10.27-dev-lockdep-fix2+ #1 Tainted: G OE > [ 498.399565] ----------------------------------------------------- > [ 498.399608] ip/364 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 498.399650] ffff8880310fa438 (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2}, at: iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi] > [ 498.399749] > [ 498.399749] and this task is already holding: > [ 498.399789] ffff88800558b5b8 (&trig->leddev_list_lock){.-.-}-{2:2}, at: led_trigger_event+0x34/0xd0 > [ 498.399845] which would create a new lock dependency: > [ 498.399884] (&trig->leddev_list_lock){.-.-}-{2:2} -> (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2} > [ 498.399938] > [ 498.399938] but this new dependency connects a HARDIRQ-irq-safe lock: > [ 498.399984] (&host->lock){-.-.}-{2:2} > [ 498.399992] > [ 498.399992] ... which became HARDIRQ-irq-safe at: > [ 498.400077] lock_acquire+0x1b8/0x720 > [ 498.400121] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.400165] ata_bmdma_interrupt+0x21/0x5a0 > [ 498.400208] __handle_irq_event_percpu+0x25e/0x630 > [ 498.400250] handle_irq_event+0x10e/0x290 > [ 498.400290] handle_edge_irq+0x1f7/0xbb0 > [ 498.400332] asm_call_irq_on_stack+0x12/0x20 > [ 498.400374] common_interrupt+0xfb/0x1a0 > [ 498.400415] asm_common_interrupt+0x1e/0x40 > [ 498.400456] _raw_spin_unlock_irq+0x2f/0x40 > [ 498.400497] ata_sff_flush_pio_task+0xa0/0x170 > [ 498.400537] ata_exec_internal_sg+0xbe0/0x1840 > [ 498.400577] ata_dev_read_id+0x8cd/0xcb0 > [ 498.400617] ata_dev_reread_id+0xef/0x1c0 > [ 498.400658] ata_dev_revalidate+0xcb/0x1f0 > [ 498.400698] ata_do_set_mode+0x126b/0x2410 > [ 498.400741] ata_set_mode+0x305/0x3e0 > [ 498.400786] ata_eh_recover+0x268c/0x4110 > [ 498.400828] ata_do_eh+0x7a/0xe0 > [ 498.400866] ata_sff_error_handler+0x263/0x360 > [ 498.400906] ata_bmdma_error_handler+0x295/0x580 > [ 498.400945] ata_scsi_port_error_handler+0x7c1/0x17b0 > [ 498.400986] ata_scsi_error+0x1f8/0x410 > [ 498.401027] scsi_error_handler+0x29b/0x14a0 > [ 498.401077] kthread+0x370/0x440 > [ 498.401119] ret_from_fork+0x22/0x30 > [ 498.401158] > [ 498.401158] to a HARDIRQ-irq-unsafe lock: > [ 498.401197] (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2} > [ 498.401204] > [ 498.401204] ... which became HARDIRQ-irq-unsafe at: > [ 498.401277] ... > [ 498.401282] lock_acquire+0x1b8/0x720 > [ 498.401359] _raw_spin_lock_bh+0x38/0x50 > [ 498.401418] iwl_pcie_hcmd_complete+0x1c7/0x12b0 [iwlwifi] > [ 498.401481] iwl_pcie_rx_handle+0xe56/0x27e0 [iwlwifi] > [ 498.401543] iwl_pcie_napi_poll_msix+0xa3/0x240 [iwlwifi] > [ 498.401591] net_rx_action+0x3dd/0xea0 > [ 498.401628] __do_softirq+0x1bc/0x899 > [ 498.401668] asm_call_irq_on_stack+0x12/0x20 > [ 498.401711] do_softirq_own_stack+0x8b/0xb0 > [ 498.401756] do_softirq+0xab/0xd0 > [ 498.401797] __local_bh_enable_ip+0xf3/0x110 > [ 498.401854] iwl_pcie_irq_rx_msix_handler+0x26e/0x4e0 [iwlwifi] > [ 498.401901] irq_thread_fn+0x92/0x160 > [ 498.401940] irq_thread+0x244/0x4a0 > [ 498.401980] kthread+0x370/0x440 > [ 498.402022] ret_from_fork+0x22/0x30 > [ 498.402061] > [ 498.402061] other info that might help us debug this: > [ 498.402061] > [ 498.402101] Chain exists of: > [ 498.402101] &host->lock --> &trig->leddev_list_lock --> &iwl_txq_cmd_queue_lock_class > [ 498.402101] > [ 498.402153] Possible interrupt unsafe locking scenario: > [ 498.402153] > [ 498.402199] CPU0 CPU1 > [ 498.402239] ---- ---- > [ 498.402278] lock(&iwl_txq_cmd_queue_lock_class); > [ 498.402320] local_irq_disable(); > [ 498.402358] lock(&host->lock); > [ 498.402398] lock(&trig->leddev_list_lock); > [ 498.402437] > [ 498.402473] lock(&host->lock); > [ 498.402515] > [ 498.402515] *** DEADLOCK *** > [ 498.402515] > [ 498.402555] 3 locks held by ip/364: > [ 498.402594] #0: ffffffff9cf45f08 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x319/0x910 > [ 498.402654] #1: ffff8880055885e8 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: ieee80211_open+0x165/0x230 [mac80211] > [ 498.402782] #2: ffff88800558b5b8 (&trig->leddev_list_lock){.-.-}-{2:2}, at: led_trigger_event+0x34/0xd0 > [ 498.402837] > [ 498.402837] the dependencies between HARDIRQ-irq-safe lock and the holding lock: > [ 498.402888] -> (&host->lock){-.-.}-{2:2} { > [ 498.402934] IN-HARDIRQ-W at: > [ 498.402977] lock_acquire+0x1b8/0x720 > [ 498.403020] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.403065] ata_bmdma_interrupt+0x21/0x5a0 > [ 498.403109] __handle_irq_event_percpu+0x25e/0x630 > [ 498.403162] handle_irq_event+0x10e/0x290 > [ 498.403208] handle_edge_irq+0x1f7/0xbb0 > [ 498.403248] asm_call_irq_on_stack+0x12/0x20 > [ 498.403290] common_interrupt+0xfb/0x1a0 > [ 498.403331] asm_common_interrupt+0x1e/0x40 > [ 498.403378] _raw_spin_unlock_irq+0x2f/0x40 > [ 498.403420] ata_sff_flush_pio_task+0xa0/0x170 > [ 498.403461] ata_exec_internal_sg+0xbe0/0x1840 > [ 498.403501] ata_dev_read_id+0x8cd/0xcb0 > [ 498.403541] ata_dev_reread_id+0xef/0x1c0 > [ 498.403582] ata_dev_revalidate+0xcb/0x1f0 > [ 498.403623] ata_do_set_mode+0x126b/0x2410 > [ 498.403664] ata_set_mode+0x305/0x3e0 > [ 498.403706] ata_eh_recover+0x268c/0x4110 > [ 498.403749] ata_do_eh+0x7a/0xe0 > [ 498.403790] ata_sff_error_handler+0x263/0x360 > [ 498.403831] ata_bmdma_error_handler+0x295/0x580 > [ 498.403870] ata_scsi_port_error_handler+0x7c1/0x17b0 > [ 498.403911] ata_scsi_error+0x1f8/0x410 > [ 498.403951] scsi_error_handler+0x29b/0x14a0 > [ 498.403991] kthread+0x370/0x440 > [ 498.404030] ret_from_fork+0x22/0x30 > [ 498.404070] IN-SOFTIRQ-W at: > [ 498.404110] lock_acquire+0x1b8/0x720 > [ 498.404150] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.404190] ata_bmdma_interrupt+0x21/0x5a0 > [ 498.404232] __handle_irq_event_percpu+0x25e/0x630 > [ 498.404272] handle_irq_event+0x10e/0x290 > [ 498.404312] handle_edge_irq+0x1f7/0xbb0 > [ 498.404363] common_interrupt+0xad/0x1a0 > [ 498.404414] asm_common_interrupt+0x1e/0x40 > [ 498.404455] __do_softirq+0x173/0x899 > [ 498.404495] asm_call_irq_on_stack+0x12/0x20 > [ 498.404539] do_softirq_own_stack+0x8b/0xb0 > [ 498.404578] irq_exit_rcu+0x12c/0x1c0 > [ 498.404619] sysvec_apic_timer_interrupt+0x4f/0xb0 > [ 498.404661] asm_sysvec_apic_timer_interrupt+0x12/0x20 > [ 498.404703] _raw_spin_unlock_irqrestore+0x31/0x50 > [ 498.404744] ata_scsi_queuecmd+0xd5/0x190 > [ 498.404786] scsi_queue_rq+0x12ff/0x2c50 > [ 498.404848] blk_mq_dispatch_rq_list+0x367/0x2050 > [ 498.404889] __blk_mq_do_dispatch_sched+0x35f/0xa10 > [ 498.404930] __blk_mq_sched_dispatch_requests+0x305/0x4b0 > [ 498.404970] blk_mq_sched_dispatch_requests+0xd8/0x130 > [ 498.405012] __blk_mq_run_hw_queue+0x116/0x2c0 > [ 498.405052] blk_mq_run_work_fn+0x48/0x60 > [ 498.405094] process_one_work+0x7cc/0x1410 > [ 498.405135] worker_thread+0x583/0xf70 > [ 498.405174] kthread+0x370/0x440 > [ 498.405215] ret_from_fork+0x22/0x30 > [ 498.405253] INITIAL USE at: > [ 498.405295] lock_acquire+0x1b8/0x720 > [ 498.405336] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.405376] ata_dev_init+0x19c/0x3d0 > [ 498.405416] ata_link_init+0x1a6/0x310 > [ 498.405462] ata_port_alloc+0x4d7/0x5c0 > [ 498.405502] ata_host_alloc+0x1d2/0x260 > [ 498.405542] ata_host_alloc_pinfo+0x21/0x580 > [ 498.405583] ata_pci_sff_prepare_host+0x41/0xc0 > [ 498.405623] ata_pci_bmdma_prepare_host+0x14/0x60 > [ 498.405661] piix_init_one+0x41b/0xbe0 > [ 498.405703] local_pci_probe+0xde/0x170 > [ 498.405743] pci_device_probe+0x351/0x6d0 > [ 498.405787] really_probe+0x222/0xcc0 > [ 498.405827] driver_probe_device+0x1fe/0x390 > [ 498.405868] device_driver_attach+0x1e9/0x250 > [ 498.405909] __driver_attach+0xfb/0x270 > [ 498.405949] bus_for_each_dev+0x123/0x1c0 > [ 498.405988] driver_attach+0x3d/0x50 > [ 498.406028] bus_add_driver+0x35c/0x580 > [ 498.406068] driver_register+0x219/0x380 > [ 498.406108] __pci_register_driver+0x185/0x240 > [ 498.406150] piix_init+0x1e/0x2e > [ 498.406190] do_one_initcall+0xcb/0x410 > [ 498.406232] kernel_init_freeable+0x627/0x6bc > [ 498.406273] kernel_init+0x12/0x127 > [ 498.406313] ret_from_fork+0x22/0x30 > [ 498.406351] } > [ 498.406393] ... key at: [] __key.6+0x0/0x40 > [ 498.406432] ... acquired at: > [ 498.406479] _raw_read_lock_irqsave+0x89/0x93 > [ 498.406522] led_trigger_blink_oneshot+0x3f/0xe0 > [ 498.406564] ledtrig_disk_activity+0x79/0xf0 > [ 498.406602] ata_qc_complete+0x69/0xd80 > [ 498.406641] ata_do_link_abort+0x1a4/0x360 > [ 498.406681] ata_port_freeze+0x69/0x80 > [ 498.406719] ata_hsm_qc_complete+0x1da/0x240 > [ 498.406758] ata_sff_hsm_move+0x39a/0x1e10 > [ 498.406797] ata_sff_pio_task+0x397/0x4a0 > [ 498.406837] process_one_work+0x7cc/0x1410 > [ 498.406877] worker_thread+0x583/0xf70 > [ 498.406917] kthread+0x370/0x440 > [ 498.406957] ret_from_fork+0x22/0x30 > [ 498.406992] > [ 498.407029] -> (&trig->leddev_list_lock){.-.-}-{2:2} { > [ 498.407075] IN-HARDIRQ-R at: > [ 498.407117] lock_acquire+0x1b8/0x720 > [ 498.407168] _raw_read_lock_irqsave+0x59/0x93 > [ 498.407212] led_trigger_blink_oneshot+0x3f/0xe0 > [ 498.407252] ledtrig_disk_activity+0x79/0xf0 > [ 498.407291] ata_qc_complete+0x69/0xd80 > [ 498.407331] ata_hsm_qc_complete+0x1c1/0x240 > [ 498.407371] ata_sff_hsm_move+0x39a/0x1e10 > [ 498.407411] __ata_sff_port_intr+0x1d2/0x3e0 > [ 498.407450] ata_bmdma_port_intr+0x96/0x330 > [ 498.409897] ata_bmdma_interrupt+0x1fd/0x5a0 > [ 498.409940] __handle_irq_event_percpu+0x25e/0x630 > [ 498.409980] handle_irq_event+0x10e/0x290 > [ 498.410018] handle_edge_irq+0x1f7/0xbb0 > [ 498.410057] asm_call_irq_on_stack+0x12/0x20 > [ 498.410098] common_interrupt+0xfb/0x1a0 > [ 498.410141] asm_common_interrupt+0x1e/0x40 > [ 498.410182] _raw_spin_unlock_irq+0x2f/0x40 > [ 498.410222] ata_sff_flush_pio_task+0xa0/0x170 > [ 498.410262] ata_exec_internal_sg+0xbe0/0x1840 > [ 498.410301] ata_dev_read_id+0x8cd/0xcb0 > [ 498.410340] ata_dev_reread_id+0xef/0x1c0 > [ 498.410381] ata_dev_revalidate+0xcb/0x1f0 > [ 498.410421] ata_do_set_mode+0x126b/0x2410 > [ 498.410461] ata_set_mode+0x305/0x3e0 > [ 498.410502] ata_eh_recover+0x268c/0x4110 > [ 498.410542] ata_do_eh+0x7a/0xe0 > [ 498.410580] ata_sff_error_handler+0x263/0x360 > [ 498.410618] ata_bmdma_error_handler+0x295/0x580 > [ 498.410658] ata_scsi_port_error_handler+0x7c1/0x17b0 > [ 498.410700] ata_scsi_error+0x1f8/0x410 > [ 498.410739] scsi_error_handler+0x29b/0x14a0 > [ 498.410779] kthread+0x370/0x440 > [ 498.410819] ret_from_fork+0x22/0x30 > [ 498.410858] IN-SOFTIRQ-R at: > [ 498.410909] lock_acquire+0x1b8/0x720 > [ 498.410948] _raw_read_lock_irqsave+0x59/0x93 > [ 498.410987] led_trigger_event+0x34/0xd0 > [ 498.411029] kbd_bh+0x199/0x220 > [ 498.411070] tasklet_action_common.constprop.0+0x260/0x300 > [ 498.411110] tasklet_action+0x22/0x30 > [ 498.411159] __do_softirq+0x1bc/0x899 > [ 498.411200] run_ksoftirqd+0x3f/0x70 > [ 498.411243] smpboot_thread_fn+0x356/0x810 > [ 498.411283] kthread+0x370/0x440 > [ 498.411321] ret_from_fork+0x22/0x30 > [ 498.411358] INITIAL USE at: > [ 498.411400] lock_acquire+0x1b8/0x720 > [ 498.411440] _raw_write_lock_irqsave+0x4d/0x70 > [ 498.411481] led_trigger_set+0x47a/0xd80 > [ 498.411522] led_trigger_set_default+0x16e/0x1f0 > [ 498.411562] led_classdev_register_ext+0x596/0x7e0 > [ 498.411603] input_leds_connect+0x35a/0x649 [input_leds] > [ 498.411645] input_attach_handler.isra.0+0x10b/0x160 > [ 498.411685] input_register_handler+0x139/0x1e0 > [ 498.411724] 0xffffffffc02d8015 > [ 498.411762] do_one_initcall+0xcb/0x410 > [ 498.411802] do_init_module+0x1d8/0x770 > [ 498.411842] load_module+0x9eec/0xa9d0 > [ 498.411881] __do_sys_finit_module+0x131/0x1c0 > [ 498.411928] __x64_sys_finit_module+0x73/0xb0 > [ 498.411968] do_syscall_64+0x38/0x90 > [ 498.412010] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 498.412048] INITIAL READ USE at: > [ 498.412088] lock_acquire+0x1b8/0x720 > [ 498.412129] _raw_read_lock_irqsave+0x89/0x93 > [ 498.412169] led_trigger_event+0x34/0xd0 > [ 498.412213] rfkill_global_led_trigger_worker+0xbe/0xf0 > [ 498.412253] process_one_work+0x7cc/0x1410 > [ 498.412293] worker_thread+0x583/0xf70 > [ 498.412333] kthread+0x370/0x440 > [ 498.412373] ret_from_fork+0x22/0x30 > [ 498.412410] } > [ 498.412452] ... key at: [] __key.0+0x0/0x40 > [ 498.412491] ... acquired at: > [ 498.412530] lock_acquire+0x1b8/0x720 > [ 498.412572] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.412631] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi] > [ 498.412692] iwl_trans_txq_send_hcmd+0x191/0xf20 [iwlwifi] > [ 498.412751] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi] > [ 498.412834] iwl_mvm_send_cmd+0xf1/0x1d0 [iwlmvm] > [ 498.412896] iwl_mvm_led_set+0x1df/0x270 [iwlmvm] > [ 498.412954] iwl_led_brightness_set+0x1f/0x30 [iwlmvm] > [ 498.413001] led_set_brightness_nosleep+0xd4/0x1a0 > [ 498.413041] led_set_brightness+0x108/0x150 > [ 498.413088] led_trigger_event+0x73/0xd0 > [ 498.413170] ieee80211_led_radio+0x6e/0x90 [mac80211] > [ 498.413253] ieee80211_do_open+0x896/0x1fa0 [mac80211] > [ 498.413333] ieee80211_open+0x176/0x230 [mac80211] > [ 498.413383] __dev_open+0x260/0x450 > [ 498.413423] __dev_change_flags+0x4c9/0x750 > [ 498.413464] dev_change_flags+0x87/0x160 > [ 498.413504] do_setlink+0x7a3/0x3160 > [ 498.413543] __rtnl_newlink+0xbdd/0x15a0 > [ 498.413588] rtnl_newlink+0x64/0xa0 > [ 498.413627] rtnetlink_rcv_msg+0x36a/0x910 > [ 498.413670] netlink_rcv_skb+0x140/0x3d0 > [ 498.413710] rtnetlink_rcv+0x15/0x20 > [ 498.413748] netlink_unicast+0x465/0x7a0 > [ 498.413788] netlink_sendmsg+0x7b4/0xcf0 > [ 498.413830] sock_sendmsg+0xec/0x120 > [ 498.413870] ____sys_sendmsg+0x696/0x9f0 > [ 498.413909] ___sys_sendmsg+0x101/0x190 > [ 498.413948] __sys_sendmsg+0xc8/0x160 > [ 498.413986] __x64_sys_sendmsg+0x78/0xb0 > [ 498.414026] do_syscall_64+0x38/0x90 > [ 498.414067] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 498.414105] > [ 498.414142] > [ 498.414142] the dependencies between the lock to be acquired > [ 498.414145] and HARDIRQ-irq-unsafe lock: > [ 498.414228] -> (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2} { > [ 498.414275] HARDIRQ-ON-W at: > [ 498.414316] lock_acquire+0x1b8/0x720 > [ 498.414357] _raw_spin_lock_bh+0x38/0x50 > [ 498.414414] iwl_pcie_hcmd_complete+0x1c7/0x12b0 [iwlwifi] > [ 498.414473] iwl_pcie_rx_handle+0xe56/0x27e0 [iwlwifi] > [ 498.414533] iwl_pcie_napi_poll_msix+0xa3/0x240 [iwlwifi] > [ 498.414579] net_rx_action+0x3dd/0xea0 > [ 498.414636] __do_softirq+0x1bc/0x899 > [ 498.414676] asm_call_irq_on_stack+0x12/0x20 > [ 498.414717] do_softirq_own_stack+0x8b/0xb0 > [ 498.414756] do_softirq+0xab/0xd0 > [ 498.414795] __local_bh_enable_ip+0xf3/0x110 > [ 498.414851] iwl_pcie_irq_rx_msix_handler+0x26e/0x4e0 [iwlwifi] > [ 498.414896] irq_thread_fn+0x92/0x160 > [ 498.414935] irq_thread+0x244/0x4a0 > [ 498.414974] kthread+0x370/0x440 > [ 498.415014] ret_from_fork+0x22/0x30 > [ 498.415051] IN-SOFTIRQ-W at: > [ 498.415091] lock_acquire+0x1b8/0x720 > [ 498.415131] _raw_spin_lock_bh+0x38/0x50 > [ 498.415195] iwl_pcie_hcmd_complete+0x1c7/0x12b0 [iwlwifi] > [ 498.415254] iwl_pcie_rx_handle+0xe56/0x27e0 [iwlwifi] > [ 498.415314] iwl_pcie_napi_poll_msix+0xa3/0x240 [iwlwifi] > [ 498.415360] net_rx_action+0x3dd/0xea0 > [ 498.415406] __do_softirq+0x1bc/0x899 > [ 498.415445] asm_call_irq_on_stack+0x12/0x20 > [ 498.415484] do_softirq_own_stack+0x8b/0xb0 > [ 498.415524] do_softirq+0xab/0xd0 > [ 498.415565] __local_bh_enable_ip+0xf3/0x110 > [ 498.415620] iwl_pcie_irq_rx_msix_handler+0x26e/0x4e0 [iwlwifi] > [ 498.415665] irq_thread_fn+0x92/0x160 > [ 498.415705] irq_thread+0x244/0x4a0 > [ 498.415744] kthread+0x370/0x440 > [ 498.415783] ret_from_fork+0x22/0x30 > [ 498.415820] INITIAL USE at: > [ 498.415860] lock_acquire+0x1b8/0x720 > [ 498.415899] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.415956] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi] > [ 498.416016] iwl_trans_txq_send_hcmd+0x3fc/0xf20 [iwlwifi] > [ 498.416074] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi] > [ 498.416133] iwl_dbg_tlv_send_hcmds.isra.0+0x18f/0x240 [iwlwifi] > [ 498.416192] _iwl_dbg_tlv_time_point+0x261/0x2ca0 [iwlwifi] > [ 498.416248] iwl_run_unified_mvm_ucode+0x1fe/0x760 [iwlmvm] > [ 498.416303] iwl_run_init_mvm_ucode+0x5ef/0xb80 [iwlmvm] > [ 498.416359] iwl_mvm_start_get_nvm+0x1ee/0x7d0 [iwlmvm] > [ 498.416413] iwl_op_mode_mvm_start+0x2082/0x3280 [iwlmvm] > [ 498.416470] _iwl_op_mode_start.isra.0+0xef/0x1f0 [iwlwifi] > [ 498.416528] iwl_opmode_register+0x13f/0x210 [iwlwifi] > [ 498.416572] 0xffffffffc055803f > [ 498.416617] do_one_initcall+0xcb/0x410 > [ 498.416658] do_init_module+0x1d8/0x770 > [ 498.416697] load_module+0x9eec/0xa9d0 > [ 498.416736] __do_sys_finit_module+0x131/0x1c0 > [ 498.416775] __x64_sys_finit_module+0x73/0xb0 > [ 498.416814] do_syscall_64+0x38/0x90 > [ 498.416854] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 498.416893] } > [ 498.416949] ... key at: [] iwl_txq_cmd_queue_lock_class.89285+0x0/0xfffffffffff99440 [iwlwifi] > [ 498.416993] ... acquired at: > [ 498.417030] lock_acquire+0x1b8/0x720 > [ 498.417069] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.417126] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi] > [ 498.417182] iwl_trans_txq_send_hcmd+0x191/0xf20 [iwlwifi] > [ 498.417240] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi] > [ 498.417298] iwl_mvm_send_cmd+0xf1/0x1d0 [iwlmvm] > [ 498.417356] iwl_mvm_led_set+0x1df/0x270 [iwlmvm] > [ 498.417412] iwl_led_brightness_set+0x1f/0x30 [iwlmvm] > [ 498.417457] led_set_brightness_nosleep+0xd4/0x1a0 > [ 498.417498] led_set_brightness+0x108/0x150 > [ 498.417537] led_trigger_event+0x73/0xd0 > [ 498.417615] ieee80211_led_radio+0x6e/0x90 [mac80211] > [ 498.417695] ieee80211_do_open+0x896/0x1fa0 [mac80211] > [ 498.417775] ieee80211_open+0x176/0x230 [mac80211] > [ 498.417823] __dev_open+0x260/0x450 > [ 498.417873] __dev_change_flags+0x4c9/0x750 > [ 498.417913] dev_change_flags+0x87/0x160 > [ 498.417950] do_setlink+0x7a3/0x3160 > [ 498.417988] __rtnl_newlink+0xbdd/0x15a0 > [ 498.418027] rtnl_newlink+0x64/0xa0 > [ 498.418066] rtnetlink_rcv_msg+0x36a/0x910 > [ 498.418105] netlink_rcv_skb+0x140/0x3d0 > [ 498.418143] rtnetlink_rcv+0x15/0x20 > [ 498.418191] netlink_unicast+0x465/0x7a0 > [ 498.418229] netlink_sendmsg+0x7b4/0xcf0 > [ 498.418267] sock_sendmsg+0xec/0x120 > [ 498.418307] ____sys_sendmsg+0x696/0x9f0 > [ 498.418345] ___sys_sendmsg+0x101/0x190 > [ 498.418383] __sys_sendmsg+0xc8/0x160 > [ 498.418421] __x64_sys_sendmsg+0x78/0xb0 > [ 498.418459] do_syscall_64+0x38/0x90 > [ 498.418499] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 498.418535] > [ 498.418572] > [ 498.418572] stack backtrace: > [ 498.418620] CPU: 3 PID: 364 Comm: ip Tainted: G OE 5.10.27-dev-lockdep-fix2+ #1 > [ 498.418666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > [ 498.418709] Call Trace: > [ 498.418754] dump_stack+0xb3/0xed > [ 498.418798] check_irq_usage.cold+0x4e1/0x66d > [ 498.419100] __lock_acquire+0x2b0e/0x5460 > [ 498.419239] lock_acquire+0x1b8/0x720 > [ 498.419618] _raw_spin_lock_irqsave+0x4d/0x70 > [ 498.419737] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi] > [ 498.420081] iwl_trans_txq_send_hcmd+0x191/0xf20 [iwlwifi] > [ 498.420245] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi] > [ 498.420305] iwl_mvm_send_cmd+0xf1/0x1d0 [iwlmvm] > [ 498.420365] iwl_mvm_led_set+0x1df/0x270 [iwlmvm] > [ 498.423581] iwl_led_brightness_set+0x1f/0x30 [iwlmvm] > [ 498.423627] led_set_brightness_nosleep+0xd4/0x1a0 > [ 498.423668] led_set_brightness+0x108/0x150 > [ 498.423709] led_trigger_event+0x73/0xd0 > [ 498.423789] ieee80211_led_radio+0x6e/0x90 [mac80211] > [ 498.423869] ieee80211_do_open+0x896/0x1fa0 [mac80211] > [ 498.424028] ieee80211_open+0x176/0x230 [mac80211] > [ 498.424076] __dev_open+0x260/0x450 > [ 498.424249] __dev_change_flags+0x4c9/0x750 > [ 498.424339] dev_change_flags+0x87/0x160 > [ 498.424382] do_setlink+0x7a3/0x3160 > [ 498.425060] __rtnl_newlink+0xbdd/0x15a0 > [ 498.425697] rtnl_newlink+0x64/0xa0 > [ 498.425739] rtnetlink_rcv_msg+0x36a/0x910 > [ 498.425938] netlink_rcv_skb+0x140/0x3d0 > [ 498.426137] rtnetlink_rcv+0x15/0x20 > [ 498.426177] netlink_unicast+0x465/0x7a0 > [ 498.426384] netlink_sendmsg+0x7b4/0xcf0 > [ 498.426556] sock_sendmsg+0xec/0x120 > [ 498.426597] ____sys_sendmsg+0x696/0x9f0 > > > Note that the iwlwifi module was a newer version, not necessarily > matching the 5.10(.27) version, especially wrt. the locking of the txq- > >lock (which is represented here by iwl_txq_cmd_queue_lock_class due to > us separating the lock classes for different types of queues). > > > Clearly, the scenario lockdep outputs isn't really an issue at all, > since that will just continue to make progress on CPU1, finish, and then > the interrupt on CPU0 can make progress. > > I tried thinking about adding write_lock() to the scenario as you > (Boqun) did in the thread I linked to above, but even then I don't see > how it could deadlock since the write_lock() for leddev_list_lock is > always write_lock_irqsave(), so it can't get stuck on any of the other > locks involved here due to interrupts, I think? > > > All of the bits here that lockdep is reporting seem correct, but I can't > figure out how this chain matters? > > &host->lock --> &trig->leddev_list_lock --> &iwl_txq_cmd_queue_lock_class > > Yes, obviously we take leddev_list_lock under host->lock, and we also > take the iwl lock under leddev_list_lock. No argument there. But the > argument seems to be that we can have the iwl lock taken without > disabling interrupts (which we do), and that's a different code path > unrelated to the leddev_list_lock? > > Basically, I don't think I understand how a read-lock in the middle can > cause this chain to be a problem? > The deadlock senario lockdep reported is as follow: CPU 0 CPU 1 CPU 2 CPU 3 ===== ==== ===== ===== spin_lock(&txq->lock); spin_lock(&host->lock); read_lock(&trig->leddev_list_lock); write_lock_irqsave(&trig->leddev_list_lock); spin_lock(&txq->lock); /* The following blocked because of fairness */ read_lock(&trig->leddev_list_lock); spin_lock(&host->lock); The key to understand this deadlock is that via a write_lock() (no matter whether irq is disabled or not), a read_lock() can block another read_lock() because of the fairness: in this case, CPU 2 is the reader of ->leddev_list_lock, and when CPU 3 comes and tries to acquire the writer of ->leddev_list_lock, it gets blocked. And no one would acquire ->leddev_list_lock earlier than CPU 3 (*fairness*, even a reader cannot steal the lock), so CPU 1 will get blocked on its read_lock(). Therefore, this is a 4-CPU involved deadlock. > Can anyone help me decode this? Does the above help? Regards, Boqun > > Thanks, > johannes >