From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91610C33C9E for ; Wed, 8 Jan 2020 11:43:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 646212077B for ; Wed, 8 Jan 2020 11:43:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="pEMTfcU5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728054AbgAHLnN (ORCPT ); Wed, 8 Jan 2020 06:43:13 -0500 Received: from mail-lf1-f54.google.com ([209.85.167.54]:33806 "EHLO mail-lf1-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727613AbgAHLnM (ORCPT ); Wed, 8 Jan 2020 06:43:12 -0500 Received: by mail-lf1-f54.google.com with SMTP id l18so2231002lfc.1 for ; Wed, 08 Jan 2020 03:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XXfm8PhlopV3AkYcnPLvA6eEp627Iq7NjTcekZwIaDs=; b=pEMTfcU5jp6EcnRQeGp56SiT0YitreUnOzIMluGLUh7ZzTLxk4KM2YzQu30El1tDjT cANr108PeEjNI9cPiJun8UdDHxUdKjt1AzsSUke/H/HlsbbF8qE/RP+WH+YjA3uAsqD7 3wIxwx4v6YG8t3SUuf6+Zz63QkAdB1Kh2k9DiP0WqAw2f4pQ1CO9EtnwoazCQ3RoZDQJ rgbRM8+pXWP+zOuJkWO0mTFzwAC4uBzxAELLyk1fUGp8hlYltN3zzwJ/5UL5RsyQ4edf FStamg9/r5wFZk0aE7kqXqWtBbWPDPK4Gt4gsMEiC6WetrjAYCvwFSxDv9M3aPhXpmEn M2ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XXfm8PhlopV3AkYcnPLvA6eEp627Iq7NjTcekZwIaDs=; b=CdvwkEIDcK30D3KpY4QKY7aDJK3tDZSXXtGRVuIWQieyEImlkjZ57m4ZpLe7+Ol1pL rAoyXKjMR2r9ZEBo00XMTANvZkBbxae3veiqSMsNutmjjHGaZS6K3h1r4KbfBwPrGFuw sFlFa+RtS6H+GH02yF5uoPq/y8XKi6pzvtzu4SyAmpCnSFr0X7wehcBkeVQQwLQNXdi6 0t3zhfWiEODxvpfvvGQ/mA4ttfxfmFghR2W5iFFTigU15mVOlloOTIV5B9Vb8jHZnv63 UNTEf7Bep16Fk/H4l+zSUjG14u1wI5SRYLiew+jhU6X5kL505AKY8blVrn0remgP3V4d eqQw== X-Gm-Message-State: APjAAAVyWQGZniWZUwnO7lHu/DaAwcLvJjmiCpZyVDOaowGy7MK1Y6Nh DokRUuX3V+DhjDikCqQIq2cWzcMlMINKq/t2FsU= X-Google-Smtp-Source: APXvYqxDvWEOeRdrQd+xwgqJSgQj3h7KdTU03cafJp8sEyPCdInreZDznbejGd7Mmewwgr3QjWizoP+F9kSuED6oBSo= X-Received: by 2002:ac2:430d:: with SMTP id l13mr2807474lfh.112.1578483789443; Wed, 08 Jan 2020 03:43:09 -0800 (PST) MIME-Version: 1.0 References: <000000000000ab3f800598cec624@google.com> <000000000000802598059b6c7989@google.com> In-Reply-To: From: Taehee Yoo Date: Wed, 8 Jan 2020 20:42:58 +0900 Message-ID: Subject: Re: WARNING: bad unlock balance in sch_direct_xmit To: Cong Wang Cc: syzbot , Linux Kernel Network Developers Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 8 Jan 2020 at 09:34, Cong Wang wrote: > > On Tue, Jan 7, 2020 at 3:31 AM Taehee Yoo wrote: > > After "ip link set team0 master team1", the "team1 -> team0" locking path > > will be recorded in lockdep key of both team1 and team0. > > Then, if "ip link set team1 master team0" is executed, "team0 -> team1" > > locking path also will be recorded in lockdep key. At this moment, > > lockdep will catch possible deadlock situation and it prints the above > > warning message. But, both "team0 -> team1" and "team1 -> team0" > > will not be existing concurrently. so the above message is actually wrong. > > In order to avoid this message, a recorded locking path should be > > removed. So, both lockdep_unregister_key() and lockdep_register_key() > > are needed. > > > > So, after you move the key down to each netdevice, they are now treated > as different locks. Is this stacked device scenario the reason why you > move it to per-netdevice? If so, I wonder why not just use nested locks? > Like: > > netif_addr_nested_lock(upper, 0); > netif_addr_nested_lock(lower, 1); > netif_addr_nested_unlock(lower); > netif_addr_nested_unlock(upper); > > For this case, they could still share a same key. > > Thanks for the details! Yes, the reason for using dynamic lockdep key is to avoid lockdep warning in stacked device scenario. But, the addr_list_lock case is a little bit different. There was a bug in netif_addr_lock_nested() that "dev->netdev_ops->ndo_get_lock_subclass" isn't updated after "master" and "nomaster" command. So, the wrong subclass is used, so lockdep warning message was printed. There were some ways to fix this problem, using dynamic key is just one of them. I think using the correct subclass in netif_addr_lock_nested() is also a correct way to fix that problem. Another minor reason was that the subclass is limited by 8. but dynamic key has no limitation. Unfortunately, dynamic key has a problem too. lockdep limits the maximum number of lockdep keys. $cat /proc/lockdep_stats lock-classes: 1228 [max: 8192] So, If so many network interfaces are created, they use so many lockdep keys. If so, lockdep will stop. This is the cause of "BUG: MAX_LOCKDEP_KEYS too low!". Thank you! Taehee Yoo