From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99049C4361B for ; Wed, 9 Dec 2020 00:11:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5B1AA23B55 for ; Wed, 9 Dec 2020 00:11:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730398AbgLIALc (ORCPT ); Tue, 8 Dec 2020 19:11:32 -0500 Received: from m43-15.mailgun.net ([69.72.43.15]:23755 "EHLO m43-15.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725906AbgLIALc (ORCPT ); Tue, 8 Dec 2020 19:11:32 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1607472671; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=F7a2bvI/3J/IthbcdUv2Ssr5gCPWZKmwp8RPsMjnRqM=; b=drCGChBwaAPVToLVOgNu1NTeZjP48jJt0Jl2O8vy4JAGSjz8M87+D1teLCi8oGlLU6sJweRP qiHQTZLVaBfxp27I5eTACvDcpitgrohtTSjVJHodpdlh5XOxK0zOnEDASaa5B4XxFua1VdmW /Zp86wp09giepncuhpXhI7J3sb4= X-Mailgun-Sending-Ip: 69.72.43.15 X-Mailgun-Sid: WyJiZjI2MiIsICJuZXRkZXZAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n09.prod.us-west-2.postgun.com with SMTP id 5fcfd035b0e089112d37c384 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Tue, 08 Dec 2020 19:12:53 GMT Sender: stranche=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 6E8A7C43463; Tue, 8 Dec 2020 19:12:53 +0000 (UTC) Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: stranche) by smtp.codeaurora.org (Postfix) with ESMTPSA id DD755C433CA; Tue, 8 Dec 2020 19:12:52 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 08 Dec 2020 12:12:52 -0700 From: stranche@codeaurora.org To: Wei Wang Cc: Eric Dumazet , David Ahern , Martin KaFai Lau , Mahesh Bandewar , Jakub Kicinski , Linux Kernel Network Developers , Subash Abhinov Kasiviswanathan Subject: Re: Refcount mismatch when unregistering netdevice from kernel In-Reply-To: References: <56e72b72-685f-925d-db2d-d245c1557987@gmail.com> Message-ID: <307c2de1a2ddbdcd0a346c57da88b394@codeaurora.org> X-Sender: stranche@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Wei and Eric, Thanks for the replies. This was reported to us on the 5.4.61 kernel during a customer regression suite, so we don't have an exact reproducer unfortunately. From the trace logs we've added it seems like this is happening during IPv6 transport mode XFRM data transfer and the device is unregistered in the middle of it, but we've been unable to reproduce it ourselves.. We're open to trying out and sharing debug patches if needed though. > rt6_uncached_list_flush_dev() actually tries to replace the inet6_dev > with loopback_dev, and release the reference to the previous inet6_dev > by calling in6_dev_put(), which is actually doing the same thing as > ip6_dst_ifdown(). I don't understand why you say " a reference to the > inet6_dev is simply dropped". Fair. I was going off the semantics used by the dst_dev_put() function which calls dst_ops->ifdown() explicitly. At least in the case of xfrm6_dst_ifdown() this swap of the loopback device and putting the refcount seems like it could be missing a few things. > The additional refcount to the DST is also released by doing the > following: > if (rt_dev == dev) { > rt->dst.dev = blackhole_netdev; > dev_hold(rt->dst.dev); > dev_put(rt_dev); > } > Am I missing something? That dev_put() is on the actual netdevice struct, not the inet6_dev associated with it. We're seeing many calls to icmp6_dst_alloc() and xfrm6_fill_dst() here, both of which seem to associate a reference to the inet6_dev struct with the DST in addition to the standard dev_hold() on the netdevice during the dst_alloc()/dst_init(). Thanks, Sean