From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D8E8C43381 for ; Fri, 15 Mar 2019 22:05:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4FE5C218AC for ; Fri, 15 Mar 2019 22:05:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726697AbfCOWFT (ORCPT ); Fri, 15 Mar 2019 18:05:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59554 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726585AbfCOWFT (ORCPT ); Fri, 15 Mar 2019 18:05:19 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 697E530A223A; Fri, 15 Mar 2019 22:05:18 +0000 (UTC) Received: from elisabeth (ovpn-200-27.brq.redhat.com [10.40.200.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 52DE76FF19; Fri, 15 Mar 2019 22:05:08 +0000 (UTC) Date: Fri, 15 Mar 2019 23:04:57 +0100 From: Stefano Brivio To: Eric Dumazet Cc: David Miller , liuzhiqiang26@huawei.com, petrm@mellanox.com, idosch@mellanox.com, sd@queasysnail.net, mousuanming@huawei.com, netdev@vger.kernel.org, mingfangsen@huawei.com, zhoukang7@huawei.com, wangxiaogang3@huawei.com Subject: Re: [PATCH v2] vxlan: remove the redundant gro_cells_destroy() calling. Message-ID: <20190315230457.094e2939@elisabeth> In-Reply-To: <1b09614f-e500-f59b-5f1e-f896c3fd39ac@gmail.com> References: <20190315162824.732b18ac@elisabeth> <005ad387-8d51-561e-a5b9-8e851e03d5e9@gmail.com> <20190315.110249.648596993203657814.davem@davemloft.net> <20190315220841.078e15b7@elisabeth> <1b09614f-e500-f59b-5f1e-f896c3fd39ac@gmail.com> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Fri, 15 Mar 2019 22:05:18 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, 15 Mar 2019 14:26:10 -0700 Eric Dumazet wrote: > On 03/15/2019 02:08 PM, Stefano Brivio wrote: > > On Fri, 15 Mar 2019 11:56:01 -0700 > > Eric Dumazet wrote: > > > >> On 03/15/2019 11:02 AM, David Miller wrote: > >>> From: Eric Dumazet > >>> Date: Fri, 15 Mar 2019 09:06:25 -0700 > >>> > >>>> > >>>> > >>>> On 03/15/2019 08:28 AM, Stefano Brivio wrote: > >>>>> On Fri, 15 Mar 2019 23:18:52 +0800 > >>>>> Zhiqiang Liu wrote: > >>>>> > >>>>>> In vxlan_destroy_tunnels func, unregister_netdevice_queue is called after > >>>>>> gro_cells_destroy func. However, in unregister_netdevice_queue func, the > >>>>>> gro_cells_destroy func will also call the gro_cells_destroy func as the > >>>>>> following routine: > >>>>>> unregister_netdevice_many() -> rollback_registered_many() > >>>>>> -> ndo_uninit() -> gro_cells_destroy() > >>>>>> > >>>>>> Signed-off-by: Suanming.Mou > >>>>>> Reviewed-by: Zhiqiang Liu > >>>>>> Reviewed-by: Stefano Brivio > >>>>> > >>>>> NACK, please read my and Eric's comments to v1 -- giving me more than 23 > >>>>> minutes to answer would have been a nice touch as well :) > >>>>> > >>>> > >>>> Sorry for the confusion, I forgot to add the question marks to my sentences. > >>>> > >>>> In fact, this is a bug fix, that we missed in the previous fix. > >>>> > >>>> Technically the bug is older. > >>> > >>> Please elaborate. > >>> > >> > >> Commit ad6c9986bcb62 > >> ("vxlan: Fix GRO cells race condition between receive and link delete") > >> > >> fixed a race condition for the typical case a vxlan device is dismantled from the > >> current netns. > >> > >> But if a netns is dismantled, we call vxlan_destroy_tunnels() > >> to schedule a unregister_netdevice_queue() of all the vxlan tunnels > >> that are related to this netns. > > > > Won't that happen via ops_exit_list() only after synchronize_rcu() is > > called by cleanup_net(), though? Is there another path I missed? > > Just look at vxlan_destroy_tunnels(). > > The call to gro_cells_destroy(&vxlan->gro_cells); > is done _before_ > unregister_netdevice_queue(vxlan->dev, head); > > So packets can still fly, the RCU grace period has not yet started. Wait, what... :/ thanks for pointing that out, I guess it was too obvious for me to notice. Zhiqiang, could you maybe update the commit message with these two bits of information (the real issue explained by Eric, and the different Fixes: tag), and post v3? This would be an actual fix and not a clean-up, so it doesn't need to wait for net-next to re-open. -- Stefano