From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31734C4320E for ; Thu, 2 Sep 2021 08:32:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18B27610C8 for ; Thu, 2 Sep 2021 08:32:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244449AbhIBIdL (ORCPT ); Thu, 2 Sep 2021 04:33:11 -0400 Received: from relay.sw.ru ([185.231.240.75]:49666 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243772AbhIBIdJ (ORCPT ); Thu, 2 Sep 2021 04:33:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=62yNtXAVsWniAwR6yGNwwYjxs676XySY+v+xUYXen5A=; b=EBbScHoXb2TSzfvaV 1pU3ncTE20uBKmts+wDKmKqfsoyjQ3NuRqHou4ZROvnVw2q8sNykJK+13p6st9GCSv8zr4GE8yWEO bejNqnb4uFsNk4ObHk+++NnJKB4HnlDedD2BZn9QPJFaTrFBqYJEdI8Q+7MPaCXJusCJ5pOzvKkvU =; Received: from [10.93.0.56] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mLi8e-000YeG-G6; Thu, 02 Sep 2021 11:32:00 +0300 Subject: Re: [PATCH net-next v4] skb_expand_head() adjust skb->truesize incorrectly From: Vasily Averin To: Eric Dumazet , Christoph Paasch , "David S. Miller" Cc: Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , netdev , linux-kernel@vger.kernel.org, kernel@openvz.org, Alexey Kuznetsov , Julian Wiedmann References: <67740366-7f1b-c953-dfe1-d2085297bdf3@gmail.com> <8a183782-f4b9-e12a-55d1-c4a3c4078369@virtuozzo.com> <2984f16b-7f20-e72d-1661-b942fdc4ff9b@virtuozzo.com> Message-ID: <27f87dd8-f6e4-b2b0-2b3a-9378fddf147f@virtuozzo.com> Date: Thu, 2 Sep 2021 11:31:59 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <2984f16b-7f20-e72d-1661-b942fdc4ff9b@virtuozzo.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/2/21 10:33 AM, Vasily Averin wrote: > On 9/2/21 10:13 AM, Vasily Averin wrote: >> On 9/2/21 7:48 AM, Eric Dumazet wrote: >>> On 9/1/21 9:32 PM, Eric Dumazet wrote: >>>> I think you missed netem case, in particular >>>> skb_orphan_partial() which I already pointed out. >>>> >>>> You can setup a stack of virtual devices (tunnels), >>>> with a qdisc on them, before ip6_xmit() is finally called... >>>> >>>> Socket might have been closed already. >>>> >>>> To test your patch, you could force a skb_orphan_partial() at the beginning >>>> of skb_expand_head() (extending code coverage) >>> >>> To clarify : >>> >>> It is ok to 'downgrade' an skb->destructor having a ref on sk->sk_wmem_alloc to >>> something owning a ref on sk->refcnt. >>> >>> But the opposite operation (ref on sk->sk_refcnt --> ref on sk->sk_wmem_alloc) is not safe. >> >> Could you please explain in more details, since I stil have a completely opposite point of view? >> >> Every sk referenced in skb have sk_wmem_alloc > 9 >> It is assigned to 1 in sk_alloc and decremented right before last __sk_free(), >> inside both sk_free() sock_wfree() and __sock_wfree() >> >> So it is safe to adjust skb->sk->sk_wmem_alloc, >> because alive skb keeps reference to alive sk and last one keeps sk_wmem_alloc > 0 >> >> So any destructor used sk->sk_refcnt will already have sk_wmem_alloc > 0, >> because last sock_put() calls sk_free(). >> >> However now I'm not sure in reversed direction. >> skb_set_owner_w() check !sk_fullsock(sk) and call sock_hold(sk); >> If sk->sk_refcnt can be 0 here (i.e. after execution of old destructor inside skb_orphan) >> -- it can be trigger pointed problem: >> "refcount_add() will trigger a warning (panic under KASAN)". >> >> Could you please explain where I'm wrong? > > To clarify: > I'm agree it is unsafe to call on alive skb: I badly explained the problem in previous letter, let me repeat once again: I'm told about this piece of code: + } else if (sk && skb->destructor != sock_edemux) { + delta = osize - skb_end_offset(skb); + if (!is_skb_wmem(skb)) + skb_set_owner_w(skb, sk); + skb->truesize += delta; + if (sk_fullsock(sk)) + refcount_add(delta, &sk->sk_wmem_alloc); } it is called on alive expanded skb and it is incorrect because 2 reasons: a) if old destructor use ref on sk->sk_wmem_alloc It can decrease to 0 and release sk. b) if old descriptor use ref on sk->refcnt and !sk_fullsock(sk) old decriptor can release last reference and release sk. We can workaround release of sk by move of refcount_add(delta, &sk->sk_wmem_alloc) before skb_set_owner_w() } else if (sk && skb->destructor != sock_edemux) { delta = osize - skb_end_offset(skb); refcount_add(delta, &sk->sk_wmem_alloc); if (!is_skb_wmem(skb)) skb_set_owner_w(skb, sk); skb->truesize += delta; #ifdef CONFIG_INET if (!sk_fullsock(sk)) refcount_dec(delta, &sk->sk_wmem_alloc); #endif } However it it does not resolve b) completely oid skb_set_owner_w(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); <<< old destructor releases last sk->refcnt ... skb->sk = sk; ... if (unlikely(!sk_fullsock(sk))) { skb->destructor = sock_edemux; sock_hold(sk); <<<< ...and it trigger wrining/panic return; } Thank you, Vasily Averin