From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7970BC433FE for ; Sat, 18 Sep 2021 09:23:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5AF916113A for ; Sat, 18 Sep 2021 09:23:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236255AbhIRJZC (ORCPT ); Sat, 18 Sep 2021 05:25:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236527AbhIRJY7 (ORCPT ); Sat, 18 Sep 2021 05:24:59 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A35FC061757 for ; Sat, 18 Sep 2021 02:23:36 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id s24so9090633wmh.4 for ; Sat, 18 Sep 2021 02:23:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=pyKD1JIq/bvkde+v3KKxiO0PuPyoOwDjP8n0rHd3UkA=; b=R8GFxIBtPTvt0iyrhv08FZGjMqI0lpxugH8moIn7i5LhM4B1UjwcvxZfzcbbBoJjb2 uUv542Bxqz20dJWFgo7E4fvBA+SJpVwfc7ZCGEnNHOeQfRbSN9Ch5sGzf/tJNkYifS4q HTz+A6JYPHyhTiCyZxtqS+cgLREW5C+9mjRkorQa0rkUqZlGt7a5hx9p0QCw8GjmN1XN vQ+9GQfHCRr4Q/K8hCi822FdCteg4VVVqluzwIqRYKB69hvm18uq1MT28d1M9HBSXWvO rGNng1MiMO6O38J7x2Skf3aidzr9eR3dqtxUNiRI/+BoBsCpCe7ta2HxjSn7pUkTnjtN sduQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=pyKD1JIq/bvkde+v3KKxiO0PuPyoOwDjP8n0rHd3UkA=; b=CVlC3rpB2+mj66wuE8AfN3Ca/ZHFkg6FVP6nUpaa34kQjYYMT04DIpsqGHfA6xF9ld SA77oiML/VGGt4jl8cG4ao8BZ+V78fMx5r2Ij8mKNyUG7WE6L7kGlWB9zq2zbGgiWfLf CpWcLXdJyIny3pUjlCliN9MW2dZCPAEoYALDTTh6ROVRubFfdcT0oSKumenk6dZMtrNZ +vz9o+8kDuMVtrFFNie3RR41C6x75CTkJTCkcSD9wlXkK4LOJWpnh+4PZMZL/qCW3AIN gdnKXrBP/6XxXLd4FDVrpjkNTt7G1vLRhD43CaUfWHM8cuBWcld+fVaSra7ugsWpbNH1 Q8lA== X-Gm-Message-State: AOAM530dj8kFnNJDRoxWm2guY8UnUwcHnvpKgnF5jVseT24VwFyVRklo xiqc2NOFdFI5HWvl8G6oeh60Mg== X-Google-Smtp-Source: ABdhPJysVYQx/YJuWxHkJrEF5utDxIvws4jhCriJGHkBxQpD+GNKKIVXM3hpnxsOuYvBL9c94gh6fA== X-Received: by 2002:a05:600c:2046:: with SMTP id p6mr19952056wmg.88.1631957014573; Sat, 18 Sep 2021 02:23:34 -0700 (PDT) Received: from Iliass-MBP (ppp-2-87-157-253.home.otenet.gr. [2.87.157.253]) by smtp.gmail.com with ESMTPSA id r25sm9473875wra.76.2021.09.18.02.23.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Sep 2021 02:23:34 -0700 (PDT) Date: Sat, 18 Sep 2021 12:23:29 +0300 From: Ilias Apalodimas To: Yunsheng Lin Cc: Jesper Dangaard Brouer , brouer@redhat.com, Alexander Duyck , davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linuxarm@openeuler.org, hawk@kernel.org, jonathan.lemon@gmail.com, alobakin@pm.me, willemb@google.com, cong.wang@bytedance.com, pabeni@redhat.com, haokexin@gmail.com, nogikh@google.com, elver@google.com, memxor@gmail.com, edumazet@google.com, dsahern@gmail.com Subject: Re: [PATCH net-next v2 3/3] skbuff: keep track of pp page when __skb_frag_ref() is called Message-ID: References: <36676c07-c2ca-bbd2-972c-95b4027c424f@huawei.com> <4a682251-3b40-b16a-8999-69acb36634f3@huawei.com> <5d8232b1-4b85-f755-a92a-d305bff9eab3@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5d8232b1-4b85-f755-a92a-d305bff9eab3@huawei.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [...] > >>>>> I was mostly referring to the skb head here and how would you trigger the > >>>>> recycling path. > >>>>> > >>>>> I think we are talking about different things here. > >>>>> One idea is to use the last bit of frag->bv_page to identify fragments > >>>>> allocated from page_pool, which is done today with the signature. > >>>>> > >>>>> The signature however exists in the head page so my question was, can we rid > >>>>> of that without having a performance penalty? > >>>> > >>>> As both skb frag and head page is eventually operated on the head page > >>>> of a compound page(if it is a compound page) for normal case too, maybe > >>>> we can refactor the code to get the head page of a compound page before > >>>> the signature checking without doing a second virt_to_head_page() or > >>>> compound_head() call? > >>> > >>> Yea that's doable, but my concern is different here. If we do that the > >>> standard network stack, even for drivers that don't use page_pool, will > >>> have to do a virt_to_head_page() -> check signature, to decide if it has to > >>> try recycling the packet. That's the performance part I am worried about, > >>> since it happens for every packet. > >> > >> Yes, there is theoretically performance penalty for virt_to_head_page() or > >> compound_head(), will do more test if we decide to go with the signature > >> checking. > > > > Can we check this somehow? I can send a patch for this, but my > > testing is limited to 1Gbit for the recycling. I can find > > 25/100Gbit interfaces for the 'normal' path. > > I have done the signature checking for frag page of a skb, I am not > able to see noticable change between patched(patched with this patch) and > unpatched, for small packet drop test case(perfermance data is about 34Mpps). > > As the hns3 driver does not use the build_skb() API, so I am not able to test > the signature checking penalty for head page of a skb, any chance to do the > testing for head page of a skb on your side? Yea I think I'll see what I can do. I wasn't expecting any regression on the recycling path, since we do eventually call virt_to_head_page(), but it's always good to confirm. Thanks for testing it. > > > > >> > >>> > >>>> > >>>>> > >>>>> IOW in skb_free_head() an we replace: > >>>>> > >>>>> if (skb_pp_recycle(skb, head)) > >>>>> with > >>>>> if (page->pp_magic & ~0x3UL) == PP_SIGNATURE) > >>>>> and get rid of the 'bool recycle' argument in __skb_frag_unref()? > >>>> > >>>> For the frag page of a skb, it seems ok to get rid of the 'bool recycle' > >>>> argument in __skb_frag_unref(), as __skb_frag_unref() and __skb_frag_ref() > >>>> is symmetrically called to put/get a page. > >>>> > >>>> For the head page of a skb, we might need to make sure the head page > >>>> passed to __build_skb_around() meet below condition: > >>>> do pp_frag_count incrementing instead of _refcount incrementing when > >>>> the head page is not newly allocated and it is from page pool. > >>>> It seems hard to audit that? > >>> > >>> Yea that seems a bit weird at least to me and I am not sure, it's the only > >>> place we'll have to go and do that. > >> > >> Yes, That is why I avoid changing the behavior of a head page for a skb. > >> In other word, maybe we should not track if head page for a skb is pp page > >> or not when the page'_refcount is incremented during network stack journey, > >> just treat it as normal page? > >> > > > > I am not sure I understand this. > > I was saying only treat the head page of a skb as pp page when it is newly > allocated from page pool, if that page is reference-counted to build another > head page for another skb later, just treat it as normal page. But the problem here is that a cloned/expanded SKB could trigger a race when freeing the fragments. That's why we reset the pp_recycle bit if there's still references to the frags. What does 'normal' page means here? We'll have to at least unmap dma part. > > > > >>> > >>>> > >>>> > >>>>> > >>>>>> bit 0 of frag->bv_page is different way of indicatior for a pp page, > >>>>>> it is better we do not confuse with the page signature way. Using > >>>>>> a bit 0 may give us a free word in 'struct page' if we manage to > >>>>>> use skb->pp_recycle to indicate a head page of the skb uniquely, meaning > >>>>>> page->pp_magic can be used for future feature. > >>>>>> > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>> for pp_recycle right now? __skb_frag_unref() in skb_shift() or > >>>>>>>>> skb_try_coalesce() (the latter can probably be removed tbh). > >>>>>>>> > >>>>>>>> If we decide to go with accurate indicator of a pp page, we just need > >>>>>>>> to make sure network stack use __skb_frag_unref() and __skb_frag_ref() > >>>>>>>> to put and get a page frag, the indicator checking need only done in > >>>>>>>> __skb_frag_unref() and __skb_frag_ref(), so the skb_shift() and > >>>>>>>> skb_try_coalesce() should be fine too. > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Another way is to use the bit 0 of frag->bv_page ptr to indicate if a frag > >>>>>>>>>> page is from page pool. > >>>>>>>>> > >>>>>>>>> Instead of the 'struct page' signature? And the pp_recycle bit will > >>>>>>>>> continue to exist? > >>>>>>>> > >>>>>>>> pp_recycle bit might only exist or is only used for the head page for the skb. > >>>>>>>> The bit 0 of frag->bv_page ptr can be used to indicate a frag page uniquely. > >>>>>>>> Doing a memcpying of shinfo or "*fragto = *fragfrom" automatically pass the > >>>>>>>> indicator to the new shinfo before doing a __skb_frag_ref(), and __skb_frag_ref() > >>>>>>>> will increment the _refcount or pp_frag_count according to the bit 0 of > >>>>>>>> frag->bv_page. > >>>>>>>> > >>>>>>>> By the way, I also prototype the above idea, and it seems to work well too. > >>>>>>>> > >>>>>>> > >>>>>>> As long as no one else touches this, it's just another way of identifying a > >>>>>>> page_pool allocated page. But are we gaining by that? Not using > >>>>>>> virt_to_head_page() as stated above? But in that case you still need to > >>>>>>> keep pp_recycle around. > >>>>>> > >>>>>> No, we do not need the pp_recycle, as long as the we make sure __skb_frag_ref() > >>>>>> is called after memcpying the shinfo or doing "*fragto = *fragfrom". > >>>>> > >>>>> But we'll have to keep it for the skb head in this case. > >>>> > >>>> As above, I am not really look into skb head case:) > >>> > >>> Let me take a step back here, because I think we drifted a bit. > >>> The page signature was introduced in order to be able to identify skb > >>> fragments. The problem was that you couldn't rely on the pp_recycle bit of > >>> the skb head, since fragments could come from anywhere. So you use the > >>> skb bit as a hint for skb frags, and you eventually decide using the page > >>> signature. > >>> > >>> So we got 3 options (Anything I've missed ?) > >>> - try to remove pp_recycle bit, since the page signature is enough for the > >>> skb head and fragments. That in my opinion is the cleanest option, as > >>> long as we can prove there's no performance hit on the standard network > >>> path. > >>> > >>> - Replace the page signature with frag->bv_page bit0. In that case we > >>> still have to keep the pp_recycle bit, but we do have an 'easier' > >>> indication that a skb frag comes from page_pool. That's still pretty > >>> safe, since you now have unique identifiers for the skb and page > >>> fragments and you can be sure of their origin (page pool or not). > >>> What I am missing here, is what do we get out of this? I think the > >>> advantage is not having to call virt_to_head_page() for frags ? > >> > >> Not using the signature will free a word space in struct page for future > >> feature? > > > > Yea that's another thing we gain, but I am not sure how useful how this is > > going to turn out. > > > >> > >>> > >>> - Keep all of them(?) and use frag->bv_page bit0 similarly to pp_recycle > >>> bit? I don't see much value on this one, I am just keeping it here for > >>> completeness. > >> > >> > >> For safty and performance reason: > >> 1. maybe we should move the pp_recycle bit from "struct sk_buff" to > >> "struct skb_shared_info", and use it to only indicate if the head page of > >> a skb is from page pool. > > > > What's the safety or performance we gain out of this? The only performance > > safety is that we still have two ways to indicate a pp page. > the pp_recycle bit in "struct skb_shared_info" or frag->bv_page bit0 tell > if we want to treat a page as pp page, the page signature checking is used > to tell if we if set those bits correctly? > Yea but in the long run we'll want the page signature. So that's basically (2) once we do that. > > I can think of is the dirty cache line of the recycle bit we set to 0. > > If we do move it to skb_shared)info we'll have to make sure it's on the > > same cacheline as the ones we already change. > > Yes, when we move the pp_recycle bit to skb_shared_info, that bit is only > set once, and we seems to not need to worry about skb doing cloning or > expanding as the it is part of head page(shinfo is part of head page). > > >> > >> 2. The frag->bv_page bit0 is used to indicate if the frag page of a skb is > >> from page pool, and modify __skb_frag_unref() and __skb_frag_ref() to keep > >> track of it. > >> > >> 3. For safty or debugging reason, keep the page signature for now, and put a > >> page signature WARN_ON checking in page pool to catch any misbehaviour? > >> > >> If there is not bug showing up later, maybe we can free the page signature space > >> for other usage? > > > > Yea that's essentially identical to (2) but we move the pp_recycle on the > > skb_shared_info. I'd really prefer getting rid of the pp_recycle entirely, > > When also removing the pp_recycle for head page of a skb, it seems a little > risky as we are not sure when a not-newly-allocated pp page is called with > __build_skb_around() to build to head page? Removing the pp_recyle, is only safe if we keep the page signature. I was suggesting we follow (1) first before starting moving things around. > > > since it's the cleanest thing we can do in my head. If we ever need an > > extra 4/8 bytes in the future, we can always go back and implement this. > > > > Alexander/Jesper any additional thoughts? > > Thanks /Ilias