From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64E14C433C1 for ; Wed, 24 Mar 2021 22:49:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE26A61A1E for ; Wed, 24 Mar 2021 22:49:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE26A61A1E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6A4A96B032D; Wed, 24 Mar 2021 18:49:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 654AA6B0332; Wed, 24 Mar 2021 18:49:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CE516B0333; Wed, 24 Mar 2021 18:49:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id 308776B032D for ; Wed, 24 Mar 2021 18:49:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E66E2180ACEEB for ; Wed, 24 Mar 2021 22:49:27 +0000 (UTC) X-FDA: 77956260774.05.ACBD31A Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf06.hostedemail.com (Postfix) with ESMTP id 365DCC0007CC for ; Wed, 24 Mar 2021 22:49:27 +0000 (UTC) Received: by mail-pj1-f46.google.com with SMTP id lr1-20020a17090b4b81b02900ea0a3f38c1so3324061pjb.0 for ; Wed, 24 Mar 2021 15:49:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CnG9+rCYeZJOIO9wMA1OXXczz9AhODbjeHBnYJ2S+dQ=; b=eTp2wE3KCcGL21ofZv6+GyxjukrFrzM4orBTTCdTZi62hw3w9tW9Q7zv2c01Al6gsO 97oQPxXzoeZKpLW8VSCq8NNFKYBlBiUZmZNJp3pJdLjTKwSJ8zzLXffVO6p/zEuJ+lp9 Wqw+/wIvYMInmyLsmfJ2b4PEDWD58m2Ihzg4GKOgh6D3ydDkAE1KZEBCxIUEnIxM3pHH PfHP/whaBXAVlYcwax2Ci/U2I1x3kFfXHFXSdNYCZxR5WQZ0jns6WPKbBk4oZWTel0k7 ro/niqhhCx94gnqMkOkn1iAWzkkLRUs7mXITMVx5WaYhy391bq59krI9cSMJSzYUaFMr ocAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CnG9+rCYeZJOIO9wMA1OXXczz9AhODbjeHBnYJ2S+dQ=; b=EjED0qY2944pKe5J2GdAs2l/pZ9Krsz7G0NH6MYaQB+0En4QLvrVGbNaxozdPWjdov iXfk7HQJCs5SpESqVKzBZQE/Tu7GmQNRhw5xxwGaeZ2VhOL+qAWfOquy8gYZnAa0Ew3H sHCCgEmBl6W+r0RYkfo4H0LQm4MYzh2g0+/+QL3nYeBzgqsSTP5nYp3wP1EX5M9w0L5B Bz+/8uKynXnMGVUPgt2QnWDUo+t/NTe834m7WLZYn3ydfj46+8ta094Q0EW4t+2409+F X7zgjOpMp80YAebJVEXl/lZW/iK+OMzcRsy7YJ5rSr51WBBkODUL2IwNyi0yZcYss+VQ ae2Q== X-Gm-Message-State: AOAM530Ff+LKAdyHP6ujb06V8Bg7TfF3QXBcpDzR1ggUvoN7Q6faBWgj ZgazGaKrlka8zbsCr8tBQnB5rWJF14JhITAQSrAznQ== X-Google-Smtp-Source: ABdhPJyPb44Fh9aB04e0tm1x3uWtjFBWj7OBBB88nVTq2jx4ZXu57a+iK6vqBVVgTjATsZexfmrkBxnckZ2oyfpH3Mg= X-Received: by 2002:a17:90a:9d82:: with SMTP id k2mr5507894pjp.48.1616626166323; Wed, 24 Mar 2021 15:49:26 -0700 (PDT) MIME-Version: 1.0 References: <20210316041645.144249-1-arjunroy.kdev@gmail.com> In-Reply-To: From: Arjun Roy Date: Wed, 24 Mar 2021 15:49:15 -0700 Message-ID: Subject: Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy To: Johannes Weiner Cc: Michal Hocko , Arjun Roy , Andrew Morton , David Miller , netdev , Linux Kernel Mailing List , Cgroups , Linux MM , Shakeel Butt , Eric Dumazet , Soheil Hassas Yeganeh , Jakub Kicinski , Yang Shi , Roman Gushchin Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: ro5iqerinko69sfau8ardkasxzw1rxhz X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 365DCC0007CC Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mail-pj1-f46.google.com; client-ip=209.85.216.46 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616626167-608280 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 24, 2021 at 2:24 PM Johannes Weiner wrote: > > On Wed, Mar 24, 2021 at 10:12:46AM +0100, Michal Hocko wrote: > > On Tue 23-03-21 11:47:54, Arjun Roy wrote: > > > On Tue, Mar 23, 2021 at 7:34 AM Michal Hocko wrote: > > > > > > > > On Wed 17-03-21 18:12:55, Johannes Weiner wrote: > > > > [...] > > > > > Here is an idea of how it could work: > > > > > > > > > > struct page already has > > > > > > > > > > struct { /* page_pool used by netstack */ > > > > > /** > > > > > * @dma_addr: might require a 64-bit value even on > > > > > * 32-bit architectures. > > > > > */ > > > > > dma_addr_t dma_addr; > > > > > }; > > > > > > > > > > and as you can see from its union neighbors, there is quite a bit more > > > > > room to store private data necessary for the page pool. > > > > > > > > > > When a page's refcount hits zero and it's a networking page, we can > > > > > feed it back to the page pool instead of the page allocator. > > > > > > > > > > From a first look, we should be able to use the PG_owner_priv_1 page > > > > > flag for network pages (see how this flag is overloaded, we can add a > > > > > PG_network alias). With this, we can identify the page in __put_page() > > > > > and __release_page(). These functions are already aware of different > > > > > types of pages and do their respective cleanup handling. We can > > > > > similarly make network a first-class citizen and hand pages back to > > > > > the network allocator from in there. > > > > > > > > For compound pages we have a concept of destructors. Maybe we can extend > > > > that for order-0 pages as well. The struct page is heavily packed and > > > > compound_dtor shares the storage without other metadata > > > > int pages; /* 16 4 */ > > > > unsigned char compound_dtor; /* 16 1 */ > > > > atomic_t hpage_pinned_refcount; /* 16 4 */ > > > > pgtable_t pmd_huge_pte; /* 16 8 */ > > > > void * zone_device_data; /* 16 8 */ > > > > > > > > But none of those should really require to be valid when a page is freed > > > > unless I am missing something. It would really require to check their > > > > users whether they can leave the state behind. But if we can establish a > > > > contract that compound_dtor can be always valid when a page is freed > > > > this would be really a nice and useful abstraction because you wouldn't > > > > have to care about the specific type of page. > > Yeah technically nobody should leave these fields behind, but it > sounds pretty awkward to manage an overloaded destructor with a > refcounted object: > > Either every put would have to check ref==1 before to see if it will > be the one to free the page, and then set up the destructor before > putting the final ref. But that means we can't support lockless > tryget() schemes like we have in the page cache with a destructor. > Ah, I think I see what you were getting at with your prior email - at first I thought your suggestion was that, since the driver may have its own refcount, every put would need to check ref == 1 and call into the driver if need be. Instead, and correct me if I'm wrong, it seems like what you're advocating is: 1) The (opted in) driver no longer hangs onto the ref, 2) Now refcount can go all the way to 0, 3) And when it does, due to the special destructor this page has, it goes back to the driver, rather than the system? > Or you'd have to set up the destructor every time an overloaded field > reverts to its null state, e.g. hpage_pinned_refcount goes back to 0. > > Neither of those sound practical to me. > > > > > But maybe I am just overlooking the real complexity there. > > > > -- > > > > > > For now probably the easiest way is to have network pages be first > > > class with a specific flag as previously discussed and have concrete > > > handling for it, rather than trying to establish the contract across > > > page types. > > > > If you are going to claim a page flag then it would be much better to > > have it more generic. Flags are really scarce and if all you care about > > is PageHasDestructor() and provide one via page->dtor then the similar > > mechanism can be reused by somebody else. Or does anything prevent that? > > I was suggesting to alias PG_owner_priv_1, which currently isn't used > on network pages. We don't need to allocate a brandnew page flag. > Just to be certain, is there any danger of having a page, that would not be a network driver page originally, being inside __put_page(), such that PG_owner_priv_1 is set (but with one of its other overloaded meanings)? Thanks, -Arjun > I agree that a generic destructor for order-0 pages would be nice, but > due to the decentralized nature of refcounting the only way I think it > would work in practice is by adding a new field to struct page that is > not in conflict with any existing ones. > > Comparably, creating a network type page consumes no additional space.