From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B029D60DC0 for ; Fri, 22 Mar 2024 17:55:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711130111; cv=none; b=UZXUHnMf0IwYXwQH91UP7v3Re8GstmlTCSh1BFGeqSyMDXbbPNAzvynjlTveLOFrNiwo+oM+tSjiFz+G9Tzb0G5CfTms36j9u/ULdHIOtOIn8vQd8nxR2w+Jl5WP0sS8SKS8O9bh7AEOzRe492g8b103GMtkQsrjptOLdatfzdI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711130111; c=relaxed/simple; bh=l9m99BFxmTnLiVZTWH4IyJEQNsmy+RFkQO1mLY1L5IA=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=Z0gbBg7pEfjFqABBL+x9SQBpOPuRzVOxJLFEUBD4nQMRx0Ya28fA70a7/QAwQyqXuwWbgZJHGMKR7BIxZ2jUoua8m/ZCxgT4G03/qJxc0waJSHxR/Pt8yHBphdzUFT0yTjPV5boxKGTBpHBwTwODG/jX/r6pjh/HGirBTKsRtYU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RzkND4Bm; arc=none smtp.client-ip=209.85.208.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RzkND4Bm" Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-56b8e4f38a2so3166276a12.3 for ; Fri, 22 Mar 2024 10:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711130107; x=1711734907; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6WuYhmNp304wILfLv6V/aQkHKFavalWMa0RSWXYvrPw=; b=RzkND4Bm4wkW3lvcEypkcxdmIg1Pc2XAJIPjja+0m+v/CoeclpPbx3S06/HnLsyDhQ IlE86J50GPqiQotibPDMSZcMLL/LCKx/nc/WNm03EtQrfCMtwRaAGvsZjG31wZg8unia yIO5LSMFoknJfUy3v6TCpywvoCOXS0ROpdyhSG4wgoLxD11Cmr7aC53NpjsGzJZtHEmz sA4xBR4biuHrpF47aakzjpmM+UJ+L8Q/7YjQryfz+gthJ09Xn0G6rQ7Edg5cBODxW7zj U24TMitp1pKP0upk5gxicw9rGF6SY9GuUwm96+GjbXB1pIgu8Q2a3SaUmr9e+3ce/QnJ wkHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711130107; x=1711734907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6WuYhmNp304wILfLv6V/aQkHKFavalWMa0RSWXYvrPw=; b=VjPsTKJ8e9kNACrgehWyW90w433eyixfRHcVdXo9X4lujcjeJe8pOPbg17IYMfLO5f HrjNWBgMXxrKWq6b3S5DpQNqqhw/KkEaaWUAQmnK3Exvht219vkNr7tbJLAwYAy6Xzb1 9MJeXFaAhlK0ORwiDlDjlk+q8ihIS1ItvV8M43AR2KhGqOAdUOjv+qSQW6/ya11Wh6Je iOp+ggG0cK5i6bTW4/jt43fjUJyc7rwgHabUPzmCUfMbme8sFdPj2/nMbVTFncTWIuMy tKBYZyV/zvJrBFuKjyCBFyhv5By+W5cMy21p0DkogHLBOrbpEKIGnafR3bkFTiehmWff n7PA== X-Forwarded-Encrypted: i=1; AJvYcCXEDLAzHcAqhYBOn4TiXNF490LxoJn0iMV0Yoepy3LikFmSAPmv5NSl6IXtAJksSVf9nRaOH8D61h147Ag5eBhFgYgKnlDV/LEmG8cT X-Gm-Message-State: AOJu0Yyk5WX09bEaFS+ADA7HRTlAhZCEfwVPfazpGe5xOlY6J6OIrDW2 LvP+rxdqpbg3wS5FmRnSQ/l7A+VkIChzKOzkhQ7zC0GLN4fkz0WZLOICdBsB/cFnRPM5msN1psi Yn0/6bVu0CAr8Ex1kBeBkLmkotY58SU+w5P1X X-Google-Smtp-Source: AGHT+IHTothcuqikFFrMs12/Yd8HwJZ8LnAxmbZMMmJb1jW1wBBEyord9u+K2tzA7reF5OIehczUR0R88m66zZf5tNs= X-Received: by 2002:a17:906:d190:b0:a47:e62:4d72 with SMTP id c16-20020a170906d19000b00a470e624d72mr331556ejz.15.1711130106762; Fri, 22 Mar 2024 10:55:06 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240305020153.2787423-1-almasrymina@google.com> <20240305020153.2787423-3-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Fri, 22 Mar 2024 10:54:54 -0700 Message-ID: Subject: Re: [RFC PATCH net-next v6 02/15] net: page_pool: create hooks for custom page providers To: Christoph Hellwig Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Mar 17, 2024 at 7:03=E2=80=AFPM Christoph Hellwig wrote: > > On Mon, Mar 04, 2024 at 06:01:37PM -0800, Mina Almasry wrote: > > From: Jakub Kicinski > > > > The page providers which try to reuse the same pages will > > need to hold onto the ref, even if page gets released from > > the pool - as in releasing the page from the pp just transfers > > the "ownership" reference from pp to the provider, and provider > > will wait for other references to be gone before feeding this > > page back into the pool. > > The word hook always rings a giant warning bell for me, and looking into > this series I am concerned indeed. > > The only provider provided here is the dma-buf one, and that basically > is the only sensible one for the documented design. Sorry I don't mean to argue but as David mentioned, there are some plans in the works and ones not in the works to extend this to other memory types. David mentioned io_uring & Jakub's huge page use cases which may want to re-use this design. I have an additional one in mind, which is extending devmem TCP for storage devices. Currently storage devices do not support dmabuf and my understanding is that it's very hard to do so, and NVMe uses pci_p2pdma instead. I wonder if it's possible to extend devmem TCP in the future to support pci_p2pdma to support nvme devices in the future. Additionally I've been thinking about a use case of limiting the amount of memory the net stack can use. Currently the page pool is free to allocate as much memory as it wants from the buddy allocator. This may be undesirable in very low memory setups such as overcommited VMs. We can imagine a memory provider that allows allocation only if the page_pool is below a certain limit. We can also imagine a memory provider that preallocates memory and only uses that pinned pool. None of these are in the works at the moment, but are examples of how this can be (reasonably?) extended. > So instead of > adding hooks that random proprietary crap can hook into, To be completely honest I'm unsure how to design hooks for proprietary code to hook into. I think that would be done on the basis of EXPORTED_SYMBOL? We do not export these hooks, nor plan to at the moment. > why not hard > code the dma buf provide and just use a flag? That'll also avoid > expensive indirect calls. > Thankfully the indirect calls do not seem to be an issue. We've been able to hit 95% line rate with devmem TCP and I think the remaining 5% are a bottleneck unrelated to the indirect calls. Page_pool benchmarks show a very minor degradation in the fast path, so small it may be just noise in the measurement (may!): https://lore.kernel.org/netdev/20240305020153.2787423-1-almasrymina@google.= com/T/#m1c308df9665724879947a345c4b1ec3b51ff6856 This is because the code path that does indirect allocations is the slow path. The page_pool recycles netmem aggressively. --=20 Thanks, Mina