From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8DFFC04A92 for ; Tue, 7 Nov 2023 23:04:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344105AbjKGXEM (ORCPT ); Tue, 7 Nov 2023 18:04:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235308AbjKGXEJ (ORCPT ); Tue, 7 Nov 2023 18:04:09 -0500 Received: from mail-vs1-xe32.google.com (mail-vs1-xe32.google.com [IPv6:2607:f8b0:4864:20::e32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D73CB10DD for ; Tue, 7 Nov 2023 15:04:07 -0800 (PST) Received: by mail-vs1-xe32.google.com with SMTP id ada2fe7eead31-45efc08a6f3so1439243137.0 for ; Tue, 07 Nov 2023 15:04:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699398247; x=1700003047; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LwSWVxQJlefO2RXGLXTB5EQnzK3f7Ti0RFRCIocUVAs=; b=RTyao1M5gnN7cTlDG7d8LPAKzFJX0V4Trw6manexGrhKIxZDKzaDzD15ayTI7EefhD 8ySq2N4mYEadI5EXDaY97Y8kfbO1RYF1rXnTOWrGCboGiAHtiKIk9DMBgklfURF3ILpM OXtHaii+Woesd5DF3er0h7i2sEe9SuKBuo0bQWBso8HD3QSK1IlXAMm5HtNyMszSPhHm 222h64CvspTt7hz5sPbehPaYkyPJEE8SNUaMAgCM6FwZRzqpiTz3Bm14ho/nleKoSGxh LDhEV9QZMQBB1QNx+mmnQ+aO4PtmMFv4fXL5f4BfzrnfZBucrb6PbVg/nMhjhatNuzmq EWeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699398247; x=1700003047; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LwSWVxQJlefO2RXGLXTB5EQnzK3f7Ti0RFRCIocUVAs=; b=C9//IEQh72IfYKNidQZkupJdNu2gDII1vpV5keRCrutPkppNmEIs+Wi/S0kEEpcNe0 /U0ZbDyA49tMVHwxEQhgSLHmz6RbtOqJLGBxThwirBqdKSNkyA7ijxtxv/OC1HmnAXNK 5GVpAiOELiMs3588H/2XO0REdqzVACkBgiwHzeuUQs5hRdFWH401zU2mulYm3DguGXF6 NHR2ynlnDPFfPmsSprTgTEmT4Zt6p98HAwQjmhY+4EvNSpuzW8rq5dR0ZiPln0XZYN3c rph2BgrF/obnsj/NSFOfRL785lqXc2gRJPhiSj45O4aMNUznlOIUHAQX3/Ho1DFbMK6D QVsw== X-Gm-Message-State: AOJu0YwhRkAuaxXn6TdYiyx7NfakcPlR/sYe1zLgyinHh8y7/vpa6Z63 pRgx+Et6dk/tygfV5gLvsoy9XUoBv3O6CD9zwDx9xg== X-Google-Smtp-Source: AGHT+IGkIzXyZBaE32+WMwF+duu/QcIsXXrFXzLPRCigy40o3oogjiTft3g4ezREIoF7S18+m0g5FcLRymLotkSR3mA= X-Received: by 2002:a67:e09b:0:b0:45f:8b65:28f0 with SMTP id f27-20020a67e09b000000b0045f8b6528f0mr105754vsl.12.1699398246594; Tue, 07 Nov 2023 15:04:06 -0800 (PST) MIME-Version: 1.0 References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-6-almasrymina@google.com> <3b0d612c-e33b-48aa-a861-fbb042572fc9@kernel.org> In-Reply-To: From: Mina Almasry Date: Tue, 7 Nov 2023 15:03:53 -0800 Message-ID: Subject: Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator To: David Ahern , David Wei Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 7, 2023 at 2:55=E2=80=AFPM David Ahern wro= te: > > On 11/7/23 3:10 PM, Mina Almasry wrote: > > On Mon, Nov 6, 2023 at 3:44=E2=80=AFPM David Ahern = wrote: > >> > >> On 11/5/23 7:44 PM, Mina Almasry wrote: > >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>> index eeeda849115c..1c351c138a5b 100644 > >>> --- a/include/linux/netdevice.h > >>> +++ b/include/linux/netdevice.h > >>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding { > >>> }; > >>> > >>> #ifdef CONFIG_DMA_SHARED_BUFFER > >>> +struct page_pool_iov * > >>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding); > >>> +void netdev_free_devmem(struct page_pool_iov *ppiov); > >> > >> netdev_{alloc,free}_dmabuf? > >> > > > > Can do. > > > >> I say that because a dmabuf can be host memory, at least I am not awar= e > >> of a restriction that a dmabuf is device memory. > >> > > > > In my limited experience dma-buf is generally device memory, and > > that's really its use case. CONFIG_UDMABUF is a driver that mocks > > dma-buf with a memfd which I think is used for testing. But I can do > > the rename, it's more clear anyway, I think. > > config UDMABUF > bool "userspace dmabuf misc driver" > default n > depends on DMA_SHARED_BUFFER > depends on MEMFD_CREATE || COMPILE_TEST > help > A driver to let userspace turn memfd regions into dma-bufs. > Qemu can use this to create host dmabufs for guest framebuffers= . > > > Qemu is just a userspace process; it is no way a special one. > > Treating host memory as a dmabuf should radically simplify the io_uring > extension of this set. I agree actually, and I was about to make that comment to David Wei's series once I have the time. David, your io_uring RX zerocopy proposal actually works with devmem TCP, if you're inclined to do that instead, what you'd do roughly is (I think): - Allocate a memfd, - Use CONFIG_UDMABUF to create a dma-buf out of that memfd. - Bind the dma-buf to the NIC using the netlink API in this RFC. - Your io_uring extensions and io_uring uapi should work as-is almost on top of this series, I think. If you do this the incoming packets should land into your memfd, which may or may not work for you. In the future if you feel inclined to use device memory, this approach that I'm describing here would be more extensible to device memory, because you'd already be using dma-bufs for your user memory; you'd just replace one kind of dma-buf (UDMABUF) with another. > That the io_uring set needs to dive into > page_pools is just wrong - complicating the design and code and pushing > io_uring into a realm it does not need to be involved in. > > Most (all?) of this patch set can work with any memory; only device > memory is unreadable. > > --=20 Thanks, Mina