From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3272C4167D for ; Tue, 14 Nov 2023 16:11:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233834AbjKNQLK (ORCPT ); Tue, 14 Nov 2023 11:11:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233830AbjKNQLI (ORCPT ); Tue, 14 Nov 2023 11:11:08 -0500 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 486BB12C; Tue, 14 Nov 2023 08:11:05 -0800 (PST) Received: by mail-qv1-xf30.google.com with SMTP id 6a1803df08f44-66cfd35f595so29861386d6.2; Tue, 14 Nov 2023 08:11:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699978264; x=1700583064; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=WcBBKnTEqaoj3LxtGCYhc7t/cWobqHG+KSslQabs03I=; b=jiko082Ua1lfJDUQiLyoKwfG4+tg/Zq2/lpZ0rN+zmUtHB+nhwDjwiisusEiHMiHeF JzQz7qTgeVzIorGtR1fv+qZHdXiDmVpR0WOYFhWEV5XfwwRNSaPBFXeQZDQKCSKVAS21 KdROydjR/E38ZRACuls+YVdof3UVbl0YtkWV0w0WL86FA3j/axlorSIAVxf9WnHfHch1 J9xllmqzbDG1IZyMHHlujzYFaqjmUJnJZiVnFxuRUDrOpb93fJugSG9dG70mbUXQPDHf 5HZYEdWSn6dF+av2EHeemNH+cdhs7oESWuJhEF0BBmNGb/FFiWFU+UNpRycXYxnxn36z I2Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699978264; x=1700583064; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WcBBKnTEqaoj3LxtGCYhc7t/cWobqHG+KSslQabs03I=; b=ZbwkeihKhLbBIuKCgy8cPLk+ePiOqBKInOr0oG09XpWJxL10imCfLS0Qqmz3/Dwcby HM3kja1Cy9WI+4xgWbF0qMlzMwx1D76HsNWXn92cWtv2gk7qfqsuRZk67VWzgHc4Ibel jMsCFUIr8mcJH87lC8hoA/W0WQUZjzQOH/SAZwikFBTDA7hLzwZCy2I0d+DYzMAx+C8T kT2A+ei9sF++h40jExsy7DKRG5TuB6oScuNqlv96OCsyx2idkXj9bXQSJuBLQy/dGuxf XnVtMoX61T1KByCd92f7gDS+zqnI/OXi/aa7EJPUko9HOAFWPCCM7tQopKZtJcR85uTg Ie4w== X-Gm-Message-State: AOJu0Yz1XCrys4BFIIpFNrY7TKec8m1IWEehhPNnXFiuB3y1rJfnmiQd ZZaQeXlVH17oipcAwbpOeEU= X-Google-Smtp-Source: AGHT+IFP/S3WR7/gOKNC8Du/goDp8eEHlLIGAtwXreEQAWZa6gXfFFZ7r8yADmexKAwCka48pB7dfg== X-Received: by 2002:a05:6214:1245:b0:66d:6af7:4571 with SMTP id r5-20020a056214124500b0066d6af74571mr4183692qvv.17.1699978264358; Tue, 14 Nov 2023 08:11:04 -0800 (PST) Received: from [172.25.81.254] ([12.186.190.1]) by smtp.gmail.com with ESMTPSA id g1-20020a0cf841000000b00656329bb3b1sm3023598qvo.10.2023.11.14.08.11.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 14 Nov 2023 08:11:03 -0800 (PST) Message-ID: <2aa9c139-eee8-c707-6e62-5415c26c2a1a@gmail.com> Date: Tue, 14 Nov 2023 16:09:51 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator To: David Ahern , Mina Almasry , David Wei Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-6-almasrymina@google.com> <3b0d612c-e33b-48aa-a861-fbb042572fc9@kernel.org> <3687e70e-29e6-34af-c943-8c0830ff92b8@gmail.com> Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/11/23 17:19, David Ahern wrote: > On 11/10/23 7:26 AM, Pavel Begunkov wrote: >> On 11/7/23 23:03, Mina Almasry wrote: >>> On Tue, Nov 7, 2023 at 2:55 PM David Ahern wrote: >>>> >>>> On 11/7/23 3:10 PM, Mina Almasry wrote: >>>>> On Mon, Nov 6, 2023 at 3:44 PM David Ahern wrote: >>>>>> >>>>>> On 11/5/23 7:44 PM, Mina Almasry wrote: >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>> index eeeda849115c..1c351c138a5b 100644 >>>>>>> --- a/include/linux/netdevice.h >>>>>>> +++ b/include/linux/netdevice.h >>>>>>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding { >>>>>>>   }; >>>>>>> >>>>>>>   #ifdef CONFIG_DMA_SHARED_BUFFER >>>>>>> +struct page_pool_iov * >>>>>>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding); >>>>>>> +void netdev_free_devmem(struct page_pool_iov *ppiov); >>>>>> >>>>>> netdev_{alloc,free}_dmabuf? >>>>>> >>>>> >>>>> Can do. >>>>> >>>>>> I say that because a dmabuf can be host memory, at least I am not >>>>>> aware >>>>>> of a restriction that a dmabuf is device memory. >>>>>> >>>>> >>>>> In my limited experience dma-buf is generally device memory, and >>>>> that's really its use case. CONFIG_UDMABUF is a driver that mocks >>>>> dma-buf with a memfd which I think is used for testing. But I can do >>>>> the rename, it's more clear anyway, I think. >>>> >>>> config UDMABUF >>>>          bool "userspace dmabuf misc driver" >>>>          default n >>>>          depends on DMA_SHARED_BUFFER >>>>          depends on MEMFD_CREATE || COMPILE_TEST >>>>          help >>>>            A driver to let userspace turn memfd regions into dma-bufs. >>>>            Qemu can use this to create host dmabufs for guest >>>> framebuffers. >>>> >>>> >>>> Qemu is just a userspace process; it is no way a special one. >>>> >>>> Treating host memory as a dmabuf should radically simplify the io_uring >>>> extension of this set. >>> >>> I agree actually, and I was about to make that comment to David Wei's >>> series once I have the time. >>> >>> David, your io_uring RX zerocopy proposal actually works with devmem >>> TCP, if you're inclined to do that instead, what you'd do roughly is >>> (I think): >> That would be a Frankenstein's monster api with no good reason for it. > > It brings a consistent API from a networking perspective. > > io_uring should not need to be in the page pool and memory management > business. Have you or David coded up the re-use of the socket APIs with > dmabuf to see how much smaller it makes the io_uring change - or even > walked through from a theoretical perspective? Yes, we did the mental exercise, which is why we're converting to pp. I don't see many opportunities for reuse for the main data path, potentially apart from using the iov format instead of pages. If the goal is to minimise the amount of code, it can mimic the tcp devmem api with netlink, ioctl-ish buffer return, but that'd be a pretty bad api for io_uring, overly complicated and limiting optimisation options. If not, then we have to do some buffer management in io_uring, and I don't see anything wrong with that. It shouldn't be a burden for networking if all that extra code is contained in io_uring and only exposed via pp ops and following the rules. -- Pavel Begunkov