From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C7C0C001B2 for ; Thu, 9 Nov 2023 16:07:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343776AbjKIQHX (ORCPT ); Thu, 9 Nov 2023 11:07:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234482AbjKIQHV (ORCPT ); Thu, 9 Nov 2023 11:07:21 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EFB9325A; Thu, 9 Nov 2023 08:07:19 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id ffacd0b85a97d-32da4ffd7e5so666291f8f.0; Thu, 09 Nov 2023 08:07:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699546038; x=1700150838; darn=vger.kernel.org; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject:from:to:cc :subject:date:message-id:reply-to; bh=bLnVRYPlFfc13GAjjN2Xsp7jGpHoZsbxAdnsHea6XjM=; b=S1PdBEEbQjC//UPEJlQzyAeUAkuHeKKJJ9YxSwGJsBWIinlJkTs+futdHjvOEVf1FC i1Lq84/hnwkn4XeC5I0BBN/0oIHI5af7Fi3x6itpWXj6tG56p2o+2ARGn2y3UatsLXoM 2BRCpof7VunFpfVP3P0nRrTARM2YsPO0VE0dPjXa8wqDzjKgoixES1wF4kmsqRx0ezw9 h3fikdiuHtuXCTH3ejazBRyAvX0r13K29IX5G5vNdpNq2cBxgpRFb+JkL0Tem9jL0+K7 pAhZsBr6i3kJ2w4D3MLLjQvI1ypeSrlzcBEEphirMcF7RjNd0iiclyA2FqT68ZsCnMv5 hM3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699546038; x=1700150838; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bLnVRYPlFfc13GAjjN2Xsp7jGpHoZsbxAdnsHea6XjM=; b=R/MbwkS/v8iIbH06fbpsFkDdZD9+AYldSwwIVdg6X9PNPW5NkpenkDc9jDsYzVHhTT J7VrGwVPZB1LflTmKfBY70EhNcRqlMD+LA/5rW2XJg0SdxntYLDp1NQWRunqbhaJU1xL R/nOoJEzR6nzeRqBGN8bC1Qmhe4pReIQ4u/3QvTL83ZSyyHXzyc1nsSFDuwXSgRTlGQ5 AEwJ/V44dN0PXF/XYwUkQOkK0XgXxazJi4TwIMlWcGFERrmzfY9SWsBHzBRfbMqkQzfv 4Sz63UHyC5/kfLXFw0DSlEyRgW4+jh2nSoqU/uXcTN16mJVwgTt1V7mIz/OjCDpj+naD /Osw== X-Gm-Message-State: AOJu0YwiZkdiOR1l2yn/EgV10xBCIJqDIqkgH4smPRNFKS8MUyVbibSC xeGsTXVc6tTg8V3ClCJS6Yk= X-Google-Smtp-Source: AGHT+IGVRroWe9+M86SBSGyIl3Qj9MvOmsO9RPSIFo66Ff8BWlVF0jn+QQTFhXSBQi9W892f6tI7Bg== X-Received: by 2002:a05:6000:1888:b0:32f:87e8:707d with SMTP id a8-20020a056000188800b0032f87e8707dmr7113717wri.5.1699546037464; Thu, 09 Nov 2023 08:07:17 -0800 (PST) Received: from [192.168.1.122] (cpc159313-cmbg20-2-0-cust161.5-4.cable.virginm.net. [82.0.78.162]) by smtp.gmail.com with ESMTPSA id t10-20020a5d42ca000000b0032f7fab0712sm7577069wrr.52.2023.11.09.08.07.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Nov 2023 08:07:17 -0800 (PST) Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP To: Mina Almasry , David Ahern Cc: Willem de Bruijn , Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=c3=b6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-11-almasrymina@google.com> <93eb6a2b-a991-40ca-8f26-f520c986729a@kernel.org> <7ce2d027-1e02-4a63-afb7-7304fbfbdf90@kernel.org> From: Edward Cree Message-ID: <6f853286-e463-b684-cc1e-405119528697@gmail.com> Date: Thu, 9 Nov 2023 16:07:15 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/11/2023 02:39, Mina Almasry wrote: > On Wed, Nov 8, 2023 at 7:36 AM Edward Cree wrote: >> If not then surely the way to return a memory area >> in an io_uring idiom is just to post a new read sqe ('RX descriptor') >> pointing into it, rather than explicitly returning it with setsockopt. > > We're interested in using this with regular TCP sockets, not > necessarily io_uring. Fair. I just wanted to push against the suggestion upthread that "oh, since io_uring supports setsockopt() we can just ignore it and it'll all magically work later" (paraphrased). If you can keep the "allocate buffers out of a devmem region" and "post RX descriptors built on those buffers" APIs separate (inside the kernel; obviously both triggered by a single call to the setsockopt() uAPI) that'll likely make things simpler for the io_uring interface I describe, which will only want the latter. -ed PS: Here's a crazy idea that I haven't thought through at all: what if you allow device memory to be mmap()ed into process address space (obviously with none of r/w/x because it's unreachable), so that your various uAPIs can just operate on pointers (e.g. the setsockopt becomes the madvise it's named after; recvmsg just uses or populates the iovec rather than needing a cmsg). Then if future devices have their memory CXL accessible that can potentially be enabled with no change to the uAPI (userland just starts being able to access the region without faulting). And you can maybe add a semantic flag to recvmsg saying "if you don't use all the buffers in my iovec, keep hold of the rest of them for future incoming traffic, and if I post new buffers with my next recvmsg, add those to the tail of the RXQ rather than replacing the ones you've got". That way you can still have the "userland directly fills the RX ring" behaviour even with TCP sockets.