From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A108C43381 for ; Fri, 22 Feb 2019 22:33:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 166142075A for ; Fri, 22 Feb 2019 22:33:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="08EskpAP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725859AbfBVWdH (ORCPT ); Fri, 22 Feb 2019 17:33:07 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:53683 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725811AbfBVWdH (ORCPT ); Fri, 22 Feb 2019 17:33:07 -0500 Received: by mail-it1-f195.google.com with SMTP id x131so5289446itc.3 for ; Fri, 22 Feb 2019 14:33:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=slAuMuDso3l3YVBaomoyLgO/af49If+hxIRMhoRsVWs=; b=08EskpAPFiFT9FSqdiXVsexBm1iC3zElB610hudLC8oT2PlKENAUvIz/nqPnCggOu8 b5AiJG9TWoZE9YtP5ubMjOghFz444jP8UR4OGtHrmsiZiymZqqPhdmls3l0RS2xXkLAl cF1WbuuFXF9um7yUH/jN15GuofCmbhlOd1P8XpDnz1+GJ9xu38ucavJ3+k/ukS0A5aEM Hmc13QY1+Lm9SfxIDB0xYj7BV6DOJclA56MP+rNOKyZZ45rUG2QmD/x5Xm4ngF4RzCd/ U5NbDSh4WTQmOK3VgGeFv7Kp7ghNne0mAjcP2q/YWMQIIm8Gt9cqcITCuXtoLBAWyKhP PDzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=slAuMuDso3l3YVBaomoyLgO/af49If+hxIRMhoRsVWs=; b=e/8+bHDUfnzd3ajcx645YUFtFOpC42JAEei25kGjITK398epOONANJ9QEluiLYTtSm BDg4Gt/BB3IH7zfL9yso7yR1UaU8Nhdq9j7WiMqBJdddrVK4hbPtMUmr+d+i6mePCkb7 G227oaWzT/XHsl2RjX8TAiiXeKgLsXNIswyKT8vNoIa2meQ6zDnOaFjvqqB0j732Crth zSZU9hsywItr2QhnW52f9lWkfb8hssCeSCVX1JntviBiP+9zV7vGRgumSRLy42hYyXe3 HnBFLMNzYh5tqQZXvMPnfr4yvcg6aTKOrp1eC70fKY6y8CX9KYccFS9ZXkRX1YktkftW y++w== X-Gm-Message-State: AHQUAua948FiHh587F9mTxgpNGxK6c4H2Yzl/nrEVskZ1h1MbJglAzjM rkosWI/5FDbdz97Xvvwm+KkxFQ== X-Google-Smtp-Source: AHgI3IaxNTrGWjxZJTD33Al37SfK3WVMrFpDSiwab/xkqp4OkTCwaBJEFi753L6S0lb9LX3hqVKsjw== X-Received: by 2002:a24:2f05:: with SMTP id j5mr3319669itj.156.1550874785872; Fri, 22 Feb 2019 14:33:05 -0800 (PST) Received: from [172.19.131.32] ([8.46.76.24]) by smtp.gmail.com with ESMTPSA id l134sm1318781itb.12.2019.02.22.14.32.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Feb 2019 14:33:04 -0800 (PST) Subject: Re: [PATCHSET v15] io_uring IO interface To: Marek Majkowski Cc: Avi Kivity , hch@lst.de, Jann Horn , jmoyer@redhat.com, linux-aio@kvack.org, linux-api@vger.kernel.org, linux-block@vger.kernel.org, viro@zeniv.linux.org.uk References: <20190211190049.7888-1-axboe@kernel.dk> <20190221121022.7867-1-marek@cloudflare.com> From: Jens Axboe Message-ID: Date: Fri, 22 Feb 2019 15:32:49 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 2/22/19 8:01 AM, Marek Majkowski wrote: > On Thu, Feb 21, 2019 at 6:48 PM Jens Axboe wrote: >> >> On 2/21/19 5:10 AM, Marek Majkowski wrote: >>>> From: Jens Axboe >>>> Subject: [PATCHSET v15] io_uring IO interface >>>> Message-ID: <20190211190049.7888-1-axboe@kernel.dk> (raw) >>>> >>>> Some final tweaks, mostly cosmetic, but also two important fixes: >>>> >>>> 1) Ensure that we account the skb appropriately against the socket. >>>> Some network config options apparently return is an skb with >>>> ->truesize != 0 when allocated with a size of 0, ensure we add >>>> those as references against sock->sk_wmem_alloc. Reported by >>>> Matt Mullins. >>> >>> Jens, >>> >>> I tried using io_uring with network sockets. It seem to be doing the >>> right thing. One bit is missing though: "flags" as in recv(2). >>> >>> In perfect world I would like to specify at least: >>> - MSG_DONTWAIT >>> - MSG_WAITALL >>> - MSG_NOSIGNAL >>> >>> Right now, unless I'm missing something, io_uring_sqe doesn't have a >>> place where we could store these. "flags" is needed for any >>> non-trivial network I/O. >> >> We have flags for sqes, depending on the type. You can add to the >> union that already holds rw_flags/fsync_flags/poll_events? There's >> also a (smaller) flags field that applies for all types, which >> currently only holds the fixed file flag. > > The "sqe->flags" right now is used by the IOSQE_FIXED_FILE which has > the same value as MSG_OOB. > > Sticking recv/send flags into the "rw_flags" union perhaps could work, > barring the discussion about naming. The obvious names don't make > sense. recv_flags, send_flags or socket_flags don't sound right. > > If we tried to add networking stuff to io_uring (for batchinig and async), then: > - send()/recv() could work, only needs the "flags" field > - sendmsg()/recvmsg() likewise > - sendto()/recvfrom() require two more pointers: (struct sockaddr > *dest_addr, socklen_t addrlen) > - sendmmsg() / recvmmsg() are perhaps irrelevant > > Non-blocking stuff like socket(), setsockopt(), bind() perhaps don't > need to be considered, although could benefit from batching. If we just do separate opcodes for them, then there's 32 bits of flag space for each one. That should be more than adequate. > Not sure what to think about connect() and accept(). In the > prehistoric days there seem to have been an attempt to add socket > things to libaio struct iocb. See: > > https://code.woboq.org/linux/include/libaio.h.html#iocb::(anonymous)::saddr > > struct iocb { > ... > union { > ... > struct io_iocb_sockaddr saddr; > } u; > }; > > Are there chances of reserving space for two pointers in io_uring_sqe, > which could be used for sendto/recvfrom/accept if we decided to add > more network support? There is already space for that. We have 3 x 64 bits at the end of the sqe, 16 of those are used for the fixed buffers which networking probably wants to support as well. But that still leaves 112 bits of space for things opcode specific data. -- Jens Axboe From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCHSET v15] io_uring IO interface Date: Fri, 22 Feb 2019 15:32:49 -0700 Message-ID: References: <20190211190049.7888-1-axboe@kernel.dk> <20190221121022.7867-1-marek@cloudflare.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US Sender: owner-linux-aio@kvack.org To: Marek Majkowski Cc: Avi Kivity , hch@lst.de, Jann Horn , jmoyer@redhat.com, linux-aio@kvack.org, linux-api@vger.kernel.org, linux-block@vger.kernel.org, viro@zeniv.linux.org.uk List-Id: linux-api@vger.kernel.org On 2/22/19 8:01 AM, Marek Majkowski wrote: > On Thu, Feb 21, 2019 at 6:48 PM Jens Axboe wrote: >> >> On 2/21/19 5:10 AM, Marek Majkowski wrote: >>>> From: Jens Axboe >>>> Subject: [PATCHSET v15] io_uring IO interface >>>> Message-ID: <20190211190049.7888-1-axboe@kernel.dk> (raw) >>>> >>>> Some final tweaks, mostly cosmetic, but also two important fixes: >>>> >>>> 1) Ensure that we account the skb appropriately against the socket. >>>> Some network config options apparently return is an skb with >>>> ->truesize != 0 when allocated with a size of 0, ensure we add >>>> those as references against sock->sk_wmem_alloc. Reported by >>>> Matt Mullins. >>> >>> Jens, >>> >>> I tried using io_uring with network sockets. It seem to be doing the >>> right thing. One bit is missing though: "flags" as in recv(2). >>> >>> In perfect world I would like to specify at least: >>> - MSG_DONTWAIT >>> - MSG_WAITALL >>> - MSG_NOSIGNAL >>> >>> Right now, unless I'm missing something, io_uring_sqe doesn't have a >>> place where we could store these. "flags" is needed for any >>> non-trivial network I/O. >> >> We have flags for sqes, depending on the type. You can add to the >> union that already holds rw_flags/fsync_flags/poll_events? There's >> also a (smaller) flags field that applies for all types, which >> currently only holds the fixed file flag. > > The "sqe->flags" right now is used by the IOSQE_FIXED_FILE which has > the same value as MSG_OOB. > > Sticking recv/send flags into the "rw_flags" union perhaps could work, > barring the discussion about naming. The obvious names don't make > sense. recv_flags, send_flags or socket_flags don't sound right. > > If we tried to add networking stuff to io_uring (for batchinig and async), then: > - send()/recv() could work, only needs the "flags" field > - sendmsg()/recvmsg() likewise > - sendto()/recvfrom() require two more pointers: (struct sockaddr > *dest_addr, socklen_t addrlen) > - sendmmsg() / recvmmsg() are perhaps irrelevant > > Non-blocking stuff like socket(), setsockopt(), bind() perhaps don't > need to be considered, although could benefit from batching. If we just do separate opcodes for them, then there's 32 bits of flag space for each one. That should be more than adequate. > Not sure what to think about connect() and accept(). In the > prehistoric days there seem to have been an attempt to add socket > things to libaio struct iocb. See: > > https://code.woboq.org/linux/include/libaio.h.html#iocb::(anonymous)::saddr > > struct iocb { > ... > union { > ... > struct io_iocb_sockaddr saddr; > } u; > }; > > Are there chances of reserving space for two pointers in io_uring_sqe, > which could be used for sendto/recvfrom/accept if we decided to add > more network support? There is already space for that. We have 3 x 64 bits at the end of the sqe, 16 of those are used for the fixed buffers which networking probably wants to support as well. But that still leaves 112 bits of space for things opcode specific data. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org