All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	io-uring@vger.kernel.org
Cc: joseph.qi@linux.alibaba.com
Subject: Re: [PATCH] __io_uring_get_cqe: eliminate unnecessary io_uring_enter() syscalls
Date: Mon, 2 Mar 2020 08:24:03 -0700	[thread overview]
Message-ID: <5370d9cf-2ca6-53bc-0e32-544a43ca88a3@kernel.dk> (raw)
In-Reply-To: <91e11a5a-1880-8ce3-18c5-6843abd2cf2b@kernel.dk>

On 3/2/20 7:05 AM, Jens Axboe wrote:
> On 3/1/20 9:18 PM, Xiaoguang Wang wrote:
>> When user applis programming mode, like sumbit one sqe and wait its
>> completion event, __io_uring_get_cqe() will result in many unnecessary
>> syscalls, see below test program:
>>
>>     int main(int argc, char *argv[])
>>     {
>>             struct io_uring ring;
>>             int fd, ret;
>>             struct io_uring_sqe *sqe;
>>             struct io_uring_cqe *cqe;
>>             struct iovec iov;
>>             off_t offset, filesize = 0;
>>             void *buf;
>>
>>             if (argc < 2) {
>>                     printf("%s: file\n", argv[0]);
>>                     return 1;
>>             }
>>
>>             ret = io_uring_queue_init(4, &ring, 0);
>>             if (ret < 0) {
>>                     fprintf(stderr, "queue_init: %s\n", strerror(-ret));
>>                     return 1;
>>             }
>>
>>             fd = open(argv[1], O_RDONLY | O_DIRECT);
>>             if (fd < 0) {
>>                     perror("open");
>>                     return 1;
>>             }
>>
>>             if (posix_memalign(&buf, 4096, 4096))
>>                     return 1;
>>             iov.iov_base = buf;
>>             iov.iov_len = 4096;
>>
>>             offset = 0;
>>             do {
>>                     sqe = io_uring_get_sqe(&ring);
>>                     if (!sqe) {
>>                             printf("here\n");
>>                             break;
>>                     }
>>                     io_uring_prep_readv(sqe, fd, &iov, 1, offset);
>>
>>                     ret = io_uring_submit(&ring);
>>                     if (ret < 0) {
>>                             fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>                             return 1;
>>                     }
>>
>>                     ret = io_uring_wait_cqe(&ring, &cqe);
>>                     if (ret < 0) {
>>                             fprintf(stderr, "io_uring_wait_cqe: %s\n", strerror(-ret));
>>                             return 1;
>>                     }
>>
>>                     if (cqe->res <= 0) {
>>                             if (cqe->res < 0) {
>>                                     fprintf(stderr, "got eror: %d\n", cqe->res);
>>                                     ret = 1;
>>                             }
>>                             io_uring_cqe_seen(&ring, cqe);
>>                             break;
>>                     }
>>                     offset += cqe->res;
>>                     filesize += cqe->res;
>>                     io_uring_cqe_seen(&ring, cqe);
>>             } while (1);
>>
>>             printf("filesize: %ld\n", filesize);
>>             close(fd);
>>             io_uring_queue_exit(&ring);
>>             return 0;
>>     }
>>
>> dd if=/dev/zero of=testfile bs=4096 count=16
>> ./test  testfile
>> and use bpftrace to trace io_uring_enter syscalls, in original codes,
>> [lege@localhost ~]$ sudo bpftrace -e "tracepoint:syscalls:sys_enter_io_uring_enter {@c[tid] = count();}"
>> Attaching 1 probe...
>> @c[11184]: 49
>> Above test issues 49 syscalls, it's counterintuitive. After looking
>> into the codes, it's because __io_uring_get_cqe issue one more syscall,
>> indded when __io_uring_get_cqe issues the first syscall, one cqe should
>> already be ready, we don't need to wait again.
>>
>> To fix this issue, after the first syscall, set wait_nr to be zero, with
>> tihs patch, bpftrace shows the number of io_uring_enter syscall is 33.
> 
> Thanks, that's a nice fix, we definitely don't want to be doing
> 50% more system calls than we have to...

Actually, don't think the fix is quite safe. For one, if we get an error
on the __io_uring_enter(), then we may not have waited for entries. Or if
we submitted less than we thought we would, we would not have waited
either. So we need to check for full success before deeming it safe to
clear wait_nr.

-- 
Jens Axboe


  reply	other threads:[~2020-03-02 15:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02  4:18 [PATCH] __io_uring_get_cqe: eliminate unnecessary io_uring_enter() syscalls Xiaoguang Wang
2020-03-02 14:05 ` Jens Axboe
2020-03-02 15:24   ` Jens Axboe [this message]
2020-03-02 15:37     ` Jens Axboe
2020-03-03 13:11       ` Xiaoguang Wang
2020-03-03 14:35         ` Jens Axboe
2020-03-04 13:27       ` Xiaoguang Wang
2020-03-04 13:57         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5370d9cf-2ca6-53bc-0e32-544a43ca88a3@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.