All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sedat Dilek <sedat.dilek@gmail.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org
Subject: Re: [PATCHSET v5 0/12] Add support for async buffered reads
Date: Thu, 28 May 2020 20:20:50 +0200	[thread overview]
Message-ID: <CA+icZUUnXYO9qX3PJTs=jJXT_1F_d8x5oUfuxkR_mViyfkFDOQ@mail.gmail.com> (raw)
In-Reply-To: <fd169130-6ac4-f135-d85f-56daa25c8c9f@kernel.dk>

On Thu, May 28, 2020 at 7:14 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 5/28/20 11:12 AM, Sedat Dilek wrote:
> > On Thu, May 28, 2020 at 7:06 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 5/28/20 11:02 AM, Sedat Dilek wrote:
> >>> On Tue, May 26, 2020 at 10:59 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>
> >>>> We technically support this already through io_uring, but it's
> >>>> implemented with a thread backend to support cases where we would
> >>>> block. This isn't ideal.
> >>>>
> >>>> After a few prep patches, the core of this patchset is adding support
> >>>> for async callbacks on page unlock. With this primitive, we can simply
> >>>> retry the IO operation. With io_uring, this works a lot like poll based
> >>>> retry for files that support it. If a page is currently locked and
> >>>> needed, -EIOCBQUEUED is returned with a callback armed. The callers
> >>>> callback is responsible for restarting the operation.
> >>>>
> >>>> With this callback primitive, we can add support for
> >>>> generic_file_buffered_read(), which is what most file systems end up
> >>>> using for buffered reads. XFS/ext4/btrfs/bdev is wired up, but probably
> >>>> trivial to add more.
> >>>>
> >>>> The file flags support for this by setting FMODE_BUF_RASYNC, similar
> >>>> to what we do for FMODE_NOWAIT. Open to suggestions here if this is
> >>>> the preferred method or not.
> >>>>
> >>>> In terms of results, I wrote a small test app that randomly reads 4G
> >>>> of data in 4K chunks from a file hosted by ext4. The app uses a queue
> >>>> depth of 32. If you want to test yourself, you can just use buffered=1
> >>>> with ioengine=io_uring with fio. No application changes are needed to
> >>>> use the more optimized buffered async read.
> >>>>
> >>>> preadv for comparison:
> >>>>         real    1m13.821s
> >>>>         user    0m0.558s
> >>>>         sys     0m11.125s
> >>>>         CPU     ~13%
> >>>>
> >>>> Mainline:
> >>>>         real    0m12.054s
> >>>>         user    0m0.111s
> >>>>         sys     0m5.659s
> >>>>         CPU     ~32% + ~50% == ~82%
> >>>>
> >>>> This patchset:
> >>>>         real    0m9.283s
> >>>>         user    0m0.147s
> >>>>         sys     0m4.619s
> >>>>         CPU     ~52%
> >>>>
> >>>> The CPU numbers are just a rough estimate. For the mainline io_uring
> >>>> run, this includes the app itself and all the threads doing IO on its
> >>>> behalf (32% for the app, ~1.6% per worker and 32 of them). Context
> >>>> switch rate is much smaller with the patchset, since we only have the
> >>>> one task performing IO.
> >>>>
> >>>> Also ran a simple fio based test case, varying the queue depth from 1
> >>>> to 16, doubling every time:
> >>>>
> >>>> [buf-test]
> >>>> filename=/data/file
> >>>> direct=0
> >>>> ioengine=io_uring
> >>>> norandommap
> >>>> rw=randread
> >>>> bs=4k
> >>>> iodepth=${QD}
> >>>> randseed=89
> >>>> runtime=10s
> >>>>
> >>>> QD/Test         Patchset IOPS           Mainline IOPS
> >>>> 1               9046                    8294
> >>>> 2               19.8k                   18.9k
> >>>> 4               39.2k                   28.5k
> >>>> 8               64.4k                   31.4k
> >>>> 16              65.7k                   37.8k
> >>>>
> >>>> Outside of my usual environment, so this is just running on a virtualized
> >>>> NVMe device in qemu, using ext4 as the file system. NVMe isn't very
> >>>> efficient virtualized, so we run out of steam at ~65K which is why we
> >>>> flatline on the patched side (nvme_submit_cmd() eats ~75% of the test app
> >>>> CPU). Before that happens, it's a linear increase. Not shown is context
> >>>> switch rate, which is massively lower with the new code. The old thread
> >>>> offload adds a blocking thread per pending IO, so context rate quickly
> >>>> goes through the roof.
> >>>>
> >>>> The goal here is efficiency. Async thread offload adds latency, and
> >>>> it also adds noticable overhead on items such as adding pages to the
> >>>> page cache. By allowing proper async buffered read support, we don't
> >>>> have X threads hammering on the same inode page cache, we have just
> >>>> the single app actually doing IO.
> >>>>
> >>>> Been beating on this and it's solid for me, and I'm now pretty happy
> >>>> with how it all turned out. Not aware of any missing bits/pieces or
> >>>> code cleanups that need doing.
> >>>>
> >>>> Series can also be found here:
> >>>>
> >>>> https://git.kernel.dk/cgit/linux-block/log/?h=async-buffered.5
> >>>>
> >>>> or pull from:
> >>>>
> >>>> git://git.kernel.dk/linux-block async-buffered.5
> >>>>
> >>>
> >>> Hi Jens,
> >>>
> >>> I have pulled linux-block.git#async-buffered.5 on top of Linux v5.7-rc7.
> >>>
> >>> From first feelings:
> >>> The booting into the system (until sddm display-login-manager) took a
> >>> bit longer.
> >>> The same after login and booting into KDE/Plasma.
> >>
> >> There is no difference for "regular" use cases, only io_uring with
> >> buffered reads will behave differently. So I don't think you have longer
> >> boot times due to this.
> >>
> >>> I am building/linking with LLVM/Clang/LLD v10.0.1-rc1 on Debian/testing AMD64.
> >>>
> >>> Here I have an internal HDD (SATA) and my Debian-system is on an
> >>> external HDD connected via USB-3.0.
> >>> Primarily, I use Ext4-FS.
> >>>
> >>> As said above is the "emotional" side, but I need some technical instructions.
> >>>
> >>> How can I see Async Buffer Reads is active on a Ext4-FS-formatted partition?
> >>
> >> You can't see that. It'll always be available on ext4 with this series,
> >> and you can watch io_uring instances to see if anyone is using it.
> >>
> >
> > Thanks for answering my questions.
> >
> > How can I "watch io_uring instances"?
>
> You can enable io_uring tracing:
>
> # echo 1 > /sys/kernel/debug/tracing/events/io_uring/io_uring_create/enable
> # tail /sys/kernel/debug/tracing/trace
>
> and see if you get any events for setup. Generally you can also look for
> the existence of io_wq_manager processes, these will exist for an
> io_uring instance.
>
> > FIO?
> > Debian has fio version 3.19-2 in its apt repositories.
> > Version OK?
>
> Yeah that should work.
>

I did:

# echo 1 > /sys/kernel/debug/tracing/events/io_uring/io_uring_create/enable
# cat /sys/kernel/debug/tracing/events/io_uring/io_uring_create/enable
1

# cat ./buf-test-dileks-min
[buf-test-dileks-min]
filename=/path/to/iso-image-file
buffered=1
ioengine=io_uring

# fio --showcmd ./buf-test-dileks-min
fio --name=buf-test-dileks-min --buffered=1 --ioengine=io_uring
--filename=filename=/path/to/iso-image-file

# fio ./buf-test-dileks-min

# ps -ef | egrep 'f[i]o|i[o]_wq'
root        6695    6066 24 20:13 pts/2    00:00:00 fio ./buf-test-dileks-min
root        6701    6695 22 20:13 ?        00:00:00 fio ./buf-test-dileks-min
root        6702       2  0 20:13 ?        00:00:00 [io_wq_manager]
root        6703       2  0 20:13 ?        00:00:00 [io_wqe_worker-0]

# LC_ALL=C tail -f /sys/kernel/debug/tracing/trace
...
# entries-in-buffer/entries-written: 16/16   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
...
             fio-6701  [001] ....  6775.117015: io_uring_create: ring
00000000ef052188, fd 5 sq size 1, cq size 2, flags 0

Looks like this works.

Thanks Jens.

- Sedat -

  reply	other threads:[~2020-05-28 18:20 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-26 19:51 [PATCHSET v5 0/12] Add support for async buffered reads Jens Axboe
2020-05-26 19:51 ` [PATCH 01/12] block: read-ahead submission should imply no-wait as well Jens Axboe
2020-05-26 19:51 ` [PATCH 02/12] mm: allow read-ahead with IOCB_NOWAIT set Jens Axboe
2020-05-26 20:45   ` Johannes Weiner
2020-05-26 19:51 ` [PATCH 03/12] mm: abstract out wake_page_match() from wake_page_function() Jens Axboe
2020-05-26 21:02   ` Johannes Weiner
2020-06-01 14:16   ` Matthew Wilcox
2020-05-26 19:51 ` [PATCH 04/12] mm: add support for async page locking Jens Axboe
2020-05-26 21:59   ` Johannes Weiner
2020-05-26 22:01     ` Jens Axboe
2020-05-27 16:02       ` Johannes Weiner
2020-06-01 14:26   ` Matthew Wilcox
2020-06-01 17:15     ` Jens Axboe
2020-05-26 19:51 ` [PATCH 05/12] mm: support async buffered reads in generic_file_buffered_read() Jens Axboe
2020-05-26 22:00   ` Johannes Weiner
2020-05-26 19:51 ` [PATCH 06/12] fs: add FMODE_BUF_RASYNC Jens Axboe
2020-05-26 19:51 ` [PATCH 07/12] ext4: flag as supporting buffered async reads Jens Axboe
2020-05-26 19:51 ` [PATCH 08/12] block: flag block devices as supporting IOCB_WAITQ Jens Axboe
2020-05-26 19:51 ` [PATCH 09/12] xfs: flag files as supporting buffered async reads Jens Axboe
2020-05-28 17:53   ` Darrick J. Wong
2020-05-28 19:23     ` Jens Axboe
2020-05-26 19:51 ` [PATCH 10/12] btrfs: " Jens Axboe
2020-05-26 19:57   ` Chris Mason
2020-05-26 19:51 ` [PATCH 11/12] mm: add kiocb_wait_page_queue_init() helper Jens Axboe
2020-05-26 22:01   ` Johannes Weiner
2020-05-26 19:51 ` [PATCH 12/12] io_uring: support true async buffered reads, if file provides it Jens Axboe
2020-05-28 17:02 ` [PATCHSET v5 0/12] Add support for async buffered reads Sedat Dilek
2020-05-28 17:02   ` Sedat Dilek
2020-05-28 17:06   ` Jens Axboe
2020-05-28 17:12     ` Sedat Dilek
2020-05-28 17:12       ` Sedat Dilek
2020-05-28 17:14       ` Jens Axboe
2020-05-28 18:20         ` Sedat Dilek [this message]
2020-05-28 18:20           ` Sedat Dilek
2020-05-29 10:02     ` Sedat Dilek
2020-05-29 10:02       ` Sedat Dilek
2020-05-29 11:22       ` Sedat Dilek
2020-05-29 11:22         ` Sedat Dilek
2020-05-30 13:36         ` Sedat Dilek
2020-05-30 13:36           ` Sedat Dilek
2020-05-30 18:57           ` Sedat Dilek
2020-05-30 18:57             ` Sedat Dilek
2020-05-31  1:57             ` Jens Axboe
2020-05-31  7:04               ` Sedat Dilek
2020-05-31  7:04                 ` Sedat Dilek
2020-05-31  7:12                 ` Sedat Dilek
2020-05-31  7:12                   ` Sedat Dilek
2020-06-01 13:35                   ` Sedat Dilek
2020-06-01 13:35                     ` Sedat Dilek
2020-06-01 14:04                     ` Jens Axboe
2020-06-01 14:13                       ` Sedat Dilek
2020-06-01 14:13                         ` Sedat Dilek
2020-06-01 14:14                         ` Jens Axboe
2020-06-01 14:35                           ` Jens Axboe
2020-06-01 14:43                             ` Sedat Dilek
2020-06-01 14:43                               ` Sedat Dilek
2020-06-01 14:46                               ` Jens Axboe
2020-06-01 14:51                                 ` Sedat Dilek
2020-06-01 14:51                                   ` Sedat Dilek
2020-06-01 20:18                                 ` Sedat Dilek
2020-06-01 20:18                                   ` Sedat Dilek
2020-06-04  0:59 ` Andres Freund
2020-06-04  1:04   ` Jens Axboe
2020-06-04  1:30     ` Andres Freund
2020-06-05 19:56       ` Andres Freund
2020-06-05 14:42     ` Jens Axboe
2020-06-05 20:20       ` Andres Freund
2020-06-05 20:21         ` Jens Axboe
2020-06-05 20:36         ` Andres Freund
2020-06-05 20:53           ` Jens Axboe
2020-06-05 21:13             ` Jens Axboe
2020-06-05 21:21               ` Jens Axboe
2020-06-05 22:30                 ` Andres Freund
2020-06-05 22:36                   ` Andres Freund
2020-06-05 22:49                     ` Jens Axboe
2020-06-05 22:54                       ` Andres Freund
2020-06-05 22:56                         ` Jens Axboe
2020-06-05 23:02                           ` Andres Freund
2020-06-06  0:33                 ` Sedat Dilek
2020-06-06  0:33                   ` Sedat Dilek
2020-06-06 16:04                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+icZUUnXYO9qX3PJTs=jJXT_1F_d8x5oUfuxkR_mViyfkFDOQ@mail.gmail.com' \
    --to=sedat.dilek@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.