All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v3 0/5] Support for RWF_UNCACHED
@ 2019-12-11 15:29 Jens Axboe
  2019-12-11 15:29 ` [PATCH 1/5] fs: add read support " Jens Axboe
                   ` (6 more replies)
  0 siblings, 7 replies; 72+ messages in thread
From: Jens Axboe @ 2019-12-11 15:29 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block; +Cc: willy, clm, torvalds, david

Recently someone asked me how io_uring buffered IO compares to mmaped
IO in terms of performance. So I ran some tests with buffered IO, and
found the experience to be somewhat painful. The test case is pretty
basic, random reads over a dataset that's 10x the size of RAM.
Performance starts out fine, and then the page cache fills up and we
hit a throughput cliff. CPU usage of the IO threads go up, and we have
kswapd spending 100% of a core trying to keep up. Seeing that, I was
reminded of the many complaints I here about buffered IO, and the fact
that most of the folks complaining will ultimately bite the bullet and
move to O_DIRECT to just get the kernel out of the way.

But I don't think it needs to be like that. Switching to O_DIRECT isn't
always easily doable. The buffers have different life times, size and
alignment constraints, etc. On top of that, mixing buffered and O_DIRECT
can be painful.

Seems to me that we have an opportunity to provide something that sits
somewhere in between buffered and O_DIRECT, and this is where
RWF_UNCACHED enters the picture. If this flag is set on IO, we get the
following behavior:

- If the data is in cache, it remains in cache and the copy (in or out)
  is served to/from that.

- If the data is NOT in cache, we add it while performing the IO. When
  the IO is done, we remove it again.

With this, I can do 100% smooth buffered reads or writes without pushing
the kernel to the state where kswapd is sweating bullets. In fact it
doesn't even register.

Comments appreciated! This should work on any standard file system,
using either the generic helpers or iomap. I have tested ext4 and xfs
for the right read/write behavior, but no further validation has been
done yet. Patches are against current git, and can also be found here:

https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached

 fs/ceph/file.c          |  2 +-
 fs/dax.c                |  2 +-
 fs/ext4/file.c          |  2 +-
 fs/iomap/apply.c        | 26 ++++++++++-
 fs/iomap/buffered-io.c  | 54 ++++++++++++++++-------
 fs/iomap/direct-io.c    |  3 +-
 fs/iomap/fiemap.c       |  5 ++-
 fs/iomap/seek.c         |  6 ++-
 fs/iomap/swapfile.c     |  2 +-
 fs/nfs/file.c           |  2 +-
 include/linux/fs.h      |  7 ++-
 include/linux/iomap.h   | 10 ++++-
 include/uapi/linux/fs.h |  5 ++-
 mm/filemap.c            | 95 ++++++++++++++++++++++++++++++++++++-----
 14 files changed, 181 insertions(+), 40 deletions(-)

Changes since v2:
- Rework the write side according to Chinners suggestions. Much cleaner
  this way. It does mean that we invalidate the full write region if just
  ONE page (or more) had to be created, where before it was more granular.
  I don't think that's a concern, and on the plus side, we now no longer
  have to chunk invalidations into 15/16 pages at the time.
- Cleanups

Changes since v1:
- Switch to pagevecs for write_drop_cached_pages()
- Use page_offset() instead of manual shift
- Ensure we hold a reference on the page between calling ->write_end()
  and checking the mapping on the locked page
- Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag after
  the first page

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCHSET v4 0/5] Support for RWF_UNCACHED
@ 2019-12-12 19:01 Jens Axboe
  2019-12-12 19:01 ` [PATCH 1/5] fs: add read support " Jens Axboe
  0 siblings, 1 reply; 72+ messages in thread
From: Jens Axboe @ 2019-12-12 19:01 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block; +Cc: willy, clm, torvalds, david

Recently someone asked me how io_uring buffered IO compares to mmaped
IO in terms of performance. So I ran some tests with buffered IO, and
found the experience to be somewhat painful. The test case is pretty
basic, random reads over a dataset that's 10x the size of RAM.
Performance starts out fine, and then the page cache fills up and we
hit a throughput cliff. CPU usage of the IO threads go up, and we have
kswapd spending 100% of a core trying to keep up. Seeing that, I was
reminded of the many complaints I here about buffered IO, and the fact
that most of the folks complaining will ultimately bite the bullet and
move to O_DIRECT to just get the kernel out of the way.

But I don't think it needs to be like that. Switching to O_DIRECT isn't
always easily doable. The buffers have different life times, size and
alignment constraints, etc. On top of that, mixing buffered and O_DIRECT
can be painful.

Seems to me that we have an opportunity to provide something that sits
somewhere in between buffered and O_DIRECT, and this is where
RWF_UNCACHED enters the picture. If this flag is set on IO, we get the
following behavior:

- If the data is in cache, it remains in cache and the copy (in or out)
  is served to/from that. This is true for both reads and writes.

- For writes, if the data is NOT in cache, we add it while performing the
  IO. When the IO is done, we remove it again.

- For reads, if the data is NOT in the cache, we allocate a private page
  and use that for IO. When the IO is done, we free this page. The page
  never sees the page cache.

With this, I can do 100% smooth buffered reads or writes without pushing
the kernel to the state where kswapd is sweating bullets. In fact it
doesn't even register.

Comments appreciated! This should work on any standard file system,
using either the generic helpers or iomap. I have tested ext4 and xfs
for the right read/write behavior, but no further validation has been
done yet. This version contains the bigger prep patch of switching
iomap_apply() and actors to struct iomap_data, and I hope I didn't
mess that up too badly. I'll try and exercise it all, I've done XFS
mounts and reads+writes and it seems happy from that POV at least.

The core of the changes are actually really small, the majority of
the diff is just prep work to get there.

Patches are against current git, and can also be found here:

https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached

 fs/ceph/file.c          |   2 +-
 fs/dax.c                |  25 +++--
 fs/ext4/file.c          |   2 +-
 fs/iomap/apply.c        |  50 ++++++---
 fs/iomap/buffered-io.c  | 225 +++++++++++++++++++++++++---------------
 fs/iomap/direct-io.c    |  57 +++++-----
 fs/iomap/fiemap.c       |  48 +++++----
 fs/iomap/seek.c         |  64 +++++++-----
 fs/iomap/swapfile.c     |  27 ++---
 fs/nfs/file.c           |   2 +-
 include/linux/fs.h      |   7 +-
 include/linux/iomap.h   |  20 +++-
 include/uapi/linux/fs.h |   5 +-
 mm/filemap.c            |  89 +++++++++++++---
 14 files changed, 416 insertions(+), 207 deletions(-)

Changes since v3:
- Add iomap_actor_data to cut down on arguments
- Fix bad flag drop in iomap_write_begin()
- Remove unused IOMAP_WRITE_F_UNCACHED flag
- Don't use the page cache at all for reads

Changes since v2:
- Rework the write side according to Chinners suggestions. Much cleaner
  this way. It does mean that we invalidate the full write region if just
  ONE page (or more) had to be created, where before it was more granular.
  I don't think that's a concern, and on the plus side, we now no longer
  have to chunk invalidations into 15/16 pages at the time.
- Cleanups

Changes since v1:
- Switch to pagevecs for write_drop_cached_pages()
- Use page_offset() instead of manual shift
- Ensure we hold a reference on the page between calling ->write_end()
  and checking the mapping on the locked page
- Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag after
  the first page

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCHSET v2 0/5] Support for RWF_UNCACHED
@ 2019-12-10 20:42 Jens Axboe
  2019-12-10 20:43 ` [PATCH 1/5] fs: add read support " Jens Axboe
  0 siblings, 1 reply; 72+ messages in thread
From: Jens Axboe @ 2019-12-10 20:42 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block; +Cc: willy, clm

ecently someone asked me how io_uring buffered IO compares to mmaped
IO in terms of performance. So I ran some tests with buffered IO, and
found the experience to be somewhat painful. The test case is pretty
basic, random reads over a dataset that's 10x the size of RAM.
Performance starts out fine, and then the page cache fills up and we
hit a throughput cliff. CPU usage of the IO threads go up, and we have
kswapd spending 100% of a core trying to keep up. Seeing that, I was
reminded of the many complaints I here about buffered IO, and the fact
that most of the folks complaining will ultimately bite the bullet and
move to O_DIRECT to just get the kernel out of the way.

But I don't think it needs to be like that. Switching to O_DIRECT isn't
always easily doable. The buffers have different life times, size and
alignment constraints, etc. On top of that, mixing buffered and O_DIRECT
can be painful.

Seems to me that we have an opportunity to provide something that sits
somewhere in between buffered and O_DIRECT, and this is where
RWF_UNCACHED enters the picture. If this flag is set on IO, we get the
following behavior:

- If the data is in cache, it remains in cache and the copy (in or out)
  is served to/from that.

- If the data is NOT in cache, we add it while performing the IO. When
  the IO is done, we remove it again.

With this, I can do 100% smooth buffered reads or writes without pushing
the kernel to the state where kswapd is sweating bullets. In fact it
doesn't even register.

Comments appreciated! This should work on any standard file system,
using either the generic helpers or iomap. Patches are against current
git, and can also be found here:

https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached

 fs/ceph/file.c          |   2 +-
 fs/dax.c                |   2 +-
 fs/ext4/file.c          |   2 +-
 fs/iomap/apply.c        |   2 +-
 fs/iomap/buffered-io.c  |  89 +++++++++++++++++++------
 fs/iomap/direct-io.c    |   3 +-
 fs/iomap/fiemap.c       |   5 +-
 fs/iomap/seek.c         |   6 +-
 fs/iomap/swapfile.c     |   2 +-
 fs/nfs/file.c           |   2 +-
 include/linux/fs.h      |  11 +++-
 include/linux/iomap.h   |   6 +-
 include/uapi/linux/fs.h |   5 +-
 mm/filemap.c            | 139 ++++++++++++++++++++++++++++++++++++----
 14 files changed, 230 insertions(+), 46 deletions(-)

Changes since v1:
- Switch to pagevecs for write_drop_cached_pages()
- Use page_offset() instead of manual shift
- Ensure we hold a reference on the page between calling ->write_end()
  and checking the mapping on the locked page
- Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag after
  the first page

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCHSET 0/5] Support for RWF_UNCACHED
@ 2019-12-10 16:24 Jens Axboe
  2019-12-10 16:24 ` [PATCH 1/5] fs: add read support " Jens Axboe
  0 siblings, 1 reply; 72+ messages in thread
From: Jens Axboe @ 2019-12-10 16:24 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-block

Recently someone asked me how io_uring buffered IO compares to mmaped
IO in terms of performance. So I ran some tests with buffered IO, and
found the experience to be somewhat painful. The test case is pretty
basic, random reads over a dataset that's 10x the size of RAM.
Performance starts out fine, and then the page cache fills up and we
hit a throughput cliff. CPU usage of the IO threads go up, and we have
kswapd spending 100% of a core trying to keep up. Seeing that, I was
reminded of the many complaints I here about buffered IO, and the fact
that most of the folks complaining will ultimately bite the bullet and
move to O_DIRECT to just get the kernel out of the way.

But I don't think it needs to be like that. Switching to O_DIRECT isn't
always easily doable. The buffers have different life times, size and
alignment constraints, etc. On top of that, mixing buffered and O_DIRECT
can be painful.

Seems to me that we have an opportunity to provide something that sits
somewhere in between buffered and O_DIRECT, and this is where
RWF_UNCACHED enters the picture. If this flag is set on IO, we get the
following behavior:

- If the data is in cache, it remains in cache and the copy (in or out)
  is served to/from that.

- If the data is NOT in cache, we add it while performing the IO. When
  the IO is done, we remove it again.

With this, I can do 100% smooth buffered reads or writes without pushing
the kernel to the state where kswapd is sweating bullets. In fact it
doesn't even register.

Comments appreciated! Patches are against current git (ish), and can
also be found here:

https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached

 fs/ceph/file.c          |   2 +-
 fs/dax.c                |   2 +-
 fs/ext4/file.c          |   2 +-
 fs/iomap/apply.c        |   2 +-
 fs/iomap/buffered-io.c  |  75 +++++++++++++++++------
 fs/iomap/direct-io.c    |   3 +-
 fs/iomap/fiemap.c       |   5 +-
 fs/iomap/seek.c         |   6 +-
 fs/iomap/swapfile.c     |   2 +-
 fs/nfs/file.c           |   2 +-
 include/linux/fs.h      |   9 ++-
 include/linux/iomap.h   |   6 +-
 include/uapi/linux/fs.h |   5 +-
 mm/filemap.c            | 132 ++++++++++++++++++++++++++++++++++++----
 14 files changed, 208 insertions(+), 45 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2020-02-01 10:33 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 15:29 [PATCHSET v3 0/5] Support for RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-11 15:29 ` [PATCH 2/5] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-11 15:29 ` [PATCH 3/5] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 4/5] iomap: pass in the write_begin/write_end flags to iomap_actor Jens Axboe
2019-12-11 17:19   ` Linus Torvalds
2019-12-11 17:19     ` Linus Torvalds
2019-12-11 15:29 ` [PATCH 5/5] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-11 17:19   ` Matthew Wilcox
2019-12-11 18:05     ` Jens Axboe
2019-12-12 22:34   ` Dave Chinner
2019-12-13  0:54     ` Jens Axboe
2019-12-13  0:57       ` Jens Axboe
2019-12-16  4:17         ` Dave Chinner
2019-12-17 14:31           ` Jens Axboe
2019-12-18  0:49             ` Dave Chinner
2019-12-18  1:01               ` Jens Axboe
2019-12-11 17:37 ` [PATCHSET v3 0/5] Support for RWF_UNCACHED Linus Torvalds
2019-12-11 17:37   ` Linus Torvalds
2019-12-11 17:56   ` Jens Axboe
2019-12-11 19:14     ` Linus Torvalds
2019-12-11 19:14       ` Linus Torvalds
2019-12-11 19:34     ` Jens Axboe
2019-12-11 20:03       ` Linus Torvalds
2019-12-11 20:03         ` Linus Torvalds
2019-12-11 20:08         ` Jens Axboe
2019-12-11 20:18           ` Linus Torvalds
2019-12-11 20:18             ` Linus Torvalds
2019-12-11 21:04             ` Johannes Weiner
2019-12-12  1:30               ` Jens Axboe
2019-12-11 23:41             ` Jens Axboe
2019-12-12  1:08               ` Linus Torvalds
2019-12-12  1:08                 ` Linus Torvalds
2019-12-12  1:11                 ` Jens Axboe
2019-12-12  1:22                   ` Linus Torvalds
2019-12-12  1:22                     ` Linus Torvalds
2019-12-12  1:29                     ` Jens Axboe
2019-12-12  1:41                       ` Linus Torvalds
2019-12-12  1:41                         ` Linus Torvalds
2019-12-12  1:56                         ` Matthew Wilcox
2019-12-12  2:47                           ` Linus Torvalds
2019-12-12  2:47                             ` Linus Torvalds
2019-12-12 17:52                             ` Matthew Wilcox
2019-12-12 18:29                               ` Linus Torvalds
2019-12-12 18:29                                 ` Linus Torvalds
2019-12-12 20:05                                 ` Matthew Wilcox
2019-12-12  1:41                       ` Jens Axboe
2019-12-12  1:49                         ` Linus Torvalds
2019-12-12  1:49                           ` Linus Torvalds
2019-12-12  1:09               ` Jens Axboe
2019-12-12  2:03                 ` Jens Axboe
2019-12-12  2:10                   ` Jens Axboe
2019-12-12  2:21                   ` Matthew Wilcox
2019-12-12  2:38                     ` Jens Axboe
2019-12-12 22:18                 ` Dave Chinner
2019-12-13  1:32                   ` Chris Mason
2020-01-07 17:42                     ` Christoph Hellwig
2020-01-08 14:09                       ` Chris Mason
2020-02-01 10:33                     ` Andres Freund
2019-12-11 20:43           ` Matthew Wilcox
2019-12-11 20:04       ` Jens Axboe
2019-12-12 10:44 ` Martin Steigerwald
2019-12-12 15:16   ` Jens Axboe
2019-12-12 21:45     ` Martin Steigerwald
2019-12-12 22:15       ` Jens Axboe
2019-12-12 22:18     ` Linus Torvalds
2019-12-12 22:18       ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2019-12-12 19:01 [PATCHSET v4 " Jens Axboe
2019-12-12 19:01 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-12 21:21   ` Matthew Wilcox
2019-12-12 21:27     ` Jens Axboe
2019-12-10 20:42 [PATCHSET v2 0/5] Support " Jens Axboe
2019-12-10 20:43 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-10 16:24 [PATCHSET 0/5] Support " Jens Axboe
2019-12-10 16:24 ` [PATCH 1/5] fs: add read support " Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.