All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, willy@infradead.org, clm@fb.com,
	torvalds@linux-foundation.org, david@fromorbit.com
Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED
Date: Thu, 12 Dec 2019 15:15:33 -0700	[thread overview]
Message-ID: <05adab5c-1405-f4a3-b14f-3242fa5ce8fc@kernel.dk> (raw)
In-Reply-To: <2091494.0NDvsO6yje@merkaba>

On 12/12/19 2:45 PM, Martin Steigerwald wrote:
> Jens Axboe - 12.12.19, 16:16:31 CET:
>> On 12/12/19 3:44 AM, Martin Steigerwald wrote:
>>> Jens Axboe - 11.12.19, 16:29:38 CET:
>>>> Recently someone asked me how io_uring buffered IO compares to
>>>> mmaped
>>>> IO in terms of performance. So I ran some tests with buffered IO,
>>>> and
>>>> found the experience to be somewhat painful. The test case is
>>>> pretty
>>>> basic, random reads over a dataset that's 10x the size of RAM.
>>>> Performance starts out fine, and then the page cache fills up and
>>>> we
>>>> hit a throughput cliff. CPU usage of the IO threads go up, and we
>>>> have kswapd spending 100% of a core trying to keep up. Seeing
>>>> that, I was reminded of the many complaints I here about buffered
>>>> IO, and the fact that most of the folks complaining will
>>>> ultimately bite the bullet and move to O_DIRECT to just get the
>>>> kernel out of the way.
>>>>
>>>> But I don't think it needs to be like that. Switching to O_DIRECT
>>>> isn't always easily doable. The buffers have different life times,
>>>> size and alignment constraints, etc. On top of that, mixing
>>>> buffered
>>>> and O_DIRECT can be painful.
>>>>
>>>> Seems to me that we have an opportunity to provide something that
>>>> sits somewhere in between buffered and O_DIRECT, and this is where
>>>> RWF_UNCACHED enters the picture. If this flag is set on IO, we get
>>>> the following behavior:
>>>>
>>>> - If the data is in cache, it remains in cache and the copy (in or
>>>> out) is served to/from that.
>>>>
>>>> - If the data is NOT in cache, we add it while performing the IO.
>>>> When the IO is done, we remove it again.
>>>>
>>>> With this, I can do 100% smooth buffered reads or writes without
>>>> pushing the kernel to the state where kswapd is sweating bullets.
>>>> In
>>>> fact it doesn't even register.
>>>
>>> A question from a user or Linux Performance trainer perspective:
>>>
>>> How does this compare with posix_fadvise() with POSIX_FADV_DONTNEED
>>> that for example the nocache¹ command is using? Excerpt from
>>> manpage> 
>>> posix_fadvice(2):
>>>        POSIX_FADV_DONTNEED
>>>        
>>>               The specified data will not be accessed  in  the  near
>>>               future.
>>>               
>>>               POSIX_FADV_DONTNEED  attempts to free cached pages as‐
>>>               sociated with the specified region.  This  is  useful,
>>>               for  example,  while streaming large files.  A program
>>>               may periodically request the  kernel  to  free  cached
>>>               data  that  has already been used, so that more useful
>>>               cached pages are not discarded instead.
>>>
>>> [1] packaged in Debian as nocache or available
>>> herehttps://github.com/ Feh/nocache
>>>
>>> In any way, would be nice to have some option in rsync… I still did
>>> not change my backup script to call rsync via nocache.
>>
>> I don't know the nocache tool, but I'm guessing it just does the
>> writes (or reads) and then uses FADV_DONTNEED to drop behind those
>> pages? That's fine for slower use cases, it won't work very well for
>> fast IO. The write side currently works pretty much like that
>> internally, whereas the read side doesn't use the page cache at all.
> 
> Yes, it does that. And yeah I saw you changed the read site to bypass 
> the cache entirely.
> 
> Also as I understand it this is for asynchronous using io uring 
> primarily?

Or preadv2/pwritev2, they also allow passing in RWF_* flags.

-- 
Jens Axboe


  reply	other threads:[~2019-12-12 22:15 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 15:29 [PATCHSET v3 0/5] Support for RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-11 15:29 ` [PATCH 2/5] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-11 15:29 ` [PATCH 3/5] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 4/5] iomap: pass in the write_begin/write_end flags to iomap_actor Jens Axboe
2019-12-11 17:19   ` Linus Torvalds
2019-12-11 17:19     ` Linus Torvalds
2019-12-11 15:29 ` [PATCH 5/5] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-11 17:19   ` Matthew Wilcox
2019-12-11 18:05     ` Jens Axboe
2019-12-12 22:34   ` Dave Chinner
2019-12-13  0:54     ` Jens Axboe
2019-12-13  0:57       ` Jens Axboe
2019-12-16  4:17         ` Dave Chinner
2019-12-17 14:31           ` Jens Axboe
2019-12-18  0:49             ` Dave Chinner
2019-12-18  1:01               ` Jens Axboe
2019-12-11 17:37 ` [PATCHSET v3 0/5] Support for RWF_UNCACHED Linus Torvalds
2019-12-11 17:37   ` Linus Torvalds
2019-12-11 17:56   ` Jens Axboe
2019-12-11 19:14     ` Linus Torvalds
2019-12-11 19:14       ` Linus Torvalds
2019-12-11 19:34     ` Jens Axboe
2019-12-11 20:03       ` Linus Torvalds
2019-12-11 20:03         ` Linus Torvalds
2019-12-11 20:08         ` Jens Axboe
2019-12-11 20:18           ` Linus Torvalds
2019-12-11 20:18             ` Linus Torvalds
2019-12-11 21:04             ` Johannes Weiner
2019-12-12  1:30               ` Jens Axboe
2019-12-11 23:41             ` Jens Axboe
2019-12-12  1:08               ` Linus Torvalds
2019-12-12  1:08                 ` Linus Torvalds
2019-12-12  1:11                 ` Jens Axboe
2019-12-12  1:22                   ` Linus Torvalds
2019-12-12  1:22                     ` Linus Torvalds
2019-12-12  1:29                     ` Jens Axboe
2019-12-12  1:41                       ` Linus Torvalds
2019-12-12  1:41                         ` Linus Torvalds
2019-12-12  1:56                         ` Matthew Wilcox
2019-12-12  2:47                           ` Linus Torvalds
2019-12-12  2:47                             ` Linus Torvalds
2019-12-12 17:52                             ` Matthew Wilcox
2019-12-12 18:29                               ` Linus Torvalds
2019-12-12 18:29                                 ` Linus Torvalds
2019-12-12 20:05                                 ` Matthew Wilcox
2019-12-12  1:41                       ` Jens Axboe
2019-12-12  1:49                         ` Linus Torvalds
2019-12-12  1:49                           ` Linus Torvalds
2019-12-12  1:09               ` Jens Axboe
2019-12-12  2:03                 ` Jens Axboe
2019-12-12  2:10                   ` Jens Axboe
2019-12-12  2:21                   ` Matthew Wilcox
2019-12-12  2:38                     ` Jens Axboe
2019-12-12 22:18                 ` Dave Chinner
2019-12-13  1:32                   ` Chris Mason
2020-01-07 17:42                     ` Christoph Hellwig
2020-01-08 14:09                       ` Chris Mason
2020-02-01 10:33                     ` Andres Freund
2019-12-11 20:43           ` Matthew Wilcox
2019-12-11 20:04       ` Jens Axboe
2019-12-12 10:44 ` Martin Steigerwald
2019-12-12 15:16   ` Jens Axboe
2019-12-12 21:45     ` Martin Steigerwald
2019-12-12 22:15       ` Jens Axboe [this message]
2019-12-12 22:18     ` Linus Torvalds
2019-12-12 22:18       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05adab5c-1405-f4a3-b14f-3242fa5ce8fc@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin@lichtvoll.de \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.