All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
Date: Wed, 1 Mar 2023 13:51:28 +0800	[thread overview]
Message-ID: <01ff76e3-87fd-0105-c363-44eecff12b57@linux.alibaba.com> (raw)
In-Reply-To: <Y/7lvBfzJXntfWal@casper.infradead.org>



On 2023/3/1 13:42, Matthew Wilcox wrote:
> On Wed, Mar 01, 2023 at 01:09:34PM +0800, Gao Xiang wrote:
>> On 2023/3/1 13:01, Matthew Wilcox wrote:
>>> On Wed, Mar 01, 2023 at 12:49:10PM +0800, Gao Xiang wrote:
>>>>> The only problem is that the readahead code doesn't tell the filesystem
>>>>> whether the request is sync or async.  This should be a simple matter
>>>>> of adding a new 'bool async' to the readahead_control and then setting
>>>>> REQ_RAHEAD based on that, rather than on whether the request came in
>>>>> through readahead() or read_folio() (eg see mpage_readahead()).
>>>>
>>>> Great!  In addition to that, just (somewhat) off topic, if we have a
>>>> "bool async" now, I think it will immediately have some users (such as
>>>> EROFS), since we'd like to do post-processing (such as decompression)
>>>> immediately in the same context with sync readahead (due to missing
>>>> pages) and leave it to another kworker for async readahead (I think
>>>> it's almost same for decryption and verification).
>>>>
>>>> So "bool async" is quite useful on my side if it could be possible
>>>> passed to fs side.  I'd like to raise my hands to have it.
>>>
>>> That's a really interesting use-case; thanks for bringing it up.
>>>
>>> Ideally, we'd have the waiting task do the
>>> decompression/decryption/verification for proper accounting of CPU.
>>> Unfortunately, if the folio isn't uptodate, the task doesn't even hold
>>> a reference to the folio while it waits, so there's no way to wake the
>>> task and let it know that it has work to do.  At least not at the moment
>>> ... let me think about that a bit (and if you see a way to do it, feel
>>> free to propose it)
>>
>> Honestly, I'd like to take the folio lock until all post-processing is
>> done and make it uptodate and unlock so that only we need is to pass
>> locked-folios requests to kworkers for async way or sync handling in
>> the original context.
>>
>> If we unlocked these folios in advance without uptodate, which means
>> we have to lock it again (which could have more lock contention) and
>> need to have a way to trace I/Oed but not post-processed stuff in
>> addition to no I/Oed stuff.
> 
> Right, look at how it's handled right now ...
> 
> sys_read() ends up in filemap_get_pages() which (assuming no folio in
> cache) calls page_cache_sync_readahead().  That creates locked, !uptodate
> folios and asks the filesystem to fill them.  Unless that completes
> incredibly quickly, filemap_get_pages() ends up in filemap_update_page()
> which calls folio_put_wait_locked().
> 
> If the filesystem BIO completion routine could identify if there was
> a task waiting and select one of them, it could wake up the waiter and
> pass it a description of what work it needed to do (with the folio still
> locked), rather than do the postprocessing itself and unlock the folio

Currently EROFS sync decompression is waiting in .readahead() with locked
page cache folios, one "completion" together than BIO descriptor
(bi_private) in the original context, so that the filesystem BIO completion
just needs to complete the completion and wakeup the original context
(due to missing pages, so the original context will need the page data
immediately as well) to go on .readhead() and unlock folios.

Does this way have some flew? Or I'm missing something?

Thanks,
Gao Xiang

> 
> But that all seems _very_ hard to do with 100% reliability.  Note the
> comment in folio_wait_bit_common() which points out that the waiters
> bit may be set even when there are no waiters.  The wake_up code
> doesn't seem to support this kind of thing (all waiters are
> non-exclusive, but only wake up one of them).

  reply	other threads:[~2023-03-01  5:51 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01  3:52 [LSF/MM/BPF TOPIC] Cloud storage optimizations Theodore Ts'o
2023-03-01  4:18 ` Gao Xiang
2023-03-01  4:40   ` Matthew Wilcox
2023-03-01  4:59     ` Gao Xiang
2023-03-01  4:35 ` Matthew Wilcox
2023-03-01  4:49   ` Gao Xiang
2023-03-01  5:01     ` Matthew Wilcox
2023-03-01  5:09       ` Gao Xiang
2023-03-01  5:19         ` Gao Xiang
2023-03-01  5:42         ` Matthew Wilcox
2023-03-01  5:51           ` Gao Xiang [this message]
2023-03-01  6:00             ` Gao Xiang
2023-03-02  3:13 ` Chaitanya Kulkarni
2023-03-02  3:50 ` Darrick J. Wong
2023-03-03  3:03   ` Martin K. Petersen
2023-03-02 20:30 ` Bart Van Assche
2023-03-03  3:05   ` Martin K. Petersen
2023-03-03  1:58 ` Keith Busch
2023-03-03  3:49   ` Matthew Wilcox
2023-03-03 11:32     ` Hannes Reinecke
2023-03-03 13:11     ` James Bottomley
2023-03-04  7:34       ` Matthew Wilcox
2023-03-04 13:41         ` James Bottomley
2023-03-04 16:39           ` Matthew Wilcox
2023-03-05  4:15             ` Luis Chamberlain
2023-03-05  5:02               ` Matthew Wilcox
2023-03-08  6:11                 ` Luis Chamberlain
2023-03-08  7:59                   ` Dave Chinner
2023-03-06 12:04               ` Hannes Reinecke
2023-03-06  3:50             ` James Bottomley
2023-03-04 19:04         ` Luis Chamberlain
2023-03-03 21:45     ` Luis Chamberlain
2023-03-03 22:07       ` Keith Busch
2023-03-03 22:14         ` Luis Chamberlain
2023-03-03 22:32           ` Keith Busch
2023-03-03 23:09             ` Luis Chamberlain
2023-03-16 15:29             ` Pankaj Raghav
2023-03-16 15:41               ` Pankaj Raghav
2023-03-03 23:51       ` Bart Van Assche
2023-03-04 11:08       ` Hannes Reinecke
2023-03-04 13:24         ` Javier González
2023-03-04 16:47         ` Matthew Wilcox
2023-03-04 17:17           ` Hannes Reinecke
2023-03-04 17:54             ` Matthew Wilcox
2023-03-04 18:53               ` Luis Chamberlain
2023-03-05  3:06               ` Damien Le Moal
2023-03-05 11:22               ` Hannes Reinecke
2023-03-06  8:23                 ` Matthew Wilcox
2023-03-06 10:05                   ` Hannes Reinecke
2023-03-06 16:12                   ` Theodore Ts'o
2023-03-08 17:53                     ` Matthew Wilcox
2023-03-08 18:13                       ` James Bottomley
2023-03-09  8:04                         ` Javier González
2023-03-09 13:11                           ` James Bottomley
2023-03-09 14:05                             ` Keith Busch
2023-03-09 15:23                             ` Martin K. Petersen
2023-03-09 20:49                               ` James Bottomley
2023-03-09 21:13                                 ` Luis Chamberlain
2023-03-09 21:28                                   ` Martin K. Petersen
2023-03-10  1:16                                     ` Dan Helmick
2023-03-10  7:59                             ` Javier González
2023-03-08 19:35                 ` Luis Chamberlain
2023-03-08 19:55                 ` Bart Van Assche
2023-03-03  2:54 ` Martin K. Petersen
2023-03-03  3:29   ` Keith Busch
2023-03-03  4:20   ` Theodore Ts'o
2023-07-16  4:09 BELINDA Goodpaster kelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01ff76e3-87fd-0105-c363-44eecff12b57@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.