Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
       [not found]   ` <20190130124420.1834-3-vbabka@suse.cz>
@ 2019-01-31  9:56     ` Michal Hocko
  2019-01-31 10:15       ` Jiri Kosina
  2019-02-01  1:44       ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Michal Hocko @ 2019-01-31  9:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linus Torvalds, linux-kernel, linux-mm, linux-api,
	Peter Zijlstra, Greg KH, Jann Horn, Jiri Kosina,
	Dominique Martinet, Andy Lutomirski, Dave Chinner, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, Jiri Kosina, linux-fsdevel

[Cc fs-devel]

On Wed 30-01-19 13:44:19, Vlastimil Babka wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> 
> preadv2(RWF_NOWAIT) can be used to open a side-channel to pagecache contents, as
> it reveals metadata about residency of pages in pagecache.
> 
> If preadv2(RWF_NOWAIT) returns immediately, it provides a clear "page not
> resident" information, and vice versa.
> 
> Close that sidechannel by always initiating readahead on the cache if we
> encounter a cache miss for preadv2(RWF_NOWAIT); with that in place, probing
> the pagecache residency itself will actually populate the cache, making the
> sidechannel useless.

I guess the current wording doesn't disallow background IO to be
triggered for EAGAIN case. I am not sure whether that breaks clever
applications which try to perform larger IO for those cases though.

> Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Dominique Martinet <asmadeus@codewreck.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Kevin Easton <kevin@guarana.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Cyril Hrubis <chrubis@suse.cz>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Daniel Gruss <daniel@gruss.cc>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/filemap.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 9f5e323e883e..7bcdd36e629d 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
>  
>  		page = find_get_page(mapping, index);
>  		if (!page) {
> -			if (iocb->ki_flags & IOCB_NOWAIT)
> -				goto would_block;
>  			page_cache_sync_readahead(mapping,
>  					ra, filp,
>  					index, last_index - index);

Maybe a stupid question but I am not really familiar with this path but
what exactly does prevent a sync read down page_cache_sync_readahead
path?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31  9:56     ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Michal Hocko
@ 2019-01-31 10:15       ` Jiri Kosina
  2019-01-31 10:23         ` Michal Hocko
  2019-02-01  1:44       ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Jiri Kosina @ 2019-01-31 10:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, Linus Torvalds, linux-kernel,
	linux-mm, linux-api, Peter Zijlstra, Greg KH, Jann Horn,
	Dominique Martinet, Andy Lutomirski, Dave Chinner, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, linux-fsdevel

On Thu, 31 Jan 2019, Michal Hocko wrote:

> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 9f5e323e883e..7bcdd36e629d 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
> >  
> >  		page = find_get_page(mapping, index);
> >  		if (!page) {
> > -			if (iocb->ki_flags & IOCB_NOWAIT)
> > -				goto would_block;
> >  			page_cache_sync_readahead(mapping,
> >  					ra, filp,
> >  					index, last_index - index);
> 
> Maybe a stupid question but I am not really familiar with this path but
> what exactly does prevent a sync read down page_cache_sync_readahead
> path?

page_cache_sync_readahead() only submits the read ahead request(s), it 
doesn't wait for it to finish.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31 10:15       ` Jiri Kosina
@ 2019-01-31 10:23         ` Michal Hocko
  2019-01-31 10:30           ` Jiri Kosina
  2019-01-31 17:54           ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Michal Hocko @ 2019-01-31 10:23 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Vlastimil Babka, Andrew Morton, Linus Torvalds, linux-kernel,
	linux-mm, linux-api, Peter Zijlstra, Greg KH, Jann Horn,
	Dominique Martinet, Andy Lutomirski, Dave Chinner, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, linux-fsdevel

On Thu 31-01-19 11:15:28, Jiri Kosina wrote:
> On Thu, 31 Jan 2019, Michal Hocko wrote:
> 
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 9f5e323e883e..7bcdd36e629d 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
> > >  
> > >  		page = find_get_page(mapping, index);
> > >  		if (!page) {
> > > -			if (iocb->ki_flags & IOCB_NOWAIT)
> > > -				goto would_block;
> > >  			page_cache_sync_readahead(mapping,
> > >  					ra, filp,
> > >  					index, last_index - index);
> > 
> > Maybe a stupid question but I am not really familiar with this path but
> > what exactly does prevent a sync read down page_cache_sync_readahead
> > path?
> 
> page_cache_sync_readahead() only submits the read ahead request(s), it 
> doesn't wait for it to finish.

OK, I guess my question was not precise. What does prevent taking fs
locks down the path?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31 10:23         ` Michal Hocko
@ 2019-01-31 10:30           ` Jiri Kosina
  2019-01-31 11:32             ` Michal Hocko
  2019-01-31 17:54           ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Jiri Kosina @ 2019-01-31 10:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, Linus Torvalds, linux-kernel,
	linux-mm, linux-api, Peter Zijlstra, Greg KH, Jann Horn,
	Dominique Martinet, Andy Lutomirski, Dave Chinner, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, linux-fsdevel

On Thu, 31 Jan 2019, Michal Hocko wrote:

> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 9f5e323e883e..7bcdd36e629d 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
> > > >  
> > > >  		page = find_get_page(mapping, index);
> > > >  		if (!page) {
> > > > -			if (iocb->ki_flags & IOCB_NOWAIT)
> > > > -				goto would_block;
> > > >  			page_cache_sync_readahead(mapping,
> > > >  					ra, filp,
> > > >  					index, last_index - index);
> > > 
> > > Maybe a stupid question but I am not really familiar with this path but
> > > what exactly does prevent a sync read down page_cache_sync_readahead
> > > path?
> > 
> > page_cache_sync_readahead() only submits the read ahead request(s), it 
> > doesn't wait for it to finish.
> 
> OK, I guess my question was not precise. What does prevent taking fs
> locks down the path?

Well, RWF_NOWAIT doesn't mean the kernel can't reschedule while executing 
preadv2(), right? It just means it will not wait for the arrival of the 
whole data blob into pagecache in case it's not there.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31 10:30           ` Jiri Kosina
@ 2019-01-31 11:32             ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2019-01-31 11:32 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Vlastimil Babka, Andrew Morton, Linus Torvalds, linux-kernel,
	linux-mm, linux-api, Peter Zijlstra, Greg KH, Jann Horn,
	Dominique Martinet, Andy Lutomirski, Dave Chinner, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, linux-fsdevel

On Thu 31-01-19 11:30:24, Jiri Kosina wrote:
> On Thu, 31 Jan 2019, Michal Hocko wrote:
> 
> > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > index 9f5e323e883e..7bcdd36e629d 100644
> > > > > --- a/mm/filemap.c
> > > > > +++ b/mm/filemap.c
> > > > > @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
> > > > >  
> > > > >  		page = find_get_page(mapping, index);
> > > > >  		if (!page) {
> > > > > -			if (iocb->ki_flags & IOCB_NOWAIT)
> > > > > -				goto would_block;
> > > > >  			page_cache_sync_readahead(mapping,
> > > > >  					ra, filp,
> > > > >  					index, last_index - index);
> > > > 
> > > > Maybe a stupid question but I am not really familiar with this path but
> > > > what exactly does prevent a sync read down page_cache_sync_readahead
> > > > path?
> > > 
> > > page_cache_sync_readahead() only submits the read ahead request(s), it 
> > > doesn't wait for it to finish.
> > 
> > OK, I guess my question was not precise. What does prevent taking fs
> > locks down the path?
> 
> Well, RWF_NOWAIT doesn't mean the kernel can't reschedule while executing 
> preadv2(), right? It just means it will not wait for the arrival of the 
> whole data blob into pagecache in case it's not there.

No, it can reschedule for sure but the man page says: 
: If this flag is specified, the preadv2() system call will return
: instantly if it would have to read data from the backing storage or wait
: for a lock.

I assume that the lock is meant to be a filesystem lock here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31 10:23         ` Michal Hocko
  2019-01-31 10:30           ` Jiri Kosina
@ 2019-01-31 17:54           ` Linus Torvalds
  2019-02-01  5:13             ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2019-01-31 17:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jiri Kosina, Vlastimil Babka, Andrew Morton,
	Linux List Kernel Mailing, Linux-MM, Linux API, Peter Zijlstra,
	Greg KH, Jann Horn, Dominique Martinet, Andy Lutomirski,
	Dave Chinner, Kevin Easton, Matthew Wilcox, Cyril Hrubis,
	Tejun Heo, Kirill A . Shutemov, Daniel Gruss, linux-fsdevel

On Thu, Jan 31, 2019 at 2:23 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> OK, I guess my question was not precise. What does prevent taking fs
> locks down the path?

IOCB_NOWAIT has never meant that, and will never mean it.

We will never give user space those kinds of guarantees. We do locking
for various reasons. For example, we'll do the mm lock just when
fetching/storing data from/to user space if there's a page fault. Or -
more obviously - we'll also check for - and sleep on - mandatory locks
in rw_verify_area().

There is nothing like "atomic IO" to user space. We simply do not give
those kinds of guarantees. That's even more true when this is a
information leak that we shouldn't expose to user space in the first
place.

                  Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31  9:56     ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Michal Hocko
  2019-01-31 10:15       ` Jiri Kosina
@ 2019-02-01  1:44       ` Dave Chinner
  2019-02-12 15:48         ` Jiri Kosina
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2019-02-01  1:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, Linus Torvalds, linux-kernel,
	linux-mm, linux-api, Peter Zijlstra, Greg KH, Jann Horn,
	Jiri Kosina, Dominique Martinet, Andy Lutomirski, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, Jiri Kosina, linux-fsdevel

On Thu, Jan 31, 2019 at 10:56:44AM +0100, Michal Hocko wrote:
> [Cc fs-devel]
> 
> On Wed 30-01-19 13:44:19, Vlastimil Babka wrote:
> > From: Jiri Kosina <jkosina@suse.cz>
> > 
> > preadv2(RWF_NOWAIT) can be used to open a side-channel to pagecache contents, as
> > it reveals metadata about residency of pages in pagecache.
> > 
> > If preadv2(RWF_NOWAIT) returns immediately, it provides a clear "page not
> > resident" information, and vice versa.
> > 
> > Close that sidechannel by always initiating readahead on the cache if we
> > encounter a cache miss for preadv2(RWF_NOWAIT); with that in place, probing
> > the pagecache residency itself will actually populate the cache, making the
> > sidechannel useless.
> 
> I guess the current wording doesn't disallow background IO to be
> triggered for EAGAIN case. I am not sure whether that breaks clever
> applications which try to perform larger IO for those cases though.

Actually, it does:

RWF_NOWAIT (since Linux 4.14)

    Do  not  wait for data which is not immediately available.  If
    this flag is specified, the preadv2() system call will return
    instantly if it would have to read data from the backing storage
    or wait for a lock.

page_cache_sync_readahead() can block on page allocation, it calls
->readpages() which means there are page locks and filesystem locks
in play (e.g.  for block mapping), there's potential for blocking on
metadata IO (both submission and completion) to read block maps, the
data readahead can be submitted for IO so it can get stuck anywhere
in the IO path, etc...

Basically, it completely subverts the documented behaviour of
RWF_NOWAIT.

There are applications (like Samba (*)) that are planning to use
this to avoid blocking their main processing threads on buffered
IO. This change makes RWF_NOWAIT pretty much useless to them - it
/was/ the only solution we had for reliably issuing non-blocking IO,
with this patch it isn't a viable solution at all.

(*) https://github.com/samba-team/samba/commit/6381044c0270a647c20935d22fd23f235d19b328

IOWs, if this change goes through, it needs to be documented as an
intentional behavioural bug in the preadv2 manpage so that userspace
developers are aware of the new limitations of RWF_NOWAIT and should
avoid it like the plague.

But worse than that is nobody has bothered to (or ask someone
familiar with the code to) do an audit of RWF_NOWAIT usage after I
pointed out the behavioural issues. The one person who was engaged
and /had done an audit/ got shouted down with so much bullshit they
just walked away....

So, I'll invite the incoherent, incandescent O_DIRECT rage flames of
Linus to be unleashed again and point out the /other reference/ to
IOCB_NOWAIT in mm/filemap.c. That is, in generic_file_read_iter(),
in the *generic O_DIRECT read path*:

	if (iocb->ki_flags & IOCB_DIRECT) {
.....
		if (iocb->ki_flags & IOCB_NOWAIT) {
			if (filemap_range_has_page(mapping, iocb->ki_pos,
						   iocb->ki_pos + count - 1))
				return -EAGAIN;
		} else {
.....

This page cache probe is about 100 lines of code down from the code
that this patch modifies, in it's direct caller. It's not hard to
find, I shouldn't have to point it out, nor have to explain how it
makes this patch completely irrelevant.

> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 9f5e323e883e..7bcdd36e629d 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
> >  
> >  		page = find_get_page(mapping, index);
> >  		if (!page) {
> > -			if (iocb->ki_flags & IOCB_NOWAIT)
> > -				goto would_block;
> >  			page_cache_sync_readahead(mapping,
> >  					ra, filp,
> >  					index, last_index - index);
> 
> Maybe a stupid question but I am not really familiar with this path but
> what exactly does prevent a sync read down page_cache_sync_readahead
> path?

It's effectively useless as a workaround because you can avoid the
readahead IO being issued relatively easily:

void page_cache_sync_readahead(struct address_space *mapping,
                               struct file_ra_state *ra, struct file *filp,
                               pgoff_t offset, unsigned long req_size)
{
        /* no read-ahead */
        if (!ra->ra_pages)
                return;

        if (blk_cgroup_congested())
                return;
....

IOWs, we just have to issue enough IO to congest the block device (or,
even easier, a rate-limited cgroup), and we can still use RWF_NOWAIT
to probe the page cache. Or if we can convince ra->ra_pages to be
zero (e.g. it's on bdi device with no readahead configured because
it's real fast) then it doesn't work there, either.

So this a) isn't a robust workaround, b) it breaks documented API
semantics and c) isn't the only path to page cache probing via
RWF_NOWAIT. It's just a new game of whack-a-mole.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-01-31 17:54           ` Linus Torvalds
@ 2019-02-01  5:13             ` Dave Chinner
  2019-02-01  7:05               ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2019-02-01  5:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Hocko, Jiri Kosina, Vlastimil Babka, Andrew Morton,
	Linux List Kernel Mailing, Linux-MM, Linux API, Peter Zijlstra,
	Greg KH, Jann Horn, Dominique Martinet, Andy Lutomirski,
	Kevin Easton, Matthew Wilcox, Cyril Hrubis, Tejun Heo,
	Kirill A . Shutemov, Daniel Gruss, linux-fsdevel

On Thu, Jan 31, 2019 at 09:54:16AM -0800, Linus Torvalds wrote:
> On Thu, Jan 31, 2019 at 2:23 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > OK, I guess my question was not precise. What does prevent taking fs
> > locks down the path?
> 
> IOCB_NOWAIT has never meant that, and will never mean it.

I think you're wrong, Linus. IOCB_NOWAIT was specifically designed
to prevent blocking on filesystem locks during AIO submission. The
initial commits spell that out pretty clearly:

commit b745fafaf70c0a98a2e1e7ac8cb14542889ceb0e
Author: Goldwyn Rodrigues <rgoldwyn@suse.com>
Date:   Tue Jun 20 07:05:43 2017 -0500

    fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT
    
    RWF_NOWAIT informs kernel to bail out if an AIO request will block
    for reasons such as file allocations, or a writeback triggered,
    or would block while allocating requests while performing
    direct I/O.
    
    RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags.
    
    FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable
    of returning -EAGAIN if the AIO call will block. This must be set by
    supporting filesystems in the ->open() call.
    
    Filesystems xfs, btrfs and ext4 would be supported in the following patches.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit 29a5d29ec181ebdc98a26cedbd76ce9870248892
Author: Goldwyn Rodrigues <rgoldwyn@suse.com>
Date:   Tue Jun 20 07:05:48 2017 -0500

    xfs: nowait aio support
    
    If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
    immediately.
    
    IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
    if it needs allocation either due to file extension, writing to a hole,
    or COW or waiting for other DIOs to finish.
    
    Return -EAGAIN if we don't have extent list in memory.
    
    Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit 728fbc0e10b7f3ce2ee043b32e3453fd5201c055
Author: Goldwyn Rodrigues <rgoldwyn@suse.com>
Date:   Tue Jun 20 07:05:47 2017 -0500

    ext4: nowait aio support
    
    Return EAGAIN if any of the following checks fail for direct I/O:
      + i_rwsem is lockable
      + Writing beyond end of file (will trigger allocation)
      + Blocks are not allocated at the write location
    
    Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

> We will never give user space those kinds of guarantees. We do locking
> for various reasons.  For example, we'll do the mm lock just when
> fetching/storing data from/to user space if there's a page fault.

You are conflating "best effort non-blocking operation" with
"atomic guarantee".  RWF_NOWAIT/IOCB_NOWAIT is the
former, not the latter.

i.e. RWF_NOWAIT addresses the "every second IO submission blocks"
problems that AIO submission suffered from due to filesystem lock
contention, not the rare and unusual things like  "page fault during
get_user_pages in direct IO submission".  Maybe one day, but right
now those rare cases are not pain points for applications that
require nonblock AIO submission via RWF_NOWAIT.

> Or -
> more obviously - we'll also check for - and sleep on - mandatory locks
> in rw_verify_area().

Well, only if you don't use fcntl(O_NONBLOCK) on the file to tell
mandatory locking to fail with -EAGAIN instead of sleeping.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-02-01  5:13             ` Dave Chinner
@ 2019-02-01  7:05               ` Linus Torvalds
  2019-02-01  7:21                 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2019-02-01  7:05 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Jiri Kosina, Vlastimil Babka, Andrew Morton,
	Linux List Kernel Mailing, Linux-MM, Linux API, Peter Zijlstra,
	Greg KH, Jann Horn, Dominique Martinet, Andy Lutomirski,
	Kevin Easton, Matthew Wilcox, Cyril Hrubis, Tejun Heo,
	Kirill A . Shutemov, Daniel Gruss, linux-fsdevel

On Thu, Jan 31, 2019 at 9:16 PM Dave Chinner <david@fromorbit.com> wrote:
>
> You are conflating "best effort non-blocking operation" with
> "atomic guarantee".  RWF_NOWAIT/IOCB_NOWAIT is the
> former, not the latter.

Right.

That's my *point*, Dave.

It's not 'atomic guarantee", and never will be. We are in 100%
agreement. That's what I _said_.

And part of "best effort" is very much "not a security information leak".

I really don't see why you are so argumentative.

As I mentioned earlier in the thread, it's actually quite possible
that users will actually find that starting read-ahead is a *good*
thing, Dave.

Even - in fact *particularly* - the user you brought up: samba using
RWF_NOWAIT to try to do things synchronously quickly.

So Dave, why are you being so negative?

             Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-02-01  7:05               ` Linus Torvalds
@ 2019-02-01  7:21                 ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2019-02-01  7:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Jiri Kosina, Vlastimil Babka, Andrew Morton,
	Linux List Kernel Mailing, Linux-MM, Linux API, Peter Zijlstra,
	Greg KH, Jann Horn, Dominique Martinet, Andy Lutomirski,
	Kevin Easton, Matthew Wilcox, Cyril Hrubis, Tejun Heo,
	Kirill A . Shutemov, Daniel Gruss, linux-fsdevel

On Thu, Jan 31, 2019 at 11:05 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And part of "best effort" is very much "not a security information leak".

Side note: it's entirely possible that the preadv2(RWF_NOWAIT)
interface is actually already effectively too slow to be effectively
used as much of an attack vector.

One of the advantages of mincore() for the attack was that you could
just get a lot of page status information in one go. With RWF_NOWAIT,
you only really get "up to the first non-cached page", so it's already
a weaker signal than mincore() gave.

System calls aren't horrendously slow (at least not with fixed
non-meltdown CPU's), but it might still be a somewhat noticeable
inconvenience in an attack that is already probably not all that easy
to do on an arbitrary target.

So it might not be a huge deal. But I think we should at least try to
make things less useful for these kinds of attack vectors.

And no, that doesn't mean "stop all theoretical attacks". It means
"let's try to make things less convenient as a data leak".

That's why things like "oh, you can still see the signal if you can
keep the backing device congested" is not something I'd worry about.
It's just another (big) inconvenience, and not all that simple to do.
At some point, it's simply not worth it as an attack vector any more.

               Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O
  2019-02-01  1:44       ` Dave Chinner
@ 2019-02-12 15:48         ` Jiri Kosina
  0 siblings, 0 replies; 11+ messages in thread
From: Jiri Kosina @ 2019-02-12 15:48 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Vlastimil Babka, Andrew Morton, Linus Torvalds,
	linux-kernel, linux-mm, linux-api, Peter Zijlstra, Greg KH,
	Jann Horn, Dominique Martinet, Andy Lutomirski, Kevin Easton,
	Matthew Wilcox, Cyril Hrubis, Tejun Heo, Kirill A . Shutemov,
	Daniel Gruss, linux-fsdevel

On Fri, 1 Feb 2019, Dave Chinner wrote:

> So, I'll invite the incoherent, incandescent O_DIRECT rage flames of
> Linus to be unleashed again and point out the /other reference/ to
> IOCB_NOWAIT in mm/filemap.c. That is, in generic_file_read_iter(),
> in the *generic O_DIRECT read path*:
> 
> 	if (iocb->ki_flags & IOCB_DIRECT) {
> .....
> 		if (iocb->ki_flags & IOCB_NOWAIT) {
> 			if (filemap_range_has_page(mapping, iocb->ki_pos,
> 						   iocb->ki_pos + count - 1))
> 				return -EAGAIN;
> 		} else {
> .....

OK, thanks Dave, this is a good point I've missed in this mail before 
(probabably as I focused only on the aspect of disagreement what NONBLOCK 
actually means :) ). I will look into fixing it for next iteration.

> It's effectively useless as a workaround because you can avoid the
> readahead IO being issued relatively easily:
> 
> void page_cache_sync_readahead(struct address_space *mapping,
>                                struct file_ra_state *ra, struct file *filp,
>                                pgoff_t offset, unsigned long req_size)
> {
>         /* no read-ahead */
>         if (!ra->ra_pages)
>                 return;
> 
>         if (blk_cgroup_congested())
>                 return;
> ....
> 
> IOWs, we just have to issue enough IO to congest the block device (or,
> even easier, a rate-limited cgroup), and we can still use RWF_NOWAIT
> to probe the page cache. Or if we can convince ra->ra_pages to be
> zero (e.g. it's on bdi device with no readahead configured because
> it's real fast) then it doesn't work there, either.

It's though questionable whether the noise level here wouldn't be too high 
already for any sidechannel to work reliably. So I'd suggest to operate 
under the assumption that it would be too noisy, unless anyone is able to 
prove otherwise.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <nycvar.YFH.7.76.1901051817390.16954@cbobk.fhfr.pm>
     [not found] ` <20190130124420.1834-1-vbabka@suse.cz>
     [not found]   ` <20190130124420.1834-3-vbabka@suse.cz>
2019-01-31  9:56     ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Michal Hocko
2019-01-31 10:15       ` Jiri Kosina
2019-01-31 10:23         ` Michal Hocko
2019-01-31 10:30           ` Jiri Kosina
2019-01-31 11:32             ` Michal Hocko
2019-01-31 17:54           ` Linus Torvalds
2019-02-01  5:13             ` Dave Chinner
2019-02-01  7:05               ` Linus Torvalds
2019-02-01  7:21                 ` Linus Torvalds
2019-02-01  1:44       ` Dave Chinner
2019-02-12 15:48         ` Jiri Kosina

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox