linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Dave Wysochanski <dwysocha@redhat.com>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Jeff Layton <jlayton@kernel.org>,
	Latchesar Ionkov <lucho@ionkov.net>,
	Marc Dionne <marc.dionne@auristor.com>,
	Omar Sandoval <osandov@osandov.com>,
	Shyam Prasad N <nspmangalore@gmail.com>,
	Steve French <sfrench@samba.org>,
	Trond Myklebust <trondmy@hammerspace.com>,
	Peter Zijlstra <peterz@infradead.org>,
	ceph-devel@vger.kernel.org, linux-afs@lists.infradead.org,
	linux-cachefs@redhat.com, CIFS <linux-cifs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	"open list:NFS, SUNRPC, AND..." <linux-nfs@vger.kernel.org>,
	v9fs-developer@lists.sourceforge.net,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Out of order read() completion and buffer filling beyond returned amount
Date: Mon, 17 Jan 2022 13:30:05 +0000	[thread overview]
Message-ID: <YeVvXToTxCsMzHZv@casper.infradead.org> (raw)
In-Reply-To: <CAHk-=wjQG5HnwQD98z8de1EvRzDnebZxh=gQUVTKCn0DOp7PQw@mail.gmail.com>

On Mon, Jan 17, 2022 at 12:19:29PM +0200, Linus Torvalds wrote:
> On Mon, Jan 17, 2022 at 11:57 AM David Howells <dhowells@redhat.com> wrote:
> >
> > Do you have an opinion on whether it's permissible for a filesystem to write
> > into the read() buffer beyond the amount it claims to return, though still
> > within the specified size of the buffer?
> 
> I'm pretty sure that would seriously violate POSIX in the general
> case, and maybe even break some programs that do fancy buffer
> management (ie I could imagine some circular buffer thing that expects
> any "unwritten" ('unread'?) parts to stay with the old contents)
> 
> That said, that's for generic 'read()' cases for things like tty's or
> pipes etc that can return partial reads in the first place.
> 
> If it's a regular file, then any partial read *already* violates
> POSIX, and nobody sane would do any such buffer management because
> it's supposed to be a 'can't happen' thing.
> 
> And since you mention DIO, that's doubly true, and is already outside
> basic POSIX, and has already violated things like "all or nothing"
> rules for visibility of writes-vs-reads (which admittedly most Linux
> filesystems have violated even outside of DIO, since the strictest
> reading of the rules are incredibly nasty anyway). But filesystems
> like XFS which took some of the strict rules more seriously already
> ignored them for DIO, afaik.

I think for DIO, you're sacrificing the entire buffer with any filesystem.
If the underlying file is split across multiple drives, or is even
just fragmented on a single drive, we'll submit multiple BIOs which
will complete independently (even for SCSI which writes sequentially;
never mind NVMe which can DMA blocks asynchronously).  It might be
more apparent in a networking situation where errors are more common,
but it's always been a possibility since Linux introduced DIO.

  reply	other threads:[~2022-01-17 13:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17  9:57 Out of order read() completion and buffer filling beyond returned amount David Howells
2022-01-17 10:19 ` Linus Torvalds
2022-01-17 13:30   ` Matthew Wilcox [this message]
2022-01-18  7:25     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YeVvXToTxCsMzHZv@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=anna.schumaker@netapp.com \
    --cc=asmadeus@codewreck.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=dwysocha@redhat.com \
    --cc=jlayton@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=marc.dionne@auristor.com \
    --cc=nspmangalore@gmail.com \
    --cc=osandov@osandov.com \
    --cc=peterz@infradead.org \
    --cc=sfrench@samba.org \
    --cc=torvalds@linux-foundation.org \
    --cc=trondmy@hammerspace.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).