All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>,
	Dave Chinner <dchinner@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	Amir Goldstein <amir73il@gmail.com>, Jan Kara <jack@suse.cz>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: pagecache locking (was: bcachefs status update) merged)
Date: Thu, 13 Jun 2019 17:08:16 -1000	[thread overview]
Message-ID: <CAHk-=whMHtg62J2KDKnyOTaoLs9GxcNz1hN9QKqpxoO=0bJqdQ@mail.gmail.com> (raw)
In-Reply-To: <20190613235524.GK14363@dread.disaster.area>

On Thu, Jun 13, 2019 at 1:56 PM Dave Chinner <david@fromorbit.com> wrote:
>
> - buffered read and buffered write can run concurrently if they
> don't overlap, but right now they are serialised because that's the
> only way to provide POSIX atomic write vs read semantics (only XFS
> provides userspace with that guarantee).

I do not believe that posix itself actually requires that at all,
although extended standards may.

That said, from a quality of implementation standpoint, it's obviously
a good thing to do, so it might be worth looking at if something
reasonable can be done. The XFS atomicity guarantees are better than
what other filesystems give, but they might also not be exactly
required.

But POSIX actually ends up being pretty lax, and says

  "Writes can be serialized with respect to other reads and writes. If
a read() of file data can be proven (by any means) to occur after a
write() of the data, it must reflect that write(), even if the calls
are made by different processes. A similar requirement applies to
multiple write operations to the same file position. This is needed to
guarantee the propagation of data from write() calls to subsequent
read() calls. This requirement is particularly significant for
networked file systems, where some caching schemes violate these
semantics."

Note the "can" in "can be serialized", not "must". Also note that
whole language about how the read file data must match the written
data only if the read can be proven to have occurred after a write of
that data.  Concurrency is very much left in the air, only provably
serial operations matter.

(There is also language that talks about "after the write has
successfully returned" etc - again, it's about reads that occur
_after_ the write, not concurrently with the write).

The only atomicity guarantees are about the usual pipe writes and
PIPE_BUF. Those are very explicit.

Of course, there are lots of standards outside of just the POSIX
read/write thing, so you may be thinking of some other stricter
standard. POSIX itself has always been pretty permissive.

And as mentioned, I do agree from a QoI standpoint that atomicity is
nice, and that the XFS behavior is better. However, it does seem that
nobody really cares, because I'm not sure we've ever done it in
general (although we do have that i_rwsem, but I think it's mainly
used to give the proper lseek behavior). And so the XFS behavior may
not necessarily be *worth* it, although I presume you have some test
for this as part of xfstests.

                Linus

  parent reply	other threads:[~2019-06-14  3:16 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10 19:14 bcachefs status update (it's done cooking; let's get this sucker merged) Kent Overstreet
2019-06-10 19:14 ` [PATCH 01/12] Compiler Attributes: add __flatten Kent Overstreet
2019-06-12 17:16   ` Greg KH
2019-06-10 19:14 ` [PATCH 02/12] locking: SIX locks (shared/intent/exclusive) Kent Overstreet
2019-06-10 19:14 ` [PATCH 03/12] mm: pagecache add lock Kent Overstreet
2019-06-10 19:14 ` [PATCH 04/12] mm: export find_get_pages() Kent Overstreet
2019-06-10 19:14 ` [PATCH 05/12] fs: insert_inode_locked2() Kent Overstreet
2019-06-10 19:14 ` [PATCH 06/12] fs: factor out d_mark_tmpfile() Kent Overstreet
2019-06-10 19:14 ` [PATCH 07/12] Propagate gfp_t when allocating pte entries from __vmalloc Kent Overstreet
2019-06-10 19:14 ` [PATCH 08/12] block: Add some exports for bcachefs Kent Overstreet
2019-06-10 19:14 ` [PATCH 09/12] bcache: optimize continue_at_nobarrier() Kent Overstreet
2019-06-10 19:14 ` [PATCH 10/12] bcache: move closures to lib/ Kent Overstreet
2019-06-11 10:25   ` Coly Li
2019-06-13  7:28   ` Christoph Hellwig
2019-06-13 11:04     ` Kent Overstreet
2019-06-10 19:14 ` [PATCH 11/12] closures: closure_wait_event() Kent Overstreet
2019-06-11 10:25   ` Coly Li
2019-06-12 17:17   ` Greg KH
2019-06-10 19:14 ` [PATCH 12/12] closures: fix a race on wakeup from closure_sync Kent Overstreet
2019-07-16 10:47   ` Coly Li
2019-07-18  7:46     ` Coly Li
2019-07-22 17:22       ` Kent Overstreet
2020-11-24 23:07         ` Marc Smith
2020-11-25 18:10           ` Marc Smith
2019-06-10 20:46 ` bcachefs status update (it's done cooking; let's get this sucker merged) Linus Torvalds
2019-06-11  1:17   ` Kent Overstreet
2019-06-11  4:33     ` Dave Chinner
2019-06-12 16:21       ` Kent Overstreet
2019-06-12 23:02         ` Dave Chinner
2019-06-13 18:36           ` pagecache locking (was: bcachefs status update) merged) Kent Overstreet
2019-06-13 21:13             ` Andreas Dilger
2019-06-13 21:21               ` Kent Overstreet
2019-06-14  0:35                 ` Dave Chinner
2019-06-13 23:55             ` Dave Chinner
2019-06-14  2:30               ` Linus Torvalds
2019-06-14  7:30                 ` Dave Chinner
2019-06-15  1:15                   ` Linus Torvalds
2019-06-14  3:08               ` Linus Torvalds [this message]
2019-06-15  4:01                 ` Linus Torvalds
2019-06-17 22:47                   ` Dave Chinner
2019-06-17 23:38                     ` Linus Torvalds
2019-06-18  4:21                       ` Amir Goldstein
2019-06-19 10:38                         ` Jan Kara
2019-06-19 22:37                           ` Dave Chinner
2019-07-03  0:04                             ` pagecache locking Boaz Harrosh
     [not found]                               ` <DM6PR19MB250857CB8A3A1C8279D6F2F3C5FB0@DM6PR19MB2508.namprd19.prod.outlook.com>
2019-07-03  1:25                                 ` Boaz Harrosh
2019-07-05 23:31                               ` Dave Chinner
2019-07-07 15:05                                 ` Boaz Harrosh
2019-07-07 23:55                                   ` Dave Chinner
2019-07-08 13:31                                 ` Jan Kara
2019-07-09 23:47                                   ` Dave Chinner
2019-07-10  8:41                                     ` Jan Kara
2019-06-14 17:08               ` pagecache locking (was: bcachefs status update) merged) Kent Overstreet
2019-06-19  8:21           ` bcachefs status update (it's done cooking; let's get this sucker merged) Jan Kara
2019-07-03  1:04             ` [PATCH] mm: Support madvise_willneed override by Filesystems Boaz Harrosh
2019-07-03 17:21               ` Jan Kara
2019-07-03 18:03                 ` Boaz Harrosh
2019-06-11  4:55     ` bcachefs status update (it's done cooking; let's get this sucker merged) Linus Torvalds
2019-06-11 14:26       ` Matthew Wilcox
2019-06-11  4:10   ` Dave Chinner
2019-06-11  4:39     ` Linus Torvalds
2019-06-11  7:10       ` Dave Chinner
2019-06-12  2:07         ` Linus Torvalds
2019-07-03  5:59 ` Stefan K
2020-06-10 11:02 ` Stefan K
2020-06-10 11:32   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whMHtg62J2KDKnyOTaoLs9GxcNz1hN9QKqpxoO=0bJqdQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.