All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: cluster-devel <cluster-devel@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] gfs2 fix
Date: Wed, 27 Apr 2022 13:25:56 -0700	[thread overview]
Message-ID: <CAHk-=wgSYSNc5sF2EVxhjbSc+c4LTs90aYaK2wavNd_m2bUkGg@mail.gmail.com> (raw)
In-Reply-To: <CAHc6FU5654k7QBU97g_Ubj8cJEWuA_bXPuXOPpBBYoXVPMJG=g@mail.gmail.com>

On Wed, Apr 27, 2022 at 12:41 PM Andreas Gruenbacher
<agruenba@redhat.com> wrote:
>
> I wonder if this could be documented in the read and write manual
> pages. Or would that be asking too much?

I don't think it would be asking too much, since it's basically just
describing what Linux has always done in all the major filesystems.

Eg look at filemap_read(), which is basically the canonical read
function, and note how it doesn't take a single lock at that level.

We *do* have synchronization at a page level, though, ie we've always
had that page-level "uptodate" bit, of course (ok, so "always" isn't
true - back in the distant past it was the 'struct buffer_head' that
was the synchronization point).

That said, even that is not synchronizing against "new writes", but
only against "new creations" (which may, of course, be writers, but is
equally likely to be just reading the contents from disk).

That said:

 (a) different filesystems can and will do different things.

Not all filesystems use filemap_read() at all, and even the ones that
do often have their own wrappers. Such wrappers *can* do extra
serialization, and have their own rules. But ext4 does not, for
example (see ext4_file_read_iter()).

And as mentioned, I *think* XFS honors that old POSIX rule for
historical reasons.

 (b) we do have *different* locking

for example, we these days do actually serialize properly on the
file->f_pos, which means that a certain *class* of read/write things
are atomic wrt each other, because we actually hold that f_pos lock
over the whole operation and so if you do file reads and writes using
the same file descriptor, they'll be disjoint.

That, btw, hasn't always been true. If you had multiple threads using
the same file pointer, I think we used to get basically random
results. So we have actually strengthened our locking in this area,
and made it much better.

But note how even if you have the same file descriptor open, and then
do pread/pwrite, those can and will happen concurrently.

And mmap accesses and modifications are obviously *always* concurrent,
even if the fault itself - but not the accesses - might end up being
serialized due to some filesystem locking implementation detail.

End result: the exact serialization is complex, depends on the
filesystem, and is just not really something that should be described
or even relied on (eg that f_pos serialization is something we do
properly now, but didn't necessarily do in the past, so ..)

Is it then worth pointing out one odd POSIX rule that basically nobody
but some very low-level filesystem people have ever heard about, and
that no version of Linux has ever conformed to in the main default
filesystems, and that no user has ever cared about?

             Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GIT PULL] gfs2 fix
Date: Wed, 27 Apr 2022 13:25:56 -0700	[thread overview]
Message-ID: <CAHk-=wgSYSNc5sF2EVxhjbSc+c4LTs90aYaK2wavNd_m2bUkGg@mail.gmail.com> (raw)
In-Reply-To: <CAHc6FU5654k7QBU97g_Ubj8cJEWuA_bXPuXOPpBBYoXVPMJG=g@mail.gmail.com>

On Wed, Apr 27, 2022 at 12:41 PM Andreas Gruenbacher
<agruenba@redhat.com> wrote:
>
> I wonder if this could be documented in the read and write manual
> pages. Or would that be asking too much?

I don't think it would be asking too much, since it's basically just
describing what Linux has always done in all the major filesystems.

Eg look at filemap_read(), which is basically the canonical read
function, and note how it doesn't take a single lock at that level.

We *do* have synchronization at a page level, though, ie we've always
had that page-level "uptodate" bit, of course (ok, so "always" isn't
true - back in the distant past it was the 'struct buffer_head' that
was the synchronization point).

That said, even that is not synchronizing against "new writes", but
only against "new creations" (which may, of course, be writers, but is
equally likely to be just reading the contents from disk).

That said:

 (a) different filesystems can and will do different things.

Not all filesystems use filemap_read() at all, and even the ones that
do often have their own wrappers. Such wrappers *can* do extra
serialization, and have their own rules. But ext4 does not, for
example (see ext4_file_read_iter()).

And as mentioned, I *think* XFS honors that old POSIX rule for
historical reasons.

 (b) we do have *different* locking

for example, we these days do actually serialize properly on the
file->f_pos, which means that a certain *class* of read/write things
are atomic wrt each other, because we actually hold that f_pos lock
over the whole operation and so if you do file reads and writes using
the same file descriptor, they'll be disjoint.

That, btw, hasn't always been true. If you had multiple threads using
the same file pointer, I think we used to get basically random
results. So we have actually strengthened our locking in this area,
and made it much better.

But note how even if you have the same file descriptor open, and then
do pread/pwrite, those can and will happen concurrently.

And mmap accesses and modifications are obviously *always* concurrent,
even if the fault itself - but not the accesses - might end up being
serialized due to some filesystem locking implementation detail.

End result: the exact serialization is complex, depends on the
filesystem, and is just not really something that should be described
or even relied on (eg that f_pos serialization is something we do
properly now, but didn't necessarily do in the past, so ..)

Is it then worth pointing out one odd POSIX rule that basically nobody
but some very low-level filesystem people have ever heard about, and
that no version of Linux has ever conformed to in the main default
filesystems, and that no user has ever cared about?

             Linus


  reply	other threads:[~2022-04-27 20:26 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-26 14:54 [GIT PULL] gfs2 fix Andreas Gruenbacher
2022-04-26 14:54 ` [Cluster-devel] " Andreas Gruenbacher
2022-04-26 18:31 ` Linus Torvalds
2022-04-26 18:31   ` [Cluster-devel] " Linus Torvalds
2022-04-26 21:27   ` Andreas Gruenbacher
2022-04-26 21:27     ` [Cluster-devel] " Andreas Gruenbacher
2022-04-26 23:33     ` Linus Torvalds
2022-04-26 23:33       ` [Cluster-devel] " Linus Torvalds
2022-04-27 12:29       ` Andreas Gruenbacher
2022-04-27 12:29         ` [Cluster-devel] " Andreas Gruenbacher
2022-04-27 17:13         ` Linus Torvalds
2022-04-27 17:13           ` [Cluster-devel] " Linus Torvalds
2022-04-27 19:41           ` Andreas Gruenbacher
2022-04-27 19:41             ` [Cluster-devel] " Andreas Gruenbacher
2022-04-27 20:25             ` Linus Torvalds [this message]
2022-04-27 20:25               ` Linus Torvalds
2022-04-27 21:26               ` Andreas Gruenbacher
2022-04-27 21:26                 ` [Cluster-devel] " Andreas Gruenbacher
2022-04-27 22:20                 ` Linus Torvalds
2022-04-27 22:20                   ` [Cluster-devel] " Linus Torvalds
2022-04-28  0:00                   ` Linus Torvalds
2022-04-28  0:00                     ` [Cluster-devel] " Linus Torvalds
2022-04-28 13:26                     ` Andreas Gruenbacher
2022-04-28 13:26                       ` [Cluster-devel] " Andreas Gruenbacher
2022-04-28 17:09                       ` Linus Torvalds
2022-04-28 17:09                         ` [Cluster-devel] " Linus Torvalds
2022-04-28 17:17                         ` Linus Torvalds
2022-04-28 17:17                           ` [Cluster-devel] " Linus Torvalds
2022-04-28 17:21                           ` Andreas Gruenbacher
2022-04-28 17:21                             ` [Cluster-devel] " Andreas Gruenbacher
2022-04-28 17:38                         ` Andreas Gruenbacher
2022-04-28 17:38                           ` [Cluster-devel] " Andreas Gruenbacher
2022-05-02 18:31                           ` Linus Torvalds
2022-05-02 18:31                             ` [Cluster-devel] " Linus Torvalds
2022-05-02 18:58                             ` Linus Torvalds
2022-05-02 18:58                               ` [Cluster-devel] " Linus Torvalds
2022-05-02 20:24                               ` Andreas Gruenbacher
2022-05-02 20:24                                 ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03  8:56                             ` Andreas Gruenbacher
2022-05-03  8:56                               ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 13:30                               ` Andreas Gruenbacher
2022-05-03 13:30                                 ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 16:19                               ` Linus Torvalds
2022-05-03 16:19                                 ` [Cluster-devel] " Linus Torvalds
2022-05-03 16:41                                 ` Andreas Gruenbacher
2022-05-03 16:41                                   ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 16:50                                   ` Linus Torvalds
2022-05-03 16:50                                     ` [Cluster-devel] " Linus Torvalds
2022-05-03 21:35                               ` Andreas Gruenbacher
2022-05-03 21:35                                 ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 22:41                                 ` Linus Torvalds
2022-05-03 22:41                                   ` [Cluster-devel] " Linus Torvalds
2022-05-04 17:52                                   ` Andreas Gruenbacher
2022-05-04 17:52                                     ` [Cluster-devel] " Andreas Gruenbacher
2022-04-28 18:16                       ` pr-tracker-bot
2022-04-28 18:16                         ` [Cluster-devel] " pr-tracker-bot
2022-04-26 19:07 ` pr-tracker-bot
2022-04-26 19:07   ` [Cluster-devel] " pr-tracker-bot
2023-06-06 12:48 Andreas Gruenbacher
2023-06-06 12:55 ` Linus Torvalds
2023-06-06 13:32   ` Andreas Gruenbacher
2023-06-06 13:22 ` pr-tracker-bot
2024-03-25 12:10 Andreas Gruenbacher
2024-03-25 18:18 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgSYSNc5sF2EVxhjbSc+c4LTs90aYaK2wavNd_m2bUkGg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.