All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: cluster-devel <cluster-devel@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] gfs2 fix
Date: Wed, 27 Apr 2022 17:00:16 -0700	[thread overview]
Message-ID: <CAHk-=wicJdoCjPLu7FhaErr6Z3UaW820U2b+F-8P4qwSFUZ0mg@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whQxvMvty8SjiGMh+gM4VmCYvqn6EAwmrDXJaHT2Aa+UA@mail.gmail.com>

On Wed, Apr 27, 2022 at 3:20 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I really think
>
>  (a) you are mis-reading the standard by attributing too strong logic
> to paperwork that is English prose and not so exact
>
>  (b) documenting Linux as not doing what you are mis-reading it for is
> only encouraging others to mis-read it too
>
> The whole "arbitrary writes have to be all-or-nothing wrt all other
> system calls" is simply not realistic, and has never been. Not just
> not in Linux, but in *ANY* operating system that POSIX was meant to
> describe.

Side note: a lot of those "atomic" things in that documentation have
come from a history of signal handling atomicity issues, and from all
the issues people had with (a) user-space threading implementations
and (b) emulation layers from non-Unixy environments.

So when they say that things like "rename()" has to be all-or-nothing,
it's to clarify that you can't emulate it as a "link and delete
original" kind of operation (which old UNIX *did* do) and claim to be
POSIX.

Because while the end result of rename() and link()+unlink()might be
similar, people did rely on that whole "use rename as a way to create
an atomic marker in the filesystem" (which is a very traditional UNIX
pattern).

So "rename()" has to be atomic, and the legacy behavior of link+unlink
is not valid in POSIX.

Similarly, you can't implement "pread()" as a "lseek+read+lseek back",
because that doesn't work if somebody else is doing another "pread()"
on the same file descriptor concurrently.

Again, people *did* implement exactly those kinds of implementations
of "pread()", and yes, they were broken for both signals and for
threading.

So there's "atomicity" and then there is "atomicity".

That "all or nothing" can be a very practical thing to describe
*roughly* how it must work on a higher level, or it can be a
theoretical "transactional" thing that works literally like a database
where the operation happens in full and you must not see any
intermediate state.

And no, "write()" and friends have never ever been about some
transactional operation where you can't see how the file grows as it
is being written to. That kind of atomicity has simply never existed,
not even in theory.

So when you see POSIX saying that a "read()" system call is "atomic",
you should *not* see it as a transaction thing, but see it in the
historical context of "people used to do threading libraries in user
space, and since they didn't want a big read() to block all other
threads, they'd split it up into many smaller reads and now another
thread *also* doing 'read()' system calls would see the data it read
being not one contiguous region, but multiple regions where the file
position changed in the middle".

Similarly, a "read()" system call will not be interrupted by a signal
in the middle, where the signal handler would do a "lseek()" or
another "read()", and now the original "read()" data suddenly is
affected.

That's why things like that whole "f_pos is atomic" is a big deal.

Because there literally were threading libraries (and badly emulated
environments) where that *WASN'T* the case, and _that_ is why POSIX
then talks about it.

So think of POSIX not as some hard set of "this is exactly how things
work and we describe every detail".

Instead, treat it a bit like historians treat Herodotus - interpreting
his histories by taking the issues of the time into account.  POSIX is
trying to clarify and document the problems of the time it was
written, and taking other things for granted.

                 Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GIT PULL] gfs2 fix
Date: Wed, 27 Apr 2022 17:00:16 -0700	[thread overview]
Message-ID: <CAHk-=wicJdoCjPLu7FhaErr6Z3UaW820U2b+F-8P4qwSFUZ0mg@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whQxvMvty8SjiGMh+gM4VmCYvqn6EAwmrDXJaHT2Aa+UA@mail.gmail.com>

On Wed, Apr 27, 2022 at 3:20 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I really think
>
>  (a) you are mis-reading the standard by attributing too strong logic
> to paperwork that is English prose and not so exact
>
>  (b) documenting Linux as not doing what you are mis-reading it for is
> only encouraging others to mis-read it too
>
> The whole "arbitrary writes have to be all-or-nothing wrt all other
> system calls" is simply not realistic, and has never been. Not just
> not in Linux, but in *ANY* operating system that POSIX was meant to
> describe.

Side note: a lot of those "atomic" things in that documentation have
come from a history of signal handling atomicity issues, and from all
the issues people had with (a) user-space threading implementations
and (b) emulation layers from non-Unixy environments.

So when they say that things like "rename()" has to be all-or-nothing,
it's to clarify that you can't emulate it as a "link and delete
original" kind of operation (which old UNIX *did* do) and claim to be
POSIX.

Because while the end result of rename() and link()+unlink()might be
similar, people did rely on that whole "use rename as a way to create
an atomic marker in the filesystem" (which is a very traditional UNIX
pattern).

So "rename()" has to be atomic, and the legacy behavior of link+unlink
is not valid in POSIX.

Similarly, you can't implement "pread()" as a "lseek+read+lseek back",
because that doesn't work if somebody else is doing another "pread()"
on the same file descriptor concurrently.

Again, people *did* implement exactly those kinds of implementations
of "pread()", and yes, they were broken for both signals and for
threading.

So there's "atomicity" and then there is "atomicity".

That "all or nothing" can be a very practical thing to describe
*roughly* how it must work on a higher level, or it can be a
theoretical "transactional" thing that works literally like a database
where the operation happens in full and you must not see any
intermediate state.

And no, "write()" and friends have never ever been about some
transactional operation where you can't see how the file grows as it
is being written to. That kind of atomicity has simply never existed,
not even in theory.

So when you see POSIX saying that a "read()" system call is "atomic",
you should *not* see it as a transaction thing, but see it in the
historical context of "people used to do threading libraries in user
space, and since they didn't want a big read() to block all other
threads, they'd split it up into many smaller reads and now another
thread *also* doing 'read()' system calls would see the data it read
being not one contiguous region, but multiple regions where the file
position changed in the middle".

Similarly, a "read()" system call will not be interrupted by a signal
in the middle, where the signal handler would do a "lseek()" or
another "read()", and now the original "read()" data suddenly is
affected.

That's why things like that whole "f_pos is atomic" is a big deal.

Because there literally were threading libraries (and badly emulated
environments) where that *WASN'T* the case, and _that_ is why POSIX
then talks about it.

So think of POSIX not as some hard set of "this is exactly how things
work and we describe every detail".

Instead, treat it a bit like historians treat Herodotus - interpreting
his histories by taking the issues of the time into account.  POSIX is
trying to clarify and document the problems of the time it was
written, and taking other things for granted.

                 Linus


  reply	other threads:[~2022-04-28  0:00 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-26 14:54 [GIT PULL] gfs2 fix Andreas Gruenbacher
2022-04-26 14:54 ` [Cluster-devel] " Andreas Gruenbacher
2022-04-26 18:31 ` Linus Torvalds
2022-04-26 18:31   ` [Cluster-devel] " Linus Torvalds
2022-04-26 21:27   ` Andreas Gruenbacher
2022-04-26 21:27     ` [Cluster-devel] " Andreas Gruenbacher
2022-04-26 23:33     ` Linus Torvalds
2022-04-26 23:33       ` [Cluster-devel] " Linus Torvalds
2022-04-27 12:29       ` Andreas Gruenbacher
2022-04-27 12:29         ` [Cluster-devel] " Andreas Gruenbacher
2022-04-27 17:13         ` Linus Torvalds
2022-04-27 17:13           ` [Cluster-devel] " Linus Torvalds
2022-04-27 19:41           ` Andreas Gruenbacher
2022-04-27 19:41             ` [Cluster-devel] " Andreas Gruenbacher
2022-04-27 20:25             ` Linus Torvalds
2022-04-27 20:25               ` [Cluster-devel] " Linus Torvalds
2022-04-27 21:26               ` Andreas Gruenbacher
2022-04-27 21:26                 ` [Cluster-devel] " Andreas Gruenbacher
2022-04-27 22:20                 ` Linus Torvalds
2022-04-27 22:20                   ` [Cluster-devel] " Linus Torvalds
2022-04-28  0:00                   ` Linus Torvalds [this message]
2022-04-28  0:00                     ` Linus Torvalds
2022-04-28 13:26                     ` Andreas Gruenbacher
2022-04-28 13:26                       ` [Cluster-devel] " Andreas Gruenbacher
2022-04-28 17:09                       ` Linus Torvalds
2022-04-28 17:09                         ` [Cluster-devel] " Linus Torvalds
2022-04-28 17:17                         ` Linus Torvalds
2022-04-28 17:17                           ` [Cluster-devel] " Linus Torvalds
2022-04-28 17:21                           ` Andreas Gruenbacher
2022-04-28 17:21                             ` [Cluster-devel] " Andreas Gruenbacher
2022-04-28 17:38                         ` Andreas Gruenbacher
2022-04-28 17:38                           ` [Cluster-devel] " Andreas Gruenbacher
2022-05-02 18:31                           ` Linus Torvalds
2022-05-02 18:31                             ` [Cluster-devel] " Linus Torvalds
2022-05-02 18:58                             ` Linus Torvalds
2022-05-02 18:58                               ` [Cluster-devel] " Linus Torvalds
2022-05-02 20:24                               ` Andreas Gruenbacher
2022-05-02 20:24                                 ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03  8:56                             ` Andreas Gruenbacher
2022-05-03  8:56                               ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 13:30                               ` Andreas Gruenbacher
2022-05-03 13:30                                 ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 16:19                               ` Linus Torvalds
2022-05-03 16:19                                 ` [Cluster-devel] " Linus Torvalds
2022-05-03 16:41                                 ` Andreas Gruenbacher
2022-05-03 16:41                                   ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 16:50                                   ` Linus Torvalds
2022-05-03 16:50                                     ` [Cluster-devel] " Linus Torvalds
2022-05-03 21:35                               ` Andreas Gruenbacher
2022-05-03 21:35                                 ` [Cluster-devel] " Andreas Gruenbacher
2022-05-03 22:41                                 ` Linus Torvalds
2022-05-03 22:41                                   ` [Cluster-devel] " Linus Torvalds
2022-05-04 17:52                                   ` Andreas Gruenbacher
2022-05-04 17:52                                     ` [Cluster-devel] " Andreas Gruenbacher
2022-04-28 18:16                       ` pr-tracker-bot
2022-04-28 18:16                         ` [Cluster-devel] " pr-tracker-bot
2022-04-26 19:07 ` pr-tracker-bot
2022-04-26 19:07   ` [Cluster-devel] " pr-tracker-bot
2023-06-06 12:48 Andreas Gruenbacher
2023-06-06 12:55 ` Linus Torvalds
2023-06-06 13:32   ` Andreas Gruenbacher
2023-06-06 13:22 ` pr-tracker-bot
2024-03-25 12:10 Andreas Gruenbacher
2024-03-25 18:18 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wicJdoCjPLu7FhaErr6Z3UaW820U2b+F-8P4qwSFUZ0mg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.