From: Mikulas Patocka <mpatocka@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Eric Sandeen <esandeen@redhat.com>,
Dave Chinner <dchinner@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: the "read" syscall sees partial effects of the "write" syscall
Date: Fri, 18 Sep 2020 08:25:28 -0400 (EDT) [thread overview]
Message-ID: <alpine.LRH.2.02.2009180509370.19302@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <CAPcyv4gFz6vBVVp_aiX4i2rL+8fps3gTQGj5cYw8QESCf7=DfQ@mail.gmail.com>
Hi
I'd like to ask about this problem: when we write to a file, the kernel
takes the write inode lock. When we read from a file, no lock is taken -
thus the read syscall can read data that are halfway modified by the write
syscall.
The standard specifies the effects of the write syscall are atomic - see
this:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07
> 2.9.7 Thread Interactions with Regular File Operations
>
> All of the following functions shall be atomic with respect to each
> other in the effects specified in POSIX.1-2017 when they operate on
> regular files or symbolic links:
>
> chmod() fchownat() lseek() readv() unlink()
> chown() fcntl() lstat() pwrite() unlinkat()
> close() fstat() open() rename() utime()
> creat() fstatat() openat() renameat() utimensat()
> dup2() ftruncate() pread() stat() utimes()
> fchmod() lchown() read() symlink() write()
> fchmodat() link() readlink() symlinkat() writev()
> fchown() linkat() readlinkat() truncate()
>
> If two threads each call one of these functions, each call shall either
> see all of the specified effects of the other call, or none of them. The
> requirement on the close() function shall also apply whenever a file
> descriptor is successfully closed, however caused (for example, as a
> consequence of calling close(), calling dup2(), or of process
> termination).
Should the read call take the read inode lock to make it atomic w.r.t. the
write syscall? (I know - taking the read lock causes big performance hit
due to cache line bouncing)
I've created this program to test it - it has two threads, one writing and
the other reading and verifying. When I run it on OpenBSD or FreeBSD, it
passes, on Linux it fails with "we read modified bytes".
Mikulas
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>
#define L 65536
static int h;
static pthread_barrier_t barrier;
static pthread_t thr;
static char rpattern[L];
static char wpattern[L];
static void *reader(__attribute__((unused)) void *ptr)
{
while (1) {
int r;
size_t i;
r = pthread_barrier_wait(&barrier);
if (r > 0) fprintf(stderr, "pthread_barrier_wait: %s\n", strerror(r)), exit(1);
r = pread(h, rpattern, L, 0);
if (r != L) perror("pread"), exit(1);
for (i = 0; i < L; i++) {
if (rpattern[i] != rpattern[0])
fprintf(stderr, "we read modified bytes\n"), exit(1);
}
}
return NULL;
}
int main(__attribute__((unused)) int argc, char *argv[])
{
int r;
h = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0644);
if (h < 0) perror("open"), exit(1);
r = pwrite(h, wpattern, L, 0);
if (r != L) perror("pwrite"), exit(1);
r = pthread_barrier_init(&barrier, NULL, 2);
if (r) fprintf(stderr, "pthread_barrier_init: %s\n", strerror(r)), exit(1);
r = pthread_create(&thr, NULL, reader, NULL);
if (r) fprintf(stderr, "pthread_create: %s\n", strerror(r)), exit(1);
while (1) {
size_t i;
for (i = 0; i < L; i++)
wpattern[i]++;
r = pthread_barrier_wait(&barrier);
if (r > 0) fprintf(stderr, "pthread_barrier_wait: %s\n", strerror(r)), exit(1);
r = pwrite(h, wpattern, L, 0);
if (r != L) perror("pwrite"), exit(1);
}
}
next prev parent reply other threads:[~2020-09-18 12:27 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-15 12:34 [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-15 13:00 ` Matthew Wilcox
2020-09-15 13:24 ` Mikulas Patocka
2020-09-22 10:04 ` Ritesh Harjani
2020-09-15 15:16 ` Dan Williams
2020-09-15 16:58 ` Mikulas Patocka
2020-09-15 17:38 ` Mikulas Patocka
2020-09-16 10:57 ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Mikulas Patocka
2020-09-16 16:21 ` Dan Williams
2020-09-16 17:24 ` Mikulas Patocka
2020-09-16 17:40 ` Dan Williams
2020-09-16 18:06 ` Mikulas Patocka
2020-09-21 16:20 ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-22 5:03 ` Dave Chinner
2020-09-22 16:46 ` Mikulas Patocka
2020-09-22 17:25 ` Matthew Wilcox
2020-09-24 15:00 ` Mikulas Patocka
2020-09-28 15:22 ` Mikulas Patocka
2020-09-23 2:45 ` Dave Chinner
2020-09-23 9:20 ` A bug in ext4 with big directories (was: NVFS XFS metadata) Mikulas Patocka
2020-09-23 9:44 ` Jan Kara
2020-09-23 12:46 ` Mikulas Patocka
2020-09-23 17:19 ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-23 9:57 ` Jan Kara
2020-09-23 13:11 ` Mikulas Patocka
2020-09-23 15:04 ` Matthew Wilcox
2020-09-22 12:28 ` Matthew Wilcox
2020-09-22 12:39 ` Mikulas Patocka
2020-09-16 18:56 ` [PATCH] pmem: fix __copy_user_flushcache Mikulas Patocka
2020-09-18 1:53 ` Dan Williams
2020-09-18 12:25 ` Mikulas Patocka [this message]
2020-09-18 13:13 ` the "read" syscall sees partial effects of the "write" syscall Jan Kara
2020-09-18 18:02 ` Linus Torvalds
2020-09-20 23:41 ` Dave Chinner
2020-09-17 6:50 ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Christoph Hellwig
2020-09-21 16:19 ` [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-21 16:29 ` Dan Williams
2020-09-22 15:43 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.02.2009180509370.19302@file01.intranet.prod.int.rdu2.redhat.com \
--to=mpatocka@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dchinner@redhat.com \
--cc=esandeen@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).