From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 886F7C43464 for ; Fri, 18 Sep 2020 12:27:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3546F21D24 for ; Fri, 18 Sep 2020 12:27:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hkcLP2eF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726475AbgIRM1D (ORCPT ); Fri, 18 Sep 2020 08:27:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:22991 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725955AbgIRM1D (ORCPT ); Fri, 18 Sep 2020 08:27:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600432021; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0fKfZIIzGiatJHAcexj7+1Kf9KYS0KSYRXI+7NGJPas=; b=hkcLP2eFaHCvJ7KTScnbiFUxs5K2P98fnbXrnbjCLvYkGdCBwTqpflpxV2wE4tbSF0fike JM83TP+yix974C2ytrLaWvnFVNL1qNVQ3s6RtoHmBA/YHzcQfyTnKx2ywf6Gdo1O9s/oa1 RjQnCLOJMpgDtMqhnFjD7v+3F8X3uH0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-88-K_BGBupBP_WSZIuNASCbwQ-1; Fri, 18 Sep 2020 08:25:44 -0400 X-MC-Unique: K_BGBupBP_WSZIuNASCbwQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AD6128B8C31; Fri, 18 Sep 2020 12:25:31 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D428183597; Fri, 18 Sep 2020 12:25:30 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 08ICPUcX006150; Fri, 18 Sep 2020 08:25:30 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 08ICPS9s006146; Fri, 18 Sep 2020 08:25:28 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Fri, 18 Sep 2020 08:25:28 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Dan Williams cc: Linus Torvalds , Alexander Viro , Andrew Morton , Matthew Wilcox , Jan Kara , Eric Sandeen , Dave Chinner , Linux Kernel Mailing List , linux-fsdevel Subject: the "read" syscall sees partial effects of the "write" syscall In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi I'd like to ask about this problem: when we write to a file, the kernel takes the write inode lock. When we read from a file, no lock is taken - thus the read syscall can read data that are halfway modified by the write syscall. The standard specifies the effects of the write syscall are atomic - see this: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07 > 2.9.7 Thread Interactions with Regular File Operations > > All of the following functions shall be atomic with respect to each > other in the effects specified in POSIX.1-2017 when they operate on > regular files or symbolic links: > > chmod() fchownat() lseek() readv() unlink() > chown() fcntl() lstat() pwrite() unlinkat() > close() fstat() open() rename() utime() > creat() fstatat() openat() renameat() utimensat() > dup2() ftruncate() pread() stat() utimes() > fchmod() lchown() read() symlink() write() > fchmodat() link() readlink() symlinkat() writev() > fchown() linkat() readlinkat() truncate() > > If two threads each call one of these functions, each call shall either > see all of the specified effects of the other call, or none of them. The > requirement on the close() function shall also apply whenever a file > descriptor is successfully closed, however caused (for example, as a > consequence of calling close(), calling dup2(), or of process > termination). Should the read call take the read inode lock to make it atomic w.r.t. the write syscall? (I know - taking the read lock causes big performance hit due to cache line bouncing) I've created this program to test it - it has two threads, one writing and the other reading and verifying. When I run it on OpenBSD or FreeBSD, it passes, on Linux it fails with "we read modified bytes". Mikulas #include #include #include #include #include #include #define L 65536 static int h; static pthread_barrier_t barrier; static pthread_t thr; static char rpattern[L]; static char wpattern[L]; static void *reader(__attribute__((unused)) void *ptr) { while (1) { int r; size_t i; r = pthread_barrier_wait(&barrier); if (r > 0) fprintf(stderr, "pthread_barrier_wait: %s\n", strerror(r)), exit(1); r = pread(h, rpattern, L, 0); if (r != L) perror("pread"), exit(1); for (i = 0; i < L; i++) { if (rpattern[i] != rpattern[0]) fprintf(stderr, "we read modified bytes\n"), exit(1); } } return NULL; } int main(__attribute__((unused)) int argc, char *argv[]) { int r; h = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0644); if (h < 0) perror("open"), exit(1); r = pwrite(h, wpattern, L, 0); if (r != L) perror("pwrite"), exit(1); r = pthread_barrier_init(&barrier, NULL, 2); if (r) fprintf(stderr, "pthread_barrier_init: %s\n", strerror(r)), exit(1); r = pthread_create(&thr, NULL, reader, NULL); if (r) fprintf(stderr, "pthread_create: %s\n", strerror(r)), exit(1); while (1) { size_t i; for (i = 0; i < L; i++) wpattern[i]++; r = pthread_barrier_wait(&barrier); if (r > 0) fprintf(stderr, "pthread_barrier_wait: %s\n", strerror(r)), exit(1); r = pwrite(h, wpattern, L, 0); if (r != L) perror("pwrite"), exit(1); } }