From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:57985 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754174AbeEWH6l (ORCPT ); Wed, 23 May 2018 03:58:41 -0400 Subject: Re: [PATCH] Btrfs: implement unlocked buffered write To: Christoph Hellwig , Chris Mason Cc: robbieko , linux-btrfs@vger.kernel.org References: <1526442757-7167-1-git-send-email-robbieko@synology.com> <20180522180828.GA8340@infradead.org> <20180523063713.GA18285@infradead.org> From: Nikolay Borisov Message-ID: <66878a15-902b-dff8-1604-a918e2b25298@suse.com> Date: Wed, 23 May 2018 10:58:36 +0300 MIME-Version: 1.0 In-Reply-To: <20180523063713.GA18285@infradead.org> Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 23.05.2018 09:37, Christoph Hellwig wrote: > On Tue, May 22, 2018 at 02:31:36PM -0400, Chris Mason wrote: >>> And what protects two writes from interleaving their results now? >> >> page locks...ish, we at least won't have results interleaved in a single >> page. For btrfs it'll actually be multiple pages since we try to do more >> than one at a time. > > I think you are going to break just about every assumption people > have that any single write is going to be atomic vs another write. > > E.g. this comes from the posix read definition reference by the > write definition: > > "I/O is intended to be atomic to ordinary files and pipes and FIFOs. > Atomic means that all the bytes from a single operation that started out > together end up together, without interleaving from other I/O > operations. It is a known attribute of terminals that this is not > honored, and terminals are explicitly (and implicitly permanently) > excepted, making the behavior unspecified. The behavior for other device > types is also left unspecified, but the wording is intended to imply > that future standards might choose to specify atomicity (or not). > " > > Without taking the inode lock (or some sort of range lock) you can > easily interleave data from two write requests. > >> But we're not avoiding the inode lock completely, we're just dropping it for >> the expensive parts of writing to the file. A quick guess about what the >> expensive parts are: > > The way I read the patch it basically 'avoids' the inode lock for almost > the whole write call, just minus some setup. You are right, this drops the inode_lock for some rather crucial ops: prepare_pages - this one allocates pages in the page cache mapping for this inode and locks them lock_and_cleanup_extent_if_need - this one completes any previous writes for the range currenlty being written and locks the range of extents. Looking at the code I *think* the locking provided by prepare_pages/lock_and_cleanup_extent_if_need *should* provide the necessary exclusivity to guarantee atomicity w.r.t to different writes. So Robbie, This reliance on btrfs-internal locks to provide exclusivity *must* be explicitly documented in the commit log. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >