From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from synology.com ([59.124.61.242]:41813 "EHLO synology.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754124AbeEWH0K (ORCPT ); Wed, 23 May 2018 03:26:10 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Date: Wed, 23 May 2018 15:26:09 +0800 From: robbieko To: Chris Mason Cc: Christoph Hellwig , linux-btrfs@vger.kernel.org, linux-btrfs-owner@vger.kernel.org Subject: Re: [PATCH] Btrfs: implement unlocked buffered write In-Reply-To: References: <1526442757-7167-1-git-send-email-robbieko@synology.com> <20180522180828.GA8340@infradead.org> Message-ID: <7ceaa6253162b32bf23e57e1763647de@synology.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: Chris Mason 於 2018-05-23 02:31 寫到: > On 22 May 2018, at 14:08, Christoph Hellwig wrote: > >> On Wed, May 16, 2018 at 11:52:37AM +0800, robbieko wrote: >>> From: Robbie Ko >>> >>> This idea is from direct io. By this patch, we can make the buffered >>> write parallel, and improve the performance and latency. But because >>> we >>> can not update isize without i_mutex, the unlocked buffered write >>> just >>> can be done in front of the EOF. >>> >>> We needn't worry about the race between buffered write and truncate, >>> because the truncate need wait until all the buffered write end. >>> >>> And we also needn't worry about the race between dio write and punch >>> hole, >>> because we have extent lock to protect our operation. >>> >>> I ran fio to test the performance of this feature. >> >> And what protects two writes from interleaving their results now? > > page locks...ish, we at least won't have results interleaved in a > single page. For btrfs it'll actually be multiple pages since we try > to do more than one at a time. > > I haven't verified all the assumptions around truncate and fallocate > and friends expecting the dio special locking to be inside i_size. In > general this makes me a little uncomfortable. > > But we're not avoiding the inode lock completely, we're just dropping > it for the expensive parts of writing to the file. A quick guess > about what the expensive parts are: > > 1) balance_dirty_pages() > 2) btrfs_btree_balance_dirty() > 3) metadata reservations/enospc waiting. > The expensive part of buffered_write are: 1. prepare_pages() --wait_on_page_writeback() Because writeback submit page to PG_writeback. We must wait until the page writeback IO ends. 2. lock_and_cleanup_extent_if_need --btrfs_start_ordered_extent When a large number of ordered_extent queue is in endio_write_workers workqueue. Buffered_write assumes that ordered_extent is the last one in the endio_write_workers workqueue, and waits for all ordered_extents to be processed before because the workqueue is a FIFO. Thanks. Robbie Ko > Can I bribe you to benchmark how much each of those things is > impacting the iops/latency benefits? > > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html