From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB63FC433E1 for ; Thu, 20 Aug 2020 13:16:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B3E3322BF3 for ; Thu, 20 Aug 2020 13:16:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597929416; bh=5N3R9ej26xSgXvjO3fMYpG3b6zxAP3I8LVzaW3P5g2Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=t6oEu6egESpPDxx4SU7VEFT85x9oTbXM3cOTf9dbn5VfRVKXZA8LlYGkeHuxb5dqd Gq2DhUOvE28LkK1MUA951BM2IZHjejH4ul0ZLSIn5lRNfuNfDMDDT09DNB5DLaY5GF eAXlREvBsjxgIZ2Vtpdt9vNl8IAwrQAMCHuYBvaI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729559AbgHTNQz (ORCPT ); Thu, 20 Aug 2020 09:16:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:48482 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728368AbgHTJfO (ORCPT ); Thu, 20 Aug 2020 05:35:14 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 85B2820724; Thu, 20 Aug 2020 09:35:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597916113; bh=5N3R9ej26xSgXvjO3fMYpG3b6zxAP3I8LVzaW3P5g2Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=x9mKBYL/8AaWDSujguB4G4bBGBhI1XN6VI1AynM4opyzUu7qk9gHRC65yL2iXxcF4 o8iy/iBESoU8K93u3NV5rb8/Z2EmmSO4YLkh8pQWHgPJH4z0XWvd0wxMEYsbPAC7HE tmj+3cYFiHhHO4llZB/UL/xRWhoZ5696IPhzDzTs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba Subject: [PATCH 5.7 016/204] btrfs: only commit delayed items at fsync if we are logging a directory Date: Thu, 20 Aug 2020 11:18:33 +0200 Message-Id: <20200820091607.031110603@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200820091606.194320503@linuxfoundation.org> References: <20200820091606.194320503@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit 5aa7d1a7f4a2f8ca6be1f32415e9365d026e8fa7 upstream. When logging an inode we are committing its delayed items if either the inode is a directory or if it is a new inode, created in the current transaction. We need to do it for directories, since new directory indexes are stored as delayed items of the inode and when logging a directory we need to be able to access all indexes from the fs/subvolume tree in order to figure out which index ranges need to be logged. However for new inodes that are not directories, we do not need to do it because the only type of delayed item they can have is the inode item, and we are guaranteed to always log an up to date version of the inode item: *) for a full fsync we do it by committing the delayed inode and then copying the item from the fs/subvolume tree with copy_inode_items_to_log(); *) for a fast fsync we always log the inode item based on the contents of the in-memory struct btrfs_inode. We guarantee this is always done since commit e4545de5b035c7 ("Btrfs: fix fsync data loss after append write"). So stop running delayed items for a new inodes that are not directories, since that forces committing the delayed inode into the fs/subvolume tree, wasting time and adding contention to the tree when a full fsync is not required. We will only do it in case a fast fsync is needed. This patch is part of a series that has the following patches: 1/4 btrfs: only commit the delayed inode when doing a full fsync 2/4 btrfs: only commit delayed items at fsync if we are logging a directory 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log 4/4 btrfs: remove no longer needed use of log_writers for the log root tree After the entire patchset applied I saw about 12% decrease on max latency reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of ram, using kvm and using a raw NVMe device directly (no intermediary fs on the host). The test was invoked like the following: mkfs.btrfs -f /dev/sdk mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk dbench -D /mnt/sdk -t 300 8 umount /mnt/dsk CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5150,7 +5150,6 @@ static int btrfs_log_inode(struct btrfs_ const loff_t end, struct btrfs_log_ctx *ctx) { - struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_path *path; struct btrfs_path *dst_path; struct btrfs_key min_key; @@ -5193,15 +5192,17 @@ static int btrfs_log_inode(struct btrfs_ max_key.offset = (u64)-1; /* - * Only run delayed items if we are a dir or a new file. + * Only run delayed items if we are a directory. We want to make sure + * all directory indexes hit the fs/subvolume tree so we can find them + * and figure out which index ranges have to be logged. + * * Otherwise commit the delayed inode only if the full sync flag is set, * as we want to make sure an up to date version is in the subvolume * tree so copy_inode_items_to_log() / copy_items() can find it and copy * it to the log tree. For a non full sync, we always log the inode item * based on the in-memory struct btrfs_inode which is always up to date. */ - if (S_ISDIR(inode->vfs_inode.i_mode) || - inode->generation > fs_info->last_trans_committed) + if (S_ISDIR(inode->vfs_inode.i_mode)) ret = btrfs_commit_inode_delayed_items(trans, inode); else if (test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags)) ret = btrfs_commit_inode_delayed_inode(inode);