From: Jan Kara <jack@suse.cz> To: <linux-ext4@vger.kernel.org> Cc: Ted Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz> Subject: [PATCH 0/19 v3] ext4: Fix transaction overflow due to revoke descriptors Date: Fri, 4 Oct 2019 00:05:46 +0200 [thread overview] Message-ID: <20191003215523.7313-1-jack@suse.cz> (raw) Hello, Here is v3 of this series with couple more bugs fixed. Now all failed tests Ted higlighted pass for me. Changes since v2: * Fixed bug in revoke credit estimates for extent freeing in bigalloc filesystems * Fixed bug in xattr code treating positive return of ext4_journal_ensure_credits() as error * Fixed preexisting bug in ext4_evict_inode() where we reserved too few credits * Added trace point to jbd2_journal_restart() * Fix some kernel doc bugs * Rebased on top of 5.4-rc1 Changes since v1: * Reordered some patches to reduce code churn * Computation in jbd2_revoke_descriptors_per_block() was too early - moved it to later when journal superblock is loaded and so the feature checking actually works. * Made sure nobody outside JBD2 uses handle->h_buffer_credits since now it contains also credits for revoke descriptors and it was confusing come users. * Updated cover letter with more details about reproducer Original cover letter: I've recently got a bug report where JBD2 assertion failed due to transaction commit running out of journal space. After closer inspection of the crash dump it seems that the problem is that there were too many journal descriptor blocks (more that max_transaction_size >> 5 + 32 we estimate in jbd2_log_space_left()) due to descriptor blocks with revoke records. In fact the estimate on the number of descriptor blocks looks pretty arbitrary and there can be much more descriptor blocks needed for revoke records. We need one revoke record for every metadata block freed. So in the worst case (1k blocksize, 64-bit journal feature enabled, checksumming enabled) we fit 125 revoke record in one descriptor block. In common cases its about 500 revoke records per descriptor block. Now when we free large directories or large file with data journalling enabled, we can have *lots* of blocks to revoke - with extent mapped files easily millions in a single transaction which can mean 10k descriptor blocks - clearly more than the estimate of 128 descriptor blocks per transaction ;) This patch series aims at fixing the problem by accounting descriptor blocks into transaction credits and reserving appropriate amount of credits for revoke descriptors on transaction handle start. Similar to normal transaction credits, the filesystem has to provide estimate for the number of blocks it is going to revoke using the transaction handle so that credits for revoke descriptors can be reserved. The series has survived fstests in couple configurations and also the stress test like: Create filesystem with 1KB blocksize and journal size 32MB Mount the filesystem with -o nodelalloc for (( i = 0; i < 4; i++ )); do dd if=/dev/zero of=file$i bs=1M count=2048 conv=fsync chattr +j file$i done for (( i = 0; i < 4; i++ )); do rm file$i& done which reliably triggers the assertion failure in JBD2 on unpatched kernel. Review and comments are welcome :). Honza Previous versions: Link: http://lore.kernel.org/r/20190927111536.16455-1-jack@suse.cz Link: http://lore.kernel.org/r/20190930103544.11479-1-jack@suse.cz
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: Ted Tso <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org>, Jan Kara <jack@suse.cz> Subject: [PATCH 0/25 v3] ext4: Fix transaction overflow due to revoke descriptors Date: Tue, 5 Nov 2019 17:44:06 +0100 [thread overview] Message-ID: <20191003215523.7313-1-jack@suse.cz> (raw) Message-ID: <20191105164406.eLBpC9DJG-1AAoIhXpHzOpmOgZ3LGdu_J-Or4O79Hvo@z> (raw) Hello, Here is v3 of this series with couple more bugs fixed. Now all failed tests Ted higlighted pass for me. Changes since v3: * Added reviewed-by tags * Added update of precomputed number of revoke records per descriptor block when journal features change * Fix possible inode leak in ext4_add_mkdir() in case of crash * Fix other cases where last inode reference gets dropped while the transaction is running * Improve comment about the new meaning of t_outstanding_credits * Added patch to reduce estimate on the base number of reserved descriptor blocks * jbd2_journal_revoke() now returns error in case there are not enough revoke credits so that ext4 can abort handle etc. * Improve estimate on the number of necessary revoke records for truncate * Add fix for too small array of journal buffers * Account number of revoke records in the transaction to avoid overestimation of necessary revoke descriptor blocks Changes since v2: * Fixed bug in revoke credit estimates for extent freeing in bigalloc filesystems * Fixed bug in xattr code treating positive return of ext4_journal_ensure_credits() as error * Fixed preexisting bug in ext4_evict_inode() where we reserved too few credits * Added trace point to jbd2_journal_restart() * Fix some kernel doc bugs * Rebased on top of 5.4-rc1 Changes since v1: * Reordered some patches to reduce code churn * Computation in jbd2_revoke_descriptors_per_block() was too early - moved it to later when journal superblock is loaded and so the feature checking actually works. * Made sure nobody outside JBD2 uses handle->h_buffer_credits since now it contains also credits for revoke descriptors and it was confusing come users. * Updated cover letter with more details about reproducer Original cover letter: I've recently got a bug report where JBD2 assertion failed due to transaction commit running out of journal space. After closer inspection of the crash dump it seems that the problem is that there were too many journal descriptor blocks (more that max_transaction_size >> 5 + 32 we estimate in jbd2_log_space_left()) due to descriptor blocks with revoke records. In fact the estimate on the number of descriptor blocks looks pretty arbitrary and there can be much more descriptor blocks needed for revoke records. We need one revoke record for every metadata block freed. So in the worst case (1k blocksize, 64-bit journal feature enabled, checksumming enabled) we fit 125 revoke record in one descriptor block. In common cases its about 500 revoke records per descriptor block. Now when we free large directories or large file with data journalling enabled, we can have *lots* of blocks to revoke - with extent mapped files easily millions in a single transaction which can mean 10k descriptor blocks - clearly more than the estimate of 128 descriptor blocks per transaction ;) This patch series aims at fixing the problem by accounting descriptor blocks into transaction credits and reserving appropriate amount of credits for revoke descriptors on transaction handle start. Similar to normal transaction credits, the filesystem has to provide estimate for the number of blocks it is going to revoke using the transaction handle so that credits for revoke descriptors can be reserved. The series has survived fstests in couple configurations and also the stress test like: Create filesystem with 1KB blocksize and journal size 32MB Mount the filesystem with -o nodelalloc for (( i = 0; i < 4; i++ )); do dd if=/dev/zero of=file$i bs=1M count=2048 conv=fsync chattr +j file$i done for (( i = 0; i < 4; i++ )); do rm file$i& done which reliably triggers the assertion failure in JBD2 on unpatched kernel. Review and comments are welcome :). Honza Previous versions: Link: http://lore.kernel.org/r/20190927111536.16455-1-jack@suse.cz Link: http://lore.kernel.org/r/20190930103544.11479-1-jack@suse.cz
next reply other threads:[~2019-10-03 22:05 UTC|newest] Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-03 22:05 Jan Kara [this message] 2019-10-03 22:05 ` [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara 2019-10-21 1:08 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 02/22] jbd2: Fixup stale comment in commit code Jan Kara 2019-10-21 1:08 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir() Jan Kara 2019-10-21 1:21 ` Theodore Y. Ts'o 2019-10-24 10:19 ` Jan Kara 2019-10-24 12:09 ` Theodore Y. Ts'o 2019-10-24 13:37 ` Jan Kara 2019-11-04 12:35 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 04/22] ext4: Fix credit estimate for final inode freeing Jan Kara 2019-10-21 1:07 ` Theodore Y. Ts'o 2019-10-24 10:30 ` Jan Kara 2019-10-03 22:05 ` [PATCH 05/22] ext4: Fix ext4_should_journal_data() for EA inodes Jan Kara 2019-10-21 1:38 ` Theodore Y. Ts'o 2019-10-23 16:55 ` Jan Kara 2019-10-03 22:05 ` [PATCH 06/22] ext4: Use ext4_journal_extend() instead of jbd2_journal_extend() Jan Kara 2019-10-21 1:39 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 07/22] ext4: Avoid unnecessary revokes in ext4_alloc_branch() Jan Kara 2019-10-21 13:39 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 08/22] ext4: Provide function to handle transaction restarts Jan Kara 2019-10-21 16:20 ` Theodore Y. Ts'o 2019-10-23 16:25 ` Jan Kara 2019-10-03 22:05 ` [PATCH 09/22] ext4, jbd2: Provide accessor function for handle credits Jan Kara 2019-10-21 16:21 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 10/22] ocfs2: Use accessor function for h_buffer_credits Jan Kara 2019-10-21 16:21 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 11/22] jbd2: Fix statistics for the number of logged blocks Jan Kara 2019-10-21 16:24 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 12/22] jbd2: Reorganize jbd2_journal_stop() Jan Kara 2019-10-21 17:29 ` Theodore Y. Ts'o 2019-10-03 22:05 ` [PATCH 13/22] jbd2: Drop pointless check from jbd2_journal_stop() Jan Kara 2019-10-21 17:30 ` Theodore Y. Ts'o 2019-10-03 22:06 ` [PATCH 14/22] jbd2: Drop pointless wakeup " Jan Kara 2019-10-21 17:34 ` Theodore Y. Ts'o 2019-10-03 22:06 ` [PATCH 15/22] jbd2: Factor out common parts of stopping and restarting a handle Jan Kara 2019-10-21 17:49 ` Theodore Y. Ts'o 2019-10-23 16:17 ` Jan Kara 2019-11-04 12:36 ` Theodore Y. Ts'o 2019-11-04 12:59 ` Jan Kara 2019-10-03 22:06 ` [PATCH 16/22] jbd2: Account descriptor blocks into t_outstanding_credits Jan Kara 2019-10-21 21:04 ` Theodore Y. Ts'o 2019-10-23 13:09 ` Jan Kara 2019-10-03 22:06 ` [PATCH 17/22] jbd2: Drop jbd2_space_needed() Jan Kara 2019-10-21 21:05 ` Theodore Y. Ts'o 2019-10-03 22:06 ` [PATCH 18/22] jbd2: Reserve space for revoke descriptor blocks Jan Kara 2019-10-21 21:47 ` Theodore Y. Ts'o 2019-10-23 13:27 ` Jan Kara 2019-10-03 22:06 ` [PATCH 19/22] jbd2: Rename h_buffer_credits to h_total_credits Jan Kara 2019-10-21 21:48 ` Theodore Y. Ts'o 2019-10-03 22:06 ` [PATCH 20/22] jbd2: Make credit checking more strict Jan Kara 2019-10-21 22:29 ` Theodore Y. Ts'o 2019-10-23 13:30 ` Jan Kara 2019-10-03 22:06 ` [PATCH 21/22] ext4: Reserve revoke credits for freed blocks Jan Kara 2019-10-21 23:18 ` Theodore Y. Ts'o 2019-10-23 16:13 ` Jan Kara 2019-11-04 13:08 ` Theodore Y. Ts'o 2019-11-05 8:31 ` Jan Kara 2019-10-03 22:06 ` [PATCH 22/22] jbd2: Provide trace event for handle restarts Jan Kara 2019-10-21 23:18 ` Theodore Y. Ts'o 2019-10-19 19:19 ` [PATCH 0/19 v3] ext4: Fix transaction overflow due to revoke descriptors Theodore Y. Ts'o 2019-10-24 13:09 ` Jan Kara 2019-10-24 15:12 ` Jan Kara 2019-11-04 3:32 ` Theodore Y. Ts'o 2019-11-04 11:22 ` Jan Kara 2019-11-04 13:09 ` Theodore Y. Ts'o 2019-11-05 16:44 ` [PATCH 0/25 " Jan Kara 2019-11-05 16:44 ` [PATCH 01/25] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara 2019-11-05 16:44 ` [PATCH 02/25] jbd2: Fixup stale comment in commit code Jan Kara 2019-11-05 16:44 ` [PATCH 03/25] jbd2: Completely fill journal descriptor blocks Jan Kara 2019-11-05 16:44 ` [PATCH 04/25] ext4: Move marking of handle as sync to ext4_add_nondir() Jan Kara 2019-11-05 16:44 ` [PATCH 05/25] ext4: Do not iput inode under running transaction Jan Kara 2019-11-05 16:44 ` [PATCH 06/25] ext4: Fix credit estimate for final inode freeing Jan Kara 2019-11-05 21:00 ` Theodore Y. Ts'o 2019-11-05 16:44 ` [PATCH 07/25] ext4: Fix ext4_should_journal_data() for EA inodes Jan Kara 2019-11-05 16:44 ` [PATCH 08/25] ext4: Use ext4_journal_extend() instead of jbd2_journal_extend() Jan Kara 2019-11-05 16:44 ` [PATCH 09/25] ext4: Avoid unnecessary revokes in ext4_alloc_branch() Jan Kara 2019-11-05 16:44 ` [PATCH 10/25] ext4: Provide function to handle transaction restarts Jan Kara 2019-11-05 16:44 ` [PATCH 11/25] ext4, jbd2: Provide accessor function for handle credits Jan Kara 2019-11-05 16:44 ` [PATCH 12/25] ocfs2: Use accessor function for h_buffer_credits Jan Kara 2019-11-05 16:44 ` [PATCH 13/25] jbd2: Fix statistics for the number of logged blocks Jan Kara 2019-11-05 16:44 ` [PATCH 14/25] jbd2: Reorganize jbd2_journal_stop() Jan Kara 2019-11-05 16:44 ` [PATCH 15/25] jbd2: Drop pointless check from jbd2_journal_stop() Jan Kara 2019-11-05 16:44 ` [PATCH 16/25] jbd2: Drop pointless wakeup " Jan Kara 2019-11-05 16:44 ` [PATCH 17/25] jbd2: Factor out common parts of stopping and restarting a handle Jan Kara 2019-11-05 16:44 ` [PATCH 18/25] jbd2: Account descriptor blocks into t_outstanding_credits Jan Kara 2019-11-05 16:44 ` [PATCH 19/25] jbd2: Drop jbd2_space_needed() Jan Kara 2019-11-05 16:44 ` [PATCH 20/25] jbd2: Reserve space for revoke descriptor blocks Jan Kara 2019-11-15 7:52 ` Eric Biggers 2019-11-15 10:02 ` Jan Kara 2019-11-15 14:20 ` Theodore Y. Ts'o 2019-11-15 17:10 ` Eric Biggers 2019-11-05 16:44 ` [PATCH 21/25] jbd2: Rename h_buffer_credits to h_total_credits Jan Kara 2019-11-05 16:44 ` [PATCH 22/25] jbd2: Make credit checking more strict Jan Kara 2019-11-05 16:44 ` [PATCH 23/25] ext4: Reserve revoke credits for freed blocks Jan Kara 2019-11-05 16:44 ` [PATCH 24/25] jbd2: Provide trace event for handle restarts Jan Kara 2019-11-05 16:44 ` [PATCH 25/25] jbd2: Fine tune estimate of necessary descriptor blocks Jan Kara 2019-11-05 21:04 ` [PATCH 0/25 v3] ext4: Fix transaction overflow due to revoke descriptors Theodore Y. Ts'o [not found] ` <20191112220614.GA11089@mit.edu> [not found] ` <20191113094545.GC6367@quack2.suse.cz> 2019-11-14 5:26 ` [PATCH 0/19 " Theodore Y. Ts'o 2019-11-14 8:49 ` Jan Kara
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191003215523.7313-1-jack@suse.cz \ --to=jack@suse.cz \ --cc=linux-ext4@vger.kernel.org \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).