linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeffle Xu <jefflexu@linux.alibaba.com>
To: enwlinux@gmail.com, linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, joseph.qi@linux.alibaba.com
Subject: [PATCH RFC] ext4: fix partial cluster initialization when splitting extent
Date: Thu, 14 May 2020 16:14:57 +0800	[thread overview]
Message-ID: <1589444097-38535-1-git-send-email-jefflexu@linux.alibaba.com> (raw)

Hi Eric, would you mind explaining why the magic number '2' is used here
when calculating the physical cluster number of the partial cluster in
commit f4226d9ea400 ("ext4: fix partial cluster initialization") ?

```
+     /*
+      * If we're going to split the extent, note that
+      * the cluster containing the block after 'end' is
+      * in use to avoid freeing it when removing blocks.
+      */
+     if (sbi->s_cluster_ratio > 1) {
+             pblk = ext4_ext_pblock(ex) + end - ee_block + 2;
+             partial_cluster =
+                     -(long long) EXT4_B2C(sbi, pblk);
+     }
```

As far as I understand, there we are initializing the partial cluster
describing the beginning of the split extent after 'end'. The
corrsponding physical block number of the first block in the split
extent should be 'ext4_ext_pblock(ex) + end - ee_block + 1'.

This bug will cause xfstests shared/298 failure on ext4 with bigalloc
enabled sometimes. Ext4 error messages indicate that previously freed
blocks are being freed again, and the following fsck will fail due to
the inconsistency of block bitmap and bg descriptor.

The following is an example case:

1. First, Initialize a ext4 filesystem with cluster size '16K', block size
'4K', in which case, one cluster contains four blocks.

2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
tree of this file is like:

...
36864:[0]4:220160
36868:[0]14332:145408
51200:[0]2:231424
...

3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
like:

..
ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
...

4. Then the extent tree of this file after punching is like

...
49507:[0]37:158047
49547:[0]58:158087
...

5. Detailed procedure of punching hole [49544, 49546]

5.1. The block address space:
```
lblk        ~49505  49506   49507~49543     49544~49546    49547~
	  ---------+------+-------------+----------------+--------
	    extent | hole |   extent	|	hole	 | extent
	  ---------+------+-------------+----------------+--------
pblk       ~158045  158046  158047~158083  158084~158086   158087~
```

5.2. The detailed layout of cluster 39521:
```
		cluster 39521
	<------------------------------->

		hole		  extent
	<----------------------><--------

lblk      49544   49545   49546   49547
	+-------+-------+-------+-------+
	|	|	|	|	|
	+-------+-------+-------+-------+
pblk     158084  1580845  158086  158087
```

5.3. The ftrace output when punching hole [49544, 49546]:
- ext4_ext_remove_space (start 49544, end 49546)
  - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
    - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
      - ext4_free_blocks: (block 158084 count 4)
        - ext4_mballoc_free (extent 1/6753/1)

In this case, the whole cluster 39521 is freed mistakenly when freeing
pblock 158084~158086 (i.e., the first three blocks of this cluster),
although pblock 158087 (the last remaining block of this cluster) has
not been freed yet.

The root cause of this isuue is that, the pclu of the partial cluster is
calculated mistakenly in ext4_ext_remove_space(). The correct
partial_cluster.pclu (i.e., the cluster number of the first block in the
next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
than 39522.

Fixes: f4226d9ea400 ("ext4: fix partial cluster initialization")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/ext4/extents.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f2b577b..cb74496 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2828,7 +2828,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 			 * in use to avoid freeing it when removing blocks.
 			 */
 			if (sbi->s_cluster_ratio > 1) {
-				pblk = ext4_ext_pblock(ex) + end - ee_block + 2;
+				pblk = ext4_ext_pblock(ex) + end - ee_block + 1;
 				partial.pclu = EXT4_B2C(sbi, pblk);
 				partial.state = nofree;
 			}
-- 
1.8.3.1


             reply	other threads:[~2020-05-14  8:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14  8:14 Jeffle Xu [this message]
2020-05-14 22:21 ` [PATCH RFC] ext4: fix partial cluster initialization when splitting extent Eric Whitney
2020-05-18 22:08   ` Eric Whitney
2020-05-19  3:29     ` JeffleXu
2020-05-21 21:26       ` Eric Whitney
2020-05-22  3:09         ` JeffleXu
2020-05-22  4:02 Jeffle Xu
2020-05-22  4:15 ` JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1589444097-38535-1-git-send-email-jefflexu@linux.alibaba.com \
    --to=jefflexu@linux.alibaba.com \
    --cc=enwlinux@gmail.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).