On Aug 21, 2019, at 12:27 PM, Harshad Shirwadkar wrote: > > On every dentry deletion, this patch traverses directory inode in > reverse direction and frees up empty dirent blocks until it finds a > non-empty dirent block. We leverage the fact that we never clear > dentry name when we delete dentrys by merging them with the previous > one. So, even though the dirent block has only fake dentry which spans > across the entire block, we can use the name in this dead entry to > perform dx lookup and find intermediate dx node blocks as well as > offset inside these blocks. One high-level limitation with this implementation is that it is unlikely to remove any directory blocks until the directory is almost entirely empty since "rm -r" will return entries in hash order, which does not match the order that the leaf blocks are allocated in the file. Even worse, if files in the directory are not deleted in hash order, no leaf block will be completely empty until about 99% of the files have been deleted - assume 24-byte filenames in 4096-byte blocks means up to 128 entries per block, typically 3/4 full, or 1/96 entries will be left in each block before it becomes empty. One option that was discussed in the past is to use the high 4 bits of dx_entry->block (i.e. the opposite of dx_get_block()) to store the "fullness" of each block (in 1/16th of a block, or 256-byte increments for 4096-byte blocks) and try to merge entries into an adjacent block if it becomes mostly empty (e.g. if the current block plus the neighbour are below 50% full). That allows removing blocks much earlier as the directory shrinks, rather than waiting until each block is completely empty. A fullness of "0" would mean "unset", since we don't set it yet, and once this feature is available there would never be a block that is entirely empty. > As of now, we only support non-indexed directories and indexed > directories with no intermediate dx nodes. This technique can also be > used to remove intermediate dx nodes. But it needs a little more > interesting logic to make that happen since we don't store directory > entry name in intermediate nodes. > > Ran kvm-xfstests smoke test-suite and verified that there are no > failures. Also, verified that when all the files are deleted in a > directory, directory shrinks to either 4k for non-indexed directories > or 8k for indexed directories with 1 level. > > This patch is an improvement over previous patch that I sent out > earlier this month. So, if this patch looks alright, then we can drop > the other shrinking patch. > > Signed-off-by: Harshad Shirwadkar > > --- > This patch supersedes the other directory shrinking patch sent in Aug > 2019 ("ext4: shrink directory when last block is empty"). > > fs/ext4/namei.c | 176 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 176 insertions(+) > > > > +static inline bool is_empty_dirent_block(struct inode *dir, > + struct buffer_head *bh) > +{ > + struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)bh->b_data; > + int csum_size = 0; > + > + if (ext4_has_metadata_csum(dir->i_sb)) > + csum_size = sizeof(struct ext4_dir_entry_tail); > + > + return ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize) == > + dir->i_sb->s_blocksize - csum_size && de->inode == 0; > +} This may not always detect empty directory blocks properly, because ext4_generic_delete_entry() will only merge deleted entries with the previous entry. It at least appears possible that if entries are not deleted in the proper order (e.g. in reverse of the order they are listed in the directory) there may be multiple empty entries in a block, and the above check will fail. Instead, this checks should walk all entries in a block and return false if any one of them has a non-zero de->inode. In the common case there may be only a single entry, or the first entry will be used, so it should be fairly quick to decide that the block cannot be removed. Another option is to change ext4_generic_delete_entry() to also try to merge with the immediately following entry to ensure that an empty block always has rec_len of the full blocksize. However, I think this is probably not a worthwhile effort since it would be better to support removing blocks that are partly empty rather than entirely empty. > @@ -2510,6 +2684,8 @@ static int ext4_delete_entry(handle_t *handle, > if (unlikely(err)) > goto out; > > + ext4_try_dir_shrink(handle, dir); > + > return 0; I think it would be inefficient to try shrinking the directory after _every_ directory entry is removed. Instead, there should be some way to determine here if ext4_generic_delete_entry() removed the last entry from the directory block, and only shrink in that case. Cheers, Andreas