linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: harshad shirwadkar <harshadshirwadkar@gmail.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH] ext4: attempt to shrink directory on dentry removal
Date: Sun, 25 Aug 2019 19:46:50 -0700	[thread overview]
Message-ID: <CAD+ocbyrJDbUFtQdq087iJCc=41CNACeAA2U_KQG-B4w19Soqw@mail.gmail.com> (raw)
In-Reply-To: <20190824023110.GB19348@mit.edu>

On Fri, Aug 23, 2019 at 7:31 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Aug 21, 2019 at 11:27:40AM -0700, Harshad Shirwadkar wrote:
> > As of now, we only support non-indexed directories and indexed
> > directories with no intermediate dx nodes.
>
> From my testing, it doen't work on non-indexed directories; the
> problem is in the is_empty_dirent_block() function.
>
> > This technique can also be used to remove intermediate dx nodes. But
> > it needs a little more interesting logic to make that happen since
> > we don't store directory entry name in intermediate nodes.
>
> Actually, that's not the problem.  An empty dx node is not allowed; in
> a two-level htree directory, if removing an empty leaf node becomes
> causes its parent dx node to become empty, we need to remove the
> parent dx node immediately.  Which is fine, because when we did the
> dx_probe, we know the path from the root to the leaf node.
>
> But removing the parent dx node means we need to be able to remove an
> empty block from the directory, regardless whether it is at the end of
> the directory or not.  OK, so how to do that?
>
> First of all, how to do find any directory block's parent?  If it is a
> leaf node, we can simply calculate the hash on the first directory
> entry (whether it is "dead" or not), and then do a lookup based on
> that hash.  For an intermediate dx node, we can do the same thing,
> since a dx node is simply a list of hashes and block numbers.  The
> first hash in the dx node can be used to do a htree lookup, and that
> will give us the full path from the root of the htree to the dx node.
>
> So, suppose we delete a file, and that causes a leaf node to become
> empty.  We can actually shrink the directory right then and there, and
> not wait until the last block in the directory beomes empty.  How do we do that?
>
> 1) We'll call the logical block number of that empty leaf block Empty,
>    and the block number of the parent of that empty leaf block Parent(Empty).
>    We'll also call the logical block number of the last entry in the
>    directory, Last.
>
> 2) Remove the pointer to Empty from Parent(Empty).  If that causes
>    Parent(Empty) to become empty, we also need to remove the
>    reference to Parent(Empty) in Parent(Parent(Empty)).  (And for 3
>    level htree directories, which are present in Largedir
>    directories, we might also need to do that one more level up.)
>
> 3) To free the directory block Empty, if Empty != Last, we copy the
>    contents of Last into the directory block Empty, and then determine
>    Parent(Last), and find the pointer to Last, and update it to be
>    Empty.  At this point, the last directory block is not in use, so
>    we can release the last directory block, and shrink the size of the
>    directory by one block.

If last is an intermediate dx node, is there a way to find out if it
actually is an intermediate dx node? Because an empty dirent block and
an intermediate dx block look the same. Unless we do dx_probe() there
is no way to know if a block is an intermediate dx block. Is that
right or am I missing something?

Looking at your following comment, if metadata_csum feature is
enabled, then we can distinguish if a block is an empty dirent block
or an index block based on dentry->rec_len. If metadata csum is
enabled, then for index blocks, fake_dentry->rec_len is set to
blocksize while for a dirent not dentry->rec_len is set to blocksize -
sizeof(ext4_dir_entry_tail). Is my understanding correct?

>
> 4) If we need to free Parent(Empty) because it was emptied by step 2,
>    follow the procedure in step 3.
>
> This has a couple of benefits.  The first is that it will work for
> large dx directories, where directory shrinking is most needed.
> Secondly, also allows the directory shrinking to take gradually place
> as the directory is emptied, instead waiting until last directory
> block is empty.  (And for multi-level dx directories that have been
> optimized via e2fsck -fD, the above is needed to allow shrinking from
> the end won't work at all.  Why that is the case is left as an
> exercise to the reader.  Hint: try creating a very large directory and
> optimize it using e2fsck -fD, and see where the index nodes end up.)
>
> BTW, for non-indexed directories, we must always shrink from the end.
> We can't play games with swapping with the last directory block,
> unless we can guarantee that (a) there are no open fd's on the
> directories (since there might be telldir/seekdir cookies that have to
> stay valid), and (b) the directory isn't exported via NFS.  Given that
> establishes these guarantees is tricky, and most of the file systems
> we care about will have indexing enabled, I'm not *that* concerned
> about making directory shrinking working well for non-indexed
> directories.  (Which will tend to only exist from ext2 file systems.)
>
>
> +static inline bool is_empty_dirent_block(struct inode *dir,
> +                                        struct buffer_head *bh)
> +{
> +       struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)bh->b_data;
> +       int     csum_size = 0;
> +
> +       if (ext4_has_metadata_csum(dir->i_sb))
> +               csum_size = sizeof(struct ext4_dir_entry_tail);
>
> The dx_tail only exists for indexed directories.  So the if statement
> should read:
>
>         if (ext4_has_metadata_csum(dir->i_sb) && is_dx(dir))
>
Ah I see, thanks. I didn't have metadata csum enabled so it worked for
me. Will fix it.

Thanks,
Harshad

>
> > +     /*
> > +      * If i was 0 when we began above loop, we would have overwritten count
> > +      * and limit values sin ce those values live in dx_entry->hash of the
>
> s/sin ce/since/
>
>
> > +     /*
> > +      * Read blocks from directory in reverse orders and clean them up one by
> > +      * one!
> > +      */
>
> s/reverse orders/reverse order/
>
>
> > +             info = &((struct dx_root *)frames[0].bh->b_data)->info;
> > +             if (info->indirect_levels > 0) {
> > +                     /*
> > +                      * We don't shrink in this case. That's because the
> > +                      * block that we just read could very well be an index
> > +                      * block. If it's an index block, we need to do special
> > +                      * handling to delete the block.
>
> Please delete this comment, and replace it with "we don't support dx
> directories with a depth > 2 for now".  See the above discussion...
>
> Cheers,
>
>                                         - Ted

  reply	other threads:[~2019-08-26  2:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 18:27 [PATCH] ext4: attempt to shrink directory on dentry removal Harshad Shirwadkar
2019-08-24  2:31 ` Theodore Y. Ts'o
2019-08-26  2:46   ` harshad shirwadkar [this message]
2019-08-26  3:19     ` Theodore Y. Ts'o
2019-08-26  5:07 ` Andreas Dilger
2019-08-26 21:46   ` harshad shirwadkar
2019-08-27  2:38     ` Theodore Y. Ts'o
2019-08-27 20:53     ` Andreas Dilger
2019-08-27  2:27   ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD+ocbyrJDbUFtQdq087iJCc=41CNACeAA2U_KQG-B4w19Soqw@mail.gmail.com' \
    --to=harshadshirwadkar@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).