linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix infinite directory reads
Date: Sun, 13 Aug 2023 19:23:53 +0100	[thread overview]
Message-ID: <CAL3q7H6ottXN_pvQoZNkiTcjv3R83zwgoSEqby53vuw=aKm-yA@mail.gmail.com> (raw)
In-Reply-To: <c9ceb0e15d92d0634600603b38965d9b6d986b6d.1691923900.git.fdmanana@suse.com>

On Sun, Aug 13, 2023 at 3:35 PM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> The readdir implementation currently processes always up to the last index
> it finds. This however can result in an infinite loop if the directory has
> a large number of entries such that they won't all fit in the given buffer
> passed to the readdir callback, that is, dir_emit() returns a non-zero
> value. Because in that case readdir() will be called again and if in the
> meanwhile new directory entries were added and we still can't put all the
> remaining entries in the buffer, we keep repeating this over and over.
>
> The following C program and test script reproduce the problem:
>
>   $ cat /mnt/readdir_prog.c
>   #include <sys/types.h>
>   #include <dirent.h>
>   #include <stdio.h>
>
>   int main(int argc, char *argv[])
>   {
>     DIR *dir = opendir(".");
>     struct dirent *dd;
>
>     while ((dd = readdir(dir))) {
>       printf("%s\n", dd->d_name);
>       rename(dd->d_name, "TEMPFILE");
>       rename("TEMPFILE", dd->d_name);
>     }
>     closedir(dir);
>   }
>
>   $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
>
>   $ cat test.sh
>   #!/bin/bash
>
>   DEV=/dev/sdi
>   MNT=/mnt/sdi
>
>   mkfs.btrfs -f $DEV &> /dev/null
>   #mkfs.xfs -f $DEV &> /dev/null
>   #mkfs.ext4 -F $DEV &> /dev/null
>
>   mount $DEV $MNT
>
>   mkdir $MNT/testdir
>   for ((i = 1; i <= 2000; i++)); do
>       echo -n > $MNT/testdir/file_$i
>   done
>
>   cd $MNT/testdir
>   /mnt/readdir_prog
>
>   cd /mnt
>
>   umount $MNT
>
> This behaviour is surprising to applications and it's unlike ext4, xfs,
> tmpfs, vfat and other filesystems, which always finish. In this case where
> new entries were added due to renames, some file names may be reported
> more than once, but this varies according to each filesystem - for example
> ext4 never reported the same file more than once while xfs reports the
> first 13 file names twice.
>
> So change our readdir implementation to track the last index number when
> opendir() is called and then make readdir() never process beyond that
> index number. This gives the same behaviour as ext4.
>
> Reported-by: Rob Landley <rob@landley.net>
> Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Tested-by: Rob Landley <rob@landley.net>

Given in this other thread:
https://lore.kernel.org/linux-btrfs/d5a42b8f-fd8d-7974-fd78-f76399e78541@landley.net/

> ---
>  fs/btrfs/ctree.h         |   1 +
>  fs/btrfs/delayed-inode.c |   5 +-
>  fs/btrfs/delayed-inode.h |   1 +
>  fs/btrfs/inode.c         | 131 +++++++++++++++++++++++----------------
>  4 files changed, 84 insertions(+), 54 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index f2d2b313bde5..9419f4e37a58 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -443,6 +443,7 @@ struct btrfs_drop_extents_args {
>
>  struct btrfs_file_private {
>         void *filldir_buf;
> +       u64 last_index;
>         struct extent_state *llseek_cached_state;
>  };
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 6b457b010cbc..6d51db066503 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1632,6 +1632,7 @@ int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode)
>  }
>
>  bool btrfs_readdir_get_delayed_items(struct inode *inode,
> +                                    u64 last_index,
>                                      struct list_head *ins_list,
>                                      struct list_head *del_list)
>  {
> @@ -1651,14 +1652,14 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode,
>
>         mutex_lock(&delayed_node->mutex);
>         item = __btrfs_first_delayed_insertion_item(delayed_node);
> -       while (item) {
> +       while (item && item->index <= last_index) {
>                 refcount_inc(&item->refs);
>                 list_add_tail(&item->readdir_list, ins_list);
>                 item = __btrfs_next_delayed_item(item);
>         }
>
>         item = __btrfs_first_delayed_deletion_item(delayed_node);
> -       while (item) {
> +       while (item && item->index <= last_index) {
>                 refcount_inc(&item->refs);
>                 list_add_tail(&item->readdir_list, del_list);
>                 item = __btrfs_next_delayed_item(item);
> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
> index 4f21daa3dbc7..dc1085b2a397 100644
> --- a/fs/btrfs/delayed-inode.h
> +++ b/fs/btrfs/delayed-inode.h
> @@ -148,6 +148,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info);
>
>  /* Used for readdir() */
>  bool btrfs_readdir_get_delayed_items(struct inode *inode,
> +                                    u64 last_index,
>                                      struct list_head *ins_list,
>                                      struct list_head *del_list);
>  void btrfs_readdir_put_delayed_items(struct inode *inode,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index c268c5861a24..3b7e8a1b9b8e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5885,6 +5885,74 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>         return d_splice_alias(inode, dentry);
>  }
>
> +/*
> + * Find the highest existing sequence number in a directory and then set the
> + * in-memory index_cnt variable to the first free sequence number.
> + */
> +static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
> +{
> +       struct btrfs_root *root = inode->root;
> +       struct btrfs_key key, found_key;
> +       struct btrfs_path *path;
> +       struct extent_buffer *leaf;
> +       int ret;
> +
> +       key.objectid = btrfs_ino(inode);
> +       key.type = BTRFS_DIR_INDEX_KEY;
> +       key.offset = (u64)-1;
> +
> +       path = btrfs_alloc_path();
> +       if (!path)
> +               return -ENOMEM;
> +
> +       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +       if (ret < 0)
> +               goto out;
> +       /* FIXME: we should be able to handle this */
> +       if (ret == 0)
> +               goto out;
> +       ret = 0;
> +
> +       if (path->slots[0] == 0) {
> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
> +               goto out;
> +       }
> +
> +       path->slots[0]--;
> +
> +       leaf = path->nodes[0];
> +       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +
> +       if (found_key.objectid != btrfs_ino(inode) ||
> +           found_key.type != BTRFS_DIR_INDEX_KEY) {
> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
> +               goto out;
> +       }
> +
> +       inode->index_cnt = found_key.offset + 1;
> +out:
> +       btrfs_free_path(path);
> +       return ret;
> +}
> +
> +static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index)
> +{
> +       if (dir->index_cnt == (u64)-1) {
> +               int ret;
> +
> +               ret = btrfs_inode_delayed_dir_index_count(dir);
> +               if (ret) {
> +                       ret = btrfs_set_inode_index_count(dir);
> +                       if (ret)
> +                               return ret;
> +               }
> +       }
> +
> +       *index = dir->index_cnt;
> +
> +       return 0;
> +}
> +
>  /*
>   * All this infrastructure exists because dir_emit can fault, and we are holding
>   * the tree lock when doing readdir.  For now just allocate a buffer and copy
> @@ -5897,10 +5965,17 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>  static int btrfs_opendir(struct inode *inode, struct file *file)
>  {
>         struct btrfs_file_private *private;
> +       u64 last_index;
> +       int ret;
> +
> +       ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index);
> +       if (ret)
> +               return ret;
>
>         private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL);
>         if (!private)
>                 return -ENOMEM;
> +       private->last_index = last_index;
>         private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>         if (!private->filldir_buf) {
>                 kfree(private);
> @@ -5967,7 +6042,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>
>         INIT_LIST_HEAD(&ins_list);
>         INIT_LIST_HEAD(&del_list);
> -       put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list);
> +       put = btrfs_readdir_get_delayed_items(inode, private->last_index,
> +                                             &ins_list, &del_list);
>
>  again:
>         key.type = BTRFS_DIR_INDEX_KEY;
> @@ -5985,6 +6061,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>                         break;
>                 if (found_key.offset < ctx->pos)
>                         continue;
> +               if (found_key.offset > private->last_index)
> +                       break;
>                 if (btrfs_should_delete_dir_index(&del_list, found_key.offset))
>                         continue;
>                 di = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dir_item);
> @@ -6120,57 +6198,6 @@ static int btrfs_update_time(struct inode *inode, struct timespec64 *now,
>         return dirty ? btrfs_dirty_inode(BTRFS_I(inode)) : 0;
>  }
>
> -/*
> - * find the highest existing sequence number in a directory
> - * and then set the in-memory index_cnt variable to reflect
> - * free sequence numbers
> - */
> -static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
> -{
> -       struct btrfs_root *root = inode->root;
> -       struct btrfs_key key, found_key;
> -       struct btrfs_path *path;
> -       struct extent_buffer *leaf;
> -       int ret;
> -
> -       key.objectid = btrfs_ino(inode);
> -       key.type = BTRFS_DIR_INDEX_KEY;
> -       key.offset = (u64)-1;
> -
> -       path = btrfs_alloc_path();
> -       if (!path)
> -               return -ENOMEM;
> -
> -       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> -       if (ret < 0)
> -               goto out;
> -       /* FIXME: we should be able to handle this */
> -       if (ret == 0)
> -               goto out;
> -       ret = 0;
> -
> -       if (path->slots[0] == 0) {
> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
> -               goto out;
> -       }
> -
> -       path->slots[0]--;
> -
> -       leaf = path->nodes[0];
> -       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> -
> -       if (found_key.objectid != btrfs_ino(inode) ||
> -           found_key.type != BTRFS_DIR_INDEX_KEY) {
> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
> -               goto out;
> -       }
> -
> -       inode->index_cnt = found_key.offset + 1;
> -out:
> -       btrfs_free_path(path);
> -       return ret;
> -}
> -
>  /*
>   * helper to find a free sequence number in a given directory.  This current
>   * code is very simple, later versions will do smarter things in the btree
> --
> 2.34.1
>

  reply	other threads:[~2023-08-13 18:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-13 11:34 [PATCH] btrfs: fix infinite directory reads fdmanana
2023-08-13 18:23 ` Filipe Manana [this message]
2023-08-14 14:18 ` David Sterba
2024-01-24 22:55 ` Eugeniu Rosca
2024-01-25 12:04   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H6ottXN_pvQoZNkiTcjv3R83zwgoSEqby53vuw=aKm-yA@mail.gmail.com' \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).