All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>,
	Filipe Manana <fdmanana@suse.com>,
	Anand Jain <anand.jain@oracle.com>
Subject: Re: [PATCH v3 1/3] btrfs: discard relocated block groups
Date: Wed, 14 Apr 2021 12:16:31 +0100	[thread overview]
Message-ID: <CAL3q7H6Bgqkdf8Z+xRBH8C=XxtrGzXyNUf6BHaLw54LZb3Agsg@mail.gmail.com> (raw)
In-Reply-To: <PH0PR04MB74167FB19522DBEB1F70E80D9B4F9@PH0PR04MB7416.namprd04.prod.outlook.com>

On Tue, Apr 13, 2021 at 6:48 PM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 13/04/2021 14:57, Filipe Manana wrote:
> > And what about the other mechanism that triggers discards on pinned
> > extents, after the transaction commits the super blocks?
> > Why isn't that happening (with -o discard=sync)? We create the delayed
> > references to drop extents from the relocated block group, which
> > results in pinning extents.
> > This is the case that surprised me that it isn't working for you.
>
> I think this is the case. I would have expected to end up in this
> part of btrfs_finish_extent_commit():
>
>
>         /*
>          * Transaction is finished.  We don't need the lock anymore.  We
>          * do need to clean up the block groups in case of a transaction
>          * abort.
>          */
>         deleted_bgs = &trans->transaction->deleted_bgs;
>         list_for_each_entry_safe(block_group, tmp, deleted_bgs, bg_list) {
>                 u64 trimmed = 0;
>
>                 ret = -EROFS;
>                 if (!TRANS_ABORTED(trans))
>                         ret = btrfs_discard_extent(fs_info,
>                                                    block_group->start,
>                                                    block_group->length,
>                                                    &trimmed);
>
>                 list_del_init(&block_group->bg_list);
>                 btrfs_unfreeze_block_group(block_group);
>                 btrfs_put_block_group(block_group);
>
>                 if (ret) {
>                         const char *errstr = btrfs_decode_error(ret);
>                         btrfs_warn(fs_info,
>                            "discard failed while removing blockgroup: errno=%d %s",
>                                    ret, errstr);
>                 }
>         }
>
> and the btrfs_discard_extent() over the whole block group would then trigger a
> REQ_OP_ZONE_RESET operation, resetting the device's zone.
>
> But as btrfs_delete_unused_bgs() doesn't add the block group to the
> ->deleted_bgs list, we're not reaching above code. I /think/ (i.e. verification
> pending) the -o discard=sync case works for regular block devices, as each extent
> is discarded on it's own, by this (also in btrfs_finish_extent_commit()):
>
>         while (!TRANS_ABORTED(trans)) {
>                 struct extent_state *cached_state = NULL;
>
>                 mutex_lock(&fs_info->unused_bg_unpin_mutex);
>                 ret = find_first_extent_bit(unpin, 0, &start, &end,
>                                             EXTENT_DIRTY, &cached_state);
>                 if (ret) {
>                         mutex_unlock(&fs_info->unused_bg_unpin_mutex);
>                         break;
>                 }
>
>                 if (btrfs_test_opt(fs_info, DISCARD_SYNC))
>                         ret = btrfs_discard_extent(fs_info, start,
>                                                    end + 1 - start, NULL);
>
>                 clear_extent_dirty(unpin, start, end, &cached_state);
>                 unpin_extent_range(fs_info, start, end, true);
>                 mutex_unlock(&fs_info->unused_bg_unpin_mutex);
>                 free_extent_state(cached_state);
>                 cond_resched();
>         }
>
> If this is the case, my patch will essentially discard the data twice, for a
> non-zoned block device, which is certainly not ideal.

Yep, that's what puzzled me, why the need to do it for non-zoned file
systems when using -o discard=sync.
I assumed you ran into a case where discard was not happening due to
some bug bug in the extent pinning/unpinning mechanism.

> So the correct fix would
> be to get the block group into the 'trans->transaction->deleted_bgs' list
> after relocation, which would work if we wouldn't check for block_group->ro in
> btrfs_delete_unused_bgs(), but I suppose this check is there for a reason.

Actually the check for ->ro does not make sense anymore since I
introduced the delete_unused_bgs_mutex in commit
67c5e7d464bc466471b05e027abe8a6b29687ebd.

When the ->ro check was added
(47ab2a6c689913db23ccae38349714edf8365e0a), it was meant to prevent
the cleaner kthread and relocation tasks from calling
btrfs_remove_chunk() concurrently, but checking for ->ro only was
buggy, hence the addition of delete_unused_bgs_mutex later.

>
> How about changing the patch to the following:

Looks good.
However would just removing the ->ro check by enough as well?

Thanks Johannes.

>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6d9b2369f17a..ba13b2ea3c6f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3103,6 +3103,9 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>         struct btrfs_root *root = fs_info->chunk_root;
>         struct btrfs_trans_handle *trans;
>         struct btrfs_block_group *block_group;
> +       u64 length;
>         int ret;
>
>         /*
> @@ -3130,8 +3133,16 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>         if (!block_group)
>                 return -ENOENT;
>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> +       length = block_group->length;
>         btrfs_put_block_group(block_group);
>
> +       /*
> +        * For a zoned filesystem we need to discard/zone-reset here, as the
> +        * discard code won't discard the whole block-group, but only single
> +        * extents.
> +        */
> +       if (btrfs_is_zoned(fs_info)) {
> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
> +               if (ret) /* Non working discard is not fatal */
> +                       btrfs_warn(fs_info, "discarding chunk %llu failed",
> +                                  chunk_offset);
> +       }
> +
>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
>                                                      chunk_offset);
>         if (IS_ERR(trans)) {



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2021-04-14 11:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 10:53 [PATCH v3 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
2021-04-09 10:53 ` [PATCH v3 1/3] btrfs: discard relocated block groups Johannes Thumshirn
2021-04-09 11:37   ` Filipe Manana
2021-04-12 13:49     ` Johannes Thumshirn
2021-04-12 14:08       ` Filipe Manana
2021-04-12 14:21         ` Johannes Thumshirn
2021-04-13 12:43           ` Johannes Thumshirn
2021-04-13 12:57             ` Filipe Manana
2021-04-13 17:48               ` Johannes Thumshirn
2021-04-14 11:16                 ` Filipe Manana [this message]
2021-04-14 11:22                   ` Johannes Thumshirn
2021-04-14 11:32                     ` Filipe Manana
2021-04-14 12:59                     ` Johannes Thumshirn
2021-04-14 13:13                       ` Filipe Manana
2021-04-09 10:53 ` [PATCH v3 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
2021-04-09 10:53 ` [PATCH v3 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H6Bgqkdf8Z+xRBH8C=XxtrGzXyNUf6BHaLw54LZb3Agsg@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.