All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] btrfs: clean snapshots one by one
@ 2013-02-11 16:11 David Sterba
  2013-02-17 19:55 ` Alex Lyakas
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2013-02-11 16:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: alex.btrfs, David Sterba

Each time pick one dead root from the list and let the caller know if
it's needed to continue. This should improve responsiveness during
umount and balance which at some point wait for cleaning all currently
queued dead roots.

A new dead root is added to the end of the list, so the snapshots
disappear in the order of deletion.

Process snapshot cleaning is now done only from the cleaner thread and
the others wake it if needed.

Signed-off-by: David Sterba <dsterba@suse.cz>
---

* btrfs_clean_old_snapshots is removed from the reloc loop, I don't know if this
  is safe wrt reloc's assumptions

* btrfs_run_delayed_iputs is left in place in super_commit, may get removed as
  well because transaction commit calls it in the end

* the responsiveness can be improved further if btrfs_drop_snapshot check
  fs_closing, but this needs changes to error handling in the main reloc loop

 fs/btrfs/disk-io.c     |    8 ++++--
 fs/btrfs/relocation.c  |    3 --
 fs/btrfs/transaction.c |   57 ++++++++++++++++++++++++++++++++----------------
 fs/btrfs/transaction.h |    2 +-
 4 files changed, 44 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 51bff86..6a02336 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg)
 	struct btrfs_root *root = arg;
 
 	do {
+		int again = 0;
+
 		if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
 		    mutex_trylock(&root->fs_info->cleaner_mutex)) {
 			btrfs_run_delayed_iputs(root);
-			btrfs_clean_old_snapshots(root);
+			again = btrfs_clean_one_deleted_snapshot(root);
 			mutex_unlock(&root->fs_info->cleaner_mutex);
 			btrfs_run_defrag_inodes(root->fs_info);
 		}
 
-		if (!try_to_freeze()) {
+		if (!try_to_freeze() && !again) {
 			set_current_state(TASK_INTERRUPTIBLE);
 			if (!kthread_should_stop())
 				schedule();
@@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root)
 
 	mutex_lock(&root->fs_info->cleaner_mutex);
 	btrfs_run_delayed_iputs(root);
-	btrfs_clean_old_snapshots(root);
 	mutex_unlock(&root->fs_info->cleaner_mutex);
+	wake_up_process(root->fs_info->cleaner_kthread);
 
 	/* wait until ongoing cleanup work done */
 	down_write(&root->fs_info->cleanup_work_sem);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ba5a321..ab6a718 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start)
 
 	while (1) {
 		mutex_lock(&fs_info->cleaner_mutex);
-
-		btrfs_clean_old_snapshots(fs_info->tree_root);
 		ret = relocate_block_group(rc);
-
 		mutex_unlock(&fs_info->cleaner_mutex);
 		if (ret < 0) {
 			err = ret;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 361fb7d..f1e3606 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -895,7 +895,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans,
 int btrfs_add_dead_root(struct btrfs_root *root)
 {
 	spin_lock(&root->fs_info->trans_lock);
-	list_add(&root->root_list, &root->fs_info->dead_roots);
+	list_add_tail(&root->root_list, &root->fs_info->dead_roots);
 	spin_unlock(&root->fs_info->trans_lock);
 	return 0;
 }
@@ -1783,31 +1783,50 @@ cleanup_transaction:
 }
 
 /*
- * interface function to delete all the snapshots we have scheduled for deletion
+ * return < 0 if error
+ * 0 if there are no more dead_roots at the time of call
+ * 1 there are more to be processed, call me again
+ *
+ * The return value indicates there are certainly more snapshots to delete, but
+ * if there comes a new one during processing, it may return 0. We don't mind,
+ * because btrfs_commit_super will poke cleaner thread and it will process it a
+ * few seconds later.
  */
-int btrfs_clean_old_snapshots(struct btrfs_root *root)
+int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
 {
-	LIST_HEAD(list);
+	int ret;
+	int run_again = 1;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 
+	if (root->fs_info->sb->s_flags & MS_RDONLY) {
+		pr_debug(G "btrfs: cleaner called for RO fs!\n");
+		return 0;
+	}
+
 	spin_lock(&fs_info->trans_lock);
-	list_splice_init(&fs_info->dead_roots, &list);
+	if (list_empty(&fs_info->dead_roots)) {
+		spin_unlock(&fs_info->trans_lock);
+		return 0;
+	}
+	root = list_first_entry(&fs_info->dead_roots,
+			struct btrfs_root, root_list);
+	list_del(&root->root_list);
 	spin_unlock(&fs_info->trans_lock);
 
-	while (!list_empty(&list)) {
-		int ret;
-
-		root = list_entry(list.next, struct btrfs_root, root_list);
-		list_del(&root->root_list);
+	pr_debug("btrfs: cleaner removing %llu\n",
+			(unsigned long long)root->objectid);
 
-		btrfs_kill_all_delayed_nodes(root);
+	btrfs_kill_all_delayed_nodes(root);
 
-		if (btrfs_header_backref_rev(root->node) <
-		    BTRFS_MIXED_BACKREF_REV)
-			ret = btrfs_drop_snapshot(root, NULL, 0, 0);
-		else
-			ret =btrfs_drop_snapshot(root, NULL, 1, 0);
-		BUG_ON(ret < 0);
-	}
-	return 0;
+	if (btrfs_header_backref_rev(root->node) <
+			BTRFS_MIXED_BACKREF_REV)
+		ret = btrfs_drop_snapshot(root, NULL, 0, 0);
+	else
+		ret = btrfs_drop_snapshot(root, NULL, 1, 0);
+	/*
+	 * If we encounter a transaction abort during snapshot cleaning, we
+	 * don't want to crash here
+	 */
+	BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
+	return run_again || ret == -EAGAIN;
 }
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 69700f7..f8e9583 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -118,7 +118,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
 
 int btrfs_add_dead_root(struct btrfs_root *root);
 int btrfs_defrag_root(struct btrfs_root *root, int cacheonly);
-int btrfs_clean_old_snapshots(struct btrfs_root *root);
+int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root);
 int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] btrfs: clean snapshots one by one
  2013-02-11 16:11 [RFC][PATCH] btrfs: clean snapshots one by one David Sterba
@ 2013-02-17 19:55 ` Alex Lyakas
  2013-02-22 23:46   ` David Sterba
  2013-03-01 16:17   ` [PATCH v2] " David Sterba
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Lyakas @ 2013-02-17 19:55 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

Hi David,
thank you for addressing this issue.

On Mon, Feb 11, 2013 at 6:11 PM, David Sterba <dsterba@suse.cz> wrote:
> Each time pick one dead root from the list and let the caller know if
> it's needed to continue. This should improve responsiveness during
> umount and balance which at some point wait for cleaning all currently
> queued dead roots.
>
> A new dead root is added to the end of the list, so the snapshots
> disappear in the order of deletion.
>
> Process snapshot cleaning is now done only from the cleaner thread and
> the others wake it if needed.
This is great.


>
> Signed-off-by: David Sterba <dsterba@suse.cz>
> ---
>
> * btrfs_clean_old_snapshots is removed from the reloc loop, I don't know if this
>   is safe wrt reloc's assumptions
>
> * btrfs_run_delayed_iputs is left in place in super_commit, may get removed as
>   well because transaction commit calls it in the end
>
> * the responsiveness can be improved further if btrfs_drop_snapshot check
>   fs_closing, but this needs changes to error handling in the main reloc loop
>
>  fs/btrfs/disk-io.c     |    8 ++++--
>  fs/btrfs/relocation.c  |    3 --
>  fs/btrfs/transaction.c |   57 ++++++++++++++++++++++++++++++++----------------
>  fs/btrfs/transaction.h |    2 +-
>  4 files changed, 44 insertions(+), 26 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 51bff86..6a02336 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg)
>         struct btrfs_root *root = arg;
>
>         do {
> +               int again = 0;
> +
>                 if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
>                     mutex_trylock(&root->fs_info->cleaner_mutex)) {
>                         btrfs_run_delayed_iputs(root);
> -                       btrfs_clean_old_snapshots(root);
> +                       again = btrfs_clean_one_deleted_snapshot(root);
>                         mutex_unlock(&root->fs_info->cleaner_mutex);
>                         btrfs_run_defrag_inodes(root->fs_info);
>                 }
>
> -               if (!try_to_freeze()) {
> +               if (!try_to_freeze() && !again) {
>                         set_current_state(TASK_INTERRUPTIBLE);
>                         if (!kthread_should_stop())
>                                 schedule();
> @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root)
>
>         mutex_lock(&root->fs_info->cleaner_mutex);
>         btrfs_run_delayed_iputs(root);
> -       btrfs_clean_old_snapshots(root);
>         mutex_unlock(&root->fs_info->cleaner_mutex);
> +       wake_up_process(root->fs_info->cleaner_kthread);
I am probably missing something, but if the cleaner wakes up here,
won't it attempt cleaning the next snap? Because I don't see the
cleaner checking anywhere that we are unmounting. Or at this point
dead_roots is supposed to be empty?


>
>         /* wait until ongoing cleanup work done */
>         down_write(&root->fs_info->cleanup_work_sem);
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index ba5a321..ab6a718 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start)
>
>         while (1) {
>                 mutex_lock(&fs_info->cleaner_mutex);
> -
> -               btrfs_clean_old_snapshots(fs_info->tree_root);
>                 ret = relocate_block_group(rc);
> -
>                 mutex_unlock(&fs_info->cleaner_mutex);
>                 if (ret < 0) {
>                         err = ret;
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 361fb7d..f1e3606 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -895,7 +895,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans,
>  int btrfs_add_dead_root(struct btrfs_root *root)
>  {
>         spin_lock(&root->fs_info->trans_lock);
> -       list_add(&root->root_list, &root->fs_info->dead_roots);
> +       list_add_tail(&root->root_list, &root->fs_info->dead_roots);
>         spin_unlock(&root->fs_info->trans_lock);
>         return 0;
>  }
> @@ -1783,31 +1783,50 @@ cleanup_transaction:
>  }
>
>  /*
> - * interface function to delete all the snapshots we have scheduled for deletion
> + * return < 0 if error
> + * 0 if there are no more dead_roots at the time of call
> + * 1 there are more to be processed, call me again
> + *
> + * The return value indicates there are certainly more snapshots to delete, but
> + * if there comes a new one during processing, it may return 0. We don't mind,
> + * because btrfs_commit_super will poke cleaner thread and it will process it a
> + * few seconds later.
>   */
> -int btrfs_clean_old_snapshots(struct btrfs_root *root)
> +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
>  {
> -       LIST_HEAD(list);
> +       int ret;
> +       int run_again = 1;
>         struct btrfs_fs_info *fs_info = root->fs_info;
>
> +       if (root->fs_info->sb->s_flags & MS_RDONLY) {
> +               pr_debug(G "btrfs: cleaner called for RO fs!\n");
> +               return 0;
> +       }
> +
>         spin_lock(&fs_info->trans_lock);
> -       list_splice_init(&fs_info->dead_roots, &list);
> +       if (list_empty(&fs_info->dead_roots)) {
> +               spin_unlock(&fs_info->trans_lock);
> +               return 0;
> +       }
> +       root = list_first_entry(&fs_info->dead_roots,
> +                       struct btrfs_root, root_list);
> +       list_del(&root->root_list);
>         spin_unlock(&fs_info->trans_lock);
>
> -       while (!list_empty(&list)) {
> -               int ret;
> -
> -               root = list_entry(list.next, struct btrfs_root, root_list);
> -               list_del(&root->root_list);
> +       pr_debug("btrfs: cleaner removing %llu\n",
> +                       (unsigned long long)root->objectid);
>
> -               btrfs_kill_all_delayed_nodes(root);
> +       btrfs_kill_all_delayed_nodes(root);
>
> -               if (btrfs_header_backref_rev(root->node) <
> -                   BTRFS_MIXED_BACKREF_REV)
> -                       ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> -               else
> -                       ret =btrfs_drop_snapshot(root, NULL, 1, 0);
> -               BUG_ON(ret < 0);
> -       }
> -       return 0;
> +       if (btrfs_header_backref_rev(root->node) <
> +                       BTRFS_MIXED_BACKREF_REV)
> +               ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> +       else
> +               ret = btrfs_drop_snapshot(root, NULL, 1, 0);
> +       /*
> +        * If we encounter a transaction abort during snapshot cleaning, we
> +        * don't want to crash here
> +        */
> +       BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
> +       return run_again || ret == -EAGAIN;
Can you tell me when btrfs_drop_snapshot is supposed to return EAGAIN?

>  }
> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
> index 69700f7..f8e9583 100644
> --- a/fs/btrfs/transaction.h
> +++ b/fs/btrfs/transaction.h
> @@ -118,7 +118,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
>
>  int btrfs_add_dead_root(struct btrfs_root *root);
>  int btrfs_defrag_root(struct btrfs_root *root, int cacheonly);
> -int btrfs_clean_old_snapshots(struct btrfs_root *root);
> +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
>  int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>                              struct btrfs_root *root);
>  int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
> --
> 1.7.9
>

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] btrfs: clean snapshots one by one
  2013-02-17 19:55 ` Alex Lyakas
@ 2013-02-22 23:46   ` David Sterba
  2013-03-01 16:17   ` [PATCH v2] " David Sterba
  1 sibling, 0 replies; 12+ messages in thread
From: David Sterba @ 2013-02-22 23:46 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: David Sterba, linux-btrfs

On Sun, Feb 17, 2013 at 09:55:23PM +0200, Alex Lyakas wrote:
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg)
> >         struct btrfs_root *root = arg;
> >
> >         do {
> > +               int again = 0;
> > +
> >                 if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
> >                     mutex_trylock(&root->fs_info->cleaner_mutex)) {
> >                         btrfs_run_delayed_iputs(root);
> > -                       btrfs_clean_old_snapshots(root);
> > +                       again = btrfs_clean_one_deleted_snapshot(root);
> >                         mutex_unlock(&root->fs_info->cleaner_mutex);
> >                         btrfs_run_defrag_inodes(root->fs_info);
> >                 }
> >
> > -               if (!try_to_freeze()) {
> > +               if (!try_to_freeze() && !again) {
> >                         set_current_state(TASK_INTERRUPTIBLE);
> >                         if (!kthread_should_stop())
> >                                 schedule();
> > @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root)
> >
> >         mutex_lock(&root->fs_info->cleaner_mutex);
> >         btrfs_run_delayed_iputs(root);
> > -       btrfs_clean_old_snapshots(root);
> >         mutex_unlock(&root->fs_info->cleaner_mutex);
> > +       wake_up_process(root->fs_info->cleaner_kthread);
> I am probably missing something, but if the cleaner wakes up here,
> won't it attempt cleaning the next snap? Because I don't see the
> cleaner checking anywhere that we are unmounting. Or at this point
> dead_roots is supposed to be empty?

No, you're right, the check of umount semaphore is missing (was in the
dusted patchset and was titled 'avoid cleaner deadlock' which we solve
now in another way, so I did not realize the patch is actually needed).
So, this hunk should do it:

--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1627,11 +1627,13 @@ static int cleaner_kthread(void *arg)
                int again = 0;

                if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
+                   down_read_trylock(&root->fs_info->sb->s_umount) &&
                    mutex_trylock(&root->fs_info->cleaner_mutex)) {
                        btrfs_run_delayed_iputs(root);
                        again = btrfs_clean_one_deleted_snapshot(root);
                        mutex_unlock(&root->fs_info->cleaner_mutex);
                        btrfs_run_defrag_inodes(root->fs_info);
+                       up_read(&root->fs_info->sb->s_umount);
                }

                if (!try_to_freeze() && !again) {
---

Seems that also checking for btrfs_fs_closing != 0 would help here.

And to the second part, no dead_roots is not supposed to be empty.

> > @@ -1783,31 +1783,50 @@ cleanup_transaction:
> >  }
> >
> >  /*
> > - * interface function to delete all the snapshots we have scheduled for deletion
> > + * return < 0 if error
> > + * 0 if there are no more dead_roots at the time of call
> > + * 1 there are more to be processed, call me again
> > + *
> > + * The return value indicates there are certainly more snapshots to delete, but
> > + * if there comes a new one during processing, it may return 0. We don't mind,
> > + * because btrfs_commit_super will poke cleaner thread and it will process it a
> > + * few seconds later.
> >   */
> > -int btrfs_clean_old_snapshots(struct btrfs_root *root)
> > +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
> >  {
> > -       LIST_HEAD(list);
> > +       int ret;
> > +       int run_again = 1;
> >         struct btrfs_fs_info *fs_info = root->fs_info;
> >
> > +       if (root->fs_info->sb->s_flags & MS_RDONLY) {
> > +               pr_debug(G "btrfs: cleaner called for RO fs!\n");
> > +               return 0;
> > +       }
> > +
> >         spin_lock(&fs_info->trans_lock);
> > -       list_splice_init(&fs_info->dead_roots, &list);
> > +       if (list_empty(&fs_info->dead_roots)) {
> > +               spin_unlock(&fs_info->trans_lock);
> > +               return 0;
> > +       }
> > +       root = list_first_entry(&fs_info->dead_roots,
> > +                       struct btrfs_root, root_list);
> > +       list_del(&root->root_list);
> >         spin_unlock(&fs_info->trans_lock);
> >
> > -       while (!list_empty(&list)) {
> > -               int ret;
> > -
> > -               root = list_entry(list.next, struct btrfs_root, root_list);
> > -               list_del(&root->root_list);
> > +       pr_debug("btrfs: cleaner removing %llu\n",
> > +                       (unsigned long long)root->objectid);
> >
> > -               btrfs_kill_all_delayed_nodes(root);
> > +       btrfs_kill_all_delayed_nodes(root);
> >
> > -               if (btrfs_header_backref_rev(root->node) <
> > -                   BTRFS_MIXED_BACKREF_REV)
> > -                       ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> > -               else
> > -                       ret =btrfs_drop_snapshot(root, NULL, 1, 0);
> > -               BUG_ON(ret < 0);
> > -       }
> > -       return 0;
> > +       if (btrfs_header_backref_rev(root->node) <
> > +                       BTRFS_MIXED_BACKREF_REV)
> > +               ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> > +       else
> > +               ret = btrfs_drop_snapshot(root, NULL, 1, 0);
> > +       /*
> > +        * If we encounter a transaction abort during snapshot cleaning, we
> > +        * don't want to crash here
> > +        */
> > +       BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
> > +       return run_again || ret == -EAGAIN;
> Can you tell me when btrfs_drop_snapshot is supposed to return EAGAIN?

That's another inconsistency of this patch, sorry.

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6976,6 +6976,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
        wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root);

        while (1) {
+               if (btrfs_fs_closing(root->fs_info)) {
+                       printk(KERN_DEBUG "btrfs: drop snapshot early exit\n");
+                       err = -EAGAIN;
+                       goto out_end_trans;
+               }
+
                ret = walk_down_tree(trans, root, path, wc);
                if (ret < 0) {
                        err = ret;
---

ie. the main loop in drop_snapshot checks for fs going down and returns EAGAIN
in that case. Initially the hunk was there, but drop_snapshot is also called
from inside reloc loop and the callers do not expect to end it early. (Though
it's possible to enhance the exit paths, it's beyond of this patch now.)

Fortunatelly there's the for_reloc argument in

6921 int btrfs_drop_snapshot(struct btrfs_root *root,
6922                          struct btrfs_block_rsv *block_rsv, int update_ref,
6923                          int for_reloc)
6924 {

so we can safely exit early only if it's not in reloc.

Thanks for your comments, I'll send updated version.


david

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] btrfs: clean snapshots one by one
  2013-02-17 19:55 ` Alex Lyakas
  2013-02-22 23:46   ` David Sterba
@ 2013-03-01 16:17   ` David Sterba
  2013-03-01 16:30     ` David Sterba
  2013-03-02 21:32     ` Alex Lyakas
  1 sibling, 2 replies; 12+ messages in thread
From: David Sterba @ 2013-03-01 16:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: alex.btrfs, David Sterba

Each time pick one dead root from the list and let the caller know if
it's needed to continue. This should improve responsiveness during
umount and balance which at some point wait for cleaning all currently
queued dead roots.

A new dead root is added to the end of the list, so the snapshots
disappear in the order of deletion.

Process snapshot cleaning is now done only from the cleaner thread and
the others wake it if needed.

Signed-off-by: David Sterba <dsterba@suse.cz>
---

v1->v2:
- added s_umount trylock in cleaner thread
- added exit into drop_snapshot if fs is going down

patch based on cmason/integration

 fs/btrfs/disk-io.c     |   10 ++++++--
 fs/btrfs/extent-tree.c |    8 ++++++
 fs/btrfs/relocation.c  |    3 --
 fs/btrfs/transaction.c |   57 ++++++++++++++++++++++++++++++++----------------
 fs/btrfs/transaction.h |    2 +-
 5 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eb7c143..cc85fc7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1652,15 +1652,19 @@ static int cleaner_kthread(void *arg)
 	struct btrfs_root *root = arg;
 
 	do {
+		int again = 0;
+
 		if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
+		    down_read_trylock(&root->fs_info->sb->s_umount) &&
 		    mutex_trylock(&root->fs_info->cleaner_mutex)) {
 			btrfs_run_delayed_iputs(root);
-			btrfs_clean_old_snapshots(root);
+			again = btrfs_clean_one_deleted_snapshot(root);
 			mutex_unlock(&root->fs_info->cleaner_mutex);
 			btrfs_run_defrag_inodes(root->fs_info);
+			up_read(&root->fs_info->sb->s_umount);
 		}
 
-		if (!try_to_freeze()) {
+		if (!try_to_freeze() && !again) {
 			set_current_state(TASK_INTERRUPTIBLE);
 			if (!kthread_should_stop())
 				schedule();
@@ -3338,8 +3342,8 @@ int btrfs_commit_super(struct btrfs_root *root)
 
 	mutex_lock(&root->fs_info->cleaner_mutex);
 	btrfs_run_delayed_iputs(root);
-	btrfs_clean_old_snapshots(root);
 	mutex_unlock(&root->fs_info->cleaner_mutex);
+	wake_up_process(root->fs_info->cleaner_kthread);
 
 	/* wait until ongoing cleanup work done */
 	down_write(&root->fs_info->cleanup_work_sem);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d2b3a5e..0119ae7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7078,6 +7078,8 @@ static noinline int walk_up_tree(struct btrfs_trans_handle *trans,
  * reference count by one. if update_ref is true, this function
  * also make sure backrefs for the shared block and all lower level
  * blocks are properly updated.
+ *
+ * If called with for_reloc == 0, may exit early with -EAGAIN
  */
 int btrfs_drop_snapshot(struct btrfs_root *root,
 			 struct btrfs_block_rsv *block_rsv, int update_ref,
@@ -7179,6 +7181,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 	wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root);
 
 	while (1) {
+		if (!for_reloc && btrfs_fs_closing(root->fs_info)) {
+			pr_debug("btrfs: drop snapshot early exit\n");
+			err = -EAGAIN;
+			goto out_end_trans;
+		}
+
 		ret = walk_down_tree(trans, root, path, wc);
 		if (ret < 0) {
 			err = ret;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ba5a321..ab6a718 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start)
 
 	while (1) {
 		mutex_lock(&fs_info->cleaner_mutex);
-
-		btrfs_clean_old_snapshots(fs_info->tree_root);
 		ret = relocate_block_group(rc);
-
 		mutex_unlock(&fs_info->cleaner_mutex);
 		if (ret < 0) {
 			err = ret;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a83d486..6b233c15 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -950,7 +950,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans,
 int btrfs_add_dead_root(struct btrfs_root *root)
 {
 	spin_lock(&root->fs_info->trans_lock);
-	list_add(&root->root_list, &root->fs_info->dead_roots);
+	list_add_tail(&root->root_list, &root->fs_info->dead_roots);
 	spin_unlock(&root->fs_info->trans_lock);
 	return 0;
 }
@@ -1858,31 +1858,50 @@ cleanup_transaction:
 }
 
 /*
- * interface function to delete all the snapshots we have scheduled for deletion
+ * return < 0 if error
+ * 0 if there are no more dead_roots at the time of call
+ * 1 there are more to be processed, call me again
+ *
+ * The return value indicates there are certainly more snapshots to delete, but
+ * if there comes a new one during processing, it may return 0. We don't mind,
+ * because btrfs_commit_super will poke cleaner thread and it will process it a
+ * few seconds later.
  */
-int btrfs_clean_old_snapshots(struct btrfs_root *root)
+int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
 {
-	LIST_HEAD(list);
+	int ret;
+	int run_again = 1;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 
+	if (root->fs_info->sb->s_flags & MS_RDONLY) {
+		pr_debug("btrfs: cleaner called for RO fs!\n");
+		return 0;
+	}
+
 	spin_lock(&fs_info->trans_lock);
-	list_splice_init(&fs_info->dead_roots, &list);
+	if (list_empty(&fs_info->dead_roots)) {
+		spin_unlock(&fs_info->trans_lock);
+		return 0;
+	}
+	root = list_first_entry(&fs_info->dead_roots,
+			struct btrfs_root, root_list);
+	list_del(&root->root_list);
 	spin_unlock(&fs_info->trans_lock);
 
-	while (!list_empty(&list)) {
-		int ret;
-
-		root = list_entry(list.next, struct btrfs_root, root_list);
-		list_del(&root->root_list);
+	pr_debug("btrfs: cleaner removing %llu\n",
+			(unsigned long long)root->objectid);
 
-		btrfs_kill_all_delayed_nodes(root);
+	btrfs_kill_all_delayed_nodes(root);
 
-		if (btrfs_header_backref_rev(root->node) <
-		    BTRFS_MIXED_BACKREF_REV)
-			ret = btrfs_drop_snapshot(root, NULL, 0, 0);
-		else
-			ret =btrfs_drop_snapshot(root, NULL, 1, 0);
-		BUG_ON(ret < 0);
-	}
-	return 0;
+	if (btrfs_header_backref_rev(root->node) <
+			BTRFS_MIXED_BACKREF_REV)
+		ret = btrfs_drop_snapshot(root, NULL, 0, 0);
+	else
+		ret = btrfs_drop_snapshot(root, NULL, 1, 0);
+	/*
+	 * If we encounter a transaction abort during snapshot cleaning, we
+	 * don't want to crash here
+	 */
+	BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
+	return run_again || ret == -EAGAIN;
 }
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 5afd7b1..52f6f25 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -121,7 +121,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
 
 int btrfs_add_dead_root(struct btrfs_root *root);
 int btrfs_defrag_root(struct btrfs_root *root);
-int btrfs_clean_old_snapshots(struct btrfs_root *root);
+int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root);
 int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-01 16:17   ` [PATCH v2] " David Sterba
@ 2013-03-01 16:30     ` David Sterba
  2013-03-06  9:08       ` Ilya Dryomov
  2013-03-02 21:32     ` Alex Lyakas
  1 sibling, 1 reply; 12+ messages in thread
From: David Sterba @ 2013-03-01 16:30 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, alex.btrfs, idryomov

The mail did not make it to the list on Monday, anyway, now I have a
few testing results to share.

The umount responsiveness is much improved, it took a few seconds when
there were many roots to clean scheduled.

With

$ echo 'func btrfs_drop_snapshot +pf' > /sys/debug/kernel/dynamic_debug/control
$ echo 'func btrfs_clean_one_deleted_snapshot +pf' > /sys/debug/kernel/dynamic_debug/control

one can watch how it proceeds.

After umount and and mount, the cleaning process starts again, after 30
seconds when transaction commit triggers.

Next test was to let it run with balance (just 'fi balance', no other
options). Worked well, but I've hit an oops from balance cancel command that
was triggered sometime during the balance. According to the log it happened
right after the balance finished. CCing Ilya, I'm not sure if this is somehow
induced by the patch.

3407         BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
3408         atomic_dec(&fs_info->balance_cancel_req);
3409         mutex_unlock(&fs_info->balance_mutex);
3410         return 0;
3411 }

 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/volumes.c:3407!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: btrfs aoe dm_crypt loop [last unloaded: btrfs]
 CPU 0
 Pid: 19117, comm: btrfs Tainted: G        W    3.8.0-default+ #267 Intel
 RIP: 0010:[<ffffffffa0160919>]  [<ffffffffa0160919>] btrfs_cancel_balance+0x179/0x180 [btrfs]
 RSP: 0018:ffff88005fddfd78  EFLAGS: 00010286
 RAX: ffff88005fddffd8 RBX: ffff88005adec000 RCX: 0000000000000006
 RDX: 0000000000000001 RSI: ffff880027cc88c0 RDI: 2222222222222222
 RBP: ffff88005fddfdd8 R08: 2222222222222222 R09: 2222222222222222
 R10: 2222222222222222 R11: 0000000000000000 R12: ffff88005adeeac8
 R13: ffff88005adeeb70 R14: ffff88005fddfd78 R15: ffff88005adeeb88
 FS:  00007fe136daa740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00007ff04c655000 CR3: 000000005fc4b000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process btrfs (pid: 19117, threadinfo ffff88005fdde000, task ffff880027cc8240)
 Stack:
  0000000000000000 ffff880027cc8240 ffffffff810745c0 ffff88005fddfd90
  ffff88005fddfd90 ffffffff810a6c6d ffff88005fddfdc8 ffff88007724d6c0
  ffff880055e6d000 0000000000000002 ffff880078ccc080 ffff880078ccc538
 Call Trace:
  [<ffffffff810745c0>] ? wake_up_bit+0x40/0x40
  [<ffffffff810a6c6d>] ? lock_release_holdtime+0x3d/0x1c0
  [<ffffffffa016c8a3>] btrfs_ioctl+0x693/0x1d80 [btrfs]
  [<ffffffff81079723>] ? up_read+0x23/0x40
  [<ffffffff8195e268>] ? __do_page_fault+0x238/0x590
  [<ffffffff81087d1f>] ? local_clock+0x6f/0x80
  [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0
  [<ffffffff811708f8>] do_vfs_ioctl+0x98/0x560
  [<ffffffff8195a906>] ? error_sti+0x5/0x6
  [<ffffffff8195a458>] ? retint_swapgs+0x13/0x1b
  [<ffffffff81170e17>] sys_ioctl+0x57/0x90
  [<ffffffff8139adce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81962e99>] system_call_fastpath+0x16/0x1b
 RIP  [<ffffffffa0160919>] btrfs_cancel_balance+0x179/0x180 [btrfs]
  RSP <ffff88005fddfd78>
 ---[ end trace 930320f35566d010 ]---


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-01 16:17   ` [PATCH v2] " David Sterba
  2013-03-01 16:30     ` David Sterba
@ 2013-03-02 21:32     ` Alex Lyakas
  2013-03-06 18:09       ` David Sterba
  1 sibling, 1 reply; 12+ messages in thread
From: Alex Lyakas @ 2013-03-02 21:32 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

Hi David,

On Fri, Mar 1, 2013 at 6:17 PM, David Sterba <dsterba@suse.cz> wrote:
> Each time pick one dead root from the list and let the caller know if
> it's needed to continue. This should improve responsiveness during
> umount and balance which at some point wait for cleaning all currently
> queued dead roots.
>
> A new dead root is added to the end of the list, so the snapshots
> disappear in the order of deletion.
>
> Process snapshot cleaning is now done only from the cleaner thread and
> the others wake it if needed.
>
> Signed-off-by: David Sterba <dsterba@suse.cz>
> ---
>
> v1->v2:
> - added s_umount trylock in cleaner thread
> - added exit into drop_snapshot if fs is going down
>
> patch based on cmason/integration
>
>  fs/btrfs/disk-io.c     |   10 ++++++--
>  fs/btrfs/extent-tree.c |    8 ++++++
>  fs/btrfs/relocation.c  |    3 --
>  fs/btrfs/transaction.c |   57 ++++++++++++++++++++++++++++++++----------------
>  fs/btrfs/transaction.h |    2 +-
>  5 files changed, 54 insertions(+), 26 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index eb7c143..cc85fc7 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1652,15 +1652,19 @@ static int cleaner_kthread(void *arg)
>         struct btrfs_root *root = arg;
>
>         do {
> +               int again = 0;
> +
>                 if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
> +                   down_read_trylock(&root->fs_info->sb->s_umount) &&
>                     mutex_trylock(&root->fs_info->cleaner_mutex)) {
>                         btrfs_run_delayed_iputs(root);
> -                       btrfs_clean_old_snapshots(root);
> +                       again = btrfs_clean_one_deleted_snapshot(root);
>                         mutex_unlock(&root->fs_info->cleaner_mutex);
>                         btrfs_run_defrag_inodes(root->fs_info);
> +                       up_read(&root->fs_info->sb->s_umount);
>                 }
>
> -               if (!try_to_freeze()) {
> +               if (!try_to_freeze() && !again) {
>                         set_current_state(TASK_INTERRUPTIBLE);
>                         if (!kthread_should_stop())
>                                 schedule();
> @@ -3338,8 +3342,8 @@ int btrfs_commit_super(struct btrfs_root *root)
>
>         mutex_lock(&root->fs_info->cleaner_mutex);
>         btrfs_run_delayed_iputs(root);
> -       btrfs_clean_old_snapshots(root);
>         mutex_unlock(&root->fs_info->cleaner_mutex);
> +       wake_up_process(root->fs_info->cleaner_kthread);
>
>         /* wait until ongoing cleanup work done */
>         down_write(&root->fs_info->cleanup_work_sem);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index d2b3a5e..0119ae7 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7078,6 +7078,8 @@ static noinline int walk_up_tree(struct btrfs_trans_handle *trans,
>   * reference count by one. if update_ref is true, this function
>   * also make sure backrefs for the shared block and all lower level
>   * blocks are properly updated.
> + *
> + * If called with for_reloc == 0, may exit early with -EAGAIN
>   */
>  int btrfs_drop_snapshot(struct btrfs_root *root,
>                          struct btrfs_block_rsv *block_rsv, int update_ref,
> @@ -7179,6 +7181,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>         wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root);
>
>         while (1) {
> +               if (!for_reloc && btrfs_fs_closing(root->fs_info)) {
> +                       pr_debug("btrfs: drop snapshot early exit\n");
> +                       err = -EAGAIN;
> +                       goto out_end_trans;
> +               }
> +
>                 ret = walk_down_tree(trans, root, path, wc);
>                 if (ret < 0) {
>                         err = ret;
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index ba5a321..ab6a718 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start)
>
>         while (1) {
>                 mutex_lock(&fs_info->cleaner_mutex);
> -
> -               btrfs_clean_old_snapshots(fs_info->tree_root);
>                 ret = relocate_block_group(rc);
> -
>                 mutex_unlock(&fs_info->cleaner_mutex);
>                 if (ret < 0) {
>                         err = ret;
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index a83d486..6b233c15 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -950,7 +950,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans,
>  int btrfs_add_dead_root(struct btrfs_root *root)
>  {
>         spin_lock(&root->fs_info->trans_lock);
> -       list_add(&root->root_list, &root->fs_info->dead_roots);
> +       list_add_tail(&root->root_list, &root->fs_info->dead_roots);
>         spin_unlock(&root->fs_info->trans_lock);
>         return 0;
>  }
> @@ -1858,31 +1858,50 @@ cleanup_transaction:
>  }
>
>  /*
> - * interface function to delete all the snapshots we have scheduled for deletion
> + * return < 0 if error
> + * 0 if there are no more dead_roots at the time of call
> + * 1 there are more to be processed, call me again
> + *
> + * The return value indicates there are certainly more snapshots to delete, but
> + * if there comes a new one during processing, it may return 0. We don't mind,
> + * because btrfs_commit_super will poke cleaner thread and it will process it a
> + * few seconds later.
>   */
> -int btrfs_clean_old_snapshots(struct btrfs_root *root)
> +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
>  {
> -       LIST_HEAD(list);
> +       int ret;
> +       int run_again = 1;
>         struct btrfs_fs_info *fs_info = root->fs_info;
>
> +       if (root->fs_info->sb->s_flags & MS_RDONLY) {
> +               pr_debug("btrfs: cleaner called for RO fs!\n");
> +               return 0;
> +       }
> +
>         spin_lock(&fs_info->trans_lock);
> -       list_splice_init(&fs_info->dead_roots, &list);
> +       if (list_empty(&fs_info->dead_roots)) {
> +               spin_unlock(&fs_info->trans_lock);
> +               return 0;
> +       }
> +       root = list_first_entry(&fs_info->dead_roots,
> +                       struct btrfs_root, root_list);
> +       list_del(&root->root_list);
>         spin_unlock(&fs_info->trans_lock);
>
> -       while (!list_empty(&list)) {
> -               int ret;
> -
> -               root = list_entry(list.next, struct btrfs_root, root_list);
> -               list_del(&root->root_list);
> +       pr_debug("btrfs: cleaner removing %llu\n",
> +                       (unsigned long long)root->objectid);
>
> -               btrfs_kill_all_delayed_nodes(root);
> +       btrfs_kill_all_delayed_nodes(root);
>
> -               if (btrfs_header_backref_rev(root->node) <
> -                   BTRFS_MIXED_BACKREF_REV)
> -                       ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> -               else
> -                       ret =btrfs_drop_snapshot(root, NULL, 1, 0);
> -               BUG_ON(ret < 0);
> -       }
> -       return 0;
> +       if (btrfs_header_backref_rev(root->node) <
> +                       BTRFS_MIXED_BACKREF_REV)
> +               ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> +       else
> +               ret = btrfs_drop_snapshot(root, NULL, 1, 0);
> +       /*
> +        * If we encounter a transaction abort during snapshot cleaning, we
> +        * don't want to crash here
> +        */
> +       BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
> +       return run_again || ret == -EAGAIN;
I think that this expression will always evaluate to "true", because
"run_again" is set to 1 and never reset.

So now if btrfs_drop_snapshot() returns -EAGAIN, then
btrfs_clean_one_deleted_snapshot() will return 1, thus telling the
caller "call me again", like your comment says. However, in this case
we are unmounting, so is this ok? Or this is just to avoid the cleaner
going to sleep?



>  }
> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
> index 5afd7b1..52f6f25 100644
> --- a/fs/btrfs/transaction.h
> +++ b/fs/btrfs/transaction.h
> @@ -121,7 +121,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
>
>  int btrfs_add_dead_root(struct btrfs_root *root);
>  int btrfs_defrag_root(struct btrfs_root *root);
> -int btrfs_clean_old_snapshots(struct btrfs_root *root);
> +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
>  int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>                              struct btrfs_root *root);
>  int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
> --
> 1.7.9
>

Also, I want to ask, hope this is not inappropriate. Do you also agree
with Josef, that it's ok for BTRFS_IOC_SNAP_DESTROY not to commit the
transaction, but just to detach from it? Had we committed, we would
have ensured that ORPHAN_ITEM is in the root tree, thus preventing
from subvol to re-appear after crash.
It seems a little inconsistent with snap creation, where not only the
transaction is committed, but delalloc flush is performed to ensure
that all data is on disk before creating the snap.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-01 16:30     ` David Sterba
@ 2013-03-06  9:08       ` Ilya Dryomov
  2013-03-06 16:47         ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Dryomov @ 2013-03-06  9:08 UTC (permalink / raw)
  To: dsterba, linux-btrfs, alex.btrfs

On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote:
> The mail did not make it to the list on Monday, anyway, now I have a
> few testing results to share.
> 
> The umount responsiveness is much improved, it took a few seconds when
> there were many roots to clean scheduled.
> 
> With
> 
> $ echo 'func btrfs_drop_snapshot +pf' > /sys/debug/kernel/dynamic_debug/control
> $ echo 'func btrfs_clean_one_deleted_snapshot +pf' > /sys/debug/kernel/dynamic_debug/control
> 
> one can watch how it proceeds.
> 
> After umount and and mount, the cleaning process starts again, after 30
> seconds when transaction commit triggers.
> 
> Next test was to let it run with balance (just 'fi balance', no other
> options). Worked well, but I've hit an oops from balance cancel command that
> was triggered sometime during the balance. According to the log it happened
> right after the balance finished. CCing Ilya, I'm not sure if this is somehow
> induced by the patch.
> 
> 3407         BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));

Were you by any chance running this on top of a for-linus that included
a raid56 merge commit?  If so, this is result of a raid56 mismerge --
fs_info->balance_ctl is supposed to be cleared in __cancel_balance(), a
call to which was accidentally nuked.

Thanks,

		Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-06  9:08       ` Ilya Dryomov
@ 2013-03-06 16:47         ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2013-03-06 16:47 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: dsterba, linux-btrfs, alex.btrfs

On Wed, Mar 06, 2013 at 11:08:57AM +0200, Ilya Dryomov wrote:
> On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote:
> > 3407         BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));
> 
> Were you by any chance running this on top of a for-linus that included
> a raid56 merge commit?  If so, this is result of a raid56 mismerge --
> fs_info->balance_ctl is supposed to be cleared in __cancel_balance(), a
> call to which was accidentally nuked.

I've been testing either plain next/master or based on linus' tree, so
yes, this merge was there and the missing cancel explains the BUG_ON.

david

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-02 21:32     ` Alex Lyakas
@ 2013-03-06 18:09       ` David Sterba
  2013-03-07  3:12         ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2013-03-06 18:09 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: David Sterba, linux-btrfs

On Sat, Mar 02, 2013 at 11:32:07PM +0200, Alex Lyakas wrote:
> On Fri, Mar 1, 2013 at 6:17 PM, David Sterba <dsterba@suse.cz> wrote:
> > -               btrfs_kill_all_delayed_nodes(root);
> > +       btrfs_kill_all_delayed_nodes(root);
> >
> > -               if (btrfs_header_backref_rev(root->node) <
> > -                   BTRFS_MIXED_BACKREF_REV)
> > -                       ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> > -               else
> > -                       ret =btrfs_drop_snapshot(root, NULL, 1, 0);
> > -               BUG_ON(ret < 0);
> > -       }
> > -       return 0;
> > +       if (btrfs_header_backref_rev(root->node) <
> > +                       BTRFS_MIXED_BACKREF_REV)
> > +               ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> > +       else
> > +               ret = btrfs_drop_snapshot(root, NULL, 1, 0);
> > +       /*
> > +        * If we encounter a transaction abort during snapshot cleaning, we
> > +        * don't want to crash here
> > +        */
> > +       BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
> > +       return run_again || ret == -EAGAIN;
> I think that this expression will always evaluate to "true", because
> "run_again" is set to 1 and never reset.

That's right, I probably had some intentions in the past, but now it's
just superflous.

> So now if btrfs_drop_snapshot() returns -EAGAIN, then
> btrfs_clean_one_deleted_snapshot() will return 1, thus telling the
> caller "call me again", like your comment says. However, in this case
> we are unmounting, so is this ok? Or this is just to avoid the cleaner
> going to sleep?

close_ctree calls kthread_stop(cleaner) which would wake cleaner anyway.

The cases when we want send cleaner to sleep are when there's nothing to
do or the read-only case which is just a safety check.

>From that point, the last statement should be 'return 1' and run_again
removed.

> Also, I want to ask, hope this is not inappropriate. Do you also agree
> with Josef, that it's ok for BTRFS_IOC_SNAP_DESTROY not to commit the
> transaction, but just to detach from it? Had we committed, we would
> have ensured that ORPHAN_ITEM is in the root tree, thus preventing
> from subvol to re-appear after crash.
> It seems a little inconsistent with snap creation, where not only the
> transaction is committed, but delalloc flush is performed to ensure
> that all data is on disk before creating the snap.

That's another question, can you please point me to the thread where
this was discussed?

thanks,
david

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-06 18:09       ` David Sterba
@ 2013-03-07  3:12         ` Chris Mason
  2013-03-07 11:55           ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2013-03-07  3:12 UTC (permalink / raw)
  To: David Sterba; +Cc: Alex Lyakas, linux-btrfs

On Wed, Mar 06, 2013 at 11:09:15AM -0700, David Sterba wrote:
> On Sat, Mar 02, 2013 at 11:32:07PM +0200, Alex Lyakas wrote:
> > On Fri, Mar 1, 2013 at 6:17 PM, David Sterba <dsterba@suse.cz> wrote:
> > > -               btrfs_kill_all_delayed_nodes(root);
> > > +       btrfs_kill_all_delayed_nodes(root);
> > >
> > > -               if (btrfs_header_backref_rev(root->node) <
> > > -                   BTRFS_MIXED_BACKREF_REV)
> > > -                       ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> > > -               else
> > > -                       ret =btrfs_drop_snapshot(root, NULL, 1, 0);
> > > -               BUG_ON(ret < 0);
> > > -       }
> > > -       return 0;
> > > +       if (btrfs_header_backref_rev(root->node) <
> > > +                       BTRFS_MIXED_BACKREF_REV)
> > > +               ret = btrfs_drop_snapshot(root, NULL, 0, 0);
> > > +       else
> > > +               ret = btrfs_drop_snapshot(root, NULL, 1, 0);
> > > +       /*
> > > +        * If we encounter a transaction abort during snapshot cleaning, we
> > > +        * don't want to crash here
> > > +        */
> > > +       BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS));
> > > +       return run_again || ret == -EAGAIN;
> > I think that this expression will always evaluate to "true", because
> > "run_again" is set to 1 and never reset.
> 
> That's right, I probably had some intentions in the past, but now it's
> just superflous.
> 
> > So now if btrfs_drop_snapshot() returns -EAGAIN, then
> > btrfs_clean_one_deleted_snapshot() will return 1, thus telling the
> > caller "call me again", like your comment says. However, in this case
> > we are unmounting, so is this ok? Or this is just to avoid the cleaner
> > going to sleep?
> 
> close_ctree calls kthread_stop(cleaner) which would wake cleaner anyway.
> 
> The cases when we want send cleaner to sleep are when there's nothing to
> do or the read-only case which is just a safety check.
> 
> From that point, the last statement should be 'return 1' and run_again
> removed.
> 
> > Also, I want to ask, hope this is not inappropriate. Do you also agree
> > with Josef, that it's ok for BTRFS_IOC_SNAP_DESTROY not to commit the
> > transaction, but just to detach from it? Had we committed, we would
> > have ensured that ORPHAN_ITEM is in the root tree, thus preventing
> > from subvol to re-appear after crash.
> > It seems a little inconsistent with snap creation, where not only the
> > transaction is committed, but delalloc flush is performed to ensure
> > that all data is on disk before creating the snap.
> 
> That's another question, can you please point me to the thread where
> this was discussed?

That's a really old one.  The original snapshot code expected people to
run sync first, but that's not very user friendly.  The idea is that if
you write a file and then take a snapshot, that file should be in the
snapshot.

-chris


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-07  3:12         ` Chris Mason
@ 2013-03-07 11:55           ` David Sterba
  2013-03-16 19:33             ` Alex Lyakas
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2013-03-07 11:55 UTC (permalink / raw)
  To: Chris Mason, David Sterba, Alex Lyakas, linux-btrfs

On Wed, Mar 06, 2013 at 10:12:11PM -0500, Chris Mason wrote:
> > > Also, I want to ask, hope this is not inappropriate. Do you also agree
> > > with Josef, that it's ok for BTRFS_IOC_SNAP_DESTROY not to commit the
> > > transaction, but just to detach from it? Had we committed, we would
> > > have ensured that ORPHAN_ITEM is in the root tree, thus preventing
> > > from subvol to re-appear after crash.
> > > It seems a little inconsistent with snap creation, where not only the
> > > transaction is committed, but delalloc flush is performed to ensure
> > > that all data is on disk before creating the snap.
> > 
> > That's another question, can you please point me to the thread where
> > this was discussed?
> 
> That's a really old one.  The original snapshot code expected people to
> run sync first, but that's not very user friendly.  The idea is that if
> you write a file and then take a snapshot, that file should be in the
> snapshot.

The snapshot behaviour sounds ok to me.

That a subvol/snapshot may appear after crash if transation commit did
not happen does not feel so good. We know that the subvol is only
scheduled for deletion and needs to be processed by cleaner.

>From that point I'd rather see the commit to happen to avoid any
unexpected surprises.  A subvolume that re-appears still holds the data
references and consumes space although the user does not assume that.

Automated snapshotting and deleting needs some guarantees about the
behaviour and what to do after a crash. So now it has to process the
backlog of previously deleted snapshots and verify that they're not
there, compared to "deleted -> will never appear, can forget about it".


david

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] btrfs: clean snapshots one by one
  2013-03-07 11:55           ` David Sterba
@ 2013-03-16 19:33             ` Alex Lyakas
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Lyakas @ 2013-03-16 19:33 UTC (permalink / raw)
  To: dsterba, linux-btrfs

Hi David,

On Thu, Mar 7, 2013 at 1:55 PM, David Sterba <dsterba@suse.cz> wrote:
> On Wed, Mar 06, 2013 at 10:12:11PM -0500, Chris Mason wrote:
>> > > Also, I want to ask, hope this is not inappropriate. Do you also agree
>> > > with Josef, that it's ok for BTRFS_IOC_SNAP_DESTROY not to commit the
>> > > transaction, but just to detach from it? Had we committed, we would
>> > > have ensured that ORPHAN_ITEM is in the root tree, thus preventing
>> > > from subvol to re-appear after crash.
>> > > It seems a little inconsistent with snap creation, where not only the
>> > > transaction is committed, but delalloc flush is performed to ensure
>> > > that all data is on disk before creating the snap.
>> >
>> > That's another question, can you please point me to the thread where
>> > this was discussed?
http://www.spinics.net/lists/linux-btrfs/msg22256.html


>>
>> That's a really old one.  The original snapshot code expected people to
>> run sync first, but that's not very user friendly.  The idea is that if
>> you write a file and then take a snapshot, that file should be in the
>> snapshot.
>
> The snapshot behaviour sounds ok to me.
>
> That a subvol/snapshot may appear after crash if transation commit did
> not happen does not feel so good. We know that the subvol is only
> scheduled for deletion and needs to be processed by cleaner.
>
> From that point I'd rather see the commit to happen to avoid any
> unexpected surprises.  A subvolume that re-appears still holds the data
> references and consumes space although the user does not assume that.
>
> Automated snapshotting and deleting needs some guarantees about the
> behaviour and what to do after a crash. So now it has to process the
> backlog of previously deleted snapshots and verify that they're not
> there, compared to "deleted -> will never appear, can forget about it".

Exactly. Currently, the user space has no idea when the deletion will
start, or when it is completed (it has to track the ROOT_ITEM, drop
progress, ORPHAN_ITEM etc.). That's why I was thinking, that at least
committing a transaction on snap_destroy could ensure that deletion
will not be reverted.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-03-16 19:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-11 16:11 [RFC][PATCH] btrfs: clean snapshots one by one David Sterba
2013-02-17 19:55 ` Alex Lyakas
2013-02-22 23:46   ` David Sterba
2013-03-01 16:17   ` [PATCH v2] " David Sterba
2013-03-01 16:30     ` David Sterba
2013-03-06  9:08       ` Ilya Dryomov
2013-03-06 16:47         ` David Sterba
2013-03-02 21:32     ` Alex Lyakas
2013-03-06 18:09       ` David Sterba
2013-03-07  3:12         ` Chris Mason
2013-03-07 11:55           ` David Sterba
2013-03-16 19:33             ` Alex Lyakas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.