Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH 2/2] Btrfs: fix unprotected deletion from pending_chunks list
@ 2014-12-02 18:07 Filipe Manana
  2019-01-21 19:07 ` Alex Lyakas
  0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2014-12-02 18:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

On block group remove if the corresponding extent map was on the
transaction->pending_chunks list, we were deleting the extent map
from that list, through remove_extent_mapping(), without any
synchronization with chunk allocation (which iterates that list
and adds new elements to it). Fix this by ensure that this is done
while the chunk mutex is held, since that's the mutex that protects
the list in the chunk allocation code path.

This applies on top (depends on) of my previous patch titled:
"Btrfs: fix race between fs trimming and block group remove/allocation"

But the issue in fact was already present before that change, it only
became easier to hit after Josef's 3.18 patch that added automatic
removal of empty block groups.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/extent-tree.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 17d429d..a7b81b4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9524,19 +9524,25 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		list_move_tail(&em->list, &root->fs_info->pinned_chunks);
 	}
 	spin_unlock(&block_group->lock);
-	unlock_chunks(root);
 
 	if (remove_em) {
 		struct extent_map_tree *em_tree;
 
 		em_tree = &root->fs_info->mapping_tree.map_tree;
 		write_lock(&em_tree->lock);
+		/*
+		 * The em might be in the pending_chunks list, so make sure the
+		 * chunk mutex is locked, since remove_extent_mapping() will
+		 * delete us from that list.
+		 */
 		remove_extent_mapping(em_tree, em);
 		write_unlock(&em_tree->lock);
 		/* once for the tree */
 		free_extent_map(em);
 	}
 
+	unlock_chunks(root);
+
 	btrfs_put_block_group(block_group);
 	btrfs_put_block_group(block_group);
 
-- 
2.1.3


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] Btrfs: fix unprotected deletion from pending_chunks list
  2014-12-02 18:07 [PATCH 2/2] Btrfs: fix unprotected deletion from pending_chunks list Filipe Manana
@ 2019-01-21 19:07 ` Alex Lyakas
  2019-01-21 20:05   ` Filipe Manana
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Lyakas @ 2019-01-21 19:07 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Filipe Manana

Hi Filipe,

On Tue, Dec 2, 2014 at 8:08 PM Filipe Manana <fdmanana@suse.com> wrote:
>
> On block group remove if the corresponding extent map was on the
> transaction->pending_chunks list, we were deleting the extent map
> from that list, through remove_extent_mapping(), without any
> synchronization with chunk allocation (which iterates that list
> and adds new elements to it). Fix this by ensure that this is done
> while the chunk mutex is held, since that's the mutex that protects
> the list in the chunk allocation code path.
>
> This applies on top (depends on) of my previous patch titled:
> "Btrfs: fix race between fs trimming and block group remove/allocation"
>
> But the issue in fact was already present before that change, it only
> became easier to hit after Josef's 3.18 patch that added automatic
> removal of empty block groups.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 17d429d..a7b81b4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9524,19 +9524,25 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>                 list_move_tail(&em->list, &root->fs_info->pinned_chunks);
>         }
>         spin_unlock(&block_group->lock);
> -       unlock_chunks(root);
>
>         if (remove_em) {
>                 struct extent_map_tree *em_tree;
>
>                 em_tree = &root->fs_info->mapping_tree.map_tree;
>                 write_lock(&em_tree->lock);
> +               /*
> +                * The em might be in the pending_chunks list, so make sure the
> +                * chunk mutex is locked, since remove_extent_mapping() will
> +                * delete us from that list.
> +                */
>                 remove_extent_mapping(em_tree, em);
>                 write_unlock(&em_tree->lock);
If the "em" was in pending_chunks, it will be deleted from that list
by "remove_extent_mapping". But it looks like in this case we also
need to drop the extra ref on "em", which was held by pending_chunks
list. I don't see it being done anywhere else. So we should check
before the remove_extent_mapping() call whether "em" was in
pending_chunks, and, if yes, drop the extra ref?

Thanks,
Alex.


>                 /* once for the tree */
>                 free_extent_map(em);
>         }
>
> +       unlock_chunks(root);
> +
>         btrfs_put_block_group(block_group);
>         btrfs_put_block_group(block_group);
>
> --
> 2.1.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] Btrfs: fix unprotected deletion from pending_chunks list
  2019-01-21 19:07 ` Alex Lyakas
@ 2019-01-21 20:05   ` Filipe Manana
  2019-01-22 10:41     ` Alex Lyakas
  0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2019-01-21 20:05 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: linux-btrfs, Filipe Manana

On Mon, Jan 21, 2019 at 7:07 PM Alex Lyakas <alex@zadara.com> wrote:
>
> Hi Filipe,
>
> On Tue, Dec 2, 2014 at 8:08 PM Filipe Manana <fdmanana@suse.com> wrote:
> >
> > On block group remove if the corresponding extent map was on the
> > transaction->pending_chunks list, we were deleting the extent map
> > from that list, through remove_extent_mapping(), without any
> > synchronization with chunk allocation (which iterates that list
> > and adds new elements to it). Fix this by ensure that this is done
> > while the chunk mutex is held, since that's the mutex that protects
> > the list in the chunk allocation code path.
> >
> > This applies on top (depends on) of my previous patch titled:
> > "Btrfs: fix race between fs trimming and block group remove/allocation"
> >
> > But the issue in fact was already present before that change, it only
> > became easier to hit after Josef's 3.18 patch that added automatic
> > removal of empty block groups.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >  fs/btrfs/extent-tree.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 17d429d..a7b81b4 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -9524,19 +9524,25 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >                 list_move_tail(&em->list, &root->fs_info->pinned_chunks);
> >         }
> >         spin_unlock(&block_group->lock);
> > -       unlock_chunks(root);
> >
> >         if (remove_em) {
> >                 struct extent_map_tree *em_tree;
> >
> >                 em_tree = &root->fs_info->mapping_tree.map_tree;
> >                 write_lock(&em_tree->lock);
> > +               /*
> > +                * The em might be in the pending_chunks list, so make sure the
> > +                * chunk mutex is locked, since remove_extent_mapping() will
> > +                * delete us from that list.
> > +                */
> >                 remove_extent_mapping(em_tree, em);
> >                 write_unlock(&em_tree->lock);
> If the "em" was in pending_chunks, it will be deleted from that list
> by "remove_extent_mapping". But it looks like in this case we also
> need to drop the extra ref on "em", which was held by pending_chunks
> list. I don't see it being done anywhere else. So we should check
> before the remove_extent_mapping() call whether "em" was in
> pending_chunks, and, if yes, drop the extra ref?

This was part of a large patch set that fixed multiple issues with
automatic removal of block groups.
Dropping the extent map reference was done on another patch of that patch set:

commit 495e64f4fe0363bc79fa0dfb41c271787e01b5c3
Author: Filipe Manana <fdmanana@suse.com>
Date:   Tue Dec 2 18:07:30 2014 +0000

    Btrfs: fix fs mapping extent map leak

Over 4 years ago....

>
> Thanks,
> Alex.
>
>
> >                 /* once for the tree */
> >                 free_extent_map(em);
> >         }
> >
> > +       unlock_chunks(root);
> > +
> >         btrfs_put_block_group(block_group);
> >         btrfs_put_block_group(block_group);
> >
> > --
> > 2.1.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] Btrfs: fix unprotected deletion from pending_chunks list
  2019-01-21 20:05   ` Filipe Manana
@ 2019-01-22 10:41     ` Alex Lyakas
  0 siblings, 0 replies; 4+ messages in thread
From: Alex Lyakas @ 2019-01-22 10:41 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Filipe Manana

Hi Filipe,

Thank you for your response. I realize it was a long time, ago, but we
are just now in the process of moving to stable kernel 4.14.x.

Regarding the fix, I see now the relevant code in "btrfs_remove_block_group":
    mutex_lock(&fs_info->chunk_mutex);
    if (!list_empty(&em->list)) {
        /* We're in the transaction->pending_chunks list. */
        free_extent_map(em);
    }
...
However, this brings another doubt. Let's say we indeed performed
free_extent_map in the above code. But later we may do:
        /*
         * Our em might be in trans->transaction->pending_chunks which
         * is protected by fs_info->chunk_mutex ([lock|unlock]_chunks),
         * and so is the fs_info->pinned_chunks list.
         *
         * So at this point we must be holding the chunk_mutex to avoid
         * any races with chunk allocation (more specifically at
         * volumes.c:contains_pending_extent()), to ensure it always
         * sees the em, either in the pending_chunks list or in the
         * pinned_chunks list.
         */
        list_move_tail(&em->list, &fs_info->pinned_chunks);

So we have dropped the ref that was held by
"transaction->pending_chunks" list, and now we moved the "em" to the
pinned_chunks without a ref. But the code assumes that "pinned_chunks"
also has a ref on the "em". For example in close_ctree, we do:
    while (!list_empty(&fs_info->pinned_chunks)) {
        struct extent_map *em;

        em = list_first_entry(&fs_info->pinned_chunks,
                      struct extent_map, list);
        list_del_init(&em->list);
        free_extent_map(em);
    }

Can you please comment on that?

Thanks,
Alex.


On Mon, Jan 21, 2019 at 10:06 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Mon, Jan 21, 2019 at 7:07 PM Alex Lyakas <alex@zadara.com> wrote:
> >
> > Hi Filipe,
> >
> > On Tue, Dec 2, 2014 at 8:08 PM Filipe Manana <fdmanana@suse.com> wrote:
> > >
> > > On block group remove if the corresponding extent map was on the
> > > transaction->pending_chunks list, we were deleting the extent map
> > > from that list, through remove_extent_mapping(), without any
> > > synchronization with chunk allocation (which iterates that list
> > > and adds new elements to it). Fix this by ensure that this is done
> > > while the chunk mutex is held, since that's the mutex that protects
> > > the list in the chunk allocation code path.
> > >
> > > This applies on top (depends on) of my previous patch titled:
> > > "Btrfs: fix race between fs trimming and block group remove/allocation"
> > >
> > > But the issue in fact was already present before that change, it only
> > > became easier to hit after Josef's 3.18 patch that added automatic
> > > removal of empty block groups.
> > >
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > ---
> > >  fs/btrfs/extent-tree.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > index 17d429d..a7b81b4 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -9524,19 +9524,25 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> > >                 list_move_tail(&em->list, &root->fs_info->pinned_chunks);
> > >         }
> > >         spin_unlock(&block_group->lock);
> > > -       unlock_chunks(root);
> > >
> > >         if (remove_em) {
> > >                 struct extent_map_tree *em_tree;
> > >
> > >                 em_tree = &root->fs_info->mapping_tree.map_tree;
> > >                 write_lock(&em_tree->lock);
> > > +               /*
> > > +                * The em might be in the pending_chunks list, so make sure the
> > > +                * chunk mutex is locked, since remove_extent_mapping() will
> > > +                * delete us from that list.
> > > +                */
> > >                 remove_extent_mapping(em_tree, em);
> > >                 write_unlock(&em_tree->lock);
> > If the "em" was in pending_chunks, it will be deleted from that list
> > by "remove_extent_mapping". But it looks like in this case we also
> > need to drop the extra ref on "em", which was held by pending_chunks
> > list. I don't see it being done anywhere else. So we should check
> > before the remove_extent_mapping() call whether "em" was in
> > pending_chunks, and, if yes, drop the extra ref?
>
> This was part of a large patch set that fixed multiple issues with
> automatic removal of block groups.
> Dropping the extent map reference was done on another patch of that patch set:
>
> commit 495e64f4fe0363bc79fa0dfb41c271787e01b5c3
> Author: Filipe Manana <fdmanana@suse.com>
> Date:   Tue Dec 2 18:07:30 2014 +0000
>
>     Btrfs: fix fs mapping extent map leak
>
> Over 4 years ago....
>
> >
> > Thanks,
> > Alex.
> >
> >
> > >                 /* once for the tree */
> > >                 free_extent_map(em);
> > >         }
> > >
> > > +       unlock_chunks(root);
> > > +
> > >         btrfs_put_block_group(block_group);
> > >         btrfs_put_block_group(block_group);
> > >
> > > --
> > > 2.1.3
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02 18:07 [PATCH 2/2] Btrfs: fix unprotected deletion from pending_chunks list Filipe Manana
2019-01-21 19:07 ` Alex Lyakas
2019-01-21 20:05   ` Filipe Manana
2019-01-22 10:41     ` Alex Lyakas

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox