linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
@ 2014-05-22  0:12 Chris Mason
  2014-05-22  1:21 ` Jeff Mahoney
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2014-05-22  0:12 UTC (permalink / raw)
  To: linux-btrfs, David Sterba, Jeff Mahoney


The Btrfs sysfs code removes entries for raid types that are no
longer in use.  This means that if you have a raid0 FS and use balance
to turn it into a raid1 FS, the raid0 sysfs entries will go away.

The rough chain of events is:

__link_block_group() -> see we're the first RAIDX, add sysfs entry

btrfs_remove_block_group() -> notice we're removing the last RAIDX
remove sysfs entry

This all makes sense until we try to add RAIDX back into the FS again.
The problem is that our RAID kobjects are just in an array that gets
freed at unmount time, instead of an array of pointers to kobjects that
get freed when the great sysfs in the sky is done with them.

When we remove the sysfs entry for a given raid level, the syfs code
free's the name.  When we use the same kobject to add back the RAIDX
entry again, sysfs sees the old name pointer and tries to free it again.

All of which is a long way of saying we're using sysfs wrong.  For now,
just don't remove entries for raid levels that we're no longer using.

Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Dave Sterba <dsterba@suse.cz>

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ddf16bf..acdc7ed 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8535,12 +8535,14 @@ static void __link_block_group(struct btrfs_space_info *space_info,
 		struct kobject *kobj = &space_info->block_group_kobjs[index];
 		int ret;
 
-		kobject_get(&space_info->kobj); /* put in release */
-		ret = kobject_add(kobj, &space_info->kobj, "%s",
-				  get_raid_name(index));
-		if (ret) {
-			pr_warn("BTRFS: failed to add kobject for block cache. ignoring.\n");
-			kobject_put(&space_info->kobj);
+		if (!kobj->name) {
+			kobject_get(&space_info->kobj); /* put in release */
+			ret = kobject_add(kobj, &space_info->kobj, "%s",
+					  get_raid_name(index));
+			if (ret) {
+				pr_warn("BTRFS: failed to add kobject for block cache. ignoring.\n");
+				kobject_put(&space_info->kobj);
+			}
 		}
 	}
 }
@@ -8976,8 +8978,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	 */
 	list_del_init(&block_group->list);
 	if (list_empty(&block_group->space_info->block_groups[index])) {
-		kobject_del(&block_group->space_info->block_group_kobjs[index]);
-		kobject_put(&block_group->space_info->block_group_kobjs[index]);
 		clear_avail_alloc_bits(root->fs_info, block_group->flags);
 	}
 	up_write(&block_group->space_info->groups_sem);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-22  0:12 [PATCH] Btrfs: don't remove raid type sysfs entries until unmount Chris Mason
@ 2014-05-22  1:21 ` Jeff Mahoney
  2014-05-22 12:19   ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Mahoney @ 2014-05-22  1:21 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs, David Sterba

On 05/21/2014 08:12 PM, Chris Mason wrote:
> 
> The Btrfs sysfs code removes entries for raid types that are no
> longer in use.  This means that if you have a raid0 FS and use balance
> to turn it into a raid1 FS, the raid0 sysfs entries will go away.
> 
> The rough chain of events is:
> 
> __link_block_group() -> see we're the first RAIDX, add sysfs entry
> 
> btrfs_remove_block_group() -> notice we're removing the last RAIDX
> remove sysfs entry
> 
> This all makes sense until we try to add RAIDX back into the FS again.
> The problem is that our RAID kobjects are just in an array that gets
> freed at unmount time, instead of an array of pointers to kobjects that
> get freed when the great sysfs in the sky is done with them.
> 
> When we remove the sysfs entry for a given raid level, the syfs code
> free's the name.  When we use the same kobject to add back the RAIDX
> entry again, sysfs sees the old name pointer and tries to free it again.
> 
> All of which is a long way of saying we're using sysfs wrong.  For now,
> just don't remove entries for raid levels that we're no longer using.

Hi Chris -

Thanks for posting the problem. I disagree that sysfs is being used
wrong here - or well, not on purpose. The problem is only that I didn't
anticipate raid levels going away and only initialize the kobjects i
update_space_info.

The name being double-freed is only part of the problem. The kobject
isn't reinitialized at all. Clearing ->name and re-calling kobject_init
would fix it - but I think we can do one better.

kobject_cleanup's comment for freeing ->name claims that ->name being
set means that it was allocated by the kobject infrastructure. The raid
names come directly out of an array anyway, so we don't even need to
copy it. We can avoid the string allocation entirely and handle the
reuse properly.

The following (untested, tonight anyway) patch should fix it.

-Jeff

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3401,10 +3401,8 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags,
 		return ret;
 	}
 
-	for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
+	for (i = 0; i < BTRFS_NR_RAID_TYPES; i++)
 		INIT_LIST_HEAD(&found->block_groups[i]);
-		kobject_init(&found->block_group_kobjs[i], &btrfs_raid_ktype);
-	}
 	init_rwsem(&found->groups_sem);
 	spin_lock_init(&found->lock);
 	found->flags = flags & BTRFS_BLOCK_GROUP_TYPE_MASK;
@@ -8356,8 +8354,9 @@ static void __link_block_group(struct btrfs_space_info *space_info,
 		int ret;
 
 		kobject_get(&space_info->kobj); /* put in release */
-		ret = kobject_add(kobj, &space_info->kobj, "%s",
-				  get_raid_name(index));
+		kobject->name = get_raid_name(index);
+		ret = kobject_init_and_add(kobj, &btrfs_raid_ktype,
+					   &space_info->kobj, NULL);
 		if (ret) {
 			pr_warn("BTRFS: failed to add kobject for block cache. ignoring.\n");
 			kobject_put(&space_info->kobj);
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index c5eb214..d742d79 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -288,6 +288,7 @@ static struct attribute *raid_attributes[] = {
 
 static void release_raid_kobj(struct kobject *kobj)
 {
+	kobj->name = NULL;
 	kobject_put(kobj->parent);
 }
 


-Jeff


> Signed-off-by: Chris Mason <clm@fb.com>
> Reported-by: Dave Sterba <dsterba@suse.cz>
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index ddf16bf..acdc7ed 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8535,12 +8535,14 @@ static void __link_block_group(struct btrfs_space_info *space_info,
>  		struct kobject *kobj = &space_info->block_group_kobjs[index];
>  		int ret;
>  
> -		kobject_get(&space_info->kobj); /* put in release */
> -		ret = kobject_add(kobj, &space_info->kobj, "%s",
> -				  get_raid_name(index));
> -		if (ret) {
> -			pr_warn("BTRFS: failed to add kobject for block cache. ignoring.\n");
> -			kobject_put(&space_info->kobj);
> +		if (!kobj->name) {
> +			kobject_get(&space_info->kobj); /* put in release */
> +			ret = kobject_add(kobj, &space_info->kobj, "%s",
> +					  get_raid_name(index));
> +			if (ret) {
> +				pr_warn("BTRFS: failed to add kobject for block cache. ignoring.\n");
> +				kobject_put(&space_info->kobj);
> +			}
>  		}
>  	}
>  }
> @@ -8976,8 +8978,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  	 */
>  	list_del_init(&block_group->list);
>  	if (list_empty(&block_group->space_info->block_groups[index])) {
> -		kobject_del(&block_group->space_info->block_group_kobjs[index]);
> -		kobject_put(&block_group->space_info->block_group_kobjs[index]);
>  		clear_avail_alloc_bits(root->fs_info, block_group->flags);
>  	}
>  	up_write(&block_group->space_info->groups_sem);
> 




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-22  1:21 ` Jeff Mahoney
@ 2014-05-22 12:19   ` Chris Mason
  2014-05-22 15:05     ` Jeff Mahoney
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2014-05-22 12:19 UTC (permalink / raw)
  To: Jeff Mahoney, linux-btrfs, David Sterba



On 05/21/2014 09:21 PM, Jeff Mahoney wrote:
> On 05/21/2014 08:12 PM, Chris Mason wrote:
>>
>> The Btrfs sysfs code removes entries for raid types that are no
>> longer in use.  This means that if you have a raid0 FS and use balance
>> to turn it into a raid1 FS, the raid0 sysfs entries will go away.
>>
>> The rough chain of events is:
>>
>> __link_block_group() -> see we're the first RAIDX, add sysfs entry
>>
>> btrfs_remove_block_group() -> notice we're removing the last RAIDX
>> remove sysfs entry
>>
>> This all makes sense until we try to add RAIDX back into the FS again.
>> The problem is that our RAID kobjects are just in an array that gets
>> freed at unmount time, instead of an array of pointers to kobjects that
>> get freed when the great sysfs in the sky is done with them.
>>
>> When we remove the sysfs entry for a given raid level, the syfs code
>> free's the name.  When we use the same kobject to add back the RAIDX
>> entry again, sysfs sees the old name pointer and tries to free it again.
>>
>> All of which is a long way of saying we're using sysfs wrong.  For now,
>> just don't remove entries for raid levels that we're no longer using.
> 
> Hi Chris -
> 
> Thanks for posting the problem. I disagree that sysfs is being used
> wrong here - or well, not on purpose. The problem is only that I didn't
> anticipate raid levels going away and only initialize the kobjects i
> update_space_info.

Thanks Jeff, this is more a comment on the sadness that comes from using
sysfs, not a knock on your patch ;)

> 
> The name being double-freed is only part of the problem. The kobject
> isn't reinitialized at all. Clearing ->name and re-calling kobject_init
> would fix it - but I think we can do one better.
> 
> kobject_cleanup's comment for freeing ->name claims that ->name being
> set means that it was allocated by the kobject infrastructure. The raid
> names come directly out of an array anyway, so we don't even need to
> copy it. We can avoid the string allocation entirely and handle the
> reuse properly.
> 
> The following (untested, tonight anyway) patch should fix it.

Can we safely reinit a kobject that has been put in use in sysfs?  Given
all the things that can hold refs etc is this legal?

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-22 12:19   ` Chris Mason
@ 2014-05-22 15:05     ` Jeff Mahoney
  2014-05-22 17:17       ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Mahoney @ 2014-05-22 15:05 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs, David Sterba

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/22/14, 8:19 AM, Chris Mason wrote:
> Can we safely reinit a kobject that has been put in use in sysfs?
> Given all the things that can hold refs etc is this legal?

It depends on how the kobject is being used.

It wouldn't be safe to re-use the kobject embedded in space_info since
it controls the lifetime of the object, regardless of its use in sysfs.

The kobjects for block groups only exist for creating the
subdirectories and their lifetime is actually the lifetime of the
space_info. We take a reference to the space_info when we add them to
sysfs because that's where they draw their data. The only reference to
a block group kobject is taken when we add it to sysfs and is dropped
when we remove it. Holding a sysfs file open doesn't pin the kobject,
so once we remove it from sysfs (kobject_del waits for readers to
complete), it's safe to reinitialize it.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQIcBAEBAgAGBQJTfhJBAAoJEB57S2MheeWymZoP/Ru0dB9DDQW7eCgeOuXTNMZt
uy29gJsUjC6QoQagJ0aHT1SiNw9EFHuDnfrnUaOGxO2l0dDostdrU4xqH/632xVv
9o4QrTWtg0d7mN2+D4YaYLB/74Nd1O86CE+3zbmM8CY8Ueh+JMR63jSSU5k2zeCX
ts0dcdK31DBP/aoof6NS8KNdJOpi0dIiefH2Si4Zm2RYBxcMBrKpI8KIrhab1Bo6
vomgq4XkAZUK7G2C/gOmGdqMlSMwMsOhiZ2fYH7pdR74q4vPu6K+RnxFnIuOzmPi
nzMxvgYnjjn1+cynTmZD3xMj94MyCamV1wbj4jciIiDdNrr1XRJH1EMqA5T3wDN4
vw52dVm6sDGizO4/coU21WMum2IUrIbCESXK74Tef3FLWzRWerKeZStIYBYGmVuz
WGTfwUJ+SnhEwSAEHbYIDM8fogVvei2UlPAwQ4o2kSzM9kKf+E5YbwYJDbdKlQ0C
tF6kyZFzkJ2ZEEVGv7fqswHudkWGafrPxRCN9iMvxWUWcxNybh1PZ3MW45uvfyA7
UrZYuDruTS2WorYSivKdFi+9DgTxms/ekfiNi4SZQ5Z1JteAPU8AS5OAQOuo+UCD
uIdlRzorK/pzYH0c/qbN5vy/t+cBVYU6ya238R9eTbq1jQGWWZlZuwaUJQb9nhmS
yKBOJnpRMrJOpO6Onrrh
=0M9G
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-22 15:05     ` Jeff Mahoney
@ 2014-05-22 17:17       ` Chris Mason
  2014-05-23 12:38         ` David Sterba
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2014-05-22 17:17 UTC (permalink / raw)
  To: Jeff Mahoney, linux-btrfs, David Sterba

On 05/22/2014 11:05 AM, Jeff Mahoney wrote:
> - gpg control packet
> On 5/22/14, 8:19 AM, Chris Mason wrote:
>> Can we safely reinit a kobject that has been put in use in sysfs?
>> Given all the things that can hold refs etc is this legal?
> 
> It depends on how the kobject is being used.
> 
> It wouldn't be safe to re-use the kobject embedded in space_info since
> it controls the lifetime of the object, regardless of its use in sysfs.
> 
> The kobjects for block groups only exist for creating the
> subdirectories and their lifetime is actually the lifetime of the
> space_info. We take a reference to the space_info when we add them to
> sysfs because that's where they draw their data. The only reference to
> a block group kobject is taken when we add it to sysfs and is dropped
> when we remove it. Holding a sysfs file open doesn't pin the kobject,
> so once we remove it from sysfs (kobject_del waits for readers to
> complete), it's safe to reinitialize it.
> 

Fair enough, once you've tested this new patch a bit I'll drop mine for
yours.

Thanks!

-chris


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-22 17:17       ` Chris Mason
@ 2014-05-23 12:38         ` David Sterba
  2014-05-23 12:56           ` Jeff Mahoney
  0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2014-05-23 12:38 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jeff Mahoney, linux-btrfs

On Thu, May 22, 2014 at 01:17:50PM -0400, Chris Mason wrote:
> On 05/22/2014 11:05 AM, Jeff Mahoney wrote:
> > - gpg control packet
> > On 5/22/14, 8:19 AM, Chris Mason wrote:
> >> Can we safely reinit a kobject that has been put in use in sysfs?
> >> Given all the things that can hold refs etc is this legal?
> > 
> > It depends on how the kobject is being used.
> > 
> > It wouldn't be safe to re-use the kobject embedded in space_info since
> > it controls the lifetime of the object, regardless of its use in sysfs.
> > 
> > The kobjects for block groups only exist for creating the
> > subdirectories and their lifetime is actually the lifetime of the
> > space_info. We take a reference to the space_info when we add them to
> > sysfs because that's where they draw their data. The only reference to
> > a block group kobject is taken when we add it to sysfs and is dropped
> > when we remove it. Holding a sysfs file open doesn't pin the kobject,
> > so once we remove it from sysfs (kobject_del waits for readers to
> > complete), it's safe to reinitialize it.
> > 
> 
> Fair enough, once you've tested this new patch a bit I'll drop mine for
> yours.

With a fixup in __link_block_group

-               kobject->name = get_raid_name(index);
+               kobj->name = get_raid_name(index);

it crashes right away, tests/btrfs/001, patch applied on top of 3.15-rc6,
I haven't analyzed the cause yet.

[  520.880250] ------------[ cut here ]------------
[  520.884185] kernel BUG at mm/slub.c:3376!
[  520.884185] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  520.884185] Modules linked in: rpcsec_gss_krb5 dm_crypt loop btrfs
[  520.884185] CPU: 1 PID: 3698 Comm: umount Not tainted 3.15.0-rc6-default+ #144
[  520.884185] Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
[  520.884185] task: ffff880075892370 ti: ffff88006e0ce000 task.ti: ffff88006e0ce000
[  520.884185] RIP: 0010:[<ffffffff8116efcf>]  [<ffffffff8116efcf>] kfree+0x1bf/0x1d0
[  520.884185] RSP: 0018:ffff88006e0cfcb8  EFLAGS: 00010246
[  520.884185] RAX: 4000000000000000 RBX: ffffffffa00bb63e RCX: 0000000180190007
[  520.884185] RDX: 0000000000000000 RSI: ffffea0001d5ad40 RDI: ffffffffa00bb63e
[  520.884185] RBP: ffff88006e0cfce8 R08: 0000000000000001 R09: 0000000000000000
[  520.884185] R10: 0000000000000000 R11: 0000000000000001 R12: ffffea0000802ec0
[  520.884185] R13: ffffffff813ba137 R14: ffffffffa00bb63e R15: 0000000000000003
[  520.884185] FS:  00007fba2c0cd7e0(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
[  520.884185] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  520.884185] CR2: 00007fba2b759610 CR3: 00000000756a6000 CR4: 00000000000007e0
[  520.884185] Stack:
[  520.884185]  ffff88007a849000 ffffffffa00cde20 ffff880075950be0 ffff880075950ba8
[  520.884185]  ffffffffa00bb63e 0000000000000003 ffff88006e0cfd18 ffffffff813ba137
[  520.884185]  ffff880075950ba8 ffff880075950800 ffff8800755e5a58 0000000000000004
[  520.884185] Call Trace:
[  520.884185]  [<ffffffff813ba137>] kobject_release+0xa7/0x1d0
[  520.884185]  [<ffffffff813ba291>] kobject_put+0x31/0x70
[  520.884185]  [<ffffffffa00223ca>] btrfs_free_block_groups+0x30a/0x420 [btrfs]
[  520.884185]  [<ffffffffa002f939>] close_ctree+0x149/0x2e0 [btrfs]
[  520.884185]  [<ffffffff8119560f>] ? dispose_list+0x4f/0x60
[  520.884185]  [<ffffffff81196544>] ? evict_inodes+0x114/0x130
[  520.884185]  [<ffffffffa00019b9>] btrfs_put_super+0x19/0x20 [btrfs]
[  520.884185]  [<ffffffff8117c78e>] generic_shutdown_super+0x7e/0x110
[  520.884185]  [<ffffffff8117c8b6>] kill_anon_super+0x16/0x30
[  520.884185]  [<ffffffffa000289a>] btrfs_kill_super+0x1a/0xa0 [btrfs]
[  520.884185]  [<ffffffff8117c94d>] deactivate_locked_super+0x4d/0x80
[  520.884185]  [<ffffffff8117cf5a>] deactivate_super+0x4a/0x70
[  520.884185]  [<ffffffff8119a341>] mntput_no_expire+0x111/0x180
[  520.884185]  [<ffffffff8119a247>] ? mntput_no_expire+0x17/0x180
[  520.884185]  [<ffffffff81190213>] ? dput+0x23/0x110
[  520.884185]  [<ffffffff8119b8eb>] SyS_umount+0xcb/0x420
[  520.884185]  [<ffffffff81a16092>] ? system_call_fastpath+0x16/0x1b
[  520.884185]  [<ffffffff81a16092>] system_call_fastpath+0x16/0x1b
[  520.884185] Code: c4 40 74 05 41 8b 74 24 68 4c 89 e7 e8 4b 25 fc ff e9 4f ff ff ff 49 8b 44 24 30 49 8b 14 24 80 e6 80 4c 0f 45 e0 e9 ae fe ff ff <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5
[  520.884185] RIP  [<ffffffff8116efcf>] kfree+0x1bf/0x1d0
[  520.884185]  RSP <ffff88006e0cfcb8>
[  521.242658] ---[ end trace 4228a5dbb43499c0 ]---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-23 12:38         ` David Sterba
@ 2014-05-23 12:56           ` Jeff Mahoney
  2014-05-23 14:32             ` David Sterba
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Mahoney @ 2014-05-23 12:56 UTC (permalink / raw)
  To: dsterba, Chris Mason, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/23/14, 8:38 AM, David Sterba wrote:
> On Thu, May 22, 2014 at 01:17:50PM -0400, Chris Mason wrote:
>> On 05/22/2014 11:05 AM, Jeff Mahoney wrote:
>>> - gpg control packet On 5/22/14, 8:19 AM, Chris Mason wrote:
>>>> Can we safely reinit a kobject that has been put in use in
>>>> sysfs? Given all the things that can hold refs etc is this
>>>> legal?
>>> 
>>> It depends on how the kobject is being used.
>>> 
>>> It wouldn't be safe to re-use the kobject embedded in
>>> space_info since it controls the lifetime of the object,
>>> regardless of its use in sysfs.
>>> 
>>> The kobjects for block groups only exist for creating the 
>>> subdirectories and their lifetime is actually the lifetime of
>>> the space_info. We take a reference to the space_info when we
>>> add them to sysfs because that's where they draw their data.
>>> The only reference to a block group kobject is taken when we
>>> add it to sysfs and is dropped when we remove it. Holding a
>>> sysfs file open doesn't pin the kobject, so once we remove it
>>> from sysfs (kobject_del waits for readers to complete), it's
>>> safe to reinitialize it.
>>> 
>> 
>> Fair enough, once you've tested this new patch a bit I'll drop
>> mine for yours.
> 
> With a fixup in __link_block_group
> 
> -               kobject->name = get_raid_name(index); +
> kobj->name = get_raid_name(index);
> 
> it crashes right away, tests/btrfs/001, patch applied on top of
> 3.15-rc6, I haven't analyzed the cause yet.

Sigh. Yep. kobject_cleanup caches ->name before the release function
and ignores the cleared value. It seems the "free name if we alloced
it" comment in there was leftover from the middle of a patch series
Kay applied in late 2007. Commit 0f4dafc05 adds the comment while
adding a flag to indicate that kobject_set_name_vargs set the name.
The commit af5ca3f4e says kobject names must be dynamic but didn't
update the comment.

Ok, so we can't save the strdup.

- -Jeff


> [  520.880250] ------------[ cut here ]------------ [  520.884185]
> kernel BUG at mm/slub.c:3376! [  520.884185] invalid opcode: 0000
> [#1] SMP DEBUG_PAGEALLOC [  520.884185] Modules linked in:
> rpcsec_gss_krb5 dm_crypt loop btrfs [  520.884185] CPU: 1 PID: 3698
> Comm: umount Not tainted 3.15.0-rc6-default+ #144 [  520.884185]
> Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS
> TSRSCRB1.86C.0047.B00.0610170821 10/17/06 [  520.884185] task:
> ffff880075892370 ti: ffff88006e0ce000 task.ti: ffff88006e0ce000 [
> 520.884185] RIP: 0010:[<ffffffff8116efcf>]  [<ffffffff8116efcf>]
> kfree+0x1bf/0x1d0 [  520.884185] RSP: 0018:ffff88006e0cfcb8
> EFLAGS: 00010246 [  520.884185] RAX: 4000000000000000 RBX:
> ffffffffa00bb63e RCX: 0000000180190007 [  520.884185] RDX:
> 0000000000000000 RSI: ffffea0001d5ad40 RDI: ffffffffa00bb63e [
> 520.884185] RBP: ffff88006e0cfce8 R08: 0000000000000001 R09:
> 0000000000000000 [  520.884185] R10: 0000000000000000 R11:
> 0000000000000001 R12: ffffea0000802ec0 [  520.884185] R13:
> ffffffff813ba137 R14: ffffffffa00bb63e R15: 0000000000000003 [
> 520.884185] FS:  00007fba2c0cd7e0(0000) GS:ffff88007d400000(0000)
> knlGS:0000000000000000 [  520.884185] CS:  0010 DS: 0000 ES: 0000
> CR0: 000000008005003b [  520.884185] CR2: 00007fba2b759610 CR3:
> 00000000756a6000 CR4: 00000000000007e0 [  520.884185] Stack: [
> 520.884185]  ffff88007a849000 ffffffffa00cde20 ffff880075950be0
> ffff880075950ba8 [  520.884185]  ffffffffa00bb63e 0000000000000003
> ffff88006e0cfd18 ffffffff813ba137 [  520.884185]  ffff880075950ba8
> ffff880075950800 ffff8800755e5a58 0000000000000004 [  520.884185]
> Call Trace: [  520.884185]  [<ffffffff813ba137>]
> kobject_release+0xa7/0x1d0 [  520.884185]  [<ffffffff813ba291>]
> kobject_put+0x31/0x70 [  520.884185]  [<ffffffffa00223ca>]
> btrfs_free_block_groups+0x30a/0x420 [btrfs] [  520.884185]
> [<ffffffffa002f939>] close_ctree+0x149/0x2e0 [btrfs] [  520.884185]
> [<ffffffff8119560f>] ? dispose_list+0x4f/0x60 [  520.884185]
> [<ffffffff81196544>] ? evict_inodes+0x114/0x130 [  520.884185]
> [<ffffffffa00019b9>] btrfs_put_super+0x19/0x20 [btrfs] [
> 520.884185]  [<ffffffff8117c78e>]
> generic_shutdown_super+0x7e/0x110 [  520.884185]
> [<ffffffff8117c8b6>] kill_anon_super+0x16/0x30 [  520.884185]
> [<ffffffffa000289a>] btrfs_kill_super+0x1a/0xa0 [btrfs] [
> 520.884185]  [<ffffffff8117c94d>]
> deactivate_locked_super+0x4d/0x80 [  520.884185]
> [<ffffffff8117cf5a>] deactivate_super+0x4a/0x70 [  520.884185]
> [<ffffffff8119a341>] mntput_no_expire+0x111/0x180 [  520.884185]
> [<ffffffff8119a247>] ? mntput_no_expire+0x17/0x180 [  520.884185]
> [<ffffffff81190213>] ? dput+0x23/0x110 [  520.884185]
> [<ffffffff8119b8eb>] SyS_umount+0xcb/0x420 [  520.884185]
> [<ffffffff81a16092>] ? system_call_fastpath+0x16/0x1b [
> 520.884185]  [<ffffffff81a16092>] system_call_fastpath+0x16/0x1b [
> 520.884185] Code: c4 40 74 05 41 8b 74 24 68 4c 89 e7 e8 4b 25 fc
> ff e9 4f ff ff ff 49 8b 44 24 30 49 8b 14 24 80 e6 80 4c 0f 45 e0
> e9 ae fe ff ff <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
> 55 48 89 e5 [  520.884185] RIP  [<ffffffff8116efcf>]
> kfree+0x1bf/0x1d0 [  520.884185]  RSP <ffff88006e0cfcb8> [
> 521.242658] ---[ end trace 4228a5dbb43499c0 ]---
> 


- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQIcBAEBAgAGBQJTf0VmAAoJEB57S2MheeWyzg4QAL8ZbJ4t9Pky0iE5p9odTbHe
RaZkS6ctraqHtHgmViqTxu78cnbRvxoQ6oOE9ZSCS9YczWkuABFk8EQ0AbTRnblo
wO6awRI0++rHmf1gA+soAtu6TfLOYR43SOFu2H9aMK/YLAGBzFnaLefeFfloj8gb
xVZnBlv3M4syKtcOmrxkCraxz9G9XakwFDI1AGtSTgxBU4tFIE0cyQOtQZf0sAlI
XqphtcoyWMa+T9GOLdFlUpV0p1OMArZhj+/tyMV7UotwHG5Y754kBU4Vd1wCiP/q
p/+lmnlMxVRdREHkGYZLpeG39bQQjMhp8qBgznPBln/R3IQq0nFJxrcPTO3ADH5T
LuLZwl63vW/uetWbVCvDLev27RXmg9aU1RtOGes0bK7HKDx+EymE5CpayADPnmlo
ZLcRol6wol8+R4BJQI9tKffy30PN7Uzhtt79X7EZiVHX5DMGH3aMnOwUihdlgvNd
YBgzwuXFjceoby7y+K1p7CpsDuh2VTBDnQ5e5SiJ3uyLpLeLwvctJ2BmVsEyS076
e4TLQRo+bW9M1y2Pn70B3tjKcQDPnufavk5bvsJqXqXLbpAp2xQDaN0RQrbnXeZG
PU8uknhLvOmdMdxw9wqVPOGpUOIZytDofAVK4QSXGzjCsMFnl3NLqj0tGgUh9DOU
mNH6TpALtxBaY1sJh8DN
=3sYp
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-23 12:56           ` Jeff Mahoney
@ 2014-05-23 14:32             ` David Sterba
  2014-05-23 14:34               ` Jeff Mahoney
  0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2014-05-23 14:32 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Chris Mason, linux-btrfs

On Fri, May 23, 2014 at 08:56:06AM -0400, Jeff Mahoney wrote:
> Sigh. Yep. kobject_cleanup caches ->name before the release function
> and ignores the cleared value. It seems the "free name if we alloced
> it" comment in there was leftover from the middle of a patch series
> Kay applied in late 2007. Commit 0f4dafc05 adds the comment while
> adding a flag to indicate that kobject_set_name_vargs set the name.
> The commit af5ca3f4e says kobject names must be dynamic but didn't
> update the comment.
> 
> Ok, so we can't save the strdup.

The tests have been running fine for more than 40 minutes, previously it
took a few minutes to trigger the bug.

my quick fix:

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8354,7 +8354,8 @@ static void __link_block_group(struct btrfs_space_info *space_info,
                int ret;

                kobject_get(&space_info->kobj); /* put in release */
-               kobj->name = get_raid_name(index);
+               kobj->name = kstrdup(get_raid_name(index), GFP_ATOMIC);
+               BUG_ON(!kobj->name);
                ret = kobject_init_and_add(kobj, &btrfs_raid_ktype,
                                           &space_info->kobj, NULL);
                if (ret) {
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d742d7973d3e..ac26f91968a7 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -288,7 +288,7 @@ static struct attribute *raid_attributes[] = {

 static void release_raid_kobj(struct kobject *kobj)
 {
-       kobj->name = NULL;
+       /* kobj->name = NULL; */
        kobject_put(kobj->parent);
 }


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-23 14:32             ` David Sterba
@ 2014-05-23 14:34               ` Jeff Mahoney
  2014-05-23 18:09                 ` Jeff Mahoney
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Mahoney @ 2014-05-23 14:34 UTC (permalink / raw)
  To: dsterba, Chris Mason, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/23/14, 10:32 AM, David Sterba wrote:
> On Fri, May 23, 2014 at 08:56:06AM -0400, Jeff Mahoney wrote:
>> Sigh. Yep. kobject_cleanup caches ->name before the release
>> function and ignores the cleared value. It seems the "free name
>> if we alloced it" comment in there was leftover from the middle
>> of a patch series Kay applied in late 2007. Commit 0f4dafc05 adds
>> the comment while adding a flag to indicate that
>> kobject_set_name_vargs set the name. The commit af5ca3f4e says
>> kobject names must be dynamic but didn't update the comment.
>> 
>> Ok, so we can't save the strdup.
> 
> The tests have been running fine for more than 40 minutes,
> previously it took a few minutes to trigger the bug.
> 
> my quick fix:
> 
> --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@
> -8354,7 +8354,8 @@ static void __link_block_group(struct
> btrfs_space_info *space_info, int ret;
> 
> kobject_get(&space_info->kobj); /* put in release */ -
> kobj->name = get_raid_name(index); +               kobj->name =
> kstrdup(get_raid_name(index), GFP_ATOMIC); +
> BUG_ON(!kobj->name); ret = kobject_init_and_add(kobj,
> &btrfs_raid_ktype, &space_info->kobj, NULL); if (ret) { diff --git
> a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index
> d742d7973d3e..ac26f91968a7 100644 --- a/fs/btrfs/sysfs.c +++
> b/fs/btrfs/sysfs.c @@ -288,7 +288,7 @@ static struct attribute
> *raid_attributes[] = {
> 
> static void release_raid_kobj(struct kobject *kobj) { -
> kobj->name = NULL; +       /* kobj->name = NULL; */ 
> kobject_put(kobj->parent); }
> 

I'm testing as well. I'm seeing:
kobject (ffff88033f847e78): tried to init an initialized object,
something is seriously wrong.

.. as well. A memset would clear this up, but perhaps that's not the
right solution after all.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQIcBAEBAgAGBQJTf1yEAAoJEB57S2MheeWyZCkQAK09TN//RPDTlTidMwg76LOh
As1Q+dHPvCnIa3nvlI4XpCtVaIdOHiaDzGLHOSA/Dfq3PpNvETg/9R5eqcnMEXN+
W74Wr+ZJ5iwSq5jeMP4p73Ph+Dwzmc6OVr7mD4+CqVkdMuPBzWBl+aKA6d1UEWFb
2J/nUTS4EQz9I7O6UBgXt0pkZXavwp0gqqpHn4YV848gAnNyz6qhHe3nXLS/3XCW
LX3WUS1HROoD2fTzKMSxlw1xHiHFZ2XItOMgP9Ma1pr/cQA1InGxTCIVshtkx8Hq
uxgSorK/6H0iRsCVRuXX6JowYodX3UJ9x3VUQQbpI5sm5X6OlPzcwS9C4MqjjVNn
3Qqx7CJk79+CZBcBYMdK2borjrmCzQeyGQb49cH4eVNKUYLL2ja/ygoFq7aKOMgc
dTuZbcwQc43uVm768fAxTYEDqJEdHI+LUKoK05XDEm6SbaDt81T2to0wxyJjpiE3
rbNmS1ubjE+WvoZzyYREd2WRUqZsuJV27017CGkJSHqj4OjfG3lvhkrVtM5Tu+jw
8f+KlgAHHN674mRXqUDU4a8Twc7aC5Y36v5mscXAIm2wQKh5EKdPAxedUOqHmMT9
t3WJG8ErmitEuZ8Hb4FzIeL1U3ztnULg+K7vYLCuQS1/C4p3xXU6jtOgPngIuGmj
BGMMRo82DAEc+pxHJI7w
=RN4K
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't remove raid type sysfs entries until unmount
  2014-05-23 14:34               ` Jeff Mahoney
@ 2014-05-23 18:09                 ` Jeff Mahoney
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Mahoney @ 2014-05-23 18:09 UTC (permalink / raw)
  To: dsterba, Chris Mason, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/23/14, 10:34 AM, Jeff Mahoney wrote:
> On 5/23/14, 10:32 AM, David Sterba wrote:
>> On Fri, May 23, 2014 at 08:56:06AM -0400, Jeff Mahoney wrote:
>>> Sigh. Yep. kobject_cleanup caches ->name before the release 
>>> function and ignores the cleared value. It seems the "free
>>> name if we alloced it" comment in there was leftover from the
>>> middle of a patch series Kay applied in late 2007. Commit
>>> 0f4dafc05 adds the comment while adding a flag to indicate
>>> that kobject_set_name_vargs set the name. The commit af5ca3f4e
>>> says kobject names must be dynamic but didn't update the
>>> comment.
>>> 
>>> Ok, so we can't save the strdup.
> 
>> The tests have been running fine for more than 40 minutes, 
>> previously it took a few minutes to trigger the bug.
> 
>> my quick fix:
> 
>> --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ 
>> -8354,7 +8354,8 @@ static void __link_block_group(struct 
>> btrfs_space_info *space_info, int ret;
> 
>> kobject_get(&space_info->kobj); /* put in release */ - kobj->name
>> = get_raid_name(index); +               kobj->name = 
>> kstrdup(get_raid_name(index), GFP_ATOMIC); + BUG_ON(!kobj->name);
>> ret = kobject_init_and_add(kobj, &btrfs_raid_ktype,
>> &space_info->kobj, NULL); if (ret) { diff --git 
>> a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 
>> d742d7973d3e..ac26f91968a7 100644 --- a/fs/btrfs/sysfs.c +++ 
>> b/fs/btrfs/sysfs.c @@ -288,7 +288,7 @@ static struct attribute 
>> *raid_attributes[] = {
> 
>> static void release_raid_kobj(struct kobject *kobj) { - 
>> kobj->name = NULL; +       /* kobj->name = NULL; */ 
>> kobject_put(kobj->parent); }
> 
> 
> I'm testing as well. I'm seeing: kobject (ffff88033f847e78): tried
> to init an initialized object, something is seriously wrong.
> 
> .. as well. A memset would clear this up, but perhaps that's not
> the right solution after all.

For background, the reason the kobjects are statically allocated is
because they cover all of the block group caches for that raid type
and it seemed silly to have a special structure that only contains a
kobject and an int. So it uses its index in that array to determine
which raid type it covers. Putting the kobject, list head, and type in
a single structure ends up introducing more complexity for little gain
since we end up with that lockdep alloc during reclaim problem again
(or end up allocating every time and freeing if it's not needed.) In
retrospect, we waste less memory allocating a special structure
on-demand anyway and it eliminates the questions.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQIcBAEBAgAGBQJTf47SAAoJEB57S2MheeWyJrAP/RLfyy90y+6pARBi1bseMW0d
maQcNk6DkWYt4DYenMe+lvJJf+RPBotsFaDzq1Eblq8DDi1r7hb2wZvpS0AaCbCg
hRLAWXpRi4cXLfjt5fQp+chymE6QUc3PYCFa9DvqcCjKBv3Wm2ArVAUbL4V1mucJ
klLwI1O2Y8i1EbBifPIHzClK6xeiAYhe7yOs+CoH9Q2oZA+vuHtAuNyUTOMXS0sY
88/2o/cnqX762fxT6J8DPPMqKiu3YwoiNGOW2gEpHyujKmY66BnfHsrx07lR3atj
eOD5VmUsHKYWvEWqbmGoCTHmT3wJbaWw1tGlcOIS2LVdbq538HUzsEk3ENhB9kfU
ANz7b4X7iGSzAeLdkLzMKGt5ApAqqEOGO5BBtcQhmPaMtLXNXgEraSjEkd4dO1Xs
lyte4PQ+FICI/hKwOVcZPt/64mC1ABfmdnAwBx4zSECsWv7HuXTtVAX5MhZX1vGQ
l9sdkwq1nfGhcY4Zf/KIF8Q2ljRH/k1I9XCdFJpyUSnk5NBvV7qLBw+9C8rQzirs
J5DTYWDAq1h1W0vn3JRzPi9AmO7DGPBeJIyGlR8JAxxjw29PX9j0mzB6T2Hc3zZe
jzIzjBnBdJRc2fiFIJXkKc1QpdJ6J6DKQaMGv6Yz58TYu3/juAKRD2C9DuKvPWWb
FvK4VPhhsCQhZzJvQsko
=MwZr
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-05-23 18:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-22  0:12 [PATCH] Btrfs: don't remove raid type sysfs entries until unmount Chris Mason
2014-05-22  1:21 ` Jeff Mahoney
2014-05-22 12:19   ` Chris Mason
2014-05-22 15:05     ` Jeff Mahoney
2014-05-22 17:17       ` Chris Mason
2014-05-23 12:38         ` David Sterba
2014-05-23 12:56           ` Jeff Mahoney
2014-05-23 14:32             ` David Sterba
2014-05-23 14:34               ` Jeff Mahoney
2014-05-23 18:09                 ` Jeff Mahoney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).