All of lore.kernel.org
 help / color / mirror / Atom feed
* ext3/jbd oops in journal_start
@ 2009-10-31  6:14 Sage Weil
  2009-10-31  8:18 ` Dmitry Monakhov
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2009-10-31  6:14 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

Hi,

I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and 
2.6.31).  data=writeback or data=ordered.  It's not the hardware or 
drive... I have 8 boxes (each with slightly different hardware) that crash 
identically.

The oops is at fs/jbd/transaction.c, journal_start():

		J_ASSERT(handle->h_transaction->t_journal == journal);

because handle->h_transaction is 0x1bf (or some other value close to 
that).  I can trigger on the 10th or so call to journal_start after 
mounting.

Has anyone seen this before?  I feel like I must be doing something silly 
here, since I can't find any references to this particular crash, but I'm 
having no problem triggering it right away, even after a fresh mke2fs 
-j...

Any suggestions on where to look or should I just start testing older 
kernel versions and bisect?

sage


[   83.550657] handle->h_transaction 00000000000001bf
[   83.555564] BUG: unable to handle kernel NULL pointer dereference at 00000000000001bf
[   83.559531] IP: [<ffffffff8118793c>] journal_start+0x87/0x184
[   83.559531] PGD 10e351067 PUD 10e1cb067 PMD 0 
[   83.559531] Oops: 0000 [#1] PREEMPT SMP 
[   83.559531] last sysfs file: /sys/class/net/lo/operstate
[   83.559531] CPU 1 
[   83.559531] Modules linked in: btrfs zlib_deflate fan ac battery 
ide_pci_generic shpchp k8temp serio_raw psmouse pcspkr ehci_hcd 
serverworks processor ohci_hcd pci_hotplug thermal button
[   83.559531] Pid: 2849, comm: cosd Not tainted 2.6.32-rc5 #7 H8SSL-I2
[   83.559531] RIP: 0010:[<ffffffff8118793c>]  [<ffffffff8118793c>] journal_start+0x87/0x184
[   83.559531] RSP: 0018:ffff88010e335b28  EFLAGS: 00010292
[   83.559531] RAX: 00000000000001bf RBX: ffff88010eeee4e0 RCX: 000000000000ad01
[   83.559531] RDX: ffff88002f400000 RSI: 0000000000000001 RDI: ffffffff81610214
[   83.559531] RBP: ffff88010e335b58 R08: ffff88010e3359d7 R09: 0000000000000000
[   83.559531] R10: ffffffff8106314b R11: ffff88010e335908 R12: ffff88010eeee4e0
[   83.559531] R13: ffff88010e17a200 R14: ffff88010f535800 R15: 000000000000000b
[   83.559531] FS:  00007fe3bce8b6f0(0000) GS:ffff88002f400000(0000) knlGS:0000000000000000
[   83.559531] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   83.559531] CR2: 00000000000001bf CR3: 0000000110223000 CR4: 00000000000006e0
[   83.559531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   83.559531] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   83.559531] Process cosd (pid: 2849, threadinfo ffff88010e334000, task ffff88010e17a200)
[   83.559531] Stack:
[   83.559531]  ffff88010e335b58 ffffffff814cbb10 ffffea0006cf6038 ffff88010eeea888
[   83.559531] <0> 0000000000000000 00000000000005f4 ffff88010e335b68 ffffffff811443b3
[   83.559531] <0> ffff88010e335c08 ffffffff8113c347 ffff88010e335ca8 ffffffff81070369
[   83.559531] Call Trace:
[   83.559531]  [<ffffffff811443b3>] ext3_journal_start_sb+0x4a/0x4c
[   83.559531]  [<ffffffff8113c347>] ext3_write_begin+0x9c/0x1e2
[   83.559531]  [<ffffffff81070369>] ? __lock_acquire+0x17d8/0x17ea
[   83.559531]  [<ffffffff810a5021>] generic_file_buffered_write+0x120/0x2a5
[   83.559531]  [<ffffffff810a564d>] __generic_file_aio_write+0x34f/0x383
[   83.559531]  [<ffffffff810a56e4>] generic_file_aio_write+0x63/0xaa
[   83.559531]  [<ffffffff810d98b2>] do_sync_write+0xe7/0x12d
[   83.559531]  [<ffffffff8105f368>] ? autoremove_wake_function+0x0/0x38
[   83.559531]  [<ffffffff8106a7fc>] ? put_lock_stats+0xe/0x27
[   83.559531]  [<ffffffff8125752c>] ? security_file_permission+0x11/0x13
[   83.559531]  [<ffffffff810da240>] vfs_write+0xae/0x14a
[   83.559531]  [<ffffffff810da3a0>] sys_write+0x47/0x6e
[   83.559531]  [<ffffffff8100baab>] system_call_fastpath+0x16/0x1b
[   83.559531] Code: 89 de 48 c7 c7 e9 01 61 81 31 c0 e8 71 f6 31 00 48 8b 
33 48 c7 c7 f7 01 61 81 31 c0 e8 60 f6 31 00 48 8b 03 48 c7 c7 14 02 61 81 
<48> 8b 30 31 c0 e8 4c f6 31 00 48 8b 03 48 8b 30 4c 39 f6 74 11 
[   83.559531] RIP  [<ffffffff8118793c>] journal_start+0x87/0x184
[   83.559531]  RSP <ffff88010e335b28>
[   83.559531] CR2: 00000000000001bf
[   83.847504] ---[ end trace 450f151cbabc2177 ]---


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3/jbd oops in journal_start
  2009-10-31  6:14 ext3/jbd oops in journal_start Sage Weil
@ 2009-10-31  8:18 ` Dmitry Monakhov
  2009-11-03  6:06   ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Monakhov @ 2009-10-31  8:18 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-kernel, linux-fsdevel

Sage Weil <sage@newdream.net> writes:

> Hi,
>
> I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and 
> 2.6.31).  data=writeback or data=ordered.  It's not the hardware or 
> drive... I have 8 boxes (each with slightly different hardware) that crash 
> identically.
Strange, 2.6.31 with ext3 is quite popular configuration...
Can you please post exact test-case.
>
> The oops is at fs/jbd/transaction.c, journal_start():
>
> 		J_ASSERT(handle->h_transaction->t_journal == journal);
*handle = journal_current_handle()

IMHO it's looks like you have entered here with current->journal_info != NULL

, but journal_info contains unexpected data
This may happens in two cases:
1) calling jbd code from other filesystem.
2) Some fs forget to zero current->journal_info on exit from vfs
According to call trace we have got second case. Do you use some 
unusual/experimental fs?
>
> because handle->h_transaction is 0x1bf (or some other value close to 
> that).  I can trigger on the 10th or so call to journal_start after 
> mounting.
>
> Has anyone seen this before?  I feel like I must be doing something silly 
> here, since I can't find any references to this particular crash, but I'm 
> having no problem triggering it right away, even after a fresh mke2fs 
> -j...
>
> Any suggestions on where to look or should I just start testing older 
> kernel versions and bisect?
>
> sage
>
>
> [   83.550657] handle->h_transaction 00000000000001bf
> [   83.555564] BUG: unable to handle kernel NULL pointer dereference at 00000000000001bf
> [   83.559531] IP: [<ffffffff8118793c>] journal_start+0x87/0x184
> [   83.559531] PGD 10e351067 PUD 10e1cb067 PMD 0 
> [   83.559531] Oops: 0000 [#1] PREEMPT SMP 
> [   83.559531] last sysfs file: /sys/class/net/lo/operstate
> [   83.559531] CPU 1 
> [   83.559531] Modules linked in: btrfs zlib_deflate fan ac battery 
> ide_pci_generic shpchp k8temp serio_raw psmouse pcspkr ehci_hcd 
> serverworks processor ohci_hcd pci_hotplug thermal button
> [   83.559531] Pid: 2849, comm: cosd Not tainted 2.6.32-rc5 #7 H8SSL-I2
> [   83.559531] RIP: 0010:[<ffffffff8118793c>]  [<ffffffff8118793c>] journal_start+0x87/0x184
> [   83.559531] RSP: 0018:ffff88010e335b28  EFLAGS: 00010292
> [   83.559531] RAX: 00000000000001bf RBX: ffff88010eeee4e0 RCX: 000000000000ad01
> [   83.559531] RDX: ffff88002f400000 RSI: 0000000000000001 RDI: ffffffff81610214
> [   83.559531] RBP: ffff88010e335b58 R08: ffff88010e3359d7 R09: 0000000000000000
> [   83.559531] R10: ffffffff8106314b R11: ffff88010e335908 R12: ffff88010eeee4e0
> [   83.559531] R13: ffff88010e17a200 R14: ffff88010f535800 R15: 000000000000000b
> [   83.559531] FS:  00007fe3bce8b6f0(0000) GS:ffff88002f400000(0000) knlGS:0000000000000000
> [   83.559531] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   83.559531] CR2: 00000000000001bf CR3: 0000000110223000 CR4: 00000000000006e0
> [   83.559531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   83.559531] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   83.559531] Process cosd (pid: 2849, threadinfo ffff88010e334000, task ffff88010e17a200)
> [   83.559531] Stack:
> [   83.559531]  ffff88010e335b58 ffffffff814cbb10 ffffea0006cf6038 ffff88010eeea888
> [   83.559531] <0> 0000000000000000 00000000000005f4 ffff88010e335b68 ffffffff811443b3
> [   83.559531] <0> ffff88010e335c08 ffffffff8113c347 ffff88010e335ca8 ffffffff81070369
> [   83.559531] Call Trace:
> [   83.559531]  [<ffffffff811443b3>] ext3_journal_start_sb+0x4a/0x4c
> [   83.559531]  [<ffffffff8113c347>] ext3_write_begin+0x9c/0x1e2
> [   83.559531]  [<ffffffff81070369>] ? __lock_acquire+0x17d8/0x17ea
> [   83.559531]  [<ffffffff810a5021>] generic_file_buffered_write+0x120/0x2a5
> [   83.559531]  [<ffffffff810a564d>] __generic_file_aio_write+0x34f/0x383
> [   83.559531]  [<ffffffff810a56e4>] generic_file_aio_write+0x63/0xaa
> [   83.559531]  [<ffffffff810d98b2>] do_sync_write+0xe7/0x12d
> [   83.559531]  [<ffffffff8105f368>] ? autoremove_wake_function+0x0/0x38
> [   83.559531]  [<ffffffff8106a7fc>] ? put_lock_stats+0xe/0x27
> [   83.559531]  [<ffffffff8125752c>] ? security_file_permission+0x11/0x13
> [   83.559531]  [<ffffffff810da240>] vfs_write+0xae/0x14a
> [   83.559531]  [<ffffffff810da3a0>] sys_write+0x47/0x6e
> [   83.559531]  [<ffffffff8100baab>] system_call_fastpath+0x16/0x1b
> [   83.559531] Code: 89 de 48 c7 c7 e9 01 61 81 31 c0 e8 71 f6 31 00 48 8b 
> 33 48 c7 c7 f7 01 61 81 31 c0 e8 60 f6 31 00 48 8b 03 48 c7 c7 14 02 61 81 
> <48> 8b 30 31 c0 e8 4c f6 31 00 48 8b 03 48 8b 30 4c 39 f6 74 11 
> [   83.559531] RIP  [<ffffffff8118793c>] journal_start+0x87/0x184
> [   83.559531]  RSP <ffff88010e335b28>
> [   83.559531] CR2: 00000000000001bf
> [   83.847504] ---[ end trace 450f151cbabc2177 ]---
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3/jbd oops in journal_start
  2009-10-31  8:18 ` Dmitry Monakhov
@ 2009-11-03  6:06   ` Sage Weil
  2009-11-03 10:01       ` Dmitry Monakhov
  2009-11-03 15:26     ` Chris Mason
  0 siblings, 2 replies; 7+ messages in thread
From: Sage Weil @ 2009-11-03  6:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Dmitry Monakhov, linux-kernel, linux-fsdevel

On Sat, 31 Oct 2009, Dmitry Monakhov wrote:

> Sage Weil <sage@newdream.net> writes:
> 
> > Hi,
> >
> > I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and 
> > 2.6.31).  data=writeback or data=ordered.  It's not the hardware or 
> > drive... I have 8 boxes (each with slightly different hardware) that crash 
> > identically.
> Strange, 2.6.31 with ext3 is quite popular configuration...
> Can you please post exact test-case.
> >
> > The oops is at fs/jbd/transaction.c, journal_start():
> >
> > 		J_ASSERT(handle->h_transaction->t_journal == journal);
> *handle = journal_current_handle()
> 
> IMHO it's looks like you have entered here with current->journal_info != NULL
> 
> , but journal_info contains unexpected data
> This may happens in two cases:
> 1) calling jbd code from other filesystem.
> 2) Some fs forget to zero current->journal_info on exit from vfs
> According to call trace we have got second case. Do you use some 
> unusual/experimental fs?

Yep, it was #2.  It turns out btrfs s setting current->journal_info 
(for no reason that I can see?), and with the transaction ioctl a 
transaction can span multiple calls.

Chris, is it ok to just remove the journal_info bits?  Nothing in fs/btrfs 
even looks at it.  I'm not sure what the point of only conditionally 
setting/clearly journal_info would be either, unless it's for debugging or 
something?

Thanks-
sage

---
From: Sage Weil <sage@newdream.net>
Date: Mon, 2 Nov 2009 14:21:29 -0800
Subject: [PATCH] Btrfs: don't set current->journal_info

Btrfs doesn't use current->journal_info for anything, so don't set it.  
We currently cause a NULL dereference in jbd if a process starts a btrfs 
user transaction and then touches another mounted fs that uses jbd, since 
current->journal_info is only supposed to be set for the duration of a 
single call into the fs.

Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/btrfs/transaction.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index bca82a4..c6dbbb8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -186,9 +186,6 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root,
 	h->alloc_exclude_start = 0;
 	h->delayed_ref_updates = 0;
 
-	if (!current->journal_info)
-		current->journal_info = h;
-
 	root->fs_info->running_transaction->use_count++;
 	record_root_in_trans(h, root);
 	mutex_unlock(&root->fs_info->trans_mutex);
@@ -321,8 +318,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	put_transaction(cur_trans);
 	mutex_unlock(&info->trans_mutex);
 
-	if (current->journal_info == trans)
-		current->journal_info = NULL;
 	memset(trans, 0, sizeof(*trans));
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
@@ -1105,9 +1100,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 
 	mutex_unlock(&root->fs_info->trans_mutex);
 
-	if (current->journal_info == trans)
-		current->journal_info = NULL;
-
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
 	return ret;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: ext3/jbd oops in journal_start
  2009-11-03  6:06   ` Sage Weil
  2009-11-03 10:01       ` Dmitry Monakhov
@ 2009-11-03 10:01       ` Dmitry Monakhov
  1 sibling, 0 replies; 7+ messages in thread
From: Dmitry Monakhov @ 2009-11-03 10:01 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-btrfs, linux-kernel, linux-fsdevel

2009/11/3 Sage Weil <sage@newdream.net>:
> On Sat, 31 Oct 2009, Dmitry Monakhov wrote:
>
>> Sage Weil <sage@newdream.net> writes:
>>
>> > Hi,
>> >
>> > I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-r=
c3 (and
>> > 2.6.31). =C2=A0data=3Dwriteback or data=3Dordered. =C2=A0It's not =
the hardware or
>> > drive... I have 8 boxes (each with slightly different hardware) th=
at crash
>> > identically.
>> Strange, 2.6.31 with ext3 is quite popular configuration...
>> Can you please post exact test-case.
>> >
>> > The oops is at fs/jbd/transaction.c, journal_start():
>> >
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 J_ASSERT(handle->h_trans=
action->t_journal =3D=3D journal);
>> *handle =3D journal_current_handle()
>>
>> IMHO it's looks like you have entered here with current->journal_inf=
o !=3D NULL
>>
>> , but journal_info contains unexpected data
>> This may happens in two cases:
>> 1) calling jbd code from other filesystem.
>> 2) Some fs forget to zero current->journal_info on exit from vfs
>> According to call trace we have got second case. Do you use some
>> unusual/experimental fs?
>
> Yep, it was #2. =C2=A0It turns out btrfs s setting current->journal_i=
nfo
> (for no reason that I can see?), and with the transaction ioctl a
> transaction can span multiple calls.
Whait a minute. Thats ioctl totally wrong! How it can prevent from prob=
lems?
issues:
1) If we are using this thechnic to preserving atomic behaviour, but
process dies unexpectedly in the midle of complex
operation which  must be done attomically. Then we call trans_end
which result in committing some internal file state.

2) What happens when second process try to touch file with opened
transaction? what behavuour expected from fsync?
process 1                                    process 2
1 fd =3D open("./file)                         open("./file")
2 ioctl_start_trans
3 write(fd, "a", 1)                           write(fd,"b", 1)
4                                                 fsync
5
What data do you expect after state (4), and after state (5)
>
> Chris, is it ok to just remove the journal_info bits? =C2=A0Nothing i=
n fs/btrfs
> even looks at it. =C2=A0I'm not sure what the point of only condition=
ally
> setting/clearly journal_info would be either, unless it's for debuggi=
ng or
> something?
>
> Thanks-
> sage
>
> ---
> From: Sage Weil <sage@newdream.net>
> Date: Mon, 2 Nov 2009 14:21:29 -0800
> Subject: [PATCH] Btrfs: don't set current->journal_info
>
> Btrfs doesn't use current->journal_info for anything, so don't set it=
=2E
> We currently cause a NULL dereference in jbd if a process starts a bt=
rfs
> user transaction and then touches another mounted fs that uses jbd, s=
ince
> current->journal_info is only supposed to be set for the duration of =
a
> single call into the fs.

NAK The root of bad thing is ioctl patch.
If you really want to design external(from kernel point of view)
atomic interface then you have to use another
locking mechanism not transaction. Some thing like lock_flag. And it
must be stored on inode in order to provide
mutal exclusion for concurent task. Otherwise this mechanism is useless=
=2E
This patch just hide problem under a carpet. Actually setting
current->journal_info
is a good way to catch  fs to fs switching.
>
> Signed-off-by: Sage Weil <sage@newdream.net>
> ---
> =C2=A0fs/btrfs/transaction.c | =C2=A0 =C2=A08 --------
> =C2=A01 files changed, 0 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index bca82a4..c6dbbb8 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -186,9 +186,6 @@ static struct btrfs_trans_handle *start_transacti=
on(struct btrfs_root *root,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0h->alloc_exclude_start =3D 0;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0h->delayed_ref_updates =3D 0;
>
> - =C2=A0 =C2=A0 =C2=A0 if (!current->journal_info)
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 current->journal_i=
nfo =3D h;
> -
> =C2=A0 =C2=A0 =C2=A0 =C2=A0root->fs_info->running_transaction->use_co=
unt++;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0record_root_in_trans(h, root);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(&root->fs_info->trans_mutex);
> @@ -321,8 +318,6 @@ static int __btrfs_end_transaction(struct btrfs_t=
rans_handle *trans,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0put_transaction(cur_trans);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(&info->trans_mutex);
>
> - =C2=A0 =C2=A0 =C2=A0 if (current->journal_info =3D=3D trans)
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 current->journal_i=
nfo =3D NULL;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0memset(trans, 0, sizeof(*trans));
> =C2=A0 =C2=A0 =C2=A0 =C2=A0kmem_cache_free(btrfs_trans_handle_cachep,=
 trans);
>
> @@ -1105,9 +1100,6 @@ int btrfs_commit_transaction(struct btrfs_trans=
_handle *trans,
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(&root->fs_info->trans_mutex);
>
> - =C2=A0 =C2=A0 =C2=A0 if (current->journal_info =3D=3D trans)
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 current->journal_i=
nfo =3D NULL;
> -
> =C2=A0 =C2=A0 =C2=A0 =C2=A0kmem_cache_free(btrfs_trans_handle_cachep,=
 trans);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0return ret;
> =C2=A0}
> --
> 1.5.6.5
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3/jbd oops in journal_start
@ 2009-11-03 10:01       ` Dmitry Monakhov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Monakhov @ 2009-11-03 10:01 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-btrfs, linux-kernel, linux-fsdevel

2009/11/3 Sage Weil <sage@newdream.net>:
> On Sat, 31 Oct 2009, Dmitry Monakhov wrote:
>
>> Sage Weil <sage@newdream.net> writes:
>>
>> > Hi,
>> >
>> > I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and
>> > 2.6.31).  data=writeback or data=ordered.  It's not the hardware or
>> > drive... I have 8 boxes (each with slightly different hardware) that crash
>> > identically.
>> Strange, 2.6.31 with ext3 is quite popular configuration...
>> Can you please post exact test-case.
>> >
>> > The oops is at fs/jbd/transaction.c, journal_start():
>> >
>> >             J_ASSERT(handle->h_transaction->t_journal == journal);
>> *handle = journal_current_handle()
>>
>> IMHO it's looks like you have entered here with current->journal_info != NULL
>>
>> , but journal_info contains unexpected data
>> This may happens in two cases:
>> 1) calling jbd code from other filesystem.
>> 2) Some fs forget to zero current->journal_info on exit from vfs
>> According to call trace we have got second case. Do you use some
>> unusual/experimental fs?
>
> Yep, it was #2.  It turns out btrfs s setting current->journal_info
> (for no reason that I can see?), and with the transaction ioctl a
> transaction can span multiple calls.
Whait a minute. Thats ioctl totally wrong! How it can prevent from problems?
issues:
1) If we are using this thechnic to preserving atomic behaviour, but
process dies unexpectedly in the midle of complex
operation which  must be done attomically. Then we call trans_end
which result in committing some internal file state.

2) What happens when second process try to touch file with opened
transaction? what behavuour expected from fsync?
process 1                                    process 2
1 fd = open("./file)                         open("./file")
2 ioctl_start_trans
3 write(fd, "a", 1)                           write(fd,"b", 1)
4                                                 fsync
5
What data do you expect after state (4), and after state (5)
>
> Chris, is it ok to just remove the journal_info bits?  Nothing in fs/btrfs
> even looks at it.  I'm not sure what the point of only conditionally
> setting/clearly journal_info would be either, unless it's for debugging or
> something?
>
> Thanks-
> sage
>
> ---
> From: Sage Weil <sage@newdream.net>
> Date: Mon, 2 Nov 2009 14:21:29 -0800
> Subject: [PATCH] Btrfs: don't set current->journal_info
>
> Btrfs doesn't use current->journal_info for anything, so don't set it.
> We currently cause a NULL dereference in jbd if a process starts a btrfs
> user transaction and then touches another mounted fs that uses jbd, since
> current->journal_info is only supposed to be set for the duration of a
> single call into the fs.

NAK The root of bad thing is ioctl patch.
If you really want to design external(from kernel point of view)
atomic interface then you have to use another
locking mechanism not transaction. Some thing like lock_flag. And it
must be stored on inode in order to provide
mutal exclusion for concurent task. Otherwise this mechanism is useless.
This patch just hide problem under a carpet. Actually setting
current->journal_info
is a good way to catch  fs to fs switching.
>
> Signed-off-by: Sage Weil <sage@newdream.net>
> ---
>  fs/btrfs/transaction.c |    8 --------
>  1 files changed, 0 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index bca82a4..c6dbbb8 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -186,9 +186,6 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root,
>        h->alloc_exclude_start = 0;
>        h->delayed_ref_updates = 0;
>
> -       if (!current->journal_info)
> -               current->journal_info = h;
> -
>        root->fs_info->running_transaction->use_count++;
>        record_root_in_trans(h, root);
>        mutex_unlock(&root->fs_info->trans_mutex);
> @@ -321,8 +318,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>        put_transaction(cur_trans);
>        mutex_unlock(&info->trans_mutex);
>
> -       if (current->journal_info == trans)
> -               current->journal_info = NULL;
>        memset(trans, 0, sizeof(*trans));
>        kmem_cache_free(btrfs_trans_handle_cachep, trans);
>
> @@ -1105,9 +1100,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>
>        mutex_unlock(&root->fs_info->trans_mutex);
>
> -       if (current->journal_info == trans)
> -               current->journal_info = NULL;
> -
>        kmem_cache_free(btrfs_trans_handle_cachep, trans);
>        return ret;
>  }
> --
> 1.5.6.5
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3/jbd oops in journal_start
@ 2009-11-03 10:01       ` Dmitry Monakhov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Monakhov @ 2009-11-03 10:01 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-btrfs, linux-kernel, linux-fsdevel

2009/11/3 Sage Weil <sage@newdream.net>:
> On Sat, 31 Oct 2009, Dmitry Monakhov wrote:
>
>> Sage Weil <sage@newdream.net> writes:
>>
>> > Hi,
>> >
>> > I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and
>> > 2.6.31).  data=writeback or data=ordered.  It's not the hardware or
>> > drive... I have 8 boxes (each with slightly different hardware) that crash
>> > identically.
>> Strange, 2.6.31 with ext3 is quite popular configuration...
>> Can you please post exact test-case.
>> >
>> > The oops is at fs/jbd/transaction.c, journal_start():
>> >
>> >             J_ASSERT(handle->h_transaction->t_journal == journal);
>> *handle = journal_current_handle()
>>
>> IMHO it's looks like you have entered here with current->journal_info != NULL
>>
>> , but journal_info contains unexpected data
>> This may happens in two cases:
>> 1) calling jbd code from other filesystem.
>> 2) Some fs forget to zero current->journal_info on exit from vfs
>> According to call trace we have got second case. Do you use some
>> unusual/experimental fs?
>
> Yep, it was #2.  It turns out btrfs s setting current->journal_info
> (for no reason that I can see?), and with the transaction ioctl a
> transaction can span multiple calls.
Whait a minute. Thats ioctl totally wrong! How it can prevent from problems?
issues:
1) If we are using this thechnic to preserving atomic behaviour, but
process dies unexpectedly in the midle of complex
operation which  must be done attomically. Then we call trans_end
which result in committing some internal file state.

2) What happens when second process try to touch file with opened
transaction? what behavuour expected from fsync?
process 1                                    process 2
1 fd = open("./file)                         open("./file")
2 ioctl_start_trans
3 write(fd, "a", 1)                           write(fd,"b", 1)
4                                                 fsync
5
What data do you expect after state (4), and after state (5)
>
> Chris, is it ok to just remove the journal_info bits?  Nothing in fs/btrfs
> even looks at it.  I'm not sure what the point of only conditionally
> setting/clearly journal_info would be either, unless it's for debugging or
> something?
>
> Thanks-
> sage
>
> ---
> From: Sage Weil <sage@newdream.net>
> Date: Mon, 2 Nov 2009 14:21:29 -0800
> Subject: [PATCH] Btrfs: don't set current->journal_info
>
> Btrfs doesn't use current->journal_info for anything, so don't set it.
> We currently cause a NULL dereference in jbd if a process starts a btrfs
> user transaction and then touches another mounted fs that uses jbd, since
> current->journal_info is only supposed to be set for the duration of a
> single call into the fs.

NAK The root of bad thing is ioctl patch.
If you really want to design external(from kernel point of view)
atomic interface then you have to use another
locking mechanism not transaction. Some thing like lock_flag. And it
must be stored on inode in order to provide
mutal exclusion for concurent task. Otherwise this mechanism is useless.
This patch just hide problem under a carpet. Actually setting
current->journal_info
is a good way to catch  fs to fs switching.
>
> Signed-off-by: Sage Weil <sage@newdream.net>
> ---
>  fs/btrfs/transaction.c |    8 --------
>  1 files changed, 0 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index bca82a4..c6dbbb8 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -186,9 +186,6 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root,
>        h->alloc_exclude_start = 0;
>        h->delayed_ref_updates = 0;
>
> -       if (!current->journal_info)
> -               current->journal_info = h;
> -
>        root->fs_info->running_transaction->use_count++;
>        record_root_in_trans(h, root);
>        mutex_unlock(&root->fs_info->trans_mutex);
> @@ -321,8 +318,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>        put_transaction(cur_trans);
>        mutex_unlock(&info->trans_mutex);
>
> -       if (current->journal_info == trans)
> -               current->journal_info = NULL;
>        memset(trans, 0, sizeof(*trans));
>        kmem_cache_free(btrfs_trans_handle_cachep, trans);
>
> @@ -1105,9 +1100,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>
>        mutex_unlock(&root->fs_info->trans_mutex);
>
> -       if (current->journal_info == trans)
> -               current->journal_info = NULL;
> -
>        kmem_cache_free(btrfs_trans_handle_cachep, trans);
>        return ret;
>  }
> --
> 1.5.6.5
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3/jbd oops in journal_start
  2009-11-03  6:06   ` Sage Weil
  2009-11-03 10:01       ` Dmitry Monakhov
@ 2009-11-03 15:26     ` Chris Mason
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Mason @ 2009-11-03 15:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-btrfs, Dmitry Monakhov, linux-kernel, linux-fsdevel

On Mon, Nov 02, 2009 at 10:06:36PM -0800, Sage Weil wrote:
> On Sat, 31 Oct 2009, Dmitry Monakhov wrote:
> 
> > Sage Weil <sage@newdream.net> writes:
> > 
> > > Hi,
> > >
> > > I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and 
> > > 2.6.31).  data=writeback or data=ordered.  It's not the hardware or 
> > > drive... I have 8 boxes (each with slightly different hardware) that crash 
> > > identically.
> > Strange, 2.6.31 with ext3 is quite popular configuration...
> > Can you please post exact test-case.
> > >
> > > The oops is at fs/jbd/transaction.c, journal_start():
> > >
> > > 		J_ASSERT(handle->h_transaction->t_journal == journal);
> > *handle = journal_current_handle()
> > 
> > IMHO it's looks like you have entered here with current->journal_info != NULL
> > 
> > , but journal_info contains unexpected data
> > This may happens in two cases:
> > 1) calling jbd code from other filesystem.
> > 2) Some fs forget to zero current->journal_info on exit from vfs
> > According to call trace we have got second case. Do you use some 
> > unusual/experimental fs?
> 
> Yep, it was #2.  It turns out btrfs s setting current->journal_info 
> (for no reason that I can see?), and with the transaction ioctl a 
> transaction can span multiple calls.
> 
> Chris, is it ok to just remove the journal_info bits?  Nothing in fs/btrfs 
> even looks at it.  I'm not sure what the point of only conditionally 
> setting/clearly journal_info would be either, unless it's for debugging or 
> something?

Josef has plans to use it later on, but he sent along patches that will
avoid setting journal_info for userland trans.  I'll get these
integrated and pushed out.

-chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-03 15:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-31  6:14 ext3/jbd oops in journal_start Sage Weil
2009-10-31  8:18 ` Dmitry Monakhov
2009-11-03  6:06   ` Sage Weil
2009-11-03 10:01     ` Dmitry Monakhov
2009-11-03 10:01       ` Dmitry Monakhov
2009-11-03 10:01       ` Dmitry Monakhov
2009-11-03 15:26     ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.