* crash in filesytem during reboot . (and proposed patch)
@ 2012-06-15 19:22 Sadasivan Shaiju
0 siblings, 0 replies; 5+ messages in thread
From: Sadasivan Shaiju @ 2012-06-15 19:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Daniel Walker
Hi ,
I am getting the following crashes during a reboot of the system
. It looks like a race condition during umount .
<4>Call Trace:
<4>[] clear_inode+0x28/0xe8
<4>[] generic_drop_inode+0x3c/0xa8
<4>[] d_kill+0x4c/0x78
<4>[] __shrink_dcache_sb+0x258/0x360
<4>[] shrink_dcache_parent+0x140/0x190
<4>[] proc_flush_task+0xac/0x2e8
<4>[] release_task+0x80/0x4c0
<4>[] wait_consider_task+0x608/0xa80
<4>[] do_wait+0x10c/0x2b8
<4>[] SyS_wait4+0x88/0x120
<4>[] compat_sys_wait4+0xc8/0xd0
<4>[] handle_sysn32+0x44/0x84
Call Trace:
[] file_ra_state_init+0x0/0x20
[] __dentry_open+0x26c/0x3d0
[] do_filp_open+0x70c/0xbc8
[] do_sys_open+0x78/0x1e0
[] handle_sysn32+0x44/0x84
Call Trace:
[<ffffffff812ae3e4>] iput+0x3c/0x88
[<ffffffff812aaa84>] d_kill+0x4c/0x78
[<ffffffff812aad08>] __shrink_dcache_sb+0x258/0x360
[<ffffffff812ab300>] shrink_dcache_parent+0x140/0x190
[<ffffffff812eea14>] proc_flush_task+0xac/0x2e8
[<ffffffff811e6538>] release_task+0x80/0x4c0
[<ffffffff811e80c8>] do_exit+0x6f8/0x908
[<ffffffff8121dee8>] unregister_module_notifier+0x0/0x10
Call Trace:
[<ffffffff812ae3e4>] iput+0x3c/0x88
[<ffffffff812aaa84>] d_kill+0x4c/0x78
[<ffffffff812ab6b8>] dput+0x120/0x220
[<ffffffff812a0f1c>] do_lookup+0xdc/0x210 [<ffffffff812a33e8>]
__link_path_walk+0x910/0x1408 [<ffffffff812a4194>] path_walk+0x64/0x108
[<ffffffff812a4350>] do_path_lookup+0x60/0x68 [<ffffffff812a519c>]
do_filp_open+0xdc/0xbc8 [<ffffffff81293768>] do_sys_open+0x78/0x1e0
[<ffffffff81103844>] handle_sysn32+0x44/0x84
Call Trace:
[<ffffffff812af56c>] __destroy_inode+0x74/0xb0 [<ffffffff812af5bc>]
destroy_inode+0x14/0x50 [<ffffffff812aacd4>] d_kill+0x4c/0x78
[<ffffffff812aaf58>] __shrink_dcache_sb+0x258/0x360 [<ffffffff812ab550>]
shrink_dcache_parent+0x140/0x190 [<ffffffff812eec9c>]
proc_flush_task+0xac/0x2e8 [<ffffffff811e65f0>] release_task+0x80/0x4c0
[<ffffffff811e7038>] wait_consider_task+0x608/0xa80 [<ffffffff811e75bc>]
do_wait+0x10c/0x2b8 [<ffffffff811e7928>] SyS_waitid+0xa0/0x200
[<ffffffff812241cc>] compat_sys_waitid+0x64/0xd8 [<ffffffff81103844>]
handle_sysn32+0x44/0x84
I am thinking of putting the following fix in
shrink_dcache_parent() . Please let me know is there any problem
with this fix .
Index: linux-2.6.32/fs/dcache.c
===================================================================
--- linux-2.6.32.orig/fs/dcache.c 2012-05-30 15:59:18.000000000
-0700
+++ linux-2.6.32/fs/dcache.c 2012-06-11 17:10:33.000000000 -0700
@@ -881,8 +881,14 @@
struct super_block *sb = parent->d_sb;
int found;
- while ((found = select_parent(parent)) != 0)
- __shrink_dcache_sb(sb, &found, 0);
+ while ((found = select_parent(parent)) != 0) {
+ if (down_read_trylock(&sb->s_umount)) {
+ if ((sb->s_root != NULL)) {
+ __shrink_dcache_sb(sb, &found, 0);
+ }
+ up_read(&sb->s_umount);
+ }
+ }
}
/*
Regards,
shaiju
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: crash in filesytem during reboot . (and proposed patch)
2012-06-22 21:29 ` Andrew Morton
@ 2012-06-23 0:53 ` Sadasivan Shaiju
0 siblings, 0 replies; 5+ messages in thread
From: Sadasivan Shaiju @ 2012-06-23 0:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Hi Andrew,
Please see inline .
-----Original Message-----
From: Andrew Morton [mailto:akpm@linux-foundation.org]
Sent: Friday, June 22, 2012 2:30 PM
To: Sadasivan Shaiju
Cc: linux-kernel@vger.kernel.org
Subject: Re: crash in filesytem during reboot . (and proposed patch)
On Fri, 15 Jun 2012 11:12:09 -0700
Sadasivan Shaiju <sshaiju@mvista.com> wrote:
> Hi
>
>
>
Your email is quadruple-spaced. Please, fix that.
Sure I will fix this .
> I am getting the following crashes during a reboot of the
system
> . It looks like a race condition during unmount .
>
> <4>Call Trace:
> <4>[] clear_inode+0x28/0xe8
> <4>[] generic_drop_inode+0x3c/0xa8
> <4>[] d_kill+0x4c/0x78
> <4>[] __shrink_dcache_sb+0x258/0x360
> <4>[] shrink_dcache_parent+0x140/0x190 <4>[]
> proc_flush_task+0xac/0x2e8 <4>[] release_task+0x80/0x4c0 <4>[]
> wait_consider_task+0x608/0xa80 <4>[] do_wait+0x10c/0x2b8 <4>[]
> SyS_wait4+0x88/0x120 <4>[] compat_sys_wait4+0xc8/0xd0 <4>[]
> handle_sysn32+0x44/0x84
>
> Call Trace:
> [] file_ra_state_init+0x0/0x20
> [] __dentry_open+0x26c/0x3d0
> [] do_filp_open+0x70c/0xbc8
> [] do_sys_open+0x78/0x1e0
> [] handle_sysn32+0x44/0x84
>
> Call Trace:
> [<ffffffff812ae3e4>] iput+0x3c/0x88
> [<ffffffff812aaa84>] d_kill+0x4c/0x78
> [<ffffffff812aad08>] __shrink_dcache_sb+0x258/0x360
> [<ffffffff812ab300>] shrink_dcache_parent+0x140/0x190
> [<ffffffff812eea14>] proc_flush_task+0xac/0x2e8 [<ffffffff811e6538>]
> release_task+0x80/0x4c0 [<ffffffff811e80c8>] do_exit+0x6f8/0x908
> [<ffffffff8121dee8>] unregister_module_notifier+0x0/0x10
>
> Call Trace:
> [<ffffffff812ae3e4>] iput+0x3c/0x88
> [<ffffffff812aaa84>] d_kill+0x4c/0x78
> [<ffffffff812ab6b8>] dput+0x120/0x220
> [<ffffffff812a0f1c>] do_lookup+0xdc/0x210 [<ffffffff812a33e8>]
> __link_path_walk+0x910/0x1408 [<ffffffff812a4194>]
> path_walk+0x64/0x108 [<ffffffff812a4350>] do_path_lookup+0x60/0x68
> [<ffffffff812a519c>]
> do_filp_open+0xdc/0xbc8 [<ffffffff81293768>] do_sys_open+0x78/0x1e0
> [<ffffffff81103844>] handle_sysn32+0x44/0x84
>
> ...
>
> I am thinking of putting the following fix in
> shrink_dcache_parent() . Please let me know is there any
problem
> with this fix .
>
> ...
>
> --- linux-2.6.32.orig/fs/dcache.c 2012-05-30 15:59:18.000000000
-0700
> +++ linux-2.6.32/fs/dcache.c 2012-06-11 17:10:33.000000000 -0700
> @@ -881,8 +881,14 @@
> struct super_block *sb = parent->d_sb;
> int found;
>
> - while ((found = select_parent(parent)) != 0)
> - __shrink_dcache_sb(sb, &found, 0);
> + while ((found = select_parent(parent)) != 0) {
> + if (down_read_trylock(&sb->s_umount)) {
> + if ((sb->s_root != NULL)) {
> + __shrink_dcache_sb(sb, &found, 0);
> + }
> + up_read(&sb->s_umount);
> + }
> + }
> }
Please fully describe the race which you believe you have found. What
races against what?
The race is between generic_shutdown_super() and __shrink_dcache_sb ()
. Under high memory pressure one
Of our user process crashed and the parent was trying to do a clean up
with the following stack flow
<4>[] clear_inode+0x28/0xe8
<4>[] generic_drop_inode+0x3c/0xa8
<4>[] d_kill+0x4c/0x78
<4>[] __shrink_dcache_sb+0x258/0x360
<4>[] shrink_dcache_parent+0x140/0x190
<4>[] proc_flush_task+0xac/0x2e8
<4>[] release_task+0x80/0x4c0
<4>[] wait_consider_task+0x608/0xa80
<4>[] do_wait+0x10c/0x2b8
<4>[] SyS_wait4+0x88/0x120
<4>[] compat_sys_wait4+0xc8/0xd0
<4>[] handle_sysn32+0x44/0x84
During that time the system get rebooted and unmounting starts .
Meanwhile the parent process is trying to clean up
The child' dentry's and clear_inode will reference to a stale inode and
it will crash . So I try to grab the s_umount lock
So that __shrink_dcache_sb() won't be called during unmounts . This
prevents accessing the stale inode in clear_inode .
A similar race condition is already prevented in prune_dcache()
(between generic_shutdown_super ()and __shrink_dcache_sb () ) .
Please also confirm that the bug is still present in current kernels -
2.6.32 is rather old.
I am not sure whether the bug is still present in current kernels.
But I do see some rcu locks in this area in the current kernel .
We are moving to 3.4 kernel . But the current product is still based on
2.6.32 .
So we need to fix this issue in 2.6.32 .
Regards,
Shaiju.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: crash in filesytem during reboot . (and proposed patch)
[not found] <797b2bac7e6fb198ea25433e302856b9@mail.gmail.com>
@ 2012-06-22 21:29 ` Andrew Morton
2012-06-23 0:53 ` Sadasivan Shaiju
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2012-06-22 21:29 UTC (permalink / raw)
To: Sadasivan Shaiju; +Cc: linux-kernel
On Fri, 15 Jun 2012 11:12:09 -0700
Sadasivan Shaiju <sshaiju@mvista.com> wrote:
> Hi
>
>
>
Your email is quadruple-spaced. Please, fix that.
> I am getting the following crashes during a reboot of the system
> . It looks like a race condition during unmount .
>
> <4>Call Trace:
> <4>[] clear_inode+0x28/0xe8
> <4>[] generic_drop_inode+0x3c/0xa8
> <4>[] d_kill+0x4c/0x78
> <4>[] __shrink_dcache_sb+0x258/0x360
> <4>[] shrink_dcache_parent+0x140/0x190
> <4>[] proc_flush_task+0xac/0x2e8
> <4>[] release_task+0x80/0x4c0
> <4>[] wait_consider_task+0x608/0xa80
> <4>[] do_wait+0x10c/0x2b8
> <4>[] SyS_wait4+0x88/0x120
> <4>[] compat_sys_wait4+0xc8/0xd0
> <4>[] handle_sysn32+0x44/0x84
>
> Call Trace:
> [] file_ra_state_init+0x0/0x20
> [] __dentry_open+0x26c/0x3d0
> [] do_filp_open+0x70c/0xbc8
> [] do_sys_open+0x78/0x1e0
> [] handle_sysn32+0x44/0x84
>
> Call Trace:
> [<ffffffff812ae3e4>] iput+0x3c/0x88
> [<ffffffff812aaa84>] d_kill+0x4c/0x78
> [<ffffffff812aad08>] __shrink_dcache_sb+0x258/0x360
> [<ffffffff812ab300>] shrink_dcache_parent+0x140/0x190
> [<ffffffff812eea14>] proc_flush_task+0xac/0x2e8
> [<ffffffff811e6538>] release_task+0x80/0x4c0
> [<ffffffff811e80c8>] do_exit+0x6f8/0x908
> [<ffffffff8121dee8>] unregister_module_notifier+0x0/0x10
>
> Call Trace:
> [<ffffffff812ae3e4>] iput+0x3c/0x88
> [<ffffffff812aaa84>] d_kill+0x4c/0x78
> [<ffffffff812ab6b8>] dput+0x120/0x220
> [<ffffffff812a0f1c>] do_lookup+0xdc/0x210 [<ffffffff812a33e8>]
> __link_path_walk+0x910/0x1408 [<ffffffff812a4194>] path_walk+0x64/0x108
> [<ffffffff812a4350>] do_path_lookup+0x60/0x68 [<ffffffff812a519c>]
> do_filp_open+0xdc/0xbc8 [<ffffffff81293768>] do_sys_open+0x78/0x1e0
> [<ffffffff81103844>] handle_sysn32+0x44/0x84
>
> ...
>
> I am thinking of putting the following fix in
> shrink_dcache_parent() . Please let me know is there any problem
> with this fix .
>
> ...
>
> --- linux-2.6.32.orig/fs/dcache.c 2012-05-30 15:59:18.000000000 -0700
> +++ linux-2.6.32/fs/dcache.c 2012-06-11 17:10:33.000000000 -0700
> @@ -881,8 +881,14 @@
> struct super_block *sb = parent->d_sb;
> int found;
>
> - while ((found = select_parent(parent)) != 0)
> - __shrink_dcache_sb(sb, &found, 0);
> + while ((found = select_parent(parent)) != 0) {
> + if (down_read_trylock(&sb->s_umount)) {
> + if ((sb->s_root != NULL)) {
> + __shrink_dcache_sb(sb, &found, 0);
> + }
> + up_read(&sb->s_umount);
> + }
> + }
> }
Please fully describe the race which you believe you have found. What
races against what?
Please also confirm that the bug is still present in current kernels -
2.6.32 is rather old.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* crash in filesytem during reboot . (and proposed patch)
@ 2012-06-15 19:16 Sadasivan Shaiju
0 siblings, 0 replies; 5+ messages in thread
From: Sadasivan Shaiju @ 2012-06-15 19:16 UTC (permalink / raw)
To: linux-kernel; +Cc: walker
Hi ,
I am getting the following crashes during a reboot of the system
. It looks like a race condition during umount .
<4>Call Trace:
<4>[] clear_inode+0x28/0xe8
<4>[] generic_drop_inode+0x3c/0xa8
<4>[] d_kill+0x4c/0x78
<4>[] __shrink_dcache_sb+0x258/0x360
<4>[] shrink_dcache_parent+0x140/0x190
<4>[] proc_flush_task+0xac/0x2e8
<4>[] release_task+0x80/0x4c0
<4>[] wait_consider_task+0x608/0xa80
<4>[] do_wait+0x10c/0x2b8
<4>[] SyS_wait4+0x88/0x120
<4>[] compat_sys_wait4+0xc8/0xd0
<4>[] handle_sysn32+0x44/0x84
Call Trace:
[] file_ra_state_init+0x0/0x20
[] __dentry_open+0x26c/0x3d0
[] do_filp_open+0x70c/0xbc8
[] do_sys_open+0x78/0x1e0
[] handle_sysn32+0x44/0x84
Call Trace:
[<ffffffff812ae3e4>] iput+0x3c/0x88
[<ffffffff812aaa84>] d_kill+0x4c/0x78
[<ffffffff812aad08>] __shrink_dcache_sb+0x258/0x360
[<ffffffff812ab300>] shrink_dcache_parent+0x140/0x190
[<ffffffff812eea14>] proc_flush_task+0xac/0x2e8
[<ffffffff811e6538>] release_task+0x80/0x4c0
[<ffffffff811e80c8>] do_exit+0x6f8/0x908
[<ffffffff8121dee8>] unregister_module_notifier+0x0/0x10
Call Trace:
[<ffffffff812ae3e4>] iput+0x3c/0x88
[<ffffffff812aaa84>] d_kill+0x4c/0x78
[<ffffffff812ab6b8>] dput+0x120/0x220
[<ffffffff812a0f1c>] do_lookup+0xdc/0x210 [<ffffffff812a33e8>]
__link_path_walk+0x910/0x1408 [<ffffffff812a4194>] path_walk+0x64/0x108
[<ffffffff812a4350>] do_path_lookup+0x60/0x68 [<ffffffff812a519c>]
do_filp_open+0xdc/0xbc8 [<ffffffff81293768>] do_sys_open+0x78/0x1e0
[<ffffffff81103844>] handle_sysn32+0x44/0x84
Call Trace:
[<ffffffff812af56c>] __destroy_inode+0x74/0xb0 [<ffffffff812af5bc>]
destroy_inode+0x14/0x50 [<ffffffff812aacd4>] d_kill+0x4c/0x78
[<ffffffff812aaf58>] __shrink_dcache_sb+0x258/0x360 [<ffffffff812ab550>]
shrink_dcache_parent+0x140/0x190 [<ffffffff812eec9c>]
proc_flush_task+0xac/0x2e8 [<ffffffff811e65f0>] release_task+0x80/0x4c0
[<ffffffff811e7038>] wait_consider_task+0x608/0xa80 [<ffffffff811e75bc>]
do_wait+0x10c/0x2b8 [<ffffffff811e7928>] SyS_waitid+0xa0/0x200
[<ffffffff812241cc>] compat_sys_waitid+0x64/0xd8 [<ffffffff81103844>]
handle_sysn32+0x44/0x84
I am thinking of putting the following fix in
shrink_dcache_parent() . Please let me know is there any problem
with this fix .
Index: linux-2.6.32/fs/dcache.c
===================================================================
--- linux-2.6.32.orig/fs/dcache.c 2012-05-30 15:59:18.000000000
-0700
+++ linux-2.6.32/fs/dcache.c 2012-06-11 17:10:33.000000000 -0700
@@ -881,8 +881,14 @@
struct super_block *sb = parent->d_sb;
int found;
- while ((found = select_parent(parent)) != 0)
- __shrink_dcache_sb(sb, &found, 0);
+ while ((found = select_parent(parent)) != 0) {
+ if (down_read_trylock(&sb->s_umount)) {
+ if ((sb->s_root != NULL)) {
+ __shrink_dcache_sb(sb, &found, 0);
+ }
+ up_read(&sb->s_umount);
+ }
+ }
}
/*
Regards,
shaiju
^ permalink raw reply [flat|nested] 5+ messages in thread
* crash in filesytem during reboot . (and proposed patch)
@ 2012-06-15 18:42 Sadasivan Shaiju
0 siblings, 0 replies; 5+ messages in thread
From: Sadasivan Shaiju @ 2012-06-15 18:42 UTC (permalink / raw)
To: linux-kernel
Hi
I am getting the following crashes during a reboot of the system
. It looks like a race condition during unmount .
<4>Call Trace:
<4>[] clear_inode+0x28/0xe8
<4>[] generic_drop_inode+0x3c/0xa8
<4>[] d_kill+0x4c/0x78
<4>[] __shrink_dcache_sb+0x258/0x360
<4>[] shrink_dcache_parent+0x140/0x190
<4>[] proc_flush_task+0xac/0x2e8
<4>[] release_task+0x80/0x4c0
<4>[] wait_consider_task+0x608/0xa80
<4>[] do_wait+0x10c/0x2b8
<4>[] SyS_wait4+0x88/0x120
<4>[] compat_sys_wait4+0xc8/0xd0
<4>[] handle_sysn32+0x44/0x84
Call Trace:
[] file_ra_state_init+0x0/0x20
[] __dentry_open+0x26c/0x3d0
[] do_filp_open+0x70c/0xbc8
[] do_sys_open+0x78/0x1e0
[] handle_sysn32+0x44/0x84
Call Trace:
[<ffffffff812ae3e4>] iput+0x3c/0x88
[<ffffffff812aaa84>] d_kill+0x4c/0x78
[<ffffffff812aad08>] __shrink_dcache_sb+0x258/0x360
[<ffffffff812ab300>] shrink_dcache_parent+0x140/0x190
[<ffffffff812eea14>] proc_flush_task+0xac/0x2e8
[<ffffffff811e6538>] release_task+0x80/0x4c0
[<ffffffff811e80c8>] do_exit+0x6f8/0x908
[<ffffffff8121dee8>] unregister_module_notifier+0x0/0x10
Call Trace:
[<ffffffff812ae3e4>] iput+0x3c/0x88
[<ffffffff812aaa84>] d_kill+0x4c/0x78
[<ffffffff812ab6b8>] dput+0x120/0x220
[<ffffffff812a0f1c>] do_lookup+0xdc/0x210 [<ffffffff812a33e8>]
__link_path_walk+0x910/0x1408 [<ffffffff812a4194>] path_walk+0x64/0x108
[<ffffffff812a4350>] do_path_lookup+0x60/0x68 [<ffffffff812a519c>]
do_filp_open+0xdc/0xbc8 [<ffffffff81293768>] do_sys_open+0x78/0x1e0
[<ffffffff81103844>] handle_sysn32+0x44/0x84
Call Trace:
[<ffffffff812af56c>] __destroy_inode+0x74/0xb0 [<ffffffff812af5bc>]
destroy_inode+0x14/0x50 [<ffffffff812aacd4>] d_kill+0x4c/0x78
[<ffffffff812aaf58>] __shrink_dcache_sb+0x258/0x360 [<ffffffff812ab550>]
shrink_dcache_parent+0x140/0x190 [<ffffffff812eec9c>]
proc_flush_task+0xac/0x2e8 [<ffffffff811e65f0>] release_task+0x80/0x4c0
[<ffffffff811e7038>] wait_consider_task+0x608/0xa80 [<ffffffff811e75bc>]
do_wait+0x10c/0x2b8 [<ffffffff811e7928>] SyS_waitid+0xa0/0x200
[<ffffffff812241cc>] compat_sys_waitid+0x64/0xd8 [<ffffffff81103844>]
handle_sysn32+0x44/0x84
I am thinking of putting the following fix in
shrink_dcache_parent() . Please let me know is there any problem
with this fix .
Index: linux-2.6.32/fs/dcache.c
===================================================================
--- linux-2.6.32.orig/fs/dcache.c 2012-05-30 15:59:18.000000000
-0700
+++ linux-2.6.32/fs/dcache.c 2012-06-11 17:10:33.000000000 -0700
@@ -881,8 +881,14 @@
struct super_block *sb = parent->d_sb;
int found;
- while ((found = select_parent(parent)) != 0)
- __shrink_dcache_sb(sb, &found, 0);
+ while ((found = select_parent(parent)) != 0) {
+ if (down_read_trylock(&sb->s_umount)) {
+ if ((sb->s_root != NULL)) {
+ __shrink_dcache_sb(sb, &found, 0);
+ }
+ up_read(&sb->s_umount);
+ }
+ }
}
/*
Regards,
shaiju
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-06-23 0:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 19:22 crash in filesytem during reboot . (and proposed patch) Sadasivan Shaiju
[not found] <797b2bac7e6fb198ea25433e302856b9@mail.gmail.com>
2012-06-22 21:29 ` Andrew Morton
2012-06-23 0:53 ` Sadasivan Shaiju
-- strict thread matches above, loose matches on Subject: below --
2012-06-15 19:16 Sadasivan Shaiju
2012-06-15 18:42 Sadasivan Shaiju
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.