All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: Check for NULL page in extent_range_uptodate
@ 2012-01-25 19:03 Mitch Harder
  2012-01-30 21:41 ` Vincent Vanackere
  0 siblings, 1 reply; 4+ messages in thread
From: Mitch Harder @ 2012-01-25 19:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Mitch Harder

A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors.  The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).

After testing this patch, the user reported that the error with
the NULL pointer oops was solved.  However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c

       for (i = start_i; i < num_pages; i++) {
               page = extent_buffer_page(eb, i);
               wait_on_page_locked(page);
               if (!PageUptodate(page))
                       ret = -EIO;
       }

This patch leaves the issue with the locked page yet to be resolved.

Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
---
 fs/btrfs/extent_io.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9d09a4f..fcf77e1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3909,6 +3909,8 @@ int extent_range_uptodate(struct extent_io_tree *tree,
 	while (start <= end) {
 		index = start >> PAGE_CACHE_SHIFT;
 		page = find_get_page(tree->mapping, index);
+		if (!page)
+			return 1;
 		uptodate = PageUptodate(page);
 		page_cache_release(page);
 		if (!uptodate) {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Btrfs: Check for NULL page in extent_range_uptodate
  2012-01-25 19:03 [PATCH] Btrfs: Check for NULL page in extent_range_uptodate Mitch Harder
@ 2012-01-30 21:41 ` Vincent Vanackere
  2012-01-30 23:13   ` Mitch Harder
  0 siblings, 1 reply; 4+ messages in thread
From: Vincent Vanackere @ 2012-01-30 21:41 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Mitch Harder

On Wed, Jan 25, 2012 at 20:03, Mitch Harder
<mitch.harder@sabayonlinux.org> wrote:
> A user has encountered a NULL pointer kernel oops in btrfs when
> encountering media errors. =C2=A0The problem has been identified
> as an unhandled NULL pointer returned from find_get_page().
> This modification simply checks for a NULL page, and returns
> with an error if found (the extent_range_uptodate() function
> returns 1 on errors).
>
> After testing this patch, the user reported that the error with
> the NULL pointer oops was solved. =C2=A0However, there is still a
> remaining problem with a thread becoming stuck in
> wait_on_page_locked(page) in the read_extent_buffer_pages(...)
> function in extent_io.c
>
> =C2=A0 =C2=A0 =C2=A0 for (i =3D start_i; i < num_pages; i++) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 page =3D extent_buff=
er_page(eb, i);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 wait_on_page_locked(=
page);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!PageUptodate(pa=
ge))
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 ret =3D -EIO;
> =C2=A0 =C2=A0 =C2=A0 }
>
> This patch leaves the issue with the locked page yet to be resolved.
>
> Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
> ---
> =C2=A0fs/btrfs/extent_io.c | =C2=A0 =C2=A02 ++
> =C2=A01 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 9d09a4f..fcf77e1 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3909,6 +3909,8 @@ int extent_range_uptodate(struct extent_io_tree=
 *tree,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0while (start <=3D end) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0index =3D star=
t >> PAGE_CACHE_SHIFT;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page =3D find_=
get_page(tree->mapping, index);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!page)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 return 1;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uptodate =3D P=
ageUptodate(page);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page_cache_rel=
ease(page);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!uptodate)=
 {
> --
> 1.7.3.4
>


Hi,

 If any btrfs developer could have a look at it while I can still
reproduce the situation (it won't last long, I'll send the disk to RMA
next week), I'm still interested in solving the remaining part of the
btrfs bug. Here is the trace I get with the current linux kernel
(6bc2b95ee602659c1be6fac0f6aadeb0c5c29a5d) :

[  330.530015] btrfs bad tree block start 959241011200 959241011200
[  480.288046] INFO: task cat:2627 blocked for more than 120 seconds.
[  480.288050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  480.288052] cat             D ffffffff8180c600     0  2627   2468 0x=
00000004
[  480.288057]  ffff8801fe135618 0000000000000086 ffff8801fe1355d8
ffff880222061650
[  480.288062]  ffff880215b5db80 ffff8801fe135fd8 ffff8801fe135fd8
ffff8801fe135fd8
[  480.288067]  ffff8802241a16e0 ffff880215b5db80 ffff8801fe1355e8
ffff88022fd93e88
[  480.288071] Call Trace:
[  480.288080]  [<ffffffff81114440>] ? __lock_page+0x70/0x70
[  480.288084]  [<ffffffff8162c0af>] schedule+0x3f/0x60
[  480.288087]  [<ffffffff8162c15f>] io_schedule+0x8f/0xd0
[  480.288091]  [<ffffffff8111444e>] sleep_on_page+0xe/0x20
[  480.288094]  [<ffffffff8162a96f>] __wait_on_bit+0x5f/0x90
[  480.288098]  [<ffffffff811145b8>] wait_on_page_bit+0x78/0x80
[  480.288102]  [<ffffffff81070c70>] ? autoremove_wake_function+0x40/0x=
40
[  480.288129]  [<ffffffffa005d161>]
read_extent_buffer_pages+0x471/0x4d0 [btrfs]
[  480.288142]  [<ffffffffa00347b0>] ? verify_parent_transid+0x160/0x16=
0 [btrfs]
[  480.288155]  [<ffffffffa003513a>]
btree_read_extent_buffer_pages.isra.99+0x8a/0xc0 [btrfs]
[  480.288169]  [<ffffffffa00371e1>] read_tree_block+0x41/0x60 [btrfs]
[  480.288179]  [<ffffffffa001d6a3>]
read_block_for_search.isra.34+0xf3/0x3d0 [btrfs]
[  480.288190]  [<ffffffffa001f930>] btrfs_search_slot+0x300/0x8a0 [btr=
fs]
[  480.288203]  [<ffffffffa0031ab4>] btrfs_lookup_csum+0x74/0x170 [btrf=
s]
[  480.288216]  [<ffffffffa0031d5f>] __btrfs_lookup_bio_sums+0x1af/0x3b=
0 [btrfs]
[  480.288228]  [<ffffffffa0031fb6>] btrfs_lookup_bio_sums+0x16/0x20 [b=
trfs]
[  480.288242]  [<ffffffffa003e650>] btrfs_submit_bio_hook+0x140/0x170 =
[btrfs]
[  480.288256]  [<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0x720 [=
btrfs]
[  480.288272]  [<ffffffffa00571aa>] submit_one_bio+0x6a/0xa0 [btrfs]
[  480.288287]  [<ffffffffa005be64>] extent_readpages+0xe4/0x100 [btrfs=
]
[  480.288301]  [<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0x720 [=
btrfs]
[  480.288315]  [<ffffffffa003eebf>] btrfs_readpages+0x1f/0x30 [btrfs]
[  480.288319]  [<ffffffff81120bef>] __do_page_cache_readahead+0x1af/0x=
250
[  480.288323]  [<ffffffff81120ff1>] ra_submit+0x21/0x30
[  480.288326]  [<ffffffff81121115>] ondemand_readahead+0x115/0x230
[  480.288330]  [<ffffffff81137eb9>] ? __do_fault+0x419/0x530
[  480.288333]  [<ffffffff81121311>] page_cache_sync_readahead+0x31/0x5=
0
[  480.288337]  [<ffffffff811167d8>] generic_file_aio_read+0x438/0x780
[  480.288342]  [<ffffffff81173db2>] do_sync_read+0xd2/0x110
[  480.288346]  [<ffffffff81294113>] ? security_file_permission+0x93/0x=
b0
[  480.288349]  [<ffffffff81174231>] ? rw_verify_area+0x61/0xf0
[  480.288352]  [<ffffffff81174710>] vfs_read+0xb0/0x180
[  480.288355]  [<ffffffff8117482a>] sys_read+0x4a/0x90
[  480.288359]  [<ffffffff81635229>] system_call_fastpath+0x16/0x1b
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Btrfs: Check for NULL page in extent_range_uptodate
  2012-01-30 21:41 ` Vincent Vanackere
@ 2012-01-30 23:13   ` Mitch Harder
  2012-01-31  9:06     ` Vincent Vanackere
  0 siblings, 1 reply; 4+ messages in thread
From: Mitch Harder @ 2012-01-30 23:13 UTC (permalink / raw)
  To: Vincent Vanackere; +Cc: linux-btrfs

On Mon, Jan 30, 2012 at 3:41 PM, Vincent Vanackere
<vincent.vanackere@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 20:03, Mitch Harder
> <mitch.harder@sabayonlinux.org> wrote:
>> A user has encountered a NULL pointer kernel oops in btrfs when
>> encountering media errors. =A0The problem has been identified
>> as an unhandled NULL pointer returned from find_get_page().
>> This modification simply checks for a NULL page, and returns
>> with an error if found (the extent_range_uptodate() function
>> returns 1 on errors).
>>
>> After testing this patch, the user reported that the error with
>> the NULL pointer oops was solved. =A0However, there is still a
>> remaining problem with a thread becoming stuck in
>> wait_on_page_locked(page) in the read_extent_buffer_pages(...)
>> function in extent_io.c
>>
>> =A0 =A0 =A0 for (i =3D start_i; i < num_pages; i++) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 page =3D extent_buffer_page(eb, i);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 wait_on_page_locked(page);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!PageUptodate(page))
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -EIO;
>> =A0 =A0 =A0 }
>>
>> This patch leaves the issue with the locked page yet to be resolved.
>>
>> Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
>> ---
>> =A0fs/btrfs/extent_io.c | =A0 =A02 ++
>> =A01 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 9d09a4f..fcf77e1 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3909,6 +3909,8 @@ int extent_range_uptodate(struct extent_io_tre=
e *tree,
>> =A0 =A0 =A0 =A0while (start <=3D end) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0index =3D start >> PAGE_CACHE_SHIFT;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0page =3D find_get_page(tree->mapping,=
 index);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!page)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 1;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0uptodate =3D PageUptodate(page);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0page_cache_release(page);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!uptodate) {
>> --
>> 1.7.3.4
>>
>
>
> Hi,
>
> =A0If any btrfs developer could have a look at it while I can still
> reproduce the situation (it won't last long, I'll send the disk to RM=
A
> next week), I'm still interested in solving the remaining part of the
> btrfs bug. Here is the trace I get with the current linux kernel
> (6bc2b95ee602659c1be6fac0f6aadeb0c5c29a5d) :
>
> [ =A0330.530015] btrfs bad tree block start 959241011200 959241011200
> [ =A0480.288046] INFO: task cat:2627 blocked for more than 120 second=
s.
> [ =A0480.288050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ =A0480.288052] cat =A0 =A0 =A0 =A0 =A0 =A0 D ffffffff8180c600 =A0 =A0=
 0 =A02627 =A0 2468 0x00000004
> [ =A0480.288057] =A0ffff8801fe135618 0000000000000086 ffff8801fe1355d=
8
> ffff880222061650
> [ =A0480.288062] =A0ffff880215b5db80 ffff8801fe135fd8 ffff8801fe135fd=
8
> ffff8801fe135fd8
> [ =A0480.288067] =A0ffff8802241a16e0 ffff880215b5db80 ffff8801fe1355e=
8
> ffff88022fd93e88
> [ =A0480.288071] Call Trace:
> [ =A0480.288080] =A0[<ffffffff81114440>] ? __lock_page+0x70/0x70
> [ =A0480.288084] =A0[<ffffffff8162c0af>] schedule+0x3f/0x60
> [ =A0480.288087] =A0[<ffffffff8162c15f>] io_schedule+0x8f/0xd0
> [ =A0480.288091] =A0[<ffffffff8111444e>] sleep_on_page+0xe/0x20
> [ =A0480.288094] =A0[<ffffffff8162a96f>] __wait_on_bit+0x5f/0x90
> [ =A0480.288098] =A0[<ffffffff811145b8>] wait_on_page_bit+0x78/0x80
> [ =A0480.288102] =A0[<ffffffff81070c70>] ? autoremove_wake_function+0=
x40/0x40
> [ =A0480.288129] =A0[<ffffffffa005d161>]
> read_extent_buffer_pages+0x471/0x4d0 [btrfs]
> [ =A0480.288142] =A0[<ffffffffa00347b0>] ? verify_parent_transid+0x16=
0/0x160 [btrfs]
> [ =A0480.288155] =A0[<ffffffffa003513a>]
> btree_read_extent_buffer_pages.isra.99+0x8a/0xc0 [btrfs]
> [ =A0480.288169] =A0[<ffffffffa00371e1>] read_tree_block+0x41/0x60 [b=
trfs]
> [ =A0480.288179] =A0[<ffffffffa001d6a3>]
> read_block_for_search.isra.34+0xf3/0x3d0 [btrfs]
> [ =A0480.288190] =A0[<ffffffffa001f930>] btrfs_search_slot+0x300/0x8a=
0 [btrfs]
> [ =A0480.288203] =A0[<ffffffffa0031ab4>] btrfs_lookup_csum+0x74/0x170=
 [btrfs]
> [ =A0480.288216] =A0[<ffffffffa0031d5f>] __btrfs_lookup_bio_sums+0x1a=
f/0x3b0 [btrfs]
> [ =A0480.288228] =A0[<ffffffffa0031fb6>] btrfs_lookup_bio_sums+0x16/0=
x20 [btrfs]
> [ =A0480.288242] =A0[<ffffffffa003e650>] btrfs_submit_bio_hook+0x140/=
0x170 [btrfs]
> [ =A0480.288256] =A0[<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0=
x720 [btrfs]
> [ =A0480.288272] =A0[<ffffffffa00571aa>] submit_one_bio+0x6a/0xa0 [bt=
rfs]
> [ =A0480.288287] =A0[<ffffffffa005be64>] extent_readpages+0xe4/0x100 =
[btrfs]
> [ =A0480.288301] =A0[<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0=
x720 [btrfs]
> [ =A0480.288315] =A0[<ffffffffa003eebf>] btrfs_readpages+0x1f/0x30 [b=
trfs]
> [ =A0480.288319] =A0[<ffffffff81120bef>] __do_page_cache_readahead+0x=
1af/0x250
> [ =A0480.288323] =A0[<ffffffff81120ff1>] ra_submit+0x21/0x30
> [ =A0480.288326] =A0[<ffffffff81121115>] ondemand_readahead+0x115/0x2=
30
> [ =A0480.288330] =A0[<ffffffff81137eb9>] ? __do_fault+0x419/0x530
> [ =A0480.288333] =A0[<ffffffff81121311>] page_cache_sync_readahead+0x=
31/0x50
> [ =A0480.288337] =A0[<ffffffff811167d8>] generic_file_aio_read+0x438/=
0x780
> [ =A0480.288342] =A0[<ffffffff81173db2>] do_sync_read+0xd2/0x110
> [ =A0480.288346] =A0[<ffffffff81294113>] ? security_file_permission+0=
x93/0xb0
> [ =A0480.288349] =A0[<ffffffff81174231>] ? rw_verify_area+0x61/0xf0
> [ =A0480.288352] =A0[<ffffffff81174710>] vfs_read+0xb0/0x180
> [ =A0480.288355] =A0[<ffffffff8117482a>] sys_read+0x4a/0x90
> [ =A0480.288359] =A0[<ffffffff81635229>] system_call_fastpath+0x16/0x=
1b

Jeff Mahoney has been working on a large overhaul of error
handling/BUG_ONs.  It is difficult to say when it  will be ready, or
if it will even address this specific problem.

I'd go ahead and return the disk.  I doubt you'll be the last user to
have bad sectors, so there'll be more opportunities to see how this
issue is handled after the changes to error handling.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Btrfs: Check for NULL page in extent_range_uptodate
  2012-01-30 23:13   ` Mitch Harder
@ 2012-01-31  9:06     ` Vincent Vanackere
  0 siblings, 0 replies; 4+ messages in thread
From: Vincent Vanackere @ 2012-01-31  9:06 UTC (permalink / raw)
  To: Mitch Harder; +Cc: linux-btrfs

On Tue, Jan 31, 2012 at 00:13, Mitch Harder
<mitch.harder@sabayonlinux.org> wrote:

> Jeff Mahoney has been working on a large overhaul of error
> handling/BUG_ONs. =C2=A0It is difficult to say when it =C2=A0will be =
ready, or
> if it will even address this specific problem.
>
> I'd go ahead and return the disk. =C2=A0I doubt you'll be the last us=
er to
> have bad sectors, so there'll be more opportunities to see how this
> issue is handled after the changes to error handling.

Ok I'm returning the disk now. Thanks for the help !
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-01-31  9:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-25 19:03 [PATCH] Btrfs: Check for NULL page in extent_range_uptodate Mitch Harder
2012-01-30 21:41 ` Vincent Vanackere
2012-01-30 23:13   ` Mitch Harder
2012-01-31  9:06     ` Vincent Vanackere

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.