* md: oops on dropping bitmaps from an array
@ 2009-11-30 14:00 Arkadiusz Miskiewicz
2009-12-01 1:18 ` Neil Brown
0 siblings, 1 reply; 3+ messages in thread
From: Arkadiusz Miskiewicz @ 2009-11-30 14:00 UTC (permalink / raw)
To: linux-raid
After reading http://etbe.coker.com.au/2008/01/28/write-intent-bitmaps/ wanted
to check
bitmaps, turned these on but http://blog.liw.fi/posts/write-intent-bitmaps/
caused me to do
mdadm --grow --bitmap=none /dev/md3
which ended with the oops below:
2.6.31.5, raid10 array
[2500705.083965] BUG: unable to handle kernel NULL pointer dereference at
(null)
[2500705.090142] IP: [<ffffffffa001387a>] bitmap_daemon_work+0x20a/0x500
[md_mod]
[2500705.090142] PGD 0
[2500705.090142] Oops: 0002 [#1] SMP
[2500705.090142] last sysfs file:
/sys/devices/pci0000:00/0000:00:1e.0/0000:09:0c.0/local_cpus
[2500705.090142] CPU 5
[2500705.090142] Modules linked in: pppoe pppox ppp_generic slhc configs
tcp_diag inet_diag ipmi_watchdog netconsole configfs sit tunnel4 sch_sfq ext3
jbd mbcache raid1 dm_mod e1000
e1000e ipmi_devintf ipmi_si ipmi_msghandler 8021q garp stp xfs exportfs sd_mod
crc_t10dif mptsas mptscsih mptbase scsi_transport_sas scsi_mod raid10 md_mod
[last unloaded: scsi_wait_scan]
[2500705.090142] Pid: 1423, comm: md2_raid10 xid: #0 Not tainted 2.6.31.5-0.3
#1 S5000VSA
[2500705.090142] RIP: 0010:[<ffffffffa001387a>] [<ffffffffa001387a>]
bitmap_daemon_work+0x20a/0x500 [md_mod]
[2500705.090142] RSP: 0018:ffff8801d6cffca0 EFLAGS: 00010046
[2500705.090142] RAX: 0000000000000000 RBX: ffff8801b8aaff00 RCX:
ffff8801d76c82a0
[2500705.090142] RDX: 0000000000000001 RSI: 0000000000000246 RDI:
ffff8801b8aaff54
[2500705.090142] RBP: ffff8801d6cffcf0 R08: 0000000000000000 R09:
0008e260c60a0f7c
[2500705.090142] R10: 6f983d4c554e7d25 R11: 00000000ffffffff R12:
0000000000000000
[2500705.090142] R13: ffffea00040c4fa0 R14: 0000000000000246 R15:
0000000000000800
[2500705.090142] FS: 0000000000000000(0000) GS:ffff8800280af000(0000)
knlGS:0000000000000000
[2500705.090142] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[2500705.090142] CR2: 0000000000000000 CR3: 0000000107e95000 CR4:
00000000000006e0
[2500705.090142] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[2500705.090142] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[2500705.090142] Process md2_raid10 (pid: 1423, threadinfo ffff8801d6cfe000,
task ffff8801d6cf23a0)
[2500705.090142] Stack:
[2500705.090142] ffff8801d6cffce0 ffff8801b8aaff54 ffff8800280c0140
faadb9534992dcbd
[2500705.090142] <0> ffff8800139dcd00 ffff8801d76c8000 ffff8801d90f5368
00000000000005b1
[2500705.090142] <0> ffff8801d6cffe80 ffff8801d90f5350 ffff8801d6cffd30
ffffffffa000da6b
[2500705.090142] Call Trace:
[2500705.090142] [<ffffffffa000da6b>] md_check_recovery+0x3b/0x5c0 [md_mod]
[2500705.090142] [<ffffffffa0024074>] cleanup_module+0x2d34/0x4140 [raid10]
[2500705.090142] [<ffffffff8146adff>] ? schedule_timeout+0x15f/0x220
[2500705.090142] [<ffffffff81062740>] ? process_timeout+0x0/0x40
[2500705.090142] [<ffffffffa000b9f4>] md_register_thread+0x1b4/0x2b0 [md_mod]
[2500705.090142] [<ffffffff81074c50>] ? autoremove_wake_function+0x0/0x60
[2500705.090142] [<ffffffffa000b9a0>] ? md_register_thread+0x160/0x2b0
[md_mod]
[2500705.090142] [<ffffffff81074746>] kthread+0xb6/0xc0
[2500705.090142] [<ffffffff810042da>] child_rip+0xa/0x20
[2500705.090142] [<ffffffff81074690>] ? kthread+0x0/0xc0
[2500705.090142] [<ffffffff810042d0>] ? child_rip+0x0/0x20
[2500705.090142] Code: e8 ac ee ff ff 8b 7b 50 85 ff 0f 85 8b 01 00 00 48 8b
7d b8 e8 a8 97 45 e1 49 8b 55 20 49 89 c6 48 8b 43 78 8d 14 95 01 00 00 00
<0f> b3 10 48 8b 4b 30 4c 89 e6 48 8d
55 c4 48 89 df 83 e9 09 48
[2500705.090142] RIP [<ffffffffa001387a>] bitmap_daemon_work+0x20a/0x500
[md_mod]
[2500705.090142] RSP <ffff8801d6cffca0>
[2500705.090142] CR2: 0000000000000000
[2500705.090142] ---[ end trace 5a64b46437e34911 ]---
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: md: oops on dropping bitmaps from an array
2009-11-30 14:00 md: oops on dropping bitmaps from an array Arkadiusz Miskiewicz
@ 2009-12-01 1:18 ` Neil Brown
2009-12-01 16:08 ` John Robinson
0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2009-12-01 1:18 UTC (permalink / raw)
To: Arkadiusz Miskiewicz; +Cc: linux-raid
On Mon, 30 Nov 2009 15:00:24 +0100
Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> wrote:
>
> After reading
> http://etbe.coker.com.au/2008/01/28/write-intent-bitmaps/ wanted to
> check bitmaps, turned these on but
> http://blog.liw.fi/posts/write-intent-bitmaps/ caused me to do
Yes, bitmaps can slow things down. Using a larger bitmap chunk size
can help. mdadm-3.1.1 uses a significantly larger chunk size by
default which should help.
I just ran some fairly simple tests on a RAID5 over 5 150G drives.
The test was untaring and then removing the linux kernel source.
The old default bitmap chunksize is 512K which resulted in a slowdown of
7%-9% compared with no bitmap.
The new default size is 64MB with resulted in a slowdown or 0.3% to 2%.
So there is still a cost, but smaller.
A larger bitmap chunksize will theoretically make the resync time after
a crash a little longer, but it would still be a fraction of 1% with
this size of bitmap chunk.
Like any insurance there is a cost.
They pay you every time your house burns down,
You pay them every time it doesn't.
It is a question of when the cost is justified, which needs to be made
on an individual basis.
>
> mdadm --grow --bitmap=none /dev/md3
>
> which ended with the oops below:
>
> 2.6.31.5, raid10 array
>
> [2500705.083965] BUG: unable to handle kernel NULL pointer
> dereference at
> (null) [2500705.090142] IP: [<ffffffffa001387a>]
> bitmap_daemon_work+0x20a/0x500
That is bad. I think I can see what is happening. I suspect that this
is a fairly hard race to hit - you must been unlucky:-( But thanks
very much for reporting it. I'll see about getting it fixed. I
probably just need to wrap a mutex around bitmap_daemon_work and grab
it before destroying the bitmap, but I need to read the code more
carefully and make sure.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: md: oops on dropping bitmaps from an array
2009-12-01 1:18 ` Neil Brown
@ 2009-12-01 16:08 ` John Robinson
0 siblings, 0 replies; 3+ messages in thread
From: John Robinson @ 2009-12-01 16:08 UTC (permalink / raw)
To: Linux RAID
On 01/12/2009 01:18, Neil Brown wrote:
> On Mon, 30 Nov 2009 15:00:24 +0100
> Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> wrote:
[...]
>> [2500705.083965] BUG: unable to handle kernel NULL pointer
>> dereference at
>> (null) [2500705.090142] IP: [<ffffffffa001387a>]
>> bitmap_daemon_work+0x20a/0x500
>
> That is bad. I think I can see what is happening. I suspect that this
> is a fairly hard race to hit - you must been unlucky:-(
I had an oops removing a bitmap a wee while ago, and I mentioned it on
this list, but I couldn't post a backtrace. I guess I was unlucky too
but I'm delighted to hear luck's not going to be an issue in the future :-)
Cheers,
John.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-12-01 16:08 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-30 14:00 md: oops on dropping bitmaps from an array Arkadiusz Miskiewicz
2009-12-01 1:18 ` Neil Brown
2009-12-01 16:08 ` John Robinson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.