All of lore.kernel.org
 help / color / mirror / Atom feed
* md: oops on dropping bitmaps from an array
@ 2009-11-30 14:00 Arkadiusz Miskiewicz
  2009-12-01  1:18 ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Arkadiusz Miskiewicz @ 2009-11-30 14:00 UTC (permalink / raw)
  To: linux-raid


After reading http://etbe.coker.com.au/2008/01/28/write-intent-bitmaps/ wanted 
to check
bitmaps, turned these on but  http://blog.liw.fi/posts/write-intent-bitmaps/ 
caused me to do

mdadm --grow --bitmap=none /dev/md3

which ended with the oops below:

2.6.31.5, raid10 array

[2500705.083965] BUG: unable to handle kernel NULL pointer dereference at 
(null)                                                                             
[2500705.090142] IP: [<ffffffffa001387a>] bitmap_daemon_work+0x20a/0x500 
[md_mod]                                                                            
[2500705.090142] PGD 0                                                                                                                                       
[2500705.090142] Oops: 0002 [#1] SMP                                                                                                                         
[2500705.090142] last sysfs file: 
/sys/devices/pci0000:00/0000:00:1e.0/0000:09:0c.0/local_cpus                                                               
[2500705.090142] CPU 5                                                                                                                                       
[2500705.090142] Modules linked in: pppoe pppox ppp_generic slhc configs 
tcp_diag inet_diag ipmi_watchdog netconsole configfs sit tunnel4 sch_sfq ext3 
jbd mbcache raid1 dm_mod e1000 
e1000e ipmi_devintf ipmi_si ipmi_msghandler 8021q garp stp xfs exportfs sd_mod 
crc_t10dif mptsas mptscsih mptbase scsi_transport_sas scsi_mod raid10 md_mod 
[last unloaded: scsi_wait_scan]                                                                                                      
[2500705.090142] Pid: 1423, comm: md2_raid10 xid: #0 Not tainted 2.6.31.5-0.3 
#1 S5000VSA                                                                    
[2500705.090142] RIP: 0010:[<ffffffffa001387a>]  [<ffffffffa001387a>] 
bitmap_daemon_work+0x20a/0x500 [md_mod]                                                
[2500705.090142] RSP: 0018:ffff8801d6cffca0  EFLAGS: 00010046                                                                                                
[2500705.090142] RAX: 0000000000000000 RBX: ffff8801b8aaff00 RCX: 
ffff8801d76c82a0                                                                           
[2500705.090142] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 
ffff8801b8aaff54                                                                           
[2500705.090142] RBP: ffff8801d6cffcf0 R08: 0000000000000000 R09: 
0008e260c60a0f7c                                                                           
[2500705.090142] R10: 6f983d4c554e7d25 R11: 00000000ffffffff R12: 
0000000000000000                                                                           
[2500705.090142] R13: ffffea00040c4fa0 R14: 0000000000000246 R15: 
0000000000000800                                                                           
[2500705.090142] FS:  0000000000000000(0000) GS:ffff8800280af000(0000) 
knlGS:0000000000000000                                                                
[2500705.090142] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b                                                                                           
[2500705.090142] CR2: 0000000000000000 CR3: 0000000107e95000 CR4: 
00000000000006e0                                                                           
[2500705.090142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000                                                                           
[2500705.090142] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[2500705.090142] Process md2_raid10 (pid: 1423, threadinfo ffff8801d6cfe000, 
task ffff8801d6cf23a0)
[2500705.090142] Stack:
[2500705.090142]  ffff8801d6cffce0 ffff8801b8aaff54 ffff8800280c0140 
faadb9534992dcbd
[2500705.090142] <0> ffff8800139dcd00 ffff8801d76c8000 ffff8801d90f5368 
00000000000005b1
[2500705.090142] <0> ffff8801d6cffe80 ffff8801d90f5350 ffff8801d6cffd30 
ffffffffa000da6b
[2500705.090142] Call Trace:
[2500705.090142]  [<ffffffffa000da6b>] md_check_recovery+0x3b/0x5c0 [md_mod]
[2500705.090142]  [<ffffffffa0024074>] cleanup_module+0x2d34/0x4140 [raid10]
[2500705.090142]  [<ffffffff8146adff>] ? schedule_timeout+0x15f/0x220
[2500705.090142]  [<ffffffff81062740>] ? process_timeout+0x0/0x40
[2500705.090142]  [<ffffffffa000b9f4>] md_register_thread+0x1b4/0x2b0 [md_mod]
[2500705.090142]  [<ffffffff81074c50>] ? autoremove_wake_function+0x0/0x60
[2500705.090142]  [<ffffffffa000b9a0>] ? md_register_thread+0x160/0x2b0 
[md_mod]
[2500705.090142]  [<ffffffff81074746>] kthread+0xb6/0xc0
[2500705.090142]  [<ffffffff810042da>] child_rip+0xa/0x20
[2500705.090142]  [<ffffffff81074690>] ? kthread+0x0/0xc0
[2500705.090142]  [<ffffffff810042d0>] ? child_rip+0x0/0x20
[2500705.090142] Code: e8 ac ee ff ff 8b 7b 50 85 ff 0f 85 8b 01 00 00 48 8b 
7d b8 e8 a8 97 45 e1 49 8b 55 20 49 89 c6 48 8b 43 78 8d 14 95 01 00 00 00 
<0f> b3 10 48 8b 4b 30 4c 89 e6 48 8d 
55 c4 48 89 df 83 e9 09 48
[2500705.090142] RIP  [<ffffffffa001387a>] bitmap_daemon_work+0x20a/0x500 
[md_mod]
[2500705.090142]  RSP <ffff8801d6cffca0>
[2500705.090142] CR2: 0000000000000000
[2500705.090142] ---[ end trace 5a64b46437e34911 ]---

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: md: oops on dropping bitmaps from an array
  2009-11-30 14:00 md: oops on dropping bitmaps from an array Arkadiusz Miskiewicz
@ 2009-12-01  1:18 ` Neil Brown
  2009-12-01 16:08   ` John Robinson
  0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2009-12-01  1:18 UTC (permalink / raw)
  To: Arkadiusz Miskiewicz; +Cc: linux-raid

On Mon, 30 Nov 2009 15:00:24 +0100
Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> wrote:

> 
> After reading
> http://etbe.coker.com.au/2008/01/28/write-intent-bitmaps/ wanted to
> check bitmaps, turned these on but
> http://blog.liw.fi/posts/write-intent-bitmaps/ caused me to do

Yes, bitmaps can slow things down.  Using a larger bitmap chunk size
can help.  mdadm-3.1.1 uses a significantly larger chunk size by
default which should help.
I just ran some fairly simple tests on a RAID5 over 5 150G drives.
The test was untaring and then removing the linux kernel source.

The old default bitmap chunksize is 512K which resulted in a slowdown of
7%-9% compared with no bitmap.
The new default size is 64MB with resulted in a slowdown or 0.3% to 2%.

So there is still a cost, but smaller.
A larger bitmap chunksize will theoretically make the resync time after
a crash a little longer, but it would still be a fraction of 1% with
this size of bitmap chunk.

Like any insurance there is a cost.
  They pay you every time your house burns down,
  You pay them every time it doesn't.

It is a question of when the cost is justified, which needs to be made
on an individual basis.


> 
> mdadm --grow --bitmap=none /dev/md3
> 
> which ended with the oops below:
> 
> 2.6.31.5, raid10 array
> 
> [2500705.083965] BUG: unable to handle kernel NULL pointer
> dereference at
> (null) [2500705.090142] IP: [<ffffffffa001387a>]
> bitmap_daemon_work+0x20a/0x500

That is bad.  I think I can see what is happening.  I suspect that this
is a fairly hard race to hit - you must been unlucky:-(   But thanks
very much for reporting it.  I'll see about getting it fixed.  I
probably just need to wrap a mutex around bitmap_daemon_work and grab
it before destroying the bitmap, but I need to read the code more
carefully and make sure.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: md: oops on dropping bitmaps from an array
  2009-12-01  1:18 ` Neil Brown
@ 2009-12-01 16:08   ` John Robinson
  0 siblings, 0 replies; 3+ messages in thread
From: John Robinson @ 2009-12-01 16:08 UTC (permalink / raw)
  To: Linux RAID

On 01/12/2009 01:18, Neil Brown wrote:
> On Mon, 30 Nov 2009 15:00:24 +0100
> Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> wrote:
[...]
>> [2500705.083965] BUG: unable to handle kernel NULL pointer
>> dereference at
>> (null) [2500705.090142] IP: [<ffffffffa001387a>]
>> bitmap_daemon_work+0x20a/0x500
> 
> That is bad.  I think I can see what is happening.  I suspect that this
> is a fairly hard race to hit - you must been unlucky:-(

I had an oops removing a bitmap a wee while ago, and I mentioned it on 
this list, but I couldn't post a backtrace. I guess I was unlucky too 
but I'm delighted to hear luck's not going to be an issue in the future :-)

Cheers,

John.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-12-01 16:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-30 14:00 md: oops on dropping bitmaps from an array Arkadiusz Miskiewicz
2009-12-01  1:18 ` Neil Brown
2009-12-01 16:08   ` John Robinson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.