* Is Swapping on software RAID1 possible in linux 2.4 ? @ 2001-07-05 11:24 ` Peter Zaitsev 2001-07-05 12:13 ` Neil Brown ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Peter Zaitsev @ 2001-07-05 11:24 UTC (permalink / raw) To: linux-kernel Hello linux-kernel, Does anyone have information on this subject ? I have the constant failures with system swapping on RAID1, I just wanted to be shure this may be the problem or not. It works without any problems with 2.2 kernel. -- Best regards, Peter mailto:pz@spylog.ru ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 11:24 ` Is Swapping on software RAID1 possible in linux 2.4 ? Peter Zaitsev @ 2001-07-05 12:13 ` Neil Brown 2001-07-05 13:22 ` Re[2]: " Peter Zaitsev ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Neil Brown @ 2001-07-05 12:13 UTC (permalink / raw) To: Peter Zaitsev; +Cc: linux-kernel On Thursday July 5, pz@spylog.ru wrote: > Hello linux-kernel, > > Does anyone have information on this subject ? I have the constant > failures with system swapping on RAID1, I just wanted to be shure > this may be the problem or not. It works without any problems with > 2.2 kernel. It certainly should work in 2.4. What sort of "constant failures" are you experiencing? Though it does appear to work in 2.2, there is a possibility of data corruption if you swap onto a raid1 array that is resyncing. This possibility does not exist in 2.4. NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re[2]: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 11:24 ` Is Swapping on software RAID1 possible in linux 2.4 ? Peter Zaitsev 2001-07-05 12:13 ` Neil Brown @ 2001-07-05 13:22 ` Peter Zaitsev 2001-07-05 13:42 ` Arjan van de Ven ` (2 more replies) 2001-07-05 14:54 ` Nick DeClario 2001-07-06 9:38 ` Re[2]: " Peter Zaitsev 3 siblings, 3 replies; 13+ messages in thread From: Peter Zaitsev @ 2001-07-05 13:22 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel Hello Neil, Thursday, July 05, 2001, 4:13:00 PM, you wrote: NB> On Thursday July 5, pz@spylog.ru wrote: >> Hello linux-kernel, >> >> Does anyone have information on this subject ? I have the constant >> failures with system swapping on RAID1, I just wanted to be shure >> this may be the problem or not. It works without any problems with >> 2.2 kernel. NB> It certainly should work in 2.4. What sort of "constant failures" are NB> you experiencing? NB> Though it does appear to work in 2.2, there is a possibility of data NB> corruption if you swap onto a raid1 array that is resyncing. This NB> possibility does not exist in 2.4. The problem is I'm constantly getting these X-order-allocation errors in kernel log and after which system becomes unstable and often hangs or leaves process which cannot be killed even by "-9" signal. Installed debuggin patches produce the following allocation paths: > Jun 20 05:56:14 tor kernel: Call Trace: [__get_free_pages+20/36] > [__get_free_pages+20/36] [kmem_cache_grow+187/520] [kmalloc+183/224] > [raid1_alloc_r1bh+105/256] [raid1_make_request+832/852] > [raid1_make_request+80/852] > Jun 20 05:56:14 tor kernel: [md_make_request+79/124] > [generic_make_request+293/308] [submit_bh+87/116] [brw_page+143/160] > [rw_swap_page_base+336/428] [rw_swap_page+112/184] [swap_writepage+120/128] > [page_launder+644/2132] > Jun 20 05:56:14 tor kernel: [do_try_to_free_pages+52/124] > [kswapd+89/228] [kernel_thread+40/56] > one more trace: SR>>Jun 19 09:50:08 garnet kernel: __alloc_pages: 0-order allocation failed. SR>>Jun 19 09:50:08 garnet kernel: __alloc_pages: 0-order allocation failed from SR>>c01Jun 19 09:50:08 garnet kernel: ^M^Mf4a2bc74 c024ac20 00000000 c012ca09 SR>>c024abe0 SR>>Jun 19 09:50:08 garnet kernel: 00000008 c03225e0 00000003 00000001 SR>>c029c9Jun 19 09:50:08 garnet kernel: f0ebb760 00000001 00000008 SR>>c03225e0 c0197bJun 19 09:50:08 garnet kernel: Call Trace: SR>>[alloc_bounce_page+13/140] [alloc_bouJun 19 09:50:08 garnet kernel: SR>>[raid1_make_request+832/852] [md_make_requJun 19 09:50:08 garnet kernel: SR>>[swap_writepage+120/128] [page_launder+644Jun 19 09:50:08 garnet kernel: SR>>[sock_poll+35/40] [do_select+230/476] [sysJun 19 10:21:27 garnet kernel: SR>>sending pkt_too_big to self SR>>Jun 19 10:21:55 garnet kernel: sending pkt_too_big to self SR>>Jun 19 10:34:36 garnet kernel: sending pkt_too_big to self SR>>Jun 19 10:35:33 garnet last message repeated 2 times SR>>Jun 19 10:36:50 garnet kernel: sending pkt_too_big to self That's why I thought this problem is related to raid1 swapping I'm using. Well. Of couse I'm speaking about synced RAID1. -- Best regards, Peter mailto:pz@spylog.ru ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 13:22 ` Re[2]: " Peter Zaitsev @ 2001-07-05 13:42 ` Arjan van de Ven 2001-07-05 18:56 ` Pete Zaitcev 2001-07-12 1:14 ` Re[2]: " Neil Brown 2 siblings, 0 replies; 13+ messages in thread From: Arjan van de Ven @ 2001-07-05 13:42 UTC (permalink / raw) To: Peter Zaitsev, linux-kernel Peter Zaitsev wrote: > > That's why I thought this problem is related to raid1 swapping I'm > using. Well there is the potential problem that RAID1 has that it can't avoid allocating memory in some occasions, for the 2nd bufferhead. ATARAID raid0 has the same problem for now, and there is no real solution to this. You can pre-allocate a bunch of bufferheads, but under high load you will run out of those, no matter how many you pre-allocate. Of course you can then wait for the "in flight" ones to become available again, and that is the best thing I've come up with so far. It would be nice if the 3 subsystems that need such bufferheads now (MD RAID1, ATARAID RAID0 and the bouncebuffer(head) code) could share their pool. Greetings, Arjan van de Ven ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 13:22 ` Re[2]: " Peter Zaitsev 2001-07-05 13:42 ` Arjan van de Ven @ 2001-07-05 18:56 ` Pete Zaitcev 2001-07-12 1:14 ` Re[2]: " Neil Brown 2 siblings, 0 replies; 13+ messages in thread From: Pete Zaitcev @ 2001-07-05 18:56 UTC (permalink / raw) To: linux-kernel In linux-kernel, you wrote: > Peter Zaitsev wrote: > > > > That's why I thought this problem is related to raid1 swapping I'm > > using. > > Well there is the potential problem that RAID1 has that it can't avoid allocating > memory in some occasions, for the 2nd bufferhead. ATARAID raid0 has the same problem for > now, and there is no real solution to this. You can pre-allocate a bunch of bufferheads, > but under high load you will run out of those, no matter how many you pre-allocate. Arjan, why doesn't it sleep instead (GFP_KERNEL)? -- Pete ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re[2]: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 13:22 ` Re[2]: " Peter Zaitsev 2001-07-05 13:42 ` Arjan van de Ven 2001-07-05 18:56 ` Pete Zaitcev @ 2001-07-12 1:14 ` Neil Brown 2001-07-12 1:48 ` Andrew Morton 2 siblings, 1 reply; 13+ messages in thread From: Neil Brown @ 2001-07-12 1:14 UTC (permalink / raw) To: Peter Zaitsev; +Cc: linux-kernel On Thursday July 5, pz@spylog.ru wrote: > Hello Neil, > > Thursday, July 05, 2001, 4:13:00 PM, you wrote: > > NB> On Thursday July 5, pz@spylog.ru wrote: > >> Hello linux-kernel, > >> > >> Does anyone have information on this subject ? I have the constant > >> failures with system swapping on RAID1, I just wanted to be shure > >> this may be the problem or not. It works without any problems with > >> 2.2 kernel. > > NB> It certainly should work in 2.4. What sort of "constant failures" are > NB> you experiencing? > > NB> Though it does appear to work in 2.2, there is a possibility of data > NB> corruption if you swap onto a raid1 array that is resyncing. This > NB> possibility does not exist in 2.4. > > > > The problem is I'm constantly getting these X-order-allocation errors > in kernel log and after which system becomes unstable and often hangs > or leaves process which cannot be killed even by "-9" signal. > Installed debuggin patches produce the following allocation paths: These "X-order-allocation" failures are just an indication that you are running out or memory. raid1 is explicitly written to cope. If memory allocation fails it waits for some to be free, and it has made sure in advance that there is some memory that it will get first-dibs on when it becomes free, so there is no risk of deadlock. However this does not explain why you are getting unkillable processes. Can you try to put swap on just one of the partitions that your raid1 together instead of on the raid1 array and see if you can get processes to become unkillable. Also, can you find out what that process is doing when it is unkillable. If you compile with alt-sysrq support, then alt-sysrq-t should print the process table. If you can get this out of dmesg and run if though ksymoops it might be most interesting. NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-12 1:14 ` Re[2]: " Neil Brown @ 2001-07-12 1:48 ` Andrew Morton 2001-07-12 3:22 ` Neil Brown 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2001-07-12 1:48 UTC (permalink / raw) To: Neil Brown; +Cc: Peter Zaitsev, linux-kernel Neil Brown wrote: > > Also, can you find out what that process is doing when it is > unkillable. > If you compile with alt-sysrq support, then alt-sysrq-t should print > the process table. If you can get this out of dmesg and run if though > ksymoops it might be most interesting. Neil, he showed us a trace the other day - kswapd was stuck in raid1_alloc_r1_bh(). This is basically the same situation as I had yesterday, where bdflush was stuck in the same place. It is completely fatal to the VM for these two processes to get stuck in this way. The approach I took was to beef up the reserved bh queues and to keep a number of them reserved *only* for the swapout and dirty buffer flush functions. That way, we have at hand the memory we need to be able to free up memory. It was necessary to define a new task_struct.flags bit so we can identify when the caller is a `buffer flusher' - I expect we'll need that in other places as well. An easy way to demonstrate the problem is to put ext3 on RAID1, boot with `mem=64m' and run `dd if=/dev/zero of=foo bs=1024k count=1k'. The machine wedges on the first run. This is due to a bdflush deadlock. Once swap is on RAID1, there will be kswapd deadlocks as well. The patch *should* fix those, but I haven't tested that. Could you please review these changes? BTW: I removed the initial buffer_head reservation code. It's not necessary with the modified reservation algorithm - as soon as we start to use the device the reserve pools will build up. There will be a deadlock opportunity if the machine is totally and utterly oom when the RAID device initially starts up, but it's really not worth the code space to even bother about this. --- linux-2.4.6/include/linux/sched.h Wed May 2 22:00:07 2001 +++ lk-ext3/include/linux/sched.h Thu Jul 12 01:03:20 2001 @@ -413,7 +418,7 @@ struct task_struct { #define PF_SIGNALED 0x00000400 /* killed by a signal */ #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_VFORK 0x00001000 /* Wake up parent in mm_release */ - +#define PF_FLUSH 0x00002000 /* Flushes buffers to disk */ #define PF_USEDFPU 0x00100000 /* task used FPU this quantum (SMP) */ /* --- linux-2.4.6/include/linux/raid/raid1.h Tue Dec 12 08:20:08 2000 +++ lk-ext3/include/linux/raid/raid1.h Thu Jul 12 01:15:39 2001 @@ -37,12 +37,12 @@ struct raid1_private_data { /* buffer pool */ /* buffer_heads that we have pre-allocated have b_pprev -> &freebh * and are linked into a stack using b_next - * raid1_bh that are pre-allocated have R1BH_PreAlloc set. * All these variable are protected by device_lock */ struct buffer_head *freebh; int freebh_cnt; /* how many are on the list */ struct raid1_bh *freer1; + unsigned freer1_cnt; struct raid1_bh *freebuf; /* each bh_req has a page allocated */ md_wait_queue_head_t wait_buffer; @@ -87,5 +87,4 @@ struct raid1_bh { /* bits for raid1_bh.state */ #define R1BH_Uptodate 1 #define R1BH_SyncPhase 2 -#define R1BH_PreAlloc 3 /* this was pre-allocated, add to free list */ #endif --- linux-2.4.6/fs/buffer.c Wed Jul 4 18:21:31 2001 +++ lk-ext3/fs/buffer.c Thu Jul 12 01:03:57 2001 @@ -2685,6 +2748,7 @@ int bdflush(void *sem) sigfillset(&tsk->blocked); recalc_sigpending(tsk); spin_unlock_irq(&tsk->sigmask_lock); + current->flags |= PF_FLUSH; up((struct semaphore *)sem); @@ -2726,6 +2790,7 @@ int kupdate(void *sem) siginitsetinv(¤t->blocked, sigmask(SIGCONT) | sigmask(SIGSTOP)); recalc_sigpending(tsk); spin_unlock_irq(&tsk->sigmask_lock); + current->flags |= PF_FLUSH; up((struct semaphore *)sem); --- linux-2.4.6/drivers/md/raid1.c Wed Jul 4 18:21:26 2001 +++ lk-ext3/drivers/md/raid1.c Thu Jul 12 01:28:58 2001 @@ -51,6 +51,28 @@ static mdk_personality_t raid1_personali static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED; struct raid1_bh *raid1_retry_list = NULL, **raid1_retry_tail; +/* + * We need to scale the number of reserved buffers by the page size + * to make writepage()s sucessful. --akpm + */ +#define R1_BLOCKS_PP (PAGE_CACHE_SIZE / 1024) +#define FREER1_MEMALLOC_RESERVED (16 * R1_BLOCKS_PP) + +/* + * Return true if the caller make take a bh from the list. + * PF_FLUSH and PF_MEMALLOC tasks are allowed to use the reserves, because + * they're trying to *free* some memory. + * + * Requires that conf->device_lock be held. + */ +static int may_take_bh(raid1_conf_t *conf, int cnt) +{ + int min_free = (current->flags & (PF_FLUSH|PF_MEMALLOC)) ? + cnt : + (cnt + FREER1_MEMALLOC_RESERVED * conf->raid_disks); + return conf->freebh_cnt >= min_free; +} + static struct buffer_head *raid1_alloc_bh(raid1_conf_t *conf, int cnt) { /* return a linked list of "cnt" struct buffer_heads. @@ -62,7 +84,7 @@ static struct buffer_head *raid1_alloc_b while(cnt) { struct buffer_head *t; md_spin_lock_irq(&conf->device_lock); - if (conf->freebh_cnt >= cnt) + if (may_take_bh(conf, cnt)) while (cnt) { t = conf->freebh; conf->freebh = t->b_next; @@ -83,7 +105,7 @@ static struct buffer_head *raid1_alloc_b cnt--; } else { PRINTK("raid1: waiting for %d bh\n", cnt); - wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt); + wait_event(conf->wait_buffer, may_take_bh(conf, cnt)); } } return bh; @@ -96,9 +118,9 @@ static inline void raid1_free_bh(raid1_c while (bh) { struct buffer_head *t = bh; bh=bh->b_next; - if (t->b_pprev == NULL) + if (conf->freebh_cnt >= FREER1_MEMALLOC_RESERVED) { kfree(t); - else { + } else { t->b_next= conf->freebh; conf->freebh = t; conf->freebh_cnt++; @@ -108,29 +130,6 @@ static inline void raid1_free_bh(raid1_c wake_up(&conf->wait_buffer); } -static int raid1_grow_bh(raid1_conf_t *conf, int cnt) -{ - /* allocate cnt buffer_heads, possibly less if kalloc fails */ - int i = 0; - - while (i < cnt) { - struct buffer_head *bh; - bh = kmalloc(sizeof(*bh), GFP_KERNEL); - if (!bh) break; - memset(bh, 0, sizeof(*bh)); - - md_spin_lock_irq(&conf->device_lock); - bh->b_pprev = &conf->freebh; - bh->b_next = conf->freebh; - conf->freebh = bh; - conf->freebh_cnt++; - md_spin_unlock_irq(&conf->device_lock); - - i++; - } - return i; -} - static int raid1_shrink_bh(raid1_conf_t *conf, int cnt) { /* discard cnt buffer_heads, if we can find them */ @@ -147,7 +146,16 @@ static int raid1_shrink_bh(raid1_conf_t md_spin_unlock_irq(&conf->device_lock); return i; } - + +/* + * Return true if the caller make take a raid1_bh from the list. + * Requires that conf->device_lock be held. + */ +static int may_take_r1bh(raid1_conf_t *conf) +{ + return ((conf->freer1_cnt > FREER1_MEMALLOC_RESERVED) || + (current->flags & (PF_FLUSH|PF_MEMALLOC))) && conf->freer1; +} static struct raid1_bh *raid1_alloc_r1bh(raid1_conf_t *conf) { @@ -155,8 +163,9 @@ static struct raid1_bh *raid1_alloc_r1bh do { md_spin_lock_irq(&conf->device_lock); - if (conf->freer1) { + if (may_take_r1bh(conf)) { r1_bh = conf->freer1; + conf->freer1_cnt--; conf->freer1 = r1_bh->next_r1; r1_bh->next_r1 = NULL; r1_bh->state = 0; @@ -170,7 +179,7 @@ static struct raid1_bh *raid1_alloc_r1bh memset(r1_bh, 0, sizeof(*r1_bh)); return r1_bh; } - wait_event(conf->wait_buffer, conf->freer1); + wait_event(conf->wait_buffer, may_take_r1bh(conf)); } while (1); } @@ -178,49 +187,30 @@ static inline void raid1_free_r1bh(struc { struct buffer_head *bh = r1_bh->mirror_bh_list; raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev); + unsigned long flags; r1_bh->mirror_bh_list = NULL; - if (test_bit(R1BH_PreAlloc, &r1_bh->state)) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); + spin_lock_irqsave(&conf->device_lock, flags); + if (conf->freer1_cnt < FREER1_MEMALLOC_RESERVED) { r1_bh->next_r1 = conf->freer1; conf->freer1 = r1_bh; + conf->freer1_cnt++; spin_unlock_irqrestore(&conf->device_lock, flags); } else { + spin_unlock_irqrestore(&conf->device_lock, flags); kfree(r1_bh); } raid1_free_bh(conf, bh); } -static int raid1_grow_r1bh (raid1_conf_t *conf, int cnt) -{ - int i = 0; - - while (i < cnt) { - struct raid1_bh *r1_bh; - r1_bh = (struct raid1_bh*)kmalloc(sizeof(*r1_bh), GFP_KERNEL); - if (!r1_bh) - break; - memset(r1_bh, 0, sizeof(*r1_bh)); - - md_spin_lock_irq(&conf->device_lock); - set_bit(R1BH_PreAlloc, &r1_bh->state); - r1_bh->next_r1 = conf->freer1; - conf->freer1 = r1_bh; - md_spin_unlock_irq(&conf->device_lock); - - i++; - } - return i; -} - static void raid1_shrink_r1bh(raid1_conf_t *conf) { md_spin_lock_irq(&conf->device_lock); while (conf->freer1) { struct raid1_bh *r1_bh = conf->freer1; conf->freer1 = r1_bh->next_r1; + conf->freer1_cnt--; /* pedantry */ kfree(r1_bh); } md_spin_unlock_irq(&conf->device_lock); @@ -1610,21 +1600,6 @@ static int raid1_run (mddev_t *mddev) goto out_free_conf; } - - /* pre-allocate some buffer_head structures. - * As a minimum, 1 r1bh and raid_disks buffer_heads - * would probably get us by in tight memory situations, - * but a few more is probably a good idea. - * For now, try 16 r1bh and 16*raid_disks bufferheads - * This will allow at least 16 concurrent reads or writes - * even if kmalloc starts failing - */ - if (raid1_grow_r1bh(conf, 16) < 16 || - raid1_grow_bh(conf, 16*conf->raid_disks)< 16*conf->raid_disks) { - printk(MEM_ERROR, mdidx(mddev)); - goto out_free_conf; - } - for (i = 0; i < MD_SB_DISKS; i++) { descriptor = sb->disks+i; @@ -1713,6 +1688,8 @@ out_free_conf: raid1_shrink_r1bh(conf); raid1_shrink_bh(conf, conf->freebh_cnt); raid1_shrink_buffers(conf); + if (conf->freer1_cnt != 0) + BUG(); kfree(conf); mddev->private = NULL; out: ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-12 1:48 ` Andrew Morton @ 2001-07-12 3:22 ` Neil Brown 2001-07-12 4:53 ` Andrew Morton 0 siblings, 1 reply; 13+ messages in thread From: Neil Brown @ 2001-07-12 3:22 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zaitsev, linux-kernel On Thursday July 12, andrewm@uow.edu.au wrote: > > Could you please review these changes? I think I see what you are trying to do, and there is nothing obviously wrong except this comment :-) > + * Return true if the caller make take a raid1_bh from the list. ^^^^ but now that I see what the problem is, I think a simpler patch would be --- drivers/md/raid1.c 2001/07/12 02:00:35 1.1 +++ drivers/md/raid1.c 2001/07/12 02:01:42 @@ -83,6 +83,7 @@ cnt--; } else { PRINTK("raid1: waiting for %d bh\n", cnt); + run_task_queue(&tq_disk); wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt); } } @@ -170,6 +171,7 @@ memset(r1_bh, 0, sizeof(*r1_bh)); return r1_bh; } + run_task_queue(&tq_disk); wait_event(conf->wait_buffer, conf->freer1); } while (1); } This is needed anyway to be "correct", as you should always unplug the queues before waiting for IO to complete. On the issue of whether to pre-allocate some reserved structures or not, I think it's "6-of-one-half-a-dozen-of-the-other". My rationale for pre-allocating was that the buffer that we hold on to would have been allocated together and so probably are fairly dense within their pages, and so there is no risk of hogging excess memory that isn't actually being used. Mind you, if I was really serious about being gentle on the memory allocation, I would use kmem_cache_alloc(bh_cachep,SLAB_whatever) instead of kmalloc(sizeof(struct buffer_head), GFP_whatever) but I hadn't 'got' the slab stuff properly when I was writing that code. Peter, does the above little patch help your problem? NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-12 3:22 ` Neil Brown @ 2001-07-12 4:53 ` Andrew Morton 0 siblings, 0 replies; 13+ messages in thread From: Andrew Morton @ 2001-07-12 4:53 UTC (permalink / raw) To: Neil Brown; +Cc: Peter Zaitsev, linux-kernel Neil Brown wrote: > > --- drivers/md/raid1.c 2001/07/12 02:00:35 1.1 > +++ drivers/md/raid1.c 2001/07/12 02:01:42 > @@ -83,6 +83,7 @@ > cnt--; > } else { > PRINTK("raid1: waiting for %d bh\n", cnt); > + run_task_queue(&tq_disk); > wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt); > } > } > @@ -170,6 +171,7 @@ > memset(r1_bh, 0, sizeof(*r1_bh)); > return r1_bh; > } > + run_task_queue(&tq_disk); > wait_event(conf->wait_buffer, conf->freer1); > } while (1); > } > > This is needed anyway to be "correct", as you should always unplug > the queues before waiting for IO to complete. The problem with this approach is the waitqueue - you get several tasks on the waitqueue, and bdflush loses the race - some other thread steals the r1bh and bdflush goes back to sleep. Replacing the wait_event() with a special raid1_wait_event() which unplugs *each time* the caller is woken does help - but it is still easy to deadlock the system. Clearly this approach is racy: it assumes that the reserved buffers have actually been submitted when we unplug - they may not yet have been. But the lockup is too easy to trigger for that to be a satisfactory explanation. The most effective, aggressive, successful and grotty fix for this problem is to remove the wait_event altogether and replace it with: run_task_queue(tq_disk); current->policy |= SCHED_YIELD; __set_current_state(TASK_RUNNING); schedule(); This can still deadlock in bad OOM situations, but I think we're dead anyway. A combination of this approach plus the PF_FLUSH reservations would work even better, but I found the PF_FLUSH stuff was sufficient. > Mind you, if I was really serious about being > gentle on the memory allocation, I would use > kmem_cache_alloc(bh_cachep,SLAB_whatever) > instead of > kmalloc(sizeof(struct buffer_head), GFP_whatever) get/put_unused_buffer_head() should be exported API functions. - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 11:24 ` Is Swapping on software RAID1 possible in linux 2.4 ? Peter Zaitsev 2001-07-05 12:13 ` Neil Brown 2001-07-05 13:22 ` Re[2]: " Peter Zaitsev @ 2001-07-05 14:54 ` Nick DeClario 2001-07-05 15:12 ` Joseph Bueno 2001-07-11 12:08 ` Paul Jakma 2001-07-06 9:38 ` Re[2]: " Peter Zaitsev 3 siblings, 2 replies; 13+ messages in thread From: Nick DeClario @ 2001-07-05 14:54 UTC (permalink / raw) To: Peter Zaitsev; +Cc: linux-kernel Just out of curiousity what are the advantages to having a RAID1 swap partition? Setting the swap priority to 0 (pri=0) in the fstab of all the swap partitions on your system should have the same effect as doing it with RAID but without the overhead, right? RAID1 would also mirror your swap. Why would you want that? Regards, -Nick Peter Zaitsev wrote: > > Hello linux-kernel, > > Does anyone have information on this subject ? I have the constant > failures with system swapping on RAID1, I just wanted to be shure > this may be the problem or not. It works without any problems with > 2.2 kernel. > > -- > Best regards, > Peter mailto:pz@spylog.ru > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Nicholas DeClario Systems Engineer Guardian Digital, Inc. (201) 934-9230 Pioneering. Open Source. Security. nick@guardiandigital.com http://www.guardiandigital.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 14:54 ` Nick DeClario @ 2001-07-05 15:12 ` Joseph Bueno 2001-07-11 12:08 ` Paul Jakma 1 sibling, 0 replies; 13+ messages in thread From: Joseph Bueno @ 2001-07-05 15:12 UTC (permalink / raw) To: nick; +Cc: Peter Zaitsev, linux-kernel Nick DeClario wrote: > > Just out of curiousity what are the advantages to having a RAID1 swap > partition? Setting the swap priority to 0 (pri=0) in the fstab of all > the swap partitions on your system should have the same effect as doing > it with RAID but without the overhead, right? RAID1 would also mirror > your swap. Why would you want that? > > Regards, > -Nick > Hi, Setting swap priority to 0 is equivalent to RAID0 (striping) not RAID1 (mirroring). Mirroring your swap partition is important because if the disk containing your swap fails, your system is dead. If you want to keep your system running even if one disk fails you need to mirror ALL your active partitions including swap. If you only mirror your data partitions, your are only protected against data loss in case of a disk crash (assuming you shutdown gracefully before it panics while it tries to read/write on a crashed swap partition and leave your data in some inconsistent state). Regards -- Joseph Bueno ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 14:54 ` Nick DeClario 2001-07-05 15:12 ` Joseph Bueno @ 2001-07-11 12:08 ` Paul Jakma 1 sibling, 0 replies; 13+ messages in thread From: Paul Jakma @ 2001-07-11 12:08 UTC (permalink / raw) To: Nick DeClario; +Cc: Peter Zaitsev, linux-kernel On Thu, 5 Jul 2001, Nick DeClario wrote: > RAID1 would also mirror your swap. Why would you want that? redundancy. no point having your data redundant if your swap isn't - 1 drive failure will take out the box the moment it tries to access swap on the failed drive. PS: i have 2 boxes deployed running RH's 2.4.2, with swap on top of LVM on top of RAID1. no problems sofar, even during resync. > Regards, > -Nick --paulj ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re[2]: Is Swapping on software RAID1 possible in linux 2.4 ? 2001-07-05 11:24 ` Is Swapping on software RAID1 possible in linux 2.4 ? Peter Zaitsev ` (2 preceding siblings ...) 2001-07-05 14:54 ` Nick DeClario @ 2001-07-06 9:38 ` Peter Zaitsev 3 siblings, 0 replies; 13+ messages in thread From: Peter Zaitsev @ 2001-07-06 9:38 UTC (permalink / raw) To: Nick DeClario; +Cc: linux-kernel Hello Nick, Thursday, July 05, 2001, 6:54:37 PM, you wrote: Well The idea is simple. I want my system to survive if one of the disk fails. So I store all of my data including swap on RAID partitions. ND> Just out of curiousity what are the advantages to having a RAID1 swap ND> partition? Setting the swap priority to 0 (pri=0) in the fstab of all ND> the swap partitions on your system should have the same effect as doing ND> it with RAID but without the overhead, right? RAID1 would also mirror ND> your swap. Why would you want that? ND> Regards, ND> -Nick ND> Peter Zaitsev wrote: >> >> Hello linux-kernel, >> >> Does anyone have information on this subject ? I have the constant >> failures with system swapping on RAID1, I just wanted to be shure >> this may be the problem or not. It works without any problems with >> 2.2 kernel. >> >> -- >> Best regards, >> Peter mailto:pz@spylog.ru >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ -- Best regards, Peter mailto:pz@spylog.ru ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2001-07-12 4:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <mailman.994340644.23368.linux-kernel2news@redhat.com> 2001-07-05 11:24 ` Is Swapping on software RAID1 possible in linux 2.4 ? Peter Zaitsev 2001-07-05 12:13 ` Neil Brown 2001-07-05 13:22 ` Re[2]: " Peter Zaitsev 2001-07-05 13:42 ` Arjan van de Ven 2001-07-05 18:56 ` Pete Zaitcev 2001-07-12 1:14 ` Re[2]: " Neil Brown 2001-07-12 1:48 ` Andrew Morton 2001-07-12 3:22 ` Neil Brown 2001-07-12 4:53 ` Andrew Morton 2001-07-05 14:54 ` Nick DeClario 2001-07-05 15:12 ` Joseph Bueno 2001-07-11 12:08 ` Paul Jakma 2001-07-06 9:38 ` Re[2]: " Peter Zaitsev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).