From: Andrew Morton <andrewm@uow.edu.au>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Peter Zaitsev <pz@spylog.ru>, linux-kernel@vger.kernel.org
Subject: Re: Is Swapping on software RAID1 possible in linux 2.4 ?
Date: Thu, 12 Jul 2001 14:53:11 +1000 [thread overview]
Message-ID: <3B4D2D37.9440B48D@uow.edu.au> (raw)
In-Reply-To: message from Andrew Morton on Thursday July 12, <1011478953412.20010705152412@spylog.ru> <15172.22988.643481.421716@notabene.cse.unsw.edu.au> <11486070195.20010705172249@spylog.ru> <15180.63984.722843.539959@notabene.cse.unsw.edu.au> <3B4D01E3.1A2F534F@uow.edu.au> <15181.6162.414864.195108@notabene.cse.unsw.edu.au>
Neil Brown wrote:
>
> --- drivers/md/raid1.c 2001/07/12 02:00:35 1.1
> +++ drivers/md/raid1.c 2001/07/12 02:01:42
> @@ -83,6 +83,7 @@
> cnt--;
> } else {
> PRINTK("raid1: waiting for %d bh\n", cnt);
> + run_task_queue(&tq_disk);
> wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
> }
> }
> @@ -170,6 +171,7 @@
> memset(r1_bh, 0, sizeof(*r1_bh));
> return r1_bh;
> }
> + run_task_queue(&tq_disk);
> wait_event(conf->wait_buffer, conf->freer1);
> } while (1);
> }
>
> This is needed anyway to be "correct", as you should always unplug
> the queues before waiting for IO to complete.
The problem with this approach is the waitqueue - you get several
tasks on the waitqueue, and bdflush loses the race - some other
thread steals the r1bh and bdflush goes back to sleep.
Replacing the wait_event() with a special raid1_wait_event()
which unplugs *each time* the caller is woken does help - but
it is still easy to deadlock the system.
Clearly this approach is racy: it assumes that the reserved buffers have
actually been submitted when we unplug - they may not yet have been.
But the lockup is too easy to trigger for that to be a satisfactory
explanation.
The most effective, aggressive, successful and grotty fix for this
problem is to remove the wait_event altogether and replace it with:
run_task_queue(tq_disk);
current->policy |= SCHED_YIELD;
__set_current_state(TASK_RUNNING);
schedule();
This can still deadlock in bad OOM situations, but I think we're
dead anyway. A combination of this approach plus the PF_FLUSH
reservations would work even better, but I found the PF_FLUSH
stuff was sufficient.
> Mind you, if I was really serious about being
> gentle on the memory allocation, I would use
> kmem_cache_alloc(bh_cachep,SLAB_whatever)
> instead of
> kmalloc(sizeof(struct buffer_head), GFP_whatever)
get/put_unused_buffer_head() should be exported API functions.
-
next prev parent reply other threads:[~2001-07-12 4:52 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.994340644.23368.linux-kernel2news@redhat.com>
2001-07-05 11:24 ` Is Swapping on software RAID1 possible in linux 2.4 ? Peter Zaitsev
2001-07-05 12:13 ` Neil Brown
2001-07-05 13:22 ` Re[2]: " Peter Zaitsev
2001-07-05 13:42 ` Arjan van de Ven
2001-07-05 18:56 ` Pete Zaitcev
2001-07-12 1:14 ` Re[2]: " Neil Brown
2001-07-12 1:48 ` Andrew Morton
2001-07-12 3:22 ` Neil Brown
2001-07-12 4:53 ` Andrew Morton [this message]
2001-07-05 14:54 ` Nick DeClario
2001-07-05 15:12 ` Joseph Bueno
2001-07-11 12:08 ` Paul Jakma
2001-07-06 9:38 ` Re[2]: " Peter Zaitsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3B4D2D37.9440B48D@uow.edu.au \
--to=andrewm@uow.edu.au \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@cse.unsw.edu.au \
--cc=pz@spylog.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).