From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265031AbTFYUEp (ORCPT ); Wed, 25 Jun 2003 16:04:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265036AbTFYUEp (ORCPT ); Wed, 25 Jun 2003 16:04:45 -0400 Received: from 216-42-72-146.ppp.netsville.net ([216.42.72.146]:60807 "EHLO tiny.suse.com") by vger.kernel.org with ESMTP id S265031AbTFYUEl (ORCPT ); Wed, 25 Jun 2003 16:04:41 -0400 Subject: Re: [PATCH] io stalls From: Chris Mason To: Andrea Arcangeli Cc: Nick Piggin , Marc-Christian Petersen , Jens Axboe , Marcelo Tosatti , Georg Nikodym , lkml , Matthias Mueller In-Reply-To: <20030625192523.GB19940@dualathlon.random> References: <1055353360.23697.235.camel@tiny.suse.com> <20030611181217.GX26270@dualathlon.random> <1055356032.24111.240.camel@tiny.suse.com> <20030611183503.GY26270@dualathlon.random> <3EE7D1AA.30701@cyberone.com.au> <20030612012951.GG1500@dualathlon.random> <1055384547.24111.322.camel@tiny.suse.com> <3EE7E876.80808@cyberone.com.au> <20030612024608.GE1415@dualathlon.random> <1056567822.10097.133.camel@tiny.suse.com> <20030625192523.GB19940@dualathlon.random> Content-Type: text/plain Organization: Message-Id: <1056572288.10091.161.camel@tiny.suse.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 Date: 25 Jun 2003 16:18:09 -0400 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2003-06-25 at 15:25, Andrea Arcangeli wrote: > > There were a few other small changes to Andrea's patch, he wasn't > > setting q->full when get_request decided there were too many sectors in > > wasn't is really in the past, because I'm doing it in 2.4.21rc8aa1 and > in my latest status. > Hmm, I thought I grabbed the patch from rc8aa1, clearly not though, sorry about that. > > I changed generic_unplug_device to zero the elevator_sequence field of > > the last request on the queue. This means there won't be any merges > > with requests pending once an unplug is done, and helps limit the number > > of sectors that need to be sent down during the run_task_queue(&tq_disk) > > in wait_on_buffer. > > this sounds like an hack, forbidding merges is pretty bad for > performance in general, of course most of the merging happens in between > the unplugs, but during heavy I/O with frequent unplugs from many > readers this may hurt performance. And as you said this mostly has the > advantage of limiting the size of the queue, like I enforce in my tree > with the elevator-lowlatency. I doubt this is right. > Well, I would hit sysrq-t when I noticed read stalls, and the reader was frequently in run_task_queue. I kept the hunk because it made a noticeable difference. I agree there's a throughput tradeoff here, my goal for the patch was to find the major places I could improve latency and change them, then go back later and decide if each one was worth it. Your elevator-lowlatency patch doesn't enforce sector limits for merged requests, so a merger could constantly come in and steal space in the sector limit from other waiters. This lead to high latency in __get_request_wait. That hunk for generic_unplug_device solves both of those problems. > > I lowered the -aa default limit on sectors in flight from 4MB to 2MB. > > I got a few complains for performance slowdown, originally it was 2MB, > so I increased it to 4, from 4M it should be enough for most hardware. > I've no preference really. I didn't notice a throughput difference but my scsi drives only have 2MB of cache. > this is very similar to my status in -aa, exept for the hack that > forbids merging which I think is wrong and the fact you miss the > wake_up_nr that I added to give a meaning to the batching again and that > you don't avoid the unplugs in get_request_wait_wakeup until the queue > is empty. I mean this: > > +static void get_request_wait_wakeup(request_queue_t *q, int rw) > +{ > + /* > + * avoid losing an unplug if a second __get_request_wait did the > + * generic_unplug_device while our __get_request_wait was > running > + * w/o the queue_lock held and w/ our request out of the queue. > + */ > + if (q->rq[rw].count == 0 && waitqueue_active(&q->wait_for_requests[rw])) > + __generic_unplug_device(q); > +} > + > > you said last week the above is racy and it even hanged your box, could > you elaborate? The above is in 2.4.21rc8aa1 and it works fine so far > (though especially the race in wait_for_request is never been known to > be reproducible) It caused hangs/stalls, but I didn't have the sector throttling code at the time and it really changes the interaction of things. I think the hang went a little like this: Lets say all the pending io is done, but the wait queue isn't empty yet because all the waiting tasks haven't yet been scheduled in. Also, we have fewer than nr_requests processes waiting to start io, so when they do all get scheduled in they won't generate an unplug. q->rq.count = q->nr_requests, q->full = 1 new io comes in, sees q->full = 1, unplugs and sleeps. No io is done because the queue is empty. All the old waiters finally get scheduled in and grab their requests, but get_request_wait_wakeup doesn't unplug because q->rq.count != 0. If no additional io comes in, the queue never gets unplugged, and our waiter never gets woken. With the sector throttling on, we've got additional wakeups coming from blk_finished_io (or blk_finished_sectors in my patch). I kept out the wakeup_nr idea because I couldn't figure out how to keep __get_request_wait fair with it in. -chris