linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Conway <nconway.list@ukaea.org.uk>
To: Jens Axboe <axboe@suse.de>
Cc: Martin Dalecki <dalecki@evision-ventures.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] 2.5.15 IDE 61
Date: Wed, 15 May 2002 12:37:48 +0100	[thread overview]
Message-ID: <3CE2488C.AF2AFB9A@ukaea.org.uk> (raw)
In-Reply-To: <E177dYp-00083c-00@the-village.bc.nu> <3CE11F90.5070701@evision-ventures.com> <3CE13943.FBD5B1D6@ukaea.org.uk> <20020514163241.GR17509@suse.de> <3CE13F99.5BDED3DF@ukaea.org.uk> <20020514165113.GT17509@suse.de>

Jens Axboe wrote:
> On Tue, May 14 2002, Neil Conway wrote:
> > > To really serialize operations the queue _must_ be shared with whoever
> > > requires serialiation.
> > Why will this help?  The hardware can still be doing DMA on hda while
> > the queue's request_fn is called quite legitimately for a hdb request -
> > and the IDE code MUST impose the serialization here to avoid hitting the
> > cable with commands destined for hdb. (For example, by waiting for
> > !channel->busy.)
>
> Current IDE code leaves a request on the list until it has completed
> (this is ignoring TCQ of course), so there's no way that you could start
> serving a second request before the first one completes.

I didn't understand why this was the case, so I've been reading source
code (and sleeping for a few minutes too ;-)).  Wow, is it hard for a
newbie to follow - if I didn't know better I'd swear the kernel was
obfuscated.

I _think_ I now understand what's going on here: you guys already know
this stuff but perhaps others can learn from my slog through the code
(?) (and/or spot the dumb mistakes).  (I did notice what could be a
potential problem too, skip to the end if you want to find it.)

On a "totally" idle system, if a process decides to read from a file,
the sequence appears to be (with minor simplifications for clarity):

sys_read(), file->fops->read(), usually then into generic_file_read(),
page_cache_readahead(), and then the important one:
do_page_cache_readahead().  This does a couple of seriously important
things.  (The name implies it's only used for readahead but the comments
show this isn't the case (if one reads them!)).

Firstly, do_page_cache_readahead() invokes the call chain:
a_ops->readpage(), block_read_full_page(), submit_bh(), submit_bio(),
generic_make_request() (this one finds the right queue for the I/O) and
down into __make_request_fcn(q,bio).  This routine is also key: it adds
requests to the device queue, and a little more: if the queue is empty
it "plugs" the device (before adding the request),  ("plugging" the
device refers to preventing I/O while a request queue is populated), and
also queues a task to the tq_disk task-queue which will unplug the
device at some later time.  The "unplug" routine is central: see below.

Control returns to do_page_cache_readahead() after the call to
readpage().  It then calls run_task_queue(&tq_disk), thus starting the
call chain: generic_unplug_device(),..., q->request_fn().  This is the
end of the line for the block-layer: request_fn() is part of the device
code - for IDE it's do_ide_request().

So this call to request_fn()/do_ide_request() is the first time the IDE
code is really involved in the loop.

The key bit to notice here is that request_fn() is only called if the
device was plugged.  This in turn only happens (well almost, see below)
if the queue was empty.  Thus one concludes that if do_ide_request() is
busy servicing requests and thus the queue is non-empty, the block layer
will never again call do_ide_request(); it will be allowed to get on
with things in its own time.

This I believe, is what you meant Jens when you said "so there's no way
that you could start serving a second request before the first one
completes".  Am I right so far?

Now a possible fly in the ointment: __make_request_fcn() will actually
plug a non-empty queue if BIO_RW_BARRIER is set.  This appears right now
to be impossible because I can't find any code in the entire kernel that
uses bio_barrier() or any similar construct.  But if it's a work in
progress and this bit is set one day, then it seems to me that the block
layer could plug a non-empty queue, and then subsequently any call to
run_task_queue(&tq_disk) would call the block-device's request_fn(). 
This would violate the assumption that request_fn() is never called
twice without the queue emptying in between.  The current IDE code would
survive this because it checks for hwgroup->busy.

Anyway, IFF the BARRIER stuff is not an issue, then I guess the
block-layer really can do all the serialization we need if we set things
up right.  I now probably have to retract my assertions that we vitally
need hwgroup->busy (or equiv) because there really does appear to be no
route into request_fn() from the block layer other than if the queue was
empty...  In the real world though, as you suggested, it's probably
worth having.

all the best,
Neil

PS: Errors and Omissions Expected.

  reply	other threads:[~2002-05-15 11:38 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-14  9:49 [PATCH] 2.5.15 IDE 61 Neil Conway
2002-05-14  8:52 ` Martin Dalecki
2002-05-14 10:12   ` Neil Conway
2002-05-14  9:30     ` Martin Dalecki
2002-05-14 11:10       ` Neil Conway
2002-05-14 10:21         ` Martin Dalecki
2002-05-14 11:38           ` Russell King
2002-05-14 10:49             ` Martin Dalecki
2002-05-14 12:10             ` Alan Cox
2002-05-14 11:11               ` Martin Dalecki
2002-05-14 12:47                 ` Alan Cox
2002-05-14 12:30                   ` Martin Dalecki
2002-05-15 14:43                 ` Pavel Machek
2002-05-14 12:00               ` Russell King
2002-05-14 11:03                 ` Martin Dalecki
2002-05-14 13:03               ` Neil Conway
2002-05-14 13:27                 ` Andre Hedrick
2002-05-14 14:45                 ` Alan Cox
2002-05-14 14:30                   ` Martin Dalecki
2002-05-14 16:20                     ` Neil Conway
2002-05-14 16:32                       ` Jens Axboe
2002-05-14 16:47                         ` Neil Conway
2002-05-14 16:51                           ` Jens Axboe
2002-05-15 11:37                             ` Neil Conway [this message]
2002-05-14 22:51                           ` Mike Fedyk
2002-05-14 16:26                     ` Jens Axboe
2002-05-14 19:34                     ` Anton Altaparmakov
2002-05-15  6:16                       ` Jens Axboe
2002-05-15  8:32                         ` Anton Altaparmakov
2002-05-15  9:42                           ` Martin Dalecki
2002-05-15  9:32                       ` Martin Dalecki
2002-05-15 11:44                         ` Neil Conway
2002-05-15 11:02                           ` Martin Dalecki
2002-05-15 13:10                             ` Alan Cox
2002-05-15 13:34                               ` Neil Conway
2002-05-15 13:04                                 ` Martin Dalecki
2002-05-15 14:08                               ` benh
2002-05-15 16:40                         ` Denis Vlasenko
2002-05-15 11:55                           ` Neil Conway
2002-05-17  7:07                             ` Mike Fedyk
2002-05-17 11:06                               ` Neil Conway
2002-05-17 10:12                                 ` Martin Dalecki
2002-05-14 16:03                   ` Neil Conway
2002-05-14 16:46                     ` Alan Cox
2002-05-14 12:52       ` Daniela Engert
  -- strict thread matches above, loose matches on Subject: below --
2002-05-06  3:53 Linux-2.5.14 Linus Torvalds
2002-05-13  9:48 ` [PATCH] 2.5.15 IDE 61 Martin Dalecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3CE2488C.AF2AFB9A@ukaea.org.uk \
    --to=nconway.list@ukaea.org.uk \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=axboe@suse.de \
    --cc=dalecki@evision-ventures.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).