linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: One more bio for for floppy users in 2.5.33..
@ 2002-09-07 14:43 linux
  0 siblings, 0 replies; 30+ messages in thread
From: linux @ 2002-09-07 14:43 UTC (permalink / raw)
  To: linux-kernel

On Fri, 6 Sep 2002, Linus Torvalds wrote:
> Note that the delay for motor on/off is _much_ larger than the actual 
> delay for seeking.
> 
> The seek itself is on the order of a few ms, with the head settle time 
> being in the tens (possibly even a few hundred) ms per track. So assuming 
> you end up reading 4 tracks or so due to readahead, that's still in the 
> range of about one second.
> 
> In contrast, the motor on/off time is something like 5 seconds if I
> remember correctly. Of course, you can certainly eject the floppy while
> the motor is still running, but I'd suggest against it.

You're forgetting the transfer rate.  A 1440K floppy has 160
tracks (80 cylinders * 2 heads), or 9K per track.

It spins at 300 RPM, so it takes at least 200 ms to read that track.
45K/sec.

A 64K read spans 7.11 tracks, which will take 1422 ms to read.
Add 100 ms for initial rotational latency, and assume that subsequent
tracks are optimally arranged for continuous reads.

That's 1.5 secods just to transfer the data.

*Then* you can add all of the seek and motor spin-up/down times
mentioned above.

(Of course, if the floppy *isn't* formatted optimally, add an
extra 100 ms per seek, or 700 ms total, of rotatinal latency.)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-10  7:25                       ` Rogier Wolff
@ 2002-09-10  8:01                         ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2002-09-10  8:01 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Linus Torvalds, Andrew Morton, Suparna Bhattacharya, Jens Axboe,
	linux-kernel

Rogier Wolff wrote:
> 
> ...
> 
> Ehmm. I'm in the data-recovery business, and we seem to have lost
> the ability to recover the other 3k of a 4k page if one of the blocks
> is bad.
>
> And we're annoyed about the read-ahead trying to read blocks past
> a bad block without returning to the application.
> 

You can use the raw driver, or O_DIRECT against /dev/hdXX.  That
will give 512-byte granularity and no readahead.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 19:03                     ` Linus Torvalds
  2002-09-05 19:35                       ` Andrew Morton
@ 2002-09-10  7:25                       ` Rogier Wolff
  2002-09-10  8:01                         ` Andrew Morton
  1 sibling, 1 reply; 30+ messages in thread
From: Rogier Wolff @ 2002-09-10  7:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Suparna Bhattacharya, Jens Axboe, linux-kernel

On Thu, Sep 05, 2002 at 12:03:30PM -0700, Linus Torvalds wrote:
> 
> On Thu, 5 Sep 2002, Andrew Morton wrote:
> >
> > It would be simpler if it was nr_of_pages_completed.
> 
> Well.. Maybe.

Ehmm. I'm in the data-recovery business, and we seem to have lost
the ability to recover the other 3k of a 4k page if one of the blocks 
is bad. 

And we're annoyed about the read-ahead trying to read blocks past
a bad block without returning to the application. 

			Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currenly in such an       * 
* excursion: The stable situation does not include humans. ***************

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-06  6:59                             ` Linus Torvalds
@ 2002-09-09 14:08                               ` Bob_Tracy
  0 siblings, 0 replies; 30+ messages in thread
From: Bob_Tracy @ 2002-09-09 14:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Helge Hafting, linux-kernel

Linus Torvalds wrote:
> Oh, well. I don't seem to be the only one who doesn't use the dang things 
> any more. The floppy driver has been broken in 2.5.x for half a year or 
> whatever, and there weren't _that_ many people who ever even mentioned it. 

Oh, we're out here alright...  I didn't speak up (until now) for several
reasons:

(a) development kernel == things break.
(b) when things break, people squawk -- I hardly expected that I'd need
    to add my voice to the "me too" chorus (bad form, don't you know?).
(c) broken things tend to get fixed quickly, which is another way of
    saying I have faith in the developers' abilities :-).

Anyway, floppy support is still reasonably important to me.  I've got a
few legacy systems that cannot boot from from CD-ROM, and a 3.5" floppy
is still handy for small sneaker-net transfers between non-networked
machines.

Thanks for reading...

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:42                   ` Andrew Morton
  2002-09-05 19:03                     ` Linus Torvalds
@ 2002-09-07 11:18                     ` Daniel Phillips
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Phillips @ 2002-09-07 11:18 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel

On Thursday 05 September 2002 20:42, Andrew Morton wrote:
> The `nr_of_sectors_completed' would be tricky to handle - we don't know how
> to bring the pagecache pages uptodate in response to a 512-byte completion.
> We'd have to teach the pagecache read functions to permit userspace to read
> from non-uptodate pages by looking at the buffer_head state.  And then
> look at buffer_req to distinguish "this part of the page hasn't been read yet"
> from "this part of the page had an IO error".  Ick.

It's yet another reason to have subpages, and see, keeping state
at the right granularity really does matter.

I'm just adding items to the list of reasons why we need it, so
when the time comes to do it, I won't have to waste a lot of energy
explaining why.

-- 
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-06  6:47                           ` Helge Hafting
@ 2002-09-06  6:59                             ` Linus Torvalds
  2002-09-09 14:08                               ` Bob_Tracy
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2002-09-06  6:59 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel


On Fri, 6 Sep 2002, Helge Hafting wrote:
> 
> I can think of one case where large readahead hurts for floppy, even
> with partial completion:
> 
> 1. Grab a stack of floppies
> 2. Try mounting (or mount+ls) one after another,
>    in search of the right one.

Note that the delay for motor on/off is _much_ larger than the actual 
delay for seeking.

The seek itself is on the order of a few ms, with the head settle time 
being in the tens (possibly even a few hundred) ms per track. So assuming 
you end up reading 4 tracks or so due to readahead, that's still in the 
range of about one second.

In contrast, the motor on/off time is something like 5 seconds if I
remember correctly. Of course, you can certainly eject the floppy while
the motor is still running, but I'd suggest against it.

> So I think a smaller readahead might make sense for floppies,
> unless people don't do this sort of search anymore.  I don't.

I do agree that a 64kB readahead is likely to be excessive on a floppy, 
I'm just saying that I doubt it will be all that noticeable in most cases. 

The absolute worst case is when opening, reading a sector, and closing
again several times in succession, at which point right now we'll end up
serializing. But even at 64kB, that's going to be faster than most people
can change floppies if they actually want to even glance at what the
contents are.

The reason I want the first sectors to be returned early is that I thought 
it was quite noticeable to do just a simple "ls" on the floppy when I 
tested. Of course, that may be just me: it's literally been several years 
since I really used floppies, and maybe they really always were that slow. 
But I thought the root directory was on track zero, so it _should_ return 
it first thing. 

Oh, well. I don't seem to be the only one who doesn't use the dang things 
any more. The floppy driver has been broken in 2.5.x for half a year or 
whatever, and there weren't _that_ many people who ever even mentioned it. 

			Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 20:27                         ` Linus Torvalds
  2002-09-05 20:38                           ` Linus Torvalds
@ 2002-09-06  6:47                           ` Helge Hafting
  2002-09-06  6:59                             ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Helge Hafting @ 2002-09-06  6:47 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

Linus Torvalds wrote:
[...]
> I do think we might make the read-ahead window configurable, and make slow
> devices have slightly smaller windows.
> 
> On the other hand, I don't think the 64kB IO actually _hurts_ per se, as
> long as it doesn't delay the stuff we care about.

I can think of one case where large readahead hurts for floppy, even
with partial completion:

1. Grab a stack of floppies
2. Try mounting (or mount+ls) one after another,
   in search of the right one.

You'll get the results on screen fast enough 
(mount succeeded/failed or ls showed the right/wrong files)
but when it is the wrong floppy you have to wait for
several tracks to read before you may eject and try
the next one.  

Sure, it is only reading so ejecting "by force" won't
hurt the fs but then you have to wait on io errors instead.

So I think a smaller readahead might make sense for floppies,
unless people don't do this sort of search anymore.  I don't.

Helge Hafting

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 19:35                       ` Andrew Morton
  2002-09-05 20:27                         ` Linus Torvalds
@ 2002-09-05 20:48                         ` Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 20:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel


Btw, the update to do partial completion will need a few more fixes: right
now the different callers of "bio->bi_end_io(bio)" are not very careful
about updating the bio information, since no bi_end_io() function has had
any reason to care before.

That turns the 2-liner patch into slightly more, since for example the 
failure cases in __make_request() need to make sure that they pass in the 
right size/sector count. So the

		....
	end_io:
		bio->bi_end_io(bio);
		return 0;

would become something like

		....
	end_io:
		bio->bi_size = 0;
		bio->bi_end_io(bio, nr_sectors);
		return 0;

if we had this interface.

To avoid those kinds of silly bugs and to avoid havind the bi_end_io() 
function have to look up all the bio information, maybe the end_io calling 
convention really should be

	void bio_end_io(struct bio *bio,
		unsigned int completed,
		unsigned int left, 
		unsigned int uptodate);

and then a failure would just be

	bio->bi_end_io(bio, nr_sectors, 0, 0);

and the end-io function would have all the information it needs to decide 
how much has been done / is left undone directly in the arguments.

One final question would be whether we would want to make all of these
byte counts, for some future networked block device where we might be
getting the completions back in odd-sized chunks, for example? Right now
much of the bio code already is able to handle byte-sized stuff, and it 
might be nice to not have to maintain byte counts in NBD if the bio layer 
already does it anyway..

		Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 20:27                         ` Linus Torvalds
@ 2002-09-05 20:38                           ` Linus Torvalds
  2002-09-06  6:47                           ` Helge Hafting
  1 sibling, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 20:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel


On Thu, 5 Sep 2002, Linus Torvalds wrote:
> 
> With partial results, it would need to do only a slightly different 
> traversal:
> 
> 	end = bio->bi_io_vec + bio->bi_vcnt*PAGE_SIZE - bio->bi_size
> 
> 	start = end - nr_sectors * 512
> 
> 	PAGE_ALIGN(start)
> 	PAGE_ALIGN(end)
> 
> but it's otherwise the exact same code (doing all the edge calculations in 
> bytes, and then only traversing pages that have now been fully done and 
> weren't fully done last time).
> 
> It _looks_ like it literally needs just a few lines of changes.

Ok, so we now have two "few lines of code" changes. Who wants to actually
_do_ these? I'll do it if nobody else wants to, but I'd much rather see
somebody else _do_ want to do this and test it out and just send me a
tested patch ;)

		Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 19:35                       ` Andrew Morton
@ 2002-09-05 20:27                         ` Linus Torvalds
  2002-09-05 20:38                           ` Linus Torvalds
  2002-09-06  6:47                           ` Helge Hafting
  2002-09-05 20:48                         ` Linus Torvalds
  1 sibling, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 20:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel


On Thu, 5 Sep 2002, Andrew Morton wrote:
> 
> >  - the code really cares about and uses the incremental information. At
> >    which point it will already have "handled" any previous incremental
> >    stuff, and the only thing it really cares about is the "new increment".
> 
> So the BIO client would need to keep some state somewhere about "this is
> the next page to unlock".  That would best be in the BIO somewhere.

Well, the information actually is there already, although I have to admit 
that it's kind of subtle. 

Look at the current mpage_end_io_read(), and realize that it already does 
a traversal in pages from

	start = bio->bi_io_vec

	end = bio->bi_io_vec + bio->bi_vcnt - 1

which is actually very close to what it would do with partial results.

With partial results, it would need to do only a slightly different 
traversal:

	end = bio->bi_io_vec + bio->bi_vcnt*PAGE_SIZE - bio->bi_size

	start = end - nr_sectors * 512

	PAGE_ALIGN(start)
	PAGE_ALIGN(end)

but it's otherwise the exact same code (doing all the edge calculations in 
bytes, and then only traversing pages that have now been fully done and 
weren't fully done last time).

It _looks_ like it literally needs just a few lines of changes.

So I would actually argue against adding new information: we _do_ actually 
have the information already, and adding more "convenient" forms of it 
adds more aliasing and coherency issues. The current form isn't _that_ 
complicated to use, and we might hide it behind a simple macro:

	#define GET_PAGE_INDEXES(bio, start, end) \
		... set start/end to point into the ...
		... proper bio->bi_io_vec subarray  ...

> Well we still will have the problem where reading 512 bytes from /dev/fd0
> does 64k of IO.  That is most sweet for reading a bunch of 32k ext2 files
> from a hard drive, not so nice for reading fd0's partition table.

I do think we might make the read-ahead window configurable, and make slow 
devices have slightly smaller windows. 

On the other hand, I don't think the 64kB IO actually _hurts_ per se, as 
long as it doesn't delay the stuff we care about.

		Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:38                       ` Jens Axboe
@ 2002-09-05 19:47                         ` Peter Osterlund
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Osterlund @ 2002-09-05 19:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Andrew Morton, Suparna Bhattacharya, linux-kernel

Jens Axboe <axboe@suse.de> writes:

> And yes, this is 100% identical to the earlier bio code (I forget what
> revisions, and that was prior to BK I think).

It changed in 2.5.5-pre1:

<axboe@burns.home.kernel.dk> (02/02/11 1.262.3.11)
	Remove nr_sectors from bio_end_io end I/O callback. It was a relic
	from when completion was potentially called more than once to indicate
	partial end I/O. These days bio->bi_end_io is _only_ called when I/O
	has completed on the entire bio.

-- 
Peter Osterlund - petero2@telia.com
http://w1.894.telia.com/~u89404340

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 19:03                     ` Linus Torvalds
@ 2002-09-05 19:35                       ` Andrew Morton
  2002-09-05 20:27                         ` Linus Torvalds
  2002-09-05 20:48                         ` Linus Torvalds
  2002-09-10  7:25                       ` Rogier Wolff
  1 sibling, 2 replies; 30+ messages in thread
From: Andrew Morton @ 2002-09-05 19:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 5 Sep 2002, Andrew Morton wrote:
> >
> > It would be simpler if it was nr_of_pages_completed.
> 
> Well.. Maybe.
> 
> I actually think that you in practice really only have two cases:
> 
>  - we only care about full completion. Easily tested for by just looking
>    at bi_size, and leaving the code as it is right now.

OK.

>  - the code really cares about and uses the incremental information. At
>    which point it will already have "handled" any previous incremental
>    stuff, and the only thing it really cares about is the "new increment".

So the BIO client would need to keep some state somewhere about "this is
the next page to unlock".  That would best be in the BIO somewhere.

(I assume the block layer handles the case where the hardware signals
completion in non-sector-ascending-order?  That must have been fun to
code and test).
 
> So I'd suggest making at least the first cut only have the incremental
> information, since the global information already exists through bi_size.

Well we still will have the problem where reading 512 bytes from /dev/fd0
does 64k of IO.  That is most sweet for reading a bunch of 32k ext2 files
from a hard drive, not so nice for reading fd0's partition table.

We can do all sorts of things by dropping parameters, hints and tunables
into q->backing_dev_info.  We just need to work out what is sensible for
this.  ummm.  I guess backing_dev_info.read_k_per_sec would suffice.

Or .i_am_a_floppy ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:42                   ` Andrew Morton
@ 2002-09-05 19:03                     ` Linus Torvalds
  2002-09-05 19:35                       ` Andrew Morton
  2002-09-10  7:25                       ` Rogier Wolff
  2002-09-07 11:18                     ` Daniel Phillips
  1 sibling, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 19:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel


On Thu, 5 Sep 2002, Andrew Morton wrote:
>
> It would be simpler if it was nr_of_pages_completed.

Well.. Maybe.

I actually think that you in practice really only have two cases:

 - we only care about full completion. Easily tested for by just looking 
   at bi_size, and leaving the code as it is right now.

 - the code really cares about and uses the incremental information. At 
   which point it will already have "handled" any previous incremental 
   stuff, and the only thing it really cares about is the "new increment".

So I'd suggest making at least the first cut only have the incremental 
information, since the global information already exists through bi_size.

		Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:28                 ` Linus Torvalds
  2002-09-05 18:31                   ` Jens Axboe
@ 2002-09-05 18:42                   ` Andrew Morton
  2002-09-05 19:03                     ` Linus Torvalds
  2002-09-07 11:18                     ` Daniel Phillips
  1 sibling, 2 replies; 30+ messages in thread
From: Andrew Morton @ 2002-09-05 18:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 5 Sep 2002, Andrew Morton wrote:
> >
> > OK.  But still, I don't see why we need partial BIO completions.  If
> > we say that the basic unit of completion is a whole BIO, then readahead
> > can then manage latency via the outgoing bio size.
> 
> But that's horrible. The floppy driver can take huge bio's no problem, and
> limiting bio sizes to track sizes would be a huge pain in the driver for
> no good reason. In fact, it would be pretty much impossible, since the
> tracks aren't even page-aligned.

erp.  Yes, a BIO can span several tracks.  I see the point.

> ...
> > In the testing I did a few weeks back, everything checked out.  An
> > application which was reading the raw device at 95% of media bandwidth
> > never blocked.  An application which was capable of processing data at
> > 120% of media bandwidth achieved 100%.
> 
> And an application that only wanted one sector? To notice that the
> filesystem is of the wrong type, for example?

OK.  That's the initial-64k thing.  If you read 512 bytes from /dev/fd0,
readahead will go and issue a 64k request, and your 512-byte request takes
ages.

> Throughput is _secondary_ to many latency concerns. And you cannot fix the
> latency with full bio's (see the track issue).

The `nr_of_sectors_completed' would be tricky to handle - we don't know how
to bring the pagecache pages uptodate in response to a 512-byte completion.
We'd have to teach the pagecache read functions to permit userspace to read
from non-uptodate pages by looking at the buffer_head state.  And then
look at buffer_req to distinguish "this part of the page hasn't been read yet"
from "this part of the page had an IO error".  Ick.

It would be simpler if it was nr_of_pages_completed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:31                   ` Jens Axboe
@ 2002-09-05 18:38                     ` Linus Torvalds
  2002-09-05 18:38                       ` Jens Axboe
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 18:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, Suparna Bhattacharya, linux-kernel


On Thu, 5 Sep 2002, Jens Axboe wrote:
> 
> However, I don't see how this is a two-liner change. Basically you are
> changing bi_end_io() from a completion to partial completion invokation,

Right. So we'll add one

	bio->bi_end_io(bio, nr_sectors)

to the partial bio completion case in "finish_that_request_first()".

That's one line.

The other line is

	if (bio->bi_size) return;

which has to be added to the end-request thing.

(yeah, yeah, there are several end-request functions, and we need to fix 
up the function prototypes too etc, but the point is that the actual 
_work_ is just two real lines of new code).

			Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:38                     ` Linus Torvalds
@ 2002-09-05 18:38                       ` Jens Axboe
  2002-09-05 19:47                         ` Peter Osterlund
  0 siblings, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2002-09-05 18:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Suparna Bhattacharya, linux-kernel

On Thu, Sep 05 2002, Linus Torvalds wrote:
> 
> On Thu, 5 Sep 2002, Jens Axboe wrote:
> > 
> > However, I don't see how this is a two-liner change. Basically you are
> > changing bi_end_io() from a completion to partial completion invokation,
> 
> Right. So we'll add one
> 
> 	bio->bi_end_io(bio, nr_sectors)
> 
> to the partial bio completion case in "finish_that_request_first()".
> 
> That's one line.
> 
> The other line is
> 
> 	if (bio->bi_size) return;
> 
> which has to be added to the end-request thing.
> 
> (yeah, yeah, there are several end-request functions, and we need to fix 
> up the function prototypes too etc, but the point is that the actual 
> _work_ is just two real lines of new code).

Ah ok, then we completely agree on what needs doing :-)

And yes, this is 100% identical to the earlier bio code (I forget what
revisions, and that was prior to BK I think).

Jens


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:28                 ` Linus Torvalds
@ 2002-09-05 18:31                   ` Jens Axboe
  2002-09-05 18:38                     ` Linus Torvalds
  2002-09-05 18:42                   ` Andrew Morton
  1 sibling, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2002-09-05 18:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Suparna Bhattacharya, linux-kernel

On Thu, Sep 05 2002, Linus Torvalds wrote:
> > OK.  But still, I don't see why we need partial BIO completions.  If
> > we say that the basic unit of completion is a whole BIO, then readahead
> > can then manage latency via the outgoing bio size.
> 
> But that's horrible. The floppy driver can take huge bio's no problem, and 
> limiting bio sizes to track sizes would be a huge pain in the driver for 
> no good reason. In fact, it would be pretty much impossible, since the 
> tracks aren't even page-aligned.
> 
> So limiting bio's fundamentally _cannot_ do the right thing. While adding
> two lines of code _can_.

I agree that partial completions are the right thing to do here, and in
fact this is how the interface was originally remember?

However, I don't see how this is a two-liner change. Basically you are
changing bi_end_io() from a completion to partial completion invokation,
which requires changing (and complicating) all of them. Just adding
a sector count to bio_endio() does not enable that to partially complete
some pages. What am I missing?

Jens


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 18:00               ` Andrew Morton
@ 2002-09-05 18:28                 ` Linus Torvalds
  2002-09-05 18:31                   ` Jens Axboe
  2002-09-05 18:42                   ` Andrew Morton
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 18:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel


On Thu, 5 Sep 2002, Andrew Morton wrote:
> 
> OK.  But still, I don't see why we need partial BIO completions.  If
> we say that the basic unit of completion is a whole BIO, then readahead
> can then manage latency via the outgoing bio size.

But that's horrible. The floppy driver can take huge bio's no problem, and 
limiting bio sizes to track sizes would be a huge pain in the driver for 
no good reason. In fact, it would be pretty much impossible, since the 
tracks aren't even page-aligned.

So limiting bio's fundamentally _cannot_ do the right thing. While adding
two lines of code _can_.

> Well I'd be interested in knowing specifically what is wrong with the
> behaviour of 2.5.33 against a floppy disk.

Try it (not plain 2.5.33, but current BK where floppy actually works). It
works, but reading a single sector will pause until it has read several 
tracks. Even though the sector came back much earlier.

Also, you missed the error case argument. Right now we _cannot_ handle 
error cases for partial transfers AT ALL. The bio interface simply does 
not support it.  And there is no way you can fix that with the current 
interface. While the partial completion case allows for at least partial 
recovery.

Andrew, give it up. The current interface _sucks_.

> In the testing I did a few weeks back, everything checked out.  An
> application which was reading the raw device at 95% of media bandwidth
> never blocked.  An application which was capable of processing data at
> 120% of media bandwidth achieved 100%.

And an application that only wanted one sector? To notice that the 
filesystem is of the wrong type, for example? 

Throughput is _secondary_ to many latency concerns. And you cannot fix the 
latency with full bio's (see the track issue).

			Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 17:05             ` Linus Torvalds
@ 2002-09-05 18:00               ` Andrew Morton
  2002-09-05 18:28                 ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2002-09-05 18:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel

Linus Torvalds wrote:
> 
> ...
> A fast disk driver will _never ever_ do a partial request completion. A
> high-performance subsystem will put in the scatter-gather list and say
> "go" to the controller, and the controller will send exactly one interrupt
> back when it is all done.

OK.  But still, I don't see why we need partial BIO completions.  If
we say that the basic unit of completion is a whole BIO, then readahead
can then manage latency via the outgoing bio size.
 
> So for such a system, you'd never see partial completions anyway.
> 
> Partial completions are a feature of slow hardware. And slow hardware is
> exactly when we want to know about it.

Well I'd be interested in knowing specifically what is wrong with the
behaviour of 2.5.33 against a floppy disk.

In the testing I did a few weeks back, everything checked out.  An
application which was reading the raw device at 95% of media bandwidth
never blocked.  An application which was capable of processing data at
120% of media bandwidth achieved 100%.

It could be that the initial 64k read at the start-of-file is
too big, and the many-small-file behaviour is poor?

A specific "this sucks" testcase would be helpful...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 16:26           ` Andrew Morton
@ 2002-09-05 17:05             ` Linus Torvalds
  2002-09-05 18:00               ` Andrew Morton
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 17:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel


On Thu, 5 Sep 2002, Andrew Morton wrote:
> Linus Torvalds wrote:
> > 
> > ...
> > I would suggest:
> > 
> >  - add a "nr of sectors completed" argument to the "bi_end_io()" function,
> >    so that it looks like
> > 
> >         void xxx_bio_end_io(struct bio *bio, unsigned long completed)
> >         {
> >                 /*
> >                  * Old completion handlers that don't understand it
> >                  * should just return immediately for partial bio
> >                  * completion notifiers
> >                  */
> >                 if (bio->b_size)
> >                         return;
> >                 ...
> >         }
> > 
> >    which would allow things like mpage_end_io_read() to unlock pages as
> >    they complete, instead of unlocking them all in one go.
> 
> It's a feature!  We don't want to have to soak up 20,000 context
> switches a second just reading a file from an 80MB/sec disk.

You didn't think it through.

The current behaviour is a BUG.

A fast disk driver will _never ever_ do a partial request completion. A 
high-performance subsystem will put in the scatter-gather list and say 
"go" to the controller, and the controller will send exactly one interrupt 
back when it is all done.

So for such a system, you'd never see partial completions anyway.

Partial completions are a feature of slow hardware. And slow hardware is 
exactly when we want to know about it.

So my approach has no downsides, only upsides.

		Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05 15:06         ` Linus Torvalds
@ 2002-09-05 16:26           ` Andrew Morton
  2002-09-05 17:05             ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2002-09-05 16:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel

Linus Torvalds wrote:
> 
> ...
> I would suggest:
> 
>  - add a "nr of sectors completed" argument to the "bi_end_io()" function,
>    so that it looks like
> 
>         void xxx_bio_end_io(struct bio *bio, unsigned long completed)
>         {
>                 /*
>                  * Old completion handlers that don't understand it
>                  * should just return immediately for partial bio
>                  * completion notifiers
>                  */
>                 if (bio->b_size)
>                         return;
>                 ...
>         }
> 
>    which would allow things like mpage_end_io_read() to unlock pages as
>    they complete, instead of unlocking them all in one go.

It's a feature!  We don't want to have to soak up 20,000 context
switches a second just reading a file from an 80MB/sec disk.

> Comments? It looks trivial to do this from a bio level, and it would not
> be hard to update the existing end_io functions (because you can always
> just update them with the one-liner "if not totally done, return" to get
> the old behaviour).
> 
> Andrew? I really dislike the lack of concurrency in our current mpage
> read-ahead thing. Whaddayathink?

Well it is supposed to lay out two BIOs, but if that happens, it's
fragile - it relies on BIO_MAX_SIZE=64k and default readahead=128k.

What I think we need to do here is to get Jens' bio_add_page() stuff
up and running and to then go through the device drivers and set their
max BIO size to something which is inversely proportional to the
device's expected bandwidth.

This way, the floppy readahead would lay out 1- or 2-page BIOs, and
the honking FC array would lay out 128k or larger BIOs.

(In fact, I would prefer that the device's nominal read- and write-
bandwidths be made available in some manner.  This way the VM can
make these granularity-size decisions for readahead, and the VM
can then solve the machine-full-of-dirty-memory-against-a-slow-device
problem.  But the non-blocking page reclaim code probably solves that
too).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-05  7:03       ` Suparna Bhattacharya
@ 2002-09-05 15:06         ` Linus Torvalds
  2002-09-05 16:26           ` Andrew Morton
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2002-09-05 15:06 UTC (permalink / raw)
  To: Suparna Bhattacharya; +Cc: Jens Axboe, linux-kernel, Andrew Morton


[ Suparna, I'll only react to one thing in your email right now, I'll try 
  to take the time to look at your _real_ questions later after I've had
  my coffee and woken up.. ]

On Thu, 5 Sep 2002, Suparna Bhattacharya wrote:
> 
> Barthlomeij actually raised another point about the situation when the
> partial completion happens with an error (i.e. not uptodate case), a situation
> which doesn't automatically get handled correctly. Something to think
> about ?

Good point. Yes. However, it's hard to pass the errors down, since we've 
largely lost the individual parts of the bio (ie people don't even use the 
buffer_heads any more.

There's a somewhat related issue with bio's, namely that partial
_successful_ completion also doesn't notify anybody, which can suck from a
latency standpoint if there are big delays in the partial IO (even if it
is all successful). That's certainly true of the floppy driver, for
example, where we can build up a 64kB bio due to read-ahead and then it
takes many milliseconds between partial completions (which are done one
track at a time in the absolute best case).

Note that this actually makes read-ahead much less effective, because it 
means that we're not doing work while the read-ahead is happening: the 
read-ahead improves throughput by doing big blocks at a time, but it does 
_not_ get the improvement that it used to get of having asynchronous IO 
going on. 

We should wake up the person that maybe only needed the first page.

But right now, we've kind of lost that ability, because the bio itself 
does not contain any such information. We used to have a list of buffer 
heads in the request, and could wake them up one by one, but..

I would suggest:

 - add a "nr of sectors completed" argument to the "bi_end_io()" function, 
   so that it looks like

	void xxx_bio_end_io(struct bio *bio, unsigned long completed)
	{
		/*
		 * Old completion handlers that don't understand it
		 * should just return immediately for partial bio
		 * completion notifiers
		 */
		if (bio->b_size)
			return;
		...
	}

   which would allow things like mpage_end_io_read() to unlock pages as 
   they complete, instead of unlocking them all in one go.

Comments? It looks trivial to do this from a bio level, and it would not 
be hard to update the existing end_io functions (because you can always 
just update them with the one-liner "if not totally done, return" to get 
the old behaviour).

Andrew? I really dislike the lack of concurrency in our current mpage 
read-ahead thing. Whaddayathink?

		Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-04 16:36     ` Linus Torvalds
@ 2002-09-05  7:03       ` Suparna Bhattacharya
  2002-09-05 15:06         ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Suparna Bhattacharya @ 2002-09-05  7:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel

On Wed, Sep 04, 2002 at 09:36:42AM -0700, Linus Torvalds wrote:
> 
> On Wed, 4 Sep 2002, Suparna Bhattacharya wrote:
> > 
> > Oh yes, even I had this fixed this in the bio traversal patches 
> > I had posted (had this in the core patch, and mentioned it 
> > in the description in the note :) ), guess it went unnoticed.
> 
> Well, I've never seen a "this should go in" about it.
> 
> Also, it was apparently mixed up with the "bio splitup" stuff, which was 

True, that fix ought to have gone in as a separate patch. As Jens said, 
it was like we all knew it had to be fixed, but maybe no one quite saw 
the urgency till the case that used it came about. 

Barthlomeij actually raised another point about the situation when the
partial completion happens with an error (i.e. not uptodate case), a situation
which doesn't automatically get handled correctly. Something to think
about ?

> discussed at least with Jens, and I feel strongly that we shouldn't split, 
> we should build up. Jens was working on exactly that.
> 
> In other words, I absolutely hate the fact that a major bug-fix was 
>  (a) not marked as such and sent to me
> and
>  (b) mixed up with experimental work for other drivers

I agree. The fix was a matter on the side. I'd expected it to have gone
in otherwise anyway, since it wasn't something new I'd discovered. It 
was perhaps just so obvious that it didn't happen so far !

I couldn't resist bringing it up in this context, because I really wanted
some critical feedback about whether the approach seemed right, or 
whether I should just let the patch lie by the side till someone found a use 
for it, without having feeling guilty about having abandoned it and not 
following it through enough. At least at Ottawa it sounded like this was
something people were interested in, and that was the whole reason for
my doing in the first place. (Will get to the bio splitting point later
down the line).

After all, getting all drivers fixed up to adhere to this would require 
some work, something I know I can't do on my own (I just
don't understand the subtleties that someone more familiar which them
would), and which won't be right to cause others  to handle unless people
felt it was the correct and useful thing to do. 

I don't even mind dropping it if it seems like a "not now" or "not right"
thing (I already seem to spent more effort on this then I probably should 
have), but at least after a bit of discussion to take it to that point.

> 
> Even now (assuming I hadn't fixed it on my own), I would have preferred to
> get that fix separately, as it would have impacted the floppy driver, for
> example (the fix broke the floppy driver even more than it was before,
> because the floppy driver was stupidly trying to work around the original
> bug by hand).
> 
> Imagine what a horror it is to figure out why a large experimental patch 
> breaks an existing driver? My first reaction would have been to just throw 
> the large new patch away, since it obviously broke the floppy even more. 
> Instead, if I had been passed the bug-fix only, and the floppy had broken 
> worse that it was originally, it would have been absolutely _obvious_ 
> where the problem was.
> 
> In short: please please PLEASE keep fixes to existing code separate from 
> new stuff. It makes everything easier, and I have absolutely no problems 
> with applying "obvious fixes" even if they might break something else.

OK. 

> 
> In contrast, the new stuff I really don't know if it should go in at all, 
> considering that it's trivial (and CPU-efficient) to build up legal bio 
> request on the fly and _not_ depend on splitting them later (or at least 
> making splitting a special thing only used by things like MD and other 
> such indirection layers).

This is exactly the kind of feedback or discussion I need. 

The patch actually addresses a set of problems and not just bio splitting.

Ideally I'd have liked to separately handle different problems differently
in separate patches. In fact that's the way I had started out at first, 
only to realise that things become much simpler or cleaner to implement, 
if we treat this as a whole (excluding the bug fix above) as a basic 
traversal logic change.

The problems are like this:

Today partial segment completion of a bio affects the bv_offset and bv_len
fields in the vecs themselves, because there is nothing in the bio to 
remember how far we are inside a current segment (and as deriving that by
calculating it backwards from bi_size and bi_vcnt could be inefficient).
This means that a higher layer which has passed a vector to bio can't 
assume that the vec entries would be valid after the i/o is completed. 
So though we have a uniform vec abstraction, passing it around subsystems
seamlessly may throw up some unexpected results. (Just that are likely
not to hit this often, which is why we don't hear complaints about it)

Today, it is hard to handle partial submission of a request/bio where
partial submission state has to be remembered across function invokations
(or interrupts) without making a copy of the request _and_ its corresponding 
bios (together with vectors), since the block layer traversal functions 
combine traversal and completion without a clear concept of separation 
between submission and completion state within a request. IDE mult-count 
PIO is of course a case in point. Not sure if there are others.

Today a bio cannot be split in the middle of a segment, which means that
MD or other such indirection layers need to do their own bvec allocations
in such cases. Once we have bio with has a different start offset from the
first bvec's start, we lose some simplicity (which is why I'm not
sure if all drivers necessarily need to support such bio s the first time
around). But with such a field in the bio (bi_voffset), we can use 
that as a moving counter during traversal. (BTW, an extra field may not  
have been indispensable if we didn't need to separate submission and 
completion state, e.g. rq->current_nr_sectors might work for just this 
purpose as well, and that was what I was planning to try before Jens 
suggested going ahead with bi_voffset, but then found it handy to
manage completion state separately from submission state).

BTW, I agree splitting truly becomes an issue mainly in the
case of such layered drivers, in other cases it is a good idea to build
before hand; but for indirection drivers it _is_ an issue ... one can't expect
to predict all situations and that would limit flexibility there.

Another point that I've always wanted to bring up is that the moment we
get to a need to divide an i/o into separate requests either to a single
or multiple drivers, we _are_ doing an allocation of the request struct
(using a slot in the pool of available requests for that queue). Building
the right bio sizes doesn't do away with this, does it ?

In any case, special case problem situations can always be dealt with 
special case code, but since we've redesigned the block i/o layer 
the question is whether we've got our abstractions right to deal with 
various requirements at the appropriate level and avoid workarounds 
by drivers or callers into block. 

So rather than look at the above problems individually and frequency
of situations where we can imagine they'd occur in practice, could 
you think about the basics a bit and let me know what you think ?

(a) Not the right away to go at all ?
(b) Think about it later, not now ?
(c) Try and get more pieces working correctly with this and then see ?

One concern that I have with the patches I posted (I did split them 
up last time around, so they look smaller) is the increase in number 
of counters to update / track. This also has the effect of requiring 
updates happen via helpers rather than drivers individually managing 
these counters which is not necessarily a bad thing in itself, but 
I have to say that I was a trifle worried about potential
unexpected side effects in the short run for drivers that make their
own assumptions about these. I was hoping that driver maintainers
would be able to help sound an alert for any issues they see right
away.

Regards
Suparna

> 
> 			Linus
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-04  7:25   ` Suparna Bhattacharya
@ 2002-09-04 16:36     ` Linus Torvalds
  2002-09-05  7:03       ` Suparna Bhattacharya
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2002-09-04 16:36 UTC (permalink / raw)
  To: Suparna Bhattacharya; +Cc: Jens Axboe, linux-kernel


On Wed, 4 Sep 2002, Suparna Bhattacharya wrote:
> 
> Oh yes, even I had this fixed this in the bio traversal patches 
> I had posted (had this in the core patch, and mentioned it 
> in the description in the note :) ), guess it went unnoticed.

Well, I've never seen a "this should go in" about it.

Also, it was apparently mixed up with the "bio splitup" stuff, which was 
discussed at least with Jens, and I feel strongly that we shouldn't split, 
we should build up. Jens was working on exactly that.

In other words, I absolutely hate the fact that a major bug-fix was 
 (a) not marked as such and sent to me
and
 (b) mixed up with experimental work for other drivers

Even now (assuming I hadn't fixed it on my own), I would have preferred to
get that fix separately, as it would have impacted the floppy driver, for
example (the fix broke the floppy driver even more than it was before,
because the floppy driver was stupidly trying to work around the original
bug by hand).

Imagine what a horror it is to figure out why a large experimental patch 
breaks an existing driver? My first reaction would have been to just throw 
the large new patch away, since it obviously broke the floppy even more. 
Instead, if I had been passed the bug-fix only, and the floppy had broken 
worse that it was originally, it would have been absolutely _obvious_ 
where the problem was.

In short: please please PLEASE keep fixes to existing code separate from 
new stuff. It makes everything easier, and I have absolutely no problems 
with applying "obvious fixes" even if they might break something else.

In contrast, the new stuff I really don't know if it should go in at all, 
considering that it's trivial (and CPU-efficient) to build up legal bio 
request on the fly and _not_ depend on splitting them later (or at least 
making splitting a special thing only used by things like MD and other 
such indirection layers).

			Linus


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-03 18:02 ` Jens Axboe
@ 2002-09-04  7:25   ` Suparna Bhattacharya
  2002-09-04 16:36     ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Suparna Bhattacharya @ 2002-09-04  7:25 UTC (permalink / raw)
  To: Linus Torvalds, Jens Axboe, linux-kernel

On Tue, 03 Sep 2002 23:44:59 +0530, Jens Axboe wrote:

> On Tue, Sep 03 2002, Linus Torvalds wrote:
>> 
>> Ok,
>>  I found another major bio-related bug that definitely explains why the
>> floppy driver generated corruption - any partial request completion
>> would be totally messed up by the BIO layer (not the floppy driver).
>> 
>> Any other block device driver that did partial request completion might
>> also be impacted.
>> 
>> Jens, oops. We should not update the counts by how much was left
>> uncompleted, but by how much we successfully completed!
> 
> Yeah oops, the most embarassing thing is that Bart and I have both found
> this but independently months ago but it seems it got lost at my end (or
> your end, but lets not point fingers :-) :-(
> 

Oh yes, even I had this fixed this in the bio traversal patches 
I had posted (had this in the core patch, and mentioned it 
in the description in the note :) ), guess it went unnoticed.

BTW, any plans for including those patches ? So far all feedback
I've received (including Jens, Bart, James, Andrew ) seems to 
say OK to go from what I can tell. If it seems like the right 
thing to do, then I'd rather it go in sooner (at least the core
and comatibility portions), so subsequent driver changes etc 
are aligned with this (I've been chasing several kernel versions 
with this now :( )

I now have a latest version updated to 2.5.33, but dropped the 
IDE portions from it for now, in view of the ide switch and 
ongoing changes ... If Bart upgrades his corresponding patches 
then that would take care of it. Otherwise I could give that 
a go too you think that seems worthwhile and may relook at
whether the "traverse for submission" helpers I have can be
improved upon (maybe given better names too).

A quick recap:

BIO traversal enhancements
--------------------------
- Pre-req for full BIO splitting infrastructure [Splits
  in the middle of a segment can also share the same vec]
- Pre-req for attaching a pre-built vector (not owned by block)
  to a bio (e.g for aio)  [Avoids certain subtle side-effects in
  special cases like partial segment completion]
- Pre-req for IDE mult-count PIO write fixes w/o request copy
  [Ability to traverse a bio partially for i/o submission without
  effecting bio completion]

Introduces some new fields in the bio and request to help maintain
traversal state and handle partial segment bio clones, and comes
with a corresponding set of helpers to enable clean separation 
of traversal for i/o submission vs i/o completion.

Regards
Suparna

> Patch is ofcourse correct. I'm not sure other drivers have been hit (of
> the used ones), since they would have to use old style completions and
> do less than current_nr_sectors in one-go.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-03 21:52   ` Linus Torvalds
@ 2002-09-03 22:33     ` Mikael Pettersson
  0 siblings, 0 replies; 30+ messages in thread
From: Mikael Pettersson @ 2002-09-03 22:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mikael Pettersson, Kernel Mailing List

Linus Torvalds writes:
 > 
 > On Tue, 3 Sep 2002, Mikael Pettersson wrote:
 > > 
 > > Confirmed. 2.5.33 + your two patches still corrupts data on a simple
 > > dd to then from /dev/fd0 test.
 > 
 > Ok, if you don't have BK, then here's the floppy driver end_request() 
 > cleanup as a plain patch.
 > 
 > This passes dd tests for me, but they were by no means very exhaustive.

Success! With this patch and your previous two the floppy driver
passes all tests I've thrown at it, including raw data access,
VFS access with ext2, installing lilo, fsck, and booting the result.

Thanks.

/Mikael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-03 20:57 ` Mikael Pettersson
@ 2002-09-03 21:52   ` Linus Torvalds
  2002-09-03 22:33     ` Mikael Pettersson
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2002-09-03 21:52 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Kernel Mailing List


On Tue, 3 Sep 2002, Mikael Pettersson wrote:
> 
> Confirmed. 2.5.33 + your two patches still corrupts data on a simple
> dd to then from /dev/fd0 test.

Ok, if you don't have BK, then here's the floppy driver end_request() 
cleanup as a plain patch.

This passes dd tests for me, but they were by no means very exhaustive.

		Linus

---
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/09/03	torvalds@home.transmeta.com	1.581.4.2
# Fix floppy driver end_request() handling - it used to do insane
# contortions instead of just calling "end_that_request_first()" with
# the proper sector count.
# --------------------------------------------
#
diff -Nru a/drivers/block/floppy.c b/drivers/block/floppy.c
--- a/drivers/block/floppy.c	Tue Sep  3 14:51:09 2002
+++ b/drivers/block/floppy.c	Tue Sep  3 14:51:09 2002
@@ -2295,16 +2295,15 @@
 {
 	kdev_t dev = req->rq_dev;
 
-	if (end_that_request_first(req, uptodate, req->hard_cur_sectors))
+	if (end_that_request_first(req, uptodate, current_count_sectors))
 		return;
 	add_blkdev_randomness(major(dev));
 	floppy_off(DEVICE_NR(dev));
 	blkdev_dequeue_request(req);
 	end_that_request_last(req);
 
-	/* Get the next request */
-	req = elv_next_request(QUEUE);
-	CURRENT = req;
+	/* We're done with the request */
+	CURRENT = NULL;
 }
 
 
@@ -2335,27 +2334,8 @@
 
 		/* unlock chained buffers */
 		spin_lock_irqsave(q->queue_lock, flags);
-		while (current_count_sectors && CURRENT &&
-		       current_count_sectors >= req->current_nr_sectors){
-			current_count_sectors -= req->current_nr_sectors;
-			req->nr_sectors -= req->current_nr_sectors;
-			req->sector += req->current_nr_sectors;
-			end_request(req, 1);
-		}
+		end_request(req, 1);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-
-		if (current_count_sectors && CURRENT) {
-			/* "unlock" last subsector */
-			req->buffer += current_count_sectors <<9;
-			req->current_nr_sectors -= current_count_sectors;
-			req->nr_sectors -= current_count_sectors;
-			req->sector += current_count_sectors;
-			return;
-		}
-
-		if (current_count_sectors && !CURRENT)
-			DPRINT("request list destroyed in floppy request done\n");
-
 	} else {
 		if (rq_data_dir(req) == WRITE) {
 			/* record write error information */


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-03 18:00 Linus Torvalds
  2002-09-03 18:02 ` Jens Axboe
@ 2002-09-03 20:57 ` Mikael Pettersson
  2002-09-03 21:52   ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Mikael Pettersson @ 2002-09-03 20:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Jens Axboe

Linus Torvalds writes:
 > 
 > Ok,
 >  I found another major bio-related bug that definitely explains why the 
 > floppy driver generated corruption - any partial request completion would 
 > be totally messed up by the BIO layer (not the floppy driver).
 > 
 > Any other block device driver that did partial request completion might 
 > also be impacted.
 > 
 > I'm still looking at the floppy driver itself - some of the request 
 > completion code is so messed up as to be unreadable, and some of that may 
 > actually be due to trying to work around the bio problem (unsuccessfully, 
 > I may add). So this may not actually fix things for you yet, simply 
 > because the floppy driver itself still does some strange things. 

Confirmed. 2.5.33 + your two patches still corrupts data on a simple
dd to then from /dev/fd0 test.

/Mikael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: One more bio for for floppy users in 2.5.33..
  2002-09-03 18:00 Linus Torvalds
@ 2002-09-03 18:02 ` Jens Axboe
  2002-09-04  7:25   ` Suparna Bhattacharya
  2002-09-03 20:57 ` Mikael Pettersson
  1 sibling, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2002-09-03 18:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Tue, Sep 03 2002, Linus Torvalds wrote:
> 
> Ok,
>  I found another major bio-related bug that definitely explains why the 
> floppy driver generated corruption - any partial request completion would 
> be totally messed up by the BIO layer (not the floppy driver).
> 
> Any other block device driver that did partial request completion might 
> also be impacted.
> 
> I'm still looking at the floppy driver itself - some of the request 
> completion code is so messed up as to be unreadable, and some of that may 
> actually be due to trying to work around the bio problem (unsuccessfully, 
> I may add). So this may not actually fix things for you yet, simply 
> because the floppy driver itself still does some strange things. 
> 
> Jens, oops. We should not update the counts by how much was left
> uncompleted, but by how much we successfully completed!

Yeah oops, the most embarassing thing is that Bart and I have both found
this but independently months ago but it seems it got lost at my end (or
your end, but lets not point fingers :-) :-(

Patch is ofcourse correct. I'm not sure other drivers have been hit (of
the used ones), since they would have to use old style completions and
do less than current_nr_sectors in one-go.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* One more bio for for floppy users in 2.5.33..
@ 2002-09-03 18:00 Linus Torvalds
  2002-09-03 18:02 ` Jens Axboe
  2002-09-03 20:57 ` Mikael Pettersson
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-03 18:00 UTC (permalink / raw)
  To: Kernel Mailing List; +Cc: Jens Axboe


Ok,
 I found another major bio-related bug that definitely explains why the 
floppy driver generated corruption - any partial request completion would 
be totally messed up by the BIO layer (not the floppy driver).

Any other block device driver that did partial request completion might 
also be impacted.

I'm still looking at the floppy driver itself - some of the request 
completion code is so messed up as to be unreadable, and some of that may 
actually be due to trying to work around the bio problem (unsuccessfully, 
I may add). So this may not actually fix things for you yet, simply 
because the floppy driver itself still does some strange things. 

Jens, oops. We should not update the counts by how much was left
uncompleted, but by how much we successfully completed!

		Linus

----
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/09/03	torvalds@home.transmeta.com	1.582
# Major partial request completion boo-boo in the bio layer.
# 
# This was _bad_. Major floppy corruption, and possibly the reason
# for other block device corruption for any driver that generated
# partial results for a block device request.
# --------------------------------------------
#
diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c	Tue Sep  3 10:53:59 2002
+++ b/drivers/block/ll_rw_blk.c	Tue Sep  3 10:53:59 2002
@@ -1992,11 +1992,11 @@
 		 * not a complete bvec done
 		 */
 		if (unlikely(nsect > nr_sectors)) {
-			int residual = (nsect - nr_sectors) << 9;
+			int partial = nr_sectors << 9;
 
-			bio->bi_size -= residual;
-			bio_iovec(bio)->bv_offset += residual;
-			bio_iovec(bio)->bv_len -= residual;
+			bio->bi_size -= partial;
+			bio_iovec(bio)->bv_offset += partial;
+			bio_iovec(bio)->bv_len -= partial;
 			blk_recalc_rq_sectors(req, nr_sectors);
 			blk_recalc_rq_segments(req);
 			return 1;


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2002-09-10  7:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-07 14:43 One more bio for for floppy users in 2.5.33 linux
  -- strict thread matches above, loose matches on Subject: below --
2002-09-03 18:00 Linus Torvalds
2002-09-03 18:02 ` Jens Axboe
2002-09-04  7:25   ` Suparna Bhattacharya
2002-09-04 16:36     ` Linus Torvalds
2002-09-05  7:03       ` Suparna Bhattacharya
2002-09-05 15:06         ` Linus Torvalds
2002-09-05 16:26           ` Andrew Morton
2002-09-05 17:05             ` Linus Torvalds
2002-09-05 18:00               ` Andrew Morton
2002-09-05 18:28                 ` Linus Torvalds
2002-09-05 18:31                   ` Jens Axboe
2002-09-05 18:38                     ` Linus Torvalds
2002-09-05 18:38                       ` Jens Axboe
2002-09-05 19:47                         ` Peter Osterlund
2002-09-05 18:42                   ` Andrew Morton
2002-09-05 19:03                     ` Linus Torvalds
2002-09-05 19:35                       ` Andrew Morton
2002-09-05 20:27                         ` Linus Torvalds
2002-09-05 20:38                           ` Linus Torvalds
2002-09-06  6:47                           ` Helge Hafting
2002-09-06  6:59                             ` Linus Torvalds
2002-09-09 14:08                               ` Bob_Tracy
2002-09-05 20:48                         ` Linus Torvalds
2002-09-10  7:25                       ` Rogier Wolff
2002-09-10  8:01                         ` Andrew Morton
2002-09-07 11:18                     ` Daniel Phillips
2002-09-03 20:57 ` Mikael Pettersson
2002-09-03 21:52   ` Linus Torvalds
2002-09-03 22:33     ` Mikael Pettersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).