linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Block layer question - indicating EOF on block devices
@ 2004-11-30 15:50 Alan Cox
  2004-12-01  2:43 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2004-11-30 15:50 UTC (permalink / raw)
  To: axboe, akpm, Linux Kernel Mailing List

How is a block device meant to indicate to the block layer that the read
issued is beyond EOF. For the case where the true EOF is known the
capacity information is propogated into the inode and that is used. For
the case where a read exceeds the known EOF the block layer sets BIO_EOF
which appears nowhere else I can find.

I'm trying to sort out the case where the block device has only an
approximate length known in advance. At the low level I've got sense
data so I know precisely when I hit the real EOF on read. I can pull
that out, I can partially complete the request neatly up to the EOF but
I can't find any code anywhere dealing with passing back an EOF.

Nor it turns out is it handleable in user space because a read to the
true EOF causes readahead into the fuzzy zone between the actual EOF and
the end of media.

Currently I see the error, pull the sense data, extract the block number
and complete the request to the point it succeeded then fail the rest,
but this doesn't end the I/O if someone is using something like cp, and
it also fills the log with "I/O error on" spew from the block layer
innards even if REQ_QUIET is magically set.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Block layer question - indicating EOF on block devices
  2004-11-30 15:50 Block layer question - indicating EOF on block devices Alan Cox
@ 2004-12-01  2:43 ` Andrew Morton
  2004-12-01 14:54   ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2004-12-01  2:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: axboe, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
> How is a block device meant to indicate to the block layer that the read
> issued is beyond EOF. For the case where the true EOF is known the
> capacity information is propogated into the inode and that is used. For
> the case where a read exceeds the known EOF the block layer sets BIO_EOF
> which appears nowhere else I can find.
> 
> I'm trying to sort out the case where the block device has only an
> approximate length known in advance. At the low level I've got sense
> data so I know precisely when I hit the real EOF on read. I can pull
> that out, I can partially complete the request neatly up to the EOF but
> I can't find any code anywhere dealing with passing back an EOF.

If the driver simply returns an I/O error, userspace should see a short
read and be happy?

> Nor it turns out is it handleable in user space because a read to the
> true EOF causes readahead into the fuzzy zone between the actual EOF and
> the end of media.

Yup.  You can turn the readahead off with posix_fadvise(POSIX_FADV_RANDOM),
or just read the disk with direct-io.  The latter has the advantage that
you can freely pluck out single 512-byte sectors without pagecache causing
any additional reads.

> Currently I see the error, pull the sense data, extract the block number
> and complete the request to the point it succeeded then fail the rest,
> but this doesn't end the I/O if someone is using something like cp,

hm.  Either cp is being silly or we're not propagating the error back
correctly.  `cp' should see the short read and just handle it.

> and
> it also fills the log with "I/O error on" spew from the block layer
> innards even if REQ_QUIET is magically set.

We'd need to propagate that quietness back up to the buffer_head layer, at
least.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Block layer question - indicating EOF on block devices
  2004-12-01  2:43 ` Andrew Morton
@ 2004-12-01 14:54   ` Alan Cox
  2004-12-02  8:18     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2004-12-01 14:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: axboe, Linux Kernel Mailing List

On Mer, 2004-12-01 at 02:43, Andrew Morton wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> If the driver simply returns an I/O error, userspace should see a short
> read and be happy?

And the logs fill with I/O error messages. 

> > Nor it turns out is it handleable in user space because a read to the
> > true EOF causes readahead into the fuzzy zone between the actual EOF and
> > the end of media.
> 
> Yup.  You can turn the readahead off with posix_fadvise(POSIX_FADV_RANDOM),

Can't do this during a mount.

> > Currently I see the error, pull the sense data, extract the block number
> > and complete the request to the point it succeeded then fail the rest,
> > but this doesn't end the I/O if someone is using something like cp,
> 
> hm.  Either cp is being silly or we're not propagating the error back
> correctly.  `cp' should see the short read and just handle it.

I'll strace that and see what else I can find. Now I'm partially
completing requests when this problem occurs it does seem somewhat
happier. The original code when I took it to bits was just blindingly
failing the lot.

> > and
> > it also fills the log with "I/O error on" spew from the block layer
> > innards even if REQ_QUIET is magically set.
> 
> We'd need to propagate that quietness back up to the buffer_head layer, at
> least.

Thats what I was assuming looking at the code. Really the block layer is
broken here. It should not be whining about I/O errors on readahead
blocks just letting them go. It has no idea if the readahead is a
badblock a media feature or whatever. (or as James added on irc scsi
reservations).

Alan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Block layer question - indicating EOF on block devices
  2004-12-01 14:54   ` Alan Cox
@ 2004-12-02  8:18     ` Jens Axboe
  2004-12-02 13:01       ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2004-12-02  8:18 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Linux Kernel Mailing List

On Wed, Dec 01 2004, Alan Cox wrote:
> On Mer, 2004-12-01 at 02:43, Andrew Morton wrote:
> > Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > If the driver simply returns an I/O error, userspace should see a short
> > read and be happy?
> 
> And the logs fill with I/O error messages. 

read-ahead should definitely be marked quiet, agree.

> > > and
> > > it also fills the log with "I/O error on" spew from the block layer
> > > innards even if REQ_QUIET is magically set.
> > 
> > We'd need to propagate that quietness back up to the buffer_head layer, at
> > least.
> 
> Thats what I was assuming looking at the code. Really the block layer is
> broken here. It should not be whining about I/O errors on readahead
> blocks just letting them go. It has no idea if the readahead is a
> badblock a media feature or whatever. (or as James added on irc scsi
> reservations).

The upper buffer layer could do something intelligent if EOF is set on
the bio, it really should. The problem is that there's no -EXXX to flag
EOF from the driver, it would be nicest if one could just do:

	end_that_request_chunk(req, 1, good_bytes);
	end_that_request_chunk(req, -EOF, residual);

and be done with it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Block layer question - indicating EOF on block devices
  2004-12-02  8:18     ` Jens Axboe
@ 2004-12-02 13:01       ` Alan Cox
  2004-12-02 14:07         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2004-12-02 13:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, Linux Kernel Mailing List

On Iau, 2004-12-02 at 08:18, Jens Axboe wrote:
> The upper buffer layer could do something intelligent if EOF is set on
> the bio, it really should. The problem is that there's no -EXXX to flag
> EOF from the driver, it would be nicest if one could just do:
> 
> 	end_that_request_chunk(req, 1, good_bytes);
> 	end_that_request_chunk(req, -EOF, residual);
> 

We have a set of internal error codes around -512 for things like
"please use
the default ioctl behaviour". The error codes don't seem to get
propogated up through the page cache however when I tried using this (I
just "borrowed"
-ENOMEDIUM for testing) with the idea of catching it at the top.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Block layer question - indicating EOF on block devices
  2004-12-02 13:01       ` Alan Cox
@ 2004-12-02 14:07         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2004-12-02 14:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Linux Kernel Mailing List

On Thu, Dec 02 2004, Alan Cox wrote:
> On Iau, 2004-12-02 at 08:18, Jens Axboe wrote:
> > The upper buffer layer could do something intelligent if EOF is set on
> > the bio, it really should. The problem is that there's no -EXXX to flag
> > EOF from the driver, it would be nicest if one could just do:
> > 
> > 	end_that_request_chunk(req, 1, good_bytes);
> > 	end_that_request_chunk(req, -EOF, residual);
> > 
> 
> We have a set of internal error codes around -512 for things like
> "please use
> the default ioctl behaviour". The error codes don't seem to get
> propogated up through the page cache however when I tried using this (I
> just "borrowed"
> -ENOMEDIUM for testing) with the idea of catching it at the top.

It gets passed to the bio->bi_end_io end io handler, but most likely
that doesn't do anything with it (most just treat it as a bool). That's
where the improvement room is, basically :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-12-02 14:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-30 15:50 Block layer question - indicating EOF on block devices Alan Cox
2004-12-01  2:43 ` Andrew Morton
2004-12-01 14:54   ` Alan Cox
2004-12-02  8:18     ` Jens Axboe
2004-12-02 13:01       ` Alan Cox
2004-12-02 14:07         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).