All of lore.kernel.org
 help / color / mirror / Atom feed
* Conceptual questions about device driver
@ 2013-08-01 20:55 neha naik
  2013-08-02  3:21 ` Greg Freemyer
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: neha naik @ 2013-08-01 20:55 UTC (permalink / raw)
  To: kernelnewbies

Hi,
 I have some conceptual questions about device driver :

1. Write order fidelity should be maintained when submitting requests from
device driver to disk below.
    However, acknowledging these requests it is okay if we don't
necessarily maintain that order, right?

2.  Also i want to understand what the device driver does say if in a
multiple paged bio, some of the pages get written
    and some don't, we send the error in the bio. But what about the pages
it has already written??? It can't possibly
    do anything about it, right?


Regards,
Neha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130801/6590318b/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-01 20:55 Conceptual questions about device driver neha naik
@ 2013-08-02  3:21 ` Greg Freemyer
  2013-08-02 14:24   ` Valdis.Kletnieks at vt.edu
  2013-08-02  5:32 ` Rajat Sharma
  2013-08-02  5:55 ` Kumar Amit Mehta
  2 siblings, 1 reply; 10+ messages in thread
From: Greg Freemyer @ 2013-08-02  3:21 UTC (permalink / raw)
  To: kernelnewbies



neha naik <nehanaik27@gmail.com> wrote:
>Hi,
> I have some conceptual questions about device driver :
>
>1. Write order fidelity should be maintained when submitting requests
>from
>device driver to disk below.
>    However, acknowledging these requests it is okay if we don't
>necessarily maintain that order, right?

I should know, but I don't think your question makes sense.  Data transfers are axles immediately upon receipt by the drive.  When the drive actually puts it to stable storage there is not another ack message.

I believe disk drives can typically cache a handful of tracks at a time.  They can do a elevator sort internally on the tracks so the right order is not guarenteed but that has nothing to do with acks back to the driver.

>2.  Also i want to understand what the device driver does say if in a
>multiple paged bio, some of the pages get written
> and some don't, we send the error in the bio. But what about the pages
>it has already written??? It can't possibly
>    do anything about it, right?

I believe the entire bio is failed.  The old data is considered lost and the new never written.  Higher levels may retry smaller sections to see how much they can get out.

>
>Regards,
>Neha
>
Greg

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-01 20:55 Conceptual questions about device driver neha naik
  2013-08-02  3:21 ` Greg Freemyer
@ 2013-08-02  5:32 ` Rajat Sharma
  2013-08-02 19:56   ` Greg Freemyer
  2013-08-02  5:55 ` Kumar Amit Mehta
  2 siblings, 1 reply; 10+ messages in thread
From: Rajat Sharma @ 2013-08-02  5:32 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Aug 2, 2013 at 2:25 AM, neha naik <nehanaik27@gmail.com> wrote:
> Hi,
>  I have some conceptual questions about device driver :
>
> 1. Write order fidelity should be maintained when submitting requests from
> device driver to disk below.
>     However, acknowledging these requests it is okay if we don't necessarily
> maintain that order, right?
>

Yes it should not matter as long as application can rely on data being
written is in order of submission.

> 2.  Also i want to understand what the device driver does say if in a
> multiple paged bio, some of the pages get written
>     and some don't, we send the error in the bio. But what about the pages
> it has already written??? It can't possibly
>     do anything about it, right?
>

Applications generally do not make any assumption about data state for
a failed IO, They might retry again, and blocks might get overwritten
again with same data. So it is perfectly fine.

>
> Regards,
> Neha
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

Rajat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-01 20:55 Conceptual questions about device driver neha naik
  2013-08-02  3:21 ` Greg Freemyer
  2013-08-02  5:32 ` Rajat Sharma
@ 2013-08-02  5:55 ` Kumar Amit Mehta
  2 siblings, 0 replies; 10+ messages in thread
From: Kumar Amit Mehta @ 2013-08-02  5:55 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Aug 01, 2013 at 02:55:42PM -0600, neha naik wrote:
> 2.  Also i want to understand what the device driver does say if in a
> multiple paged bio, some of the pages get written
>     and some don't, we send the error in the bio. But what about the pages
> it has already written??? It can't possibly
>     do anything about it, right?

What do you mean by multiple paged bio? A bio, broken down into bio_vec is
_always_ set of pages(or Am I missing something?), So, most of the time,you
_will_ have a bio containing multiple pages. What about other field of bio
structure; bi_idx, bi_size etc, Can we use those for partial completion?

!!amit

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-02  3:21 ` Greg Freemyer
@ 2013-08-02 14:24   ` Valdis.Kletnieks at vt.edu
  0 siblings, 0 replies; 10+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-08-02 14:24 UTC (permalink / raw)
  To: kernelnewbies

On Thu, 01 Aug 2013 23:21:48 -0400, Greg Freemyer said:

> I should know, but I don't think your question makes sense.  Data transfers
> are axles immediately upon receipt by the drive.  When the drive actually puts
> it to stable storage there is not another ack message.

> I believe disk drives can typically cache a handful of tracks at a time.
> They can do a elevator sort internally on the tracks so the right order is not
> guarenteed but that has nothing to do with acks back to the driver.

In fact, the way that disk drives will flat-out lie about what they're doing
is a major challenge for filesystem designers.  In every filesystem, there are
places where things have to be committed to disk in a certain order for it
to maintain logical consistency in case of a crash mid-operation. For instance,
it's usually better to de-allocate blocks from a file, then add the blocks to
the free list, because a crash in between the two steps results in blocks you
can't allocate. That's better than the other way around, where after the crash
you can re-allocate blocks off the free list and overwrite existing data...

However, actual hardware is known to lie about *ALL* of the following:

1) Whether or not it has actually written a given block to disk.
2) What order writes happened in.
3) How large the disk's write buffer is (so you can't spam the drive with
dummy writes to force eviction of a write you care about)
4) Whether the disk's buffer cache is write-through or write-behind.
5) Whether a cache-flush request has completed, or merely been accepted
6) Whether the battery backup on a disk cache is enabled and functional
7) Whether there actually is a battery backup or not
8) Whether a disk's write cache is enabled or not.
9) Whether a disk *has* a write cache or not.

And that's just the hardware screw-ups that I've gotten burned by personally.
There's an even longer list of stupid hardware tricks I've heard about from
others over beers. ;)

This is why good filesystem designers almost always burn out and resort to
heavy drinking at a young age. ;)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130802/18d854cc/attachment.bin 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-02  5:32 ` Rajat Sharma
@ 2013-08-02 19:56   ` Greg Freemyer
  2013-08-02 21:10     ` neha naik
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Freemyer @ 2013-08-02 19:56 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Aug 2, 2013 at 1:32 AM, Rajat Sharma <fs.rajat@gmail.com> wrote:
> On Fri, Aug 2, 2013 at 2:25 AM, neha naik <nehanaik27@gmail.com> wrote:
>> Hi,
>>  I have some conceptual questions about device driver :
>>
>> 1. Write order fidelity should be maintained when submitting requests from
>> device driver to disk below.
>>     However, acknowledging these requests it is okay if we don't necessarily
>> maintain that order, right?
>>
>
> Yes it should not matter as long as application can rely on data being
> written is in order of submission.

But it can't ..... unless the write cache is turned off and it is
known the the cache is truly off.

There is no guarantee of write order in the block stack.  Not between
the filesystem and the driver.  Not between the driver and the drive.

There are at least 2 elevators shuffling the order of writes to
optimize performance.

Rajat, did you get confused?  Or were you trying to say something else?

Greg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-02 19:56   ` Greg Freemyer
@ 2013-08-02 21:10     ` neha naik
  2013-08-02 22:33       ` Greg Freemyer
  0 siblings, 1 reply; 10+ messages in thread
From: neha naik @ 2013-08-02 21:10 UTC (permalink / raw)
  To: kernelnewbies

Thanks for the responses.  I have one more  question for Greg. I come from
filesystem background and not device driver so i may be a bit confused
about the write order fidelity. I know that filesystems guarantee that.
Looking from filesystem perspective, no write will be allowed on the same
block until
the first write finishes. So, if 'B' is written after 'A' you can always
guarantee that you will see 'B' at the end of the two writes.
  Now imagine not having a filesystem, and doing a write directly on the
device. Do device drivers honour it. Should they? I imagine device driver
as a kind of
queue. So any writes are always queued up one after the other so that it
gives write order fidelity whether it wants to or not. Am i missing
something here.

Regards,
Neha


On Fri, Aug 2, 2013 at 1:56 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote:

> On Fri, Aug 2, 2013 at 1:32 AM, Rajat Sharma <fs.rajat@gmail.com> wrote:
> > On Fri, Aug 2, 2013 at 2:25 AM, neha naik <nehanaik27@gmail.com> wrote:
> >> Hi,
> >>  I have some conceptual questions about device driver :
> >>
> >> 1. Write order fidelity should be maintained when submitting requests
> from
> >> device driver to disk below.
> >>     However, acknowledging these requests it is okay if we don't
> necessarily
> >> maintain that order, right?
> >>
> >
> > Yes it should not matter as long as application can rely on data being
> > written is in order of submission.
>
> But it can't ..... unless the write cache is turned off and it is
> known the the cache is truly off.
>
> There is no guarantee of write order in the block stack.  Not between
> the filesystem and the driver.  Not between the driver and the drive.
>
> There are at least 2 elevators shuffling the order of writes to
> optimize performance.
>
> Rajat, did you get confused?  Or were you trying to say something else?
>
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130802/60d37068/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
  2013-08-02 21:10     ` neha naik
@ 2013-08-02 22:33       ` Greg Freemyer
  0 siblings, 0 replies; 10+ messages in thread
From: Greg Freemyer @ 2013-08-02 22:33 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Aug 2, 2013 at 5:10 PM, neha naik <nehanaik27@gmail.com> wrote:
> Thanks for the responses.  I have one more  question for Greg. I come from
> filesystem background and not device driver so i may be a bit confused
> about the write order fidelity. I know that filesystems guarantee that.

> Looking from filesystem perspective, no write will be allowed on the same
> block until
> the first write finishes. So, if 'B' is written after 'A' you can always
> guarantee that you will see 'B' at the end of the two writes.

I see what you are saying and that is totally different from what I
was talking about.

I am saying if userspace does:

lseek(location_A), write(data_A)
lseek(location_B), write(data_B)

There is no guarantee which of those writes will be delivered to the
device driver to perform first.  The block layer has elevator logic
that can optimize the physical write order.

>  Now imagine not having a filesystem, and doing a write directly on the
> device. Do device drivers honour it. Should they? I imagine device driver as
> a kind of
> queue. So any writes are always queued up one after the other so that it
> gives write order fidelity whether it wants to or not. Am i missing
> something here.

You seem to be talking about:

lseek(location_A), write(data_A)
lseek(location_A), write(data_B)

That is a very special case and there is special logic to ensure
data_B is the final write.  So if you are only talking/thinking about
overwrites of the exact same location, then you need to appreciate and
clarify that you are talking about a special case.

In the overwrite case, I believe the first write (data_A) can simply
be consumed by the second write (data_B) and never even make it to the
device driver.  If it does get to the device driver, the disk drive
might consolidate them to a single write of data_B to media.

> Regards,
> Neha

Greg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
@ 2013-08-03  4:39 Rajat Sharma
  0 siblings, 0 replies; 10+ messages in thread
From: Rajat Sharma @ 2013-08-03  4:39 UTC (permalink / raw)
  To: kernelnewbies

BTW I have not seen filesystem implementations doing write ordering for
direct IO unless a request merge is possible (different user credentials),
since you have mentioned you are from filesystem background, do you know
any such implimentation?

Rajat
------------------------------
From: Rajat Sharma
Sent: 03-08-2013 09:25
To: neha naik; Greg Freemyer
Cc: kernelnewbies
Subject: RE: Conceptual questions about device driver

If filesystem has to guarantee write ordering, it has to serialize write
request. So, only when 'A' is written to the block (completion received
from block driver) filesystem can dispatch 'B' not before that. Remember
that there could be failures too. So if 'A' is failed, request for 'B' is
pending, would a filesystem fail that too? I guess not. Usually page cache
absorbs such overwrites first than issues a single merged IO to block
device, but what about direct IO? If applications need consistency
guarantee against such failures, it should serialize writes. So to
summarize, write ordering effort from each layers
Block driver: none
Filesystem: best effort
Application: full

Rajat
------------------------------
From: neha naik
Sent: 03-08-2013 02:40
To: Greg Freemyer
Cc: Rajat Sharma; kernelnewbies
Subject: Re: Conceptual questions about device driver

Thanks for the responses.  I have one more  question for Greg. I come from
filesystem background and not device driver so i may be a bit confused
about the write order fidelity. I know that filesystems guarantee that.
Looking from filesystem perspective, no write will be allowed on the same
block until
the first write finishes. So, if 'B' is written after 'A' you can always
guarantee that you will see 'B' at the end of the two writes.
  Now imagine not having a filesystem, and doing a write directly on the
device. Do device drivers honour it. Should they? I imagine device driver
as a kind of
queue. So any writes are always queued up one after the other so that it
gives write order fidelity whether it wants to or not. Am i missing
something here.

Regards,
Neha


On Fri, Aug 2, 2013 at 1:56 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote:

> On Fri, Aug 2, 2013 at 1:32 AM, Rajat Sharma <fs.rajat@gmail.com> wrote:
> > On Fri, Aug 2, 2013 at 2:25 AM, neha naik <nehanaik27@gmail.com> wrote:
> >> Hi,
> >>  I have some conceptual questions about device driver :
> >>
> >> 1. Write order fidelity should be maintained when submitting requests
> from
> >> device driver to disk below.
> >>     However, acknowledging these requests it is okay if we don't
> necessarily
> >> maintain that order, right?
> >>
> >
> > Yes it should not matter as long as application can rely on data being
> > written is in order of submission.
>
> But it can't ..... unless the write cache is turned off and it is
> known the the cache is truly off.
>
> There is no guarantee of write order in the block stack.  Not between
> the filesystem and the driver.  Not between the driver and the drive.
>
> There are at least 2 elevators shuffling the order of writes to
> optimize performance.
>
> Rajat, did you get confused?  Or were you trying to say something else?
>
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130802/35c68f7c/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Conceptual questions about device driver
@ 2013-08-03  3:55 Rajat Sharma
  0 siblings, 0 replies; 10+ messages in thread
From: Rajat Sharma @ 2013-08-03  3:55 UTC (permalink / raw)
  To: kernelnewbies

If filesystem has to guarantee write ordering, it has to serialize write
request. So, only when 'A' is written to the block (completion received
from block driver) filesystem can dispatch 'B' not before that. Remember
that there could be failures too. So if 'A' is failed, request for 'B' is
pending, would a filesystem fail that too? I guess not. Usually page cache
absorbs such overwrites first than issues a single merged IO to block
device, but what about direct IO? If applications need consistency
guarantee against such failures, it should serialize writes. So to
summarize, write ordering effort from each layers
Block driver: none
Filesystem: best effort
Application: full

Rajat
------------------------------
From: neha naik
Sent: 03-08-2013 02:40
To: Greg Freemyer
Cc: Rajat Sharma; kernelnewbies
Subject: Re: Conceptual questions about device driver

Thanks for the responses.  I have one more  question for Greg. I come from
filesystem background and not device driver so i may be a bit confused
about the write order fidelity. I know that filesystems guarantee that.
Looking from filesystem perspective, no write will be allowed on the same
block until
the first write finishes. So, if 'B' is written after 'A' you can always
guarantee that you will see 'B' at the end of the two writes.
  Now imagine not having a filesystem, and doing a write directly on the
device. Do device drivers honour it. Should they? I imagine device driver
as a kind of
queue. So any writes are always queued up one after the other so that it
gives write order fidelity whether it wants to or not. Am i missing
something here.

Regards,
Neha


On Fri, Aug 2, 2013 at 1:56 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote:

> On Fri, Aug 2, 2013 at 1:32 AM, Rajat Sharma <fs.rajat@gmail.com> wrote:
> > On Fri, Aug 2, 2013 at 2:25 AM, neha naik <nehanaik27@gmail.com> wrote:
> >> Hi,
> >>  I have some conceptual questions about device driver :
> >>
> >> 1. Write order fidelity should be maintained when submitting requests
> from
> >> device driver to disk below.
> >>     However, acknowledging these requests it is okay if we don't
> necessarily
> >> maintain that order, right?
> >>
> >
> > Yes it should not matter as long as application can rely on data being
> > written is in order of submission.
>
> But it can't ..... unless the write cache is turned off and it is
> known the the cache is truly off.
>
> There is no guarantee of write order in the block stack.  Not between
> the filesystem and the driver.  Not between the driver and the drive.
>
> There are at least 2 elevators shuffling the order of writes to
> optimize performance.
>
> Rajat, did you get confused?  Or were you trying to say something else?
>
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130802/b585e908/attachment-0001.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-08-03  4:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-01 20:55 Conceptual questions about device driver neha naik
2013-08-02  3:21 ` Greg Freemyer
2013-08-02 14:24   ` Valdis.Kletnieks at vt.edu
2013-08-02  5:32 ` Rajat Sharma
2013-08-02 19:56   ` Greg Freemyer
2013-08-02 21:10     ` neha naik
2013-08-02 22:33       ` Greg Freemyer
2013-08-02  5:55 ` Kumar Amit Mehta
2013-08-03  3:55 Rajat Sharma
2013-08-03  4:39 Rajat Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.