All of lore.kernel.org
 help / color / mirror / Atom feed
* Forum for asking questions related to block device drivers
@ 2013-04-10 20:53 neha naik
  2013-04-11  5:15 ` Rajat Sharma
  2013-04-11  7:47 ` Forum for asking questions related to block device drivers Bjørn Mork
  0 siblings, 2 replies; 13+ messages in thread
From: neha naik @ 2013-04-10 20:53 UTC (permalink / raw)
  To: kernelnewbies

Hi All,
   Nobody has replied to my query here. So i am just wondering if there is
a forum for block device driver where i can post my query.
Please tell me if there is any such forum.

Thanks,
Neha

---------- Forwarded message ----------
From: neha naik <nehanaik27@gmail.com>
Date: Tue, Apr 9, 2013 at 10:18 AM
Subject: Passthrough device driver performance is low on reads compared to
writes
To: kernelnewbies at kernelnewbies.org


Hi All,
  I have written a passthrough block device driver using 'make_request'
call. This block device driver simply passes any request that comes to it
down to lvm.

However, the read performance for my passthrough driver is around 65MB/s
(measured through dd) and write performance is around 140MB/s for dd block
size 4096.
The write performance matches with lvm's write performance more or less
but, the read performance on lvm is around 365MB/s.

I am posting snippets of code which i think are relevant here:

static int passthrough_make_request(
struct request_queue * queue, struct bio * bio)
{

        passthrough_device_t * passdev = queue->queuedata;
        bio->bi_bdev = passdev->bdev_backing;
        generic_make_request(bio);
        return 0;
}

For initializing the queue i am using following:

blk_queue_make_request(passdev->queue, passthrough_make_request);
passdev->queue->queuedata = sbd;
passdev->queue->unplug_fn = NULL;
bdev_backing = passdev->bdev_backing;
blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
        blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
}

Now, I browsed through dm code in kernel to see if there is some flag or
something which i am not using which is causing this huge performance
penalty.
But, I have not found anything.

If you have any ideas about what i am possibly doing wrong then please tell
me.

Thanks in advance.

Regards,
Neha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130410/5052f416/attachment.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-10 20:53 Forum for asking questions related to block device drivers neha naik
@ 2013-04-11  5:15 ` Rajat Sharma
  2013-04-11 15:09   ` neha naik
  2013-04-11  7:47 ` Forum for asking questions related to block device drivers Bjørn Mork
  1 sibling, 1 reply; 13+ messages in thread
From: Rajat Sharma @ 2013-04-11  5:15 UTC (permalink / raw)
  To: kernelnewbies

Hi,

On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> wrote:
> Hi All,
>    Nobody has replied to my query here. So i am just wondering if there is a
> forum for block device driver where i can post my query.
> Please tell me if there is any such forum.
>
> Thanks,
> Neha
>
> ---------- Forwarded message ----------
> From: neha naik <nehanaik27@gmail.com>
> Date: Tue, Apr 9, 2013 at 10:18 AM
> Subject: Passthrough device driver performance is low on reads compared to
> writes
> To: kernelnewbies at kernelnewbies.org
>
>
> Hi All,
>   I have written a passthrough block device driver using 'make_request'
> call. This block device driver simply passes any request that comes to it
> down to lvm.
>
> However, the read performance for my passthrough driver is around 65MB/s
> (measured through dd) and write performance is around 140MB/s for dd block
> size 4096.
> The write performance matches with lvm's write performance more or less but,
> the read performance on lvm is around 365MB/s.
>
> I am posting snippets of code which i think are relevant here:
>
> static int passthrough_make_request(
> struct request_queue * queue, struct bio * bio)
> {
>
>         passthrough_device_t * passdev = queue->queuedata;
>         bio->bi_bdev = passdev->bdev_backing;
>         generic_make_request(bio);
>         return 0;
> }
>
> For initializing the queue i am using following:
>
> blk_queue_make_request(passdev->queue, passthrough_make_request);
> passdev->queue->queuedata = sbd;
> passdev->queue->unplug_fn = NULL;
> bdev_backing = passdev->bdev_backing;
> blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
> if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
>         blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
> }
>

What is the implementation for sbd_merge_bvec_fn? Please debug through
it to check requests are merging or not? May be that is the cause of
lower performance?

> Now, I browsed through dm code in kernel to see if there is some flag or
> something which i am not using which is causing this huge performance
> penalty.
> But, I have not found anything.
>
> If you have any ideas about what i am possibly doing wrong then please tell
> me.
>
> Thanks in advance.
>
> Regards,
> Neha
>

-Rajat

>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-10 20:53 Forum for asking questions related to block device drivers neha naik
  2013-04-11  5:15 ` Rajat Sharma
@ 2013-04-11  7:47 ` Bjørn Mork
  1 sibling, 0 replies; 13+ messages in thread
From: Bjørn Mork @ 2013-04-11  7:47 UTC (permalink / raw)
  To: kernelnewbies

neha naik <nehanaik27@gmail.com> writes:

>    Nobody has replied to my query here. So i am just wondering if there is
> a forum for block device driver where i can post my query.
> Please tell me if there is any such forum.

The "get_maintainer" script will tell you such things.  Try running for
example

  scripts/get_maintainer.pl -f drivers/block/

from the top level kernel source directory.

(The answer seems to be NO. The only list pointed to by the script is
linux-kernel at vger.kernel.org.)



Bj?rn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11  5:15 ` Rajat Sharma
@ 2013-04-11 15:09   ` neha naik
  2013-04-11 17:53     ` Rajat Sharma
  0 siblings, 1 reply; 13+ messages in thread
From: neha naik @ 2013-04-11 15:09 UTC (permalink / raw)
  To: kernelnewbies

Hi,
 I am calling the merge function of the block device driver below me(since
mine is only pass through). Does this not work?
When i tried seeing what read requests were coming then i saw that when i
issue dd with count=1 it retrieves 4 pages,
so i tried with 'direct' flag. But even with direct io my read performance
is way lower than my write performance.

Regards,
Neha

On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma <fs.rajat@gmail.com> wrote:

> Hi,
>
> On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> wrote:
> > Hi All,
> >    Nobody has replied to my query here. So i am just wondering if there
> is a
> > forum for block device driver where i can post my query.
> > Please tell me if there is any such forum.
> >
> > Thanks,
> > Neha
> >
> > ---------- Forwarded message ----------
> > From: neha naik <nehanaik27@gmail.com>
> > Date: Tue, Apr 9, 2013 at 10:18 AM
> > Subject: Passthrough device driver performance is low on reads compared
> to
> > writes
> > To: kernelnewbies at kernelnewbies.org
> >
> >
> > Hi All,
> >   I have written a passthrough block device driver using 'make_request'
> > call. This block device driver simply passes any request that comes to it
> > down to lvm.
> >
> > However, the read performance for my passthrough driver is around 65MB/s
> > (measured through dd) and write performance is around 140MB/s for dd
> block
> > size 4096.
> > The write performance matches with lvm's write performance more or less
> but,
> > the read performance on lvm is around 365MB/s.
> >
> > I am posting snippets of code which i think are relevant here:
> >
> > static int passthrough_make_request(
> > struct request_queue * queue, struct bio * bio)
> > {
> >
> >         passthrough_device_t * passdev = queue->queuedata;
> >         bio->bi_bdev = passdev->bdev_backing;
> >         generic_make_request(bio);
> >         return 0;
> > }
> >
> > For initializing the queue i am using following:
> >
> > blk_queue_make_request(passdev->queue, passthrough_make_request);
> > passdev->queue->queuedata = sbd;
> > passdev->queue->unplug_fn = NULL;
> > bdev_backing = passdev->bdev_backing;
> > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
> > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
> >         blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
> > }
> >
>
> What is the implementation for sbd_merge_bvec_fn? Please debug through
> it to check requests are merging or not? May be that is the cause of
> lower performance?
>
> > Now, I browsed through dm code in kernel to see if there is some flag or
> > something which i am not using which is causing this huge performance
> > penalty.
> > But, I have not found anything.
> >
> > If you have any ideas about what i am possibly doing wrong then please
> tell
> > me.
> >
> > Thanks in advance.
> >
> > Regards,
> > Neha
> >
>
> -Rajat
>
> >
> > _______________________________________________
> > Kernelnewbies mailing list
> > Kernelnewbies at kernelnewbies.org
> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130411/f1744815/attachment-0001.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 15:09   ` neha naik
@ 2013-04-11 17:53     ` Rajat Sharma
  2013-04-11 18:50       ` neha naik
  0 siblings, 1 reply; 13+ messages in thread
From: Rajat Sharma @ 2013-04-11 17:53 UTC (permalink / raw)
  To: kernelnewbies

so you mean direct I/O read of your passthrough device is lower than
direct I/O read of lvm?

On Thu, Apr 11, 2013 at 8:39 PM, neha naik <nehanaik27@gmail.com> wrote:
> Hi,
>  I am calling the merge function of the block device driver below me(since
> mine is only pass through). Does this not work?
> When i tried seeing what read requests were coming then i saw that when i
> issue dd with count=1 it retrieves 4 pages,
> so i tried with 'direct' flag. But even with direct io my read performance
> is way lower than my write performance.
>
> Regards,
> Neha
>
>
> On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma <fs.rajat@gmail.com> wrote:
>>
>> Hi,
>>
>> On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> wrote:
>> > Hi All,
>> >    Nobody has replied to my query here. So i am just wondering if there
>> > is a
>> > forum for block device driver where i can post my query.
>> > Please tell me if there is any such forum.
>> >
>> > Thanks,
>> > Neha
>> >
>> > ---------- Forwarded message ----------
>> > From: neha naik <nehanaik27@gmail.com>
>> > Date: Tue, Apr 9, 2013 at 10:18 AM
>> > Subject: Passthrough device driver performance is low on reads compared
>> > to
>> > writes
>> > To: kernelnewbies at kernelnewbies.org
>> >
>> >
>> > Hi All,
>> >   I have written a passthrough block device driver using 'make_request'
>> > call. This block device driver simply passes any request that comes to
>> > it
>> > down to lvm.
>> >
>> > However, the read performance for my passthrough driver is around 65MB/s
>> > (measured through dd) and write performance is around 140MB/s for dd
>> > block
>> > size 4096.
>> > The write performance matches with lvm's write performance more or less
>> > but,
>> > the read performance on lvm is around 365MB/s.
>> >
>> > I am posting snippets of code which i think are relevant here:
>> >
>> > static int passthrough_make_request(
>> > struct request_queue * queue, struct bio * bio)
>> > {
>> >
>> >         passthrough_device_t * passdev = queue->queuedata;
>> >         bio->bi_bdev = passdev->bdev_backing;
>> >         generic_make_request(bio);
>> >         return 0;
>> > }
>> >
>> > For initializing the queue i am using following:
>> >
>> > blk_queue_make_request(passdev->queue, passthrough_make_request);
>> > passdev->queue->queuedata = sbd;
>> > passdev->queue->unplug_fn = NULL;
>> > bdev_backing = passdev->bdev_backing;
>> > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
>> > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
>> >         blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
>> > }
>> >
>>
>> What is the implementation for sbd_merge_bvec_fn? Please debug through
>> it to check requests are merging or not? May be that is the cause of
>> lower performance?
>>
>> > Now, I browsed through dm code in kernel to see if there is some flag or
>> > something which i am not using which is causing this huge performance
>> > penalty.
>> > But, I have not found anything.
>> >
>> > If you have any ideas about what i am possibly doing wrong then please
>> > tell
>> > me.
>> >
>> > Thanks in advance.
>> >
>> > Regards,
>> > Neha
>> >
>>
>> -Rajat
>>
>> >
>> > _______________________________________________
>> > Kernelnewbies mailing list
>> > Kernelnewbies at kernelnewbies.org
>> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>> >
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 17:53     ` Rajat Sharma
@ 2013-04-11 18:50       ` neha naik
  2013-04-11 19:49         ` Greg Freemyer
  2013-04-15  7:02         ` simple question about struct pointer Ben Wu
  0 siblings, 2 replies; 13+ messages in thread
From: neha naik @ 2013-04-11 18:50 UTC (permalink / raw)
  To: kernelnewbies

Yes. Interestingly my direct write i/o performance is better than my direct
read i/o performance for my passthrough device... And that doesn't make any
kind of sense to me.

pdev0 = pass through device on top of lvm

root at voffice-base:/home/neha/sbd# time dd if=/dev/pdev0 of=/dev/null
bs=4096 count=1024 iflag=direct
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 4.09488 s, 1.0 MB/s

real    0m4.100s
user    0m0.028s
sys    0m0.000s

root at voffice-base:/home/neha/sbd# time dd if=/dev/shm/image of=/dev/pdev0
bs=4096 count=1024 oflag=direct
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 0.0852398 s, 49.2 MB/s

real    0m0.090s
user    0m0.004s
sys    0m0.012s

Thanks,
Neha

On Thu, Apr 11, 2013 at 11:53 AM, Rajat Sharma <fs.rajat@gmail.com> wrote:

> so you mean direct I/O read of your passthrough device is lower than
> direct I/O read of lvm?
>
> On Thu, Apr 11, 2013 at 8:39 PM, neha naik <nehanaik27@gmail.com> wrote:
> > Hi,
> >  I am calling the merge function of the block device driver below
> me(since
> > mine is only pass through). Does this not work?
> > When i tried seeing what read requests were coming then i saw that when i
> > issue dd with count=1 it retrieves 4 pages,
> > so i tried with 'direct' flag. But even with direct io my read
> performance
> > is way lower than my write performance.
> >
> > Regards,
> > Neha
> >
> >
> > On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma <fs.rajat@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com>
> wrote:
> >> > Hi All,
> >> >    Nobody has replied to my query here. So i am just wondering if
> there
> >> > is a
> >> > forum for block device driver where i can post my query.
> >> > Please tell me if there is any such forum.
> >> >
> >> > Thanks,
> >> > Neha
> >> >
> >> > ---------- Forwarded message ----------
> >> > From: neha naik <nehanaik27@gmail.com>
> >> > Date: Tue, Apr 9, 2013 at 10:18 AM
> >> > Subject: Passthrough device driver performance is low on reads
> compared
> >> > to
> >> > writes
> >> > To: kernelnewbies at kernelnewbies.org
> >> >
> >> >
> >> > Hi All,
> >> >   I have written a passthrough block device driver using
> 'make_request'
> >> > call. This block device driver simply passes any request that comes to
> >> > it
> >> > down to lvm.
> >> >
> >> > However, the read performance for my passthrough driver is around
> 65MB/s
> >> > (measured through dd) and write performance is around 140MB/s for dd
> >> > block
> >> > size 4096.
> >> > The write performance matches with lvm's write performance more or
> less
> >> > but,
> >> > the read performance on lvm is around 365MB/s.
> >> >
> >> > I am posting snippets of code which i think are relevant here:
> >> >
> >> > static int passthrough_make_request(
> >> > struct request_queue * queue, struct bio * bio)
> >> > {
> >> >
> >> >         passthrough_device_t * passdev = queue->queuedata;
> >> >         bio->bi_bdev = passdev->bdev_backing;
> >> >         generic_make_request(bio);
> >> >         return 0;
> >> > }
> >> >
> >> > For initializing the queue i am using following:
> >> >
> >> > blk_queue_make_request(passdev->queue, passthrough_make_request);
> >> > passdev->queue->queuedata = sbd;
> >> > passdev->queue->unplug_fn = NULL;
> >> > bdev_backing = passdev->bdev_backing;
> >> > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
> >> > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
> >> >         blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
> >> > }
> >> >
> >>
> >> What is the implementation for sbd_merge_bvec_fn? Please debug through
> >> it to check requests are merging or not? May be that is the cause of
> >> lower performance?
> >>
> >> > Now, I browsed through dm code in kernel to see if there is some flag
> or
> >> > something which i am not using which is causing this huge performance
> >> > penalty.
> >> > But, I have not found anything.
> >> >
> >> > If you have any ideas about what i am possibly doing wrong then please
> >> > tell
> >> > me.
> >> >
> >> > Thanks in advance.
> >> >
> >> > Regards,
> >> > Neha
> >> >
> >>
> >> -Rajat
> >>
> >> >
> >> > _______________________________________________
> >> > Kernelnewbies mailing list
> >> > Kernelnewbies at kernelnewbies.org
> >> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130411/8e26c147/attachment.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 18:50       ` neha naik
@ 2013-04-11 19:49         ` Greg Freemyer
  2013-04-11 20:48           ` neha naik
  2013-04-15  7:02         ` simple question about struct pointer Ben Wu
  1 sibling, 1 reply; 13+ messages in thread
From: Greg Freemyer @ 2013-04-11 19:49 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Apr 11, 2013 at 2:50 PM, neha naik <nehanaik27@gmail.com> wrote:
> Yes. Interestingly my direct write i/o performance is better than my direct
> read i/o performance for my passthrough device... And that doesn't make any
> kind of sense to me.
>
> pdev0 = pass through device on top of lvm
>
> root at voffice-base:/home/neha/sbd# time dd if=/dev/pdev0 of=/dev/null bs=4096
> count=1024 iflag=direct
> 1024+0 records in
> 1024+0 records out
> 4194304 bytes (4.2 MB) copied, 4.09488 s, 1.0 MB/s
>
> real    0m4.100s
> user    0m0.028s
> sys    0m0.000s
>
> root at voffice-base:/home/neha/sbd# time dd if=/dev/shm/image of=/dev/pdev0
> bs=4096 count=1024 oflag=direct
> 1024+0 records in
> 1024+0 records out
> 4194304 bytes (4.2 MB) copied, 0.0852398 s, 49.2 MB/s
>
> real    0m0.090s
> user    0m0.004s
> sys    0m0.012s
>
> Thanks,
> Neha

I assume your issue is caching somewhere.

If in the top levels of the kernel, dd has various fsync, fdatasync,
etc. options that should address that.  I note you aren't using any of
them.

You mention LVM.  It should pass cache flush commands down, but some
flavors of mdraid will not the last I knew.  ie. Raid 6 used to
discard cache flush commands iirc.  I don't know if that was ever
fixed or not.

If the cache is in hardware, then dd's cache flushing calls may or may
not get propagated all the way to the device.  Some battery backed
caches actually intentionally reply ACK to a cache flush command
without actually doing it.

Further, you're only writing 4MB.  Not much of a test for most
devices.  A sata drive will typically have at least 32MB of cache.
One way to ensure that results are not being corrupted by the various
caches up and down the storage stack is to write so much data you
overwhelm the caches.  That can be a huge amount of data in some
systems.  ie. A server with 128 GB or ram may use 10's of GB for
cache.

As you can see, testing of the write path for performance can take a
significant effort to ensure caches are not biasing your results.

HTH
Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 19:49         ` Greg Freemyer
@ 2013-04-11 20:48           ` neha naik
  2013-04-11 22:06             ` Arlie Stephens
  2013-04-11 23:02             ` Greg Freemyer
  0 siblings, 2 replies; 13+ messages in thread
From: neha naik @ 2013-04-11 20:48 UTC (permalink / raw)
  To: kernelnewbies

HI Greg,
   Thanks a lot. Everything you said made complete sense to me but when i
tried running with following options my read is so slow (basically with
direct io, that with 1MB/s it will just take 32minutes to read 32MB data)
yet my write is doing fine. Should i use some other options of dd (though i
understand that with direct we bypass all caches, but direct doesn't
guarantee that everything is written when call returns to user for which i
am using fdatasync).

time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct
conv=fdatasync
time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s

real    0m17.785s
user    0m0.152s
sys    0m1.564s


I interrupted the dd for read because it was taking too much time with
1MB/s :
time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct
conv=fdatasync
^C150046+0 records in
150045+0 records out
614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s


real    10m0.201s
user    0m2.576s
sys    0m0.000s

Thanks,
Neha

On Thu, Apr 11, 2013 at 1:49 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote:

> On Thu, Apr 11, 2013 at 2:50 PM, neha naik <nehanaik27@gmail.com> wrote:
> > Yes. Interestingly my direct write i/o performance is better than my
> direct
> > read i/o performance for my passthrough device... And that doesn't make
> any
> > kind of sense to me.
> >
> > pdev0 = pass through device on top of lvm
> >
> > root at voffice-base:/home/neha/sbd# time dd if=/dev/pdev0 of=/dev/null
> bs=4096
> > count=1024 iflag=direct
> > 1024+0 records in
> > 1024+0 records out
> > 4194304 bytes (4.2 MB) copied, 4.09488 s, 1.0 MB/s
> >
> > real    0m4.100s
> > user    0m0.028s
> > sys    0m0.000s
> >
> > root at voffice-base:/home/neha/sbd# time dd if=/dev/shm/image
> of=/dev/pdev0
> > bs=4096 count=1024 oflag=direct
> > 1024+0 records in
> > 1024+0 records out
> > 4194304 bytes (4.2 MB) copied, 0.0852398 s, 49.2 MB/s
> >
> > real    0m0.090s
> > user    0m0.004s
> > sys    0m0.012s
> >
> > Thanks,
> > Neha
>
> I assume your issue is caching somewhere.
>
> If in the top levels of the kernel, dd has various fsync, fdatasync,
> etc. options that should address that.  I note you aren't using any of
> them.
>
> You mention LVM.  It should pass cache flush commands down, but some
> flavors of mdraid will not the last I knew.  ie. Raid 6 used to
> discard cache flush commands iirc.  I don't know if that was ever
> fixed or not.
>
> If the cache is in hardware, then dd's cache flushing calls may or may
> not get propagated all the way to the device.  Some battery backed
> caches actually intentionally reply ACK to a cache flush command
> without actually doing it.
>
> Further, you're only writing 4MB.  Not much of a test for most
> devices.  A sata drive will typically have at least 32MB of cache.
> One way to ensure that results are not being corrupted by the various
> caches up and down the storage stack is to write so much data you
> overwhelm the caches.  That can be a huge amount of data in some
> systems.  ie. A server with 128 GB or ram may use 10's of GB for
> cache.
>
> As you can see, testing of the write path for performance can take a
> significant effort to ensure caches are not biasing your results.
>
> HTH
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130411/b83b5b59/attachment-0001.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 20:48           ` neha naik
@ 2013-04-11 22:06             ` Arlie Stephens
  2013-04-11 23:02             ` Greg Freemyer
  1 sibling, 0 replies; 13+ messages in thread
From: Arlie Stephens @ 2013-04-11 22:06 UTC (permalink / raw)
  To: kernelnewbies

Hi Neha,

On Apr 11 2013, neha naik wrote:
> HI Greg,
>    Thanks a lot. Everything you said made complete sense to me but when i
> tried running with following options my read is so slow (basically with
> direct io, that with 1MB/s it will just take 32minutes to read 32MB data)
> yet my write is doing fine. Should i use some other options of dd (though i
> understand that with direct we bypass all caches, but direct doesn't
> guarantee that everything is written when call returns to user for which i
> am using fdatasync). 

I'm no kind of expert, but the last time I found myself timing dd, I
found that the block size was critical, and 4096 bytes is a very small
block size, from a dd point of view. On freebsd at least, cranking it
up to at least 1MB did great things for its performance. What happens 
with "bs=1M" ? 

> time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct
> conv=fdatasync
> time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in
> 262144+0 records out
> 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s
> 
> real    0m17.785s
> user    0m0.152s
> sys    0m1.564s
> 
> 
> I interrupted the dd for read because it was taking too much time with
> 1MB/s :
> time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct
> conv=fdatasync
> ^C150046+0 records in
> 150045+0 records out
> 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s
> 
> 
> real    10m0.201s
> user    0m2.576s
> sys    0m0.000s
> 
> Thanks,
> Neha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 20:48           ` neha naik
  2013-04-11 22:06             ` Arlie Stephens
@ 2013-04-11 23:02             ` Greg Freemyer
  2013-04-12 18:01               ` neha naik
  1 sibling, 1 reply; 13+ messages in thread
From: Greg Freemyer @ 2013-04-11 23:02 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Apr 11, 2013 at 4:48 PM, neha naik <nehanaik27@gmail.com> wrote:
> HI Greg,
>    Thanks a lot. Everything you said made complete sense to me but when i
> tried running with following options my read is so slow (basically with
> direct io, that with 1MB/s it will just take 32minutes to read 32MB data)
> yet my write is doing fine. Should i use some other options of dd (though i
> understand that with direct we bypass all caches, but direct doesn't
> guarantee that everything is written when call returns to user for which i
> am using fdatasync).
>
> time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct
> conv=fdatasync
> time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in
> 262144+0 records out
> 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s
>
> real    0m17.785s
> user    0m0.152s
> sys    0m1.564s
>
>
> I interrupted the dd for read because it was taking too much time with 1MB/s
> :
> time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct
> conv=fdatasync
> ^C150046+0 records in
> 150045+0 records out
> 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s
>
>
> real    10m0.201s
> user    0m2.576s
> sys    0m0.000s

Before reading the below, please not the rotating disks are made of
zones with a constant number of sectors/track.  In the below I discuss
1 track as holding 1MB of data.  I believe that is roughly accurate
for an outer track with near 3" of diameter.  A inner track with
roughly 2" of diameter, would only have 2/3rds of 1MB of data.  I am
ignoring that for simplicity sake.  You can worry about it yourself
separately.

====
When you use iflag=direct, you are telling the kernel, I know what I'm
doing, just do it.

So let's do some math and see if we can figure it out.  I assume you
are working with rotating media as your backing store for the LVM
volumes.

A rotating disk with 6000 RPMs takes 10 milliseconds per revolution.
(I'm using this because the math is easy.  Check the specs for your
drives.)

With iflag=direct, you have taken control of interacting with a
rotating disk that can only read data once every rotation. That is
relevant sectors are only below the read head once every 10 msecs.

So, you are saying, give me 4KB every time the data rotates below the
read head.  That happens about 100 times per second, so per my logic
you should be seeing 400KB/sec read rate.

You are actually getting roughly twice that.  Thus my question is what
is happening in your setup that you are getting 10KB per rotation
instead of the 4KB you asked for.  (the answer could be that you have
15K rpm drives, instead of the 6K rpm drives I calculated for.)

My laptop is giving 20MB/sec with bs=4KB which implies I'm getting 50x
the speed I expect from the above theory.  I have to assume some form
of read-ahead is going on and reading 256KB at a time.  That logic may
be in my laptop's disk and not the kernel. (I don't know for sure).

Arlie recommended 1 MB reads.  That should be a lot faster because a
disk track is roughly 1 MB, so you are telling the disk: As you spin,
when you get to the sector I care about, do a continuous read for a
full rotation (1MB).  By the time you ask for the next 1MB, I would
expect it will be too late get the very next sector, so the drive
would do a full rotation looking for your sector, then do a continuous
1MB read.

So, if my logic is right the drive itself is doing:

rotation 1: searching for first sector of read
rotation 2: read 1MB continuously
rotation 3: searching for first sector of next read
rotation 4: read 1MB continuously

I just checked my laptop's drive, and with bs=1MB it actually achieves
more or less max transfer rate, so for it at least with 1MB reads the
cpu / drive controller is able to keep up with the rotating disk and
not have the 50% wasted rotations that I would actually expect.

Again it appears something is doing some read ahead.  Let's assume my
laptop's disk does a 256KB readahead every time it gets a read
request.  So when it gets that 1MB request, it actually reads
1MB+256KB, but it returns the first 1MB to the cpu as soon as it has
it.  Thus when the 1MB is returned to the cpu, the drive is still
working on the next 256KB and putting it in on-disk cache.  If 256KB
is 1/4 of a track's data, then it takes the disk about 2.5 msecs to
read that data from the rotating platter to drives internal controller
cache.  If during that 2.5 msecs the cpu issues the next 1MB read
request, the disk will just continue reading and not have any dead
time.

If you want to understand exactly what is happening you would need to
monitor exactly what is going back and forth across the sata bus.  Is
the kernel doing a read-ahead even with direct io?  Is the drive doing
some kind of read ahead? etc.

If you are going to work with direct io, hopefully the above gives you
a new way to think about things.
Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Forum for asking questions related to block device drivers
  2013-04-11 23:02             ` Greg Freemyer
@ 2013-04-12 18:01               ` neha naik
  0 siblings, 0 replies; 13+ messages in thread
From: neha naik @ 2013-04-12 18:01 UTC (permalink / raw)
  To: kernelnewbies

HI Greg,
   I am using SSD underneath. However, my problem is not exactly related to
disk cache. I think i should give some more background.

  These are my key points:

   1.   Read on my passthrough driver on top of lvm is slower than read on
   just the lvm (with or without any kind of direct i/o).
   2.   Read on my passthrough driver (on top of lvm) is slower than write
   on my passthrough driver (on top of lvm).
   3.  If i disable lvm readahead (we can do that for all block device
   drivers) then its read performance becomes almost equal to the read
   performance of my passthrough driver. This suggested that lvm readahead was
   helping lvm's performance. But, if it helps the lvm performance then it
   should also help the performance of my passthrough driver (which is sitting
   on top of it).  This led me to thinking that i am doing something in my
   device driver which is possibly either disabling the lvm readahead or lvm
   readahead gets switched off when it is not interacting with the kernel
   directly.

  Given this, i am thinking there are there may be some issue with how i
have written my device driver (rather used the api). I am using the
'merge_bvec_fn' function of lvm underneath it which i think should have
merged the ios (since we are doing sequential io). But, that is clearly not
the case. When i print the pages that come to my driver i see that each
time the function 'make_request' gets called with one page. Shouldn't it be
merging the io using lvm merge function or it doesn't work like that? That
is should each driver write its own 'merge_bvec_fn' and not rely on the
driver beneath it to take care of that?
   Or is there some problem when i pass the request to lvm (should i be
calling some thing else or passing some kind of flag).


Regards,
Neha


On Thu, Apr 11, 2013 at 5:02 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote:

> On Thu, Apr 11, 2013 at 4:48 PM, neha naik <nehanaik27@gmail.com> wrote:
> > HI Greg,
> >    Thanks a lot. Everything you said made complete sense to me but when i
> > tried running with following options my read is so slow (basically with
> > direct io, that with 1MB/s it will just take 32minutes to read 32MB data)
> > yet my write is doing fine. Should i use some other options of dd
> (though i
> > understand that with direct we bypass all caches, but direct doesn't
> > guarantee that everything is written when call returns to user for which
> i
> > am using fdatasync).
> >
> > time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct
> > conv=fdatasync
> > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in
> > 262144+0 records out
> > 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s
> >
> > real    0m17.785s
> > user    0m0.152s
> > sys    0m1.564s
> >
> >
> > I interrupted the dd for read because it was taking too much time with
> 1MB/s
> > :
> > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct
> > conv=fdatasync
> > ^C150046+0 records in
> > 150045+0 records out
> > 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s
> >
> >
> > real    10m0.201s
> > user    0m2.576s
> > sys    0m0.000s
>
> Before reading the below, please not the rotating disks are made of
> zones with a constant number of sectors/track.  In the below I discuss
> 1 track as holding 1MB of data.  I believe that is roughly accurate
> for an outer track with near 3" of diameter.  A inner track with
> roughly 2" of diameter, would only have 2/3rds of 1MB of data.  I am
> ignoring that for simplicity sake.  You can worry about it yourself
> separately.
>
> ====
> When you use iflag=direct, you are telling the kernel, I know what I'm
> doing, just do it.
>
> So let's do some math and see if we can figure it out.  I assume you
> are working with rotating media as your backing store for the LVM
> volumes.
>
> A rotating disk with 6000 RPMs takes 10 milliseconds per revolution.
> (I'm using this because the math is easy.  Check the specs for your
> drives.)
>
> With iflag=direct, you have taken control of interacting with a
> rotating disk that can only read data once every rotation. That is
> relevant sectors are only below the read head once every 10 msecs.
>
> So, you are saying, give me 4KB every time the data rotates below the
> read head.  That happens about 100 times per second, so per my logic
> you should be seeing 400KB/sec read rate.
>
> You are actually getting roughly twice that.  Thus my question is what
> is happening in your setup that you are getting 10KB per rotation
> instead of the 4KB you asked for.  (the answer could be that you have
> 15K rpm drives, instead of the 6K rpm drives I calculated for.)
>
> My laptop is giving 20MB/sec with bs=4KB which implies I'm getting 50x
> the speed I expect from the above theory.  I have to assume some form
> of read-ahead is going on and reading 256KB at a time.  That logic may
> be in my laptop's disk and not the kernel. (I don't know for sure).
>
> Arlie recommended 1 MB reads.  That should be a lot faster because a
> disk track is roughly 1 MB, so you are telling the disk: As you spin,
> when you get to the sector I care about, do a continuous read for a
> full rotation (1MB).  By the time you ask for the next 1MB, I would
> expect it will be too late get the very next sector, so the drive
> would do a full rotation looking for your sector, then do a continuous
> 1MB read.
>
> So, if my logic is right the drive itself is doing:
>
> rotation 1: searching for first sector of read
> rotation 2: read 1MB continuously
> rotation 3: searching for first sector of next read
> rotation 4: read 1MB continuously
>
> I just checked my laptop's drive, and with bs=1MB it actually achieves
> more or less max transfer rate, so for it at least with 1MB reads the
> cpu / drive controller is able to keep up with the rotating disk and
> not have the 50% wasted rotations that I would actually expect.
>
> Again it appears something is doing some read ahead.  Let's assume my
> laptop's disk does a 256KB readahead every time it gets a read
> request.  So when it gets that 1MB request, it actually reads
> 1MB+256KB, but it returns the first 1MB to the cpu as soon as it has
> it.  Thus when the 1MB is returned to the cpu, the drive is still
> working on the next 256KB and putting it in on-disk cache.  If 256KB
> is 1/4 of a track's data, then it takes the disk about 2.5 msecs to
> read that data from the rotating platter to drives internal controller
> cache.  If during that 2.5 msecs the cpu issues the next 1MB read
> request, the disk will just continue reading and not have any dead
> time.
>
> If you want to understand exactly what is happening you would need to
> monitor exactly what is going back and forth across the sata bus.  Is
> the kernel doing a read-ahead even with direct io?  Is the drive doing
> some kind of read ahead? etc.
>
> If you are going to work with direct io, hopefully the above gives you
> a new way to think about things.
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130412/6bca93dc/attachment.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* simple question about struct pointer
  2013-04-11 18:50       ` neha naik
  2013-04-11 19:49         ` Greg Freemyer
@ 2013-04-15  7:02         ` Ben Wu
  2013-04-15  9:48           ` arshad hussain
  1 sibling, 1 reply; 13+ messages in thread
From: Ben Wu @ 2013-04-15  7:02 UTC (permalink / raw)
  To: kernelnewbies


Dear All ?Im new to linux kernel program, and found struct pointer is difficult to understand, that the? struct s3c_i2sv2_info *i2s = snd_soc_dai_get_drvdata(cpu_dai) means,? why is use the struct pointer assgn another struct pointer??


static int s3c2412_i2s_hw_params(struct snd_pcm_substream *substream,
?? ??? ??? ??? ? struct snd_pcm_hw_params *params,
?? ??? ??? ??? ? struct snd_soc_dai *cpu_dai)
{
?? ?struct s3c_i2sv2_info *i2s = snd_soc_dai_get_drvdata(cpu_dai);
?? ?struct s3c_dma_params *dma_data;
?? ?....................................................................
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130415/1ee21821/attachment.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* simple question about struct pointer
  2013-04-15  7:02         ` simple question about struct pointer Ben Wu
@ 2013-04-15  9:48           ` arshad hussain
  0 siblings, 0 replies; 13+ messages in thread
From: arshad hussain @ 2013-04-15  9:48 UTC (permalink / raw)
  To: kernelnewbies

On Mon, Apr 15, 2013 at 12:32 PM, Ben Wu <crayben@yahoo.cn> wrote:

>
> Dear All ?Im new to linux kernel program, and found struct pointer is
> difficult to understand, that the  struct s3c_i2sv2_info *i2s =
> snd_soc_dai_get_drvdata(cpu_dai) means,? why is use the struct pointer
> assgn another struct pointer??
>
>
> static int s3c2412_i2s_hw_params(struct snd_pcm_substream *substream,
>                  struct snd_pcm_hw_params *params,
>                  struct snd_soc_dai *cpu_dai)
> {
>     struct s3c_i2sv2_info *i2s = snd_soc_dai_get_drvdata(cpu_dai);
>     struct s3c_dma_params *dma_data;
>     ....................................................................
> }
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

Have a look at a utility binary "cdecl". This will help you greatly.
However, the statement above
means, snd_soc... is a function which takes a argument which is a pointer
and returns a pointer
to struct s3c_..

Here the pointer i2s is declared and initialize. Else it will do nasty
thing if you try to de-reference it.

Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130415/bfb9b97a/attachment.html 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-04-15  9:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-10 20:53 Forum for asking questions related to block device drivers neha naik
2013-04-11  5:15 ` Rajat Sharma
2013-04-11 15:09   ` neha naik
2013-04-11 17:53     ` Rajat Sharma
2013-04-11 18:50       ` neha naik
2013-04-11 19:49         ` Greg Freemyer
2013-04-11 20:48           ` neha naik
2013-04-11 22:06             ` Arlie Stephens
2013-04-11 23:02             ` Greg Freemyer
2013-04-12 18:01               ` neha naik
2013-04-15  7:02         ` simple question about struct pointer Ben Wu
2013-04-15  9:48           ` arshad hussain
2013-04-11  7:47 ` Forum for asking questions related to block device drivers Bjørn Mork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.