linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O
@ 2001-10-22 19:08 Shailabh Nagar
  2001-10-23  6:42 ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Shailabh Nagar @ 2001-10-22 19:08 UTC (permalink / raw)
  To: Reto Baettig; +Cc: lse-tech, linux-kernel



Unlike the SGI patch, the multiple block size patch continues to use buffer
heads. So the biggest atomic transfer request that can be seen by a device
driver with the multiblocksize patch is still 1 page.

Getting bigger transfers would require a single buffer head to be able to
point to a multipage buffer or not use buffer heads at all.
The former would obviously be a major change and suitable only for 2.5
(perhaps as part of the much-awaited rewrite of the block I/O
subsystem).The use of multipage transfers using a single buffer head would
also help non-raw I/O transfers. I don't know if anyone is working along
those lines.

Incidentally, the multiple block size patch doesn't check whether the
device driver can handle large requests - thats on the todo list of
changes.


Shailabh Nagar
Enterprise Linux Group, IBM TJ Watson Research Center
(914) 945 2851, T/L 862 2851


Reto Baettig <baettig@scs.ch>@lists.sourceforge.net on 10/22/2001 03:50:16
AM
Hi!

We had 200MB/s on 2.2.18 with the SGI raw patch and CPU-Load
approximately 10%.
On 2.4.3-12, we get 100MB/s with 100% CPU-Load. Is there a way of
getting even bigger transfers than one page for the aligned part? With
the SGI patch, there was much less waiting for I/O completion  because
we could transfer 1MB in one chunk. I'm sorry but I don't have time at
the moment to test the patch but I will send you our numbers as soon as
we have some time.

Good to see somebody working on it! Thanks!

Reto

Shailabh Nagar wrote:
>
> Following up on the previous mail with patches for doing multiblock raw
I/O
> :
>
> Experiments on a 2-way, 850MHz PIII, 256K cache, 256M memory
> Running bonnie (modified to allow specification of O_DIRECT option,
> target file etc.)
> Only the block tests (rewrite,read,write) have been run. All tests
> are single threaded.
>
> BW  = bandwidth in kB/s
> cpu = %CPU use
> abs = size of each I/O request
>       (NOT blocksize used by underlying raw I/O mechanism !)
>
> pre2 = using kernel 2.4.13-pre2aa1
> multi = 2.4.13-pre2aa1 kernel with multiblock raw I/O patches applied
>         (both /dev/raw and O_DIRECT)
>
>                   /dev/raw (uses 512 byte blocks)
>                ===============================
>
>          rewrite              write                   read
> ------------------------------------------------------------------
>      pre2      multi       pre2     multi         pre2     multi
> ------------------------------------------------------------------
> abs BW  cpu   BW  cpu     BW  cpu   BW  cpu      BW  cpu   BW  cpu
> ------------------------------------------------------------------
>  4k 884 0.5   882 0.1    1609 0.3  1609 0.2     9841 1.5  9841 0.9
>  6k 884 0.5   882 0.2    1609 0.5  1609 0.1     9841 1.8  9841 1.2
> 16k 884 0.6   882 0.2    1609 0.3  1609 0.0     9841 2.7  9841 1.4
> 18k 884 0.4   882 0.2    1609 0.4  1607 0.1     9841 2.4  9829 1.2
> 64k 883 0.5   882 0.1    1609 0.4  1609 0.3     9841 2.0  9841 0.6
> 66k 883 0.5   882 0.2    1609 0.5  1609 0.2     9829 3.4  9829 1.0
>
>                O_DIRECT : on filesystem with 1K blocksize
>             ===========================================
>
>          rewrite              write                   read
> ------------------------------------------------------------------
>      pre2      multi       pre2     multi         pre2     multi
> ------------------------------------------------------------------
> abs BW  cpu   BW  cpu     BW  cpu   BW  cpu      BW  cpu   BW  cpu
> ------------------------------------------------------------------
>  4k 854 0.8   880 0.4    1527 0.5  1607 0.1     9731 2.5  9780 1.3
>  6k 856 0.4   882 0.3    1527 0.4  1607 0.1   9732 1.6  9780 0.7
> 16k 857 0.4   881 0.1     1527 0.3  1608 0.0  9732 2.2  9780 1.2
> 18k 857 0.3   882 0.2     1527 0.4  1607 0.1  9731 1.9  9780 1.0
> 64k 857 0.3   881 0.1     1526 0.4  1607 0.2  9732 1.6  9780 1.6
> 66k 856 0.4   882 0.2     1527 0.4  1607 0.2  9731 2.7  9780 1.2
>



^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O
@ 2001-10-23 14:05 Shailabh Nagar
  0 siblings, 0 replies; 10+ messages in thread
From: Shailabh Nagar @ 2001-10-23 14:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Reto Baettig, lse-tech, linux-kernel



>On Mon, Oct 22 2001, Shailabh Nagar wrote:
>>
>>
>> Unlike the SGI patch, the multiple block size patch continues to use
buffer
>> heads. So the biggest atomic transfer request that can be seen by a
device
>> driver with the multiblocksize patch is still 1 page.
>
>Not so. Given a 1MB contigious request broken into 256 pages, even if
>submitted in these chunks it will be merged into the biggest possible
>request the lower level driver can handle. This is typically 127kB, for
>SCSI it can be as much as 512kB currently and depending on the SCSI
>driver even more maybe.

My mistake - by device driver I wasn't referring to the lowest level
drivers but also
including the merging functionality.

>
>I haven't seen the SGI rawio patch, but I'm assuming it used kiobufs to
>pass a single unit of 1 meg down at the time. Yes currently we do incur
>significant overhead compared to that approach.
>
> Getting bigger transfers would require a single buffer head to be able to
> point to a multipage buffer or not use buffer heads at all.
> The former would obviously be a major change and suitable only for 2.5
> (perhaps as part of the much-awaited rewrite of the block I/O
>
> Ongoing effort.
>
> subsystem).The use of multipage transfers using a single buffer head
would
> also help non-raw I/O transfers. I don't know if anyone is working along
> those lines.

>It is being worked on.


Could you give some idea as to what are some of the ideas being
discussed/proposed ?
It would be nice to know some of the details as they are being worked on.

Thanks,
Shailabh Nagar





^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O
@ 2001-10-23 14:12 Shailabh Nagar
  2001-10-23 18:10 ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Shailabh Nagar @ 2001-10-23 14:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin Frey, 'Reto Baettig', lse-tech, linux-kernel




>On Tue, Oct 23 2001, Martin Frey wrote:
>> >I haven't seen the SGI rawio patch, but I'm assuming it used kiobufs to
>> >pass a single unit of 1 meg down at the time. Yes currently we do incur
>> >significant overhead compared to that approach.
>> >
>> Yes, it used kiobufs to get a gatherlist, setup a gather DMA out
>> of that list and submitted it to the SCSI layer. Depending on
>> the controller 1 MB could be transfered with 0 memcopies, 1 DMA,
>> 1 interrupt. 200 MB/s with 10% CPU load was really impressive.
>
>Let me repeat that the only difference between the kiobuf and the
>current approach is the overhead incurred on multiple __make_request
>calls. Given the current short queues, this isn't as bad as it used to
>be. Of course it isn't free, though.

The patch below attempts to address exactly that - reducing the number of
submit_bh/__make_request() calls made for raw I/O. The basic idea is to do
a major
part of the I/O in page sized blocks.

Comments on the idea ?


diff -Naur linux-2.4.10-v/drivers/char/raw.c
linux-2.4.10-rawvar/drivers/char/raw.c
--- linux-2.4.10-v/drivers/char/raw.c    Sat Sep 22 23:35:43 2001
+++ linux-2.4.10-rawvar/drivers/char/raw.c    Wed Oct 17 16:31:43 2001
@@ -283,6 +283,9 @@

     int       sector_size, sector_bits, sector_mask;
     int       max_sectors;
+
+    int             cursector_size, cursector_bits;
+    loff_t          startpg,endpg ;

     /*
      * First, a few checks on device size limits
@@ -304,8 +307,8 @@
     }

     dev = to_kdev_t(raw_devices[minor].binding->bd_dev);
-    sector_size = raw_devices[minor].sector_size;
-    sector_bits = raw_devices[minor].sector_bits;
+    sector_size = cursector_size = raw_devices[minor].sector_size;
+    sector_bits = cursector_bits = raw_devices[minor].sector_bits;
     sector_mask = sector_size- 1;
     max_sectors = KIO_MAX_SECTORS >> (sector_bits - 9);

@@ -325,6 +328,23 @@
     if ((*offp >> sector_bits) >= limit)
          goto out_free;

+    /* Using multiple I/O granularities
+       Divide <size> into <initial> <pagealigned> <final>
+       <initial> and <final> are done at sector_size granularity
+       <pagealigned> is done at PAGE_SIZE granularity
+       startpg, endpg define the boundaries of <pagealigned>.
+       They also serve as flags on whether PAGE_SIZE I/O is
+       done at all (its unnecessary if <size> is sufficiently small)
+    */
+
+    startpg = (*offp + (loff_t)(PAGE_SIZE - 1)) & (loff_t)PAGE_MASK ;
+    endpg = (*offp + (loff_t) size) & (loff_t)PAGE_MASK ;
+
+    if ((startpg == endpg) || (sector_size == PAGE_SIZE))
+         /* PAGE_SIZE I/O either unnecessary or being done anyway */
+         /* impossible values make startpg,endpg act as flags     */
+         startpg = endpg = ~(loff_t)0 ;
+
     /*
      * Split the IO into KIO_MAX_SECTORS chunks, mapping and
      * unmapping the single kiobuf as we go to perform each chunk of
@@ -332,9 +352,23 @@
      */

     transferred = 0;
-    blocknr = *offp >> sector_bits;
     while (size > 0) {
-         blocks = size >> sector_bits;
+
+         if (*offp  == startpg) {
+              cursector_size = PAGE_SIZE ;
+              cursector_bits = PAGE_SHIFT ;
+         }
+         else if (*offp == endpg) {
+              cursector_size = sector_size ;
+              cursector_bits = sector_bits ;
+         }
+
+         blocknr = *offp >> cursector_bits ;
+         max_sectors = KIO_MAX_SECTORS << (cursector_bits - 9) ;
+         if (limit != INT_MAX)
+              limit = (((loff_t) blk_size[MAJOR(dev)][MINOR(dev)]) <<
BLOCK_SIZE_BITS) >> cursector_bits ;
+
+         blocks = size >> cursector_bits;
          if (blocks > max_sectors)
               blocks = max_sectors;
          if (blocks > limit - blocknr)
@@ -342,7 +376,7 @@
          if (!blocks)
               break;

-         iosize = blocks << sector_bits;
+         iosize = blocks << cursector_bits;

          err = map_user_kiobuf(rw, iobuf, (unsigned long) buf, iosize);
          if (err)
@@ -351,7 +385,7 @@
          for (i=0; i < blocks; i++)
               iobuf->blocks[i] = blocknr++;

-         err = brw_kiovec(rw, 1, &iobuf, dev, iobuf->blocks, sector_size);
+         err = brw_kiovec(rw, 1, &iobuf, dev, iobuf->blocks,
cursector_size);

          if (rw == READ && err > 0)
               mark_dirty_kiobuf(iobuf, err);
@@ -360,6 +394,7 @@
               transferred += err;
               size -= err;
               buf += err;
+              *offp += err ;
          }

          unmap_kiobuf(iobuf);
@@ -369,7 +404,6 @@
     }

     if (transferred) {
-         *offp += transferred;
          err = transferred;
     }




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-10-23 18:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-22 19:08 [Lse-tech] Re: Preliminary results of using multiblock raw I/O Shailabh Nagar
2001-10-23  6:42 ` Jens Axboe
2001-10-23  9:59   ` Martin Frey
2001-10-23 10:02     ` Jens Axboe
2001-10-23 16:23   ` Alan Cox
2001-10-23 17:49     ` Jens Axboe
2001-10-23 18:04       ` Alan Cox
2001-10-23 14:05 Shailabh Nagar
2001-10-23 14:12 Shailabh Nagar
2001-10-23 18:10 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).