linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Disk Performance Measurements
@ 2001-05-02  6:31 Shaun
  2001-05-02 10:44 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Shaun @ 2001-05-02  6:31 UTC (permalink / raw)
  To: linux-kernel


Hi All,

I've now been battling for several days with the kernel performance stats
for disks and have come to the conclusion I need some assistance from
someone with a little more understanding of the block device support in
the kernel (and the kernel in general).

Initially I've been looking at the relevance of the statistics in the
/proc/stat file (this is on a 2.2.16 kernel). My understanding of these
figures is as follows:
	- There are seven different statistics kept for each of the first
4 IDE disks
	- The relevant fields are disk, disk_rio, disk_rblk, disk_wblk,
disk_pgin, disk_pgout. The columns following the labels represent hda,
hdb, hdc and hdd respectively.
	- disk = disk_rio + disk_wio - the total number of ios issued to
the device
	- disk_rio - the total number of read ios issued to the device
	- disk_wio - the total number of write ios issed to the device
	- disk_rblk - The total number of blocks read from the device
	- disk_wblk - the total number of blocks written to the device
	- disk_pgin - The total number of buffer reads fulfilled
	- disk_pgout - the total number of buffer writes fulfilled

These statistics are maintained by code in devices/block/ll_rw_blk.c. My
problem is that both disk_r/wblk and disk_pgin/out appear to be incorrect. 

In regards to diskr/wblk, drive_stat_acct() increments the number of
sectors/blocks read based n the values in the request being processed by
add_request(). But add_request() is only called for requests that can't be
merged with requests currently on the queue. Thus the counters can't be
updated for sectors that are read by being added to aqueued
request. Unless I'm mistaken this makes the diskr/wblk mostly useless.

The disk_pgin/out fields appear to have been added based on a patch by
Sebastian Godard
(http://www.uwsg.indiana.edu/hypermail/linux/kernel/0002.1/1106.html) submitted
on the 13th of Feb 200. According to his email the statistics should
record the _kilobytes_ read or written to the disks. His code adds
drive_pg_stat_acct(). This routine increments disk_pgin/out once for each
call to make_request(). Presumably he has assumed every call to
make_request will always be for 2 sectors/1 Kilobytes worth of
data. However I added printk() statements to try to verify this and found
that the request to the block device need not be 1024 bytes, I frequently
saw 4096 requests. In fact, the "correct_size" for the block device
appeared to be changeable from partition to partition on the same
disk. This "correct_size" appears to be related to the block size for the
filesystem on the partition/disk? Following from the above logic it would
appear that the pgin/pgout statistics are also useless since you don't
know how large the requests were?

Is any of my understanding incorrect? If not it looks like these
statistics can't really be used.

In addition to trying to work out the meaning of the /proc/stat fields
I've been looking at the statistics provided through the 'sard' kernel
patche (which adds stats to the /proc/partitions file). These appear to
be correct, does anyone on this list have any comments regarding this
patch?

Please CC me on any replies as I am not subscribed to the list

Thanks,
Shaun


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Disk Performance Measurements
  2001-05-02  6:31 Disk Performance Measurements Shaun
@ 2001-05-02 10:44 ` Jens Axboe
  2001-05-02 21:59   ` Shaun
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2001-05-02 10:44 UTC (permalink / raw)
  To: Shaun; +Cc: linux-kernel

On Wed, May 02 2001, Shaun wrote:
> In regards to diskr/wblk, drive_stat_acct() increments the number of
> sectors/blocks read based n the values in the request being processed by
> add_request(). But add_request() is only called for requests that can't be
> merged with requests currently on the queue. Thus the counters can't be
> updated for sectors that are read by being added to aqueued
> request. Unless I'm mistaken this makes the diskr/wblk mostly useless.

Look again, drive_stat_acct is also called for list merges (just with 0
set for new i/o of course).


> record the _kilobytes_ read or written to the disks. His code adds
> drive_pg_stat_acct(). This routine increments disk_pgin/out once for each
> call to make_request(). Presumably he has assumed every call to
> make_request will always be for 2 sectors/1 Kilobytes worth of
> data. However I added printk() statements to try to verify this and found
> that the request to the block device need not be 1024 bytes, I frequently
> saw 4096 requests. In fact, the "correct_size" for the block device
> appeared to be changeable from partition to partition on the same
> disk. This "correct_size" appears to be related to the block size for the
> filesystem on the partition/disk? Following from the above logic it would
> appear that the pgin/pgout statistics are also useless since you don't
> know how large the requests were?

The size of requests will typically vary with the block size set by
ext2. So if you have 1kB block size on your fs, that partition will
receive 1kB buffers. Similar for 4kB. The stats collected in the kernel
are sector based, units of 512 bytes. The proc printed value should be
in kB however for pgpin/out and 512b sectors for rio/rblk wio/wblk.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Disk Performance Measurements
  2001-05-02 10:44 ` Jens Axboe
@ 2001-05-02 21:59   ` Shaun
  2001-05-03 10:46     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Shaun @ 2001-05-02 21:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel


> > In regards to diskr/wblk, drive_stat_acct() increments the number of
> > sectors/blocks read based n the values in the request being processed by
> > add_request(). But add_request() is only called for requests that can't be
> > merged with requests currently on the queue. Thus the counters can't be
> > updated for sectors that are read by being added to aqueued
> > request. Unless I'm mistaken this makes the diskr/wblk mostly useless.
> 
> Look again, drive_stat_acct is also called for list merges (just with 0
> set for new i/o of course).

Ok, this problem must has been fixed in later versions. In my 2.2.16
kernel:

[shaunc@ufs block]$ grep -irn "drive_stat_acct" ll_rw_blk.c
307:static inline void drive_stat_acct(int cmd, unsigned long nr_sectors,
318:            printk(KERN_ERR "drive_stat_acct: cmd not R/W?\n");
676: * which is important for drive_stat_acct() above.  */
715:            drive_stat_acct(req->cmd, req->nr_sectors, disk_index);
[shaunc@ufs block]$ grep -irn "dk_drive_rblk" ll_rw_blk.c
313:            kstat.dk_drive_rblk[disk_index] += nr_sectors;
 
> > record the _kilobytes_ read or written to the disks. His code adds
> > drive_pg_stat_acct(). This routine increments disk_pgin/out once for each
> > call to make_request(). Presumably he has assumed every call to
> > make_request will always be for 2 sectors/1 Kilobytes worth of
> > data. However I added printk() statements to try to verify this and found
> > that the request to the block device need not be 1024 bytes, I frequently
> > saw 4096 requests. In fact, the "correct_size" for the block device
> > appeared to be changeable from partition to partition on the same
> > disk. This "correct_size" appears to be related to the block size for the
> > filesystem on the partition/disk? Following from the above logic it would
> > appear that the pgin/pgout statistics are also useless since you don't
> > know how large the requests were?
> 
> The size of requests will typically vary with the block size set by
> ext2. So if you have 1kB block size on your fs, that partition will
> receive 1kB buffers. Similar for 4kB. The stats collected in the kernel
> are sector based, units of 512 bytes. The proc printed value should be
> in kB however for pgpin/out and 512b sectors for rio/rblk wio/wblk.

Again, this isn't the case in the 2.2.16 kernel I'm working with. Each
call to make_request() causes pgin/pgout to be incremented, since these
requests can be of different sizes (even for the same disk) I can't see
how a kb value can be deduced. 

Just as a question though, a disk/partition doesn't need to have a
filesystem on it, so why is the "correct_size" for a buffer request on the
block device defined based on a filesystem block system? 

Thanks,
Shaun
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Disk Performance Measurements
  2001-05-02 21:59   ` Shaun
@ 2001-05-03 10:46     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2001-05-03 10:46 UTC (permalink / raw)
  To: Shaun; +Cc: linux-kernel

On Thu, May 03 2001, Shaun wrote:
> Again, this isn't the case in the 2.2.16 kernel I'm working with. Each
> call to make_request() causes pgin/pgout to be incremented, since these
> requests can be of different sizes (even for the same disk) I can't see
> how a kb value can be deduced. 

Check if the latest 2.2 is correct then, 2.4 is.

> Just as a question though, a disk/partition doesn't need to have a
> filesystem on it, so why is the "correct_size" for a buffer request on the
> block device defined based on a filesystem block system? 

It's not, but the fs may set the block size (ext2 does).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-05-03 10:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-05-02  6:31 Disk Performance Measurements Shaun
2001-05-02 10:44 ` Jens Axboe
2001-05-02 21:59   ` Shaun
2001-05-03 10:46     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).