All of lore.kernel.org
 help / color / mirror / Atom feed
* Add data cache statistics to nfs-iostat.py
@ 2009-04-21  2:03 Kevin Constantine
       [not found] ` <1240279414-30528-1-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-21  2:03 UTC (permalink / raw)
  To: linux-nfs; +Cc: chuck.lever

The first patch adds a --data flag, and prints a line about data being read by NFS

Data Read:     From Server      From Cache           Total   Hit %
               5676.7534MB     1862.0452MB     7538.7987MB  24.70%

The second patch adds a line about write data.  I'm not quire sure about this patch.  I'm seeing oddities where the number of bytes in "NORMALWRITTENBYTES + DIRECTWRITTENBYTES" does not equal SERVERWRITTENBYTES.  In some cases the amount of data written by the app (normalwrittenbytes + directwrittenbytes) is greater than the amount of data written to the server, and in other cases it's less than the amount of data written to the server.  At the moment, i've ascribed that difference to data being written to the local cache and not flushed, but that doesn't make a whole lot of sense.  

Data Written:    To Server        To Cache           Total
                  5.2221MB        0.1951MB        5.4172MB

-kevin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics
       [not found] ` <1240279414-30528-1-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
@ 2009-04-21  2:03   ` Kevin Constantine
       [not found]     ` <1240279414-30528-2-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-21  2:03 UTC (permalink / raw)
  To: linux-nfs; +Cc: chuck.lever, Kevin Constantine

add --data flag to print data cache statistics
print read cache stats from __print_data_cache_stats

Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
---
 tools/nfs-iostat/nfs-iostat.py |   26 ++++++++++++++++++++------
 1 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py
index 9626d42..d331a72 100644
--- a/tools/nfs-iostat/nfs-iostat.py
+++ b/tools/nfs-iostat/nfs-iostat.py
@@ -220,14 +220,20 @@ class DeviceData:
         """Print the data cache hit rate
         """
         nfs_stats = self.__nfs_data
-        app_bytes_read = float(nfs_stats['normalreadbytes'])
+        app_bytes_read = float(nfs_stats['normalreadbytes'] + nfs_stats['directreadbytes'])
         if app_bytes_read != 0:
-            client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
-            ratio = ((app_bytes_read - client_bytes_read) * 100) / app_bytes_read
-
+            read_bytes_from_server = float(nfs_stats['serverreadbytes'])
+            directio_bytes_from_server = float(nfs_stats['directreadbytes'])
+            standard_read_bytes_from_server = read_bytes_from_server - directio_bytes_from_server
+            cached_read_bytes = float(app_bytes_read - read_bytes_from_server)
+            ratio = (cached_read_bytes / app_bytes_read) * 100
             print
-            print 'app bytes: %f  client bytes %f' % (app_bytes_read, client_bytes_read)
-            print 'Data cache hit ratio: %4.2f%%' % ratio
+            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From Server", "From Cache", "Total", "Hit %")
+            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
+	    					read_bytes_from_server / 1024.0 / 1024.0, \
+						cached_read_bytes / 1024.0 / 1024.0, \
+						app_bytes_read / 1024.0 / 1024.0, \
+						ratio)
 
     def __print_attr_cache_stats(self, sample_time):
         """Print attribute cache efficiency stats
@@ -390,6 +396,10 @@ class DeviceData:
             self.__print_rpc_op_stats('READ', sample_time)
             self.__print_rpc_op_stats('WRITE', sample_time)
             self.__print_page_stats(sample_time)
+        elif which == 4:
+            self.__print_rpc_op_stats('READ', sample_time)
+            self.__print_rpc_op_stats('WRITE', sample_time)
+            self.__print_data_cache_stats()
 
 #
 # Functions
@@ -487,6 +497,10 @@ def iostat_command(name):
         if arg in ['-p', '--page']:
             which = 3
             continue
+	
+	if arg in ['--data']:
+	    which = 4
+	    continue
 
         if arg == sys.argv[0]:
             continue
-- 
1.6.2.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] nfs-iostat.py: Added bytes written output
       [not found]     ` <1240279414-30528-2-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
@ 2009-04-21  2:03       ` Kevin Constantine
       [not found]         ` <1240279414-30528-3-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
  2009-04-21 14:26       ` [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics Chuck Lever
  1 sibling, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-21  2:03 UTC (permalink / raw)
  To: linux-nfs; +Cc: chuck.lever, Kevin Constantine

Print statistics about bytes written by NFS

Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
---
 tools/nfs-iostat/nfs-iostat.py |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py
index d331a72..5f2e232 100644
--- a/tools/nfs-iostat/nfs-iostat.py
+++ b/tools/nfs-iostat/nfs-iostat.py
@@ -221,6 +221,7 @@ class DeviceData:
         """
         nfs_stats = self.__nfs_data
         app_bytes_read = float(nfs_stats['normalreadbytes'] + nfs_stats['directreadbytes'])
+        bytes_written_by_app = float(nfs_stats['normalwritebytes'] + nfs_stats['directwritebytes'])
         if app_bytes_read != 0:
             read_bytes_from_server = float(nfs_stats['serverreadbytes'])
             directio_bytes_from_server = float(nfs_stats['directreadbytes'])
@@ -234,6 +235,15 @@ class DeviceData:
 						cached_read_bytes / 1024.0 / 1024.0, \
 						app_bytes_read / 1024.0 / 1024.0, \
 						ratio)
+	if bytes_written_by_app != 0:
+	    bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
+	    cached_write_bytes = float(bytes_written_by_app - bytes_written_by_nfs);
+	    print
+            print '%13s %12s %15s %15s' % ("Data Written:", "To Server", "To Cache", "Total")
+	    print '%10s %13.4fMB %13.4fMB %13.4fMB' % ("", \
+	    					bytes_written_by_nfs / 1024.0 / 1024.0, \
+						cached_write_bytes / 1024.0 / 1024.0, \
+						bytes_written_by_app / 1024.0 / 1024.0)
 
     def __print_attr_cache_stats(self, sample_time):
         """Print attribute cache efficiency stats
-- 
1.6.2.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] nfs-iostat.py: Added bytes written output
       [not found]         ` <1240279414-30528-3-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
@ 2009-04-21 14:26           ` Chuck Lever
  2009-04-21 17:57             ` Kevin Constantine
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2009-04-21 14:26 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: linux-nfs, Kevin Constantine

On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
> Print statistics about bytes written by NFS
>
> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
> >
> ---
> tools/nfs-iostat/nfs-iostat.py |   10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
> iostat.py
> index d331a72..5f2e232 100644
> --- a/tools/nfs-iostat/nfs-iostat.py
> +++ b/tools/nfs-iostat/nfs-iostat.py
> @@ -221,6 +221,7 @@ class DeviceData:
>         """
>         nfs_stats = self.__nfs_data
>         app_bytes_read = float(nfs_stats['normalreadbytes'] +  
> nfs_stats['directreadbytes'])
> +        bytes_written_by_app = float(nfs_stats['normalwritebytes']  
> + nfs_stats['directwritebytes'])
>         if app_bytes_read != 0:
>             read_bytes_from_server =  
> float(nfs_stats['serverreadbytes'])
>             directio_bytes_from_server =  
> float(nfs_stats['directreadbytes'])
> @@ -234,6 +235,15 @@ class DeviceData:
> 						cached_read_bytes / 1024.0 / 1024.0, \
> 						app_bytes_read / 1024.0 / 1024.0, \
> 						ratio)
> +	if bytes_written_by_app != 0:
> +	    bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
> +	    cached_write_bytes = float(bytes_written_by_app -  
> bytes_written_by_nfs);

Again, I don't see why this computation is necessary:  The  
"normalwritebytes" statistic is what was written through the client's  
cache, and "directwritebytes" counts the number of bytes that were  
written via O_DIRECT.

> +	    print
> +            print '%13s %12s %15s %15s' % ("Data Written:", "To  
> Server", "To Cache", "Total")
> +	    print '%10s %13.4fMB %13.4fMB %13.4fMB' % ("", \
> +	    					bytes_written_by_nfs / 1024.0 / 1024.0, \
> +						cached_write_bytes / 1024.0 / 1024.0, \
> +						bytes_written_by_app / 1024.0 / 1024.0)
>
>     def __print_attr_cache_stats(self, sample_time):
>         """Print attribute cache efficiency stats
> -- 
> 1.6.2.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics
       [not found]     ` <1240279414-30528-2-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
  2009-04-21  2:03       ` [PATCH 2/2] nfs-iostat.py: Added bytes written output Kevin Constantine
@ 2009-04-21 14:26       ` Chuck Lever
  2009-04-21 17:54         ` Kevin Constantine
  1 sibling, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2009-04-21 14:26 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: linux-nfs, Kevin Constantine

On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
> add --data flag to print data cache statistics
> print read cache stats from __print_data_cache_stats

It's been a while since I wrote this code... comments below.

> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
> >
> ---
> tools/nfs-iostat/nfs-iostat.py |   26 ++++++++++++++++++++------
> 1 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
> iostat.py
> index 9626d42..d331a72 100644
> --- a/tools/nfs-iostat/nfs-iostat.py
> +++ b/tools/nfs-iostat/nfs-iostat.py
> @@ -220,14 +220,20 @@ class DeviceData:
>         """Print the data cache hit rate
>         """
>         nfs_stats = self.__nfs_data
> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
> +        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
> nfs_stats['directreadbytes'])
>         if app_bytes_read != 0:
> -            client_bytes_read = float(nfs_stats['serverreadbytes']  
> - nfs_stats['directreadbytes'])
> -            ratio = ((app_bytes_read - client_bytes_read) * 100) /  
> app_bytes_read
> -
> +            read_bytes_from_server =  
> float(nfs_stats['serverreadbytes'])
> +            directio_bytes_from_server =  
> float(nfs_stats['directreadbytes'])
> +            standard_read_bytes_from_server =  
> read_bytes_from_server - directio_bytes_from_server
> +            cached_read_bytes = float(app_bytes_read -  
> read_bytes_from_server)

I'm not sure why you are reversing the sense of this computation.   
"directreadbytes" is the count of bytes that applications read with  
O_DIRECT.  These are never cached, but they are counted in  
"serverreadbytes", so my logic subtracts them before computing the hit  
ratio.  Was there some accounting problem you found?  (If there was,  
it's worth stating that explicitly in the patch description).

> +            ratio = (cached_read_bytes / app_bytes_read) * 100

The other functions use (x * 100) / y -- I seem to recall there are  
rounding errors if you do it the way you've done it here, but I don't  
remember exactly what the issue was.  But you are changing this one so  
it is different than the all the other similar computations.  Can you  
state your reason?

>             print
> -            print 'app bytes: %f  client bytes %f' %  
> (app_bytes_read, client_bytes_read)
> -            print 'Data cache hit ratio: %4.2f%%' % ratio
> +            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From  
> Server", "From Cache", "Total", "Hit %")
> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
> +	    					read_bytes_from_server / 1024.0 / 1024.0, \
> +						cached_read_bytes / 1024.0 / 1024.0, \
> +						app_bytes_read / 1024.0 / 1024.0, \
> +						ratio)
>
>     def __print_attr_cache_stats(self, sample_time):
>         """Print attribute cache efficiency stats
> @@ -390,6 +396,10 @@ class DeviceData:
>             self.__print_rpc_op_stats('READ', sample_time)
>             self.__print_rpc_op_stats('WRITE', sample_time)
>             self.__print_page_stats(sample_time)
> +        elif which == 4:
> +            self.__print_rpc_op_stats('READ', sample_time)
> +            self.__print_rpc_op_stats('WRITE', sample_time)
> +            self.__print_data_cache_stats()
>
> #
> # Functions
> @@ -487,6 +497,10 @@ def iostat_command(name):
>         if arg in ['-p', '--page']:
>             which = 3
>             continue
> +	
> +	if arg in ['--data']:
> +	    which = 4
> +	    continue
>
>         if arg == sys.argv[0]:
>             continue

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics
  2009-04-21 14:26       ` [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics Chuck Lever
@ 2009-04-21 17:54         ` Kevin Constantine
       [not found]           ` <49EE083C.6010105-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-21 17:54 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

Chuck Lever wrote:
> On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
>> add --data flag to print data cache statistics
>> print read cache stats from __print_data_cache_stats
> 
> It's been a while since I wrote this code... comments below.
> 
>> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
>> ---
>> tools/nfs-iostat/nfs-iostat.py |   26 ++++++++++++++++++++------
>> 1 files changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/nfs-iostat/nfs-iostat.py 
>> b/tools/nfs-iostat/nfs-iostat.py
>> index 9626d42..d331a72 100644
>> --- a/tools/nfs-iostat/nfs-iostat.py
>> +++ b/tools/nfs-iostat/nfs-iostat.py
>> @@ -220,14 +220,20 @@ class DeviceData:
>>         """Print the data cache hit rate
>>         """
>>         nfs_stats = self.__nfs_data
>> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
>> +        app_bytes_read = float(nfs_stats['normalreadbytes'] + 
>> nfs_stats['directreadbytes'])
>>         if app_bytes_read != 0:
>> -            client_bytes_read = float(nfs_stats['serverreadbytes'] - 
>> nfs_stats['directreadbytes'])
>> -            ratio = ((app_bytes_read - client_bytes_read) * 100) / 
>> app_bytes_read
>> -
>> +            read_bytes_from_server = float(nfs_stats['serverreadbytes'])
>> +            directio_bytes_from_server = 
>> float(nfs_stats['directreadbytes'])
>> +            standard_read_bytes_from_server = read_bytes_from_server 
>> - directio_bytes_from_server
>> +            cached_read_bytes = float(app_bytes_read - 
>> read_bytes_from_server)
> 
> I'm not sure why you are reversing the sense of this computation.  
> "directreadbytes" is the count of bytes that applications read with 
> O_DIRECT.  These are never cached, but they are counted in 
> "serverreadbytes", so my logic subtracts them before computing the hit 
> ratio.  Was there some accounting problem you found?  (If there was, 
> it's worth stating that explicitly in the patch description).

Originally you had app_bytes_read = normalreadbytes, but that's not entirely true, since an app might be reading with 
O_DIRECT, therefore app_bytes_read should probably be normal + direct.

client_bytes_read seemed ambiguous to me, is that bytes being read by the host client from the server, bytes being read 
by the nfs client process?  read_bytes_from_server seemed more logical, but in the diff block below, I've removed that 
altogether, and am just left with cached_read_bytes.

> 
>> +            ratio = (cached_read_bytes / app_bytes_read) * 100
> 
> The other functions use (x * 100) / y -- I seem to recall there are 
> rounding errors if you do it the way you've done it here, but I don't 
> remember exactly what the issue was.  But you are changing this one so 
> it is different than the all the other similar computations.  Can you 
> state your reason?
> 

since cached_read_bytes = app_bytes_read - client_bytes_read, i used (cached_read_bytes / app_bytes_read) * 100, but 
that is the same as (cached_read_bytes * 100) / app_bytes_read if you'd rather see that.

new diff block below (if it all checks out, i'll send the whole patch, unless you'd rather just see that.)

diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py
index 9626d42..7dae659 100644
--- a/tools/nfs-iostat/nfs-iostat.py
+++ b/tools/nfs-iostat/nfs-iostat.py
@@ -220,14 +220,17 @@ class DeviceData:
          """Print the data cache hit rate
          """
          nfs_stats = self.__nfs_data
-        app_bytes_read = float(nfs_stats['normalreadbytes'])
+        app_bytes_read = float(nfs_stats['normalreadbytes'] + nfs_stats['directreadbytes'])
          if app_bytes_read != 0:
-            client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
-            ratio = ((app_bytes_read - client_bytes_read) * 100) / app_bytes_read
-
+            cached_read_bytes = float(app_bytes_read - float(nfs_stats['serverreadbytes']))
+            ratio = (cached_read_bytes * 100) / app_bytes_read
              print
-            print 'app bytes: %f  client bytes %f' % (app_bytes_read, client_bytes_read)
-            print 'Data cache hit ratio: %4.2f%%' % ratio
+            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From Server", "From Cache", "Total", "Hit %")
+            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
+                                               float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0, \
+                                               cached_read_bytes / 1024.0 / 1024.0, \
+                                               app_bytes_read / 1024.0 / 1024.0, \
+                                               ratio)

      def __print_attr_cache_stats(self, sample_time):
          """Print attribute cache efficiency stats

-kevin

> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] nfs-iostat.py: Added bytes written output
  2009-04-21 14:26           ` Chuck Lever
@ 2009-04-21 17:57             ` Kevin Constantine
       [not found]               ` <49EE0901.2040704-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-21 17:57 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

Chuck Lever wrote:
> On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
>> Print statistics about bytes written by NFS
>>
>> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
>> ---
>> tools/nfs-iostat/nfs-iostat.py |   10 ++++++++++
>> 1 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/tools/nfs-iostat/nfs-iostat.py 
>> b/tools/nfs-iostat/nfs-iostat.py
>> index d331a72..5f2e232 100644
>> --- a/tools/nfs-iostat/nfs-iostat.py
>> +++ b/tools/nfs-iostat/nfs-iostat.py
>> @@ -221,6 +221,7 @@ class DeviceData:
>>         """
>>         nfs_stats = self.__nfs_data
>>         app_bytes_read = float(nfs_stats['normalreadbytes'] + 
>> nfs_stats['directreadbytes'])
>> +        bytes_written_by_app = float(nfs_stats['normalwritebytes'] + 
>> nfs_stats['directwritebytes'])
>>         if app_bytes_read != 0:
>>             read_bytes_from_server = float(nfs_stats['serverreadbytes'])
>>             directio_bytes_from_server = 
>> float(nfs_stats['directreadbytes'])
>> @@ -234,6 +235,15 @@ class DeviceData:
>>                         cached_read_bytes / 1024.0 / 1024.0, \
>>                         app_bytes_read / 1024.0 / 1024.0, \
>>                         ratio)
>> +    if bytes_written_by_app != 0:
>> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
>> +        cached_write_bytes = float(bytes_written_by_app - 
>> bytes_written_by_nfs);
> 
> Again, I don't see why this computation is necessary:  The 
> "normalwritebytes" statistic is what was written through the client's 
> cache, and "directwritebytes" counts the number of bytes that were 
> written via O_DIRECT.
> 

As I said in the email preceding these patches:
"I'm not quire sure about this patch.  I'm seeing oddities where the number of bytes in "NORMALWRITTENBYTES + 
DIRECTWRITTENBYTES" does not equal SERVERWRITTENBYTES.  In some cases the amount of data written by the app 
(normalwrittenbytes + directwrittenbytes) is greater than the amount of data written to the server, and in other cases 
it's less than the amount of data written to the server.  At the moment, i've ascribed that difference to data being 
written to the local cache and not flushed, but that doesn't make a whole lot of sense."

-kevin

>> +        print
>> +            print '%13s %12s %15s %15s' % ("Data Written:", "To 
>> Server", "To Cache", "Total")
>> +        print '%10s %13.4fMB %13.4fMB %13.4fMB' % ("", \
>> +                            bytes_written_by_nfs / 1024.0 / 1024.0, \
>> +                        cached_write_bytes / 1024.0 / 1024.0, \
>> +                        bytes_written_by_app / 1024.0 / 1024.0)
>>
>>     def __print_attr_cache_stats(self, sample_time):
>>         """Print attribute cache efficiency stats
>> -- 
>> 1.6.2.1
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
------------------------------------------------------------
Kevin Constantine

Systems Engineer		t: 818.460.8221
Walt Disney Animation Studios	e: kevin.constantine-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org

Any sufficiently advanced technology is indistinguishable from magic.
     - Arthur C. Clarke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics
       [not found]           ` <49EE083C.6010105-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
@ 2009-04-21 18:58             ` Chuck Lever
  2009-04-22  1:42               ` [PATCH] " Kevin Constantine
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2009-04-21 18:58 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: linux-nfs

On Apr 21, 2009, at 1:54 PM, Kevin Constantine wrote:
> Chuck Lever wrote:
>> On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
>>> add --data flag to print data cache statistics
>>> print read cache stats from __print_data_cache_stats
>> It's been a while since I wrote this code... comments below.
>>> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
>>> >
>>> ---
>>> tools/nfs-iostat/nfs-iostat.py |   26 ++++++++++++++++++++------
>>> 1 files changed, 20 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
>>> iostat.py
>>> index 9626d42..d331a72 100644
>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>> @@ -220,14 +220,20 @@ class DeviceData:
>>>        """Print the data cache hit rate
>>>        """
>>>        nfs_stats = self.__nfs_data
>>> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
>>> +        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
>>> nfs_stats['directreadbytes'])
>>>        if app_bytes_read != 0:
>>> -            client_bytes_read =  
>>> float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
>>> -            ratio = ((app_bytes_read - client_bytes_read) *  
>>> 100) / app_bytes_read
>>> -
>>> +            read_bytes_from_server =  
>>> float(nfs_stats['serverreadbytes'])
>>> +            directio_bytes_from_server =  
>>> float(nfs_stats['directreadbytes'])
>>> +            standard_read_bytes_from_server =  
>>> read_bytes_from_server - directio_bytes_from_server
>>> +            cached_read_bytes = float(app_bytes_read -  
>>> read_bytes_from_server)
>> I'm not sure why you are reversing the sense of this computation.   
>> "directreadbytes" is the count of bytes that applications read with  
>> O_DIRECT.  These are never cached, but they are counted in  
>> "serverreadbytes", so my logic subtracts them before computing the  
>> hit ratio.  Was there some accounting problem you found?  (If there  
>> was, it's worth stating that explicitly in the patch description).
>
> Originally you had app_bytes_read = normalreadbytes, but that's not  
> entirely true, since an app might be reading with O_DIRECT,  
> therefore app_bytes_read should probably be normal + direct.
>
> client_bytes_read seemed ambiguous to me, is that bytes being read  
> by the host client from the server, bytes being read by the nfs  
> client process?  read_bytes_from_server seemed more logical, but in  
> the diff block below, I've removed that altogether, and am just left  
> with cached_read_bytes.

I'd rather not change the way the ratio is computed, unless you see a  
loss of precision.  The way I had it minimizes the possibility of an  
overflow during the computation.  If you want to rename the variables  
to clarify it that's OK, I guess.

>>> +            ratio = (cached_read_bytes / app_bytes_read) * 100
>> The other functions use (x * 100) / y -- I seem to recall there are  
>> rounding errors if you do it the way you've done it here, but I  
>> don't remember exactly what the issue was.  But you are changing  
>> this one so it is different than the all the other similar  
>> computations.  Can you state your reason?
>
> since cached_read_bytes = app_bytes_read - client_bytes_read, i used  
> (cached_read_bytes / app_bytes_read) * 100, but that is the same as  
> (cached_read_bytes * 100) / app_bytes_read if you'd rather see that.

The latter usually gives us better precision in the calculated result.

> new diff block below (if it all checks out, i'll send the whole  
> patch, unless you'd rather just see that.)
>
> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
> iostat.py
> index 9626d42..7dae659 100644
> --- a/tools/nfs-iostat/nfs-iostat.py
> +++ b/tools/nfs-iostat/nfs-iostat.py
> @@ -220,14 +220,17 @@ class DeviceData:
>         """Print the data cache hit rate
>         """
>         nfs_stats = self.__nfs_data
> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
> +        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
> nfs_stats['directreadbytes'])
>         if app_bytes_read != 0:
> -            client_bytes_read = float(nfs_stats['serverreadbytes']  
> - nfs_stats['directreadbytes'])
> -            ratio = ((app_bytes_read - client_bytes_read) * 100) /  
> app_bytes_read
> -
> +            cached_read_bytes = float(app_bytes_read -  
> float(nfs_stats['serverreadbytes']))
> +            ratio = (cached_read_bytes * 100) / app_bytes_read
>             print
> -            print 'app bytes: %f  client bytes %f' %  
> (app_bytes_read, client_bytes_read)
> -            print 'Data cache hit ratio: %4.2f%%' % ratio
> +            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From  
> Server", "From Cache", "Total", "Hit %")
> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
> +                                                
> float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0, \
> +                                               cached_read_bytes /  
> 1024.0 / 1024.0, \
> +                                               app_bytes_read /  
> 1024.0 / 1024.0, \
> +                                               ratio)
>
>     def __print_attr_cache_stats(self, sample_time):
>         """Print attribute cache efficiency stats

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] nfs-iostat.py: Added bytes written output
       [not found]               ` <49EE0901.2040704-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
@ 2009-04-21 19:00                 ` Chuck Lever
  2009-04-21 20:39                   ` Kevin Constantine
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2009-04-21 19:00 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: linux-nfs

On Apr 21, 2009, at 1:57 PM, Kevin Constantine wrote:
> Chuck Lever wrote:
>> On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
>>> Print statistics about bytes written by NFS
>>>
>>> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
>>> >
>>> ---
>>> tools/nfs-iostat/nfs-iostat.py |   10 ++++++++++
>>> 1 files changed, 10 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
>>> iostat.py
>>> index d331a72..5f2e232 100644
>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>> @@ -221,6 +221,7 @@ class DeviceData:
>>>        """
>>>        nfs_stats = self.__nfs_data
>>>        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
>>> nfs_stats['directreadbytes'])
>>> +        bytes_written_by_app =  
>>> float(nfs_stats['normalwritebytes'] + nfs_stats['directwritebytes'])
>>>        if app_bytes_read != 0:
>>>            read_bytes_from_server =  
>>> float(nfs_stats['serverreadbytes'])
>>>            directio_bytes_from_server =  
>>> float(nfs_stats['directreadbytes'])
>>> @@ -234,6 +235,15 @@ class DeviceData:
>>>                        cached_read_bytes / 1024.0 / 1024.0, \
>>>                        app_bytes_read / 1024.0 / 1024.0, \
>>>                        ratio)
>>> +    if bytes_written_by_app != 0:
>>> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
>>> +        cached_write_bytes = float(bytes_written_by_app -  
>>> bytes_written_by_nfs);
>> Again, I don't see why this computation is necessary:  The  
>> "normalwritebytes" statistic is what was written through the  
>> client's cache, and "directwritebytes" counts the number of bytes  
>> that were written via O_DIRECT.
>
> As I said in the email preceding these patches:
> "I'm not quire sure about this patch.  I'm seeing oddities where the  
> number of bytes in "NORMALWRITTENBYTES + DIRECTWRITTENBYTES" does  
> not equal SERVERWRITTENBYTES.  In some cases the amount of data  
> written by the app (normalwrittenbytes + directwrittenbytes) is  
> greater than the amount of data written to the server, and in other  
> cases it's less than the amount of data written to the server.  At  
> the moment, i've ascribed that difference to data being written to  
> the local cache and not flushed, but that doesn't make a whole lot  
> of sense."

Our e-mail system may have barfed on my earlier reply, but I would  
like to understand why you need this statistic.  What kind of analysis  
are you looking for on the write side?

>
>>> +        print
>>> +            print '%13s %12s %15s %15s' % ("Data Written:", "To  
>>> Server", "To Cache", "Total")
>>> +        print '%10s %13.4fMB %13.4fMB %13.4fMB' % ("", \
>>> +                            bytes_written_by_nfs / 1024.0 /  
>>> 1024.0, \
>>> +                        cached_write_bytes / 1024.0 / 1024.0, \
>>> +                        bytes_written_by_app / 1024.0 / 1024.0)
>>>
>>>    def __print_attr_cache_stats(self, sample_time):
>>>        """Print attribute cache efficiency stats

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] nfs-iostat.py: Added bytes written output
  2009-04-21 19:00                 ` Chuck Lever
@ 2009-04-21 20:39                   ` Kevin Constantine
  2009-04-21 20:46                     ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-21 20:39 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

Chuck Lever wrote:
> On Apr 21, 2009, at 1:57 PM, Kevin Constantine wrote:
>> Chuck Lever wrote:
>>> On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
>>>> Print statistics about bytes written by NFS
>>>>
>>>> Signed-off-by: Kevin Constantine 
>>>> <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
>>>> ---
>>>> tools/nfs-iostat/nfs-iostat.py |   10 ++++++++++
>>>> 1 files changed, 10 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/tools/nfs-iostat/nfs-iostat.py 
>>>> b/tools/nfs-iostat/nfs-iostat.py
>>>> index d331a72..5f2e232 100644
>>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>>> @@ -221,6 +221,7 @@ class DeviceData:
>>>>        """
>>>>        nfs_stats = self.__nfs_data
>>>>        app_bytes_read = float(nfs_stats['normalreadbytes'] + 
>>>> nfs_stats['directreadbytes'])
>>>> +        bytes_written_by_app = float(nfs_stats['normalwritebytes'] 
>>>> + nfs_stats['directwritebytes'])
>>>>        if app_bytes_read != 0:
>>>>            read_bytes_from_server = float(nfs_stats['serverreadbytes'])
>>>>            directio_bytes_from_server = 
>>>> float(nfs_stats['directreadbytes'])
>>>> @@ -234,6 +235,15 @@ class DeviceData:
>>>>                        cached_read_bytes / 1024.0 / 1024.0, \
>>>>                        app_bytes_read / 1024.0 / 1024.0, \
>>>>                        ratio)
>>>> +    if bytes_written_by_app != 0:
>>>> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
>>>> +        cached_write_bytes = float(bytes_written_by_app - 
>>>> bytes_written_by_nfs);
>>> Again, I don't see why this computation is necessary:  The 
>>> "normalwritebytes" statistic is what was written through the client's 
>>> cache, and "directwritebytes" counts the number of bytes that were 
>>> written via O_DIRECT.
>>
>> As I said in the email preceding these patches:
>> "I'm not quire sure about this patch.  I'm seeing oddities where the 
>> number of bytes in "NORMALWRITTENBYTES + DIRECTWRITTENBYTES" does not 
>> equal SERVERWRITTENBYTES.  In some cases the amount of data written by 
>> the app (normalwrittenbytes + directwrittenbytes) is greater than the 
>> amount of data written to the server, and in other cases it's less 
>> than the amount of data written to the server.  At the moment, i've 
>> ascribed that difference to data being written to the local cache and 
>> not flushed, but that doesn't make a whole lot of sense."
> 
> Our e-mail system may have barfed on my earlier reply, but I would like 
> to understand why you need this statistic.  What kind of analysis are 
> you looking for on the write side?
>

As far as writes go, all I really wanted to print is MB being written to 
the nfs server.  But I noticed along the way that the values don't add 
up and found it odd.

An application writing data to an nfs server should be getting counted 
by either normalwrittenbytes, or directwrittenbytes.  The data being 
written by the nfs by an NFS write request should be counted by 
serverwrittenbytes.  normal + direct should equal server, but it 
doesn't, and I'm wondering what happened to those bytes.  Sometimes 
normal + direct > server, sometimes normal + direct < server.

What I have in the patch doesn't really matter because I don't 
understand the imbalance in the counters.

-kevin

>>
>>>> +        print
>>>> +            print '%13s %12s %15s %15s' % ("Data Written:", "To 
>>>> Server", "To Cache", "Total")
>>>> +        print '%10s %13.4fMB %13.4fMB %13.4fMB' % ("", \
>>>> +                            bytes_written_by_nfs / 1024.0 / 1024.0, \
>>>> +                        cached_write_bytes / 1024.0 / 1024.0, \
>>>> +                        bytes_written_by_app / 1024.0 / 1024.0)
>>>>
>>>>    def __print_attr_cache_stats(self, sample_time):
>>>>        """Print attribute cache efficiency stats
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] nfs-iostat.py: Added bytes written output
  2009-04-21 20:39                   ` Kevin Constantine
@ 2009-04-21 20:46                     ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2009-04-21 20:46 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: linux-nfs

On Apr 21, 2009, at 4:39 PM, Kevin Constantine wrote:
> Chuck Lever wrote:
>> On Apr 21, 2009, at 1:57 PM, Kevin Constantine wrote:
>>> Chuck Lever wrote:
>>>> On Apr 20, 2009, at 10:03 PM, Kevin Constantine wrote:
>>>>> Print statistics about bytes written by NFS
>>>>>
>>>>> Signed-off-by: Kevin Constantine <kevin.constantine@disneyanimation.com 
>>>>> >
>>>>> ---
>>>>> tools/nfs-iostat/nfs-iostat.py |   10 ++++++++++
>>>>> 1 files changed, 10 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/ 
>>>>> nfs-iostat.py
>>>>> index d331a72..5f2e232 100644
>>>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>>>> @@ -221,6 +221,7 @@ class DeviceData:
>>>>>       """
>>>>>       nfs_stats = self.__nfs_data
>>>>>       app_bytes_read = float(nfs_stats['normalreadbytes'] +  
>>>>> nfs_stats['directreadbytes'])
>>>>> +        bytes_written_by_app =  
>>>>> float(nfs_stats['normalwritebytes'] +  
>>>>> nfs_stats['directwritebytes'])
>>>>>       if app_bytes_read != 0:
>>>>>           read_bytes_from_server =  
>>>>> float(nfs_stats['serverreadbytes'])
>>>>>           directio_bytes_from_server =  
>>>>> float(nfs_stats['directreadbytes'])
>>>>> @@ -234,6 +235,15 @@ class DeviceData:
>>>>>                       cached_read_bytes / 1024.0 / 1024.0, \
>>>>>                       app_bytes_read / 1024.0 / 1024.0, \
>>>>>                       ratio)
>>>>> +    if bytes_written_by_app != 0:
>>>>> +        bytes_written_by_nfs =  
>>>>> float(nfs_stats['serverwritebytes'])
>>>>> +        cached_write_bytes = float(bytes_written_by_app -  
>>>>> bytes_written_by_nfs);
>>>> Again, I don't see why this computation is necessary:  The  
>>>> "normalwritebytes" statistic is what was written through the  
>>>> client's cache, and "directwritebytes" counts the number of bytes  
>>>> that were written via O_DIRECT.
>>>
>>> As I said in the email preceding these patches:
>>> "I'm not quire sure about this patch.  I'm seeing oddities where  
>>> the number of bytes in "NORMALWRITTENBYTES + DIRECTWRITTENBYTES"  
>>> does not equal SERVERWRITTENBYTES.  In some cases the amount of  
>>> data written by the app (normalwrittenbytes + directwrittenbytes)  
>>> is greater than the amount of data written to the server, and in  
>>> other cases it's less than the amount of data written to the  
>>> server.  At the moment, i've ascribed that difference to data  
>>> being written to the local cache and not flushed, but that doesn't  
>>> make a whole lot of sense."
>> Our e-mail system may have barfed on my earlier reply, but I would  
>> like to understand why you need this statistic.  What kind of  
>> analysis are you looking for on the write side?
>>
>
> As far as writes go, all I really wanted to print is MB being  
> written to the nfs server.  But I noticed along the way that the  
> values don't add up and found it odd.
>
> An application writing data to an nfs server should be getting  
> counted by either normalwrittenbytes, or directwrittenbytes.  The  
> data being written by the nfs by an NFS write request should be  
> counted by serverwrittenbytes.  normal + direct should equal server,  
> but it doesn't, and I'm wondering what happened to those bytes.   
> Sometimes normal + direct > server, sometimes normal + direct <  
> server.
>
> What I have in the patch doesn't really matter because I don't  
> understand the imbalance in the counters.

If you want to see how much data is going from the client to the  
server, use serverwrittenbytes.  No user-space computation is required.

The other two measure data coming in from system calls.  The VFS and  
the NFS client do some write caching.  The instantaneous sum of these  
two will not always be the same as what is written on the wire.

> -kevin
>
>>>
>>>>> +        print
>>>>> +            print '%13s %12s %15s %15s' % ("Data Written:", "To  
>>>>> Server", "To Cache", "Total")
>>>>> +        print '%10s %13.4fMB %13.4fMB %13.4fMB' % ("", \
>>>>> +                            bytes_written_by_nfs / 1024.0 /  
>>>>> 1024.0, \
>>>>> +                        cached_write_bytes / 1024.0 / 1024.0, \
>>>>> +                        bytes_written_by_app / 1024.0 / 1024.0)
>>>>>
>>>>>   def __print_attr_cache_stats(self, sample_time):
>>>>>       """Print attribute cache efficiency stats
>> -- 
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] nfs-iostat.py: Print Data Cache Statistics
  2009-04-21 18:58             ` Chuck Lever
@ 2009-04-22  1:42               ` Kevin Constantine
       [not found]                 ` <1240364531-11499-1-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-22  1:42 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, Kevin Constantine

I rolled the read cache, and write patches into a single patch below.

add --data flag to print data cache statistics
print read cache stats from __print_data_cache_stats
print stats about bytes written by NFS

Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
---
 tools/nfs-iostat/nfs-iostat.py |   28 ++++++++++++++++++++++------
 1 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py
index 9626d42..cb3da59 100644
--- a/tools/nfs-iostat/nfs-iostat.py
+++ b/tools/nfs-iostat/nfs-iostat.py
@@ -220,14 +220,22 @@ class DeviceData:
         """Print the data cache hit rate
         """
         nfs_stats = self.__nfs_data
-        app_bytes_read = float(nfs_stats['normalreadbytes'])
+        app_bytes_read = float(nfs_stats['normalreadbytes'] + nfs_stats['directreadbytes'])
         if app_bytes_read != 0:
-            client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
-            ratio = ((app_bytes_read - client_bytes_read) * 100) / app_bytes_read
-
+            cached_read_bytes = float(app_bytes_read - float(nfs_stats['serverreadbytes']))
+            ratio = (cached_read_bytes * 100) / app_bytes_read
             print
-            print 'app bytes: %f  client bytes %f' % (app_bytes_read, client_bytes_read)
-            print 'Data cache hit ratio: %4.2f%%' % ratio
+            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From Server", "From Cache", "Total", "Hit %")
+            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
+	    					float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0, \
+						cached_read_bytes / 1024.0 / 1024.0, \
+						app_bytes_read / 1024.0 / 1024.0, \
+						ratio)
+        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
+	if bytes_written_by_nfs != 0:
+	    print
+            print '%13s %12s' % ("Data Written:", "To Server")
+	    print '%10s %13.4fMB' % ("", bytes_written_by_nfs / 1024.0 / 1024.0)
 
     def __print_attr_cache_stats(self, sample_time):
         """Print attribute cache efficiency stats
@@ -390,6 +398,10 @@ class DeviceData:
             self.__print_rpc_op_stats('READ', sample_time)
             self.__print_rpc_op_stats('WRITE', sample_time)
             self.__print_page_stats(sample_time)
+        elif which == 4:
+            self.__print_rpc_op_stats('READ', sample_time)
+            self.__print_rpc_op_stats('WRITE', sample_time)
+            self.__print_data_cache_stats()
 
 #
 # Functions
@@ -487,6 +499,10 @@ def iostat_command(name):
         if arg in ['-p', '--page']:
             which = 3
             continue
+	
+	if arg in ['--data']:
+	    which = 4
+	    continue
 
         if arg == sys.argv[0]:
             continue
-- 
1.6.2.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfs-iostat.py: Print Data Cache Statistics
       [not found]                 ` <1240364531-11499-1-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
@ 2009-04-22 18:38                   ` Chuck Lever
  2009-04-25  3:16                     ` Kevin Constantine
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2009-04-22 18:38 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: Linux NFS Mailing List

Hi Kevin-

On Apr 21, 2009, at 9:42 PM, Kevin Constantine wrote:
> I rolled the read cache, and write patches into a single patch below.
>
> add --data flag to print data cache statistics
> print read cache stats from __print_data_cache_stats
> print stats about bytes written by NFS
>
> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
> >
> ---
> tools/nfs-iostat/nfs-iostat.py |   28 ++++++++++++++++++++++------
> 1 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
> iostat.py
> index 9626d42..cb3da59 100644
> --- a/tools/nfs-iostat/nfs-iostat.py
> +++ b/tools/nfs-iostat/nfs-iostat.py
> @@ -220,14 +220,22 @@ class DeviceData:
>         """Print the data cache hit rate
>         """
>         nfs_stats = self.__nfs_data
> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
> +        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
> nfs_stats['directreadbytes'])

Again, I object to this particular change.  The reason for the current  
computation is to retain precision and mitigate the chance of an  
overflow.  Reversing the sense of this computation (by introducing the  
addition above) increases the probability of overflow here.  Remember  
that over time, these numbers can grow exceptionally large.

If you think you need this change, you do need to provide some  
justification for the reorganization in the patch description.  Why is  
the current code inadequate?

>         if app_bytes_read != 0:
> -            client_bytes_read = float(nfs_stats['serverreadbytes']  
> - nfs_stats['directreadbytes'])
> -            ratio = ((app_bytes_read - client_bytes_read) * 100) /  
> app_bytes_read
> -
> +            cached_read_bytes = float(app_bytes_read -  
> float(nfs_stats['serverreadbytes']))
> +            ratio = (cached_read_bytes * 100) / app_bytes_read
>             print
> -            print 'app bytes: %f  client bytes %f' %  
> (app_bytes_read, client_bytes_read)
> -            print 'Data cache hit ratio: %4.2f%%' % ratio

In my opinion, displaying O_DIRECT reads right next to the cache hit  
ratio is deceptive.  This is why I did not include the O_DIRECT counts  
here.  They have no impact on the page cache hit ratio, as O_DIRECT  
data is never cached in the kernel.

> +            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From  
> Server", "From Cache", "Total", "Hit %")
> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
> +	    					float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0, \
> +						cached_read_bytes / 1024.0 / 1024.0, \
> +						app_bytes_read / 1024.0 / 1024.0, \
> +						ratio)

If app_bytes_read is zero, we should print a line here that displays  
zeroes, as is done in other areas of this code.  I know that nfsstat  
quells output lines containing all zeros, but other tools (like  
iostat) do not.  What ends up happening is that when there is no I/O  
activity, no lines are produced at all.  That is confusing to users  
when nfs-iostat is in the "output one line every N seconds" mode.

> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
> +	if bytes_written_by_nfs != 0:
> +	    print
> +            print '%13s %12s' % ("Data Written:", "To Server")
> +	    print '%10s %13.4fMB' % ("", bytes_written_by_nfs / 1024.0 /  
> 1024.0)

As you are not dividing by bytes_written_by_nfs, there is no  
particular reason to gate this output.  I don't see the need for the  
extra variable here either.  You don't have a similar variable in the  
read statistics, for example.

You included application and direct read bytes in your read  
statistics, but not here.  Why not?

>     def __print_attr_cache_stats(self, sample_time):
>         """Print attribute cache efficiency stats
> @@ -390,6 +398,10 @@ class DeviceData:
>             self.__print_rpc_op_stats('READ', sample_time)
>             self.__print_rpc_op_stats('WRITE', sample_time)
>             self.__print_page_stats(sample_time)
> +        elif which == 4:
> +            self.__print_rpc_op_stats('READ', sample_time)
> +            self.__print_rpc_op_stats('WRITE', sample_time)
> +            self.__print_data_cache_stats()
>
> #
> # Functions
> @@ -487,6 +499,10 @@ def iostat_command(name):
>         if arg in ['-p', '--page']:
>             which = 3
>             continue
> +	
> +	if arg in ['--data']:
> +	    which = 4
> +	    continue
>
>         if arg == sys.argv[0]:
>             continue

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfs-iostat.py: Print Data Cache Statistics
  2009-04-22 18:38                   ` Chuck Lever
@ 2009-04-25  3:16                     ` Kevin Constantine
       [not found]                       ` <49F28096.9060300-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-25  3:16 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

Chuck Lever wrote:
> Hi Kevin-
> 
> On Apr 21, 2009, at 9:42 PM, Kevin Constantine wrote:
>> I rolled the read cache, and write patches into a single patch below.
>>
>> add --data flag to print data cache statistics
>> print read cache stats from __print_data_cache_stats
>> print stats about bytes written by NFS
>>
>> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
>> ---
>> tools/nfs-iostat/nfs-iostat.py |   28 ++++++++++++++++++++++------
>> 1 files changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/nfs-iostat/nfs-iostat.py 
>> b/tools/nfs-iostat/nfs-iostat.py
>> index 9626d42..cb3da59 100644
>> --- a/tools/nfs-iostat/nfs-iostat.py
>> +++ b/tools/nfs-iostat/nfs-iostat.py
>> @@ -220,14 +220,22 @@ class DeviceData:
>>         """Print the data cache hit rate
>>         """
>>         nfs_stats = self.__nfs_data
>> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
>> +        app_bytes_read = float(nfs_stats['normalreadbytes'] + 
>> nfs_stats['directreadbytes'])
> 
> Again, I object to this particular change.  The reason for the current 
> computation is to retain precision and mitigate the chance of an 
> overflow.  Reversing the sense of this computation (by introducing the 
> addition above) increases the probability of overflow here.  Remember 
> that over time, these numbers can grow exceptionally large.
> 
> If you think you need this change, you do need to provide some 
> justification for the reorganization in the patch description.  Why is 
> the current code inadequate?
> 

I've changed my opinion on how the cache hit ratio should be computed and now agree with you.  I was thinking that direct reads should negatively 
impact the ratio, but that doesn't make a whole lot of sense.

>>         if app_bytes_read != 0:
>> -            client_bytes_read = float(nfs_stats['serverreadbytes'] - 
>> nfs_stats['directreadbytes'])
>> -            ratio = ((app_bytes_read - client_bytes_read) * 100) / 
>> app_bytes_read
>> -
>> +            cached_read_bytes = float(app_bytes_read - 
>> float(nfs_stats['serverreadbytes']))
>> +            ratio = (cached_read_bytes * 100) / app_bytes_read
>>             print
>> -            print 'app bytes: %f  client bytes %f' % (app_bytes_read, 
>> client_bytes_read)
>> -            print 'Data cache hit ratio: %4.2f%%' % ratio
> 
> In my opinion, displaying O_DIRECT reads right next to the cache hit 
> ratio is deceptive.  This is why I did not include the O_DIRECT counts 
> here.  They have no impact on the page cache hit ratio, as O_DIRECT data 
> is never cached in the kernel.
> 

I've changed the output considerably this time around.  I think it makes more sense, let me know what you think.

nani:/vol/pamtest mounted on /data/pamtest:

    op/s         rpc bklog
3452.00           13.16
read:             ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                 3452.000        111407.906       32.273        0 (0.0%)           4.633           8.628
write:            ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                   0.000           0.000           0.000        0 (0.0%)           0.000           0.000

            Network                                          Direct I/O             Direct %                  Total
       Read        Write   Cache Reads    Cache Hit %     Read        Write       Read     Write         Read        Write
   107.8750MB     0.0000MB     4.1236MB       3.68%     0.0000MB     0.0000MB     0.00%     0.00%   111.9986MB     0.0000MB


>> +            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From 
>> Server", "From Cache", "Total", "Hit %")
>> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
>> +                            float(nfs_stats['serverreadbytes']) / 
>> 1024.0 / 1024.0, \
>> +                        cached_read_bytes / 1024.0 / 1024.0, \
>> +                        app_bytes_read / 1024.0 / 1024.0, \
>> +                        ratio)
> 
> If app_bytes_read is zero, we should print a line here that displays 
> zeroes, as is done in other areas of this code.  I know that nfsstat 
> quells output lines containing all zeros, but other tools (like iostat) 
> do not.  What ends up happening is that when there is no I/O activity, 
> no lines are produced at all.  That is confusing to users when 
> nfs-iostat is in the "output one line every N seconds" mode.
> 

Agreed.  It now prints out zeroes.  There are instances where data has been
sent through the NFS stack and counted in the serverbytes counter, but
haven't made it through the userspace stack and haven't been counted so we
get a negative cache hit ratio. In these cases, I'm converting the cache
ratio and cache bytes to zero.

>> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
>> +    if bytes_written_by_nfs != 0:
>> +        print
>> +            print '%13s %12s' % ("Data Written:", "To Server")
>> +        print '%10s %13.4fMB' % ("", bytes_written_by_nfs / 1024.0 / 
>> 1024.0)
> 
> As you are not dividing by bytes_written_by_nfs, there is no particular 
> reason to gate this output.  I don't see the need for the extra variable 
> here either.  You don't have a similar variable in the read statistics, 
> for example.
> 

I was thinking the gate was there to prevent non-zero values from printing.
This has been fixed.

> You included application and direct read bytes in your read statistics, 
> but not here.  Why not?
> 

The new output prints Network reads/writes, direct reads/writes, total
reads/writes, cache reads, cache ratio and direct i/o ratios for reads/writes.

New Patch below:

add --data flag to print data cache statistics
--data flag prints Network reads/writes, cache reads, cache hit%, Direct I/O reads/writes, Direct I/O% and Total reads/Writes

Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
---
  tools/nfs-iostat/nfs-iostat.py |   53 +++++++++++++++++++++++++++++++++++++---
  1 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py
index 9626d42..b9f3a96 100644
--- a/tools/nfs-iostat/nfs-iostat.py
+++ b/tools/nfs-iostat/nfs-iostat.py
@@ -220,14 +220,51 @@ class DeviceData:
          """Print the data cache hit rate
          """
          nfs_stats = self.__nfs_data
+	client_bytes_read = 0
+	ratio = 0
+	direct_read_ratio = 0
+	direct_write_ratio = 0
+
          app_bytes_read = float(nfs_stats['normalreadbytes'])
+        client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
+        direct_read_bytes = float(nfs_stats['directreadbytes'])
+	direct_write_bytes = float(nfs_stats['directwritebytes'])
          if app_bytes_read != 0:
-            client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
              ratio = ((app_bytes_read - client_bytes_read) * 100) / app_bytes_read
+        if direct_read_bytes != 0:
+            direct_read_ratio = float(nfs_stats['directreadbytes']) * 100 / \
+				(app_bytes_read + direct_read_bytes)
+        if direct_write_bytes != 0:
+            direct_write_ratio = float(nfs_stats['directwritebytes']) * 100 / \
+				(direct_write_bytes + float(nfs_stats['normalwritebytes']))
+
+        network_reads_MB = float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0
+        network_writes_MB = float(nfs_stats['serverwritebytes']) / 1024.0 / 1024.0
+        cached_reads_MB = float(app_bytes_read - client_bytes_read) / 1024.0 / 1024.0
+        direct_reads_MB = float(nfs_stats['directreadbytes']) / 1024.0 / 1024.0
+        direct_writes_MB = float(nfs_stats['directwritebytes']) / 1024.0 / 1024.0
+        total_reads_MB = float(app_bytes_read + nfs_stats['directreadbytes']) / 1024.0 / 1024.0
+	total_writes_MB = float(nfs_stats['normalwritebytes'] + nfs_stats['directwritebytes']) / \
+			1024.0 / 1024.0
+
+        """ Bytes may have been counted by the NFS stack but haven't been counted by the
+            userspace stack.  When this happens, we see negative cache values.  Set these
+            to zero to avoid confusion.
+        """
+        if cached_reads_MB < 0:
+            cached_reads_MB = 0;
+        if ratio < 0:
+            ratio = 0;

-            print
-            print 'app bytes: %f  client bytes %f' % (app_bytes_read, client_bytes_read)
-            print 'Data cache hit ratio: %4.2f%%' % ratio
+        print
+        print '%18s %51s %20s %22s' % ("Network", "Direct I/O", "Direct %", "Total")
+        print '%10s %12s %13s %14s %8s %12s %10s %9s %12s %12s' % \
+		("Read", "Write", "Cache Reads", "Cache Hit %", "Read", "Write", \
+		"Read", "Write",  "Read", "Write")
+        print '%10.4fMB %10.4fMB %10.4fMB %10.2f%% %10.4fMB %10.4fMB %8.2f%% %8.2f%% %10.4fMB %10.4fMB' % ( \
+		network_reads_MB, network_writes_MB, cached_reads_MB, ratio, direct_reads_MB, \
+	 	direct_writes_MB, direct_read_ratio, direct_write_ratio, total_reads_MB, \
+		total_writes_MB)

      def __print_attr_cache_stats(self, sample_time):
          """Print attribute cache efficiency stats
@@ -390,6 +427,10 @@ class DeviceData:
              self.__print_rpc_op_stats('READ', sample_time)
              self.__print_rpc_op_stats('WRITE', sample_time)
              self.__print_page_stats(sample_time)
+        elif which == 4:
+            self.__print_rpc_op_stats('READ', sample_time)
+            self.__print_rpc_op_stats('WRITE', sample_time)
+            self.__print_data_cache_stats()

  #
  # Functions
@@ -487,6 +528,10 @@ def iostat_command(name):
          if arg in ['-p', '--page']:
              which = 3
              continue
+	
+	if arg in ['--data']:
+	    which = 4
+	    continue

          if arg == sys.argv[0]:
              continue
-- 1.6.2.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfs-iostat.py: Print Data Cache Statistics
       [not found]                       ` <49F28096.9060300-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
@ 2009-04-27 15:33                         ` Chuck Lever
  2009-04-30  1:57                           ` Kevin Constantine
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2009-04-27 15:33 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: Linux NFS Mailing List

On Apr 24, 2009, at 11:16 PM, Kevin Constantine wrote:
> Chuck Lever wrote:
>> Hi Kevin-
>> On Apr 21, 2009, at 9:42 PM, Kevin Constantine wrote:
>>> I rolled the read cache, and write patches into a single patch  
>>> below.
>>>
>>> add --data flag to print data cache statistics
>>> print read cache stats from __print_data_cache_stats
>>> print stats about bytes written by NFS
>>>
>>> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
>>> >
>>> ---
>>> tools/nfs-iostat/nfs-iostat.py |   28 ++++++++++++++++++++++------
>>> 1 files changed, 22 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
>>> iostat.py
>>> index 9626d42..cb3da59 100644
>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>> @@ -220,14 +220,22 @@ class DeviceData:
>>>        """Print the data cache hit rate
>>>        """
>>>        nfs_stats = self.__nfs_data
>>> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
>>> +        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
>>> nfs_stats['directreadbytes'])
>> Again, I object to this particular change.  The reason for the  
>> current computation is to retain precision and mitigate the chance  
>> of an overflow.  Reversing the sense of this computation (by  
>> introducing the addition above) increases the probability of  
>> overflow here.  Remember that over time, these numbers can grow  
>> exceptionally large.
>> If you think you need this change, you do need to provide some  
>> justification for the reorganization in the patch description.  Why  
>> is the current code inadequate?
>
> I've changed my opinion on how the cache hit ratio should be  
> computed and now agree with you.  I was thinking that direct reads  
> should negatively impact the ratio, but that doesn't make a whole  
> lot of sense.
>
>>>        if app_bytes_read != 0:
>>> -            client_bytes_read =  
>>> float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
>>> -            ratio = ((app_bytes_read - client_bytes_read) *  
>>> 100) / app_bytes_read
>>> -
>>> +            cached_read_bytes = float(app_bytes_read -  
>>> float(nfs_stats['serverreadbytes']))
>>> +            ratio = (cached_read_bytes * 100) / app_bytes_read
>>>            print
>>> -            print 'app bytes: %f  client bytes %f' %  
>>> (app_bytes_read, client_bytes_read)
>>> -            print 'Data cache hit ratio: %4.2f%%' % ratio
>> In my opinion, displaying O_DIRECT reads right next to the cache  
>> hit ratio is deceptive.  This is why I did not include the O_DIRECT  
>> counts here.  They have no impact on the page cache hit ratio, as  
>> O_DIRECT data is never cached in the kernel.
>
> I've changed the output considerably this time around.  I think it  
> makes more sense, let me know what you think.
>
> nani:/vol/pamtest mounted on /data/pamtest:
>
>   op/s         rpc bklog
> 3452.00           13.16
> read:             ops/s            kB/s           kB/op          
> retrans         avg RTT (ms)    avg exe (ms)
>                3452.000        111407.906       32.273        0  
> (0.0%)           4.633           8.628
> write:            ops/s            kB/s           kB/op          
> retrans         avg RTT (ms)    avg exe (ms)
>                  0.000           0.000           0.000        0  
> (0.0%)           0.000           0.000
>
>           Network                                          Direct I/ 
> O             Direct %                  Total
>      Read        Write   Cache Reads    Cache Hit %     Read         
> Write       Read     Write         Read        Write
>  107.8750MB     0.0000MB     4.1236MB       3.68%     0.0000MB      
> 0.0000MB     0.00%     0.00%   111.9986MB     0.0000MB

This is a lot better, but I still find the placement of direct I/O  
statistics near the cache hit ratio confusing.  Direct reads are a  
totally different beast from normal (cached) reads.  I also think that  
read caching and write caching behavior are different enough that  
reporting them separately might be warranted.

It occurs to me that another important piece of information here is  
how often the NFS client decides it needs to invalidate page cache  
pages.  It doesn't invalidate individual pages; if a file's mtime has  
changed, the client purges all cached data for that file.  So, if the  
client has read the data in a page once already, but the cache was  
invalidated, it will read the same page again (whether data in the  
page has changed on the server or not).

I think it would be informative to provide at least the page cache  
invalidation number with the hit ratio, but it could also be  
interesting to see how often the same page is read multiple times.   
The kernel doesn't keep that stat at the moment.

Distinguishing between "cache missed because client had never read the  
page before" and "cache missed because client noticed the file  
changed" might be important, but it might also be difficult to track.   
For now, maybe the best approach is simply to report the page cache  
invalidation stat along with the computed hit ratio?

>>> +            print '%10s %15s %15s %15s %7s' % ("Data Read:",  
>>> "From Server", "From Cache", "Total", "Hit %")
>>> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' %  
>>> ("", \
>>> +                            float(nfs_stats['serverreadbytes']) /  
>>> 1024.0 / 1024.0, \
>>> +                        cached_read_bytes / 1024.0 / 1024.0, \
>>> +                        app_bytes_read / 1024.0 / 1024.0, \
>>> +                        ratio)
>> If app_bytes_read is zero, we should print a line here that  
>> displays zeroes, as is done in other areas of this code.  I know  
>> that nfsstat quells output lines containing all zeros, but other  
>> tools (like iostat) do not.  What ends up happening is that when  
>> there is no I/O activity, no lines are produced at all.  That is  
>> confusing to users when nfs-iostat is in the "output one line every  
>> N seconds" mode.
>
> Agreed.  It now prints out zeroes.  There are instances where data  
> has been
> sent through the NFS stack and counted in the serverbytes counter, but
> haven't made it through the userspace stack and haven't been counted  
> so we
> get a negative cache hit ratio. In these cases, I'm converting the  
> cache
> ratio and cache bytes to zero.

I don't see how this can happen.  All of the NFS I/O statistics are  
counted in the kernel; there is no "user space stack".  "normal" and  
"direct" are counted before the NFS client starts any network operation.

So, maybe there's a bug in the kernel accounting or reporting?  Or  
perhaps this behavior is exposed only on SMP?

>>> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
>>> +    if bytes_written_by_nfs != 0:
>>> +        print
>>> +            print '%13s %12s' % ("Data Written:", "To Server")
>>> +        print '%10s %13.4fMB' % ("", bytes_written_by_nfs /  
>>> 1024.0 / 1024.0)
>> As you are not dividing by bytes_written_by_nfs, there is no  
>> particular reason to gate this output.  I don't see the need for  
>> the extra variable here either.  You don't have a similar variable  
>> in the read statistics, for example.
>
> I was thinking the gate was there to prevent non-zero values from  
> printing.
> This has been fixed.
>
>> You included application and direct read bytes in your read  
>> statistics, but not here.  Why not?
>
> The new output prints Network reads/writes, direct reads/writes, total
> reads/writes, cache reads, cache ratio and direct i/o ratios for  
> reads/writes.
>
> New Patch below:
>
> add --data flag to print data cache statistics
> --data flag prints Network reads/writes, cache reads, cache hit%,  
> Direct I/O reads/writes, Direct I/O% and Total reads/Writes
>
> Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org 
> >
> ---
> tools/nfs-iostat/nfs-iostat.py |   53 +++++++++++++++++++++++++++++++ 
> ++++++---
> 1 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
> iostat.py
> index 9626d42..b9f3a96 100644
> --- a/tools/nfs-iostat/nfs-iostat.py
> +++ b/tools/nfs-iostat/nfs-iostat.py
> @@ -220,14 +220,51 @@ class DeviceData:
>         """Print the data cache hit rate
>         """
>         nfs_stats = self.__nfs_data
> +	client_bytes_read = 0
> +	ratio = 0
> +	direct_read_ratio = 0
> +	direct_write_ratio = 0
> +
>         app_bytes_read = float(nfs_stats['normalreadbytes'])
> +        client_bytes_read = float(nfs_stats['serverreadbytes'] -  
> nfs_stats['directreadbytes'])
> +        direct_read_bytes = float(nfs_stats['directreadbytes'])
> +	direct_write_bytes = float(nfs_stats['directwritebytes'])
>         if app_bytes_read != 0:
> -            client_bytes_read = float(nfs_stats['serverreadbytes']  
> - nfs_stats['directreadbytes'])
>             ratio = ((app_bytes_read - client_bytes_read) * 100) /  
> app_bytes_read
> +        if direct_read_bytes != 0:
> +            direct_read_ratio = float(nfs_stats['directreadbytes'])  
> * 100 / \
> +				(app_bytes_read + direct_read_bytes)
> +        if direct_write_bytes != 0:
> +            direct_write_ratio =  
> float(nfs_stats['directwritebytes']) * 100 / \
> +				(direct_write_bytes + float(nfs_stats['normalwritebytes']))
> +
> +        network_reads_MB = float(nfs_stats['serverreadbytes']) /  
> 1024.0 / 1024.0
> +        network_writes_MB = float(nfs_stats['serverwritebytes']) /  
> 1024.0 / 1024.0
> +        cached_reads_MB = float(app_bytes_read -  
> client_bytes_read) / 1024.0 / 1024.0
> +        direct_reads_MB = float(nfs_stats['directreadbytes']) /  
> 1024.0 / 1024.0
> +        direct_writes_MB = float(nfs_stats['directwritebytes']) /  
> 1024.0 / 1024.0
> +        total_reads_MB = float(app_bytes_read +  
> nfs_stats['directreadbytes']) / 1024.0 / 1024.0
> +	total_writes_MB = float(nfs_stats['normalwritebytes'] +  
> nfs_stats['directwritebytes']) / \
> +			1024.0 / 1024.0

Just a nit: the indentation here is uneven.  Given this is Python,  
this could be a little more than just a cosmetic problem over the long  
run.

I always find it a challenge to avoid using tabs in Python scripts.

> +
> +        """ Bytes may have been counted by the NFS stack but  
> haven't been counted by the
> +            userspace stack.  When this happens, we see negative  
> cache values.  Set these
> +            to zero to avoid confusion.
> +        """
> +        if cached_reads_MB < 0:
> +            cached_reads_MB = 0;
> +        if ratio < 0:
> +            ratio = 0;
>
> -            print
> -            print 'app bytes: %f  client bytes %f' %  
> (app_bytes_read, client_bytes_read)
> -            print 'Data cache hit ratio: %4.2f%%' % ratio
> +        print
> +        print '%18s %51s %20s %22s' % ("Network", "Direct I/O",  
> "Direct %", "Total")
> +        print '%10s %12s %13s %14s %8s %12s %10s %9s %12s %12s' % \
> +		("Read", "Write", "Cache Reads", "Cache Hit %", "Read", "Write", \
> +		"Read", "Write",  "Read", "Write")
> +        print '%10.4fMB %10.4fMB %10.4fMB %10.2f%% %10.4fMB  
> %10.4fMB %8.2f%% %8.2f%% %10.4fMB %10.4fMB' % ( \
> +		network_reads_MB, network_writes_MB, cached_reads_MB, ratio,  
> direct_reads_MB, \
> +	 	direct_writes_MB, direct_read_ratio, direct_write_ratio,  
> total_reads_MB, \
> +		total_writes_MB)
>
>     def __print_attr_cache_stats(self, sample_time):
>         """Print attribute cache efficiency stats
> @@ -390,6 +427,10 @@ class DeviceData:
>             self.__print_rpc_op_stats('READ', sample_time)
>             self.__print_rpc_op_stats('WRITE', sample_time)
>             self.__print_page_stats(sample_time)
> +        elif which == 4:
> +            self.__print_rpc_op_stats('READ', sample_time)
> +            self.__print_rpc_op_stats('WRITE', sample_time)
> +            self.__print_data_cache_stats()
>
> #
> # Functions
> @@ -487,6 +528,10 @@ def iostat_command(name):
>         if arg in ['-p', '--page']:
>             which = 3
>             continue
> +	
> +	if arg in ['--data']:
> +	    which = 4
> +	    continue
>
>         if arg == sys.argv[0]:
>             continue
> -- 1.6.2.1

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfs-iostat.py: Print Data Cache Statistics
  2009-04-27 15:33                         ` Chuck Lever
@ 2009-04-30  1:57                           ` Kevin Constantine
  2009-04-30 14:43                             ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Constantine @ 2009-04-30  1:57 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

Chuck Lever wrote:
> On Apr 24, 2009, at 11:16 PM, Kevin Constantine wrote:
>> Chuck Lever wrote:
>>> Hi Kevin-
>>> On Apr 21, 2009, at 9:42 PM, Kevin Constantine wrote:
>>>> I rolled the read cache, and write patches into a single patch below.
>>>>
>>>> add --data flag to print data cache statistics
>>>> print read cache stats from __print_data_cache_stats
>>>> print stats about bytes written by NFS
>>>>
>>>> Signed-off-by: Kevin Constantine 
>>>> <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
>>>> ---
>>>> tools/nfs-iostat/nfs-iostat.py |   28 ++++++++++++++++++++++------
>>>> 1 files changed, 22 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/tools/nfs-iostat/nfs-iostat.py 
>>>> b/tools/nfs-iostat/nfs-iostat.py
>>>> index 9626d42..cb3da59 100644
>>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>>> @@ -220,14 +220,22 @@ class DeviceData:
>>>>        """Print the data cache hit rate
>>>>        """
>>>>        nfs_stats = self.__nfs_data
>>>> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
>>>> +        app_bytes_read = float(nfs_stats['normalreadbytes'] + 
>>>> nfs_stats['directreadbytes'])
>>> Again, I object to this particular change.  The reason for the 
>>> current computation is to retain precision and mitigate the chance of 
>>> an overflow.  Reversing the sense of this computation (by introducing 
>>> the addition above) increases the probability of overflow here.  
>>> Remember that over time, these numbers can grow exceptionally large.
>>> If you think you need this change, you do need to provide some 
>>> justification for the reorganization in the patch description.  Why 
>>> is the current code inadequate?
>>
>> I've changed my opinion on how the cache hit ratio should be computed 
>> and now agree with you.  I was thinking that direct reads should 
>> negatively impact the ratio, but that doesn't make a whole lot of sense.
>>
>>>>        if app_bytes_read != 0:
>>>> -            client_bytes_read = float(nfs_stats['serverreadbytes'] 
>>>> - nfs_stats['directreadbytes'])
>>>> -            ratio = ((app_bytes_read - client_bytes_read) * 100) / 
>>>> app_bytes_read
>>>> -
>>>> +            cached_read_bytes = float(app_bytes_read - 
>>>> float(nfs_stats['serverreadbytes']))
>>>> +            ratio = (cached_read_bytes * 100) / app_bytes_read
>>>>            print
>>>> -            print 'app bytes: %f  client bytes %f' % 
>>>> (app_bytes_read, client_bytes_read)
>>>> -            print 'Data cache hit ratio: %4.2f%%' % ratio
>>> In my opinion, displaying O_DIRECT reads right next to the cache hit 
>>> ratio is deceptive.  This is why I did not include the O_DIRECT 
>>> counts here.  They have no impact on the page cache hit ratio, as 
>>> O_DIRECT data is never cached in the kernel.
>>
>> I've changed the output considerably this time around.  I think it 
>> makes more sense, let me know what you think.
>>
>> nani:/vol/pamtest mounted on /data/pamtest:
>>
>>   op/s         rpc bklog
>> 3452.00           13.16
>> read:             ops/s            kB/s           kB/op         
>> retrans         avg RTT (ms)    avg exe (ms)
>>                3452.000        111407.906       32.273        0 
>> (0.0%)           4.633           8.628
>> write:            ops/s            kB/s           kB/op         
>> retrans         avg RTT (ms)    avg exe (ms)
>>                  0.000           0.000           0.000        0 
>> (0.0%)           0.000           0.000
>>
>>           Network                                          Direct 
>> I/O             Direct %                  Total
>>      Read        Write   Cache Reads    Cache Hit %     Read        
>> Write       Read     Write         Read        Write
>>  107.8750MB     0.0000MB     4.1236MB       3.68%     0.0000MB     
>> 0.0000MB     0.00%     0.00%   111.9986MB     0.0000MB
> 
> This is a lot better, but I still find the placement of direct I/O 
> statistics near the cache hit ratio confusing.  Direct reads are a 
> totally different beast from normal (cached) reads.  

The goal of this output is to display what kind of I/O the client is
performing, and where that data is coming from (be it over the network,
or from cache).  I'm not particularly attached to keeping the direct
stats in there, I just figured that it's useful to know that, for
example, 100MB of the 112MB can be attributed to direct i/o.  I've
pulled the direct i/o stuff out.

The new output looks like:

            Network                                                                Total
       Read        Write    Cache Reads    Cache Hit %   Invalidations       Read        Write
  3585.4648MB     0.0000MB   606.5352MB         14.47%               1  4192.0000MB     0.0000MB

> I also think that 
> read caching and write caching behavior are different enough that 
> reporting them separately might be warranted.
> 

Are you suggesting that instead of printing out Network Reads/writes,
instead there is a way to print out just reads, and just writes?  I guess
I'm a little confused by this.

> It occurs to me that another important piece of information here is how 
> often the NFS client decides it needs to invalidate page cache pages.  
> It doesn't invalidate individual pages; if a file's mtime has changed, 
> the client purges all cached data for that file.  So, if the client has 
> read the data in a page once already, but the cache was invalidated, it 
> will read the same page again (whether data in the page has changed on 
> the server or not).
> 
> I think it would be informative to provide at least the page cache 
> invalidation number with the hit ratio, but it could also be interesting 
> to see how often the same page is read multiple times.  The kernel 
> doesn't keep that stat at the moment.
> 
> Distinguishing between "cache missed because client had never read the 
> page before" and "cache missed because client noticed the file changed" 
> might be important, but it might also be difficult to track.  For now,
> maybe the best approach is simply to report the page cache invalidation 
> stat along with the computed hit ratio?
> 

I've printed out the datainvalidate counter, which, as far as I can
tell, is the correct counter.

>>>> +            print '%10s %15s %15s %15s %7s' % ("Data Read:", "From 
>>>> Server", "From Cache", "Total", "Hit %")
>>>> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \
>>>> +                            float(nfs_stats['serverreadbytes']) / 
>>>> 1024.0 / 1024.0, \
>>>> +                        cached_read_bytes / 1024.0 / 1024.0, \
>>>> +                        app_bytes_read / 1024.0 / 1024.0, \
>>>> +                        ratio)
>>> If app_bytes_read is zero, we should print a line here that displays 
>>> zeroes, as is done in other areas of this code.  I know that nfsstat 
>>> quells output lines containing all zeros, but other tools (like 
>>> iostat) do not.  What ends up happening is that when there is no I/O 
>>> activity, no lines are produced at all.  That is confusing to users 
>>> when nfs-iostat is in the "output one line every N seconds" mode.
>>
>> Agreed.  It now prints out zeroes.  There are instances where data has 
>> been
>> sent through the NFS stack and counted in the serverbytes counter, but
>> haven't made it through the userspace stack and haven't been counted 
>> so we
>> get a negative cache hit ratio. In these cases, I'm converting the cache
>> ratio and cache bytes to zero.
> 
> I don't see how this can happen.  All of the NFS I/O statistics are 
> counted in the kernel; there is no "user space stack".  "normal" and 
> "direct" are counted before the NFS client starts any network operation.
> 
> So, maybe there's a bug in the kernel accounting or reporting?  Or 
> perhaps this behavior is exposed only on SMP?
> 


As an example: I mounted a volume, and began a sequential read.  The
mountstats proc file bytes counter looks like:

486533058 0 0 0 487096320 0 118920 0

According to this, I've read more from the server than my application
has asked for.  Because we calculate cached_byes as application_bytes -
server_bytes, we end up with a negative value.

>>>> +        bytes_written_by_nfs = float(nfs_stats['serverwritebytes'])
>>>> +    if bytes_written_by_nfs != 0:
>>>> +        print
>>>> +            print '%13s %12s' % ("Data Written:", "To Server")
>>>> +        print '%10s %13.4fMB' % ("", bytes_written_by_nfs / 1024.0 
>>>> / 1024.0)
>>> As you are not dividing by bytes_written_by_nfs, there is no 
>>> particular reason to gate this output.  I don't see the need for the 
>>> extra variable here either.  You don't have a similar variable in the 
>>> read statistics, for example.
>>
>> I was thinking the gate was there to prevent non-zero values from 
>> printing.
>> This has been fixed.
>>
>>> You included application and direct read bytes in your read 
>>> statistics, but not here.  Why not?
>>
>> The new output prints Network reads/writes, direct reads/writes, total
>> reads/writes, cache reads, cache ratio and direct i/o ratios for 
>> reads/writes.
>>
>> New Patch below:

[...]

>> nfs_stats['directreadbytes']) / 1024.0 / 1024.0
>> +    total_writes_MB = float(nfs_stats['normalwritebytes'] + 
>> nfs_stats['directwritebytes']) / \
>> +            1024.0 / 1024.0
> 
> Just a nit: the indentation here is uneven.  Given this is Python, this 
> could be a little more than just a cosmetic problem over the long run.
> 

fixed

> I always find it a challenge to avoid using tabs in Python scripts.
> 

Out of curiosity, why were nfs-iostats.py and mountstats.py written in
python instead of c like the rest of nfs-utils?

>> +
>> +        """ Bytes may have been counted by the NFS stack but haven't 
>> been counted by the
[...]

The next iteration of the patch below:


  add --data flag to print data cache statistics

--data flag prints Network reads/writes, cache reads, cache hit%, Cache invalidation count, Total reads/Writes

Signed-off-by: Kevin Constantine <kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
---
  tools/nfs-iostat/nfs-iostat.py |   43 ++++++++++++++++++++++++++++++++++++---
  1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py
index 9626d42..e8bb2fb 100644
--- a/tools/nfs-iostat/nfs-iostat.py
+++ b/tools/nfs-iostat/nfs-iostat.py
@@ -220,14 +220,41 @@ class DeviceData:
          """Print the data cache hit rate
          """
          nfs_stats = self.__nfs_data
+        client_bytes_read = 0
+        ratio = 0
+        direct_read_ratio = 0
+        direct_write_ratio = 0
+
          app_bytes_read = float(nfs_stats['normalreadbytes'])
+        client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
+        direct_read_bytes = float(nfs_stats['directreadbytes'])
+        direct_write_bytes = float(nfs_stats['directwritebytes'])
          if app_bytes_read != 0:
-            client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
              ratio = ((app_bytes_read - client_bytes_read) * 100) / app_bytes_read
+        if direct_read_bytes != 0:
+            direct_read_ratio = float(nfs_stats['directreadbytes']) * 100 / \
+				(app_bytes_read + direct_read_bytes)
+        if direct_write_bytes != 0:
+            direct_write_ratio = float(nfs_stats['directwritebytes']) * 100 / \
+				(direct_write_bytes + float(nfs_stats['normalwritebytes']))
+
+        network_reads_MB = float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0
+        network_writes_MB = float(nfs_stats['serverwritebytes']) / 1024.0 / 1024.0
+        cached_reads_MB = float(app_bytes_read - client_bytes_read) / 1024.0 / 1024.0
+        direct_reads_MB = float(nfs_stats['directreadbytes']) / 1024.0 / 1024.0
+        direct_writes_MB = float(nfs_stats['directwritebytes']) / 1024.0 / 1024.0
+        total_reads_MB = float(app_bytes_read + nfs_stats['directreadbytes']) / 1024.0 / 1024.0
+        total_writes_MB = float(nfs_stats['normalwritebytes'] + nfs_stats['directwritebytes']) / \
+			1024.0 / 1024.0

-            print
-            print 'app bytes: %f  client bytes %f' % (app_bytes_read, client_bytes_read)
-            print 'Data cache hit ratio: %4.2f%%' % ratio
+        print
+        print '%18s %68s' % ("Network", "Total")
+        print '%10s %12s %14s %14s %15s %10s %12s' % \
+		("Read", "Write", "Cache Reads", "Cache Hit %", \
+		"Invalidations", "Read", "Write")
+        print '%10.4fMB %10.4fMB %10.4fMB %13.2f%% %15d %10.4fMB %10.4fMB' % ( \
+		network_reads_MB, network_writes_MB, cached_reads_MB, ratio, \
+	 	nfs_stats['datainvalidates'], total_reads_MB, total_writes_MB)

      def __print_attr_cache_stats(self, sample_time):
          """Print attribute cache efficiency stats
@@ -390,6 +417,10 @@ class DeviceData:
              self.__print_rpc_op_stats('READ', sample_time)
              self.__print_rpc_op_stats('WRITE', sample_time)
              self.__print_page_stats(sample_time)
+        elif which == 4:
+            self.__print_rpc_op_stats('READ', sample_time)
+            self.__print_rpc_op_stats('WRITE', sample_time)
+            self.__print_data_cache_stats()

  #
  # Functions
@@ -487,6 +518,10 @@ def iostat_command(name):
          if arg in ['-p', '--page']:
              which = 3
              continue
+	
+        if arg in ['--data']:
+            which = 4
+            continue

          if arg == sys.argv[0]:
              continue
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfs-iostat.py: Print Data Cache Statistics
  2009-04-30  1:57                           ` Kevin Constantine
@ 2009-04-30 14:43                             ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2009-04-30 14:43 UTC (permalink / raw)
  To: Kevin Constantine; +Cc: Linux NFS Mailing List

Hi Kevin-

Thanks for sticking with this.

On Apr 29, 2009, at 9:57 PM, Kevin Constantine wrote:
> Chuck Lever wrote:
>> On Apr 24, 2009, at 11:16 PM, Kevin Constantine wrote:
>>> Chuck Lever wrote:
>>>> Hi Kevin-
>>>> On Apr 21, 2009, at 9:42 PM, Kevin Constantine wrote:
>>>>> I rolled the read cache, and write patches into a single patch  
>>>>> below.
>>>>>
>>>>> add --data flag to print data cache statistics
>>>>> print read cache stats from __print_data_cache_stats
>>>>> print stats about bytes written by NFS
>>>>>
>>>>> Signed-off-by: Kevin Constantine <kevin.constantine@disneyanimation.com 
>>>>> >
>>>>> ---
>>>>> tools/nfs-iostat/nfs-iostat.py |   28 ++++++++++++++++++++++------
>>>>> 1 files changed, 22 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/ 
>>>>> nfs-iostat.py
>>>>> index 9626d42..cb3da59 100644
>>>>> --- a/tools/nfs-iostat/nfs-iostat.py
>>>>> +++ b/tools/nfs-iostat/nfs-iostat.py
>>>>> @@ -220,14 +220,22 @@ class DeviceData:
>>>>>       """Print the data cache hit rate
>>>>>       """
>>>>>       nfs_stats = self.__nfs_data
>>>>> -        app_bytes_read = float(nfs_stats['normalreadbytes'])
>>>>> +        app_bytes_read = float(nfs_stats['normalreadbytes'] +  
>>>>> nfs_stats['directreadbytes'])
>>>> Again, I object to this particular change.  The reason for the  
>>>> current computation is to retain precision and mitigate the  
>>>> chance of an overflow.  Reversing the sense of this computation  
>>>> (by introducing the addition above) increases the probability of  
>>>> overflow here.  Remember that over time, these numbers can grow  
>>>> exceptionally large.
>>>> If you think you need this change, you do need to provide some  
>>>> justification for the reorganization in the patch description.   
>>>> Why is the current code inadequate?
>>>
>>> I've changed my opinion on how the cache hit ratio should be  
>>> computed and now agree with you.  I was thinking that direct reads  
>>> should negatively impact the ratio, but that doesn't make a whole  
>>> lot of sense.
>>>
>>>>>       if app_bytes_read != 0:
>>>>> -            client_bytes_read =  
>>>>> float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes'])
>>>>> -            ratio = ((app_bytes_read - client_bytes_read) *  
>>>>> 100) / app_bytes_read
>>>>> -
>>>>> +            cached_read_bytes = float(app_bytes_read -  
>>>>> float(nfs_stats['serverreadbytes']))
>>>>> +            ratio = (cached_read_bytes * 100) / app_bytes_read
>>>>>           print
>>>>> -            print 'app bytes: %f  client bytes %f' %  
>>>>> (app_bytes_read, client_bytes_read)
>>>>> -            print 'Data cache hit ratio: %4.2f%%' % ratio
>>>> In my opinion, displaying O_DIRECT reads right next to the cache  
>>>> hit ratio is deceptive.  This is why I did not include the  
>>>> O_DIRECT counts here.  They have no impact on the page cache hit  
>>>> ratio, as O_DIRECT data is never cached in the kernel.
>>>
>>> I've changed the output considerably this time around.  I think it  
>>> makes more sense, let me know what you think.
>>>
>>> nani:/vol/pamtest mounted on /data/pamtest:
>>>
>>>  op/s         rpc bklog
>>> 3452.00           13.16
>>> read:             ops/s            kB/s           kB/op          
>>> retrans         avg RTT (ms)    avg exe (ms)
>>>               3452.000        111407.906       32.273        0  
>>> (0.0%)           4.633           8.628
>>> write:            ops/s            kB/s           kB/op          
>>> retrans         avg RTT (ms)    avg exe (ms)
>>>                 0.000           0.000           0.000        0  
>>> (0.0%)           0.000           0.000
>>>
>>>          Network                                          Direct I/ 
>>> O             Direct %                  Total
>>>     Read        Write   Cache Reads    Cache Hit %     Read         
>>> Write       Read     Write         Read        Write
>>> 107.8750MB     0.0000MB     4.1236MB       3.68%     0.0000MB      
>>> 0.0000MB     0.00%     0.00%   111.9986MB     0.0000MB
>> This is a lot better, but I still find the placement of direct I/O  
>> statistics near the cache hit ratio confusing.  Direct reads are a  
>> totally different beast from normal (cached) reads.
>
> The goal of this output is to display what kind of I/O the client is
> performing, and where that data is coming from (be it over the  
> network,
> or from cache).  I'm not particularly attached to keeping the direct
> stats in there, I just figured that it's useful to know that, for
> example, 100MB of the 112MB can be attributed to direct i/o.  I've
> pulled the direct i/o stuff out.

I didn't mean for you to pull the direct I/O report out.  I think that  
information is useful, as you do.  But perhaps it could be displayed  
separately (like on a separate line).

> The new output looks like:
>
>            
> Network 
>                                                                 Total
>      Read        Write    Cache Reads    Cache Hit %    
> Invalidations       Read        Write
> 3585.4648MB     0.0000MB   606.5352MB         14.47%                
> 1  4192.0000MB     0.0000MB
>
>> I also think that read caching and write caching behavior are  
>> different enough that reporting them separately might be warranted.
>
> Are you suggesting that instead of printing out Network Reads/writes,
> instead there is a way to print out just reads, and just writes?  I  
> guess
> I'm a little confused by this.

Again, I was thinking of separate lines.  I'm just commenting on how  
the report is visually organized.

  o  We have normal reads, which go through the cache.  The  
interesting numbers are cache hit ratio, and data cache  
invalidations.  (And maybe read-ahead... see below).

  o  We have direct reads, which don't touch the page cache at all,  
and go right to the server

  o  We have normal writes, which are often combined by the client,  
and can sit in the page cache for a while.  An interesting statistic  
here would be how often the client is forced to flush the write cache  
via memory pressure, fsync(2) or close(2).

  o  We have direct writes, which go right to the server.

  o  We have the amount of data that actually appeared on the wire at  
some point.

Maybe you can report the raw I/O numbers (network, cached, direct  
reads and writes) on one line, and the page cache effectiveness data  
on a separate line?  Or perhaps reads on one line, and writes on  
another, since read caching is a different process than write  
gathering?  Or, perhaps you could provide a new command line option to  
display the direct I/O report?

>> It occurs to me that another important piece of information here is  
>> how often the NFS client decides it needs to invalidate page cache  
>> pages.  It doesn't invalidate individual pages; if a file's mtime  
>> has changed, the client purges all cached data for that file.  So,  
>> if the client has read the data in a page once already, but the  
>> cache was invalidated, it will read the same page again (whether  
>> data in the page has changed on the server or not).
>> I think it would be informative to provide at least the page cache  
>> invalidation number with the hit ratio, but it could also be  
>> interesting to see how often the same page is read multiple times.   
>> The kernel doesn't keep that stat at the moment.
>> Distinguishing between "cache missed because client had never read  
>> the page before" and "cache missed because client noticed the file  
>> changed" might be important, but it might also be difficult to  
>> track.  For now,
>> maybe the best approach is simply to report the page cache  
>> invalidation stat along with the computed hit ratio?
>
> I've printed out the datainvalidate counter, which, as far as I can
> tell, is the correct counter.

Yeah, I think that's correct.

>>>>> +            print '%10s %15s %15s %15s %7s' % ("Data Read:",  
>>>>> "From Server", "From Cache", "Total", "Hit %")
>>>>> +            print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' %  
>>>>> ("", \
>>>>> +                             
>>>>> float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0, \
>>>>> +                        cached_read_bytes / 1024.0 / 1024.0, \
>>>>> +                        app_bytes_read / 1024.0 / 1024.0, \
>>>>> +                        ratio)
>>>> If app_bytes_read is zero, we should print a line here that  
>>>> displays zeroes, as is done in other areas of this code.  I know  
>>>> that nfsstat quells output lines containing all zeros, but other  
>>>> tools (like iostat) do not.  What ends up happening is that when  
>>>> there is no I/O activity, no lines are produced at all.  That is  
>>>> confusing to users when nfs-iostat is in the "output one line  
>>>> every N seconds" mode.
>>>
>>> Agreed.  It now prints out zeroes.  There are instances where data  
>>> has been
>>> sent through the NFS stack and counted in the serverbytes counter,  
>>> but
>>> haven't made it through the userspace stack and haven't been  
>>> counted so we
>>> get a negative cache hit ratio. In these cases, I'm converting the  
>>> cache
>>> ratio and cache bytes to zero.
>> I don't see how this can happen.  All of the NFS I/O statistics are  
>> counted in the kernel; there is no "user space stack".  "normal"  
>> and "direct" are counted before the NFS client starts any network  
>> operation.
>> So, maybe there's a bug in the kernel accounting or reporting?  Or  
>> perhaps this behavior is exposed only on SMP?
>
>
> As an example: I mounted a volume, and began a sequential read.  The
> mountstats proc file bytes counter looks like:
>
> 486533058 0 0 0 487096320 0 118920 0
>
> According to this, I've read more from the server than my application
> has asked for.  Because we calculate cached_byes as  
> application_bytes -
> server_bytes, we end up with a negative value.

Oh.  That's because of read-ahead.  So, _that_ would be a good reason  
to compute the read stats without subtraction!  :-)

Actually read-ahead effectiveness is about as interesting as page  
cache hit ratio.  If the client is reading too much (ie not using a  
lot of what was read from the server), that can have impact on server  
scalability.

We have counters that count the number of bytes that were read and  
written by the VFS (readpages and writepages).  This would be the  
number of bytes that were requested by normal cached reads and writes  
that were not already available in the page cache.  That gets rid of  
that subtraction, but perhaps doesn't get us a more accurate cache hit  
ratio.

I'll have to think about this.  We need additional counters in the  
kernel to sort this, but given the way the VFS is architected I don't  
immediately see an easy direct way to acquire this data.

>>>>> +        bytes_written_by_nfs =  
>>>>> float(nfs_stats['serverwritebytes'])
>>>>> +    if bytes_written_by_nfs != 0:
>>>>> +        print
>>>>> +            print '%13s %12s' % ("Data Written:", "To Server")
>>>>> +        print '%10s %13.4fMB' % ("", bytes_written_by_nfs /  
>>>>> 1024.0 / 1024.0)
>>>> As you are not dividing by bytes_written_by_nfs, there is no  
>>>> particular reason to gate this output.  I don't see the need for  
>>>> the extra variable here either.  You don't have a similar  
>>>> variable in the read statistics, for example.
>>>
>>> I was thinking the gate was there to prevent non-zero values from  
>>> printing.
>>> This has been fixed.
>>>
>>>> You included application and direct read bytes in your read  
>>>> statistics, but not here.  Why not?
>>>
>>> The new output prints Network reads/writes, direct reads/writes,  
>>> total
>>> reads/writes, cache reads, cache ratio and direct i/o ratios for  
>>> reads/writes.
>>>
>>> New Patch below:
>
> [...]
>
>>> nfs_stats['directreadbytes']) / 1024.0 / 1024.0
>>> +    total_writes_MB = float(nfs_stats['normalwritebytes'] +  
>>> nfs_stats['directwritebytes']) / \
>>> +            1024.0 / 1024.0
>> Just a nit: the indentation here is uneven.  Given this is Python,  
>> this could be a little more than just a cosmetic problem over the  
>> long run.
>
> fixed
>
>> I always find it a challenge to avoid using tabs in Python scripts.
>
> Out of curiosity, why were nfs-iostats.py and mountstats.py written in
> python instead of c like the rest of nfs-utils?

I wrote them as a quick prototype for testing while I was developing  
the kernel parts.  Since Red Hat has been providing some system  
utilities in Python for a while, we thought we might keep the  
prototypes and develop them into full-fledged tools for users.  We've  
also considered re-implementing these in C.

No-one has had time in a couple of years to look into this, though.   
It's still a work in progress.

>>> +
>>> +        """ Bytes may have been counted by the NFS stack but  
>>> haven't been counted by the
> [...]
>
> The next iteration of the patch below:
>
>
> add --data flag to print data cache statistics
>
> --data flag prints Network reads/writes, cache reads, cache hit%,  
> Cache invalidation count, Total reads/Writes
>
> Signed-off-by: Kevin Constantine <kevin.constantine@disneyanimation.com 
> >
> ---
> tools/nfs-iostat/nfs-iostat.py |   43 +++++++++++++++++++++++++++++++ 
> +++++---
> 1 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs- 
> iostat.py
> index 9626d42..e8bb2fb 100644
> --- a/tools/nfs-iostat/nfs-iostat.py
> +++ b/tools/nfs-iostat/nfs-iostat.py
> @@ -220,14 +220,41 @@ class DeviceData:
>         """Print the data cache hit rate
>         """
>         nfs_stats = self.__nfs_data
> +        client_bytes_read = 0
> +        ratio = 0
> +        direct_read_ratio = 0
> +        direct_write_ratio = 0
> +
>         app_bytes_read = float(nfs_stats['normalreadbytes'])
> +        client_bytes_read = float(nfs_stats['serverreadbytes'] -  
> nfs_stats['directreadbytes'])
> +        direct_read_bytes = float(nfs_stats['directreadbytes'])
> +        direct_write_bytes = float(nfs_stats['directwritebytes'])
>         if app_bytes_read != 0:
> -            client_bytes_read = float(nfs_stats['serverreadbytes']  
> - nfs_stats['directreadbytes'])
>             ratio = ((app_bytes_read - client_bytes_read) * 100) /  
> app_bytes_read
> +        if direct_read_bytes != 0:
> +            direct_read_ratio = float(nfs_stats['directreadbytes'])  
> * 100 / \
> +				(app_bytes_read + direct_read_bytes)
> +        if direct_write_bytes != 0:
> +            direct_write_ratio =  
> float(nfs_stats['directwritebytes']) * 100 / \
> +				(direct_write_bytes + float(nfs_stats['normalwritebytes']))
> +
> +        network_reads_MB = float(nfs_stats['serverreadbytes']) /  
> 1024.0 / 1024.0
> +        network_writes_MB = float(nfs_stats['serverwritebytes']) /  
> 1024.0 / 1024.0
> +        cached_reads_MB = float(app_bytes_read -  
> client_bytes_read) / 1024.0 / 1024.0
> +        direct_reads_MB = float(nfs_stats['directreadbytes']) /  
> 1024.0 / 1024.0
> +        direct_writes_MB = float(nfs_stats['directwritebytes']) /  
> 1024.0 / 1024.0
> +        total_reads_MB = float(app_bytes_read +  
> nfs_stats['directreadbytes']) / 1024.0 / 1024.0
> +        total_writes_MB = float(nfs_stats['normalwritebytes'] +  
> nfs_stats['directwritebytes']) / \
> +			1024.0 / 1024.0
>
> -            print
> -            print 'app bytes: %f  client bytes %f' %  
> (app_bytes_read, client_bytes_read)
> -            print 'Data cache hit ratio: %4.2f%%' % ratio
> +        print
> +        print '%18s %68s' % ("Network", "Total")
> +        print '%10s %12s %14s %14s %15s %10s %12s' % \
> +		("Read", "Write", "Cache Reads", "Cache Hit %", \
> +		"Invalidations", "Read", "Write")
> +        print '%10.4fMB %10.4fMB %10.4fMB %13.2f%% %15d %10.4fMB  
> %10.4fMB' % ( \
> +		network_reads_MB, network_writes_MB, cached_reads_MB, ratio, \
> +	 	nfs_stats['datainvalidates'], total_reads_MB, total_writes_MB)
>
>     def __print_attr_cache_stats(self, sample_time):
>         """Print attribute cache efficiency stats
> @@ -390,6 +417,10 @@ class DeviceData:
>             self.__print_rpc_op_stats('READ', sample_time)
>             self.__print_rpc_op_stats('WRITE', sample_time)
>             self.__print_page_stats(sample_time)
> +        elif which == 4:
> +            self.__print_rpc_op_stats('READ', sample_time)
> +            self.__print_rpc_op_stats('WRITE', sample_time)
> +            self.__print_data_cache_stats()
>
> #
> # Functions
> @@ -487,6 +518,10 @@ def iostat_command(name):
>         if arg in ['-p', '--page']:
>             which = 3
>             continue
> +	
> +        if arg in ['--data']:
> +            which = 4
> +            continue
>
>         if arg == sys.argv[0]:
>             continue
> -- 
> 1.6.2.1

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-04-30 14:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-21  2:03 Add data cache statistics to nfs-iostat.py Kevin Constantine
     [not found] ` <1240279414-30528-1-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
2009-04-21  2:03   ` [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics Kevin Constantine
     [not found]     ` <1240279414-30528-2-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
2009-04-21  2:03       ` [PATCH 2/2] nfs-iostat.py: Added bytes written output Kevin Constantine
     [not found]         ` <1240279414-30528-3-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
2009-04-21 14:26           ` Chuck Lever
2009-04-21 17:57             ` Kevin Constantine
     [not found]               ` <49EE0901.2040704-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
2009-04-21 19:00                 ` Chuck Lever
2009-04-21 20:39                   ` Kevin Constantine
2009-04-21 20:46                     ` Chuck Lever
2009-04-21 14:26       ` [PATCH 1/2] nfs-iostat.py: Print Data Cache Statistics Chuck Lever
2009-04-21 17:54         ` Kevin Constantine
     [not found]           ` <49EE083C.6010105-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
2009-04-21 18:58             ` Chuck Lever
2009-04-22  1:42               ` [PATCH] " Kevin Constantine
     [not found]                 ` <1240364531-11499-1-git-send-email-kevin.constantine-FfNkGbSheRGpB8w63BLUukEOCMrvLtNR@public.gmane.org>
2009-04-22 18:38                   ` Chuck Lever
2009-04-25  3:16                     ` Kevin Constantine
     [not found]                       ` <49F28096.9060300-P5ys19MLBK/QT0dZR+AlfA@public.gmane.org>
2009-04-27 15:33                         ` Chuck Lever
2009-04-30  1:57                           ` Kevin Constantine
2009-04-30 14:43                             ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.