From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kevin Constantine Subject: Re: [PATCH] nfs-iostat.py: Print Data Cache Statistics Date: Wed, 29 Apr 2009 18:57:19 -0700 Message-ID: <49F9057F.40105@disney.com> References: <8EF2000D-C4D6-4AC2-B37C-221594A515B4@oracle.com> <1240364531-11499-1-git-send-email-kevin.constantine@disneyanimation.com> <49F28096.9060300@disney.com> <50DACB0A-59C3-4BCF-A470-AAB5525CEF26@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Cc: Linux NFS Mailing List To: Chuck Lever Return-path: Received: from mailgate1.disneyanimation.com ([12.188.26.101]:22300 "EHLO mailgate1.disneyanimation.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751734AbZD3B5U (ORCPT ); Wed, 29 Apr 2009 21:57:20 -0400 In-Reply-To: <50DACB0A-59C3-4BCF-A470-AAB5525CEF26@oracle.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: Chuck Lever wrote: > On Apr 24, 2009, at 11:16 PM, Kevin Constantine wrote: >> Chuck Lever wrote: >>> Hi Kevin- >>> On Apr 21, 2009, at 9:42 PM, Kevin Constantine wrote: >>>> I rolled the read cache, and write patches into a single patch below. >>>> >>>> add --data flag to print data cache statistics >>>> print read cache stats from __print_data_cache_stats >>>> print stats about bytes written by NFS >>>> >>>> Signed-off-by: Kevin Constantine >>>> >>>> --- >>>> tools/nfs-iostat/nfs-iostat.py | 28 ++++++++++++++++++++++------ >>>> 1 files changed, 22 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/tools/nfs-iostat/nfs-iostat.py >>>> b/tools/nfs-iostat/nfs-iostat.py >>>> index 9626d42..cb3da59 100644 >>>> --- a/tools/nfs-iostat/nfs-iostat.py >>>> +++ b/tools/nfs-iostat/nfs-iostat.py >>>> @@ -220,14 +220,22 @@ class DeviceData: >>>> """Print the data cache hit rate >>>> """ >>>> nfs_stats = self.__nfs_data >>>> - app_bytes_read = float(nfs_stats['normalreadbytes']) >>>> + app_bytes_read = float(nfs_stats['normalreadbytes'] + >>>> nfs_stats['directreadbytes']) >>> Again, I object to this particular change. The reason for the >>> current computation is to retain precision and mitigate the chance of >>> an overflow. Reversing the sense of this computation (by introducing >>> the addition above) increases the probability of overflow here. >>> Remember that over time, these numbers can grow exceptionally large. >>> If you think you need this change, you do need to provide some >>> justification for the reorganization in the patch description. Why >>> is the current code inadequate? >> >> I've changed my opinion on how the cache hit ratio should be computed >> and now agree with you. I was thinking that direct reads should >> negatively impact the ratio, but that doesn't make a whole lot of sense. >> >>>> if app_bytes_read != 0: >>>> - client_bytes_read = float(nfs_stats['serverreadbytes'] >>>> - nfs_stats['directreadbytes']) >>>> - ratio = ((app_bytes_read - client_bytes_read) * 100) / >>>> app_bytes_read >>>> - >>>> + cached_read_bytes = float(app_bytes_read - >>>> float(nfs_stats['serverreadbytes'])) >>>> + ratio = (cached_read_bytes * 100) / app_bytes_read >>>> print >>>> - print 'app bytes: %f client bytes %f' % >>>> (app_bytes_read, client_bytes_read) >>>> - print 'Data cache hit ratio: %4.2f%%' % ratio >>> In my opinion, displaying O_DIRECT reads right next to the cache hit >>> ratio is deceptive. This is why I did not include the O_DIRECT >>> counts here. They have no impact on the page cache hit ratio, as >>> O_DIRECT data is never cached in the kernel. >> >> I've changed the output considerably this time around. I think it >> makes more sense, let me know what you think. >> >> nani:/vol/pamtest mounted on /data/pamtest: >> >> op/s rpc bklog >> 3452.00 13.16 >> read: ops/s kB/s kB/op >> retrans avg RTT (ms) avg exe (ms) >> 3452.000 111407.906 32.273 0 >> (0.0%) 4.633 8.628 >> write: ops/s kB/s kB/op >> retrans avg RTT (ms) avg exe (ms) >> 0.000 0.000 0.000 0 >> (0.0%) 0.000 0.000 >> >> Network Direct >> I/O Direct % Total >> Read Write Cache Reads Cache Hit % Read >> Write Read Write Read Write >> 107.8750MB 0.0000MB 4.1236MB 3.68% 0.0000MB >> 0.0000MB 0.00% 0.00% 111.9986MB 0.0000MB > > This is a lot better, but I still find the placement of direct I/O > statistics near the cache hit ratio confusing. Direct reads are a > totally different beast from normal (cached) reads. The goal of this output is to display what kind of I/O the client is performing, and where that data is coming from (be it over the network, or from cache). I'm not particularly attached to keeping the direct stats in there, I just figured that it's useful to know that, for example, 100MB of the 112MB can be attributed to direct i/o. I've pulled the direct i/o stuff out. The new output looks like: Network Total Read Write Cache Reads Cache Hit % Invalidations Read Write 3585.4648MB 0.0000MB 606.5352MB 14.47% 1 4192.0000MB 0.0000MB > I also think that > read caching and write caching behavior are different enough that > reporting them separately might be warranted. > Are you suggesting that instead of printing out Network Reads/writes, instead there is a way to print out just reads, and just writes? I guess I'm a little confused by this. > It occurs to me that another important piece of information here is how > often the NFS client decides it needs to invalidate page cache pages. > It doesn't invalidate individual pages; if a file's mtime has changed, > the client purges all cached data for that file. So, if the client has > read the data in a page once already, but the cache was invalidated, it > will read the same page again (whether data in the page has changed on > the server or not). > > I think it would be informative to provide at least the page cache > invalidation number with the hit ratio, but it could also be interesting > to see how often the same page is read multiple times. The kernel > doesn't keep that stat at the moment. > > Distinguishing between "cache missed because client had never read the > page before" and "cache missed because client noticed the file changed" > might be important, but it might also be difficult to track. For now, > maybe the best approach is simply to report the page cache invalidation > stat along with the computed hit ratio? > I've printed out the datainvalidate counter, which, as far as I can tell, is the correct counter. >>>> + print '%10s %15s %15s %15s %7s' % ("Data Read:", "From >>>> Server", "From Cache", "Total", "Hit %") >>>> + print '%10s %13.4fMB %13.4fMB %13.4fMB %6.2f%%' % ("", \ >>>> + float(nfs_stats['serverreadbytes']) / >>>> 1024.0 / 1024.0, \ >>>> + cached_read_bytes / 1024.0 / 1024.0, \ >>>> + app_bytes_read / 1024.0 / 1024.0, \ >>>> + ratio) >>> If app_bytes_read is zero, we should print a line here that displays >>> zeroes, as is done in other areas of this code. I know that nfsstat >>> quells output lines containing all zeros, but other tools (like >>> iostat) do not. What ends up happening is that when there is no I/O >>> activity, no lines are produced at all. That is confusing to users >>> when nfs-iostat is in the "output one line every N seconds" mode. >> >> Agreed. It now prints out zeroes. There are instances where data has >> been >> sent through the NFS stack and counted in the serverbytes counter, but >> haven't made it through the userspace stack and haven't been counted >> so we >> get a negative cache hit ratio. In these cases, I'm converting the cache >> ratio and cache bytes to zero. > > I don't see how this can happen. All of the NFS I/O statistics are > counted in the kernel; there is no "user space stack". "normal" and > "direct" are counted before the NFS client starts any network operation. > > So, maybe there's a bug in the kernel accounting or reporting? Or > perhaps this behavior is exposed only on SMP? > As an example: I mounted a volume, and began a sequential read. The mountstats proc file bytes counter looks like: 486533058 0 0 0 487096320 0 118920 0 According to this, I've read more from the server than my application has asked for. Because we calculate cached_byes as application_bytes - server_bytes, we end up with a negative value. >>>> + bytes_written_by_nfs = float(nfs_stats['serverwritebytes']) >>>> + if bytes_written_by_nfs != 0: >>>> + print >>>> + print '%13s %12s' % ("Data Written:", "To Server") >>>> + print '%10s %13.4fMB' % ("", bytes_written_by_nfs / 1024.0 >>>> / 1024.0) >>> As you are not dividing by bytes_written_by_nfs, there is no >>> particular reason to gate this output. I don't see the need for the >>> extra variable here either. You don't have a similar variable in the >>> read statistics, for example. >> >> I was thinking the gate was there to prevent non-zero values from >> printing. >> This has been fixed. >> >>> You included application and direct read bytes in your read >>> statistics, but not here. Why not? >> >> The new output prints Network reads/writes, direct reads/writes, total >> reads/writes, cache reads, cache ratio and direct i/o ratios for >> reads/writes. >> >> New Patch below: [...] >> nfs_stats['directreadbytes']) / 1024.0 / 1024.0 >> + total_writes_MB = float(nfs_stats['normalwritebytes'] + >> nfs_stats['directwritebytes']) / \ >> + 1024.0 / 1024.0 > > Just a nit: the indentation here is uneven. Given this is Python, this > could be a little more than just a cosmetic problem over the long run. > fixed > I always find it a challenge to avoid using tabs in Python scripts. > Out of curiosity, why were nfs-iostats.py and mountstats.py written in python instead of c like the rest of nfs-utils? >> + >> + """ Bytes may have been counted by the NFS stack but haven't >> been counted by the [...] The next iteration of the patch below: add --data flag to print data cache statistics --data flag prints Network reads/writes, cache reads, cache hit%, Cache invalidation count, Total reads/Writes Signed-off-by: Kevin Constantine --- tools/nfs-iostat/nfs-iostat.py | 43 ++++++++++++++++++++++++++++++++++++--- 1 files changed, 39 insertions(+), 4 deletions(-) diff --git a/tools/nfs-iostat/nfs-iostat.py b/tools/nfs-iostat/nfs-iostat.py index 9626d42..e8bb2fb 100644 --- a/tools/nfs-iostat/nfs-iostat.py +++ b/tools/nfs-iostat/nfs-iostat.py @@ -220,14 +220,41 @@ class DeviceData: """Print the data cache hit rate """ nfs_stats = self.__nfs_data + client_bytes_read = 0 + ratio = 0 + direct_read_ratio = 0 + direct_write_ratio = 0 + app_bytes_read = float(nfs_stats['normalreadbytes']) + client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes']) + direct_read_bytes = float(nfs_stats['directreadbytes']) + direct_write_bytes = float(nfs_stats['directwritebytes']) if app_bytes_read != 0: - client_bytes_read = float(nfs_stats['serverreadbytes'] - nfs_stats['directreadbytes']) ratio = ((app_bytes_read - client_bytes_read) * 100) / app_bytes_read + if direct_read_bytes != 0: + direct_read_ratio = float(nfs_stats['directreadbytes']) * 100 / \ + (app_bytes_read + direct_read_bytes) + if direct_write_bytes != 0: + direct_write_ratio = float(nfs_stats['directwritebytes']) * 100 / \ + (direct_write_bytes + float(nfs_stats['normalwritebytes'])) + + network_reads_MB = float(nfs_stats['serverreadbytes']) / 1024.0 / 1024.0 + network_writes_MB = float(nfs_stats['serverwritebytes']) / 1024.0 / 1024.0 + cached_reads_MB = float(app_bytes_read - client_bytes_read) / 1024.0 / 1024.0 + direct_reads_MB = float(nfs_stats['directreadbytes']) / 1024.0 / 1024.0 + direct_writes_MB = float(nfs_stats['directwritebytes']) / 1024.0 / 1024.0 + total_reads_MB = float(app_bytes_read + nfs_stats['directreadbytes']) / 1024.0 / 1024.0 + total_writes_MB = float(nfs_stats['normalwritebytes'] + nfs_stats['directwritebytes']) / \ + 1024.0 / 1024.0 - print - print 'app bytes: %f client bytes %f' % (app_bytes_read, client_bytes_read) - print 'Data cache hit ratio: %4.2f%%' % ratio + print + print '%18s %68s' % ("Network", "Total") + print '%10s %12s %14s %14s %15s %10s %12s' % \ + ("Read", "Write", "Cache Reads", "Cache Hit %", \ + "Invalidations", "Read", "Write") + print '%10.4fMB %10.4fMB %10.4fMB %13.2f%% %15d %10.4fMB %10.4fMB' % ( \ + network_reads_MB, network_writes_MB, cached_reads_MB, ratio, \ + nfs_stats['datainvalidates'], total_reads_MB, total_writes_MB) def __print_attr_cache_stats(self, sample_time): """Print attribute cache efficiency stats @@ -390,6 +417,10 @@ class DeviceData: self.__print_rpc_op_stats('READ', sample_time) self.__print_rpc_op_stats('WRITE', sample_time) self.__print_page_stats(sample_time) + elif which == 4: + self.__print_rpc_op_stats('READ', sample_time) + self.__print_rpc_op_stats('WRITE', sample_time) + self.__print_data_cache_stats() # # Functions @@ -487,6 +518,10 @@ def iostat_command(name): if arg in ['-p', '--page']: which = 3 continue + + if arg in ['--data']: + which = 4 + continue if arg == sys.argv[0]: continue -- 1.6.2.1