All of lore.kernel.org
 help / color / mirror / Atom feed
* Call graph of nfsd_open
@ 2015-12-03 18:31 Chuck Lever
  2015-12-03 19:53 ` Trond Myklebust
  2015-12-04  0:01 ` Jeff Layton
  0 siblings, 2 replies; 7+ messages in thread
From: Chuck Lever @ 2015-12-03 18:31 UTC (permalink / raw)
  To: Linux NFS Mailing List

Hi-

I’m looking into NFS WRITE latency on the Linux NFS server.
With a high speed network fabric and ultra-fast storage (think
persistent memory) we can get the per-WRITE latency under 50us
(as observed on the wire at the server).

One source of the remaining latency appears to be nfsd_open.
Here is an example of a call graph captured during a simple
iozone 4KB direct write workload (v4.4-rc3):

 2)               |            nfsd_open [nfsd]() {
 2)               |              fh_verify [nfsd]() {
 2)               |                rqst_exp_find [nfsd]() {
 2)               |                  exp_find [nfsd]() {
 2)               |                    exp_find_key [nfsd]() {
 2)   0.043 us    |                      svc_expkey_hash [nfsd]();
 2)               |                      sunrpc_cache_lookup [sunrpc]() {
 2)   0.027 us    |                        _raw_read_lock();
 2)               |                        getboottime64() {
 2)   0.022 us    |                          ns_to_timespec();
 2)   0.210 us    |                        }
 2)   0.021 us    |                        get_seconds();
 2)   0.854 us    |                      }
 2)               |                      cache_check [sunrpc]() {
 2)               |                        getboottime64() {
 2)   0.021 us    |                          ns_to_timespec();
 2)   0.206 us    |                        }
 2)   0.022 us    |                        get_seconds();
 2)   0.596 us    |                      }
 2)   2.050 us    |                    }
 2)               |                    exp_get_by_name [nfsd]() {
 2)               |                      svc_export_lookup [nfsd]() {
 2)               |                        sunrpc_cache_lookup [sunrpc]() {
 2)   0.022 us    |                          _raw_read_lock();
 2)   0.024 us    |                          svc_export_match [nfsd]();
 2)               |                          getboottime64() {
 2)   0.023 us    |                            ns_to_timespec();
 2)   0.210 us    |                          }
 2)   0.022 us    |                          get_seconds();
 2)   0.997 us    |                        }
 2)   1.181 us    |                      }
 2)               |                      cache_check [sunrpc]() {
 2)               |                        getboottime64() {
 2)   0.022 us    |                          ns_to_timespec();
 2)   0.210 us    |                        }
 2)   0.022 us    |                        get_seconds();
 2)   0.592 us    |                      }
 2)   2.147 us    |                    }
 2)   4.560 us    |                  }
 2)   4.768 us    |                }
 2)               |                prepare_creds() {
 2)               |                  kmem_cache_alloc() {
 2)   0.021 us    |                    _cond_resched();
 2)   0.233 us    |                  }
 2)               |                  security_prepare_creds() {
 2)               |                    selinux_cred_prepare() {
 2)               |                      kmemdup() {
 2)               |                        __kmalloc_track_caller() {
 2)   0.022 us    |                          kmalloc_slab();
 2)   0.023 us    |                          _cond_resched();
 2)   0.448 us    |                        }
 2)   0.640 us    |                      }
 2)   0.838 us    |                    }
 2)   1.129 us    |                  }
 2)   1.808 us    |                }
 2)   0.022 us    |                override_creds();
 2)               |                exportfs_decode_fh() {
 2)               |                  shmem_fh_to_dentry() {
 2)               |                    ilookup5() {
 2)               |                      ilookup5_nowait() {
 2)   0.026 us    |                        _raw_spin_lock();
 2)               |                        find_inode() {
 2)   0.031 us    |                          shmem_match();
 2)   0.029 us    |                          _raw_spin_lock();
 2)   0.429 us    |                        }
 2)   0.826 us    |                      }
 2)   0.022 us    |                      _cond_resched();
 2)   0.021 us    |                      _cond_resched();
 2)   1.417 us    |                    }
 2)               |                    d_find_alias() {
 2)   0.022 us    |                      _raw_spin_lock();
 2)   0.027 us    |                      _raw_spin_lock();
 2)   0.423 us    |                    }
 2)   0.035 us    |                    iput();
 2)   2.410 us    |                  }
 2)               |                  find_acceptable_alias() {
 2)   0.026 us    |                    nfsd_acceptable [nfsd]();
 2)   0.220 us    |                  }
 2)   3.032 us    |                }
 2)               |                nfsd_setuser_and_check_port [nfsd]() {
 2)   0.023 us    |                  nfsexp_flags [nfsd]();
 2)               |                  nfsd_setuser [nfsd]() {
 2)               |                    revert_creds() {
 2)               |                      __put_cred() {
 2)               |                        call_rcu_sched() {
 2)               |                          __call_rcu() {
 2)   0.049 us    |                            __call_rcu_nocb_enqueue();
 2)   0.265 us    |                          }
 2)   0.446 us    |                        }
 2)   0.661 us    |                      }
 2)   0.854 us    |                    }
 2)               |                    prepare_creds() {
 2)               |                      kmem_cache_alloc() {
 2)   0.022 us    |                        _cond_resched();
 2)   0.219 us    |                      }
 2)               |                      security_prepare_creds() {
 2)               |                        selinux_cred_prepare() {
 2)               |                          kmemdup() {
 2)               |                            __kmalloc_track_caller() {
 2)   0.024 us    |                              kmalloc_slab();
 2)   0.022 us    |                              _cond_resched();
 2)   0.412 us    |                            }
 2)   0.615 us    |                          }
 2)   0.801 us    |                        }
 2)   0.993 us    |                      }
 2)   1.651 us    |                    }
 2)   0.033 us    |                    set_groups();
 2)   0.023 us    |                    override_creds();
 2)   3.323 us    |                  }
 2)   0.029 us    |                  nfserrno [nfsd]();
 2)   3.919 us    |                }
 2)   0.025 us    |                check_nfsd_access [nfsd]();
 2)   0.022 us    |                nfsd_permission [nfsd]();
 2) + 14.904 us   |              }

So, 15us for fh_verify by itself.

 2)               |              dentry_open() {
 2)               |                get_empty_filp() {
 2)               |                  kmem_cache_alloc() {
 2)   0.021 us    |                    _cond_resched();
 2)   0.325 us    |                  }
 2)               |                  security_file_alloc() {
 2)               |                    selinux_file_alloc_security() {
 2)               |                      kmem_cache_alloc() {
 2)   0.022 us    |                        _cond_resched();
 2)   0.251 us    |                      }
 2)   0.441 us    |                    }
 2)   0.633 us    |                  }
 2)   0.022 us    |                  __mutex_init();
 2)   1.572 us    |                }
 2)               |                vfs_open() {
 2)               |                  do_dentry_open() {
 2)               |                    path_get() {
 2)   0.025 us    |                      mntget();
 2)   0.313 us    |                    }
 2)   0.029 us    |                    try_module_get();
 2)               |                    security_file_open() {
 2)               |                      selinux_file_open() {
 2)   0.024 us    |                        avc_policy_seqno();
 2)               |                        inode_has_perm.isra.21() {
 2)   0.148 us    |                          avc_has_perm();
 2)   0.332 us    |                        }
 2)   0.728 us    |                      }
 2)   0.023 us    |                      __fsnotify_parent();
 2)   0.034 us    |                      fsnotify();
 2)   1.359 us    |                    }
 2)   0.045 us    |                    file_ra_state_init();
 2)   2.548 us    |                  }
 2)   2.733 us    |                }
 2)   4.667 us    |              }
 2)               |              ima_file_check() {
 2)   0.028 us    |                process_measurement();
 2)   0.211 us    |              }
 2)   0.023 us    |              nfserrno [nfsd]();
 2) + 20.549 us   |            }

And that’s 20us total for this nfsd_open. Seems like a lot of
work for each NFSv3 WRITE operation, and plenty of opportunities
for lock contention if my server happened to be otherwise busy.

Note these figures are a little inflated because ftrace itself
adds some latency.

Anyway, making fh_verify faster could have a significant
impact on per-op latency with fast storage. The recently-proposed
NFSD open cache by itself could eliminate the need to perform
fh_verify so often, for example.

--
Chuck Lever





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Call graph of nfsd_open
  2015-12-03 18:31 Call graph of nfsd_open Chuck Lever
@ 2015-12-03 19:53 ` Trond Myklebust
  2015-12-03 19:56   ` Chuck Lever
  2015-12-15  1:34   ` Chuck Lever
  2015-12-04  0:01 ` Jeff Layton
  1 sibling, 2 replies; 7+ messages in thread
From: Trond Myklebust @ 2015-12-03 19:53 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

Hi Chuck,

On Thu, Dec 3, 2015 at 10:31 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> Hi-
>
> I’m looking into NFS WRITE latency on the Linux NFS server.
> With a high speed network fabric and ultra-fast storage (think
> persistent memory) we can get the per-WRITE latency under 50us
> (as observed on the wire at the server).
>
> One source of the remaining latency appears to be nfsd_open.
> Here is an example of a call graph captured during a simple
> iozone 4KB direct write workload (v4.4-rc3):
>
>  2)               |            nfsd_open [nfsd]() {
>  2)               |              fh_verify [nfsd]() {
>  2)               |                rqst_exp_find [nfsd]() {
>  2)               |                  exp_find [nfsd]() {
>  2)               |                    exp_find_key [nfsd]() {
>  2)   0.043 us    |                      svc_expkey_hash [nfsd]();
>  2)               |                      sunrpc_cache_lookup [sunrpc]() {
>  2)   0.027 us    |                        _raw_read_lock();
>  2)               |                        getboottime64() {
>  2)   0.022 us    |                          ns_to_timespec();
>  2)   0.210 us    |                        }
>  2)   0.021 us    |                        get_seconds();
>  2)   0.854 us    |                      }
>  2)               |                      cache_check [sunrpc]() {
>  2)               |                        getboottime64() {
>  2)   0.021 us    |                          ns_to_timespec();
>  2)   0.206 us    |                        }
>  2)   0.022 us    |                        get_seconds();
>  2)   0.596 us    |                      }
>  2)   2.050 us    |                    }
>  2)               |                    exp_get_by_name [nfsd]() {
>  2)               |                      svc_export_lookup [nfsd]() {
>  2)               |                        sunrpc_cache_lookup [sunrpc]() {
>  2)   0.022 us    |                          _raw_read_lock();
>  2)   0.024 us    |                          svc_export_match [nfsd]();
>  2)               |                          getboottime64() {
>  2)   0.023 us    |                            ns_to_timespec();
>  2)   0.210 us    |                          }
>  2)   0.022 us    |                          get_seconds();
>  2)   0.997 us    |                        }
>  2)   1.181 us    |                      }
>  2)               |                      cache_check [sunrpc]() {
>  2)               |                        getboottime64() {
>  2)   0.022 us    |                          ns_to_timespec();
>  2)   0.210 us    |                        }
>  2)   0.022 us    |                        get_seconds();
>  2)   0.592 us    |                      }
>  2)   2.147 us    |                    }
>  2)   4.560 us    |                  }
>  2)   4.768 us    |                }
>  2)               |                prepare_creds() {
>  2)               |                  kmem_cache_alloc() {
>  2)   0.021 us    |                    _cond_resched();
>  2)   0.233 us    |                  }
>  2)               |                  security_prepare_creds() {
>  2)               |                    selinux_cred_prepare() {
>  2)               |                      kmemdup() {
>  2)               |                        __kmalloc_track_caller() {
>  2)   0.022 us    |                          kmalloc_slab();
>  2)   0.023 us    |                          _cond_resched();
>  2)   0.448 us    |                        }
>  2)   0.640 us    |                      }
>  2)   0.838 us    |                    }
>  2)   1.129 us    |                  }
>  2)   1.808 us    |                }
>  2)   0.022 us    |                override_creds();
>  2)               |                exportfs_decode_fh() {
>  2)               |                  shmem_fh_to_dentry() {
>  2)               |                    ilookup5() {
>  2)               |                      ilookup5_nowait() {
>  2)   0.026 us    |                        _raw_spin_lock();
>  2)               |                        find_inode() {
>  2)   0.031 us    |                          shmem_match();
>  2)   0.029 us    |                          _raw_spin_lock();
>  2)   0.429 us    |                        }
>  2)   0.826 us    |                      }
>  2)   0.022 us    |                      _cond_resched();
>  2)   0.021 us    |                      _cond_resched();
>  2)   1.417 us    |                    }
>  2)               |                    d_find_alias() {
>  2)   0.022 us    |                      _raw_spin_lock();
>  2)   0.027 us    |                      _raw_spin_lock();
>  2)   0.423 us    |                    }
>  2)   0.035 us    |                    iput();
>  2)   2.410 us    |                  }
>  2)               |                  find_acceptable_alias() {
>  2)   0.026 us    |                    nfsd_acceptable [nfsd]();
>  2)   0.220 us    |                  }
>  2)   3.032 us    |                }
>  2)               |                nfsd_setuser_and_check_port [nfsd]() {
>  2)   0.023 us    |                  nfsexp_flags [nfsd]();
>  2)               |                  nfsd_setuser [nfsd]() {
>  2)               |                    revert_creds() {
>  2)               |                      __put_cred() {
>  2)               |                        call_rcu_sched() {
>  2)               |                          __call_rcu() {
>  2)   0.049 us    |                            __call_rcu_nocb_enqueue();
>  2)   0.265 us    |                          }
>  2)   0.446 us    |                        }
>  2)   0.661 us    |                      }
>  2)   0.854 us    |                    }
>  2)               |                    prepare_creds() {
>  2)               |                      kmem_cache_alloc() {
>  2)   0.022 us    |                        _cond_resched();
>  2)   0.219 us    |                      }
>  2)               |                      security_prepare_creds() {
>  2)               |                        selinux_cred_prepare() {
>  2)               |                          kmemdup() {
>  2)               |                            __kmalloc_track_caller() {
>  2)   0.024 us    |                              kmalloc_slab();
>  2)   0.022 us    |                              _cond_resched();
>  2)   0.412 us    |                            }
>  2)   0.615 us    |                          }
>  2)   0.801 us    |                        }
>  2)   0.993 us    |                      }
>  2)   1.651 us    |                    }
>  2)   0.033 us    |                    set_groups();
>  2)   0.023 us    |                    override_creds();
>  2)   3.323 us    |                  }
>  2)   0.029 us    |                  nfserrno [nfsd]();
>  2)   3.919 us    |                }
>  2)   0.025 us    |                check_nfsd_access [nfsd]();
>  2)   0.022 us    |                nfsd_permission [nfsd]();
>  2) + 14.904 us   |              }
>
> So, 15us for fh_verify by itself.
>
>  2)               |              dentry_open() {
>  2)               |                get_empty_filp() {
>  2)               |                  kmem_cache_alloc() {
>  2)   0.021 us    |                    _cond_resched();
>  2)   0.325 us    |                  }
>  2)               |                  security_file_alloc() {
>  2)               |                    selinux_file_alloc_security() {
>  2)               |                      kmem_cache_alloc() {
>  2)   0.022 us    |                        _cond_resched();
>  2)   0.251 us    |                      }
>  2)   0.441 us    |                    }
>  2)   0.633 us    |                  }
>  2)   0.022 us    |                  __mutex_init();
>  2)   1.572 us    |                }
>  2)               |                vfs_open() {
>  2)               |                  do_dentry_open() {
>  2)               |                    path_get() {
>  2)   0.025 us    |                      mntget();
>  2)   0.313 us    |                    }
>  2)   0.029 us    |                    try_module_get();
>  2)               |                    security_file_open() {
>  2)               |                      selinux_file_open() {
>  2)   0.024 us    |                        avc_policy_seqno();
>  2)               |                        inode_has_perm.isra.21() {
>  2)   0.148 us    |                          avc_has_perm();
>  2)   0.332 us    |                        }
>  2)   0.728 us    |                      }
>  2)   0.023 us    |                      __fsnotify_parent();
>  2)   0.034 us    |                      fsnotify();
>  2)   1.359 us    |                    }
>  2)   0.045 us    |                    file_ra_state_init();
>  2)   2.548 us    |                  }
>  2)   2.733 us    |                }
>  2)   4.667 us    |              }
>  2)               |              ima_file_check() {
>  2)   0.028 us    |                process_measurement();
>  2)   0.211 us    |              }
>  2)   0.023 us    |              nfserrno [nfsd]();
>  2) + 20.549 us   |            }
>
> And that’s 20us total for this nfsd_open. Seems like a lot of
> work for each NFSv3 WRITE operation, and plenty of opportunities
> for lock contention if my server happened to be otherwise busy.
>
> Note these figures are a little inflated because ftrace itself
> adds some latency.
>
> Anyway, making fh_verify faster could have a significant
> impact on per-op latency with fast storage. The recently-proposed
> NFSD open cache by itself could eliminate the need to perform
> fh_verify so often, for example.
>

Have you compared with Jeff's patchset? It would be very interesting
to see how that affects your numbers.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Call graph of nfsd_open
  2015-12-03 19:53 ` Trond Myklebust
@ 2015-12-03 19:56   ` Chuck Lever
  2015-12-03 21:00     ` Trond Myklebust
  2015-12-15  1:34   ` Chuck Lever
  1 sibling, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2015-12-03 19:56 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List


> On Dec 3, 2015, at 2:53 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> Hi Chuck,
> 
> On Thu, Dec 3, 2015 at 10:31 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> Hi-
>> 
>> I’m looking into NFS WRITE latency on the Linux NFS server.
>> With a high speed network fabric and ultra-fast storage (think
>> persistent memory) we can get the per-WRITE latency under 50us
>> (as observed on the wire at the server).
>> 
>> One source of the remaining latency appears to be nfsd_open.
>> Here is an example of a call graph captured during a simple
>> iozone 4KB direct write workload (v4.4-rc3):
>> 
>> 2)               |            nfsd_open [nfsd]() {
>> 2)               |              fh_verify [nfsd]() {
>> 2)               |                rqst_exp_find [nfsd]() {
>> 2)               |                  exp_find [nfsd]() {
>> 2)               |                    exp_find_key [nfsd]() {
>> 2)   0.043 us    |                      svc_expkey_hash [nfsd]();
>> 2)               |                      sunrpc_cache_lookup [sunrpc]() {
>> 2)   0.027 us    |                        _raw_read_lock();
>> 2)               |                        getboottime64() {
>> 2)   0.022 us    |                          ns_to_timespec();
>> 2)   0.210 us    |                        }
>> 2)   0.021 us    |                        get_seconds();
>> 2)   0.854 us    |                      }
>> 2)               |                      cache_check [sunrpc]() {
>> 2)               |                        getboottime64() {
>> 2)   0.021 us    |                          ns_to_timespec();
>> 2)   0.206 us    |                        }
>> 2)   0.022 us    |                        get_seconds();
>> 2)   0.596 us    |                      }
>> 2)   2.050 us    |                    }
>> 2)               |                    exp_get_by_name [nfsd]() {
>> 2)               |                      svc_export_lookup [nfsd]() {
>> 2)               |                        sunrpc_cache_lookup [sunrpc]() {
>> 2)   0.022 us    |                          _raw_read_lock();
>> 2)   0.024 us    |                          svc_export_match [nfsd]();
>> 2)               |                          getboottime64() {
>> 2)   0.023 us    |                            ns_to_timespec();
>> 2)   0.210 us    |                          }
>> 2)   0.022 us    |                          get_seconds();
>> 2)   0.997 us    |                        }
>> 2)   1.181 us    |                      }
>> 2)               |                      cache_check [sunrpc]() {
>> 2)               |                        getboottime64() {
>> 2)   0.022 us    |                          ns_to_timespec();
>> 2)   0.210 us    |                        }
>> 2)   0.022 us    |                        get_seconds();
>> 2)   0.592 us    |                      }
>> 2)   2.147 us    |                    }
>> 2)   4.560 us    |                  }
>> 2)   4.768 us    |                }
>> 2)               |                prepare_creds() {
>> 2)               |                  kmem_cache_alloc() {
>> 2)   0.021 us    |                    _cond_resched();
>> 2)   0.233 us    |                  }
>> 2)               |                  security_prepare_creds() {
>> 2)               |                    selinux_cred_prepare() {
>> 2)               |                      kmemdup() {
>> 2)               |                        __kmalloc_track_caller() {
>> 2)   0.022 us    |                          kmalloc_slab();
>> 2)   0.023 us    |                          _cond_resched();
>> 2)   0.448 us    |                        }
>> 2)   0.640 us    |                      }
>> 2)   0.838 us    |                    }
>> 2)   1.129 us    |                  }
>> 2)   1.808 us    |                }
>> 2)   0.022 us    |                override_creds();
>> 2)               |                exportfs_decode_fh() {
>> 2)               |                  shmem_fh_to_dentry() {
>> 2)               |                    ilookup5() {
>> 2)               |                      ilookup5_nowait() {
>> 2)   0.026 us    |                        _raw_spin_lock();
>> 2)               |                        find_inode() {
>> 2)   0.031 us    |                          shmem_match();
>> 2)   0.029 us    |                          _raw_spin_lock();
>> 2)   0.429 us    |                        }
>> 2)   0.826 us    |                      }
>> 2)   0.022 us    |                      _cond_resched();
>> 2)   0.021 us    |                      _cond_resched();
>> 2)   1.417 us    |                    }
>> 2)               |                    d_find_alias() {
>> 2)   0.022 us    |                      _raw_spin_lock();
>> 2)   0.027 us    |                      _raw_spin_lock();
>> 2)   0.423 us    |                    }
>> 2)   0.035 us    |                    iput();
>> 2)   2.410 us    |                  }
>> 2)               |                  find_acceptable_alias() {
>> 2)   0.026 us    |                    nfsd_acceptable [nfsd]();
>> 2)   0.220 us    |                  }
>> 2)   3.032 us    |                }
>> 2)               |                nfsd_setuser_and_check_port [nfsd]() {
>> 2)   0.023 us    |                  nfsexp_flags [nfsd]();
>> 2)               |                  nfsd_setuser [nfsd]() {
>> 2)               |                    revert_creds() {
>> 2)               |                      __put_cred() {
>> 2)               |                        call_rcu_sched() {
>> 2)               |                          __call_rcu() {
>> 2)   0.049 us    |                            __call_rcu_nocb_enqueue();
>> 2)   0.265 us    |                          }
>> 2)   0.446 us    |                        }
>> 2)   0.661 us    |                      }
>> 2)   0.854 us    |                    }
>> 2)               |                    prepare_creds() {
>> 2)               |                      kmem_cache_alloc() {
>> 2)   0.022 us    |                        _cond_resched();
>> 2)   0.219 us    |                      }
>> 2)               |                      security_prepare_creds() {
>> 2)               |                        selinux_cred_prepare() {
>> 2)               |                          kmemdup() {
>> 2)               |                            __kmalloc_track_caller() {
>> 2)   0.024 us    |                              kmalloc_slab();
>> 2)   0.022 us    |                              _cond_resched();
>> 2)   0.412 us    |                            }
>> 2)   0.615 us    |                          }
>> 2)   0.801 us    |                        }
>> 2)   0.993 us    |                      }
>> 2)   1.651 us    |                    }
>> 2)   0.033 us    |                    set_groups();
>> 2)   0.023 us    |                    override_creds();
>> 2)   3.323 us    |                  }
>> 2)   0.029 us    |                  nfserrno [nfsd]();
>> 2)   3.919 us    |                }
>> 2)   0.025 us    |                check_nfsd_access [nfsd]();
>> 2)   0.022 us    |                nfsd_permission [nfsd]();
>> 2) + 14.904 us   |              }
>> 
>> So, 15us for fh_verify by itself.
>> 
>> 2)               |              dentry_open() {
>> 2)               |                get_empty_filp() {
>> 2)               |                  kmem_cache_alloc() {
>> 2)   0.021 us    |                    _cond_resched();
>> 2)   0.325 us    |                  }
>> 2)               |                  security_file_alloc() {
>> 2)               |                    selinux_file_alloc_security() {
>> 2)               |                      kmem_cache_alloc() {
>> 2)   0.022 us    |                        _cond_resched();
>> 2)   0.251 us    |                      }
>> 2)   0.441 us    |                    }
>> 2)   0.633 us    |                  }
>> 2)   0.022 us    |                  __mutex_init();
>> 2)   1.572 us    |                }
>> 2)               |                vfs_open() {
>> 2)               |                  do_dentry_open() {
>> 2)               |                    path_get() {
>> 2)   0.025 us    |                      mntget();
>> 2)   0.313 us    |                    }
>> 2)   0.029 us    |                    try_module_get();
>> 2)               |                    security_file_open() {
>> 2)               |                      selinux_file_open() {
>> 2)   0.024 us    |                        avc_policy_seqno();
>> 2)               |                        inode_has_perm.isra.21() {
>> 2)   0.148 us    |                          avc_has_perm();
>> 2)   0.332 us    |                        }
>> 2)   0.728 us    |                      }
>> 2)   0.023 us    |                      __fsnotify_parent();
>> 2)   0.034 us    |                      fsnotify();
>> 2)   1.359 us    |                    }
>> 2)   0.045 us    |                    file_ra_state_init();
>> 2)   2.548 us    |                  }
>> 2)   2.733 us    |                }
>> 2)   4.667 us    |              }
>> 2)               |              ima_file_check() {
>> 2)   0.028 us    |                process_measurement();
>> 2)   0.211 us    |              }
>> 2)   0.023 us    |              nfserrno [nfsd]();
>> 2) + 20.549 us   |            }
>> 
>> And that’s 20us total for this nfsd_open. Seems like a lot of
>> work for each NFSv3 WRITE operation, and plenty of opportunities
>> for lock contention if my server happened to be otherwise busy.
>> 
>> Note these figures are a little inflated because ftrace itself
>> adds some latency.
>> 
>> Anyway, making fh_verify faster could have a significant
>> impact on per-op latency with fast storage. The recently-proposed
>> NFSD open cache by itself could eliminate the need to perform
>> fh_verify so often, for example.
>> 
> 
> Have you compared with Jeff's patchset? It would be very interesting
> to see how that affects your numbers.

Bruce asked me to post what I had so far, to help justify
adding the NFSD open cache, which has been tabled. Do you
know where Jeff keeps this work, and is it merged up to
4.4-rc ?

--
Chuck Lever





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Call graph of nfsd_open
  2015-12-03 19:56   ` Chuck Lever
@ 2015-12-03 21:00     ` Trond Myklebust
  0 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2015-12-03 21:00 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Thu, Dec 3, 2015 at 11:56 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> Bruce asked me to post what I had so far, to help justify
> adding the NFSD open cache, which has been tabled. Do you
> know where Jeff keeps this work, and is it merged up to
> 4.4-rc ?
>

IIRC, he posted them to the list a couple of months ago. He's on
vacation right now, but I'll see if I can dig them up for you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Call graph of nfsd_open
  2015-12-03 18:31 Call graph of nfsd_open Chuck Lever
  2015-12-03 19:53 ` Trond Myklebust
@ 2015-12-04  0:01 ` Jeff Layton
  2015-12-04 14:42   ` Chuck Lever
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2015-12-04  0:01 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List, trond.myklebust

On Thu, 3 Dec 2015 13:31:13 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> Hi-
> 
> I’m looking into NFS WRITE latency on the Linux NFS server.
> With a high speed network fabric and ultra-fast storage (think
> persistent memory) we can get the per-WRITE latency under 50us
> (as observed on the wire at the server).
> 
> One source of the remaining latency appears to be nfsd_open.
> Here is an example of a call graph captured during a simple
> iozone 4KB direct write workload (v4.4-rc3):
> 
>  2)               |            nfsd_open [nfsd]() {
>  2)               |              fh_verify [nfsd]() {
>  2)               |                rqst_exp_find [nfsd]() {
>  2)               |                  exp_find [nfsd]() {
>  2)               |                    exp_find_key [nfsd]() {
>  2)   0.043 us    |                      svc_expkey_hash [nfsd]();
>  2)               |                      sunrpc_cache_lookup [sunrpc]() {
>  2)   0.027 us    |                        _raw_read_lock();
>  2)               |                        getboottime64() {
>  2)   0.022 us    |                          ns_to_timespec();
>  2)   0.210 us    |                        }
>  2)   0.021 us    |                        get_seconds();
>  2)   0.854 us    |                      }
>  2)               |                      cache_check [sunrpc]() {
>  2)               |                        getboottime64() {
>  2)   0.021 us    |                          ns_to_timespec();
>  2)   0.206 us    |                        }
>  2)   0.022 us    |                        get_seconds();
>  2)   0.596 us    |                      }
>  2)   2.050 us    |                    }
>  2)               |                    exp_get_by_name [nfsd]() {
>  2)               |                      svc_export_lookup [nfsd]() {
>  2)               |                        sunrpc_cache_lookup [sunrpc]() {
>  2)   0.022 us    |                          _raw_read_lock();
>  2)   0.024 us    |                          svc_export_match [nfsd]();
>  2)               |                          getboottime64() {
>  2)   0.023 us    |                            ns_to_timespec();
>  2)   0.210 us    |                          }
>  2)   0.022 us    |                          get_seconds();
>  2)   0.997 us    |                        }
>  2)   1.181 us    |                      }
>  2)               |                      cache_check [sunrpc]() {
>  2)               |                        getboottime64() {
>  2)   0.022 us    |                          ns_to_timespec();
>  2)   0.210 us    |                        }
>  2)   0.022 us    |                        get_seconds();
>  2)   0.592 us    |                      }
>  2)   2.147 us    |                    }
>  2)   4.560 us    |                  }
>  2)   4.768 us    |                }
>  2)               |                prepare_creds() {
>  2)               |                  kmem_cache_alloc() {
>  2)   0.021 us    |                    _cond_resched();
>  2)   0.233 us    |                  }
>  2)               |                  security_prepare_creds() {
>  2)               |                    selinux_cred_prepare() {
>  2)               |                      kmemdup() {
>  2)               |                        __kmalloc_track_caller() {
>  2)   0.022 us    |                          kmalloc_slab();
>  2)   0.023 us    |                          _cond_resched();
>  2)   0.448 us    |                        }
>  2)   0.640 us    |                      }
>  2)   0.838 us    |                    }
>  2)   1.129 us    |                  }
>  2)   1.808 us    |                }
>  2)   0.022 us    |                override_creds();
>  2)               |                exportfs_decode_fh() {
>  2)               |                  shmem_fh_to_dentry() {
>  2)               |                    ilookup5() {
>  2)               |                      ilookup5_nowait() {
>  2)   0.026 us    |                        _raw_spin_lock();
>  2)               |                        find_inode() {
>  2)   0.031 us    |                          shmem_match();
>  2)   0.029 us    |                          _raw_spin_lock();
>  2)   0.429 us    |                        }
>  2)   0.826 us    |                      }
>  2)   0.022 us    |                      _cond_resched();
>  2)   0.021 us    |                      _cond_resched();
>  2)   1.417 us    |                    }
>  2)               |                    d_find_alias() {
>  2)   0.022 us    |                      _raw_spin_lock();
>  2)   0.027 us    |                      _raw_spin_lock();
>  2)   0.423 us    |                    }
>  2)   0.035 us    |                    iput();
>  2)   2.410 us    |                  }
>  2)               |                  find_acceptable_alias() {
>  2)   0.026 us    |                    nfsd_acceptable [nfsd]();
>  2)   0.220 us    |                  }
>  2)   3.032 us    |                }
>  2)               |                nfsd_setuser_and_check_port [nfsd]() {
>  2)   0.023 us    |                  nfsexp_flags [nfsd]();
>  2)               |                  nfsd_setuser [nfsd]() {
>  2)               |                    revert_creds() {
>  2)               |                      __put_cred() {
>  2)               |                        call_rcu_sched() {
>  2)               |                          __call_rcu() {
>  2)   0.049 us    |                            __call_rcu_nocb_enqueue();
>  2)   0.265 us    |                          }
>  2)   0.446 us    |                        }
>  2)   0.661 us    |                      }
>  2)   0.854 us    |                    }
>  2)               |                    prepare_creds() {
>  2)               |                      kmem_cache_alloc() {
>  2)   0.022 us    |                        _cond_resched();
>  2)   0.219 us    |                      }
>  2)               |                      security_prepare_creds() {
>  2)               |                        selinux_cred_prepare() {
>  2)               |                          kmemdup() {
>  2)               |                            __kmalloc_track_caller() {
>  2)   0.024 us    |                              kmalloc_slab();
>  2)   0.022 us    |                              _cond_resched();
>  2)   0.412 us    |                            }
>  2)   0.615 us    |                          }
>  2)   0.801 us    |                        }
>  2)   0.993 us    |                      }
>  2)   1.651 us    |                    }
>  2)   0.033 us    |                    set_groups();
>  2)   0.023 us    |                    override_creds();
>  2)   3.323 us    |                  }
>  2)   0.029 us    |                  nfserrno [nfsd]();
>  2)   3.919 us    |                }
>  2)   0.025 us    |                check_nfsd_access [nfsd]();
>  2)   0.022 us    |                nfsd_permission [nfsd]();
>  2) + 14.904 us   |              }
> 
> So, 15us for fh_verify by itself.
> 
>  2)               |              dentry_open() {
>  2)               |                get_empty_filp() {
>  2)               |                  kmem_cache_alloc() {
>  2)   0.021 us    |                    _cond_resched();
>  2)   0.325 us    |                  }
>  2)               |                  security_file_alloc() {
>  2)               |                    selinux_file_alloc_security() {
>  2)               |                      kmem_cache_alloc() {
>  2)   0.022 us    |                        _cond_resched();
>  2)   0.251 us    |                      }
>  2)   0.441 us    |                    }
>  2)   0.633 us    |                  }
>  2)   0.022 us    |                  __mutex_init();
>  2)   1.572 us    |                }
>  2)               |                vfs_open() {
>  2)               |                  do_dentry_open() {
>  2)               |                    path_get() {
>  2)   0.025 us    |                      mntget();
>  2)   0.313 us    |                    }
>  2)   0.029 us    |                    try_module_get();
>  2)               |                    security_file_open() {
>  2)               |                      selinux_file_open() {
>  2)   0.024 us    |                        avc_policy_seqno();
>  2)               |                        inode_has_perm.isra.21() {
>  2)   0.148 us    |                          avc_has_perm();
>  2)   0.332 us    |                        }
>  2)   0.728 us    |                      }
>  2)   0.023 us    |                      __fsnotify_parent();
>  2)   0.034 us    |                      fsnotify();
>  2)   1.359 us    |                    }
>  2)   0.045 us    |                    file_ra_state_init();
>  2)   2.548 us    |                  }
>  2)   2.733 us    |                }
>  2)   4.667 us    |              }
>  2)               |              ima_file_check() {
>  2)   0.028 us    |                process_measurement();
>  2)   0.211 us    |              }
>  2)   0.023 us    |              nfserrno [nfsd]();
>  2) + 20.549 us   |            }
> 
> And that’s 20us total for this nfsd_open. Seems like a lot of
> work for each NFSv3 WRITE operation, and plenty of opportunities
> for lock contention if my server happened to be otherwise busy.
> 
> Note these figures are a little inflated because ftrace itself
> adds some latency.
> 
> Anyway, making fh_verify faster could have a significant
> impact on per-op latency with fast storage. The recently-proposed
> NFSD open cache by itself could eliminate the need to perform
> fh_verify so often, for example.
> 

No, we still have to do a fh_verify. We still have to do a
nfsd_permission check, etc (in particular). What we _can_ avoid is
calling down into the vfs layer to do the actual open. Maybe it's
possible to optimize away some of the fh_verify processing though?
Personally, I think the fh_* routines do too much...

In any case, the latest set that I have is here:

    https://git.samba.org/?p=jlayton/linux.git;a=shortlog;h=refs/heads/nfsd-4.5

...but note that Trond and Tao have merged some fixes to our internal
repo for this that I have not yet had time to review yet. I'll probably
pick those up next week sometime once I've gotten caught up.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Call graph of nfsd_open
  2015-12-04  0:01 ` Jeff Layton
@ 2015-12-04 14:42   ` Chuck Lever
  0 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2015-12-04 14:42 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Linux NFS Mailing List, Trond Myklebust


> On Dec 3, 2015, at 7:01 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> 
> On Thu, 3 Dec 2015 13:31:13 -0500
> Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> Hi-
>> 
>> I’m looking into NFS WRITE latency on the Linux NFS server.
>> With a high speed network fabric and ultra-fast storage (think
>> persistent memory) we can get the per-WRITE latency under 50us
>> (as observed on the wire at the server).
>> 
>> One source of the remaining latency appears to be nfsd_open.
>> Here is an example of a call graph captured during a simple
>> iozone 4KB direct write workload (v4.4-rc3):
>> 
>> 2)               |            nfsd_open [nfsd]() {
>> 2)               |              fh_verify [nfsd]() {
>> 2)               |                rqst_exp_find [nfsd]() {
>> 2)               |                  exp_find [nfsd]() {
>> 2)               |                    exp_find_key [nfsd]() {
>> 2)   0.043 us    |                      svc_expkey_hash [nfsd]();
>> 2)               |                      sunrpc_cache_lookup [sunrpc]() {
>> 2)   0.027 us    |                        _raw_read_lock();
>> 2)               |                        getboottime64() {
>> 2)   0.022 us    |                          ns_to_timespec();
>> 2)   0.210 us    |                        }
>> 2)   0.021 us    |                        get_seconds();
>> 2)   0.854 us    |                      }
>> 2)               |                      cache_check [sunrpc]() {
>> 2)               |                        getboottime64() {
>> 2)   0.021 us    |                          ns_to_timespec();
>> 2)   0.206 us    |                        }
>> 2)   0.022 us    |                        get_seconds();
>> 2)   0.596 us    |                      }
>> 2)   2.050 us    |                    }
>> 2)               |                    exp_get_by_name [nfsd]() {
>> 2)               |                      svc_export_lookup [nfsd]() {
>> 2)               |                        sunrpc_cache_lookup [sunrpc]() {
>> 2)   0.022 us    |                          _raw_read_lock();
>> 2)   0.024 us    |                          svc_export_match [nfsd]();
>> 2)               |                          getboottime64() {
>> 2)   0.023 us    |                            ns_to_timespec();
>> 2)   0.210 us    |                          }
>> 2)   0.022 us    |                          get_seconds();
>> 2)   0.997 us    |                        }
>> 2)   1.181 us    |                      }
>> 2)               |                      cache_check [sunrpc]() {
>> 2)               |                        getboottime64() {
>> 2)   0.022 us    |                          ns_to_timespec();
>> 2)   0.210 us    |                        }
>> 2)   0.022 us    |                        get_seconds();
>> 2)   0.592 us    |                      }
>> 2)   2.147 us    |                    }
>> 2)   4.560 us    |                  }
>> 2)   4.768 us    |                }
>> 2)               |                prepare_creds() {
>> 2)               |                  kmem_cache_alloc() {
>> 2)   0.021 us    |                    _cond_resched();
>> 2)   0.233 us    |                  }
>> 2)               |                  security_prepare_creds() {
>> 2)               |                    selinux_cred_prepare() {
>> 2)               |                      kmemdup() {
>> 2)               |                        __kmalloc_track_caller() {
>> 2)   0.022 us    |                          kmalloc_slab();
>> 2)   0.023 us    |                          _cond_resched();
>> 2)   0.448 us    |                        }
>> 2)   0.640 us    |                      }
>> 2)   0.838 us    |                    }
>> 2)   1.129 us    |                  }
>> 2)   1.808 us    |                }
>> 2)   0.022 us    |                override_creds();
>> 2)               |                exportfs_decode_fh() {
>> 2)               |                  shmem_fh_to_dentry() {
>> 2)               |                    ilookup5() {
>> 2)               |                      ilookup5_nowait() {
>> 2)   0.026 us    |                        _raw_spin_lock();
>> 2)               |                        find_inode() {
>> 2)   0.031 us    |                          shmem_match();
>> 2)   0.029 us    |                          _raw_spin_lock();
>> 2)   0.429 us    |                        }
>> 2)   0.826 us    |                      }
>> 2)   0.022 us    |                      _cond_resched();
>> 2)   0.021 us    |                      _cond_resched();
>> 2)   1.417 us    |                    }
>> 2)               |                    d_find_alias() {
>> 2)   0.022 us    |                      _raw_spin_lock();
>> 2)   0.027 us    |                      _raw_spin_lock();
>> 2)   0.423 us    |                    }
>> 2)   0.035 us    |                    iput();
>> 2)   2.410 us    |                  }
>> 2)               |                  find_acceptable_alias() {
>> 2)   0.026 us    |                    nfsd_acceptable [nfsd]();
>> 2)   0.220 us    |                  }
>> 2)   3.032 us    |                }
>> 2)               |                nfsd_setuser_and_check_port [nfsd]() {
>> 2)   0.023 us    |                  nfsexp_flags [nfsd]();
>> 2)               |                  nfsd_setuser [nfsd]() {
>> 2)               |                    revert_creds() {
>> 2)               |                      __put_cred() {
>> 2)               |                        call_rcu_sched() {
>> 2)               |                          __call_rcu() {
>> 2)   0.049 us    |                            __call_rcu_nocb_enqueue();
>> 2)   0.265 us    |                          }
>> 2)   0.446 us    |                        }
>> 2)   0.661 us    |                      }
>> 2)   0.854 us    |                    }
>> 2)               |                    prepare_creds() {
>> 2)               |                      kmem_cache_alloc() {
>> 2)   0.022 us    |                        _cond_resched();
>> 2)   0.219 us    |                      }
>> 2)               |                      security_prepare_creds() {
>> 2)               |                        selinux_cred_prepare() {
>> 2)               |                          kmemdup() {
>> 2)               |                            __kmalloc_track_caller() {
>> 2)   0.024 us    |                              kmalloc_slab();
>> 2)   0.022 us    |                              _cond_resched();
>> 2)   0.412 us    |                            }
>> 2)   0.615 us    |                          }
>> 2)   0.801 us    |                        }
>> 2)   0.993 us    |                      }
>> 2)   1.651 us    |                    }
>> 2)   0.033 us    |                    set_groups();
>> 2)   0.023 us    |                    override_creds();
>> 2)   3.323 us    |                  }
>> 2)   0.029 us    |                  nfserrno [nfsd]();
>> 2)   3.919 us    |                }
>> 2)   0.025 us    |                check_nfsd_access [nfsd]();
>> 2)   0.022 us    |                nfsd_permission [nfsd]();
>> 2) + 14.904 us   |              }
>> 
>> So, 15us for fh_verify by itself.
>> 
>> 2)               |              dentry_open() {
>> 2)               |                get_empty_filp() {
>> 2)               |                  kmem_cache_alloc() {
>> 2)   0.021 us    |                    _cond_resched();
>> 2)   0.325 us    |                  }
>> 2)               |                  security_file_alloc() {
>> 2)               |                    selinux_file_alloc_security() {
>> 2)               |                      kmem_cache_alloc() {
>> 2)   0.022 us    |                        _cond_resched();
>> 2)   0.251 us    |                      }
>> 2)   0.441 us    |                    }
>> 2)   0.633 us    |                  }
>> 2)   0.022 us    |                  __mutex_init();
>> 2)   1.572 us    |                }
>> 2)               |                vfs_open() {
>> 2)               |                  do_dentry_open() {
>> 2)               |                    path_get() {
>> 2)   0.025 us    |                      mntget();
>> 2)   0.313 us    |                    }
>> 2)   0.029 us    |                    try_module_get();
>> 2)               |                    security_file_open() {
>> 2)               |                      selinux_file_open() {
>> 2)   0.024 us    |                        avc_policy_seqno();
>> 2)               |                        inode_has_perm.isra.21() {
>> 2)   0.148 us    |                          avc_has_perm();
>> 2)   0.332 us    |                        }
>> 2)   0.728 us    |                      }
>> 2)   0.023 us    |                      __fsnotify_parent();
>> 2)   0.034 us    |                      fsnotify();
>> 2)   1.359 us    |                    }
>> 2)   0.045 us    |                    file_ra_state_init();
>> 2)   2.548 us    |                  }
>> 2)   2.733 us    |                }
>> 2)   4.667 us    |              }
>> 2)               |              ima_file_check() {
>> 2)   0.028 us    |                process_measurement();
>> 2)   0.211 us    |              }
>> 2)   0.023 us    |              nfserrno [nfsd]();
>> 2) + 20.549 us   |            }
>> 
>> And that’s 20us total for this nfsd_open. Seems like a lot of
>> work for each NFSv3 WRITE operation, and plenty of opportunities
>> for lock contention if my server happened to be otherwise busy.
>> 
>> Note these figures are a little inflated because ftrace itself
>> adds some latency.
>> 
>> Anyway, making fh_verify faster could have a significant
>> impact on per-op latency with fast storage. The recently-proposed
>> NFSD open cache by itself could eliminate the need to perform
>> fh_verify so often, for example.
>> 
> 
> No, we still have to do a fh_verify. We still have to do a
> nfsd_permission check, etc (in particular). What we _can_ avoid is
> calling down into the vfs layer to do the actual open. Maybe it's
> possible to optimize away some of the fh_verify processing though?
> Personally, I think the fh_* routines do too much

fh_verify turns out to be the expensive part, about three-fourths
of the latency of nfsd_open in my environment. It would be very
nice to reduce or eliminate this expense.

But this is with 4KB NFSv3 WRITE operations. As the payload size
increases, latency is dominated by how long it takes to move
the data on the network and other housekeeping I have yet to
look into. More to come.


> In any case, the latest set that I have is here:
> 
>    https://git.samba.org/?p=jlayton/linux.git;a=shortlog;h=refs/heads/nfsd-4.5
> 
> ...but note that Trond and Tao have merged some fixes to our internal
> repo for this that I have not yet had time to review yet. I'll probably
> pick those up next week sometime once I've gotten caught up.
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Call graph of nfsd_open
  2015-12-03 19:53 ` Trond Myklebust
  2015-12-03 19:56   ` Chuck Lever
@ 2015-12-15  1:34   ` Chuck Lever
  1 sibling, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2015-12-15  1:34 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List


> On Dec 3, 2015, at 2:53 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> Hi Chuck,
> 
> On Thu, Dec 3, 2015 at 10:31 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> Hi-
>> 
>> I’m looking into NFS WRITE latency on the Linux NFS server.
>> With a high speed network fabric and ultra-fast storage (think
>> persistent memory) we can get the per-WRITE latency under 50us
>> (as observed on the wire at the server).
>> 
>> One source of the remaining latency appears to be nfsd_open.
>> Here is an example of a call graph captured during a simple
>> iozone 4KB direct write workload (v4.4-rc3):

[ ... function call graph snipped ... ]

>> And that’s 20us total for this nfsd_open. Seems like a lot of
>> work for each NFSv3 WRITE operation, and plenty of opportunities
>> for lock contention if my server happened to be otherwise busy.
>> 
>> Note these figures are a little inflated because ftrace itself
>> adds some latency.
>> 
>> Anyway, making fh_verify faster could have a significant
>> impact on per-op latency with fast storage. The recently-proposed
>> NFSD open cache by itself could eliminate the need to perform
>> fh_verify so often, for example.
>> 
> 
> Have you compared with Jeff's patchset? It would be very interesting
> to see how that affects your numbers.

I compared the latency of read(2) and write(2) as measured
by iozone on the client.

I pulled Jeff's patches from samba.org, they are based on
4.4-rc4. My patches are my for-4.5 series, also based on
4.4-rc4. I've disabled FRMR on the server for RDMA Reads.
The share is a tmpfs.

The client is 4.4-rc4 with my for-4.5 patches applied.
1MB rsize and wsize.

The fabric is FDR, both systems have CX-3 Pro adapters
and report 56Gbps link speed.

The test is "iozone -az -i0 -i1 -y1k -s128m -I -N". The
results are microseconds per system call. Lower is
better.

With Jeff's NFSD open cache:

              KB  reclen   write rewrite    read    reread
          131072       1      48      48       43       43
          131072       2      49      49       43       43
          131072       4      50      50       43       43
          131072       8      54      52       45       46
          131072      16      59      58       48       48
          131072      32      71      69       56       56
          131072      64      97      92       75       78
          131072     128     147     142      105      105
          131072     256     237     216      171      162
          131072     512     492     457      268      267
          131072    1024     806     748      529      526
          131072    2048    1257    1189      711      696
          131072    4096    1884    1718     1019     1016
          131072    8192    3452    2958     1714     1710
          131072   16384    6835    5416     3132     3134

With my for-4.5 patches (no open cache):

              KB  reclen   write rewrite    read    reread
          131072       1      49      50       43       43
          131072       2      50      49       43       43
          131072       4      51      50       43       43
          131072       8      55      53       45       45
          131072      16      60      58       48       48
          131072      32      70      68       53       54
          131072      64      91      85       69       69
          131072     128     140     130       95       96
          131072     256     214     203      145      147
          131072     512     480     410      253      249
          131072    1024     755     698      508      477
          131072    2048    1206    1049      667      656
          131072    4096    1786    1632      977      977
          131072    8192    3243    2851     1670     1672
          131072   16384    6579    5276     3091     3092

You can see below 64KB I/O size, Jeff's open cache saves
about a microsecond per NFS WRITE. NFS READ is not affected.

Above that, other effects dominate the cost per I/O. I can't
explain yet why my for-4.5 server looks better with larger I/Os.

NFS WRITE takes longer than NFS READ even with a memory
file system. The difference seems to be dealing with all
the pages: on the server, this is done by the local file
system; on the client, there's a palpable O/S cost per page
to set up memory registration (the FAST_REG WR itself is
quick), and lock contention dealing with the NFS WRITE
completions.

Looking at ways to speed up fh_verify would help the smaller
WRITEs, and reducing the cost of page management might be
good for large I/O.


--
Chuck Lever





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-15  1:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 18:31 Call graph of nfsd_open Chuck Lever
2015-12-03 19:53 ` Trond Myklebust
2015-12-03 19:56   ` Chuck Lever
2015-12-03 21:00     ` Trond Myklebust
2015-12-15  1:34   ` Chuck Lever
2015-12-04  0:01 ` Jeff Layton
2015-12-04 14:42   ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.