Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Alkis Georgopoulos <alkisg@gmail.com>
To: Trond Myklebust <trondmy@gmail.com>,
	Anna Schumaker <Anna.Schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] NFS: Optimise the default readahead size
Date: Mon, 23 Sep 2019 09:36:55 +0300
Message-ID: <a9b7d011-b244-f7d4-4545-d302dd51b5b7@gmail.com> (raw)
In-Reply-To: <20190922190749.54156-1-trond.myklebust@hammerspace.com>

Thank you Trond, you're awesome!

I don't know if it's appropriate, but I thought I'd send some recent 
benchmarks about this:

Netbooting a system over 100 Mbps,tcp,timeo=600,rsize=1M,wsize=1M,
then `rm -rf .mozilla; echo 3>/proc/sys/vm/drop_caches; firefox`

| Readahead | Boot sec | Boot MB | Firefox sec | Firefox MB |
|-----------|----------|---------|-------------|------------|
|    4 KB   |    34    |   158   |     27      |    120     |
|  128 KB   |    36    |   355   |     27      |    247     |
|    1 MB   |    83    |  1210   |     60      |    661     |

If I understand it correctly, the new default is 128 KB, which feels 
like a great generic default, while for remote / or /home for multiple 
clients, 4 KB might be more appropriate, so software like LTSP or klibc 
nfsmount that focus there, may adjust readahead from the 
/sys/devices/virtual/bdi interface.

Thanks again,
Alkis Georgopoulos
LTSP developer


On 9/22/19 10:07 PM, Trond Myklebust wrote:
> In the years since the max readahead size was fixed in NFS, a number of
> things have happened:
> - Users can now set the value directly using /sys/class/bdi
> - NFS max supported block sizes have increased by several orders of
>    magnitude from 64K to 1MB.
> - Disk access latencies are orders of magnitude faster due to SSD + NVME.
> 
> In particular note that if the server is advertising 1MB as the optimal
> read size, as that will set the readahead size to 15MB.
> Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES.
> However let's inform the VM about our preferred block size so that it
> can choose to round up in cases where that makes sense.
> 
> Reported-by: Alkis Georgopoulos <alkisg@gmail.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>   fs/nfs/internal.h | 8 --------
>   fs/nfs/super.c    | 9 ++++++++-
>   2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index e64f810223be..447a3c17fa8e 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -16,14 +16,6 @@ extern const struct export_operations nfs_export_ops;
>   
>   struct nfs_string;
>   
> -/* Maximum number of readahead requests
> - * FIXME: this should really be a sysctl so that users may tune it to suit
> - *        their needs. People that do NFS over a slow network, might for
> - *        instance want to reduce it to something closer to 1 for improved
> - *        interactive response.
> - */
> -#define NFS_MAX_READAHEAD	(RPC_DEF_SLOT_TABLE - 1)
> -
>   static inline void nfs_attr_check_mountpoint(struct super_block *parent, struct nfs_fattr *fattr)
>   {
>   	if (!nfs_fsid_equal(&NFS_SB(parent)->fsid, &fattr->fsid))
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 703f595dce90..c96194e28692 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -2627,6 +2627,13 @@ int nfs_clone_sb_security(struct super_block *s, struct dentry *mntroot,
>   }
>   EXPORT_SYMBOL_GPL(nfs_clone_sb_security);
>   
> +static void nfs_set_readahead(struct backing_dev_info *bdi,
> +			      unsigned long iomax_pages)
> +{
> +	bdi->ra_pages = VM_READAHEAD_PAGES;
> +	bdi->io_pages = iomax_pages;
> +}
> +
>   struct dentry *nfs_fs_mount_common(struct nfs_server *server,
>   				   int flags, const char *dev_name,
>   				   struct nfs_mount_info *mount_info,
> @@ -2669,7 +2676,7 @@ struct dentry *nfs_fs_mount_common(struct nfs_server *server,
>   			mntroot = ERR_PTR(error);
>   			goto error_splat_super;
>   		}
> -		s->s_bdi->ra_pages = server->rpages * NFS_MAX_READAHEAD;
> +		nfs_set_readahead(s->s_bdi, server->rpages);
>   		server->super = s;
>   	}
>   
> 


      reply index

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-22 19:07 Trond Myklebust
2019-09-23  6:36 ` Alkis Georgopoulos [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9b7d011-b244-f7d4-4545-d302dd51b5b7@gmail.com \
    --to=alkisg@gmail.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git