Re: [PATCH 2/2] NFSD: limit the number of v4 clients to 4096 per 4GB of system memory

From: Chuck Lever III <chuck.lever@oracle.com>
To: Dai Ngo <dai.ngo@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 2/2] NFSD: limit the number of v4 clients to 4096 per 4GB of system memory
Date: Tue, 12 Jul 2022 21:54:12 +0000	[thread overview]
Message-ID: <D112EBD0-D062-498C-A15F-65A44097AC6B@oracle.com> (raw)
In-Reply-To: <1657656660-16647-3-git-send-email-dai.ngo@oracle.com>

Hello Dai, lovely to see this!

> On Jul 12, 2022, at 4:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> Currently there is no limit on how many v4 clients are supported
> by the system. This can be a problem in systems with small memory
> configuration to function properly when a very large number of
> clients exist that creates memory shortage conditions.
> 
> This patch enforces a limit of 4096 NFSv4 clients, including courtesy
> clients, per 4GB of system memory.  When the number of the clients
> reaches the limit, requests that create new clients are returned
> with NFS4ERR_DELAY. The laundromat detects this condition and removes
> older courtesy clients. Due to the overhead of the upcall to remove
> the client record, the maximun number of clients the laundromat
> removes on each run is limited to 128. This is done to ensure the
> laundromat can still process other tasks in a timely manner.
> 
> Since there is now a limit of the number of clients, the 24-hr
> idle time limit of courtesy client is no longer needed and was
> removed.
> 
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> fs/nfsd/netns.h     |  1 +
> fs/nfsd/nfs4state.c | 17 +++++++++++++----
> fs/nfsd/nfsctl.c    |  8 ++++++++
> 3 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index ce864f001a3e..8d72b302a49c 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -191,6 +191,7 @@ struct nfsd_net {
> 	siphash_key_t		siphash_key;
> 
> 	atomic_t		nfs4_client_count;
> +	unsigned int		nfs4_max_clients;
> };
> 
> /* Simple check to find out if a given net was properly initialized */
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 30e16d9e8657..e54db346dc00 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -126,6 +126,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_recall_ops;
> static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
> 
> static struct workqueue_struct *laundry_wq;
> +#define	NFSD_CLIENT_MAX_TRIM_PER_RUN	128

Let's move these #defines to a header file instead of scattering
them in the source code. How about fs/nfsd/nfsd.h ?

> int nfsd4_create_laundry_wq(void)
> {
> @@ -2059,6 +2060,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name,
> 	struct nfs4_client *clp;
> 	int i;
> 
> +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients)
> +		return NULL;

So, NFSD will return NFS4ERR_DELAY if it is asked to establish
a new client and we've hit this limit. The next laundromat run
should knock a few lingering COURTESY clients out of the LRU
to make room for a new client.

Maybe you want to kick the laundromat here to get that process
moving sooner?

> 	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
> 	if (clp == NULL)
> 		return NULL;
> @@ -5796,9 +5799,12 @@ static void
> nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist,
> 				struct laundry_time *lt)
> {
> +	unsigned int maxreap = 0, reapcnt = 0;
> 	struct list_head *pos, *next;
> 	struct nfs4_client *clp;
> 
> +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients)
> +		maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN;

The idea I guess is "don't reap anything until we exceed the
maximum number of clients". It took me a bit to figure that
out.

> 	INIT_LIST_HEAD(reaplist);
> 	spin_lock(&nn->client_lock);
> 	list_for_each_safe(pos, next, &nn->client_lru) {
> @@ -5809,14 +5815,17 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist,
> 			break;
> 		if (!atomic_read(&clp->cl_rpc_users))
> 			clp->cl_state = NFSD4_COURTESY;
> -		if (!client_has_state(clp) ||
> -				ktime_get_boottime_seconds() >=
> -				(clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT))
> +		if (!client_has_state(clp))
> 			goto exp_client;
> 		if (nfs4_anylock_blockers(clp)) {
> exp_client:
> -			if (!mark_client_expired_locked(clp))
> +			if (!mark_client_expired_locked(clp)) {
> 				list_add(&clp->cl_lru, reaplist);
> +				reapcnt++;
> +			}
> +		} else {
> +			if (reapcnt < maxreap)
> +				goto exp_client;
> 		}
> 	}

Would something like this be more straightforward? I probably
didn't get the logic exactly right.

		if (!nfs4_anylock_blockers(clp))
			if (reapcnt > maxreap)
				continue;
exp_client:
		if (!mark_client_expired_locked(clp)) {
			list_add(&clp->cl_lru, reaplist);
			reapcnt++;
		}
	}

The idea is: once maxreap has been reached, continue walking the
LRU looking for clients to convert from ACTIVE to COURTESY, but
do not reap any more COURTESY clients that might be found.

> 	spin_unlock(&nn->client_lock);
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 547f4c4b9668..223659e15af3 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -96,6 +96,8 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
> #endif
> };
> 
> +#define	NFS4_MAX_CLIENTS_PER_4GB	4096

No need for "MAX" in this name.

And, ditto the above comment: move this to a header file.

> +
> static ssize_t nfsctl_transaction_write(struct file *file, const char __user *buf, size_t size, loff_t *pos)
> {
> 	ino_t ino =  file_inode(file)->i_ino;
> @@ -1462,6 +1464,8 @@ unsigned int nfsd_net_id;
> static __net_init int nfsd_init_net(struct net *net)
> {
> 	int retval;
> +	unsigned long lowmem;
> +	struct sysinfo si;
> 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);

Nit: I prefer the reverse christmas tree style. Can you add
the new stack variables after "struct nfsd_net *nn ..." ?

> 
> 	retval = nfsd_export_init(net);
> @@ -1488,6 +1492,10 @@ static __net_init int nfsd_init_net(struct net *net)
> 	seqlock_init(&nn->writeverf_lock);
> 
> 	atomic_set(&nn->nfs4_client_count, 0);
> +	si_meminfo(&si);
> +	lowmem = (si.totalram - si.totalhigh) * si.mem_unit;

There's no reason to restrict this to lowmem, since we're not
using a struct nfs4_client as the target of I/O.

> +	nn->nfs4_max_clients = (((lowmem * 100) >> 32) *
> +				NFS4_MAX_CLIENTS_PER_4GB) / 100;

On a platform where "unsigned long" is a 32-bit type, will
the shift-right-by-32 continue to work as you expect?

Let's try to simplify this computation, because it isn't
especially clear what is going on. The math might work a
little better if it were "1024 clients per GB" for example.

> 	return 0;
> 
> -- 
> 2.9.5
> 

--
Chuck Lever