linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 23/27] libxfs: use PSI information to detect memory pressure
Date: Thu, 15 Oct 2020 10:56:11 -0700	[thread overview]
Message-ID: <20201015175611.GY9832@magnolia> (raw)
In-Reply-To: <20201015072155.1631135-24-david@fromorbit.com>

On Thu, Oct 15, 2020 at 06:21:51PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The buffer cache needs to have a reliable trigger for shrinking
> the cache. Modern kernels track and report memory pressure events to
> the userspace via the Pressure Stall Interface (PSI). Create a PSI
> memory pressure monitoring thread to listen for memory pressure
> events and use that to drive buffer cache shrinking interfaces.
> 
> Add the shrinker framework that will allow us to implement LRU
> reclaim of buffers when memory pressure occues.  We also create a
> low memory detection and reclaim wait mechanism to allow use to
> throttle back new allocations while we are shrinking the buffer
> cache.
> 
> We also include malloc heap trimming callouts so that once the
> shrinker frees the memory, we trim the malloc heap to release the
> freed memory back to the system.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  libxfs/buftarg.c     | 142 ++++++++++++++++++++++++++++++++++++++++++-
>  libxfs/xfs_buftarg.h |   9 +++
>  2 files changed, 150 insertions(+), 1 deletion(-)
> 
> diff --git a/libxfs/buftarg.c b/libxfs/buftarg.c
> index 42806e433715..6c7142d41eb1 100644
> --- a/libxfs/buftarg.c
> +++ b/libxfs/buftarg.c
> @@ -62,6 +62,128 @@ xfs_buftarg_setsize_early(
>  	return xfs_buftarg_setsize(btp, bsize);
>  }
>  
> +/*
> + * Scan a chunk of the buffer cache and drop LRU reference counts. If the
> + * count goes to zero, dispose of the buffer.
> + */
> +static void
> +xfs_buftarg_shrink(
> +	struct xfs_buftarg	*btc)
> +{
> +	/*
> +	 * Make the fact we are in memory reclaim externally visible. This
> +	 * allows buffer cache allocation throttling while we are trying to
> +	 * free memory.
> +	 */
> +	atomic_inc_return(&btc->bt_low_mem);
> +
> +	fprintf(stderr, "Got memory pressure event. Shrinking caches!\n");
> +
> +	/*
> +	 * Now we've free a bunch of memory, trim the heap down to release the
> +	 * freed memory back to the kernel and reduce the pressure we are
> +	 * placing on the system.
> +	 */
> +	malloc_trim(0);
> +
> +	/*
> +	 * Done, wake anyone waiting on memory reclaim to complete.
> +	 */
> +	atomic_dec_return(&btc->bt_low_mem);
> +	complete(&btc->bt_low_mem_wait);
> +}
> +
> +static void *
> +xfs_buftarg_shrinker(
> +	void			*args)
> +{
> +	struct xfs_buftarg	*btp = args;
> +	struct pollfd		 fds = {
> +		.fd = btp->bt_psi_fd,
> +		.events = POLLPRI,
> +	};
> +
> +	rcu_register_thread();
> +	while (!btp->bt_exiting) {
> +		int	n;
> +
> +		n = poll(&fds, 1, 100);
> +		if (n == 0)
> +			continue;	/* timeout */
> +		if (n < 0) {
> +			perror("poll(PSI)");
> +			break;
> +		}
> +		if (fds.revents & POLLERR) {
> +			fprintf(stderr,
> +				"poll(psi) POLLERR: event source dead?\n");
> +			break;
> +		}
> +		if (!(fds.revents & POLLPRI)) {
> +			fprintf(stderr,
> +				"poll(psi): unknown event.  Ignoring.\n");
> +			continue;
> +		}
> +
> +		/* run the shrinker here */
> +		xfs_buftarg_shrink(btp);
> +
> +	}
> +	rcu_unregister_thread();
> +	return NULL;
> +}
> +
> +/*
> + * This only picks up on global memory pressure. Maybe in future we can detect
> + * whether we are running inside a container and use the PSI information for the
> + * container.
> + *
> + * We want relatively early notification of memory pressure stalls because
> + * xfs_repair will consume lots of memory. Hence set a low trigger threshold for
> + * reclaim to run - a partial stall of 5ms over a 1s sample period will trigger

The trigger string looks like it's configuring for a partial stall of
10ms over a 1s sample period?

> + * reclaim algorithms.
> + */
> +static int
> +xfs_buftarg_mempressue_init(

xfs_buftarg_mempressure_init() ?

> +	struct xfs_buftarg	*btp)
> +{
> +	const char		*fname = "/proc/pressure/memory";
> +	const char		*trigger = "some 10000 1000000";
> +	int			error;
> +
> +	btp->bt_psi_fd = open(fname, O_RDWR | O_NONBLOCK);
> +	if (btp->bt_psi_fd < 0) {
> +		perror("open(PSI)");
> +		return -errno;
> +	}
> +	if (write(btp->bt_psi_fd, trigger, strlen(trigger) + 1) !=
> +						strlen(trigger) + 1) {
> +		perror("write(PSI)");
> +		error = -errno;
> +		goto out_close;
> +	}
> +
> +	atomic_set(&btp->bt_low_mem, 0);
> +	init_completion(&btp->bt_low_mem_wait);
> +
> +	/*
> +	 * Now create the monitoring reclaim thread. This will run until the
> +	 * buftarg is torn down.
> +	 */
> +	error = pthread_create(&btp->bt_psi_tid, NULL,
> +				xfs_buftarg_shrinker, btp);
> +	if (error)
> +		goto out_close;
> +
> +	return 0;
> +
> +out_close:
> +	close(btp->bt_psi_fd);
> +	btp->bt_psi_fd = -1;
> +	return error;
> +}
> +
> +
>  struct xfs_buftarg *
>  xfs_buftarg_alloc(
>  	struct xfs_mount	*mp,
> @@ -74,6 +196,8 @@ xfs_buftarg_alloc(
>  	btp->bt_mount = mp;
>  	btp->bt_fd = libxfs_device_to_fd(bdev);
>  	btp->bt_bdev = bdev;
> +	btp->bt_psi_fd = -1;
> +	btp->bt_exiting = false;
>  
>  	if (xfs_buftarg_setsize_early(btp))
>  		goto error_free;
> @@ -84,8 +208,13 @@ xfs_buftarg_alloc(
>  	if (percpu_counter_init(&btp->bt_io_count, 0, GFP_KERNEL))
>  		goto error_lru;
>  
> +	if (xfs_buftarg_mempressue_init(btp))

So what happens if PSI isn't enabled or procfs isn't mounted yet?
xfs_repair just ... fails?  That seems disappointing, particularly if
the admin is trying to fix a dead root fs from the initramfs premount
shell and /proc isn't set up yet.

Hmm, looks like Debian actually /does/ set up procfs nowadays.  Still,
if we're going to add a hard requirement on CONFIG_PSI=y and
CONFIG_PSI_DEFAULT_DISABLED=n, we need to advertise this kind of loudly.

(Personally, I thought that if there's no pressure stall information,
we'd just fall back to not having a shrinker and daring the system to
OOM us like it does now...)

--D

> +		goto error_pcp;
> +
>  	return btp;
>  
> +error_pcp:
> +	percpu_counter_destroy(&btp->bt_io_count);
>  error_lru:
>  	list_lru_destroy(&btp->bt_lru);
>  error_free:
> @@ -97,6 +226,12 @@ void
>  xfs_buftarg_free(
>  	struct xfs_buftarg	*btp)
>  {
> +	btp->bt_exiting = true;
> +	if (btp->bt_psi_tid)
> +		pthread_join(btp->bt_psi_tid, NULL);
> +	if (btp->bt_psi_fd >= 0)
> +		close(btp->bt_psi_fd);
> +
>  	ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0);
>  	percpu_counter_destroy(&btp->bt_io_count);
>  	platform_flush_device(btp->bt_fd, btp->bt_bdev);
> @@ -121,10 +256,15 @@ xfs_buf_allocate_memory(
>  	struct xfs_buf		*bp,
>  	uint			flags)
>  {
> +	struct xfs_buftarg	*btp = bp->b_target;
>  	size_t			size;
>  
> +	/* Throttle allocation while dealing with low memory events */
> +	while (atomic_read(&btp->bt_low_mem))
> +		wait_for_completion(&btp->bt_low_mem_wait);
> +
>  	size = BBTOB(bp->b_length);
> -	bp->b_addr = memalign(bp->b_target->bt_meta_sectorsize, size);
> +	bp->b_addr = memalign(btp->bt_meta_sectorsize, size);
>  	if (!bp->b_addr)
>  		return -ENOMEM;
>  	return 0;
> diff --git a/libxfs/xfs_buftarg.h b/libxfs/xfs_buftarg.h
> index 798980fdafeb..d2ce47e22545 100644
> --- a/libxfs/xfs_buftarg.h
> +++ b/libxfs/xfs_buftarg.h
> @@ -41,7 +41,16 @@ struct xfs_buftarg {
>  
>  	uint32_t		bt_io_count;
>  	unsigned int		flags;
> +
> +	/*
> +	 * Memory pressure (PSI) and cache reclaim infrastructure
> +	 */
>  	struct list_lru		bt_lru;
> +	int			bt_psi_fd;
> +	pthread_t		bt_psi_tid;
> +	bool			bt_exiting;
> +	bool			bt_low_mem;
> +	struct completion	bt_low_mem_wait;
>  };
>  
>  /* We purged a dirty buffer and lost a write. */
> -- 
> 2.28.0
> 

  reply	other threads:[~2020-10-15 17:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-15  7:21 [PATCH 00/27] [RFC, WIP] xfsprogs: xfs_buf unification and AIO Dave Chinner
2020-10-15  7:21 ` [PATCH 01/27] xfsprogs: remove unused buffer tracing code Dave Chinner
2020-10-15  7:21 ` [PATCH 02/27] xfsprogs: remove unused IO_DEBUG functionality Dave Chinner
2020-11-16  2:31   ` Eric Sandeen
2020-10-15  7:21 ` [PATCH 03/27] libxfs: get rid of b_bcount from xfs_buf Dave Chinner
2020-11-23 19:53   ` Eric Sandeen
2020-10-15  7:21 ` [PATCH 04/27] libxfs: rename buftarg->dev to btdev Dave Chinner
2020-11-16  2:33   ` Eric Sandeen
2020-10-15  7:21 ` [PATCH 05/27] xfsprogs: get rid of ancient btree tracing fragments Dave Chinner
2020-11-16  2:35   ` Eric Sandeen
2020-10-15  7:21 ` [PATCH 06/27] xfsprogs: remove xfs_buf_t typedef Dave Chinner
2020-10-15 15:22   ` Darrick J. Wong
2020-10-15 20:54     ` Dave Chinner
2020-10-15  7:21 ` [PATCH 07/27] xfsprogs: introduce liburcu support Dave Chinner
2020-10-15  7:21 ` [PATCH 08/27] libxfs: add spinlock_t wrapper Dave Chinner
2020-10-15  7:21 ` [PATCH 09/27] atomic: convert to uatomic Dave Chinner
2020-10-15  7:21 ` [PATCH 10/27] libxfs: add kernel-compatible completion API Dave Chinner
2020-10-15 17:09   ` Darrick J. Wong
2020-10-19 22:21     ` Dave Chinner
2020-10-15  7:21 ` [PATCH 11/27] libxfs: add wrappers for kernel semaphores Dave Chinner
2020-10-15  7:21 ` [PATCH 12/27] xfsprogs: convert use-once buffer reads to uncached IO Dave Chinner
2020-10-15 17:12   ` Darrick J. Wong
2020-10-19 22:36     ` Dave Chinner
2020-10-15  7:21 ` [PATCH 13/27] libxfs: introduce userspace buftarg infrastructure Dave Chinner
2020-10-15  7:21 ` [PATCH 14/27] xfs: rename libxfs_buftarg_init to libxfs_open_devices() Dave Chinner
2020-10-15  7:21 ` [PATCH 15/27] libxfs: introduce userspace buftarg infrastructure Dave Chinner
2020-10-15 17:16   ` Darrick J. Wong
2020-10-15  7:21 ` [PATCH 16/27] libxfs: add a synchronous IO engine to the buftarg Dave Chinner
2020-10-15  7:21 ` [PATCH 17/27] xfsprogs: convert libxfs_readbufr to libxfs_buf_read_uncached Dave Chinner
2020-10-15  7:21 ` [PATCH 18/27] libxfs: convert libxfs_bwrite to buftarg IO Dave Chinner
2020-10-15  7:21 ` [PATCH 19/27] libxfs: add cache infrastructure to buftarg Dave Chinner
2020-10-15  7:21 ` [PATCH 20/27] libxfs: add internal lru to btcache Dave Chinner
2020-10-15  7:21 ` [PATCH 21/27] libxfs: Add kernel list_lru wrapper Dave Chinner
2020-10-15  7:21 ` [PATCH 22/27] libxfs: introduce new buffer cache infrastructure Dave Chinner
2020-10-15 17:46   ` Darrick J. Wong
2020-10-15  7:21 ` [PATCH 23/27] libxfs: use PSI information to detect memory pressure Dave Chinner
2020-10-15 17:56   ` Darrick J. Wong [this message]
2020-10-15 21:20     ` Dave Chinner
2020-10-15  7:21 ` [PATCH 24/27] libxfs: add a buftarg cache shrinker implementation Dave Chinner
2020-10-15 18:01   ` Darrick J. Wong
2020-10-15 21:33     ` Dave Chinner
2020-10-15  7:21 ` [PATCH 25/27] libxfs: switch buffer cache implementations Dave Chinner
2020-10-15  7:21 ` [PATCH 26/27] build: set platform_defs.h.in dependency correctly Dave Chinner
2020-10-15  7:21 ` [PATCH 27/27] libxfs: convert sync IO buftarg engine to AIO Dave Chinner
2020-10-15 18:26   ` Darrick J. Wong
2020-10-15 21:42     ` Dave Chinner
2020-10-15  7:29 ` [PATCH 00/27] [RFC, WIP] xfsprogs: xfs_buf unification and AIO Dave Chinner
2020-10-15 18:37 ` Darrick J. Wong
2020-10-15 22:35   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201015175611.GY9832@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).