linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org,
	 brauner@kernel.org, linux-mm@kvack.org, p.raghav@samsung.com,
	 da.gomez@samsung.com, a.manzanares@samsung.com,
	dave@stgolabs.net,  yosryahmed@google.com, keescook@chromium.org,
	patches@lists.linux.dev,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap
Date: Mon, 17 Apr 2023 22:50:59 -0700 (PDT)	[thread overview]
Message-ID: <79eae9fe-7818-a65c-89c6-138b55d609a@google.com> (raw)
In-Reply-To: <20230309230545.2930737-7-mcgrof@kernel.org>

On Thu, 9 Mar 2023, Luis Chamberlain wrote:

> In doing experimentations with shmem having the option to avoid swap
> becomes a useful mechanism. One of the *raves* about brd over shmem is
> you can avoid swap, but that's not really a good reason to use brd if
> we can instead use shmem. Using brd has its own good reasons to exist,
> but just because "tmpfs" doesn't let you do that is not a great reason
> to avoid it if we can easily add support for it.
> 
> I don't add support for reconfiguring incompatible options, but if
> we really wanted to we can add support for that.
> 
> To avoid swap we use mapping_set_unevictable() upon inode creation,
> and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.

I have one big question here, which betrays my ignorance:
I hope that you or Christian can reassure me on this.

tmpfs has fs_flags FS_USERNS_MOUNT.  I know nothing about namespaces,
nothing; but from overhearings, wonder if an ordinary user in a namespace
might be able to mount their own tmpfs with "noswap", and thereby evade
all accounting of the locked memory.

That would be an absolute no-no for this patch; but I assume that even
if so, it can be easily remedied by inserting an appropriate (unknown
to me!) privilege check where the "noswap" option is validated.

I did idly wonder what happens with "noswap" when CONFIG_SWAP is not
enabled, or no swap is enabled; but I think it would be a waste of time
and code to worry over doing anything different from whatever behaviour
falls out trivially.

You'll be sending a manpage update to Alejandro in due course, I think.

Thanks,
Hugh

> 
> Acked-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  Documentation/filesystems/tmpfs.rst  |  9 ++++++---
>  Documentation/mm/unevictable-lru.rst |  2 ++
>  include/linux/shmem_fs.h             |  1 +
>  mm/shmem.c                           | 28 +++++++++++++++++++++++++++-
>  4 files changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 1ec9a9f8196b..f18f46be5c0c 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -13,7 +13,8 @@ everything stored therein is lost.
>  
>  tmpfs puts everything into the kernel internal caches and grows and
>  shrinks to accommodate the files it contains and is able to swap
> -unneeded pages out to swap space, and supports THP.
> +unneeded pages out to swap space, if swap was enabled for the tmpfs
> +mount. tmpfs also supports THP.
>  
>  tmpfs extends ramfs with a few userspace configurable options listed and
>  explained further below, some of which can be reconfigured dynamically on the
> @@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynamically resize them.
>  Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
>  block layer at all.
>  
> -Since tmpfs lives completely in the page cache and on swap, all tmpfs
> -pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
> +Since tmpfs lives completely in the page cache and optionally on swap,
> +all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
>  free(1). Notice that these counters also include shared memory
>  (shmem, see ipcs(1)). The most reliable way to get the count is
>  using df(1) and du(1).
> @@ -83,6 +84,8 @@ nr_inodes  The maximum number of inodes for this instance. The default
>             is half of the number of your physical RAM pages, or (on a
>             machine with highmem) the number of lowmem RAM pages,
>             whichever is the lower.
> +noswap     Disables swap. Remounts must respect the original settings.
> +           By default swap is enabled.
>  =========  ============================================================
>  
>  These parameters accept a suffix k, m or g for kilo, mega and giga and
> diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
> index 92ac5dca420c..d5ac8511eb67 100644
> --- a/Documentation/mm/unevictable-lru.rst
> +++ b/Documentation/mm/unevictable-lru.rst
> @@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:
>  
>   * Those owned by ramfs.
>  
> + * Those owned by tmpfs with the noswap mount option.
> +
>   * Those mapped into SHM_LOCK'd shared memory regions.
>  
>   * Those mapped into VM_LOCKED [mlock()ed] VMAs.
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 103d1000a5a2..50bf82b36995 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -45,6 +45,7 @@ struct shmem_sb_info {
>  	kuid_t uid;		    /* Mount uid for root directory */
>  	kgid_t gid;		    /* Mount gid for root directory */
>  	bool full_inums;	    /* If i_ino should be uint or ino_t */
> +	bool noswap;		    /* ignores VM reclaim / swap requests */
>  	ino_t next_ino;		    /* The next per-sb inode number to use */
>  	ino_t __percpu *ino_batch;  /* The next per-cpu inode number to use */
>  	struct mempolicy *mpol;     /* default memory policy for mappings */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index dfd995da77b4..2e122c72b375 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -119,10 +119,12 @@ struct shmem_options {
>  	bool full_inums;
>  	int huge;
>  	int seen;
> +	bool noswap;
>  #define SHMEM_SEEN_BLOCKS 1
>  #define SHMEM_SEEN_INODES 2
>  #define SHMEM_SEEN_HUGE 4
>  #define SHMEM_SEEN_INUMS 8
> +#define SHMEM_SEEN_NOSWAP 16
>  };
>  
>  #ifdef CONFIG_TMPFS
> @@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	struct address_space *mapping = folio->mapping;
>  	struct inode *inode = mapping->host;
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	swp_entry_t swap;
>  	pgoff_t index;
>  
> @@ -1350,7 +1353,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	if (WARN_ON_ONCE(!wbc->for_reclaim))
>  		goto redirty;
>  
> -	if (WARN_ON_ONCE(info->flags & VM_LOCKED))
> +	if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap))
>  		goto redirty;
>  
>  	if (!total_swap_pages)
> @@ -2487,6 +2490,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
>  			shmem_set_inode_flags(inode, info->fsflags);
>  		INIT_LIST_HEAD(&info->shrinklist);
>  		INIT_LIST_HEAD(&info->swaplist);
> +		if (sbinfo->noswap)
> +			mapping_set_unevictable(inode->i_mapping);
>  		simple_xattrs_init(&info->xattrs);
>  		cache_no_acl(inode);
>  		mapping_set_large_folios(inode->i_mapping);
> @@ -3574,6 +3579,7 @@ enum shmem_param {
>  	Opt_uid,
>  	Opt_inode32,
>  	Opt_inode64,
> +	Opt_noswap,
>  };
>  
>  static const struct constant_table shmem_param_enums_huge[] = {
> @@ -3595,6 +3601,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
>  	fsparam_u32   ("uid",		Opt_uid),
>  	fsparam_flag  ("inode32",	Opt_inode32),
>  	fsparam_flag  ("inode64",	Opt_inode64),
> +	fsparam_flag  ("noswap",	Opt_noswap),
>  	{}
>  };
>  
> @@ -3678,6 +3685,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>  		ctx->full_inums = true;
>  		ctx->seen |= SHMEM_SEEN_INUMS;
>  		break;
> +	case Opt_noswap:
> +		ctx->noswap = true;
> +		ctx->seen |= SHMEM_SEEN_NOSWAP;
> +		break;
>  	}
>  	return 0;
>  
> @@ -3776,6 +3787,14 @@ static int shmem_reconfigure(struct fs_context *fc)
>  		err = "Current inum too high to switch to 32-bit inums";
>  		goto out;
>  	}
> +	if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) {
> +		err = "Cannot disable swap on remount";
> +		goto out;
> +	}
> +	if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) {
> +		err = "Cannot enable swap on remount if it was disabled on first mount";
> +		goto out;
> +	}
>  
>  	if (ctx->seen & SHMEM_SEEN_HUGE)
>  		sbinfo->huge = ctx->huge;
> @@ -3796,6 +3815,10 @@ static int shmem_reconfigure(struct fs_context *fc)
>  		sbinfo->mpol = ctx->mpol;	/* transfers initial ref */
>  		ctx->mpol = NULL;
>  	}
> +
> +	if (ctx->noswap)
> +		sbinfo->noswap = true;
> +
>  	raw_spin_unlock(&sbinfo->stat_lock);
>  	mpol_put(mpol);
>  	return 0;
> @@ -3850,6 +3873,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
>  		seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge));
>  #endif
>  	shmem_show_mpol(seq, sbinfo->mpol);
> +	if (sbinfo->noswap)
> +		seq_printf(seq, ",noswap");
>  	return 0;
>  }
>  
> @@ -3893,6 +3918,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
>  			ctx->inodes = shmem_default_max_inodes();
>  		if (!(ctx->seen & SHMEM_SEEN_INUMS))
>  			ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
> +		sbinfo->noswap = ctx->noswap;
>  	} else {
>  		sb->s_flags |= SB_NOUSER;
>  	}
> -- 
> 2.39.1


  reply	other threads:[~2023-04-18  5:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-09 23:05 [PATCH v2 0/6] tmpfs: add the option to disable swap Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
2023-03-09 23:09   ` Yosry Ahmed
2023-04-18  4:41   ` Hugh Dickins
2023-04-18 21:11     ` Luis Chamberlain
2023-04-18 21:20       ` Hugh Dickins
2023-03-09 23:05 ` [PATCH v2 5/6] shmem: update documentation Luis Chamberlain
2023-04-18  5:29   ` Hugh Dickins
2023-04-18 21:20     ` Luis Chamberlain
2023-04-18 21:41       ` Hugh Dickins
2023-04-18 21:49         ` Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 6/6] shmem: add support to ignore swap Luis Chamberlain
2023-04-18  5:50   ` Hugh Dickins [this message]
2023-04-18  7:38     ` Christian Brauner
2023-04-18 21:51       ` Luis Chamberlain
2023-04-20  8:57         ` [PATCH] shmem: restrict noswap option to initial user namespace Christian Brauner
2023-04-20 19:18           ` Luis Chamberlain
2023-04-18 21:22     ` [PATCH v2 6/6] shmem: add support to ignore swap Luis Chamberlain
2023-04-18 21:30       ` Randy Dunlap
2023-03-14  1:21 ` [PATCH v2 0/6] tmpfs: add the option to disable swap Davidlohr Bueso
2023-03-14  2:46 ` haoxin
2023-03-19 20:32   ` Luis Chamberlain
2023-03-20 11:14     ` haoxin
2023-03-20 21:36       ` Luis Chamberlain
2023-03-21 11:37         ` haoxin
2023-04-18  4:31 ` Hugh Dickins
2023-04-18 20:55   ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79eae9fe-7818-a65c-89c6-138b55d609a@google.com \
    --to=hughd@google.com \
    --cc=a.manzanares@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=dave@stgolabs.net \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=patches@lists.linux.dev \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).