linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] tmpfs: add the option to disable swap
@ 2023-02-07  2:52 Luis Chamberlain
  2023-02-07  2:52 ` [RFC 1/2] shmem: set shmem_writepage() variables early Luis Chamberlain
  2023-02-07  2:52 ` [RFC 2/2] shmem: add support to ignore swap Luis Chamberlain
  0 siblings, 2 replies; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-07  2:52 UTC (permalink / raw)
  To: hughd, akpm, willy
  Cc: linux-mm, p.raghav, dave, a.manzanares, linux-kernel, Luis Chamberlain

Many folks suggest using tmpfs is not great because it can use swap.
That's not a good reason to *not* use tmpfs, what's just missing is just
the option to let you disable it. And so this does that, to enable that
and also let users experiment with it.

Reconfiguring is not supported, so what you set up first is what you can keep
with regards to swap options. Enabling support to disable on the fly and
add swap on the fly *is* possible, but I'm not sure we need that yet.

Matthew -- the first patch just makes it easier to read the second, but it does
beg a few questions in it with regards to semantics about the folio and
the inode after splitting a huge page so review for that would be
greatly appreciated.

To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next.

I'm doing this work as part of future experimentation with tmpfs and the
page cache, but given a common complaint found about tmpfs is the
innability to work without the page cache I figured this might be useful
to others. It would also prove useful if folks ask for patches for future
experimentation we are doing.

To see if you hit swap:

mkswap /dev/nvme2n1
swapon /dev/nvme2n1
free -h

With swap - what we see today
=============================
mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       1.2Gi       2.2Gi       2.2Gi       1.2Gi
Swap:           99Gi       2.8Gi        97Gi


Without swap
=============

free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       387Mi       3.4Gi       2.1Mi        57Mi       3.3Gi
Swap:           99Gi          0B        99Gi
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       1.2Gi       2.3Gi       2.3Gi       1.1Gi
Swap:           99Gi        21Mi        99Gi


The mix and match remount testing
=================================

# Cannot disable swap after it was first enabled:
mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
mount: /data-tmpfs: mount point not mounted or bad option.
       dmesg(1) may have more information after failed mount system call.
dmesg -c
tmpfs: Cannot disable swap on remount

# Remount with the same noswap option is OK:
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
dmesg -c

# Trying to enable swap with a remount after it first disabled:
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G           tmpfs /data-tmpfs/
mount: /data-tmpfs: mount point not mounted or bad option.
       dmesg(1) may have more information after failed mount system call.
dmesg -c
tmpfs: Cannot enable swap on remount if it was disabled on first mount

[0] https://github.com/linux-kdevops/kdevops

Luis Chamberlain (2):
  shmem: set shmem_writepage() variables early
  shmem: add support to ignore swap

 include/linux/shmem_fs.h |  1 +
 mm/shmem.c               | 33 ++++++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

-- 
2.39.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/2] shmem: set shmem_writepage() variables early
  2023-02-07  2:52 [RFC 0/2] tmpfs: add the option to disable swap Luis Chamberlain
@ 2023-02-07  2:52 ` Luis Chamberlain
  2023-02-07  3:52   ` Matthew Wilcox
  2023-02-07  2:52 ` [RFC 2/2] shmem: add support to ignore swap Luis Chamberlain
  1 sibling, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-07  2:52 UTC (permalink / raw)
  To: hughd, akpm, willy
  Cc: linux-mm, p.raghav, dave, a.manzanares, linux-kernel, Luis Chamberlain

shmem_writepage() sets up variables typically used *after* a possible
huge page split. However even if that does happen the address space
mapping should not change. So it should be safe to set that from
the beginning.

The folio should always be locked from the start as well. It however
was not clear if the folio address can / should change, as well as
the first inode.

This commit makes no functional changes other a double check on the
folio locking which might be superflous. This change should help make
the subsequent patch easier to review.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 28f3c699c8ce..a2c6aa11aab8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1332,11 +1332,13 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 {
 	struct folio *folio = page_folio(page);
 	struct shmem_inode_info *info;
-	struct address_space *mapping;
-	struct inode *inode;
+	struct address_space *mapping = folio->mapping;
+	struct inode *inode = mapping->host;
 	swp_entry_t swap;
 	pgoff_t index;
 
+	BUG_ON(!folio_test_locked(folio));
+
 	/*
 	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
 	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
@@ -1351,8 +1353,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		folio_clear_dirty(folio);
 	}
 
+	/* Can the folio or first inode change on after a split? */
 	BUG_ON(!folio_test_locked(folio));
-	mapping = folio->mapping;
 	index = folio->index;
 	inode = mapping->host;
 	info = SHMEM_I(inode);
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/2] shmem: add support to ignore swap
  2023-02-07  2:52 [RFC 0/2] tmpfs: add the option to disable swap Luis Chamberlain
  2023-02-07  2:52 ` [RFC 1/2] shmem: set shmem_writepage() variables early Luis Chamberlain
@ 2023-02-07  2:52 ` Luis Chamberlain
  2023-02-07  4:01   ` Matthew Wilcox
  1 sibling, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-07  2:52 UTC (permalink / raw)
  To: hughd, akpm, willy
  Cc: linux-mm, p.raghav, dave, a.manzanares, linux-kernel, Luis Chamberlain

In doing experimentations with shmem having the option to avoid becomes
a useful mechanism. One of the *raves* about brd over shmem is you can
avoid swap, but that's not really a good reason to use brd if we can
instead use shmem. Using brd has its own good reasons to exist, but
just because "tmpfs" doesn't let you do that is not a great reason
to avoid it if we can easily add support for it.

I don't add support for reconfiguring incompatible options, but if
we really wanted to we can add support for that.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 include/linux/shmem_fs.h |  1 +
 mm/shmem.c               | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index d09d54be4ffd..98a7d53f6cc5 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -45,6 +45,7 @@ struct shmem_sb_info {
 	kuid_t uid;		    /* Mount uid for root directory */
 	kgid_t gid;		    /* Mount gid for root directory */
 	bool full_inums;	    /* If i_ino should be uint or ino_t */
+	bool noswap;	    	    /* ingores VM relcaim / swap requests */
 	ino_t next_ino;		    /* The next per-sb inode number to use */
 	ino_t __percpu *ino_batch;  /* The next per-cpu inode number to use */
 	struct mempolicy *mpol;     /* default memory policy for mappings */
diff --git a/mm/shmem.c b/mm/shmem.c
index a2c6aa11aab8..92aa927cf569 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -116,10 +116,12 @@ struct shmem_options {
 	bool full_inums;
 	int huge;
 	int seen;
+	bool noswap;
 #define SHMEM_SEEN_BLOCKS 1
 #define SHMEM_SEEN_INODES 2
 #define SHMEM_SEEN_HUGE 4
 #define SHMEM_SEEN_INUMS 8
+#define SHMEM_SEEN_NOSWAP 16
 };
 
 #ifdef CONFIG_TMPFS
@@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	struct shmem_inode_info *info;
 	struct address_space *mapping = folio->mapping;
 	struct inode *inode = mapping->host;
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	swp_entry_t swap;
 	pgoff_t index;
 
 	BUG_ON(!folio_test_locked(folio));
 
+	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
+		return AOP_WRITEPAGE_ACTIVATE;
+
 	/*
 	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
 	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
@@ -3465,6 +3471,7 @@ enum shmem_param {
 	Opt_uid,
 	Opt_inode32,
 	Opt_inode64,
+	Opt_noswap,
 };
 
 static const struct constant_table shmem_param_enums_huge[] = {
@@ -3486,6 +3493,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
 	fsparam_u32   ("uid",		Opt_uid),
 	fsparam_flag  ("inode32",	Opt_inode32),
 	fsparam_flag  ("inode64",	Opt_inode64),
+	fsparam_flag  ("noswap",	Opt_noswap),
 	{}
 };
 
@@ -3569,6 +3577,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		ctx->full_inums = true;
 		ctx->seen |= SHMEM_SEEN_INUMS;
 		break;
+	case Opt_noswap:
+		ctx->noswap = true;
+		ctx->seen |= SHMEM_SEEN_NOSWAP;
+		break;
 	}
 	return 0;
 
@@ -3667,6 +3679,14 @@ static int shmem_reconfigure(struct fs_context *fc)
 		err = "Current inum too high to switch to 32-bit inums";
 		goto out;
 	}
+	if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) {
+		err = "Cannot disable swap on remount";
+		goto out;
+	}
+	if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) {
+		err = "Cannot enable swap on remount if it was disabled on first mount";
+		goto out;
+	}
 
 	if (ctx->seen & SHMEM_SEEN_HUGE)
 		sbinfo->huge = ctx->huge;
@@ -3687,6 +3707,10 @@ static int shmem_reconfigure(struct fs_context *fc)
 		sbinfo->mpol = ctx->mpol;	/* transfers initial ref */
 		ctx->mpol = NULL;
 	}
+
+	if (ctx->noswap)
+		sbinfo->noswap = true;
+
 	raw_spin_unlock(&sbinfo->stat_lock);
 	mpol_put(mpol);
 	return 0;
@@ -3784,6 +3808,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 			ctx->inodes = shmem_default_max_inodes();
 		if (!(ctx->seen & SHMEM_SEEN_INUMS))
 			ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
+		sbinfo->noswap = ctx->noswap;
 	} else {
 		sb->s_flags |= SB_NOUSER;
 	}
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] shmem: set shmem_writepage() variables early
  2023-02-07  2:52 ` [RFC 1/2] shmem: set shmem_writepage() variables early Luis Chamberlain
@ 2023-02-07  3:52   ` Matthew Wilcox
  2023-02-08 16:08     ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-02-07  3:52 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, linux-mm, p.raghav, dave, a.manzanares, linux-kernel

On Mon, Feb 06, 2023 at 06:52:58PM -0800, Luis Chamberlain wrote:
> shmem_writepage() sets up variables typically used *after* a possible
> huge page split. However even if that does happen the address space
> mapping should not change. So it should be safe to set that from
> the beginning.

Yes, we can get mapping from the folio early.  It doesn't change
on split.

> The folio should always be locked from the start as well. It however
> was not clear if the folio address can / should change, as well as
> the first inode.
> 
> This commit makes no functional changes other a double check on the
> folio locking which might be superflous. This change should help make
> the subsequent patch easier to review.

You don't need to check that the folio's locked, and you don't
need to reassign inode after the split.

> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  mm/shmem.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 28f3c699c8ce..a2c6aa11aab8 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1332,11 +1332,13 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  {
>  	struct folio *folio = page_folio(page);
>  	struct shmem_inode_info *info;
> -	struct address_space *mapping;
> -	struct inode *inode;
> +	struct address_space *mapping = folio->mapping;
> +	struct inode *inode = mapping->host;
>  	swp_entry_t swap;
>  	pgoff_t index;
>  
> +	BUG_ON(!folio_test_locked(folio));
> +
>  	/*
>  	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
>  	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
> @@ -1351,8 +1353,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  		folio_clear_dirty(folio);
>  	}
>  
> +	/* Can the folio or first inode change on after a split? */
>  	BUG_ON(!folio_test_locked(folio));
> -	mapping = folio->mapping;
>  	index = folio->index;
>  	inode = mapping->host;
>  	info = SHMEM_I(inode);
> -- 
> 2.39.0
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-07  2:52 ` [RFC 2/2] shmem: add support to ignore swap Luis Chamberlain
@ 2023-02-07  4:01   ` Matthew Wilcox
  2023-02-08 16:01     ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-02-07  4:01 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, linux-mm, p.raghav, dave, a.manzanares, linux-kernel

On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	struct shmem_inode_info *info;
>  	struct address_space *mapping = folio->mapping;
>  	struct inode *inode = mapping->host;
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	swp_entry_t swap;
>  	pgoff_t index;
>  
>  	BUG_ON(!folio_test_locked(folio));
>  
> +	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> +		return AOP_WRITEPAGE_ACTIVATE;

Not sure this is the best way to handle this.  We'll still incur the
oevrhead of tracking shmem pages on the LRU, only to fail to write them
out when the VM thinks we should get rid of them.  We'd be better off
not putting them on the LRU in the first place.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-07  4:01   ` Matthew Wilcox
@ 2023-02-08 16:01     ` Luis Chamberlain
  2023-02-08 17:45       ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-08 16:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: hughd, akpm, linux-mm, p.raghav, dave, a.manzanares, linux-kernel

On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> >  	struct shmem_inode_info *info;
> >  	struct address_space *mapping = folio->mapping;
> >  	struct inode *inode = mapping->host;
> > +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> >  	swp_entry_t swap;
> >  	pgoff_t index;
> >  
> >  	BUG_ON(!folio_test_locked(folio));
> >  
> > +	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > +		return AOP_WRITEPAGE_ACTIVATE;
> 
> Not sure this is the best way to handle this.  We'll still incur the
> oevrhead of tracking shmem pages on the LRU, only to fail to write them
> out when the VM thinks we should get rid of them.  We'd be better off
> not putting them on the LRU in the first place.

Ah, makes sense, so in effect then if we do that then on reclaim
we should be able to even WARN_ON(sbinfo->noswap) assuming we did
everthing right.

Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
too late how about d_mark_dontcache() on shmem_get_inode() instead?

  Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] shmem: set shmem_writepage() variables early
  2023-02-07  3:52   ` Matthew Wilcox
@ 2023-02-08 16:08     ` Luis Chamberlain
  0 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-08 16:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: hughd, akpm, linux-mm, p.raghav, dave, a.manzanares, linux-kernel

On Tue, Feb 07, 2023 at 03:52:53AM +0000, Matthew Wilcox wrote:
> On Mon, Feb 06, 2023 at 06:52:58PM -0800, Luis Chamberlain wrote:
> > shmem_writepage() sets up variables typically used *after* a possible
> > huge page split. However even if that does happen the address space
> > mapping should not change. So it should be safe to set that from
> > the beginning.
> 
> Yes, we can get mapping from the folio early.  It doesn't change
> on split.

Great.

> > The folio should always be locked from the start as well. It however
> > was not clear if the folio address can / should change, as well as
> > the first inode.
> > 
> > This commit makes no functional changes other a double check on the
> > folio locking which might be superflous. This change should help make
> > the subsequent patch easier to review.
> 
> You don't need to check that the folio's locked,

That BUG_ON() has been on shmem since linux-history days it's probably
best as a separate patch.

> and you don't
> need to reassign inode after the split.

Great! Thanks for this confirmation!

  Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-08 16:01     ` Luis Chamberlain
@ 2023-02-08 17:45       ` Matthew Wilcox
  2023-02-08 20:33         ` Yosry Ahmed
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-02-08 17:45 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, linux-mm, p.raghav, dave, a.manzanares, linux-kernel

On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > >  	struct shmem_inode_info *info;
> > >  	struct address_space *mapping = folio->mapping;
> > >  	struct inode *inode = mapping->host;
> > > +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > >  	swp_entry_t swap;
> > >  	pgoff_t index;
> > >  
> > >  	BUG_ON(!folio_test_locked(folio));
> > >  
> > > +	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > +		return AOP_WRITEPAGE_ACTIVATE;
> > 
> > Not sure this is the best way to handle this.  We'll still incur the
> > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > out when the VM thinks we should get rid of them.  We'd be better off
> > not putting them on the LRU in the first place.
> 
> Ah, makes sense, so in effect then if we do that then on reclaim
> we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> everthing right.
> 
> Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> too late how about d_mark_dontcache() on shmem_get_inode() instead?

I was thinking that the two calls to folio_add_lru() in mm/shmem.c
should be conditional on sbinfo->noswap.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-08 17:45       ` Matthew Wilcox
@ 2023-02-08 20:33         ` Yosry Ahmed
  2023-02-23  0:53           ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Yosry Ahmed @ 2023-02-08 20:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luis Chamberlain, hughd, akpm, linux-mm, p.raghav, dave,
	a.manzanares, linux-kernel

On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > >   struct shmem_inode_info *info;
> > > >   struct address_space *mapping = folio->mapping;
> > > >   struct inode *inode = mapping->host;
> > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > >   swp_entry_t swap;
> > > >   pgoff_t index;
> > > >
> > > >   BUG_ON(!folio_test_locked(folio));
> > > >
> > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > >
> > > Not sure this is the best way to handle this.  We'll still incur the
> > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > out when the VM thinks we should get rid of them.  We'd be better off
> > > not putting them on the LRU in the first place.
> >
> > Ah, makes sense, so in effect then if we do that then on reclaim
> > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > everthing right.
> >
> > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > too late how about d_mark_dontcache() on shmem_get_inode() instead?
>
> I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> should be conditional on sbinfo->noswap.
>

Wouldn't this cause the folio to not show up in any lru lists, even
the unevictable one, which may be a strange discrepancy?

Perhaps we can do something like shmem_lock(), which calls
mapping_set_unevictable(), which will make folio_evictable() return
true and the LRUs code will take care of the rest?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-08 20:33         ` Yosry Ahmed
@ 2023-02-23  0:53           ` Luis Chamberlain
  2023-02-23  1:04             ` Yosry Ahmed
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-23  0:53 UTC (permalink / raw)
  To: Yosry Ahmed, Eric W. Biederman
  Cc: Matthew Wilcox, hughd, akpm, linux-mm, p.raghav, dave,
	a.manzanares, linux-kernel

On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote:
> On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > > >   struct shmem_inode_info *info;
> > > > >   struct address_space *mapping = folio->mapping;
> > > > >   struct inode *inode = mapping->host;
> > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > > >   swp_entry_t swap;
> > > > >   pgoff_t index;
> > > > >
> > > > >   BUG_ON(!folio_test_locked(folio));
> > > > >
> > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > > >
> > > > Not sure this is the best way to handle this.  We'll still incur the
> > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > > out when the VM thinks we should get rid of them.  We'd be better off
> > > > not putting them on the LRU in the first place.
> > >
> > > Ah, makes sense, so in effect then if we do that then on reclaim
> > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > > everthing right.
> > >
> > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > > too late how about d_mark_dontcache() on shmem_get_inode() instead?
> >
> > I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> > should be conditional on sbinfo->noswap.
> >
> 
> Wouldn't this cause the folio to not show up in any lru lists, even
> the unevictable one, which may be a strange discrepancy?
> 
> Perhaps we can do something like shmem_lock(), which calls
> mapping_set_unevictable(), which will make folio_evictable() return
> true and the LRUs code will take care of the rest?

If shmem_lock() should take care of that is that because writepages()
should not happen or because we have that info->flags & VM_LOCKED stop
gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE()
if writepages() is called on info->flags & VM_LOCKED?

While I see the value in mapping_set_unevictable() I am not sure I see
the point in using shmem_lock(). I don't see why we should constrain
noswap tmpfs option to RLIMIT_MEMLOCK

Please correct me if I'm wrong but the limit seem to be designed for
files / IPC / unprivileged perf limits. On the contrary, we'd bump the
count for each new inode. Using shmem_lock() would  also complicate the
inode allocation on shmem as we'd have to unwind on failure from the
user_shm_lock(). It would also beg the question of when to capture a
ucount for an inode, should we just share one for the superblock at
shmem_fill_super() or do we really need to capture it at every single
inode creation? In theory we could end up with different limits.    

So why not just use mapping_set_unevictable() alone for this use case?

  Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-23  0:53           ` Luis Chamberlain
@ 2023-02-23  1:04             ` Yosry Ahmed
  2023-02-23  1:35               ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Yosry Ahmed @ 2023-02-23  1:04 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Eric W. Biederman, Matthew Wilcox, hughd, akpm, linux-mm,
	p.raghav, dave, a.manzanares, linux-kernel

On Wed, Feb 22, 2023 at 4:53 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote:
> > On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > > > >   struct shmem_inode_info *info;
> > > > > >   struct address_space *mapping = folio->mapping;
> > > > > >   struct inode *inode = mapping->host;
> > > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > > > >   swp_entry_t swap;
> > > > > >   pgoff_t index;
> > > > > >
> > > > > >   BUG_ON(!folio_test_locked(folio));
> > > > > >
> > > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > > > >
> > > > > Not sure this is the best way to handle this.  We'll still incur the
> > > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > > > out when the VM thinks we should get rid of them.  We'd be better off
> > > > > not putting them on the LRU in the first place.
> > > >
> > > > Ah, makes sense, so in effect then if we do that then on reclaim
> > > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > > > everthing right.
> > > >
> > > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > > > too late how about d_mark_dontcache() on shmem_get_inode() instead?
> > >
> > > I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> > > should be conditional on sbinfo->noswap.
> > >
> >
> > Wouldn't this cause the folio to not show up in any lru lists, even
> > the unevictable one, which may be a strange discrepancy?
> >
> > Perhaps we can do something like shmem_lock(), which calls
> > mapping_set_unevictable(), which will make folio_evictable() return
> > true and the LRUs code will take care of the rest?
>
> If shmem_lock() should take care of that is that because writepages()
> should not happen or because we have that info->flags & VM_LOCKED stop
> gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE()
> if writepages() is called on info->flags & VM_LOCKED?
>
> While I see the value in mapping_set_unevictable() I am not sure I see
> the point in using shmem_lock(). I don't see why we should constrain
> noswap tmpfs option to RLIMIT_MEMLOCK
>
> Please correct me if I'm wrong but the limit seem to be designed for
> files / IPC / unprivileged perf limits. On the contrary, we'd bump the
> count for each new inode. Using shmem_lock() would  also complicate the
> inode allocation on shmem as we'd have to unwind on failure from the
> user_shm_lock(). It would also beg the question of when to capture a
> ucount for an inode, should we just share one for the superblock at
> shmem_fill_super() or do we really need to capture it at every single
> inode creation? In theory we could end up with different limits.
>
> So why not just use mapping_set_unevictable() alone for this use case?

Sorry if I wasn't clear, I did NOT mean that we should use
shmem_lock(), I meant that we do something similar to what
shmem_lock() does and use mapping_set_unevictable() or similar.

I think we just need to make sure that if we use
mapping_set_unevictable() does not imply that shmem_lock() was used
(i.e no code assumes that if the shmem mapping is unevictable then
shmem_lock() was used).

Anyway, I am not very knowledgeable here so take anything I say with a
grain of salt.
Thanks.

>
>   Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] shmem: add support to ignore swap
  2023-02-23  1:04             ` Yosry Ahmed
@ 2023-02-23  1:35               ` Luis Chamberlain
  0 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2023-02-23  1:35 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Eric W. Biederman, Matthew Wilcox, hughd, akpm, linux-mm,
	p.raghav, dave, a.manzanares, linux-kernel

On Wed, Feb 22, 2023 at 05:04:32PM -0800, Yosry Ahmed wrote:
> On Wed, Feb 22, 2023 at 4:53 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote:
> > > On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > > > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > > > > >   struct shmem_inode_info *info;
> > > > > > >   struct address_space *mapping = folio->mapping;
> > > > > > >   struct inode *inode = mapping->host;
> > > > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > > > > >   swp_entry_t swap;
> > > > > > >   pgoff_t index;
> > > > > > >
> > > > > > >   BUG_ON(!folio_test_locked(folio));
> > > > > > >
> > > > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > > > > >
> > > > > > Not sure this is the best way to handle this.  We'll still incur the
> > > > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > > > > out when the VM thinks we should get rid of them.  We'd be better off
> > > > > > not putting them on the LRU in the first place.
> > > > >
> > > > > Ah, makes sense, so in effect then if we do that then on reclaim
> > > > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > > > > everthing right.
> > > > >
> > > > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > > > > too late how about d_mark_dontcache() on shmem_get_inode() instead?
> > > >
> > > > I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> > > > should be conditional on sbinfo->noswap.
> > > >
> > >
> > > Wouldn't this cause the folio to not show up in any lru lists, even
> > > the unevictable one, which may be a strange discrepancy?
> > >
> > > Perhaps we can do something like shmem_lock(), which calls
> > > mapping_set_unevictable(), which will make folio_evictable() return
> > > true and the LRUs code will take care of the rest?
> >
> > If shmem_lock() should take care of that is that because writepages()
> > should not happen or because we have that info->flags & VM_LOCKED stop
> > gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE()
> > if writepages() is called on info->flags & VM_LOCKED?
> >
> > While I see the value in mapping_set_unevictable() I am not sure I see
> > the point in using shmem_lock(). I don't see why we should constrain
> > noswap tmpfs option to RLIMIT_MEMLOCK
> >
> > Please correct me if I'm wrong but the limit seem to be designed for
> > files / IPC / unprivileged perf limits. On the contrary, we'd bump the
> > count for each new inode. Using shmem_lock() would  also complicate the
> > inode allocation on shmem as we'd have to unwind on failure from the
> > user_shm_lock(). It would also beg the question of when to capture a
> > ucount for an inode, should we just share one for the superblock at
> > shmem_fill_super() or do we really need to capture it at every single
> > inode creation? In theory we could end up with different limits.
> >
> > So why not just use mapping_set_unevictable() alone for this use case?
> 
> Sorry if I wasn't clear, I did NOT mean that we should use
> shmem_lock(), I meant that we do something similar to what
> shmem_lock() does and use mapping_set_unevictable() or similar.

Ah OK! Sure yeah I reviewed shmem_lock() usage and I don't think it
and its rtlimit baggage makes sense here so the only thing to do is
just mapping_set_unevictable().

> I think we just need to make sure that if we use
> mapping_set_unevictable() does not imply that shmem_lock() was used
> (i.e no code assumes that if the shmem mapping is unevictable then
> shmem_lock() was used).

The *other* stuff that shmem_lock() does is rlimit rlimit related
to RLIMIT_MEMLOCK, I can't think off hand why we'd confuse the two
use cases at the moment, but I'll give it another good luck with this
in mind.

I'll test what I have and post a v2 with the feedback received.

Thanks,

  Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-23  1:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-07  2:52 [RFC 0/2] tmpfs: add the option to disable swap Luis Chamberlain
2023-02-07  2:52 ` [RFC 1/2] shmem: set shmem_writepage() variables early Luis Chamberlain
2023-02-07  3:52   ` Matthew Wilcox
2023-02-08 16:08     ` Luis Chamberlain
2023-02-07  2:52 ` [RFC 2/2] shmem: add support to ignore swap Luis Chamberlain
2023-02-07  4:01   ` Matthew Wilcox
2023-02-08 16:01     ` Luis Chamberlain
2023-02-08 17:45       ` Matthew Wilcox
2023-02-08 20:33         ` Yosry Ahmed
2023-02-23  0:53           ` Luis Chamberlain
2023-02-23  1:04             ` Yosry Ahmed
2023-02-23  1:35               ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).