All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] tmpfs: add the option to disable swap
@ 2023-03-02 23:27 Luis Chamberlain
  2023-03-02 23:27 ` [PATCH 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

After a couple of RFCs I think this is ready for PATCH form. Review
is appreciated. Below the changes I also list the series of tests
I performed to verify correctness. In short you either create a fs
with swap or without, but if you can't change that option later.
If we really wanted to, we could work on accepting this change on
reconfigure (remount) but its not clear yet that is desirable so
for now keep things simple.

Changes since last RFCv2:

  o Added Christian Brauner'd Acked-by for the noswap patch (the only
    change in that patch is just the new shmem_show_options() change I
    describe below).
  o Embraced Yosry Ahmed's recommendation to use mapping_set_unevictable()
    to at ensure the folios at least appear in the unevictable LRU.
    Since that is the goal, this accomplishes what we want and the VM
    takes care of things for us. The shem writepage() still uses a stop-gap
    to ensure we don't get called for swap when its shmem uses
    mapping_set_unevictable().
  o I had evaluated using shmem_lock() instead of calling mapping_set_unevictable()
    but upon my review this doesn't make much sense, as shmem_lock() was
    designed to make use of the RLIMIT_MEMLOCK and this was designed for
    files / IPC / unprivileged perf limits. If we were to use
    shmem_lock() we'd bump the count on each new inode. Using
    shmem_lock() would also complicate inode allocation on shmem as
    we'd to unwind on failure from the user_shm_lock(). It would also
    beg the question of when to capture a ucount for an inode, should we
    just share one for the superblock at shmem_fill_super() or do we
    really need to capture it at every single inode creation? In theory
    we could end up with different limits. The simple solution is to
    juse use mapping_set_unevictable() upon inode creation and be done
    with it, as it cannot fail.
  o Update the documentation for tmpfs before / after my patch to
    reflect use cases a bit more clearly between ramfs, tmpfs and brd
    ramdisks.
  o I updated the shmem_show_options() to also reveal the noswap option
    when its used.
  o Address checkpatch style complaint with spaces before tabs on
    shmem_fs.h.

Chances since first RFC:

  o Matthew suggested BUG_ON(!folio_test_locked(folio)) is not needed
    on writepage() callback for shmem so just remove that.
  o Based on Matthew's feedback the inode is set up early as it is not
    reset in case we split the folio. So now we move all the variables
    we can set up really early.
  o shmem writepage() should only be issued on reclaim, so just move
    the WARN_ON_ONCE(!wbc->for_reclaim) early so that the code and
    expectations are easier to read. This also avoid the folio splitting
    in case of that odd case.
  o There are a few cases where the shmem writepage() could possibly
    hit, but in the total_swap_pages we just bail out. We shouldn't be
    splitting the folio then. Likewise for VM_LOCKED case. But for
    a writepage() on a VM_LOCKED case is not expected so we want to
    learn about it so add a WARN_ON_ONCE() on that condition.
  o Based on Yosry Ahmed's feedback the patch which allows tmpfs to
    disable swap now just uses mapping_set_unevictable() on inode
    creation. In that case writepage() should not be called so we
    augment the WARN_ON_ONCE() for writepage() for that case to ensure
    that never happens.

To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next.

I'm doing this work as part of future experimentation with tmpfs and the
page cache, but given a common complaint found about tmpfs is the
innability to work without the page cache I figured this might be useful
to others. It turns out it is -- at least Christian Brauner indicates
systemd uses ramfs for a few use-cases because they don't want to use
swap and so having this option would let them move over to using tmpfs
for those small use cases, see systemd-creds(1).

To see if you hit swap:

mkswap /dev/nvme2n1
swapon /dev/nvme2n1
free -h

With swap - what we see today
=============================
mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       1.2Gi       2.2Gi       2.2Gi       1.2Gi
Swap:           99Gi       2.8Gi        97Gi


Without swap
=============

free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       387Mi       3.4Gi       2.1Mi        57Mi       3.3Gi
Swap:           99Gi          0B        99Gi
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       1.2Gi       2.3Gi       2.3Gi       1.1Gi
Swap:           99Gi        21Mi        99Gi

The mix and match remount testing
=================================

# Cannot disable swap after it was first enabled:
mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
mount: /data-tmpfs: mount point not mounted or bad option.
       dmesg(1) may have more information after failed mount system call.
dmesg -c
tmpfs: Cannot disable swap on remount

# Remount with the same noswap option is OK:
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
dmesg -c

# Trying to enable swap with a remount after it first disabled:
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G           tmpfs /data-tmpfs/
mount: /data-tmpfs: mount point not mounted or bad option.
       dmesg(1) may have more information after failed mount system call.
dmesg -c
tmpfs: Cannot enable swap on remount if it was disabled on first mount

[0] https://github.com/linux-kdevops/kdevops

Luis Chamberlain (6):
  shmem: remove check for folio lock on writepage()
  shmem: set shmem_writepage() variables early
  shmem: move reclaim check early on writepages()
  shmem: skip page split if we're not reclaiming
  shmem: update documentation
  shmem: add support to ignore swap

 Documentation/filesystems/tmpfs.rst  | 36 +++++++++-----
 Documentation/mm/unevictable-lru.rst |  2 +
 include/linux/shmem_fs.h             |  1 +
 mm/shmem.c                           | 70 +++++++++++++++++++---------
 4 files changed, 75 insertions(+), 34 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] shmem: remove check for folio lock on writepage()
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
@ 2023-03-02 23:27 ` Luis Chamberlain
  2023-03-06 13:59   ` David Hildenbrand
  2023-03-02 23:27 ` [PATCH 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

Matthew notes we should not need to check the folio lock
on the writepage() callback so remove it. This sanity check
has been lingering since linux-history days. We remove this
as we tidy up the writepage() callback to make things a bit
clearer.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 1af85259b6fc..7fff1a3af092 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1354,7 +1354,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		folio_clear_dirty(folio);
 	}
 
-	BUG_ON(!folio_test_locked(folio));
 	mapping = folio->mapping;
 	index = folio->index;
 	inode = mapping->host;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] shmem: set shmem_writepage() variables early
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
  2023-03-02 23:27 ` [PATCH 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
@ 2023-03-02 23:27 ` Luis Chamberlain
  2023-03-06 13:59   ` David Hildenbrand
  2023-03-06 17:06   ` Christian Brauner
  2023-03-02 23:27 ` [PATCH 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

shmem_writepage() sets up variables typically used *after* a possible
huge page split. However even if that does happen the address space
mapping should not change, and the inode does not change either. So it
should be safe to set that from the very beginning.

This commit makes no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 7fff1a3af092..2b9ff585a553 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1334,9 +1334,9 @@ int shmem_unuse(unsigned int type)
 static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 {
 	struct folio *folio = page_folio(page);
-	struct shmem_inode_info *info;
-	struct address_space *mapping;
-	struct inode *inode;
+	struct address_space *mapping = folio->mapping;
+	struct inode *inode = mapping->host;
+	struct shmem_inode_info *info = SHMEM_I(inode);
 	swp_entry_t swap;
 	pgoff_t index;
 
@@ -1354,10 +1354,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		folio_clear_dirty(folio);
 	}
 
-	mapping = folio->mapping;
 	index = folio->index;
-	inode = mapping->host;
-	info = SHMEM_I(inode);
 	if (info->flags & VM_LOCKED)
 		goto redirty;
 	if (!total_swap_pages)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] shmem: move reclaim check early on writepages()
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
  2023-03-02 23:27 ` [PATCH 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
  2023-03-02 23:27 ` [PATCH 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
@ 2023-03-02 23:27 ` Luis Chamberlain
  2023-03-06 14:01   ` David Hildenbrand
  2023-03-06 19:49   ` Yosry Ahmed
  2023-03-02 23:27 ` [PATCH 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

i915_gem requires huge folios to be split when swapping.
However we have  check for usage of writepages() to ensure
it used only for swap purposes later. Avoid the splits if
we're not being called for reclaim, even if they should in
theory not happen.

This makes the conditions easier to follow on shem_writepage().

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 2b9ff585a553..a5a6da51087e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1340,6 +1340,18 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	swp_entry_t swap;
 	pgoff_t index;
 
+	/*
+	 * Our capabilities prevent regular writeback or sync from ever calling
+	 * shmem_writepage; but a stacking filesystem might use ->writepage of
+	 * its underlying filesystem, in which case tmpfs should write out to
+	 * swap only in response to memory pressure, and not for the writeback
+	 * threads or sync.
+	 */
+	if (!wbc->for_reclaim) {
+		WARN_ON_ONCE(1);	/* Still happens? Tell us about it! */
+		goto redirty;
+	}
+
 	/*
 	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
 	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
@@ -1360,18 +1372,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	if (!total_swap_pages)
 		goto redirty;
 
-	/*
-	 * Our capabilities prevent regular writeback or sync from ever calling
-	 * shmem_writepage; but a stacking filesystem might use ->writepage of
-	 * its underlying filesystem, in which case tmpfs should write out to
-	 * swap only in response to memory pressure, and not for the writeback
-	 * threads or sync.
-	 */
-	if (!wbc->for_reclaim) {
-		WARN_ON_ONCE(1);	/* Still happens? Tell us about it! */
-		goto redirty;
-	}
-
 	/*
 	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
 	 * value into swapfile.c, the only way we can correctly account for a
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] shmem: skip page split if we're not reclaiming
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
                   ` (2 preceding siblings ...)
  2023-03-02 23:27 ` [PATCH 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
@ 2023-03-02 23:27 ` Luis Chamberlain
  2023-03-06 14:02   ` David Hildenbrand
  2023-03-06 19:49   ` Yosry Ahmed
  2023-03-02 23:27 ` [PATCH 5/6] shmem: update documentation Luis Chamberlain
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

In theory when info->flags & VM_LOCKED we should not be getting
shem_writepage() called so we should be verifying this with a
WARN_ON_ONCE(). Since we should not be swapping then best to ensure
we also don't do the folio split earlier too. So just move the check
early to avoid folio splits in case its a dubious call.

We also have a similar early bail when !total_swap_pages so just move
that earlier to avoid the possible folio split in the same situation.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index a5a6da51087e..6006dbb7dbcb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1352,6 +1352,12 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		goto redirty;
 	}
 
+	if (WARN_ON_ONCE(info->flags & VM_LOCKED))
+		goto redirty;
+
+	if (!total_swap_pages)
+		goto redirty;
+
 	/*
 	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
 	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
@@ -1367,10 +1373,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	}
 
 	index = folio->index;
-	if (info->flags & VM_LOCKED)
-		goto redirty;
-	if (!total_swap_pages)
-		goto redirty;
 
 	/*
 	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] shmem: update documentation
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
                   ` (3 preceding siblings ...)
  2023-03-02 23:27 ` [PATCH 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
@ 2023-03-02 23:27 ` Luis Chamberlain
  2023-03-06 14:04   ` David Hildenbrand
  2023-03-06 17:05   ` Christian Brauner
  2023-03-02 23:27 ` [PATCH 6/6] shmem: add support to ignore swap Luis Chamberlain
  2023-03-06 17:09 ` [PATCH 0/6] tmpfs: add the option to disable swap Christian Brauner
  6 siblings, 2 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

Update the docs to reflect a bit better why some folks prefer tmpfs
over ramfs and clarify a bit more about the difference between brd
ramdisks.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 Documentation/filesystems/tmpfs.rst | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 0408c245785e..e77ebdacadd0 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -13,14 +13,25 @@ everything stored therein is lost.
 
 tmpfs puts everything into the kernel internal caches and grows and
 shrinks to accommodate the files it contains and is able to swap
-unneeded pages out to swap space. It has maximum size limits which can
-be adjusted on the fly via 'mount -o remount ...'
-
-If you compare it to ramfs (which was the template to create tmpfs)
-you gain swapping and limit checking. Another similar thing is the RAM
-disk (/dev/ram*), which simulates a fixed size hard disk in physical
-RAM, where you have to create an ordinary filesystem on top. Ramdisks
-cannot swap and you do not have the possibility to resize them.
+unneeded pages out to swap space.
+
+tmpfs extends ramfs with a few userspace configurable options listed and
+explained further below, some of which can be reconfigured dynamically on the
+fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
+filesystem can be resized but it cannot be resized to a size below its current
+usage. tmpfs also supports POSIX ACLs, and extended attributes for the
+trusted.* and security.* namespaces. ramfs does not use swap and you cannot
+modify any parameter for a ramfs filesystem. The size limit of a ramfs
+filesystem is how much memory you have available, and so care must be taken if
+used so to not run out of memory.
+
+An alternative to tmpfs and ramfs is to use brd to create RAM disks
+(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
+To write data you would just then need to create an regular filesystem on top
+this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
+configured in size at initialization and you cannot dynamically resize them.
+Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
+block layer at all.
 
 Since tmpfs lives completely in the page cache and on swap, all tmpfs
 pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] shmem: add support to ignore swap
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
                   ` (4 preceding siblings ...)
  2023-03-02 23:27 ` [PATCH 5/6] shmem: update documentation Luis Chamberlain
@ 2023-03-02 23:27 ` Luis Chamberlain
  2023-03-06 20:03   ` Yosry Ahmed
  2023-03-06 17:09 ` [PATCH 0/6] tmpfs: add the option to disable swap Christian Brauner
  6 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-02 23:27 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, mcgrof, patches, linux-kernel

In doing experimentations with shmem having the option to avoid swap
becomes a useful mechanism. One of the *raves* about brd over shmem is
you can avoid swap, but that's not really a good reason to use brd if
we can instead use shmem. Using brd has its own good reasons to exist,
but just because "tmpfs" doesn't let you do that is not a great reason
to avoid it if we can easily add support for it.

I don't add support for reconfiguring incompatible options, but if
we really wanted to we can add support for that.

To avoid swap we use mapping_set_unevictable() upon inode creation,
and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 Documentation/filesystems/tmpfs.rst  |  9 ++++++---
 Documentation/mm/unevictable-lru.rst |  2 ++
 include/linux/shmem_fs.h             |  1 +
 mm/shmem.c                           | 28 +++++++++++++++++++++++++++-
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index e77ebdacadd0..551b621f34d9 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -13,7 +13,8 @@ everything stored therein is lost.
 
 tmpfs puts everything into the kernel internal caches and grows and
 shrinks to accommodate the files it contains and is able to swap
-unneeded pages out to swap space.
+unneeded pages out to swap space, if swap was enabled for the tmpfs
+filesystem.
 
 tmpfs extends ramfs with a few userspace configurable options listed and
 explained further below, some of which can be reconfigured dynamically on the
@@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynamically resize them.
 Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
 block layer at all.
 
-Since tmpfs lives completely in the page cache and on swap, all tmpfs
-pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
+Since tmpfs lives completely in the page cache and on optionally on swap,
+all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
 free(1). Notice that these counters also include shared memory
 (shmem, see ipcs(1)). The most reliable way to get the count is
 using df(1) and du(1).
@@ -83,6 +84,8 @@ nr_inodes  The maximum number of inodes for this instance. The default
            is half of the number of your physical RAM pages, or (on a
            machine with highmem) the number of lowmem RAM pages,
            whichever is the lower.
+noswap     Disables swap. Remounts must respect the original settings.
+           By default swap is enabled.
 =========  ============================================================
 
 These parameters accept a suffix k, m or g for kilo, mega and giga and
diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
index 92ac5dca420c..3cdcbb6e00a0 100644
--- a/Documentation/mm/unevictable-lru.rst
+++ b/Documentation/mm/unevictable-lru.rst
@@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:
 
  * Those owned by ramfs.
 
+ * Those owned by tmpfs with the noswap option.
+
  * Those mapped into SHM_LOCK'd shared memory regions.
 
  * Those mapped into VM_LOCKED [mlock()ed] VMAs.
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 103d1000a5a2..21989d4f8cbe 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -45,6 +45,7 @@ struct shmem_sb_info {
 	kuid_t uid;		    /* Mount uid for root directory */
 	kgid_t gid;		    /* Mount gid for root directory */
 	bool full_inums;	    /* If i_ino should be uint or ino_t */
+	bool noswap;		    /* ingores VM relcaim / swap requests */
 	ino_t next_ino;		    /* The next per-sb inode number to use */
 	ino_t __percpu *ino_batch;  /* The next per-cpu inode number to use */
 	struct mempolicy *mpol;     /* default memory policy for mappings */
diff --git a/mm/shmem.c b/mm/shmem.c
index 6006dbb7dbcb..cd36cb3d974c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -119,10 +119,12 @@ struct shmem_options {
 	bool full_inums;
 	int huge;
 	int seen;
+	bool noswap;
 #define SHMEM_SEEN_BLOCKS 1
 #define SHMEM_SEEN_INODES 2
 #define SHMEM_SEEN_HUGE 4
 #define SHMEM_SEEN_INUMS 8
+#define SHMEM_SEEN_NOSWAP 16
 };
 
 #ifdef CONFIG_TMPFS
@@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	struct address_space *mapping = folio->mapping;
 	struct inode *inode = mapping->host;
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	swp_entry_t swap;
 	pgoff_t index;
 
@@ -1352,7 +1355,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		goto redirty;
 	}
 
-	if (WARN_ON_ONCE(info->flags & VM_LOCKED))
+	if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap))
 		goto redirty;
 
 	if (!total_swap_pages)
@@ -2489,6 +2492,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
 			shmem_set_inode_flags(inode, info->fsflags);
 		INIT_LIST_HEAD(&info->shrinklist);
 		INIT_LIST_HEAD(&info->swaplist);
+		if (sbinfo->noswap)
+			mapping_set_unevictable(inode->i_mapping);
 		simple_xattrs_init(&info->xattrs);
 		cache_no_acl(inode);
 		mapping_set_large_folios(inode->i_mapping);
@@ -3576,6 +3581,7 @@ enum shmem_param {
 	Opt_uid,
 	Opt_inode32,
 	Opt_inode64,
+	Opt_noswap,
 };
 
 static const struct constant_table shmem_param_enums_huge[] = {
@@ -3597,6 +3603,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
 	fsparam_u32   ("uid",		Opt_uid),
 	fsparam_flag  ("inode32",	Opt_inode32),
 	fsparam_flag  ("inode64",	Opt_inode64),
+	fsparam_flag  ("noswap",	Opt_noswap),
 	{}
 };
 
@@ -3680,6 +3687,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		ctx->full_inums = true;
 		ctx->seen |= SHMEM_SEEN_INUMS;
 		break;
+	case Opt_noswap:
+		ctx->noswap = true;
+		ctx->seen |= SHMEM_SEEN_NOSWAP;
+		break;
 	}
 	return 0;
 
@@ -3778,6 +3789,14 @@ static int shmem_reconfigure(struct fs_context *fc)
 		err = "Current inum too high to switch to 32-bit inums";
 		goto out;
 	}
+	if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) {
+		err = "Cannot disable swap on remount";
+		goto out;
+	}
+	if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) {
+		err = "Cannot enable swap on remount if it was disabled on first mount";
+		goto out;
+	}
 
 	if (ctx->seen & SHMEM_SEEN_HUGE)
 		sbinfo->huge = ctx->huge;
@@ -3798,6 +3817,10 @@ static int shmem_reconfigure(struct fs_context *fc)
 		sbinfo->mpol = ctx->mpol;	/* transfers initial ref */
 		ctx->mpol = NULL;
 	}
+
+	if (ctx->noswap)
+		sbinfo->noswap = true;
+
 	raw_spin_unlock(&sbinfo->stat_lock);
 	mpol_put(mpol);
 	return 0;
@@ -3852,6 +3875,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 		seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge));
 #endif
 	shmem_show_mpol(seq, sbinfo->mpol);
+	if (sbinfo->noswap)
+		seq_printf(seq, ",noswap");
 	return 0;
 }
 
@@ -3895,6 +3920,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 			ctx->inodes = shmem_default_max_inodes();
 		if (!(ctx->seen & SHMEM_SEEN_INUMS))
 			ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
+		sbinfo->noswap = ctx->noswap;
 	} else {
 		sb->s_flags |= SB_NOUSER;
 	}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/6] shmem: remove check for folio lock on writepage()
  2023-03-02 23:27 ` [PATCH 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
@ 2023-03-06 13:59   ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2023-03-06 13:59 UTC (permalink / raw)
  To: Luis Chamberlain, hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, patches, linux-kernel

On 03.03.23 00:27, Luis Chamberlain wrote:
> Matthew notes we should not need to check the folio lock
> on the writepage() callback so remove it. This sanity check
> has been lingering since linux-history days. We remove this
> as we tidy up the writepage() callback to make things a bit
> clearer.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   mm/shmem.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1af85259b6fc..7fff1a3af092 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1354,7 +1354,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   		folio_clear_dirty(folio);
>   	}
>   
> -	BUG_ON(!folio_test_locked(folio));
>   	mapping = folio->mapping;
>   	index = folio->index;
>   	inode = mapping->host;


It's still required IIUC. At least for split_huge_page() and 
setting/clearing some page flags in there.

At least split_huge_page() also contains a
	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);

So it's probably reasonable to remove this unconditional sanity check 
here; removing BUG_ON's is always nice.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/6] shmem: set shmem_writepage() variables early
  2023-03-02 23:27 ` [PATCH 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
@ 2023-03-06 13:59   ` David Hildenbrand
  2023-03-06 17:06   ` Christian Brauner
  1 sibling, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2023-03-06 13:59 UTC (permalink / raw)
  To: Luis Chamberlain, hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, patches, linux-kernel

On 03.03.23 00:27, Luis Chamberlain wrote:
> shmem_writepage() sets up variables typically used *after* a possible
> huge page split. However even if that does happen the address space
> mapping should not change, and the inode does not change either. So it
> should be safe to set that from the very beginning.
> 
> This commit makes no functional changes.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] shmem: move reclaim check early on writepages()
  2023-03-02 23:27 ` [PATCH 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
@ 2023-03-06 14:01   ` David Hildenbrand
  2023-03-08 21:30     ` Luis Chamberlain
  2023-03-06 19:49   ` Yosry Ahmed
  1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2023-03-06 14:01 UTC (permalink / raw)
  To: Luis Chamberlain, hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, patches, linux-kernel

On 03.03.23 00:27, Luis Chamberlain wrote:
> i915_gem requires huge folios to be split when swapping.
> However we have  check for usage of writepages() to ensure
> it used only for swap purposes later. Avoid the splits if
> we're not being called for reclaim, even if they should in
> theory not happen.
> 
> This makes the conditions easier to follow on shem_writepage().
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   mm/shmem.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 2b9ff585a553..a5a6da51087e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1340,6 +1340,18 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   	swp_entry_t swap;
>   	pgoff_t index;
>   
> +	/*
> +	 * Our capabilities prevent regular writeback or sync from ever calling
> +	 * shmem_writepage; but a stacking filesystem might use ->writepage of
> +	 * its underlying filesystem, in which case tmpfs should write out to
> +	 * swap only in response to memory pressure, and not for the writeback
> +	 * threads or sync.
> +	 */
> +	if (!wbc->for_reclaim) {

if (WARN_ON_ONCE(!wbc->for_reclaim))

> +		WARN_ON_ONCE(1);	/* Still happens? Tell us about it! */

And drop the comment :) That's what WARN_ON_ONCE is all about.

> +		goto redirty;
> +	}
> +
>   	/*
>   	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
>   	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
> @@ -1360,18 +1372,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   	if (!total_swap_pages)
>   		goto redirty;
>   
> -	/*
> -	 * Our capabilities prevent regular writeback or sync from ever calling
> -	 * shmem_writepage; but a stacking filesystem might use ->writepage of
> -	 * its underlying filesystem, in which case tmpfs should write out to
> -	 * swap only in response to memory pressure, and not for the writeback
> -	 * threads or sync.
> -	 */
> -	if (!wbc->for_reclaim) {
> -		WARN_ON_ONCE(1);	/* Still happens? Tell us about it! */
> -		goto redirty;
> -	}
> -
>   	/*
>   	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
>   	 * value into swapfile.c, the only way we can correctly account for a

LGTM

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] shmem: skip page split if we're not reclaiming
  2023-03-02 23:27 ` [PATCH 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
@ 2023-03-06 14:02   ` David Hildenbrand
  2023-03-06 19:49   ` Yosry Ahmed
  1 sibling, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2023-03-06 14:02 UTC (permalink / raw)
  To: Luis Chamberlain, hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, patches, linux-kernel

On 03.03.23 00:27, Luis Chamberlain wrote:
> In theory when info->flags & VM_LOCKED we should not be getting
> shem_writepage() called so we should be verifying this with a
> WARN_ON_ONCE(). Since we should not be swapping then best to ensure
> we also don't do the folio split earlier too. So just move the check
> early to avoid folio splits in case its a dubious call.
> 
> We also have a similar early bail when !total_swap_pages so just move
> that earlier to avoid the possible folio split in the same situation.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   mm/shmem.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a5a6da51087e..6006dbb7dbcb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1352,6 +1352,12 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   		goto redirty;
>   	}
>   
> +	if (WARN_ON_ONCE(info->flags & VM_LOCKED))
> +		goto redirty;
> +
> +	if (!total_swap_pages)
> +		goto redirty;
> +
>   	/*
>   	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
>   	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
> @@ -1367,10 +1373,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   	}
>   
>   	index = folio->index;
> -	if (info->flags & VM_LOCKED)
> -		goto redirty;
> -	if (!total_swap_pages)
> -		goto redirty;
>   
>   	/*
>   	 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] shmem: update documentation
  2023-03-02 23:27 ` [PATCH 5/6] shmem: update documentation Luis Chamberlain
@ 2023-03-06 14:04   ` David Hildenbrand
  2023-03-08 22:16     ` Luis Chamberlain
  2023-03-06 17:05   ` Christian Brauner
  1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2023-03-06 14:04 UTC (permalink / raw)
  To: Luis Chamberlain, hughd, akpm, willy, brauner
  Cc: linux-mm, p.raghav, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, patches, linux-kernel

On 03.03.23 00:27, Luis Chamberlain wrote:
> Update the docs to reflect a bit better why some folks prefer tmpfs
> over ramfs and clarify a bit more about the difference between brd
> ramdisks.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   Documentation/filesystems/tmpfs.rst | 27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..e77ebdacadd0 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -13,14 +13,25 @@ everything stored therein is lost.
>   
>   tmpfs puts everything into the kernel internal caches and grows and
>   shrinks to accommodate the files it contains and is able to swap
> -unneeded pages out to swap space. It has maximum size limits which can
> -be adjusted on the fly via 'mount -o remount ...'
> -
> -If you compare it to ramfs (which was the template to create tmpfs)
> -you gain swapping and limit checking. Another similar thing is the RAM
> -disk (/dev/ram*), which simulates a fixed size hard disk in physical
> -RAM, where you have to create an ordinary filesystem on top. Ramdisks
> -cannot swap and you do not have the possibility to resize them.
> +unneeded pages out to swap space.

I suppose, in contrast to ramfs, tmpfs also supports THP. Maybe worth 
adding as well.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] shmem: update documentation
  2023-03-02 23:27 ` [PATCH 5/6] shmem: update documentation Luis Chamberlain
  2023-03-06 14:04   ` David Hildenbrand
@ 2023-03-06 17:05   ` Christian Brauner
  1 sibling, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-03-06 17:05 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, willy, linux-mm, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, patches, linux-kernel

On Thu, Mar 02, 2023 at 03:27:57PM -0800, Luis Chamberlain wrote:
> Update the docs to reflect a bit better why some folks prefer tmpfs
> over ramfs and clarify a bit more about the difference between brd
> ramdisks.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---

Looks good,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/6] shmem: set shmem_writepage() variables early
  2023-03-02 23:27 ` [PATCH 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
  2023-03-06 13:59   ` David Hildenbrand
@ 2023-03-06 17:06   ` Christian Brauner
  1 sibling, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-03-06 17:06 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, willy, linux-mm, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, patches, linux-kernel

On Thu, Mar 02, 2023 at 03:27:54PM -0800, Luis Chamberlain wrote:
> shmem_writepage() sets up variables typically used *after* a possible
> huge page split. However even if that does happen the address space
> mapping should not change, and the inode does not change either. So it
> should be safe to set that from the very beginning.
> 
> This commit makes no functional changes.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---

Looks good,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] tmpfs: add the option to disable swap
  2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
                   ` (5 preceding siblings ...)
  2023-03-02 23:27 ` [PATCH 6/6] shmem: add support to ignore swap Luis Chamberlain
@ 2023-03-06 17:09 ` Christian Brauner
  6 siblings, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-03-06 17:09 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, willy, linux-mm, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, patches, linux-kernel

On Thu, Mar 02, 2023 at 03:27:52PM -0800, Luis Chamberlain wrote:
> After a couple of RFCs I think this is ready for PATCH form. Review
> is appreciated. Below the changes I also list the series of tests
> I performed to verify correctness. In short you either create a fs
> with swap or without, but if you can't change that option later.
> If we really wanted to, we could work on accepting this change on
> reconfigure (remount) but its not clear yet that is desirable so
> for now keep things simple.
> 
> Changes since last RFCv2:
> 
>   o Added Christian Brauner'd Acked-by for the noswap patch (the only
>     change in that patch is just the new shmem_show_options() change I
>     describe below).

Sorry, didn't get around to the rest of the patches earlier. Not an mm
expert but even the mm changes look straightforward to me so,

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] shmem: move reclaim check early on writepages()
  2023-03-02 23:27 ` [PATCH 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
  2023-03-06 14:01   ` David Hildenbrand
@ 2023-03-06 19:49   ` Yosry Ahmed
  1 sibling, 0 replies; 21+ messages in thread
From: Yosry Ahmed @ 2023-03-06 19:49 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, willy, brauner, linux-mm, p.raghav, da.gomez,
	a.manzanares, dave, keescook, patches, linux-kernel

On Thu, Mar 2, 2023 at 3:28 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> i915_gem requires huge folios to be split when swapping.
> However we have  check for usage of writepages() to ensure
> it used only for swap purposes later. Avoid the splits if
> we're not being called for reclaim, even if they should in
> theory not happen.
>
> This makes the conditions easier to follow on shem_writepage().
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Nice cleanup.

Reviewed-by: Yosry Ahmed <yosryahmed@google.com>

> ---
>  mm/shmem.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 2b9ff585a553..a5a6da51087e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1340,6 +1340,18 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>         swp_entry_t swap;
>         pgoff_t index;
>
> +       /*
> +        * Our capabilities prevent regular writeback or sync from ever calling
> +        * shmem_writepage; but a stacking filesystem might use ->writepage of
> +        * its underlying filesystem, in which case tmpfs should write out to
> +        * swap only in response to memory pressure, and not for the writeback
> +        * threads or sync.
> +        */
> +       if (!wbc->for_reclaim) {
> +               WARN_ON_ONCE(1);        /* Still happens? Tell us about it! */
> +               goto redirty;
> +       }
> +
>         /*
>          * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
>          * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
> @@ -1360,18 +1372,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>         if (!total_swap_pages)
>                 goto redirty;
>
> -       /*
> -        * Our capabilities prevent regular writeback or sync from ever calling
> -        * shmem_writepage; but a stacking filesystem might use ->writepage of
> -        * its underlying filesystem, in which case tmpfs should write out to
> -        * swap only in response to memory pressure, and not for the writeback
> -        * threads or sync.
> -        */
> -       if (!wbc->for_reclaim) {
> -               WARN_ON_ONCE(1);        /* Still happens? Tell us about it! */
> -               goto redirty;
> -       }
> -
>         /*
>          * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
>          * value into swapfile.c, the only way we can correctly account for a
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] shmem: skip page split if we're not reclaiming
  2023-03-02 23:27 ` [PATCH 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
  2023-03-06 14:02   ` David Hildenbrand
@ 2023-03-06 19:49   ` Yosry Ahmed
  1 sibling, 0 replies; 21+ messages in thread
From: Yosry Ahmed @ 2023-03-06 19:49 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, willy, brauner, linux-mm, p.raghav, da.gomez,
	a.manzanares, dave, keescook, patches, linux-kernel

On Thu, Mar 2, 2023 at 3:28 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> In theory when info->flags & VM_LOCKED we should not be getting
> shem_writepage() called so we should be verifying this with a
> WARN_ON_ONCE(). Since we should not be swapping then best to ensure
> we also don't do the folio split earlier too. So just move the check
> early to avoid folio splits in case its a dubious call.
>
> We also have a similar early bail when !total_swap_pages so just move
> that earlier to avoid the possible folio split in the same situation.
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Reviewed-by: Yosry Ahmed <yosryahmed@google.com>

> ---
>  mm/shmem.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a5a6da51087e..6006dbb7dbcb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1352,6 +1352,12 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>                 goto redirty;
>         }
>
> +       if (WARN_ON_ONCE(info->flags & VM_LOCKED))
> +               goto redirty;
> +
> +       if (!total_swap_pages)
> +               goto redirty;
> +
>         /*
>          * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
>          * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
> @@ -1367,10 +1373,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>         }
>
>         index = folio->index;
> -       if (info->flags & VM_LOCKED)
> -               goto redirty;
> -       if (!total_swap_pages)
> -               goto redirty;
>
>         /*
>          * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] shmem: add support to ignore swap
  2023-03-02 23:27 ` [PATCH 6/6] shmem: add support to ignore swap Luis Chamberlain
@ 2023-03-06 20:03   ` Yosry Ahmed
  2023-03-08 22:21     ` Luis Chamberlain
  0 siblings, 1 reply; 21+ messages in thread
From: Yosry Ahmed @ 2023-03-06 20:03 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, willy, brauner, linux-mm, p.raghav, da.gomez,
	a.manzanares, dave, keescook, patches, linux-kernel

On Thu, Mar 2, 2023 at 3:28 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> In doing experimentations with shmem having the option to avoid swap
> becomes a useful mechanism. One of the *raves* about brd over shmem is
> you can avoid swap, but that's not really a good reason to use brd if
> we can instead use shmem. Using brd has its own good reasons to exist,
> but just because "tmpfs" doesn't let you do that is not a great reason
> to avoid it if we can easily add support for it.
>
> I don't add support for reconfiguring incompatible options, but if
> we really wanted to we can add support for that.
>
> To avoid swap we use mapping_set_unevictable() upon inode creation,
> and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.
>
> Acked-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  Documentation/filesystems/tmpfs.rst  |  9 ++++++---
>  Documentation/mm/unevictable-lru.rst |  2 ++
>  include/linux/shmem_fs.h             |  1 +
>  mm/shmem.c                           | 28 +++++++++++++++++++++++++++-
>  4 files changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index e77ebdacadd0..551b621f34d9 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -13,7 +13,8 @@ everything stored therein is lost.
>
>  tmpfs puts everything into the kernel internal caches and grows and
>  shrinks to accommodate the files it contains and is able to swap
> -unneeded pages out to swap space.
> +unneeded pages out to swap space, if swap was enabled for the tmpfs
> +filesystem.

Nit: s/filesystem/mount to make it clear it's a per-mount setting?

>
>  tmpfs extends ramfs with a few userspace configurable options listed and
>  explained further below, some of which can be reconfigured dynamically on the
> @@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynamically resize them.
>  Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
>  block layer at all.
>
> -Since tmpfs lives completely in the page cache and on swap, all tmpfs
> -pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
> +Since tmpfs lives completely in the page cache and on optionally on swap,

Nit: s/on optionally on swap/optionally on swap
(extra on)

> +all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
>  free(1). Notice that these counters also include shared memory
>  (shmem, see ipcs(1)). The most reliable way to get the count is
>  using df(1) and du(1).
> @@ -83,6 +84,8 @@ nr_inodes  The maximum number of inodes for this instance. The default
>             is half of the number of your physical RAM pages, or (on a
>             machine with highmem) the number of lowmem RAM pages,
>             whichever is the lower.
> +noswap     Disables swap. Remounts must respect the original settings.
> +           By default swap is enabled.
>  =========  ============================================================
>
>  These parameters accept a suffix k, m or g for kilo, mega and giga and
> diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
> index 92ac5dca420c..3cdcbb6e00a0 100644
> --- a/Documentation/mm/unevictable-lru.rst
> +++ b/Documentation/mm/unevictable-lru.rst
> @@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:
>
>   * Those owned by ramfs.
>
> + * Those owned by tmpfs with the noswap option.
> +

Nit: s/noswap option/noswap mount option

to make it easier to understand what noswap is

>   * Those mapped into SHM_LOCK'd shared memory regions.
>
>   * Those mapped into VM_LOCKED [mlock()ed] VMAs.
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 103d1000a5a2..21989d4f8cbe 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -45,6 +45,7 @@ struct shmem_sb_info {
>         kuid_t uid;                 /* Mount uid for root directory */
>         kgid_t gid;                 /* Mount gid for root directory */
>         bool full_inums;            /* If i_ino should be uint or ino_t */
> +       bool noswap;                /* ingores VM relcaim / swap requests */

Nit: typos
s/ingores/ignores
s/relcaim/reclaim

>         ino_t next_ino;             /* The next per-sb inode number to use */
>         ino_t __percpu *ino_batch;  /* The next per-cpu inode number to use */
>         struct mempolicy *mpol;     /* default memory policy for mappings */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6006dbb7dbcb..cd36cb3d974c 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -119,10 +119,12 @@ struct shmem_options {
>         bool full_inums;
>         int huge;
>         int seen;
> +       bool noswap;
>  #define SHMEM_SEEN_BLOCKS 1
>  #define SHMEM_SEEN_INODES 2
>  #define SHMEM_SEEN_HUGE 4
>  #define SHMEM_SEEN_INUMS 8
> +#define SHMEM_SEEN_NOSWAP 16
>  };
>
>  #ifdef CONFIG_TMPFS
> @@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>         struct address_space *mapping = folio->mapping;
>         struct inode *inode = mapping->host;
>         struct shmem_inode_info *info = SHMEM_I(inode);
> +       struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>         swp_entry_t swap;
>         pgoff_t index;
>
> @@ -1352,7 +1355,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>                 goto redirty;
>         }
>
> -       if (WARN_ON_ONCE(info->flags & VM_LOCKED))
> +       if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap))
>                 goto redirty;
>
>         if (!total_swap_pages)
> @@ -2489,6 +2492,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
>                         shmem_set_inode_flags(inode, info->fsflags);
>                 INIT_LIST_HEAD(&info->shrinklist);
>                 INIT_LIST_HEAD(&info->swaplist);
> +               if (sbinfo->noswap)
> +                       mapping_set_unevictable(inode->i_mapping);
>                 simple_xattrs_init(&info->xattrs);
>                 cache_no_acl(inode);
>                 mapping_set_large_folios(inode->i_mapping);
> @@ -3576,6 +3581,7 @@ enum shmem_param {
>         Opt_uid,
>         Opt_inode32,
>         Opt_inode64,
> +       Opt_noswap,
>  };
>
>  static const struct constant_table shmem_param_enums_huge[] = {
> @@ -3597,6 +3603,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
>         fsparam_u32   ("uid",           Opt_uid),
>         fsparam_flag  ("inode32",       Opt_inode32),
>         fsparam_flag  ("inode64",       Opt_inode64),
> +       fsparam_flag  ("noswap",        Opt_noswap),
>         {}
>  };
>
> @@ -3680,6 +3687,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>                 ctx->full_inums = true;
>                 ctx->seen |= SHMEM_SEEN_INUMS;
>                 break;
> +       case Opt_noswap:
> +               ctx->noswap = true;
> +               ctx->seen |= SHMEM_SEEN_NOSWAP;
> +               break;
>         }
>         return 0;
>
> @@ -3778,6 +3789,14 @@ static int shmem_reconfigure(struct fs_context *fc)
>                 err = "Current inum too high to switch to 32-bit inums";
>                 goto out;
>         }
> +       if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) {
> +               err = "Cannot disable swap on remount";
> +               goto out;
> +       }
> +       if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) {
> +               err = "Cannot enable swap on remount if it was disabled on first mount";
> +               goto out;
> +       }
>
>         if (ctx->seen & SHMEM_SEEN_HUGE)
>                 sbinfo->huge = ctx->huge;
> @@ -3798,6 +3817,10 @@ static int shmem_reconfigure(struct fs_context *fc)
>                 sbinfo->mpol = ctx->mpol;       /* transfers initial ref */
>                 ctx->mpol = NULL;
>         }
> +
> +       if (ctx->noswap)
> +               sbinfo->noswap = true;
> +
>         raw_spin_unlock(&sbinfo->stat_lock);
>         mpol_put(mpol);
>         return 0;
> @@ -3852,6 +3875,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
>                 seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge));
>  #endif
>         shmem_show_mpol(seq, sbinfo->mpol);
> +       if (sbinfo->noswap)
> +               seq_printf(seq, ",noswap");
>         return 0;
>  }
>
> @@ -3895,6 +3920,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
>                         ctx->inodes = shmem_default_max_inodes();
>                 if (!(ctx->seen & SHMEM_SEEN_INUMS))
>                         ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
> +               sbinfo->noswap = ctx->noswap;
>         } else {
>                 sb->s_flags |= SB_NOUSER;
>         }
> --
> 2.39.1
>

The change logically makes sense to me but I am not at all familiar
with shmem and fs stuff, so I would rather someone else take a look.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] shmem: move reclaim check early on writepages()
  2023-03-06 14:01   ` David Hildenbrand
@ 2023-03-08 21:30     ` Luis Chamberlain
  0 siblings, 0 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-08 21:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: hughd, akpm, willy, brauner, linux-mm, p.raghav, da.gomez,
	a.manzanares, dave, yosryahmed, keescook, patches, linux-kernel

On Mon, Mar 06, 2023 at 03:01:52PM +0100, David Hildenbrand wrote:
> On 03.03.23 00:27, Luis Chamberlain wrote:
> > @@ -1340,6 +1340,18 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> >   	swp_entry_t swap;
> >   	pgoff_t index;
> > +	/*
> > +	 * Our capabilities prevent regular writeback or sync from ever calling
> > +	 * shmem_writepage; but a stacking filesystem might use ->writepage of
> > +	 * its underlying filesystem, in which case tmpfs should write out to
> > +	 * swap only in response to memory pressure, and not for the writeback
> > +	 * threads or sync.
> > +	 */
> > +	if (!wbc->for_reclaim) {
> 
> if (WARN_ON_ONCE(!wbc->for_reclaim))
> 
> > +		WARN_ON_ONCE(1);	/* Still happens? Tell us about it! */
> 
> And drop the comment :) That's what WARN_ON_ONCE is all about.

Good call, will add that to v2.

> Acked-by: David Hildenbrand <david@redhat.com>

Great thanks,

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] shmem: update documentation
  2023-03-06 14:04   ` David Hildenbrand
@ 2023-03-08 22:16     ` Luis Chamberlain
  0 siblings, 0 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-08 22:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: hughd, akpm, willy, brauner, linux-mm, p.raghav, da.gomez,
	a.manzanares, dave, yosryahmed, keescook, patches, linux-kernel

On Mon, Mar 06, 2023 at 03:04:57PM +0100, David Hildenbrand wrote:
> On 03.03.23 00:27, Luis Chamberlain wrote:
> > Update the docs to reflect a bit better why some folks prefer tmpfs
> > over ramfs and clarify a bit more about the difference between brd
> > ramdisks.
> > 
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >   Documentation/filesystems/tmpfs.rst | 27 +++++++++++++++++++--------
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> > index 0408c245785e..e77ebdacadd0 100644
> > --- a/Documentation/filesystems/tmpfs.rst
> > +++ b/Documentation/filesystems/tmpfs.rst
> > @@ -13,14 +13,25 @@ everything stored therein is lost.
> >   tmpfs puts everything into the kernel internal caches and grows and
> >   shrinks to accommodate the files it contains and is able to swap
> > -unneeded pages out to swap space. It has maximum size limits which can
> > -be adjusted on the fly via 'mount -o remount ...'
> > -
> > -If you compare it to ramfs (which was the template to create tmpfs)
> > -you gain swapping and limit checking. Another similar thing is the RAM
> > -disk (/dev/ram*), which simulates a fixed size hard disk in physical
> > -RAM, where you have to create an ordinary filesystem on top. Ramdisks
> > -cannot swap and you do not have the possibility to resize them.
> > +unneeded pages out to swap space.
> 
> I suppose, in contrast to ramfs, tmpfs also supports THP. Maybe worth adding
> as well.

Good call, both the mount and the sysfs file (which is registered by
THP) lacks any documentation so I'll just add docs for both.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] shmem: add support to ignore swap
  2023-03-06 20:03   ` Yosry Ahmed
@ 2023-03-08 22:21     ` Luis Chamberlain
  0 siblings, 0 replies; 21+ messages in thread
From: Luis Chamberlain @ 2023-03-08 22:21 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: hughd, akpm, willy, brauner, linux-mm, p.raghav, da.gomez,
	a.manzanares, dave, keescook, patches, linux-kernel

On Mon, Mar 06, 2023 at 12:03:57PM -0800, Yosry Ahmed wrote:
> On Thu, Mar 2, 2023 at 3:28 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> > index e77ebdacadd0..551b621f34d9 100644
> > --- a/Documentation/filesystems/tmpfs.rst
> > +++ b/Documentation/filesystems/tmpfs.rst
> > @@ -13,7 +13,8 @@ everything stored therein is lost.
> >
> >  tmpfs puts everything into the kernel internal caches and grows and
> >  shrinks to accommodate the files it contains and is able to swap
> > -unneeded pages out to swap space.
> > +unneeded pages out to swap space, if swap was enabled for the tmpfs
> > +filesystem.
> 
> Nit: s/filesystem/mount to make it clear it's a per-mount setting?

Fixed this and all the other typos you addressed, thanks!

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-03-08 22:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 23:27 [PATCH 0/6] tmpfs: add the option to disable swap Luis Chamberlain
2023-03-02 23:27 ` [PATCH 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
2023-03-06 13:59   ` David Hildenbrand
2023-03-02 23:27 ` [PATCH 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
2023-03-06 13:59   ` David Hildenbrand
2023-03-06 17:06   ` Christian Brauner
2023-03-02 23:27 ` [PATCH 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
2023-03-06 14:01   ` David Hildenbrand
2023-03-08 21:30     ` Luis Chamberlain
2023-03-06 19:49   ` Yosry Ahmed
2023-03-02 23:27 ` [PATCH 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
2023-03-06 14:02   ` David Hildenbrand
2023-03-06 19:49   ` Yosry Ahmed
2023-03-02 23:27 ` [PATCH 5/6] shmem: update documentation Luis Chamberlain
2023-03-06 14:04   ` David Hildenbrand
2023-03-08 22:16     ` Luis Chamberlain
2023-03-06 17:05   ` Christian Brauner
2023-03-02 23:27 ` [PATCH 6/6] shmem: add support to ignore swap Luis Chamberlain
2023-03-06 20:03   ` Yosry Ahmed
2023-03-08 22:21     ` Luis Chamberlain
2023-03-06 17:09 ` [PATCH 0/6] tmpfs: add the option to disable swap Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.