From: haoxin <xhao@linux.alibaba.com>
To: Luis Chamberlain <mcgrof@kernel.org>,
hughd@google.com, akpm@linux-foundation.org, willy@infradead.org,
brauner@kernel.org
Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com,
a.manzanares@samsung.com, dave@stgolabs.net,
yosryahmed@google.com, keescook@chromium.org,
patches@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/6] tmpfs: add the option to disable swap
Date: Tue, 14 Mar 2023 10:46:28 +0800 [thread overview]
Message-ID: <e1de614b-25e1-5c21-933a-880412ac7421@linux.alibaba.com> (raw)
In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org>
All these series looks good to me and i do some test on my virtual
machine it works well.
so please add Tested-by: Xin Hao <xhao@linux.alibaba.com> .
just one question, if tmpfs pagecache occupies a large amount of memory,
how can we ensure successful memory reclamation in case of memory shortage?
在 2023/3/10 上午7:05, Luis Chamberlain 写道:
> Changes on this v2 PATCH series:
>
> o Added all respective tags for Reviewed-by, Acked-by's
> o David Hildenbrand suggested on the update-docs patch to mention THP.
> It turns out tmpfs.rst makes absolutely no mention to THP at all
> so I added all the relevant options to the docs including the
> system wide sysfs file. All that should hopefully demistify that
> and make it clearer.
> o Yosry Ahmed spell checked my patch "shmem: add support to ignore swap"
>
> Changes since RFCv2 to the first real PATCH series:
>
> o Added Christian Brauner'd Acked-by for the noswap patch (the only
> change in that patch is just the new shmem_show_options() change I
> describe below).
> o Embraced Yosry Ahmed's recommendation to use mapping_set_unevictable()
> to at ensure the folios at least appear in the unevictable LRU.
> Since that is the goal, this accomplishes what we want and the VM
> takes care of things for us. The shem writepage() still uses a stop-gap
> to ensure we don't get called for swap when its shmem uses
> mapping_set_unevictable().
> o I had evaluated using shmem_lock() instead of calling mapping_set_unevictable()
> but upon my review this doesn't make much sense, as shmem_lock() was
> designed to make use of the RLIMIT_MEMLOCK and this was designed for
> files / IPC / unprivileged perf limits. If we were to use
> shmem_lock() we'd bump the count on each new inode. Using
> shmem_lock() would also complicate inode allocation on shmem as
> we'd to unwind on failure from the user_shm_lock(). It would also
> beg the question of when to capture a ucount for an inode, should we
> just share one for the superblock at shmem_fill_super() or do we
> really need to capture it at every single inode creation? In theory
> we could end up with different limits. The simple solution is to
> juse use mapping_set_unevictable() upon inode creation and be done
> with it, as it cannot fail.
> o Update the documentation for tmpfs before / after my patch to
> reflect use cases a bit more clearly between ramfs, tmpfs and brd
> ramdisks.
> o I updated the shmem_show_options() to also reveal the noswap option
> when its used.
> o Address checkpatch style complaint with spaces before tabs on
> shmem_fs.h.
>
> Chances since first RFC:
>
> o Matthew suggested BUG_ON(!folio_test_locked(folio)) is not needed
> on writepage() callback for shmem so just remove that.
> o Based on Matthew's feedback the inode is set up early as it is not
> reset in case we split the folio. So now we move all the variables
> we can set up really early.
> o shmem writepage() should only be issued on reclaim, so just move
> the WARN_ON_ONCE(!wbc->for_reclaim) early so that the code and
> expectations are easier to read. This also avoid the folio splitting
> in case of that odd case.
> o There are a few cases where the shmem writepage() could possibly
> hit, but in the total_swap_pages we just bail out. We shouldn't be
> splitting the folio then. Likewise for VM_LOCKED case. But for
> a writepage() on a VM_LOCKED case is not expected so we want to
> learn about it so add a WARN_ON_ONCE() on that condition.
> o Based on Yosry Ahmed's feedback the patch which allows tmpfs to
> disable swap now just uses mapping_set_unevictable() on inode
> creation. In that case writepage() should not be called so we
> augment the WARN_ON_ONCE() for writepage() for that case to ensure
> that never happens.
>
> To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next.
>
> I'm doing this work as part of future experimentation with tmpfs and the
> page cache, but given a common complaint found about tmpfs is the
> innability to work without the page cache I figured this might be useful
> to others. It turns out it is -- at least Christian Brauner indicates
> systemd uses ramfs for a few use-cases because they don't want to use
> swap and so having this option would let them move over to using tmpfs
> for those small use cases, see systemd-creds(1).
>
> To see if you hit swap:
>
> mkswap /dev/nvme2n1
> swapon /dev/nvme2n1
> free -h
>
> With swap - what we see today
> =============================
> mount -t tmpfs -o size=5G tmpfs /data-tmpfs/
> dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
> free -h
> total used free shared buff/cache available
> Mem: 3.7Gi 2.6Gi 1.2Gi 2.2Gi 2.2Gi 1.2Gi
> Swap: 99Gi 2.8Gi 97Gi
>
>
> Without swap
> =============
>
> free -h
> total used free shared buff/cache available
> Mem: 3.7Gi 387Mi 3.4Gi 2.1Mi 57Mi 3.3Gi
> Swap: 99Gi 0B 99Gi
> mount -t tmpfs -o size=5G -o noswap tmpfs /data-tmpfs/
> dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
> free -h
> total used free shared buff/cache available
> Mem: 3.7Gi 2.6Gi 1.2Gi 2.3Gi 2.3Gi 1.1Gi
> Swap: 99Gi 21Mi 99Gi
>
> The mix and match remount testing
> =================================
>
> # Cannot disable swap after it was first enabled:
> mount -t tmpfs -o size=5G tmpfs /data-tmpfs/
> mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
> mount: /data-tmpfs: mount point not mounted or bad option.
> dmesg(1) may have more information after failed mount system call.
> dmesg -c
> tmpfs: Cannot disable swap on remount
>
> # Remount with the same noswap option is OK:
> mount -t tmpfs -o size=5G -o noswap tmpfs /data-tmpfs/
> mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
> dmesg -c
>
> # Trying to enable swap with a remount after it first disabled:
> mount -t tmpfs -o size=5G -o noswap tmpfs /data-tmpfs/
> mount -t tmpfs -o remount -o size=5G tmpfs /data-tmpfs/
> mount: /data-tmpfs: mount point not mounted or bad option.
> dmesg(1) may have more information after failed mount system call.
> dmesg -c
> tmpfs: Cannot enable swap on remount if it was disabled on first mount
>
> [0] https://github.com/linux-kdevops/kdevops
>
> Luis Chamberlain (6):
> shmem: remove check for folio lock on writepage()
> shmem: set shmem_writepage() variables early
> shmem: move reclaim check early on writepages()
> shmem: skip page split if we're not reclaiming
> shmem: update documentation
> shmem: add support to ignore swap
>
> Documentation/filesystems/tmpfs.rst | 66 ++++++++++++++++++++++-----
> Documentation/mm/unevictable-lru.rst | 2 +
> include/linux/shmem_fs.h | 1 +
> mm/shmem.c | 68 ++++++++++++++++++----------
> 4 files changed, 103 insertions(+), 34 deletions(-)
>
next prev parent reply other threads:[~2023-03-14 2:46 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-09 23:05 [PATCH v2 0/6] tmpfs: add the option to disable swap Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 1/6] shmem: remove check for folio lock on writepage() Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 2/6] shmem: set shmem_writepage() variables early Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 3/6] shmem: move reclaim check early on writepages() Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 4/6] shmem: skip page split if we're not reclaiming Luis Chamberlain
2023-03-09 23:09 ` Yosry Ahmed
2023-04-18 4:41 ` Hugh Dickins
2023-04-18 21:11 ` Luis Chamberlain
2023-04-18 21:20 ` Hugh Dickins
2023-03-09 23:05 ` [PATCH v2 5/6] shmem: update documentation Luis Chamberlain
2023-04-18 5:29 ` Hugh Dickins
2023-04-18 21:20 ` Luis Chamberlain
2023-04-18 21:41 ` Hugh Dickins
2023-04-18 21:49 ` Luis Chamberlain
2023-03-09 23:05 ` [PATCH v2 6/6] shmem: add support to ignore swap Luis Chamberlain
2023-04-18 5:50 ` Hugh Dickins
2023-04-18 7:38 ` Christian Brauner
2023-04-18 21:51 ` Luis Chamberlain
2023-04-20 8:57 ` [PATCH] shmem: restrict noswap option to initial user namespace Christian Brauner
2023-04-20 19:18 ` Luis Chamberlain
2023-04-18 21:22 ` [PATCH v2 6/6] shmem: add support to ignore swap Luis Chamberlain
2023-04-18 21:30 ` Randy Dunlap
2023-03-14 1:21 ` [PATCH v2 0/6] tmpfs: add the option to disable swap Davidlohr Bueso
2023-03-14 2:46 ` haoxin [this message]
2023-03-19 20:32 ` Luis Chamberlain
2023-03-20 11:14 ` haoxin
2023-03-20 21:36 ` Luis Chamberlain
2023-03-21 11:37 ` haoxin
2023-04-18 4:31 ` Hugh Dickins
2023-04-18 20:55 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1de614b-25e1-5c21-933a-880412ac7421@linux.alibaba.com \
--to=xhao@linux.alibaba.com \
--cc=a.manzanares@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=da.gomez@samsung.com \
--cc=dave@stgolabs.net \
--cc=hughd@google.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=patches@lists.linux.dev \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).