linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Chris Down <chris@chrisdown.name>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jeff Layton <jlayton@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com, Hugh Dickins <hughd@google.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	"zhengbin (A)" <zhengbin13@huawei.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit
Date: Tue, 24 Dec 2019 05:04:11 +0200	[thread overview]
Message-ID: <CAOQ4uxjm5JMvfbi4xa3yaDwuM+XpNOSDrbVsHvJtkms00ZBnAg@mail.gmail.com> (raw)
In-Reply-To: <20191223204551.GA272672@chrisdown.name>

> The slab i_ino recycling approach works somewhat, but is unfortunately neutered
> quite a lot by the fact that slab recycling is per-memcg. That is, replacing
> with recycle_or_get_next_ino(old_ino)[0] for shmfs and a few other trivial
> callsites only leads to about 10% slab reuse, which doesn't really stem the
> bleeding of 32-bit inums on an affected workload:
>
>      # tail -5000 /sys/kernel/debug/tracing/trace | grep -o 'recycle_or_get_next_ino:.*' | sort | uniq -c
>          4454 recycle_or_get_next_ino: not recycled
>           546 recycle_or_get_next_ino: recycled
>

Too bad..
Maybe recycled ino should be implemented all the same because it is simple
and may improve workloads that are not so MEMCG intensive.

> Roman (who I've just added to cc) tells me that currently we only have
> per-memcg slab reuse instead of global when using CONFIG_MEMCG. This
> contributes fairly significantly here since there are multiple tasks across
> multiple cgroups which are contributing to the get_next_ino() thrash.
>
> I think this is a good start, but we need something of a different magnitude in
> order to actually solve this problem with the current slab infrastructure. How
> about something like the following?
>
> 1. Add get_next_ino_full, which uses whatever the full width of ino_t is
> 2. Use get_next_ino_full in tmpfs (et al.)

I would prefer that filesystems making heavy use of get_next_ino, be converted
to use a private ino pool per sb:

ino_pool_create()
ino_pool_get_next()

flags to ino_pool_create() can determine the desired ino range.
Does the Facebook use case involve a single large tmpfs or many
small ones? I would guess the latter and therefore we are trying to solve
a problem that nobody really needs to solve (i.e. global efficient ino pool).

> 3. Add a mount option to tmpfs (et al.), say `32bit-inums`, which people can
>     pass if they want the 32-bit inode numbers back. This would still allow
>     people who want to make this tradeoff to use xino.

inode32|inode64 (see man xfs(5)).

> 4. (If you like) Also add a CONFIG option to disable this at compile time.
>

I Don't know about disable, but the default mode for tmpfs (inode32|inode64)
might me best determined by CONFIG option, so distro builders could decide
if they want to take the risk of breaking applications on tmpfs.

But if you implement per sb ino pool, maybe inode64 will no longer be
required for your use case?

Thanks,
Amir.

  reply	other threads:[~2019-12-24  3:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20  2:49 [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit Chris Down
2019-12-20  3:05 ` zhengbin (A)
2019-12-20  8:32 ` Amir Goldstein
2019-12-20 12:16   ` Chris Down
2019-12-20 13:41     ` Amir Goldstein
2019-12-20 16:46       ` Matthew Wilcox
2019-12-20 17:35         ` Amir Goldstein
2019-12-20 19:50           ` Matthew Wilcox
2019-12-23 20:45             ` Chris Down
2019-12-24  3:04               ` Amir Goldstein [this message]
2019-12-25 12:54                 ` Chris Down
2019-12-26  1:40                   ` zhengbin (A)
2019-12-20 21:30 ` Darrick J. Wong
2019-12-21  8:43   ` Amir Goldstein
2019-12-21 18:05     ` Darrick J. Wong
2019-12-21 10:16   ` Chris Down
2020-01-07 17:35     ` J. Bruce Fields
2020-01-07 17:44       ` Chris Down
2020-01-08  3:00         ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxjm5JMvfbi4xa3yaDwuM+XpNOSDrbVsHvJtkms00ZBnAg@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=chris@chrisdown.name \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).