linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "zhengbin (A)" <zhengbin13@huawei.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>, <viro@zeniv.linux.org.uk>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<houtao1@huawei.com>, <yi.zhang@huawei.com>,
	"J. R. Okajima" <hooanon05g@gmail.com>
Subject: Re: [PATCH] tmpfs: use ida to get inode number
Date: Sun, 1 Dec 2019 16:44:27 +0800	[thread overview]
Message-ID: <f452bd64-c8ec-d7be-44e1-8585e146b5e1@huawei.com> (raw)
In-Reply-To: <20191122221327.GW20752@bombadil.infradead.org>


On 2019/11/23 6:13, Matthew Wilcox wrote:
> On Fri, Nov 22, 2019 at 09:23:30AM +0800, zhengbin (A) wrote:
>> On 2019/11/22 3:53, Hugh Dickins wrote:
>>> On Thu, 21 Nov 2019, zhengbin (A) wrote:
>>>> On 2019/11/21 12:52, Hugh Dickins wrote:
>>>>> Just a rushed FYI without looking at your patch or comments.
>>>>>
>>>>> Internally (in Google) we do rely on good tmpfs inode numbers more
>>>>> than on those of other get_next_ino() filesystems, and carry a patch
>>>>> to mm/shmem.c for it to use 64-bit inode numbers (and separate inode
>>>>> number space for each superblock) - essentially,
>>>>>
>>>>> 	ino = sbinfo->next_ino++;
>>>>> 	/* Avoid 0 in the low 32 bits: might appear deleted */
>>>>> 	if (unlikely((unsigned int)ino == 0))
>>>>> 		ino = sbinfo->next_ino++;
>>>>>
>>>>> Which I think would be faster, and need less memory, than IDA.
>>>>> But whether that is of general interest, or of interest to you,
>>>>> depends upon how prevalent 32-bit executables built without
>>>>> __FILE_OFFSET_BITS=64 still are these days.
>>>> So how google think about this? inode number > 32-bit, but 32-bit executables
>>>> cat not handle this?
>>> Google is free to limit what executables are run on its machines,
>>> and how they are built, so little problem here.
>>>
>>> A general-purpose 32-bit Linux distribution does not have that freedom,
>>> does not want to limit what the user runs.  But I thought that by now
>>> they (and all serious users of 32-bit systems) were building their own
>>> executables with _FILE_OFFSET_BITS=64 (I was too generous with the
>>> underscores yesterday); and I thought that defined __USE_FILE_OFFSET64,
>>> and that typedef'd ino_t to be __ino64_t.  And the 32-bit kernel would
>>> have __ARCH_WANT_STAT64, which delivers st_ino as unsigned long long.
>>>
>>> So I thought that a modern, professional 32-bit executable would be
>>> dealing in 64-bit inode numbers anyway.  But I am not a system builder,
>>> so perhaps I'm being naive.  And of course some users may have to support
>>> some old userspace, or apps that assign inode numbers to "int" or "long"
>>> or whatever.  I have no insight into the extent of that problem.
>> So how to solve this problem?
>>
>> 1. tmpfs use ida or other data structure
>>
>> 2. tmpfs use 64-bit, each superblock a inode number space
>>
>> 3. do not do anything, If somebody hits this bug, let them solve for themselves
>>
>> 4. (last_ino change to 64-bit)get_next_ino -->other filesystems will be ok, but it was rejected before
> 5. Extend the sbitmap API to allow for growing the bitmap.  I had a
> look at doing that, and it looks hard.  There are a lot of things which
> are set up at initialisation and changing them mid-use seems tricky.
> Ccing Jens in case he has an opinion.
>
> 6. Creating a percpu IDA.  This doesn't seem too hard.  We need a percpu
> pointer to an IDA leaf (128 bytes), and a percpu integer which is the
> current base for this CPU.  At allocation time, find and set the first
> free bit in the leaf, and add on the current base.
>
> If the percpu leaf is full, set the XA_MARK_1 bit on the entry in
> the XArray.  Then look for any leaves which have both the XA_MARK_0
> and XA_MARK_1 bits set; if there is one, claim it by clearing the
> XA_MARK_1 bit.  If not, kzalloc a new one and find a free spot for it
> in the underlying XArray.
>
> Freeing an ID is simply ida_free().  That will involve changing the
> users of get_next_ino() to call put_ino(), or something.
>
> This should generally result in similar contention between threads as
> the current scheme -- accessing a shared resource every 1024 allocations.
> Maybe more often as we try to avoid leaving gaps in the data structure,
> or maybe less as we reuse IDs.
>
> (I've tried to explain what I want here, but appreciate it may be
> inscrutable.  I can try to explain more, or maybe I should just write
> the code myself)
Hi willy, do you write this?
>
> .
>



  parent reply	other threads:[~2019-12-01  8:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 14:23 [PATCH] tmpfs: use ida to get inode number zhengbin
2019-11-20 15:45 ` Matthew Wilcox
2019-11-21  2:36   ` zhengbin (A)
2019-11-21  4:52     ` Hugh Dickins
2019-11-21  6:45       ` zhengbin (A)
2019-11-21 19:53         ` Hugh Dickins
2019-11-22  1:23           ` zhengbin (A)
2019-11-22 22:13             ` Matthew Wilcox
2019-11-23  2:16               ` zhengbin (A)
2019-11-23  2:33                 ` Matthew Wilcox
2019-11-23  4:54                   ` Al Viro
2019-12-01  8:44               ` zhengbin (A) [this message]
2019-11-21 11:40       ` J. R. Okajima
2019-11-21 20:07         ` Hugh Dickins
2019-11-21  4:31   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f452bd64-c8ec-d7be-44e1-8585e146b5e1@huawei.com \
    --to=zhengbin13@huawei.com \
    --cc=hooanon05g@gmail.com \
    --cc=houtao1@huawei.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).