From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE7BC432C0 for ; Thu, 21 Nov 2019 06:46:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9790720855 for ; Thu, 21 Nov 2019 06:46:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9790720855 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 026206B02AB; Thu, 21 Nov 2019 01:46:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F195A6B02AC; Thu, 21 Nov 2019 01:46:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2EC76B02AD; Thu, 21 Nov 2019 01:46:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id C96806B02AB for ; Thu, 21 Nov 2019 01:46:36 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 830E3181AEF1D for ; Thu, 21 Nov 2019 06:46:36 +0000 (UTC) X-FDA: 76179351192.14.watch77_73a9d03cce142 X-HE-Tag: watch77_73a9d03cce142 X-Filterd-Recvd-Size: 4941 Received: from huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 21 Nov 2019 06:46:34 +0000 (UTC) Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 73A6C2041FC65341FC57; Thu, 21 Nov 2019 14:46:31 +0800 (CST) Received: from [127.0.0.1] (10.184.213.217) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.439.0; Thu, 21 Nov 2019 14:45:02 +0800 Subject: Re: [PATCH] tmpfs: use ida to get inode number To: Hugh Dickins CC: Matthew Wilcox , , , , , , "J. R. Okajima" References: <1574259798-144561-1-git-send-email-zhengbin13@huawei.com> <20191120154552.GS20752@bombadil.infradead.org> <1c64e7c2-6460-49cf-6db0-ec5f5f7e09c4@huawei.com> From: "zhengbin (A)" Message-ID: Date: Thu, 21 Nov 2019 14:45:00 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US X-Originating-IP: [10.184.213.217] X-CFilter-Loop: Reflected Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2019/11/21 12:52, Hugh Dickins wrote: > On Thu, 21 Nov 2019, zhengbin (A) wrote: >> On 2019/11/20 23:45, Matthew Wilcox wrote: >>> On Wed, Nov 20, 2019 at 10:23:18PM +0800, zhengbin wrote: >>>> I have tried to change last_ino type to unsigned long, while this wa= s >>>> rejected, see details on https://patchwork.kernel.org/patch/11023915= . >>> Did you end up trying sbitmap? >> Maybe sbitmap is not a good solution, max_inodes of tmpfs are controll= ed by mount options--nrinodes, >> >> which can be modified by remountfs(bigger or smaller), as the comment = of function sbitmap_resize says: >> >> =C2=A0* Doesn't reallocate anything. It's up to the caller to ensure t= hat the new >> =C2=A0* depth doesn't exceed the depth that the sb was initialized wit= h. >> >> We can modify this to meet the growing requirements, there will still = be questions as follows: >> >> 1. tmpfs is a ram filesystem, we need to allocate sbitmap memory for s= binfo->max_inodes(while this maybe huge) >> >> 2.If remountfs changes=C2=A0 max_inode, we have to deal with it, while= this may take a long time >> >> (bigger: we need to free the old sbitmap memory, allocate new memory, = copy the old sbitmap to new sbitmap >> >> smaller: How do we deal with it?ie: we use sb->map[inode number/8] to = find the sbitmap, we need to change the exist >> >> inode numbers?while this maybe used by userspace application.) >> >>> What I think is fundamentally wrong with this patch is that you've fo= und a >>> problem in get_next_ino() and decided to use a different scheme for t= his >>> one filesystem, leaving every other filesystem which uses get_next_in= o() >>> facing the same problem. >>> >>> That could be acceptable if you explained why tmpfs is fundamentally >>> different from all the other filesystems that use get_next_ino(), but >>> you haven't (and I don't think there is such a difference. eg pipes, >>> autofs and ipc mqueue could all have the same problem. >> tmpfs is same with all the other filesystems that use get_next_ino(), = but we need to solve this problem one by one. >> >> If tmpfs is ok, we can modify the other filesystems too. Besides, I do= not=C2=A0 recommend all file systems share the same >> >> global variable, for performance impact consideration. >> >>> There are some other problems I noticed, but they're not worth bringi= ng >>> up until this fundamental design choice is justified. >> Agree, thanks. > Just a rushed FYI without looking at your patch or comments. > > Internally (in Google) we do rely on good tmpfs inode numbers more > than on those of other get_next_ino() filesystems, and carry a patch > to mm/shmem.c for it to use 64-bit inode numbers (and separate inode > number space for each superblock) - essentially, > > ino =3D sbinfo->next_ino++; > /* Avoid 0 in the low 32 bits: might appear deleted */ > if (unlikely((unsigned int)ino =3D=3D 0)) > ino =3D sbinfo->next_ino++; > > Which I think would be faster, and need less memory, than IDA. > But whether that is of general interest, or of interest to you, > depends upon how prevalent 32-bit executables built without > __FILE_OFFSET_BITS=3D64 still are these days. So how google think about this? inode number > 32-bit, but 32-bit executa= bles cat not handle this?=C2=A0"separate inode number space for each superbloc= k" can reduce the probability, but still can not solve it. > > Hugh