From: "Theodore Y. Ts'o" <tytso@mit.edu> To: Daniel Phillips <daniel@phunq.net> Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Subject: Re: [RFC] Thing 1: Shardmap fox Ext4 Date: Wed, 27 Nov 2019 09:25:08 -0500 Message-ID: <20191127142508.GB5143@mit.edu> (raw) In-Reply-To: <176a1773-f5ea-e686-ec7b-5f0a46c6f731@phunq.net> A couple of quick observations about Shardmap. (1) It's licensed[1] under the GPLv3, so it's not compatible with the kernel license. That doesn't matter much for ext4, because... [1] https://github.com/danielbot/Shardmap/blob/master/LICENSE (2) It's implemented as userspace code (e.g., it uses open(2), mmap(2), et. al) and using C++, so it would need to be reimplemented from scratch for use in the kernel. (3) It's not particularly well documented, making the above more challenging, but it appears to be a variation of an extensible hashing scheme, which was used by dbx and Berkley DB. (4) Because of (2), we won't be able to do any actual benchmarks for a while. I just checked the latest version of Tux3[2], and it appears to be be still using a linear search scheme for its directory --- e.g., an O(n) lookup ala ext2. So I'm guessing Shardmap may have been *designed* for Tux3, but it has not yet been *implemented* for Tux3? [2] https://github.com/OGAWAHirofumi/linux-tux3/blob/hirofumi/fs/tux3/dir.c#L283 (5) The claim is made that readdir() accesses files sequentially; but there is also mention in Shardmap of compressing shards (e.g., rewriting them) to squeeze out deleted and tombstone entries. This pretty much guarantees that it will not be possible to satisfy POSIX requirements of telldir(2)/seekdir(3) (using a 32-bit or 64-bitt cookie), NFS (which also requires use of a 32-bit or 64-bit cookie while doing readdir scan), or readdir() semantics in the face of directory entries getting inserted or removed from the directory. (To be specific, POSIX requires readdir returns each entry in a directory once and only once, and in the case of a directory entry which is removed or inserted, that directory entry must be returned exactly zero or one times. This is true even if telldir(2) ort seekdir(2) is used to memoize a particular location in the directory, which means you have a 32-bit or 64-bit cookie to define a particular location in the readdir(2) stream. If the file system wants to be exportable via NFS, it must meet similar requirements ---- except the 32-bit or 64-bit cookie MUST survive a reboot.) Regards, - Ted
next prev parent reply index Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-27 1:47 Daniel Phillips 2019-11-27 7:40 ` Vyacheslav Dubeyko 2019-11-27 8:28 ` Daniel Phillips 2019-11-27 19:35 ` Viacheslav Dubeyko 2019-11-28 2:54 ` Daniel Phillips 2019-11-28 9:15 ` Andreas Dilger 2019-11-28 10:03 ` Daniel Phillips 2019-11-27 14:25 ` Theodore Y. Ts'o [this message] 2019-11-27 22:27 ` Daniel Phillips 2019-11-28 2:28 ` Theodore Y. Ts'o 2019-11-28 4:27 ` Daniel Phillips 2019-11-30 17:50 ` Theodore Y. Ts'o 2019-12-01 8:21 ` Daniel Phillips 2019-12-04 18:31 ` Andreas Dilger 2019-12-04 21:44 ` Daniel Phillips 2019-12-05 0:36 ` Andreas Dilger 2019-12-05 2:27 ` [RFC] Thing 1: Shardmap for Ext4 Daniel Phillips 2019-12-04 23:41 ` [RFC] Thing 1: Shardmap fox Ext4 Theodore Y. Ts'o 2019-12-06 1:16 ` Dave Chinner 2019-12-06 5:09 ` [RFC] Thing 1: Shardmap for Ext4 Daniel Phillips 2019-12-08 22:42 ` Dave Chinner 2019-11-28 21:17 ` [RFC] Thing 1: Shardmap fox Ext4 Daniel Phillips 2019-12-08 10:25 ` Daniel Phillips 2019-12-02 1:45 ` Daniel Phillips 2019-12-04 15:55 ` Vyacheslav Dubeyko 2019-12-05 9:46 ` Daniel Phillips 2019-12-06 11:47 ` Vyacheslav Dubeyko 2019-12-07 0:46 ` [RFC] Thing 1: Shardmap for Ext4 Daniel Phillips 2019-12-04 18:03 ` [RFC] Thing 1: Shardmap fox Ext4 Andreas Dilger 2019-12-04 20:47 ` Daniel Phillips 2019-12-04 20:53 ` Daniel Phillips 2019-12-05 5:59 ` Daniel Phillips
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191127142508.GB5143@mit.edu \ --to=tytso@mit.edu \ --cc=daniel@phunq.net \ --cc=hirofumi@mail.parknet.co.jp \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-ext4 Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \ linux-ext4@vger.kernel.org public-inbox-index linux-ext4 Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4 AGPL code for this site: git clone https://public-inbox.org/public-inbox.git