All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Alexey Lyashkov <alexey.lyashkov@gmail.com>,
	Artem Blagodarenko <artem.blagodarenko@gmail.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	Yang Sheng <yang.sheng@intel.com>,
	Zhen Liang <liang.zhen@intel.com>,
	Artem Blagodarenko <artem.blagodarenko@seagate.com>
Subject: Re: [PATCH] Add largedir feature
Date: Mon, 20 Mar 2017 07:42:01 -0400	[thread overview]
Message-ID: <20170320114201.icgvngqty52q6wf3@thunk.org> (raw)
In-Reply-To: <2F91584E-6351-4523-9821-54AD6A7CD889@dilger.ca>

On Sun, Mar 19, 2017 at 07:54:40PM -0400, Andreas Dilger wrote:
> 
> No, the directory tree for the Lustre MDS is just a regular directory
> tree (under "ROOT/" so we can have other files outside the visible
> namespace) with regular filenames as with local ext4.  The one difference
> is that there are also 128-bit FIDs stored in the dirents to allow readdir
> to work efficiently, but the majority of the other Lustre attributes
> are stored in xattrs on the inode.

OK, so let's summarize.

1.  This is only going to be an issue for Lustre users that are
creating a truly insanely large directories, and who aren't willing to
use a multi-level directories (e.g., users/t/y/tytso) for whatever reason.

2.  Currently the proposal is to upstream largedir, and not
necessarily the other file system features that are Lustre MDS
specific.

3.  I can therefore assume that Artem is interested in getting
largedir upstream for use cases and users that go beyond Lustre ---
and these users will probably be using non-zero length inodes, in
which case my observations about the fact that the slow down caused by
the fact that you have to spread out the inodes to place them close to
the data blocks will be applicable.

4.  Alexey's concerns, which seem to be based around Lustre users for
which (1) are true, could potentially be addressed by further,
additional file system changes, which could either continue to be
Lustre MDS specific and not upstreamed, or could be upstreamed at some
future point --- but which are fairly orthogonal to this discussion.

Does that seem fair?

					- Ted

P.S.  I could imagine some changes that involve using 64-bit inode
numbers where the low log2(inode_size) bits are used for the location
of the inode in the block, and the rest of the inode number is used to
identify the block number where the inode can be found --- and
abandoning the use of an "inode table" completely.  The inode
allocation bitmap block could be used instead to tell us which blocks
in the block group contain inodes for e2fsck pass 1 scanning.  Things
get a bit more complicated in e2fsck if it turns out that bitmap block
is corrupt, but that's a subject for another day, and I suspct it's
something that would only make sense if the Lustre community is
willing to put in the investment to work on it.

  parent reply	other threads:[~2017-03-20 11:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16  9:51 [PATCH] Add largedir feature Artem Blagodarenko
2017-03-16 21:44 ` Andreas Dilger
2017-03-17  6:15   ` Alexey Lyashkov
2017-03-17 20:51     ` Andreas Dilger
2017-03-18  8:16       ` Alexey Lyashkov
2017-03-18 16:29         ` Theodore Ts'o
2017-03-18 17:17           ` Alexey Lyashkov
2017-03-19  0:39             ` Theodore Ts'o
2017-03-19  4:19               ` Alexey Lyashkov
2017-03-19  6:13               ` Andreas Dilger
2017-03-19  5:38           ` Andreas Dilger
2017-03-19 13:34             ` Theodore Ts'o
2017-03-19 23:54               ` Andreas Dilger
2017-03-20 11:34                 ` Alexey Lyashkov
2017-03-20 14:20                   ` Theodore Ts'o
2017-03-21 15:38                     ` Andreas Dilger
2017-03-20 11:42                 ` Theodore Ts'o [this message]
2017-04-30  0:59 ` Theodore Ts'o
2017-05-01 18:58   ` Eric Biggers
2017-05-01 23:39     ` Andreas Dilger
2017-05-02  2:44       ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170320114201.icgvngqty52q6wf3@thunk.org \
    --to=tytso@mit.edu \
    --cc=adilger@dilger.ca \
    --cc=alexey.lyashkov@gmail.com \
    --cc=artem.blagodarenko@gmail.com \
    --cc=artem.blagodarenko@seagate.com \
    --cc=liang.zhen@intel.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=yang.sheng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.