linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jon Bernard <jbernard@tuxion.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>, linux-ext4@vger.kernel.org
Subject: Re: kernel bug at fs/ext4/resize.c:409
Date: Fri, 14 Feb 2014 15:19:05 -0500	[thread overview]
Message-ID: <20140214201905.GA26292@helmut> (raw)
In-Reply-To: <20140213211831.GA11480@thunk.org>

* Theodore Ts'o <tytso@mit.edu> wrote:
> On Thu, Feb 13, 2014 at 09:53:23AM -0500, Jon Bernard wrote:
> > The image should be available here:
> > 
> > http://c5a6e06e970802d5126f-8c6b900f6923cc24b844c506080778ec.r72.cf1.rackcdn.com/fedora_resize_fails.qcow2
> 
> Thanks for the image.  I've been able to reproduce the problem, and
> it's caused by the fact that the inode table is so large that it's
> overflowing into a subsequent block group, and the resize code isn't
> handling this.  Fixing this may be a bit tricky, since the flex_bg
> online resize code is a big ugly at the moment, and needs some clean
> up so this can be fixed properly.
> 
> Until that can be done --- one question: was there a deliberate reason
> why the file system was created with parameters which allocate 32,752
> inodes per block group?  That means that a bit over 8 megabytes of
> inode table are being reserved for every 128 megabyte (32768 4k
> blocks) block group, and that you have more inodes reserved than could
> be used if the average file size is 4k or less.  In fact, the only way
> you could run out of inodes is if you had huge numbers of devices,
> sockets, small symlinks, or zero-length files in your file system.
> This seems to be a bit of a waste of space, in all liklihood.

Ahh, I see.  Here's where this comes from: the particular usecase is
provisioning of new cloud instances whose root volume is of unknown
size.  The filesystem and its contents are created and bundled
before-hand into the smallest filesystem possible.  The instance is PXE
booted for provisioning and the root filesystem is then copied onto the
disk - and then resized to take advantage of the total amount of space.

In order to support very large partitions, the filesystem is created
with an abnormally large inode table so that large resizes would be
possible.  I traced it to this commit as best I can tell:

    https://github.com/openstack/diskimage-builder/commit/fb246a02eb2ed330d3cc37f5795b3ed026aabe07

I assumed that additional inodes would be allocated along with block
groups during an online resize, but that commit contradicts my current
understanding. 

I suggested that the filesystem be created during the time of
provisioning to allow a more optimal on-disk layout, and I believe this
is being considered now.

> Don't get me wrong; we should be able to handle this case correctly,
> and not trigger a BUG_ON, but this is why most people aren't seeing
> this particular fault --- it requires a far greater number of inodes
> than mke2fs would ever create by default, or that most system
> administrators would try to deliberately specify, when creating the
> file system.

Thank you for taking the time to look into this, it is very much
appreciated.

> I'll look and see what's the best way to fix up fs/ext4/resize.c in
> the kernel.

If it turns out to be not terribly complicated and there is not an
immediate time constraint, I would love to try to help with this or at
least test patches.

Cheers,

-- 
Jon

  parent reply	other threads:[~2014-02-14 20:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-03 18:26 kernel bug at fs/ext4/resize.c:409 Jon Bernard
2014-02-03 18:56 ` Theodore Ts'o
2014-02-06 21:08   ` Jon Bernard
2014-02-13 13:24     ` Dmitry Monakhov
2014-02-13 14:53       ` Jon Bernard
2014-02-13 21:18         ` Theodore Ts'o
2014-02-13 21:27           ` Theodore Ts'o
2014-02-14  3:13           ` Andreas Dilger
2014-02-14 20:19           ` Jon Bernard [this message]
2014-02-14 23:46             ` Theodore Ts'o
2014-02-15  3:16               ` Darrick J. Wong
2014-02-15 15:34                 ` Theodore Ts'o
2014-02-16  2:35 ` [PATCH] ext4: fix online resize with very large inode tables Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140214201905.GA26292@helmut \
    --to=jbernard@tuxion.com \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).