All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Harmon <gharm@google.com>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Lukas Czerner <lczerner@redhat.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Curt Wohlgemuth <curtw@google.com>
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
Date: Fri, 22 Mar 2013 10:10:46 -0700	[thread overview]
Message-ID: <CABFC-fO3HrWFXJhyPMPKRy6Up3vGp_JUQs_VJGRyXwph1M95Cw@mail.gmail.com> (raw)
In-Reply-To: <877gl0yc2k.fsf@openvz.org>

[Resending without html.]
I haven't looked at ext4 in awhile now. CC'ing Ted and Curt in case
they want to chime in. See comment below as well.

On Thu, Mar 21, 2013 at 9:03 AM, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> On Thu, 21 Mar 2013 16:50:45 +0100, Lukas Czerner <lczerner@redhat.com> wrote:
>> Block requests from fallocate has been normalized originally. Then it was
>> changed by 556b27abf73833923d5cd4be80006292e1b31662 not to normalize it.
>> And then it was changed by 3c6fe77017bc6ce489f231c35fed3220b6691836
>> again to normalize the request.
>>
>> The fact is that we _never_ want to normalize the request from
>> fallocate. We know exactly how much space we're going to use and we do
>> not want anyone to mess with the request and there is no point in doing
>> so.
> Looks reasonable.
> Reviewed-by:Dmitry Monakhov <dmonakhov@openvz.org>
>>
>> Commit 3c6fe77017bc6ce489f231c35fed3220b6691836 mentioned that

Is this a publicly available commit? I did not find it with a quick
google search.
Anyway, I don't see a problem with this commit off-hand, but I'll
defer to Ted or Curt.

-Greg Harmon

>> large fallocate requests were not physically contiguous. However it is
>> important to see why that is the case. Because the request is so big the
>> allocator will try to find free group to allocate from skipping block
>> groups which are used, which is fine. However it will only allocate
>> extents of 2^15-1 block (limitation of uninitialized extent size)
>> which will leave one block in each block group free which will make the
>> extent tree physically non-contiguous, however _only_ by one block which
>> is perfectly fine.
>>
>> This will never happen when we normalize the request because for some
>> reason (maybe bug) it will be normalized to much smaller request (2048
>> blocks) and those extents will then be merged together not leaving any
>> free block in between - hence physically contiguous. However the fact
>> that we're splitting huge requests into ton of smaller ones and then
>> merging extents together is very _very_ bad for fallocate performance.
>>
>> The situation is even worst since with commit
>> ec22ba8edb507395c95fbc617eea26a6b2d98797 we no longer merge
>> uninitialized extents so we end up with absolutely _huge_ extent tree
>> for bigger fallocate requests which is also bad for performance but not
>> only when fallocate itself, but even when working with the file
>> later on.
>>
>> Fix this by disabling normalization for fallocate. From my simple testing
>> with this commit fallocate is much faster on non fragmented file
>> system. On my system fallocate 15T is almost 3x faster with this patch
>> and removing this file is almost 2x faster - tested on real hardware.
>>
>> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
>> ---
>>  fs/ext4/extents.c |   18 ++++++++++--------
>>  1 files changed, 10 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index e2bb929..a40a602 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4422,16 +4422,18 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>               trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>>               return ret;
>>       }
>> -     flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
>> -     if (mode & FALLOC_FL_KEEP_SIZE)
>> -             flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>> +
>>       /*
>> -      * Don't normalize the request if it can fit in one extent so
>> -      * that it doesn't get unnecessarily split into multiple
>> -      * extents.
>> +      * We do NOT want the requests from fallocate to be normalized
>> +      * ever!. We know exactly how much we want to allocate and
>> +      * we do not need to do any mumbo-jumbo with it. Requests bigger
>> +      * than uninit extent size, will be divided automatically into
>> +      * biggest possible extents.
>>        */
>> -     if (len <= EXT_UNINIT_MAX_LEN << blkbits)
>> -             flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
>> +     flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT |
>> +             EXT4_GET_BLOCKS_NO_NORMALIZE;
>> +     if (mode & FALLOC_FL_KEEP_SIZE)
>> +             flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>>
>>  retry:
>>       while (ret >= 0 && ret < max_blocks) {
>> --
>> 1.7.7.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-03-22 17:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 15:50 [PATCH] ext4: Do not normalize request from fallocate Lukas Czerner
2013-03-21 16:03 ` Dmitry Monakhov
2013-03-22 17:10   ` Greg Harmon [this message]
2013-03-22 19:36     ` Theodore Ts'o
     [not found]     ` <514cb91d.8a48340a.33fd.ffff9fa3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-03-22 22:19       ` Greg Harmon
2013-03-24  0:11 ` Theodore Ts'o
2013-03-24  2:42   ` Andreas Dilger
2013-03-25 10:09   ` Lukáš Czerner
2013-03-25 12:53     ` Theodore Ts'o
2013-03-25 13:26       ` Lukáš Czerner
2013-03-25 14:44         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABFC-fO3HrWFXJhyPMPKRy6Up3vGp_JUQs_VJGRyXwph1M95Cw@mail.gmail.com \
    --to=gharm@google.com \
    --cc=curtw@google.com \
    --cc=dmonakhov@openvz.org \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.