From mboxrd@z Thu Jan 1 00:00:00 1970 From: Greg Freemyer Subject: Re: [PATCH] e2fsprogs: Fix the overflow in e4defrag with 2GB over file Date: Tue, 30 Mar 2010 12:14:47 -0400 Message-ID: <87f94c371003300914h16a0ed4l38d27e1313c6383c@mail.gmail.com> References: <4BB19BBB.9010509@rs.jp.nec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Theodore Tso , ext4 development To: Akira Fujita Return-path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:46973 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751409Ab0C3QOx (ORCPT ); Tue, 30 Mar 2010 12:14:53 -0400 Received: by vws20 with SMTP id 20so804114vws.19 for ; Tue, 30 Mar 2010 09:14:52 -0700 (PDT) In-Reply-To: <4BB19BBB.9010509@rs.jp.nec.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, Mar 30, 2010 at 2:35 AM, Akira Fujita wrote: > e2fsprogs: Fix the overflow in e4defrag with 2GB over file > > From: Akira Fujita > > In e4defrag, we use locally defined posix_fallocate interface. > And its "offset" and "len" are defined as off_t (long) type, > their upper limit is 2GB -1 byte. > Thus if we run e4defrag to the file whose size is 2GB over, > the overflow occurs at calling fallocate syscall. > > To fix this issue, I add new define _FILE_OFFSET_BITS 64 to use > 64bit offset for filesystem related syscalls in e4defrag.c. > (Also this patch includes open mode fix which has been > released but not been merged e2fsprogs git tree yet. > http://lists.openwall.net/linux-ext4/2010/01/19/3) > > Reported-by: David Calinski > Signed-off-by: Akira Fujita > --- Akira, I haven't looked at the4defrag code since Sept, but does it still defrag large files in one huge effort. Thus a 100GB sparse file being used to hold VM virtual disk is defrag'ed all at once. And worse, when data is written to one of the holes in the sparse file, the entire file has to be defragged again? If so, I think that is a broken design, and e4defrag should simply skip these large files for now. The proper fix being to defrag a "donor extent" at a time. ie. attempt to allocate a full 128 MB extent for the donor file. If successful, replace the first partial extent in the target file with the donor extent. Repeat until done. That way you have a few advantages: 1) You never need more than one free extent to work with. 2) Once you defrag the beginning of a file, you never have to defrag it again. Thus when a sparse file gets new blocks/extents allocated, only the areas of the files that are truly fragmented have to be defragmented. The one negative I can see is that the extents may not be localized well with this approach. Is that a major concern? Is there a way to try to localize the new donor extent request near to the extent it will be following logically? For the last issue, I think you've been working on a mballoc patch that would give e4defrag the ability to control mballoc on a per inode basis. If not, the ohsm project has a patch for something similar. I haven't worked with the ohsm mballoc patch, so I'm not sure how it works. Greg