linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Dave Chinner <david@fromorbit.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Christoph Hellwig <hch@infradead.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Mike Snitzer <snitzer@redhat.com>, Jan Kara <jack@suse.cz>,
	Eric Biggers <ebiggers@google.com>,
	riteshh@linux.ibm.com, krisman@collabora.com, surajjs@amazon.com,
	dmonakhov@gmail.com, mbobrowski@mbobrowski.org,
	Eric Whitney <enwlinux@gmail.com>,
	sblbir@amazon.com, Khazhismel Kumykov <khazhy@google.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH RFC 5/5] ext4: Add fallocate2() support
Date: Sat, 29 Feb 2020 13:12:52 -0700	[thread overview]
Message-ID: <F2CA6010-F7E5-4891-A337-FA1FEB32B935@dilger.ca> (raw)
In-Reply-To: <20200228211610.GQ10737@dread.disaster.area>

[-- Attachment #1: Type: text/plain, Size: 3719 bytes --]

On Feb 28, 2020, at 2:16 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Fri, Feb 28, 2020 at 08:35:19AM -0700, Andreas Dilger wrote:
>> On Feb 27, 2020, at 5:24 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>> 
>>> So, this interface is 3-in-1:
>>> 
>>> 1)finds a placement for inodes extents;
>> 
>> The target allocation size would be sum(size of inodes), which should
>> be relatively small in your case).
>> 
>>> 2)assigns this space to some temporary donor inode;
>> 
>> Maybe yes, or just reserves that space from being allocated by anyone.
>> 
>>> 3)calls ext4_move_extents() for each of them.
>> 
>> ... using the target space that was reserved earlier
>> 
>>> Do I understand you right?
>> 
>> Correct.  That is my "5 minutes thinking about an interface for grouping
>> small files together without exposing kernel internals" proposal for this.
> 
> You don't need any special kernel interface with XFS for this. It is
> simply:
> 
> 	mkdir tmpdir
> 	create O_TMPFILEs in tmpdir
> 
> Now all the tmpfiles you create and their data will be co-located
> around the location of the tmpdir inode. This is the natural
> placement policy of the filesystem. i..e the filesystem assumes that
> files in the same directory are all related, so will be accessed
> together and so should be located in relatively close proximity to
> each other.

Sure, this will likely get inodes allocate _close_ to each other on
ext4 as well (the new directory will preferentially be located in a
group that has free space), but it doesn't necessarily result in
all of the files being packed densely.  For 1MB+4KB and 1MB-4KB files
they will still prefer to be aligned on 1MB boundaries rather than
packed together.

>>> Can we introduce a flag, that some of inode is unmovable?
>> 
>> There are very few flags left in the ext4_inode->i_flags for use.
>> You could use "IMMUTABLE" or "APPEND_ONLY" to mean that, but they
>> also have other semantics.  The EXT4_NOTAIL_FL is for not merging the
>> tail of a file, but ext4 doesn't have tails (that was in Reiserfs),
>> so we might consider it a generic "do not merge" flag if set?
> 
> Indeed, thanks to XFS, ext4 already has an interface that can be
> used to set/clear a "no defrag" flag such as you are asking for.
> It's the FS_XFLAG_NODEFRAG bit in the FS_IOC_FS[GS]ETXATTR ioctl.
> In XFS, that manages the XFS_DIFLAG_NODEFRAG on-disk inode flag,
> and it has special meaning for directories. From the 'man 3 xfsctl'
> man page where this interface came from:
> 
>      Bit 13 (0x2000) - XFS_XFLAG_NODEFRAG
> 	No defragment file bit - the file should be skipped during a
> 	defragmentation operation. When applied to  a directory,
> 	new files and directories created will inherit the no-defrag
> 	bit.

The interface is not the limiting factor here, but rather the number
of flags available in the inode.  Since chattr/lsattr from e2fsprogs
was used as "common ground" for a few years, there are a number of
flags in the namespace that don't actually have any meaning for ext4.

One of those flags is:

#define EXT4_NOTAIL_FL    0x00008000 /* file tail should not be merged */

This was added for Reiserfs, but it is not used by any other filesystem,
so generalizing it slightly to mean "no migrate" is reasonable.  That
doesn't affect Reiserfs in any way, and it would still be possible to
also wire up the XFS_XFLAG_NODEFRAG bit to be stored as that flag.

It wouldn't be any issue at all to chose an arbitrary unused flag to
store this in ext4 inode internally, except that chattr/lsattr are used
by a variety of different filesystems, so whatever flag is chosen will
immediately also apply to any other filesystem that users use those
tools on.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply	other threads:[~2020-02-29 20:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-26 13:40 [PATCH RFC 0/5] fs, ext4: Physical blocks placement hint for fallocate(0): fallocate2(). TP defrag Kirill Tkhai
2020-02-26 13:40 ` [PATCH RFC 1/5] fs: Add new argument to file_operations::fallocate() Kirill Tkhai
2020-02-26 13:41 ` [PATCH RFC 2/5] fs: Add new argument to vfs_fallocate() Kirill Tkhai
2020-02-26 13:41 ` [PATCH RFC 3/5] fs: Add fallocate2() syscall Kirill Tkhai
2020-02-26 13:41 ` [PATCH RFC 4/5] ext4: Prepare ext4_mb_discard_preallocations() for handling EXT4_MB_HINT_GOAL_ONLY Kirill Tkhai
2020-02-26 13:41 ` [PATCH RFC 5/5] ext4: Add fallocate2() support Kirill Tkhai
2020-02-26 15:55   ` Christoph Hellwig
2020-02-26 20:05     ` Kirill Tkhai
2020-02-26 21:51       ` Andreas Dilger
2020-02-27 12:24         ` Kirill Tkhai
2020-02-28 15:35           ` Andreas Dilger
2020-02-28 21:16             ` Dave Chinner
2020-02-29 20:12               ` Andreas Dilger [this message]
2020-03-01  0:06                 ` Dave Chinner
2020-03-02 10:33               ` Kirill Tkhai
2020-03-02 11:07             ` Kirill Tkhai
2020-02-27  6:59       ` Konstantin Khlebnikov
2020-02-27 10:42         ` Kirill Tkhai
2020-02-27  7:33       ` Dave Chinner
2020-02-27 11:12         ` Kirill Tkhai
2020-02-27 21:56           ` Dave Chinner
2020-02-28 12:41             ` Kirill Tkhai
2020-02-29 22:41               ` Dave Chinner
2020-03-02 10:17                 ` Kirill Tkhai
2020-02-27 10:39 ` [PATCH RFC 0/5] fs, ext4: Physical blocks placement hint for fallocate(0): fallocate2(). TP defrag Ritesh Harjani
2020-02-28  7:07 ` xiaohui li
2020-02-28 12:46   ` Kirill Tkhai
2020-03-02 16:56 ` Theodore Y. Ts'o
2020-03-03  9:57   ` Kirill Tkhai
2020-03-03 16:55     ` Theodore Y. Ts'o
2020-03-03 17:36       ` Kirill Tkhai
2020-03-11 19:26     ` Andreas Dilger
2020-03-11 20:29       ` Kirill Tkhai
2020-03-12  0:31         ` Andreas Dilger
2020-03-12  9:23           ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F2CA6010-F7E5-4891-A337-FA1FEB32B935@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=david@fromorbit.com \
    --cc=dmonakhov@gmail.com \
    --cc=ebiggers@google.com \
    --cc=enwlinux@gmail.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=khazhy@google.com \
    --cc=krisman@collabora.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbobrowski@mbobrowski.org \
    --cc=riteshh@linux.ibm.com \
    --cc=sblbir@amazon.com \
    --cc=snitzer@redhat.com \
    --cc=surajjs@amazon.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).