All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger@clusterfs.com>,
	sho@tnes.nec.co.jp, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC][PATCH 2/3] Move the file data to the new blocks
Date: Thu, 8 Feb 2007 11:47:39 +0100	[thread overview]
Message-ID: <20070208104739.GA3674@duck.suse.cz> (raw)
In-Reply-To: <20070208023213.902eed32.akpm@linux-foundation.org>

On Thu 08-02-07 02:32:13, Andrew Morton wrote:
> On Thu, 8 Feb 2007 11:21:02 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > On Thu 08-02-07 01:45:29, Andrew Morton wrote:
> >  <snip>
> > > >   I though Andreas meant "any write changes" - i.e. you check that noone
> > > > has open file descriptor for writing and block any new open for writing.
> > > > That can be done quite easily.
> > > >   Anyway, I agree with you that userspace solution to a possible page
> > > > cache pollution is preferable after thinking about it for a while.
> > > > As I've been thinking about it, we could actually do the copying
> > > > from user space. We could do something like:
> > > >   block any writes to file (as I described above)
> > > >   craft new inode with blocks allocated as we want (using preallocation,
> > > >     we should mostly have the kernel infrastructure we need)
> > > >   copy data using splice syscall
> > > >   call the kernel to switch data
> > > > 
> > > 
> > > I don't think we need to block any writes to any file or anything.
> > > 
> > > To move a page within a file:
> > > 
> > > 	fd = open(file);
> > > 	p = mmap(fd);
> > > 	the_page_was_in_core = mincore(p, offset);
> > > 	munmap(p);
> > > 	ioctl(fd, ..., new_block);
> > > 
> > > 			<kernel>
> > > 			read_cache_page(inode, offset);
> > > 			lock_page(page);
> > > 			if (try_to_free_buffers(page)) {
> > > 				<relocate the page>
> > > 				set_page_dirty(page);
> > > 			}
> > > 			unlock_page(page);
> > > 
> > > 	if (the_page_was_in_core) {
> > > 		sync_file_range(fd, offset SYNC_FILE_RANGE_WAIT_BEFORE|
> > > 						SYNC_FILE_RANGE_WRITE|
> > > 						SYNC_FILE_RANGE_WAIT_AFTER);
> > > 		fadvise(fd, offset, FADV_DONTNEED);
> > > 	}
> > > 
> > > completely coherent with pagecache, quite safe in the presence of mmap,
> > > mlock, O_DIRECT, everything else.  Also fully journallable in-kernel.
> >   Yes, this is the simple way. But I see two disadvantages:
> > 1) You'd like to relocate metadata (indirect blocks) too.
> 
> Well.  Do we really?  Are we looking for a 100% solution here, or a 90% one?
  Umm, I think that for ext3 having data on one end of the disk and
indirect blocks on the other end of the disk does not quite help (not
mentioning that it can create bad free space fragmentation over the time).
I have not measured it but I'd guess that it would erase the effect of
moving data closer together. At least for sequential reads..

> Relocating data is the main thing.  After that, yeah, relocating metadata,
> inodes and directories is probably a second-order thing.
> 
> > For that you need
> >    a different mechanism.
> 
> I suspect a similar approach will work there: load and lock the
> buffer_heads (or maybe just the top-level buffer_head) and then alter their
> contents.  It could be that verify_chain() will just magically do the right
> thing there, but some changes might be needed.
  Yes, it could be done. I just wanted to point to the fact that things may
not be as simple in your solution either...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

  reply	other threads:[~2007-02-08 10:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-16 12:05 [RFC][PATCH 2/3] Move the file data to the new blocks sho
2007-02-05 13:12 ` Jan Kara
2007-02-05 22:06   ` Nathan Scott
2007-02-07  1:35   ` Andrew Morton
2007-02-07 20:46     ` Andreas Dilger
2007-02-07 20:56       ` Andrew Morton
2007-02-08  9:29         ` Jan Kara
2007-02-08  9:45           ` Andrew Morton
2007-02-08 10:21             ` Jan Kara
2007-02-08 10:32               ` Andrew Morton
2007-02-08 10:47                 ` Jan Kara [this message]
2007-02-12  3:11                   ` Theodore Tso
2007-02-07  1:33 ` Andrew Morton
2007-02-07  3:45   ` Eric Sandeen
2007-02-07  9:46     ` Takashi Sato
  -- strict thread matches above, loose matches on Subject: below --
2007-02-08  9:01 Takashi Sato
2006-12-22 10:30 sho
2006-11-09 11:10 sho

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070208104739.GA3674@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@clusterfs.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sho@tnes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.