All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Ext4 snapshots design challenges
@ 2010-10-25 12:34 Amir G.
  2010-10-25 15:24 ` Greg Freemyer
  0 siblings, 1 reply; 5+ messages in thread
From: Amir G. @ 2010-10-25 12:34 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: next3-devel, Theodore Tso

Hi All,

In April this year , I introduces the Next3 snapshots feature on this list:
http://lwn.net/Articles/383934/

The main criticism I got was concerning the choice of forking from Ext3,
rather than developing for Ext4. To those critics I replied, that the Ext4
merge is on the roadmap, but it may take a while before we get there.

In the mean while, Ted has been very supportive and has already merged
the (minor) on-disk changes of the snapshot feature to mainline and libext2.

In these days, a group of 4 students is preparing to start the porting of
the Next3 snapshots feature to Ext4, with my assistance and following
some guide lines that were drawn by Ted.

I will be attending the Linux Plumbers Conference and will try to initiate a
discussion around some design issues regarding Ext4 snapshots:
http://www.linuxplumbersconf.org/2010/ocw/proposals/1191

If you are attending LPC, you are most welcome to join the discussion
(We, Nov 3, 17:30) and contribute to Ext4 snapshots design.

For those of you who didn't get the chance to catch up with Next3 snapshots
design, I have prepared this 'quick' overview:
http://sf.net/apps/mediawiki/next3/index.php?title=Technical_overview

A draft of design challenges and proposed solutions can be found here:
http://sf.net/apps/mediawiki/next3/index.php?title=Ext4_snapshots_TODO

Here inlined, for your convenience, is the first and biggest challenge of
the merge - the implementation of extent mapped file data block re-write.

Your comments will be appreciated,
Amir.


https://sourceforge.net/apps/mediawiki/next3/index.php?
title=Ext4_snapshots_TODO#Ext4_snapshots_design_challenges:

=                     Ext4 snapshots design challenges                     =

The following issues require special attention when merging the snapshots
feature to ext4.

Ext4 developers are encouraged to comment on these issues and suggest
solutions other than the ones proposed here.

== Extent mapped file data block re-write ==

The term re-write refers to a non first write to a file's data block.
The first write allocates a new block for that file and requires no special
snapshot block operations. If a snapshot was taken after a block was
allocated, that block is protected by the snapshot's COW bitmap. Any attempt
to re-write that block should result in a snapshot block operation, which
either copies the original data to the snapshot file or moves the original
block to the snapshot file and allocates a new block for the new data.

Current implementation moves data blocks of indirect mapped files to
snapshot on re-write. The move-on-write method is more efficient than the
copy-on-write method, but it may cause a file to get fragmented in a use
case of re-writes to many random locations.

For extent mapped files re-write, there are 2 possible solutions.
Ted T'so has wrote about this choice:

''Technically speaking, it's possible to do it both ways, yes?''
''I'm not sure why you consider this such an important design decision.''
''We can even play games where for some files we might do copy-on-write,''
''and for some files, we do move-on-write. It's always possible to check''
''the COW bitmaps to decide what had happened.''

=== Move-on-write ===

Besides the mentioned file fragmentation problem, every move-on-write
operation may need to split up a data extent into 2 extents of existing
blocks and a third extent for blocks allocated for the new data.
The metadata overhead of such a split operation is more significant than
that of an indirect mapped file move-on-write operation and these extra
metadata updates will have to be accounted for in advance when starting a
block re-write transaction. Extent spliting may also degrade re-write
performance to extent mapped files.

In general, delayed allocation, or delayed move-on-write for our purpose,
should be used to avoid extent splitting as much as possible.

Perhaps the file fragmentation problem can be solved by online
de-fragmentation. After all, the original file's blocks are kept safely
inside the snapshot file, so a background task can simply copy the snapshot
moved blocks to new locations and then copy the file's new data into its
original blocks and map them back into the file.

=== Copy-on-write ===

Copying the re-written block to snapshot may seem like the "easy way out"
of the file fragmentation problem, but the problems it causes in return
are not to be disregarded.

The first and obvious problem is write performance, because every data
block re-write involves reading the content of the existing block from
storage, before proceeding with the re-write. This read I/O can be avoided
when using the move-on-write method. Though the write performance seems
like a big limitation, it can be tagged as a trade-off between random write
performance and sequential read performance and the choice can be left at
the hands of the user.

The second issue with data blocks copy-on-write is the snapshot reserved
blocks count. On snapshot take, the file system reserves a certain amount of
blocks for snapshot use. The reservation is calculated from the estimated
count of metadata blocks that may need to be copied to snapshot at some
point in the future. Move-on-write uses much less snapshot reserved blocks
than copy-on-write, so the data blocks count doesn't need to be accounted
for. When choosing to do copy-on-write on data blocks re-write, the re-write
operation should first verify that there is enough disk space for allocating
the snapshot copied data blocks without using snapshot reserved blocks.
If there is not enough disk space, the operation should return ENOSPC.

The last and most challenging issue has to do with I/O ordering within a
single snapshot COW operation. The rule is very simple:
To keep the snapshot data safe, the snapshot copy has to secured in storage
before the new data is allowed to be written to storage.

With metadata copy-on-write, this ordering is provided as a by product from
the journaling sub-system. All snapshot COW'ed blocks are marked as ordered
data, which is always written to storage before transaction commit starts
and metadata blocks are always written to storage during transaction commit.

When COW'ing a data block, which may be "ordered" or "writeback", there is
no mechanism in place to help order the async writes of the snapshot COW'ed
blocks before the async writes of the re-written data blocks. Even worse,
when COW'ing an "ordered" data block, the journal will force it to storage
before transaction commit starts and the snapshot COW'ed block mapping into
the snapshot file will only be written during transaction commit.

One possible solution is to implement a "holdback" list of blocks that
should not be written before the current transaction commits. Naturally, a
block must not be on both the "ordered" and "holdback" lists, but when
re-writing an allocated data block, there is no sense in making this block
"ordered", because this kind of data modification is directly related to any
metadata modification (except change of inode's mtime, but who cares).

....

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Ext4 snapshots design challenges
  2010-10-25 12:34 [RFC] Ext4 snapshots design challenges Amir G.
@ 2010-10-25 15:24 ` Greg Freemyer
  2010-10-25 16:05   ` Amir G.
  0 siblings, 1 reply; 5+ messages in thread
From: Greg Freemyer @ 2010-10-25 15:24 UTC (permalink / raw)
  To: Amir G.; +Cc: Ext4 Developers List, next3-devel

Amir,

I recently saw an announcement for X-Ways Forensics
(http://www.x-ways.net/) that they now support next3 as a filesystem
to analyze.  See Oct. 10 msg under topic "Announcements: X-Ways
Forensics 15.8" at http://www.winhex.net/  (I think that is a public
posting board.)

I was surprised to see that, but assuming it was indeed your project
they added support for, I congratulate you on the above.

I'm curious what level of support they offer.  In particular, they
only offer limited support for NTFS shadow copies, so I'm curious if
the next3 support is similarly limited.

Or since next3 is GPL they may have been able to do a more
comprehensive job with it than with ntfs shadow copies.

Any info you have would be appreciated.
Greg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Ext4 snapshots design challenges
  2010-10-25 15:24 ` Greg Freemyer
@ 2010-10-25 16:05   ` Amir G.
  2010-10-27  0:13     ` Greg Freemyer
  0 siblings, 1 reply; 5+ messages in thread
From: Amir G. @ 2010-10-25 16:05 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Ext4 Developers List, next3-devel

On Mon, Oct 25, 2010 at 5:24 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
> Amir,
>
> I recently saw an announcement for X-Ways Forensics
> (http://www.x-ways.net/) that they now support next3 as a filesystem
> to analyze.  See Oct. 10 msg under topic "Announcements: X-Ways
> Forensics 15.8" at http://www.winhex.net/  (I think that is a public
> posting board.)
>
> I was surprised to see that, but assuming it was indeed your project
> they added support for, I congratulate you on the above.
>

Thanks! I guess :-)
I am pretty clueless with regards to the big players in the storage market.
I do not know X-Ways, but it looks like they are a big player.

> I'm curious what level of support they offer.  In particular, they
> only offer limited support for NTFS shadow copies, so I'm curious if
> the next3 support is similarly limited.
>
> Or since next3 is GPL they may have been able to do a more
> comprehensive job with it than with ntfs shadow copies.
>
> Any info you have would be appreciated.
> Greg
>

As you can figure out, I was not involved or notified about this move.
Judging from their release notes, I would say that the added support is
mostly adding some information tags and verifying the correctness of the
exclude bitmap:

* Support for the Linux file system next3. The exclude bitmap inode
will be evaluated,
  and snapshot files are marked with (SF) in the Attribute column.
  Specialist license or higher required.

You shouldn't be too surprised to learn that the only file system
integrity test that
I have added in my e2fsprogs patches is verifying the correctness of
the exclude bitmap ;-)

Thanks for the info and sorry if your post was rejected from next3-devel.
I fixed the permissions for out of list posts.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Ext4 snapshots design challenges
  2010-10-25 16:05   ` Amir G.
@ 2010-10-27  0:13     ` Greg Freemyer
  2010-10-27  2:05       ` Amir G.
  0 siblings, 1 reply; 5+ messages in thread
From: Greg Freemyer @ 2010-10-27  0:13 UTC (permalink / raw)
  To: Amir G.; +Cc: Ext4 Developers List, next3-devel

On Mon, Oct 25, 2010 at 12:05 PM, Amir G.
<amir73il@users.sourceforge.net> wrote:
> On Mon, Oct 25, 2010 at 5:24 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
>> Amir,
>>
>> I recently saw an announcement for X-Ways Forensics
>> (http://www.x-ways.net/) that they now support next3 as a filesystem
>> to analyze.  See Oct. 10 msg under topic "Announcements: X-Ways
>> Forensics 15.8" at http://www.winhex.net/  (I think that is a public
>> posting board.)
>>
>> I was surprised to see that, but assuming it was indeed your project
>> they added support for, I congratulate you on the above.
>>
>
> Thanks! I guess :-)
> I am pretty clueless with regards to the big players in the storage market.
> I do not know X-Ways, but it looks like they are a big player.


X-Ways is a computer forensic tool.  It is used to find evidence on
computers.  (You might want to check my sig below.)  X-Ways is one of
the 3 biggest forensic suite vendors and their forensic app sells for
about $1K.  (My company has 3 licenses.)

A perfect situation for analysis of a next3 based filesystem would be
if a contract had been fraudulently updated after it was signed and
X-Ways was able to pull up older versions of the contract and prove
the fraud.

The fact that they took the time to recover documents out of a next3
filesystem implies they thought next3 was deployed widely enough to be
worth the effort.

I know they also add features for specific large customers, so it
could simply be that a large client of their's asked them to add next3
support for some internal reason.

>> I'm curious what level of support they offer.  In particular, they
>> only offer limited support for NTFS shadow copies, so I'm curious if
>> the next3 support is similarly limited.
>>
>> Or since next3 is GPL they may have been able to do a more
>> comprehensive job with it than with ntfs shadow copies.
>>
>> Any info you have would be appreciated.
>> Greg
>>
>
> As you can figure out, I was not involved or notified about this move.
> Judging from their release notes, I would say that the added support is
> mostly adding some information tags and verifying the correctness of the
> exclude bitmap:
>
> * Support for the Linux file system next3. The exclude bitmap inode
> will be evaluated,
>  and snapshot files are marked with (SF) in the Attribute column.
>  Specialist license or higher required.

But the ability to pull out snapshot files in an orderly fashion is
the core functionality they could add from their perspective.  So
while you may think this is basic, it means they took the time to
decode your filesystem structure and pull out snapshot files.  Since
they don't actually use any of the GPL code (or at least I hope they
don't, that means they had to develop the fs analyser just for next3.
Not something I suspect can be done with limited effort.

They do the same for NTFS shadow volumes, but even now the
functionality is not complete enough they call it supported.

> You shouldn't be too surprised to learn that the only file system
> integrity test that
> I have added in my e2fsprogs patches is verifying the correctness of
> the exclude bitmap ;-)
>
> Thanks for the info and sorry if your post was rejected from next3-devel.
> I fixed the permissions for out of list posts.

No problem

> Amir.
>

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
CNN/TruTV Aired Forensic Imaging Demo -
   http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Ext4 snapshots design challenges
  2010-10-27  0:13     ` Greg Freemyer
@ 2010-10-27  2:05       ` Amir G.
  0 siblings, 0 replies; 5+ messages in thread
From: Amir G. @ 2010-10-27  2:05 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Ext4 Developers List, next3-devel

On Wed, Oct 27, 2010 at 2:13 AM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
> On Mon, Oct 25, 2010 at 12:05 PM, Amir G.
> <amir73il@users.sourceforge.net> wrote:
>> On Mon, Oct 25, 2010 at 5:24 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
>>> Amir,
>>>
>>> I recently saw an announcement for X-Ways Forensics
>>> (http://www.x-ways.net/) that they now support next3 as a filesystem
>>> to analyze.  See Oct. 10 msg under topic "Announcements: X-Ways
>>> Forensics 15.8" at http://www.winhex.net/  (I think that is a public
>>> posting board.)
>>>
>>> I was surprised to see that, but assuming it was indeed your project
>>> they added support for, I congratulate you on the above.
>>>
>>
>> Thanks! I guess :-)
>> I am pretty clueless with regards to the big players in the storage market.
>> I do not know X-Ways, but it looks like they are a big player.
>
>
> X-Ways is a computer forensic tool.  It is used to find evidence on
> computers.  (You might want to check my sig below.)  X-Ways is one of
> the 3 biggest forensic suite vendors and their forensic app sells for
> about $1K.  (My company has 3 licenses.)
>
> A perfect situation for analysis of a next3 based filesystem would be
> if a contract had been fraudulently updated after it was signed and
> X-Ways was able to pull up older versions of the contract and prove
> the fraud.
>
> The fact that they took the time to recover documents out of a next3
> filesystem implies they thought next3 was deployed widely enough to be
> worth the effort.
>
> I know they also add features for specific large customers, so it
> could simply be that a large client of their's asked them to add next3
> support for some internal reason.
>

That's very interesting. I sure hope that next3 (or better yet ext4 snapshots)
will be widely deployed, but I am guessing that X-Ways are trying to
stay in sync
with latest libext2, so when Ted accepted the on-disk format changes to libext2
a few months ago, they must have updated their library as well.

>>> I'm curious what level of support they offer.  In particular, they
>>> only offer limited support for NTFS shadow copies, so I'm curious if
>>> the next3 support is similarly limited.
>>>
>>> Or since next3 is GPL they may have been able to do a more
>>> comprehensive job with it than with ntfs shadow copies.
>>>
>>> Any info you have would be appreciated.
>>> Greg
>>>
>>
>> As you can figure out, I was not involved or notified about this move.
>> Judging from their release notes, I would say that the added support is
>> mostly adding some information tags and verifying the correctness of the
>> exclude bitmap:
>>
>> * Support for the Linux file system next3. The exclude bitmap inode
>> will be evaluated,
>>  and snapshot files are marked with (SF) in the Attribute column.
>>  Specialist license or higher required.
>
> But the ability to pull out snapshot files in an orderly fashion is
> the core functionality they could add from their perspective.  So
> while you may think this is basic, it means they took the time to
> decode your filesystem structure and pull out snapshot files.  Since
> they don't actually use any of the GPL code (or at least I hope they
> don't, that means they had to develop the fs analyser just for next3.
> Not something I suspect can be done with limited effort.
>

The changes that next3 made to on-disk format of ext3 are minor:
http://sourceforge.net/apps/mediawiki/next3/index.php?title=On-disk_format
(and have already been pushed to mainline)

So if you have a code that decodes ext3 structures, be it GPL or not,
the effort required to decode next3 is very limited and it looks to me like
they have only invested that limited effort so far.

However, if any of you forensic developers out there hears me,
you should know that extracting a full snapshot image, or a snapshot
files report,
should be a trivial task if you have all the snapshot file structures decoded.

I was planning to implement something like e2image -r /dev/sda1@1,
but I am probably not going to get around to that in the near future.

> They do the same for NTFS shadow volumes, but even now the
> functionality is not complete enough they call it supported.
>
>> You shouldn't be too surprised to learn that the only file system
>> integrity test that
>> I have added in my e2fsprogs patches is verifying the correctness of
>> the exclude bitmap ;-)
>>
>> Thanks for the info and sorry if your post was rejected from next3-devel.
>> I fixed the permissions for out of list posts.
>
> No problem
>
>> Amir.
>>
>
> Greg
> --
> Greg Freemyer
> Head of EDD Tape Extraction and Processing team
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> CNN/TruTV Aired Forensic Imaging Demo -
>    http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-27  2:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-25 12:34 [RFC] Ext4 snapshots design challenges Amir G.
2010-10-25 15:24 ` Greg Freemyer
2010-10-25 16:05   ` Amir G.
2010-10-27  0:13     ` Greg Freemyer
2010-10-27  2:05       ` Amir G.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.