All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/35] e2fsprogs April 2015 patchbomb
@ 2015-04-02  2:34 Darrick J. Wong
  2015-04-02  2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
                   ` (33 more replies)
  0 siblings, 34 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

April Fools!

It's been a couple of months; here's a revised patchbomb for 1.43.
There are a few fixes for minor bugs and bitrot I've encountered since
the last patchbomb, but aside from resolving merge conflicts, I
haven't changed a thing.

Patch 1 makes e2fuzz try harder to screw things up, by remounting the
supposedly fixed filesystem and continuing to modify it.  This helps
us to find discrepancies between what the kernel complains about and
what e2fsck knows to fix.

Patch 2 fixes a bug wherein a inlinedata symbolic link longer than 80
characters but missing the extended attribute portion was not
correctly truncated; the kernel apparently expects a symlink with the
inlinedata flag set to have the xattr part even if the target fits
inside i_block[].

Patch 7 fixes a bug where the kernel refuses to allocate clusters to
non-extent files on bigalloc filesystems by converting all non-extent
files to use extents.

Patch 9 teaches e2fsck to complain loudly when someone attempts to
read an obviously invalid block number.  There are a few places where
this is acceptable (trying to resolve inodes to pathnames for
reporting purposes, and fixing crosslinked file messes) but otherwise
we really shouldn't be playing with garbage data.

All other patches (3-6, 8, and 10-35) have not changed since last time:

Patches 3-4 are the e2fsck metadata readahead patches, unchanged
from September 2014.

Patch 5 changes e2fsck to use a bitmap instead of a u32 list when
building the list of directories to rehash.  This enables some code
cleanup and makes it so we can free the dirinfo structure earlier.
No changes from December 2014.

Patches 6-8 rebuild extent trees.  This can be used to convert block
mapped files to extent files (-E bmap2extent), and it can also detect
sparse extent trees that could be reduced in size by either a full ETB
block or a full level.  The code is now smart enough to put off
detecting and rebuilding the extent trees of directories that are
going to be rehashed in part 3A until after the rehash because the
rehash process can shrink a directory enough to trigger the rebuilder
during the next e2fsck run.  No changes from December 2014 aside from
the new patch 7, discussed above.

Patches 10-13 prepare the undo IO manager and e2undo for heavier use
by adding discard, zeroout, and readahead call pass through support;
allow user programs to provide an undo IO file block size that differs
from the filesystem; and speeds up block writeout considerably by
tracking which blocks we've already written in a bitmap (instead of
repeatedly bashing on the tdb keystore).  No change from December
2014 for any of the e2undo patches.

Patch 14 replaces e2undo's tdb file with a dumb flat file format,
which greatly improves the insane performance losses when using undo
files while fixing a lot of endianness bugs, database size
limitations, and the totally broken detection of undo files that
should not be applied to the filesystem.

Patch 15 implements atexit() handlers so that the undo IO manager has
a chance to finish writing the undo file if the program exits without
explicitly cleaning up the IO managers.

Patches 16-22 enable the creation of e2undo files for all modern
e2fsprogs utilities and adds simple test cases for e2undo and supply
test cases for the new undo features.

Patches 23-27 fix some bugs in the copy-in support for mke2fs and
change the file copy-in algorithm to use SEEK_DATA and SEEK_HOLE to
skip pointless reads on sparse files.  Rudimentary feature testing is
provided, and I added a contrib/ script to generate the minimum-sized
ext4 image of a particular directory.  No change since December 2014.

Patches 28-34 are new API calls in the library, primarily to support
the new fallocate feature in patch 31.  None of these patches have
changed since July 2014.

Patch 35 implements fuse2fs, a FUSE server based on libext2fs.
Primarily I've been using it to shake out bugs in the library via
xfstests and the metadata checksumming test program.  It can also be
used to mount ext4 on any OS supporting FUSE, and it can also mount
64k-block filesystems on x86, though I'd be wary of using rw mode.
fuse2fs depends on these new APIs: xattr editing, uninit extent
handling, and the new fallocate call.  No changes since July 2014.

I've tested these e2fsprogs changes against the -next branch as of
3/28, though the patches have been rebased to reflect the minor
changes in this morning's -next.  The patches have been tested against
the 'make check' suite and a week's worth of e2fuzz testing on x86_64,
ppc64, armv7l, i386, and aarch64.  Github, for crazy testers:
https://github.com/djwong/e2fsprogs/commits/next

Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 01/35] e2fuzz: fuzz harder
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21  1:47   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Once we've "fixed" the filesystem, try mounting and modifying it to see
if we can break the kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/e2fuzz.sh |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 1 deletion(-)


diff --git a/misc/e2fuzz.sh b/misc/e2fuzz.sh
index d8d9a82..389f2ca 100755
--- a/misc/e2fuzz.sh
+++ b/misc/e2fuzz.sh
@@ -139,7 +139,7 @@ if [ $? -ne 0 ]; then
 fi
 SRC_SZ="$(du -ks "${SRCDIR}" | awk '{print $1}')"
 FS_SZ="$(( $(stat -f "${TESTMNT}" -c '%a * %S') / 1024 ))"
-NR="$(( (FS_SZ * 6 / 10) / SRC_SZ ))"
+NR="$(( (FS_SZ * 4 / 10) / SRC_SZ ))"
 if [ "${NR}" -lt 1 ]; then
 	NR=1
 fi
@@ -263,6 +263,64 @@ seq 1 "${PASSES}" | while read pass; do
 			break;
 		fi
 	fi
+
+	echo "+++ check fs for round 2"
+	FSCK_LOG="${TESTDIR}/e2fuzz-${pass}-round2.log"
+	e2fsck -fn "${FSCK_IMG}" ${EXTENDED_FSCK_OPTS} >> "${FSCK_LOG}" 2>&1
+	res=$?
+	if [ "${res}" -ne 0 ]; then
+		echo "++++ fsck failed."
+		exit 1
+	fi
+
+	echo "++ mount image (2)"
+	mount "${FSCK_IMG}" "${TESTMNT}" -o loop
+	res=$?
+
+	if [ "${res}" -eq 0 ]; then
+		echo "+++ ls -laR (2)"
+		ls -laR "${TESTMNT}/test.1/" > /dev/null 2> "${OPS_LOG}"
+
+		echo "+++ cat files (2)"
+		find "${TESTMNT}/test.1/" -type f -size -1048576k -print0 | xargs -0 cat > /dev/null 2>> "${OPS_LOG}"
+
+		echo "+++ expand (2)"
+		find "${TESTMNT}/" -type f 2> /dev/null | head -n 50000 | while read f; do
+			attr -l "$f" > /dev/null 2>> "${OPS_LOG}"
+			if [ -f "$f" -a -w "$f" ]; then
+				dd if=/dev/zero bs="${BLK_SZ}" count=1 >> "$f" 2>> "${OPS_LOG}"
+			fi
+			mv "$f" "$f.longer" > /dev/null 2>> "${OPS_LOG}"
+		done
+		sync
+
+		echo "+++ create files (2)"
+		cp -pRdu "${SRCDIR}" "${TESTMNT}/test.moo" 2>> "${OPS_LOG}"
+		sync
+
+		echo "+++ remove files (2)"
+		rm -rf "${TESTMNT}/test.moo" 2>> "${OPS_LOG}"
+
+		umount "${TESTMNT}"
+		res=$?
+		if [ "${res}" -ne 0 ]; then
+			ret=1
+			break
+		fi
+		sync
+		test "${USE_FUSE2FS}" -gt 0 && sleep 2
+
+		echo "+++ check fs (2)"
+		e2fsck -fn "${FSCK_IMG}" >> "${FSCK_LOG}" 2>&1
+		res=$?
+		if [ "${res}" -ne 0 ]; then
+			echo "++ fsck failed."
+			exit 1
+		fi
+	else
+		echo "++ mount(2) failed with ${res}"
+		exit 1
+	fi
 	rm -rf "${FSCK_IMG}" "${PASS_IMG}" "${FUZZ_LOG}" "${TESTDIR}"/e2fuzz*.log
 done
 


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
  2015-04-02  2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21  1:47   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
                   ` (31 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When there's a problem accessing the EA part of an inline data symlink
and we want to truncate the symlink back to 60 characters (hoping the
user can re-establish the link later on, apparently) be sure to turn
off the inline data flag to convert the symlink back to a regular fast
symlink.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c                     |    2 ++
 tests/f_inlinedata_repair/expect.1 |    5 ++++-
 tests/f_inlinedata_repair/expect.2 |    2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 791817b..bf95ae1 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1251,6 +1251,8 @@ void e2fsck_pass1(e2fsck_t ctx)
 						ctx->flags |= E2F_FLAG_ABORT;
 						goto endit;
 					}
+					if (LINUX_S_ISLNK(inode->i_mode))
+						inode->i_flags &= ~EXT4_INLINE_DATA_FL;
 					e2fsck_write_inode(ctx, ino, inode,
 							   "pass1");
 					failed_csum = 0;
diff --git a/tests/f_inlinedata_repair/expect.1 b/tests/f_inlinedata_repair/expect.1
index cc220ba..9c84b14 100644
--- a/tests/f_inlinedata_repair/expect.1
+++ b/tests/f_inlinedata_repair/expect.1
@@ -21,6 +21,9 @@ Salvage? yes
 Directory inode 32, block #0, offset 4: directory corrupted
 Salvage? yes
 
+Symlink /1 (inode #12) is invalid.
+Clear? yes
+
 Symlink /3 (inode #14) is invalid.
 Clear? yes
 
@@ -51,5 +54,5 @@ Unattached zero-length inode 35.  Clear? yes
 Pass 5: Checking group summary information
 
 test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
-test_filesys: 27/128 files (0.0% non-contiguous), 18/512 blocks
+test_filesys: 26/128 files (0.0% non-contiguous), 18/512 blocks
 Exit status is 1
diff --git a/tests/f_inlinedata_repair/expect.2 b/tests/f_inlinedata_repair/expect.2
index 2c400a5..69d874e 100644
--- a/tests/f_inlinedata_repair/expect.2
+++ b/tests/f_inlinedata_repair/expect.2
@@ -3,5 +3,5 @@ Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
 Pass 5: Checking group summary information
-test_filesys: 27/128 files (0.0% non-contiguous), 18/512 blocks
+test_filesys: 26/128 files (0.0% non-contiguous), 18/512 blocks
 Exit status is 0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
  2015-04-02  2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
  2015-04-02  2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21  3:03   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
                   ` (30 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

This patch adds to e2fsck the ability to pre-fetch metadata into the
page cache in the hopes of speeding up fsck runs.  There are two new
functions -- the first allows a caller to readahead a list of blocks,
and the second is a helper function that uses that first mechanism to
load group data (bitmaps, inode tables).

These new e2fsck routines require the addition of a dblist API to
allow us to iterate a subset of a dblist.  This will enable
incremental directory block readahead in e2fsck pass 2.

There's also a function to estimate the readahead given a FS.

v2: Add an API to create a dblist with a given number of list elements
pre-allocated.  This enables us to save ~2ms per call to
e2fsck_readahead() (assuming a 2MB RA buffer) by not having to
repeatedly call ext2_resize_mem as we add blocks to the list.

v3: Instead of creating dblists of arbitrary size, change the dblist
iterator to allow iterating a sub-range.  This eliminates a lot of
unnecessary list copying during e2fsck part2.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure           |    2 
 configure.in        |    1 
 e2fsck/Makefile.in  |    9 +-
 e2fsck/e2fsck.h     |   18 ++++
 e2fsck/readahead.c  |  252 +++++++++++++++++++++++++++++++++++++++++++++++++++
 e2fsck/util.c       |   51 ++++++++++
 lib/config.h.in     |    3 +
 lib/ext2fs/dblist.c |   21 ++++
 lib/ext2fs/ext2fs.h |   10 ++
 9 files changed, 359 insertions(+), 8 deletions(-)
 create mode 100644 e2fsck/readahead.c


diff --git a/configure b/configure
index f59d232..fdc93c0 100755
--- a/configure
+++ b/configure
@@ -12414,7 +12414,7 @@ fi
 done
 
 fi
-for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	attr/xattr.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/disk.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/mount.h 	sys/prctl.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
+for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	attr/xattr.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/disk.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/mount.h 	sys/prctl.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index 9069234..73cfeb4 100644
--- a/configure.in
+++ b/configure.in
@@ -949,6 +949,7 @@ AC_CHECK_HEADERS(m4_flatten([
 	sys/sockio.h
 	sys/stat.h
 	sys/syscall.h
+	sys/sysctl.h
 	sys/sysmacros.h
 	sys/time.h
 	sys/types.h
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index d0e64eb..e40e51b 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -62,7 +62,7 @@ OBJS= dict.o unix.o e2fsck.o super.o pass1.o pass1b.o pass2.o \
 	pass3.o pass4.o pass5.o journal.o badblocks.o util.o dirinfo.o \
 	dx_dirinfo.o ehandler.o problem.o message.o quota.o recovery.o \
 	region.o revoke.o ea_refcount.o rehash.o profile.o prof_err.o \
-	logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o
+	logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o readahead.o
 
 PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/super.o profiled/pass1.o profiled/pass1b.o \
@@ -73,7 +73,8 @@ PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/recovery.o profiled/region.o profiled/revoke.o \
 	profiled/ea_refcount.o profiled/rehash.o profiled/profile.o \
 	profiled/prof_err.o profiled/logfile.o \
-	profiled/sigcatcher.o profiled/plausible.o
+	profiled/sigcatcher.o profiled/plausible.o \
+	profiled/sigcatcher.o profiled/readahead.o
 
 SRCS= $(srcdir)/e2fsck.c \
 	$(srcdir)/dict.c \
@@ -97,6 +98,7 @@ SRCS= $(srcdir)/e2fsck.c \
 	$(srcdir)/message.c \
 	$(srcdir)/ea_refcount.c \
 	$(srcdir)/rehash.c \
+	$(srcdir)/readahead.c \
 	$(srcdir)/region.c \
 	$(srcdir)/profile.c \
 	$(srcdir)/sigcatcher.c \
@@ -541,3 +543,6 @@ plausible.o: $(srcdir)/../misc/plausible.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
  $(srcdir)/../misc/nls-enable.h $(srcdir)/../misc/plausible.h
+readahead.o: $(srcdir)/readahead.c $(top_builddir)/lib/config.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/e2fsck.h prof_err.h
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index e0a9239..b8795a8 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -495,6 +495,23 @@ extern ext2_ino_t e2fsck_get_lost_and_found(e2fsck_t ctx, int fix);
 extern errcode_t e2fsck_adjust_inode_count(e2fsck_t ctx, ext2_ino_t ino,
 					   int adj);
 
+/* readahead.c */
+#define E2FSCK_READA_SUPER	(0x01)
+#define E2FSCK_READA_GDT	(0x02)
+#define E2FSCK_READA_BBITMAP	(0x04)
+#define E2FSCK_READA_IBITMAP	(0x08)
+#define E2FSCK_READA_ITABLE	(0x10)
+#define E2FSCK_READA_ALL_FLAGS	(0x1F)
+errcode_t e2fsck_readahead(ext2_filsys fs, int flags, dgrp_t start,
+			   dgrp_t ngroups);
+#define E2FSCK_RA_DBLIST_IGNORE_BLOCKCNT	(0x01)
+#define E2FSCK_RA_DBLIST_ALL_FLAGS		(0x01)
+errcode_t e2fsck_readahead_dblist(ext2_filsys fs, int flags,
+				  ext2_dblist dblist,
+				  unsigned long long start,
+				  unsigned long long count);
+int e2fsck_can_readahead(ext2_filsys fs);
+unsigned long long e2fsck_guess_readahead(ext2_filsys fs);
 
 /* region.c */
 extern region_t region_create(region_addr_t min, region_addr_t max);
@@ -582,6 +599,7 @@ extern errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs,
 						   int default_type,
 						   const char *profile_name,
 						   ext2fs_block_bitmap *ret);
+unsigned long long get_memory_size(void);
 
 /* unix.c */
 extern void e2fsck_clear_progbar(e2fsck_t ctx);
diff --git a/e2fsck/readahead.c b/e2fsck/readahead.c
new file mode 100644
index 0000000..8190d1f
--- /dev/null
+++ b/e2fsck/readahead.c
@@ -0,0 +1,252 @@
+/*
+ * readahead.c -- Prefetch filesystem metadata to speed up fsck.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+
+#include "e2fsck.h"
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+struct read_dblist {
+	errcode_t err;
+	blk64_t run_start;
+	blk64_t run_len;
+	int flags;
+};
+
+static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
+			       void *priv_data)
+{
+	struct read_dblist *pr = priv_data;
+	e2_blkcnt_t count = (pr->flags & E2FSCK_RA_DBLIST_IGNORE_BLOCKCNT ?
+			     1 : db->blockcnt);
+
+	if (!pr->run_len || db->blk != pr->run_start + pr->run_len) {
+		if (pr->run_len) {
+			pr->err = io_channel_cache_readahead(fs->io,
+							     pr->run_start,
+							     pr->run_len);
+			dbg_printf("readahead start=%llu len=%llu err=%d\n",
+				   pr->run_start, pr->run_len,
+				   (int)pr->err);
+		}
+		pr->run_start = db->blk;
+		pr->run_len = 0;
+	}
+	pr->run_len += count;
+
+	return pr->err ? DBLIST_ABORT : 0;
+}
+
+errcode_t e2fsck_readahead_dblist(ext2_filsys fs, int flags,
+				  ext2_dblist dblist,
+				  unsigned long long start,
+				  unsigned long long count)
+{
+	errcode_t err;
+	struct read_dblist pr;
+
+	dbg_printf("%s: flags=0x%x\n", __func__, flags);
+	if (flags & ~E2FSCK_RA_DBLIST_ALL_FLAGS)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	memset(&pr, 0, sizeof(pr));
+	pr.flags = flags;
+	err = ext2fs_dblist_iterate3(dblist, readahead_dir_block, start,
+				     count, &pr);
+	if (pr.err)
+		return pr.err;
+	if (err)
+		return err;
+
+	if (pr.run_len)
+		err = io_channel_cache_readahead(fs->io, pr.run_start,
+						 pr.run_len);
+
+	return err;
+}
+
+static errcode_t e2fsck_readahead_bitmap(ext2_filsys fs,
+					 ext2fs_block_bitmap ra_map)
+{
+	blk64_t start, end, out;
+	errcode_t err;
+
+	start = 1;
+	end = ext2fs_blocks_count(fs->super) - 1;
+
+	err = ext2fs_find_first_set_block_bitmap2(ra_map, start, end, &out);
+	while (err == 0) {
+		start = out;
+		err = ext2fs_find_first_zero_block_bitmap2(ra_map, start, end,
+							   &out);
+		if (err == ENOENT) {
+			out = end;
+			err = 0;
+		} else if (err)
+			break;
+
+		err = io_channel_cache_readahead(fs->io, start, out - start);
+		if (err)
+			break;
+		start = out;
+		err = ext2fs_find_first_set_block_bitmap2(ra_map, start, end,
+							  &out);
+	}
+
+	if (err == ENOENT)
+		err = 0;
+
+	return err;
+}
+
+/* Try not to spew bitmap range errors for readahead */
+static errcode_t mark_bmap_range(ext2_filsys fs, ext2fs_block_bitmap map,
+				 blk64_t blk, unsigned int num)
+{
+	if (blk >= ext2fs_get_generic_bmap_start(map) &&
+	    blk + num <= ext2fs_get_generic_bmap_end(map))
+		ext2fs_mark_block_bitmap_range2(map, blk, num);
+	else
+		return EXT2_ET_INVALID_ARGUMENT;
+	return 0;
+}
+
+static errcode_t mark_bmap(ext2_filsys fs, ext2fs_block_bitmap map, blk64_t blk)
+{
+	if (blk >= ext2fs_get_generic_bmap_start(map) &&
+	    blk <= ext2fs_get_generic_bmap_end(map))
+		ext2fs_mark_block_bitmap2(map, blk);
+	else
+		return EXT2_ET_INVALID_ARGUMENT;
+	return 0;
+}
+
+errcode_t e2fsck_readahead(ext2_filsys fs, int flags, dgrp_t start,
+			   dgrp_t ngroups)
+{
+	blk64_t		super, old_gdt, new_gdt;
+	blk_t		blocks;
+	dgrp_t		i;
+	ext2fs_block_bitmap		ra_map = NULL;
+	dgrp_t		end = start + ngroups;
+	errcode_t	err = 0;
+
+	dbg_printf("%s: flags=0x%x start=%d groups=%d\n", __func__, flags,
+		   start, ngroups);
+	if (flags & ~E2FSCK_READA_ALL_FLAGS)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (end > fs->group_desc_count)
+		end = fs->group_desc_count;
+
+	if (flags == 0)
+		return 0;
+
+	err = ext2fs_allocate_block_bitmap(fs, "readahead bitmap",
+					   &ra_map);
+	if (err)
+		return err;
+
+	for (i = start; i < end; i++) {
+		err = ext2fs_super_and_bgd_loc2(fs, i, &super, &old_gdt,
+						&new_gdt, &blocks);
+		if (err)
+			break;
+
+		if (flags & E2FSCK_READA_SUPER) {
+			err = mark_bmap(fs, ra_map, super);
+			if (err)
+				break;
+		}
+
+		if (flags & E2FSCK_READA_GDT) {
+			err = mark_bmap_range(fs, ra_map,
+					      old_gdt ? old_gdt : new_gdt,
+					      blocks);
+			if (err)
+				break;
+		}
+
+		if ((flags & E2FSCK_READA_BBITMAP) &&
+		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_BLOCK_UNINIT) &&
+		    ext2fs_bg_free_blocks_count(fs, i) <
+				fs->super->s_blocks_per_group) {
+			super = ext2fs_block_bitmap_loc(fs, i);
+			err = mark_bmap(fs, ra_map, super);
+			if (err)
+				break;
+		}
+
+		if ((flags & E2FSCK_READA_IBITMAP) &&
+		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_INODE_UNINIT) &&
+		    ext2fs_bg_free_inodes_count(fs, i) <
+				fs->super->s_inodes_per_group) {
+			super = ext2fs_inode_bitmap_loc(fs, i);
+			err = mark_bmap(fs, ra_map, super);
+			if (err)
+				break;
+		}
+
+		if ((flags & E2FSCK_READA_ITABLE) &&
+		    ext2fs_bg_free_inodes_count(fs, i) <
+				fs->super->s_inodes_per_group) {
+			super = ext2fs_inode_table_loc(fs, i);
+			blocks = fs->inode_blocks_per_group -
+				 (ext2fs_bg_itable_unused(fs, i) *
+				  EXT2_INODE_SIZE(fs->super) / fs->blocksize);
+			err = mark_bmap_range(fs, ra_map, super, blocks);
+			if (err)
+				break;
+		}
+	}
+
+	if (!err)
+		err = e2fsck_readahead_bitmap(fs, ra_map);
+
+	ext2fs_free_block_bitmap(ra_map);
+	return err;
+}
+
+int e2fsck_can_readahead(ext2_filsys fs)
+{
+	errcode_t err;
+
+	err = io_channel_cache_readahead(fs->io, 0, 1);
+	dbg_printf("%s: supp=%d\n", __func__, err != EXT2_ET_OP_NOT_SUPPORTED);
+	return err != EXT2_ET_OP_NOT_SUPPORTED;
+}
+
+unsigned long long e2fsck_guess_readahead(ext2_filsys fs)
+{
+	unsigned long long guess;
+
+	/*
+	 * The optimal readahead sizes were experimentally determined by
+	 * djwong in August 2014.  Setting the RA size to two block groups'
+	 * worth of inode table blocks seems to yield the largest reductions
+	 * in e2fsck runtime.
+	 */
+	guess = 2 * fs->blocksize * fs->inode_blocks_per_group;
+
+	/* Disable RA if it'd use more 1/50th of RAM. */
+	if (get_memory_size() > (guess * 50))
+		return guess / 1024;
+
+	return 0;
+}
diff --git a/e2fsck/util.c b/e2fsck/util.c
index e2fb982..9e217e6 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -37,6 +37,10 @@
 #include <errno.h>
 #endif
 
+#ifdef HAVE_SYS_SYSCTL_H
+#include <sys/sysctl.h>
+#endif
+
 #include "e2fsck.h"
 
 extern e2fsck_t e2fsck_global_ctx;   /* Try your very best not to use this! */
@@ -819,3 +823,50 @@ errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs, const char *descr,
 	fs->default_bitmap_type = save_type;
 	return retval;
 }
+
+/* Return memory size in bytes */
+unsigned long long get_memory_size(void)
+{
+#if defined(_SC_PHYS_PAGES)
+# if defined(_SC_PAGESIZE)
+	return (unsigned long long)sysconf(_SC_PHYS_PAGES) *
+	       (unsigned long long)sysconf(_SC_PAGESIZE);
+# elif defined(_SC_PAGE_SIZE)
+	return (unsigned long long)sysconf(_SC_PHYS_PAGES) *
+	       (unsigned long long)sysconf(_SC_PAGE_SIZE);
+# endif
+#elif defined(CTL_HW)
+# if (defined(HW_MEMSIZE) || defined(HW_PHYSMEM64))
+#  define CTL_HW_INT64
+# elif (defined(HW_PHYSMEM) || defined(HW_REALMEM))
+#  define CTL_HW_UINT
+# endif
+	int mib[2];
+
+	mib[0] = CTL_HW;
+# if defined(HW_MEMSIZE)
+	mib[1] = HW_MEMSIZE;
+# elif defined(HW_PHYSMEM64)
+	mib[1] = HW_PHYSMEM64;
+# elif defined(HW_REALMEM)
+	mib[1] = HW_REALMEM;
+# elif defined(HW_PYSMEM)
+	mib[1] = HW_PHYSMEM;
+# endif
+# if defined(CTL_HW_INT64)
+	unsigned long long size = 0;
+# elif defined(CTL_HW_UINT)
+	unsigned int size = 0;
+# endif
+# if defined(CTL_HW_INT64) || defined(CTL_HW_UINT)
+	size_t len = sizeof(size);
+
+	if (sysctl(mib, 2, &size, &len, NULL, 0) == 0)
+		return (unsigned long long)size;
+# endif
+	return 0;
+#else
+# warning "Don't know how to detect memory on your platform?"
+	return 0;
+#endif
+}
diff --git a/lib/config.h.in b/lib/config.h.in
index 0db010f..cd7ec90 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -509,6 +509,9 @@
 /* Define to 1 if you have the <sys/syscall.h> header file. */
 #undef HAVE_SYS_SYSCALL_H
 
+/* Define to 1 if you have the <sys/sysctl.h> header file. */
+#undef HAVE_SYS_SYSCTL_H
+
 /* Define to 1 if you have the <sys/sysmacros.h> header file. */
 #undef HAVE_SYS_SYSMACROS_H
 
diff --git a/lib/ext2fs/dblist.c b/lib/ext2fs/dblist.c
index 942c4f0..bbdb221 100644
--- a/lib/ext2fs/dblist.c
+++ b/lib/ext2fs/dblist.c
@@ -194,20 +194,25 @@ void ext2fs_dblist_sort2(ext2_dblist dblist,
 /*
  * This function iterates over the directory block list
  */
-errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
+errcode_t ext2fs_dblist_iterate3(ext2_dblist dblist,
 				 int (*func)(ext2_filsys fs,
 					     struct ext2_db_entry2 *db_info,
 					     void	*priv_data),
+				 unsigned long long start,
+				 unsigned long long count,
 				 void *priv_data)
 {
-	unsigned long long	i;
+	unsigned long long	i, end;
 	int		ret;
 
 	EXT2_CHECK_MAGIC(dblist, EXT2_ET_MAGIC_DBLIST);
 
+	end = start + count;
 	if (!dblist->sorted)
 		ext2fs_dblist_sort2(dblist, 0);
-	for (i=0; i < dblist->count; i++) {
+	if (end > dblist->count)
+		end = dblist->count;
+	for (i = start; i < end; i++) {
 		ret = (*func)(dblist->fs, &dblist->list[i], priv_data);
 		if (ret & DBLIST_ABORT)
 			return 0;
@@ -215,6 +220,16 @@ errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
 	return 0;
 }
 
+errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
+				 int (*func)(ext2_filsys fs,
+					     struct ext2_db_entry2 *db_info,
+					     void	*priv_data),
+				 void *priv_data)
+{
+	return ext2fs_dblist_iterate3(dblist, func, 0, dblist->count,
+				      priv_data);
+}
+
 static EXT2_QSORT_TYPE dir_block_cmp2(const void *a, const void *b)
 {
 	const struct ext2_db_entry2 *db_a =
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index d75dd76..5084d88 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1053,11 +1053,17 @@ extern void ext2fs_dblist_sort2(ext2_dblist dblist,
 extern errcode_t ext2fs_dblist_iterate(ext2_dblist dblist,
 	int (*func)(ext2_filsys fs, struct ext2_db_entry *db_info,
 		    void	*priv_data),
-       void *priv_data);
+	void *priv_data);
 extern errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
 	int (*func)(ext2_filsys fs, struct ext2_db_entry2 *db_info,
 		    void	*priv_data),
-       void *priv_data);
+	void *priv_data);
+extern errcode_t ext2fs_dblist_iterate3(ext2_dblist dblist,
+	int (*func)(ext2_filsys fs, struct ext2_db_entry2 *db_info,
+		    void	*priv_data),
+	unsigned long long start,
+	unsigned long long count,
+	void *priv_data);
 extern errcode_t ext2fs_set_dir_block(ext2_dblist dblist, ext2_ino_t ino,
 				      blk_t blk, int blockcnt);
 extern errcode_t ext2fs_set_dir_block2(ext2_dblist dblist, ext2_ino_t ino,


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (2 preceding siblings ...)
  2015-04-02  2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21  3:03   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
                   ` (29 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed.  We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.

pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1.  We use the
"iterate a subset of a dblist" and avoid copying the dblist.  Directory
blocks are fetched incrementally as we walk through the directory
block list.  In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it.  Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.

pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.

In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs.  Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost.  (Single-issue USB mass storage disks seem to suffer badly.)

By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS).  The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.

v2: Fix an off-by-one error in the pass1 readahead which made the
readahead trigger one inode too late if the block groups are full.

v3: Use the dblist partial iterator function to read ahead parts
of the directory block list in pass 2, instead of making sublists.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.8.in      |    7 +++++
 e2fsck/e2fsck.conf.5.in |   15 +++++++++++
 e2fsck/e2fsck.h         |    3 ++
 e2fsck/pass1.c          |   65 +++++++++++++++++++++++++++++++++++++++++++++++
 e2fsck/pass2.c          |   38 +++++++++++++++++++++++++++
 e2fsck/pass4.c          |    9 +++++++
 e2fsck/unix.c           |   28 ++++++++++++++++++++
 lib/ext2fs/ext2fs.h     |    1 +
 lib/ext2fs/inode.c      |    3 +-
 9 files changed, 167 insertions(+), 2 deletions(-)


diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 3367f4f..270727a 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -220,6 +220,13 @@ option may prevent you from further manual data recovery.
 .BI nodiscard
 Do not attempt to discard free blocks and unused inode blocks. This option is
 exactly the opposite of discard option. This is set as default.
+.TP
+.BI readahead_kb
+Use this many KiB of memory to pre-fetch metadata in the hopes of reducing
+e2fsck runtime.  By default, this is set to the size of two block groups' inode
+tables (typically 4MiB on a regular ext4 filesystem); if this amount is more
+than 1/50th of total physical memory, readahead is disabled.  Set this to zero
+to disable readahead entirely.
 .RE
 .TP
 .B \-f
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index 9ebfbbf..ab83180 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -205,6 +205,21 @@ of that type are squelched.  This can be useful if the console is slow
 (i.e., connected to a serial port) and so a large amount of output could
 end up delaying the boot process for a long time (potentially hours).
 .TP
+.I readahead_mem_pct
+Use this percentage of memory to try to read in metadata blocks ahead of the
+main e2fsck thread.  This should reduce run times, depending on the speed of
+the underlying storage and the amount of free memory.  There is no default, but
+see
+.B readahead_mem_pct
+for more details.
+.TP
+.I readahead_kb
+Use this amount of memory to read in metadata blocks ahead of the main checking
+thread.  Setting this value to zero disables readahead entirely.  By default,
+this is set the size of two block groups' inode tables (typically 4MiB on a
+regular ext4 filesystem); if this amount is more than 1/50th of total physical
+memory, readahead is disabled.
+.TP
 .I report_features
 If this boolean relation is true, e2fsck will print the file system
 features as part of its verbose reporting (i.e., if the
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index b8795a8..a0e03e3 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -379,6 +379,9 @@ struct e2fsck_struct {
 	 */
 	void *priv_data;
 	ext2fs_block_bitmap block_metadata_map; /* Metadata blocks */
+
+	/* How much are we allowed to readahead? */
+	unsigned long long readahead_kb;
 };
 
 /* Used by the region allocation code */
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index bf95ae1..993aedd 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -898,6 +898,60 @@ out:
 	return 0;
 }
 
+static void pass1_readahead(e2fsck_t ctx, dgrp_t *group, ext2_ino_t *next_ino)
+{
+	ext2_ino_t inodes_in_group = 0, inodes_per_block, inodes_per_buffer;
+	dgrp_t start = *group, grp;
+	blk64_t blocks_to_read = 0;
+	errcode_t err = EXT2_ET_INVALID_ARGUMENT;
+
+	if (ctx->readahead_kb == 0)
+		goto out;
+
+	/* Keep iterating groups until we have enough to readahead */
+	inodes_per_block = EXT2_INODES_PER_BLOCK(ctx->fs->super);
+	for (grp = start; grp < ctx->fs->group_desc_count; grp++) {
+		if (ext2fs_bg_flags_test(ctx->fs, grp, EXT2_BG_INODE_UNINIT))
+			continue;
+		inodes_in_group = ctx->fs->super->s_inodes_per_group -
+					ext2fs_bg_itable_unused(ctx->fs, grp);
+		blocks_to_read += (inodes_in_group + inodes_per_block - 1) /
+					inodes_per_block;
+		if (blocks_to_read * ctx->fs->blocksize >
+		    ctx->readahead_kb * 1024)
+			break;
+	}
+
+	err = e2fsck_readahead(ctx->fs, E2FSCK_READA_ITABLE, start,
+			       grp - start + 1);
+	if (err == EAGAIN) {
+		ctx->readahead_kb /= 2;
+		err = 0;
+	}
+
+out:
+	if (err) {
+		/* Error; disable itable readahead */
+		*group = ctx->fs->group_desc_count;
+		*next_ino = ctx->fs->super->s_inodes_count;
+	} else {
+		/*
+		 * Don't do more readahead until we've reached the first inode
+		 * of the last inode scan buffer block for the last group.
+		 */
+		*group = grp + 1;
+		inodes_per_buffer = (ctx->inode_buffer_blocks ?
+				     ctx->inode_buffer_blocks :
+				     EXT2_INODE_SCAN_DEFAULT_BUFFER_BLOCKS) *
+				    ctx->fs->blocksize /
+				    EXT2_INODE_SIZE(ctx->fs->super);
+		inodes_in_group--;
+		*next_ino = inodes_in_group -
+			    (inodes_in_group % inodes_per_buffer) + 1 +
+			    (grp * ctx->fs->super->s_inodes_per_group);
+	}
+}
+
 void e2fsck_pass1(e2fsck_t ctx)
 {
 	int	i;
@@ -920,10 +974,19 @@ void e2fsck_pass1(e2fsck_t ctx)
 	int		low_dtime_check = 1;
 	int		inode_size;
 	int		failed_csum = 0;
+	ext2_ino_t	ino_threshold = 0;
+	dgrp_t		ra_group = 0;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	clear_problem_context(&pctx);
 
+	/* If we can do readahead, figure out how many groups to pull in. */
+	if (!e2fsck_can_readahead(ctx->fs))
+		ctx->readahead_kb = 0;
+	else if (ctx->readahead_kb == ~0ULL)
+		ctx->readahead_kb = e2fsck_guess_readahead(ctx->fs);
+	pass1_readahead(ctx, &ra_group, &ino_threshold);
+
 	if (!(ctx->options & E2F_OPT_PREEN))
 		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
 
@@ -1103,6 +1166,8 @@ void e2fsck_pass1(e2fsck_t ctx)
 		old_op = ehandler_operation(_("getting next inode from scan"));
 		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
 							  inode, inode_size);
+		if (ino > ino_threshold)
+			pass1_readahead(ctx, &ra_group, &ino_threshold);
 		ehandler_operation(old_op);
 		if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
 			return;
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 94665c6..120f611 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -61,6 +61,9 @@
  * Keeps track of how many times an inode is referenced.
  */
 static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf);
+static int check_dir_block2(ext2_filsys fs,
+			   struct ext2_db_entry2 *dir_blocks_info,
+			   void *priv_data);
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *dir_blocks_info,
 			   void *priv_data);
@@ -77,6 +80,9 @@ struct check_dir_struct {
 	struct problem_context	pctx;
 	int	count, max;
 	e2fsck_t ctx;
+	unsigned long long list_offset;
+	unsigned long long ra_entries;
+	unsigned long long next_ra_off;
 };
 
 void e2fsck_pass2(e2fsck_t ctx)
@@ -96,6 +102,9 @@ void e2fsck_pass2(e2fsck_t ctx)
 	int			i, depth;
 	problem_t		code;
 	int			bad_dir;
+	int (*check_dir_func)(ext2_filsys fs,
+			      struct ext2_db_entry2 *dir_blocks_info,
+			      void *priv_data);
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	clear_problem_context(&cd.pctx);
@@ -139,6 +148,9 @@ void e2fsck_pass2(e2fsck_t ctx)
 	cd.ctx = ctx;
 	cd.count = 1;
 	cd.max = ext2fs_dblist_count2(fs->dblist);
+	cd.list_offset = 0;
+	cd.ra_entries = ctx->readahead_kb * 1024 / ctx->fs->blocksize;
+	cd.next_ra_off = 0;
 
 	if (ctx->progress)
 		(void) (ctx->progress)(ctx, 2, 0, cd.max);
@@ -146,7 +158,8 @@ void e2fsck_pass2(e2fsck_t ctx)
 	if (fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX)
 		ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);
 
-	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_block,
+	check_dir_func = cd.ra_entries ? check_dir_block2 : check_dir_block;
+	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_func,
 						 &cd);
 	if (ctx->flags & E2F_FLAG_SIGNAL_MASK || ctx->flags & E2F_FLAG_RESTART)
 		return;
@@ -868,6 +881,29 @@ int get_filename_hash(ext2_filsys fs, int encrypted, int version,
 			      ret_hash, ret_minor_hash);
 }
 
+static int check_dir_block2(ext2_filsys fs,
+			   struct ext2_db_entry2 *db,
+			   void *priv_data)
+{
+	int err;
+	struct check_dir_struct *cd = priv_data;
+
+	if (cd->ra_entries && cd->list_offset >= cd->next_ra_off) {
+		err = e2fsck_readahead_dblist(fs,
+					E2FSCK_RA_DBLIST_IGNORE_BLOCKCNT,
+					fs->dblist,
+					cd->list_offset + cd->ra_entries / 8,
+					cd->ra_entries);
+		if (err)
+			cd->ra_entries = 0;
+		cd->next_ra_off = cd->list_offset + (cd->ra_entries * 7 / 8);
+	}
+
+	err = check_dir_block(fs, db, priv_data);
+	cd->list_offset++;
+	return err;
+}
+
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *db,
 			   void *priv_data)
diff --git a/e2fsck/pass4.c b/e2fsck/pass4.c
index 21d93f0..bc9a2c4 100644
--- a/e2fsck/pass4.c
+++ b/e2fsck/pass4.c
@@ -106,6 +106,15 @@ void e2fsck_pass4(e2fsck_t ctx)
 #ifdef MTRACE
 	mtrace_print("Pass 4");
 #endif
+	/*
+	 * Since pass4 is mostly CPU bound, start readahead of bitmaps
+	 * ahead of pass 5 if we haven't already loaded them.
+	 */
+	if (ctx->readahead_kb &&
+	    (fs->block_map == NULL || fs->inode_map == NULL))
+		e2fsck_readahead(fs, E2FSCK_READA_BBITMAP |
+				     E2FSCK_READA_IBITMAP,
+				 0, fs->group_desc_count);
 
 	clear_problem_context(&pctx);
 
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index e629136..f45a903 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -650,6 +650,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 	char	*buf, *token, *next, *p, *arg;
 	int	ea_ver;
 	int	extended_usage = 0;
+	unsigned long long reada_kb;
 
 	buf = string_copy(ctx, opts, 0);
 	for (token = buf; token && *token; token = next) {
@@ -678,6 +679,15 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 				continue;
 			}
 			ctx->ext_attr_ver = ea_ver;
+		} else if (strcmp(token, "readahead_kb") == 0) {
+			reada_kb = strtoull(arg, &p, 0);
+			if (*p) {
+				fprintf(stderr, "%s",
+					_("Invalid readahead buffer size.\n"));
+				extended_usage++;
+				continue;
+			}
+			ctx->readahead_kb = reada_kb;
 		} else if (strcmp(token, "fragcheck") == 0) {
 			ctx->options |= E2F_OPT_FRAGCHECK;
 			continue;
@@ -717,6 +727,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs(("\tjournal_only\n"), stderr);
 		fputs(("\tdiscard\n"), stderr);
 		fputs(("\tnodiscard\n"), stderr);
+		fputs(("\treadahead_kb=<buffer size>\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
@@ -750,6 +761,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 #ifdef CONFIG_JBD_DEBUG
 	char 		*jbd_debug;
 #endif
+	unsigned long long phys_mem_kb;
 
 	retval = e2fsck_allocate_context(&ctx);
 	if (retval)
@@ -777,6 +789,8 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	else
 		ctx->program_name = "e2fsck";
 
+	phys_mem_kb = get_memory_size() / 1024;
+	ctx->readahead_kb = ~0ULL;
 	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
 		switch (c) {
 		case 'C':
@@ -961,6 +975,20 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	if (c)
 		verbose = 1;
 
+	if (ctx->readahead_kb == ~0ULL) {
+		profile_get_integer(ctx->profile, "options",
+				    "readahead_mem_pct", 0, -1, &c);
+		if (c >= 0 && c <= 100)
+			ctx->readahead_kb = phys_mem_kb * c / 100;
+		profile_get_integer(ctx->profile, "options",
+				    "readahead_kb", 0, -1, &c);
+		if (c >= 0)
+			ctx->readahead_kb = c;
+		if (ctx->readahead_kb != ~0ULL &&
+		    ctx->readahead_kb > phys_mem_kb)
+			ctx->readahead_kb = phys_mem_kb;
+	}
+
 	/* Turn off discard in read-only mode */
 	if ((ctx->options & E2F_OPT_NO) &&
 	    (ctx->options & E2F_OPT_DISCARD))
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 5084d88..d4f6c8e 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1424,6 +1424,7 @@ extern errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan,
 					    ext2_ino_t *ino,
 					    struct ext2_inode *inode,
 					    int bufsize);
+#define EXT2_INODE_SCAN_DEFAULT_BUFFER_BLOCKS	8
 extern errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
 				  ext2_inode_scan *ret_scan);
 extern void ext2fs_close_inode_scan(ext2_inode_scan scan);
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index 8cc0eb8..58389a8 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -175,7 +175,8 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
 	scan->bytes_left = 0;
 	scan->current_group = 0;
 	scan->groups_left = fs->group_desc_count - 1;
-	scan->inode_buffer_blocks = buffer_blocks ? buffer_blocks : 8;
+	scan->inode_buffer_blocks = buffer_blocks ? buffer_blocks :
+				    EXT2_INODE_SCAN_DEFAULT_BUFFER_BLOCKS;
 	scan->current_block = ext2fs_inode_table_loc(scan->fs,
 						     scan->current_group);
 	scan->inodes_left = EXT2_INODES_PER_GROUP(scan->fs->super);


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (3 preceding siblings ...)
  2015-04-02  2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21  2:26   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
                   ` (28 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Use a bitmap to track which directories we want to rehash, since
bitmaps will use less memory.  This enables us to clean up the
rehash-all case to use inode_dir_map, and we can free the dirinfo
memory sooner.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.c |    4 ++--
 e2fsck/e2fsck.h |    2 +-
 e2fsck/pass1.c  |    8 ++++++-
 e2fsck/pass2.c  |    4 ++--
 e2fsck/pass3.c  |    4 ++++
 e2fsck/rehash.c |   60 ++++++++++++++++++-------------------------------------
 6 files changed, 35 insertions(+), 47 deletions(-)


diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index cf43a8c..4273060 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -125,8 +125,8 @@ errcode_t e2fsck_reset_context(e2fsck_t ctx)
 		ctx->inode_imagic_map = 0;
 	}
 	if (ctx->dirs_to_hash) {
-		ext2fs_u32_list_free(ctx->dirs_to_hash);
-		ctx->dirs_to_hash = 0;
+		ext2fs_free_inode_bitmap(ctx->dirs_to_hash);
+		ctx->dirs_to_hash = NULL;
 	}
 
 	/*
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index a0e03e3..6f96e55 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -304,7 +304,7 @@ struct e2fsck_struct {
 	/*
 	 * Directories to hash
 	 */
-	ext2_u32_list	dirs_to_hash;
+	ext2fs_inode_bitmap dirs_to_hash;
 
 	/*
 	 * Tuning parameters
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 993aedd..938796f 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -992,8 +992,12 @@ void e2fsck_pass1(e2fsck_t ctx)
 
 	if ((fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX) &&
 	    !(ctx->options & E2F_OPT_NO)) {
-		if (ext2fs_u32_list_create(&ctx->dirs_to_hash, 50))
-			ctx->dirs_to_hash = 0;
+		if (e2fsck_allocate_inode_bitmap(fs,
+						 _("directories to rehash"),
+						 EXT2FS_BMAP64_AUTODIR,
+						 "dirs_to_hash",
+						 &ctx->dirs_to_hash))
+			ctx->dirs_to_hash = NULL;
 	}
 
 #ifdef MTRACE
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 120f611..81b8c5f 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -995,7 +995,7 @@ static int check_dir_block(ext2_filsys fs,
 		dot_state = 0;
 
 	if (ctx->dirs_to_hash &&
-	    ext2fs_u32_list_test(ctx->dirs_to_hash, ino))
+	    ext2fs_fast_test_block_bitmap2(ctx->dirs_to_hash, ino))
 		dups_found++;
 
 #if 0
@@ -1718,7 +1718,7 @@ static void clear_htree(e2fsck_t ctx, ext2_ino_t ino)
 	inode.i_flags = inode.i_flags & ~EXT2_INDEX_FL;
 	e2fsck_write_inode(ctx, ino, &inode, "clear_htree");
 	if (ctx->dirs_to_hash)
-		ext2fs_u32_list_add(ctx->dirs_to_hash, ino);
+		ext2fs_mark_inode_bitmap2(ctx->dirs_to_hash, ino);
 }
 
 
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 1d5255f..c331b98 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -119,6 +119,10 @@ void e2fsck_pass3(e2fsck_t ctx)
 	 * If there are any directories that need to be indexed or
 	 * optimized, do it here.
 	 */
+	if (iter)
+		e2fsck_dir_info_iter_end(ctx, iter);
+	iter = NULL;
+	e2fsck_free_dir_info(ctx);
 	e2fsck_rehash_directories(ctx);
 
 abort_exit:
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 66e6786..1d720dc 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -56,9 +56,13 @@
 void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino)
 {
 	if (!ctx->dirs_to_hash)
-		ext2fs_u32_list_create(&ctx->dirs_to_hash, 50);
+		e2fsck_allocate_inode_bitmap(ctx->fs,
+					     _("directories to rehash"),
+					     EXT2FS_BMAP64_AUTODIR,
+					     "dirs_to_hash",
+					     &ctx->dirs_to_hash);
 	if (ctx->dirs_to_hash)
-		ext2fs_u32_list_add(ctx->dirs_to_hash, ino);
+		ext2fs_mark_inode_bitmap2(ctx->dirs_to_hash, ino);
 }
 
 /* Ask if a dir will be rebuilt during pass 3A. */
@@ -68,7 +72,7 @@ int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino)
 		return 1;
 	if (!ctx->dirs_to_hash)
 		return 0;
-	return ext2fs_u32_list_test(ctx->dirs_to_hash, ino);
+	return ext2fs_test_inode_bitmap2(ctx->dirs_to_hash, ino);
 }
 
 struct fill_dir_struct {
@@ -929,12 +933,9 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
 #ifdef RESOURCE_TRACK
 	struct resource_track	rtrack;
 #endif
-	struct dir_info		*dir;
-	ext2_u32_iterate 	iter;
-	struct dir_info_iter *	dirinfo_iter = 0;
-	ext2_ino_t		ino;
-	errcode_t		retval;
-	int			cur, max, all_dirs, first = 1;
+	ext2_ino_t		ino = 0;
+	int			all_dirs, first = 1;
+	ext2fs_inode_bitmap	hmap;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	all_dirs = ctx->options & E2F_OPT_COMPRESS_DIRS;
@@ -946,30 +947,12 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
 
 	clear_problem_context(&pctx);
 
-	cur = 0;
-	if (all_dirs) {
-		dirinfo_iter = e2fsck_dir_info_iter_begin(ctx);
-		max = e2fsck_get_num_dirinfo(ctx);
-	} else {
-		retval = ext2fs_u32_list_iterate_begin(ctx->dirs_to_hash,
-						       &iter);
-		if (retval) {
-			pctx.errcode = retval;
-			fix_problem(ctx, PR_3A_OPTIMIZE_ITER, &pctx);
-			return;
-		}
-		max = ext2fs_u32_list_count(ctx->dirs_to_hash);
-	}
+	hmap = (all_dirs ? ctx->inode_dir_map : ctx->dirs_to_hash);
 	while (1) {
-		if (all_dirs) {
-			if ((dir = e2fsck_dir_info_iter(ctx,
-							dirinfo_iter)) == 0)
-				break;
-			ino = dir->ino;
-		} else {
-			if (!ext2fs_u32_list_iterate(iter, &ino))
-				break;
-		}
+		if (ext2fs_find_first_set_inode_bitmap2(
+				hmap, ino + 1,
+				ctx->fs->super->s_inodes_count, &ino))
+			break;
 
 		pctx.dir = ino;
 		if (first) {
@@ -986,17 +969,14 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
 		}
 		if (ctx->progress && !ctx->progress_fd)
 			e2fsck_simple_progress(ctx, "Rebuilding directory",
-			       100.0 * (float) (++cur) / (float) max, ino);
+					100.0 * (float) ino /
+					(float) ctx->fs->super->s_inodes_count,
+					ino);
 	}
 	end_problem_latch(ctx, PR_LATCH_OPTIMIZE_DIR);
-	if (all_dirs)
-		e2fsck_dir_info_iter_end(ctx, dirinfo_iter);
-	else
-		ext2fs_u32_list_iterate_end(iter);

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (4 preceding siblings ...)
  2015-04-02  2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21 16:33   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
                   ` (27 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Teach e2fsck to (re)construct extent trees.  This enables us to do
either of the following: compress a highly sparse extent tree into
fewer ETB blocks; or convert a ext3-style block mapped file to an
extent file.  The reconstruction is performed during pass 1E or 3A,
as detailed below.

For files that are already extent based, this algorithm will
automatically run (pending user approval) if pass1 determines either
(1) that a whole level of extent tree will fit into a higher level of
the tree; (2) that the size of any level can be reduced by at least
one ETB block; or (3) the extent tree is unnecessarily deep.  It will
not run at all if errors are found and the user declines to fix the
errors.

The option "-E bmap2extent" can be used to force e2fsck to convert all
block map files to extent trees, and to rebuild all extent files'
extent trees.  After conversion, files larger than 12 blocks should be
defragmented to eliminate empty holes where a block lives.

The extent tree constructor is pretty dumb -- it creates a list of
leaf extents (adjacent extents are collapsed), marks all indirect
blocks / ETB blocks free, installs a new extent tree root in the
inode, then loads the leaf extents into the tree.

v2: Account for extent tree block slack that we create when splitting
a block, so that we don't repeatedly annoy the user to rebuild a tree
that we can't optimize further.

v3: For any directory being rebuilt during pass 3A, defer any extent
tree rebuilding until after the rehash.  It's quite possible that the
act of compressing an aged directory will cause it to shrink far
enough to enable us to knock a level off the dir's extent tree.

v4: Add a fixes_only option (and a E2FSCK_FIXES_ONLY environment
variable) that disables optimization activities unless they are
required to make the filesystem consistent.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/Makefile.in                     |   16 +
 e2fsck/e2fsck.8.in                     |    8 
 e2fsck/e2fsck.c                        |    4 
 e2fsck/e2fsck.h                        |   36 ++
 e2fsck/extents.c                       |  536 ++++++++++++++++++++++++++++++++
 e2fsck/pass1.c                         |   63 ++++
 e2fsck/problem.c                       |   48 +++
 e2fsck/problem.h                       |   33 ++
 e2fsck/rehash.c                        |   27 +-
 e2fsck/super.c                         |    7 
 e2fsck/unix.c                          |   23 +
 tests/f_extent_bad_node/expect.1       |   11 -
 tests/f_extent_bad_node/expect.2       |    2 
 tests/f_extent_int_bad_magic/expect.1  |    5 
 tests/f_extent_leaf_bad_magic/expect.1 |    5 
 tests/f_extent_oobounds/expect.1       |   11 -
 tests/f_extent_oobounds/expect.2       |    2 
 tests/f_extents/expect.1               |    9 +
 18 files changed, 819 insertions(+), 27 deletions(-)
 create mode 100644 e2fsck/extents.c


diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index e40e51b..a4413d9 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -62,7 +62,8 @@ OBJS= dict.o unix.o e2fsck.o super.o pass1.o pass1b.o pass2.o \
 	pass3.o pass4.o pass5.o journal.o badblocks.o util.o dirinfo.o \
 	dx_dirinfo.o ehandler.o problem.o message.o quota.o recovery.o \
 	region.o revoke.o ea_refcount.o rehash.o profile.o prof_err.o \
-	logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o readahead.o
+	logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o readahead.o \
+	extents.o
 
 PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/super.o profiled/pass1.o profiled/pass1b.o \
@@ -74,7 +75,7 @@ PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/ea_refcount.o profiled/rehash.o profiled/profile.o \
 	profiled/prof_err.o profiled/logfile.o \
 	profiled/sigcatcher.o profiled/plausible.o \
-	profiled/sigcatcher.o profiled/readahead.o
+	profiled/sigcatcher.o profiled/readahead.o profiled/extents.o
 
 SRCS= $(srcdir)/e2fsck.c \
 	$(srcdir)/dict.c \
@@ -106,6 +107,7 @@ SRCS= $(srcdir)/e2fsck.c \
 	prof_err.c \
 	$(srcdir)/quota.c \
 	$(srcdir)/../misc/plausible.c \
+	$(srcdir)/extents.c \
 	$(MTRACE_SRC)
 
 all:: profiled $(PROGS) e2fsck $(MANPAGES) $(FMANPAGES)
@@ -308,6 +310,16 @@ pass1.o: $(srcdir)/pass1.c $(top_builddir)/lib/config.h \
  $(srcdir)/profile.h prof_err.h $(top_srcdir)/lib/quota/quotaio.h \
  $(top_srcdir)/lib/quota/dqblk_v2.h $(top_srcdir)/lib/quota/quotaio_tree.h \
  $(top_srcdir)/lib/../e2fsck/dict.h $(srcdir)/problem.h
+extents.o: $(srcdir)/extents.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
+ $(srcdir)/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
+ $(top_srcdir)/lib/ext2fs/ext3_extents.h $(top_srcdir)/lib/ext2fs/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
+ $(srcdir)/profile.h prof_err.h $(top_srcdir)/lib/quota/quotaio.h \
+ $(top_srcdir)/lib/quota/dqblk_v2.h $(top_srcdir)/lib/quota/quotaio_tree.h \
+ $(top_srcdir)/lib/../e2fsck/dict.h $(srcdir)/problem.h $(srcdir)/dict.h
 pass1b.o: $(srcdir)/pass1b.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 270727a..e1bbd27 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -227,6 +227,14 @@ e2fsck runtime.  By default, this is set to the size of two block groups' inode
 tables (typically 4MiB on a regular ext4 filesystem); if this amount is more
 than 1/50th of total physical memory, readahead is disabled.  Set this to zero
 to disable readahead entirely.
+.TP
+.BI bmap2extent
+Convert block-mapped files to extent-mapped files.
+.TP
+.BI fixes_only
+Only fix damaged metadata; do not optimize htree directories or compress
+extent trees.  This option is incompatible with the -D and -E bmap2extent
+options.
 .RE
 .TP
 .B \-f
diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 4273060..d8db925 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -208,8 +208,8 @@ void e2fsck_free_context(e2fsck_t ctx)
 typedef void (*pass_t)(e2fsck_t ctx);
 
 static pass_t e2fsck_passes[] = {
-	e2fsck_pass1, e2fsck_pass2, e2fsck_pass3, e2fsck_pass4,
-	e2fsck_pass5, 0 };
+	e2fsck_pass1, e2fsck_pass1e, e2fsck_pass2, e2fsck_pass3,
+	e2fsck_pass4, e2fsck_pass5, 0 };
 
 #define E2F_FLAG_RUN_RETURN	(E2F_FLAG_SIGNAL_MASK|E2F_FLAG_RESTART)
 
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 6f96e55..5fda863 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -167,6 +167,8 @@ struct resource_track {
 #define E2F_OPT_FRAGCHECK	0x0800
 #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
 #define E2F_OPT_DISCARD		0x2000
+#define E2F_OPT_CONVERT_BMAP	0x4000 /* convert blockmap to extent */
+#define E2F_OPT_FIXES_ONLY	0x8000 /* skip all optimizations */
 
 /*
  * E2fsck flags
@@ -190,6 +192,7 @@ struct resource_track {
 #define E2F_FLAG_EXITING	0x1000 /* E2fsck exiting due to errors */
 #define E2F_FLAG_TIME_INSANE	0x2000 /* Time is insane */
 #define E2F_FLAG_PROBLEMS_FIXED	0x4000 /* At least one problem was fixed */
+#define E2F_FLAG_ALLOC_OK	0x8000 /* Can we allocate blocks? */
 
 #define E2F_RESET_FLAGS (E2F_FLAG_TIME_INSANE | E2F_FLAG_PROBLEMS_FIXED)
 
@@ -382,6 +385,23 @@ struct e2fsck_struct {
 
 	/* How much are we allowed to readahead? */
 	unsigned long long readahead_kb;
+
+	/*
+	 * Inodes to rebuild extent trees
+	 */
+	ext2fs_inode_bitmap inodes_to_rebuild;
+};
+
+/* Data structures to evaluate whether an extent tree needs rebuilding. */
+struct extent_tree_level {
+	unsigned int	num_extents;
+	unsigned int	max_extents;
+};
+
+struct extent_tree_info {
+	int force_rebuild:1;
+	ext2_ino_t ino;
+	struct extent_tree_level	ext_info[MAX_EXTENT_DEPTH_COUNT];
 };
 
 /* Used by the region allocation code */
@@ -457,6 +477,19 @@ extern blk64_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
 extern const char *ehandler_operation(const char *op);
 extern void ehandler_init(io_channel channel);
 
+/* extents.c */
+struct problem_context;
+errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino);
+int e2fsck_ino_will_be_rebuilt(e2fsck_t ctx, ext2_ino_t ino);
+void e2fsck_pass1e(e2fsck_t ctx);
+errcode_t e2fsck_check_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino,
+				       struct ext2_inode *inode,
+				       struct problem_context *pctx);
+errcode_t e2fsck_should_rebuild_extents(e2fsck_t ctx,
+					struct problem_context *pctx,
+					struct extent_tree_info *eti,
+					struct ext2_extent_info *info);
+
 /* journal.c */
 extern errcode_t e2fsck_check_ext3_journal(e2fsck_t ctx);
 extern errcode_t e2fsck_run_ext3_journal(e2fsck_t ctx);
@@ -524,7 +557,8 @@ extern int region_allocate(region_t region, region_addr_t start, int n);
 /* rehash.c */
 void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino);
 int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino);
-errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino);
+errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
+			    struct problem_context *pctx);
 void e2fsck_rehash_directories(e2fsck_t ctx);
 
 /* sigcatcher.c */
diff --git a/e2fsck/extents.c b/e2fsck/extents.c
new file mode 100644
index 0000000..8465299
--- /dev/null
+++ b/e2fsck/extents.c
@@ -0,0 +1,536 @@
+/*
+ * extents.c --- rebuild extent tree
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include "e2fsck.h"
+#include "problem.h"
+
+#undef DEBUG
+#undef DEBUG_SUMMARY
+#undef DEBUG_FREE
+
+#define NUM_EXTENTS	341	/* about one ETB' worth of extents */
+
+static errcode_t e2fsck_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino);
+
+/* Schedule an inode to have its extent tree rebuilt during pass 1E. */
+errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino)
+{
+	if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
+				       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+	    (ctx->options & E2F_OPT_NO) ||
+	    (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
+		return 0;
+
+	if (ctx->flags & E2F_FLAG_ALLOC_OK)
+		return e2fsck_rebuild_extents(ctx, ino);
+
+	if (!ctx->inodes_to_rebuild)
+		e2fsck_allocate_inode_bitmap(ctx->fs,
+					     _("extent rebuild inode map"),
+					     EXT2FS_BMAP64_RBTREE,
+					     "inodes_to_rebuild",
+					     &ctx->inodes_to_rebuild);
+	if (ctx->inodes_to_rebuild)
+		ext2fs_mark_inode_bitmap2(ctx->inodes_to_rebuild, ino);
+	return 0;
+}
+
+/* Ask if an inode will have its extents rebuilt during pass 1E. */
+int e2fsck_ino_will_be_rebuilt(e2fsck_t ctx, ext2_ino_t ino)
+{
+	if (!ctx->inodes_to_rebuild)
+		return 0;
+	return ext2fs_test_inode_bitmap2(ctx->inodes_to_rebuild, ino);
+}
+
+struct extent_list {
+	blk64_t blocks_freed;
+	struct ext2fs_extent *extents;
+	unsigned int count;
+	unsigned int size;
+	unsigned int ext_read;
+	errcode_t retval;
+	ext2_ino_t ino;
+};
+
+static errcode_t load_extents(e2fsck_t ctx, struct extent_list *list)
+{
+	ext2_filsys		fs = ctx->fs;
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	extent;
+	errcode_t		retval;
+
+	retval = ext2fs_extent_open(fs, list->ino, &handle);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_extent_get(handle, EXT2_EXTENT_ROOT, &extent);
+	if (retval)
+		goto out;
+
+	do {
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+			goto next;
+
+		/* Internal node; free it and we'll re-allocate it later */
+		if (!(extent.e_flags & EXT2_EXTENT_FLAGS_LEAF)) {
+#if defined(DEBUG) || defined(DEBUG_FREE)
+			printf("ino=%d free=%llu bf=%llu\n", list->ino,
+					extent.e_pblk, list->blocks_freed + 1);
+#endif
+			list->blocks_freed++;
+			ext2fs_block_alloc_stats2(fs, extent.e_pblk, -1);
+			goto next;
+		}
+
+		list->ext_read++;
+		/* Can we attach it to the previous extent? */
+		if (list->count) {
+			struct ext2fs_extent *last = list->extents +
+						     list->count - 1;
+			blk64_t end = last->e_len + extent.e_len;
+
+			if (last->e_pblk + last->e_len == extent.e_pblk &&
+			    last->e_lblk + last->e_len == extent.e_lblk &&
+			    (last->e_flags & EXT2_EXTENT_FLAGS_UNINIT) ==
+			    (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+			    end < (1ULL << 32)) {
+				last->e_len += extent.e_len;
+#ifdef DEBUG
+				printf("R: ino=%d len=%u\n", list->ino,
+						last->e_len);
+#endif
+				goto next;
+			}
+		}
+
+		/* Do we need to expand? */
+		if (list->count == list->size) {
+			unsigned int new_size = (list->size + NUM_EXTENTS) *
+						sizeof(struct ext2fs_extent);
+			retval = ext2fs_resize_mem(0, new_size, &list->extents);
+			if (retval)
+				goto out;
+			list->size += NUM_EXTENTS;
+		}
+
+		/* Add a new extent */
+		memcpy(list->extents + list->count, &extent, sizeof(extent));
+#ifdef DEBUG
+		printf("R: ino=%d pblk=%llu lblk=%llu len=%u\n", list->ino,
+				extent.e_pblk, extent.e_lblk, extent.e_len);
+#endif
+		list->count++;
+next:
+		retval = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT, &extent);
+	} while (retval == 0);
+
+out:
+	/* Ok if we run off the end */
+	if (retval == EXT2_ET_EXTENT_NO_NEXT)
+		retval = 0;
+	ext2fs_extent_free(handle);
+	return retval;
+}
+
+static int find_blocks(ext2_filsys fs, blk64_t *blocknr, e2_blkcnt_t blockcnt,
+		       blk64_t ref_blk, int ref_offset, void *priv_data)
+{
+	struct extent_list *list = priv_data;
+
+	/* Internal node? */
+	if (blockcnt < 0) {
+#if defined(DEBUG) || defined(DEBUG_FREE)
+		printf("ino=%d free=%llu bf=%llu\n", list->ino, *blocknr,
+				list->blocks_freed + 1);
+#endif
+		list->blocks_freed++;
+		ext2fs_block_alloc_stats2(fs, *blocknr, -1);
+		return 0;
+	}
+
+	/* Can we attach it to the previous extent? */
+	if (list->count) {
+		struct ext2fs_extent *last = list->extents +
+					     list->count - 1;
+		blk64_t end = last->e_len + 1;
+
+		if (last->e_pblk + last->e_len == *blocknr &&
+		    end < (1ULL << 32)) {
+			last->e_len++;
+#ifdef DEBUG
+			printf("R: ino=%d len=%u\n", list->ino, last->e_len);
+#endif
+			return 0;
+		}
+	}
+
+	/* Do we need to expand? */
+	if (list->count == list->size) {
+		unsigned int new_size = (list->size + NUM_EXTENTS) *
+					sizeof(struct ext2fs_extent);
+		list->retval = ext2fs_resize_mem(0, new_size, &list->extents);
+		if (list->retval)
+			return BLOCK_ABORT;
+		list->size += NUM_EXTENTS;
+	}
+
+	/* Add a new extent */
+	list->extents[list->count].e_pblk = *blocknr;
+	list->extents[list->count].e_lblk = blockcnt;
+	list->extents[list->count].e_len = 1;
+	list->extents[list->count].e_flags = 0;
+#ifdef DEBUG
+	printf("R: ino=%d pblk=%llu lblk=%llu len=%u\n", list->ino, *blocknr,
+			blockcnt, 1);
+#endif
+	list->count++;
+
+	return 0;
+}
+
+static errcode_t rebuild_extent_tree(e2fsck_t ctx, struct extent_list *list,
+				     ext2_ino_t ino)
+{
+	struct ext2_inode	inode;
+	errcode_t		retval;
+	ext2_extent_handle_t	handle;
+	unsigned int		i, ext_written;
+	struct ext2fs_extent	*ex, extent;
+
+	list->count = 0;
+	list->blocks_freed = 0;
+	list->ino = ino;
+	list->ext_read = 0;
+	e2fsck_read_inode(ctx, ino, &inode, "rebuild_extents");
+
+	/* Skip deleted inodes and inline data files */
+	if (inode.i_links_count == 0 ||
+	    inode.i_flags & EXT4_INLINE_DATA_FL)
+		return 0;
+
+	/* Collect lblk->pblk mappings */
+	if (inode.i_flags & EXT4_EXTENTS_FL) {
+		retval = load_extents(ctx, list);
+		goto extents_loaded;
+	}
+
+	retval = ext2fs_block_iterate3(ctx->fs, ino, BLOCK_FLAG_READ_ONLY, 0,
+				       find_blocks, list);
+	if (retval)
+		goto err;
+	if (list->retval) {
+		retval = list->retval;
+		goto err;
+	}
+
+extents_loaded:
+	/* Reset extent tree */
+	inode.i_flags &= ~EXT4_EXTENTS_FL;
+	memset(inode.i_block, 0, sizeof(inode.i_block));
+
+	/* Make a note of freed blocks */
+	retval = ext2fs_iblk_sub_blocks(ctx->fs, &inode, list->blocks_freed);
+	if (retval)
+		goto err;
+
+	/* Now stuff extents into the file */
+	retval = ext2fs_extent_open2(ctx->fs, ino, &inode, &handle);
+	if (retval)
+		goto err;
+
+	ext_written = 0;
+	for (i = 0, ex = list->extents; i < list->count; i++, ex++) {
+		memcpy(&extent, ex, sizeof(struct ext2fs_extent));
+		extent.e_flags &= EXT2_EXTENT_FLAGS_UNINIT;
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) {
+			if (extent.e_len > EXT_UNINIT_MAX_LEN) {
+				extent.e_len = EXT_UNINIT_MAX_LEN;
+				ex->e_pblk += EXT_UNINIT_MAX_LEN;
+				ex->e_lblk += EXT_UNINIT_MAX_LEN;
+				ex->e_len -= EXT_UNINIT_MAX_LEN;
+				ex--;
+				i--;
+			}
+		} else {
+			if (extent.e_len > EXT_INIT_MAX_LEN) {
+				extent.e_len = EXT_INIT_MAX_LEN;
+				ex->e_pblk += EXT_INIT_MAX_LEN;
+				ex->e_lblk += EXT_INIT_MAX_LEN;
+				ex->e_len -= EXT_INIT_MAX_LEN;
+				ex--;
+				i--;
+			}
+		}
+
+#ifdef DEBUG
+		printf("W: ino=%d pblk=%llu lblk=%llu len=%u\n", ino,
+				extent.e_pblk, extent.e_lblk, extent.e_len);
+#endif
+		retval = ext2fs_extent_insert(handle, EXT2_EXTENT_INSERT_AFTER,
+					      &extent);
+		if (retval)
+			goto err2;
+		retval = ext2fs_extent_fix_parents(handle);
+		if (retval)
+			goto err2;
+		ext_written++;
+	}
+
+#if defined(DEBUG) || defined(DEBUG_SUMMARY)
+	printf("rebuild: ino=%d extents=%d->%d\n", ino, list->ext_read,
+	       ext_written);
+#endif
+	e2fsck_write_inode(ctx, ino, &inode, "rebuild_extents");
+
+err2:
+	ext2fs_extent_free(handle);
+err:
+	return retval;
+}
+
+/* Rebuild the extents immediately */
+static errcode_t e2fsck_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino)
+{
+	struct extent_list	list;
+	errcode_t err;
+
+	if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
+				       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+	    (ctx->options & E2F_OPT_NO) ||
+	    (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
+		return 0;
+
+	e2fsck_read_bitmaps(ctx);
+	memset(&list, 0, sizeof(list));
+	err = ext2fs_get_mem(sizeof(struct ext2fs_extent) * NUM_EXTENTS,
+				&list.extents);
+	if (err)
+		return err;
+	list.size = NUM_EXTENTS;
+	err = rebuild_extent_tree(ctx, &list, ino);
+	ext2fs_free_mem(&list.extents);
+
+	return err;
+}
+
+static void rebuild_extents(e2fsck_t ctx, const char *pass_name, int pr_header)
+{
+	struct problem_context	pctx;
+#ifdef RESOURCE_TRACK
+	struct resource_track	rtrack;
+#endif
+	struct extent_list	list;
+	int			first = 1;
+	ext2_ino_t		ino = 0;
+	errcode_t		retval;
+
+	if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
+				       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+	    !ext2fs_test_valid(ctx->fs) ||
+	    ctx->invalid_bitmaps) {
+		if (ctx->inodes_to_rebuild)
+			ext2fs_free_inode_bitmap(ctx->inodes_to_rebuild);
+		ctx->inodes_to_rebuild = NULL;
+	}
+
+	if (ctx->inodes_to_rebuild == NULL)
+		return;
+
+	init_resource_track(&rtrack, ctx->fs->io);
+	clear_problem_context(&pctx);
+	e2fsck_read_bitmaps(ctx);
+
+	memset(&list, 0, sizeof(list));
+	retval = ext2fs_get_mem(sizeof(struct ext2fs_extent) * NUM_EXTENTS,
+				&list.extents);
+	list.size = NUM_EXTENTS;
+	while (1) {
+		retval = ext2fs_find_first_set_inode_bitmap2(
+				ctx->inodes_to_rebuild, ino + 1,
+				ctx->fs->super->s_inodes_count, &ino);
+		if (retval)
+			break;
+		pctx.ino = ino;
+		if (first) {
+			fix_problem(ctx, pr_header, &pctx);
+			first = 0;
+		}
+		pctx.errcode = rebuild_extent_tree(ctx, &list, ino);
+		if (pctx.errcode) {
+			end_problem_latch(ctx, PR_LATCH_OPTIMIZE_EXT);
+			fix_problem(ctx, PR_1E_OPTIMIZE_EXT_ERR, &pctx);
+		}
+		if (ctx->progress && !ctx->progress_fd)
+			e2fsck_simple_progress(ctx, "Rebuilding extents",
+					100.0 * (float) ino /
+					(float) ctx->fs->super->s_inodes_count,
+					ino);
+	}
+	end_problem_latch(ctx, PR_LATCH_OPTIMIZE_EXT);
+
+	ext2fs_free_inode_bitmap(ctx->inodes_to_rebuild);
+	ctx->inodes_to_rebuild = NULL;
+	ext2fs_free_mem(&list.extents);
+
+	print_resource_track(ctx, pass_name, &rtrack, ctx->fs->io);
+}
+
+/* Scan a file to see if we should rebuild its extent tree */
+errcode_t e2fsck_check_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino,
+				  struct ext2_inode *inode,
+				  struct problem_context *pctx)
+{
+	struct extent_tree_info	eti;
+	struct ext2_extent_info	info, top_info;
+	struct ext2fs_extent	extent;
+	ext2_extent_handle_t	ehandle;
+	ext2_filsys		fs = ctx->fs;
+	errcode_t		retval;
+
+	/* block map file and we want extent conversion */
+	if (!(inode->i_flags & EXT4_EXTENTS_FL) &&
+	    !(inode->i_flags & EXT4_INLINE_DATA_FL) &&
+	    (ctx->options & E2F_OPT_CONVERT_BMAP)) {
+		return e2fsck_rebuild_extents_later(ctx, ino);
+	}
+
+	if (!(inode->i_flags & EXT4_EXTENTS_FL))
+		return 0;
+	memset(&eti, 0, sizeof(eti));
+	eti.ino = ino;
+
+	/* Otherwise, go scan the extent tree... */
+	retval = ext2fs_extent_open2(fs, ino, inode, &ehandle);
+	if (retval)
+		return 0;
+
+	retval = ext2fs_extent_get_info(ehandle, &top_info);
+	if (retval)
+		goto out;
+
+	/* Check maximum extent depth */
+	pctx->ino = ino;
+	pctx->blk = top_info.max_depth;
+	pctx->blk2 = ext2fs_max_extent_depth(ehandle);
+	if (pctx->blk2 < pctx->blk &&
+	    fix_problem(ctx, PR_1_EXTENT_BAD_MAX_DEPTH, pctx))
+		eti.force_rebuild = 1;
+
+	/* Can we collect extent tree level stats? */
+	pctx->blk = MAX_EXTENT_DEPTH_COUNT;
+	if (pctx->blk2 > pctx->blk)
+		fix_problem(ctx, PR_1E_MAX_EXTENT_TREE_DEPTH, pctx);
+
+	/* We need to fix tree depth problems, but the scan isn't a fix. */
+	if (ctx->options & E2F_OPT_FIXES_ONLY)
+		goto out;
+
+	retval = ext2fs_extent_get(ehandle, EXT2_EXTENT_ROOT, &extent);
+	if (retval)
+		goto out;
+
+	do {
+		retval = ext2fs_extent_get_info(ehandle, &info);
+		if (retval)
+			break;
+
+		/*
+		 * If this is the first extent in an extent block that we
+		 * haven't visited, collect stats on the block.
+		 */
+		if (info.curr_entry == 1 &&
+		    !(extent.e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT) &&
+		    !eti.force_rebuild) {
+			struct extent_tree_level *etl;
+
+			etl = eti.ext_info + info.curr_level;
+			etl->num_extents += info.num_entries;
+			etl->max_extents += info.max_entries;
+			/*
+			 * Implementation wart: Splitting extent blocks when
+			 * appending will leave the old block with one free
+			 * entry.  Therefore unless the node is totally full,
+			 * pretend that a non-root extent block can hold one
+			 * fewer entry than it actually does, so that we don't
+			 * repeatedly rebuild the extent tree.
+			 */
+			if (info.curr_level &&
+			    info.num_entries < info.max_entries)
+				etl->max_extents--;
+		}
+
+		/* Skip to the end of a block of leaf nodes */
+		if (extent.e_flags & EXT2_EXTENT_FLAGS_LEAF) {
+			retval = ext2fs_extent_get(ehandle,
+						    EXT2_EXTENT_LAST_SIB,
+						    &extent);
+			if (retval)
+				break;
+		}
+
+		retval = ext2fs_extent_get(ehandle, EXT2_EXTENT_NEXT, &extent);
+	} while (retval == 0);
+out:
+	ext2fs_extent_free(ehandle);
+	return e2fsck_should_rebuild_extents(ctx, pctx, &eti, &top_info);
+}
+
+/* Having scanned a file's extent tree, decide if we should rebuild it */
+errcode_t e2fsck_should_rebuild_extents(e2fsck_t ctx,
+				   struct problem_context *pctx,
+				   struct extent_tree_info *eti,
+				   struct ext2_extent_info *info)
+{
+	struct extent_tree_level *ei;
+	int i, j, op;
+	unsigned int extents_per_block;
+
+	if (eti->force_rebuild)
+		goto rebuild;
+
+	extents_per_block = (ctx->fs->blocksize -
+			     sizeof(struct ext3_extent_header)) /
+			    sizeof(struct ext3_extent);
+	/*
+	 * If we can consolidate a level or shorten the tree, schedule the
+	 * extent tree to be rebuilt.
+	 */
+	for (i = 0, ei = eti->ext_info; i < info->max_depth + 1; i++, ei++) {
+		if (ei->max_extents - ei->num_extents > extents_per_block) {
+			pctx->blk = i;
+			op = PR_1E_CAN_NARROW_EXTENT_TREE;
+			goto rebuild;
+		}
+		for (j = 0; j < i; j++) {
+			if (ei->num_extents < eti->ext_info[j].max_extents) {
+				pctx->blk = i;
+				op = PR_1E_CAN_COLLAPSE_EXTENT_TREE;
+				goto rebuild;
+			}
+		}
+	}
+	return 0;
+
+rebuild:
+	if (eti->force_rebuild || fix_problem(ctx, op, pctx))
+		return e2fsck_rebuild_extents_later(ctx, eti->ino);
+	return 0;
+}
+
+void e2fsck_pass1e(e2fsck_t ctx)
+{
+	rebuild_extents(ctx, "Pass 1E", PR_1E_PASS_HEADER);
+}
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 938796f..524314f 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -56,6 +56,8 @@
 #define _INLINE_ inline
 #endif
 
+#undef DEBUG
+
 static int process_block(ext2_filsys fs, blk64_t	*blocknr,
 			 e2_blkcnt_t blockcnt, blk64_t ref_blk,
 			 int ref_offset, void *priv_data);
@@ -95,6 +97,7 @@ struct process_block_struct {
 	ext2fs_block_bitmap fs_meta_blocks;
 	e2fsck_t	ctx;
 	region_t	region;
+	struct extent_tree_info	eti;
 };
 
 struct process_inode_block {
@@ -1839,6 +1842,7 @@ void e2fsck_pass1(e2fsck_t ctx)
 		}
 		e2fsck_pass1_dupblocks(ctx, block_buf);
 	}
+	ctx->flags |= E2F_FLAG_ALLOC_OK;
 	ext2fs_free_mem(&inodes_to_process);
 endit:
 	e2fsck_use_inode_shortcuts(ctx, 0);
@@ -2490,6 +2494,23 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
 	pctx->errcode = ext2fs_extent_get_info(ehandle, &info);
 	if (pctx->errcode)
 		return;
+	if (!(ctx->options & E2F_OPT_FIXES_ONLY) &&
+	    !pb->eti.force_rebuild) {
+		struct extent_tree_level *etl;
+
+		etl = pb->eti.ext_info + info.curr_level;
+		etl->num_extents += info.num_entries;
+		etl->max_extents += info.max_entries;
+		/*
+		 * Implementation wart: Splitting extent blocks when appending
+		 * will leave the old block with one free entry.  Therefore
+		 * unless the node is totally full, pretend that a non-root
+		 * extent block can hold one fewer entry than it actually does,
+		 * so that we don't repeatedly rebuild the extent tree.
+		 */
+		if (info.curr_level && info.num_entries < info.max_entries)
+			etl->max_extents--;
+	}
 
 	pctx->errcode = ext2fs_extent_get(ehandle, EXT2_EXTENT_FIRST_SIB,
 					  &extent);
@@ -2826,11 +2847,27 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,
 
 	retval = ext2fs_extent_get_info(ehandle, &info);
 	if (retval == 0) {
-		if (info.max_depth >= MAX_EXTENT_DEPTH_COUNT)
-			info.max_depth = MAX_EXTENT_DEPTH_COUNT-1;
-		ctx->extent_depth_count[info.max_depth]++;
+		int max_depth = info.max_depth;
+
+		if (max_depth >= MAX_EXTENT_DEPTH_COUNT)
+			max_depth = MAX_EXTENT_DEPTH_COUNT-1;
+		ctx->extent_depth_count[max_depth]++;
 	}
 
+	/* Check maximum extent depth */
+	pctx->blk = info.max_depth;
+	pctx->blk2 = ext2fs_max_extent_depth(ehandle);
+	if (pctx->blk2 < pctx->blk &&
+	    fix_problem(ctx, PR_1_EXTENT_BAD_MAX_DEPTH, pctx))
+		pb->eti.force_rebuild = 1;
+
+	/* Can we collect extent tree level stats? */
+	pctx->blk = MAX_EXTENT_DEPTH_COUNT;
+	if (pctx->blk2 > pctx->blk)
+		fix_problem(ctx, PR_1E_MAX_EXTENT_TREE_DEPTH, pctx);
+	memset(pb->eti.ext_info, 0, sizeof(pb->eti.ext_info));
+	pb->eti.ino = pb->ino;
+
 	pb->region = region_create(0, info.max_lblk);
 	if (!pb->region) {
 		ext2fs_extent_free(ehandle);
@@ -2853,6 +2890,16 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,
 	region_free(pb->region);
 	pb->region = NULL;
 	ext2fs_extent_free(ehandle);
+
+	/* Rebuild unless it's a dir and we're rehashing it */
+	if (LINUX_S_ISDIR(inode->i_mode) &&
+	    e2fsck_dir_will_be_rehashed(ctx, ino))
+		return;
+
+	if (ctx->options & E2F_OPT_CONVERT_BMAP)
+		e2fsck_rebuild_extents_later(ctx, ino);
+	else
+		e2fsck_should_rebuild_extents(ctx, pctx, &pb->eti, &info);
 }
 
 /*
@@ -2937,6 +2984,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 	pb.pctx = pctx;
 	pb.ctx = ctx;
 	pb.inode_modified = 0;
+	pb.eti.force_rebuild = 0;
 	pctx->ino = ino;
 	pctx->errcode = 0;
 
@@ -3000,6 +3048,15 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 						  "check_blocks");
 			fs->flags = (flags & EXT2_FLAG_IGNORE_CSUM_ERRORS) |
 				    (fs->flags & ~EXT2_FLAG_IGNORE_CSUM_ERRORS);
+
+			if (ctx->options & E2F_OPT_CONVERT_BMAP) {
+#ifdef DEBUG
+				printf("bmap rebuild ino=%d\n", ino);
+#endif
+				if (!LINUX_S_ISDIR(inode->i_mode) ||
+				    !e2fsck_dir_will_be_rehashed(ctx, ino))
+					e2fsck_rebuild_extents_later(ctx, ino);
+			}
 		}
 	}
 	end_problem_latch(ctx, PR_LATCH_BLOCK);
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 960fb07..7af0b76 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1106,6 +1106,11 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("@A memory for encrypted @d list\n"),
 	  PROMPT_NONE, PR_FATAL },
 
+	/* Inode extent tree could be more shallow */
+	{ PR_1_EXTENT_BAD_MAX_DEPTH,
+	  N_("@i %i @x tree could be more shallow (%b; could be <= %c)\n"),
+	  PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
+
 	/* Pass 1b errors */
 
 	/* Pass 1B: Rescan for duplicate/bad blocks */
@@ -1203,6 +1208,48 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1D_CLONE_ERROR,
 	  N_("Couldn't clone file: %m\n"), PROMPT_NONE, 0 },
 
+	/* Pass 1E Extent tree optimization	*/
+
+	/* Pass 1E: Optimizing extent trees */
+	{ PR_1E_PASS_HEADER,
+	  N_("Pass 1E: Optimizing @x trees\n"),
+	  PROMPT_NONE, PR_PREEN_NOMSG },
+
+	/* Failed to optimize extent tree */
+	{ PR_1E_OPTIMIZE_EXT_ERR,
+	  N_("Failed to optimize @x tree %p (%i): %m\n"),
+	  PROMPT_NONE, 0 },
+
+	/* Optimizing extent trees */
+	{ PR_1E_OPTIMIZE_EXT_HEADER,
+	  N_("Optimizing @x trees: "),
+	  PROMPT_NONE, PR_MSG_ONLY },
+
+	/* Rebuilding extent tree %d */
+	{ PR_1E_OPTIMIZE_EXT,
+	  " %i",
+	  PROMPT_NONE, PR_LATCH_OPTIMIZE_EXT | PR_PREEN_NOHDR},
+
+	/* Rebuilding extent tree end */
+	{ PR_1E_OPTIMIZE_EXT_END,
+	  "\n",
+	  PROMPT_NONE, PR_PREEN_NOHDR },
+
+	/* Internal error: extent tree depth too large */
+	{ PR_1E_MAX_EXTENT_TREE_DEPTH,
+	  N_("Internal error: max extent tree depth too large (%b; expected=%c).\n"),
+	  PROMPT_NONE, PR_FATAL },
+
+	/* Inode extent tree could be shorter */
+	{ PR_1E_CAN_COLLAPSE_EXTENT_TREE,
+	  N_("@i %i @x tree could be shorter.\n\t(level %b is unnecessary)\n"),
+	  PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
+
+	/* Inode extent tree could be narrower */
+	{ PR_1E_CAN_NARROW_EXTENT_TREE,
+	  N_("@i %i @x tree could be narrower.\n\t(level %b has unnecessary nodes)\n"),
+	  PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
+
 	/* Pass 2 errors */
 
 	/* Pass 2: Checking directory structure */
@@ -1951,6 +1998,7 @@ static struct latch_descr pr_latch_info[] = {
 	{ PR_LATCH_TOOBIG, PR_1_INODE_TOOBIG, 0 },
 	{ PR_LATCH_OPTIMIZE_DIR, PR_3A_OPTIMIZE_DIR_HEADER, PR_3A_OPTIMIZE_DIR_END },
 	{ PR_LATCH_BG_CHECKSUM, PR_0_GDT_CSUM_LATCH, 0 },
+	{ PR_LATCH_OPTIMIZE_EXT, PR_1E_OPTIMIZE_EXT_HEADER, PR_1E_OPTIMIZE_EXT_END },
 	{ -1, 0, 0 },
 };
 
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 19b2301..ace8d2f 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -40,6 +40,7 @@ struct problem_context {
 #define PR_LATCH_TOOBIG	0x0080	/* Latch for file to big errors */
 #define PR_LATCH_OPTIMIZE_DIR 0x0090 /* Latch for optimize directories */
 #define PR_LATCH_BG_CHECKSUM 0x00A0  /* Latch for block group checksums */
+#define PR_LATCH_OPTIMIZE_EXT 0x00B0  /* Latch for rebuild extents */
 
 #define PR_LATCH(x)	((((x) & PR_LATCH_MASK) >> 4) - 1)
 
@@ -644,6 +645,9 @@ struct problem_context {
 /* Error allocating memory for encrypted directory list */
 #define PR_1_ALLOCATE_ENCRYPTED_DIRLIST		0x01007E
 
+/* extent tree max depth too big */
+#define PR_1_EXTENT_BAD_MAX_DEPTH		0x01007F
+
 /*
  * Pass 1b errors
  */
@@ -707,6 +711,33 @@ struct problem_context {
 #define PR_1D_CLONE_ERROR	0x013008
 
 /*
+ * Pass 1e --- rebuilding extent trees
+ */
+/* Pass 1e: Rebuilding extent trees */
+#define PR_1E_PASS_HEADER		0x014000
+
+/* Error rehash directory */
+#define PR_1E_OPTIMIZE_EXT_ERR		0x014001
+
+/* Rebuilding extent trees */
+#define PR_1E_OPTIMIZE_EXT_HEADER	0x014002
+
+/* Rebuilding extent %d */
+#define PR_1E_OPTIMIZE_EXT		0x014003
+
+/* Rebuilding extent tree end */
+#define PR_1E_OPTIMIZE_EXT_END		0x014004
+
+/* Internal error: extent tree depth too large */
+#define PR_1E_MAX_EXTENT_TREE_DEPTH	0x014005
+
+/* Inode extent tree could be shorter */
+#define PR_1E_CAN_COLLAPSE_EXTENT_TREE	0x014006
+
+/* Inode extent tree could be narrower */
+#define PR_1E_CAN_NARROW_EXTENT_TREE	0x014007
+
+/*
  * Pass 2 errors
  */
 
@@ -1035,6 +1066,8 @@ struct problem_context {
 /* Rehashing dir end */
 #define PR_3A_OPTIMIZE_DIR_END		0x031005
 
+/* Pass 3B is really just 1E */
+
 /*
  * Pass 4 errors
  */
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 1d720dc..c9da5d4 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -766,11 +766,11 @@ static int write_dir_block(ext2_filsys fs,
 
 static errcode_t write_directory(e2fsck_t ctx, ext2_filsys fs,
 				 struct out_dir *outdir,
-				 ext2_ino_t ino, int compress)
+				 ext2_ino_t ino, struct ext2_inode *inode,
+				 int compress)
 {
 	struct write_dir_struct wd;
 	errcode_t	retval;
-	struct ext2_inode 	inode;
 
 	retval = e2fsck_expand_directory(ctx, ino, -1, outdir->num);
 	if (retval)
@@ -789,22 +789,23 @@ static errcode_t write_directory(e2fsck_t ctx, ext2_filsys fs,
 	if (wd.err)
 		return wd.err;
 
-	e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
+	e2fsck_read_inode(ctx, ino, inode, "rehash_dir");
 	if (compress)
-		inode.i_flags &= ~EXT2_INDEX_FL;
+		inode->i_flags &= ~EXT2_INDEX_FL;
 	else
-		inode.i_flags |= EXT2_INDEX_FL;
-	retval = ext2fs_inode_size_set(fs, &inode,
+		inode->i_flags |= EXT2_INDEX_FL;
+	retval = ext2fs_inode_size_set(fs, inode,
 				       outdir->num * fs->blocksize);
 	if (retval)
 		return retval;
-	ext2fs_iblk_sub_blocks(fs, &inode, wd.cleared);
-	e2fsck_write_inode(ctx, ino, &inode, "rehash_dir");
+	ext2fs_iblk_sub_blocks(fs, inode, wd.cleared);
+	e2fsck_write_inode(ctx, ino, inode, "rehash_dir");
 
 	return 0;
 }
 
-errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino)
+errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
+			    struct problem_context *pctx)
 {
 	ext2_filsys 		fs = ctx->fs;
 	errcode_t		retval;
@@ -915,10 +916,14 @@ resort:
 			goto errout;
 	}
 
-	retval = write_directory(ctx, fs, &outdir, ino, fd.compress);
+	retval = write_directory(ctx, fs, &outdir, ino, &inode, fd.compress);
 	if (retval)
 		goto errout;
 
+	if (ctx->options & E2F_OPT_CONVERT_BMAP)
+		retval = e2fsck_rebuild_extents_later(ctx, ino);
+	else
+		retval = e2fsck_check_rebuild_extents(ctx, ino, &inode, pctx);
 errout:
 	free(dir_buf);
 	free(fd.harray);
@@ -962,7 +967,7 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
 #if 0
 		fix_problem(ctx, PR_3A_OPTIMIZE_DIR, &pctx);
 #endif
-		pctx.errcode = e2fsck_rehash_dir(ctx, ino);
+		pctx.errcode = e2fsck_rehash_dir(ctx, ino, &pctx);
 		if (pctx.errcode) {
 			end_problem_latch(ctx, PR_LATCH_OPTIMIZE_DIR);
 			fix_problem(ctx, PR_3A_OPTIMIZE_DIR_ERR, &pctx);
diff --git a/e2fsck/super.c b/e2fsck/super.c
index 1e7e749..e64262a 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -606,6 +606,13 @@ void check_super_block(e2fsck_t ctx)
 		ext2fs_mark_super_dirty(fs);
 	}
 
+	/* Did user ask us to convert files to extents? */
+	if (ctx->options & E2F_OPT_CONVERT_BMAP) {
+		fs->super->s_feature_incompat |=
+			EXT3_FEATURE_INCOMPAT_EXTENTS;
+		ext2fs_mark_super_dirty(fs);
+	}
+
 	if ((fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) &&
 	    (fs->super->s_first_meta_bg > fs->desc_blocks)) {
 		pctx.group = fs->desc_blocks;
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index f45a903..f8d088e 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -709,6 +709,12 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 			else
 				ctx->log_fn = string_copy(ctx, arg, 0);
 			continue;
+		} else if (strcmp(token, "bmap2extent") == 0) {
+			ctx->options |= E2F_OPT_CONVERT_BMAP;
+			continue;
+		} else if (strcmp(token, "fixes_only") == 0) {
+			ctx->options |= E2F_OPT_FIXES_ONLY;
+			continue;
 		} else {
 			fprintf(stderr, _("Unknown extended option: %s\n"),
 				token);
@@ -728,6 +734,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs(("\tdiscard\n"), stderr);
 		fputs(("\tnodiscard\n"), stderr);
 		fputs(("\treadahead_kb=<buffer size>\n"), stderr);
+		fputs(("\tbmap2extent\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
@@ -961,6 +968,22 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	if (extended_opts)
 		parse_extended_opts(ctx, extended_opts);
 
+	/* Complain about mutually exclusive rebuilding activities */
+	if (getenv("E2FSCK_FIXES_ONLY"))
+		ctx->options |= E2F_OPT_FIXES_ONLY;
+	if ((ctx->options & E2F_OPT_COMPRESS_DIRS) &&
+	    (ctx->options & E2F_OPT_FIXES_ONLY)) {
+		com_err(ctx->program_name, 0, "%s",
+			_("The -D and -E fixes_only options are incompatible."));
+		fatal_error(ctx, 0);
+	}
+	if ((ctx->options & E2F_OPT_CONVERT_BMAP) &&
+	    (ctx->options & E2F_OPT_FIXES_ONLY)) {
+		com_err(ctx->program_name, 0, "%s",
+			_("The -E bmap2extent and fixes_only options are incompatible."));
+		fatal_error(ctx, 0);
+	}
+
 	if ((cp = getenv("E2FSCK_CONFIG")) != NULL)
 		config_fn[0] = cp;
 	profile_set_syntax_err_cb(syntax_err_report);
diff --git a/tests/f_extent_bad_node/expect.1 b/tests/f_extent_bad_node/expect.1
index 0c0bc28..c9643a1 100644
--- a/tests/f_extent_bad_node/expect.1
+++ b/tests/f_extent_bad_node/expect.1
@@ -2,8 +2,13 @@ Pass 1: Checking inodes, blocks, and sizes
 Inode 12 has an invalid extent node (blk 22, lblk 0)
 Clear? yes
 
+Inode 12 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
 Inode 12, i_blocks is 16, should be 8.  Fix? yes
 
+Pass 1E: Optimizing extent trees
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
@@ -11,13 +16,13 @@ Pass 5: Checking group summary information
 Block bitmap differences:  -(21--23) -25
 Fix? yes
 
-Free blocks count wrong for group #0 (71, counted=75).
+Free blocks count wrong for group #0 (73, counted=77).
 Fix? yes
 
-Free blocks count wrong (71, counted=75).
+Free blocks count wrong (73, counted=77).
 Fix? yes
 
 
 test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
-test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks
+test_filesys: 12/16 files (0.0% non-contiguous), 23/100 blocks
 Exit status is 1
diff --git a/tests/f_extent_bad_node/expect.2 b/tests/f_extent_bad_node/expect.2
index 568c792..b78b193 100644
--- a/tests/f_extent_bad_node/expect.2
+++ b/tests/f_extent_bad_node/expect.2
@@ -3,5 +3,5 @@ Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
 Pass 5: Checking group summary information
-test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks
+test_filesys: 12/16 files (0.0% non-contiguous), 23/100 blocks
 Exit status is 0
diff --git a/tests/f_extent_int_bad_magic/expect.1 b/tests/f_extent_int_bad_magic/expect.1
index 0e82e2b..3529636 100644
--- a/tests/f_extent_int_bad_magic/expect.1
+++ b/tests/f_extent_int_bad_magic/expect.1
@@ -2,8 +2,13 @@ Pass 1: Checking inodes, blocks, and sizes
 Inode 12 has an invalid extent node (blk 1295, lblk 0)
 Clear? yes
 
+Inode 12 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
 Inode 12, i_blocks is 712, should be 0.  Fix? yes
 
+Pass 1E: Optimizing extent trees
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
diff --git a/tests/f_extent_leaf_bad_magic/expect.1 b/tests/f_extent_leaf_bad_magic/expect.1
index 7b6dbf1..ae27ecc 100644
--- a/tests/f_extent_leaf_bad_magic/expect.1
+++ b/tests/f_extent_leaf_bad_magic/expect.1
@@ -2,8 +2,13 @@ Pass 1: Checking inodes, blocks, and sizes
 Inode 12 has an invalid extent node (blk 1604, lblk 0)
 Clear? yes
 
+Inode 12 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
 Inode 12, i_blocks is 18, should be 0.  Fix? yes
 
+Pass 1E: Optimizing extent trees
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
diff --git a/tests/f_extent_oobounds/expect.1 b/tests/f_extent_oobounds/expect.1
index 3164ea0..f0e282e 100644
--- a/tests/f_extent_oobounds/expect.1
+++ b/tests/f_extent_oobounds/expect.1
@@ -3,8 +3,13 @@ Inode 12, end of extent exceeds allowed value
 	(logical block 15, physical block 200, len 30)
 Clear? yes
 
+Inode 12 extent tree could be narrower.
+	(level 1 has unnecessary nodes)
+Fix? yes
+
 Inode 12, i_blocks is 154, should be 94.  Fix? yes
 
+Pass 1E: Optimizing extent trees
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
@@ -12,13 +17,13 @@ Pass 5: Checking group summary information
 Block bitmap differences:  -(200--229)
 Fix? yes
 
-Free blocks count wrong for group #0 (156, counted=186).
+Free blocks count wrong for group #0 (158, counted=188).
 Fix? yes
 
-Free blocks count wrong (156, counted=186).
+Free blocks count wrong (158, counted=188).
 Fix? yes
 
 
 test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
-test_filesys: 12/32 files (8.3% non-contiguous), 70/256 blocks
+test_filesys: 12/32 files (8.3% non-contiguous), 68/256 blocks
 Exit status is 1
diff --git a/tests/f_extent_oobounds/expect.2 b/tests/f_extent_oobounds/expect.2
index 22c4f2c..0729283 100644
--- a/tests/f_extent_oobounds/expect.2
+++ b/tests/f_extent_oobounds/expect.2
@@ -3,5 +3,5 @@ Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
 Pass 5: Checking group summary information
-test_filesys: 12/32 files (8.3% non-contiguous), 70/256 blocks
+test_filesys: 12/32 files (8.3% non-contiguous), 68/256 blocks
 Exit status is 0
diff --git a/tests/f_extents/expect.1 b/tests/f_extents/expect.1
index aeebc7b..2751eb9 100644
--- a/tests/f_extents/expect.1
+++ b/tests/f_extents/expect.1
@@ -6,6 +6,10 @@ Inode 12 has an invalid extent
 	(logical block 0, invalid physical block 21994527527949, len 17)
 Clear? yes
 
+Inode 12 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
 Inode 12, i_blocks is 34, should be 0.  Fix? yes
 
 Inode 13 missing EXTENT_FL, but is in extents format
@@ -21,6 +25,10 @@ Inode 17 has an invalid extent
 	(logical block 0, invalid physical block 22011707397135, len 15)
 Clear? yes
 
+Inode 17 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
 Inode 17, i_blocks is 32, should be 0.  Fix? yes
 
 Error while reading over extent tree in inode 18: Corrupt extent header
@@ -31,6 +39,7 @@ Inode 18, i_blocks is 2, should be 0.  Fix? yes
 Special (device/socket/fifo) file (inode 19) has extents
 or inline-data flag set.  Clear? yes
 
+Pass 1E: Optimizing extent trees
 Pass 2: Checking directory structure
 Entry 'fbad-flag' in / (2) has deleted/unused inode 18.  Clear? yes
 


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (5 preceding siblings ...)
  2015-04-02  2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21 14:36   ` Theodore Ts'o
  2015-04-02  2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
                   ` (26 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

As of v4.0, the Linux kernel won't add blocks to a block-mapped file
on a bigalloc filesystem.  Therefore, convert any such files or
directories we find, to prevent fs errors later on.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c            |   17 +++++++++++++++++
 e2fsck/problem.c          |    5 +++++
 e2fsck/problem.h          |    3 +++
 tests/f_badcluster/expect |   26 +++++++++++++-------------
 4 files changed, 38 insertions(+), 13 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 524314f..308a95a 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -3203,6 +3203,23 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 		pctx->num = 0;
 	}
 
+	/*
+	 * The kernel gets mad if we ask it to allocate bigalloc clusters to
+	 * a block mapped file, so rebuild it as an extent file.  We can skip
+	 * symlinks because they're never rewritten.
+	 */
+	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+			EXT4_FEATURE_RO_COMPAT_BIGALLOC) &&
+	    (LINUX_S_ISREG(inode->i_mode) || LINUX_S_ISDIR(inode->i_mode)) &&
+	    ext2fs_inode_data_blocks2(fs, inode) > 0 &&
+	    (ino == EXT2_ROOT_INO || ino >= EXT2_FIRST_INO(fs->super)) &&
+	    !(inode->i_flags & (EXT4_EXTENTS_FL | EXT4_INLINE_DATA_FL)) &&
+	    fix_problem(ctx, PR_1_NO_BIGALLOC_BLOCKMAP_FILES, pctx)) {
+		pctx->errcode = e2fsck_rebuild_extents_later(ctx, ino);
+		if (pctx->errcode)
+			goto out;
+	}
+
 	if (ctx->dirs_to_hash && pb.is_dir &&
 	    !(ctx->lost_and_found && ctx->lost_and_found == ino) &&
 	    !(inode->i_flags & EXT2_INDEX_FL) &&
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 7af0b76..df919da 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1111,6 +1111,11 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("@i %i @x tree could be more shallow (%b; could be <= %c)\n"),
 	  PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
 
+	/* Inode extent tree could be more shallow */
+	{ PR_1_NO_BIGALLOC_BLOCKMAP_FILES,
+	  N_("@i %i on bigalloc @f cannot be @b mapped.  "),
+	  PROMPT_FIX, 0 },
+
 	/* Pass 1b errors */
 
 	/* Pass 1B: Rescan for duplicate/bad blocks */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index ace8d2f..dc0c18a 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -648,6 +648,9 @@ struct problem_context {
 /* extent tree max depth too big */
 #define PR_1_EXTENT_BAD_MAX_DEPTH		0x01007F
 
+/* bigalloc fs cannot have blockmap files */
+#define PR_1_NO_BIGALLOC_BLOCKMAP_FILES		0x010080
+
 /*
  * Pass 1b errors
  */
diff --git a/tests/f_badcluster/expect b/tests/f_badcluster/expect
index b8ce19d..65a1641 100644
--- a/tests/f_badcluster/expect
+++ b/tests/f_badcluster/expect
@@ -19,6 +19,8 @@ Inode 18 logical block 3 (physical block 1201) violates cluster allocation rules
 Will fix in pass 1B.
 Inode 18, i_blocks is 32, should be 64.  Fix? yes
 
+Inode 15 on bigalloc filesystem cannot be block mapped.  Fix? yes
+
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
@@ -65,19 +67,20 @@ File /g (inode #18, mod time Tue Jun 17 08:00:50 2014)
   has 1 multiply-claimed block(s), shared with 0 file(s):
 Clone multiply-claimed blocks? yes
 
+Pass 1E: Optimizing extent trees
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
 Pass 5: Checking group summary information
-Free blocks count wrong for group #0 (50, counted=47).
+Free blocks count wrong for group #0 (51, counted=48).
 Fix? yes
 
-Free blocks count wrong (800, counted=752).
+Free blocks count wrong (816, counted=768).
 Fix? yes
 
 
 test_fs: ***** FILE SYSTEM WAS MODIFIED *****
-test_fs: 18/128 files (22.2% non-contiguous), 1296/2048 blocks
+test_fs: 18/128 files (22.2% non-contiguous), 1280/2048 blocks
 Pass 1: Checking inodes, blocks, and sizes
 Inode 12, i_blocks is 64, should be 32.  Fix? yes
 
@@ -94,21 +97,21 @@ Pass 5: Checking group summary information
 Block bitmap differences:  -(1168--1200)
 Fix? yes
 
-Free blocks count wrong for group #0 (47, counted=50).
+Free blocks count wrong for group #0 (48, counted=51).
 Fix? yes
 
-Free blocks count wrong (752, counted=800).
+Free blocks count wrong (768, counted=816).
 Fix? yes
 
 
 test_fs: ***** FILE SYSTEM WAS MODIFIED *****
-test_fs: 18/128 files (5.6% non-contiguous), 1248/2048 blocks
+test_fs: 18/128 files (5.6% non-contiguous), 1232/2048 blocks
 Pass 1: Checking inodes, blocks, and sizes
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
 Pass 5: Checking group summary information
-test_fs: 18/128 files (5.6% non-contiguous), 1248/2048 blocks
+test_fs: 18/128 files (5.6% non-contiguous), 1232/2048 blocks
 debugfs: stat /a
 Inode: 12   Type: regular    Mode:  0644   Flags: 0x80000
 Generation: 1117152157    Version: 0x00000001
@@ -146,19 +149,16 @@ mtime: 0x539ff5b2 -- Tue Jun 17 08:00:50 2014
 EXTENTS:
 (0-1):1216-1217, (2):1218
 debugfs: stat /d
-Inode: 15   Type: regular    Mode:  0644   Flags: 0x0
+Inode: 15   Type: regular    Mode:  0644   Flags: 0x80000
 Generation: 1117152160    Version: 0x00000001
 User:     0   Group:     0   Size: 3072
 File ACL: 0    Directory ACL: 0
-Links: 1   Blockcount: 32
+Links: 1   Blockcount: 0
 Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x539ff5b2 -- Tue Jun 17 08:00:50 2014
 atime: 0x539ff5b2 -- Tue Jun 17 08:00:50 2014
 mtime: 0x539ff5b2 -- Tue Jun 17 08:00:50 2014
-BLOCKS:
-(TIND):1650
-TOTAL: 1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (6 preceding siblings ...)
  2015-04-02  2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
@ 2015-04-02  2:34 ` Darrick J. Wong
  2015-04-21 14:47   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
                   ` (25 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:34 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/f_collapse_extent_tree/expect.1       |   18 ++++
 tests/f_collapse_extent_tree/expect.2       |   10 ++
 tests/f_collapse_extent_tree/image.gz       |  Bin
 tests/f_collapse_extent_tree/name           |    1 
 tests/f_collapse_extent_tree/script         |  118 +++++++++++++++++++++++++++
 tests/f_compress_extent_tree_level/expect.1 |   25 ++++++
 tests/f_compress_extent_tree_level/expect.2 |   17 ++++
 tests/f_compress_extent_tree_level/image.gz |  Bin
 tests/f_compress_extent_tree_level/name     |    1 
 tests/f_compress_extent_tree_level/script   |  118 +++++++++++++++++++++++++++
 tests/f_convert_bmap/expect.1               |   26 ++++++
 tests/f_convert_bmap/expect.2               |   10 ++
 tests/f_convert_bmap/image.gz               |  Bin
 tests/f_convert_bmap/name                   |    1 
 tests/f_convert_bmap/script                 |  117 +++++++++++++++++++++++++++
 tests/f_convert_bmap_and_extent/expect.1    |   33 +++++++
 tests/f_convert_bmap_and_extent/expect.2    |   16 ++++
 tests/f_convert_bmap_and_extent/image.gz    |  Bin
 tests/f_convert_bmap_and_extent/name        |    1 
 tests/f_convert_bmap_and_extent/script      |  119 +++++++++++++++++++++++++++
 tests/f_extent_too_deep/expect.1            |   23 +++++
 tests/f_extent_too_deep/expect.2            |   10 ++
 tests/f_extent_too_deep/image.gz            |  Bin
 tests/f_extent_too_deep/name                |    1 
 tests/f_extent_too_deep/script              |  118 +++++++++++++++++++++++++++
 tests/f_opt_extent/expect                   |   55 ++++++++++++
 tests/f_opt_extent/name                     |    1 
 tests/f_opt_extent/script                   |   64 +++++++++++++++
 tests/f_opt_extent_ext3/expect              |   44 ++++++++++
 tests/f_opt_extent_ext3/name                |    1 
 tests/f_opt_extent_ext3/script              |   65 +++++++++++++++
 31 files changed, 1013 insertions(+)
 create mode 100644 tests/f_collapse_extent_tree/expect.1
 create mode 100644 tests/f_collapse_extent_tree/expect.2
 create mode 100644 tests/f_collapse_extent_tree/image.gz
 create mode 100644 tests/f_collapse_extent_tree/name
 create mode 100644 tests/f_collapse_extent_tree/script
 create mode 100644 tests/f_compress_extent_tree_level/expect.1
 create mode 100644 tests/f_compress_extent_tree_level/expect.2
 create mode 100644 tests/f_compress_extent_tree_level/image.gz
 create mode 100644 tests/f_compress_extent_tree_level/name
 create mode 100644 tests/f_compress_extent_tree_level/script
 create mode 100644 tests/f_convert_bmap/expect.1
 create mode 100644 tests/f_convert_bmap/expect.2
 create mode 100644 tests/f_convert_bmap/image.gz
 create mode 100644 tests/f_convert_bmap/name
 create mode 100644 tests/f_convert_bmap/script
 create mode 100644 tests/f_convert_bmap_and_extent/expect.1
 create mode 100644 tests/f_convert_bmap_and_extent/expect.2
 create mode 100644 tests/f_convert_bmap_and_extent/image.gz
 create mode 100644 tests/f_convert_bmap_and_extent/name
 create mode 100644 tests/f_convert_bmap_and_extent/script
 create mode 100644 tests/f_extent_too_deep/expect.1
 create mode 100644 tests/f_extent_too_deep/expect.2
 create mode 100644 tests/f_extent_too_deep/image.gz
 create mode 100644 tests/f_extent_too_deep/name
 create mode 100644 tests/f_extent_too_deep/script
 create mode 100644 tests/f_opt_extent/expect
 create mode 100644 tests/f_opt_extent/name
 create mode 100644 tests/f_opt_extent/script
 create mode 100644 tests/f_opt_extent_ext3/expect
 create mode 100644 tests/f_opt_extent_ext3/name
 create mode 100644 tests/f_opt_extent_ext3/script


diff --git a/tests/f_collapse_extent_tree/expect.1 b/tests/f_collapse_extent_tree/expect.1
new file mode 100644
index 0000000..d76880c
--- /dev/null
+++ b/tests/f_collapse_extent_tree/expect.1
@@ -0,0 +1,18 @@
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 -     0     9              1
+ 1/ 1   1/  1     0 -     0    10 -    10      1 
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 1
diff --git a/tests/f_collapse_extent_tree/expect.2 b/tests/f_collapse_extent_tree/expect.2
new file mode 100644
index 0000000..a1d28b1
--- /dev/null
+++ b/tests/f_collapse_extent_tree/expect.2
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  1     0 -     0    10 -    10      1 
diff --git a/tests/f_collapse_extent_tree/image.gz b/tests/f_collapse_extent_tree/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..97036cc597c5d2bcf6da6ffa68039b743b46aa20
GIT binary patch
literal 2537
zcmb2|=3r2~7aqdI{Pxc7?BGBdh6lxY##=IfImEQ^-3nbX`9VZ0->#??lNCe@FW4FQ
z#YVL}TII*Z*HrHjAfO&wvUze}my@ja$2)hVrIT2)v*+IzulqiySl;uUviknXEiDW(
zp{IYUa6fd>vO6yD>gk^~J=eHSyKdZJX8c#hmFMiUr|S)uTs`cRwf(*Jw5JI-Ev9#g
zzx10H^z+x-+1&PezM<j%wN-x#UY^?f?s?&}Z!v%C-)BF5eDl?n_HOO-adCV1e|+^f
zi)Zq}X`-jCz9%p5SsMMuaQ~#F154Q%4#=Ijd_P(9SoF_&+pXIBr(Hc<aafRvfuW&!
zy=4ribnWdMatsU%H@<&*|36zXO?crCT?gHin@3zeO5c0DaMs;f`-AOklmETD^R*_D
z<Gowbo@dFzK=l>@Y`c&Bzwh4l(H_VvIMHLp0;CiE@B>MS|I9!V1gcl=k~>^-C+GRq
zERBES2J6qaJ*xkI`|HBKUj^>6K=~WTAJ?~oRpKF>ikGbB1<K1<*)cNWQAa>3L0jxv
z+Ope)Q@;H^{nJEJ_gnqOo+Qus7x`2FDlW?`wQch{Ubb=Tv2VRgcl<QbdwS81f1O)`
zxz73}p2?fs|I45L_x?U_+1AJMzxKcTu9(hR=B|9cUak6ReWXR?Gy8?^|AIA>)j!SO
zTl3dGUh~(|qxV1UpRv3(_UHXHezOY~k3TI=558jc%31d7spkCte`^0vE7$)i)8ioE
m=uzp>5Eu=C(GVC7fzc2c4FPgP;J~%NEDm#Conv57U;qF<XAwLA

literal 0
HcmV?d00001

diff --git a/tests/f_collapse_extent_tree/name b/tests/f_collapse_extent_tree/name
new file mode 100644
index 0000000..83e506f
--- /dev/null
+++ b/tests/f_collapse_extent_tree/name
@@ -0,0 +1 @@
+extent tree can be collapsed one level
diff --git a/tests/f_collapse_extent_tree/script b/tests/f_collapse_extent_tree/script
new file mode 100644
index 0000000..ee18438
--- /dev/null
+++ b/tests/f_collapse_extent_tree/script
@@ -0,0 +1,118 @@
+if [ "$DESCRIPTION"x != x ]; then
+	test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+	IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+	FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+	SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+	OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+	OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+	if [ -f $test_dir/expect.1.gz ]; then
+		EXP1=$test_name.1.tmp
+		gunzip < $test_dir/expect.1.gz > $EXP1
+	else
+		EXP1=$test_dir/expect.1
+	fi
+fi
+
+if [ "$EXP2"x = x ]; then
+	if [ -f $test_dir/expect.2.gz ]; then
+		EXP2=$test_name.2.tmp
+		gunzip < $test_dir/expect.2.gz > $EXP2
+	else
+		EXP2=$test_dir/expect.2
+	fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+	gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$FSCK $FSCK_OPT  -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+if [ "$ONE_PASS_ONLY" != "true" ]; then
+	$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1 
+	status=$?
+	echo Exit status is $status >> $OUT2.new
+	echo 'ex /a' > $TMPFILE.cmd
+	$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+	rm -rf $TMPFILE.cmd
+	sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+	rm -f $OUT2.new
+fi
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+	rm -f $test_name.ok $test_name.failed
+	cmp -s $OUT1 $EXP1
+	status1=$?
+	if [ "$ONE_PASS_ONLY" != "true" ]; then
+		cmp -s $OUT2 $EXP2
+		status2=$?
+	else
+		status2=0
+	fi
+	if [ "$PASS_ZERO" = "true" ]; then
+		cmp -s $test_name.0.log	$test_dir/expect.0
+		status3=$?
+	else
+		status3=0
+	fi
+
+	if [ -z "$test_description" ] ; then
+		description="$test_name"
+	else
+		description="$test_name: $test_description"
+	fi
+
+	if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+		echo "$description: ok"
+		touch $test_name.ok
+	else
+		echo "$description: failed"
+		rm -f $test_name.failed
+		if [ "$PASS_ZERO" = "true" ]; then
+			diff $DIFF_OPTS $test_dir/expect.0 \
+				$test_name.0.log >> $test_name.failed
+		fi
+		diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+		if [ "$ONE_PASS_ONLY" != "true" ]; then
+			diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+		fi
+	fi
+	rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+	unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2 
+	unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+	unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_compress_extent_tree_level/expect.1 b/tests/f_compress_extent_tree_level/expect.1
new file mode 100644
index 0000000..45dcb39
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/expect.1
@@ -0,0 +1,25 @@
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  2     0 -    16     9             17
+ 1/ 1   1/  4     0 -     0    10 -    10      1 
+ 1/ 1   2/  4    11 -    11   100 -   100      1 
+ 1/ 1   3/  4    13 -    13   101 -   101      1 
+ 1/ 1   4/  4    15 -    15   102 -   102      1 
+ 0/ 1   2/  2    17 -    21    12              5
+ 1/ 1   1/  3    17 -    17   103 -   103      1 
+ 1/ 1   2/  3    19 -    19   104 -   104      1 
+ 1/ 1   3/  3    21 -    21   105 -   105      1 
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 extent tree could be narrower.
+	(level 1 has unnecessary nodes)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (8.3% non-contiguous), 26/512 blocks
+Exit status is 1
diff --git a/tests/f_compress_extent_tree_level/expect.2 b/tests/f_compress_extent_tree_level/expect.2
new file mode 100644
index 0000000..07d1082
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/expect.2
@@ -0,0 +1,17 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (8.3% non-contiguous), 26/512 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 -    21     9             22
+ 1/ 1   1/  7     0 -     0    10 -    10      1 
+ 1/ 1   2/  7    11 -    11   100 -   100      1 
+ 1/ 1   3/  7    13 -    13   101 -   101      1 
+ 1/ 1   4/  7    15 -    15   102 -   102      1 
+ 1/ 1   5/  7    17 -    17   103 -   103      1 
+ 1/ 1   6/  7    19 -    19   104 -   104      1 
+ 1/ 1   7/  7    21 -    21   105 -   105      1 
diff --git a/tests/f_compress_extent_tree_level/image.gz b/tests/f_compress_extent_tree_level/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..a552a586ce20e4fbc01d6f332ec800c5c5b6011f
GIT binary patch
literal 2581
zcmb2|=3rR!C_IFT`RyHR9}!0hh6gkEYCmX@`O&3zM1osdIA^Jlw(gQiEytLI<J9c9
z&L3%E^Y!W3JGZlUZ`ds<>4n?Hq`jLtg`|b&q_WGat95BG-FN>tx4!H<U)lS-AJ6lO
z6&ZG4&Dp8pWHDvdcEgsixcDa#cN4c|Jrj3xXS?6icVxq+qB_I2E!8u;UtfEt{cM|d
zx!<y9uDd4Byy90XBb_r>eqZ_jr)RIH>*?If{BP|Sv-8%k+sAkPd2;YL_u=Paa{ab8
z7R65wT$X5aDi6&4{msJ6yZ%?_>ox0_zxZ;6nIVCD%G3ArMBKw){||e0eSc1N>&EnR
z0t^fcIrZA@uO9}lJ$3)OSwS^W;KBazx_|d&lMDq`Wi#$_4&>npx&GlDckiQ5TVJnz
zJo$QFjI93l)(@*BAG_wy2C6@>?b21V|Nk$ndG;T~<GcJ?1V}ghv<H$2|M-C<2;?k1
z7G&FVWc^>G+?H3>jHhdCT>a0f2bmPx&0M7aJXrJFvl^?T|K`LORWSmc@PEdCYgULx
zh7E5`r)9;j`(JJgG7%fdIpei?Ey$ub7WWu|+Q|hE%FpQb^sPCXXZ5ap`Q=61t^WO#
zxjHTCMMSQ(lh5g6K|Ejc^Tfly|0w>exlI4=-q>Te9zVFr@rQf!_Dhd`Ua)#xf9CQ1
zW2IZaKK4Jh|J#3OInf{H^QM12f1&DYykOsxU+Qau|6eK2ar?V@>iYlgPk)$;9sa}r
zHJ&GZ|LXr;C41uCH=MikZRv}1mHu*%)xaV1=-A3zpVnWgPo1*Aeytb_x&9i}HyQ$?
fAut*OqaiRF0;3^7AOs$KuVp*dzh0JsL4g4PCSfgc

literal 0
HcmV?d00001

diff --git a/tests/f_compress_extent_tree_level/name b/tests/f_compress_extent_tree_level/name
new file mode 100644
index 0000000..fde4f4a
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/name
@@ -0,0 +1 @@
+compress an extent tree level
diff --git a/tests/f_compress_extent_tree_level/script b/tests/f_compress_extent_tree_level/script
new file mode 100644
index 0000000..ee18438
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/script
@@ -0,0 +1,118 @@
+if [ "$DESCRIPTION"x != x ]; then
+	test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+	IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+	FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+	SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+	OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+	OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+	if [ -f $test_dir/expect.1.gz ]; then
+		EXP1=$test_name.1.tmp
+		gunzip < $test_dir/expect.1.gz > $EXP1
+	else
+		EXP1=$test_dir/expect.1
+	fi
+fi
+
+if [ "$EXP2"x = x ]; then
+	if [ -f $test_dir/expect.2.gz ]; then
+		EXP2=$test_name.2.tmp
+		gunzip < $test_dir/expect.2.gz > $EXP2
+	else
+		EXP2=$test_dir/expect.2
+	fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+	gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$FSCK $FSCK_OPT  -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+if [ "$ONE_PASS_ONLY" != "true" ]; then
+	$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1 
+	status=$?
+	echo Exit status is $status >> $OUT2.new
+	echo 'ex /a' > $TMPFILE.cmd
+	$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+	rm -rf $TMPFILE.cmd
+	sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+	rm -f $OUT2.new
+fi
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+	rm -f $test_name.ok $test_name.failed
+	cmp -s $OUT1 $EXP1
+	status1=$?
+	if [ "$ONE_PASS_ONLY" != "true" ]; then
+		cmp -s $OUT2 $EXP2
+		status2=$?
+	else
+		status2=0
+	fi
+	if [ "$PASS_ZERO" = "true" ]; then
+		cmp -s $test_name.0.log	$test_dir/expect.0
+		status3=$?
+	else
+		status3=0
+	fi
+
+	if [ -z "$test_description" ] ; then
+		description="$test_name"
+	else
+		description="$test_name: $test_description"
+	fi
+
+	if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+		echo "$description: ok"
+		touch $test_name.ok
+	else
+		echo "$description: failed"
+		rm -f $test_name.failed
+		if [ "$PASS_ZERO" = "true" ]; then
+			diff $DIFF_OPTS $test_dir/expect.0 \
+				$test_name.0.log >> $test_name.failed
+		fi
+		diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+		if [ "$ONE_PASS_ONLY" != "true" ]; then
+			diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+		fi
+	fi
+	rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+	unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2 
+	unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+	unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_convert_bmap/expect.1 b/tests/f_convert_bmap/expect.1
new file mode 100644
index 0000000..7d2ca86
--- /dev/null
+++ b/tests/f_convert_bmap/expect.1
@@ -0,0 +1,26 @@
+debugfs: stat /a
+Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
+Generation: 1573716129    Version: 0x00000000:00000001
+User:     0   Group:     0   Size: 524288
+File ACL: 0    Directory ACL: 0
+Links: 1   Blockcount: 1030
+Fragment:  Address: 0    Number: 0    Size: 0
+ ctime: 0x5457f87a:62ae2980 -- Mon Nov  3 21:49:46 2014
+ atime: 0x5457f87a:61ba0598 -- Mon Nov  3 21:49:46 2014
+ mtime: 0x5457f87a:62ae2980 -- Mon Nov  3 21:49:46 2014
+crtime: 0x5457f87a:61ba0598 -- Mon Nov  3 21:49:46 2014
+Size of extra inode fields: 28
+BLOCKS:
+(0-11):1025-1036, (IND):24, (12-267):1037-1292, (DIND):25, (IND):41, (268-511):1293-1536
+TOTAL: 515
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (8.3% non-contiguous), 570/2048 blocks
+Exit status is 1
diff --git a/tests/f_convert_bmap/expect.2 b/tests/f_convert_bmap/expect.2
new file mode 100644
index 0000000..632d411
--- /dev/null
+++ b/tests/f_convert_bmap/expect.2
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 570/2048 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  1     0 -   511  1025 -  1536    512 
diff --git a/tests/f_convert_bmap/image.gz b/tests/f_convert_bmap/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..7c22532397ba8d42e928f75190dc545cae2140cf
GIT binary patch
literal 3548
zcmeH{TTD|27{|LAMF+@(0Tox+B<ivR0l73*3mPNpq{>AKJy;kpC>$)zXsxs>&77h{
z5d{R1Tf|{+oO3i%(L%x52w{Q@ZBMl=7l%+hhteTCTuLv?;vP1$_wAwI!}sN%d{4js
zKY#y=D@kV;l8$`5%sjNJl`~FyG$XVKkeS>DH}6f}NR0H7XZeLMHZ~n7S%IBv`1#=N
zvthAe6*sSJFDc(vQMvN!_C4jFe<j65!*^OUm={)9=G$@69F1o=F+fw#*9&#jyvB6W
zL}KW^E$$xb{J^bs+dN*RTpr^8kjSR!^SeC_oE_k?>(^gS=<hNObc@YMH>Kvk-p-E5
z35ksKaBDTb(PT!@CVwq0qb~h&i*n<*7OXbTS(15(N<89733W!X1W$_EjVBkiwdO+O
zgdm1$#XPIm+f@lYY^0qqFBCAP$X(HIv%QKlVn5zT?)`;ZVOpN;uzxq3w!?I(<aK$X
zaS!SJe5<2`goNateEzncYj1mM?y%OZ3HI8cvU9NDh)8IB%F@##rq|m2I-85nb<Z2w
zU2pNJ`vVSvg$htSs_DKP485(ZoNIkL`xm<MpHIzr^#yEhcYn0tc1iSaYjk?luhhO5
z90Xc?kWFAOAghA2z*hiVg?NKc04W0&pkFFLh%qk{=rBd}kQi-QLIK4oC}Cc{q-zD*
zP`R6C8NQns<?n~mydY0dt(W*QypVmG06dPE)IW%j((~@|V>qd0P#|_(kwxolh)vP1
z!4D|7w1*Ax2F-m!!v{D?hY_}Bj_Bhv%&`NoNpo3|U_RQefeE7?I)#|DeuRy`+sqVb
zhZToacT$fmp+(`UIb^NwAlA&?)i0^me$S019}PlkxcgO2tDz)Rj%kS-d=8m`$kjMO
z6!SC5aRssfTtZb|mQr*n*h?xr3>3)6@Uzsrhh!CaB~>w;YLUFa>Is<7Q;82Dp_q#3
z<csQ={t_?rJf<eDT62Ug&tzq~S40U_9La2yoxnS+C+pK1rS8~c>oFl5nU*FALaVW-
z#5F6(I+7;8h~wM?EMP?P6ssj5Wk)A#L~23336-^o#f*~pqh(kRQDM!sx4^PST@Y4H
z$d$?>mQGWE>8%_)T$Cn~;8+<UR(9q~qI9L)1bcHnoC0S2?i-o4t{d#wtbKDMepYK!
z><UbF*FFwcpxc%|-kFy%XsC0WSf9P?uV}qL5AcUQmFdYp`an;>8K9}Er5St++^&Mg
z!A0OEqvaYnu4F^;aunnuN*<J&rxAkvO3UQwWiHGAd&`@O7ag;OUt9kyt+v8kM|<{I
zIyy5D-6rU+%SLL#@YyT-x{F^mR@l1kJP-CwGEIen(_^EFs~!aO?fhmP`84O<$bd6b
zozXVSD_9JACTbIs-$@>k$83X+0lqWGS@2!*G{ZnA+mYmpn#n71w;rbdEhKizylA>_
zMx@ViZ17o^Z5wioybK<B(`9y+686({S#NlAG;TtcgFgWhA6R-TGwW2%xanw*RK0hH
z?Vcg(1Qg2i3)j1S%5oZoyZ;f(55(v*iZt4vOKqNzFP<OV(zIHAX4?0IAD46ya1n43
aa1n43a1r?55SY$(e6qN4MIedv8R-w=i-Ai3

literal 0
HcmV?d00001

diff --git a/tests/f_convert_bmap/name b/tests/f_convert_bmap/name
new file mode 100644
index 0000000..67e0d47
--- /dev/null
+++ b/tests/f_convert_bmap/name
@@ -0,0 +1 @@
+convert blockmap file to extents file
diff --git a/tests/f_convert_bmap/script b/tests/f_convert_bmap/script
new file mode 100644
index 0000000..f6b6f62
--- /dev/null
+++ b/tests/f_convert_bmap/script
@@ -0,0 +1,117 @@
+if [ "$DESCRIPTION"x != x ]; then
+	test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+	IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+	FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+	SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+	OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+	OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+	if [ -f $test_dir/expect.1.gz ]; then
+		EXP1=$test_name.1.tmp
+		gunzip < $test_dir/expect.1.gz > $EXP1
+	else
+		EXP1=$test_dir/expect.1
+	fi
+fi
+
+if [ "$EXP2"x = x ]; then
+	if [ -f $test_dir/expect.2.gz ]; then
+		EXP2=$test_name.2.tmp
+		gunzip < $test_dir/expect.2.gz > $EXP2
+	else
+		EXP2=$test_dir/expect.2
+	fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+	gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'stat /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$TUNE2FS -O extent $TMPFILE >> $OUT1.new 2>&1
+$FSCK $FSCK_OPT -E bmap2extent -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1 
+status=$?
+echo Exit status is $status >> $OUT2.new
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+rm -rf $TMPFILE.cmd
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+rm -f $OUT2.new
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+	rm -f $test_name.ok $test_name.failed
+	cmp -s $OUT1 $EXP1
+	status1=$?
+	if [ "$ONE_PASS_ONLY" != "true" ]; then
+		cmp -s $OUT2 $EXP2
+		status2=$?
+	else
+		status2=0
+	fi
+	if [ "$PASS_ZERO" = "true" ]; then
+		cmp -s $test_name.0.log	$test_dir/expect.0
+		status3=$?
+	else
+		status3=0
+	fi
+
+	if [ -z "$test_description" ] ; then
+		description="$test_name"
+	else
+		description="$test_name: $test_description"
+	fi
+
+	if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+		echo "$description: ok"
+		touch $test_name.ok
+	else
+		echo "$description: failed"
+		rm -f $test_name.failed
+		if [ "$PASS_ZERO" = "true" ]; then
+			diff $DIFF_OPTS $test_dir/expect.0 \
+				$test_name.0.log >> $test_name.failed
+		fi
+		diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+		if [ "$ONE_PASS_ONLY" != "true" ]; then
+			diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+		fi
+	fi
+	rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+	unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2 
+	unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+	unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_convert_bmap_and_extent/expect.1 b/tests/f_convert_bmap_and_extent/expect.1
new file mode 100644
index 0000000..7af91aa
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/expect.1
@@ -0,0 +1,33 @@
+debugfs: stat /a
+Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
+Generation: 1573716129    Version: 0x00000000:00000001
+User:     0   Group:     0   Size: 524288
+File ACL: 0    Directory ACL: 0
+Links: 1   Blockcount: 1030
+Fragment:  Address: 0    Number: 0    Size: 0
+ ctime: 0x5457f87a:62ae2980 -- Mon Nov  3 21:49:46 2014
+ atime: 0x5457f87a:61ba0598 -- Mon Nov  3 21:49:46 2014
+ mtime: 0x5457f87a:62ae2980 -- Mon Nov  3 21:49:46 2014
+crtime: 0x5457f87a:61ba0598 -- Mon Nov  3 21:49:46 2014
+Size of extra inode fields: 28
+BLOCKS:
+(0-11):1025-1036, (IND):24, (12-267):1037-1292, (DIND):25, (IND):41, (268-511):1293-1536
+TOTAL: 515
+
+debugfs: ex /zero
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 -     8    28              9
+ 1/ 1   1/  4     0 -     0    27 -    27      1 
+ 1/ 1   2/  4     2 -     2    29 -    29      1 
+ 1/ 1   3/  4     4 -     4    31 -    31      1 
+ 1/ 1   4/  4     6 -     6    33 -    33      1 
+Pass 1: Checking inodes, blocks, and sizes
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 13/128 files (15.4% non-contiguous), 574/2048 blocks
+Exit status is 1
diff --git a/tests/f_convert_bmap_and_extent/expect.2 b/tests/f_convert_bmap_and_extent/expect.2
new file mode 100644
index 0000000..73765ea
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/expect.2
@@ -0,0 +1,16 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 13/128 files (7.7% non-contiguous), 574/2048 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  1     0 -   511  1025 -  1536    512 
+debugfs: ex /zero
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  4     0 -     0    27 -    27      1 
+ 0/ 0   2/  4     2 -     2    29 -    29      1 
+ 0/ 0   3/  4     4 -     4    31 -    31      1 
+ 0/ 0   4/  4     6 -     6    33 -    33      1 
diff --git a/tests/f_convert_bmap_and_extent/image.gz b/tests/f_convert_bmap_and_extent/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..916b493c710843030b453b548be0b6adcd12c48d
GIT binary patch
literal 3657
zcmeIyTToMX9tUvLscfwayK5@~0__Zy*%bt#ks1VQS!7xUxg-RLBtct56D}f#m|%je
z4(XO+RzbXANW}$m;UwG+mq0*^Ra<Uj&B>8)$s$(poDBi>PvC?kJG0$K-}i;z!}sAg
zzo*Z9=jT`PI~P@THFe`A^VGW?Il$%alCvc2Nobha)|k?9a!YJ>-XH&z5>s=9R_{gH
z=JkvJRiWZw+oAU_2l@M#P_IDM(l-Cwf0$Wym}>Wve?8-$k`pE4^;@Q`4(o`}>MR|w
z4HS&pOc6U=DwO;|Ui{Y0uAl7Z{hSrj`|?qOM|w{E%;}<nr@|MWujj^gdVW`FE;c_}
zIbB3d{t)XSn(BDRFBkc`_wl`deKYf>dUe8&ztkCUOE~=ZiD4akDn8#K%q?F0s3_lQ
z2o$Nhn0rf1-oZM7_`+f1(%mltm^$$DpoK@w-_Hx5y5y`Om3Xgu$Yh!L>f_nNS!{^u
zEs;3KFFan7jlSFNs`8ol-RSb)IUh05+S;l+BnaR4(%I>l__bIqn%)ukLwTuVpD-kp
zeCOr;wfXt)Jhzn&c0J3&<`1?X9~*yuK11x+W?JlA%8pyk*HWd?Ctp39w7b`42XEme
zJjvhZ(8xa7>nMx8|21#J4(ItL_qWC52Z6{_mbONb&FP`{)8rp;8t6SVMJflT>kM)d
z2htva7`R@`wbC^PO^H5|bHUiMPg%*S(x&Cn!`SV{z6FL=VKZ!pv!^r*tRsj4PKf;8
zH{D(7ipF5K;k>-G3Dga=7+Zar5_z2AFpwL2d>InOL&F}>ZG?+eH^7SEb2BrsIRL4`
z{|v{OGZG;kb`s{9DK>)$F7-)D)a2;Pz_5VkLv=+zBB|;aV#85V?+g75xFyJs)LfGH
zfHlEEpN!z<J8Y&8BUt$r_9HjfC+#5`k2Qcc0bf_o%+M&)P!0xjKbNMz#<_qH#7L<H
z%~f~;cp-?_!PB$?4F&T7qhxqG)uMiay%8kpx<=(z#D)(Op9x!{sD5gRJ_9@^(uIAa
z>O}p1@Do85YDb|lJs*q{nZo}5Bvx}Uat&W2z7)cH<nQ721f7w|(!^ulV1%Fvd;2NZ
z4B22x+ozUW5G(Eq28m>Qw<skVk|9^QsrEim@<1~SK>)686j4w^0nP(9ty-;#*8Aa!
z+!%Y0BPHCx!V7@a%R5S(=(+U}yJHvf{ANl8@(liKPzHrVPta8Ud*ji|q|W7$sLk6w
zm(RD%-!uk$h3giPML{ZO>#osvLT<4*_e$N}=EUMH6+zcoyb#fN%Mu@L`(Gh{Yo)WY
zp>gZxcr+Bw4N4nAx4_3B>TdKeFdL$&P%rp6l*&`V94WnPfr%poSE9|yOY(Lb3s3S!
zcX2hwj^p~{Fn_9dfntS%*h5;@)Ig^a;?xR0i)`_wx~s+d7|w)I-Kp`xj>2C>_W34D
znx$AWTokGGrAQzKRtR&<v@m=LTr}$H(i4>dD1|LCw%5^1kSA=5v1u8yvz_Gc>zHHm
zP52GaOqRQ;W7xFFUN>lvT|A-~#P@?q0n;bB2ww$%BeyZr<B&eGmPvi6ufU&@)t5Ba
z^znE$cqNF{wJ}pQYE;j`FOpkdpg0CVIax8C9)Q+jBcMjW(LvM9CVd1Z<+63EX+{(B
z64VI_q{$YQ6}<t<h%BKzD$`HVj@<#R#91jrCNILOfPy$9Jt9-Q#@ay-aZY%<Kb@rv
zMpwaAg3U<sFvJ2_+uhH+rVqzz0YnrFoAy9MSTC2)NH(b~s5>SGcZfp!KSlIt)oxr#
z@a^@Yi~*$zN4W|1Mm1zISTHM>!C;ljqYW~woG7w4<Tno(Oqhj4n++bdxsUW5wcdE|
zNcD0lnj5jBv~2cN@7$gf&uYGy)s*e@>VHsZbe^zt+{$OMTRL&jxt#}gZ{S_z$GMjW
zFl)~Z*`1HF#wS(_EN==HocWFe&y&j{9`FZeWg|{x$uysZ)CPtNS9*^HF^2S~Of8u~
ztlj!=;S?xSHThsx=A2uJ=5{;11NUApshAzOeD7G5lfM@J%i4(q<xixB@~&WJy#5La
zZbaV9^n}`OBHNp1tMA2@OdFTSCgz8Q<-eC~5aX<~foJ1B<#gL|tL^#PYM4joDOHo(
zu7%p$3bgXf^#i9X9-0lK+d@x<6xRJ?;?;_?$o|_?eS4mo3+X#Nw`i17Hg7}Cndd?Z
pk+F}bNN+!0->wU+3#<#Q3#<#Q3;h2DCVyPpWW8%MxVU`i@*iE}y4(N&

literal 0
HcmV?d00001

diff --git a/tests/f_convert_bmap_and_extent/name b/tests/f_convert_bmap_and_extent/name
new file mode 100644
index 0000000..c9394c6
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/name
@@ -0,0 +1 @@
+convert blockmap and extents files to extents files
diff --git a/tests/f_convert_bmap_and_extent/script b/tests/f_convert_bmap_and_extent/script
new file mode 100644
index 0000000..203ab25
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/script
@@ -0,0 +1,119 @@
+if [ "$DESCRIPTION"x != x ]; then
+	test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+	IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+	FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+	SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+	OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+	OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+	if [ -f $test_dir/expect.1.gz ]; then
+		EXP1=$test_name.1.tmp
+		gunzip < $test_dir/expect.1.gz > $EXP1
+	else
+		EXP1=$test_dir/expect.1
+	fi
+fi
+
+if [ "$EXP2"x = x ]; then
+	if [ -f $test_dir/expect.2.gz ]; then
+		EXP2=$test_name.2.tmp
+		gunzip < $test_dir/expect.2.gz > $EXP2
+	else
+		EXP2=$test_dir/expect.2
+	fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+	gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'stat /a' > $TMPFILE.cmd
+echo 'ex /zero' >> $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$TUNE2FS -O extent $TMPFILE >> $OUT1.new 2>&1
+$FSCK $FSCK_OPT -E bmap2extent -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1 
+status=$?
+echo Exit status is $status >> $OUT2.new
+echo 'ex /a' > $TMPFILE.cmd
+echo 'ex /zero' >> $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+rm -rf $TMPFILE.cmd
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+rm -f $OUT2.new
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+	rm -f $test_name.ok $test_name.failed
+	cmp -s $OUT1 $EXP1
+	status1=$?
+	if [ "$ONE_PASS_ONLY" != "true" ]; then
+		cmp -s $OUT2 $EXP2
+		status2=$?
+	else
+		status2=0
+	fi
+	if [ "$PASS_ZERO" = "true" ]; then
+		cmp -s $test_name.0.log	$test_dir/expect.0
+		status3=$?
+	else
+		status3=0
+	fi
+
+	if [ -z "$test_description" ] ; then
+		description="$test_name"
+	else
+		description="$test_name: $test_description"
+	fi
+
+	if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+		echo "$description: ok"
+		touch $test_name.ok
+	else
+		echo "$description: failed"
+		rm -f $test_name.failed
+		if [ "$PASS_ZERO" = "true" ]; then
+			diff $DIFF_OPTS $test_dir/expect.0 \
+				$test_name.0.log >> $test_name.failed
+		fi
+		diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+		if [ "$ONE_PASS_ONLY" != "true" ]; then
+			diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+		fi
+	fi
+	rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+	unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2 
+	unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+	unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_extent_too_deep/expect.1 b/tests/f_extent_too_deep/expect.1
new file mode 100644
index 0000000..a595482
--- /dev/null
+++ b/tests/f_extent_too_deep/expect.1
@@ -0,0 +1,23 @@
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 7   1/  1     0 -     0    12              1
+ 1/ 7   1/  1     0 -     0    13              1
+ 2/ 7   1/  1     0 -     0    14              1
+ 3/ 7   1/  1     0 -     0    15              1
+ 4/ 7   1/  1     0 -     0    16              1
+ 5/ 7   1/  1     0 -     0    17              1
+ 6/ 7   1/  1     0 -     0     9              1
+ 7/ 7   1/  1     0 -     0    10 -    10      1 
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 extent tree could be more shallow (7; could be <= 4)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 1
diff --git a/tests/f_extent_too_deep/expect.2 b/tests/f_extent_too_deep/expect.2
new file mode 100644
index 0000000..a1d28b1
--- /dev/null
+++ b/tests/f_extent_too_deep/expect.2
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  1     0 -     0    10 -    10      1 
diff --git a/tests/f_extent_too_deep/image.gz b/tests/f_extent_too_deep/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..0f5adff562c7f45f275a4401344e784b6c61ef0b
GIT binary patch
literal 2592
zcmb2|=3wx?6duCF{PymCf8jtGh7aG@g)R_rT-?mre9>1}CPu>N$ctAhoXss6N_=iF
z5@Jgn)7j0{g?;~U$MjrWSXgN2BJ`haZVz)`*522$TUX9#P_RGs-1*+#-}m|t)ke$R
z5APFT+7Par(yh|6IP<$^TY&yHb=Pd!4HL>YcLz^?KYf{6a)z0HeXpXe&6y>)!{oHZ
z{w4_aEWdmEi{rbce>>ZI-7<e&n#^wgT>k9*+V6{hzqYV7kN<!F`meNSFQ+Q6*V|Y3
z^T*9EtryRSC(q!xs&fB&rR?lflfNCSbM%+z9`NR8Xvm!~b$|P%vOS^yuV3DA=UzC!
zdV2~70|P_Z{%7a+eb~FD@iQX>LxZgT@4xk?M^bd2d}j)Ap48L(^!o=nzNt^b-j@Bn
z(tB&}&(}Bi3E1%+uQ+>K4XAm;YM!#Q|M&Ag|6C8`EjZre0W^eR#(!oY`RqTC0D%K#
zt8|VVNY)>}nWdV0b3e-(v5!&LK$gwTUKeJq&J9t;uwnPp@BiQDA9oI70*csL0e#oU
z@4*RVe|hm||DC?}r7S>pRms2jtmDcsHU76gI%<N<lAHVgc$tlYDp1LV;#slvAMS%y
z_mw*UMaTsa{WFiQJUMmS^mV^a|12?_6#qZx;wRgs7w*6M?d&ss>QjkRvd@oR6O=Ze
zwT(YrEiNVZY2}A3mTiX){VlmTBO>)r|Ir`c-?Qz0^ke$R^}nA_NN3&Vro6gdEo}Aw
zNd;?G)F0CN>c2B%cHsY<=imQ-{{3@@$#3_+;+waWT&=Ht`+1dYa@5urll#5~9k0rL
z`R+&5tNm+I=hrV419@x|jE2By2#kinXb6mkz-S22A_Nlt)-hg8%bdf&puhkC@B%oQ

literal 0
HcmV?d00001

diff --git a/tests/f_extent_too_deep/name b/tests/f_extent_too_deep/name
new file mode 100644
index 0000000..7e8654a
--- /dev/null
+++ b/tests/f_extent_too_deep/name
@@ -0,0 +1 @@
+extent tree is deeper than it needs to be
diff --git a/tests/f_extent_too_deep/script b/tests/f_extent_too_deep/script
new file mode 100644
index 0000000..ee18438
--- /dev/null
+++ b/tests/f_extent_too_deep/script
@@ -0,0 +1,118 @@
+if [ "$DESCRIPTION"x != x ]; then
+	test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+	IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+	FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+	SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+	OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+	OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+	if [ -f $test_dir/expect.1.gz ]; then
+		EXP1=$test_name.1.tmp
+		gunzip < $test_dir/expect.1.gz > $EXP1
+	else
+		EXP1=$test_dir/expect.1
+	fi
+fi
+
+if [ "$EXP2"x = x ]; then
+	if [ -f $test_dir/expect.2.gz ]; then
+		EXP2=$test_name.2.tmp
+		gunzip < $test_dir/expect.2.gz > $EXP2
+	else
+		EXP2=$test_dir/expect.2
+	fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+	gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$FSCK $FSCK_OPT  -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+if [ "$ONE_PASS_ONLY" != "true" ]; then
+	$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1 
+	status=$?
+	echo Exit status is $status >> $OUT2.new
+	echo 'ex /a' > $TMPFILE.cmd
+	$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+	rm -rf $TMPFILE.cmd
+	sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+	rm -f $OUT2.new
+fi
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+	rm -f $test_name.ok $test_name.failed
+	cmp -s $OUT1 $EXP1
+	status1=$?
+	if [ "$ONE_PASS_ONLY" != "true" ]; then
+		cmp -s $OUT2 $EXP2
+		status2=$?
+	else
+		status2=0
+	fi
+	if [ "$PASS_ZERO" = "true" ]; then
+		cmp -s $test_name.0.log	$test_dir/expect.0
+		status3=$?
+	else
+		status3=0
+	fi
+
+	if [ -z "$test_description" ] ; then
+		description="$test_name"
+	else
+		description="$test_name: $test_description"
+	fi
+
+	if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+		echo "$description: ok"
+		touch $test_name.ok
+	else
+		echo "$description: failed"
+		rm -f $test_name.failed
+		if [ "$PASS_ZERO" = "true" ]; then
+			diff $DIFF_OPTS $test_dir/expect.0 \
+				$test_name.0.log >> $test_name.failed
+		fi
+		diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+		if [ "$ONE_PASS_ONLY" != "true" ]; then
+			diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+		fi
+	fi
+	rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+	unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2 
+	unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+	unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_opt_extent/expect b/tests/f_opt_extent/expect
new file mode 100644
index 0000000..6d4863b
--- /dev/null
+++ b/tests/f_opt_extent/expect
@@ -0,0 +1,55 @@
+tune2fs metadata_csum test
+Creating filesystem with 524288 1k blocks and 65536 inodes
+Superblock backups stored on blocks: 
+	8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
+
+Allocating group tables:      \b\b\b\b\bdone                            
+Writing inode tables:      \b\b\b\b\bdone                            
+Creating journal (16384 blocks): done
+Creating 477 huge file(s) with 1024 blocks each: done
+Writing superblocks and filesystem accounting information:      \b\b\b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+
+
+Change in FS metadata:
+@@ -10,7 +10,7 @@
+ Inode count:              65536
+ Block count:              524288
+ Reserved block count:     26214
+-Free blocks:              570
++Free blocks:              567
+ Free inodes:              65047
+ First block:              1
+ Block size:               1024
+@@ -47,8 +47,8 @@
+   Block bitmap at 262 (+261)
+   Inode bitmap at 278 (+277)
+   Inode table at 294-549 (+293)
+-  21 free blocks, 535 free inodes, 3 directories, 535 unused inodes
+-  Free blocks: 4414-4434
++  18 free blocks, 535 free inodes, 3 directories, 535 unused inodes
++  Free blocks: 4417-4434
+   Free inodes: 490-1024
+ Group 1: (Blocks 8193-16384) [INODE_UNINIT]
+   Backup superblock at 8193, Group descriptors at 8194-8197
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
diff --git a/tests/f_opt_extent/name b/tests/f_opt_extent/name
new file mode 100644
index 0000000..7d4389c
--- /dev/null
+++ b/tests/f_opt_extent/name
@@ -0,0 +1 @@
+optimize extent tree
diff --git a/tests/f_opt_extent/script b/tests/f_opt_extent/script
new file mode 100644
index 0000000..2da5e91
--- /dev/null
+++ b/tests/f_opt_extent/script
@@ -0,0 +1,64 @@
+FSCK_OPT=-fn
+OUT=$test_name.log
+EXP=$test_dir/expect
+CONF=$TMPFILE.conf
+
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode,64bit,metadata_csum
+		blocksize = 1024
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /xyz
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1M
+		zero_hugefiles = false
+	}
+ENDL
+
+echo "tune2fs metadata_csum test" > $OUT
+
+MKE2FS_CONFIG=$CONF $MKE2FS -F -T ext4h $TMPFILE 524288 >> $OUT 2>&1
+rm -rf $CONF
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.before
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+# check
+$FSCK -fyD -N test_filesys $TMPFILE >> $OUT 2>&1
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.after
+echo "Change in FS metadata:" >> $OUT
+diff -u $OUT.before $OUT.after | tail -n +3 >> $OUT
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+rm $TMPFILE $OUT.before $OUT.after
+
+#
+# Do the verification
+#
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" -e 's/test_filesys:.*//g' < $OUT > $OUT.new
+mv $OUT.new $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+unset IMAGE FSCK_OPT OUT EXP CONF
diff --git a/tests/f_opt_extent_ext3/expect b/tests/f_opt_extent_ext3/expect
new file mode 100644
index 0000000..1761471
--- /dev/null
+++ b/tests/f_opt_extent_ext3/expect
@@ -0,0 +1,44 @@
+rebuild extent metadata_csum test
+Creating filesystem with 524288 1k blocks and 65536 inodes
+Superblock backups stored on blocks: 
+	8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
+
+Allocating group tables:      \b\b\b\b\bdone                            
+Writing inode tables:      \b\b\b\b\bdone                            
+Creating journal (16384 blocks): done
+mke2fs: Operation not supported for inodes containing extents while creating huge files
+Writing superblocks and filesystem accounting information:      \b\b\b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+
+
+Change in FS metadata:
+@@ -2,7 +2,7 @@
+ Last mounted on:          <not available>
+ Filesystem magic number:  0xEF53
+ Filesystem revision #:    1 (dynamic)
+-Filesystem features:      has_journal ext_attr resize_inode dir_index filetype sparse_super large_file huge_file dir_nlink
++Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent sparse_super large_file huge_file dir_nlink
+ Default mount options:    user_xattr acl
+ Filesystem state:         clean
+ Errors behavior:          Continue
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
diff --git a/tests/f_opt_extent_ext3/name b/tests/f_opt_extent_ext3/name
new file mode 100644
index 0000000..b369685
--- /dev/null
+++ b/tests/f_opt_extent_ext3/name
@@ -0,0 +1 @@
+convert ext3 to extent tree
diff --git a/tests/f_opt_extent_ext3/script b/tests/f_opt_extent_ext3/script
new file mode 100644
index 0000000..931eae7
--- /dev/null
+++ b/tests/f_opt_extent_ext3/script
@@ -0,0 +1,65 @@
+FSCK_OPT=-fn
+OUT=$test_name.log
+EXP=$test_dir/expect
+CONF=$TMPFILE.conf
+
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = has_journal,^extent,huge_file,^flex_bg,^uninit_bg,dir_nlink,^extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode,^64bit,^metadata_csum
+		blocksize = 1024
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /
+		num_hugefiles = 100
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1M
+		zero_hugefiles = false
+	}
+ENDL
+
+echo "rebuild extent metadata_csum test" > $OUT
+
+MKE2FS_CONFIG=$CONF $MKE2FS -F -T ext4h $TMPFILE 524288 >> $OUT 2>&1
+rm -rf $CONF
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.before
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+# check
+$FSCK -fyD -N test_filesys -E bmap2extent $TMPFILE >> $OUT 2>&1
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.after
+echo "Change in FS metadata:" >> $OUT
+diff -u $OUT.before $OUT.after | tail -n +3 >> $OUT
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+rm $TMPFILE $OUT.before $OUT.after
+
+#
+# Do the verification
+#
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" -e 's/test_filesys:.*//g' < $OUT > $OUT.new
+mv $OUT.new $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+unset IMAGE FSCK_OPT OUT EXP CONF


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 09/35] e2fsck: abort on read error beyond end of FS
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (7 preceding siblings ...)
  2015-04-02  2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-04-02  4:10   ` Andreas Dilger
  2015-04-02  2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
                   ` (24 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Abort if we fail to read a block that's past the end of the FS.
Includes a flag to disable the abort behavior for selected parts of
the fsck run, so that we don't fail on a busted object prior to fixing
it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.h   |    1 +
 e2fsck/ehandler.c |    7 +++++--
 e2fsck/extents.c  |    2 ++
 e2fsck/journal.c  |    3 +++
 e2fsck/message.c  |   12 +++++++++++-
 e2fsck/pass1.c    |   28 ++++++++++++++++------------
 e2fsck/pass1b.c   |    4 ++++
 7 files changed, 42 insertions(+), 15 deletions(-)


diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 5fda863..453b552 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -193,6 +193,7 @@ struct resource_track {
 #define E2F_FLAG_TIME_INSANE	0x2000 /* Time is insane */
 #define E2F_FLAG_PROBLEMS_FIXED	0x4000 /* At least one problem was fixed */
 #define E2F_FLAG_ALLOC_OK	0x8000 /* Can we allocate blocks? */
+#define E2F_FLAG_IGNORE_READ_ERROR	0x10000 /* Don't rewrite read error blocks */
 
 #define E2F_RESET_FLAGS (E2F_FLAG_TIME_INSANE | E2F_FLAG_PROBLEMS_FIXED)
 
diff --git a/e2fsck/ehandler.c b/e2fsck/ehandler.c
index 71ca301..847f8e5 100644
--- a/e2fsck/ehandler.c
+++ b/e2fsck/ehandler.c
@@ -60,8 +60,11 @@ static errcode_t e2fsck_handle_read_error(io_channel channel,
 	preenhalt(ctx);
 
 	/* Don't rewrite a block past the end of the FS. */
-	if (block >= ext2fs_blocks_count(fs->super))
-		return 0;
+	if (block >= ext2fs_blocks_count(fs->super)) {
+		if (ctx->flags & E2F_FLAG_IGNORE_READ_ERROR)
+			return 0;
+		abort();
+	}
 
 	if (ask(ctx, _("Ignore error"), 1)) {
 		if (ask(ctx, _("Force rewrite"), 1))
diff --git a/e2fsck/extents.c b/e2fsck/extents.c
index 8465299..cff265a 100644
--- a/e2fsck/extents.c
+++ b/e2fsck/extents.c
@@ -29,6 +29,7 @@ errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino)
 {
 	if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
 				       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+	    (ctx->flags & (E2F_FLAG_RESTART_LATER | E2F_FLAG_RESTART)) ||
 	    (ctx->options & E2F_OPT_NO) ||
 	    (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
 		return 0;
@@ -339,6 +340,7 @@ static void rebuild_extents(e2fsck_t ctx, const char *pass_name, int pr_header)
 
 	if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
 				       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+	    (ctx->flags & (E2F_FLAG_RESTART_LATER | E2F_FLAG_RESTART)) ||
 	    !ext2fs_test_valid(ctx->fs) ||
 	    ctx->invalid_bitmaps) {
 		if (ctx->inodes_to_rebuild)
diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index 9f32095..c195797 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -315,6 +315,7 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
 	journal->j_inode = NULL;
 	journal->j_blocksize = ctx->fs->blocksize;
 
+	ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
 	if (uuid_is_null(sb->s_journal_uuid)) {
 		if (!sb->s_journal_inum) {
 			retval = EXT2_ET_BAD_INODE_NUM;
@@ -518,9 +519,11 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
 
 	*ret_journal = journal;
 	e2fsck_use_inode_shortcuts(ctx, 0);
+	ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
 	return 0;
 
 errout:
+	ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
 	e2fsck_use_inode_shortcuts(ctx, 0);
 	if (dev_fs)
 		ext2fs_free_mem(&dev_fs);
diff --git a/e2fsck/message.c b/e2fsck/message.c
index 9c1433f..510f291 100644
--- a/e2fsck/message.c
+++ b/e2fsck/message.c
@@ -199,14 +199,24 @@ static void print_pathname(FILE *f, ext2_filsys fs, ext2_ino_t dir,
 {
 	errcode_t	retval = 0;
 	char		*path;
+	e2fsck_t	ctx = fs ? (e2fsck_t) fs->priv_data : NULL;
+	int		flags;
 
 	if (!dir && (ino < num_special_inodes)) {
 		fputs(_(special_inode_name[ino]), f);
 		return;
 	}
 
-	if (fs)
+	if (fs) {
+		if (ctx) {
+			flags = ctx->flags;
+			ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
+		}
 		retval = ext2fs_get_pathname(fs, dir, ino, &path);
+		if (ctx)
+			ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR |
+					(flags & E2F_FLAG_IGNORE_READ_ERROR);
+	}
 	if (!fs || retval)
 		fputs("???", f);
 	else {
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 308a95a..760fbde 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -510,6 +510,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 	int			extent_fs;
 	int			inlinedata_fs;
 
+	ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
 	/*
 	 * If the mode looks OK, we believe it.  If the first block in
 	 * the i_block array is 0, this cannot be a directory. If the
@@ -519,7 +520,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 	 */
 	if (LINUX_S_ISDIR(inode->i_mode) || LINUX_S_ISREG(inode->i_mode) ||
 	    LINUX_S_ISLNK(inode->i_mode) || inode->i_block[0] == 0)
-		return;
+		goto out;
 
 	/* 
 	 * Check the block numbers in the i_block array for validity:
@@ -552,13 +553,13 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 		struct ext2_dir_entry de;
 
 		if (ext2fs_inline_data_size(ctx->fs, pctx->ino, &size))
-			return;
+			goto out;
 		/*
 		 * If the size isn't a multiple of 4, it's probably not a
 		 * directory??
 		 */
 		if (size & 3)
-			return;
+			goto out;
 		/*
 		 * If the first 10 bytes don't look like a directory entry,
 		 * it's probably not a directory.
@@ -578,14 +579,14 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 		     de.inode != 0) ||
 		    rec_len > EXT4_MIN_INLINE_DATA_SIZE -
 			      EXT4_INLINE_DATA_DOTDOT_SIZE)
-			return;
+			goto out;
 		/* device files never have a "system.data" entry */
 		goto isdir;
 	} else if (extent_fs && (inode->i_flags & EXT4_EXTENTS_FL)) {
 		/* extent mapped */
 		if  (ext2fs_bmap2(ctx->fs, pctx->ino, inode, 0, 0, 0, 0,
 				 &blk))
-			return;
+			goto out;
 		/* device files are never extent mapped */
 		not_device++;
 	} else {
@@ -600,7 +601,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 			    blk >= ext2fs_blocks_count(ctx->fs->super) ||
 			    ext2fs_fast_test_block_bitmap2(ctx->block_found_map,
 							   blk))
-				return;	/* Invalid block, can't be dir */
+				goto out;	/* Invalid block, can't be dir */
 		}
 		blk = inode->i_block[0];
 	}
@@ -612,45 +613,48 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 	 */
 	if ((LINUX_S_ISCHR(inode->i_mode) || LINUX_S_ISBLK(inode->i_mode)) &&
 	    (inode->i_links_count == 1) && !not_device)
-		return;
+		goto out;
 
 	/* read the first block */
 	ehandler_operation(_("reading directory block"));
 	retval = ext2fs_read_dir_block4(ctx->fs, blk, buf, 0, pctx->ino);
 	ehandler_operation(0);
 	if (retval)
-		return;
+		goto out;
 
 	dirent = (struct ext2_dir_entry *) buf;
 	retval = ext2fs_get_rec_len(ctx->fs, dirent, &rec_len);
 	if (retval)
-		return;
+		goto out;
 	if ((ext2fs_dirent_name_len(dirent) != 1) ||
 	    (dirent->name[0] != '.') ||
 	    (dirent->inode != pctx->ino) ||
 	    (rec_len < 12) ||
 	    (rec_len % 4) ||
 	    (rec_len >= ctx->fs->blocksize - 12))
-		return;
+		goto out;
 
 	dirent = (struct ext2_dir_entry *) (buf + rec_len);
 	retval = ext2fs_get_rec_len(ctx->fs, dirent, &rec_len);
 	if (retval)
-		return;
+		goto out;
 	if ((ext2fs_dirent_name_len(dirent) != 2) ||
 	    (dirent->name[0] != '.') ||
 	    (dirent->name[1] != '.') ||
 	    (rec_len < 12) ||
 	    (rec_len % 4))
-		return;
+		goto out;
 
 isdir:
+	ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
 	if (fix_problem(ctx, PR_1_TREAT_AS_DIRECTORY, pctx)) {
 		inode->i_mode = (inode->i_mode & 07777) | LINUX_S_IFDIR;
 		e2fsck_write_inode_full(ctx, pctx->ino, inode,
 					EXT2_INODE_SIZE(ctx->fs->super),
 					"check_is_really_dir");
 	}
+out:
+	ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
 }
 
 void e2fsck_setup_tdb_icount(e2fsck_t ctx, int flags,
diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index cd967f4..10136a6 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -234,7 +234,9 @@ void e2fsck_pass1_dupblocks(e2fsck_t ctx, char *block_buf)
 	dict_set_allocator(&clstr_dict, NULL, cluster_dnode_free, NULL);
 
 	init_resource_track(&rtrack, ctx->fs->io);
+	ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
 	pass1b(ctx, block_buf);
+	ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
 	print_resource_track(ctx, "Pass 1b", &rtrack, ctx->fs->io);
 
 	init_resource_track(&rtrack, ctx->fs->io);
@@ -242,7 +244,9 @@ void e2fsck_pass1_dupblocks(e2fsck_t ctx, char *block_buf)
 	print_resource_track(ctx, "Pass 1c", &rtrack, ctx->fs->io);
 
 	init_resource_track(&rtrack, ctx->fs->io);
+	ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
 	pass1d(ctx, block_buf);
+	ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
 	print_resource_track(ctx, "Pass 1d", &rtrack, ctx->fs->io);
 
 	/*


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (8 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-04-02  4:06   ` Andreas Dilger
  2015-05-05 14:20   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
                   ` (23 subsequent siblings)
  33 siblings, 2 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Implement pass-through calls for discard, zero-out, and readahead in
the IO manager so that we can take advantage of any underlying
support.

Furthermore, improve tdb write-out speed by disabling locking and only
fsyncing at the end -- we don't care about locking because having
multiple writers to the undo file will produce an undo database full
of garbage blocks; and we only need to fsync at the end because if we
fail before the end, our undo file will lack the necessary superblock
data that e2undo requires to do replay safely.  Without this, we call
fsync four times per tdb update(!)  This reduces the overhead of using
undo_io while converting a 2TB FS to metadata_csum from 3+ hours to 55
minutes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/tdb.c     |   10 ++++++
 lib/ext2fs/tdb.h     |    2 +
 lib/ext2fs/undo_io.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 97 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/tdb.c b/lib/ext2fs/tdb.c
index 1d97685..7317288 100644
--- a/lib/ext2fs/tdb.c
+++ b/lib/ext2fs/tdb.c
@@ -4142,3 +4142,13 @@ int tdb_reopen_all(int parent_longlived)
 
 	return 0;
 }
+
+/**
+ * Flush a database file from the page cache.
+ **/
+int tdb_flush(struct tdb_context *tdb)
+{
+	if (tdb->fd != -1)
+		return fsync(tdb->fd);
+	return 0;
+}
diff --git a/lib/ext2fs/tdb.h b/lib/ext2fs/tdb.h
index 732ef0e..6a4086c 100644
--- a/lib/ext2fs/tdb.h
+++ b/lib/ext2fs/tdb.h
@@ -129,6 +129,7 @@ typedef struct TDB_DATA {
 #define tdb_lockall_nonblock ext2fs_tdb_lockall_nonblock
 #define tdb_lockall_read_nonblock ext2fs_tdb_lockall_read_nonblock
 #define tdb_lockall_unmark ext2fs_tdb_lockall_unmark
+#define tdb_flush ext2fs_tdb_flush
 
 /* this is the context structure that is returned from a db open */
 typedef struct tdb_context TDB_CONTEXT;
@@ -191,6 +192,7 @@ size_t tdb_map_size(struct tdb_context *tdb);
 int tdb_get_flags(struct tdb_context *tdb);
 void tdb_enable_seqnum(struct tdb_context *tdb);
 void tdb_increment_seqnum_nonblock(struct tdb_context *tdb);
+int tdb_flush(struct tdb_context *tdb);
 
 /* Low level locking functions: use with care */
 int tdb_chainlock(struct tdb_context *tdb, TDB_DATA key);
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index d6beb02..94317cb 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -37,6 +37,7 @@
 #if HAVE_SYS_RESOURCE_H
 #include <sys/resource.h>
 #endif
+#include <limits.h>
 
 #include "tdb.h"
 
@@ -354,8 +355,12 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 		data->real = 0;
 	}
 
+	if (data->real)
+		io->flags = (io->flags & ~CHANNEL_FLAGS_DISCARD_ZEROES) |
+			    (data->real->flags & CHANNEL_FLAGS_DISCARD_ZEROES);
+
 	/* setup the tdb file */
-	data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST,
+	data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST | TDB_NOLOCK | TDB_NOSYNC,
 			     O_RDWR | O_CREAT | O_TRUNC | O_EXCL, 0600);
 	if (!data->tdb) {
 		retval = errno;
@@ -399,8 +404,10 @@ static errcode_t undo_close(io_channel channel)
 		return retval;
 	if (data->real)
 		retval = io_channel_close(data->real);
-	if (data->tdb)
+	if (data->tdb) {
+		tdb_flush(data->tdb);
 		tdb_close(data->tdb);
+	}
 	ext2fs_free_mem(&channel->private_data);
 	if (channel->name)
 		ext2fs_free_mem(&channel->name);
@@ -510,6 +517,77 @@ static errcode_t undo_write_byte(io_channel channel, unsigned long offset,
 	return retval;
 }
 
+static errcode_t undo_discard(io_channel channel, unsigned long long block,
+			      unsigned long long count)
+{
+	struct undo_private_data *data;
+	errcode_t	retval = 0;
+	int icount;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	data = (struct undo_private_data *) channel->private_data;
+	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+	if (count > INT_MAX)
+		return EXT2_ET_UNIMPLEMENTED;
+	icount = count;
+
+	/*
+	 * First write the existing content into database
+	 */
+	retval = undo_write_tdb(channel, block, icount);
+	if (retval)
+		return retval;
+	if (data->real)
+		retval = io_channel_discard(data->real, block, count);
+
+	return retval;
+}
+
+static errcode_t undo_zeroout(io_channel channel, unsigned long long block,
+			      unsigned long long count)
+{
+	struct undo_private_data *data;
+	errcode_t	retval = 0;
+	int icount;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	data = (struct undo_private_data *) channel->private_data;
+	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+	if (count > INT_MAX)
+		return EXT2_ET_UNIMPLEMENTED;
+	icount = count;
+
+	/*
+	 * First write the existing content into database
+	 */
+	retval = undo_write_tdb(channel, block, icount);
+	if (retval)
+		return retval;
+	if (data->real)
+		retval = io_channel_zeroout(data->real, block, count);
+
+	return retval;
+}
+
+static errcode_t undo_cache_readahead(io_channel channel,
+				      unsigned long long block,
+				      unsigned long long count)
+{
+	struct undo_private_data *data;
+	errcode_t	retval = 0;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	data = (struct undo_private_data *) channel->private_data;
+	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+	if (data->real)
+		retval = io_channel_cache_readahead(data->real, block, count);
+
+	return retval;
+}
+
 /*
  * Flush data buffers to disk.
  */
@@ -522,6 +600,8 @@ static errcode_t undo_flush(io_channel channel)
 	data = (struct undo_private_data *) channel->private_data;
 	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
 
+	if (data->tdb)
+		tdb_flush(data->tdb);
 	if (data->real)
 		retval = io_channel_flush(data->real);
 
@@ -601,6 +681,9 @@ static struct struct_io_manager struct_undo_manager = {
 	.get_stats	= undo_get_stats,
 	.read_blk64	= undo_read_blk64,
 	.write_blk64	= undo_write_blk64,
+	.discard	= undo_discard,
+	.zeroout	= undo_zeroout,
+	.cache_readahead	= undo_cache_readahead,
 };
 
 io_manager undo_io_manager = &struct_undo_manager;


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 11/35] undo-io: be more flexible about setting block size
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (9 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:21   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
                   ` (22 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Most of the e2fsprogs utilities set the IO block size multiple times
(once to 1k to read the superblock, then again to set the real block
size if we find a real superblock).  Unfortunately, the undo IO
manager only lets the block size be set once.  For the non-mke2fs
utilities we'd rather catch the real block size and use that.  mke2fs
of course wants to use a really large block size since it's probably
writing a lot of data.

Therefore, if we haven't written any blocks to the undo file, it's
perfectly fine to allow block size changes.  For mke2fs, we'll modify
the IO channel option that lets us set the huge size to lock that
in place.  This greatly reduces index overhead for undo files for
e2fsck/tune2fs/resize2fs while continuing the practice of reducing
it even more for mke2fs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/undo_io.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 94317cb..70b90d5 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -265,7 +265,7 @@ static errcode_t undo_write_tdb(io_channel channel,
 		       tdb_data.dptr,
 		       tdb_data.dsize);
 #endif
-		if (!data->tdb_written) {
+		if (data->tdb_written != 1) {
 			data->tdb_written = 1;
 			/* Write the blocksize to tdb file */
 			retval = write_block_size(data->tdb,
@@ -430,9 +430,8 @@ static errcode_t undo_set_blksize(io_channel channel, int blksize)
 	/*
 	 * Set the block size used for tdb
 	 */
-	if (!data->tdb_data_size) {
+	if (!data->tdb_data_size || !data->tdb_written)
 		data->tdb_data_size = blksize;
-	}
 	channel->block_size = blksize;
 	return retval;
 }
@@ -628,6 +627,7 @@ static errcode_t undo_set_option(io_channel channel, const char *option,
 		if (*end)
 			return EXT2_ET_INVALID_ARGUMENT;
 		if (!data->tdb_data_size || !data->tdb_written) {
+			data->tdb_written = -1;
 			data->tdb_data_size = tmp;
 		}
 		return 0;


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 12/35] undo-io: use a bitmap to track what we've already written
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (10 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:21   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
                   ` (21 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

It's really inefficient to (ab)use the TDB key store as a bitmap to
find out if we've already written a block to the undo file, because
the tdb code is reads the database key btree disk blocks for *every*
query.  Changing that logic to a bitmap reduces overhead by a large
margin -- the overhead of using undo_io while converting a 2TB FS to
metadata_csum is reduced from 55 minutes to 45.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/undo_io.c |   69 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 18 deletions(-)


diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 70b90d5..9a01e30 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -70,6 +70,9 @@ struct undo_private_data {
 
 	/* to support offset in unix I/O manager */
 	ext2_loff_t offset;
+
+	ext2fs_block_bitmap written_block_map;
+	struct struct_ext2_filsys fake_fs;
 };
 
 static io_manager undo_io_backing_manager;
@@ -164,6 +167,38 @@ static errcode_t write_block_size(TDB_CONTEXT *tdb, int block_size)
 	return retval;
 }
 
+static errcode_t undo_setup_tdb(struct undo_private_data *data)
+{
+	errcode_t retval;
+
+	if (data->tdb_written == 1)
+		return 0;
+
+	data->tdb_written = 1;
+
+	/* Make a bitmap to track what we've written */
+	memset(&data->fake_fs, 0, sizeof(data->fake_fs));
+	data->fake_fs.blocksize = data->tdb_data_size;
+	retval = ext2fs_alloc_generic_bmap(&data->fake_fs,
+				EXT2_ET_MAGIC_BLOCK_BITMAP64,
+				EXT2FS_BMAP64_RBTREE,
+				0, ~1ULL, ~1ULL,
+				"undo block map", &data->written_block_map);
+	if (retval)
+		return retval;
+
+	/* Write the blocksize to tdb file */
+	tdb_transaction_start(data->tdb);
+	retval = write_block_size(data->tdb,
+				  data->tdb_data_size);
+	if (retval) {
+		tdb_transaction_cancel(data->tdb);
+		return EXT2_ET_TDB_ERR_IO;
+	}
+	tdb_transaction_commit(data->tdb);
+	return 0;
+}
+
 static errcode_t undo_write_tdb(io_channel channel,
 				unsigned long long block, int count)
 
@@ -194,6 +229,10 @@ static errcode_t undo_write_tdb(io_channel channel,
 		else
 			size = count * channel->block_size;
 	}
+
+	retval = undo_setup_tdb(data);
+	if (retval)
+		return retval;
 	/*
 	 * Data is stored in tdb database as blocks of tdb_data_size size
 	 * This helps in efficient lookup further.
@@ -212,11 +251,14 @@ static errcode_t undo_write_tdb(io_channel channel,
 		/*
 		 * Check if we have the record already
 		 */
-		if (tdb_exists(data->tdb, tdb_key)) {
+		if (ext2fs_test_block_bitmap2(data->written_block_map,
+						   block_num)) {
 			/* Try the next block */
 			block_num++;
 			continue;
 		}
+		ext2fs_mark_block_bitmap2(data->written_block_map, block_num);
+
 		/*
 		 * Read one block using the backing I/O manager
 		 * The backing I/O manager block size may be
@@ -265,19 +307,7 @@ static errcode_t undo_write_tdb(io_channel channel,
 		       tdb_data.dptr,
 		       tdb_data.dsize);
 #endif
-		if (data->tdb_written != 1) {
-			data->tdb_written = 1;
-			/* Write the blocksize to tdb file */
-			retval = write_block_size(data->tdb,
-						  data->tdb_data_size);
-			if (retval) {
-				tdb_transaction_cancel(data->tdb);
-				retval = EXT2_ET_TDB_ERR_IO;
-				free(read_ptr);
-				return retval;
-			}
-		}
-		retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_INSERT);
+		retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_REPLACE);
 		if (retval == -1) {
 			/*
 			 * TDB_ERR_EXISTS cannot happen because we
@@ -345,6 +375,7 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 
 	memset(data, 0, sizeof(struct undo_private_data));
 	data->magic = EXT2_ET_MAGIC_UNIX_IO_CHANNEL;
+	data->written_block_map = NULL;
 
 	if (undo_io_backing_manager) {
 		retval = undo_io_backing_manager->open(name, flags,
@@ -390,7 +421,7 @@ cleanup:
 static errcode_t undo_close(io_channel channel)
 {
 	struct undo_private_data *data;
-	errcode_t	retval = 0;
+	errcode_t	err, retval = 0;
 
 	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
 	data = (struct undo_private_data *) channel->private_data;
@@ -399,20 +430,22 @@ static errcode_t undo_close(io_channel channel)
 	if (--channel->refcount > 0)
 		return 0;
 	/* Before closing write the file system identity */
-	retval = write_file_system_identity(channel, data->tdb);
-	if (retval)
-		return retval;
+	err = write_file_system_identity(channel, data->tdb);
 	if (data->real)
 		retval = io_channel_close(data->real);
 	if (data->tdb) {
 		tdb_flush(data->tdb);
 		tdb_close(data->tdb);
 	}
+	if (data->written_block_map)
+		ext2fs_free_generic_bitmap(data->written_block_map);
 	ext2fs_free_mem(&channel->private_data);
 	if (channel->name)
 		ext2fs_free_mem(&channel->name);
 	ext2fs_free_mem(&channel);
 
+	if (err)
+		return err;
 	return retval;
 }
 


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (11 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:22   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
                   ` (20 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix memory leaks and improve the error messages to make it easier
to figure out why e2undo went wrong.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/e2undo.c |   43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)


diff --git a/misc/e2undo.c b/misc/e2undo.c
index a43c26f..d828d3b 100644
--- a/misc/e2undo.c
+++ b/misc/e2undo.c
@@ -49,7 +49,7 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
 	retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
 	if (retval) {
 		com_err(prg_name, retval,
-			"%s", _("Failed to read the file system data \n"));
+			"%s", _("while reading filesystem superblock."));
 		return retval;
 	}
 
@@ -58,16 +58,16 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
 	tdb_data = tdb_fetch(tdb, tdb_key);
 	if (!tdb_data.dptr) {
 		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-		com_err(prg_name, retval,
-			_("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+		com_err(prg_name, retval, "%s",
+			_("while fetching last mount time."));
 		return retval;
 	}
 
 	s_mtime = *(__u32 *)tdb_data.dptr;
+	free(tdb_data.dptr);
 	if (super.s_mtime != s_mtime) {
-
 		com_err(prg_name, 0,
-			_("The file system Mount time didn't match %u\n"),
+			_("The filesystem last mount time didn't match %u."),
 			s_mtime);
 
 		return  -1;
@@ -79,14 +79,14 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
 	tdb_data = tdb_fetch(tdb, tdb_key);
 	if (!tdb_data.dptr) {
 		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-		com_err(prg_name, retval,
-			_("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+		com_err(prg_name, retval, "%s", _("while fetching UUID"));
 		return retval;
 	}
 	memcpy(s_uuid, tdb_data.dptr, sizeof(s_uuid));
+	free(tdb_data.dptr);
 	if (memcmp(s_uuid, super.s_uuid, sizeof(s_uuid))) {
 		com_err(prg_name, 0, "%s",
-			_("The file system UUID didn't match \n"));
+			_("The filesystem UUID didn't match."));
 		return -1;
 	}
 
@@ -104,12 +104,12 @@ static int set_blk_size(TDB_CONTEXT *tdb, io_channel channel)
 	tdb_data = tdb_fetch(tdb, tdb_key);
 	if (!tdb_data.dptr) {
 		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-		com_err(prg_name, retval,
-			_("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+		com_err(prg_name, retval, "%s", _("while fetching block size"));
 		return retval;
 	}
 
 	block_size = *(int *)tdb_data.dptr;
+	free(tdb_data.dptr);
 #ifdef DEBUG
 	printf("Block size %d\n", block_size);
 #endif
@@ -129,6 +129,7 @@ int main(int argc, char *argv[])
 	blk64_t  blk_num;
 	char *device_name, *tdb_file;
 	io_manager manager = unix_io_manager;
+	void *old_dptr = NULL;
 
 #ifdef ENABLE_NLS
 	setlocale(LC_MESSAGES, "");
@@ -160,20 +161,20 @@ int main(int argc, char *argv[])
 
 	if (!tdb) {
 		com_err(prg_name, errno,
-				_("Failed tdb_open %s\n"), tdb_file);
+				_("while opening undo file `%s'\n"), tdb_file);
 		exit(1);
 	}
 
 	retval = ext2fs_check_if_mounted(device_name, &mount_flags);
 	if (retval) {
 		com_err(prg_name, retval, _("Error while determining whether "
-				"%s is mounted.\n"), device_name);
+				"%s is mounted."), device_name);
 		exit(1);
 	}
 
 	if (mount_flags & EXT2_MF_MOUNTED) {
 		com_err(prg_name, retval, "%s", _("e2undo should only be run "
-						"on unmounted file system\n"));
+						"on unmounted filesystems"));
 		exit(1);
 	}
 
@@ -181,7 +182,7 @@ int main(int argc, char *argv[])
 				IO_FLAG_EXCLUSIVE | IO_FLAG_RW,  &channel);
 	if (retval) {
 		com_err(prg_name, retval,
-				_("Failed to open %s\n"), device_name);
+				_("while opening `%s'"), device_name);
 		exit(1);
 	}
 
@@ -194,30 +195,34 @@ int main(int argc, char *argv[])
 	}
 
 	for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
+		free(old_dptr);
+		old_dptr = key.dptr;
 		if (!strcmp((char *) key.dptr, (char *) mtime_key) ||
 		    !strcmp((char *) key.dptr, (char *) uuid_key) ||
 		    !strcmp((char *) key.dptr, (char *) blksize_key)) {
 			continue;
 		}
 
+		blk_num = *(blk64_t *)key.dptr;
 		data = tdb_fetch(tdb, key);
 		if (!data.dptr) {
-			com_err(prg_name, 0,
-				_("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+			retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
+			com_err(prg_name, retval,
+				_("while fetching block %llu."), blk_num);
 			exit(1);
 		}
-		blk_num = *(blk64_t *)key.dptr;
 		printf(_("Replayed transaction of size %zd at location %llu\n"),
 							data.dsize, blk_num);
 		retval = io_channel_write_blk64(channel, blk_num,
 						-data.dsize, data.dptr);
+		free(data.dptr);
 		if (retval == -1) {
 			com_err(prg_name, retval,
-					_("Failed write %s\n"),
-					strerror(errno));
+				_("while writing block %llu."), blk_num);
 			exit(1);
 		}
 	}
+	free(old_dptr);
 	io_channel_close(channel);
 	tdb_close(tdb);
 


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (12 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:24   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
                   ` (19 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The existing undo file format (which is based on tdb) has many
problems.  First, its comparison of superblock fields is ineffective,
since the last mount time is only written by the kernel, not the tools
(which means that undo files can be applied out of order, thus
corrupting the filesystem); block numbers are written in CPU byte
order, which will cause silent failures if an undo file is moved from
one type of system to another; using the tdb database costs us an
enormous amount of CPU overhead to maintain the key data structure,
and finally, the tdb database is unable to deal with databases larger
than 2GB.  (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
might as well move off of tdb now.)

The last problem is fatal if you want to use tune2fs to turn on
metadata checksumming, since that rewrites every block on the
filesystem, which can easily produce a many-gigabyte undo file, which
of course is unreadable and therefore the operation cannot be undone.

Therefore, rip all of that out in favor of writing to a flat file.
Old blocks are appended to a file and the index is written to the end
when we're done.  This implementation is much faster than wasting a
considerable amount of time trying to maintain a hash index, which
drops the runtime overhead of tune2fs -O metadata_csum from ~45min
to ~20 seconds on a 2TB filesystem.

I have a few reasons that factored in my decision not to repurpose the
jbd2 file format for undo files.  First, undo files are limited to
2^32 blocks (16TB) which some day might not serve us well.  Second,
the journal block size is tied to the file system block size, but
mke2fs wants to be able to back up big chunks of old device contents.
This would require large changes to the e2fsck journal replay code,
which itself is derived from the kernel jbd2 driver, which I'd rather
not destabilize.  Third, I want to require undo files to store the FS
superblock at the end of undo file creation so that e2undo can be
reasonably sure that an undo file is supposed to apply against the
given block device, and doing so would require changes to the jbd2
format.  Fourth, it didn't seem like a good idea that external
journals should resemble undo files so closely.

v2: Provide a state bit that is only set when the undo channel is
closed correctly so we can warn the user about potentially incomplete
undo files.  Straighten out the superblock handling so that undo files
won't be confused for real ext* FS images.  Record multi-block runs in
each block key to reduce overhead even further.  Support reopening an
undo file so that we can combine multiple FS operations into one
(overall smaller) transaction file, which will be easier to manage.
Flush the undo index data if the program should terminate
unexpectedly.  Update the ext4 superblock bits if errors or -f is
found to encourage fsck to do a full run the next time it's invoked.
Enable undoing the undo.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/ext2_err.et.in |    6 
 lib/ext2fs/undo_io.c      |  550 ++++++++++++++++++++++++++++++++++++--------
 misc/e2undo.8.in          |   17 +
 misc/e2undo.c             |  560 +++++++++++++++++++++++++++++++++++++--------
 4 files changed, 923 insertions(+), 210 deletions(-)


diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
index 790d135..894789e 100644
--- a/lib/ext2fs/ext2_err.et.in
+++ b/lib/ext2fs/ext2_err.et.in
@@ -524,4 +524,10 @@ ec	EXT2_ET_EA_BAD_VALUE_OFFSET,
 ec	EXT2_ET_JOURNAL_FLAGS_WRONG,
 	"Journal flags inconsistent"
 
+ec	EXT2_ET_UNDO_FILE_CORRUPT,
+	"Undo file corrupt"
+
+ec	EXT2_ET_UNDO_FILE_WRONG,
+	"Wrong undo file for this filesystem"
+
 	end
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 9a01e30..f1c107a 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -39,8 +39,6 @@
 #endif
 #include <limits.h>
 
-#include "tdb.h"
-
 #include "ext2_fs.h"
 #include "ext2fs.h"
 
@@ -50,22 +48,86 @@
 #define ATTR(x)
 #endif
 
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
 /*
  * For checking structure magic numbers...
  */
 
 #define EXT2_CHECK_MAGIC(struct, code) \
 	  if ((struct)->magic != (code)) return (code)
+/*
+ * Undo file format: The file is cut up into undo_header.block_size blocks.
+ * The first block contains the header.
+ * The second block contains the superblock.
+ * There is then a repeating series of blocks as follows:
+ *   A key block, which contains undo_keys to map the following data blocks.
+ *   Data blocks
+ * (Note that there are pointers to the first key block and the sb, so this
+ * order isn't strictly necessary.)
+ */
+#define E2UNDO_MAGIC "E2UNDO02"
+#define KEYBLOCK_MAGIC 0xCADECADE
+
+#define E2UNDO_STATE_FINISHED	0x1	/* undo file is complete */
+
+#define E2UNDO_MIN_BLOCK_SIZE	1024	/* undo blocks are no less than 1KB */
+#define E2UNDO_MAX_BLOCK_SIZE	1048576	/* undo blocks are no more than 1MB */
+
+struct undo_header {
+	char magic[8];		/* "E2UNDO02" */
+	__le64 num_keys;	/* how many keys? */
+	__le64 super_offset;	/* where in the file is the superblock copy? */
+	__le64 key_offset;	/* where do the key/data block chunks start? */
+	__le32 block_size;	/* block size of the undo file */
+	__le32 fs_block_size;	/* block size of the target device */
+	__le32 sb_crc;		/* crc32c of the superblock */
+	__le32 state;		/* e2undo state flags */
+	__le32 f_compat;	/* compatible features (none so far) */
+	__le32 f_incompat;	/* incompatible features (none so far) */
+	__le32 f_rocompat;	/* ro compatible features (none so far) */
+	__u8 padding[448];	/* padding */
+	__le32 header_crc;	/* crc32c of this header (but not this field) */
+};
+
+#define E2UNDO_MAX_EXTENT_BLOCKS	512	/* max extent size, in blocks */
+
+struct undo_key {
+	__le64 fsblk;		/* where in the fs does the block go */
+	__le32 blk_crc;		/* crc32c of the block */
+	__le32 size;		/* how many bytes in this block? */
+};
+
+struct undo_key_block {
+	__le32 magic;		/* KEYBLOCK_MAGIC number */
+	__le32 crc;		/* block checksum */
+	__le64 reserved;	/* zero */
+
+	struct undo_key keys[0];	/* keys, which come immediately after */
+};
 
 struct undo_private_data {
 	int	magic;
-	TDB_CONTEXT *tdb;
-	char *tdb_file;
+
+	/* the undo file io channel */
+	io_channel undo_file;
+	blk64_t undo_blk_num;			/* next free block */
+	blk64_t key_blk_num;			/* current key block location */
+	blk64_t super_blk_num;			/* superblock location */
+	blk64_t first_key_blk;			/* first key block location */
+	struct undo_key_block *keyb;
+	size_t num_keys, keys_in_block;
 
 	/* The backing io channel */
 	io_channel real;
 
-	int tdb_data_size;
+	unsigned long long tdb_data_size;
 	int tdb_written;
 
 	/* to support offset in unix I/O manager */
@@ -73,16 +135,15 @@ struct undo_private_data {
 
 	ext2fs_block_bitmap written_block_map;
 	struct struct_ext2_filsys fake_fs;
+
+	struct undo_header hdr;
 };
+#define KEYS_PER_BLOCK(d) (((d)->tdb_data_size / sizeof(struct undo_key)) - 1)
 
 static io_manager undo_io_backing_manager;
 static char *tdb_file;
 static int actual_size;
 
-static unsigned char mtime_key[] = "filesystem MTIME";
-static unsigned char blksize_key[] = "filesystem BLKSIZE";
-static unsigned char uuid_key[] = "filesystem UUID";
-
 errcode_t set_undo_io_backing_manager(io_manager manager)
 {
 	/*
@@ -103,17 +164,34 @@ errcode_t set_undo_io_backup_file(char *file_name)
 	return 0;
 }
 
-static errcode_t write_file_system_identity(io_channel undo_channel,
-							TDB_CONTEXT *tdb)
+static errcode_t write_undo_indexes(struct undo_private_data *data)
 {
 	errcode_t retval;
 	struct ext2_super_block super;
-	TDB_DATA tdb_key, tdb_data;
-	struct undo_private_data *data;
 	io_channel channel;
-	int block_size ;
+	int block_size;
+	__u32 sb_crc, hdr_crc;
+
+	/* Spit out a key block, if there's any data */
+	if (data->keys_in_block) {
+		data->keyb->magic = ext2fs_cpu_to_le32(KEYBLOCK_MAGIC);
+		data->keyb->crc = 0;
+		data->keyb->crc = ext2fs_cpu_to_le32(
+					 ext2fs_crc32c_le(~0,
+					 (unsigned char *)data->keyb,
+					 data->tdb_data_size));
+		dbg_printf("Writing keyblock to blk %llu\n", data->key_blk_num);
+		retval = io_channel_write_blk64(data->undo_file,
+						data->key_blk_num,
+						1, data->keyb);
+		if (retval)
+			return retval;
+		memset(data->keyb, 0, data->tdb_data_size);
+		data->keys_in_block = 0;
+		data->key_blk_num = data->undo_blk_num;
+	}
 
-	data = (struct undo_private_data *) undo_channel->private_data;
+	/* Prepare superblock for write */
 	channel = data->real;
 	block_size = channel->block_size;
 
@@ -121,54 +199,45 @@ static errcode_t write_file_system_identity(io_channel undo_channel,
 	retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
 	if (retval)
 		goto err_out;
-
-	/* Write to tdb file in the file system byte order */
-	tdb_key.dptr = mtime_key;
-	tdb_key.dsize = sizeof(mtime_key);
-	tdb_data.dptr = (unsigned char *) &(super.s_mtime);
-	tdb_data.dsize = sizeof(super.s_mtime);
-
-	retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
-	if (retval == -1) {
-		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
+	sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)&super, SUPERBLOCK_SIZE);
+	super.s_magic = ~super.s_magic;
+
+	/* Write the undo header to disk. */
+	memcpy(data->hdr.magic, E2UNDO_MAGIC, sizeof(data->hdr.magic));
+	data->hdr.num_keys = ext2fs_cpu_to_le64(data->num_keys);
+	data->hdr.super_offset = ext2fs_cpu_to_le64(data->super_blk_num);
+	data->hdr.key_offset = ext2fs_cpu_to_le64(data->first_key_blk);
+	data->hdr.fs_block_size = ext2fs_cpu_to_le32(block_size);
+	data->hdr.sb_crc = ext2fs_cpu_to_le32(sb_crc);
+	hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&data->hdr,
+				   sizeof(data->hdr) -
+				   sizeof(data->hdr.header_crc));
+	data->hdr.header_crc = ext2fs_cpu_to_le32(hdr_crc);
+	retval = io_channel_write_blk64(data->undo_file, 0,
+					-(int)sizeof(data->hdr),
+					&data->hdr);
+	if (retval)
 		goto err_out;
-	}
 
-	tdb_key.dptr = uuid_key;
-	tdb_key.dsize = sizeof(uuid_key);
-	tdb_data.dptr = (unsigned char *)&(super.s_uuid);
-	tdb_data.dsize = sizeof(super.s_uuid);
-
-	retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
-	if (retval == -1) {
-		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-	}
+	/*
+	 * Record the entire superblock (in FS byte order) so that we can't
+	 * apply e2undo files to the wrong FS or out of order.
+	 */
+	dbg_printf("Writing superblock to block %llu\n", data->super_blk_num);
+	retval = io_channel_write_blk64(data->undo_file, data->super_blk_num,
+					-SUPERBLOCK_SIZE, &super);
+	if (retval)
+		goto err_out;
 
+	retval = io_channel_flush(data->undo_file);
 err_out:
 	io_channel_set_blksize(channel, block_size);
 	return retval;
 }
 
-static errcode_t write_block_size(TDB_CONTEXT *tdb, int block_size)
-{
-	errcode_t retval;
-	TDB_DATA tdb_key, tdb_data;
-
-	tdb_key.dptr = blksize_key;
-	tdb_key.dsize = sizeof(blksize_key);
-	tdb_data.dptr = (unsigned char *)&(block_size);
-	tdb_data.dsize = sizeof(block_size);
-
-	retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
-	if (retval == -1) {
-		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-	}
-
-	return retval;
-}
-
 static errcode_t undo_setup_tdb(struct undo_private_data *data)
 {
+	int i;
 	errcode_t retval;
 
 	if (data->tdb_written == 1)
@@ -187,15 +256,33 @@ static errcode_t undo_setup_tdb(struct undo_private_data *data)
 	if (retval)
 		return retval;
 
-	/* Write the blocksize to tdb file */
-	tdb_transaction_start(data->tdb);
-	retval = write_block_size(data->tdb,
-				  data->tdb_data_size);
-	if (retval) {
-		tdb_transaction_cancel(data->tdb);
-		return EXT2_ET_TDB_ERR_IO;
+	/* Allocate key block */
+	retval = ext2fs_get_mem(data->tdb_data_size, &data->keyb);
+	if (retval)
+		return retval;
+	data->key_blk_num = data->undo_blk_num;
+
+	/* Record block size */
+	dbg_printf("Undo block size %llu\n", data->tdb_data_size);
+	dbg_printf("Keys per block %llu\n", KEYS_PER_BLOCK(data));
+	data->hdr.block_size = ext2fs_cpu_to_le32(data->tdb_data_size);
+	io_channel_set_blksize(data->undo_file, data->tdb_data_size);
+
+	/* Ensure that we have space for header blocks */
+	for (i = 0; i <= 2; i++) {
+		retval = io_channel_read_blk64(data->undo_file, i, 1,
+					       data->keyb);
+		if (retval)
+			memset(data->keyb, 0, data->tdb_data_size);
+		retval = io_channel_write_blk64(data->undo_file, i, 1,
+						data->keyb);
+		if (retval)
+			return retval;
+		retval = io_channel_flush(data->undo_file);
+		if (retval)
+			return retval;
 	}
-	tdb_transaction_commit(data->tdb);
+	memset(data->keyb, 0, data->tdb_data_size);
 	return 0;
 }
 
@@ -208,13 +295,16 @@ static errcode_t undo_write_tdb(io_channel channel,
 	errcode_t retval = 0;
 	ext2_loff_t offset;
 	struct undo_private_data *data;
-	TDB_DATA tdb_key, tdb_data;
 	unsigned char *read_ptr;
 	unsigned long long end_block;
+	unsigned long long data_size;
+	void *data_ptr;
+	struct undo_key *key;
+	__u32 blk_crc;
 
 	data = (struct undo_private_data *) channel->private_data;
 
-	if (data->tdb == NULL) {
+	if (data->undo_file == NULL) {
 		/*
 		 * Transaction database not initialized
 		 */
@@ -241,13 +331,11 @@ static errcode_t undo_write_tdb(io_channel channel,
 	 */
 	offset = (block * channel->block_size) + data->offset ;
 	block_num = offset / data->tdb_data_size;
-	end_block = (offset + size) / data->tdb_data_size;
+	end_block = (offset + size - 1) / data->tdb_data_size;
 
-	tdb_transaction_start(data->tdb);
-	while (block_num <= end_block ) {
+	while (block_num <= end_block) {
+		__u32 keysz;
 
-		tdb_key.dptr = (unsigned char *)&block_num;
-		tdb_key.dsize = sizeof(block_num);
 		/*
 		 * Check if we have the record already
 		 */
@@ -259,6 +347,22 @@ static errcode_t undo_write_tdb(io_channel channel,
 		}
 		ext2fs_mark_block_bitmap2(data->written_block_map, block_num);
 
+		/* Spit out a key block */
+		if (data->keys_in_block == KEYS_PER_BLOCK(data)) {
+			retval = write_undo_indexes(data);
+			if (retval)
+				return retval;
+			retval = io_channel_write_blk64(data->undo_file,
+							data->key_blk_num, 1,
+							data->keyb);
+			if (retval)
+				return retval;
+		}
+
+		/* Allocate new key block */
+		if (data->keys_in_block == 0)
+			data->undo_blk_num++;
+
 		/*
 		 * Read one block using the backing I/O manager
 		 * The backing I/O manager block size may be
@@ -273,7 +377,6 @@ static errcode_t undo_write_tdb(io_channel channel,
 				((offset - data->offset) % channel->block_size);
 		retval = ext2fs_get_mem(count, &read_ptr);
 		if (retval) {
-			tdb_transaction_cancel(data->tdb);
 			return retval;
 		}
 
@@ -288,41 +391,75 @@ static errcode_t undo_write_tdb(io_channel channel,
 		if (retval) {
 			if (retval != EXT2_ET_SHORT_READ) {
 				free(read_ptr);
-				tdb_transaction_cancel(data->tdb);
 				return retval;
 			}
 			/*
 			 * short read so update the record size
 			 * accordingly
 			 */
-			tdb_data.dsize = actual_size;
+			data_size = actual_size;
 		} else {
-			tdb_data.dsize = data->tdb_data_size;
+			data_size = data->tdb_data_size;
 		}
-		tdb_data.dptr = read_ptr +
-				((offset - data->offset) % channel->block_size);
-#ifdef DEBUG
-		printf("Printing with key %lld data %x and size %d\n",
+		if (data_size == 0) {
+			free(read_ptr);
+			block_num++;
+			continue;
+		}
+		dbg_printf("Read %llu bytes from FS block %llu (blk=%llu cnt=%u)\n",
+		       data_size, backing_blk_num, block, count);
+		if ((data_size % data->undo_file->block_size) == 0)
+			sz = data_size / data->undo_file->block_size;
+		else
+			sz = -actual_size;
+		data_ptr = read_ptr + ((offset - data->offset) %
+				       data->undo_file->block_size);
+		/* extend this key? */
+		if (data->keys_in_block) {
+			key = data->keyb->keys + data->keys_in_block - 1;
+			keysz = ext2fs_le32_to_cpu(key->size);
+		} else {
+			key = NULL;
+			keysz = 0;
+		}
+		if (key != NULL &&
+		    ext2fs_le64_to_cpu(key->fsblk) +
+		    ((keysz + data->tdb_data_size - 1) /
+		     data->tdb_data_size) == backing_blk_num &&
+		    E2UNDO_MAX_EXTENT_BLOCKS * data->tdb_data_size >
+		    keysz + sz) {
+			blk_crc = ext2fs_le32_to_cpu(key->blk_crc);
+			blk_crc = ext2fs_crc32c_le(blk_crc,
+						   (unsigned char *)data_ptr,
+						   data_size);
+			key->blk_crc = ext2fs_cpu_to_le32(blk_crc);
+			key->size = ext2fs_cpu_to_le32(keysz + data_size);
+		} else {
+			data->num_keys++;
+			key = data->keyb->keys + data->keys_in_block;
+			data->keys_in_block++;
+			key->fsblk = ext2fs_cpu_to_le64(backing_blk_num);
+			blk_crc = ext2fs_crc32c_le(~0,
+						   (unsigned char *)data_ptr,
+						   data_size);
+			key->blk_crc = ext2fs_cpu_to_le32(blk_crc);
+			key->size = ext2fs_cpu_to_le32(data_size);
+		}
+		dbg_printf("Writing block %llu to offset %llu size %d key %zu\n",
 		       block_num,
-		       tdb_data.dptr,
-		       tdb_data.dsize);
-#endif
-		retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_REPLACE);
-		if (retval == -1) {
-			/*
-			 * TDB_ERR_EXISTS cannot happen because we
-			 * have already verified it doesn't exist
-			 */
-			tdb_transaction_cancel(data->tdb);
-			retval = EXT2_ET_TDB_ERR_IO;
+		       data->undo_blk_num,
+		       sz, data->num_keys - 1);
+		retval = io_channel_write_blk64(data->undo_file,
+					data->undo_blk_num, sz, data_ptr);
+		if (retval) {
 			free(read_ptr);
 			return retval;
 		}
+		data->undo_blk_num++;
 		free(read_ptr);
 		/* Next block */
 		block_num++;
 	}
-	tdb_transaction_commit(data->tdb);
 
 	return retval;
 }
@@ -344,10 +481,192 @@ static void undo_err_handler_init(io_channel channel)
 	channel->read_error = undo_io_read_error;
 }
 
+static int check_filesystem(struct undo_header *hdr, io_channel undo_file,
+			    unsigned int blocksize, blk64_t super_block,
+			    io_channel channel)
+{
+	struct ext2_super_block super, *sb;
+	char *buf;
+	__u32 sb_crc;
+	errcode_t retval;
+
+	io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
+	retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
+	if (retval)
+		return retval;
+
+	/*
+	 * Compare the FS and the undo file superblock so that we don't
+	 * append to something that doesn't match this FS.
+	 */
+	retval = ext2fs_get_mem(blocksize, &buf);
+	if (retval)
+		return retval;
+	retval = io_channel_read_blk64(undo_file, super_block,
+				       -SUPERBLOCK_SIZE, buf);
+	if (retval)
+		goto out;
+	sb = (struct ext2_super_block *)buf;
+	sb->s_magic = ~sb->s_magic;
+	if (memcmp(&super, buf, sizeof(super))) {
+		retval = -1;
+		goto out;
+	}
+	sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf, SUPERBLOCK_SIZE);
+	if (ext2fs_le32_to_cpu(hdr->sb_crc) != sb_crc) {
+		retval = -1;
+		goto out;
+	}
+
+out:
+	ext2fs_free_mem(&buf);
+	return retval;
+}
+
+/*
+ * Try to re-open the undo file, so that we can resume where we left off.
+ * That way, the user can pass the same undo file to various programs as
+ * part of an FS upgrade instead of having to create multiple files and
+ * then apply them in correct order.
+ */
+static errcode_t try_reopen_undo_file(int undo_fd,
+				      struct undo_private_data *data)
+{
+	struct undo_header hdr;
+	struct undo_key *dkey;
+	ext2fs_struct_stat statbuf;
+	unsigned int blocksize, fs_blocksize;
+	blk64_t super_block, lblk;
+	size_t num_keys, keys_per_block, i;
+	__u32 hdr_crc, key_crc;
+	errcode_t retval;
+
+	/* Zero size already? */
+	retval = ext2fs_fstat(undo_fd, &statbuf);
+	if (retval)
+		goto bad_file;
+	if (statbuf.st_size == 0)
+		goto out;
+
+	/* check the file header */
+	retval = io_channel_read_blk64(data->undo_file, 0, -(int)sizeof(hdr),
+				       &hdr);
+	if (retval)
+		goto bad_file;
+
+	if (memcmp(hdr.magic, E2UNDO_MAGIC,
+		    sizeof(hdr.magic)))
+		goto bad_file;
+	hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&hdr,
+				   sizeof(struct undo_header) -
+				   sizeof(__u32));
+	if (ext2fs_le32_to_cpu(hdr.header_crc) != hdr_crc)
+		goto bad_file;
+	blocksize = ext2fs_le32_to_cpu(hdr.block_size);
+	fs_blocksize = ext2fs_le32_to_cpu(hdr.fs_block_size);
+	if (blocksize > E2UNDO_MAX_BLOCK_SIZE ||
+	    blocksize < E2UNDO_MIN_BLOCK_SIZE ||
+	    !blocksize || !fs_blocksize)
+		goto bad_file;
+	super_block = ext2fs_le64_to_cpu(hdr.super_offset);
+	num_keys = ext2fs_le64_to_cpu(hdr.num_keys);
+	io_channel_set_blksize(data->undo_file, blocksize);
+	if (hdr.f_compat || hdr.f_incompat || hdr.f_rocompat)
+		goto bad_file;
+
+	/* Superblock matches this FS? */
+	if (check_filesystem(&hdr, data->undo_file, blocksize, super_block,
+			     data->real) != 0) {
+		retval = EXT2_ET_UNDO_FILE_WRONG;
+		goto out;
+	}
+
+	/* Try to set ourselves up */
+	data->tdb_data_size = blocksize;
+	retval = undo_setup_tdb(data);
+	if (retval)
+		goto bad_file;
+	data->num_keys = num_keys;
+	data->super_blk_num = super_block;
+	data->first_key_blk = ext2fs_le64_to_cpu(hdr.key_offset);
+
+	/* load the written block map */
+	keys_per_block = KEYS_PER_BLOCK(data);
+	lblk = data->first_key_blk;
+	dbg_printf("nr_keys=%lu, kpb=%zu, blksz=%u\n",
+		   num_keys, keys_per_block, blocksize);
+	for (i = 0; i < num_keys; i += keys_per_block) {
+		size_t j, max_j;
+		__le32 crc;
+
+		data->key_blk_num = lblk;
+		retval = io_channel_read_blk64(data->undo_file,
+					       lblk, 1, data->keyb);
+		if (retval)
+			goto bad_key_replay;
+
+		/* check keys */
+		if (ext2fs_le32_to_cpu(data->keyb->magic) != KEYBLOCK_MAGIC) {
+			retval = EXT2_ET_UNDO_FILE_CORRUPT;
+			goto bad_key_replay;
+		}
+		crc = data->keyb->crc;
+		data->keyb->crc = 0;
+		key_crc = ext2fs_crc32c_le(~0, (unsigned char *)data->keyb,
+					   blocksize);
+		if (ext2fs_le32_to_cpu(crc) != key_crc) {
+			retval = EXT2_ET_UNDO_FILE_CORRUPT;
+			goto bad_key_replay;
+		}
+
+		/* load keys from key block */
+		lblk++;
+		max_j = data->num_keys - i;
+		if (max_j > keys_per_block)
+			max_j = keys_per_block;
+		for (j = 0, dkey = data->keyb->keys;
+		     j < max_j;
+		     j++, dkey++) {
+			blk64_t fsblk = ext2fs_le64_to_cpu(dkey->fsblk);
+			blk64_t undo_blk = fsblk * fs_blocksize / blocksize;
+			size_t size = ext2fs_le32_to_cpu(dkey->size);
+
+			ext2fs_mark_block_bitmap_range2(data->written_block_map,
+					 undo_blk,
+					(size + blocksize - 1) / blocksize);
+			lblk += (size + blocksize - 1) / blocksize;
+			data->undo_blk_num = lblk;
+			data->keys_in_block = j + 1;
+		}
+	}
+	dbg_printf("Reopen undo, keyblk=%llu undoblk=%llu nrkeys=%zu kib=%zu\n",
+		   data->key_blk_num, data->undo_blk_num, data->num_keys,
+		   data->keys_in_block);
+
+	data->hdr.state = hdr.state & ~E2UNDO_STATE_FINISHED;
+	data->hdr.f_compat = hdr.f_compat;
+	data->hdr.f_incompat = hdr.f_incompat;
+	data->hdr.f_rocompat = hdr.f_rocompat;
+	return retval;
+
+bad_key_replay:
+	data->key_blk_num = data->undo_blk_num = 0;
+	data->keys_in_block = 0;
+	ext2fs_free_mem(&data->keyb);
+	ext2fs_free_generic_bitmap(data->written_block_map);
+	data->tdb_written = 0;
+	goto out;
+bad_file:
+	retval = EXT2_ET_UNDO_FILE_CORRUPT;
+out:
+	return retval;
+}
+
 static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 {
 	io_channel	io = NULL;
 	struct undo_private_data *data = NULL;
+	int		undo_fd = -1;
 	errcode_t	retval;
 
 	if (name == 0)
@@ -375,29 +694,32 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 
 	memset(data, 0, sizeof(struct undo_private_data));
 	data->magic = EXT2_ET_MAGIC_UNIX_IO_CHANNEL;
-	data->written_block_map = NULL;
+	data->super_blk_num = 1;
+	data->undo_blk_num = data->first_key_blk = 2;
 
 	if (undo_io_backing_manager) {
 		retval = undo_io_backing_manager->open(name, flags,
 						       &data->real);
 		if (retval)
 			goto cleanup;
+
+		undo_fd = ext2fs_open_file(tdb_file, O_RDWR | O_CREAT, 0600);
+		if (undo_fd < 0)
+			goto cleanup;
+
+		retval = undo_io_backing_manager->open(tdb_file, IO_FLAG_RW,
+						       &data->undo_file);
+		if (retval)
+			goto cleanup;
 	} else {
-		data->real = 0;
+		data->real = NULL;
+		data->undo_file = NULL;
 	}
 
 	if (data->real)
 		io->flags = (io->flags & ~CHANNEL_FLAGS_DISCARD_ZEROES) |
 			    (data->real->flags & CHANNEL_FLAGS_DISCARD_ZEROES);
 
-	/* setup the tdb file */
-	data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST | TDB_NOLOCK | TDB_NOSYNC,
-			     O_RDWR | O_CREAT | O_TRUNC | O_EXCL, 0600);
-	if (!data->tdb) {
-		retval = errno;
-		goto cleanup;
-	}
-
 	/*
 	 * setup err handler for read so that we know
 	 * when the backing manager fails do short read
@@ -405,10 +727,22 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 	if (data->real)
 		undo_err_handler_init(data->real);
 
+	if (data->undo_file) {
+		retval = try_reopen_undo_file(undo_fd, data);
+		if (retval)
+			goto cleanup;
+	}
+
 	*channel = io;
-	return 0;
+	if (undo_fd >= 0)
+		close(undo_fd);
+	return retval;
 
 cleanup:
+	if (undo_fd >= 0)
+		close(undo_fd);
+	if (data && data->undo_file)
+		io_channel_close(data->undo_file);
 	if (data && data->real)
 		io_channel_close(data->real);
 	if (data)
@@ -430,13 +764,14 @@ static errcode_t undo_close(io_channel channel)
 	if (--channel->refcount > 0)
 		return 0;
 	/* Before closing write the file system identity */
-	err = write_file_system_identity(channel, data->tdb);
+	if (!getenv("UNDO_IO_SIMULATE_UNFINISHED"))
+		data->hdr.state = ext2fs_cpu_to_le32(E2UNDO_STATE_FINISHED);
+	err = write_undo_indexes(data);
 	if (data->real)
 		retval = io_channel_close(data->real);
-	if (data->tdb) {
-		tdb_flush(data->tdb);
-		tdb_close(data->tdb);
-	}
+	if (data->undo_file)
+		io_channel_close(data->undo_file);
+	ext2fs_free_mem(&data->keyb);
 	if (data->written_block_map)
 		ext2fs_free_generic_bitmap(data->written_block_map);
 	ext2fs_free_mem(&channel->private_data);
@@ -458,6 +793,9 @@ static errcode_t undo_set_blksize(io_channel channel, int blksize)
 	data = (struct undo_private_data *) channel->private_data;
 	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
 
+	if (blksize > E2UNDO_MAX_BLOCK_SIZE || blksize < E2UNDO_MIN_BLOCK_SIZE)
+		return EXT2_ET_INVALID_ARGUMENT;
+
 	if (data->real)
 		retval = io_channel_set_blksize(data->real, blksize);
 	/*
@@ -632,8 +970,6 @@ static errcode_t undo_flush(io_channel channel)
 	data = (struct undo_private_data *) channel->private_data;
 	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
 
-	if (data->tdb)
-		tdb_flush(data->tdb);
 	if (data->real)
 		retval = io_channel_flush(data->real);
 
@@ -659,6 +995,8 @@ static errcode_t undo_set_option(io_channel channel, const char *option,
 		tmp = strtoul(arg, &end, 0);
 		if (*end)
 			return EXT2_ET_INVALID_ARGUMENT;
+		if (tmp > E2UNDO_MAX_BLOCK_SIZE || tmp < E2UNDO_MIN_BLOCK_SIZE)
+			return EXT2_ET_INVALID_ARGUMENT;
 		if (!data->tdb_data_size || !data->tdb_written) {
 			data->tdb_written = -1;
 			data->tdb_data_size = tmp;
diff --git a/misc/e2undo.8.in b/misc/e2undo.8.in
index 4bf0798..71e8a7b 100644
--- a/misc/e2undo.8.in
+++ b/misc/e2undo.8.in
@@ -10,6 +10,12 @@ e2undo \- Replay an undo log for an ext2/ext3/ext4 filesystem
 [
 .B \-f
 ]
+[
+.B \-n
+]
+[
+.B \-v
+]
 .I undo_log device
 .SH DESCRIPTION
 .B e2undo
@@ -24,13 +30,18 @@ used to undo a failed operation by an e2fsprogs program.
 .B \-f
 Normally,
 .B e2undo
-will check the filesystem UUID and last modified time to make sure the
-undo log matches with the filesystem on the device.  If they do not
-match,
+will check the filesystem superblock to make sure the undo log matches
+with the filesystem on the device.  If they do not match,
 .B e2undo
 will refuse to apply the undo log as a safety mechanism.  The
 .B \-f
 option disables this safety mechanism.
+.TP
+.B \-n
+Dry-run; do not actually write blocks back to the filesystem.
+.TP
+.B \-v
+Report which block we're currently replaying.
 .SH AUTHOR
 .B e2undo
 was written by Aneesh Kumar K.V. (aneesh.kumar@linux.vnet.ibm.com)
diff --git a/misc/e2undo.c b/misc/e2undo.c
index d828d3b..3f312c6 100644
--- a/misc/e2undo.c
+++ b/misc/e2undo.c
@@ -20,30 +20,132 @@
 #if HAVE_ERRNO_H
 #include <errno.h>
 #endif
-#include "ext2fs/tdb.h"
+#include <unistd.h>
 #include "ext2fs/ext2fs.h"
 #include "nls-enable.h"
 
-static unsigned char mtime_key[] = "filesystem MTIME";
-static unsigned char uuid_key[] = "filesystem UUID";
-static unsigned char blksize_key[] = "filesystem BLKSIZE";
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Undo file format: The file is cut up into undo_header.block_size blocks.
+ * The first block contains the header.
+ * The second block contains the superblock.
+ * There is then a repeating series of blocks as follows:
+ *   A key block, which contains undo_keys to map the following data blocks.
+ *   Data blocks
+ * (Note that there are pointers to the first key block and the sb, so this
+ * order isn't strictly necessary.)
+ */
+#define E2UNDO_MAGIC "E2UNDO02"
+#define KEYBLOCK_MAGIC 0xCADECADE
+
+#define E2UNDO_STATE_FINISHED	0x1	/* undo file is complete */
+
+#define E2UNDO_MIN_BLOCK_SIZE	1024	/* undo blocks are no less than 1KB */
+#define E2UNDO_MAX_BLOCK_SIZE	1048576	/* undo blocks are no more than 1MB */
+
+struct undo_header {
+	char magic[8];		/* "E2UNDO02" */
+	__le64 num_keys;	/* how many keys? */
+	__le64 super_offset;	/* where in the file is the superblock copy? */
+	__le64 key_offset;	/* where do the key/data block chunks start? */
+	__le32 block_size;	/* block size of the undo file */
+	__le32 fs_block_size;	/* block size of the target device */
+	__le32 sb_crc;		/* crc32c of the superblock */
+	__le32 state;		/* e2undo state flags */
+	__le32 f_compat;	/* compatible features (none so far) */
+	__le32 f_incompat;	/* incompatible features (none so far) */
+	__le32 f_rocompat;	/* ro compatible features (none so far) */
+	__u8 padding[448];	/* padding */
+	__le32 header_crc;	/* crc32c of the header (but not this field) */
+};
+
+#define E2UNDO_MAX_EXTENT_BLOCKS	512	/* max extent size, in blocks */
+
+struct undo_key {
+	__le64 fsblk;		/* where in the fs does the block go */
+	__le32 blk_crc;		/* crc32c of the block */
+	__le32 size;		/* how many bytes in this block? */
+};
+
+struct undo_key_block {
+	__le32 magic;		/* KEYBLOCK_MAGIC number */
+	__le32 crc;		/* block checksum */
+	__le64 reserved;	/* zero */
+
+	struct undo_key keys[0];	/* keys, which come immediately after */
+};
+
+struct undo_key_info {
+	blk64_t fsblk;
+	blk64_t fileblk;
+	__u32 blk_crc;
+	unsigned int size;
+};
+
+struct undo_context {
+	struct undo_header hdr;
+	io_channel undo_file;
+	unsigned int blocksize, fs_blocksize;
+	blk64_t super_block;
+	size_t num_keys;
+	struct undo_key_info *keys;
+};
+#define KEYS_PER_BLOCK(d) (((d)->blocksize / sizeof(struct undo_key)) - 1)
 
 static char *prg_name;
+static char *undo_file;
 
 static void usage(void)
 {
 	fprintf(stderr,
-		_("Usage: %s <transaction file> <filesystem>\n"), prg_name);
+		_("Usage: %s [-f] [-h] [-n] [-v] <transaction file> <filesystem>\n"), prg_name);
 	exit(1);
 }
 
-static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
+static void dump_header(struct undo_header *hdr)
+{
+	printf("nr keys:\t%llu\n", ext2fs_le64_to_cpu(hdr->num_keys));
+	printf("super block:\t%llu\n", ext2fs_le64_to_cpu(hdr->super_offset));
+	printf("key block:\t%llu\n", ext2fs_le64_to_cpu(hdr->key_offset));
+	printf("block size:\t%u\n", ext2fs_le32_to_cpu(hdr->block_size));
+	printf("fs block size:\t%u\n", ext2fs_le32_to_cpu(hdr->fs_block_size));
+	printf("super crc:\t0x%x\n", ext2fs_le32_to_cpu(hdr->sb_crc));
+	printf("state:\t\t0x%x\n", ext2fs_le32_to_cpu(hdr->state));
+	printf("compat:\t\t0x%x\n", ext2fs_le32_to_cpu(hdr->f_compat));
+	printf("incompat:\t0x%x\n", ext2fs_le32_to_cpu(hdr->f_incompat));
+	printf("rocompat:\t0x%x\n", ext2fs_le32_to_cpu(hdr->f_rocompat));
+	printf("header crc:\t0x%x\n", ext2fs_le32_to_cpu(hdr->header_crc));
+}
+
+static void print_undo_mismatch(struct ext2_super_block *fs_super,
+				struct ext2_super_block *undo_super)
+{
+	printf("%s",
+	       _("The file system superblock doesn't match the undo file.\n"));
+	if (memcmp(fs_super->s_uuid, undo_super->s_uuid,
+		   sizeof(fs_super->s_uuid)))
+		printf("%s", _("UUID does not match.\n"));
+	if (fs_super->s_mtime != undo_super->s_mtime)
+		printf("%s", _("Last mount time does not match.\n"));
+	if (fs_super->s_wtime != undo_super->s_wtime)
+		printf("%s", _("Last write time does not match.\n"));
+	if (fs_super->s_kbytes_written != undo_super->s_kbytes_written)
+		printf("%s", _("Lifetime write counter does not match.\n"));
+}
+
+static int check_filesystem(struct undo_context *ctx, io_channel channel)
 {
-	__u32   s_mtime;
-	__u8    s_uuid[16];
+	struct ext2_super_block super, *sb;
+	char *buf;
+	__u32 sb_crc;
 	errcode_t retval;
-	TDB_DATA tdb_key, tdb_data;
-	struct ext2_super_block super;
 
 	io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
 	retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
@@ -53,83 +155,127 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
 		return retval;
 	}
 
-	tdb_key.dptr = mtime_key;
-	tdb_key.dsize = sizeof(mtime_key);
-	tdb_data = tdb_fetch(tdb, tdb_key);
-	if (!tdb_data.dptr) {
-		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-		com_err(prg_name, retval, "%s",
-			_("while fetching last mount time."));
+	/*
+	 * Compare the FS and the undo file superblock so that we can't apply
+	 * e2undo "patches" out of order.
+	 */
+	retval = ext2fs_get_mem(ctx->blocksize, &buf);
+	if (retval) {
+		com_err(prg_name, retval, "%s", _("while allocating memory"));
 		return retval;
 	}
+	retval = io_channel_read_blk64(ctx->undo_file, ctx->super_block,
+				       -SUPERBLOCK_SIZE, buf);
+	if (retval) {
+		com_err(prg_name, retval, "%s", _("while fetching superblock"));
+		goto out;
+	}
+	sb = (struct ext2_super_block *)buf;
+	sb->s_magic = ~sb->s_magic;
+	if (memcmp(&super, buf, sizeof(super))) {
+		print_undo_mismatch(&super, (struct ext2_super_block *)buf);
+		retval = -1;
+		goto out;
+	}
+	sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf, SUPERBLOCK_SIZE);
+	if (ext2fs_le32_to_cpu(ctx->hdr.sb_crc) != sb_crc) {
+		fprintf(stderr,
+			_("Undo file superblock checksum doesn't match.\n"));
+		retval = -1;
+		goto out;
+	}
 
-	s_mtime = *(__u32 *)tdb_data.dptr;
-	free(tdb_data.dptr);
-	if (super.s_mtime != s_mtime) {
-		com_err(prg_name, 0,
-			_("The filesystem last mount time didn't match %u."),
-			s_mtime);
+out:
+	ext2fs_free_mem(&buf);
+	return retval;
+}
 
-		return  -1;
-	}
+static int key_compare(const void *a, const void *b)
+{
+	const struct undo_key_info *ka, *kb;
 
+	ka = a;
+	kb = b;
+	return ext2fs_le64_to_cpu(ka->fsblk) -
+	       ext2fs_le64_to_cpu(kb->fsblk);
+}
+
+static int e2undo_setup_tdb(const char *name, io_manager *io_ptr)
+{
+	errcode_t retval = 0;
+	const char *tdb_dir;
+	char *tdb_file;
+	char *dev_name, *tmp_name;
 
-	tdb_key.dptr = uuid_key;
-	tdb_key.dsize = sizeof(uuid_key);
-	tdb_data = tdb_fetch(tdb, tdb_key);
-	if (!tdb_data.dptr) {
-		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-		com_err(prg_name, retval, "%s", _("while fetching UUID"));
+	/* (re)open a specific undo file */
+	if (undo_file && undo_file[0] != 0) {
+		set_undo_io_backing_manager(*io_ptr);
+		*io_ptr = undo_io_manager;
+		set_undo_io_backup_file(undo_file);
+		printf(_("To undo the e2undo operation please run "
+			 "the command\n    e2undo %s %s\n\n"),
+			 undo_file, name);
 		return retval;
 	}
-	memcpy(s_uuid, tdb_data.dptr, sizeof(s_uuid));
-	free(tdb_data.dptr);
-	if (memcmp(s_uuid, super.s_uuid, sizeof(s_uuid))) {
-		com_err(prg_name, 0, "%s",
-			_("The filesystem UUID didn't match."));
-		return -1;
+
+	tmp_name = strdup(name);
+	if (!tmp_name) {
+	alloc_fn_fail:
+		com_err(prg_name, ENOMEM, "%s",
+			_("Couldn't allocate memory for tdb filename\n"));
+		return ENOMEM;
 	}
+	dev_name = basename(tmp_name);
 
-	return 0;
-}
+	tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+	if (!tdb_dir)
+		tdb_dir = "/var/lib/e2fsprogs";
 
-static int set_blk_size(TDB_CONTEXT *tdb, io_channel channel)
-{
-	int block_size;
-	errcode_t retval;
-	TDB_DATA tdb_key, tdb_data;
-
-	tdb_key.dptr = blksize_key;
-	tdb_key.dsize = sizeof(blksize_key);
-	tdb_data = tdb_fetch(tdb, tdb_key);
-	if (!tdb_data.dptr) {
-		retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-		com_err(prg_name, retval, "%s", _("while fetching block size"));
+	if (!strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+	    access(tdb_dir, W_OK))
+		return 0;
+
+	tdb_file = malloc(strlen(tdb_dir) + 9 + strlen(dev_name) + 7 + 1);
+	if (!tdb_file)
+		goto alloc_fn_fail;
+	sprintf(tdb_file, "%s/e2undo-%s.e2undo", tdb_dir, dev_name);
+
+	if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+		retval = errno;
+		com_err(prg_name, retval,
+			_("while trying to delete %s"), tdb_file);
+		free(tdb_file);
 		return retval;
 	}
 
-	block_size = *(int *)tdb_data.dptr;
-	free(tdb_data.dptr);
-#ifdef DEBUG
-	printf("Block size %d\n", block_size);
-#endif
-	io_channel_set_blksize(channel, block_size);
-
-	return 0;
+	set_undo_io_backing_manager(*io_ptr);
+	*io_ptr = undo_io_manager;
+	set_undo_io_backup_file(tdb_file);
+	printf(_("To undo the e2undo operation please run "
+		 "the command\n    e2undo %s %s\n\n"),
+		 tdb_file, name);
+	free(tdb_file);
+	free(tmp_name);
+	return retval;
 }
 
 int main(int argc, char *argv[])
 {
-	int c,force = 0;
-	TDB_CONTEXT *tdb;
-	TDB_DATA key, data;
+	int c, force = 0, dry_run = 0, verbose = 0, dump = 0;
 	io_channel channel;
 	errcode_t retval;
-	int  mount_flags;
-	blk64_t  blk_num;
+	int mount_flags, csum_error = 0, io_error = 0;
+	size_t i, keys_per_block;
 	char *device_name, *tdb_file;
 	io_manager manager = unix_io_manager;
-	void *old_dptr = NULL;
+	struct undo_context undo_ctx;
+	char *buf;
+	struct undo_key_block *keyb;
+	struct undo_key *dkey;
+	struct undo_key_info *ikey;
+	__u32 key_crc, blk_crc, hdr_crc;
+	blk64_t lblk;
+	ext2_filsys fs;
 
 #ifdef ENABLE_NLS
 	setlocale(LC_MESSAGES, "");
@@ -141,13 +287,25 @@ int main(int argc, char *argv[])
 	add_error_table(&et_ext2_error_table);
 
 	prg_name = argv[0];
-	while((c = getopt(argc, argv, "f")) != EOF) {
+	while ((c = getopt(argc, argv, "fhnvz:")) != EOF) {
 		switch (c) {
-			case 'f':
-				force = 1;
-				break;
-			default:
-				usage();
+		case 'f':
+			force = 1;
+			break;
+		case 'h':
+			dump = 1;
+			break;
+		case 'n':
+			dry_run = 1;
+			break;
+		case 'v':
+			verbose = 1;
+			break;
+		case 'z':
+			undo_file = optarg;
+			break;
+		default:
+			usage();
 		}
 	}
 
@@ -157,14 +315,70 @@ int main(int argc, char *argv[])
 	tdb_file = argv[optind];
 	device_name = argv[optind+1];
 
-	tdb = tdb_open(tdb_file, 0, 0, O_RDONLY, 0600);
+	if (undo_file && strcmp(tdb_file, undo_file) == 0) {
+		printf(_("Will not write to an undo file while replaying it.\n"));
+		exit(1);
+	}
 
-	if (!tdb) {
+	/* Interpret the undo file */
+	retval = manager->open(tdb_file, IO_FLAG_EXCLUSIVE,
+			       &undo_ctx.undo_file);
+	if (retval) {
 		com_err(prg_name, errno,
 				_("while opening undo file `%s'\n"), tdb_file);
 		exit(1);
 	}
+	retval = io_channel_read_blk64(undo_ctx.undo_file, 0,
+				       -(int)sizeof(undo_ctx.hdr),
+				       &undo_ctx.hdr);
+	if (retval) {
+		com_err(prg_name, retval, _("while reading undo file"));
+		exit(1);
+	}
+	if (memcmp(undo_ctx.hdr.magic, E2UNDO_MAGIC,
+		    sizeof(undo_ctx.hdr.magic))) {
+		fprintf(stderr, _("%s: Not an undo file.\n"), tdb_file);
+		exit(1);
+	}
+	if (dump) {
+		dump_header(&undo_ctx.hdr);
+		exit(1);
+	}
+	hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&undo_ctx.hdr,
+				   sizeof(struct undo_header) -
+				   sizeof(__u32));
+	if (!force && ext2fs_le32_to_cpu(undo_ctx.hdr.header_crc) != hdr_crc) {
+		fprintf(stderr, _("%s: Header checksum doesn't match.\n"),
+			tdb_file);
+		exit(1);
+	}
+	undo_ctx.blocksize = ext2fs_le32_to_cpu(undo_ctx.hdr.block_size);
+	undo_ctx.fs_blocksize = ext2fs_le32_to_cpu(undo_ctx.hdr.fs_block_size);
+	if (undo_ctx.blocksize == 0 || undo_ctx.fs_blocksize == 0) {
+		fprintf(stderr, _("%s: Corrupt undo file header.\n"), tdb_file);
+		exit(1);
+	}
+	if (!force && undo_ctx.blocksize > E2UNDO_MAX_BLOCK_SIZE) {
+		fprintf(stderr, _("%s: Undo block size too large.\n"),
+			tdb_file);
+		exit(1);
+	}
+	if (!force && undo_ctx.blocksize < E2UNDO_MIN_BLOCK_SIZE) {
+		fprintf(stderr, _("%s: Undo block size too small.\n"),
+			tdb_file);
+		exit(1);
+	}
+	undo_ctx.super_block = ext2fs_le64_to_cpu(undo_ctx.hdr.super_offset);
+	undo_ctx.num_keys = ext2fs_le64_to_cpu(undo_ctx.hdr.num_keys);
+	io_channel_set_blksize(undo_ctx.undo_file, undo_ctx.blocksize);
+	if (!force && (undo_ctx.hdr.f_compat || undo_ctx.hdr.f_incompat ||
+		       undo_ctx.hdr.f_rocompat)) {
+		fprintf(stderr, _("%s: Unknown undo file feature set.\n"),
+			tdb_file);
+		exit(1);
+	}
 
+	/* open the fs */
 	retval = ext2fs_check_if_mounted(device_name, &mount_flags);
 	if (retval) {
 		com_err(prg_name, retval, _("Error while determining whether "
@@ -178,53 +392,197 @@ int main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (undo_file) {
+		retval = e2undo_setup_tdb(device_name, &manager);
+		if (retval)
+			exit(1);
+	}
+
 	retval = manager->open(device_name,
-				IO_FLAG_EXCLUSIVE | IO_FLAG_RW,  &channel);
+			       IO_FLAG_EXCLUSIVE | (dry_run ? 0 : IO_FLAG_RW),
+			       &channel);
 	if (retval) {
 		com_err(prg_name, retval,
 				_("while opening `%s'"), device_name);
 		exit(1);
 	}
 
-	if (!force && check_filesystem(tdb, channel)) {
+	if (!force && check_filesystem(&undo_ctx, channel))
 		exit(1);
-	}
 
-	if (set_blk_size(tdb, channel)) {
+	/* prepare to read keys */
+	retval = ext2fs_get_mem(sizeof(struct undo_key_info) * undo_ctx.num_keys,
+				&undo_ctx.keys);
+	if (retval) {
+		com_err(prg_name, retval, "%s", _("while allocating memory"));
+		exit(1);
+	}
+	ikey = undo_ctx.keys;
+	retval = ext2fs_get_mem(undo_ctx.blocksize, &keyb);
+	if (retval) {
+		com_err(prg_name, retval, "%s", _("while allocating memory"));
+		exit(1);
+	}
+	retval = ext2fs_get_mem(E2UNDO_MAX_EXTENT_BLOCKS * undo_ctx.blocksize,
+				&buf);
+	if (retval) {
+		com_err(prg_name, retval, "%s", _("while allocating memory"));
 		exit(1);
 	}
 
-	for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
-		free(old_dptr);
-		old_dptr = key.dptr;
-		if (!strcmp((char *) key.dptr, (char *) mtime_key) ||
-		    !strcmp((char *) key.dptr, (char *) uuid_key) ||
-		    !strcmp((char *) key.dptr, (char *) blksize_key)) {
-			continue;
+	/* load keys */
+	keys_per_block = KEYS_PER_BLOCK(&undo_ctx);
+	lblk = ext2fs_le64_to_cpu(undo_ctx.hdr.key_offset);
+	dbg_printf("nr_keys=%lu, kpb=%zu, blksz=%u\n",
+		   undo_ctx.num_keys, keys_per_block, undo_ctx.blocksize);
+	for (i = 0; i < undo_ctx.num_keys; i += keys_per_block) {
+		size_t j, max_j;
+		__le32 crc;
+
+		retval = io_channel_read_blk64(undo_ctx.undo_file,
+					       lblk, 1, keyb);
+		if (retval) {
+			com_err(prg_name, retval, "%s", _("while reading keys"));
+			if (force) {
+				io_error = 1;
+				undo_ctx.num_keys = i - 1;
+				break;
+			}
+			exit(1);
 		}
 
-		blk_num = *(blk64_t *)key.dptr;
-		data = tdb_fetch(tdb, key);
-		if (!data.dptr) {
-			retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
-			com_err(prg_name, retval,
-				_("while fetching block %llu."), blk_num);
+		/* check keys */
+		if (!force &&
+		    ext2fs_le32_to_cpu(keyb->magic) != KEYBLOCK_MAGIC) {
+			fprintf(stderr, _("%s: wrong key magic at %llu\n"),
+				tdb_file, lblk);
 			exit(1);
 		}
-		printf(_("Replayed transaction of size %zd at location %llu\n"),
-							data.dsize, blk_num);
-		retval = io_channel_write_blk64(channel, blk_num,
-						-data.dsize, data.dptr);
-		free(data.dptr);
-		if (retval == -1) {
-			com_err(prg_name, retval,
-				_("while writing block %llu."), blk_num);
+		crc = keyb->crc;
+		keyb->crc = 0;
+		key_crc = ext2fs_crc32c_le(~0, (unsigned char *)keyb,
+					   undo_ctx.blocksize);
+		if (!force && ext2fs_le32_to_cpu(crc) != key_crc) {
+			fprintf(stderr,
+				_("%s: key block checksum error at %llu.\n"),
+				tdb_file, lblk);
 			exit(1);
 		}
+
+		/* load keys from key block */
+		lblk++;
+		max_j = undo_ctx.num_keys - i;
+		if (max_j > keys_per_block)
+			max_j = keys_per_block;
+		for (j = 0, dkey = keyb->keys;
+		     j < max_j;
+		     j++, ikey++, dkey++) {
+			ikey->fsblk = ext2fs_le64_to_cpu(dkey->fsblk);
+			ikey->fileblk = lblk;
+			ikey->blk_crc = ext2fs_le32_to_cpu(dkey->blk_crc);
+			ikey->size = ext2fs_le32_to_cpu(dkey->size);
+			lblk += (ikey->size + undo_ctx.blocksize - 1) /
+				undo_ctx.blocksize;
+
+			if (E2UNDO_MAX_EXTENT_BLOCKS * undo_ctx.blocksize <
+			    ikey->size) {
+				com_err(prg_name, retval,
+					_("%s: block %llu is too long."),
+					tdb_file, ikey->fsblk);
+				exit(1);
+			}
+
+			/* check each block's crc */
+			retval = io_channel_read_blk64(undo_ctx.undo_file,
+						       ikey->fileblk,
+						       -(int)ikey->size,
+						       buf);
+			if (retval) {
+				com_err(prg_name, retval,
+					_("while fetching block %llu."),
+					ikey->fileblk);
+				if (!force)
+					exit(1);
+				io_error = 1;
+				continue;
+			}
+
+			blk_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf,
+						   ikey->size);
+			if (blk_crc != ikey->blk_crc) {
+				fprintf(stderr,
+					_("checksum error in filesystem block "
+					  "%llu (undo blk %llu)\n"),
+					ikey->fsblk, ikey->fileblk);
+				if (!force)
+					exit(1);
+				csum_error = 1;
+			}
+		}
 	}
-	free(old_dptr);
+	ext2fs_free_mem(&keyb);
+
+	/* sort keys in fs block order */
+	qsort(undo_ctx.keys, undo_ctx.num_keys, sizeof(struct undo_key_info),
+	      key_compare);
+
+	/* replay */
+	io_channel_set_blksize(channel, undo_ctx.fs_blocksize);
+	for (i = 0, ikey = undo_ctx.keys; i < undo_ctx.num_keys; i++, ikey++) {
+		retval = io_channel_read_blk64(undo_ctx.undo_file,
+					       ikey->fileblk,
+					       -(int)ikey->size,
+					       buf);
+		if (retval) {
+			com_err(prg_name, retval,
+				_("while fetching block %llu."),
+				ikey->fileblk);
+			io_error = 1;
+			continue;
+		}
+
+		if (verbose)
+			printf("Replayed block of size %u from %llu to %llu\n",
+				ikey->size, ikey->fileblk, ikey->fsblk);
+		if (dry_run)
+			continue;
+		retval = io_channel_write_blk64(channel, ikey->fsblk,
+						-(int)ikey->size, buf);
+		if (retval) {
+			com_err(prg_name, retval,
+				_("while writing block %llu."), ikey->fsblk);
+			io_error = 1;
+		}
+	}
+
+	if (csum_error)
+		fprintf(stderr, _("Undo file corruption; run e2fsck NOW!\n"));
+	if (io_error)
+		fprintf(stderr, _("IO error during replay; run e2fsck NOW!\n"));
+	if (!(ext2fs_le32_to_cpu(undo_ctx.hdr.state) & E2UNDO_STATE_FINISHED)) {
+		force = 1;
+		fprintf(stderr, _("Incomplete undo record; run e2fsck.\n"));
+	}
+	ext2fs_free_mem(&buf);
+	ext2fs_free_mem(&undo_ctx.keys);
 	io_channel_close(channel);
-	tdb_close(tdb);
 
-	return 0;
+	/* If there were problems, try to force a fsck */
+	if (!dry_run && (force || csum_error || io_error)) {
+		retval = ext2fs_open2(device_name, NULL,
+				   EXT2_FLAG_RW | EXT2_FLAG_64BITS, 0, 0,
+				   manager, &fs);
+		if (retval)
+			goto out;
+		fs->super->s_state &= ~EXT2_VALID_FS;
+		if (csum_error || io_error)
+			fs->super->s_state |= EXT2_ERROR_FS;
+		ext2fs_mark_super_dirty(fs);
+		ext2fs_close_free(&fs);
+	}
+
+out:
+	io_channel_close(undo_ctx.undo_file);
+
+	return csum_error;
 }


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 15/35] libext2fs: support atexit cleanups
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (13 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:31   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
                   ` (18 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Use the atexit() function to provide a means for the library to clean
itself up on program exit.  This will be used by the undo IO manager
to flush the undo file state to disk if the program should terminate
without closing the io channel, since most e2fsprogs clients will
simply exit() when they hit errors.

This won't help for signal termination; client programs must set
up signal handlers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/Makefile.in |    8 +++
 lib/ext2fs/atexit.c    |  112 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/ext2fsP.h   |    5 ++
 lib/ext2fs/undo_io.c   |   32 ++++++++++++--
 4 files changed, 154 insertions(+), 3 deletions(-)
 create mode 100644 lib/ext2fs/atexit.c


diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 367f440..e717ae0 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -57,6 +57,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
 	alloc_sb.o \
 	alloc_stats.o \
 	alloc_tables.o \
+	atexit.o \
 	badblocks.o \
 	bb_inode.o \
 	bitmaps.o \
@@ -133,6 +134,7 @@ SRCS= ext2_err.c \
 	$(srcdir)/alloc_sb.c \
 	$(srcdir)/alloc_stats.c \
 	$(srcdir)/alloc_tables.c \
+	$(srcdir)/atexit.c \
 	$(srcdir)/badblocks.c \
 	$(srcdir)/bb_compat.c \
 	$(srcdir)/bb_inode.c \
@@ -639,6 +641,12 @@ alloc_tables.o: $(srcdir)/alloc_tables.c $(top_builddir)/lib/config.h \
  $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(srcdir)/ext2_ext_attr.h $(srcdir)/bitops.h $(srcdir)/ext2fsP.h
+atexit.o: $(srcdir)/atexit.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
+ $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
+ $(srcdir)/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(srcdir)/ext2_ext_attr.h $(srcdir)/bitops.h $(srcdir)/ext2fsP.h
 badblocks.o: $(srcdir)/badblocks.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fsP.h \
diff --git a/lib/ext2fs/atexit.c b/lib/ext2fs/atexit.c
new file mode 100644
index 0000000..5eba993
--- /dev/null
+++ b/lib/ext2fs/atexit.c
@@ -0,0 +1,112 @@
+/*
+ * atexit.c --- Clean things up when we exit normally.
+ *
+ * Copyright Oracle, 2014
+ * Author Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#define _LARGEFILE_SOURCE
+#define _LARGEFILE64_SOURCE
+
+#include "config.h"
+#include <stdlib.h>
+
+#include "ext2_fs.h"
+#include "ext2fs.h"
+#include "ext2fsP.h"
+
+struct exit_data {
+	ext2_exit_fn func;
+	void *data;
+};
+
+static struct exit_data *items;
+static size_t nr_items;
+
+static void handle_exit(void)
+{
+	struct exit_data *ed;
+
+	for (ed = items + nr_items - 1; ed >= items; ed--) {
+		if (ed->func == NULL)
+			continue;
+		ed->func(ed->data);
+	}
+
+	ext2fs_free_mem(&items);
+	nr_items = 0;
+}
+
+/*
+ * Schedule a function to be called at (normal) program termination.
+ * If you want this to be called during a signal exit, you must capture
+ * the signal and call exit() yourself!
+ */
+errcode_t ext2fs_add_exit_fn(ext2_exit_fn func, void *data)
+{
+	struct exit_data *ed, *free_ed = NULL;
+	size_t x;
+	errcode_t ret;
+
+	if (func == NULL)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	for (x = 0, ed = items; x < nr_items; x++, ed++) {
+		if (ed->func == func && ed->data == data)
+			return EXT2_ET_FILE_EXISTS;
+		if (ed->func == NULL)
+			free_ed = ed;
+	}
+
+	if (free_ed) {
+		free_ed->func = func;
+		free_ed->data = data;
+		return 0;
+	}
+
+	if (nr_items == 0) {
+		ret = atexit(handle_exit);
+		if (ret)
+			return ret;
+	}
+
+	ret = ext2fs_resize_mem(0, (nr_items + 1) * sizeof(struct exit_data),
+				&items);
+	if (ret)
+		return ret;
+
+	items[nr_items].func = func;
+	items[nr_items].data = data;
+	nr_items++;
+
+	return 0;
+}
+
+/* Remove a function from the exit cleanup list. */
+errcode_t ext2fs_remove_exit_fn(ext2_exit_fn func, void *data)
+{
+	struct exit_data *ed;
+	size_t x;
+
+	if (func == NULL)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	for (x = 0, ed = items; x < nr_items; x++, ed++) {
+		if (ed->func == NULL)
+			return 0;
+		if (ed->func == func && ed->data == data) {
+			size_t sz = (nr_items - (x + 1)) *
+				    sizeof(struct exit_data);
+			memmove(ed, ed + 1, sz);
+			memset(items + nr_items - 1, 0,
+			       sizeof(struct exit_data));
+		}
+	}
+
+	return 0;
+}
diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index f8c61e6..8de9d33 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -169,3 +169,8 @@ extern int ext2fs_mem_is_zero(const char *mem, size_t len);
 extern int ext2fs_file_block_offset_too_big(ext2_filsys fs,
 					    struct ext2_inode *inode,
 					    blk64_t offset);
+
+/* atexit support */
+typedef void (*ext2_exit_fn)(void *);
+errcode_t ext2fs_add_exit_fn(ext2_exit_fn fn, void *data);
+errcode_t ext2fs_remove_exit_fn(ext2_exit_fn fn, void *data);
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index f1c107a..df26e3e 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -41,6 +41,7 @@
 
 #include "ext2_fs.h"
 #include "ext2fs.h"
+#include "ext2fsP.h"
 
 #ifdef __GNUC__
 #define ATTR(x) __attribute__(x)
@@ -135,7 +136,7 @@ struct undo_private_data {
 
 	ext2fs_block_bitmap written_block_map;
 	struct struct_ext2_filsys fake_fs;
-
+	char *tdb_file;
 	struct undo_header hdr;
 };
 #define KEYS_PER_BLOCK(d) (((d)->tdb_data_size / sizeof(struct undo_key)) - 1)
@@ -662,6 +663,17 @@ out:
 	return retval;
 }
 
+static void undo_atexit(void *p)
+{
+	struct undo_private_data *data = p;
+	errcode_t err;
+
+	err = write_undo_indexes(data);
+	io_channel_close(data->undo_file);
+
+	com_err(data->tdb_file, err, "while force-closing undo file");
+}
+
 static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 {
 	io_channel	io = NULL;
@@ -703,11 +715,16 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 		if (retval)
 			goto cleanup;
 
-		undo_fd = ext2fs_open_file(tdb_file, O_RDWR | O_CREAT, 0600);
+		data->tdb_file = strdup(tdb_file);
+		if (data->tdb_file == NULL)
+			goto cleanup;
+		undo_fd = ext2fs_open_file(data->tdb_file, O_RDWR | O_CREAT,
+					   0600);
 		if (undo_fd < 0)
 			goto cleanup;
 
-		retval = undo_io_backing_manager->open(tdb_file, IO_FLAG_RW,
+		retval = undo_io_backing_manager->open(data->tdb_file,
+						       IO_FLAG_RW,
 						       &data->undo_file);
 		if (retval)
 			goto cleanup;
@@ -732,6 +749,9 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 		if (retval)
 			goto cleanup;
 	}
+	retval = ext2fs_add_exit_fn(undo_atexit, data);
+	if (retval)
+		goto cleanup;
 
 	*channel = io;
 	if (undo_fd >= 0)
@@ -739,10 +759,13 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
 	return retval;
 
 cleanup:
+	ext2fs_remove_exit_fn(undo_atexit, data);
 	if (undo_fd >= 0)
 		close(undo_fd);
 	if (data && data->undo_file)
 		io_channel_close(data->undo_file);
+	if (data && data->tdb_file)
+		free(data->tdb_file);
 	if (data && data->real)
 		io_channel_close(data->real);
 	if (data)
@@ -769,11 +792,14 @@ static errcode_t undo_close(io_channel channel)
 	err = write_undo_indexes(data);
 	if (data->real)
 		retval = io_channel_close(data->real);
+	if (data->tdb_file)
+		free(data->tdb_file);
 	if (data->undo_file)
 		io_channel_close(data->undo_file);
 	ext2fs_free_mem(&data->keyb);
 	if (data->written_block_map)
 		ext2fs_free_generic_bitmap(data->written_block_map);
+	ext2fs_remove_exit_fn(undo_atexit, data);
 	ext2fs_free_mem(&channel->private_data);
 	if (channel->name)
 		ext2fs_free_mem(&channel->name);


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 16/35] e2fsck: optionally create an undo file
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (14 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:07   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
                   ` (17 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide the user with an option to create an undo file so that they
can roll back a failed repair operation.

v2: Support reopening undo files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.8.in |   10 +++++
 e2fsck/e2fsck.h    |    3 ++
 e2fsck/unix.c      |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 107 insertions(+), 2 deletions(-)


diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index e1bbd27..f6d8436 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -339,6 +339,16 @@ may not be specified at the same time as the
 or
 .B \-p
 options.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file.  This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong.  If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+e2fsck-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
 .SH EXIT CODE
 The exit code returned by
 .B e2fsck
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 453b552..c87f00e 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -391,6 +391,9 @@ struct e2fsck_struct {
 	 * Inodes to rebuild extent trees
 	 */
 	ext2fs_inode_bitmap inodes_to_rebuild;
+
+	/* Undo file */
+	char *undo_file;
 };
 
 /* Data structures to evaluate whether an extent tree needs rebuilding. */
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index f8d088e..8009846 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -45,6 +45,7 @@ extern int optind;
 #ifdef HAVE_DIRENT_H
 #include <dirent.h>
 #endif
+#include <libgen.h>
 
 #include "e2p/e2p.h"
 #include "et/com_err.h"
@@ -75,7 +76,7 @@ static void usage(e2fsck_t ctx)
 		_("Usage: %s [-panyrcdfvtDFV] [-b superblock] [-B blocksize]\n"
 		"\t\t[-I inode_buffer_blocks] [-P process_inode_size]\n"
 		"\t\t[-l|-L bad_blocks_file] [-C fd] [-j external_journal]\n"
-		"\t\t[-E extended-options] device\n"),
+		"\t\t[-E extended-options] [-z undo_file] device\n"),
 		ctx->program_name);
 
 	fprintf(stderr, "%s", _("\nEmergency help:\n"
@@ -91,6 +92,7 @@ static void usage(e2fsck_t ctx)
 		" -j external_journal  Set location of the external journal\n"
 		" -l bad_blocks_file   Add to badblocks list\n"
 		" -L bad_blocks_file   Set badblocks list\n"
+		" -z undo_file         Create an undo file\n"
 		));
 
 	exit(FSCK_USAGE);
@@ -798,7 +800,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 
 	phys_mem_kb = get_memory_size() / 1024;
 	ctx->readahead_kb = ~0ULL;
-	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
+	while ((c = getopt(argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDkz:")) != EOF)
 		switch (c) {
 		case 'C':
 			ctx->progress = e2fsck_update_progress;
@@ -930,6 +932,9 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 		case 'k':
 			keep_bad_blocks++;
 			break;
+		case 'z':
+			ctx->undo_file = optarg;
+			break;
 		default:
 			usage(ctx);
 		}
@@ -1224,6 +1229,87 @@ check_error:
 	return retval;
 }
 
+static int e2fsck_setup_tdb(e2fsck_t ctx, io_manager *io_ptr)
+{
+	errcode_t retval = ENOMEM;
+	char *tdb_dir = NULL, *tdb_file = NULL;
+	char *dev_name, *tmp_name;
+	int free_tdb_dir = 0;
+
+	/* (re)open a specific undo file */
+	if (ctx->undo_file && ctx->undo_file[0] != 0) {
+		set_undo_io_backing_manager(*io_ptr);
+		*io_ptr = undo_io_manager;
+		retval = set_undo_io_backup_file(ctx->undo_file);
+		if (retval)
+			goto err;
+		printf(_("Overwriting existing filesystem; this can be undone "
+			 "using the command:\n"
+			 "    e2undo %s %s\n\n"),
+			ctx->undo_file, ctx->filesystem_name);
+		return 0;
+	}
+
+	/*
+	 * Configuration via a conf file would be
+	 * nice
+	 */
+	tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+	if (!tdb_dir) {
+		profile_get_string(ctx->profile, "defaults",
+				   "undo_dir", 0, "/var/lib/e2fsprogs",
+				   &tdb_dir);
+		free_tdb_dir = 1;
+	}
+
+	if (!strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+	    access(tdb_dir, W_OK)) {
+		if (free_tdb_dir)
+			free(tdb_dir);
+		return 0;
+	}
+
+	tmp_name = strdup(ctx->filesystem_name);
+	if (!tmp_name)
+		goto errout;
+	dev_name = basename(tmp_name);
+	tdb_file = malloc(strlen(tdb_dir) + 8 + strlen(dev_name) + 7 + 1);
+	if (!tdb_file) {
+		free(tmp_name);
+		goto errout;
+	}
+	sprintf(tdb_file, "%s/e2fsck-%s.e2undo", tdb_dir, dev_name);
+	free(tmp_name);
+
+	if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+		retval = errno;
+		goto errout;
+	}
+
+	set_undo_io_backing_manager(*io_ptr);
+	*io_ptr = undo_io_manager;
+	retval = set_undo_io_backup_file(tdb_file);
+	if (retval)
+		goto errout;
+	printf(_("Overwriting existing filesystem; this can be undone "
+		 "using the command:\n"
+		 "    e2undo %s %s\n\n"), tdb_file, ctx->filesystem_name);
+
+	if (free_tdb_dir)
+		free(tdb_dir);
+	free(tdb_file);
+	return 0;
+
+errout:
+	if (free_tdb_dir)
+		free(tdb_dir);
+	free(tdb_file);
+err:
+	com_err(ctx->program_name, retval, "%s",
+		_("while trying to setup undo file\n"));
+	return retval;
+}
+
 int main (int argc, char *argv[])
 {
 	errcode_t	retval = 0, retval2 = 0, orig_retval = 0;
@@ -1333,6 +1419,12 @@ restart:
 			flags &= ~EXT2_FLAG_EXCLUSIVE;
 	}
 
+	if (ctx->undo_file) {
+		retval = e2fsck_setup_tdb(ctx, &io_ptr);
+		if (retval)
+			exit(FSCK_ERROR);
+	}
+
 	ctx->openfs_flags = flags;
 	retval = try_open_fs(ctx, flags, io_ptr, &fs);
 


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 17/35] resize2fs: optionally create undo file
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (15 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:36   ` Theodore Ts'o
  2015-04-02  2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
                   ` (16 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide the user with an option to create an undo file so that they
can roll back a failed resize operation.

v2: Allow reopening of undo files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/main.c         |   93 +++++++++++++++++++++++++++++++++++++++++++++++--
 resize/resize2fs.8.in |   14 +++++++
 2 files changed, 103 insertions(+), 4 deletions(-)


diff --git a/resize/main.c b/resize/main.c
index c25de61..a61943e 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -29,6 +29,7 @@ extern int optind;
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
+#include <libgen.h>
 
 #include "e2p/e2p.h"
 
@@ -42,7 +43,8 @@ static char *device_name, *io_options;
 static void usage (char *prog)
 {
 	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
-			   "[-p] device [-b|-s|new_size]\n\n"), prog);
+			   "[-p] device [-b|-s|new_size] [-z undo_file]\n\n"),
+		 prog);
 
 	exit (1);
 }
@@ -162,6 +164,82 @@ static void bigalloc_check(ext2_filsys fs, int force)
 	}
 }
 
+static int resize2fs_setup_tdb(const char *device_name, char *undo_file,
+			       io_manager *io_ptr)
+{
+	errcode_t retval = ENOMEM;
+	char *tdb_dir = NULL, *tdb_file = NULL;
+	char *dev_name, *tmp_name;
+	int free_tdb_dir = 0;
+
+	/* (re)open a specific undo file */
+	if (undo_file && undo_file[0] != 0) {
+		set_undo_io_backing_manager(*io_ptr);
+		*io_ptr = undo_io_manager;
+		retval = set_undo_io_backup_file(undo_file);
+		if (retval)
+			goto err;
+		printf(_("Overwriting existing filesystem; this can be undone "
+			 "using the command:\n"
+			 "    e2undo %s %s\n\n"),
+			undo_file, device_name);
+		return 0;
+	}
+
+	/*
+	 * Configuration via a conf file would be
+	 * nice
+	 */
+	tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+
+	if (tdb_dir == NULL || !strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+	    access(tdb_dir, W_OK)) {
+		if (free_tdb_dir)
+			free(tdb_dir);
+		return 0;
+	}
+
+	tmp_name = strdup(device_name);
+	if (!tmp_name)
+		goto errout;
+	dev_name = basename(tmp_name);
+	tdb_file = malloc(strlen(tdb_dir) + 8 + strlen(dev_name) + 7 + 1);
+	if (!tdb_file) {
+		free(tmp_name);
+		goto errout;
+	}
+	sprintf(tdb_file, "%s/resize2fs-%s.e2undo", tdb_dir, dev_name);
+	free(tmp_name);
+
+	if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+		retval = errno;
+		goto errout;
+	}
+
+	set_undo_io_backing_manager(*io_ptr);
+	*io_ptr = undo_io_manager;
+	retval = set_undo_io_backup_file(tdb_file);
+	if (retval)
+		goto errout;
+	printf(_("Overwriting existing filesystem; this can be undone "
+		 "using the command:\n"
+		 "    e2undo %s %s\n\n"), tdb_file, device_name);
+
+	if (free_tdb_dir)
+		free(tdb_dir);
+	free(tdb_file);
+	return 0;
+
+errout:
+	if (free_tdb_dir)
+		free(tdb_dir);
+	free(tdb_file);
+err:
+	com_err(program_name, retval, "%s",
+		_("while trying to setup undo file\n"));
+	return retval;
+}
+
 int main (int argc, char ** argv)
 {
 	errcode_t	retval;
@@ -186,7 +264,7 @@ int main (int argc, char ** argv)
 	unsigned int	blocksize;
 	long		sysval;
 	int		len, mount_flags;
-	char		*mtpt;
+	char		*mtpt, *undo_file = NULL;
 
 #ifdef ENABLE_NLS
 	setlocale(LC_MESSAGES, "");
@@ -203,7 +281,7 @@ int main (int argc, char ** argv)
 	if (argc && *argv)
 		program_name = *argv;
 
-	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
+	while ((c = getopt(argc, argv, "d:fFhMPpS:bsz:")) != EOF) {
 		switch (c) {
 		case 'h':
 			usage(program_name);
@@ -235,6 +313,9 @@ int main (int argc, char ** argv)
 		case 's':
 			flags |= RESIZE_DISABLE_64BIT;
 			break;
+		case 'z':
+			undo_file = optarg;
+			break;
 		default:
 			usage(program_name);
 		}
@@ -318,7 +399,11 @@ int main (int argc, char ** argv)
 		io_flags = EXT2_FLAG_RW | EXT2_FLAG_EXCLUSIVE;
 
 	io_flags |= EXT2_FLAG_64BITS;
-
+	if (undo_file) {
+		retval = resize2fs_setup_tdb(device_name, undo_file, &io_ptr);
+		if (retval)
+			exit(1);
+	}
 	retval = ext2fs_open2(device_name, io_options, io_flags,
 			      0, 0, io_ptr, &fs);
 	if (retval) {
diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
index 0129bfc..d2738e9 100644
--- a/resize/resize2fs.8.in
+++ b/resize/resize2fs.8.in
@@ -18,6 +18,10 @@ resize2fs \- ext2/ext3/ext4 file system resizer
 .B \-S
 .I RAID-stride
 ]
+[
+.B \-z
+.I undo_file
+]
 .I device
 [
 .I size
@@ -149,6 +153,16 @@ The
 program will heuristically determine the RAID stride that was specified
 when the filesystem was created.  This option allows the user to
 explicitly specify a RAID stride setting to be used by resize2fs instead.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file.  This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong.  If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+resize2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
 .SH KNOWN BUGS
 The minimum size of the filesystem as estimated by resize2fs may be
 incorrect, especially for filesystems with 1k and 2k blocksizes.


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 18/35] tune2fs: optionally create undo file
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (16 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
@ 2015-04-02  2:35 ` Darrick J. Wong
  2015-05-05 14:36   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
                   ` (15 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:35 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide the user with an option to create an undo file so that they
can roll back a failed tuning operation.  Previously, one would be
created for inode resize if a bunch of (undocumented) conditions were
met.

v2: Enable re-opening of undo files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/tune2fs.8.in |   14 ++++++++++++++
 misc/tune2fs.c    |   30 ++++++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 4 deletions(-)


diff --git a/misc/tune2fs.8.in b/misc/tune2fs.8.in
index 9d1df82..4373fc4 100644
--- a/misc/tune2fs.8.in
+++ b/misc/tune2fs.8.in
@@ -88,6 +88,10 @@ tune2fs \- adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems
 .B \-U
 .I UUID
 ]
+[
+.B \-z
+.I undo_file
+]
 device
 .SH DESCRIPTION
 .BI tune2fs
@@ -687,6 +691,16 @@ or
 .IR /dev/urandom ,
 .B tune2fs
 will automatically use a time-based UUID instead of a randomly-generated UUID.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file.  This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong.  If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+tune2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
 .SH BUGS
 We haven't found any bugs yet.  That doesn't mean there aren't any...
 .SH AUTHOR
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 550932d..ddaad59 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -99,6 +99,7 @@ static int usrquota, grpquota;
 static int rewrite_checksums;
 static int feature_64bit;
 static int fsck_requested;
+static char *undo_file;
 
 int journal_size, journal_flags;
 char *journal_device;
@@ -136,7 +137,8 @@ static void usage(void)
 		  "\t[-Q quota_options]\n"
 #endif
 		  "\t[-E extended-option[,...]] [-T last_check_time] "
-		  "[-U UUID]\n\t[ -I new_inode_size ] device\n"), program_name);
+		  "[-U UUID]\n\t[-I new_inode_size] [-z undo_file] device\n"),
+		program_name);
 	exit(1);
 }
 
@@ -465,6 +467,8 @@ static void convert_64bit(ext2_filsys fs, int direction)
 		fprintf(stderr, _("Please run `resize2fs %s %s"),
 			direction > 0 ? "-b" : "-s", fs->device_name);
 
+	if (undo_file)
+		fprintf(stderr, _(" -z \"%s\""), undo_file);
 	if (direction > 0)
 		fprintf(stderr, _("' to enable 64-bit mode.\n"));
 	else
@@ -1563,7 +1567,7 @@ static void parse_tune2fs_options(int argc, char **argv)
 	char *tmp;
 	struct group *gr;
 	struct passwd *pw;
-	char optstring[100] = "c:e:fg:i:jlm:o:r:s:u:C:E:I:J:L:M:O:T:U:";
+	char optstring[100] = "c:e:fg:i:jlm:o:r:s:u:C:E:I:J:L:M:O:T:U:z:";
 
 #ifdef CONFIG_QUOTA
 	strcat(optstring, "Q:");
@@ -1797,6 +1801,9 @@ static void parse_tune2fs_options(int argc, char **argv)
 			open_flag = EXT2_FLAG_RW;
 			I_flag = 1;
 			break;
+		case 'z':
+			undo_file = optarg;
+			break;
 		default:
 			usage();
 		}
@@ -2517,6 +2524,17 @@ static int tune2fs_setup_tdb(const char *name, io_manager *io_ptr)
 	char *tdb_file;
 	char *dev_name, *tmp_name;
 
+	/* (re)open a specific undo file */
+	if (undo_file && undo_file[0] != 0) {
+		set_undo_io_backing_manager(*io_ptr);
+		*io_ptr = undo_io_manager;
+		set_undo_io_backup_file(undo_file);
+		printf(_("To undo the tune2fs operation please run "
+			 "the command\n    e2undo %s %s\n\n"),
+			 undo_file, name);
+		return retval;
+	}
+
 #if 0 /* FIXME!! */
 	/*
 	 * Configuration via a conf file would be
@@ -2712,7 +2730,7 @@ retry_open:
 	}
 	fs->default_bitmap_type = EXT2FS_BMAP64_RBTREE;
 
-	if (I_flag && !io_ptr_orig) {
+	if (I_flag) {
 		/*
 		 * Check the inode size is right so we can issue an
 		 * error message and bail before setting up the tdb
@@ -2736,11 +2754,15 @@ retry_open:
 			rc = 1;
 			goto closefs;
 		}

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 19/35] mke2fs: optionally create undo file
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (17 preceding siblings ...)
  2015-04-02  2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:37   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
                   ` (14 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide the user with an option to create an undo file so that they
can roll back a failed tuning operation.  Previously, one would be
created if force_undo was set in the configuration file and a bunch of
(undocumented) conditions were met.

v2: Support reopening undo files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.8.in |   15 +++++++++++++++
 misc/mke2fs.c    |   25 ++++++++++++++++++++++---
 2 files changed, 37 insertions(+), 3 deletions(-)


diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index aeb5caf..3230f65 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -117,6 +117,10 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
 .B \-e
 .I errors-behavior
 ]
+[
+.B \-z
+.I undo_file
+]
 .I device
 [
 .I fs-size
@@ -738,6 +742,17 @@ Verbose execution.
 Print the version number of
 .B mke2fs
 and exit.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file.  This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong.  If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+mke2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable or the \fIundo_dir\fR directive
+in the configuration file.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
 .SH ENVIRONMENT
 .TP
 .BI MKE2FS_SYNC
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index ec450ad..f5ef703 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -110,6 +110,7 @@ char *journal_device;
 static int sync_kludge;	/* Set using the MKE2FS_SYNC env. option */
 char **fs_types;
 const char *root_dir;  /* Copy files from the specified directory */
+static char *undo_file;
 
 static profile_t	profile;
 
@@ -129,7 +130,8 @@ static void usage(void)
 	"[-M last-mounted-directory]\n\t[-O feature[,...]] "
 	"[-r fs-revision] [-E extended-option[,...]]\n"
 	"\t[-t fs-type] [-T usage-type ] [-U UUID] [-e errors_behavior]"
-	"[-jnqvDFKSV] device [blocks-count]\n"),
+	"[-z undo_file]\n"
+	"\t[-jnqvDFKSV] device [blocks-count]\n"),
 		program_name);
 	exit(1);
 }
@@ -1552,7 +1554,7 @@ profile_error:
 	}
 
 	while ((c = getopt (argc, argv,
-		    "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
+		    "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:Vz:")) != EOF) {
 		switch (c) {
 		case 'b':
 			blocksize = parse_num_blocks2(optarg, -1);
@@ -1775,6 +1777,9 @@ profile_error:
 			/* Print version number and exit */
 			show_version_only++;
 			break;
+		case 'z':
+			undo_file = optarg;
+			break;
 		default:
 			usage();
 		}
@@ -2493,6 +2498,19 @@ static int mke2fs_setup_tdb(const char *name, io_manager *io_ptr)
 	char *dev_name, *tmp_name;
 	int free_tdb_dir = 0;
 
+	/* (re)open a specific undo file */
+	if (undo_file && undo_file[0] != 0) {
+		set_undo_io_backing_manager(*io_ptr);
+		*io_ptr = undo_io_manager;
+		retval = set_undo_io_backup_file(undo_file);
+		if (retval)
+			goto err;
+		printf(_("Overwriting existing filesystem; this can be undone "
+			 "using the command:\n"
+			 "    e2undo %s %s\n\n"), undo_file, name);
+		return 0;
+	}
+
 	/*
 	 * Configuration via a conf file would be
 	 * nice
@@ -2547,6 +2565,7 @@ errout:
 	if (free_tdb_dir)
 		free(tdb_dir);
 	free(tdb_file);
+err:
 	com_err(program_name, retval, "%s",
 		_("while trying to setup undo file\n"));
 	return retval;
@@ -2718,7 +2737,7 @@ int main (int argc, char *argv[])
 #endif
 		io_ptr = unix_io_manager;
 
-	if (should_do_undo(device_name)) {
+	if (undo_file != NULL || should_do_undo(device_name)) {
 		retval = mke2fs_setup_tdb(device_name, &io_ptr);
 		if (retval)
 			exit(1);


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 20/35] debugfs: optionally create undo file
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (18 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:43   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
                   ` (13 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide the user with an option to create an undo file so that they
can roll back a failed debugfs expedition.

v2: Support reopening undo files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.8.in |   16 +++++++-
 debugfs/debugfs.c    |  105 +++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 113 insertions(+), 8 deletions(-)


diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index ab6adc4..9a09cbf 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -31,6 +31,10 @@ request
 data_source_device
 ]
 [
+.B \-z
+.I undo_file
+]
+[
 device
 ]
 .SH DESCRIPTION
@@ -130,6 +134,16 @@ and then exit.
 print the version number of
 .B debugfs
 and exit.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file.  This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong.  If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+resize2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
 .SH SPECIFYING FILES
 Many
 .B debugfs
@@ -535,7 +549,7 @@ to those inodes.  The
 flag will enable checking the file type information in the directory
 entry to make sure it matches the inode's type.
 .TP
-.BI open " [-weficD] [-b blocksize] [-s superblock] device"
+.BI open " [-weficD] [-b blocksize] [-s superblock] [-z undo_file] device"
 Open a filesystem for editing.  The
 .I -f
 flag forces the filesystem to be opened even if there are some unknown
diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index fe57366..4b88f73 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -15,6 +15,7 @@
 #include <ctype.h>
 #include <string.h>
 #include <time.h>
+#include <libgen.h>
 #ifdef HAVE_GETOPT_H
 #include <getopt.h>
 #else
@@ -48,12 +49,88 @@ ext2_filsys	current_fs;
 quota_ctx_t	current_qctx;
 ext2_ino_t	root, cwd;
 
+static int debugfs_setup_tdb(const char *device_name, char *undo_file,
+			     io_manager *io_ptr)
+{
+	errcode_t retval = ENOMEM;
+	char *tdb_dir = NULL, *tdb_file = NULL;
+	char *dev_name, *tmp_name;
+	int free_tdb_dir = 0;
+
+	/* (re)open a specific undo file */
+	if (undo_file && undo_file[0] != 0) {
+		set_undo_io_backing_manager(*io_ptr);
+		*io_ptr = undo_io_manager;
+		retval = set_undo_io_backup_file(undo_file);
+		if (retval)
+			goto err;
+		printf("Overwriting existing filesystem; this can be undone "
+			"using the command:\n"
+			"    e2undo %s %s\n\n",
+			undo_file, device_name);
+		return 0;
+	}
+
+	/*
+	 * Configuration via a conf file would be
+	 * nice
+	 */
+	tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+
+	if (tdb_dir == NULL || !strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+	    access(tdb_dir, W_OK)) {
+		if (free_tdb_dir)
+			free(tdb_dir);
+		return 0;
+	}
+
+	tmp_name = strdup(device_name);
+	if (!tmp_name)
+		goto errout;
+	dev_name = basename(tmp_name);
+	tdb_file = malloc(strlen(tdb_dir) + 8 + strlen(dev_name) + 7 + 1);
+	if (!tdb_file) {
+		free(tmp_name);
+		goto errout;
+	}
+	sprintf(tdb_file, "%s/debugfs-%s.e2undo", tdb_dir, dev_name);
+	free(tmp_name);
+
+	if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+		retval = errno;
+		goto errout;
+	}
+
+	set_undo_io_backing_manager(*io_ptr);
+	*io_ptr = undo_io_manager;
+	retval = set_undo_io_backup_file(tdb_file);
+	if (retval)
+		goto errout;
+	printf("Overwriting existing filesystem; this can be undone "
+		"using the command:\n"
+		"    e2undo %s %s\n\n", tdb_file, device_name);
+
+	if (free_tdb_dir)
+		free(tdb_dir);
+	free(tdb_file);
+	return 0;
+
+errout:
+	if (free_tdb_dir)
+		free(tdb_dir);
+	free(tdb_file);
+err:
+	com_err("debugfs", retval, "while trying to setup undo file\n");
+	return retval;
+}
+
 static void open_filesystem(char *device, int open_flags, blk64_t superblock,
 			    blk64_t blocksize, int catastrophic,
-			    char *data_filename)
+			    char *data_filename, char *undo_file)
 {
 	int	retval;
 	io_channel data_io = 0;
+	io_manager io_ptr = unix_io_manager;
 
 	if (superblock != 0 && blocksize == 0) {
 		com_err(device, 0, "if you specify the superblock, you must also specify the block size");
@@ -84,8 +161,14 @@ static void open_filesystem(char *device, int open_flags, blk64_t superblock,
 	if (catastrophic)
 		open_flags |= EXT2_FLAG_SKIP_MMP;
 
+	if (undo_file) {
+		retval = debugfs_setup_tdb(device, undo_file, &io_ptr);
+		if (retval)
+			exit(1);
+	}
+
 	retval = ext2fs_open(device, open_flags, superblock, blocksize,
-			     unix_io_manager, &current_fs);
+			     io_ptr, &current_fs);
 	if (retval) {
 		com_err(device, retval, "while opening filesystem");
 		if (retval == EXT2_ET_BAD_MAGIC)
@@ -136,9 +219,10 @@ void do_open_filesys(int argc, char **argv)
 	blk64_t	blocksize = 0;
 	int	open_flags = EXT2_FLAG_SOFTSUPP_FEATURES | EXT2_FLAG_64BITS; 
 	char	*data_filename = 0;
+	char	*undo_file = NULL;
 
 	reset_getopt();
-	while ((c = getopt (argc, argv, "iwfecb:s:d:D")) != EOF) {
+	while ((c = getopt(argc, argv, "iwfecb:s:d:Dz:")) != EOF) {
 		switch (c) {
 		case 'i':
 			open_flags |= EXT2_FLAG_IMAGE_FILE;
@@ -177,6 +261,9 @@ void do_open_filesys(int argc, char **argv)
 			if (err)
 				return;
 			break;
+		case 'z':
+			undo_file = optarg;
+			break;
 		default:
 			goto print_usage;
 		}
@@ -188,7 +275,7 @@ void do_open_filesys(int argc, char **argv)
 		return;
 	open_filesystem(argv[optind], open_flags,
 			superblock, blocksize, catastrophic,
-			data_filename);
+			data_filename, undo_file);
 	return;
 
 print_usage:
@@ -2219,7 +2306,7 @@ int main(int argc, char **argv)
 		"Usage: %s [-b blocksize] [-s superblock] [-f cmd_file] "
 		"[-R request] [-V] ["
 #ifndef READ_ONLY
-		"[-w] "
+		"[-w] [-z undo_file] "
 #endif
 		"[-c] device]";
 	int		c;
@@ -2234,7 +2321,8 @@ int main(int argc, char **argv)
 #ifdef READ_ONLY
 	const char	*opt_string = "nicR:f:b:s:Vd:D";
 #else
-	const char	*opt_string = "niwcR:f:b:s:Vd:D";
+	const char	*opt_string = "niwcR:f:b:s:Vd:Dz:";
+	char		*undo_file = NULL;
 #endif
 
 	if (debug_prog_name == 0)
@@ -2291,6 +2379,9 @@ int main(int argc, char **argv)
 			fprintf(stderr, "\tUsing %s\n",
 				error_message(EXT2_ET_BASE));
 			exit(0);
+		case 'z':
+			undo_file = optarg;
+			break;
 		default:
 			com_err(argv[0], 0, usage, debug_prog_name);
 			return 1;
@@ -2299,7 +2390,7 @@ int main(int argc, char **argv)
 	if (optind < argc)
 		open_filesystem(argv[optind], open_flags,
 				superblock, blocksize, catastrophic,
-				data_filename);
+				data_filename, undo_file);
 
 	sci_idx = ss_create_invocation(debug_prog_name, "0.0", (char *) NULL,
 				       &debug_cmds, &retval);


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (19 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:43   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
                   ` (12 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Regression tests to ensure that we can create undo files and roll
things back if need be.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/test_config                           |    1 
 tests/u_compound_rollback/script            |   62 ++++++++++++
 tests/u_debugfs_opt/script                  |   32 ++++++
 tests/u_e2fsck_opt/script                   |   32 ++++++
 tests/u_mke2fs/script                       |    4 -
 tests/u_mke2fs_opt/script                   |   32 ++++++
 tests/u_mke2fs_opt_oddsize/script           |   31 ++++++
 tests/u_resize2fs_opt/script                |   32 ++++++
 tests/u_revert_upgrade_to_64bitmcsum/script |  136 +++++++++++++++++++++++++++
 tests/u_tune2fs/script                      |    4 -
 tests/u_tune2fs_opt/script                  |   32 ++++++
 11 files changed, 394 insertions(+), 4 deletions(-)
 create mode 100644 tests/u_compound_rollback/script
 create mode 100644 tests/u_debugfs_opt/script
 create mode 100644 tests/u_e2fsck_opt/script
 create mode 100644 tests/u_mke2fs_opt/script
 create mode 100644 tests/u_mke2fs_opt_oddsize/script
 create mode 100644 tests/u_resize2fs_opt/script
 create mode 100644 tests/u_revert_upgrade_to_64bitmcsum/script
 create mode 100644 tests/u_tune2fs_opt/script


diff --git a/tests/test_config b/tests/test_config
index 2e3af6b..7f39157 100644
--- a/tests/test_config
+++ b/tests/test_config
@@ -17,6 +17,7 @@ TEST_BITS="../debugfs/debugfs"
 RESIZE2FS_EXE="../resize/resize2fs"
 RESIZE2FS="$USE_VALGRIND $RESIZE2FS_EXE"
 E2UNDO_EXE="../misc/e2undo"
+E2UNDO="$USE_VALGRIND $E2UNDO_EXE"
 TEST_REL=../tests/progs/test_rel
 TEST_ICOUNT=../tests/progs/test_icount
 CRCSUM=../tests/progs/crcsum
diff --git a/tests/u_compound_rollback/script b/tests/u_compound_rollback/script
new file mode 100644
index 0000000..0c1fbcc
--- /dev/null
+++ b/tests/u_compound_rollback/script
@@ -0,0 +1,62 @@
+test_description="e2undo with mke2fs/tune2fs/resize2fs/e2fsck -z"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+echo compound e2undo rollback test > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc2 >> $OUT
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.2 $TMPFILE 512 >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc3 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+
+echo roll back e2fsck >> $OUT
+$E2UNDO $TDB_FILE.3 $TMPFILE  >> $OUT 2>&1
+crc3_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck $crc3_2 >> $OUT
+
+echo roll back resize2fs >> $OUT
+$E2UNDO $TDB_FILE.2 $TMPFILE  >> $OUT 2>&1
+crc2_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo resize2fs $crc2_2 >> $OUT
+
+echo roll back tune2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE  >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo tune2fs $crc1_2 >> $OUT
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+
+if [ $crc0 = $crc0_2 ] && [ $crc1 = $crc1_2 ] && [ $crc2 = $crc2_2 ] && [ $crc3 = $crc3_2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TDB_FILE.2 $TDB_FILE.3 $TMPFILE
+fi
diff --git a/tests/u_debugfs_opt/script b/tests/u_debugfs_opt/script
new file mode 100644
index 0000000..bb93917
--- /dev/null
+++ b/tests/u_debugfs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with debugfs -z"
+if test -x $E2UNDO_EXE -a -x $DEBUGFS_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before debugfs $crc0 >> $OUT
+
+echo using debugfs to test e2undo >> $OUT
+$DEBUGFS -w -z $TDB_FILE -R 'zap -p 0x55 0' $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after debugfs $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_e2fsck_opt/script b/tests/u_e2fsck_opt/script
new file mode 100644
index 0000000..d61cd2b
--- /dev/null
+++ b/tests/u_e2fsck_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with e2fsck -z"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/e2fsck-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before e2fsck $crc0 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_mke2fs/script b/tests/u_mke2fs/script
index d249ddd..f1041a9 100644
--- a/tests/u_mke2fs/script
+++ b/tests/u_mke2fs/script
@@ -1,7 +1,7 @@
 test_description="e2undo with mke2fs"
 if test -x $E2UNDO_EXE; then
 
-E2FSPROGS_UNDO_DIR=/tmp
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
 export E2FSPROGS_UNDO_DIR
 TDB_FILE=$E2FSPROGS_UNDO_DIR/mke2fs-$(basename $TMPFILE).e2undo
 OUT=$test_name.log
@@ -19,7 +19,7 @@ $MKE2FS -q -F -o Linux -I 256 -O uninit_bg -E lazy_itable_init=1 -b 1024 $TMPFIL
 new_crc=`$CRCSUM $TMPFILE`
 echo $CRCSUM after mke2fs $new_crc >> $OUT
 
-$E2UNDO_EXE  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
 new_crc=`$CRCSUM $TMPFILE`
 echo $CRCSUM after e2undo $new_crc >> $OUT
 
diff --git a/tests/u_mke2fs_opt/script b/tests/u_mke2fs_opt/script
new file mode 100644
index 0000000..db62ab2
--- /dev/null
+++ b/tests/u_mke2fs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with mke2fs -z"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/mke2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -F -o Linux -I 128 -b 1024 test.img  > $OUT
+$MKE2FS -F -o Linux -I 128 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+echo using mke2fs to test e2undo >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -E lazy_itable_init=1 -b 1024 -z $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_mke2fs_opt_oddsize/script b/tests/u_mke2fs_opt_oddsize/script
new file mode 100644
index 0000000..23e0b9e
--- /dev/null
+++ b/tests/u_mke2fs_opt_oddsize/script
@@ -0,0 +1,31 @@
+test_description="e2undo with mke2fs -z and non-32k-aligned bdev size"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/mke2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+yes "abc123abc123abc" | dd bs=1k count=8 >> $TMPFILE 2> /dev/null
+
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 > $OUT
+
+echo using mke2fs to test e2undo >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -E lazy_itable_init=1 -b 1024 -z $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_resize2fs_opt/script b/tests/u_resize2fs_opt/script
new file mode 100644
index 0000000..fe1e04d
--- /dev/null
+++ b/tests/u_resize2fs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with resize2fs -z"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE 256 > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE 256 >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before resize2fs $crc0 >> $OUT
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE $TMPFILE 512 >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_revert_upgrade_to_64bitmcsum/script b/tests/u_revert_upgrade_to_64bitmcsum/script
new file mode 100644
index 0000000..6120d00
--- /dev/null
+++ b/tests/u_revert_upgrade_to_64bitmcsum/script
@@ -0,0 +1,136 @@
+test_description="convert fs to 64bit,metadata_csum and revert both changes"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+fail=0
+
+echo convert fs to 64bit,metadata_csum and revert both changes > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+CONF=$TMPFILE.conf
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode
+		blocksize = 4096
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1M
+		zero_hugefiles = false
+	}
+ENDL
+
+echo mke2fs -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE 524288 >> $OUT
+MKE2FS_CONFIG=$CONF $MKE2FS -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have 64bit or metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.1 -b $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.2 $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc3 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back e2fsck >> $OUT
+$E2UNDO $TDB_FILE.3 $TMPFILE  >> $OUT 2>&1
+crc3_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck $crc3_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back tune2fs >> $OUT
+$E2UNDO $TDB_FILE.2 $TMPFILE  >> $OUT 2>&1
+crc2_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo tune2fs $crc2_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back resize2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE  >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo resize2fs $crc1_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have 64bit or metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ -n "${features}" ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have any features set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 && fail=1
+
+if [ $fail -eq 0 ] && [ $crc0 = $crc0_2 ] && [ $crc1 = $crc1_2 ] && [ $crc2 = $crc2_2 ] && [ $crc3 = $crc3_2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TDB_FILE.2 $TDB_FILE.3 $TMPFILE $CONF
+fi
diff --git a/tests/u_tune2fs/script b/tests/u_tune2fs/script
index a443f5a..aa5f379 100644
--- a/tests/u_tune2fs/script
+++ b/tests/u_tune2fs/script
@@ -1,7 +1,7 @@
 test_description="e2undo with tune2fs"
 if test -x $E2UNDO_EXE; then
 
-E2FSPROGS_UNDO_DIR=/tmp
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
 export E2FSPROGS_UNDO_DIR
 TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
 OUT=$test_name.log
@@ -19,7 +19,7 @@ $TUNE2FS -I 256 $TMPFILE  >> $OUT 2>&1
 new_crc=`$CRCSUM $TMPFILE`
 echo $CRCSUM after tune2fs $new_crc >> $OUT
 
-$E2UNDO_EXE  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
 new_crc=`$CRCSUM $TMPFILE`
 echo $CRCSUM after e2undo $new_crc >> $OUT
 
diff --git a/tests/u_tune2fs_opt/script b/tests/u_tune2fs_opt/script
new file mode 100644
index 0000000..c4810b9
--- /dev/null
+++ b/tests/u_tune2fs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with tune2fs -z"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 22/35] tests: test various features of the new e2undo format
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (20 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:44   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
                   ` (11 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Verify that the header, checksum, and wrong-order rollback detection
features of the new e2undo actually work.

v2: Collect more tests for the v2 of the e2undo flat file patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/u_compound_bad_rollback/script     |   62 ++++++++++++++++
 tests/u_corrupt_blk_csum/script          |   38 ++++++++++
 tests/u_corrupt_blk_csum_force/script    |   38 ++++++++++
 tests/u_corrupt_hdr_csum/script          |   37 ++++++++++
 tests/u_corrupt_key_csum/script          |   37 ++++++++++
 tests/u_dryrun/script                    |   32 ++++++++
 tests/u_errorout/script                  |   49 +++++++++++++
 tests/u_force/script                     |   40 ++++++++++
 tests/u_force_dryrun/script              |   38 ++++++++++
 tests/u_incomplete/script                |   38 ++++++++++
 tests/u_not_undo/script                  |   28 +++++++
 tests/u_onefile_bad/script               |  115 ++++++++++++++++++++++++++++++
 tests/u_revert_64bitmcsum_onefile/script |  112 +++++++++++++++++++++++++++++
 tests/u_revert_all_onefile/script        |  100 ++++++++++++++++++++++++++
 tests/u_undo_undo/script                 |   54 ++++++++++++++
 tests/u_wrong_fs/script                  |   36 +++++++++
 16 files changed, 854 insertions(+)
 create mode 100644 tests/u_compound_bad_rollback/script
 create mode 100644 tests/u_corrupt_blk_csum/script
 create mode 100644 tests/u_corrupt_blk_csum_force/script
 create mode 100644 tests/u_corrupt_hdr_csum/script
 create mode 100644 tests/u_corrupt_key_csum/script
 create mode 100644 tests/u_dryrun/script
 create mode 100644 tests/u_errorout/script
 create mode 100644 tests/u_force/script
 create mode 100644 tests/u_force_dryrun/script
 create mode 100644 tests/u_incomplete/script
 create mode 100644 tests/u_not_undo/script
 create mode 100644 tests/u_onefile_bad/script
 create mode 100644 tests/u_revert_64bitmcsum_onefile/script
 create mode 100644 tests/u_revert_all_onefile/script
 create mode 100644 tests/u_undo_undo/script
 create mode 100644 tests/u_wrong_fs/script


diff --git a/tests/u_compound_bad_rollback/script b/tests/u_compound_bad_rollback/script
new file mode 100644
index 0000000..f54da7f
--- /dev/null
+++ b/tests/u_compound_bad_rollback/script
@@ -0,0 +1,62 @@
+test_description="e2undo with mke2fs/tune2fs/resize2fs/e2fsck -z"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+echo compound e2undo rollback test > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc2 >> $OUT
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.2 $TMPFILE 512 >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc3 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+
+echo roll back tune2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE  >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo tune2fs $crc1_2 >> $OUT
+
+echo roll back resize2fs >> $OUT
+$E2UNDO $TDB_FILE.2 $TMPFILE  >> $OUT 2>&1
+crc2_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo resize2fs $crc2_2 >> $OUT
+
+echo roll back e2fsck >> $OUT
+$E2UNDO $TDB_FILE.3 $TMPFILE  >> $OUT 2>&1
+crc3_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck $crc3_2 >> $OUT
+
+if [ $crc4 = $crc0_2 ] && [ $crc4 = $crc1_2 ] && [ $crc4 = $crc2_2 ] && [ $crc3 = $crc3_2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TDB_FILE.2 $TDB_FILE.3 $TMPFILE
+fi
diff --git a/tests/u_corrupt_blk_csum/script b/tests/u_corrupt_blk_csum/script
new file mode 100644
index 0000000..ee16552
--- /dev/null
+++ b/tests/u_corrupt_blk_csum/script
@@ -0,0 +1,38 @@
+test_description="corrupt e2undo block data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE 2>/dev/null || stat -f '%z' $TDB_FILE 2>/dev/null) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 2)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $res -ne 0 ] && [ $crc2 = $crc1 ] && [ $crc2 != $crc0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_corrupt_blk_csum_force/script b/tests/u_corrupt_blk_csum_force/script
new file mode 100644
index 0000000..ba82726
--- /dev/null
+++ b/tests/u_corrupt_blk_csum_force/script
@@ -0,0 +1,38 @@
+test_description="force replay of corrupt e2undo block data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE 2>/dev/null || stat -f '%z' $TDB_FILE 2>/dev/null) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 2)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO -f $TDB_FILE $TMPFILE  >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc2 != $crc1 ] && [ $crc2 != $crc0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_corrupt_hdr_csum/script b/tests/u_corrupt_hdr_csum/script
new file mode 100644
index 0000000..32c38c8
--- /dev/null
+++ b/tests/u_corrupt_hdr_csum/script
@@ -0,0 +1,37 @@
+test_description="corrupt e2undo header"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+dd if=/dev/zero of=$TDB_FILE bs=256 count=1 seek=1 conv=notrunc > /dev/null 2>&1
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $res -ne 0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_corrupt_key_csum/script b/tests/u_corrupt_key_csum/script
new file mode 100644
index 0000000..d07556b
--- /dev/null
+++ b/tests/u_corrupt_key_csum/script
@@ -0,0 +1,37 @@
+test_description="corrupt e2undo key data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE 2>/dev/null || stat -f '%z' $TDB_FILE 2>/dev/null) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 1)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 != $crc1 ] && [ $crc1 = $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_dryrun/script b/tests/u_dryrun/script
new file mode 100644
index 0000000..b90ef47
--- /dev/null
+++ b/tests/u_dryrun/script
@@ -0,0 +1,32 @@
+test_description="e2undo dry run"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$E2UNDO -n $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc1 = $crc2 ] && [ $crc1 != $crc0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_errorout/script b/tests/u_errorout/script
new file mode 100644
index 0000000..20c53de
--- /dev/null
+++ b/tests/u_errorout/script
@@ -0,0 +1,49 @@
+test_description="e2undo a failed command"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+echo check that we cant append a bad undo file > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+CONF=$TMPFILE.conf
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = ^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode
+		blocksize = 4096
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1K
+		zero_hugefiles = false
+	}
+ENDL
+
+echo mke2fs -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 524288 >> $OUT
+MKE2FS_CONFIG=$CONF $MKE2FS -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 -d /etc/ $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc2 >> $OUT
+
+if [ $crc0 != $crc1 ] && [ $crc1 != $crc2 ] && [ $crc0 = $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TMPFILE $CONF
+fi
diff --git a/tests/u_force/script b/tests/u_force/script
new file mode 100644
index 0000000..ef39e24
--- /dev/null
+++ b/tests/u_force/script
@@ -0,0 +1,40 @@
+test_description="e2undo force"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+dd if=/dev/zero of=$TDB_FILE bs=4 count=1 seek=127 conv=notrunc 2> /dev/null
+
+$E2UNDO $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+$E2UNDO -f $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo -f $crc3 >> $OUT
+
+MUST_FSCK=$($DUMPE2FS $TMPFILE 2> /dev/null | grep 'Filesystem state:.*not clean' -c )
+
+if [ $MUST_FSCK -eq 1 ] && [ $crc0 != $crc3 ] && [ $crc1 = $crc2 ] && [ $crc2 != $crc0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_force_dryrun/script b/tests/u_force_dryrun/script
new file mode 100644
index 0000000..92d7624
--- /dev/null
+++ b/tests/u_force_dryrun/script
@@ -0,0 +1,38 @@
+test_description="force dry-run replay of corrupt e2undo block data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=${TMPDIR:-/tmp}
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE  >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE 2>/dev/null || stat -f '%z' $TDB_FILE 2>/dev/null) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 2)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO -f -n $TDB_FILE $TMPFILE  >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc2 = $crc1 ] && [ $crc2 != $crc0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_incomplete/script b/tests/u_incomplete/script
new file mode 100644
index 0000000..7bc7858
--- /dev/null
+++ b/tests/u_incomplete/script
@@ -0,0 +1,38 @@
+test_description="e2undo with incomplete undo file"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+UNDO_IO_SIMULATE_UNFINISHED=1 $TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$E2UNDO  $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+$FSCK -y $TMPFILE >> $OUT 2>&1
+fsck_res=$?
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after fsck $crc3 >> $OUT
+echo fsck result $fsck_res >> $OUT
+
+if [ $crc0 != $crc2 ] && [ $crc1 != $crc2 ] && [ $crc0 != $crc1 ] && [ $fsck_res -eq 1 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_not_undo/script b/tests/u_not_undo/script
new file mode 100644
index 0000000..2f07d1b
--- /dev/null
+++ b/tests/u_not_undo/script
@@ -0,0 +1,28 @@
+test_description="e2undo a non-undo file"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+dd if=/dev/zero of=$TDB_FILE bs=1k count=512 > /dev/null 2>&1
+
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before e2undo $crc0 > $OUT
+
+od -tx1 -Ad -c < $TDB_FILE >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc3 >> $OUT
+
+if [ $crc3 = $crc0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_onefile_bad/script b/tests/u_onefile_bad/script
new file mode 100644
index 0000000..60c9a2e
--- /dev/null
+++ b/tests/u_onefile_bad/script
@@ -0,0 +1,115 @@
+test_description="check that we cant append a bad undo file"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+fail=0
+
+echo check that we cant append a bad undo file > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+CONF=$TMPFILE.conf
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode
+		blocksize = 4096
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1M
+		zero_hugefiles = false
+	}
+ENDL
+
+echo mke2fs -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE 524288 >> $OUT
+MKE2FS_CONFIG=$CONF $MKE2FS -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have 64bit or metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.1 -b $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo corrupt $TDB_FILE.1 >> $OUT
+dd if=/dev/zero of=$TDB_FILE.1 bs=4096 count=1 skip=1 2> /dev/null
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc3 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back e2fsck/tune2fs/resize2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE  >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck/tune2fs/resize2fs $crc1_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+if [ $fail -eq 0 ] && [ $crc0 != $crc1 ] && [ $crc1 != $crc2 ] && [ $crc2 = $crc3 ] && [ $crc3 = $crc4 ] && [ $crc1_2 = $crc2 ] && [ $crc0_2 = $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TMPFILE $CONF
+fi
diff --git a/tests/u_revert_64bitmcsum_onefile/script b/tests/u_revert_64bitmcsum_onefile/script
new file mode 100644
index 0000000..f1d7c2b
--- /dev/null
+++ b/tests/u_revert_64bitmcsum_onefile/script
@@ -0,0 +1,112 @@
+test_description="convert fs to 64bit,metadata_csum and revert as one undo file"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+fail=0
+
+echo convert fs to 64bit,metadata_csum and revert both changes as one undo file > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+CONF=$TMPFILE.conf
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode
+		blocksize = 4096
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1M
+		zero_hugefiles = false
+	}
+ENDL
+
+echo mke2fs -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE 524288 >> $OUT
+MKE2FS_CONFIG=$CONF $MKE2FS -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have 64bit or metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.1 -b $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc3 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back e2fsck/tune2fs/resize2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE  >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck/tune2fs/resize2fs $crc1_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have 64bit or metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ -n "${features}" ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have any features set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 && fail=1
+
+if [ $fail -eq 0 ] && [ $crc0 = $crc0_2 ] && [ $crc1 = $crc1_2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TMPFILE $CONF
+fi
diff --git a/tests/u_revert_all_onefile/script b/tests/u_revert_all_onefile/script
new file mode 100644
index 0000000..27b3b23
--- /dev/null
+++ b/tests/u_revert_all_onefile/script
@@ -0,0 +1,100 @@
+test_description="convert fs to 64bit,metadata_csum and revert as one undo file"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+fail=0
+
+echo convert fs to 64bit,metadata_csum and revert both changes as one undo file > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+CONF=$TMPFILE.conf
+cat > $CONF << ENDL
+[fs_types]
+	ext4h = {
+		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode
+		blocksize = 4096
+		inode_size = 256
+		make_hugefiles = true
+		hugefiles_dir = /
+		hugefiles_slack = 0
+		hugefiles_name = aaaaa
+		hugefiles_digits = 4
+		hugefiles_size = 1M
+		zero_hugefiles = false
+	}
+ENDL
+
+echo mke2fs -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE 524288 >> $OUT
+MKE2FS_CONFIG=$CONF $MKE2FS -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have 64bit or metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.0 -b $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit but not metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc3 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should have 64bit and metadata_csum set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back e2fsck/tune2fs/resize2fs/mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE  >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck/tune2fs/resize2fs/mke2fs $crc1_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ -n "${features}" ]; then
+	echo "FS features: ${features}" >> $OUT
+	echo "Should not have any features set" >> $OUT
+	fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 && fail=1
+
+if [ $fail -eq 0 ] && [ $crc0 = $crc0_2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TMPFILE $CONF
+fi
diff --git a/tests/u_undo_undo/script b/tests/u_undo_undo/script
new file mode 100644
index 0000000..726a453
--- /dev/null
+++ b/tests/u_undo_undo/script
@@ -0,0 +1,54 @@
+test_description="undo e2undo"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/e2fsck-$(basename $TMPFILE).e2undo
+TDB_FILE2=${TMPDIR:-/tmp}/e2undo-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE $TDB_FILE2 >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before e2fsck $crc0 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc1 >> $OUT
+
+echo e2undo the e2fsck >> $OUT
+$E2UNDO -z $TDB_FILE2 $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+echo e2undo the e2undo >> $OUT
+$E2UNDO $TDB_FILE2 $TMPFILE  >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc3 >> $OUT
+
+echo e2undo the e2undo the e2undo >> $OUT
+$E2UNDO $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc4 >> $OUT
+
+$E2UNDO -h $TDB_FILE $TMPFILE >> $OUT 2>&1
+$E2UNDO -h $TDB_FILE2 $TMPFILE >> $OUT 2>&1
+
+$E2UNDO -z $TDB_FILE2 $TDB_FILE2 $TMPFILE >> $OUT 2>&1
+
+crc5=`$CRCSUM $TMPFILE`
+echo $CRCSUM after failed e2undo $crc5 >> $OUT
+
+echo $crc0 $crc1 $crc2 $crc3 $crc4 $crc5 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc2 = $crc4 ] && [ $crc5 = $crc4 ] && [ $crc1 = $crc3 ] && [ $crc1 != $crc2 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TDB_FILE2 $TMPFILE
+fi
diff --git a/tests/u_wrong_fs/script b/tests/u_wrong_fs/script
new file mode 100644
index 0000000..dbf0d6b
--- /dev/null
+++ b/tests/u_wrong_fs/script
@@ -0,0 +1,36 @@
+test_description="e2undo on the wrong fs"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=${TMPDIR:-/tmp}/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE  >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after re-mke2fs $crc2 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE  >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc3 >> $OUT
+
+if [ $crc3 = $crc2 ] && [ $crc2 != $crc1 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	ln -f $test_name.log $test_name.failed
+	echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (21 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:46   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
                   ` (10 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're creating hard links via ext2fs_link, the (misnamed?) flags
argument specifies the filetype for the directory entry.  This is
*derived* from i_mode, so provide a translator.  Otherwise, fsck will
complain about unset file types.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |   32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index a024d1c..3bc0515 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -37,6 +37,32 @@
 #define S_BLKSIZE 512
 #endif
 
+static int ext2_file_type(unsigned int mode)
+{
+	if (LINUX_S_ISREG(mode))
+		return EXT2_FT_REG_FILE;
+
+	if (LINUX_S_ISDIR(mode))
+		return EXT2_FT_DIR;
+
+	if (LINUX_S_ISCHR(mode))
+		return EXT2_FT_CHRDEV;
+
+	if (LINUX_S_ISBLK(mode))
+		return EXT2_FT_BLKDEV;
+
+	if (LINUX_S_ISLNK(mode))
+		return EXT2_FT_SYMLINK;
+
+	if (LINUX_S_ISFIFO(mode))
+		return EXT2_FT_FIFO;
+
+	if (LINUX_S_ISSOCK(mode))
+		return EXT2_FT_SOCK;
+
+	return 0;
+}
+
 /* Link an inode number to a directory */
 static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
 			  ext2_ino_t ino, const char *name)
@@ -50,14 +76,16 @@ static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
 		return retval;
 	}
 
-	retval = ext2fs_link(fs, parent_ino, name, ino, inode.i_flags);
+	retval = ext2fs_link(fs, parent_ino, name, ino,
+			     ext2_file_type(inode.i_mode));
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
 			com_err(__func__, retval, "while expanding directory");
 			return retval;
 		}
-		retval = ext2fs_link(fs, parent_ino, name, ino, inode.i_flags);
+		retval = ext2fs_link(fs, parent_ino, name, ino,
+				     ext2_file_type(inode.i_mode));
 	}
 	if (retval) {
 		com_err(__func__, retval, "while linking %s", name);


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (22 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:49   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
                   ` (9 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Rewrite the file copy-in algorithm to detect smaller holes in the
files we're copying in.  Use SEEK_DATA/SEEK_HOLE/FIEMAP when available
to skip known empty parts.  This fixes the particular bug where zeroed
blocks on a system with 64k pages are needlessly copied into a
4k-block filesystem.  It also saves time by skipping parts we know to
be zeroed.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |  280 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 196 insertions(+), 84 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index 3bc0515..8ab546e 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -9,12 +9,22 @@
  * %End-Header%
  */
 
+#define _FILE_OFFSET_BITS       64
+#define _LARGEFILE64_SOURCE     1
+#define _GNU_SOURCE		1
+
+#include "config.h"
 #include <time.h>
+#include <sys/types.h>
 #include <unistd.h>
 #include <limits.h> /* for PATH_MAX */
 #ifdef HAVE_ATTR_XATTR_H
 #include <attr/xattr.h>
 #endif
+#include <sys/ioctl.h>
+#include <ext2fs/ext2fs.h>
+#include <ext2fs/ext2_types.h>
+#include <ext2fs/fiemap.h>
 
 #include "create_inode.h"
 #include "nls-enable.h"
@@ -28,14 +38,7 @@
 #endif
 
 /* 64KiB is the minimium blksize to best minimize system call overhead. */
-#ifndef IO_BUFSIZE
-#define IO_BUFSIZE 64*1024
-#endif
-
-/* Block size for `st_blocks' */
-#ifndef S_BLKSIZE
-#define S_BLKSIZE 512
-#endif
+#define COPY_FILE_BUFLEN	65536
 
 static int ext2_file_type(unsigned int mode)
 {
@@ -139,10 +142,9 @@ static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *fil
 {
 #ifdef HAVE_LLISTXATTR
 	errcode_t			retval, close_retval;
-	struct ext2_inode		inode;
 	struct ext2_xattr_handle	*handle;
 	ssize_t				size, value_size;
-	char				*list;
+	char				*list = NULL;
 	int				i;
 
 	size = llistxattr(filename, NULL, 0);
@@ -382,82 +384,202 @@ try_again:
 	return retval;
 }
 
-static errcode_t copy_file(ext2_filsys fs, int fd, ext2_ino_t newfile,
-			   int bufsize, int make_holes)
+#if !defined HAVE_PREAD64 && !defined HAVE_PREAD
+static ssize_t my_pread(int fd, const void *buf, size_t count, off_t offset)
 {
-	ext2_file_t	e2_file;
-	errcode_t	retval, close_ret;
-	int		got;
-	unsigned int	written;
-	char		*buf;
-	char		*ptr;
-	char		*zero_buf;
-	int		cmp;
-
-	retval = ext2fs_file_open(fs, newfile,
-				  EXT2_FILE_WRITE, &e2_file);
-	if (retval)
-		return retval;
-
-	retval = ext2fs_get_mem(bufsize, &buf);
-	if (retval) {
-		com_err("copy_file", retval, "can't allocate buffer\n");
-		goto out_close;
-	}
+	if (lseek(fd, offset, SEEK_SET) < 0)
+		return 0;
 
-	/* This is used for checking whether the whole block is zero */
-	retval = ext2fs_get_memzero(bufsize, &zero_buf);
-	if (retval) {
-		com_err("copy_file", retval, "can't allocate zero buffer\n");
-		goto out_free_buf;
-	}
+	return read(fd, buf, count);
+}
+#endif /* !defined HAVE_PREAD64 && !defined HAVE_PREAD */
 
-	while (1) {
-		got = read(fd, buf, bufsize);
-		if (got == 0)
-			break;
+static errcode_t copy_file_range(ext2_filsys fs, int fd, ext2_file_t e2_file,
+				 off_t start, off_t end, char *buf,
+				 char *zerobuf)
+{
+	off_t off, bpos;
+	ssize_t got, blen;
+	unsigned int written;
+	char *ptr;
+	errcode_t err = 0;
+
+	for (off = start; off < end; off += COPY_FILE_BUFLEN) {
+#ifdef HAVE_PREAD64
+		got = pread64(fd, buf, COPY_FILE_BUFLEN, off);
+#elif HAVE_PREAD
+		got = pread(fd, buf, COPY_FILE_BUFLEN, off);
+#else
+		got = my_pread(fd, buf, COPY_FILE_BUFLEN, off);
+#endif
 		if (got < 0) {
-			retval = errno;
+			err = errno;
 			goto fail;
 		}
-		ptr = buf;
-
-		/* Sparse copy */
-		if (make_holes) {
-			/* Check whether all is zero */
-			cmp = memcmp(ptr, zero_buf, got);
-			if (cmp == 0) {
-				 /* The whole block is zero, make a hole */
-				retval = ext2fs_file_lseek(e2_file, got,
-							   EXT2_SEEK_CUR,
-							   NULL);
-				if (retval)
+		for (bpos = 0, ptr = buf; bpos < got; bpos += fs->blocksize) {
+			blen = fs->blocksize;
+			if (blen > got - bpos)
+				blen = got - bpos;
+			if (memcmp(ptr, zerobuf, blen) == 0) {
+				ptr += blen;
+				continue;
+			}
+			err = ext2fs_file_lseek(e2_file, off + bpos,
+						EXT2_SEEK_SET, NULL);
+			if (err)
+				goto fail;
+			while (blen > 0) {
+				err = ext2fs_file_write(e2_file, ptr, blen,
+							&written);
+				if (err)
+					goto fail;
+				if (written == 0) {
+					err = EIO;
 					goto fail;
-				got = 0;
+				}
+				blen -= written;
+				ptr += written;
 			}
 		}
+	}
+fail:
+	return err;
+}
 
-		/* Normal copy */
-		while (got > 0) {
-			retval = ext2fs_file_write(e2_file, ptr,
-						   got, &written);
-			if (retval)
-				goto fail;
-
-			got -= written;
-			ptr += written;
+static errcode_t try_lseek_copy(ext2_filsys fs, int fd, struct stat *statbuf,
+				ext2_file_t e2_file, char *buf, char *zerobuf)
+{
+#if defined(SEEK_DATA) && defined(SEEK_HOLE)
+	off_t data = 0, hole;
+	off_t data_blk, hole_blk;
+	errcode_t err;
+
+	/* Try to use SEEK_DATA and SEEK_HOLE */
+	while (data < statbuf->st_size) {
+		data = lseek(fd, data, SEEK_DATA);
+		if (data < 0) {
+			if (errno == ENXIO)
+				break;
+			return EXT2_ET_UNIMPLEMENTED;
 		}
+		hole = lseek(fd, data, SEEK_HOLE);
+		if (hole < 0)
+			return EXT2_ET_UNIMPLEMENTED;
+
+		data_blk = data & ~(fs->blocksize - 1);
+		hole_blk = (hole + (fs->blocksize - 1)) & ~(fs->blocksize - 1);
+		err = copy_file_range(fs, fd, e2_file, data_blk, hole_blk, buf,
+				      zerobuf);
+		if (err)
+			return err;
+
+		data = hole;
 	}
 
-fail:
-	ext2fs_free_mem(&zero_buf);
-out_free_buf:
+	return err;
+#else
+	return EXT2_ET_UNIMPLEMENTED;
+#endif /* SEEK_DATA and SEEK_HOLE */
+}
+
+static errcode_t try_fiemap_copy(ext2_filsys fs, int fd, ext2_file_t e2_file,
+				 char *buf, char *zerobuf)
+{
+#if defined(FS_IOC_FIEMAP)
+#define EXTENT_MAX_COUNT 512
+	struct fiemap *fiemap_buf;
+	struct fiemap_extent *ext_buf, *ext;
+	int ext_buf_size, fie_buf_size;
+	off_t pos = 0;
+	unsigned int i;
+	errcode_t err;
+
+	ext_buf_size = EXTENT_MAX_COUNT * sizeof(struct fiemap_extent);
+	fie_buf_size = sizeof(struct fiemap) + ext_buf_size;
+
+	err = ext2fs_get_memzero(fie_buf_size, &fiemap_buf);
+	if (err)
+		return err;
+
+	ext_buf = fiemap_buf->fm_extents;
+	memset(fiemap_buf, 0, fie_buf_size);
+	fiemap_buf->fm_length = FIEMAP_MAX_OFFSET;
+	fiemap_buf->fm_flags |= FIEMAP_FLAG_SYNC;
+	fiemap_buf->fm_extent_count = EXTENT_MAX_COUNT;
+
+	do {
+		fiemap_buf->fm_start = pos;
+		memset(ext_buf, 0, ext_buf_size);
+		err = ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
+		if (err < 0 && (errno == EOPNOTSUPP || errno == ENOTTY)) {
+			err = EXT2_ET_UNIMPLEMENTED;
+			goto out;
+		} else if (err < 0 || fiemap_buf->fm_mapped_extents == 0) {
+			err = errno;
+			goto out;
+		}
+		for (i = 0, ext = ext_buf; i < fiemap_buf->fm_mapped_extents;
+		     i++, ext++) {
+			err = copy_file_range(fs, fd, e2_file, ext->fe_logical,
+					      ext->fe_logical + ext->fe_length,
+					      buf, zerobuf);
+			if (err)
+				goto out;
+		}
+
+		ext--;
+		/* Record file's logical offset this time */
+		pos = ext->fe_logical + ext->fe_length;
+		/*
+		 * If fm_extents array has been filled and
+		 * there are extents left, continue to cycle.
+		 */
+	} while (fiemap_buf->fm_mapped_extents == EXTENT_MAX_COUNT &&
+		 !(ext->fe_flags & FIEMAP_EXTENT_LAST));
+out:
+	ext2fs_free_mem(&fiemap_buf);
+	return err;
+#else
+	return EXT2_ET_UNIMPLEMENTED;
+#endif /* FS_IOC_FIEMAP */
+}
+
+static errcode_t copy_file(ext2_filsys fs, int fd, struct stat *statbuf,
+			   ext2_ino_t ino)
+{
+	ext2_file_t e2_file;
+	char *buf = NULL, *zerobuf = NULL;
+	errcode_t err, close_err;
+
+	err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &e2_file);
+	if (err)
+		return err;
+
+	err = ext2fs_get_mem(COPY_FILE_BUFLEN, &buf);
+	if (err)
+		goto out;
+
+	err = ext2fs_get_memzero(fs->blocksize, &zerobuf);
+	if (err)
+		goto out;
+
+	err = try_lseek_copy(fs, fd, statbuf, e2_file, buf, zerobuf);
+	if (err != EXT2_ET_UNIMPLEMENTED)
+		goto out;
+
+	err = try_fiemap_copy(fs, fd, e2_file, buf, zerobuf);
+	if (err != EXT2_ET_UNIMPLEMENTED)
+		goto out;
+
+	err = copy_file_range(fs, fd, e2_file, 0, statbuf->st_size, buf,
+			      zerobuf);
+out:
+	ext2fs_free_mem(&zerobuf);
 	ext2fs_free_mem(&buf);
-out_close:
-	close_ret = ext2fs_file_close(e2_file);
-	if (retval == 0)
-		retval = close_ret;
-	return retval;
+	close_err = ext2fs_file_close(e2_file);
+	if (err == 0)
+		err = close_err;
+	return err;
 }
 
 static int is_hardlink(struct hdlinks_s *hdlinks, dev_t dev, ino_t ino)
@@ -481,8 +603,6 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 	ext2_ino_t	newfile;
 	errcode_t	retval;
 	struct ext2_inode inode;
-	int		bufsize = IO_BUFSIZE;
-	int		make_holes = 0;
 
 	fd = ext2fs_open_file(src, O_RDONLY, 0);
 	if (fd < 0) {
@@ -570,17 +690,9 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 		}
 	}
 	if (LINUX_S_ISREG(inode.i_mode)) {
-		if (statbuf.st_blocks < statbuf.st_size / S_BLKSIZE) {
-			make_holes = 1;
-			/*
-			 * Use I/O blocksize as buffer size when
-			 * copying sparse files.
-			 */
-			bufsize = statbuf.st_blksize;
-		}
-		retval = copy_file(fs, fd, newfile, bufsize, make_holes);
+		retval = copy_file(fs, fd, &statbuf, newfile);
 		if (retval)
-			com_err("copy_file", retval, 0);
+			com_err("copy_file", retval, _("while copying %s"), src);
 	}
 	close(fd);
 


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 25/35] copyin: fix error handling
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (23 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:51   ` Theodore Ts'o
  2015-04-02  2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
                   ` (8 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Save errno (in retval) before doing anything else, because the
"anything else" (usually com_err()) can call library functions, which
will reset errno.

Fix the error messages to use the message catalog, and don't _ever_
print an error without providing context.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c            |  212 ++++++++++++++++++++--------------------
 misc/mke2fs.c                  |    2 
 tests/f_create_symlinks/expect |    4 -
 3 files changed, 107 insertions(+), 111 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index 8ab546e..7d3c9e6 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -75,7 +75,7 @@ static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
 
 	retval = ext2fs_read_inode(fs, ino, &inode);
         if (retval) {
-		com_err(__func__, retval, "while reading inode %u", ino);
+		com_err(__func__, retval, _("while reading inode %u"), ino);
 		return retval;
 	}
 
@@ -84,14 +84,15 @@ static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
-			com_err(__func__, retval, "while expanding directory");
+			com_err(__func__, retval,
+				_("while expanding directory"));
 			return retval;
 		}
 		retval = ext2fs_link(fs, parent_ino, name, ino,
 				     ext2_file_type(inode.i_mode));
 	}
 	if (retval) {
-		com_err(__func__, retval, "while linking %s", name);
+		com_err(__func__, retval, _("while linking \"%s\""), name);
 		return retval;
 	}
 
@@ -99,24 +100,11 @@ static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
 
 	retval = ext2fs_write_inode(fs, ino, &inode);
 	if (retval)
-		com_err(__func__, retval, "while writing inode %u", ino);
+		com_err(__func__, retval, _("while writing inode %u"), ino);
 
 	return retval;
 }
 
-/* Fill the uid, gid, mode and time for the inode */
-static void fill_inode(struct ext2_inode *inode, struct stat *st)
-{
-	if (st != NULL) {
-		inode->i_uid = st->st_uid;
-		inode->i_gid = st->st_gid;
-		inode->i_mode |= st->st_mode;
-		inode->i_atime = st->st_atime;
-		inode->i_mtime = st->st_mtime;
-		inode->i_ctime = st->st_ctime;
-	}
-}
-
 /* Set the uid, gid, mode and time for the inode */
 static errcode_t set_inode_extra(ext2_filsys fs, ext2_ino_t cwd,
 				 ext2_ino_t ino, struct stat *st)
@@ -126,19 +114,25 @@ static errcode_t set_inode_extra(ext2_filsys fs, ext2_ino_t cwd,
 
 	retval = ext2fs_read_inode(fs, ino, &inode);
         if (retval) {
-		com_err(__func__, retval, "while reading inode %u", ino);
+		com_err(__func__, retval, _("while reading inode %u"), ino);
 		return retval;
 	}
 
-	fill_inode(&inode, st);
+	inode.i_uid = st->st_uid;
+	inode.i_gid = st->st_gid;
+	inode.i_mode |= st->st_mode;
+	inode.i_atime = st->st_atime;
+	inode.i_mtime = st->st_mtime;
+	inode.i_ctime = st->st_ctime;
 
 	retval = ext2fs_write_inode(fs, ino, &inode);
 	if (retval)
-		com_err(__func__, retval, "while writing inode %u", ino);
+		com_err(__func__, retval, _("while writing inode %u"), ino);
 	return retval;
 }
 
-static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *filename)
+static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino,
+				 const char *filename)
 {
 #ifdef HAVE_LLISTXATTR
 	errcode_t			retval, close_retval;
@@ -149,8 +143,10 @@ static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *fil
 
 	size = llistxattr(filename, NULL, 0);
 	if (size == -1) {
-		com_err(__func__, errno, "llistxattr failed on %s", filename);
-		return errno;
+		retval = errno;
+		com_err(__func__, retval, _("while listing attributes of \"%s\""),
+			filename);
+		return retval;
 	} else if (size == 0) {
 		return 0;
 	}
@@ -159,20 +155,21 @@ static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *fil
 	if (retval) {
 		if (retval == EXT2_ET_MISSING_EA_FEATURE)
 			return 0;
-		com_err(__func__, retval, "while opening inode %u", ino);
+		com_err(__func__, retval, _("while opening inode %u"), ino);
 		return retval;
 	}
 
 	retval = ext2fs_get_mem(size, &list);
 	if (retval) {
-		com_err(__func__, retval, "whilst allocating memory");
+		com_err(__func__, retval, _("while allocating memory"));
 		goto out;
 	}
 
 	size = llistxattr(filename, list, size);
 	if (size == -1) {
-		com_err(__func__, errno, "llistxattr failed on %s", filename);
 		retval = errno;
+		com_err(__func__, retval, _("while listing attributes of \"%s\""),
+			filename);
 		goto out;
         }
 
@@ -182,24 +179,26 @@ static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *fil
 
 		value_size = getxattr(filename, name, NULL, 0);
 		if (value_size == -1) {
-			com_err(__func__, errno, "getxattr failed on %s",
-				filename);
 			retval = errno;
+			com_err(__func__, retval,
+				_("while reading attribute \"%s\" of \"%s\""),
+				name, filename);
 			break;
 		}
 
 		retval = ext2fs_get_mem(value_size, &value);
 		if (retval) {
-			com_err(__func__, retval, "whilst allocating memory");
+			com_err(__func__, retval, _("while allocating memory"));
 			break;
 		}
 
 		value_size = getxattr(filename, name, value, value_size);
 		if (value_size == -1) {
 			ext2fs_free_mem(&value);
-			com_err(__func__, errno, "getxattr failed on %s",
-				filename);
 			retval = errno;
+			com_err(__func__, retval,
+				_("while reading attribute \"%s\" of \"%s\""),
+				name, filename);
 			break;
 		}
 
@@ -207,7 +206,8 @@ static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *fil
 		ext2fs_free_mem(&value);
 		if (retval) {
 			com_err(__func__, retval,
-				"while writing xattr %u", ino);
+				_("while writing attribute \"%s\" to inode %u"),
+				name, ino);
 			break;
 		}
 
@@ -216,7 +216,7 @@ static errcode_t set_inode_xattr(ext2_filsys fs, ext2_ino_t ino, const char *fil
 	ext2fs_free_mem(&list);
 	close_retval = ext2fs_xattrs_close(&handle);
 	if (close_retval) {
-		com_err(__func__, retval, "while closing inode %u", ino);
+		com_err(__func__, retval, _("while closing inode %u"), ino);
 		retval = retval ? retval : close_retval;
 	}
 	return retval;
@@ -256,13 +256,10 @@ errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
 		return EXT2_ET_INVALID_ARGUMENT;
 	}
 
-	if (!(fs->flags & EXT2_FLAG_RW)) {
-		com_err(__func__, 0, "Filesystem opened read/only");
-		return EROFS;
-	}
 	retval = ext2fs_new_inode(fs, cwd, 010755, 0, &ino);
 	if (retval) {
-		com_err(__func__, retval, 0);
+		com_err(__func__, retval, _("while allocating inode \"%s\""),
+			name);
 		return retval;
 	}
 
@@ -273,13 +270,14 @@ errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(fs, cwd);
 		if (retval) {
-			com_err(__func__, retval, "while expanding directory");
+			com_err(__func__, retval,
+				_("while expanding directory"));
 			return retval;
 		}
 		retval = ext2fs_link(fs, cwd, name, ino, filetype);
 	}
 	if (retval) {
-		com_err(name, retval, 0);
+		com_err(name, retval, _("while creating inode \"%s\""), name);
 		return retval;
 	}
 	if (ext2fs_test_inode_bitmap2(fs->inode_map, ino))
@@ -307,7 +305,7 @@ errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
 
 	retval = ext2fs_write_new_inode(fs, ino, &inode);
 	if (retval)
-		com_err(__func__, retval, "while creating inode %u", ino);
+		com_err(__func__, retval, _("while writing inode %u"), ino);
 
 	return retval;
 }
@@ -332,19 +330,19 @@ errcode_t do_symlink_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
 	} else
 		parent_ino = cwd;
 
-try_again:
 	retval = ext2fs_symlink(fs, parent_ino, 0, name, target);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
 			com_err("do_symlink_internal", retval,
-				"while expanding directory");
+				_("while expanding directory"));
 			return retval;
 		}
-		goto try_again;
+		retval = ext2fs_symlink(fs, parent_ino, 0, name, target);
 	}
 	if (retval)
-		com_err("ext2fs_symlink", retval, 0);
+		com_err("ext2fs_symlink", retval,
+			_("while creating symlink \"%s\""), name);
 	return retval;
 }
 
@@ -362,25 +360,27 @@ errcode_t do_mkdir_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
 		*cp = 0;
 		retval = ext2fs_namei(fs, root, cwd, name, &parent_ino);
 		if (retval) {
-			com_err(name, retval, 0);
+			com_err(name, retval, _("while looking up \"%s\""),
+				name);
 			return retval;
 		}
 		name = cp+1;
 	} else
 		parent_ino = cwd;
 
-try_again:
 	retval = ext2fs_mkdir(fs, parent_ino, 0, name);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
-			com_err(__func__, retval, "while expanding directory");
+			com_err(__func__, retval,
+				_("while expanding directory"));
 			return retval;
 		}
-		goto try_again;
+		retval = ext2fs_mkdir(fs, parent_ino, 0, name);
 	}
 	if (retval)
-		com_err("ext2fs_mkdir", retval, 0);
+		com_err("ext2fs_mkdir", retval,
+			_("while creating directory \"%s\""), name);
 	return retval;
 }
 
@@ -606,27 +606,25 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 
 	fd = ext2fs_open_file(src, O_RDONLY, 0);
 	if (fd < 0) {
-		com_err(src, errno, 0);
-		return errno;
+		retval = errno;
+		com_err(__func__, retval, _("while opening \"%s\" to copy"),
+			src);
+		return retval;
 	}
 	if (fstat(fd, &statbuf) < 0) {
-		com_err(src, errno, 0);
-		close(fd);
-		return errno;
+		retval = errno;
+		goto out;
 	}
 
 	retval = ext2fs_namei(fs, root, cwd, dest, &newfile);
 	if (retval == 0) {
-		close(fd);
-		return EXT2_ET_FILE_EXISTS;
+		retval = EXT2_ET_FILE_EXISTS;
+		goto out;
 	}
 
 	retval = ext2fs_new_inode(fs, cwd, 010755, 0, &newfile);
-	if (retval) {
-		com_err(__func__, retval, 0);
-		close(fd);
-		return retval;
-	}
+	if (retval)
+		goto out;
 #ifdef DEBUGFS
 	printf("Allocated inode: %u\n", newfile);
 #endif
@@ -634,19 +632,13 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 				EXT2_FT_REG_FILE);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(fs, cwd);
-		if (retval) {
-			com_err(__func__, retval, "while expanding directory");
-			close(fd);
-			return retval;
-		}
+		if (retval)
+			goto out;
 		retval = ext2fs_link(fs, cwd, dest, newfile,
 					EXT2_FT_REG_FILE);
 	}
-	if (retval) {
-		com_err(dest, retval, 0);
-		close(fd);
-		return errno;
-	}
+	if (retval)
+		goto out;
 	if (ext2fs_test_inode_bitmap2(fs->inode_map, newfile))
 		com_err(__func__, 0, "Warning: inode already set");
 	ext2fs_inode_alloc_stats2(fs, newfile, +1, 0);
@@ -656,11 +648,8 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 		fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
 	retval = ext2fs_inode_size_set(fs, &inode, statbuf.st_size);
-	if (retval) {
-		com_err(dest, retval, 0);
-		close(fd);
-		return retval;
-	}
+	if (retval)
+		goto out;
 	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				      EXT4_FEATURE_INCOMPAT_INLINE_DATA)) {
 		inode.i_flags |= EXT4_INLINE_DATA_FL;
@@ -671,31 +660,25 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 		inode.i_flags &= ~EXT4_EXTENTS_FL;
 		retval = ext2fs_extent_open2(fs, newfile, &inode, &handle);
 		if (retval)
-			return retval;
+			goto out;
 		ext2fs_extent_free(handle);
 	}
 
 	retval = ext2fs_write_new_inode(fs, newfile, &inode);
-	if (retval) {
-		com_err(__func__, retval, "while creating inode %u", newfile);
-		close(fd);
-		return retval;
-	}
+	if (retval)
+		goto out;
 	if (inode.i_flags & EXT4_INLINE_DATA_FL) {
 		retval = ext2fs_inline_data_init(fs, newfile);
-		if (retval) {
-			com_err("copy_file", retval, 0);
-			close(fd);
-			return retval;
-		}
+		if (retval)
+			goto out;
 	}
 	if (LINUX_S_ISREG(inode.i_mode)) {
 		retval = copy_file(fs, fd, &statbuf, newfile);
 		if (retval)
-			com_err("copy_file", retval, _("while copying %s"), src);
+			goto out;
 	}
+out:
 	close(fd);
-
 	return retval;
 }
 
@@ -716,16 +699,18 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 	int		hdlink;
 
 	if (chdir(source_dir) < 0) {
-		com_err(__func__, errno,
+		retval = errno;
+		com_err(__func__, retval,
 			_("while changing working directory to \"%s\""),
 			source_dir);
-		return errno;
+		return retval;
 	}
 
 	if (!(dh = opendir("."))) {
-		com_err(__func__, errno,
+		retval = errno;
+		com_err(__func__, retval,
 			_("while opening directory \"%s\""), source_dir);
-		return errno;
+		return retval;
 	}
 
 	while ((dent = readdir(dh))) {
@@ -733,7 +718,8 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 		    (!strcmp(dent->d_name, "..")))
 			continue;
 		if (lstat(dent->d_name, &st)) {
-			com_err(__func__, errno, _("while lstat \"%s\""),
+			retval = errno;
+			com_err(__func__, retval, _("while lstat \"%s\""),
 				dent->d_name);
 			goto out;
 		}
@@ -775,10 +761,10 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			read_cnt = readlink(name, ln_target,
 					    sizeof(ln_target) - 1);
 			if (read_cnt == -1) {
-				com_err(__func__, errno,
-					_("while trying to readlink \"%s\""),
-					name);
 				retval = errno;
+				com_err(__func__, retval,
+					_("while trying to read link \"%s\""),
+					name);
 				goto out;
 			}
 			ln_target[read_cnt] = '\0';
@@ -801,6 +787,10 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			}
 			break;
 		case S_IFDIR:
+			/* Don't choke on /lost+found */
+			if (parent_ino == EXT2_ROOT_INO &&
+			    strcmp(name, "lost+found") == 0)
+				goto find_lnf;
 			retval = do_mkdir_internal(fs, parent_ino, name, &st,
 						   root);
 			if (retval) {
@@ -808,6 +798,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 					_("while making dir \"%s\""), name);
 				goto out;
 			}
+find_lnf:
 			retval = ext2fs_namei(fs, root, parent_ino,
 					      name, &ino);
 			if (retval) {
@@ -816,14 +807,12 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			}
 			/* Populate the dir recursively*/
 			retval = __populate_fs(fs, ino, name, root, hdlinks);
-			if (retval) {
-				com_err(__func__, retval,
-					_("while adding dir \"%s\""), name);
+			if (retval)
 				goto out;
-			}
 			if (chdir("..")) {
-				com_err(__func__, errno, _("during cd .."));
 				retval = errno;
+				com_err(__func__, retval,
+					_("while changing directory"));
 				goto out;
 			}
 			break;
@@ -834,7 +823,8 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 
 		retval =  ext2fs_namei(fs, root, parent_ino, name, &ino);
 		if (retval) {
-			com_err(name, retval, 0);
+			com_err(name, retval, _("while looking up \"%s\""),
+				name);
 			goto out;
 		}
 
@@ -864,9 +854,9 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 						(hdlinks->size + HDLINK_CNT) *
 						sizeof(struct hdlink_s));
 				if (p == NULL) {
-					com_err(name, errno,
-						_("Not enough memory"));
 					retval = EXT2_ET_NO_MEMORY;
+					com_err(name, retval,
+						_("while saving inode data"));
 					goto out;
 				}
 				hdlinks->hdl = p;
@@ -890,12 +880,18 @@ errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 	struct hdlinks_s hdlinks;
 	errcode_t retval;
 
+	if (!(fs->flags & EXT2_FLAG_RW)) {
+		com_err(__func__, 0, "Filesystem opened readonly");
+		return EROFS;
+	}
+
 	hdlinks.count = 0;
 	hdlinks.size = HDLINK_CNT;
 	hdlinks.hdl = realloc(NULL, hdlinks.size * sizeof(struct hdlink_s));
 	if (hdlinks.hdl == NULL) {
-		com_err(__func__, errno, "Not enough memory");
-		return errno;
+		retval = errno;
+		com_err(__func__, retval, _("while allocating memory"));
+		return retval;
 	}
 
 	retval = __populate_fs(fs, parent_ino, source_dir, root, &hdlinks);
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index f5ef703..8928d23 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -3141,7 +3141,7 @@ no_journal:
 				     EXT2_ROOT_INO);
 		if (retval) {
 			com_err(program_name, retval, "%s",
-				_("\nError while populating file system\n"));
+				_("while populating file system"));
 			exit(1);
 		} else if (!quiet)
 			printf("%s", _("done\n"));
diff --git a/tests/f_create_symlinks/expect b/tests/f_create_symlinks/expect
index 096495c..47fa468 100644
--- a/tests/f_create_symlinks/expect
+++ b/tests/f_create_symlinks/expect
@@ -11,10 +11,10 @@ debugfs -R "symlink /l_70 /xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 debugfs -R "symlink /l_500 /xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" test.img
 debugfs -R "symlink /l_1023 /xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx!
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" test.img
 debugfs -R "symlink /l_1024 /xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx!
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" test.img
-ext2fs_symlink: Invalid argument passed to ext2 library 
+ext2fs_symlink: Invalid argument passed to ext2 library while creating symlink "l_1024"
 symlink: Invalid argument passed to ext2 library 
 debugfs -R "symlink /l_1500 /xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx!
 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" test.img
-ext2fs_symlink: Invalid argument passed to ext2 library 
+ext2fs_symlink: Invalid argument passed to ext2 library while creating symlink "l_1500"
 symlink: Invalid argument passed to ext2 library 
 debugfs -R "stat /l_30" test.img
 Inode: 12   Type: symlink    Mode:  0777   Flags: 0x0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (24 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
@ 2015-04-02  2:36 ` Darrick J. Wong
  2015-05-05 14:52   ` Theodore Ts'o
  2015-04-02  2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
                   ` (7 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:36 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Add some simple tests for mke2fs -d (create image from dir) and make
the manpage options appear in alphabetic order.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.8.in              |   15 ++-
 tests/m_devdir/script         |   33 ++++++
 tests/m_minrootdir/expect     |  216 +++++++++++++++++++++++++++++++++++++++++
 tests/m_minrootdir/output.sed |    5 +
 tests/m_minrootdir/script     |   81 +++++++++++++++
 tests/m_rootdir/expect        |  117 ++++++++++++++++++++++
 tests/m_rootdir/output.sed    |    5 +
 tests/m_rootdir/script        |   71 +++++++++++++
 8 files changed, 536 insertions(+), 7 deletions(-)
 create mode 100644 tests/m_devdir/script
 create mode 100644 tests/m_minrootdir/expect
 create mode 100644 tests/m_minrootdir/output.sed
 create mode 100644 tests/m_minrootdir/script
 create mode 100644 tests/m_rootdir/expect
 create mode 100644 tests/m_rootdir/output.sed
 create mode 100644 tests/m_rootdir/script


diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index 3230f65..40c40d3 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -18,6 +18,10 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
 .I block-size
 ]
 [
+.B \-d
+.I root-directory
+]
+[
 .B \-D
 ]
 [
@@ -52,10 +56,6 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
 .I number-of-inodes
 ]
 [
-.B \-d
-.I root-directory
-]
-[
 .B \-n
 ]
 [
@@ -237,6 +237,10 @@ enabled.  (See the
 man page for more details about bigalloc.)   The default cluster size if
 bigalloc is enabled is 16 times the block size.
 .TP
+.BI \-d " root-directory"
+Copy the contents of the given directory into the root directory of the
+filesystem.
+.TP
 .B \-D
 Use direct I/O when writing to the disk.  This avoids mke2fs dirtying a
 lot of buffer cache memory, which may impact other applications running
@@ -589,9 +593,6 @@ the
 ratio).  This allows the user to specify the number
 of desired inodes directly.
 .TP
-.BI \-d " root-directory"
-Add the files from the root-directory to the filesystem.
-.TP
 .BI \-o " creator-os"
 Overrides the default value of the "creator operating system" field of the
 filesystem.  The creator field is set by default to the name of the OS the
diff --git a/tests/m_devdir/script b/tests/m_devdir/script
new file mode 100644
index 0000000..5f26699
--- /dev/null
+++ b/tests/m_devdir/script
@@ -0,0 +1,33 @@
+if test -x $DEBUGFS_EXE; then
+
+test_description="create fs image from /dev"
+MKFS_DIR=/dev
+OUT=$test_name.log
+
+$MKE2FS -q -F -o Linux -T ext4 -O metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -d $MKFS_DIR $TMPFILE 16384 > $OUT 2>&1
+mkfs_status=$?
+
+$DUMPE2FS $TMPFILE >> $OUT 2>&1
+$DEBUGFS -R 'ls /' $TMPFILE >> $OUT 2>&1
+
+$FSCK -f -n $TMPFILE >> $OUT 2>&1
+fsck_status=$?
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" < $OUT > $OUT.tmp
+mv $OUT.tmp $OUT
+
+if [ $mkfs_status -ne 0 ]; then
+	echo "$test_name: $test_description: skipped"
+elif [ $mkfs_status -eq 0 ] && [ $fsck_status -eq 0 ]; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+        echo "$test_name: $test_description: failed"
+fi
+
+rm -rf $TMPFILE.cmd $OUT.sed
+unset MKFS_DIR OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/m_minrootdir/expect b/tests/m_minrootdir/expect
new file mode 100644
index 0000000..41a713f
--- /dev/null
+++ b/tests/m_minrootdir/expect
@@ -0,0 +1,216 @@
+create fs
+Filesystem volume name:   <none>
+Last mounted on:          <not available>
+Filesystem magic number:  0xEF53
+Filesystem revision #:    1 (dynamic)
+Filesystem features:      ext_attr dir_index filetype extent 64bit flex_bg sparse_super huge_file dir_nlink extra_isize metadata_csum
+Default mount options:    (none)
+Filesystem state:         clean
+Errors behavior:          Continue
+Filesystem OS type:       Linux
+Inode count:              1024
+Block count:              16384
+Reserved block count:     819
+Free blocks:              16065
+Free inodes:              1006
+First block:              1
+Block size:               1024
+Fragment size:            1024
+Group descriptor size:    64
+Blocks per group:         8192
+Fragments per group:      8192
+Inodes per group:         512
+Inode blocks per group:   128
+Flex block group size:    16
+Mount count:              0
+Check interval:           15552000 (6 months)
+Reserved blocks uid:      0
+Reserved blocks gid:      0
+First inode:              11
+Inode size:	          256
+Required extra isize:     28
+Desired extra isize:      28
+Default directory hash:   half_md4
+Checksum type:            crc32c
+
+
+Group 0: (Blocks 1-8192)
+  Primary superblock at 1, Group descriptors at 2-2
+  Block bitmap at 3 (+2)
+  Inode bitmap at 5 (+4)
+  Inode table at 7-134 (+6)
+  7876 free blocks, 494 free inodes, 4 directories, 494 unused inodes
+  Free blocks: 317-8192
+  Free inodes: 19-512
+Group 1: (Blocks 8193-16383) [INODE_UNINIT]
+  Backup superblock at 8193, Group descriptors at 8194-8194
+  Block bitmap at 4 (bg #0 + 3)
+  Inode bitmap at 6 (bg #0 + 5)
+  Inode table at 135-262 (bg #0 + 134)
+  8189 free blocks, 512 free inodes, 0 directories, 512 unused inodes
+  Free blocks: 8195-16383
+  Free inodes: 513-1024
+debugfs: stat /emptyfile
+Inode: III   Type: regular    
+Size: 0
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /bigfile
+Inode: III   Type: regular    
+Size: 32768
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /sparsefile
+Inode: III   Type: regular    
+Size: 1073741825
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /bigzerofile
+Inode: III   Type: regular    
+Size: 1073741825
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /fifo
+debugfs: stat /emptydir
+Inode: III   Type: directory    
+Size: 1024
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /dir
+Inode: III   Type: directory    
+Size: 1024
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /dir/file
+Inode: III   Type: regular    
+Size: 8
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: ex /emptyfile
+Level Entries       Logical      Physical Length Flags
+debugfs: ex /bigfile
+Level Entries       Logical      Physical Length Flags
+X 0/0 1/1 0-31 AAA-BBB 32 
+debugfs: ex /sparsefile
+Level Entries           Logical      Physical Length Flags
+Y 0/1 1/1 1-1048576 AAA 1048576
+X 1/1 1/5 1-1 AAA-BBB 1 
+X 1/1 2/5 512-512 AAA-BBB 1 
+X 1/1 3/5 1024-1024 AAA-BBB 1 
+X 1/1 4/5 524288-524288 AAA-BBB 1 
+X 1/1 5/5 1048576-1048576 AAA-BBB 1 
+debugfs: ex /bigzerofile
+Level Entries           Logical      Physical Length Flags
+debugfs: ex /dir
+Level Entries       Logical      Physical Length Flags
+X 0/0 1/1 0-0 AAA-BBB 1 
+debugfs: ex /dir/file
+Level Entries       Logical      Physical Length Flags
+X 0/0 1/1 0-0 AAA-BBB 1 
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test.img: 18/1024 files (0.0% non-contiguous), 319/16384 blocks
+minify fs
+Setting reserved blocks percentage to 0% (0 blocks)
+Resizing the filesystem on test.img to 338 (1k) blocks.
+The filesystem on test.img is now 338 (1k) blocks long.
+
+Filesystem volume name:   <none>
+Last mounted on:          <not available>
+Filesystem magic number:  0xEF53
+Filesystem revision #:    1 (dynamic)
+Filesystem features:      ext_attr dir_index filetype extent 64bit flex_bg sparse_super huge_file dir_nlink extra_isize metadata_csum
+Default mount options:    (none)
+Filesystem state:         clean
+Errors behavior:          Continue
+Filesystem OS type:       Linux
+Inode count:              512
+Block count:              338
+Reserved block count:     0
+Free blocks:              151
+Free inodes:              494
+First block:              1
+Block size:               1024
+Fragment size:            1024
+Group descriptor size:    64
+Blocks per group:         8192
+Fragments per group:      8192
+Inodes per group:         512
+Inode blocks per group:   128
+Flex block group size:    16
+Mount count:              0
+Check interval:           15552000 (6 months)
+Reserved blocks uid:      0
+Reserved blocks gid:      0
+First inode:              11
+Inode size:	          256
+Required extra isize:     28
+Desired extra isize:      28
+Default directory hash:   half_md4
+Checksum type:            crc32c
+
+
+Group 0: (Blocks 1-337)
+  Primary superblock at 1, Group descriptors at 2-2
+  Block bitmap at 3 (+2)
+  Inode bitmap at 5 (+4)
+  Inode table at 7-134 (+6)
+  151 free blocks, 494 free inodes, 4 directories, 494 unused inodes
+  Free blocks: 4, 6, 135-262, 317-337
+  Free inodes: 19-512
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test.img: 18/512 files (0.0% non-contiguous), 187/338 blocks
+minify fs (2)
+Setting reserved blocks percentage to 0% (0 blocks)
+Resizing the filesystem on test.img to 188 (1k) blocks.
+The filesystem on test.img is now 188 (1k) blocks long.
+
+Filesystem volume name:   <none>
+Last mounted on:          <not available>
+Filesystem magic number:  0xEF53
+Filesystem revision #:    1 (dynamic)
+Filesystem features:      ext_attr dir_index filetype extent 64bit flex_bg sparse_super huge_file dir_nlink extra_isize metadata_csum
+Default mount options:    (none)
+Filesystem state:         clean
+Errors behavior:          Continue
+Filesystem OS type:       Linux
+Inode count:              512
+Block count:              188
+Reserved block count:     0
+Free blocks:              1
+Free inodes:              494
+First block:              1
+Block size:               1024
+Fragment size:            1024
+Group descriptor size:    64
+Blocks per group:         8192
+Fragments per group:      8192
+Inodes per group:         512
+Inode blocks per group:   128
+Flex block group size:    16
+Mount count:              0
+Check interval:           15552000 (6 months)
+Reserved blocks uid:      0
+Reserved blocks gid:      0
+First inode:              11
+Inode size:	          256
+Required extra isize:     28
+Desired extra isize:      28
+Default directory hash:   half_md4
+Checksum type:            crc32c
+
+
+Group 0: (Blocks 1-187)
+  Primary superblock at 1, Group descriptors at 2-2
+  Block bitmap at 3 (+2)
+  Inode bitmap at 5 (+4)
+  Inode table at 7-134 (+6)
+  1 free blocks, 494 free inodes, 4 directories, 494 unused inodes
+  Free blocks: 187
+  Free inodes: 19-512
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test.img: 18/512 files (5.6% non-contiguous), 187/188 blocks
diff --git a/tests/m_minrootdir/output.sed b/tests/m_minrootdir/output.sed
new file mode 100644
index 0000000..2e76967
--- /dev/null
+++ b/tests/m_minrootdir/output.sed
@@ -0,0 +1,5 @@
+s/^[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)[[:space:]]*-[[:space:]]*\([0-9]*\)[[:space:]]*[0-9]*[[:space:]]*-[[:space:]]*[0-9]*[[:space:]]*\([0-9]*\)/X \1\/\2 \3\/\4 \5-\6 AAA-BBB \7/g
+s/^[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)[[:space:]]*-[[:space:]]*\([0-9]*\)[[:space:]]*[0-9]*[[:space:]]*\([0-9]*\)/Y \1\/\2 \3\/\4 \5-\6 AAA \7/g
+s/Mode:.*$//g
+s/User:.*Size:/Size:/g
+s/^Inode: [0-9]*/Inode: III/g
diff --git a/tests/m_minrootdir/script b/tests/m_minrootdir/script
new file mode 100644
index 0000000..662e76f
--- /dev/null
+++ b/tests/m_minrootdir/script
@@ -0,0 +1,81 @@
+if test -x $DEBUGFS_EXE -a -x $RESIZE2FS_EXE; then
+
+test_description="create fs image from dir, then minimize it"
+MKFS_DIR=$TMPFILE.dir
+OUT=$test_name.log
+EXP=$test_dir/expect
+
+rm -rf $MKFS_DIR
+mkdir -p $MKFS_DIR
+mkdir $MKFS_DIR/dir
+mkdir $MKFS_DIR/emptydir
+dd if=/dev/zero of=$MKFS_DIR/bigzerofile bs=1 count=1 seek=1073741824 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=1024 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=524288 conv=notrunc 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=1048576 conv=notrunc 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=536870912 conv=notrunc 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=1073741824 conv=notrunc 2> /dev/null
+dd if=/dev/zero bs=1024 count=32 2> /dev/null | tr '\0' 'a' > $MKFS_DIR/bigfile
+touch $MKFS_DIR/emptyfile
+echo "Test me" > $MKFS_DIR/dir/file
+
+echo "create fs" > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^has_journal,metadata_csum,64bit,^resize_inode -E lazy_itable_init=1 -b 1024 -d $MKFS_DIR $TMPFILE 16384 >> $OUT 2>&1
+
+$DUMPE2FS $TMPFILE >> $OUT 2>&1
+cat > $TMPFILE.cmd << ENDL
+stat /emptyfile
+stat /bigfile
+stat /sparsefile
+stat /bigzerofile
+stat /fifo
+stat /emptydir
+stat /dir
+stat /dir/file
+ENDL
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE 2>&1 | egrep "(stat|Size:|Type:)" | sed -f $test_dir/output.sed >> $OUT
+
+cat > $TMPFILE.cmd << ENDL
+ex /emptyfile
+ex /bigfile
+ex /sparsefile
+ex /bigzerofile
+ex /dir
+ex /dir/file
+ENDL
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE 2>&1 | sed -f $test_dir/output.sed >> $OUT
+$FSCK -f -n $TMPFILE >> $OUT 2>&1
+
+echo "minify fs" >> $OUT
+$TUNE2FS -m 0 $TMPFILE >> $OUT 2>&1
+$RESIZE2FS -M $TMPFILE >> $OUT 2>&1
+$DUMPE2FS $TMPFILE >> $OUT 2>&1
+$FSCK -f -n $TMPFILE >> $OUT 2>&1
+
+echo "minify fs (2)" >> $OUT
+$TUNE2FS -m 0 $TMPFILE >> $OUT 2>&1
+$RESIZE2FS -M $TMPFILE >> $OUT 2>&1
+$DUMPE2FS $TMPFILE >> $OUT 2>&1
+$FSCK -f -n $TMPFILE >> $OUT 2>&1
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" < $OUT > $OUT.tmp
+mv $OUT.tmp $OUT
+
+# Do the verification
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+        echo "$test_name: $test_description: failed"
+        diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+rm -rf $TMPFILE.cmd $MKFS_DIR $OUT.sed
+unset MKFS_DIR OUT EXP
+
+else #if test -x $DEBUGFS_EXE -a -x RESIZE2FS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/m_rootdir/expect b/tests/m_rootdir/expect
new file mode 100644
index 0000000..a5314f1
--- /dev/null
+++ b/tests/m_rootdir/expect
@@ -0,0 +1,117 @@
+Filesystem volume name:   <none>
+Last mounted on:          <not available>
+Filesystem magic number:  0xEF53
+Filesystem revision #:    1 (dynamic)
+Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super huge_file dir_nlink extra_isize metadata_csum
+Default mount options:    (none)
+Filesystem state:         clean
+Errors behavior:          Continue
+Filesystem OS type:       Linux
+Inode count:              1024
+Block count:              16384
+Reserved block count:     819
+Free blocks:              14786
+Free inodes:              1005
+First block:              1
+Block size:               1024
+Fragment size:            1024
+Group descriptor size:    64
+Reserved GDT blocks:      127
+Blocks per group:         8192
+Fragments per group:      8192
+Inodes per group:         512
+Inode blocks per group:   128
+Flex block group size:    16
+Mount count:              0
+Check interval:           15552000 (6 months)
+Reserved blocks uid:      0
+Reserved blocks gid:      0
+First inode:              11
+Inode size:	          256
+Required extra isize:     28
+Desired extra isize:      28
+Journal inode:            8
+Default directory hash:   half_md4
+Journal backup:           inode blocks
+Checksum type:            crc32c
+Journal features:         (none)
+Journal size:             1024k
+Journal length:           1024
+Journal sequence:         0x00000001
+Journal start:            0
+
+
+Group 0: (Blocks 1-8192)
+  Primary superblock at 1, Group descriptors at 2-2
+  Reserved GDT blocks at 3-129
+  Block bitmap at 130 (+129)
+  Inode bitmap at 132 (+131)
+  Inode table at 134-261 (+133)
+  7748 free blocks, 493 free inodes, 4 directories, 493 unused inodes
+  Free blocks: 445-8192
+  Free inodes: 20-512
+Group 1: (Blocks 8193-16383) [INODE_UNINIT]
+  Backup superblock at 8193, Group descriptors at 8194-8194
+  Reserved GDT blocks at 8195-8321
+  Block bitmap at 131 (bg #0 + 130)
+  Inode bitmap at 133 (bg #0 + 132)
+  Inode table at 262-389 (bg #0 + 261)
+  7038 free blocks, 512 free inodes, 0 directories, 512 unused inodes
+  Free blocks: 9346-16383
+  Free inodes: 513-1024
+debugfs: stat /emptyfile
+Inode: III   Type: regular    
+Size: 0
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /bigfile
+Inode: III   Type: regular    
+Size: 32768
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /sparsefile
+Inode: III   Type: regular    
+Size: 1073741825
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /bigzerofile
+Inode: III   Type: regular    
+Size: 1073741825
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /fifo
+debugfs: stat /emptydir
+Inode: III   Type: directory    
+Size: 1024
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /dir
+Inode: III   Type: directory    
+Size: 1024
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: stat /dir/file
+Inode: III   Type: regular    
+Size: 8
+Fragment:  Address: 0    Number: 0    Size: 0
+debugfs: ex /emptyfile
+Level Entries       Logical      Physical Length Flags
+debugfs: ex /bigfile
+Level Entries       Logical      Physical Length Flags
+X 0/0 1/1 0-31 AAA-BBB 32 
+debugfs: ex /sparsefile
+Level Entries           Logical      Physical Length Flags
+Y 0/1 1/1 1-1048576 AAA 1048576
+X 1/1 1/5 1-1 AAA-BBB 1 
+X 1/1 2/5 512-512 AAA-BBB 1 
+X 1/1 3/5 1024-1024 AAA-BBB 1 
+X 1/1 4/5 524288-524288 AAA-BBB 1 
+X 1/1 5/5 1048576-1048576 AAA-BBB 1 
+debugfs: ex /bigzerofile
+Level Entries           Logical      Physical Length Flags
+debugfs: ex /dir
+Level Entries       Logical      Physical Length Flags
+X 0/0 1/1 0-0 AAA-BBB 1 
+debugfs: ex /dir/file
+Level Entries       Logical      Physical Length Flags
+X 0/0 1/1 0-0 AAA-BBB 1 
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test.img: 19/1024 files (0.0% non-contiguous), 1598/16384 blocks
diff --git a/tests/m_rootdir/output.sed b/tests/m_rootdir/output.sed
new file mode 100644
index 0000000..2e76967
--- /dev/null
+++ b/tests/m_rootdir/output.sed
@@ -0,0 +1,5 @@
+s/^[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)[[:space:]]*-[[:space:]]*\([0-9]*\)[[:space:]]*[0-9]*[[:space:]]*-[[:space:]]*[0-9]*[[:space:]]*\([0-9]*\)/X \1\/\2 \3\/\4 \5-\6 AAA-BBB \7/g
+s/^[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)\/[[:space:]]*\([0-9]*\)[[:space:]]*\([0-9]*\)[[:space:]]*-[[:space:]]*\([0-9]*\)[[:space:]]*[0-9]*[[:space:]]*\([0-9]*\)/Y \1\/\2 \3\/\4 \5-\6 AAA \7/g
+s/Mode:.*$//g
+s/User:.*Size:/Size:/g
+s/^Inode: [0-9]*/Inode: III/g
diff --git a/tests/m_rootdir/script b/tests/m_rootdir/script
new file mode 100644
index 0000000..fbe1b31
--- /dev/null
+++ b/tests/m_rootdir/script
@@ -0,0 +1,71 @@
+if test -x $DEBUGFS_EXE; then
+
+test_description="create fs image from dir"
+MKFS_DIR=$TMPFILE.dir
+OUT=$test_name.log
+EXP=$test_dir/expect
+
+rm -rf $MKFS_DIR
+mkdir -p $MKFS_DIR
+touch $MKFS_DIR/emptyfile
+dd if=/dev/zero bs=1024 count=32 2> /dev/null | tr '\0' 'a' > $MKFS_DIR/bigfile
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=1024 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=524288 conv=notrunc 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=1048576 conv=notrunc 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=536870912 conv=notrunc 2> /dev/null
+echo "M" | dd of=$MKFS_DIR/sparsefile bs=1 count=1 seek=1073741824 conv=notrunc 2> /dev/null
+dd if=/dev/zero of=$MKFS_DIR/bigzerofile bs=1 count=1 seek=1073741824 2> /dev/null
+ln $MKFS_DIR/bigzerofile $MKFS_DIR/bigzerofile_hardlink
+ln -s /silly_bs_link $MKFS_DIR/silly_bs_link
+mkdir $MKFS_DIR/emptydir
+mkdir $MKFS_DIR/dir
+echo "Test me" > $MKFS_DIR/dir/file
+
+$MKE2FS -q -F -o Linux -T ext4 -O metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -d $MKFS_DIR $TMPFILE 16384 > $OUT 2>&1
+
+$DUMPE2FS $TMPFILE >> $OUT 2>&1
+cat > $TMPFILE.cmd << ENDL
+stat /emptyfile
+stat /bigfile
+stat /sparsefile
+stat /bigzerofile
+stat /fifo
+stat /emptydir
+stat /dir
+stat /dir/file
+ENDL
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE 2>&1 | egrep "(stat|Size:|Type:)" | sed -f $test_dir/output.sed >> $OUT
+
+cat > $TMPFILE.cmd << ENDL
+ex /emptyfile
+ex /bigfile
+ex /sparsefile
+ex /bigzerofile
+ex /dir
+ex /dir/file
+ENDL
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE 2>&1 | sed -f $test_dir/output.sed >> $OUT 2>&1
+
+$FSCK -f -n $TMPFILE >> $OUT 2>&1
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" < $OUT > $OUT.tmp
+mv $OUT.tmp $OUT
+
+# Do the verification
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+        echo "$test_name: $test_description: failed"
+        diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+rm -rf $TMPFILE.cmd $MKFS_DIR $OUT.sed
+unset MKFS_DIR OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 27/35] contrib: script to create minified ext4 image from a directory
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (25 preceding siblings ...)
  2015-04-02  2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-05-05 14:52   ` Theodore Ts'o
  2015-04-02  2:37 ` [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
                   ` (6 subsequent siblings)
  33 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The dir2fs script converts a directory into a minimized ext4 filesystem.
FS creation parameters are tweaked to reduce as much FS overhead as
possible, and to leave as few unused blocks and inodes as possible.
Given that mke2fs -d lays out files linearly from the beginning of the
FS, using resize2fs -M is not as horrible as it usually is.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 contrib/dir2fs |   66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)
 create mode 100755 contrib/dir2fs


diff --git a/contrib/dir2fs b/contrib/dir2fs
new file mode 100755
index 0000000..abcecb3
--- /dev/null
+++ b/contrib/dir2fs
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+dir="$1"
+dev="$2"
+
+if [ "$1" = "--help" ] || [ ! -d "${dir}" ]; then
+	echo "Usage: $0 dir [mke2fs args] dev"
+	exit 1
+fi
+
+shift
+
+# Goal: Put all the files at the beginning (which mke2fs does) and minimize
+# the number of free inodes given the minimum number of blocks required.
+# Hence all this math to get the inode ratio just right.
+
+bytes="$(du -ks "${dir}" | awk '{print $1}')"
+bytes="$((bytes * 1024))"
+inodes="$(find "${dir}" -print0 | xargs -0 stat -c '%i' | sort -g | uniq | wc -l)"
+block_sz=4096
+inode_sz=256
+sb_overhead=4096
+blocks_per_group="$((block_sz * 8))"
+bytes_per_group="$((blocks_per_group * block_sz))"
+inode_bytes="$((inodes * inode_sz))"
+
+# Estimate overhead with the minimum number of groups...
+nr_groups="$(( (bytes + inode_bytes + bytes_per_group - 1) / bytes_per_group))"
+inode_bytes_per_group="$((inode_bytes / nr_groups))"
+inode_blocks_per_group="$(( (inode_bytes_per_group + (block_sz - 1)) / block_sz ))"
+per_grp_overhead="$(( ((3 + inode_blocks_per_group) * block_sz) + 64 ))"
+overhead="$(( sb_overhead + (per_grp_overhead * nr_groups) ))"
+used_bytes="$((bytes + overhead))"
+
+# Then do it again with the real number of groups.
+nr_groups="$(( (used_bytes + (bytes_per_group - 1)) / bytes_per_group))"
+tot_blocks="$((nr_groups * blocks_per_group))"
+tot_bytes="$((tot_blocks * block_sz))"
+
+ratio="$((bytes / inodes))"
+mkfs_blocks="$((tot_blocks * 4 / 3))"
+
+mke2fs -i "${ratio}" -T ext4 -d "${dir}" -O ^resize_inode,sparse_super2,metadata_csum,64bit,^has_journal -E packed_meta_blocks=1,num_backup_sb=0 -b "${block_sz}" -I "${inodesz}" -F "${dev}" "${mkfs_blocks}" || exit
+
+e2fsck -fyD "${dev}"
+
+blocks="$(dumpe2fs -h "${dev}" 2>&1 | grep 'Block count:' | awk '{print $3}')"
+while resize2fs -f -M "${dev}"; do
+	new_blocks="$(dumpe2fs -h "${dev}" 2>&1 | grep 'Block count:' | awk '{print $3}')"
+	if [ "${new_blocks}" -eq "${blocks}" ]; then
+		break;
+	fi
+	blocks="${new_blocks}"
+done
+
+if [ ! -b "${dev}" ]; then
+    truncate -s "$((blocks * block_sz))" "${dev}" || (e2image -ar "${dev}" "${dev}.min"; mv "${dev}.min" "${dev}")
+fi
+
+e2fsck -fy "${dev}"
+
+dir_blocks="$((bytes / block_sz))"
+overhead="$((blocks - dir_blocks))"
+echo "Minimized image overhead: $((100 * overhead / dir_blocks))%"
+
+exit 0


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2()
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (26 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 29/35] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

As part of supporting fallocate-like functionality, extend
ext2fs_bmap() with two flags -- BMAP_UNINIT and BMAP_ZERO.  The first
will cause it to mark/set a block uninitialized, if it's part of an
extent based file.  For a block mapped file, the mapping is put in,
but there is no way to remember the uninitialized status.  The second
flag causes the block to be zeroed to support the use case of
emulating uninitialized blocks on a block-map file by zeroing them.

Eventually fallocate or fuse2fs or somebody will use these.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/bmap.c   |    9 +++++++--
 lib/ext2fs/ext2fs.h |    2 ++
 2 files changed, 9 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index cb3f5a1..c18f742 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -214,10 +214,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
 	errcode_t		retval = 0;
 	blk64_t			blk64 = 0;
 	int			alloc = 0;
+	int			set_flags;
+
+	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
 
 	if (bmap_flags & BMAP_SET) {
 		retval = ext2fs_extent_set_bmap(handle, block,
-						*phys_blk, 0);
+						*phys_blk, set_flags);
 		return retval;
 	}
 	retval = ext2fs_extent_goto(handle, block);
@@ -254,7 +257,7 @@ got_block:
 		alloc++;
 	set_extent:
 		retval = ext2fs_extent_set_bmap(handle, block,
-						blk64, 0);
+						blk64, set_flags);
 		if (retval) {
 			ext2fs_block_alloc_stats2(fs, blk64, -1);
 			return retval;
@@ -441,6 +444,8 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 	if (retval == 0)
 		*phys_blk = blk32;
 done:
+	if (*phys_blk && retval == 0 && (bmap_flags & BMAP_ZERO))
+		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
 	if (buf)
 		ext2fs_free_mem(&buf);
 	if (handle)
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index d4f6c8e..5b3c6c1 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -527,6 +527,8 @@ typedef struct ext2_icount *ext2_icount_t;
  */
 #define BMAP_ALLOC	0x0001
 #define BMAP_SET	0x0002
+#define BMAP_UNINIT	0x0004
+#define BMAP_ZERO	0x0008
 
 /*
  * Returned flags from ext2fs_bmap


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 29/35] libext2fs: find/alloc a range of empty blocks
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (27 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 30/35] libext2fs: add new hooks to support large allocations Darrick J. Wong
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide a function that, given a goal pblk and a range, will try to
find a run of free blocks to satisfy the allocation.  By default the
function will look anywhere in the filesystem for the run, though this
can be constrained with optional flags.  One flag indicates that the
range must start at the goal block; the other flag indicates that we
should not return a range shorter than len.

v2: Add a second function to allocate a range of blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/alloc.c  |  141 +++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/ext2fs.h |   11 ++++
 2 files changed, 152 insertions(+)


diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 9901ca5..4c3b620 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -26,6 +26,16 @@
 #include "ext2_fs.h"
 #include "ext2fs.h"
 
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
 /*
  * Clear the uninit block bitmap flag if necessary
  */
@@ -346,3 +356,134 @@ no_blocks:
 		group = group & ~((1 << (log_flex)) - 1);
 	return ext2fs_group_first_block2(fs, group);
 }
+
+/*
+ * Starting at _goal_, scan around the filesystem to find a run of free blocks
+ * that's at least _len_ blocks long.  Possible flags:
+ * - EXT2_NEWRANGE_EXACT_GOAL: The range of blocks must start at _goal_.
+ * - EXT2_NEWRANGE_MIN_LENGTH: do not return a allocation shorter than _len_.
+ * - EXT2_NEWRANGE_ZERO_BLOCKS: Zero blocks pblk to pblk+plen before returning.
+ *
+ * The starting block is returned in _pblk_ and the length is returned via
+ * _plen_.  The blocks are not marked in the bitmap; the caller must mark
+ * however much of the returned run they actually use, hopefully via
+ * ext2fs_block_alloc_stats_range().
+ *
+ * This function can return a range that is longer than what was requested.
+ */
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+			   blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+			   blk64_t *plen)
+{
+	errcode_t retval;
+	blk64_t start, end, b;
+	int looped = 0;
+	blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+
+	dbg_printf("%s: flags=0x%x goal=%llu len=%llu\n", __func__, flags,
+		   goal, len);
+	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+	if (len == 0 || (flags & ~EXT2_NEWRANGE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+	if (!map)
+		map = fs->block_map;
+	if (!map)
+		return EXT2_ET_NO_BLOCK_BITMAP;
+	if (!goal || goal >= ext2fs_blocks_count(fs->super))
+		goal = fs->super->s_first_data_block;
+
+	start = goal;
+	while (!looped || start <= goal) {
+		retval = ext2fs_find_first_zero_block_bitmap2(map, start,
+							      max_blocks - 1,
+							      &start);
+		if (retval == ENOENT) {
+			/*
+			 * If there are no free blocks beyond the starting
+			 * point, try scanning the whole filesystem, unless the
+			 * user told us only to allocate from _goal_, or if
+			 * we're already scanning the whole filesystem.
+			 */
+			if (flags & EXT2_NEWRANGE_FIXED_GOAL ||
+			    start == fs->super->s_first_data_block)
+				goto fail;
+			start = fs->super->s_first_data_block;
+			continue;
+		} else if (retval)
+			goto errout;
+
+		if (flags & EXT2_NEWRANGE_FIXED_GOAL && start != goal)
+			goto fail;
+
+		b = min(start + len - 1, max_blocks - 1);
+		retval =  ext2fs_find_first_set_block_bitmap2(map, start, b,
+							      &end);
+		if (retval == ENOENT)
+			end = b + 1;
+		else if (retval)
+			goto errout;
+
+		if (!(flags & EXT2_NEWRANGE_MIN_LENGTH) ||
+		    (end - start) >= len) {
+			/* Success! */
+			*pblk = start;
+			*plen = end - start;
+			dbg_printf("%s: new_range goal=%llu--%llu "
+				   "blk=%llu--%llu %llu\n",
+				   __func__, goal, goal + len - 1,
+				   *pblk, *pblk + *plen - 1, *plen);
+
+			for (b = start; b < end;
+			     b += fs->super->s_blocks_per_group)
+				clear_block_uninit(fs,
+						ext2fs_group_of_blk2(fs, b));
+			return 0;
+		}
+
+		if (flags & EXT2_NEWRANGE_FIXED_GOAL)
+			goto fail;
+		start = end;
+		if (start >= max_blocks) {
+			if (looped)
+				goto fail;
+			looped = 1;
+			start = fs->super->s_first_data_block;
+		}
+	}
+
+fail:
+	retval = EXT2_ET_BLOCK_ALLOC_FAIL;
+errout:
+	return retval;
+}
+
+errcode_t ext2fs_alloc_range(ext2_filsys fs, int flags, blk64_t goal,
+			     blk_t len, blk64_t *ret)
+{
+	int newr_flags = EXT2_NEWRANGE_MIN_LENGTH;
+	errcode_t retval;
+	blk64_t plen;
+
+	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+	if (len == 0 || (flags & ~EXT2_ALLOCRANGE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (flags & EXT2_ALLOCRANGE_FIXED_GOAL)
+		newr_flags |= EXT2_NEWRANGE_FIXED_GOAL;
+
+	retval = ext2fs_new_range(fs, newr_flags, goal, len, NULL, ret, &plen);
+	if (retval)
+		return retval;
+
+	if (plen < len)
+		return EXT2_ET_BLOCK_ALLOC_FAIL;
+
+	if (flags & EXT2_ALLOCRANGE_ZERO_BLOCKS) {
+		retval = ext2fs_zero_blocks2(fs, *ret, len, NULL, NULL);
+		if (retval)
+			return retval;
+	}
+
+	ext2fs_block_alloc_stats_range(fs, *ret, len, +1);
+	return retval;
+}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 5b3c6c1..f5306fa 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -695,6 +695,17 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 							      blk64_t *ret));
 blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino,
 			       struct ext2_inode *inode, blk64_t lblk);
+#define EXT2_NEWRANGE_FIXED_GOAL	(0x1)
+#define EXT2_NEWRANGE_MIN_LENGTH	(0x2)
+#define EXT2_NEWRANGE_ALL_FLAGS		(0x3)
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+			   blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+			   blk64_t *plen);
+#define EXT2_ALLOCRANGE_FIXED_GOAL	(0x1)
+#define EXT2_ALLOCRANGE_ZERO_BLOCKS	(0x2)
+#define EXT2_ALLOCRANGE_ALL_FLAGS	(0x3)
+errcode_t ext2fs_alloc_range(ext2_filsys fs, int flags, blk64_t goal,
+			     blk_t len, blk64_t *ret);
 
 /* alloc_sb.c */
 extern int ext2fs_reserve_super_and_bgd(ext2_filsys fs,


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 30/35] libext2fs: add new hooks to support large allocations
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (28 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 29/35] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 31/35] libext2fs: implement fallocate Darrick J. Wong
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Add a new get_alloc_blocks hook and a block_alloc_stats_range hook so
that e2fsck can capture allocation requests spanning more than a
block to its block_found_map.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c           |   45 +++++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/alloc.c       |   37 ++++++++++++++++++++++++++++++++++++-
 lib/ext2fs/alloc_stats.c |   16 ++++++++++++++++
 lib/ext2fs/ext2fs.h      |   16 ++++++++++++++++
 4 files changed, 113 insertions(+), 1 deletion(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 760fbde..8c66c6d 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -3981,6 +3981,26 @@ static errcode_t e2fsck_get_alloc_block(ext2_filsys fs, blk64_t goal,
 	return (0);
 }
 
+static errcode_t e2fsck_new_range(ext2_filsys fs, int flags, blk64_t goal,
+				  blk64_t len, blk64_t *pblk, blk64_t *plen)
+{
+	e2fsck_t ctx = (e2fsck_t) fs->priv_data;
+	errcode_t	retval;
+
+	if (ctx->block_found_map)
+		return ext2fs_new_range(fs, flags, goal, len,
+					ctx->block_found_map, pblk, plen);
+
+	if (!fs->block_map) {
+		retval = ext2fs_read_block_bitmap(fs);
+		if (retval)
+			return retval;
+	}
+
+	return ext2fs_new_range(fs, flags, goal, len, fs->block_map,
+				pblk, plen);
+}
+
 static void e2fsck_block_alloc_stats(ext2_filsys fs, blk64_t blk, int inuse)
 {
 	e2fsck_t ctx = (e2fsck_t) fs->priv_data;
@@ -4000,6 +4020,28 @@ static void e2fsck_block_alloc_stats(ext2_filsys fs, blk64_t blk, int inuse)
 	}
 }
 
+static void e2fsck_block_alloc_stats_range(ext2_filsys fs, blk64_t blk,
+					   blk_t num, int inuse)
+{
+	e2fsck_t ctx = (e2fsck_t) fs->priv_data;
+
+	/* Never free a critical metadata block */
+	if (ctx->block_found_map &&
+	    ctx->block_metadata_map &&
+	    inuse < 0 &&
+	    ext2fs_test_block_bitmap_range2(ctx->block_metadata_map, blk, num))
+		return;
+
+	if (ctx->block_found_map) {
+		if (inuse > 0)
+			ext2fs_mark_block_bitmap_range2(ctx->block_found_map,
+							blk, num);
+		else
+			ext2fs_unmark_block_bitmap_range2(ctx->block_found_map,
+							blk, num);
+	}
+}
+
 void e2fsck_use_inode_shortcuts(e2fsck_t ctx, int use_shortcuts)
 {
 	ext2_filsys fs = ctx->fs;
@@ -4023,4 +4065,7 @@ void e2fsck_intercept_block_allocations(e2fsck_t ctx)
 	ext2fs_set_alloc_block_callback(ctx->fs, e2fsck_get_alloc_block, 0);
 	ext2fs_set_block_alloc_stats_callback(ctx->fs,
 						e2fsck_block_alloc_stats, 0);
+	ext2fs_set_new_range_callback(ctx->fs, e2fsck_new_range, NULL);
+	ext2fs_set_block_alloc_stats_range_callback(ctx->fs,
+					e2fsck_block_alloc_stats_range, NULL);
 }
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 4c3b620..86e7f99 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -379,12 +379,32 @@ errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
 	blk64_t start, end, b;
 	int looped = 0;
 	blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+	errcode_t (*nrf)(ext2_filsys fs, int flags, blk64_t goal,
+			 blk64_t len, blk64_t *pblk, blk64_t *plen);
 
 	dbg_printf("%s: flags=0x%x goal=%llu len=%llu\n", __func__, flags,
 		   goal, len);
 	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
 	if (len == 0 || (flags & ~EXT2_NEWRANGE_ALL_FLAGS))
 		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (!map && fs->new_range) {
+		/*
+		 * In case there are clients out there whose new_range
+		 * handlers call ext2fs_new_range with a NULL block map,
+		 * temporarily swap out the function pointer so that we don't
+		 * end up in an infinite loop.
+		 */
+		nrf = fs->new_range;
+		fs->new_range = NULL;
+		retval = nrf(fs, flags, goal, len, pblk, plen);
+		fs->new_range = nrf;
+		if (retval)
+			return retval;
+		start = *pblk;
+		end = *pblk + *plen;
+		goto allocated;
+	}
 	if (!map)
 		map = fs->block_map;
 	if (!map)
@@ -432,7 +452,7 @@ errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
 				   "blk=%llu--%llu %llu\n",
 				   __func__, goal, goal + len - 1,
 				   *pblk, *pblk + *plen - 1, *plen);
-
+allocated:
 			for (b = start; b < end;
 			     b += fs->super->s_blocks_per_group)
 				clear_block_uninit(fs,
@@ -457,6 +477,21 @@ errout:
 	return retval;
 }
 
+void ext2fs_set_new_range_callback(ext2_filsys fs,
+	errcode_t (*func)(ext2_filsys fs, int flags, blk64_t goal,
+			       blk64_t len, blk64_t *pblk, blk64_t *plen),
+	errcode_t (**old)(ext2_filsys fs, int flags, blk64_t goal,
+			       blk64_t len, blk64_t *pblk, blk64_t *plen))
+{
+	if (!fs || fs->magic != EXT2_ET_MAGIC_EXT2FS_FILSYS)
+		return;
+
+	if (old)
+		*old = fs->new_range;
+
+	fs->new_range = func;
+}
+
 errcode_t ext2fs_alloc_range(ext2_filsys fs, int flags, blk64_t goal,
 			     blk_t len, blk64_t *ret)
 {
diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
index aca5004..3949f61 100644
--- a/lib/ext2fs/alloc_stats.c
+++ b/lib/ext2fs/alloc_stats.c
@@ -145,4 +145,20 @@ void ext2fs_block_alloc_stats_range(ext2_filsys fs, blk64_t blk,
 	}
 	ext2fs_mark_super_dirty(fs);
 	ext2fs_mark_bb_dirty(fs);
+	if (fs->block_alloc_stats_range)
+		(fs->block_alloc_stats_range)(fs, blk, num, inuse);
+}
+
+void ext2fs_set_block_alloc_stats_range_callback(ext2_filsys fs,
+	void (*func)(ext2_filsys fs, blk64_t blk,
+				    blk_t num, int inuse),
+	void (**old)(ext2_filsys fs, blk64_t blk,
+				    blk_t num, int inuse))
+{
+	if (!fs || fs->magic != EXT2_ET_MAGIC_EXT2FS_FILSYS)
+		return;
+	if (old)
+		*old = fs->block_alloc_stats_range;
+
+	fs->block_alloc_stats_range = func;
 }
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index f5306fa..4ffc9ea 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -279,6 +279,12 @@ struct struct_ext2_filsys {
 
 	io_channel			journal_io;
 	char				*journal_name;
+
+	/* New block range allocation hooks */
+	errcode_t (*new_range)(ext2_filsys fs, int flags, blk64_t goal,
+			       blk64_t len, blk64_t *pblk, blk64_t *plen);
+	void (*block_alloc_stats_range)(ext2_filsys fs, blk64_t blk, blk_t num,
+					int inuse);
 };
 
 #if EXT2_FLAT_INCLUDES
@@ -695,6 +701,16 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 							      blk64_t *ret));
 blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino,
 			       struct ext2_inode *inode, blk64_t lblk);
+extern void ext2fs_set_new_range_callback(ext2_filsys fs,
+	errcode_t (*func)(ext2_filsys fs, int flags, blk64_t goal,
+			       blk64_t len, blk64_t *pblk, blk64_t *plen),
+	errcode_t (**old)(ext2_filsys fs, int flags, blk64_t goal,
+			       blk64_t len, blk64_t *pblk, blk64_t *plen));
+extern void ext2fs_set_block_alloc_stats_range_callback(ext2_filsys fs,
+	void (*func)(ext2_filsys fs, blk64_t blk,
+				    blk_t num, int inuse),
+	void (**old)(ext2_filsys fs, blk64_t blk,
+				    blk_t num, int inuse));
 #define EXT2_NEWRANGE_FIXED_GOAL	(0x1)
 #define EXT2_NEWRANGE_MIN_LENGTH	(0x2)
 #define EXT2_NEWRANGE_ALL_FLAGS		(0x3)


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 31/35] libext2fs: implement fallocate
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (29 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 30/35] libext2fs: add new hooks to support large allocations Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Create a library function to perform fallocation on arbitrary files,
and wire up a few users for this function.  This is a bit more intense
than Ted's original mk_hugefiles implementation since we have to honor
any blocks that may already be allocated to the file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/Makefile.in |    8 
 lib/ext2fs/ext2fs.h    |   10 +
 lib/ext2fs/fallocate.c |  853 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 871 insertions(+)
 create mode 100644 lib/ext2fs/fallocate.c


diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index e717ae0..2077dab 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -81,6 +81,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
 	expanddir.o \
 	ext_attr.o \
 	extent.o \
+	fallocate.o \
 	fileio.o \
 	finddev.o \
 	flushb.o \
@@ -801,6 +802,13 @@ extent.o: $(srcdir)/extent.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
  $(srcdir)/bitops.h $(srcdir)/e2image.h
+fallocate.o: $(srcdir)/fallocate.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fsP.h \
+ $(srcdir)/ext2fs.h $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
+ $(srcdir)/bitops.h $(srcdir)/e2image.h
 fileio.o: $(srcdir)/fileio.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 4ffc9ea..4545e8a 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1261,6 +1261,16 @@ extern errcode_t ext2fs_extent_goto2(ext2_extent_handle_t handle,
 extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);
 size_t ext2fs_max_extent_depth(ext2_extent_handle_t handle);
 
+/* fallocate.c */
+#define EXT2_FALLOCATE_ZERO_BLOCKS	(0x1)
+#define EXT2_FALLOCATE_FORCE_INIT	(0x2)
+#define EXT2_FALLOCATE_FORCE_UNINIT	(0x4)
+#define EXT2_FALLOCATE_INIT_BEYOND_EOF	(0x8)
+#define EXT2_FALLOCATE_ALL_FLAGS	(0xF)
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+			   struct ext2_inode *inode, blk64_t goal,
+			   blk64_t start, blk64_t len);
+
 /* fileio.c */
 extern errcode_t ext2fs_file_open2(ext2_filsys fs, ext2_ino_t ino,
 				   struct ext2_inode *inode,
diff --git a/lib/ext2fs/fallocate.c b/lib/ext2fs/fallocate.c
new file mode 100644
index 0000000..8b502d5
--- /dev/null
+++ b/lib/ext2fs/fallocate.c
@@ -0,0 +1,853 @@
+/*
+ * fallocate.c -- Allocate large chunks of file.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+
+#include "ext2_fs.h"
+#include "ext2fs.h"
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Extent-based fallocate code.
+ *
+ * Find runs of unmapped logical blocks by starting at start and walking the
+ * extents until we reach the end of the range we want.
+ *
+ * For each run of unmapped blocks, try to find the extents on either side of
+ * the range.  If there's a left extent that can grow by at least a cluster and
+ * there are lblocks between start and the next lcluster after start, see if
+ * there's an implied cluster allocation; if so, zero the blocks (if the left
+ * extent is initialized) and adjust the extent.  Ditto for the blocks between
+ * the end of the last full lcluster and end, if there's a right extent.
+ *
+ * Try to attach as much as we can to the left extent, then try to attach as
+ * much as we can to the right extent.  For the remainder, try to allocate the
+ * whole range; map in whatever we get; and repeat until we're done.
+ *
+ * To attach to a left extent, figure out the maximum amount we can add to the
+ * extent and try to allocate that much, and append if successful.  To attach
+ * to a right extent, figure out the max we can add to the extent, try to
+ * allocate that much, and prepend if successful.
+ *
+ * We need an alloc_range function that tells us how much we can allocate given
+ * a maximum length and one of a suggested start, a fixed start, or a fixed end
+ * point.
+ *
+ * Every time we modify the extent tree we also need to update the block stats.
+ *
+ * At the end, update i_blocks and i_size appropriately.
+ */
+
+static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
+{
+#ifdef DEBUG
+	if (desc)
+		printf("%s: ", desc);
+	printf("extent: lblk %llu--%llu, len %u, pblk %llu, flags: ",
+	       extent->e_lblk, extent->e_lblk + extent->e_len - 1,
+	       extent->e_len, extent->e_pblk);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_LEAF)
+		fputs("LEAF ", stdout);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+		fputs("UNINIT ", stdout);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+		fputs("2ND_VISIT ", stdout);
+	if (!extent->e_flags)
+		fputs("(none)", stdout);
+	fputc('\n', stdout);
+	fflush(stdout);
+#endif
+}
+
+static errcode_t claim_range(ext2_filsys fs, struct ext2_inode *inode,
+			     blk64_t blk, blk64_t len)
+{
+	blk64_t	clusters;
+
+	clusters = (len + EXT2FS_CLUSTER_RATIO(fs) - 1) /
+		   EXT2FS_CLUSTER_RATIO(fs);
+	ext2fs_block_alloc_stats_range(fs, blk,
+			clusters * EXT2FS_CLUSTER_RATIO(fs), +1);
+	return ext2fs_iblk_add_blocks(fs, inode, clusters);
+}
+
+static errcode_t ext_falloc_helper(ext2_filsys fs,
+				   int flags,
+				   ext2_ino_t ino,
+				   struct ext2_inode *inode,
+				   ext2_extent_handle_t handle,
+				   struct ext2fs_extent *left_ext,
+				   struct ext2fs_extent *right_ext,
+				   blk64_t range_start, blk64_t range_len,
+				   blk64_t alloc_goal)
+{
+	struct ext2fs_extent	newex, ex;
+	int			op;
+	blk64_t			fillable, pblk, plen, x, y;
+	blk64_t			eof_blk = 0, cluster_fill = 0;
+	errcode_t		err;
+	blk_t			max_extent_len, max_uninit_len, max_init_len;
+
+#ifdef DEBUG
+	printf("%s: ", __func__);
+	if (left_ext)
+		printf("left_ext=%llu--%llu, ", left_ext->e_lblk,
+		       left_ext->e_lblk + left_ext->e_len - 1);
+	if (right_ext)
+		printf("right_ext=%llu--%llu, ", right_ext->e_lblk,
+		       right_ext->e_lblk + right_ext->e_len - 1);
+	printf("start=%llu len=%llu, goal=%llu\n", range_start, range_len,
+	       alloc_goal);
+	fflush(stdout);
+#endif
+	/* Can't create initialized extents past EOF? */
+	if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF))
+		eof_blk = EXT2_I_SIZE(inode) / fs->blocksize;
+
+	/* The allocation goal must be as far into a cluster as range_start. */
+	alloc_goal = (alloc_goal & ~EXT2FS_CLUSTER_MASK(fs)) |
+		     (range_start & EXT2FS_CLUSTER_MASK(fs));
+
+	max_uninit_len = EXT_UNINIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+	max_init_len = EXT_INIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+
+	/* We must lengthen the left extent to the end of the cluster */
+	if (left_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		/* How many more blocks can be attached to left_ext? */
+		if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - left_ext->e_len;
+		else
+			fillable = max_init_len - left_ext->e_len;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto expand_right;
+
+		/*
+		 * If range_start isn't on a cluster boundary, try an
+		 * implied cluster allocation for left_ext.
+		 */
+		cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+			       (range_start & EXT2FS_CLUSTER_MASK(fs));
+		cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill == 0)
+			goto expand_right;
+
+		if (cluster_fill > fillable)
+			cluster_fill = fillable;
+
+		/* Don't expand an initialized left_ext beyond EOF */
+		if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+			x = left_ext->e_lblk + left_ext->e_len - 1;
+			dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+				   __func__, x, x + cluster_fill, eof_blk);
+			if (eof_blk >= x && eof_blk <= x + cluster_fill)
+				cluster_fill = eof_blk - x;
+			if (cluster_fill == 0)
+				goto expand_right;
+		}
+
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto expand_right;
+		left_ext->e_len += cluster_fill;
+		range_start += cluster_fill;
+		range_len -= cluster_fill;
+		alloc_goal += cluster_fill;
+
+		dbg_print_extent("ext_falloc clus left+", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+			err = ext2fs_zero_blocks2(fs, left_ext->e_pblk +
+						  left_ext->e_len -
+						  cluster_fill, cluster_fill,
+						  NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+expand_right:
+	/* We must lengthen the right extent to the beginning of the cluster */
+	if (right_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		/* How much can we attach to right_ext? */
+		if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - right_ext->e_len;
+		else
+			fillable = max_init_len - right_ext->e_len;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto try_merge;
+
+		/*
+		 * If range_end isn't on a cluster boundary, try an implied
+		 * cluster allocation for right_ext.
+		 */
+		cluster_fill = right_ext->e_lblk & EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill == 0)
+			goto try_merge;
+
+		err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+		if (err)
+			goto out;
+
+		if (cluster_fill > fillable)
+			cluster_fill = fillable;
+		right_ext->e_lblk -= cluster_fill;
+		right_ext->e_pblk -= cluster_fill;
+		right_ext->e_len += cluster_fill;
+		range_len -= cluster_fill;
+
+		dbg_print_extent("ext_falloc clus right+", right_ext);
+		err = ext2fs_extent_replace(handle, 0, right_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+			err = ext2fs_zero_blocks2(fs, right_ext->e_pblk,
+						  cluster_fill, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_merge:
+	/* Merge both extents together, perhaps? */
+	if (left_ext && right_ext) {
+		/* Are the two extents mergeable? */
+		if ((left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) !=
+		    (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT))
+			goto try_left;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_left;
+
+		/*
+		 * Skip initialized extent unless user wants to zero blocks
+		 * or requires init extent.
+		 */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (!(flags & EXT2_FALLOCATE_ZERO_BLOCKS) ||
+		     !(flags & EXT2_FALLOCATE_FORCE_INIT)))
+			goto try_left;
+
+		/* Will it even fit? */
+		x = left_ext->e_len + range_len + right_ext->e_len;
+		if (x > (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT ?
+				max_uninit_len : max_init_len))
+			goto try_left;
+
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto try_left;
+
+		/* Allocate blocks */
+		y = left_ext->e_pblk + left_ext->e_len;
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				       EXT2_NEWRANGE_MIN_LENGTH, y,
+				       right_ext->e_pblk - y + 1, NULL,
+				       &pblk, &plen);
+		if (err)
+			goto try_left;
+		if (pblk + plen != right_ext->e_pblk)
+			goto try_left;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Modify extents */
+		left_ext->e_len = x;
+		dbg_print_extent("ext_falloc merge", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_delete(handle, 0);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+		*right_ext = *left_ext;
+
+		/* Zero blocks */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, range_start, range_len,
+						  NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		return 0;
+	}
+
+try_left:
+	/* Extend the left extent */
+	if (left_ext) {
+		/* How many more blocks can be attached to left_ext? */
+		if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - left_ext->e_len;
+		else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+			fillable = max_init_len - left_ext->e_len;
+		else
+			fillable = 0;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_right;
+
+		if (fillable > range_len)
+			fillable = range_len;
+
+		/* Don't expand an initialized left_ext beyond EOF */
+		x = left_ext->e_lblk + left_ext->e_len - 1;
+		if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+			dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+				   __func__, x, x + fillable, eof_blk);
+			if (eof_blk >= x && eof_blk <= x + fillable)
+				fillable = eof_blk - x;
+		}
+
+		if (fillable == 0)
+			goto try_right;
+
+		/* Test if the right edge of the range is already mapped? */
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			err = ext2fs_map_cluster_block(fs, ino, inode,
+					x + fillable, &pblk);
+			if (err)
+				goto out;
+			if (pblk)
+				fillable -= 1 + ((x + fillable)
+						 & EXT2FS_CLUSTER_MASK(fs));
+			if (fillable == 0)
+				goto try_right;
+		}
+
+		/* Allocate range of blocks */
+		x = left_ext->e_pblk + left_ext->e_len;
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				EXT2_NEWRANGE_MIN_LENGTH,
+				x, fillable, NULL, &pblk, &plen);
+		if (err)
+			goto try_right;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Modify left_ext */
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto out;
+		range_start += plen;
+		range_len -= plen;
+		left_ext->e_len += plen;
+		dbg_print_extent("ext_falloc left+", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_right:
+	/* Extend the right extent */
+	if (right_ext) {
+		/* How much can we attach to right_ext? */
+		if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - right_ext->e_len;
+		else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+			fillable = max_init_len - right_ext->e_len;
+		else
+			fillable = 0;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_anywhere;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto try_anywhere;
+
+		/* Test if the left edge of the range is already mapped? */
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			err = ext2fs_map_cluster_block(fs, ino, inode,
+					right_ext->e_lblk - fillable, &pblk);
+			if (err)
+				goto out;
+			if (pblk)
+				fillable -= EXT2FS_CLUSTER_RATIO(fs) -
+						((right_ext->e_lblk - fillable)
+						 & EXT2FS_CLUSTER_MASK(fs));
+			if (fillable == 0)
+				goto try_anywhere;
+		}
+
+		/*
+		 * FIXME: It would be nice if we could handle allocating a
+		 * variable range from a fixed end point instead of just
+		 * skipping to the general allocator if the whole range is
+		 * unavailable.
+		 */
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				EXT2_NEWRANGE_MIN_LENGTH,
+				right_ext->e_pblk - fillable,
+				fillable, NULL, &pblk, &plen);
+		if (err)
+			goto try_anywhere;
+		err = claim_range(fs, inode,
+			      pblk & ~EXT2FS_CLUSTER_MASK(fs),
+			      plen + (pblk & EXT2FS_CLUSTER_MASK(fs)));
+		if (err)
+			goto out;
+
+		/* Modify right_ext */
+		err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+		if (err)
+			goto out;
+		range_len -= plen;
+		right_ext->e_lblk -= plen;
+		right_ext->e_pblk -= plen;
+		right_ext->e_len += plen;
+		dbg_print_extent("ext_falloc right+", right_ext);
+		err = ext2fs_extent_replace(handle, 0, right_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk,
+					plen + cluster_fill, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_anywhere:
+	/* Try implied cluster alloc on the left and right ends */
+	if (range_len > 0 && (range_start & EXT2FS_CLUSTER_MASK(fs))) {
+		cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+			       (range_start & EXT2FS_CLUSTER_MASK(fs));
+		cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill > range_len)
+			cluster_fill = range_len;
+		newex.e_lblk = range_start;
+		err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+					       &pblk);
+		if (err)
+			goto out;
+		if (pblk == 0)
+			goto try_right_implied;
+		newex.e_pblk = pblk;
+		newex.e_len = cluster_fill;
+		newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+				 EXT2_EXTENT_FLAGS_UNINIT);
+		dbg_print_extent("ext_falloc iclus left+", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+						  newex.e_len, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		range_start += cluster_fill;
+		range_len -= cluster_fill;
+	}
+
+try_right_implied:
+	y = range_start + range_len;
+	if (range_len > 0 && (y & EXT2FS_CLUSTER_MASK(fs))) {
+		cluster_fill = y & EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill > range_len)
+			cluster_fill = range_len;
+		newex.e_lblk = y & ~EXT2FS_CLUSTER_MASK(fs);
+		err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+					       &pblk);
+		if (err)
+			goto out;
+		if (pblk == 0)
+			goto no_implied;
+		newex.e_pblk = pblk;
+		newex.e_len = cluster_fill;
+		newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+				 EXT2_EXTENT_FLAGS_UNINIT);
+		dbg_print_extent("ext_falloc iclus right+", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+						  newex.e_len, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		range_len -= cluster_fill;
+	}
+
+no_implied:
+	if (range_len == 0)
+		return 0;
+
+	newex.e_lblk = range_start;
+	if (flags & EXT2_FALLOCATE_FORCE_INIT) {
+		max_extent_len = max_init_len;
+		newex.e_flags = 0;
+	} else {
+		max_extent_len = max_uninit_len;
+		newex.e_flags = EXT2_EXTENT_FLAGS_UNINIT;
+	}
+	pblk = alloc_goal;
+	y = range_len;
+	for (x = 0; x < y;) {
+		cluster_fill = newex.e_lblk & EXT2FS_CLUSTER_MASK(fs);
+		fillable = min(range_len + cluster_fill, max_extent_len);
+		err = ext2fs_new_range(fs, 0, pblk & ~EXT2FS_CLUSTER_MASK(fs),
+				       fillable,
+				       NULL, &pblk, &plen);
+		if (err)
+			goto out;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Create extent */
+		newex.e_pblk = pblk + cluster_fill;
+		newex.e_len = plen - cluster_fill;
+		dbg_print_extent("ext_falloc create", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		/* Update variables at end of loop */
+		x += plen - cluster_fill;
+		range_len -= plen - cluster_fill;
+		newex.e_lblk += plen - cluster_fill;
+		pblk += plen - cluster_fill;
+		if (pblk >= ext2fs_blocks_count(fs->super))
+			pblk = fs->super->s_first_data_block;
+	}
+
+out:
+	return err;
+}
+
+static errcode_t extent_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+				      struct ext2_inode *inode, blk64_t goal,
+				      blk64_t start, blk64_t len)
+{
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	left_extent, right_extent;
+	struct ext2fs_extent	*left_adjacent, *right_adjacent;
+	errcode_t		err;
+	blk64_t			range_start, range_end = 0, end, next;
+	blk64_t			count, goal_distance;
+
+	end = start + len - 1;
+	err = ext2fs_extent_open2(fs, ino, inode, &handle);
+	if (err)
+		return err;
+
+	/*
+	 * Find the extent closest to the start of the alloc range.  We don't
+	 * check the return value because _goto() sets the current node to the
+	 * next-lowest extent if 'start' is in a hole; or the next-highest
+	 * extent if there aren't any lower ones; or doesn't set a current node
+	 * if there was a real error reading the extent tree.  In that case,
+	 * _get() will error out.
+	 */
+start_again:
+	ext2fs_extent_goto(handle, start);
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, &left_extent);
+	if (err == EXT2_ET_NO_CURRENT_NODE) {
+		blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+
+		if (goal == ~0ULL)
+			goal = ext2fs_find_inode_goal(fs, ino, inode, start);
+		err = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+						goal, max_blocks - 1, &goal);
+		goal += start;
+		err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+					NULL, start, len, goal);
+		goto errout;
+	} else if (err)
+		goto errout;
+
+	dbg_print_extent("ext_falloc initial", &left_extent);
+	next = left_extent.e_lblk + left_extent.e_len;
+	if (left_extent.e_lblk > start) {
+		/* The nearest extent we found was beyond start??? */
+		goal = left_extent.e_pblk - (left_extent.e_lblk - start);
+		err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+					&left_extent, start,
+					left_extent.e_lblk - start, goal);
+		if (err)
+			goto errout;
+
+		goto start_again;
+	} else if (next >= start) {
+		range_start = next;
+		left_adjacent = &left_extent;
+	} else {
+		range_start = start;
+		left_adjacent = NULL;
+	}
+	goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+	goal_distance = range_start - next;
+
+	do {
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF,
+					   &right_extent);
+		dbg_printf("%s: ino=%d get next =%d\n", __func__, ino,
+			   (int)err);
+		dbg_print_extent("ext_falloc next", &right_extent);
+		/* Stop if we've seen this extent before */
+		if (!err && right_extent.e_lblk <= left_extent.e_lblk)
+			err = EXT2_ET_EXTENT_NO_NEXT;
+
+		if (err && err != EXT2_ET_EXTENT_NO_NEXT)
+			goto errout;
+		if (err == EXT2_ET_EXTENT_NO_NEXT ||
+		    right_extent.e_lblk > end + 1) {
+			range_end = end;
+			right_adjacent = NULL;
+		} else {
+			/* Handle right_extent.e_lblk <= end */
+			range_end = right_extent.e_lblk - 1;
+			right_adjacent = &right_extent;
+		}
+		if (err != EXT2_ET_EXTENT_NO_NEXT &&
+		    goal_distance > (range_end - right_extent.e_lblk)) {
+			goal = right_extent.e_pblk -
+					(right_extent.e_lblk - range_start);
+			goal_distance = range_end - right_extent.e_lblk;
+		}
+
+		dbg_printf("%s: ino=%d rstart=%llu rend=%llu\n", __func__, ino,
+			   range_start, range_end);
+		err = 0;
+		if (range_start <= range_end) {
+			count = range_end - range_start + 1;
+			err = ext_falloc_helper(fs, flags, ino, inode, handle,
+						left_adjacent, right_adjacent,
+						range_start, count, goal);
+			if (err)
+				goto errout;
+		}
+
+		if (range_end == end)
+			break;
+
+		err = ext2fs_extent_goto(handle, right_extent.e_lblk);
+		if (err)
+			goto errout;
+		next = right_extent.e_lblk + right_extent.e_len;
+		left_extent = right_extent;
+		left_adjacent = &left_extent;
+		range_start = next;
+		goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+		goal_distance = range_start - next;
+	} while (range_end < end);
+
+errout:
+	ext2fs_extent_free(handle);
+	return err;
+}
+
+/*
+ * Map physical blocks to a range of logical blocks within a file.  The range
+ * of logical blocks are (start, start + len).  If there are already extents,
+ * the mappings will try to extend the mappings; otherwise, it will try to map
+ * start as if logical block 0 points to goal.  If goal is ~0ULL, then the goal
+ * is calculated based on the inode group.
+ *
+ * Flags:
+ * - EXT2_FALLOCATE_ZERO_BLOCKS: Zero the blocks that are allocated.
+ * - EXT2_FALLOCATE_FORCE_INIT: Create only initialized extents.
+ * - EXT2_FALLOCATE_FORCE_UNINIT: Create only uninitialized extents.
+ * - EXT2_FALLOCATE_INIT_BEYOND_EOF: Create extents beyond EOF.
+ *
+ * If neither FORCE_INIT nor FORCE_UNINIT are specified, this function will
+ * try to expand any extents it finds, zeroing blocks as necessary.
+ */
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+			   struct ext2_inode *inode, blk64_t goal,
+			   blk64_t start, blk64_t len)
+{
+	struct ext2_inode	inode_buf;
+	blk64_t			blk, x;
+	errcode_t		err;
+
+	if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+	    (flags & EXT2_FALLOCATE_FORCE_UNINIT)) ||
+	   (flags & ~EXT2_FALLOCATE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (len > ext2fs_blocks_count(fs->super))
+		return EXT2_ET_BLOCK_ALLOC_FAIL;
+	else if (len == 0)
+		return 0;
+
+	/* Read inode structure if necessary */
+	if (!inode) {
+		err = ext2fs_read_inode(fs, ino, &inode_buf);
+		if (err)
+			return err;
+		inode = &inode_buf;
+	}
+	dbg_printf("%s: ino=%d start=%llu len=%llu goal=%llu\n", __func__, ino,
+		   start, len, goal);
+
+	if (inode->i_flags & EXT4_EXTENTS_FL) {
+		err = extent_fallocate(fs, flags, ino, inode, goal, start, len);
+		goto out;
+	}
+
+	/* XXX: Allocate a bunch of blocks the slow way */
+	for (blk = start; blk < start + len; blk++) {
+		err = ext2fs_bmap2(fs, ino, inode, NULL, 0, blk, 0, &x);
+		if (err)
+			return err;
+		if (x)
+			continue;
+
+		err = ext2fs_bmap2(fs, ino, inode, NULL,
+				   BMAP_ALLOC | BMAP_UNINIT | BMAP_ZERO, blk,
+				   0, &x);
+		if (err)
+			return err;
+	}
+
+out:
+	if (inode == &inode_buf)
+		ext2fs_write_inode(fs, ino, inode);
+	return err;
+}


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (30 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 31/35] libext2fs: implement fallocate Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 33/35] debugfs: implement fallocate Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 34/35] tests: test debugfs punch command Darrick J. Wong
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Use the new fallocate API for creating the journal and the mk_hugefile
feature.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/mkjournal.c                |  134 +++++----------------------------
 misc/mk_hugefiles.c                   |   96 ++----------------------
 tests/f_opt_extent/expect             |   15 ----
 tests/r_32to64bit_meta/expect         |    4 -
 tests/r_32to64bit_move_itable/expect  |    4 -
 tests/r_64to32bit/expect              |    4 -
 tests/r_64to32bit_meta/expect         |    4 -
 tests/t_disable_mcsum_noinitbg/expect |    6 +
 tests/t_enable_mcsum/expect           |   15 ----
 tests/t_enable_mcsum_ext3/expect      |    8 +-
 tests/t_enable_mcsum_initbg/expect    |   11 +--
 tests/t_iexpand_full/expect           |    4 -
 12 files changed, 56 insertions(+), 249 deletions(-)


diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index c42cb98..02a65cb 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -227,89 +227,6 @@ errcode_t ext2fs_zero_blocks(ext2_filsys fs, blk_t blk, int num,
 }
 
 /*
- * Helper function for creating the journal using direct I/O routines
- */
-struct mkjournal_struct {
-	int		num_blocks;
-	int		newblocks;
-	blk64_t		goal;
-	blk64_t		blk_to_zero;
-	int		zero_count;
-	int		flags;
-	char		*buf;
-	errcode_t	err;
-};
-
-static int mkjournal_proc(ext2_filsys	fs,
-			  blk64_t	*blocknr,
-			  e2_blkcnt_t	blockcnt,
-			  blk64_t	ref_block EXT2FS_ATTR((unused)),
-			  int		ref_offset EXT2FS_ATTR((unused)),
-			  void		*priv_data)
-{
-	struct mkjournal_struct *es = (struct mkjournal_struct *) priv_data;
-	blk64_t	new_blk;
-	errcode_t	retval;
-
-	if (*blocknr) {
-		es->goal = *blocknr;
-		return 0;
-	}
-	if (blockcnt &&
-	    (EXT2FS_B2C(fs, es->goal) == EXT2FS_B2C(fs, es->goal+1)))
-		new_blk = es->goal+1;
-	else {
-		es->goal &= ~EXT2FS_CLUSTER_MASK(fs);
-		retval = ext2fs_new_block2(fs, es->goal, 0, &new_blk);
-		if (retval) {
-			es->err = retval;
-			return BLOCK_ABORT;
-		}
-		ext2fs_block_alloc_stats2(fs, new_blk, +1);
-		es->newblocks++;
-	}
-	if (blockcnt >= 0)
-		es->num_blocks--;
-
-	retval = 0;
-	if (blockcnt <= 0)
-		retval = io_channel_write_blk64(fs->io, new_blk, 1, es->buf);
-	else if (!(es->flags & EXT2_MKJOURNAL_LAZYINIT)) {
-		if (es->zero_count) {
-			if ((es->blk_to_zero + es->zero_count == new_blk) &&
-			    (es->zero_count < 1024))
-				es->zero_count++;
-			else {
-				retval = ext2fs_zero_blocks2(fs,
-							     es->blk_to_zero,
-							     es->zero_count,
-							     0, 0);
-				es->zero_count = 0;
-			}
-		}
-		if (es->zero_count == 0) {
-			es->blk_to_zero = new_blk;
-			es->zero_count = 1;
-		}
-	}
-
-	if (blockcnt == 0)
-		memset(es->buf, 0, fs->blocksize);
-
-	if (retval) {
-		es->err = retval;
-		return BLOCK_ABORT;
-	}
-	*blocknr = es->goal = new_blk;
-
-	if (es->num_blocks == 0)
-		return (BLOCK_CHANGED | BLOCK_ABORT);
-	else
-		return BLOCK_CHANGED;
-
-}
-
-/*
  * Calculate the initial goal block to be roughly at the middle of the
  * filesystem.  Pick a group that has the largest number of free
  * blocks.
@@ -350,7 +267,8 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
 	errcode_t		retval;
 	struct ext2_inode	inode;
 	unsigned long long	inode_size;
-	struct mkjournal_struct	es;
+	int			falloc_flags = EXT2_FALLOCATE_FORCE_INIT;
+	blk64_t			zblk;
 
 	if ((retval = ext2fs_create_journal_superblock(fs, num_blocks, flags,
 						       &buf)))
@@ -367,40 +285,16 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
 		goto out2;
 	}
 
-	es.num_blocks = num_blocks;
-	es.newblocks = 0;
-	es.buf = buf;
-	es.err = 0;
-	es.flags = flags;
-	es.zero_count = 0;
-	es.goal = (goal != ~0ULL) ? goal : get_midpoint_journal_block(fs);
+	if (goal == ~0ULL)
+		goal = get_midpoint_journal_block(fs);
 
-	if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
+	if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS)
 		inode.i_flags |= EXT4_EXTENTS_FL;
-		if ((retval = ext2fs_write_inode(fs, journal_ino, &inode)))
-			goto out2;
-	}
 
-	retval = ext2fs_block_iterate3(fs, journal_ino, BLOCK_FLAG_APPEND,
-				       0, mkjournal_proc, &es);
-	if (retval)
-		goto out2;
-	if (es.err) {
-		retval = es.err;
-		goto out2;
-	}
-	if (es.zero_count) {
-		retval = ext2fs_zero_blocks2(fs, es.blk_to_zero,
-					    es.zero_count, 0, 0);
-		if (retval)
-			goto out2;
-	}
-
-	if ((retval = ext2fs_read_inode(fs, journal_ino, &inode)))
-		goto out2;
+	if (!(flags & EXT2_MKJOURNAL_LAZYINIT))
+		falloc_flags |= EXT2_FALLOCATE_ZERO_BLOCKS;
 
 	inode_size = (unsigned long long)fs->blocksize * num_blocks;
-	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
 	inode.i_mtime = inode.i_ctime = fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
 	inode.i_mode = LINUX_S_IFREG | 0600;
@@ -408,9 +302,21 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
 	if (retval)
 		goto out2;
 
+	retval = ext2fs_fallocate(fs, falloc_flags, journal_ino,
+				  &inode, goal, 0, num_blocks);
+	if (retval)
+		goto out2;
+
 	if ((retval = ext2fs_write_new_inode(fs, journal_ino, &inode)))
 		goto out2;
-	retval = 0;
+
+	retval = ext2fs_bmap2(fs, journal_ino, &inode, NULL, 0, 0, NULL, &zblk);
+	if (retval)
+		goto out2;
+
+	retval = io_channel_write_blk64(fs->io, zblk, 1, buf);
+	if (retval)
+		goto out2;
 
 	memcpy(fs->super->s_jnl_blocks, inode.i_block, EXT2_N_BLOCKS*4);
 	fs->super->s_jnl_blocks[15] = inode.i_size_high;
diff --git a/misc/mk_hugefiles.c b/misc/mk_hugefiles.c
index e42c0b9..0978d55 100644
--- a/misc/mk_hugefiles.c
+++ b/misc/mk_hugefiles.c
@@ -258,12 +258,7 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
 
 {
 	errcode_t		retval;
-	blk64_t			lblk, bend = 0;
-	__u64			size;
-	blk64_t			left;
-	blk64_t			count = 0;
 	struct ext2_inode	inode;
-	ext2_extent_handle_t	handle;
 
 	retval = ext2fs_new_inode(fs, 0, LINUX_S_IFREG, NULL, ino);
 	if (retval)
@@ -283,85 +278,20 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
 
 	ext2fs_inode_alloc_stats2(fs, *ino, +1, 0);
 
-	retval = ext2fs_extent_open2(fs, *ino, &inode, &handle);
+	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				      EXT3_FEATURE_INCOMPAT_EXTENTS))
+		inode.i_flags |= EXT4_EXTENTS_FL;
+	retval = ext2fs_fallocate(fs,
+				  EXT2_FALLOCATE_FORCE_INIT |
+				  EXT2_FALLOCATE_ZERO_BLOCKS,
+				  *ino, &inode, ~0ULL, 0, num);
 	if (retval)
 		return retval;
-
-	lblk = 0;
-	left = num ? num : 1;
-	while (left) {
-		blk64_t pblk, end;
-		blk64_t n = left;
-
-		retval =  ext2fs_find_first_zero_block_bitmap2(fs->block_map,
-			goal, ext2fs_blocks_count(fs->super) - 1, &end);
-		if (retval)
-			goto errout;
-		goal = end;
-
-		retval =  ext2fs_find_first_set_block_bitmap2(fs->block_map, goal,
-			       ext2fs_blocks_count(fs->super) - 1, &bend);
-		if (retval == ENOENT) {
-			bend = ext2fs_blocks_count(fs->super);
-			if (num == 0)
-				left = 0;
-		}
-		if (!num || bend - goal < left)
-			n = bend - goal;
-		pblk = goal;
-		if (num)
-			left -= n;
-		goal += n;
-		count += n;
-		ext2fs_block_alloc_stats_range(fs, pblk, n, +1);
-
-		if (zero_hugefile) {
-			blk64_t ret_blk;
-			retval = ext2fs_zero_blocks2(fs, pblk, n,
-						     &ret_blk, NULL);
-
-			if (retval)
-				com_err(program_name, retval,
-					_("while zeroing block %llu "
-					  "for hugefile"), ret_blk);
-		}
-
-		while (n) {
-			blk64_t l = n;
-			struct ext2fs_extent newextent;
-
-			if (l > EXT_INIT_MAX_LEN)
-				l = EXT_INIT_MAX_LEN;
-
-			newextent.e_len = l;
-			newextent.e_pblk = pblk;
-			newextent.e_lblk = lblk;
-			newextent.e_flags = 0;
-
-			retval = ext2fs_extent_insert(handle,
-					EXT2_EXTENT_INSERT_AFTER, &newextent);
-			if (retval)
-				return retval;
-			pblk += l;
-			lblk += l;
-			n -= l;
-		}
-	}
-
-	retval = ext2fs_read_inode(fs, *ino, &inode);
-	if (retval)
-		goto errout;
-
-	retval = ext2fs_iblk_add_blocks(fs, &inode,
-					count / EXT2FS_CLUSTER_RATIO(fs));
-	if (retval)
-		goto errout;
-	size = (__u64) count * fs->blocksize;
-	retval = ext2fs_inode_size_set(fs, &inode, size);
+	retval = ext2fs_inode_size_set(fs, &inode, num * fs->blocksize);
 	if (retval)
-		goto errout;
+		return retval;
 
-	retval = ext2fs_write_new_inode(fs, *ino, &inode);
+	retval = ext2fs_write_inode(fs, *ino, &inode);
 	if (retval)
 		goto errout;
 
@@ -379,13 +309,7 @@ retry:
 		goto retry;
 	}
 
-	if (retval)
-		goto errout;
-
 errout:
-	if (handle)
-		ext2fs_extent_free(handle);
-
 	return retval;
 }
 
diff --git a/tests/f_opt_extent/expect b/tests/f_opt_extent/expect
index 6d4863b..f4ed7ff 100644
--- a/tests/f_opt_extent/expect
+++ b/tests/f_opt_extent/expect
@@ -30,22 +30,11 @@ Change in FS metadata:
  Inode count:              65536
  Block count:              524288
  Reserved block count:     26214
--Free blocks:              570
-+Free blocks:              567
+-Free blocks:              569
++Free blocks:              566
  Free inodes:              65047
  First block:              1
  Block size:               1024
-@@ -47,8 +47,8 @@
-   Block bitmap at 262 (+261)
-   Inode bitmap at 278 (+277)
-   Inode table at 294-549 (+293)
--  21 free blocks, 535 free inodes, 3 directories, 535 unused inodes
--  Free blocks: 4414-4434
-+  18 free blocks, 535 free inodes, 3 directories, 535 unused inodes
-+  Free blocks: 4417-4434
-   Free inodes: 490-1024
- Group 1: (Blocks 8193-16384) [INODE_UNINIT]
-   Backup superblock at 8193, Group descriptors at 8194-8197
 Pass 1: Checking inodes, blocks, and sizes
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
diff --git a/tests/r_32to64bit_meta/expect b/tests/r_32to64bit_meta/expect
index 0eacd45..8796503 100644
--- a/tests/r_32to64bit_meta/expect
+++ b/tests/r_32to64bit_meta/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
  Inode count:              65536
  Block count:              524288
  Reserved block count:     26214
--Free blocks:              858
-+Free blocks:              852
+-Free blocks:              857
++Free blocks:              851
  Free inodes:              65046
  First block:              1
  Block size:               1024
diff --git a/tests/r_32to64bit_move_itable/expect b/tests/r_32to64bit_move_itable/expect
index b51663d..999bb8d 100644
--- a/tests/r_32to64bit_move_itable/expect
+++ b/tests/r_32to64bit_move_itable/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
  Inode count:              98304
  Block count:              786432
  Reserved block count:     39321
--Free blocks:              764
-+Free blocks:              734
+-Free blocks:              763
++Free blocks:              733
  Free inodes:              97566
  First block:              1
  Block size:               1024
diff --git a/tests/r_64to32bit/expect b/tests/r_64to32bit/expect
index 13e94a2..5d2ea4b 100644
--- a/tests/r_64to32bit/expect
+++ b/tests/r_64to32bit/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
  Inode count:              65536
  Block count:              524288
  Reserved block count:     26214
--Free blocks:              571
-+Free blocks:              589
+-Free blocks:              570
++Free blocks:              588
  Free inodes:              65048
  First block:              1
  Block size:               1024
diff --git a/tests/r_64to32bit_meta/expect b/tests/r_64to32bit_meta/expect
index d6e2dcc..1400c6b 100644
--- a/tests/r_64to32bit_meta/expect
+++ b/tests/r_64to32bit_meta/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
  Inode count:              65536
  Block count:              524288
  Reserved block count:     26214
--Free blocks:              852
-+Free blocks:              858
+-Free blocks:              851
++Free blocks:              857
  Free inodes:              65046
  First block:              1
  Block size:               1024
diff --git a/tests/t_disable_mcsum_noinitbg/expect b/tests/t_disable_mcsum_noinitbg/expect
index a022631..09e4ff1 100644
--- a/tests/t_disable_mcsum_noinitbg/expect
+++ b/tests/t_disable_mcsum_noinitbg/expect
@@ -40,9 +40,9 @@ Change in FS metadata:
    Block bitmap at 262 (+261)
    Inode bitmap at 278 (+277)
    Inode table at 294-549 (+293)
--  21 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-+  21 free blocks, 536 free inodes, 2 directories
-   Free blocks: 4413-4433
+-  0 free blocks, 536 free inodes, 2 directories, 536 unused inodes
++  0 free blocks, 536 free inodes, 2 directories
+   Free blocks: 
    Free inodes: 489-1024
 -Group 1: (Blocks 8193-16384) [INODE_UNINIT]
 +Group 1: (Blocks 8193-16384)
diff --git a/tests/t_enable_mcsum/expect b/tests/t_enable_mcsum/expect
index 2ee3c27..81e1125 100644
--- a/tests/t_enable_mcsum/expect
+++ b/tests/t_enable_mcsum/expect
@@ -45,8 +45,8 @@ Change in FS metadata:
  Inode count:              65536
  Block count:              524288
  Reserved block count:     26214
--Free blocks:              571
-+Free blocks:              568
+-Free blocks:              570
++Free blocks:              567
  Free inodes:              65048
  First block:              1
  Block size:               1024
@@ -58,17 +58,6 @@ Change in FS metadata:
  Journal features:         (none)
  Journal size:             16M
  Journal length:           16384
-@@ -46,8 +47,8 @@
-   Block bitmap at 262 (+261)
-   Inode bitmap at 278 (+277)
-   Inode table at 294-549 (+293)
--  21 free blocks, 536 free inodes, 2 directories, 536 unused inodes
--  Free blocks: 4413-4433
-+  18 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-+  Free blocks: 4413, 4417-4433
-   Free inodes: 489-1024
- Group 1: (Blocks 8193-16384) [INODE_UNINIT]
-   Backup superblock at 8193, Group descriptors at 8194-8197
 Pass 1: Checking inodes, blocks, and sizes
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
diff --git a/tests/t_enable_mcsum_ext3/expect b/tests/t_enable_mcsum_ext3/expect
index 5460482..0f761a9 100644
--- a/tests/t_enable_mcsum_ext3/expect
+++ b/tests/t_enable_mcsum_ext3/expect
@@ -49,8 +49,8 @@ Change in FS metadata:
    Reserved GDT blocks at 4-259
    Block bitmap at 260 (+259)
 @@ -45,7 +46,7 @@
-   7789 free blocks, 1013 free inodes, 2 directories
-   Free blocks: 404-8192
+   0 free blocks, 1013 free inodes, 2 directories
+   Free blocks: 
    Free inodes: 12-1024
 -Group 1: (Blocks 8193-16384)
 +Group 1: (Blocks 8193-16384) [ITABLE_ZEROED]
@@ -58,8 +58,8 @@ Change in FS metadata:
    Reserved GDT blocks at 8196-8451
    Block bitmap at 8452 (+259)
 @@ -54,6 +55,6 @@
-   7803 free blocks, 1024 free inodes, 0 directories
-   Free blocks: 8582-16384
+   0 free blocks, 1024 free inodes, 0 directories
+   Free blocks: 
    Free inodes: 1025-2048
 -Group 2: (Blocks 16385-24576)
 +Group 2: (Blocks 16385-24576) [ITABLE_ZEROED]
diff --git a/tests/t_enable_mcsum_initbg/expect b/tests/t_enable_mcsum_initbg/expect
index d3b4444..3cbb98f 100644
--- a/tests/t_enable_mcsum_initbg/expect
+++ b/tests/t_enable_mcsum_initbg/expect
@@ -45,8 +45,8 @@ Change in FS metadata:
  Inode count:              65536
  Block count:              524288
  Reserved block count:     26214
--Free blocks:              571
-+Free blocks:              568
+-Free blocks:              570
++Free blocks:              567
  Free inodes:              65048
  First block:              1
  Block size:               1024
@@ -69,10 +69,9 @@ Change in FS metadata:
    Block bitmap at 262 (+261)
    Inode bitmap at 278 (+277)
    Inode table at 294-549 (+293)
--  21 free blocks, 536 free inodes, 2 directories
--  Free blocks: 4413-4433
-+  18 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-+  Free blocks: 4413, 4417-4433
+-  0 free blocks, 536 free inodes, 2 directories
++  0 free blocks, 536 free inodes, 2 directories, 536 unused inodes
+   Free blocks: 
    Free inodes: 489-1024
 -Group 1: (Blocks 8193-16384)
 +Group 1: (Blocks 8193-16384) [INODE_UNINIT, ITABLE_ZEROED]
diff --git a/tests/t_iexpand_full/expect b/tests/t_iexpand_full/expect
index 3eb1715..0474827 100644
--- a/tests/t_iexpand_full/expect
+++ b/tests/t_iexpand_full/expect
@@ -21,8 +21,8 @@ Setting inode size 256
 Exit status is 0
 Change in FS metadata:
 @@ -13 +13 @@
--Free blocks:              12301
-+Free blocks:              12
+-Free blocks:              12299
++Free blocks:              10
 @@ -22 +22 @@
 -Inode blocks per group:   128
 +Inode blocks per group:   256


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 33/35] debugfs: implement fallocate
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (31 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  2015-04-02  2:37 ` [PATCH 34/35] tests: test debugfs punch command Darrick J. Wong
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Implement a fallocate function for debugfs, and add some tests to
demonstrate that it works (more or less).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debug_cmds.ct                |    3 +
 debugfs/debugfs.8.in                 |    7 +
 debugfs/debugfs.c                    |   36 +++++++
 debugfs/debugfs.h                    |    1 
 tests/d_fallocate/expect.gz          |  Bin
 tests/d_fallocate/name               |    1 
 tests/d_fallocate/script             |  175 ++++++++++++++++++++++++++++++++++
 tests/d_fallocate_bigalloc/expect.gz |  Bin
 tests/d_fallocate_bigalloc/name      |    1 
 tests/d_fallocate_bigalloc/script    |  176 ++++++++++++++++++++++++++++++++++
 tests/d_fallocate_blkmap/expect      |   58 +++++++++++
 tests/d_fallocate_blkmap/name        |    1 
 tests/d_fallocate_blkmap/script      |   85 ++++++++++++++++
 13 files changed, 544 insertions(+)
 create mode 100644 tests/d_fallocate/expect.gz
 create mode 100644 tests/d_fallocate/name
 create mode 100644 tests/d_fallocate/script
 create mode 100644 tests/d_fallocate_bigalloc/expect.gz
 create mode 100644 tests/d_fallocate_bigalloc/name
 create mode 100644 tests/d_fallocate_bigalloc/script
 create mode 100644 tests/d_fallocate_blkmap/expect
 create mode 100644 tests/d_fallocate_blkmap/name
 create mode 100644 tests/d_fallocate_blkmap/script


diff --git a/debugfs/debug_cmds.ct b/debugfs/debug_cmds.ct
index c6f6d6c..34dad9e 100644
--- a/debugfs/debug_cmds.ct
+++ b/debugfs/debug_cmds.ct
@@ -157,6 +157,9 @@ request do_dirsearch, "Search a directory for a particular filename",
 request do_bmap, "Calculate the logical->physical block mapping for an inode",
 	bmap;
 
+request do_fallocate, "Allocate uninitialized blocks to an inode",
+	fallocate;
+
 request do_punch, "Punch (or truncate) blocks from an inode by deallocating them",
 	punch, truncate;
 
diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index 9a09cbf..a463c73 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -306,6 +306,13 @@ from the file \fIfilespec\fR.
 Expand the directory
 .IR filespec .
 .TP
+.BI fallocate " filespec start_block [end_block]
+Allocate and map uninitialized blocks into \fIfilespec\fR between
+logical block \fIstart_block\fR and \fIend_block\fR, inclusive.  If
+\fIend_block\fR is not supplied, this function maps until it runs out
+of free disk blocks or the maximum file size is reached.  Existing
+mappings are left alone.
+.TP
 .BI feature " [fs_feature] [-fs_feature] ..."
 Set or clear various filesystem features in the superblock.  After setting
 or clearing any filesystem features that were requested, print the current
diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 4b88f73..2af6b71 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -2196,6 +2196,42 @@ void do_punch(int argc, char *argv[])
 		return;
 	}
 }
+
+void do_fallocate(int argc, char *argv[])
+{
+	ext2_ino_t	ino;
+	blk64_t		start, end;
+	int		err;
+	errcode_t	errcode;
+
+	if (common_args_process(argc, argv, 3, 4, argv[0],
+				"<file> start_blk [end_blk]",
+				CHECK_FS_RW | CHECK_FS_BITMAPS))
+		return;
+
+	ino = string_to_inode(argv[1]);
+	if (!ino)
+		return;
+	err = strtoblk(argv[0], argv[2], "logical block", &start);
+	if (err)
+		return;
+	if (argc == 4) {
+		err = strtoblk(argv[0], argv[3], "logical block", &end);
+		if (err)
+			return;
+	} else
+		end = ~0;
+
+	errcode = ext2fs_fallocate(current_fs, EXT2_FALLOCATE_INIT_BEYOND_EOF,
+				   ino, NULL, ~0ULL, start, end - start + 1);
+
+	if (errcode) {
+		com_err(argv[0], errcode,
+			"while fallocating inode %u from %llu to %llu\n", ino,
+			(unsigned long long) start, (unsigned long long) end);
+		return;
+	}
+}
 #endif /* READ_ONLY */
 
 void do_symlink(int argc, char *argv[])
diff --git a/debugfs/debugfs.h b/debugfs/debugfs.h
index e163d0a..76bb22c 100644
--- a/debugfs/debugfs.h
+++ b/debugfs/debugfs.h
@@ -166,6 +166,7 @@ extern void do_imap(int argc, char **argv);
 extern void do_set_current_time(int argc, char **argv);
 extern void do_supported_features(int argc, char **argv);
 extern void do_punch(int argc, char **argv);
+extern void do_fallocate(int argc, char **argv);
 extern void do_symlink(int argc, char **argv);
 
 extern void do_dump_mmp(int argc, char **argv);
diff --git a/tests/d_fallocate/expect.gz b/tests/d_fallocate/expect.gz
new file mode 100644
index 0000000000000000000000000000000000000000..3e6ffc38595b7c451e3de0df3a16f795cf33421b
GIT binary patch
literal 3770
zcmZ9Nc{~$-{KqA}=E^ZmjxiFcMP=1u5+X-hxpK>W6gKA^k(*pYteD9aQf!j@-ZF9@
zIa|z%<Svs4zxDn8{`ft9|GXaW_xt(29`DEJ@%Tt50odT_7N+b+X*TO=5_LlRIp3=+
zUYwp2x!ot`yF<Qymm`s%12x3vEwp8VZp|Eg<<)~-o3D>}6x37KoE6w$b-7eoz5PW1
zRlzvTCtl@Ca_ekW)!OBmnT6E_^6ti0RNcX3*ym3z4_+tK^0)YW4x6-X(}3NZ$?Yn;
zpW7c8^!D0oyV(e<b;fMYm3tHG>l)x3GR?cK)g#dQKH*A*28L(XgRF}W72n*d5h_i2
zZ9e<RJ+!35=U3AnvY+-bH}KeC?)G9uRcqsv*F=Bc;CI}<d=w+*;NFHzMDmsp7tF@i
zetoU(#QduFAt`U!TTRm8-R|%2DEDu{KlhuM%ss?&vdQ@kSnHz0&-`rSWEPjL&*eR4
zfZ-G_yMBM4N!t(4Q(fx0J<+sUyT86L(NDvT4f5@-PWVjhF}Fi#6n!VS*p{B8^qNL^
zv*)bWMxuGpGMU!4(UE`U#^Agv)gFc%tNV`Z53JjN2|o1p3vFxC2%{;vH$@F*+!#@?
zZQ1ein+V5NH0^1>y$bUTcpL5)v9~=G=Kb8RG-x^ESF`2eIHJj_#p<x}V5oWW^Xiq=
zJ4=fn*8=?>F=bkIe$#N;VNB1K7B8LOKgb_f9X^)1rFb413{FRo@;zIZ#KF_{?jy9b
zBb}Q*x|a?oy=!xu7{dbhq%|H{Xmp9(nn*bxTkCbB<Q^$SV+>h9MR$43`L9$(j&kDH
z&<}x?UNZ+vW)Dlz@(gyA%(g&#Ebx<%na`i1;W12@jrVioSD_zkpPIi)e+wf~4piyW
z_jVYTIbkb4kv}~UkFsBy-?cr&qEv@(aOQL}hDG9FVtfLcb+*~VaxZ$!N1Qb?-xT*a
z61)lW#%YIRI)o?#2A5J_Q@e>CBK;L3@hN9zs)BK%LZJ#`*aah-hmp0V0zEdI&jeqb
z>oNVoLCeI`6O|p_&!E8=-^Dj@fnZ-eN(%8j77@Jh<!tbeIYm5O-#$*^B?YKFyDE$1
zQyIu|#XEF0j3le{!SgRLoRuh<j}^+la0lYguENQwekEYHN%S>0>Q^m9M{3Zj?|S*N
z0HM=zJ<++_1e-8QWH1&3auoR0Di>FGE@==5IR)SeJZ6N*sA*5loz&PYHVPTs?gwF>
z+eYx(q~NBQTu9gP-e%VZK_m##@D-uIB0%`_V^Qe`huY_NX8+EBBDXIh#Ii6KO{tlY
z;43!&N5kS2f@BDMm<$8LeX3C`IEbWysa7(|5&gIPE#f!t=9dnWGS>PwXZF#N8cMzU
zo86i1ti_>7`b7lyO8qxg0>^(4ENapNT&c&C7*@6k3X|ya8V0+XQftOvbAxKe+AHKR
zmiO<StBH)-kyPtRwDnTjU!|v+tmI>=b8e$<B4`LHL)Y$%3m#hsrl+9J+~%LmLl_{I
zVi;(I1$ub|lalMa6qM&YvT;6ZteyI<ML8;edulCYn@#NMI%uE?thU~c?&8l2;w|!_
z4yd7286ibay9~GB0lP#de3X%j7l`jKd{9cwOqLMWEZyC7O_`gyX`Om7ye2oV4Yx~;
zi|Z|5DniQKd5x`Yj?JBx%02tlF~Y#jgI2NCyXor>H;AvNZAPcKWx=DDe`TL1Npjy*
zqR;!D-r?a1IeCfYQriRSOr8*fgmI`^s>j?dii5I@b!UT=TL?ZWQy$NKHxikTp3_m4
zB+I`&lOh^|->N^61f)NVE&XcriuxWYlAlniF&=5;fGCt&eC#JOD1M&sw2P6<i1{fL
z&t#13-W^X`M=QEgF$FNY@uVy5&nIWu++I#ZR@*hRi8`r9TS#9UJ%M8i2OU-hklKz8
zh%P*5&XdAr=1~hpB^1MP!*6Is5Czi;v#Sy{K*9q`KX^qvvj_7q&{d+ca|w>nuYxAW
zyPQ7M*Mw7q;@>(K88Vn5GHd7ymx)rO#C$uhaWP!w&&JX3Ycdo}C94b0--=c%zncm#
z5;;Q;X~7=3EqiQPp@hk>Q<iJDd?KLS!NoTkhz)JQaw;o%N2}(6t&5O@u8)_QT1x5y
zv~u(!xV`DI-9L8RC`;wfM?_nM`$n?~8hK3Nmg!o4kEXa^B7c@Xf?4p$Prhs75}n(M
zoHpZzdW?YVx6A8~&wexc6+(w=O`lOK0!<8-J{xKfl&@R1KE&+38(cF{3n+B@l^DNw
z>7go{5eJ1<^FX^Jk1llu)zlbi+NqB35D4;w002+a#Q;hx&hiZSUcYU`QM{ts?RIrI
zCfRUij4&|d8q>4*)E23qa80TpQC*d8Q62x88PwkJLyG@Xv|)F>)a}(JK5gClu#4nP
z+20a1qfLn);Q3CGwQXIVOHuc9Ex|_@GtokM)0tw%<FV%2M?4JxMYmTbNC#xbk_KNq
z4!A!Wb3ndx_)k8{zYC>%$gb-`I^f*J8SL|JDQE49ezONH+rNmgE|1{dpE20C{x+*L
z-WGe{RdN6}?=|~As|5}u9e}si7nz59zQP8ZDvOwxf*N}y?9noACmBE8p*z?k(xjSL
zYWGwcrx>LBJgB$_pdw<fH!_$gj5w~cmrgV;yy2-(ED$3Bh<SRnkVp#};qpF>eQK3H
zJ(?bt1{Vf6FpATiFtUP;$=7$=14!2ap<GKP>@q7yuFg6YT+iUjn|5KB;g5V~5P1SX
zfr1huuiq-UPHe+kLIqc#a&8=*qQVJpjsudHAzm(85c0Wmr(PG=hHYP#5#)-jf6tB<
zOsEn`h`;{FF%koy+~c8WL&8bd?~@eqLJ6&`kta){u7A>l@MThA=LD}<xyP*!09%^|
z66jhgg)#UUm0;+yE^q%vM{p+>F8PEKj6AbA(xW4iAaM>x#x0PP!6kGSCpNzxgw`|+
zY2{AG#w|HoHV>hUEkVk#Pjo!K#=n66u-cXtdf=xzIuaW<+8wy5MUJXgdpqP?5WD@3
zmmf%RU`2j2&esi*%Vb^j(Gu0!h3QMN<EVogtHvSmfqe4--Z%6=9`F1ME<p<jK?k2n
z!CT9CSnIW8Y5kPXgepJT#~0}H0bIyk4)K?U%RL)A6p!M`%*bclX$oQ7{G!AMy}vx2
z?6t^KV^+AJouUnk2Y`{j7%H$eChl#y0^Uet3Fr<_iRP#^iR_N$n0MtA?qVldxT`5(
z&w~KB2!W&Q+)k109%J6OAr4T%`p34`AiWd390M=^BX9s^kXtT!01309R-Pl|vjxdP
z;pNjvmW5_nxzWjOCJs}*fZjVLL}TCk6XV>y?G$mVGtlU05AA@?i>RyoA6cP>OEsnz
z|8R}QEyuXby%MhkW=EPl#FP{H@H3iEzLXO`#m1cS>x=@V#<)NPRZhN+Q3HYS)+X>x
zGbaZ+WulL;^NK{Ao6ng7+FxQ3kN;1^bN`p1{*oe~eKgQsC3L=`D=G$fty@R^cySvR
zLVgYLfa(c2{JyF=Vu3|<8a)HarTqK674*b;g%^qPecTzWicz1=n!lOw{lWzf{gawJ
zHkiA9QN`&H&W@-u56uxhY!vS(eoIYJPux}$uzjka;VeL$+fvY(FC!KiE2(tSc{oc8
z#KGGc@K+0QJeGpcgi2sF#6r+k!C8|C2`R_2T0Jw6Vc;7cRKz4hwEG-oBmh&UJYst<
z^h&ToV|o|LhQDiR5%@jlm|$)(&6GaQOF&#_)C=8DBZdx|cTQHW8CwXBB~&ztfH*GA
zXWQ}cMu~3*{F@O*D<MM4gxa=@Ef6s)%Sy4Fd!;drtSZ4%;XNg&F;4GWr-p2ft*aT~
zg4v*E*VsTEII{qO$M5CcMBO+oIqYjMp6!<1sm`+0WD#5Zac7~hsSmFqNprpCu77zT
z+8nE6j(yX|Q#~2v<-&4^ZlW%n@6EbWh+rB(TAeMIn%O4y;Si?SeY<;>@2d}_GQ6Ui
z_iNglVWrzV)c_$c7h|kWg8SLbD5%zrCb(j^?2Vc%eNB)|;hS53`RaTzBuQ}CT%OZ%
zWxVzxi`Ud7Wkkt}>hXNfa7|<S%N<78F9eoW7suT-CX+LArX0m}FYpT_dARnvpKyWM
zSn85b7ttNW`X3(QFM1po&k3W}v_*U|y5ho~|8&5g^TN#%l`vyf?bCS$4oiJ^hxzhF
z`URi3?%W#YlRn;0X+ATm;4z=LX`-W&ESqZz1FF=A<$p5o)K7B~L4JF-9@&nqH!XGw
zmCX3oBzEzMYh!JUpZdUyZgTM-VL3=Ys9#R93;5(YfBm>=$);8sFj+y-HWFKT{vIq7
zHTT}P`92j>GJmR2%IyO3!8{n=m3uBnA1+c<4@cr*B;^zwGs-#Q>xdcjKk)7Sd$l0E
zM87Dj?Gu7cGge2?Mk%}}1bswDMJRs1&gM@Bb3W9j5wuwkQe1^tDr<f%6w}@0#U<I7
zP4sFH-$FD*lU)stNmL}C6zRVqDOu~gY54c|E+k!=T>78`ZLw>V&5cP)Kju#pB_$^r
zgtQdPbyOPN<Y?FDcDr~?M@qgc9G>5&0}X%WE<b-q?xwIho8;gZSy0ZIoJ6=tQ-HoV
zEu2rfMA?yEUidsgd)UhD?g<^Jjgq57KKc>KXv=E@A(CVPUw+TjQ4y(A>I9jeaqJ1v
zBuNB=_tfSLbfA{X<&#E@{(-tehQGi$H2I2_0M_j#_cTP>YRsYFkl@7t|7*AGejO%N
zy$NfuuiEobNcpz#GkoGv=90I-n6%kXN=OmmWm}ccfn!UpeMD18!|&Cqe%$uxxnQ8@
zY|Ii}ow74sH@A>zvA+S5FVpeaq@1XYUXmEKBAoovg9|!n*z>SE*!iyWb2q>0@L;}l
y0k>{<MPq+8UfTX(XFh&w5ttwuF<UycB^qw5mO16~r}$&Uk<ipHmv5P~vHcHq|7W-W

literal 0
HcmV?d00001

diff --git a/tests/d_fallocate/name b/tests/d_fallocate/name
new file mode 100644
index 0000000..72d0ed3
--- /dev/null
+++ b/tests/d_fallocate/name
@@ -0,0 +1 @@
+fallocate sparse files and big files
diff --git a/tests/d_fallocate/script b/tests/d_fallocate/script
new file mode 100644
index 0000000..ae8956e
--- /dev/null
+++ b/tests/d_fallocate/script
@@ -0,0 +1,175 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+	EXP=$test_name.tmp
+	gunzip < $test_dir/expect.gz > $EXP
+else
+	EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+        base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+        blocksize = 1024
+        inode_size = 256
+        inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O ^bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+	name="$1"
+	start="$2"
+	flag="$3"
+
+	cat << ENDL
+write /dev/null $name
+sif /$name size 40960
+eo /$name
+set_bmap $flag 10 $((start + 10))
+set_bmap $flag 13 $((start + 13))
+set_bmap $flag 26 $((start + 26))
+set_bmap $flag 29 $((start + 29))
+ec
+sif /$name blocks 8
+setb $((start + 10))
+setb $((start + 13))
+setb $((start + 26))
+setb $((start + 29))
+ENDL
+}
+
+#Files we create:
+# a: fallocate a 40k file
+# b*: falloc sparse file starting at b*
+# c*: falloc spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+# g*: falloc sparse init file starting at g*
+# h*: falloc sparse init file ending at h*
+# i: midcluster to midcluster, surrounding sparse init
+# j: partial middle cluster alloc
+# k: one big init file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+sif /a size 40960
+fallocate /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+	make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+	echo "fallocate /b$i $i 39" >> $TMPFILE.cmd
+	echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+	make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+	echo "fallocate /c$i 0 $i" >> $TMPFILE.cmd
+	echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "fallocate /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "fallocate /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 2
+setb 9000
+fallocate /f 0 8999
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+# Now do it again, but with initialized blocks
+base=20000
+for i in 8 9 10 11 12 13 14 15; do
+	make_file g$i $(($base + (40 * ($i - 8)))) >> $TMPFILE.cmd
+	echo "fallocate /g$i $i 39" >> $TMPFILE.cmd
+	echo "ex /g$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+	make_file h$i $(($base + 320 + (40 * ($i - 24)))) >> $TMPFILE.cmd
+	echo "fallocate /h$i 0 $i" >> $TMPFILE.cmd
+	echo "ex /h$i" >> $TMPFILE.cmd2
+done
+
+make_file i $(($base + 640)) >> $TMPFILE.cmd
+echo "fallocate /i 4 35" >> $TMPFILE.cmd
+echo "ex /i" >> $TMPFILE.cmd2
+
+make_file j $(($base + 680)) >> $TMPFILE.cmd
+echo "fallocate /j 19 20" >> $TMPFILE.cmd
+echo "ex /j" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null k
+sif /k size 1024
+eo /k
+set_bmap 0 19000
+ec
+sif /k blocks 2
+setb 19000
+fallocate /k 0 8999
+sif /k size 9216000
+ENDL
+echo "ex /k" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/d_fallocate_bigalloc/expect.gz b/tests/d_fallocate_bigalloc/expect.gz
new file mode 100644
index 0000000000000000000000000000000000000000..8640bc29dc71b09810f110d07d84d883b1109b88
GIT binary patch
literal 2673
zcmV-%3Xb(3iwFQXkyum$1KnF)Z``;QeGTSU5GV@l7Fk=o<fTLdMX}vYfB;!^fwTzv
zR7`BeR&CE1GviItUtftl8dDm5kloQs2MB0octq(O>GE7&&g$cOx2ZO_SK@G2ici(;
z`r%=FEk2j)!}G2b)n4om)x(2$-rSTsaZ~<QK5U;J%gtf+p)AGS;qbJ-y1XpE9`wbR
z>b|<TFL#^r;bObHy{u};*NdmSr^~+{PrKcT$8B@oP5tNN`cQ2*tJP1tdbIi)#OLav
z+<)62%17~~I@}4vtc;NN;!_R2-;4F;M(E6%8os?L_p8;9%~?J6cDH?g5{LDt`V?29
z{=9ni>Snts#l&azaaTF8JKz5b-akK;yWk3gyI5aeZ=W|9;&{j52&>hv>-}EHEAi7^
zd3}GpnEh+q^lNv{ef6LEeqC1zy53Z~^7^pdeXCX4JzqDH`A%xE)Ae?<ssF5gs}A3M
zKOOYjm7mL9xw$TdyJ!r@K@ZRI{`v87U7sz~@KEj#|H6%52`Mj~OzuC#clJX5QEaxG
zAL{WB)$Q~4dH?<Ex|7zIUSWTyt2bY(LwzIllecdkyliz-etN$Bycb_;rb~ae_W!O#
z`Bhx5SMQrAQoPw5c2zx;|GeMcR@dtXfAH(wx4k=hUv6&eXX@?4`gXq(*`>(pZ{=nE
z&-Kg14^5-2a97qGNm(fOozBER>X)QCbVuK>AD<q|b6kRMx=CC@-3re9k!#&YwwKW2
z=!jFmDc4^Cr$%t9#gXy2x~=>BBi9b&^bu!)v#!4d&h~H}sy-EGqe&FFU8<SNvgW96
z>m&E!S(-?>{>tOX(<CQo(!dq2zXq-t8SjLq`iW}#TaQz=H(r+}dp??MNR!6!44QN#
zO$?e$B25M~nM9flt}~yeNoMDN>l{xu^W$6aq#}8;o+p{5@gy@1TpCX@^E^p8!;@Tw
zC#gJ7Ql;@E6?oE#JZbQx6M4dOFL{P1S|W<q<L2NgpO4}c@|2UuVFIeZjN_>QPeme6
z1$eSEJgH@PBI$Xe_}R|!L<r-Q7{&=<oD#!0A&k>Z7?*k>o)|oFB2S3plo-beahww4
zI3bSHTpZ`$!;=gRoXo=Ui>%|-Rkmt+wg5gQ7J<KMd&B}qN8F3uYWjm%t446@h-2W`
z^=IIC1gDQU6AEID!zoM{QuT4%;yQvTOc-i<!nh`VHbF!vh-!@oK6*e95za7*ydVlQ
z%;}Au2;!b}g&9Vj&M<vJj!(LF{WWlUPLRMiH~k@qHY2zbLC6t;nW<?feDgK&`5a+~
zUju&g6MiQniORum5g$n$PnelhVMZLoP=3+$2T#QePi7gObmn=|GL0vVnMqR`PZ~3m
zrZk>3W+qKDJmm}Vq`?y>@`Ra5C(KN_>kpnxK9bRl@PwJEX(xFq7UHP@Pd1S!Of;OB
zVK6cbj?XY0PmBqpX(xHI%kadA|E$t@V#I$=%rF@7pLHTnh~qpH$5plvPl)537{?iL
zoQp)B5XU(o!^m8J@YEVC8sSMU!;?WAH;HlFAdZ{F48tIfn}iHQy8fOgqhv4scX+xv
zA4@u932%XE?V62b2~pf6<`)K0+$7`|?D~VH9QHXMWT{*xyD+KQg+X><60-|~pl>3w
zi#90v@jN|w)_>Wi&1{|-;L^-&p0(pm2Eor~&n=oqq_DP|TL>A-Ax#HRy9JxMSkSca
z4#j{Rz`4a{>3Z7ZVza|F?SLE&PWqm1;mNJOVpUO(ji+0jPxhJx26qzRiZL9wIR92P
z`CW4o%$<_PaHhp60?w@hVJ?wo441b!cSlXzcrik3rBObMBM!W<OM<}*jq+k!9D^6*
zauDlR#}rnpPH-}7aR$8P4KG@GoK|Bv+2V5Wk~`0c)i|Ax;i$zG;H7AIVd-%!$8g8z
zr2sFM9XY`981DGISkFtd?%d3+z>V_KJ}-XVxo)MKbHIxk<)wXIq+g}1Upe>M`}-^W
zczID2gRlK<alZGyIELZ+q04$<m^xwD^om7<ViASKqD}xdodC?Prvif8*|-yWO=p2d
zxt_)&;1>qpb&m^XLPBAQ!opG~=$f8WAe5Bq_QffL@3%)N3JG&ZbSy%N5xwmB^s?db
zx~G>M(94E;Y24vU)oM;JhF%=47eg<TsF#6WCR#6pyUC;V%Ar?2t=I3XhK>0ga`VrG
z&oRBS5TSfVVfk!MFblzKv|!c?)|T301@j?~!cyCeV7|y!w=z~RA0FvgmKzmJE+beo
z@h1u9>v$B_@n!_`6}+a66-+7!rV|Cz5KKo4hIk=4S}=xSJR=w_B$!DMENcKc1bZnc
z4hTk}U<e|zO_lVlU^xUU5(O(DSP?B)0l{pvU>0{{X9QEr2u4|=V7{_RVP$jX4I_9C
ziuN1?&q0ZvgWx$R+H(**2Ss}hg6E(a&mr|fg6UMj7=m%MU@#pNZ8`|1gAz>#!E{ix
z=^&U6iZ&gDctLZfgBKEP+H~Zng28kIrQ42Rd2d;TP_(76XxkAi@0Du_<yx94*XE}}
z!>p)Lv%)Q|Bea59Q3Gbh91)?=3T8zOn3cRmbc9wgD{9)T+$~cQuE$koswABEmTd?n
z;ijEfwgE0o`xN=KuPr5qWU7+~gk4Vs#OWJ7EVP1=P-7|kfY7?>50f)qc7BojWu=Z#
zcBZiGd@6UaMn@<(Q&@2B3(X4&olgpFPK7oQ+Dr+3n%^q}q4Qy(PeZq_W;gAy(5InW
z!GMgeJq3YShl0Tz1@-UKK-?7!<|wFq_XRV{2&QGCV7^G+v}uBAEXZgXE0~5k(lS;s
z4FfW|=r<ymhB*of@iT&Hn4_Qs-xn-jNH9(nOhYi8D42#|94#2kk>+T@U_eG!TL%Pl
z_XKkk><E|>4CW}<3(yy=SV%CNDp&!*ibTN*2xg-NgE`VRRxpN<7+vNb5Uepd43onn
zCg)TzhRF#wCG-We%Lv9QQ80#CVM-H>VOChh3dS%itfB>jSz%5T3}%IOv|unDJZd^r
zwvb?{rh{QRIMH-4Ob17s4u<LAXw$(k9c&W?gX!QRLNJxNo)GM1n?_$Sxr|^Y)pQt`
z4wGm)3`~cKHXR10!$g}71JhwtnqUT|!%&Q1()F|$=_T*$2v(g>E*;9%vmD9{WN$i@
z3&F$alMQ>5*_(moFdUn`8CVX(vDuq}<uJKTO4AK2hcU5fx`E{|c}$w_?glfzX;)+Q
zV4D8#WmaQyNhw=h2wPn!-0Cv9$%U}Vg~ClPliOPe+gm7cdkbNE3x(TTCO5VaHnvc>
zv87A<{W<nwsa|Z8*r6iB9V(q2SSol;TrIxcn_{h>BAPuh13R!3wym@^{QAzkPbqR|
z-ilyP%m2milHcyivi<eALyIrFZSza>?GC6vWa2x{GkWd6FK_<LhRpXDtGCtHzld+m
zugu3z@?D*GN%I2KT&vJQ_I?-rr{*KxzI*>heE8>wzrXoKeEji)_~mauzkB=c&CmYp
f9)!C;rDnRY1HYwqzH#!$)N=J-JwLQM>OTMg)z2kH

literal 0
HcmV?d00001

diff --git a/tests/d_fallocate_bigalloc/name b/tests/d_fallocate_bigalloc/name
new file mode 100644
index 0000000..915645c
--- /dev/null
+++ b/tests/d_fallocate_bigalloc/name
@@ -0,0 +1 @@
+fallocate sparse files and big files with bigalloc
diff --git a/tests/d_fallocate_bigalloc/script b/tests/d_fallocate_bigalloc/script
new file mode 100644
index 0000000..6b6bf97
--- /dev/null
+++ b/tests/d_fallocate_bigalloc/script
@@ -0,0 +1,176 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+	EXP=$test_name.tmp
+	gunzip < $test_dir/expect.gz > $EXP
+else
+	EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+	cluster_size = 8192
+        base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+        blocksize = 1024
+        inode_size = 256
+        inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+	name="$1"
+	start="$2"
+	flag="$3"
+
+	cat << ENDL
+write /dev/null $name
+sif /$name size 40960
+eo /$name
+set_bmap $flag 10 $((start + 10))
+set_bmap $flag 13 $((start + 13))
+set_bmap $flag 26 $((start + 26))
+set_bmap $flag 29 $((start + 29))
+ec
+sif /$name blocks 32
+setb $((start + 10))
+setb $((start + 13))
+setb $((start + 26))
+setb $((start + 29))
+ENDL
+}
+
+#Files we create:
+# a: fallocate a 40k file
+# b*: falloc sparse file starting at b*
+# c*: falloc spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+# g*: falloc sparse init file starting at g*
+# h*: falloc sparse init file ending at h*
+# i: midcluster to midcluster, surrounding sparse init
+# j: partial middle cluster alloc
+# k: one big init file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+sif /a size 40960
+fallocate /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+	make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+	echo "fallocate /b$i $i 39" >> $TMPFILE.cmd
+	echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+	make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+	echo "fallocate /c$i 0 $i" >> $TMPFILE.cmd
+	echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "fallocate /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "fallocate /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 16
+setb 9000
+fallocate /f 0 8999
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+# Now do it again, but with initialized blocks
+base=20000
+for i in 8 9 10 11 12 13 14 15; do
+	make_file g$i $(($base + (40 * ($i - 8)))) >> $TMPFILE.cmd
+	echo "fallocate /g$i $i 39" >> $TMPFILE.cmd
+	echo "ex /g$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+	make_file h$i $(($base + 320 + (40 * ($i - 24)))) >> $TMPFILE.cmd
+	echo "fallocate /h$i 0 $i" >> $TMPFILE.cmd
+	echo "ex /h$i" >> $TMPFILE.cmd2
+done
+
+make_file i $(($base + 640)) >> $TMPFILE.cmd
+echo "fallocate /i 4 35" >> $TMPFILE.cmd
+echo "ex /i" >> $TMPFILE.cmd2
+
+make_file j $(($base + 680)) >> $TMPFILE.cmd
+echo "fallocate /j 19 20" >> $TMPFILE.cmd
+echo "ex /j" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null k
+sif /k size 1024
+eo /k
+set_bmap 0 19000
+ec
+sif /k blocks 16
+setb 19000
+fallocate /k 0 8999
+sif /k size 9216000
+ENDL
+echo "ex /k" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/d_fallocate_blkmap/expect b/tests/d_fallocate_blkmap/expect
new file mode 100644
index 0000000..f7ae606
--- /dev/null
+++ b/tests/d_fallocate_blkmap/expect
@@ -0,0 +1,58 @@
+Creating filesystem with 65536 1k blocks and 4096 inodes
+Superblock backups stored on blocks: 
+	8193, 24577, 40961, 57345
+
+Allocating group tables:    \b\b\bdone                            
+Writing inode tables:    \b\b\bdone                            
+Writing superblocks and filesystem accounting information:    \b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/4096 files (0.0% non-contiguous), 2340/65536 blocks
+Exit status is 0
+debugfs write files
+debugfs: stat /a
+Inode: 12   Type: regular    Mode:  0666   Flags: 0x0
+Generation: 0    Version: 0x00000000:00000000
+User:     0   Group:     0   Size: 40960
+File ACL: 0    Directory ACL: 0
+Links: 1   Blockcount: 82
+Fragment:  Address: 0    Number: 0    Size: 0
+Size of extra inode fields: 28
+BLOCKS:
+(0-1):1312-1313, (2-11):8000-8009, (IND):8010, (12-39):8011-8038
+TOTAL: 41
+
+debugfs: stat /b
+Inode: 13   Type: regular    Mode:  0666   Flags: 0x0
+Generation: 0    Version: 0x00000000:00000000
+User:     0   Group:     0   Size: 10240000
+File ACL: 0    Directory ACL: 0
+Links: 1   Blockcount: 20082
+Fragment:  Address: 0    Number: 0    Size: 0
+Size of extra inode fields: 28
+BLOCKS:
+(0-11):10000-10011, (IND):10012, (12-267):10013-10268, (DIND):10269, (IND):10270, (268-523):10271-10526, (IND):10527, (524-779):10528-10783, (IND):10784, (780-1035):10785-11040, (IND):11041, (1036-1291):11042-11297, (IND):11298, (1292-1547):11299-11554, (IND):11555, (1548-1803):11556-11811, (IND):11812, (1804-2059):11813-12068, (IND):12069, (2060-2315):12070-12325, (IND):12326, (2316-2571):12327-12582, (IND):12583, (2572-2827):12584-12839, (IND):12840, (2828-3083):12841-13096, (IND):13097, (3084-3339):13098-13353, (IND):13354, (3340-3595):13355-13610, (IND):13611, (3596-3851):13612-13867, (IND):13868, (3852-4107):13869-14124, (IND):14125, (4108-4363):14126-14381, (IND):14382, (4364-4619):14383-14638, (IND):14639, (4620-4875):14640-14895, (IND):14896, (4876-5131):14897-15152, (IND):15153, 
 (5132-5387):15154-15409, (IND):15410, (5388-5643):15411-15666, (IND):15667, (5644-5899):15668-15923, (IND):15924, (5900-6155):15925-16180, (IND):16181, (6156-6411):16182-16437, (IND):16438,!
  (6412-6667):16439-16694, (IND):16695, (6668-6923):16696-16951, (IND):16952, (6924-7179):16953-17208, (IND):17209, (7180-7435):17210-17465, (IND):17466, (7436-7691):17467-17722, (IND):17723, (7692-7947):17724-17979, (IND):17980, (7948-8203):17981-18236, (IND):18237, (8204-8459):18238-18493, (IND):18494, (8460-8715):18495-18750, (IND):18751, (8716-8971):18752-19007, (IND):19008, (8972-9227):19009-19264, (IND):19265, (9228-9483):19266-19521, (IND):19522, (9484-9739):19523-19778, (IND):19779, (9740-9995):19780-20035, (IND):20036, (9996-9999):20037-20040
+TOTAL: 10041
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #0 (6841, counted=6840).
+Fix? yes
+
+Free blocks count wrong for group #1 (1551, counted=1550).
+Fix? yes
+
+Free blocks count wrong (53116, counted=53114).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 13/4096 files (7.7% non-contiguous), 12422/65536 blocks
+Exit status is 1
diff --git a/tests/d_fallocate_blkmap/name b/tests/d_fallocate_blkmap/name
new file mode 100644
index 0000000..ba2b61d
--- /dev/null
+++ b/tests/d_fallocate_blkmap/name
@@ -0,0 +1 @@
+fallocate sparse files and big files on a blockmap fs
diff --git a/tests/d_fallocate_blkmap/script b/tests/d_fallocate_blkmap/script
new file mode 100644
index 0000000..9c48cbc
--- /dev/null
+++ b/tests/d_fallocate_blkmap/script
@@ -0,0 +1,85 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+	EXP=$test_name.tmp
+	gunzip < $test_dir/expect.gz > $EXP1
+else
+	EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+        base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,^extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,^64bit
+        blocksize = 1024
+        inode_size = 256
+        inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O ^bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+
+#Files we create:
+# a: fallocate a 40k file
+# k: one big file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+sif /a bmap[2] 8000
+sif /a size 40960
+sif /a i_blocks 2
+setb 8000
+fallocate /a 0 39
+
+write /dev/null b
+sif /b size 10240000
+sif /b bmap[0] 10000
+sif /b i_blocks 2
+setb 10000
+fallocate /b 0 9999
+ENDL
+echo "stat /a" >> $TMPFILE.cmd2
+echo "stat /b" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed -e '/^.*time:.*$/d' < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 34/35] tests: test debugfs punch command
  2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
                   ` (32 preceding siblings ...)
  2015-04-02  2:37 ` [PATCH 33/35] debugfs: implement fallocate Darrick J. Wong
@ 2015-04-02  2:37 ` Darrick J. Wong
  33 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-02  2:37 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Test punching out various parts of sparse files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/d_punch/expect          |  208 +++++++++++++++++++++++++++++++++++++++++
 tests/d_punch/name            |    1 
 tests/d_punch/script          |  129 +++++++++++++++++++++++++
 tests/d_punch_bigalloc/expect |  207 +++++++++++++++++++++++++++++++++++++++++
 tests/d_punch_bigalloc/name   |    1 
 tests/d_punch_bigalloc/script |  130 ++++++++++++++++++++++++++
 6 files changed, 676 insertions(+)
 create mode 100644 tests/d_punch/expect
 create mode 100644 tests/d_punch/name
 create mode 100644 tests/d_punch/script
 create mode 100644 tests/d_punch_bigalloc/expect
 create mode 100644 tests/d_punch_bigalloc/name
 create mode 100644 tests/d_punch_bigalloc/script


diff --git a/tests/d_punch/expect b/tests/d_punch/expect
new file mode 100644
index 0000000..764715e
--- /dev/null
+++ b/tests/d_punch/expect
@@ -0,0 +1,208 @@
+Creating filesystem with 65536 1k blocks and 4096 inodes
+Superblock backups stored on blocks: 
+	8193, 24577, 40961, 57345
+
+Allocating group tables:    \b\b\bdone                            
+Writing inode tables:    \b\b\bdone                            
+Writing superblocks and filesystem accounting information:    \b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/4096 files (0.0% non-contiguous), 2345/65536 blocks
+Exit status is 0
+debugfs write files
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+debugfs: ex /sample
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1323              0
+ 1/ 1   1/  5     0 -     9  1313 -  1322     10 Uninit
+ 1/ 1   2/  5    11 -    12  1324 -  1325      2 Uninit
+ 1/ 1   3/  5    14 -    25  1327 -  1338     12 Uninit
+ 1/ 1   4/  5    27 -    28  1340 -  1341      2 Uninit
+ 1/ 1   5/  5    30 -    39  1343 -  1352     10 Uninit
+debugfs: ex /b8
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1390              0
+ 1/ 1   1/  4     0 -     0  1326 -  1326      1 Uninit
+ 1/ 1   2/  4     1 -     1  1339 -  1339      1 Uninit
+ 1/ 1   3/  4     2 -     2  1342 -  1342      1 Uninit
+ 1/ 1   4/  4     3 -     7  1353 -  1357      5 Uninit
+debugfs: ex /b9
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1368              0
+ 1/ 1   1/  1     0 -     8  1358 -  1366      9 Uninit
+debugfs: ex /b10
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1378              0
+ 1/ 1   1/  2     0 -     0  1367 -  1367      1 Uninit
+ 1/ 1   2/  2     1 -     9  1369 -  1377      9 Uninit
+debugfs: ex /b11
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1389              0
+ 1/ 1   1/  1     0 -     9  1379 -  1388     10 Uninit
+debugfs: ex /b12
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1401              0
+ 1/ 1   1/  2     0 -     9  1391 -  1400     10 Uninit
+ 1/ 1   2/  2    11 -    11  1402 -  1402      1 Uninit
+debugfs: ex /b13
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1413              0
+ 1/ 1   1/  2     0 -     9  1403 -  1412     10 Uninit
+ 1/ 1   2/  2    11 -    12  1414 -  1415      2 Uninit
+debugfs: ex /b14
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1426              0
+ 1/ 1   1/  2     0 -     9  1416 -  1425     10 Uninit
+ 1/ 1   2/  2    11 -    12  1427 -  1428      2 Uninit
+debugfs: ex /b15
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1439              0
+ 1/ 1   1/  3     0 -     9  1429 -  1438     10 Uninit
+ 1/ 1   2/  3    11 -    12  1440 -  1441      2 Uninit
+ 1/ 1   3/  3    14 -    14  1443 -  1443      1 Uninit
+debugfs: ex /c24
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    25 - 4294967295  1453         4294967271
+ 1/ 1   1/  3    25 -    25  1468 -  1468      1 Uninit
+ 1/ 1   2/  3    27 -    28  1470 -  1471      2 Uninit
+ 1/ 1   3/  3    30 -    39  1473 -  1482     10 Uninit
+debugfs: ex /c25
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    27 - 4294967295  1483         4294967269
+ 1/ 1   1/  2    27 -    28  1485 -  1486      2 Uninit
+ 1/ 1   2/  2    30 -    39  1488 -  1497     10 Uninit
+debugfs: ex /c26
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    27 - 4294967295  1484         4294967269
+ 1/ 1   1/  2    27 -    28  1498 -  1499      2 Uninit
+ 1/ 1   2/  2    30 -    39  1501 -  1510     10 Uninit
+debugfs: ex /c27
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    28 - 4294967295  1487         4294967268
+ 1/ 1   1/  2    28 -    28  1512 -  1512      1 Uninit
+ 1/ 1   2/  2    30 -    39  1514 -  1523     10 Uninit
+debugfs: ex /c28
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    30 - 4294967295  1500         4294967266
+ 1/ 1   1/  1    30 -    39  1526 -  1535     10 Uninit
+debugfs: ex /c29
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    30 - 4294967295  1511         4294967266
+ 1/ 1   1/  1    30 -    39  1537 -  1546     10 Uninit
+debugfs: ex /c30
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    31 - 4294967295  1513         4294967265
+ 1/ 1   1/  1    31 -    39  1549 -  1557      9 Uninit
+debugfs: ex /c31
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    32 - 4294967295  1524         4294967264
+ 1/ 1   1/  1    32 -    39  1560 -  1567      8 Uninit
+debugfs: ex /d
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1525              0
+ 1/ 1   1/  3     0 -     0  1442 -  1442      1 Uninit
+ 1/ 1   2/  3     1 -     3  1444 -  1446      3 Uninit
+ 1/ 1   3/  3    36 -    39  1573 -  1576      4 Uninit
+debugfs: ex /e
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1547              0
+ 1/ 1   1/ 11     0 -     5  1447 -  1452      6 Uninit
+ 1/ 1   2/ 11     6 -     9  1454 -  1457      4 Uninit
+ 1/ 1   3/ 11    11 -    12  1459 -  1460      2 Uninit
+ 1/ 1   4/ 11    14 -    18  1462 -  1466      5 Uninit
+ 1/ 1   5/ 11    21 -    21  1472 -  1472      1 Uninit
+ 1/ 1   6/ 11    22 -    22  1536 -  1536      1 Uninit
+ 1/ 1   7/ 11    23 -    23  1548 -  1548      1 Uninit
+ 1/ 1   8/ 11    24 -    25  1558 -  1559      2 Uninit
+ 1/ 1   9/ 11    27 -    28  1569 -  1570      2 Uninit
+ 1/ 1  10/ 11    30 -    30  1572 -  1572      1 Uninit
+ 1/ 1  11/ 11    31 -    39  1577 -  1585      9 Uninit
+debugfs: ex /f
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  2     0 -     0  9000 -  9000      1 Uninit
+ 0/ 0   2/  2  8999 -  8999 17999 - 17999      1 Uninit
+Pass 1: Checking inodes, blocks, and sizes
+Inode 15 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 16 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 17 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 18 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 19 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 20 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 21 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 22 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 23 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 24 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 25 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 26 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 27 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 28 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 29 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 30 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #1 (7934, counted=7933).
+Fix? yes
+
+Free blocks count wrong (62939, counted=62938).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 32/4096 files (43.8% non-contiguous), 2598/65536 blocks
+Exit status is 1
diff --git a/tests/d_punch/name b/tests/d_punch/name
new file mode 100644
index 0000000..724639f
--- /dev/null
+++ b/tests/d_punch/name
@@ -0,0 +1 @@
+punch sparse files and big files
diff --git a/tests/d_punch/script b/tests/d_punch/script
new file mode 100644
index 0000000..7a77c69
--- /dev/null
+++ b/tests/d_punch/script
@@ -0,0 +1,129 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+	EXP=$test_name.tmp
+	gunzip < $test_dir/expect.gz > $EXP1
+else
+	EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+        base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+        blocksize = 1024
+        inode_size = 256
+        inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O ^bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+	name="$1"
+	start="$2"
+	flag="$3"
+
+	cat << ENDL
+write /dev/null $name
+fallocate /$name 0 39
+punch /$name 10 10
+punch /$name 13 13
+punch /$name 26 26
+punch /$name 29 29
+ENDL
+}
+
+#Files we create:
+# a: punch a 40k file
+# b*: punch sparse file starting at b*
+# c*: punch spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+fallocate /a 0 39
+punch /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+	make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+	echo "punch /b$i $i 39" >> $TMPFILE.cmd
+	echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+	make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+	echo "punch /c$i 0 $i" >> $TMPFILE.cmd
+	echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "punch /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "punch /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 2
+setb 9000
+fallocate /f 0 8999
+punch /f 1 8998
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/d_punch_bigalloc/expect b/tests/d_punch_bigalloc/expect
new file mode 100644
index 0000000..21427d5
--- /dev/null
+++ b/tests/d_punch_bigalloc/expect
@@ -0,0 +1,207 @@
+
+Warning: the bigalloc feature is still under development
+See https://ext4.wiki.kernel.org/index.php/Bigalloc for more information
+
+Creating filesystem with 65536 1k blocks and 4096 inodes
+
+Allocating group tables:    \b\b\bdone                            
+Writing inode tables:    \b\b\bdone                            
+Writing superblocks and filesystem accounting information:    \b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/4096 files (9.1% non-contiguous), 1144/65536 blocks
+Exit status is 0
+debugfs write files
+debugfs: ex /a
+Level Entries       Logical      Physical Length Flags
+debugfs: ex /sample
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1184              0
+ 1/ 1   1/  5     0 -     9  1144 -  1153     10 Uninit
+ 1/ 1   2/  5    11 -    12  1155 -  1156      2 Uninit
+ 1/ 1   3/  5    14 -    25  1158 -  1169     12 Uninit
+ 1/ 1   4/  5    27 -    28  1171 -  1172      2 Uninit
+ 1/ 1   5/  5    30 -    39  1174 -  1183     10 Uninit
+debugfs: ex /b8
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1232              0
+ 1/ 1   1/  1     0 -     7  1192 -  1199      8 Uninit
+debugfs: ex /b9
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1248              0
+ 1/ 1   1/  1     0 -     8  1200 -  1208      9 Uninit
+debugfs: ex /b10
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1272              0
+ 1/ 1   1/  1     0 -     9  1216 -  1225     10 Uninit
+debugfs: ex /b11
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1296              0
+ 1/ 1   1/  2     0 -     7  1240 -  1247      8 Uninit
+ 1/ 1   2/  2     8 -     9  1256 -  1257      2 Uninit
+debugfs: ex /b12
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1320              0
+ 1/ 1   1/  3     0 -     7  1264 -  1271      8 Uninit
+ 1/ 1   2/  3     8 -     9  1280 -  1281      2 Uninit
+ 1/ 1   3/  3    11 -    11  1283 -  1283      1 Uninit
+debugfs: ex /b13
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1344              0
+ 1/ 1   1/  3     0 -     7  1288 -  1295      8 Uninit
+ 1/ 1   2/  3     8 -     9  1304 -  1305      2 Uninit
+ 1/ 1   3/  3    11 -    12  1307 -  1308      2 Uninit
+debugfs: ex /b14
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1368              0
+ 1/ 1   1/  3     0 -     7  1312 -  1319      8 Uninit
+ 1/ 1   2/  3     8 -     9  1328 -  1329      2 Uninit
+ 1/ 1   3/  3    11 -    12  1331 -  1332      2 Uninit
+debugfs: ex /b15
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1392              0
+ 1/ 1   1/  4     0 -     7  1336 -  1343      8 Uninit
+ 1/ 1   2/  4     8 -     9  1352 -  1353      2 Uninit
+ 1/ 1   3/  4    11 -    12  1355 -  1356      2 Uninit
+ 1/ 1   4/  4    14 -    14  1358 -  1358      1 Uninit
+debugfs: ex /c24
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    25 - 4294967295  1416         4294967271
+ 1/ 1   1/  3    25 -    25  1401 -  1401      1 Uninit
+ 1/ 1   2/  3    27 -    28  1403 -  1404      2 Uninit
+ 1/ 1   3/  3    30 -    39  1406 -  1415     10 Uninit
+debugfs: ex /c25
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    27 - 4294967295  1440         4294967269
+ 1/ 1   1/  2    27 -    28  1427 -  1428      2 Uninit
+ 1/ 1   2/  2    30 -    39  1430 -  1439     10 Uninit
+debugfs: ex /c26
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    27 - 4294967295  1464         4294967269
+ 1/ 1   1/  2    27 -    28  1451 -  1452      2 Uninit
+ 1/ 1   2/  2    30 -    39  1454 -  1463     10 Uninit
+debugfs: ex /c27
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    28 - 4294967295  1488         4294967268
+ 1/ 1   1/  2    28 -    28  1476 -  1476      1 Uninit
+ 1/ 1   2/  2    30 -    39  1478 -  1487     10 Uninit
+debugfs: ex /c28
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    30 - 4294967295  1512         4294967266
+ 1/ 1   1/  1    30 -    39  1502 -  1511     10 Uninit
+debugfs: ex /c29
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    30 - 4294967295  1536         4294967266
+ 1/ 1   1/  1    30 -    39  1526 -  1535     10 Uninit
+debugfs: ex /c30
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    31 - 4294967295  1560         4294967265
+ 1/ 1   1/  1    31 -    39  1551 -  1559      9 Uninit
+debugfs: ex /c31
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1    32 - 4294967295  1584         4294967264
+ 1/ 1   1/  1    32 -    39  1576 -  1583      8 Uninit
+debugfs: ex /d
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1600              0
+ 1/ 1   1/  2     0 -     3  1360 -  1363      4 Uninit
+ 1/ 1   2/  2    36 -    39  1596 -  1599      4 Uninit
+debugfs: ex /e
+Level Entries       Logical      Physical Length Flags
+ 0/ 1   1/  1     0 - 4294967295  1624              0
+ 1/ 1   1/  8     0 -     9  1376 -  1385     10 Uninit
+ 1/ 1   2/  8    11 -    12  1387 -  1388      2 Uninit
+ 1/ 1   3/  8    14 -    15  1390 -  1391      2 Uninit
+ 1/ 1   4/  8    16 -    18  1568 -  1570      3 Uninit
+ 1/ 1   5/  8    21 -    23  1573 -  1575      3 Uninit
+ 1/ 1   6/  8    24 -    25  1608 -  1609      2 Uninit
+ 1/ 1   7/  8    27 -    28  1611 -  1612      2 Uninit
+ 1/ 1   8/  8    30 -    39  1614 -  1623     10 Uninit
+debugfs: ex /f
+Level Entries       Logical      Physical Length Flags
+ 0/ 0   1/  2     0 -     0  9000 -  9000      1 Uninit
+ 0/ 0   2/  2  8999 -  8999 17999 - 17999      1 Uninit
+Pass 1: Checking inodes, blocks, and sizes
+Inode 14 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 15 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 16 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 17 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 18 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 19 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 20 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 22 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 23 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 24 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 25 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 26 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 27 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 28 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 29 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Inode 30 extent tree could be shorter.
+	(level 1 is unnecessary)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #0 (8003, counted=8002).
+Fix? yes
+
+Free blocks count wrong (64024, counted=64016).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 32/4096 files (43.8% non-contiguous), 1520/65536 blocks
+Exit status is 1
diff --git a/tests/d_punch_bigalloc/name b/tests/d_punch_bigalloc/name
new file mode 100644
index 0000000..6d61ebe
--- /dev/null
+++ b/tests/d_punch_bigalloc/name
@@ -0,0 +1 @@
+punch sparse files and big files with bigalloc
diff --git a/tests/d_punch_bigalloc/script b/tests/d_punch_bigalloc/script
new file mode 100644
index 0000000..6eb0571
--- /dev/null
+++ b/tests/d_punch_bigalloc/script
@@ -0,0 +1,130 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+	EXP=$test_name.tmp
+	gunzip < $test_dir/expect.gz > $EXP1
+else
+	EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+	cluster_size = 8192
+        base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+        blocksize = 1024
+        inode_size = 256
+        inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+	name="$1"
+	start="$2"
+	flag="$3"
+
+	cat << ENDL
+write /dev/null $name
+fallocate /$name 0 39
+punch /$name 10 10
+punch /$name 13 13
+punch /$name 26 26
+punch /$name 29 29
+ENDL
+}
+
+#Files we create:
+# a: punch a 40k file
+# b*: punch sparse file starting at b*
+# c*: punch spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+fallocate /a 0 39
+punch /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+	make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+	echo "punch /b$i $i 39" >> $TMPFILE.cmd
+	echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+	make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+	echo "punch /c$i 0 $i" >> $TMPFILE.cmd
+	echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "punch /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "punch /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 16
+setb 9000
+fallocate /f 0 8999
+punch /f 1 8998
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager
  2015-04-02  2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
@ 2015-04-02  4:06   ` Andreas Dilger
  2015-04-21 15:00     ` Theodore Ts'o
  2015-05-05 14:20   ` Theodore Ts'o
  1 sibling, 1 reply; 70+ messages in thread
From: Andreas Dilger @ 2015-04-02  4:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

Doesn't it kind of make e2undo useless if it doesn't work unless
the overwriting operation completed successfully?

Wouldn't it be better to save the superblock at the start, so that
it is available if the overwriting operation is interrupted?  It seems
like e2undo would be most useful if e.g. resize2fs was interrupted in
the middle of some otherwise-corrupting change to the
filesystem.

While speeding up undo logging is nice, being able to recover your
filesystem in case of a problem is the primary goal, and that shouldn't be
forgotten.  Otherwise you may as well not use the undo manager at all. 

Cheers, Andreas

> On Apr 1, 2015, at 21:35, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> Implement pass-through calls for discard, zero-out, and readahead in
> the IO manager so that we can take advantage of any underlying
> support.
> 
> Furthermore, improve tdb write-out speed by disabling locking and only
> fsyncing at the end -- we don't care about locking because having
> multiple writers to the undo file will produce an undo database full
> of garbage blocks; and we only need to fsync at the end because if we
> fail before the end, our undo file will lack the necessary superblock
> data that e2undo requires to do replay safely.  Without this, we call
> fsync four times per tdb update(!)  This reduces the overhead of using
> undo_io while converting a 2TB FS to metadata_csum from 3+ hours to 55
> minutes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> lib/ext2fs/tdb.c     |   10 ++++++
> lib/ext2fs/tdb.h     |    2 +
> lib/ext2fs/undo_io.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 97 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/tdb.c b/lib/ext2fs/tdb.c
> index 1d97685..7317288 100644
> --- a/lib/ext2fs/tdb.c
> +++ b/lib/ext2fs/tdb.c
> @@ -4142,3 +4142,13 @@ int tdb_reopen_all(int parent_longlived)
> 
>    return 0;
> }
> +
> +/**
> + * Flush a database file from the page cache.
> + **/
> +int tdb_flush(struct tdb_context *tdb)
> +{
> +    if (tdb->fd != -1)
> +        return fsync(tdb->fd);
> +    return 0;
> +}
> diff --git a/lib/ext2fs/tdb.h b/lib/ext2fs/tdb.h
> index 732ef0e..6a4086c 100644
> --- a/lib/ext2fs/tdb.h
> +++ b/lib/ext2fs/tdb.h
> @@ -129,6 +129,7 @@ typedef struct TDB_DATA {
> #define tdb_lockall_nonblock ext2fs_tdb_lockall_nonblock
> #define tdb_lockall_read_nonblock ext2fs_tdb_lockall_read_nonblock
> #define tdb_lockall_unmark ext2fs_tdb_lockall_unmark
> +#define tdb_flush ext2fs_tdb_flush
> 
> /* this is the context structure that is returned from a db open */
> typedef struct tdb_context TDB_CONTEXT;
> @@ -191,6 +192,7 @@ size_t tdb_map_size(struct tdb_context *tdb);
> int tdb_get_flags(struct tdb_context *tdb);
> void tdb_enable_seqnum(struct tdb_context *tdb);
> void tdb_increment_seqnum_nonblock(struct tdb_context *tdb);
> +int tdb_flush(struct tdb_context *tdb);
> 
> /* Low level locking functions: use with care */
> int tdb_chainlock(struct tdb_context *tdb, TDB_DATA key);
> diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
> index d6beb02..94317cb 100644
> --- a/lib/ext2fs/undo_io.c
> +++ b/lib/ext2fs/undo_io.c
> @@ -37,6 +37,7 @@
> #if HAVE_SYS_RESOURCE_H
> #include <sys/resource.h>
> #endif
> +#include <limits.h>
> 
> #include "tdb.h"
> 
> @@ -354,8 +355,12 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
>        data->real = 0;
>    }
> 
> +    if (data->real)
> +        io->flags = (io->flags & ~CHANNEL_FLAGS_DISCARD_ZEROES) |
> +                (data->real->flags & CHANNEL_FLAGS_DISCARD_ZEROES);
> +
>    /* setup the tdb file */
> -    data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST,
> +    data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST | TDB_NOLOCK | TDB_NOSYNC,
>                 O_RDWR | O_CREAT | O_TRUNC | O_EXCL, 0600);
>    if (!data->tdb) {
>        retval = errno;
> @@ -399,8 +404,10 @@ static errcode_t undo_close(io_channel channel)
>        return retval;
>    if (data->real)
>        retval = io_channel_close(data->real);
> -    if (data->tdb)
> +    if (data->tdb) {
> +        tdb_flush(data->tdb);
>        tdb_close(data->tdb);
> +    }
>    ext2fs_free_mem(&channel->private_data);
>    if (channel->name)
>        ext2fs_free_mem(&channel->name);
> @@ -510,6 +517,77 @@ static errcode_t undo_write_byte(io_channel channel, unsigned long offset,
>    return retval;
> }
> 
> +static errcode_t undo_discard(io_channel channel, unsigned long long block,
> +                  unsigned long long count)
> +{
> +    struct undo_private_data *data;
> +    errcode_t    retval = 0;
> +    int icount;
> +
> +    EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
> +    data = (struct undo_private_data *) channel->private_data;
> +    EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
> +
> +    if (count > INT_MAX)
> +        return EXT2_ET_UNIMPLEMENTED;
> +    icount = count;
> +
> +    /*
> +     * First write the existing content into database
> +     */
> +    retval = undo_write_tdb(channel, block, icount);
> +    if (retval)
> +        return retval;
> +    if (data->real)
> +        retval = io_channel_discard(data->real, block, count);
> +
> +    return retval;
> +}
> +
> +static errcode_t undo_zeroout(io_channel channel, unsigned long long block,
> +                  unsigned long long count)
> +{
> +    struct undo_private_data *data;
> +    errcode_t    retval = 0;
> +    int icount;
> +
> +    EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
> +    data = (struct undo_private_data *) channel->private_data;
> +    EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
> +
> +    if (count > INT_MAX)
> +        return EXT2_ET_UNIMPLEMENTED;
> +    icount = count;
> +
> +    /*
> +     * First write the existing content into database
> +     */
> +    retval = undo_write_tdb(channel, block, icount);
> +    if (retval)
> +        return retval;
> +    if (data->real)
> +        retval = io_channel_zeroout(data->real, block, count);
> +
> +    return retval;
> +}
> +
> +static errcode_t undo_cache_readahead(io_channel channel,
> +                      unsigned long long block,
> +                      unsigned long long count)
> +{
> +    struct undo_private_data *data;
> +    errcode_t    retval = 0;
> +
> +    EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
> +    data = (struct undo_private_data *) channel->private_data;
> +    EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
> +
> +    if (data->real)
> +        retval = io_channel_cache_readahead(data->real, block, count);
> +
> +    return retval;
> +}
> +
> /*
>  * Flush data buffers to disk.
>  */
> @@ -522,6 +600,8 @@ static errcode_t undo_flush(io_channel channel)
>    data = (struct undo_private_data *) channel->private_data;
>    EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
> 
> +    if (data->tdb)
> +        tdb_flush(data->tdb);
>    if (data->real)
>        retval = io_channel_flush(data->real);
> 
> @@ -601,6 +681,9 @@ static struct struct_io_manager struct_undo_manager = {
>    .get_stats    = undo_get_stats,
>    .read_blk64    = undo_read_blk64,
>    .write_blk64    = undo_write_blk64,
> +    .discard    = undo_discard,
> +    .zeroout    = undo_zeroout,
> +    .cache_readahead    = undo_cache_readahead,
> };
> 
> io_manager undo_io_manager = &struct_undo_manager;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 09/35] e2fsck: abort on read error beyond end of FS
  2015-04-02  2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
@ 2015-04-02  4:10   ` Andreas Dilger
       [not found]     ` <20150402060021.GP11031@birch.djwong.org>
  0 siblings, 1 reply; 70+ messages in thread
From: Andreas Dilger @ 2015-04-02  4:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

It isn't clear what the benefit of this patch is?  There are times (e.g. if
a partition table is broken or if a single-disk filesystem is changed
to an MD RAID device) that it is useful to run e2fsck on such a filesystem.

With this patch, it's essentially turning a small error into a fatal one, but
what is the benefit?

Cheers, Andreas

> On Apr 1, 2015, at 21:35, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> Abort if we fail to read a block that's past the end of the FS.
> Includes a flag to disable the abort behavior for selected parts of
> the fsck run, so that we don't fail on a busted object prior to fixing
> it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> e2fsck/e2fsck.h   |    1 +
> e2fsck/ehandler.c |    7 +++++--
> e2fsck/extents.c  |    2 ++
> e2fsck/journal.c  |    3 +++
> e2fsck/message.c  |   12 +++++++++++-
> e2fsck/pass1.c    |   28 ++++++++++++++++------------
> e2fsck/pass1b.c   |    4 ++++
> 7 files changed, 42 insertions(+), 15 deletions(-)
> 
> 
> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> index 5fda863..453b552 100644
> --- a/e2fsck/e2fsck.h
> +++ b/e2fsck/e2fsck.h
> @@ -193,6 +193,7 @@ struct resource_track {
> #define E2F_FLAG_TIME_INSANE    0x2000 /* Time is insane */
> #define E2F_FLAG_PROBLEMS_FIXED    0x4000 /* At least one problem was fixed */
> #define E2F_FLAG_ALLOC_OK    0x8000 /* Can we allocate blocks? */
> +#define E2F_FLAG_IGNORE_READ_ERROR 0x10000 /* Don't rewrite read error blocks */
> 
> #define E2F_RESET_FLAGS (E2F_FLAG_TIME_INSANE | E2F_FLAG_PROBLEMS_FIXED)
> 
> diff --git a/e2fsck/ehandler.c b/e2fsck/ehandler.c
> index 71ca301..847f8e5 100644
> --- a/e2fsck/ehandler.c
> +++ b/e2fsck/ehandler.c
> @@ -60,8 +60,11 @@ static errcode_t e2fsck_handle_read_error(io_channel channel,
>    preenhalt(ctx);
> 
>    /* Don't rewrite a block past the end of the FS. */
> -    if (block >= ext2fs_blocks_count(fs->super))
> -        return 0;
> +    if (block >= ext2fs_blocks_count(fs->super)) {
> +        if (ctx->flags & E2F_FLAG_IGNORE_READ_ERROR)
> +            return 0;
> +        abort();
> +    }
> 
>    if (ask(ctx, _("Ignore error"), 1)) {
>        if (ask(ctx, _("Force rewrite"), 1))
> diff --git a/e2fsck/extents.c b/e2fsck/extents.c
> index 8465299..cff265a 100644
> --- a/e2fsck/extents.c
> +++ b/e2fsck/extents.c
> @@ -29,6 +29,7 @@ errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino)
> {
>    if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
>                       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
> +        (ctx->flags & (E2F_FLAG_RESTART_LATER | E2F_FLAG_RESTART)) ||
>        (ctx->options & E2F_OPT_NO) ||
>        (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
>        return 0;
> @@ -339,6 +340,7 @@ static void rebuild_extents(e2fsck_t ctx, const char *pass_name, int pr_header)
> 
>    if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
>                       EXT3_FEATURE_INCOMPAT_EXTENTS) ||
> +        (ctx->flags & (E2F_FLAG_RESTART_LATER | E2F_FLAG_RESTART)) ||
>        !ext2fs_test_valid(ctx->fs) ||
>        ctx->invalid_bitmaps) {
>        if (ctx->inodes_to_rebuild)
> diff --git a/e2fsck/journal.c b/e2fsck/journal.c
> index 9f32095..c195797 100644
> --- a/e2fsck/journal.c
> +++ b/e2fsck/journal.c
> @@ -315,6 +315,7 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
>    journal->j_inode = NULL;
>    journal->j_blocksize = ctx->fs->blocksize;
> 
> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
>    if (uuid_is_null(sb->s_journal_uuid)) {
>        if (!sb->s_journal_inum) {
>            retval = EXT2_ET_BAD_INODE_NUM;
> @@ -518,9 +519,11 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
> 
>    *ret_journal = journal;
>    e2fsck_use_inode_shortcuts(ctx, 0);
> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
>    return 0;
> 
> errout:
> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
>    e2fsck_use_inode_shortcuts(ctx, 0);
>    if (dev_fs)
>        ext2fs_free_mem(&dev_fs);
> diff --git a/e2fsck/message.c b/e2fsck/message.c
> index 9c1433f..510f291 100644
> --- a/e2fsck/message.c
> +++ b/e2fsck/message.c
> @@ -199,14 +199,24 @@ static void print_pathname(FILE *f, ext2_filsys fs, ext2_ino_t dir,
> {
>    errcode_t    retval = 0;
>    char        *path;
> +    e2fsck_t    ctx = fs ? (e2fsck_t) fs->priv_data : NULL;
> +    int        flags;
> 
>    if (!dir && (ino < num_special_inodes)) {
>        fputs(_(special_inode_name[ino]), f);
>        return;
>    }
> 
> -    if (fs)
> +    if (fs) {
> +        if (ctx) {
> +            flags = ctx->flags;
> +            ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
> +        }
>        retval = ext2fs_get_pathname(fs, dir, ino, &path);
> +        if (ctx)
> +            ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR |
> +                    (flags & E2F_FLAG_IGNORE_READ_ERROR);
> +    }
>    if (!fs || retval)
>        fputs("???", f);
>    else {
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index 308a95a..760fbde 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -510,6 +510,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
>    int            extent_fs;
>    int            inlinedata_fs;
> 
> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
>    /*
>     * If the mode looks OK, we believe it.  If the first block in
>     * the i_block array is 0, this cannot be a directory. If the
> @@ -519,7 +520,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
>     */
>    if (LINUX_S_ISDIR(inode->i_mode) || LINUX_S_ISREG(inode->i_mode) ||
>        LINUX_S_ISLNK(inode->i_mode) || inode->i_block[0] == 0)
> -        return;
> +        goto out;
> 
>    /* 
>     * Check the block numbers in the i_block array for validity:
> @@ -552,13 +553,13 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
>        struct ext2_dir_entry de;
> 
>        if (ext2fs_inline_data_size(ctx->fs, pctx->ino, &size))
> -            return;
> +            goto out;
>        /*
>         * If the size isn't a multiple of 4, it's probably not a
>         * directory??
>         */
>        if (size & 3)
> -            return;
> +            goto out;
>        /*
>         * If the first 10 bytes don't look like a directory entry,
>         * it's probably not a directory.
> @@ -578,14 +579,14 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
>             de.inode != 0) ||
>            rec_len > EXT4_MIN_INLINE_DATA_SIZE -
>                  EXT4_INLINE_DATA_DOTDOT_SIZE)
> -            return;
> +            goto out;
>        /* device files never have a "system.data" entry */
>        goto isdir;
>    } else if (extent_fs && (inode->i_flags & EXT4_EXTENTS_FL)) {
>        /* extent mapped */
>        if  (ext2fs_bmap2(ctx->fs, pctx->ino, inode, 0, 0, 0, 0,
>                 &blk))
> -            return;
> +            goto out;
>        /* device files are never extent mapped */
>        not_device++;
>    } else {
> @@ -600,7 +601,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
>                blk >= ext2fs_blocks_count(ctx->fs->super) ||
>                ext2fs_fast_test_block_bitmap2(ctx->block_found_map,
>                               blk))
> -                return;    /* Invalid block, can't be dir */
> +                goto out;    /* Invalid block, can't be dir */
>        }
>        blk = inode->i_block[0];
>    }
> @@ -612,45 +613,48 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
>     */
>    if ((LINUX_S_ISCHR(inode->i_mode) || LINUX_S_ISBLK(inode->i_mode)) &&
>        (inode->i_links_count == 1) && !not_device)
> -        return;
> +        goto out;
> 
>    /* read the first block */
>    ehandler_operation(_("reading directory block"));
>    retval = ext2fs_read_dir_block4(ctx->fs, blk, buf, 0, pctx->ino);
>    ehandler_operation(0);
>    if (retval)
> -        return;
> +        goto out;
> 
>    dirent = (struct ext2_dir_entry *) buf;
>    retval = ext2fs_get_rec_len(ctx->fs, dirent, &rec_len);
>    if (retval)
> -        return;
> +        goto out;
>    if ((ext2fs_dirent_name_len(dirent) != 1) ||
>        (dirent->name[0] != '.') ||
>        (dirent->inode != pctx->ino) ||
>        (rec_len < 12) ||
>        (rec_len % 4) ||
>        (rec_len >= ctx->fs->blocksize - 12))
> -        return;
> +        goto out;
> 
>    dirent = (struct ext2_dir_entry *) (buf + rec_len);
>    retval = ext2fs_get_rec_len(ctx->fs, dirent, &rec_len);
>    if (retval)
> -        return;
> +        goto out;
>    if ((ext2fs_dirent_name_len(dirent) != 2) ||
>        (dirent->name[0] != '.') ||
>        (dirent->name[1] != '.') ||
>        (rec_len < 12) ||
>        (rec_len % 4))
> -        return;
> +        goto out;
> 
> isdir:
> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
>    if (fix_problem(ctx, PR_1_TREAT_AS_DIRECTORY, pctx)) {
>        inode->i_mode = (inode->i_mode & 07777) | LINUX_S_IFDIR;
>        e2fsck_write_inode_full(ctx, pctx->ino, inode,
>                    EXT2_INODE_SIZE(ctx->fs->super),
>                    "check_is_really_dir");
>    }
> +out:
> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> }
> 
> void e2fsck_setup_tdb_icount(e2fsck_t ctx, int flags,
> diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
> index cd967f4..10136a6 100644
> --- a/e2fsck/pass1b.c
> +++ b/e2fsck/pass1b.c
> @@ -234,7 +234,9 @@ void e2fsck_pass1_dupblocks(e2fsck_t ctx, char *block_buf)
>    dict_set_allocator(&clstr_dict, NULL, cluster_dnode_free, NULL);
> 
>    init_resource_track(&rtrack, ctx->fs->io);
> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
>    pass1b(ctx, block_buf);
> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
>    print_resource_track(ctx, "Pass 1b", &rtrack, ctx->fs->io);
> 
>    init_resource_track(&rtrack, ctx->fs->io);
> @@ -242,7 +244,9 @@ void e2fsck_pass1_dupblocks(e2fsck_t ctx, char *block_buf)
>    print_resource_track(ctx, "Pass 1c", &rtrack, ctx->fs->io);
> 
>    init_resource_track(&rtrack, ctx->fs->io);
> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
>    pass1d(ctx, block_buf);
> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
>    print_resource_track(ctx, "Pass 1d", &rtrack, ctx->fs->io);
> 
>    /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 09/35] e2fsck: abort on read error beyond end of FS
       [not found]       ` <10D33B1F-52B7-4242-9A67-FB9E1CE75296@dilger.ca>
@ 2015-04-06 18:57         ` Darrick J. Wong
  0 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-06 18:57 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

On Fri, Apr 03, 2015 at 03:11:39PM -0600, Andreas Dilger wrote:
> On Apr 2, 2015, at 12:00 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Wed, Apr 01, 2015 at 11:10:48PM -0500, Andreas Dilger wrote:
> >> It isn't clear what the benefit of this patch is?  There are times (e.g.
> >> if a partition table is broken or if a single-disk filesystem is changed
> >> to an MD RAID device) that it is useful to run e2fsck on such a
> >> filesystem.
> >> 
> >> With this patch, it's essentially turning a small error into a fatal one,
> >> but what is the benefit?
> > 
> > This came from a debugging patch I was using to see if I could trick fsck
> > into using garbage values when it tries to "fix" things.
> > 
> > I'm not really sure what the usage scenario is for running fsck on a FS
> > that's too big for the device containing it -- wouldn't you want fsck to
> > stop immediately?
> 
> As mentioned above, this can happen in some cases, and e2fsck already has
> a check for it and it will prompt to abort:
> 
>         { PR_0_FS_SIZE_WRONG,
>           N_("The @f size (according to the @S) is %b @bs\n"
>           "The physical size of the @v is %c @bs\n"
>           "Either the @S or the partition table is likely to be corrupt!\n"),
>           PROMPT_ABORT, 0 },
> 
> However, it doesn't _force_ e2fsck to abort, because someone might want to
> be able to recover from such a situation (e.g. lose a few files that use
> blocks beyond the end of the device) rather than have an unusable filesystem.

Yeah, you're probably right, this is at best a debugging patch.  Let's drop it.

--D

> 
> Cheers, Andreas
> 
> > --D
> > 
> >> 
> >> Cheers, Andreas
> >> 
> >>> On Apr 1, 2015, at 21:35, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >>> 
> >>> Abort if we fail to read a block that's past the end of the FS.
> >>> Includes a flag to disable the abort behavior for selected parts of
> >>> the fsck run, so that we don't fail on a busted object prior to fixing
> >>> it.
> >>> 
> >>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >>> ---
> >>> e2fsck/e2fsck.h   |    1 +
> >>> e2fsck/ehandler.c |    7 +++++--
> >>> e2fsck/extents.c  |    2 ++
> >>> e2fsck/journal.c  |    3 +++
> >>> e2fsck/message.c  |   12 +++++++++++-
> >>> e2fsck/pass1.c    |   28 ++++++++++++++++------------
> >>> e2fsck/pass1b.c   |    4 ++++
> >>> 7 files changed, 42 insertions(+), 15 deletions(-)
> >>> 
> >>> 
> >>> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> >>> index 5fda863..453b552 100644
> >>> --- a/e2fsck/e2fsck.h
> >>> +++ b/e2fsck/e2fsck.h
> >>> @@ -193,6 +193,7 @@ struct resource_track {
> >>> #define E2F_FLAG_TIME_INSANE    0x2000 /* Time is insane */
> >>> #define E2F_FLAG_PROBLEMS_FIXED    0x4000 /* At least one problem was fixed */
> >>> #define E2F_FLAG_ALLOC_OK    0x8000 /* Can we allocate blocks? */
> >>> +#define E2F_FLAG_IGNORE_READ_ERROR 0x10000 /* Don't rewrite read error blocks */
> >>> 
> >>> #define E2F_RESET_FLAGS (E2F_FLAG_TIME_INSANE | E2F_FLAG_PROBLEMS_FIXED)
> >>> 
> >>> diff --git a/e2fsck/ehandler.c b/e2fsck/ehandler.c
> >>> index 71ca301..847f8e5 100644
> >>> --- a/e2fsck/ehandler.c
> >>> +++ b/e2fsck/ehandler.c
> >>> @@ -60,8 +60,11 @@ static errcode_t e2fsck_handle_read_error(io_channel channel,
> >>>   preenhalt(ctx);
> >>> 
> >>>   /* Don't rewrite a block past the end of the FS. */
> >>> -    if (block >= ext2fs_blocks_count(fs->super))
> >>> -        return 0;
> >>> +    if (block >= ext2fs_blocks_count(fs->super)) {
> >>> +        if (ctx->flags & E2F_FLAG_IGNORE_READ_ERROR)
> >>> +            return 0;
> >>> +        abort();
> >>> +    }
> >>> 
> >>>   if (ask(ctx, _("Ignore error"), 1)) {
> >>>       if (ask(ctx, _("Force rewrite"), 1))
> >>> diff --git a/e2fsck/extents.c b/e2fsck/extents.c
> >>> index 8465299..cff265a 100644
> >>> --- a/e2fsck/extents.c
> >>> +++ b/e2fsck/extents.c
> >>> @@ -29,6 +29,7 @@ errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino)
> >>> {
> >>>   if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
> >>>                      EXT3_FEATURE_INCOMPAT_EXTENTS) ||
> >>> +        (ctx->flags & (E2F_FLAG_RESTART_LATER | E2F_FLAG_RESTART)) ||
> >>>       (ctx->options & E2F_OPT_NO) ||
> >>>       (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
> >>>       return 0;
> >>> @@ -339,6 +340,7 @@ static void rebuild_extents(e2fsck_t ctx, const char *pass_name, int pr_header)
> >>> 
> >>>   if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
> >>>                      EXT3_FEATURE_INCOMPAT_EXTENTS) ||
> >>> +        (ctx->flags & (E2F_FLAG_RESTART_LATER | E2F_FLAG_RESTART)) ||
> >>>       !ext2fs_test_valid(ctx->fs) ||
> >>>       ctx->invalid_bitmaps) {
> >>>       if (ctx->inodes_to_rebuild)
> >>> diff --git a/e2fsck/journal.c b/e2fsck/journal.c
> >>> index 9f32095..c195797 100644
> >>> --- a/e2fsck/journal.c
> >>> +++ b/e2fsck/journal.c
> >>> @@ -315,6 +315,7 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
> >>>   journal->j_inode = NULL;
> >>>   journal->j_blocksize = ctx->fs->blocksize;
> >>> 
> >>> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
> >>>   if (uuid_is_null(sb->s_journal_uuid)) {
> >>>       if (!sb->s_journal_inum) {
> >>>           retval = EXT2_ET_BAD_INODE_NUM;
> >>> @@ -518,9 +519,11 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
> >>> 
> >>>   *ret_journal = journal;
> >>>   e2fsck_use_inode_shortcuts(ctx, 0);
> >>> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> >>>   return 0;
> >>> 
> >>> errout:
> >>> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> >>>   e2fsck_use_inode_shortcuts(ctx, 0);
> >>>   if (dev_fs)
> >>>       ext2fs_free_mem(&dev_fs);
> >>> diff --git a/e2fsck/message.c b/e2fsck/message.c
> >>> index 9c1433f..510f291 100644
> >>> --- a/e2fsck/message.c
> >>> +++ b/e2fsck/message.c
> >>> @@ -199,14 +199,24 @@ static void print_pathname(FILE *f, ext2_filsys fs, ext2_ino_t dir,
> >>> {
> >>>   errcode_t    retval = 0;
> >>>   char        *path;
> >>> +    e2fsck_t    ctx = fs ? (e2fsck_t) fs->priv_data : NULL;
> >>> +    int        flags;
> >>> 
> >>>   if (!dir && (ino < num_special_inodes)) {
> >>>       fputs(_(special_inode_name[ino]), f);
> >>>       return;
> >>>   }
> >>> 
> >>> -    if (fs)
> >>> +    if (fs) {
> >>> +        if (ctx) {
> >>> +            flags = ctx->flags;
> >>> +            ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
> >>> +        }
> >>>       retval = ext2fs_get_pathname(fs, dir, ino, &path);
> >>> +        if (ctx)
> >>> +            ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR |
> >>> +                    (flags & E2F_FLAG_IGNORE_READ_ERROR);
> >>> +    }
> >>>   if (!fs || retval)
> >>>       fputs("???", f);
> >>>   else {
> >>> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> >>> index 308a95a..760fbde 100644
> >>> --- a/e2fsck/pass1.c
> >>> +++ b/e2fsck/pass1.c
> >>> @@ -510,6 +510,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> >>>   int            extent_fs;
> >>>   int            inlinedata_fs;
> >>> 
> >>> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
> >>>   /*
> >>>    * If the mode looks OK, we believe it.  If the first block in
> >>>    * the i_block array is 0, this cannot be a directory. If the
> >>> @@ -519,7 +520,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> >>>    */
> >>>   if (LINUX_S_ISDIR(inode->i_mode) || LINUX_S_ISREG(inode->i_mode) ||
> >>>       LINUX_S_ISLNK(inode->i_mode) || inode->i_block[0] == 0)
> >>> -        return;
> >>> +        goto out;
> >>> 
> >>>   /* 
> >>>    * Check the block numbers in the i_block array for validity:
> >>> @@ -552,13 +553,13 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> >>>       struct ext2_dir_entry de;
> >>> 
> >>>       if (ext2fs_inline_data_size(ctx->fs, pctx->ino, &size))
> >>> -            return;
> >>> +            goto out;
> >>>       /*
> >>>        * If the size isn't a multiple of 4, it's probably not a
> >>>        * directory??
> >>>        */
> >>>       if (size & 3)
> >>> -            return;
> >>> +            goto out;
> >>>       /*
> >>>        * If the first 10 bytes don't look like a directory entry,
> >>>        * it's probably not a directory.
> >>> @@ -578,14 +579,14 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> >>>            de.inode != 0) ||
> >>>           rec_len > EXT4_MIN_INLINE_DATA_SIZE -
> >>>                 EXT4_INLINE_DATA_DOTDOT_SIZE)
> >>> -            return;
> >>> +            goto out;
> >>>       /* device files never have a "system.data" entry */
> >>>       goto isdir;
> >>>   } else if (extent_fs && (inode->i_flags & EXT4_EXTENTS_FL)) {
> >>>       /* extent mapped */
> >>>       if  (ext2fs_bmap2(ctx->fs, pctx->ino, inode, 0, 0, 0, 0,
> >>>                &blk))
> >>> -            return;
> >>> +            goto out;
> >>>       /* device files are never extent mapped */
> >>>       not_device++;
> >>>   } else {
> >>> @@ -600,7 +601,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> >>>               blk >= ext2fs_blocks_count(ctx->fs->super) ||
> >>>               ext2fs_fast_test_block_bitmap2(ctx->block_found_map,
> >>>                              blk))
> >>> -                return;    /* Invalid block, can't be dir */
> >>> +                goto out;    /* Invalid block, can't be dir */
> >>>       }
> >>>       blk = inode->i_block[0];
> >>>   }
> >>> @@ -612,45 +613,48 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> >>>    */
> >>>   if ((LINUX_S_ISCHR(inode->i_mode) || LINUX_S_ISBLK(inode->i_mode)) &&
> >>>       (inode->i_links_count == 1) && !not_device)
> >>> -        return;
> >>> +        goto out;
> >>> 
> >>>   /* read the first block */
> >>>   ehandler_operation(_("reading directory block"));
> >>>   retval = ext2fs_read_dir_block4(ctx->fs, blk, buf, 0, pctx->ino);
> >>>   ehandler_operation(0);
> >>>   if (retval)
> >>> -        return;
> >>> +        goto out;
> >>> 
> >>>   dirent = (struct ext2_dir_entry *) buf;
> >>>   retval = ext2fs_get_rec_len(ctx->fs, dirent, &rec_len);
> >>>   if (retval)
> >>> -        return;
> >>> +        goto out;
> >>>   if ((ext2fs_dirent_name_len(dirent) != 1) ||
> >>>       (dirent->name[0] != '.') ||
> >>>       (dirent->inode != pctx->ino) ||
> >>>       (rec_len < 12) ||
> >>>       (rec_len % 4) ||
> >>>       (rec_len >= ctx->fs->blocksize - 12))
> >>> -        return;
> >>> +        goto out;
> >>> 
> >>>   dirent = (struct ext2_dir_entry *) (buf + rec_len);
> >>>   retval = ext2fs_get_rec_len(ctx->fs, dirent, &rec_len);
> >>>   if (retval)
> >>> -        return;
> >>> +        goto out;
> >>>   if ((ext2fs_dirent_name_len(dirent) != 2) ||
> >>>       (dirent->name[0] != '.') ||
> >>>       (dirent->name[1] != '.') ||
> >>>       (rec_len < 12) ||
> >>>       (rec_len % 4))
> >>> -        return;
> >>> +        goto out;
> >>> 
> >>> isdir:
> >>> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> >>>   if (fix_problem(ctx, PR_1_TREAT_AS_DIRECTORY, pctx)) {
> >>>       inode->i_mode = (inode->i_mode & 07777) | LINUX_S_IFDIR;
> >>>       e2fsck_write_inode_full(ctx, pctx->ino, inode,
> >>>                   EXT2_INODE_SIZE(ctx->fs->super),
> >>>                   "check_is_really_dir");
> >>>   }
> >>> +out:
> >>> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> >>> }
> >>> 
> >>> void e2fsck_setup_tdb_icount(e2fsck_t ctx, int flags,
> >>> diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
> >>> index cd967f4..10136a6 100644
> >>> --- a/e2fsck/pass1b.c
> >>> +++ b/e2fsck/pass1b.c
> >>> @@ -234,7 +234,9 @@ void e2fsck_pass1_dupblocks(e2fsck_t ctx, char *block_buf)
> >>>   dict_set_allocator(&clstr_dict, NULL, cluster_dnode_free, NULL);
> >>> 
> >>>   init_resource_track(&rtrack, ctx->fs->io);
> >>> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
> >>>   pass1b(ctx, block_buf);
> >>> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> >>>   print_resource_track(ctx, "Pass 1b", &rtrack, ctx->fs->io);
> >>> 
> >>>   init_resource_track(&rtrack, ctx->fs->io);
> >>> @@ -242,7 +244,9 @@ void e2fsck_pass1_dupblocks(e2fsck_t ctx, char *block_buf)
> >>>   print_resource_track(ctx, "Pass 1c", &rtrack, ctx->fs->io);
> >>> 
> >>>   init_resource_track(&rtrack, ctx->fs->io);
> >>> +    ctx->flags |= E2F_FLAG_IGNORE_READ_ERROR;
> >>>   pass1d(ctx, block_buf);
> >>> +    ctx->flags &= ~E2F_FLAG_IGNORE_READ_ERROR;
> >>>   print_resource_track(ctx, "Pass 1d", &rtrack, ctx->fs->io);
> >>> 
> >>>   /*
> >>> 
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible
  2015-04-02  2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
@ 2015-04-21  1:47   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21  1:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:13PM -0700, Darrick J. Wong wrote:
> When there's a problem accessing the EA part of an inline data symlink
> and we want to truncate the symlink back to 60 characters (hoping the
> user can re-establish the link later on, apparently) be sure to turn
> off the inline data flag to convert the symlink back to a regular fast
> symlink.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 01/35] e2fuzz: fuzz harder
  2015-04-02  2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
@ 2015-04-21  1:47   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21  1:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:06PM -0700, Darrick J. Wong wrote:
> Once we've "fixed" the filesystem, try mounting and modifying it to see
> if we can break the kernel.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap
  2015-04-02  2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
@ 2015-04-21  2:26   ` Theodore Ts'o
  2015-04-21  4:43     ` Darrick J. Wong
  0 siblings, 1 reply; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21  2:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:33PM -0700, Darrick J. Wong wrote:
> Use a bitmap to track which directories we want to rehash, since
> bitmaps will use less memory.  This enables us to clean up the
> rehash-all case to use inode_dir_map, and we can free the dirinfo
> memory sooner.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Um, how is it that bitmaps will use less memory?  Directories
generally don't use contiguous inode numbers (i.e., it's not that
often that inodes N-1. N, and N+1 will all be directoriess), and and
the rbtree data structure is going to have more pointer overhead
compared with the u32 list.

In the case of the bitarray representation, the memory usage is
nr_inodes / 8 in bytes.  The memory usage of the u32 list is (nr_dirs
* 4) bytes.  Given that the number of inodes is generally something
that we've massively provisioned, that's not all that likely.

Looking at some files systems I have handy, it's no contest:

Filesystem	nr_inodes / 8                nr_dirs * 4
/dev/sda3	1,176,576                    382,424
/dev/heap/u1	  655,360		      63,384


Using inode_dir_map for the rehash-all case is a good idea, but I'm
not sure it follows that we should ues a bitmap for the non-rehash-all
case.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata
  2015-04-02  2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
@ 2015-04-21  3:03   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21  3:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:19PM -0700, Darrick J. Wong wrote:
> This patch adds to e2fsck the ability to pre-fetch metadata into the
> page cache in the hopes of speeding up fsck runs.  There are two new
> functions -- the first allows a caller to readahead a list of blocks,
> and the second is a helper function that uses that first mechanism to
> load group data (bitmaps, inode tables).
> 
> These new e2fsck routines require the addition of a dblist API to
> allow us to iterate a subset of a dblist.  This will enable
> incremental directory block readahead in e2fsck pass 2.
> 
> There's also a function to estimate the readahead given a FS.
> 
> v2: Add an API to create a dblist with a given number of list elements
> pre-allocated.  This enables us to save ~2ms per call to
> e2fsck_readahead() (assuming a 2MB RA buffer) by not having to
> repeatedly call ext2_resize_mem as we add blocks to the list.
> 
> v3: Instead of creating dblists of arbitrary size, change the dblist
> iterator to allow iterating a sub-range.  This eliminates a lot of
> unnecessary list copying during e2fsck part2.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2015-04-02  2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2015-04-21  3:03   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21  3:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:27PM -0700, Darrick J. Wong wrote:
> e2fsck pass1 is modified to use the block group data prefetch function
> to try to fetch the inode tables into the pagecache before it is
> needed.  We iterate through the blockgroups until we have enough inode
> tables that need reading such that we can issue readahead; then we sit
> and wait until the last inode table block read of the last group to
> start fetching the next bunch.
> 
> pass2 is modified to use the dirblock prefetching function to prefetch
> the list of directory blocks that are assembled in pass1.  We use the
> "iterate a subset of a dblist" and avoid copying the dblist.  Directory
> blocks are fetched incrementally as we walk through the directory
> block list.  In previous iterations of this patch we would free the
> directory blocks after processing, but the performance hit to e2fsck
> itself wasn't worth it.  Furthermore, it is anticipated that most
> users will then mount the FS and start using the directories, so they
> may as well remain in the page cache.
> 
> pass4 is modified to prefetch the block and inode bitmaps in
> anticipation of pass 5, because pass4 is entirely CPU bound.
> 
> In general, these mechanisms can decrease fsck time by 10-40%, if the
> host system has sufficient memory and the storage system can provide a
> lot of IOPs.  Pretty much any storage system capable of handling
> multiple IOs in-flight at any time will see a fairly large performance
> boost.  (Single-issue USB mass storage disks seem to suffer badly.)
> 
> By default, the readahead buffer size will be set to the size of a block
> group's inode table (which is 2MiB for a regular ext4 FS).  The -E
> readahead_kb= option can be given to specify the amount of memory to
> use for readahead or zero to disable it entirely; or an option can be
> given in e2fsck.conf.
> 
> v2: Fix an off-by-one error in the pass1 readahead which made the
> readahead trigger one inode too late if the block groups are full.
> 
> v3: Use the dblist partial iterator function to read ahead parts
> of the directory block list in pass 2, instead of making sublists.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap
  2015-04-21  2:26   ` Theodore Ts'o
@ 2015-04-21  4:43     ` Darrick J. Wong
  2015-04-21 14:06       ` Theodore Ts'o
  0 siblings, 1 reply; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-21  4:43 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Mon, Apr 20, 2015 at 10:26:45PM -0400, Theodore Ts'o wrote:
> On Wed, Apr 01, 2015 at 07:34:33PM -0700, Darrick J. Wong wrote:
> > Use a bitmap to track which directories we want to rehash, since
> > bitmaps will use less memory.  This enables us to clean up the
> > rehash-all case to use inode_dir_map, and we can free the dirinfo
> > memory sooner.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Um, how is it that bitmaps will use less memory?  Directories
> generally don't use contiguous inode numbers (i.e., it's not that
> often that inodes N-1. N, and N+1 will all be directoriess), and and
> the rbtree data structure is going to have more pointer overhead
> compared with the u32 list.
> 
> In the case of the bitarray representation, the memory usage is
> nr_inodes / 8 in bytes.  The memory usage of the u32 list is (nr_dirs
> * 4) bytes.  Given that the number of inodes is generally something
> that we've massively provisioned, that's not all that likely.
> 
> Looking at some files systems I have handy, it's no contest:
> 
> Filesystem	nr_inodes / 8                nr_dirs * 4
> /dev/sda3	1,176,576                    382,424
> /dev/heap/u1	  655,360		      63,384
> 
> 
> Using inode_dir_map for the rehash-all case is a good idea, but I'm
> not sure it follows that we should ues a bitmap for the non-rehash-all
> case.

Eh, you're right, let's drop this one.  Honestly it's been so long I don't
remember my motivation for writing this up in the first place.  Thanks for
pulling in the e2fsck readahead pieces, though!

--D

> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap
  2015-04-21  4:43     ` Darrick J. Wong
@ 2015-04-21 14:06       ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21 14:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Apr 20, 2015 at 09:43:45PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 20, 2015 at 10:26:45PM -0400, Theodore Ts'o wrote:
> > On Wed, Apr 01, 2015 at 07:34:33PM -0700, Darrick J. Wong wrote:
> > > Use a bitmap to track which directories we want to rehash, since
> > > bitmaps will use less memory.  This enables us to clean up the
> > > rehash-all case to use inode_dir_map, and we can free the dirinfo
> > > memory sooner.
> > 
> > Using inode_dir_map for the rehash-all case is a good idea, but I'm
> > not sure it follows that we should ues a bitmap for the non-rehash-all
> > case.
> 
> Eh, you're right, let's drop this one.  Honestly it's been so long I don't
> remember my motivation for writing this up in the first place.  Thanks for
> pulling in the e2fsck readahead pieces, though!

I think there still is value in using inode_dir_map to iterate over
all of the directories, so we can free the dirinfo memory sooner, but
the value is not as great, I agree.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs
  2015-04-02  2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
@ 2015-04-21 14:36   ` Theodore Ts'o
  2015-05-05 22:45     ` Darrick J. Wong
  0 siblings, 1 reply; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21 14:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:46PM -0700, Darrick J. Wong wrote:
> As of v4.0, the Linux kernel won't add blocks to a block-mapped file
> on a bigalloc filesystem.  Therefore, convert any such files or
> directories we find, to prevent fs errors later on.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.  I adjusted the e2fsck problem messages a little to
compress vertical space, and to remove some gcc-wall warnings.

	 	  	     	       	    - Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion
  2015-04-02  2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
@ 2015-04-21 14:47   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21 14:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:53PM -0700, Darrick J. Wong wrote:
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager
  2015-04-02  4:06   ` Andreas Dilger
@ 2015-04-21 15:00     ` Theodore Ts'o
  2015-04-21 16:48       ` Theodore Ts'o
  0 siblings, 1 reply; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21 15:00 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Darrick J. Wong, linux-ext4

On Wed, Apr 01, 2015 at 11:06:11PM -0500, Andreas Dilger wrote:
> Doesn't it kind of make e2undo useless if it doesn't work unless
> the overwriting operation completed successfully?
> 
> Wouldn't it be better to save the superblock at the start, so that
> it is available if the overwriting operation is interrupted?  It seems
> like e2undo would be most useful if e.g. resize2fs was interrupted in
> the middle of some otherwise-corrupting change to the
> filesystem.

It would be nice if e2fsck's undo log worked correctly after a
powerfailure, but having to constantly call fsync to keep the undo log
consistent probably isn't work it.

However, if the user types ^C, or e2fsck crashes out with a call to
fatal_error(), we *should* make sure the undo log is in a proper state
so it can be replied.

Alternatively, what we *could* do is to implement a write-ahead log
where all of the modified blocks go into separate file, and then the
file system only gets modified at the end, if e2fsck finishes
correctly (or if the user types ^C, we can ask the user if he/she
wants to apply the changes made so far).  I could imagine this being
useful in some cases, but I'm not entirely clear it's worth the effort
to implement.  (And we can always do that later, we shouldn't let the
perfect be the enemy of the good.)

						- Ted


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files
  2015-04-02  2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
@ 2015-04-21 16:33   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21 16:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:34:40PM -0700, Darrick J. Wong wrote:
> Teach e2fsck to (re)construct extent trees.  This enables us to do
> either of the following: compress a highly sparse extent tree into
> fewer ETB blocks; or convert a ext3-style block mapped file to an
> extent file.  The reconstruction is performed during pass 1E or 3A,
> as detailed below.
> 
> For files that are already extent based, this algorithm will
> automatically run (pending user approval) if pass1 determines either
> (1) that a whole level of extent tree will fit into a higher level of
> the tree; (2) that the size of any level can be reduced by at least
> one ETB block; or (3) the extent tree is unnecessarily deep.  It will
> not run at all if errors are found and the user declines to fix the
> errors.
> 
> The option "-E bmap2extent" can be used to force e2fsck to convert all
> block map files to extent trees, and to rebuild all extent files'
> extent trees.  After conversion, files larger than 12 blocks should be
> defragmented to eliminate empty holes where a block lives.
> 
> The extent tree constructor is pretty dumb -- it creates a list of
> leaf extents (adjacent extents are collapsed), marks all indirect
> blocks / ETB blocks free, installs a new extent tree root in the
> inode, then loads the leaf extents into the tree.
> 
> v2: Account for extent tree block slack that we create when splitting
> a block, so that we don't repeatedly annoy the user to rebuild a tree
> that we can't optimize further.
> 
> v3: For any directory being rebuilt during pass 3A, defer any extent
> tree rebuilding until after the rehash.  It's quite possible that the
> act of compressing an aged directory will cause it to shrink far
> enough to enable us to knock a level off the dir's extent tree.
> 
> v4: Add a fixes_only option (and a E2FSCK_FIXES_ONLY environment
> variable) that disables optimization activities unless they are
> required to make the filesystem consistent.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager
  2015-04-21 15:00     ` Theodore Ts'o
@ 2015-04-21 16:48       ` Theodore Ts'o
  2015-04-22  2:47         ` Darrick J. Wong
  0 siblings, 1 reply; 70+ messages in thread
From: Theodore Ts'o @ 2015-04-21 16:48 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Darrick J. Wong, linux-ext4

On Tue, Apr 21, 2015 at 11:00:12AM -0400, Theodore Ts'o wrote:
> However, if the user types ^C, or e2fsck crashes out with a call to
> fatal_error(), we *should* make sure the undo log is in a proper state
> so it can be replied.

It looks like the current set of patches are registering an atexit()
cleanup handler, but there aren't changes to add signal handlers; is
this correct?

In the case of e2fsck, we have signal handlers already, but many of
the other e2fsprogs programs don't have signal handlers and unless I
missed them when I did a quick scan, it looks like this patch series
doesn't add any.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager
  2015-04-21 16:48       ` Theodore Ts'o
@ 2015-04-22  2:47         ` Darrick J. Wong
  0 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-04-22  2:47 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, linux-ext4

On Tue, Apr 21, 2015 at 12:48:02PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 21, 2015 at 11:00:12AM -0400, Theodore Ts'o wrote:
> > However, if the user types ^C, or e2fsck crashes out with a call to
> > fatal_error(), we *should* make sure the undo log is in a proper state
> > so it can be replied.
> 
> It looks like the current set of patches are registering an atexit()
> cleanup handler, but there aren't changes to add signal handlers; is
> this correct?
> 
> In the case of e2fsck, we have signal handlers already, but many of
> the other e2fsprogs programs don't have signal handlers and unless I
> missed them when I did a quick scan, it looks like this patch series
> doesn't add any.

Correct, it does not.  I hadn't made up my mind if I wanted to continue writing
stuff out if one of the bad signals comes in, but for the specific case of ^C
it does seem warranted.  I'm also not quite sure when's a good time to install
our own "default" handler ... I guess each undo io manager could install
itself via sigaction and store the old pointer for calling later?

WAL could be useful too, but I wouldn't want undo_io and wal_io banging around
inside libext2fs together.

--D

> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 16/35] e2fsck: optionally create an undo file
  2015-04-02  2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
@ 2015-05-05 14:07   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:45PM -0700, Darrick J. Wong wrote:
> Provide the user with an option to create an undo file so that they
> can roll back a failed repair operation.
> 
> v2: Support reopening undo files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager
  2015-04-02  2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
  2015-04-02  4:06   ` Andreas Dilger
@ 2015-05-05 14:20   ` Theodore Ts'o
  1 sibling, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:06PM -0700, Darrick J. Wong wrote:
> Implement pass-through calls for discard, zero-out, and readahead in
> the IO manager so that we can take advantage of any underlying
> support.
> 
> Furthermore, improve tdb write-out speed by disabling locking and only
> fsyncing at the end -- we don't care about locking because having
> multiple writers to the undo file will produce an undo database full
> of garbage blocks; and we only need to fsync at the end because if we
> fail before the end, our undo file will lack the necessary superblock
> data that e2undo requires to do replay safely.  Without this, we call
> fsync four times per tdb update(!)  This reduces the overhead of using
> undo_io while converting a 2TB FS to metadata_csum from 3+ hours to 55
> minutes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 11/35] undo-io: be more flexible about setting block size
  2015-04-02  2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
@ 2015-05-05 14:21   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:13PM -0700, Darrick J. Wong wrote:
> Most of the e2fsprogs utilities set the IO block size multiple times
> (once to 1k to read the superblock, then again to set the real block
> size if we find a real superblock).  Unfortunately, the undo IO
> manager only lets the block size be set once.  For the non-mke2fs
> utilities we'd rather catch the real block size and use that.  mke2fs
> of course wants to use a really large block size since it's probably
> writing a lot of data.
> 
> Therefore, if we haven't written any blocks to the undo file, it's
> perfectly fine to allow block size changes.  For mke2fs, we'll modify
> the IO channel option that lets us set the huge size to lock that
> in place.  This greatly reduces index overhead for undo files for
> e2fsck/tune2fs/resize2fs while continuing the practice of reducing
> it even more for mke2fs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 12/35] undo-io: use a bitmap to track what we've already written
  2015-04-02  2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
@ 2015-05-05 14:21   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:19PM -0700, Darrick J. Wong wrote:
> It's really inefficient to (ab)use the TDB key store as a bitmap to
> find out if we've already written a block to the undo file, because
> the tdb code is reads the database key btree disk blocks for *every*
> query.  Changing that logic to a bitmap reduces overhead by a large
> margin -- the overhead of using undo_io while converting a 2TB FS to
> metadata_csum is reduced from 55 minutes to 45.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat
  2015-04-02  2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
@ 2015-05-05 14:22   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:26PM -0700, Darrick J. Wong wrote:
> Fix memory leaks and improve the error messages to make it easier
> to figure out why e2undo went wrong.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file
  2015-04-02  2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
@ 2015-05-05 14:24   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:32PM -0700, Darrick J. Wong wrote:
> The existing undo file format (which is based on tdb) has many
> problems.  First, its comparison of superblock fields is ineffective,
> since the last mount time is only written by the kernel, not the tools
> (which means that undo files can be applied out of order, thus
> corrupting the filesystem); block numbers are written in CPU byte
> order, which will cause silent failures if an undo file is moved from
> one type of system to another; using the tdb database costs us an
> enormous amount of CPU overhead to maintain the key data structure,
> and finally, the tdb database is unable to deal with databases larger
> than 2GB.  (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
> to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
> might as well move off of tdb now.)
> 
> The last problem is fatal if you want to use tune2fs to turn on
> metadata checksumming, since that rewrites every block on the
> filesystem, which can easily produce a many-gigabyte undo file, which
> of course is unreadable and therefore the operation cannot be undone.
> 
> Therefore, rip all of that out in favor of writing to a flat file.
> Old blocks are appended to a file and the index is written to the end
> when we're done.  This implementation is much faster than wasting a
> considerable amount of time trying to maintain a hash index, which
> drops the runtime overhead of tune2fs -O metadata_csum from ~45min
> to ~20 seconds on a 2TB filesystem.
> 
> I have a few reasons that factored in my decision not to repurpose the
> jbd2 file format for undo files.  First, undo files are limited to
> 2^32 blocks (16TB) which some day might not serve us well.  Second,
> the journal block size is tied to the file system block size, but
> mke2fs wants to be able to back up big chunks of old device contents.
> This would require large changes to the e2fsck journal replay code,
> which itself is derived from the kernel jbd2 driver, which I'd rather
> not destabilize.  Third, I want to require undo files to store the FS
> superblock at the end of undo file creation so that e2undo can be
> reasonably sure that an undo file is supposed to apply against the
> given block device, and doing so would require changes to the jbd2
> format.  Fourth, it didn't seem like a good idea that external
> journals should resemble undo files so closely.
> 
> v2: Provide a state bit that is only set when the undo channel is
> closed correctly so we can warn the user about potentially incomplete
> undo files.  Straighten out the superblock handling so that undo files
> won't be confused for real ext* FS images.  Record multi-block runs in
> each block key to reduce overhead even further.  Support reopening an
> undo file so that we can combine multiple FS operations into one
> (overall smaller) transaction file, which will be easier to manage.
> Flush the undo index data if the program should terminate
> unexpectedly.  Update the ext4 superblock bits if errors or -f is
> found to encourage fsck to do a full run the next time it's invoked.
> Enable undoing the undo.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 15/35] libext2fs: support atexit cleanups
  2015-04-02  2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
@ 2015-05-05 14:31   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:39PM -0700, Darrick J. Wong wrote:
> Use the atexit() function to provide a means for the library to clean
> itself up on program exit.  This will be used by the undo IO manager
> to flush the undo file state to disk if the program should terminate
> without closing the io channel, since most e2fsprogs clients will
> simply exit() when they hit errors.
> 
> This won't help for signal termination; client programs must set
> up signal handlers.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, but in undo_io.c:

> @@ -769,11 +792,14 @@ static errcode_t undo_close(io_channel channel)
>  	err = write_undo_indexes(data);
>  	if (data->real)
>  		retval = io_channel_close(data->real);
> +	if (data->tdb_file)
> +		free(data->tdb_file);
>  	if (data->undo_file)
>  		io_channel_close(data->undo_file);
>  	ext2fs_free_mem(&data->keyb);
>  	if (data->written_block_map)
>  		ext2fs_free_generic_bitmap(data->written_block_map);
> +	ext2fs_remove_exit_fn(undo_atexit, data);
>  	ext2fs_free_mem(&channel->private_data);
>  	if (channel->name)
>  		ext2fs_free_mem(&channel->name);
> 

I've moved the call to ext2fs_remove_exit_fn() to right after the call
to write_undo_indexes().  This avoids a write if there is a signal
killing the process right after the call free the data->keyb and
before the call to ext2fs_remove_exit_fn().

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 17/35] resize2fs: optionally create undo file
  2015-04-02  2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
@ 2015-05-05 14:36   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:52PM -0700, Darrick J. Wong wrote:
> Provide the user with an option to create an undo file so that they
> can roll back a failed resize operation.
> 
> v2: Allow reopening of undo files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

				- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 18/35] tune2fs: optionally create undo file
  2015-04-02  2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
@ 2015-05-05 14:36   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:35:58PM -0700, Darrick J. Wong wrote:
> Provide the user with an option to create an undo file so that they
> can roll back a failed tuning operation.  Previously, one would be
> created for inode resize if a bunch of (undocumented) conditions were
> met.
> 
> v2: Enable re-opening of undo files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 19/35] mke2fs: optionally create undo file
  2015-04-02  2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
@ 2015-05-05 14:37   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:04PM -0700, Darrick J. Wong wrote:
> Provide the user with an option to create an undo file so that they
> can roll back a failed tuning operation.  Previously, one would be
> created if force_undo was set in the configuration file and a bunch of
> (undocumented) conditions were met.
> 
> v2: Support reopening undo files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 20/35] debugfs: optionally create undo file
  2015-04-02  2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
@ 2015-05-05 14:43   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:11PM -0700, Darrick J. Wong wrote:
> Provide the user with an option to create an undo file so that they
> can roll back a failed debugfs expedition.
> 
> v2: Support reopening undo files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs
  2015-04-02  2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
@ 2015-05-05 14:43   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:17PM -0700, Darrick J. Wong wrote:
> Regression tests to ensure that we can create undo files and roll
> things back if need be.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 22/35] tests: test various features of the new e2undo format
  2015-04-02  2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
@ 2015-05-05 14:44   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:24PM -0700, Darrick J. Wong wrote:
> Verify that the header, checksum, and wrong-order rollback detection
> features of the new e2undo actually work.
> 
> v2: Collect more tests for the v2 of the e2undo flat file patch.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

				- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype
  2015-04-02  2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
@ 2015-05-05 14:46   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:30PM -0700, Darrick J. Wong wrote:
> When we're creating hard links via ext2fs_link, the (misnamed?) flags
> argument specifies the filetype for the directory entry.  This is
> *derived* from i_mode, so provide a translator.  Otherwise, fsck will
> complain about unset file types.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped
  2015-04-02  2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
@ 2015-05-05 14:49   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:37PM -0700, Darrick J. Wong wrote:
> Rewrite the file copy-in algorithm to detect smaller holes in the
> files we're copying in.  Use SEEK_DATA/SEEK_HOLE/FIEMAP when available
> to skip known empty parts.  This fixes the particular bug where zeroed
> blocks on a system with 64k pages are needlessly copied into a
> 4k-block filesystem.  It also saves time by skipping parts we know to
> be zeroed.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 25/35] copyin: fix error handling
  2015-04-02  2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
@ 2015-05-05 14:51   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:46PM -0700, Darrick J. Wong wrote:
> Save errno (in retval) before doing anything else, because the
> "anything else" (usually com_err()) can call library functions, which
> will reset errno.
> 
> Fix the error messages to use the message catalog, and don't _ever_
> print an error without providing context.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options
  2015-04-02  2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
@ 2015-05-05 14:52   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:36:53PM -0700, Darrick J. Wong wrote:
> Add some simple tests for mke2fs -d (create image from dir) and make
> the manpage options appear in alphabetic order.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 27/35] contrib: script to create minified ext4 image from a directory
  2015-04-02  2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
@ 2015-05-05 14:52   ` Theodore Ts'o
  0 siblings, 0 replies; 70+ messages in thread
From: Theodore Ts'o @ 2015-05-05 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Apr 01, 2015 at 07:37:00PM -0700, Darrick J. Wong wrote:
> The dir2fs script converts a directory into a minimized ext4 filesystem.
> FS creation parameters are tweaked to reduce as much FS overhead as
> possible, and to leave as few unused blocks and inodes as possible.
> Given that mke2fs -d lays out files linearly from the beginning of the
> FS, using resize2fs -M is not as horrible as it usually is.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs
  2015-04-21 14:36   ` Theodore Ts'o
@ 2015-05-05 22:45     ` Darrick J. Wong
  0 siblings, 0 replies; 70+ messages in thread
From: Darrick J. Wong @ 2015-05-05 22:45 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Tue, Apr 21, 2015 at 10:36:55AM -0400, Theodore Ts'o wrote:
> On Wed, Apr 01, 2015 at 07:34:46PM -0700, Darrick J. Wong wrote:
> > As of v4.0, the Linux kernel won't add blocks to a block-mapped file
> > on a bigalloc filesystem.  Therefore, convert any such files or
> > directories we find, to prevent fs errors later on.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Thanks, applied.  I adjusted the e2fsck problem messages a little to
> compress vertical space, and to remove some gcc-wall warnings.

Hmm.  Did this patch fall out somewhere?  I don't see it in this morning's
-next branch.

--D
> 
> 	 	  	     	       	    - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2015-05-05 22:45 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
2015-04-02  2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
2015-04-21  1:47   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
2015-04-21  1:47   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2015-04-21  3:03   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2015-04-21  3:03   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
2015-04-21  2:26   ` Theodore Ts'o
2015-04-21  4:43     ` Darrick J. Wong
2015-04-21 14:06       ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
2015-04-21 16:33   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
2015-04-21 14:36   ` Theodore Ts'o
2015-05-05 22:45     ` Darrick J. Wong
2015-04-02  2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
2015-04-21 14:47   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
2015-04-02  4:10   ` Andreas Dilger
     [not found]     ` <20150402060021.GP11031@birch.djwong.org>
     [not found]       ` <10D33B1F-52B7-4242-9A67-FB9E1CE75296@dilger.ca>
2015-04-06 18:57         ` Darrick J. Wong
2015-04-02  2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
2015-04-02  4:06   ` Andreas Dilger
2015-04-21 15:00     ` Theodore Ts'o
2015-04-21 16:48       ` Theodore Ts'o
2015-04-22  2:47         ` Darrick J. Wong
2015-05-05 14:20   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
2015-05-05 14:21   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
2015-05-05 14:21   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
2015-05-05 14:22   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
2015-05-05 14:24   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
2015-05-05 14:31   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
2015-05-05 14:07   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
2015-05-05 14:36   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
2015-05-05 14:36   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
2015-05-05 14:37   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
2015-05-05 14:43   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
2015-05-05 14:43   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
2015-05-05 14:44   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
2015-05-05 14:46   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
2015-05-05 14:49   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
2015-05-05 14:51   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
2015-05-05 14:52   ` Theodore Ts'o
2015-04-02  2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
2015-05-05 14:52   ` Theodore Ts'o
2015-04-02  2:37 ` [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2015-04-02  2:37 ` [PATCH 29/35] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
2015-04-02  2:37 ` [PATCH 30/35] libext2fs: add new hooks to support large allocations Darrick J. Wong
2015-04-02  2:37 ` [PATCH 31/35] libext2fs: implement fallocate Darrick J. Wong
2015-04-02  2:37 ` [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
2015-04-02  2:37 ` [PATCH 33/35] debugfs: implement fallocate Darrick J. Wong
2015-04-02  2:37 ` [PATCH 34/35] tests: test debugfs punch command Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.