All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/37] e2fsprogs patchbomb 5/14
@ 2014-05-01 23:12 Darrick J. Wong
  2014-05-01 23:12 ` [PATCH 01/37] misc: create better-packaged static analysis reports Darrick J. Wong
                   ` (34 more replies)
  0 siblings, 35 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:12 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Welp, time for another patchbomb!  As usual, the patchbomb starts with
8 minor bug fixes.  The first two add some extra reporting and fix
Coverity bugs.  Patch 3 adds the ability to create sockets when
performing mke2fs -D.  Patch 4 changes mke2fs to always complain when
formatting inline_data with 128-byte inodes.  Patch 5-6 fix some
problems when dumping journals in debugfs.  Patch 7 fixes a bug in
resize2fs when sparse_super2.  Patch 8 fixes a bug in mke2fs where
block group checksums weren't being set when the user specifies packed
metadata blocks.  Patch 9 adds an extended option to mke2fs so that
users can set the error behavior at format time.

Patches 10-14 make some alterations to metadata checksumming support;
by default, e2fsck will now check the inode before verifying the
checksum.  There's a command line option to restore the "just scrape
it off the system" behavior for heavily damaged filesystems.  There
are a couple of patches to fix erroneous behavior and crashes when
e2fsck has to rebuild the root directory.  The final patch in this
clump adds a command line option to dumpe2fs to ignore checksum
failures.

Patch 15 enables block_validity for new filesystems.  As noted here
previously, the overhead of enabling this option seems to be at most a
1% performance hit when performing a lot of small allocations, and
negligible otherwise.  On the plus side, the filesystem is smarter
about noticing erroneous allocations out of metadata areas (i.e. block
bitmap corruption) and shutting itself down to prevent damage.

Patches 16-17 enhance ext2fs_bmap2() to allow the creation of
uninitialized extents.  The functionality is already there; really it
just adds a flag to indicate uninitialized.  There's also a patch to
the fileio routines to handle uninitialized extents.  These patches
are unchanged from December.

Patches 18-20 add to resize2fs the ability to convert a filesystem to
and from 64bit mode.  These patches are unchanged from December.

Patches 21-24 implement readahead for e2fsck.  The first patch tries
to reduce system call overhead by using pread/pwrite if available.
The next two patches plumb in the IO manager and library changes
necessary to read metadata blocks into the page cache (on Linux).  The
final patch teaches e2fsck to use the library readahead functions in a
separate thread.

Crude testing has been done via:
# echo 3 > /proc/sys/vm/drop_caches
# e2fsck -Fnfvtt /dev/XXX

So far in my crude testing on a cold system, I've seen about a ~20%
speedup on a SSD, a ~40% speedup on a 3x RAID1 SATA array, and about
a 10% speedup on a single-spindle SATA disk.  On a single-queue USB
HDD, performance doesn't change much.  It looks as though low end
storage like USB HDDs will not benefit, which doesn't surprise me.
There's around a 2% regression for USB HDDs, though it doesn't seem
statistically significant.  The SSD numbers are harder to quantify
since they're already fast.  Somewhat unexpectedly, the readahead code
speeds up e2fsck even when the page cache has already been warmed up.

This third version of the readahead patches try to prevent page cache
thrashing by limiting the amount of (user-configurable) readahead to a
default of half of physical memory.  It also tries to release some of
the memory pages if it can conclude that it's totally done with a
block, and it can now detect very slow readahead and disable it.

Patches 25-29 implement fallocate for e2fsprogs, and modifies Ted's
mk_hugefiles functionality to use it.  The general fallocate API call
is (regrettably) much more complex than Ted's, since it must grapple
with the possibility that the file already has mapped blocks.  There
were also a lot of bigalloc related subtleties.

Patches 30-33 implement fuse2fs, a FUSE server based on libext2fs.
Primarily I've been using it to shake out bugs in the library via
xfstests and the metadata checksumming test program.  It can also be
used to mount ext4 on any OS supporting FUSE, and it can also mount
64k-block filesystems on x86, though I'd be wary of using rw mode.
fuse2fs depends on these new APIs: xattr editing, uninit extent
handling, and the new fallocate call.

Patches 34-36 provide the metadata checksumming test script.  Its
primary advantage over 'make check' is that it allows one to specify a 
variety of different mkfs and mount options.  It's also growing more
tests as a result of fuse2fs exercise.

Patch 37 introduces ext5, which reduces our testing matrix by
requiring a fairly large set of features and eliminating most mount
options.  True, not all the features are stable or ready for
production yet, but six years after ext4 we have a bunch of new
features ready for wider testing.

I've tested these e2fsprogs changes against the -next branch as of
4/17.  These days, I use several VMs, each with 8GB ramdisks to test
with; the test process is checkpatch > make C=1 > make check >
metadata checksum tests > fuse + xfstests.

Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/37] misc: create better-packaged static analysis reports
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
@ 2014-05-01 23:12 ` Darrick J. Wong
  2014-05-11 22:33   ` Theodore Ts'o
  2014-05-01 23:12 ` [PATCH 02/37] misc: coverity fixes Darrick J. Wong
                   ` (33 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:12 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix some minor bugs relating to passing CFLAGS to cppcheck, and
package the cppcheck output into nicer looking reports.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 MCONFIG.in                   |   17 ++++++++++++-----
 debugfs/Makefile.in          |    2 +-
 e2fsck/Makefile.in           |    2 +-
 ext2ed/Makefile.in           |    2 +-
 intl/Makefile.in             |    4 ++--
 lib/blkid/Makefile.in        |    2 +-
 lib/e2p/Makefile.in          |    2 +-
 lib/et/Makefile.in           |    2 +-
 lib/ext2fs/Makefile.in       |    2 +-
 lib/quota/Makefile.in        |    2 +-
 lib/ss/Makefile.in           |    2 +-
 lib/uuid/Makefile.in         |    2 +-
 misc/Makefile.in             |    2 +-
 resize/Makefile.in           |    2 +-
 tests/progs/Makefile.in      |    2 +-
 util/Makefile.in             |    2 +-
 util/static-analysis-cleanup |   20 ++++++++++++++++++++
 17 files changed, 48 insertions(+), 21 deletions(-)
 create mode 100644 util/static-analysis-cleanup


diff --git a/MCONFIG.in b/MCONFIG.in
index 9b411d6..7e520be 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -53,7 +53,7 @@ datadir = @datadir@
 @ifGNUmake@ CHECK=sparse
 @ifGNUmake@ CHECK_OPTS=-Wsparse-all -Wno-transparent-union -Wno-return-void -Wno-undef -Wno-non-pointer-null
 @ifGNUmake@ CPPCHECK=cppcheck
-@ifGNUmake@ CPPCHECK_OPTS=--force --enable=all
+@ifGNUmake@ CPPCHECK_OPTS=--force --enable=all --quiet
 @ifGNUmake@ ifeq ("$(C)", "2")
 @ifGNUmake@   CHECK_CMD=$(CHECK) $(CHECK_OPTS) -Wbitwise -D__CHECK_ENDIAN__
 @ifGNUmake@   CPPCHECK_CMD=$(CPPCHECK) $(CPPCHECK_OPTS)
@@ -183,7 +183,7 @@ DEP_INSTALL_SYMLINK = $(top_builddir)/util/install-symlink \
 # Run make gcc-wall to do a build with warning messages.
 #
 #
-WFLAGS=		-std=c99 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE \
+WFLAGS=		-std=gnu99 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE \
 			-pedantic $(WFLAGS_EXTRA) \
 			-Wall -W -Wwrite-strings -Wpointer-arith \
 			-Wcast-qual -Wcast-align -Wno-variadic-macros \
@@ -194,11 +194,18 @@ WFLAGS=		-std=c99 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE \
 			-UENABLE_NLS
 
 gcc-wall-new:
-	(make CFLAGS="@CFLAGS@ $(WFLAGS)" > /dev/null) 2>&1 | sed -f $(top_srcdir)/util/gcc-wall-cleanup 
+	($(MAKE) CFLAGS="@CFLAGS@ $(WFLAGS)" > /dev/null) 2>&1 | sed -f $(top_srcdir)/util/gcc-wall-cleanup
 
 gcc-wall:
-	make clean > /dev/null
-	make gcc-wall-new
+	$(MAKE) clean > /dev/null
+	$(MAKE) gcc-wall-new
+
+static-check:
+	($(MAKE) C=1 V=1 CFLAGS="@CFLAGS@ $(WFLAGS)") 2>&1 | sed -f $(top_srcdir)/util/static-analysis-cleanup
+
+static-check-all:
+	$(MAKE) clean > /dev/null
+	$(MAKE) static-check
 
 #
 # Installation user and groups
diff --git a/debugfs/Makefile.in b/debugfs/Makefile.in
index 34cdac1..5097749 100644
--- a/debugfs/Makefile.in
+++ b/debugfs/Makefile.in
@@ -46,7 +46,7 @@ STATIC_DEPLIBS= $(STATIC_LIBEXT2FS) $(DEPSTATIC_LIBSS) \
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 
 all:: $(PROGS) $(MANPAGES)
 
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 5c8ce39..5a6883a 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -40,7 +40,7 @@ COMPILE_ET=$(top_builddir)/lib/et/compile_et --build-tree
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 
 #
diff --git a/ext2ed/Makefile.in b/ext2ed/Makefile.in
index f05a562..0697431 100644
--- a/ext2ed/Makefile.in
+++ b/ext2ed/Makefile.in
@@ -34,7 +34,7 @@ DOCS=   doc/ext2ed-design.pdf doc/user-guide.pdf doc/ext2fs-overview.pdf \
 .c.o:
 	$(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(CPPCHECK_CMD) $<
+	$(CPPCHECK_CMD) $(CPPFLAGS) $<
 
 .SUFFIXES: .sgml .ps .pdf .html
 
diff --git a/intl/Makefile.in b/intl/Makefile.in
index 07700c8..db6d7d7 100644
--- a/intl/Makefile.in
+++ b/intl/Makefile.in
@@ -62,7 +62,7 @@ mkinstalldirs = $(SHELL) $(MKINSTALLDIRS)
 @ifGNUmake@ CHECK=sparse
 @ifGNUmake@ CHECK_OPTS=-Wsparse-all -Wno-transparent-union -Wno-return-void -Wno-undef -Wno-non-pointer-null
 @ifGNUmake@ CPPCHECK=cppcheck
-@ifGNUmake@ CPPCHECK_OPTS=--force --enable=all
+@ifGNUmake@ CPPCHECK_OPTS=--force --enable=all --quiet --check-config
 @ifGNUmake@ ifeq ("$(C)", "2")
 @ifGNUmake@   CHECK_CMD=$(CHECK) $(CHECK_OPTS) -Wbitwise -D__CHECK_ENDIAN__
 @ifGNUmake@   CPPCHECK_CMD=$(CPPCHECK) $(CPPCHECK_OPTS)
@@ -212,7 +212,7 @@ LTV_AGE=4
 	$(E) "	CC $<"
 	$(Q) $(COMPILE) $<
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 
 .y.c:
 	$(YACC) $(YFLAGS) --output $@ $<
diff --git a/lib/blkid/Makefile.in b/lib/blkid/Makefile.in
index 69b5b4c..ecd0e8f 100644
--- a/lib/blkid/Makefile.in
+++ b/lib/blkid/Makefile.in
@@ -56,7 +56,7 @@ DEPLIBS_BLKID=	$(DEPSTATIC_LIBBLKID) $(DEPSTATIC_LIBUUID)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/e2p/Makefile.in b/lib/e2p/Makefile.in
index 761ac48..60920dd 100644
--- a/lib/e2p/Makefile.in
+++ b/lib/e2p/Makefile.in
@@ -56,7 +56,7 @@ BSDLIB_INSTALL_DIR = $(root_libdir)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/et/Makefile.in b/lib/et/Makefile.in
index 4f2d31f..db2e056 100644
--- a/lib/et/Makefile.in
+++ b/lib/et/Makefile.in
@@ -44,7 +44,7 @@ BSDLIB_INSTALL_DIR = $(root_libdir)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 0c880c7..f287a57 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -205,7 +205,7 @@ all:: ext2fs.pc
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/quota/Makefile.in b/lib/quota/Makefile.in
index 0344d09..aa513a0 100644
--- a/lib/quota/Makefile.in
+++ b/lib/quota/Makefile.in
@@ -48,7 +48,7 @@ LIBDIR= quota
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 #ELF_CMT#	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/ss/Makefile.in b/lib/ss/Makefile.in
index 4c1ef8f..a94de79 100644
--- a/lib/ss/Makefile.in
+++ b/lib/ss/Makefile.in
@@ -35,7 +35,7 @@ MK_CMDS=_SS_DIR_OVERRIDE=. ./mk_cmds
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $<
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -DSHARED_ELF_LIB -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/uuid/Makefile.in b/lib/uuid/Makefile.in
index f5b767e..f436b36 100644
--- a/lib/uuid/Makefile.in
+++ b/lib/uuid/Makefile.in
@@ -63,7 +63,7 @@ BSDLIB_INSTALL_DIR = $(root_libdir)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/misc/Makefile.in b/misc/Makefile.in
index 18a8a2f..7bd9b5f 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -103,7 +103,7 @@ COMPILE_ET=$(top_builddir)/lib/et/compile_et --build-tree
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 
 all:: profiled $(SPROGS) $(UPROGS) $(USPROGS) $(SMANPAGES) $(UMANPAGES) \
diff --git a/resize/Makefile.in b/resize/Makefile.in
index 16f2a95..201e268 100644
--- a/resize/Makefile.in
+++ b/resize/Makefile.in
@@ -39,7 +39,7 @@ DEPSTATIC_LIBS= $(STATIC_LIBE2P) $(STATIC_LIBEXT2FS) $(DEPSTATIC_LIBCOM_ERR)
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 
 all:: $(PROGS) $(TEST_PROGS) $(MANPAGES) 
 
diff --git a/tests/progs/Makefile.in b/tests/progs/Makefile.in
index 6c986e4..22d9417 100644
--- a/tests/progs/Makefile.in
+++ b/tests/progs/Makefile.in
@@ -28,7 +28,7 @@ DEPLIBS= $(LIBEXT2FS) $(DEPLIBSS) $(DEPLIBCOM_ERR)
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 
 all:: $(PROGS)
 
diff --git a/util/Makefile.in b/util/Makefile.in
index 2375e17..5171c1c 100644
--- a/util/Makefile.in
+++ b/util/Makefile.in
@@ -17,7 +17,7 @@ SRCS = $(srcdir)/subst.c
 	$(E) "	CC $<"
 	$(Q) $(BUILD_CC) -c $(BUILD_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
-	$(Q) $(CPPCHECK_CMD) $<
+	$(Q) $(CPPCHECK_CMD) $(CPPFLAGS) $<
 
 PROGS=		subst symlinks
 
diff --git a/util/static-analysis-cleanup b/util/static-analysis-cleanup
new file mode 100644
index 0000000..6749259
--- /dev/null
+++ b/util/static-analysis-cleanup
@@ -0,0 +1,20 @@
+#!/bin/sed -f
+#
+# This script filters out gcc-wall crud that we're not interested in seeing.
+#
+/^cc /d
+/^kcc /d
+/^gcc /d
+/does not support `long long'/d
+/forbids long long integer constants/d
+/does not support the `ll' length modifier/d
+/does not support the `ll' printf length modifier/d
+/ANSI C forbids long long integer constants/d
+/traditional C rejects string concatenation/d
+/integer constant is unsigned in ANSI C, signed with -traditional/d
+/warning: missing initializer/d
+/warning: (near initialization for/d
+/^[ 	]*from/d
+/unused parameter/d
+/e2_types.h" not found.$/d
+/e2_bitops.h" not found.$/d


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 02/37] misc: coverity fixes
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
  2014-05-01 23:12 ` [PATCH 01/37] misc: create better-packaged static analysis reports Darrick J. Wong
@ 2014-05-01 23:12 ` Darrick J. Wong
  2014-05-02 11:17   ` Lukáš Czerner
  2014-05-01 23:12 ` [PATCH 03/37] libext2fs: create sockets when populating filesystem Darrick J. Wong
                   ` (32 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:12 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix various small resource leaks and error code handling issues that
Coverity pointed out.

Fixes-Coverity-Bugs: 11919{39-45}, 1174118, 1049160, 1049144
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/xattrs.c    |   38 ++++++++++++++++++++------------------
 lib/ext2fs/extent.c |    7 ++++---
 lib/ext2fs/punch.c  |    2 +-
 misc/create_inode.c |   34 ++++++++++++++++++++--------------
 4 files changed, 45 insertions(+), 36 deletions(-)


diff --git a/debugfs/xattrs.c b/debugfs/xattrs.c
index 0a29521..7109719 100644
--- a/debugfs/xattrs.c
+++ b/debugfs/xattrs.c
@@ -122,26 +122,26 @@ void do_get_xattr(int argc, char **argv)
 		default:
 			printf("%s: Usage: %s <file> <attr> [-f outfile]\n",
 			       argv[0], argv[0]);
-			return;
+			goto out2;
 		}
 	}
 
 	if (optind != argc - 2) {
 		printf("%s: Usage: %s <file> <attr> [-f outfile]\n", argv[0],
 		       argv[0]);
-		return;
+		goto out2;
 	}
 
 	if (check_fs_open(argv[0]))
-		return;
+		goto out2;
 
 	ino = string_to_inode(argv[optind]);
 	if (!ino)
-		return;
+		goto out2;
 
 	err = ext2fs_xattrs_open(current_fs, ino, &h);
 	if (err)
-		return;
+		goto out2;
 
 	err = ext2fs_xattrs_read(h);
 	if (err)
@@ -153,18 +153,19 @@ void do_get_xattr(int argc, char **argv)
 
 	if (fp) {
 		fwrite(buf, buflen, 1, fp);
-		fclose(fp);
 	} else {
 		dump_xattr_string(stdout, buf, buflen);
 		printf("\n");
 	}
 
-	if (buf)
-		ext2fs_free_mem(&buf);
+	ext2fs_free_mem(&buf);
 out:
 	ext2fs_xattrs_close(&h);
 	if (err)
 		com_err(argv[0], err, "while getting extended attribute");
+out2:
+	if (fp)
+		fclose(fp);
 }
 
 void do_set_xattr(int argc, char **argv)
@@ -190,30 +191,30 @@ void do_set_xattr(int argc, char **argv)
 		default:
 			printf("%s: Usage: %s <file> <attr> [-f infile | "
 			       "value]\n", argv[0], argv[0]);
-			return;
+			goto out2;
 		}
 	}
 
 	if (optind != argc - 2 && optind != argc - 3) {
 		printf("%s: Usage: %s <file> <attr> [-f infile | value>]\n",
 		       argv[0], argv[0]);
-		return;
+		goto out2;
 	}
 
 	if (check_fs_open(argv[0]))
-		return;
+		goto out2;
 	if (check_fs_read_write(argv[0]))
-		return;
+		goto out2;
 	if (check_fs_bitmaps(argv[0]))
-		return;
+		goto out2;
 
 	ino = string_to_inode(argv[optind]);
 	if (!ino)
-		return;
+		goto out2;
 
 	err = ext2fs_xattrs_open(current_fs, ino, &h);
 	if (err)
-		return;
+		goto out2;
 
 	err = ext2fs_xattrs_read(h);
 	if (err)
@@ -238,13 +239,14 @@ void do_set_xattr(int argc, char **argv)
 		goto out;
 
 out:
+	ext2fs_xattrs_close(&h);
+	if (err)
+		com_err(argv[0], err, "while setting extended attribute");
+out2:
 	if (fp) {
 		fclose(fp);
 		ext2fs_free_mem(&buf);
 	}
-	ext2fs_xattrs_close(&h);
-	if (err)
-		com_err(argv[0], err, "while setting extended attribute");
 }
 
 void do_rm_xattr(int argc, char **argv)
diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 80ce88f..30673b5 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -1482,7 +1482,7 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 			if (retval) {
 				r2 = ext2fs_extent_goto(handle, orig_lblk);
 				if (r2 == 0)
-					ext2fs_extent_replace(handle, 0,
+					(void)ext2fs_extent_replace(handle, 0,
 							      &orig_extent);
 				goto done;
 			}
@@ -1498,11 +1498,12 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 				r2 = ext2fs_extent_goto(handle,
 							newextent.e_lblk);
 				if (r2 == 0)
-					ext2fs_extent_delete(handle, 0);
+					(void)ext2fs_extent_delete(handle, 0);
 			}
 			r2 = ext2fs_extent_goto(handle, orig_lblk);
 			if (r2 == 0)
-				ext2fs_extent_replace(handle, 0, &orig_extent);
+				(void)ext2fs_extent_replace(handle, 0,
+							    &orig_extent);
 			goto done;
 		}
 	}
diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 60cd2a3..c9250cd 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -403,7 +403,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 			retval = 0;
 
 			/* Jump forward to the next extent. */
-			ext2fs_extent_goto(handle, next_lblk);
+			(void)ext2fs_extent_goto(handle, next_lblk);
 			op = EXT2_EXTENT_CURRENT;
 		}
 		if (retval)
diff --git a/misc/create_inode.c b/misc/create_inode.c
index 964c66a..4bb5e5b 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -465,7 +465,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 	char		ln_target[PATH_MAX];
 	unsigned int	save_inode;
 	ext2_ino_t	ino;
-	errcode_t	retval;
+	errcode_t	retval = 0;
 	int		read_cnt;
 	int		hdlink;
 
@@ -486,7 +486,11 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 		if ((!strcmp(dent->d_name, ".")) ||
 		    (!strcmp(dent->d_name, "..")))
 			continue;
-		lstat(dent->d_name, &st);
+		if (lstat(dent->d_name, &st)) {
+			com_err(__func__, errno, _("while lstat \"%s\""),
+				dent->d_name);
+			goto out;
+		}
 		name = dent->d_name;
 
 		/* Check for hardlinks */
@@ -501,7 +505,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 				if (retval) {
 					com_err(__func__, retval,
 						"while linking %s", name);
-					return retval;
+					goto out;
 				}
 				continue;
 			} else
@@ -517,7 +521,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 				com_err(__func__, retval,
 					_("while creating special file "
 					  "\"%s\""), name);
-				return retval;
+				goto out;
 			}
 			break;
 		case S_IFSOCK:
@@ -527,7 +531,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			continue;
 		case S_IFLNK:
 			read_cnt = readlink(name, ln_target,
-					    sizeof(ln_target));
+					    sizeof(ln_target) - 1);
 			if (read_cnt == -1) {
 				com_err(__func__, errno,
 					_("while trying to readlink \"%s\""),
@@ -541,7 +545,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 				com_err(__func__, retval,
 					_("while writing symlink\"%s\""),
 					name);
-				return retval;
+				goto out;
 			}
 			break;
 		case S_IFREG:
@@ -550,7 +554,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			if (retval) {
 				com_err(__func__, retval,
 					_("while writing file \"%s\""), name);
-				return retval;
+				goto out;
 			}
 			break;
 		case S_IFDIR:
@@ -559,25 +563,25 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			if (retval) {
 				com_err(__func__, retval,
 					_("while making dir \"%s\""), name);
-				return retval;
+				goto out;
 			}
 			retval = ext2fs_namei(fs, root, parent_ino,
 					      name, &ino);
 			if (retval) {
 				com_err(name, retval, 0);
-					return retval;
+					goto out;
 			}
 			/* Populate the dir recursively*/
 			retval = __populate_fs(fs, ino, name, root, hdlinks);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while adding dir \"%s\""), name);
-				return retval;
+				goto out;
 			}
 			if (chdir("..")) {
 				com_err(__func__, errno,
 					_("during cd .."));
-				return errno;
+				goto out;
 			}
 			break;
 		default:
@@ -588,14 +592,14 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 		retval =  ext2fs_namei(fs, root, parent_ino, name, &ino);
 		if (retval) {
 			com_err(name, retval, 0);
-			return retval;
+			goto out;
 		}
 
 		retval = set_inode_extra(fs, parent_ino, ino, &st);
 		if (retval) {
 			com_err(__func__, retval,
 				_("while setting inode for \"%s\""), name);
-			return retval;
+			goto out;
 		}
 
 		/* Save the hardlink ino */
@@ -612,7 +616,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 				if (p == NULL) {
 					com_err(name, errno,
 						_("Not enough memory"));
-					return errno;
+					goto out;
 				}
 				hdlinks->hdl = p;
 				hdlinks->size += HDLINK_CNT;
@@ -623,6 +627,8 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			hdlinks->count++;
 		}
 	}
+
+out:
 	closedir(dh);
 	return retval;
 }


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 03/37] libext2fs: create sockets when populating filesystem
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
  2014-05-01 23:12 ` [PATCH 01/37] misc: create better-packaged static analysis reports Darrick J. Wong
  2014-05-01 23:12 ` [PATCH 02/37] misc: coverity fixes Darrick J. Wong
@ 2014-05-01 23:12 ` Darrick J. Wong
  2014-05-02 11:22   ` Lukáš Czerner
  2014-05-01 23:12 ` [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data Darrick J. Wong
                   ` (31 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:12 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Since the code to copy-in a socket when creating a filesystem is
fairly simple, just do it here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index 4bb5e5b..e7faab1 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -114,6 +114,9 @@ errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
 		mode = LINUX_S_IFIFO;
 		filetype = EXT2_FT_FIFO;
 		break;
+	case S_IFSOCK:
+		mode = LINUX_S_IFSOCK;
+		filetype = EXT2_FT_SOCK;
 	default:
 		abort();
 		/* NOTREACHED */
@@ -516,6 +519,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 		case S_IFCHR:
 		case S_IFBLK:
 		case S_IFIFO:
+		case S_IFSOCK:
 			retval = do_mknod_internal(fs, parent_ino, name, &st);
 			if (retval) {
 				com_err(__func__, retval,
@@ -524,11 +528,6 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 				goto out;
 			}
 			break;
-		case S_IFSOCK:
-			/* FIXME: there is no make socket function atm. */
-			com_err(__func__, 0,
-				_("ignoring socket file \"%s\""), name);
-			continue;
 		case S_IFLNK:
 			read_cnt = readlink(name, ln_target,
 					    sizeof(ln_target) - 1);


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2014-05-01 23:12 ` [PATCH 03/37] libext2fs: create sockets when populating filesystem Darrick J. Wong
@ 2014-05-01 23:12 ` Darrick J. Wong
  2014-05-02 11:27   ` Lukáš Czerner
  2014-05-01 23:12 ` [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables Darrick J. Wong
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:12 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The combination of 128-byte inodes and inline_data is silly, since
there's no room in the inode table.  Unfortunately, if neither
mke2fs.conf nor the mkfs command line options specify an inode size,
the default inode size is set to 128 bytes (by libext2fs) and the
warning isn't printed.  Therefore, always do the check-and-warning.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)


diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index aecd5d5..6507d0d 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -2282,21 +2282,22 @@ profile_error:
 				blocksize);
 			exit(1);
 		}
-		/*
-		 * If inode size is 128 and inline data is enabled, we need
-		 * to notify users that inline data will never be useful.
-		 */
-		if ((fs_param.s_feature_incompat &
-		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
-		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
-			com_err(program_name, 0,
-				_("inode size is %d, inline data is useless"),
-				inode_size);
-			exit(1);
-		}
 		fs_param.s_inode_size = inode_size;
 	}
 
+	/*
+	 * If inode size is 128 and inline data is enabled, we need
+	 * to notify users that inline data will never be useful.
+	 */
+	if ((fs_param.s_feature_incompat &
+	     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
+	    fs_param.s_inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
+		com_err(program_name, 0,
+			_("inode size is %d, inline data is useless"),
+			inode_size);
+		exit(1);
+	}
+
 	/* Make sure number of inodes specified will fit in 32 bits */
 	if (num_inodes == 0) {
 		unsigned long long n;


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2014-05-01 23:12 ` [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data Darrick J. Wong
@ 2014-05-01 23:12 ` Darrick J. Wong
  2014-05-02 11:38   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 06/37] debugfs: force logdump to display (old) journal contents Darrick J. Wong
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:12 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The logdump command doesn't know how to deal with revoke tables in
64bit journals, so teach it to do this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/logdump.c          |   20 ++++-
 tests/f_jnl_64bit/expect.0 |  171 --------------------------------------------
 2 files changed, 15 insertions(+), 176 deletions(-)


diff --git a/debugfs/logdump.c b/debugfs/logdump.c
index 2d0efaf..8b9dc5b 100644
--- a/debugfs/logdump.c
+++ b/debugfs/logdump.c
@@ -526,28 +526,38 @@ static void dump_revoke_block(FILE *out_file, char *buf,
 {
 	int			offset, max;
 	journal_revoke_header_t *header;
-	unsigned int		*entry, rblock;
+	unsigned int		*entry;
+	unsigned long long	*bentry, rblock;
+	int			tag_size = sizeof(*entry);
 
 	if (dump_all)
 		fprintf(out_file, "Dumping revoke block, sequence %u, at "
 			"block %u:\n", transaction, blocknr);
 
+	if (be32_to_cpu(jsb->s_feature_incompat) & JFS_FEATURE_INCOMPAT_64BIT)
+		tag_size = sizeof(*bentry);
+
 	header = (journal_revoke_header_t *) buf;
 	offset = sizeof(journal_revoke_header_t);
 	max = be32_to_cpu(header->r_count);
 
 	while (offset < max) {
-		entry = (unsigned int *) (buf + offset);
-		rblock = be32_to_cpu(*entry);
+		if (tag_size == sizeof(*entry)) {
+			entry = (unsigned int *) (buf + offset);
+			rblock = be32_to_cpu(*entry);
+		} else {
+			bentry = (unsigned long long *)(buf + offset);
+			rblock = ext2fs_be64_to_cpu(*bentry);
+		}
 		if (dump_all || rblock == block_to_dump) {
-			fprintf(out_file, "  Revoke FS block %u", rblock);
+			fprintf(out_file, "  Revoke FS block %llu", rblock);
 			if (dump_all)
 				fprintf(out_file, "\n");
 			else
 				fprintf(out_file," at block %u, sequence %u\n",
 					blocknr, transaction);
 		}
-		offset += 4;
+		offset += tag_size;
 	}
 }
 
diff --git a/tests/f_jnl_64bit/expect.0 b/tests/f_jnl_64bit/expect.0
index 2007f03..5cef2d8 100644
--- a/tests/f_jnl_64bit/expect.0
+++ b/tests/f_jnl_64bit/expect.0
@@ -1,189 +1,97 @@
 Journal starts at block 67, transaction 32
 Found expected sequence 32, type 5 (revoke table) at block 67
 Dumping revoke block, sequence 32, at block 67:
-  Revoke FS block 0
   Revoke FS block 1536
-  Revoke FS block 0
   Revoke FS block 1472
-  Revoke FS block 0
   Revoke FS block 1473
-  Revoke FS block 0
   Revoke FS block 1474
-  Revoke FS block 0
   Revoke FS block 1475
-  Revoke FS block 0
   Revoke FS block 1476
-  Revoke FS block 0
   Revoke FS block 1541
-  Revoke FS block 0
   Revoke FS block 1477
-  Revoke FS block 0
   Revoke FS block 1478
-  Revoke FS block 0
   Revoke FS block 1479
-  Revoke FS block 0
   Revoke FS block 1480
-  Revoke FS block 0
   Revoke FS block 1481
-  Revoke FS block 0
   Revoke FS block 1482
-  Revoke FS block 0
   Revoke FS block 1483
-  Revoke FS block 0
   Revoke FS block 1484
-  Revoke FS block 0
   Revoke FS block 1485
-  Revoke FS block 0
   Revoke FS block 1486
-  Revoke FS block 0
   Revoke FS block 1487
-  Revoke FS block 0
   Revoke FS block 1488
-  Revoke FS block 0
   Revoke FS block 1489
-  Revoke FS block 0
   Revoke FS block 1490
-  Revoke FS block 0
   Revoke FS block 1491
-  Revoke FS block 0
   Revoke FS block 1556
-  Revoke FS block 0
   Revoke FS block 1492
-  Revoke FS block 0
   Revoke FS block 1493
-  Revoke FS block 0
   Revoke FS block 1429
-  Revoke FS block 0
   Revoke FS block 1494
-  Revoke FS block 0
   Revoke FS block 1495
-  Revoke FS block 0
   Revoke FS block 1496
-  Revoke FS block 0
   Revoke FS block 1432
-  Revoke FS block 0
   Revoke FS block 1497
-  Revoke FS block 0
   Revoke FS block 1498
-  Revoke FS block 0
   Revoke FS block 1434
-  Revoke FS block 0
   Revoke FS block 1499
-  Revoke FS block 0
   Revoke FS block 1435
-  Revoke FS block 0
   Revoke FS block 1500
-  Revoke FS block 0
   Revoke FS block 1501
-  Revoke FS block 0
   Revoke FS block 1502
-  Revoke FS block 0
   Revoke FS block 1503
-  Revoke FS block 0
   Revoke FS block 1504
-  Revoke FS block 0
   Revoke FS block 1505
-  Revoke FS block 0
   Revoke FS block 1506
-  Revoke FS block 0
   Revoke FS block 1442
-  Revoke FS block 0
   Revoke FS block 1507
-  Revoke FS block 0
   Revoke FS block 1508
-  Revoke FS block 0
   Revoke FS block 1444
-  Revoke FS block 0
   Revoke FS block 1509
-  Revoke FS block 0
   Revoke FS block 1445
-  Revoke FS block 0
   Revoke FS block 1510
-  Revoke FS block 0
   Revoke FS block 1511
-  Revoke FS block 0
   Revoke FS block 1512
-  Revoke FS block 0
   Revoke FS block 1513
-  Revoke FS block 0
   Revoke FS block 1449
-  Revoke FS block 0
   Revoke FS block 1514
-  Revoke FS block 0
   Revoke FS block 1515
-  Revoke FS block 0
   Revoke FS block 1516
-  Revoke FS block 0
   Revoke FS block 1517
-  Revoke FS block 0
   Revoke FS block 1453
-  Revoke FS block 0
   Revoke FS block 1518
-  Revoke FS block 0
   Revoke FS block 1519
-  Revoke FS block 0
   Revoke FS block 1520
-  Revoke FS block 0
   Revoke FS block 1456
-  Revoke FS block 0
   Revoke FS block 1521
-  Revoke FS block 0
   Revoke FS block 1457
-  Revoke FS block 0
   Revoke FS block 1522
-  Revoke FS block 0
   Revoke FS block 1458
-  Revoke FS block 0
   Revoke FS block 1523
-  Revoke FS block 0
   Revoke FS block 1459
-  Revoke FS block 0
   Revoke FS block 1524
-  Revoke FS block 0
   Revoke FS block 1460
-  Revoke FS block 0
   Revoke FS block 1525
-  Revoke FS block 0
   Revoke FS block 1461
-  Revoke FS block 0
   Revoke FS block 1526
-  Revoke FS block 0
   Revoke FS block 1462
-  Revoke FS block 0
   Revoke FS block 1527
-  Revoke FS block 0
   Revoke FS block 1463
-  Revoke FS block 0
   Revoke FS block 1528
-  Revoke FS block 0
   Revoke FS block 1464
-  Revoke FS block 0
   Revoke FS block 1529
-  Revoke FS block 0
   Revoke FS block 1465
-  Revoke FS block 0
   Revoke FS block 1530
-  Revoke FS block 0
   Revoke FS block 1466
-  Revoke FS block 0
   Revoke FS block 1531
-  Revoke FS block 0
   Revoke FS block 1467
-  Revoke FS block 0
   Revoke FS block 1532
-  Revoke FS block 0
   Revoke FS block 1468
-  Revoke FS block 0
   Revoke FS block 1533
-  Revoke FS block 0
   Revoke FS block 1469
-  Revoke FS block 0
   Revoke FS block 1534
-  Revoke FS block 0
   Revoke FS block 1470
-  Revoke FS block 0
   Revoke FS block 1535
-  Revoke FS block 0
   Revoke FS block 1471
 Found expected sequence 32, type 1 (descriptor block) at block 68
 Dumping descriptor block, sequence 32, at block 68:
@@ -323,163 +231,84 @@ Dumping descriptor block, sequence 32, at block 150:
 Found expected sequence 32, type 2 (commit block) at block 201
 Found expected sequence 33, type 5 (revoke table) at block 202
 Dumping revoke block, sequence 33, at block 202:
-  Revoke FS block 0
   Revoke FS block 1600
-  Revoke FS block 0
   Revoke FS block 1601
-  Revoke FS block 0
   Revoke FS block 1537
-  Revoke FS block 0
   Revoke FS block 1602
-  Revoke FS block 0
   Revoke FS block 1538
-  Revoke FS block 0
   Revoke FS block 1603
-  Revoke FS block 0
   Revoke FS block 1539
-  Revoke FS block 0
   Revoke FS block 1604
-  Revoke FS block 0
   Revoke FS block 1540
-  Revoke FS block 0
   Revoke FS block 1605
-  Revoke FS block 0
   Revoke FS block 1606
-  Revoke FS block 0
   Revoke FS block 1542
-  Revoke FS block 0
   Revoke FS block 1607
-  Revoke FS block 0
   Revoke FS block 1543
-  Revoke FS block 0
   Revoke FS block 1608
-  Revoke FS block 0
   Revoke FS block 1544
-  Revoke FS block 0
   Revoke FS block 1609
-  Revoke FS block 0
   Revoke FS block 1545
-  Revoke FS block 0
   Revoke FS block 1610
-  Revoke FS block 0
   Revoke FS block 1546
-  Revoke FS block 0
   Revoke FS block 1611
-  Revoke FS block 0
   Revoke FS block 1547
-  Revoke FS block 0
   Revoke FS block 1612
-  Revoke FS block 0
   Revoke FS block 1548
-  Revoke FS block 0
   Revoke FS block 1613
-  Revoke FS block 0
   Revoke FS block 1549
-  Revoke FS block 0
   Revoke FS block 1614
-  Revoke FS block 0
   Revoke FS block 1550
-  Revoke FS block 0
   Revoke FS block 1615
-  Revoke FS block 0
   Revoke FS block 1551
-  Revoke FS block 0
   Revoke FS block 1616
-  Revoke FS block 0
   Revoke FS block 1552
-  Revoke FS block 0
   Revoke FS block 1617
-  Revoke FS block 0
   Revoke FS block 1553
-  Revoke FS block 0
   Revoke FS block 1554
-  Revoke FS block 0
   Revoke FS block 1555
-  Revoke FS block 0
   Revoke FS block 1557
-  Revoke FS block 0
   Revoke FS block 1558
-  Revoke FS block 0
   Revoke FS block 1559
-  Revoke FS block 0
   Revoke FS block 1560
-  Revoke FS block 0
   Revoke FS block 1561
-  Revoke FS block 0
   Revoke FS block 1562
-  Revoke FS block 0
   Revoke FS block 1563
-  Revoke FS block 0
   Revoke FS block 1564
-  Revoke FS block 0
   Revoke FS block 1565
-  Revoke FS block 0
   Revoke FS block 1566
-  Revoke FS block 0
   Revoke FS block 1567
-  Revoke FS block 0
   Revoke FS block 1568
-  Revoke FS block 0
   Revoke FS block 1569
-  Revoke FS block 0
   Revoke FS block 1570
-  Revoke FS block 0
   Revoke FS block 1571
-  Revoke FS block 0
   Revoke FS block 1572
-  Revoke FS block 0
   Revoke FS block 1573
-  Revoke FS block 0
   Revoke FS block 1574
-  Revoke FS block 0
   Revoke FS block 1575
-  Revoke FS block 0
   Revoke FS block 1576
-  Revoke FS block 0
   Revoke FS block 1577
-  Revoke FS block 0
   Revoke FS block 1578
-  Revoke FS block 0
   Revoke FS block 1579
-  Revoke FS block 0
   Revoke FS block 1580
-  Revoke FS block 0
   Revoke FS block 1581
-  Revoke FS block 0
   Revoke FS block 1582
-  Revoke FS block 0
   Revoke FS block 1583
-  Revoke FS block 0
   Revoke FS block 1584
-  Revoke FS block 0
   Revoke FS block 1585
-  Revoke FS block 0
   Revoke FS block 1586
-  Revoke FS block 0
   Revoke FS block 1587
-  Revoke FS block 0
   Revoke FS block 1588
-  Revoke FS block 0
   Revoke FS block 1589
-  Revoke FS block 0
   Revoke FS block 1590
-  Revoke FS block 0
   Revoke FS block 1591
-  Revoke FS block 0
   Revoke FS block 1592
-  Revoke FS block 0
   Revoke FS block 1593
-  Revoke FS block 0
   Revoke FS block 1594
-  Revoke FS block 0
   Revoke FS block 1595
-  Revoke FS block 0
   Revoke FS block 1596
-  Revoke FS block 0
   Revoke FS block 1597
-  Revoke FS block 0
   Revoke FS block 1598
-  Revoke FS block 0
   Revoke FS block 1599
 Found expected sequence 33, type 1 (descriptor block) at block 203
 Dumping descriptor block, sequence 33, at block 203:


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 06/37] debugfs: force logdump to display (old) journal contents
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2014-05-01 23:12 ` [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-02 11:49   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 07/37] resize2fs: fix check for collision between old GDT and superblock on sparse_super2 fs Darrick J. Wong
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If the user passes -a more than once to logdump, try to dump old log
contents.  This can be used to try to track down journal problems even
after recovery.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/logdump.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)


diff --git a/debugfs/logdump.c b/debugfs/logdump.c
index 8b9dc5b..bf4bef5 100644
--- a/debugfs/logdump.c
+++ b/debugfs/logdump.c
@@ -393,9 +393,13 @@ static void dump_journal(char *cmdname, FILE *out_file,
 	fprintf(out_file, "Journal starts at block %u, transaction %u\n",
 		blocknr, transaction);
 
-	if (!blocknr)
+	if (!blocknr) {
 		/* Empty journal, nothing to do. */
-		return;
+		if (dump_all < 2)
+			return;
+		else
+			blocknr = 1;
+	}
 
 	while (1) {
 		retval = read_journal_block(cmdname, source,
@@ -420,7 +424,8 @@ static void dump_journal(char *cmdname, FILE *out_file,
 			fprintf (out_file, "Found sequence %u (not %u) at "
 				 "block %u: end of journal.\n",
 				 sequence, transaction, blocknr);
-			return;
+			if (dump_all < 2)
+				return;
 		}
 
 		if (dump_descriptors) {


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 07/37] resize2fs: fix check for collision between old GDT and superblock on sparse_super2 fs
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (5 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 06/37] debugfs: force logdump to display (old) journal contents Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-12  3:35   ` Theodore Ts'o
  2014-05-01 23:13 ` [PATCH 08/37] mke2fs: set gdt csum when creating packed fs Darrick J. Wong
                   ` (27 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In reserve_sparse_super2_last_group, the old_desc check should only be
performed if ext2fs_super_and_bgd_loc2() gave us a location -- a
return value of 0 means that there is no old-style GDT block.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index f5f1337..a81a1c3 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -2121,7 +2121,7 @@ static errcode_t reserve_sparse_super2_last_group(ext2_resize_t rfs,
 		      stderr);
 		exit(1);
 	}
-	if (old_desc != sb+1) {
+	if (old_desc && old_desc != sb+1) {
 		fputs(_("Should never happen!  Unexpected old_desc in "
 			"super_sparse bg?\n"),
 		      stderr);


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 08/37] mke2fs: set gdt csum when creating packed fs
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (6 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 07/37] resize2fs: fix check for collision between old GDT and superblock on sparse_super2 fs Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-02 11:55   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 09/37] mke2fs: set error behavior at initialization time Darrick J. Wong
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're creating a fs with metadata blocks packed at the beginning
(packed_meta_blocks=1 in mke2fs.conf), set the group descriptor
checksum or else we create DOA filesystems with checksum errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.c |    1 +
 1 file changed, 1 insertion(+)


diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 6507d0d..fd6259d 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -383,6 +383,7 @@ static errcode_t packed_allocate_tables(ext2_filsys fs)
 		ext2fs_block_alloc_stats_range(fs, goal,
 					       fs->inode_blocks_per_group, +1);
 		ext2fs_inode_table_loc_set(fs, i, goal);
+		ext2fs_group_desc_csum_set(fs, i);
 	}
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 09/37] mke2fs: set error behavior at initialization time
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (7 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 08/37] mke2fs: set gdt csum when creating packed fs Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-02 12:13   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 10/37] e2fsck: verify checksums after checking everything else Darrick J. Wong
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Port tune2fs' -e flag to mke2fs so that we can set error behavior at
format time, and introduce the equivalent errors= setting into
mke2fs.conf.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.8.in             |   23 +++++++++
 misc/mke2fs.c                |   57 +++++++++++++++++++++-
 misc/mke2fs.conf.5.in        |   19 +++++++
 tests/t_mke2fs_errors/expect |   24 +++++++++
 tests/t_mke2fs_errors/script |  110 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 231 insertions(+), 2 deletions(-)
 create mode 100644 tests/t_mke2fs_errors/expect
 create mode 100755 tests/t_mke2fs_errors/script


diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index bf17eae..bad76bb 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -113,6 +113,10 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
 [
 .B \-V
 ]
+[
+.B \-e
+.I errors-behavior
+]
 .I device
 [
 .I blocks-count
@@ -206,6 +210,25 @@ lot of buffer cache memory, which may impact other applications running
 on a busy server.  This option will cause mke2fs to run much more
 slowly, however, so there is a tradeoff to using direct I/O.
 .TP
+.BI \-e " error-behavior"
+Change the behavior of the kernel code when errors are detected.
+In all cases, a filesystem error will cause
+.BR e2fsck (8)
+to check the filesystem on the next boot.
+.I error-behavior
+can be one of the following:
+.RS 1.2i
+.TP 1.2i
+.B continue
+Continue normal execution.
+.TP
+.B remount-ro
+Remount filesystem read-only.
+.TP
+.B panic
+Cause a kernel panic.
+.RE
+.TP
 .BI \-E " extended-options"
 Set extended options for the filesystem.  Extended options are comma
 separated, and may take an argument using the equals ('=') sign.  The
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index fd6259d..a794689 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -112,6 +112,8 @@ static profile_t	profile;
 static int sys_page_size = 4096;
 static int linux_version_code = 0;
 
+static int errors_behavior = 0;
+
 static void usage(void)
 {
 	fprintf(stderr, _("Usage: %s [-c|-l filename] [-b block-size] "
@@ -123,7 +125,7 @@ static void usage(void)
 	"\t[-g blocks-per-group] [-L volume-label] "
 	"[-M last-mounted-directory]\n\t[-O feature[,...]] "
 	"[-r fs-revision] [-E extended-option[,...]]\n"
-	"\t[-t fs-type] [-T usage-type ] [-U UUID] "
+	"\t[-t fs-type] [-T usage-type ] [-U UUID] [-e errors_behavior]"
 	"[-jnqvDFKSV] device [blocks-count]\n"),
 		program_name);
 	exit(1);
@@ -1524,7 +1526,7 @@ profile_error:
 	}
 
 	while ((c = getopt (argc, argv,
-		    "b:cg:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
+		    "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
 		switch (c) {
 		case 'b':
 			blocksize = parse_num_blocks2(optarg, -1);
@@ -1567,6 +1569,20 @@ profile_error:
 		case 'E':
 			extended_opts = optarg;
 			break;
+		case 'e':
+			if (strcmp(optarg, "continue") == 0)
+				errors_behavior = EXT2_ERRORS_CONTINUE;
+			else if (strcmp(optarg, "remount-ro") == 0)
+				errors_behavior = EXT2_ERRORS_RO;
+			else if (strcmp(optarg, "panic") == 0)
+				errors_behavior = EXT2_ERRORS_PANIC;
+			else {
+				com_err(program_name, 0,
+					_("bad error behavior - %s"),
+					optarg);
+				usage();
+			}
+			break;
 		case 'F':
 			force++;
 			break;
@@ -2577,6 +2593,38 @@ static int create_quota_inodes(ext2_filsys fs)
 	return 0;
 }
 
+static errcode_t set_error_behavior(ext2_filsys fs)
+{
+	char	*arg = NULL;
+	short	errors = fs->super->s_errors;
+
+	arg = get_string_from_profile(fs_types, "errors", NULL);
+	if (arg == NULL)
+		goto try_user;
+
+	if (strcmp(arg, "continue") == 0)
+		errors = EXT2_ERRORS_CONTINUE;
+	else if (strcmp(arg, "remount-ro") == 0)
+		errors = EXT2_ERRORS_RO;
+	else if (strcmp(arg, "panic") == 0)
+		errors = EXT2_ERRORS_PANIC;
+	else {
+		com_err(program_name, 0,
+			_("bad error behavior in profile - %s"),
+			arg);
+		free(arg);
+		return EXT2_ET_INVALID_ARGUMENT;
+	}
+	free(arg);
+
+try_user:
+	if (errors_behavior)
+		errors = errors_behavior;
+
+	fs->super->s_errors = errors;
+	return 0;
+}
+
 int main (int argc, char *argv[])
 {
 	errcode_t	retval = 0;
@@ -2641,6 +2689,11 @@ int main (int argc, char *argv[])
 	}
 	fs->progress_ops = &ext2fs_numeric_progress_ops;
 
+	/* Set the error behavior */
+	retval = set_error_behavior(fs);
+	if (retval)
+		usage();
+
 	/* Check the user's mkfs options for metadata checksumming */
 	if (!quiet &&
 	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
diff --git a/misc/mke2fs.conf.5.in b/misc/mke2fs.conf.5.in
index 02efdce..18a003a 100644
--- a/misc/mke2fs.conf.5.in
+++ b/misc/mke2fs.conf.5.in
@@ -302,6 +302,25 @@ whose subsections define the
 relation, only the last will be used by
 .BR mke2fs (8).
 .TP
+.I errors
+Change the behavior of the kernel code when errors are detected.
+In all cases, a filesystem error will cause
+.BR e2fsck (8)
+to check the filesystem on the next boot.
+.I errors
+can be one of the following:
+.RS 1.2i
+.TP 1.2i
+.B continue
+Continue normal execution.
+.TP
+.B remount-ro
+Remount filesystem read-only.
+.TP
+.B panic
+Cause a kernel panic.
+.RE
+.TP
 .I features
 This relation specifies a comma-separated list of features edit
 requests which modify the feature set
diff --git a/tests/t_mke2fs_errors/expect b/tests/t_mke2fs_errors/expect
new file mode 100644
index 0000000..78514bd
--- /dev/null
+++ b/tests/t_mke2fs_errors/expect
@@ -0,0 +1,24 @@
+error default
+Errors behavior:          Continue
+error continue
+Errors behavior:          Continue
+error panic
+Errors behavior:          Panic
+error remount-ro
+Errors behavior:          Remount read-only
+error garbage
+error default profile continue
+Errors behavior:          Continue
+error default profile panic
+Errors behavior:          Panic
+error default profile remount-ro
+Errors behavior:          Remount read-only
+error default profile broken
+error fs_types profile continue
+Errors behavior:          Continue
+error fs_types profile panic
+Errors behavior:          Panic
+error fs_types profile remount-ro
+Errors behavior:          Remount read-only
+error fs_types profile remount-ro
+Errors behavior:          Panic
diff --git a/tests/t_mke2fs_errors/script b/tests/t_mke2fs_errors/script
new file mode 100755
index 0000000..d09e926
--- /dev/null
+++ b/tests/t_mke2fs_errors/script
@@ -0,0 +1,110 @@
+test_description="mke2fs with error behavior"
+
+conf=$TMPFILE.conf
+write_defaults_conf()
+{
+	errors="$1"
+	cat > $conf << ENDL
+[defaults]
+	errors = $errors
+ENDL
+}
+
+write_section_conf()
+{
+	errors="$1"
+	cat > $conf << ENDL
+[defaults]
+	errors = broken
+
+[fs_types]
+	test_suite = {
+		errors = $errors
+	}
+ENDL
+}
+
+trap "rm -rf $TMPFILE $TMPFILE.conf" EXIT INT QUIT
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+OUT=$test_name.log
+EXP=$test_dir/expect
+rm -rf $OUT
+
+# Test command line option
+echo "error default" >> $OUT
+$MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error continue" >> $OUT
+$MKE2FS -e continue -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error panic" >> $OUT
+$MKE2FS -e panic -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error remount-ro" >> $OUT
+$MKE2FS -e remount-ro -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error garbage" >> $OUT
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+$MKE2FS -e broken -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+# Test errors= in default
+echo "error default profile continue" >> $OUT
+write_defaults_conf continue
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error default profile panic" >> $OUT
+write_defaults_conf panic
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error default profile remount-ro" >> $OUT
+write_defaults_conf remount-ro
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error default profile broken" >> $OUT
+write_defaults_conf broken
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+# Test errors= in a fs type
+echo "error fs_types profile continue" >> $OUT
+write_section_conf continue
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error fs_types profile panic" >> $OUT
+write_section_conf panic
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error fs_types profile remount-ro" >> $OUT
+write_section_conf remount-ro
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+# Test command line override
+echo "error fs_types profile remount-ro" >> $OUT
+write_section_conf remount-ro
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite -e panic > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 10/37] e2fsck: verify checksums after checking everything else
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (8 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 09/37] mke2fs: set error behavior at initialization time Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-02 12:32   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 11/37] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
                   ` (24 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

There's a particular problem with e2fsck's user interface where
checksum errors are concerned:  Fixing the first complaint about
a checksum problem results in the inode being cleared even if e2fsck
could otherwise have recovered it.  While this mode is useful for
cleaning the remaining broken crud off the filesystem, we could at
least default to checking everything /else/ and only complaining about
the incorrect checksum if fsck finds nothing else wrong.

So, plumb in a config option.  We default to "verify and checksum"
unless the user tell us otherwise.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.8.in      |   12 ++++++++++++
 e2fsck/e2fsck.conf.5.in |   20 ++++++++++++++++++++
 e2fsck/e2fsck.h         |    1 +
 e2fsck/problem.c        |   18 ++++++++++++++----
 e2fsck/problemP.h       |    1 +
 e2fsck/unix.c           |   11 +++++++++++
 6 files changed, 59 insertions(+), 4 deletions(-)


diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index f5ed758..43ee063 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -207,6 +207,18 @@ option may prevent you from further manual data recovery.
 .BI nodiscard
 Do not attempt to discard free blocks and unused inode blocks. This option is
 exactly the opposite of discard option. This is set as default.
+.TP
+.BI strict_csums
+Verify each metadata object's checksum before checking anything other fields
+in the metadata object.  If the verification fails, offer to clear the item,
+also before checking any of the other fields.  This option causes e2fsck to
+favor throwing away broken objects over trying to salvage them.
+.TP
+.BI no_strict_csums
+Perform all regular checks of a metadata object and only verify the checksum if
+no problems were found.  This option causes e2fsck to try to salvage slightly
+damaged metadata objects, at the cost of spending processing time on recovering
+data.  This is set as the default.
 .RE
 .TP
 .B \-f
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index 9ebfbbf..a8219a8 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -222,6 +222,26 @@ If this boolean relation is true, e2fsck will run as if the option
 .B -v
 is always specified.  This will cause e2fsck to print some additional
 information at the end of each full file system check.
+.TP
+.I strict_csums
+If this boolean relation is true, e2fsck will run as if
+.B -E strict_csums
+is set.  This causes e2fsck to verify each metadata object's checksum before
+checking anything other fields in the metadata object.  If the verification
+fails, offer to clear the item, also before checking any of the other fields.
+This option causes e2fsck to favor throwing away broken objects over trying to
+salvage them.
+.IP
+If the boolean relation is false, e2fsck will run as if
+.B -E no_strict_csums
+is set.  In this case, e2fsck will perform all regular checks of a metadata
+object and only verify the checksum if no problems were found.  This option
+causes e2fsck to try to salvage slightly damaged metadata objects, at the cost
+of spending processing time on recovering data.
+.IP
+The default is for e2fsck to behave as if
+.B -E no_strict_csums
+is set.
 .SH THE [problems] STANZA
 Each tag in the
 .I [problems] 
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index dbd6ea8..d7a7be9 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -167,6 +167,7 @@ struct resource_track {
 #define E2F_OPT_FRAGCHECK	0x0800
 #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
 #define E2F_OPT_DISCARD		0x2000
+#define E2F_OPT_CSUM_FIRST	0x4000
 
 /*
  * E2fsck flags
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 7f0ad6c..0999399 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -970,7 +970,7 @@ static struct e2fsck_problem problem_table[] = {
 	/* inode checksum does not match inode */
 	{ PR_1_INODE_CSUM_INVALID,
 	  N_("@i %i checksum does not match @i.  "),
-	  PROMPT_CLEAR, PR_PREEN_OK },
+	  PROMPT_CLEAR, PR_PREEN_OK | PR_INITIAL_CSUM },
 
 	/* inode passes checks, but checksum does not match inode */
 	{ PR_1_INODE_ONLY_CSUM_INVALID,
@@ -981,7 +981,7 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1_EXTENT_CSUM_INVALID,
 	  N_("@i %i extent block checksum does not match extent\n\t(logical @b "
 	     "%c, @n physical @b %b, len %N)\n"),
-	  PROMPT_CLEAR, 0 },
+	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
 	 * Inode extent block passes checks, but checksum does not match
@@ -996,7 +996,7 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1_EA_BLOCK_CSUM_INVALID,
 	  N_("Extended attribute @a @b %b checksum for @i %i does not "
 	     "match.  "),
-	  PROMPT_CLEAR, 0 },
+	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
 	 * Extended attribute block passes checks, but checksum for inode does
@@ -1470,7 +1470,7 @@ static struct e2fsck_problem problem_table[] = {
 	/* leaf node fails checksum */
 	{ PR_2_LEAF_NODE_CSUM_INVALID,
 	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
-	  PROMPT_SALVAGE, PR_PREEN_OK },
+	  PROMPT_SALVAGE, PR_PREEN_OK | PR_INITIAL_CSUM },
 
 	/* leaf node has no checksum */
 	{ PR_2_LEAF_NODE_MISSING_CSUM,
@@ -1944,6 +1944,16 @@ int fix_problem(e2fsck_t ctx, problem_t code, struct problem_context *pctx)
 		printf(_("Unhandled error code (0x%x)!\n"), code);
 		return 0;
 	}
+
+	/*
+	 * If there is a problem with the initial csum verification and the
+	 * user told e2fsck to verify csums /after/ checking everything else,
+	 * then don't "fix" anything.
+	 */
+	if ((ptr->flags & PR_INITIAL_CSUM) &&
+	    !(ctx->options & E2F_OPT_CSUM_FIRST))
+		return 0;
+
 	if (!(ptr->flags & PR_CONFIG)) {
 		char	key[9], *new_desc = NULL;
 
diff --git a/e2fsck/problemP.h b/e2fsck/problemP.h
index 7944cd6..a983598 100644
--- a/e2fsck/problemP.h
+++ b/e2fsck/problemP.h
@@ -44,3 +44,4 @@ struct latch_descr {
 #define PR_CONFIG	0x080000 /* This problem has been customized
 				    from the config file */
 #define PR_FORCE_NO	0x100000 /* Force the answer to be no */
+#define PR_INITIAL_CSUM	0x200000 /* User can ignore initial csum check */
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index b39383d..c6cdb49 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -692,6 +692,10 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 			else
 				ctx->log_fn = string_copy(ctx, arg, 0);
 			continue;
+		} else if (strcmp(token, "strict_csums") == 0) {
+			ctx->options |= E2F_OPT_CSUM_FIRST;
+		} else if (strcmp(token, "no_strict_csums") == 0) {
+			ctx->options &= ~E2F_OPT_CSUM_FIRST;
 		} else {
 			fprintf(stderr, _("Unknown extended option: %s\n"),
 				token);
@@ -710,6 +714,8 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs(("\tjournal_only\n"), stderr);
 		fputs(("\tdiscard\n"), stderr);
 		fputs(("\tnodiscard\n"), stderr);
+		fputs(("\tstrict_csums\n"), stderr);
+		fputs(("\tno_strict_csums\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
@@ -945,6 +951,11 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	profile_set_syntax_err_cb(syntax_err_report);
 	profile_init(config_fn, &ctx->profile);
 
+	profile_get_boolean(ctx->profile, "options", "strict_csums", NULL,
+			    0, &c);
+	if (c)
+		ctx->options |= E2F_OPT_CSUM_FIRST;
+
 	profile_get_boolean(ctx->profile, "options", "report_time", 0, 0,
 			    &c);
 	if (c)


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 11/37] e2fsck: fix the extended attribute checksum error message
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (9 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 10/37] e2fsck: verify checksums after checking everything else Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-02 12:46   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Make the "EA block passes checks but fails checksum" message less
strange.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/problem.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)


diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 0999399..ec20bd1 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -992,19 +992,17 @@ static struct e2fsck_problem problem_table[] = {
 	     "extent\n\t(logical @b %c, @n physical @b %b, len %N)\n"),
 	  PROMPT_FIX, 0 },
 
-	/* Extended attribute block checksum for inode does not match. */
+	/* Extended attribute block checksum does not match. */
 	{ PR_1_EA_BLOCK_CSUM_INVALID,
-	  N_("Extended attribute @a @b %b checksum for @i %i does not "
-	     "match.  "),
+	  N_("@a @b %b checksum for @i %i does not match.  "),
 	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
-	 * Extended attribute block passes checks, but checksum for inode does
-	 * not match.
+	 * Extended attribute block passes checks, but checksum does not
+	 * match.
 	 */
 	{ PR_1_EA_BLOCK_ONLY_CSUM_INVALID,
-	  N_("Extended attribute @a @b %b passes checks, but checksum for "
-	     "@i %i does not match.  "),
+	  N_("@a @b %b passes checks, but checksum does not match.  "),
 	  PROMPT_FIX, 0 },
 
 	/*


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (10 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 11/37] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-02 12:54   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 13/37] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
                   ` (22 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If e2fsck is writing a block of directory entries to disk, it should
adjust the dirents to add the dirent tail if one is missing.  It's not
a big deal if there's no space to do this since rehash (pass 3A) will
reconstruct directories for us.  However, we may as well avoid
unnecessary work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass2.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)


diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 5488c73..95f51b7 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -739,6 +739,41 @@ static int is_last_entry(ext2_filsys fs, int inline_data_size,
 		return (offset < fs->blocksize - csum_size);
 }
 
+static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
+{
+	struct ext2_dir_entry *d;
+	void *top;
+	struct ext2_dir_entry_tail *t;
+	unsigned int rec_len;
+
+	d = dirbuf;
+	top = EXT2_DIRENT_TAIL(dirbuf, fs->blocksize);
+
+	rec_len = d->rec_len;
+	while (rec_len && !(rec_len & 0x3)) {
+		d = (struct ext2_dir_entry *)(((char *)d) + rec_len);
+		if (((void *)d) + d->rec_len >= top)
+			break;
+		rec_len = d->rec_len;
+	}
+
+	if (d != top) {
+		size_t min_size = EXT2_DIR_REC_LEN(
+				ext2fs_dirent_name_len(dirbuf));
+		if (min_size > d->rec_len - sizeof(struct ext2_dir_entry_tail))
+			return EXT2_ET_DIR_NO_SPACE_FOR_CSUM;
+		d->rec_len -= sizeof(struct ext2_dir_entry_tail);
+	}
+
+	t = (struct ext2_dir_entry_tail *)top;
+	if (t->det_reserved_zero1 ||
+	    t->det_rec_len != sizeof(struct ext2_dir_entry_tail) ||
+	    t->det_reserved_name_len != EXT2_DIR_NAME_LEN_CSUM)
+		ext2fs_initialize_dirent_tail(fs, t);
+
+	return 0;
+}
+
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *db,
 			   void *priv_data)
@@ -1275,8 +1310,13 @@ skip_checksum:
 		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
 				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
 		    is_leaf &&
+		    !inline_data_size &&
 		    !ext2fs_dirent_has_tail(fs, (struct ext2_dir_entry *)buf))
+		{
+			if (insert_dirent_tail(fs, buf) == 0)
+				goto write_and_fix;
 			e2fsck_rehash_dir_later(ctx, ino);
+		}
 
 write_and_fix:
 		if (e2fsck_dir_will_be_rehashed(ctx, ino))


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 13/37] e2fsck: write dir blocks after new inode when reconstructing root/lost+found
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (11 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-05 17:13   ` Lukáš Czerner
  2014-05-01 23:13 ` [PATCH 14/37] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If we trash the root directory block, e2fsck will find inode 11 (the
old lost+found) and try to attach it to l+f.  The lost+found checker
also fails to find l+f and tries to add one to the root dir.  The root
dir is not found but is recreated with incorrect checksums, so linking
in the l+f dir fails and the l+f '..' entry isn't set.  Since both
dirs now fail checksum verification, they're both referred to rehash
to have that fixed, but because l+f doesn't have a '..' entry, rehash
crashes because l+f has < 2 entries.

On a checksumming filesystem, the routines in e2fsck that recreate
/lost+found and / must write the new directory block *after* the inode
has been written to disk because the checksum depends on i_generation.
Add a regression test while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass3.c                        |   85 +++++----
 tests/f_rebuild_csum_rootdir/expect.1 |  311 +++++++++++++++++++++++++++++++++
 tests/f_rebuild_csum_rootdir/expect.2 |    7 +
 tests/f_rebuild_csum_rootdir/image.gz |  Bin
 tests/f_rebuild_csum_rootdir/name     |    1 
 5 files changed, 364 insertions(+), 40 deletions(-)
 create mode 100644 tests/f_rebuild_csum_rootdir/expect.1
 create mode 100644 tests/f_rebuild_csum_rootdir/expect.2
 create mode 100644 tests/f_rebuild_csum_rootdir/image.gz
 create mode 100644 tests/f_rebuild_csum_rootdir/name


diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 6f7f855..efc0d49 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -188,28 +188,6 @@ static void check_root(e2fsck_t ctx)
 	ext2fs_mark_bb_dirty(fs);
 
 	/*
-	 * Now let's create the actual data block for the inode
-	 */
-	pctx.errcode = ext2fs_new_dir_block(fs, EXT2_ROOT_INO, EXT2_ROOT_INO,
-					    &block);
-	if (pctx.errcode) {
-		pctx.str = "ext2fs_new_dir_block";
-		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
-		ctx->flags |= E2F_FLAG_ABORT;
-		return;
-	}
-
-	pctx.errcode = ext2fs_write_dir_block4(fs, blk, block, 0,
-					       EXT2_ROOT_INO);
-	if (pctx.errcode) {
-		pctx.str = "ext2fs_write_dir_block4";
-		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
-		ctx->flags |= E2F_FLAG_ABORT;
-		return;
-	}
-	ext2fs_free_mem(&block);
-
-	/*
 	 * Set up the inode structure
 	 */
 	memset(&inode, 0, sizeof(inode));
@@ -232,6 +210,30 @@ static void check_root(e2fsck_t ctx)
 	}
 
 	/*
+	 * Now let's create the actual data block for the inode.
+	 * Due to metadata_csum, we must write the dir blocks AFTER
+	 * the inode has been written to disk!
+	 */
+	pctx.errcode = ext2fs_new_dir_block(fs, EXT2_ROOT_INO, EXT2_ROOT_INO,
+					    &block);
+	if (pctx.errcode) {
+		pctx.str = "ext2fs_new_dir_block";
+		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
+		ctx->flags |= E2F_FLAG_ABORT;
+		return;
+	}
+
+	pctx.errcode = ext2fs_write_dir_block4(fs, blk, block, 0,
+					       EXT2_ROOT_INO);
+	ext2fs_free_mem(&block);
+	if (pctx.errcode) {
+		pctx.str = "ext2fs_write_dir_block4";
+		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
+		ctx->flags |= E2F_FLAG_ABORT;
+		return;
+	}
+
+	/*
 	 * Miscellaneous bookkeeping...
 	 */
 	e2fsck_add_dir_info(ctx, EXT2_ROOT_INO, EXT2_ROOT_INO);
@@ -449,24 +451,6 @@ unlink:
 	ext2fs_inode_alloc_stats2(fs, ino, +1, 1);
 
 	/*
-	 * Now let's create the actual data block for the inode
-	 */
-	retval = ext2fs_new_dir_block(fs, ino, EXT2_ROOT_INO, &block);
-	if (retval) {
-		pctx.errcode = retval;
-		fix_problem(ctx, PR_3_ERR_LPF_NEW_DIR_BLOCK, &pctx);
-		return 0;
-	}
-
-	retval = ext2fs_write_dir_block4(fs, blk, block, 0, ino);
-	ext2fs_free_mem(&block);
-	if (retval) {
-		pctx.errcode = retval;
-		fix_problem(ctx, PR_3_ERR_LPF_WRITE_BLOCK, &pctx);
-		return 0;
-	}
-
-	/*
 	 * Set up the inode structure
 	 */
 	memset(&inode, 0, sizeof(inode));
@@ -486,6 +470,27 @@ unlink:
 		fix_problem(ctx, PR_3_CREATE_LPF_ERROR, &pctx);
 		return 0;
 	}
+
+	/*
+	 * Now let's create the actual data block for the inode.
+	 * Due to metadata_csum, the directory block MUST be written
+	 * after the inode is written to disk!
+	 */
+	retval = ext2fs_new_dir_block(fs, ino, EXT2_ROOT_INO, &block);
+	if (retval) {
+		pctx.errcode = retval;
+		fix_problem(ctx, PR_3_ERR_LPF_NEW_DIR_BLOCK, &pctx);
+		return 0;
+	}
+
+	retval = ext2fs_write_dir_block4(fs, blk, block, 0, ino);
+	ext2fs_free_mem(&block);
+	if (retval) {
+		pctx.errcode = retval;
+		fix_problem(ctx, PR_3_ERR_LPF_WRITE_BLOCK, &pctx);
+		return 0;
+	}
+
 	/*
 	 * Finally, create the directory link
 	 */
diff --git a/tests/f_rebuild_csum_rootdir/expect.1 b/tests/f_rebuild_csum_rootdir/expect.1
new file mode 100644
index 0000000..6b5c47b
--- /dev/null
+++ b/tests/f_rebuild_csum_rootdir/expect.1
@@ -0,0 +1,311 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Directory inode 2, block #0, offset 0: directory has no checksum
+Fix? yes
+
+Directory inode 2, block #0, offset 0: directory corrupted
+Salvage? yes
+
+Missing '.' in directory inode 2.
+Fix? yes
+
+Setting filetype for entry '.' in ??? (2) to 2.
+Missing '..' in directory inode 2.
+Fix? yes
+
+Setting filetype for entry '..' in ??? (2) to 2.
+Pass 3: Checking directory connectivity
+'..' in / (2) is <The NULL inode> (0), should be / (2).
+Fix? yes
+
+Unconnected directory inode 11 (/???)
+Connect to /lost+found? yes
+
+/lost+found not found.  Create? yes
+
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Inode 11 ref count is 3, should be 2.  Fix? yes
+
+Unattached inode 12
+Connect to /lost+found? yes
+
+Inode 12 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 13
+Connect to /lost+found? yes
+
+Inode 13 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 14
+Connect to /lost+found? yes
+
+Inode 14 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 15
+Connect to /lost+found? yes
+
+Inode 15 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 16
+Connect to /lost+found? yes
+
+Inode 16 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 17
+Connect to /lost+found? yes
+
+Inode 17 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 18
+Connect to /lost+found? yes
+
+Inode 18 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 19
+Connect to /lost+found? yes
+
+Inode 19 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 20
+Connect to /lost+found? yes
+
+Inode 20 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 21
+Connect to /lost+found? yes
+
+Inode 21 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 22
+Connect to /lost+found? yes
+
+Inode 22 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 23
+Connect to /lost+found? yes
+
+Inode 23 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 24
+Connect to /lost+found? yes
+
+Inode 24 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 25
+Connect to /lost+found? yes
+
+Inode 25 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 26
+Connect to /lost+found? yes
+
+Inode 26 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 27
+Connect to /lost+found? yes
+
+Inode 27 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 28
+Connect to /lost+found? yes
+
+Inode 28 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 29
+Connect to /lost+found? yes
+
+Inode 29 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 30
+Connect to /lost+found? yes
+
+Inode 30 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 31
+Connect to /lost+found? yes
+
+Inode 31 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 32
+Connect to /lost+found? yes
+
+Inode 32 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 33
+Connect to /lost+found? yes
+
+Inode 33 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 34
+Connect to /lost+found? yes
+
+Inode 34 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 35
+Connect to /lost+found? yes
+
+Inode 35 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 36
+Connect to /lost+found? yes
+
+Inode 36 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 37
+Connect to /lost+found? yes
+
+Inode 37 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 38
+Connect to /lost+found? yes
+
+Inode 38 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 39
+Connect to /lost+found? yes
+
+Inode 39 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 40
+Connect to /lost+found? yes
+
+Inode 40 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 41
+Connect to /lost+found? yes
+
+Inode 41 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 42
+Connect to /lost+found? yes
+
+Inode 42 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 43
+Connect to /lost+found? yes
+
+Inode 43 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 44
+Connect to /lost+found? yes
+
+Inode 44 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 45
+Connect to /lost+found? yes
+
+Inode 45 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 46
+Connect to /lost+found? yes
+
+Inode 46 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 47
+Connect to /lost+found? yes
+
+Inode 47 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 48
+Connect to /lost+found? yes
+
+Inode 48 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 49
+Connect to /lost+found? yes
+
+Inode 49 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 50
+Connect to /lost+found? yes
+
+Inode 50 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 51
+Connect to /lost+found? yes
+
+Inode 51 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 52
+Connect to /lost+found? yes
+
+Inode 52 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 53
+Connect to /lost+found? yes
+
+Inode 53 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 54
+Connect to /lost+found? yes
+
+Inode 54 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 55
+Connect to /lost+found? yes
+
+Inode 55 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 56
+Connect to /lost+found? yes
+
+Inode 56 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 57
+Connect to /lost+found? yes
+
+Inode 57 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 58
+Connect to /lost+found? yes
+
+Inode 58 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 59
+Connect to /lost+found? yes
+
+Inode 59 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 60
+Connect to /lost+found? yes
+
+Inode 60 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 61
+Connect to /lost+found? yes
+
+Inode 61 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 62
+Connect to /lost+found? yes
+
+Inode 62 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 63
+Connect to /lost+found? yes
+
+Inode 63 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 64
+Connect to /lost+found? yes
+
+Inode 64 ref count is 2, should be 1.  Fix? yes
+
+Unattached zero-length inode 65.  Clear? yes
+
+Unattached inode 66
+Connect to /lost+found? yes
+
+Inode 66 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 67
+Connect to /lost+found? yes
+
+Inode 67 ref count is 2, should be 1.  Fix? yes
+
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 67/512 files (1.5% non-contiguous), 1127/2048 blocks
+Exit status is 1
diff --git a/tests/f_rebuild_csum_rootdir/expect.2 b/tests/f_rebuild_csum_rootdir/expect.2
new file mode 100644
index 0000000..033f1bf
--- /dev/null
+++ b/tests/f_rebuild_csum_rootdir/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 67/512 files (1.5% non-contiguous), 1127/2048 blocks
+Exit status is 0
diff --git a/tests/f_rebuild_csum_rootdir/image.gz b/tests/f_rebuild_csum_rootdir/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..a32fd4431a44560b20033d43836000ef22ce977f
GIT binary patch
literal 12476
zcmeI2c~leUyT@t$QMaFB>q3zwwrWuk5Ktr{q?HOkRKy~R2oeP}Dq@f&VRc#;a6z#s
zrpl69L{tz2#3&I4TtEa8ma;FSr9dEopd<uln0fBld(XM|o_o$czd!p&@=tP}yz|cc
zeBS5%exEsK7#C;g9HESNemYIjJ@bmu!CU3;GrDISaQ)YkC7*op=Y<*7pKbba-qmkE
z{p=r`FV7KVy-rRy3bf?&zWaXpYvZqH{p0*kI-Bftzdp6>+ba_$YR?N_+<)}zf`To5
z=hb$P-kkHF`HVWm`KluLK6~u-bcIqdR9ibDd9NH992~rn;rhk?^7~s8DqO8i9iH5G
zbTwI3I@!Bs+3X;vX#e`4%+zo0IIWB#)puBe<k4)1jR@a~ZElv_JmzO_wa#H8NfkT$
zv*p#67WPB&=KV$mH4)-VMq?elNFE}uYs-01N_4+AeAe?m#pQuxWM!6(s7&6qu=g8u
zDwWuhHCTP6qz5~DIUa2>J>5bO@g0*d-aQQe^5?moSuer{AFys?L8)aG_kI{eJy^)P
zyT8XU=k4shrP+9NwN81tFNv)(=H1H!mgoJP2cHO%ch#7zGPbpr$L9Nrqg?{@baZrj
z7=DQ5$~mJa>EDrpV>9Wm*tcinA`+HmmArRVmgyjicYCcr`AoB!{-=C;#g%1~!(FiP
zSW-xA_nO{mbkOmx6f7yFD<_KjIAI2S3@Tv&g0i>y-OR|18&;2&WoE5M#^Z<9AJmLF
z^ehjUPY>zoJSv5)#RD3j(mLTd>GrTDx=P+^pOMRdcD%G#oX3w9sV^saD<@o<ZdEM!
zL`UaGK;p1zPX_Hbvgpf4dzv-BA(7v~eLP`HM6rnc@ARU;tTnHvq2=RUB9X~~oXk`_
z{!&CbimW=RxoDF-RAmyYDsFHKIO?S24qdHieo;~V$so)@%!5aNnvr#zUzNc3?=wvf
z^!(8Fa}e;J=dMXxAJCk9iAe>du06UR8_0+X*{@-rkd%)I-$zb$K3ucyba4~mO$m<H
zpnuFgSu-d)xuf=Xo97?6Z~hQD&J9*kw}A@G?&(83_ED|ehg00U<Tpm~NmAqg+R4+;
zzhdSS$+Ml{H}in;DwS&5G=f6gOD04Ih7*Srm~@KWYEPh6z<om5teDj3ox{-%dDa>b
ze%Mw!iENGr)XL3hj4Ma7*0;bG@&TA{OLhpm{9_oyr0_RXsI)Ux!BCs2U)#yu*t|TJ
z(eYYp`J7jHRlkB|nF-l1oPgQaM$UKi%td@9)R+yfzt0F~44&&|=UKL3zZAi#M9Pi&
z63@PLxf?xRLlFt68?vv1<;a1v2ISU7XQ^FDl?uIl&~x1q(Tmynd98PJD2R%VmEXRV
zSDe}Cz}xQP9<x<#Gp$KDXQld4R<^ivz_Ze;f6fCk7tg%@;hsa-fal}8ev)4spY^wl
z94Hm+JQ^I7d#%&Vcwb77yMa;0KP}erdA9uGi11)<2lAGOS6I#9V6!Kdv}R-bBTr_L
z$KdgiygZ+?sPZs(rD&Z<bZnnPsq4L*tWpnG=|i_+ThBvY{d|7-qxeqsyAHL*3VyM`
z$9}_xCVBT`9y6O49xme51{-+WcXu~8>|Dv;^*ARzTa<4`<}7P3!;9*+>>Ca<%n_t#
z^xWc+Rg$##=a+KgkF>?a4DcV<S!U~I2BF;4bic8#a3))vs5|E4V=vxtD4}NJU&Gw(
z!$*fl9E<q7Y!Xjz35#JSo=ztg*JSVJR#_Bx-n{(s`kSezDWgBl%1k!jI_P;=5xlL6
z+d9~ncUv=U(Q}MfXM%NDiJwMn8!qhJd()>QxkqpB3{*LswzIu7+S^C4s@pjIQryz)
z0&6txc+g}(@#BoV@R$5Y%VRu!Y%S}ChnU4DcrHFZE-{6i;Sc@kRvbzYy}>LXW*}3A
zHysR&jt#VawpJ{!m5f}UOwmoL3=Q*=-?&uV$sSxS;7EhEAx-UF4l#b}@ud6=&f$BF
zEnUs&e&1n{SdMMnhM2u(ef=5NCyiYT`Pd^pQE65%FD%+%{{2?A&=1*J=ssN7(J$Rz
z73;D!YDzKEqT<XCLYjKIZzy~WTT5TL7P5z}?Kjj6H+Oay?5C*WN{K{ry)wKk)zcoy
zvd%AwThWy_9;z`g-Z#5(t7~#~bJ^!v8RPvUBjGnwd=|%S7`PJ_`7}B98F#P63ek13
zeMk21l?{et+yIUjGd&@@#X?heLa?j}A2Ca@>1=99HQFQ2-xa-g!`%CRl824@HNg&-
z2Gg^}T`dz?)5i63_{}wwjb~=YU0r?Sht&RFpPH~#j?g<@VZ6`0FHhCN@J;O>yXkHz
zbsMN0jqN{QmvGz;X$zc1i=N@TF<?CTW6CUX=9lHqozcH6BeKxZmD(LWR@JQ6@4Lsa
zasFvP!|ny~9v^VI8)C6bbTJR2-7vX0wRQWMVSPx+;8y&45?voSEvC4bj~YhF(fHC2
zcblh6pM>|f@YBWqU4IIYF!sH*4h9~Cs-gb9gx#C|pB_M!86DhlPZPT2PJ_@~YP(2h
z|Jx|{xG%pu@}q_p<#O>d1%!lEkiv1vQ+To2fMEGfG><NBKR7=?2`65E^9ncpQ-RuL
z0)^Kj@`?gNJZl7s+$Gdz1JWJS$05W=L3vaRvaC3;+<+7dHX%-43QD-J3%FPsdC>;f
z*sWEuLZVe9W^3MpvLzHZP=!5Nt<vkT;K3ht7t~5&5i}-JS)K8e6MQa4M5i?1$|TSm
z4anHJAqw=WM!49sJ@5$~OKAYb?Rt2Z?XO|^j8ZHn2M!vL`bPS=vTY>~ymghpkhUXH
z&TwL@6Vx{SW(l($DYxo&WV180TIiAT;}^P-iAAsQ?0yA1xdC~J<;Yp$mm`e5Apl31
zM1HM|JvLSkqZtK6(&~vRWqBHjNo2@iUp@AD9H|V@6(H056zHl<5cSow<hgAmr*)xR
zm{EiraR&GMLQuVV(?+6(eu^>1p+6O{-5J{YZo#-rh!h^m@h21hRIsl%AOj=KYOJq-
z;9EQ)0}e^392;EZJr9@0t>wW^-Bv`et{ri6h6@)rfvf4O5!hOxMqh-ETS{fzz4P%T
z^cx-+oW7t&tT|9@K+fmSAVrBu-Ozh#4BpXM)X4N0qgJb!C-y3+fLYsNY3Mjx*Z|Mp
z)FUUH6x0A6a_>{Xac7Wg&QSZ11!Aa3CzOmyTT*=lvctx)D-G=EjjH?}dmeP1N&*);
zXRAzTaoYrsr@tBzdW=qD4NSr-P^FiDlIpGL_19nykM@8Sli<GD3qHnps>%6o#IIry
zIVPb}ZE$|6jS3+ZZKUO*XRw~0ZMj^HvJ<FkqEexZVU*udj!x0uTpmM^;NV&?%qR*y
z4o}tv5H8ZO14PwxB7h6b+_&TTy`kgS^#(MUBSX~VZDgl@EZIAeMv6EpESxJtN0Qq}
z;mQe0Fh6u0vCKq^I#5|g8|l6%mQ=n>Bl&?UW!QchYx1%TgPkF`@LCZ_5-BsO9w`+v
z#K`v*N_3K5vAFOt!ulMC@b55<F#aoF#IEaZC(JH`-Y`1rLn_F@scbVBx^Q-nQZPvu
ziByK(DVeh8p!Cfa!~#~$B$RaFsE39K#3w+*W`~di*V0JVYZ<Gc1bV#|`)e%CCUAC`
zgo?4og;SelDE_F8447>J?)2t!ouRu~$;z3s=D|&R1I(EO@=F8y!s|Yn<F65Vgg`>5
zgdz%iAk0aIh96Z>O|u*+X>~hYn#kUWNu;*4fVi4C0M(z7uTW(~gfskMd4@{Lz1Ypl
zdP_0F!Y*GSY;ZKzM~10nZRFKi7SP|PK$c`8IrO1>alJ}0T}S836)oFuJX`Cnl$U)t
zyqN503pOF@+$v4zoqCp^0SSW07yQs9SKFyK$LP@8Y2_4*g9uO(hJgzR0iA$-3`y<D
zVGt@eC;)meKzz@Tkko0eVv)XV;<crT)$bn%VZV^_Qa~M*Q8%4Y+XJapdBE^-r|125
zK9TwVr>Sd%c++YZwqf_$JGLt7j-!FEgHf8%6$%?|^{_r%I+k)Dc`wPG`aU^vi-~>c
zFt+`dh4EvDa`zTmRUyZ&I?quMT{e1i6_lC8ppCRKNkR-4#UFXdi)Ph;NARRqR|S{3
zFS=&%9_LIScI_26&Zt91x&m3fy|e~ymrUhj`zNs-gqxkw^|?zM4~WAfE#h|AEgOhn
zhaQlzA04cve#tnwe?JE6ee}ULH~oinlDGl#Xb!CJhdHgtXU*|^|9rvoDddff)c`CU
zB%yNciCXjlr1b0oGqs2kt#KW%WqvA&i+p{7I$746Ru4G=!pH-JcbwA`D&&b3Ay~UW
z&PXkSXNi<>t{z>00UGy-9R<`0CLxu|*x@H+$nB*(<Z0O+BtSyVw#L^_2FQfA8+cG+
zPRjvhOGX?4`gG|ZQdxR0z`Yz6i=qEi4O$vD&c4=wyz_Db+9@=QLe~OnCX=v~$WYW#
zg++%bs61N6<7mZtBSY}AdPwN>*9d6YIZuzOmSLp}aKjRR4dlM<g)%0g&rvB)9F`$(
zRk9=`b&(viq$ybAT;E0b^k984^8U~(Ttf>xt+9>4PH+p(b)!BN1l*<%Uabs``d&Q=
zFA9jx^_DHjx3t9^RI%(=s*s6$ZKRae8+S=N;_VE&Rc9%C9&G@-=}W8V6fzo+uEu)S
z*dtjDoJvA=(kl6L0~FUdK}o8=hEcY;fV#&d1c55Xd(99owJ0DG7R6F5a}J~!kU=L+
za7K>}4}R7af-&6=cN&nErAD~@Z}||H`}TwK*EG`Wpb8!v;hgk^Vo;R$lTBgcmUS|4
zRjY+wyDLYad(^)Z`Cm5C$!a~eD;KV$ku}!1va7U&5^v;zL|25=*0v+`DYl)x37$5H
zz+qPisi{vRTLV@3Pgbae;T{SKUEmA$t2yu?k-D?)7EA=uN?~<(517)qrZ%unB-A4{
z6Ky2R!VVZ2MNpDRMeTHg^G$TFv=%acE&@D}(%&6VT{NT()++=@R&juDgR@anl@iZv
zBlSkQ5O^&b%y%bP=|vEqNHzA}LexD4#K5B1DHc8Yk^w0?{sk^_Siys9r=)OM(vE(e
ziA1VNhGo*lZbVb>Pgc<_#mXR(8zJLHFXKU!ZZ;Iqz9EreYDqn)ivl$0o>2;;rTKbp
zE1WdOO_hdX)O2MF%ZvjL3`hrQ0(DkXHNwh$JN5C=q|+J~4gZtO?=cZ0#3?zeYCwTH
zWWt>oCx{Td9D!{W!)&^Z0Y7JiQ;ak?6cEDIPbkD+LfKp68s-|6FxEw-frV+5DQ!$P
zneb-J3C0GhMi@Vgl3XFJ#$GMG*!)}IuCE$~1@R%|yO2yrO23O5dzwa?+2F9Nn~Lf?
z*G9gW)dMWLpCJs$R!;#kpwJh?`BR;L6(HXka=g6Q5Ok><iK|cx>1u-|necLmV4#ij
z#E5r=8v4^nO`wW3d9fZ*A1NTJ7Wq=bn?=x*NQrZlNZwy&pn8w^{xpas7eRuz5^4V*
zD!SGgR_bD~gRXXV%Q$Md39@zuxWoTr3>wYe^agZd_AOW{_t&`7zM1rt>GGO1AlYT+
zDsFVRf^wy+ySAnszIBG(%^J}2F)3pFJA$koqa~mvpe3Lspe3Lspe3Lspe3Lspe3Ls
zpe3Lspe3Lspe3Lspe3Lspe3Lspe3Lspe3Lspe67(B|wQB?3U1P9P7+eL4^IM3;(9e
W)Gq69OyI|Sozg9bf2Chr*ZB{=uW}{;

literal 0
HcmV?d00001

diff --git a/tests/f_rebuild_csum_rootdir/name b/tests/f_rebuild_csum_rootdir/name
new file mode 100644
index 0000000..b246f48
--- /dev/null
+++ b/tests/f_rebuild_csum_rootdir/name
@@ -0,0 +1 @@
+force fsck to rebuild a corrupted rootdir w/ metadata_csum


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 14/37] dumpe2fs: add switch to disable checksum verification
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (12 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 13/37] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
@ 2014-05-01 23:13 ` Darrick J. Wong
  2014-05-05 17:20   ` Lukáš Czerner
  2014-05-01 23:14 ` [PATCH 15/37] mke2fs: set block_validity as a default mount option Darrick J. Wong
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:13 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Add a -n switch to turn off checksum verification.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/dumpe2fs.8.in |    3 +++
 misc/dumpe2fs.c    |   10 +++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)


diff --git a/misc/dumpe2fs.8.in b/misc/dumpe2fs.8.in
index befaf94..51614db 100644
--- a/misc/dumpe2fs.8.in
+++ b/misc/dumpe2fs.8.in
@@ -61,6 +61,9 @@ using
 .I device
 as the pathname to the image file.
 .TP
+.B \-n
+Don't verify checksums when dumping the filesystem.
+.TP
 .B \-x
 print the detailed group information block numbers in hexadecimal format
 .TP
diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
index ae54f8a..3a3684b 100644
--- a/misc/dumpe2fs.c
+++ b/misc/dumpe2fs.c
@@ -52,7 +52,7 @@ static int blocks64 = 0;
 
 static void usage(void)
 {
-	fprintf (stderr, _("Usage: %s [-bfhixV] [-o superblock=<num>] "
+	fprintf(stderr, _("Usage: %s [-bfhinxV] [-o superblock=<num>] "
 		 "[-o blocksize=<num>] device\n"), program_name);
 	exit (1);
 }
@@ -582,7 +582,9 @@ int main (int argc, char ** argv)
 	if (argc && *argv)
 		program_name = *argv;
 
-	while ((c = getopt (argc, argv, "bfhixVo:")) != EOF) {
+	flags = EXT2_FLAG_JOURNAL_DEV_OK | EXT2_FLAG_SOFTSUPP_FEATURES |
+		EXT2_FLAG_64BITS;
+	while ((c = getopt(argc, argv, "bfhixVo:n")) != EOF) {
 		switch (c) {
 		case 'b':
 			print_badblocks++;
@@ -608,6 +610,9 @@ int main (int argc, char ** argv)
 		case 'x':
 			hex_format++;
 			break;
+		case 'n':
+			flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
+			break;
 		default:
 			usage();
 		}
@@ -615,7 +620,6 @@ int main (int argc, char ** argv)
 	if (optind > argc - 1)
 		usage();
 	device_name = argv[optind++];
-	flags = EXT2_FLAG_JOURNAL_DEV_OK | EXT2_FLAG_SOFTSUPP_FEATURES | EXT2_FLAG_64BITS;
 	if (force)
 		flags |= EXT2_FLAG_FORCE;
 	if (image_dump)


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 15/37] mke2fs: set block_validity as a default mount option
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (13 preceding siblings ...)
  2014-05-01 23:13 ` [PATCH 14/37] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-05 17:24   ` Lukáš Czerner
  2014-05-01 23:14 ` [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
                   ` (19 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The block_validity mount option spot-checks block allocations against
a bitmap of known group metadata blocks.  This helps us to prevent
self-inflicted catastrophic failures such as trying to "share"
critical metadata (think bitmaps) with file data, which usually
results in filesystem destruction.

In order to test the overhead of the mount option, I re-used the speed
tests in the metadata checksum testing script.  In short, the program
creates what looks like 15 copies of a kernel source tree, except that
it uses fallocate to strip out the overhead of writing the file data
so that we can focus on metadata overhead.  On a 64G RAM disk, the
overhead was generally about 0.9% and at most 1.6%.  On a 160G USB
disk, the overhead was about 0.8% and peaked at 1.2%.

When I changed the test to write out files instead of merely
fallocating space, the overhead was negligible.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.conf.in |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
index 4c5dba7..de0250d 100644
--- a/misc/mke2fs.conf.in
+++ b/misc/mke2fs.conf.in
@@ -1,6 +1,6 @@
 [defaults]
 	base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
-	default_mntopts = acl,user_xattr
+	default_mntopts = acl,user_xattr,block_validity
 	enable_periodic_fsck = 0
 	blocksize = 4096
 	inode_size = 256


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (14 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 15/37] mke2fs: set block_validity as a default mount option Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-06 15:45   ` Lukáš Czerner
  2014-05-01 23:14 ` [PATCH 17/37] libext2fs: file IO routines should handle uninit blocks Darrick J. Wong
                   ` (18 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In order to support fallocate, we need to be able to have
ext2fs_bmap2() allocate blocks and put them into uninitialized
extents.  There's a flag to do this in the extent code, but it's not
exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
or somebody will use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/bmap.c      |   24 ++++++++++++++++++++++--
 lib/ext2fs/ext2fs.h    |    1 +
 lib/ext2fs/mkjournal.c |   17 +++++++++++++++++
 3 files changed, 40 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index c1d0e6f..a4dc8ef 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
 					    block_buf + fs->blocksize, &b);
 		if (retval)
 			return retval;
+		if (flags & BMAP_UNINIT) {
+			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
+			if (retval)
+				return retval;
+		}
 
 #ifdef WORDS_BIGENDIAN
 		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
@@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
 	errcode_t		retval = 0;
 	blk64_t			blk64 = 0;
 	int			alloc = 0;
+	int			set_flags;
+
+	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
 
 	if (bmap_flags & BMAP_SET) {
 		retval = ext2fs_extent_set_bmap(handle, block,
-						*phys_blk, 0);
+						*phys_blk, set_flags);
 		return retval;
 	}
 	retval = ext2fs_extent_goto(handle, block);
@@ -254,7 +262,7 @@ got_block:
 		alloc++;
 	set_extent:
 		retval = ext2fs_extent_set_bmap(handle, block,
-						blk64, 0);
+						blk64, set_flags);
 		if (retval) {
 			ext2fs_block_alloc_stats2(fs, blk64, -1);
 			return retval;
@@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 		goto done;
 	}
 
+	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
+		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
+		if (retval)
+			goto done;
+	}
+
 	if (block < EXT2_NDIR_BLOCKS) {
 		if (bmap_flags & BMAP_SET) {
 			b = *phys_blk;
@@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
 			if (retval)
 				goto done;
+			if (bmap_flags & BMAP_UNINIT) {
+				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
+							     NULL);
+				if (retval)
+					goto done;
+			}
 			inode_bmap(inode, block) = b;
 			blocks_alloc++;
 			*phys_blk = b;
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 599c972..819a14a 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
  */
 #define BMAP_ALLOC	0x0001
 #define BMAP_SET	0x0002
+#define BMAP_UNINIT	0x0004
 
 /*
  * Returned flags from ext2fs_bmap
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index 884d9c0..ecc3912 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -174,6 +174,23 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
 			return ENOMEM;
 		memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
 	}
+
+	/* Try discard, if it zeroes data... */
+	if (io_channel_discard_zeroes_data(fs->io)) {
+		memset(buf + fs->blocksize, 0, fs->blocksize);
+		retval = io_channel_discard(fs->io, blk, num);
+		if (retval)
+			goto skip_discard;
+		retval = io_channel_read_blk64(fs->io, blk, 1, buf);
+		if (retval)
+			goto skip_discard;
+		if (memcmp(buf, buf + fs->blocksize, fs->blocksize) == 0)
+			return 0;
+		/* Hah!  Discard doesn't zero! */
+		fs->io->flags &= ~CHANNEL_FLAGS_DISCARD_ZEROES;
+	}
+skip_discard:
+
 	/* OK, do the write loop */
 	j=0;
 	while (j < num) {


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 17/37] libext2fs: file IO routines should handle uninit blocks
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (15 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-01 23:14 ` [PATCH 18/37] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The file IO routines do not handle uninit blocks at all.  The read
method should check for the uninit flag and return a buffer of zeroes,
and the write routine should convert unwritten extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/fileio.c |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 14eaed3..1e386f8 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -123,6 +123,8 @@ errcode_t ext2fs_file_flush(ext2_file_t file)
 {
 	errcode_t	retval;
 	ext2_filsys fs;
+	int		ret_flags;
+	blk64_t		dontcare;
 
 	EXT2_CHECK_MAGIC(file, EXT2_ET_MAGIC_EXT2_FILE);
 	fs = file->fs;
@@ -131,6 +133,22 @@ errcode_t ext2fs_file_flush(ext2_file_t file)
 	    !(file->flags & EXT2_FILE_BUF_DIRTY))
 		return 0;
 
+	/* Is this an uninit block? */
+	if (file->physblock && file->inode.i_flags & EXT4_EXTENTS_FL) {
+		retval = ext2fs_bmap2(fs, file->ino, &file->inode, BMAP_BUFFER,
+				      0, file->blockno, &ret_flags, &dontcare);
+		if (retval)
+			return retval;
+		if (ret_flags & BMAP_RET_UNINIT) {
+			retval = ext2fs_bmap2(fs, file->ino, &file->inode,
+					      BMAP_BUFFER, BMAP_SET,
+					      file->blockno, 0,
+					      &file->physblock);
+			if (retval)
+				return retval;
+		}
+	}
+
 	/*
 	 * OK, the physical block hasn't been allocated yet.
 	 * Allocate it.
@@ -185,15 +203,17 @@ static errcode_t load_buffer(ext2_file_t file, int dontfill)
 {
 	ext2_filsys	fs = file->fs;
 	errcode_t	retval;
+	int		ret_flags;
 
 	if (!(file->flags & EXT2_FILE_BUF_VALID)) {
 		retval = ext2fs_bmap2(fs, file->ino, &file->inode,
-				     BMAP_BUFFER, 0, file->blockno, 0,
+				     BMAP_BUFFER, 0, file->blockno, &ret_flags,
 				     &file->physblock);
 		if (retval)
 			return retval;
 		if (!dontfill) {
-			if (file->physblock) {
+			if (file->physblock &&
+			    !(ret_flags & BMAP_RET_UNINIT)) {
 				retval = io_channel_read_blk64(fs->io,
 							       file->physblock,
 							       1, file->buf);


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 18/37] resize2fs: convert fs to and from 64bit mode
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (16 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 17/37] libext2fs: file IO routines should handle uninit blocks Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-01 23:14 ` [PATCH 19/37] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

resize2fs does its magic by loading a filesystem, duplicating the
in-memory image of that fs, moving relevant blocks out of the way of
whatever new metadata get created, and finally writing everything back
out to disk.  Enabling 64bit mode enlarges the group descriptors,
which makes resize2fs a reasonable vehicle for taking care of the rest
of the bookkeeping requirements, so add to resize2fs the ability to
convert a filesystem to 64bit mode and back.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/main.c         |   40 ++++++-
 resize/resize2fs.8.in |   18 +++
 resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
 resize/resize2fs.h    |    3 +
 4 files changed, 336 insertions(+), 7 deletions(-)


diff --git a/resize/main.c b/resize/main.c
index 2b7abff..e37521a 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -42,7 +42,7 @@ static char *device_name, *io_options;
 static void usage (char *prog)
 {
 	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
-			   "[-p] device [new_size]\n\n"), prog);
+			   "[-p] device [-b|-s|new_size]\n\n"), prog);
 
 	exit (1);
 }
@@ -200,7 +200,7 @@ int main (int argc, char ** argv)
 	if (argc && *argv)
 		program_name = *argv;
 
-	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
+	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
 		switch (c) {
 		case 'h':
 			usage(program_name);
@@ -226,6 +226,12 @@ int main (int argc, char ** argv)
 		case 'S':
 			use_stride = atoi(optarg);
 			break;
+		case 'b':
+			flags |= RESIZE_ENABLE_64BIT;
+			break;
+		case 's':
+			flags |= RESIZE_DISABLE_64BIT;
+			break;
 		default:
 			usage(program_name);
 		}
@@ -384,6 +390,10 @@ int main (int argc, char ** argv)
 		if (sys_page_size > fs->blocksize)
 			new_size &= ~((sys_page_size / fs->blocksize)-1);
 	}
+	/* If changing 64bit, don't change the filesystem size. */
+	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
+		new_size = ext2fs_blocks_count(fs->super);
+	}
 	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				       EXT4_FEATURE_INCOMPAT_64BIT)) {
 		/* Take 16T down to 2^32-1 blocks */
@@ -435,7 +445,31 @@ int main (int argc, char ** argv)
 			fs->blocksize / 1024, new_size);
 		exit(1);
 	}
-	if (new_size == ext2fs_blocks_count(fs->super)) {
+	if ((flags & RESIZE_DISABLE_64BIT) && (flags & RESIZE_ENABLE_64BIT)) {
+		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
+		exit(1);
+	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
+		new_size = ext2fs_blocks_count(fs->super);
+		if (new_size >= (1ULL << 32)) {
+			fprintf(stderr, _("Cannot change the 64bit feature "
+				"on a filesystem that is larger than "
+				"2^32 blocks.\n"));
+			exit(1);
+		}
+		if (mount_flags & EXT2_MF_MOUNTED) {
+			fprintf(stderr, _("Cannot change the 64bit feature "
+				"while the filesystem is mounted.\n"));
+			exit(1);
+		}
+		if (flags & RESIZE_ENABLE_64BIT &&
+		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
+			fprintf(stderr, _("Please enable the extents feature "
+				"with tune2fs before enabling the 64bit "
+				"feature.\n"));
+			exit(1);
+		}
+	} else if (new_size == ext2fs_blocks_count(fs->super)) {
 		fprintf(stderr, _("The filesystem is already %llu blocks "
 			"long.  Nothing to do!\n\n"), new_size);
 		exit(0);
diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
index a1f3099..1c75816 100644
--- a/resize/resize2fs.8.in
+++ b/resize/resize2fs.8.in
@@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
 .SH SYNOPSIS
 .B resize2fs
 [
-.B \-fFpPM
+.B \-fFpPMbs
 ]
 [
 .B \-d
@@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
 to shrink the size of the partition.  When shrinking the size of
 the partition, make sure you do not make it smaller than the new size
 of the ext2 filesystem!
+.PP
+The
+.B \-b
+and
+.B \-s
+options enable and disable the 64bit feature, respectively.  The resize2fs
+program will, of course, take care of resizing the block group descriptors
+and moving other data blocks out of the way, as needed.  It is not possible
+to resize the filesystem concurrent with changing the 64bit status.
 .SH OPTIONS
 .TP
+.B \-b
+Turns on the 64bit feature, resizes the group descriptors as necessary, and
+moves other metadata out of the way.
+.TP
 .B \-d \fIdebug-flags
 Turns on various resize2fs debugging features, if they have been compiled
 into the binary.
@@ -126,6 +139,9 @@ of what the program is doing.
 .B \-P
 Print the minimum size of the filesystem and exit.
 .TP
+.B \-s
+Turns off the 64bit feature and frees blocks that are no longer in use.
+.TP
 .B \-S \fIRAID-stride
 The
 .B resize2fs
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index a81a1c3..e945fef 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -56,6 +56,9 @@ static errcode_t mark_table_blocks(ext2_filsys fs,
 static errcode_t clear_sparse_super2_last_group(ext2_resize_t rfs);
 static errcode_t reserve_sparse_super2_last_group(ext2_resize_t rfs,
 						 ext2fs_block_bitmap meta_bmap);
+static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
+static errcode_t move_bg_metadata(ext2_resize_t rfs);
+static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
 
 /*
  * Some helper CPP macros
@@ -122,13 +125,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
 	if (retval)
 		goto errout;
 
+	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
+	retval = resize_group_descriptors(rfs, *new_size);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
+	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
+	retval = move_bg_metadata(rfs);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
+	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
+	retval = zero_high_bits_in_inodes(rfs);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
 	init_resource_track(&rtrack, "adjust_superblock", fs->io);
 	retval = adjust_superblock(rfs, *new_size);
 	if (retval)
 		goto errout;
 	print_resource_track(rfs, &rtrack, fs->io);
 
-
 	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
 	fix_uninit_block_bitmaps(rfs->new_fs);
 	print_resource_track(rfs, &rtrack, fs->io);
@@ -231,6 +251,259 @@ errout:
 	return retval;
 }
 
+/* Toggle 64bit mode */
+static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
+{
+	void *o, *n, *new_group_desc;
+	dgrp_t i;
+	int copy_size;
+	errcode_t retval;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
+	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
+	    (rfs->flags & RESIZE_DISABLE_64BIT &&
+	     rfs->flags & RESIZE_ENABLE_64BIT))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (rfs->flags & RESIZE_DISABLE_64BIT) {
+		rfs->new_fs->super->s_feature_incompat &=
+				~EXT4_FEATURE_INCOMPAT_64BIT;
+		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
+	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
+		rfs->new_fs->super->s_feature_incompat |=
+				EXT4_FEATURE_INCOMPAT_64BIT;
+		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
+	}
+
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
+	    EXT2_DESC_SIZE(rfs->new_fs->super))
+		return 0;
+
+	o = rfs->new_fs->group_desc;
+	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
+			rfs->old_fs->group_desc_count,
+			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
+	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
+				      rfs->old_fs->blocksize, &new_group_desc);
+	if (retval)
+		return retval;
+
+	n = new_group_desc;
+
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
+	    EXT2_DESC_SIZE(rfs->new_fs->super))
+		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
+	else
+		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		memcpy(n, o, copy_size);
+		n += EXT2_DESC_SIZE(rfs->new_fs->super);
+		o += EXT2_DESC_SIZE(rfs->old_fs->super);
+	}
+
+	ext2fs_free_mem(&rfs->new_fs->group_desc);
+	rfs->new_fs->group_desc = new_group_desc;
+
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
+		ext2fs_group_desc_csum_set(rfs->new_fs, i);
+
+	return 0;
+}
+
+/* Move bitmaps/inode tables out of the way. */
+static errcode_t move_bg_metadata(ext2_resize_t rfs)
+{
+	dgrp_t i;
+	blk64_t b, c, d;
+	ext2fs_block_bitmap old_map, new_map;
+	int old, new;
+	errcode_t retval;
+	int zero = 0, one = 1;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
+	if (retval)
+		goto out;
+
+	/* Construct bitmaps of super/descriptor blocks in old and new fs */
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
+						   NULL);
+		if (retval)
+			goto out;
+		ext2fs_mark_block_bitmap2(old_map, b);
+		ext2fs_mark_block_bitmap2(old_map, c);
+		ext2fs_mark_block_bitmap2(old_map, d);
+
+		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
+						   NULL);
+		if (retval)
+			goto out;
+		ext2fs_mark_block_bitmap2(new_map, b);
+		ext2fs_mark_block_bitmap2(new_map, c);
+		ext2fs_mark_block_bitmap2(new_map, d);
+	}
+
+	/* Find changes in block allocations for bg metadata */
+	for (b = 0;
+	     b < ext2fs_blocks_count(rfs->new_fs->super);
+	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
+		old = ext2fs_test_block_bitmap2(old_map, b);
+		new = ext2fs_test_block_bitmap2(new_map, b);
+
+		if (old && !new)
+			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
+		else if (!old && new)
+			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
+		else
+			ext2fs_unmark_block_bitmap2(new_map, b);
+	}
+	/* new_map now shows blocks that have been newly allocated. */
+
+	/* Move any conflicting bitmaps and inode tables */
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
+		if (ext2fs_test_block_bitmap2(new_map, b))
+			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
+
+		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
+		if (ext2fs_test_block_bitmap2(new_map, b))
+			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
+
+		c = ext2fs_inode_table_loc(rfs->new_fs, i);
+		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
+			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
+				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
+				break;
+			}
+		}
+	}
+
+out:
+	if (old_map)
+		ext2fs_free_block_bitmap(old_map);
+	if (new_map)
+		ext2fs_free_block_bitmap(new_map);
+	return retval;
+}
+
+/* Zero out the high bits of extent fields */
+static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
+				 struct ext2_inode *inode)
+{
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	extent;
+	int			op = EXT2_EXTENT_ROOT;
+	errcode_t		errcode;
+
+	if (!(inode->i_flags & EXT4_EXTENTS_FL))
+		return 0;
+
+	errcode = ext2fs_extent_open(fs, ino, &handle);
+	if (errcode)
+		return errcode;
+
+	while (1) {
+		errcode = ext2fs_extent_get(handle, op, &extent);
+		if (errcode)
+			break;
+
+		op = EXT2_EXTENT_NEXT_SIB;
+
+		if (extent.e_pblk > (1ULL << 32)) {
+			extent.e_pblk &= (1ULL << 32) - 1;
+			errcode = ext2fs_extent_replace(handle, 0, &extent);
+			if (errcode)
+				break;
+		}
+	}
+
+	/* Ok if we run off the end */
+	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
+		errcode = 0;
+	return errcode;
+}
+
+/* Zero out the high bits of inodes. */
+static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
+{
+	ext2_filsys	fs = rfs->new_fs;
+	int length = EXT2_INODE_SIZE(fs->super);
+	struct ext2_inode *inode = NULL;
+	ext2_inode_scan	scan = NULL;
+	errcode_t	retval;
+	ext2_ino_t	ino;
+	blk64_t		file_acl_block;
+	int		inode_dirty;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	if (fs->super->s_creator_os != EXT2_OS_LINUX)
+		return 0;
+
+	retval = ext2fs_open_inode_scan(fs, 0, &scan);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_get_mem(length, &inode);
+	if (retval)
+		goto out;
+
+	do {
+		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
+		if (retval)
+			goto out;
+		if (!ino)
+			break;
+		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
+			continue;
+
+		/*
+		 * Here's how we deal with high block number fields:
+		 *
+		 *  - i_size_high has been been written out with i_size_lo
+		 *    since the ext2 days, so no conversion is needed.
+		 *
+		 *  - i_blocks_hi is guarded by both the huge_file feature and
+		 *    inode flags and has always been written out with
+		 *    i_blocks_lo if the feature is set.  The field is only
+		 *    ever read if both feature and inode flag are set, so
+		 *    we don't need to zero it now.
+		 *
+		 *  - i_file_acl_high can be uninitialized, so zero it if
+		 *    it isn't already.
+		 */
+		if (inode->osd2.linux2.l_i_file_acl_high) {
+			inode->osd2.linux2.l_i_file_acl_high = 0;
+			retval = ext2fs_write_inode_full(fs, ino, inode,
+							 length);
+			if (retval)
+				goto out;
+		}
+
+		retval = zero_high_bits_in_extents(fs, ino, inode);
+		if (retval)
+			goto out;
+	} while (ino);
+
+out:
+	if (inode)
+		ext2fs_free_mem(&inode);
+	if (scan)
+		ext2fs_close_inode_scan(scan);
+	return retval;
+}
+
 /*
  * Clean up the bitmaps for unitialized bitmaps
  */
@@ -455,7 +728,8 @@ retry:
 	/*
 	 * Reallocate the group descriptors as necessary.
 	 */
-	if (old_fs->desc_blocks != fs->desc_blocks) {
+	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
+	    old_fs->desc_blocks != fs->desc_blocks) {
 		retval = ext2fs_resize_mem(old_fs->desc_blocks *
 					   fs->blocksize,
 					   fs->desc_blocks * fs->blocksize,
@@ -1006,7 +1280,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	if (retval)
 		goto errout;
 
-	if (old_blocks == new_blocks) {
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
+	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
+	    old_blocks == new_blocks) {
 		retval = 0;
 		goto errout;
 	}
diff --git a/resize/resize2fs.h b/resize/resize2fs.h
index 7aeab91..829fcd8 100644
--- a/resize/resize2fs.h
+++ b/resize/resize2fs.h
@@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
 #define RESIZE_PERCENT_COMPLETE		0x0100
 #define RESIZE_VERBOSE			0x0200
 
+#define RESIZE_ENABLE_64BIT		0x0400
+#define RESIZE_DISABLE_64BIT		0x0800
+
 /*
  * This structure is used for keeping track of how much resources have
  * been used for a particular resize2fs pass.


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 19/37] resize2fs: when toggling 64bit, don't free in-use bg data clusters
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (17 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 18/37] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-01 23:14 ` [PATCH 20/37] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Currently, move_bg_metadata() assumes that if a block containing a
superblock or a group descriptor is no longer needed, then it is safe
to free the whole cluster.  This of course isn't true, for bitmaps and
inode tables can share these clusters.  Therefore, check a little more
carefully before freeing clusters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   71 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 55 insertions(+), 16 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index e945fef..7eb025e 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -317,11 +317,11 @@ static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 static errcode_t move_bg_metadata(ext2_resize_t rfs)
 {
 	dgrp_t i;
-	blk64_t b, c, d;
+	blk64_t b, c, d, old_desc_blocks, new_desc_blocks, j;
 	ext2fs_block_bitmap old_map, new_map;
 	int old, new;
 	errcode_t retval;
-	int zero = 0, one = 1;
+	int zero = 0, one = 1, cluster_ratio;
 
 	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
 		return 0;
@@ -334,6 +334,17 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 	if (retval)
 		goto out;
 
+	if (EXT2_HAS_INCOMPAT_FEATURE(rfs->old_fs->super,
+				      EXT2_FEATURE_INCOMPAT_META_BG)) {
+		old_desc_blocks = rfs->old_fs->super->s_first_meta_bg;
+		new_desc_blocks = rfs->new_fs->super->s_first_meta_bg;
+	} else {
+		old_desc_blocks = rfs->old_fs->desc_blocks +
+				rfs->old_fs->super->s_reserved_gdt_blocks;
+		new_desc_blocks = rfs->new_fs->desc_blocks +
+				rfs->new_fs->super->s_reserved_gdt_blocks;
+	}
+
 	/* Construct bitmaps of super/descriptor blocks in old and new fs */
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
 		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
@@ -341,7 +352,8 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 		if (retval)
 			goto out;
 		ext2fs_mark_block_bitmap2(old_map, b);
-		ext2fs_mark_block_bitmap2(old_map, c);
+		for (j = 0; c != 0 && j < old_desc_blocks; j++)
+			ext2fs_mark_block_bitmap2(old_map, c + j);
 		ext2fs_mark_block_bitmap2(old_map, d);
 
 		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
@@ -349,45 +361,72 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 		if (retval)
 			goto out;
 		ext2fs_mark_block_bitmap2(new_map, b);
-		ext2fs_mark_block_bitmap2(new_map, c);
+		for (j = 0; c != 0 && j < new_desc_blocks; j++)
+			ext2fs_mark_block_bitmap2(new_map, c + j);
 		ext2fs_mark_block_bitmap2(new_map, d);
 	}
 
+	cluster_ratio = EXT2FS_CLUSTER_RATIO(rfs->new_fs);
+
 	/* Find changes in block allocations for bg metadata */
 	for (b = 0;
 	     b < ext2fs_blocks_count(rfs->new_fs->super);
-	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
+	     b += cluster_ratio) {
 		old = ext2fs_test_block_bitmap2(old_map, b);
 		new = ext2fs_test_block_bitmap2(new_map, b);
 
-		if (old && !new)
-			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
-		else if (!old && new)
-			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
-		else
+		if (old && !new) {
+			/* mark old_map, unmark new_map */
+			if (cluster_ratio == 1)
+				ext2fs_unmark_block_bitmap2(
+						rfs->new_fs->block_map, b);
+		} else if (!old && new)
+			; /* unmark old_map, mark new_map */
+		else {
+			ext2fs_unmark_block_bitmap2(old_map, b);
 			ext2fs_unmark_block_bitmap2(new_map, b);
+		}
 	}
-	/* new_map now shows blocks that have been newly allocated. */
 
-	/* Move any conflicting bitmaps and inode tables */
+	/*
+	 * new_map now shows blocks that have been newly allocated.
+	 * old_map now shows blocks that have been newly freed.
+	 */
+
+	/*
+	 * Move any conflicting bitmaps and inode tables.  Ensure that we
+	 * don't try to free clusters associated with bitmaps or tables.
+	 */
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
 		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
 		if (ext2fs_test_block_bitmap2(new_map, b))
 			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
+		else if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(old_map, b);
 
 		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
 		if (ext2fs_test_block_bitmap2(new_map, b))
 			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
+		else if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(old_map, b);
 
 		c = ext2fs_inode_table_loc(rfs->new_fs, i);
-		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
-			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
+		for (b = 0;
+		     b < rfs->new_fs->inode_blocks_per_group;
+		     b++) {
+			if (ext2fs_test_block_bitmap2(new_map, b + c))
 				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
-				break;
-			}
+			else if (ext2fs_test_block_bitmap2(old_map, b + c))
+				ext2fs_unmark_block_bitmap2(old_map, b + c);
 		}
 	}
 
+	/* Free unused clusters */
+	for (b = 0;
+	     cluster_ratio > 1 && b < ext2fs_blocks_count(rfs->new_fs->super);
+	     b += cluster_ratio)
+		if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
 out:
 	if (old_map)
 		ext2fs_free_block_bitmap(old_map);


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 20/37] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (18 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 19/37] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-01 23:14 ` [PATCH 21/37] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Since we're constructing the fantasy that new_fs has always been a
64bit fs, we need to adjust reserved_gdt_blocks when we start resizing
the metadata so that the size of the gdt space in the new fs reflects
the fantasy throughout the resize process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 7eb025e..8227e81 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -251,6 +251,24 @@ errout:
 	return retval;
 }
 
+/* Keep the size of the group descriptor region constant */
+static void adjust_reserved_gdt_blocks(ext2_filsys old_fs, ext2_filsys fs)
+{
+	if ((fs->super->s_feature_compat &
+	     EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
+	    (old_fs->desc_blocks != fs->desc_blocks)) {
+		int new;
+
+		new = ((int) fs->super->s_reserved_gdt_blocks) +
+			(old_fs->desc_blocks - fs->desc_blocks);
+		if (new < 0)
+			new = 0;
+		if (new > (int) fs->blocksize/4)
+			new = fs->blocksize/4;
+		fs->super->s_reserved_gdt_blocks = new;
+	}
+}
+
 /* Toggle 64bit mode */
 static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 {
@@ -310,6 +328,8 @@ static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
 		ext2fs_group_desc_csum_set(rfs->new_fs, i);
 
+	adjust_reserved_gdt_blocks(rfs->old_fs, rfs->new_fs);
+
 	return 0;
 }
 
@@ -787,20 +807,11 @@ retry:
 	 * number of descriptor blocks, then adjust
 	 * s_reserved_gdt_blocks if possible to avoid needing to move
 	 * the inode table either now or in the future.
+	 *
+	 * Note: If we're converting to 64bit mode, we did this earlier.
 	 */
-	if ((fs->super->s_feature_compat &
-	     EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
-	    (old_fs->desc_blocks != fs->desc_blocks)) {
-		int new;

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 21/37] libext2fs: have UNIX IO manager use pread/pwrite
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (19 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 20/37] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-08-02 23:16   ` Theodore Ts'o
  2014-05-01 23:14 ` [PATCH 22/37] ext2fs: add readahead method to improve scanning Darrick J. Wong
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If pread/pwrite are present, have the UNIX IO manager use them for
aligned IOs (instead of the current seek -> read/write), thereby
saving us a (minor) amount of system call overhead.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure            |    2 +-
 configure.in         |    2 ++
 lib/config.h.in      |    6 ++++++
 lib/ext2fs/unix_io.c |   24 ++++++++++++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)


diff --git a/configure b/configure
index 6449f59..7b0a0d1 100755
--- a/configure
+++ b/configure
@@ -11155,7 +11155,7 @@ if test "$ac_res" != no; then :
 fi
 
 fi
-for ac_func in  	__secure_getenv 	backtrace 	blkid_probe_get_topology 	chflags 	fadvise64 	fallocate 	fallocate64 	fchown 	fdatasync 	fstat64 	ftruncate64 	futimes 	getcwd 	getdtablesize 	getmntinfo 	getpwuid_r 	getrlimit 	getrusage 	jrand48 	llseek 	lseek64 	mallinfo 	mbstowcs 	memalign 	mempcpy 	mmap 	msync 	nanosleep 	open64 	pathconf 	posix_fadvise 	posix_fadvise64 	posix_memalign 	prctl 	secure_getenv 	setmntent 	setresgid 	setresuid 	srandom 	stpcpy 	strcasecmp 	strdup 	strnlen 	strptime 	strtoull 	sync_file_range 	sysconf 	usleep 	utime 	valloc
+for ac_func in  	__secure_getenv 	backtrace 	blkid_probe_get_topology 	chflags 	fadvise64 	fallocate 	fallocate64 	fchown 	fdatasync 	fstat64 	ftruncate64 	futimes 	getcwd 	getdtablesize 	getmntinfo 	getpwuid_r 	getrlimit 	getrusage 	jrand48 	llseek 	lseek64 	mallinfo 	mbstowcs 	memalign 	mempcpy 	mmap 	msync 	nanosleep 	open64 	pathconf 	posix_fadvise 	posix_fadvise64 	posix_memalign 	prctl 	pread 	pwrite 	secure_getenv 	setmntent 	setresgid 	setresuid 	srandom 	stpcpy 	strcasecmp 	strdup 	strnlen 	strptime 	strtoull 	sync_file_range 	sysconf 	usleep 	utime 	valloc
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 8a033b0..f28bd46 100644
--- a/configure.in
+++ b/configure.in
@@ -1135,6 +1135,8 @@ AC_CHECK_FUNCS(m4_flatten([
 	posix_fadvise64
 	posix_memalign
 	prctl
+	pread
+	pwrite
 	secure_getenv
 	setmntent
 	setresgid
diff --git a/lib/config.h.in b/lib/config.h.in
index 12ac1e0..e0384ee 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -311,9 +311,15 @@
 /* Define to 1 if you have the `prctl' function. */
 #undef HAVE_PRCTL
 
+/* Define to 1 if you have the `pread' function. */
+#undef HAVE_PREAD
+
 /* Define to 1 if you have the `putenv' function. */
 #undef HAVE_PUTENV
 
+/* Define to 1 if you have the `pwrite' function. */
+#undef HAVE_PWRITE
+
 /* Define to 1 if dirent has d_reclen */
 #undef HAVE_RECLEN_DIRENT
 
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index c3185b6..a818c13 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -130,6 +130,18 @@ static errcode_t raw_read_blk(io_channel channel,
 	size = (count < 0) ? -count : count * channel->block_size;
 	data->io_stats.bytes_read += size;
 	location = ((ext2_loff_t) block * channel->block_size) + data->offset;
+
+#ifdef HAVE_PREAD
+	/* Try an aligned pread */
+	if ((channel->align == 0) ||
+	    (IS_ALIGNED(buf, channel->align) &&
+	     IS_ALIGNED(size, channel->align))) {
+		actual = pread(data->dev, buf, size, location);
+		if (actual == size)
+			return 0;
+	}
+#endif /* HAVE_PREAD */
+
 	if (ext2fs_llseek(data->dev, location, SEEK_SET) != location) {
 		retval = errno ? errno : EXT2_ET_LLSEEK_FAILED;
 		goto error_out;
@@ -200,6 +212,18 @@ static errcode_t raw_write_blk(io_channel channel,
 	data->io_stats.bytes_written += size;
 
 	location = ((ext2_loff_t) block * channel->block_size) + data->offset;
+
+#ifdef HAVE_PWRITE
+	/* Try an aligned pwrite */
+	if ((channel->align == 0) ||
+	    (IS_ALIGNED(buf, channel->align) &&
+	     IS_ALIGNED(size, channel->align))) {
+		actual = pwrite(data->dev, buf, size, location);
+		if (actual == size)
+			return 0;
+	}
+#endif /* HAVE_PWRITE */
+
 	if (ext2fs_llseek(data->dev, location, SEEK_SET) != location) {
 		retval = errno ? errno : EXT2_ET_LLSEEK_FAILED;
 		goto error_out;


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 22/37] ext2fs: add readahead method to improve scanning
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (20 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 21/37] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-01 23:14 ` [PATCH 23/37] e2fsck: provide routines to read-ahead metadata Darrick J. Wong
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4, Andreas Dilger

Frøm: Andreas Dilger <adilger@whamcloud.com>

Add a readahead method for prefetching ranges of disk blocks.  This is
useful for inode table scanning, and other large contiguous ranges of
blocks, and may also prove useful for random block prefetch, since it
will allow reordering of the IO without waiting synchronously for the
reads to complete.

It is currently using the posix_fadvise(POSIX_FADV_WILLNEED)
interface, as this proved most efficient during our testing.

[darrick.wong@oracle.com]
Add a cache_release method for advising the pagecache to discard disk
cache blocks.  Make the arguments to the readahead function take the
same ULL values as the other IO functions, and return an appropriate
error code when fadvise isn't available.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/ext2_io.h    |   12 ++++++++++++
 lib/ext2fs/io_manager.c |   18 ++++++++++++++++++
 lib/ext2fs/unix_io.c    |   46 +++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 73 insertions(+), 3 deletions(-)


diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 1894fb8..636f797 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -90,6 +90,12 @@ struct struct_io_manager {
 					int count, const void *data);
 	errcode_t (*discard)(io_channel channel, unsigned long long block,
 			     unsigned long long count);
+	errcode_t (*cache_readahead)(io_channel channel,
+				     unsigned long long block,
+				     unsigned long long count);
+	errcode_t (*cache_release)(io_channel channel,
+				   unsigned long long block,
+				   unsigned long long count);
 	long	reserved[16];
 };
 
@@ -124,6 +130,12 @@ extern errcode_t io_channel_discard(io_channel channel,
 				    unsigned long long count);
 extern errcode_t io_channel_alloc_buf(io_channel channel,
 				      int count, void *ptr);
+extern errcode_t io_channel_cache_readahead(io_channel io,
+					    unsigned long long block,
+					    unsigned long long count);
+extern errcode_t io_channel_cache_release(io_channel io,
+					  unsigned long long block,
+					  unsigned long long count);
 
 /* unix_io.c */
 extern io_manager unix_io_manager;
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index 34e4859..a1258c4 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -128,3 +128,21 @@ errcode_t io_channel_alloc_buf(io_channel io, int count, void *ptr)
 	else
 		return ext2fs_get_mem(size, ptr);
 }
+
+errcode_t io_channel_cache_readahead(io_channel io, unsigned long long block,
+				     unsigned long long count)
+{
+	if (!io->manager->cache_readahead)
+		return EXT2_ET_OP_NOT_SUPPORTED;
+
+	return io->manager->cache_readahead(io, block, count);
+}
+
+errcode_t io_channel_cache_release(io_channel io, unsigned long long block,
+				   unsigned long long count)
+{
+	if (!io->manager->cache_release)
+		return EXT2_ET_OP_NOT_SUPPORTED;
+
+	return io->manager->cache_release(io, block, count);
+}
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index a818c13..a95e289 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -15,6 +15,9 @@
  * %End-Header%
  */
 
+#define _XOPEN_SOURCE 600
+#define _DARWIN_C_SOURCE
+#define _FILE_OFFSET_BITS 64
 #define _LARGEFILE_SOURCE
 #define _LARGEFILE64_SOURCE
 #ifndef _GNU_SOURCE
@@ -35,6 +38,9 @@
 #ifdef __linux__
 #include <sys/utsname.h>
 #endif
+#if HAVE_SYS_TYPES_H
+#include <sys/types.h>
+#endif
 #ifdef HAVE_SYS_IOCTL_H
 #include <sys/ioctl.h>
 #endif
@@ -44,9 +50,6 @@
 #if HAVE_SYS_STAT_H
 #include <sys/stat.h>
 #endif
-#if HAVE_SYS_TYPES_H
-#include <sys/types.h>
-#endif
 #if HAVE_SYS_RESOURCE_H
 #include <sys/resource.h>
 #endif
@@ -97,6 +100,7 @@ struct unix_private_data {
 #define IS_ALIGNED(n, align) ((((unsigned long) n) & \
 			       ((unsigned long) ((align)-1))) == 0)
 
+
 static errcode_t unix_get_stats(io_channel channel, io_stats *stats)
 {
 	errcode_t	retval = 0;
@@ -810,6 +814,40 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
 #endif /* NO_IO_CACHE */
 }
 
+static errcode_t unix_cache_readahead(io_channel channel,
+				      unsigned long long block,
+				      unsigned long long count)
+{
+#ifdef POSIX_FADV_WILLNEED
+	struct unix_private_data *data;
+
+	data = (struct unix_private_data *)channel->private_data;
+	return posix_fadvise(data->dev,
+			     (ext2_loff_t)block * channel->block_size,
+			     (ext2_loff_t)count * channel->block_size,
+			     POSIX_FADV_WILLNEED);
+#else
+	return EXT2_ET_OP_NOT_SUPPORTED;
+#endif
+}
+
+static errcode_t unix_cache_release(io_channel channel,
+				    unsigned long long block,
+				    unsigned long long count)
+{
+#ifdef POSIX_FADV_DONTNEED
+	struct unix_private_data *data;
+
+	data = (struct unix_private_data *)channel->private_data;
+	return posix_fadvise(data->dev,
+			     (ext2_loff_t)block * channel->block_size,
+			     (ext2_loff_t)count * channel->block_size,
+			     POSIX_FADV_DONTNEED);
+#else
+	return EXT2_ET_OP_NOT_SUPPORTED;
+#endif
+}
+
 static errcode_t unix_write_blk(io_channel channel, unsigned long block,
 				int count, const void *buf)
 {
@@ -961,6 +999,8 @@ static struct struct_io_manager struct_unix_manager = {
 	unix_read_blk64,
 	unix_write_blk64,
 	unix_discard,
+	unix_cache_readahead,
+	unix_cache_release,
 };
 
 io_manager unix_io_manager = &struct_unix_manager;

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 23/37] e2fsck: provide routines to read-ahead metadata
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (21 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 22/37] ext2fs: add readahead method to improve scanning Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-05-01 23:14 ` [PATCH 24/37] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

This patch adds to e2fsck the ability to pre-fetch metadata into the
page cache in the hopes of speeding up fsck runs.  There are two new
functions -- the first allows a caller to readahead a list of blocks,
and the second is a helper function that uses that first mechanism to
load group data (bitmaps, inode tables).

e2fsck will employ both of these methods to speed itself up.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/Makefile.in |    8 ++
 e2fsck/e2fsck.h    |   12 +++
 e2fsck/readahead.c |  187 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 205 insertions(+), 2 deletions(-)
 create mode 100644 e2fsck/readahead.c


diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 5a6883a..2e08982 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -71,7 +71,7 @@ OBJS= dict.o unix.o e2fsck.o super.o pass1.o pass1b.o pass2.o \
 	pass3.o pass4.o pass5.o journal.o badblocks.o util.o dirinfo.o \
 	dx_dirinfo.o ehandler.o problem.o message.o quota.o recovery.o \
 	region.o revoke.o ea_refcount.o rehash.o profile.o prof_err.o \
-	logfile.o sigcatcher.o $(MTRACE_OBJ)
+	logfile.o sigcatcher.o readahead.o $(MTRACE_OBJ)
 
 PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/super.o profiled/pass1.o profiled/pass1b.o \
@@ -82,7 +82,7 @@ PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/recovery.o profiled/region.o profiled/revoke.o \
 	profiled/ea_refcount.o profiled/rehash.o profiled/profile.o \
 	profiled/prof_err.o profiled/logfile.o \
-	profiled/sigcatcher.o
+	profiled/sigcatcher.o profiled/readahead.o
 
 SRCS= $(srcdir)/e2fsck.c \
 	$(srcdir)/dict.c \
@@ -106,6 +106,7 @@ SRCS= $(srcdir)/e2fsck.c \
 	$(srcdir)/message.c \
 	$(srcdir)/ea_refcount.c \
 	$(srcdir)/rehash.c \
+	$(srcdir)/readahead.c \
 	$(srcdir)/region.c \
 	$(srcdir)/profile.c \
 	$(srcdir)/sigcatcher.c \
@@ -550,3 +551,6 @@ quota.o: $(srcdir)/quota.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/quota/quotaio.h $(top_srcdir)/lib/quota/dqblk_v2.h \
  $(top_srcdir)/lib/quota/quotaio_tree.h $(top_srcdir)/lib/../e2fsck/dict.h \
  $(srcdir)/problem.h $(top_srcdir)/lib/quota/quotaio.h
+readahead.o: $(srcdir)/readahead.c $(top_builddir)/lib/config.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/e2fsck.h
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index d7a7be9..c739329 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -484,6 +484,18 @@ extern ext2_ino_t e2fsck_get_lost_and_found(e2fsck_t ctx, int fix);
 extern errcode_t e2fsck_adjust_inode_count(e2fsck_t ctx, ext2_ino_t ino,
 					   int adj);
 
+/* readahead.c */
+#define E2FSCK_READA_SUPER	(0x01)
+#define E2FSCK_READA_GDT	(0x02)
+#define E2FSCK_READA_BBITMAP	(0x04)
+#define E2FSCK_READA_IBITMAP	(0x08)
+#define E2FSCK_READA_ITABLE	(0x10)
+#define E2FSCK_READA_ALL_FLAGS	(0x1F)
+errcode_t e2fsck_readahead(ext2_filsys fs, int flags, dgrp_t start,
+			   dgrp_t ngroups);
+errcode_t e2fsck_readahead_dblist(ext2_filsys fs, int flags,
+				  ext2_dblist dblist);
+int e2fsck_can_readahead(ext2_filsys fs);
 
 /* region.c */
 extern region_t region_create(region_addr_t min, region_addr_t max);
diff --git a/e2fsck/readahead.c b/e2fsck/readahead.c
new file mode 100644
index 0000000..79608af
--- /dev/null
+++ b/e2fsck/readahead.c
@@ -0,0 +1,187 @@
+/*
+ * readahead.c -- Prefetch filesystem metadata to speed up fsck.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+
+#include "e2fsck.h"
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+struct read_dblist {
+	errcode_t err;
+	blk64_t run_start;
+	blk64_t run_len;
+};
+
+static EXT2_QSORT_TYPE readahead_dir_block_cmp(const void *a, const void *b)
+{
+	const struct ext2_db_entry2 *db_a =
+		(const struct ext2_db_entry2 *) a;
+	const struct ext2_db_entry2 *db_b =
+		(const struct ext2_db_entry2 *) b;
+
+	return (int) (db_a->blk - db_b->blk);
+}
+
+static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
+			       void *priv_data)
+{
+	errcode_t err = 0;
+	struct read_dblist *pr = priv_data;
+
+	if (!pr->run_len || db->blk != pr->run_start + pr->run_len) {
+		if (pr->run_len) {
+			pr->err = io_channel_cache_readahead(fs->io,
+							     pr->run_start,
+							     pr->run_len);
+			dbg_printf("readahead start=%llu len=%llu err=%d\n",
+				   pr->run_start, pr->run_len,
+				   (int)pr->err);
+		}
+		pr->run_start = db->blk;
+		pr->run_len = 0;
+	}
+	pr->run_len += db->blockcnt;
+
+	return pr->err ? DBLIST_ABORT : 0;
+}
+
+errcode_t e2fsck_readahead_dblist(ext2_filsys fs, int flags,
+				  ext2_dblist dblist)
+{
+	errcode_t err;
+	struct read_dblist pr;
+
+	dbg_printf("%s: flags=0x%x\n", __func__, flags);
+	if (flags)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	ext2fs_dblist_sort2(dblist, readahead_dir_block_cmp);
+
+	memset(&pr, 0, sizeof(pr));
+	err = ext2fs_dblist_iterate2(dblist, readahead_dir_block, &pr);
+	if (pr.err)
+		return pr.err;
+	if (err)
+		return err;
+
+	if (pr.run_len)
+		err = io_channel_cache_readahead(fs->io, pr.run_start,
+						 pr.run_len);
+
+	return err;
+}
+
+errcode_t e2fsck_readahead(ext2_filsys fs, int flags, dgrp_t start,
+			   dgrp_t ngroups)
+{
+	blk64_t		super, old_gdt, new_gdt;
+	blk_t		blocks;
+	dgrp_t		i;
+	ext2_dblist	dblist;
+	dgrp_t		end = start + ngroups;
+	errcode_t	err = 0;
+
+	dbg_printf("%s: flags=0x%x start=%d groups=%d\n", __func__, flags,
+		   start, ngroups);
+	if (flags & ~E2FSCK_READA_ALL_FLAGS)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (end > fs->group_desc_count)
+		end = fs->group_desc_count;
+
+	if (flags == 0)
+		return 0;
+
+	err = ext2fs_init_dblist(fs, &dblist);
+	if (err)
+		return err;
+
+	for (i = start; i < end; i++) {
+		err = ext2fs_super_and_bgd_loc2(fs, i, &super, &old_gdt,
+						&new_gdt, &blocks);
+		if (err)
+			break;
+
+		if (flags & E2FSCK_READA_SUPER) {
+			err = ext2fs_add_dir_block2(dblist, 0, super, 0);
+			if (err)
+				break;
+		}
+
+		if (flags & E2FSCK_READA_GDT) {
+			if (old_gdt)
+				err = ext2fs_add_dir_block2(dblist, 0, old_gdt,
+							    blocks);
+			else if (new_gdt)
+				err = ext2fs_add_dir_block2(dblist, 0, new_gdt,
+							    blocks);
+			else
+				err = 0;
+			if (err)
+				break;
+		}
+
+		if ((flags & E2FSCK_READA_BBITMAP) &&
+		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_BLOCK_UNINIT) &&
+		    ext2fs_bg_free_blocks_count(fs, i) <
+				fs->super->s_blocks_per_group) {
+			super = ext2fs_block_bitmap_loc(fs, i);
+			err = ext2fs_add_dir_block2(dblist, 0, super, 1);
+			if (err)
+				break;
+		}
+
+		if ((flags & E2FSCK_READA_IBITMAP) &&
+		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_INODE_UNINIT) &&
+		    ext2fs_bg_free_inodes_count(fs, i) <
+				fs->super->s_inodes_per_group) {
+			super = ext2fs_inode_bitmap_loc(fs, i);
+			err = ext2fs_add_dir_block2(dblist, 0, super, 1);
+			if (err)
+				break;
+		}
+
+		if ((flags & E2FSCK_READA_ITABLE) &&
+		    ext2fs_bg_free_inodes_count(fs, i) <
+				fs->super->s_inodes_per_group) {
+			super = ext2fs_inode_table_loc(fs, i);
+			blocks = fs->inode_blocks_per_group -
+				 (ext2fs_bg_itable_unused(fs, i) *
+				  EXT2_INODE_SIZE(fs->super) / fs->blocksize);
+			err = ext2fs_add_dir_block2(dblist, 0, super, blocks);
+			if (err)
+				break;
+		}
+	}
+
+	if (!err)
+		err = e2fsck_readahead_dblist(fs, 0, dblist);
+
+	ext2fs_free_dblist(dblist);
+	return err;
+}
+
+int e2fsck_can_readahead(ext2_filsys fs)
+{
+	errcode_t err;
+
+	err = io_channel_cache_readahead(fs->io, 0, 1);
+	dbg_printf("%s: supp=%d\n", __func__, err != EXT2_ET_OP_NOT_SUPPORTED);
+	return err != EXT2_ET_OP_NOT_SUPPORTED;
+}


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 24/37] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (22 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 23/37] e2fsck: provide routines to read-ahead metadata Darrick J. Wong
@ 2014-05-01 23:14 ` Darrick J. Wong
  2014-07-28 22:25   ` Darrick J. Wong
  2014-05-01 23:15 ` [PATCH 25/37] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
                   ` (10 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:14 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed.  In order to avoid cache thrashing, we limit ourselves to
prefetching at most half the available memory.

pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1.  So long as
we don't anticipate rehashing the dirs (pass 3a), we can release the
dirblocks as soon as we're done checking them.

pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.

In general, these mechanisms can halve fsck time, if the host system
has sufficient memory and the storage system can provide a lot of
IOPs.  SSDs and multi-spindle RAIDs see the most speedup; single disks
experience a modest speedup, and single-spindle USB mass storage
devices see hardly any benefit.

By default, readahead will try to fill half the physical memory in the
system.  The -E readahead_mem_kb= option can be given to specify the
amount of memory to use for readahead, or zero to disable it entirely;
or an option can be given in e2fsck.conf.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 MCONFIG.in              |    1 
 configure               |   49 +++++++++++++++++
 configure.in            |    6 ++
 e2fsck/Makefile.in      |    4 +
 e2fsck/e2fsck.8.in      |    5 ++
 e2fsck/e2fsck.c         |  136 +++++++++++++++++++++++++++++++++++++++++++++++
 e2fsck/e2fsck.conf.5.in |   13 ++++
 e2fsck/e2fsck.h         |   25 +++++++++
 e2fsck/pass1.c          |  106 ++++++++++++++++++++++++++++++++++++-
 e2fsck/pass2.c          |   95 ++++++++++++++++++++++++++++++++-
 e2fsck/pass4.c          |   22 ++++++++
 e2fsck/prof_err.et      |    1 
 e2fsck/rehash.c         |   10 +++
 e2fsck/unix.c           |   35 ++++++++++++
 e2fsck/util.c           |   51 ++++++++++++++++++
 lib/config.h.in         |    9 +++
 16 files changed, 563 insertions(+), 5 deletions(-)


diff --git a/MCONFIG.in b/MCONFIG.in
index 7e520be..352c133 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -116,6 +116,7 @@ LIBUUID = @LIBUUID@ @SOCKET_LIB@
 LIBQUOTA = @STATIC_LIBQUOTA@
 LIBBLKID = @LIBBLKID@ @PRIVATE_LIBS_CMT@ $(LIBUUID)
 LIBINTL = @LIBINTL@
+LIBPTHREADS = @PTHREADS_LIB@
 SYSLIBS = @LIBS@
 DEPLIBSS = $(LIB)/libss@LIB_EXT@
 DEPLIBCOM_ERR = $(LIB)/libcom_err@LIB_EXT@
diff --git a/configure b/configure
index 7b0a0d1..5b89229 100755
--- a/configure
+++ b/configure
@@ -639,6 +639,7 @@ CYGWIN_CMT
 LINUX_CMT
 UNI_DIFF_OPTS
 SEM_INIT_LIB
+PTHREADS_LIB
 SOCKET_LIB
 SIZEOF_OFF_T
 SIZEOF_LONG_LONG
@@ -10474,7 +10475,7 @@ fi
 done
 
 fi
-for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
+for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -11235,6 +11236,52 @@ if test $ac_cv_have_optreset = yes; then
 $as_echo "#define HAVE_OPTRESET 1" >>confdefs.h
 
 fi
+PTHREADS_LIB='-lpthread'
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for pthread_create in -lpthread" >&5
+$as_echo_n "checking for pthread_create in -lpthread... " >&6; }
+if ${ac_cv_lib_pthread_pthread_create+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lpthread  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char pthread_create ();
+int
+main ()
+{
+return pthread_create ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_pthread_pthread_create=yes
+else
+  ac_cv_lib_pthread_pthread_create=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_pthread_pthread_create" >&5
+$as_echo "$ac_cv_lib_pthread_pthread_create" >&6; }
+if test "x$ac_cv_lib_pthread_pthread_create" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBPTHREAD 1
+_ACEOF
+
+  LIBS="-lpthread $LIBS"
+
+fi
+
 
 SEM_INIT_LIB=''
 ac_fn_c_check_func "$LINENO" "sem_init" "ac_cv_func_sem_init"
diff --git a/configure.in b/configure.in
index f28bd46..d2cfe41 100644
--- a/configure.in
+++ b/configure.in
@@ -961,6 +961,7 @@ AC_CHECK_HEADERS(m4_flatten([
 	sys/sockio.h
 	sys/stat.h
 	sys/syscall.h
+	sys/sysctl.h
 	sys/sysmacros.h
 	sys/time.h
 	sys/types.h
@@ -1173,6 +1174,11 @@ if test $ac_cv_have_optreset = yes; then
   AC_DEFINE(HAVE_OPTRESET, 1, [Define to 1 if optreset for getopt is present])
 fi
 dnl
+dnl Test for pthread_create in -lpthread
+dnl
+PTHREADS_LIB='-lpthread'
+AC_CHECK_LIB(pthread, pthread_create, AC_SUBST(PTHREADS_LIB))
+dnl
 dnl Test for sem_init, and which library it might require:
 dnl
 AH_TEMPLATE([HAVE_SEM_INIT], [Define to 1 if sem_init() exists])
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 2e08982..548df9c 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -16,13 +16,13 @@ MANPAGES=	e2fsck.8
 FMANPAGES=	e2fsck.conf.5
 
 LIBS= $(LIBQUOTA) $(LIBEXT2FS) $(LIBCOM_ERR) $(LIBBLKID) $(LIBUUID) \
-	$(LIBINTL) $(LIBE2P) $(SYSLIBS)
+	$(LIBINTL) $(LIBE2P) $(SYSLIBS) $(LIBPTHREADS)
 DEPLIBS= $(DEPLIBQUOTA) $(LIBEXT2FS) $(DEPLIBCOM_ERR) $(DEPLIBBLKID) \
 	 $(DEPLIBUUID) $(DEPLIBE2P)
 
 STATIC_LIBS= $(STATIC_LIBQUOTA) $(STATIC_LIBEXT2FS) $(STATIC_LIBCOM_ERR) \
 	     $(STATIC_LIBBLKID) $(STATIC_LIBUUID) $(LIBINTL) $(STATIC_LIBE2P) \
-	     $(SYSLIBS)
+	     $(SYSLIBS) $(LIBPTHEADS)
 STATIC_DEPLIBS= $(DEPSTATIC_LIBQUOTA) $(STATIC_LIBEXT2FS) \
 		$(DEPSTATIC_LIBCOM_ERR) $(DEPSTATIC_LIBBLKID) \
 		$(DEPSTATIC_LIBUUID) $(DEPSTATIC_LIBE2P)
diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 43ee063..820281d 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -208,6 +208,11 @@ option may prevent you from further manual data recovery.
 Do not attempt to discard free blocks and unused inode blocks. This option is
 exactly the opposite of discard option. This is set as default.
 .TP
+.BI readahead_mem_kb
+Use at most this many KiB to pre-fetch metadata in the hopes of reducing
+e2fsck runtime.  By default, this uses half the physical memory in the
+system; setting this value to zero disables readahead entirely.
+.TP
 .BI strict_csums
 Verify each metadata object's checksum before checking anything other fields
 in the metadata object.  If the verification fails, offer to clear the item,
diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 0ec1540..c5d823c 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -15,6 +15,10 @@
 #include "e2fsck.h"
 #include "problem.h"
 
+#ifdef HAVE_PTHREAD_H
+#include <pthread.h>
+#endif
+
 /*
  * This function allocates an e2fsck context
  */
@@ -44,6 +48,8 @@ errcode_t e2fsck_allocate_context(e2fsck_t *ret)
 			context->flags |= E2F_FLAG_TIME_INSANE;
 	}
 
+	e2fsck_init_thread(&context->ra_thread);
+
 	*ret = context;
 	return 0;
 }
@@ -209,6 +215,7 @@ int e2fsck_run(e2fsck_t ctx)
 {
 	int	i;
 	pass_t	e2fsck_pass;
+	errcode_t	err;
 
 #ifdef HAVE_SETJMP_H
 	if (setjmp(ctx->abort_loc)) {
@@ -226,6 +233,10 @@ int e2fsck_run(e2fsck_t ctx)
 		e2fsck_pass(ctx);
 		if (ctx->progress)
 			(void) (ctx->progress)(ctx, 0, 0, 0);
+		err = e2fsck_stop_thread(&ctx->ra_thread, NULL);
+		if (err)
+			com_err(ctx->program_name, err, "%s",
+				_("while stopping readahead"));
 	}
 	ctx->flags &= ~E2F_FLAG_SETJMP_OK;
 
@@ -233,3 +244,128 @@ int e2fsck_run(e2fsck_t ctx)
 		return (ctx->flags & E2F_FLAG_RUN_RETURN);
 	return 0;
 }
+
+#ifdef HAVE_PTHREAD_H
+struct run_threaded {
+	struct e2fsck_thread *thread;
+	void * (*func)(void *);
+	void (*cleanup)(void *);
+	void *arg;
+};
+
+static void run_threaded_cleanup(void *p)
+{
+	struct run_threaded *rt = p;
+
+	if (rt->cleanup)
+		rt->cleanup(rt->arg);
+	pthread_mutex_lock(&rt->thread->lock);
+	rt->thread->running = 0;
+	pthread_mutex_unlock(&rt->thread->lock);
+	ext2fs_free_mem(&rt);
+}
+
+static void *run_threaded_helper(void *p)
+{
+	int old;
+	struct run_threaded *rt = p;
+	void *ret;
+
+	pthread_cleanup_push(run_threaded_cleanup, rt);
+	pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+	ret = rt->func(rt->arg);
+	pthread_setcanceltype(old, NULL);
+	pthread_cleanup_pop(1);
+	pthread_exit(ret);
+	return NULL;
+}
+#endif /* HAVE_PTHREAD_H */
+
+errcode_t e2fsck_init_thread(struct e2fsck_thread *thread)
+{
+	errcode_t err = 0;
+
+	thread->magic = E2FSCK_ET_MAGIC_RUN_THREAD;
+#ifdef HAVE_PTHREAD_H
+	err = pthread_mutex_init(&thread->lock, NULL);
+#endif /* HAVE_PTHREAD_H */
+
+	return err;
+}
+
+errcode_t e2fsck_run_thread(struct e2fsck_thread *thread,
+			    void * (*func)(void *), void (*cleanup)(void *),
+			    void *arg)
+{
+#ifdef HAVE_PTHREAD_H
+	struct run_threaded *rt;
+#endif
+	errcode_t err = 0, err2;
+
+	EXT2_CHECK_MAGIC(thread, E2FSCK_ET_MAGIC_RUN_THREAD);
+#ifdef HAVE_PTHREAD_H
+	err = pthread_mutex_lock(&thread->lock);
+	if (err)
+		return err;
+
+	if (thread->running) {
+		err = EAGAIN;
+		goto out;
+	}
+
+	err = pthread_join(thread->tid, NULL);
+	if (err && err != ESRCH)
+		goto out;
+
+	err = ext2fs_get_mem(sizeof(*rt), &rt);
+	if (err)
+		goto out;
+
+	rt->thread = thread;
+	rt->func = func;
+	rt->cleanup = cleanup;
+	rt->arg = arg;
+
+	err = pthread_create(&thread->tid, NULL, run_threaded_helper, rt);
+	if (err)
+		ext2fs_free_mem(&rt);
+	else
+		thread->running = 1;
+out:
+	pthread_mutex_unlock(&thread->lock);
+#else
+	thread->ret = func(arg);
+	if (cleanup)
+		cleanup(arg);
+#endif /* HAVE_PTHREAD_H */
+
+	return err;
+}
+
+errcode_t e2fsck_stop_thread(struct e2fsck_thread *thread, void **ret)
+{
+	errcode_t err = 0, err2;
+
+	EXT2_CHECK_MAGIC(thread, E2FSCK_ET_MAGIC_RUN_THREAD);
+
+#ifdef HAVE_PTHREAD_H
+	err = pthread_mutex_lock(&thread->lock);
+	if (err)
+		return err;
+	if (thread->running)
+		err = pthread_cancel(thread->tid);
+	if (err == ESRCH)
+		err = 0;
+	err2 = pthread_mutex_unlock(&thread->lock);
+	if (!err && err2)
+		err = err2;
+	if (!err)
+		err = pthread_join(thread->tid, ret);
+	if (err == ESRCH)
+		err = 0;
+#else
+	if (ret)
+		*ret = thread->ret;
+#endif
+	return err;
+}
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index a8219a8..fcda392 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -205,6 +205,19 @@ of that type are squelched.  This can be useful if the console is slow
 (i.e., connected to a serial port) and so a large amount of output could
 end up delaying the boot process for a long time (potentially hours).
 .TP
+.I readahead_mem_pct
+Use no more than this percentage of memory to try to read in metadata blocks
+ahead of the main e2fsck thread.  This should reduce run times, depending on
+the speed of the underlying storage and the amount of free memory.  By default,
+this is set to 50%.
+.TP
+.I readahead_mem_kb
+Use no more than this amount of memory to read in metadata blocks ahead of the
+main checking thread.  Setting this value to zero disables readahead entirely.
+There is no default, but see
+.B readahead_mem_pct
+for more details.
+.TP
 .I report_features
 If this boolean relation is true, e2fsck will print the file system
 features as part of its verbose reporting (i.e., if the
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index c739329..59045bc 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -11,6 +11,7 @@
 
 #include <stdio.h>
 #include <string.h>
+#include <stdint.h>
 #ifdef HAVE_UNISTD_H
 #include <unistd.h>
 #endif
@@ -69,6 +70,24 @@
 
 #include "quota/mkquota.h"
 
+/* Functions to run something asynchronously */
+struct e2fsck_thread {
+	int magic;
+#ifdef HAVE_PTHREAD_H
+	int running;
+	pthread_t tid;
+	pthread_mutex_t lock;
+#else
+	void *ret;
+#endif /* HAVE_PTHREAD_T */
+};
+
+errcode_t e2fsck_init_thread(struct e2fsck_thread *thread);
+errcode_t e2fsck_run_thread(struct e2fsck_thread *thread,
+			    void * (*func)(void *), void (*cleanup)(void *),
+			    void *arg);
+errcode_t e2fsck_stop_thread(struct e2fsck_thread *thread, void **ret);
+
 /*
  * Exit codes used by fsck-type programs
  */
@@ -373,6 +392,10 @@ struct e2fsck_struct {
 	 * e2fsck functions themselves.
 	 */
 	void *priv_data;
+
+	/* How much are we allowed to readahead? */
+	unsigned long long readahead_mem_kb;
+	struct e2fsck_thread ra_thread;
 };
 
 /* Used by the region allocation code */
@@ -507,6 +530,7 @@ void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino);
 int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino);
 errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino);
 void e2fsck_rehash_directories(e2fsck_t ctx);
+int e2fsck_will_rehash_dirs(e2fsck_t ctx);
 
 /* sigcatcher.c */
 void sigcatcher_setup(void);
@@ -585,6 +609,7 @@ extern errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs,
 						   int default_type,
 						   const char *profile_name,
 						   ext2fs_block_bitmap *ret);
+int64_t get_memory_size(void);
 
 /* unix.c */
 extern void e2fsck_clear_progbar(e2fsck_t ctx);
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index eb9497c..376ee23 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -589,6 +589,67 @@ static errcode_t recheck_bad_inode_checksum(ext2_filsys fs, ext2_ino_t ino,
 	return 0;
 }
 
+struct pass1ra_ctx {
+	ext2_filsys fs;
+	dgrp_t group;
+	dgrp_t ngroups;
+};
+
+static void pass1_readahead_cleanup(void *p)
+{
+	struct pass1ra_ctx *c = p;
+
+	ext2fs_free_mem(&p);
+}
+
+static void *pass1_readahead(void *p)
+{
+	struct pass1ra_ctx *c = p;
+	errcode_t err;
+
+	e2fsck_readahead(c->fs, E2FSCK_READA_ITABLE, c->group, c->ngroups);
+	return NULL;
+}
+
+static errcode_t initiate_readahead(e2fsck_t ctx, dgrp_t group, dgrp_t ngroups)
+{
+	struct pass1ra_ctx *ractx;
+	errcode_t err;
+
+	err = ext2fs_get_mem(sizeof(*ractx), &ractx);
+	if (err)
+		return err;
+
+	ractx->fs = ctx->fs;
+	ractx->group = group;
+	ractx->ngroups = ngroups;
+
+	err = e2fsck_run_thread(&ctx->ra_thread, pass1_readahead,
+				pass1_readahead_cleanup, ractx);
+	if (err)
+		ext2fs_free_mem(&ractx);
+
+	return err;
+}
+
+static ext2_ino_t estimate_next_ra_inode(ext2_filsys fs, dgrp_t start,
+					 dgrp_t end)
+{
+	ext2_ino_t inodes_per_group = fs->super->s_inodes_per_group;
+	dgrp_t grp;
+
+	if (end >= fs->group_desc_count)
+		end = fs->group_desc_count - 1;
+
+	for (grp = end; grp >= start; grp--) {
+		if (ext2fs_bg_flags_test(fs, grp, EXT2_BG_INODE_UNINIT))
+			continue;
+		return grp * inodes_per_group;
+	}
+
+	return end * inodes_per_group;
+}
+
 void e2fsck_pass1(e2fsck_t ctx)
 {
 	int	i;
@@ -611,10 +672,40 @@ void e2fsck_pass1(e2fsck_t ctx)
 	int		busted_fs_time = 0;
 	int		inode_size;
 	int		failed_csum = 0;
+	dgrp_t		grp = 0;
+	ext2_ino_t	ra_threshold = 0, ino_threshold;
+	dgrp_t		ra_groups = 0;
+	ext2_ino_t	inodes_per_group = fs->super->s_inodes_per_group;
+	errcode_t	err;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	clear_problem_context(&pctx);
 
+	/* If we can do readahead, figure out how many groups to pull in. */
+	if (!e2fsck_can_readahead(ctx->fs))
+		ctx->readahead_mem_kb = 0;
+	if (ctx->readahead_mem_kb) {
+		ra_groups = ctx->readahead_mem_kb /
+			    (fs->inode_blocks_per_group * fs->blocksize /
+			     1024);
+		if (ra_groups > fs->group_desc_count)
+			ra_groups = fs->group_desc_count;
+		if (ra_groups < 16)
+			ra_groups = 0;
+		if (ra_groups) {
+			err = initiate_readahead(ctx, grp, ra_groups);
+			if (err) {
+				com_err(ctx->program_name, err, "%s",
+					_("while starting pass1 readahead"));
+				ra_groups = 0;
+			}
+			ra_threshold = ra_groups *
+				       inodes_per_group;
+			ino_threshold = estimate_next_ra_inode(fs, 0,
+					ra_groups * 9 / 10);
+		}
+	}
+
 	if (!(ctx->options & E2F_OPT_PREEN))
 		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
 
@@ -774,10 +865,23 @@ void e2fsck_pass1(e2fsck_t ctx)
 	(void) e2fsck_get_lost_and_found(ctx, 0);
 
 	while (1) {
-		if (ino % (fs->super->s_inodes_per_group * 4) == 1) {
+
+		if (ino % (inodes_per_group * 4) == 1) {
 			if (e2fsck_mmp_update(fs))
 				fatal_error(ctx, 0);
 		}
+		if (ra_groups > 0 && ino > ino_threshold) {
+			grp = (ra_threshold - 1) / inodes_per_group;
+			err = initiate_readahead(ctx, grp, ra_groups);
+			if (err == EAGAIN)
+				ra_groups /= 2;
+			else if (err)
+				com_err(ctx->program_name, err, "%s",
+					_("while starting pass1 readahead"));
+			ra_threshold += ra_groups * inodes_per_group;
+			ino_threshold = estimate_next_ra_inode(fs, grp,
+					grp + (ra_groups * 9 / 10));
+		}
 		old_op = ehandler_operation(_("getting next inode from scan"));
 		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
 							  inode, inode_size);
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 95f51b7..1667292 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -61,6 +61,9 @@
  * Keeps track of how many times an inode is referenced.
  */
 static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf);
+static int check_dir_block2(ext2_filsys fs,
+			   struct ext2_db_entry2 *dir_blocks_info,
+			   void *priv_data);
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *dir_blocks_info,
 			   void *priv_data);
@@ -77,8 +80,67 @@ struct check_dir_struct {
 	struct problem_context	pctx;
 	int	count, max;
 	e2fsck_t ctx;
+	int	save_readahead;
+};
+
+struct pass2_readahead_data {
+	ext2_filsys fs;
+	ext2_dblist dblist;
 };
 
+static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
+			       void *priv_data)
+{
+	db->blockcnt = 1;
+	return 0;
+}
+
+static void pass2_readahead_cleanup(void *p)
+{
+	struct pass2_readahead_data *pr = p;
+
+	ext2fs_free_dblist(pr->dblist);
+	ext2fs_free_mem(&pr);
+}
+
+static void *pass2_readahead(void *p)
+{
+	struct pass2_readahead_data *pr = p;
+
+	e2fsck_readahead_dblist(pr->fs, 0, pr->dblist);
+	return NULL;
+}
+
+static errcode_t initiate_readahead(e2fsck_t ctx)
+{
+	struct pass2_readahead_data *pr;
+	errcode_t err;
+
+	err = ext2fs_get_mem(sizeof(*pr), &pr);
+	if (err)
+		return err;
+	pr->fs = ctx->fs;
+	err = ext2fs_copy_dblist(ctx->fs->dblist, &pr->dblist);
+	if (err)
+		goto out_pr;
+	err = ext2fs_dblist_iterate2(pr->dblist, readahead_dir_block,
+				     NULL);
+	if (err)
+		goto out_dblist;
+	err = e2fsck_run_thread(&ctx->ra_thread, pass2_readahead,
+				pass2_readahead_cleanup, pr);
+	if (err)
+		goto out_dblist;
+
+	return 0;
+
+out_dblist:
+	ext2fs_free_dblist(pr->dblist);
+out_pr:
+	ext2fs_free_mem(&pr);
+	return err;
+}
+
 void e2fsck_pass2(e2fsck_t ctx)
 {
 	struct ext2_super_block *sb = ctx->fs->super;
@@ -96,6 +158,10 @@ void e2fsck_pass2(e2fsck_t ctx)
 	int			i, depth;
 	problem_t		code;
 	int			bad_dir;
+	int (*check_dir_func)(ext2_filsys fs,
+			      struct ext2_db_entry2 *dir_blocks_info,
+			      void *priv_data);
+	errcode_t		err;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	clear_problem_context(&cd.pctx);
@@ -139,6 +205,7 @@ void e2fsck_pass2(e2fsck_t ctx)
 	cd.ctx = ctx;
 	cd.count = 1;
 	cd.max = ext2fs_dblist_count2(fs->dblist);
+	cd.save_readahead = e2fsck_will_rehash_dirs(ctx);
 
 	if (ctx->progress)
 		(void) (ctx->progress)(ctx, 2, 0, cd.max);
@@ -146,7 +213,16 @@ void e2fsck_pass2(e2fsck_t ctx)
 	if (fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX)
 		ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);
 
-	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_block,
+	if (ctx->readahead_mem_kb) {
+		check_dir_func = check_dir_block2;
+		err = initiate_readahead(ctx);
+		if (err)
+			com_err(ctx->program_name, err, "%s",
+				_("while starting pass2 readahead"));
+	} else
+		check_dir_func = check_dir_block;
+
+	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_func,
 						 &cd);
 	if (ctx->flags & E2F_FLAG_SIGNAL_MASK || ctx->flags & E2F_FLAG_RESTART)
 		return;
@@ -655,6 +731,7 @@ clear_and_exit:
 	clear_htree(cd->ctx, cd->pctx.ino);
 	dx_dir->numblocks = 0;
 	e2fsck_rehash_dir_later(cd->ctx, cd->pctx.ino);
+	cd->save_readahead = 1;
 }
 #endif /* ENABLE_HTREE */
 
@@ -774,6 +851,19 @@ static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
 	return 0;
 }
 
+static int check_dir_block2(ext2_filsys fs,
+			   struct ext2_db_entry2 *db,
+			   void *priv_data)
+{
+	int err;
+	struct check_dir_struct *cd = priv_data;
+
+	err = check_dir_block(fs, db, priv_data);
+	if (!cd->save_readahead)
+		io_channel_cache_release(fs->io, db->blk, 1);
+	return err;
+}
+
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *db,
 			   void *priv_data)
@@ -957,6 +1047,7 @@ out_htree:
 					 &cd->pctx))
 				goto skip_checksum;
 			e2fsck_rehash_dir_later(ctx, ino);
+			cd->save_readahead = 1;
 			goto skip_checksum;
 		}
 		if (failed_csum) {
@@ -1249,6 +1340,7 @@ skip_checksum:
 			pctx.dirent = dirent;
 			fix_problem(ctx, PR_2_REPORT_DUP_DIRENT, &pctx);
 			e2fsck_rehash_dir_later(ctx, ino);
+			cd->save_readahead = 1;
 			dups_found++;
 		} else
 			dict_alloc_insert(&de_dict, dirent, dirent);
@@ -1316,6 +1408,7 @@ skip_checksum:
 			if (insert_dirent_tail(fs, buf) == 0)
 				goto write_and_fix;
 			e2fsck_rehash_dir_later(ctx, ino);
+			cd->save_readahead = 1;
 		}
 
 write_and_fix:
diff --git a/e2fsck/pass4.c b/e2fsck/pass4.c
index 21d93f0..6cebfa3 100644
--- a/e2fsck/pass4.c
+++ b/e2fsck/pass4.c
@@ -87,6 +87,21 @@ static int disconnect_inode(e2fsck_t ctx, ext2_ino_t i,
 	return 0;
 }
 
+/* Since pass4 is mostly CPU bound, start readahead of bitmaps for pass 5. */
+static void *pass5_readahead(void *p)
+{
+	ext2_filsys fs = p;
+
+	e2fsck_readahead(fs, E2FSCK_READA_BBITMAP | E2FSCK_READA_IBITMAP, 0,
+			 fs->group_desc_count);
+	return NULL;
+}
+
+static errcode_t initiate_readahead(e2fsck_t ctx)
+{
+	return e2fsck_run_thread(&ctx->ra_thread, pass5_readahead, NULL,
+				 ctx->fs);
+}
 
 void e2fsck_pass4(e2fsck_t ctx)
 {
@@ -100,12 +115,19 @@ void e2fsck_pass4(e2fsck_t ctx)
 	__u16	link_count, link_counted;
 	char	*buf = 0;
 	dgrp_t	group, maxgroup;
+	errcode_t	err;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 
 #ifdef MTRACE
 	mtrace_print("Pass 4");
 #endif
+	if (ctx->readahead_mem_kb) {
+		err = initiate_readahead(ctx);
+		if (err)
+			com_err(ctx->program_name, err, "%s",
+				_("while starting pass5 readahead"));
+	}
 
 	clear_problem_context(&pctx);
 
diff --git a/e2fsck/prof_err.et b/e2fsck/prof_err.et
index c9316c7..21fb524 100644
--- a/e2fsck/prof_err.et
+++ b/e2fsck/prof_err.et
@@ -62,5 +62,6 @@ error_code	PROF_BAD_INTEGER,		"Invalid integer value"
 
 error_code	PROF_MAGIC_FILE_DATA, "Bad magic value in profile_file_data_t"
 
+error_code	E2FSCK_ET_MAGIC_RUN_THREAD,	"Wrong magic number for e2fsck_thread structure"
 
 end
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 3b05715..89708c2 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -71,6 +71,16 @@ int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino)
 	return ext2fs_u32_list_test(ctx->dirs_to_hash, ino);
 }
 
+/* Ask if there will be a pass 3A. */
+int e2fsck_will_rehash_dirs(e2fsck_t ctx)
+{
+	if (ctx->options & E2F_OPT_COMPRESS_DIRS)
+		return 1;
+	if (!ctx->dirs_to_hash)
+		return 0;
+	return ext2fs_u32_list_count(ctx->dirs_to_hash) > 0;
+}
+
 struct fill_dir_struct {
 	char *buf;
 	struct ext2_inode *inode;
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index c6cdb49..da888c2 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -643,6 +643,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 	char	*buf, *token, *next, *p, *arg;
 	int	ea_ver;
 	int	extended_usage = 0;
+	unsigned long long reada_kb;
 
 	buf = string_copy(ctx, opts, 0);
 	for (token = buf; token && *token; token = next) {
@@ -671,6 +672,15 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 				continue;
 			}
 			ctx->ext_attr_ver = ea_ver;
+		} else if (strcmp(token, "readahead_mem_kb") == 0) {
+			reada_kb = strtoull(arg, &p, 0);
+			if (*p) {
+				fprintf(stderr, "%s",
+					_("Invalid readahead buffer size.\n"));
+				extended_usage++;
+				continue;
+			}
+			ctx->readahead_mem_kb = reada_kb;
 		} else if (strcmp(token, "fragcheck") == 0) {
 			ctx->options |= E2F_OPT_FRAGCHECK;
 			continue;
@@ -716,6 +726,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs(("\tnodiscard\n"), stderr);
 		fputs(("\tstrict_csums\n"), stderr);
 		fputs(("\tno_strict_csums\n"), stderr);
+		fputs(("\treadahead_mem_kb=<buffer size>\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
@@ -749,6 +760,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 #ifdef CONFIG_JBD_DEBUG
 	char 		*jbd_debug;
 #endif
+	unsigned long long phys_mem_kb;
 
 	retval = e2fsck_allocate_context(&ctx);
 	if (retval)
@@ -776,6 +788,8 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	else
 		ctx->program_name = "e2fsck";
 
+	phys_mem_kb = get_memory_size() / 1024;
+	ctx->readahead_mem_kb = ~0ULL;
 	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
 		switch (c) {
 		case 'C':
@@ -965,6 +979,22 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	if (c)
 		verbose = 1;
 
+	/* Figure out how much memory goes to readahead */
+	if (ctx->readahead_mem_kb == ~0ULL) {
+		profile_get_integer(ctx->profile, "options",
+				    "readahead_mem_pct", 0, 50, &c);
+		if (c >= 0 && c <= 100)
+			ctx->readahead_mem_kb = phys_mem_kb * c / 100;
+		else
+			ctx->readahead_mem_kb = phys_mem_kb / 2;
+		profile_get_integer(ctx->profile, "options",
+				    "readahead_mem_kb", 0, -1, &c);
+		if (c >= 0)
+			ctx->readahead_mem_kb = c;
+	}
+	if (ctx->readahead_mem_kb > phys_mem_kb)
+		ctx->readahead_mem_kb = phys_mem_kb;
+
 	/* Turn off discard in read-only mode */
 	if ((ctx->options & E2F_OPT_NO) &&
 	    (ctx->options & E2F_OPT_DISCARD))
@@ -1781,6 +1811,11 @@ no_journal:
 		}
 	}
 
+	retval = e2fsck_stop_thread(&ctx->ra_thread, NULL);
+	if (retval)
+		com_err(ctx->program_name, retval, "%s",
+			_("while stopping readahead"));
+
 	e2fsck_write_bitmaps(ctx);
 	io_channel_flush(ctx->fs->io);
 	print_resource_track(ctx, NULL, &ctx->global_rtrack, ctx->fs->io);
diff --git a/e2fsck/util.c b/e2fsck/util.c
index fec6179..09b78c2 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -37,6 +37,10 @@
 #include <errno.h>
 #endif
 
+#ifdef HAVE_SYS_SYSCTL_H
+#include <sys/sysctl.h>
+#endif
+
 #include "e2fsck.h"
 
 extern e2fsck_t e2fsck_global_ctx;   /* Try your very best not to use this! */
@@ -845,3 +849,50 @@ errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs, const char *descr,
 	fs->default_bitmap_type = save_type;
 	return retval;
 }
+
+/* Return memory size in bytes */
+int64_t get_memory_size(void)
+{
+#if defined(_SC_PHYS_PAGES)
+# if defined(_SC_PAGESIZE)
+	return (int64_t)sysconf(_SC_PHYS_PAGES) *
+	       (int64_t)sysconf(_SC_PAGESIZE);
+# elif defined(_SC_PAGE_SIZE)
+	return (int64_t)sysconf(_SC_PHYS_PAGES) *
+	       (int64_t)sysconf(_SC_PAGE_SIZE);
+# endif
+#elif defined(_SC_AIX_REALMEM)
+	return (int64_t)sysconf(_SC_AIX_REALMEM) * (int64_t)1024L;
+#elif defined(CTL_HW)
+# if (defined(HW_MEMSIZE) || defined(HW_PHYSMEM64))
+#  define CTL_HW_INT64
+# elif (defined(HW_PHYSMEM) || defined(HW_REALMEM))
+#  define CTL_HW_UINT
+# endif
+	int mib[2];
+	mib[0] = CTL_HW;
+# if defined(HW_MEMSIZE)
+	mib[1] = HW_MEMSIZE;
+# elif defined(HW_PHYSMEM64)
+	mib[1] = HW_PHYSMEM64;
+# elif defined(HW_REALMEM)
+	mib[1] = HW_REALMEM;
+# elif defined(HW_PYSMEM)
+	mib[1] = HW_PHYSMEM;
+# endif
+# if defined(CTL_HW_INT64)
+	int64_t size = 0;
+# elif defined(CTL_HW_UINT)
+	unsigned int size = 0;
+# endif
+# if defined(CTL_HW_INT64) || defined(CTL_HW_UINT)
+	size_t len = sizeof(size);
+	if (sysctl(mib, 2, &size, &len, NULL, 0) == 0)
+		return (int64_t)size;
+# endif
+	return 0;
+#else
+# warning "Don't know how to detect memory on your platform?"
+	return 0;
+#endif
+}
diff --git a/lib/config.h.in b/lib/config.h.in
index e0384ee..836c2df 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -203,6 +203,9 @@
 /* Define if your <locale.h> file defines LC_MESSAGES. */
 #undef HAVE_LC_MESSAGES
 
+/* Define to 1 if you have the `pthread' library (-lpthread). */
+#undef HAVE_LIBPTHREAD
+
 /* Define to 1 if you have the <limits.h> header file. */
 #undef HAVE_LIMITS_H
 
@@ -314,6 +317,9 @@
 /* Define to 1 if you have the `pread' function. */
 #undef HAVE_PREAD
 
+/* Define to 1 if you have the <pthread.h> header file. */
+#undef HAVE_PTHREAD_H
+
 /* Define to 1 if you have the `putenv' function. */
 #undef HAVE_PUTENV
 
@@ -465,6 +471,9 @@
 /* Define to 1 if you have the <sys/syscall.h> header file. */
 #undef HAVE_SYS_SYSCALL_H
 
+/* Define to 1 if you have the <sys/sysctl.h> header file. */
+#undef HAVE_SYS_SYSCTL_H
+
 /* Define to 1 if you have the <sys/sysmacros.h> header file. */
 #undef HAVE_SYS_SYSMACROS_H
 


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 25/37] libext2fs: when appending to a file, don't split an index block in equal halves
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (23 preceding siblings ...)
  2014-05-01 23:14 ` [PATCH 24/37] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-08-02 23:43   ` Theodore Ts'o
  2014-05-01 23:15 ` [PATCH 26/37] libext2fs: find inode goal when allocating blocks Darrick J. Wong
                   ` (9 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're appending an extent to the end of a file and the index
block is full, don't split the index block into two half-full index
blocks because this leaves us with under utilized index blocks, at
least in the fallocate case.  Instead, copy the last extent from the
full block into the new block.  This isn't perfect utilization, but
there's a lot of work involved in teaching extent.c to be able to goto
a nonexistent node in a newly allocated (and empty) extent block.

This patch does not fix the general problem of keeping the extent tree
balanced.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/extent.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 7 deletions(-)


diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 30673b5..c0b34a7 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -29,6 +29,8 @@
 #include "ext2fsP.h"
 #include "e2image.h"
 
+#undef DEBUG
+
 /*
  * Definitions to be dropped in lib/ext2fs/ext2fs.h
  */
@@ -122,11 +124,39 @@ static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
 
 }
 
+static void dump_path(const char *tag, struct ext2_extent_handle *handle,
+		      struct extent_path *path)
+{
+	struct extent_path *ppp = path;
+	printf("%s: level=%d\n", tag, handle->level);
+
+	do {
+		printf("%s: path=%ld buf=%p entries=%d max_entries=%d left=%d "
+		       "visit_num=%d flags=0x%x end_blk=%llu curr=%p(%ld)\n",
+		       tag, (ppp - handle->path), ppp->buf, ppp->entries,
+		       ppp->max_entries, ppp->left, ppp->visit_num, ppp->flags,
+		       ppp->end_blk, ppp->curr, ppp->curr - (void *)ppp->buf);
+		printf("  ");
+		dbg_show_header((struct ext3_extent_header *)ppp->buf);
+		if (ppp->curr) {
+			printf("  ");
+			dbg_show_index(ppp->curr);
+			printf("  ");
+			dbg_show_extent(ppp->curr);
+		}
+		ppp--;
+	} while (ppp >= handle->path);
+	fflush(stdout);
+
+	return;
+}
+
 #else
 #define dbg_show_header(eh) do { } while (0)
 #define dbg_show_index(ix) do { } while (0)
 #define dbg_show_extent(ex) do { } while (0)
 #define dbg_print_extent(desc, ex) do { } while (0)
+#define dump_path(tag, handle, path) do { } while (0)
 #endif
 
 /*
@@ -837,12 +867,31 @@ errcode_t ext2fs_extent_replace(ext2_extent_handle_t handle,
 	return 0;
 }
 
+static int splitting_at_eof(struct ext2_extent_handle *handle,
+			    struct extent_path *path)
+{
+	struct extent_path *ppp = path;
+	dump_path(__func__, handle, path);
+
+	if (handle->level == 0)
+		return 0;
+
+	do {
+		if (ppp->left)
+			return 0;
+		ppp--;
+	} while (ppp >= handle->path);
+
+	return 1;
+}
+
 /*
  * allocate a new block, move half the current node to it, and update parent
  *
  * handle will be left pointing at original record.
  */
-errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
+static errcode_t extent_node_split(ext2_extent_handle_t handle,
+				   int expand_allowed)
 {
 	errcode_t			retval = 0;
 	blk64_t				new_node_pblk;
@@ -857,6 +906,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 	int				tocopy;
 	int				new_root = 0;
 	struct ext2_extent_info		info;
+	int				no_balance;
 
 	/* basic sanity */
 	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EXTENT_HANDLE);
@@ -897,7 +947,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 			goto done;
 		goal_blk = extent.e_pblk;
 
-		retval = ext2fs_extent_node_split(handle);
+		retval = extent_node_split(handle, expand_allowed);
 		if (retval)
 			goto done;
 
@@ -912,6 +962,14 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 	if (!path->curr)
 		return EXT2_ET_NO_CURRENT_NODE;
 
+	/*
+	 * Normally, we try to split a full node in half.  This doesn't turn
+	 * out so well if we're tacking extents on the end of the file because
+	 * then we're stuck with a tree of half-full extent blocks.  This of
+	 * course doesn't apply to the root level.
+	 */
+	no_balance = expand_allowed ? splitting_at_eof(handle, path) : 0;
+
 	/* extent header of the current node we'll split */
 	eh = (struct ext3_extent_header *)path->buf;
 
@@ -925,7 +983,10 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 		if (retval)
 			goto done;
 	} else {
-		tocopy = ext2fs_le16_to_cpu(eh->eh_entries) / 2;
+		if (no_balance)
+			tocopy = 1;
+		else
+			tocopy = ext2fs_le16_to_cpu(eh->eh_entries) / 2;
 	}
 
 #ifdef DEBUG
@@ -934,7 +995,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 				handle->level);
 #endif
 
-	if (!tocopy) {
+	if (!tocopy && !no_balance) {
 #ifdef DEBUG
 		printf("Nothing to copy to new block!\n");
 #endif
@@ -1059,8 +1120,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 		goto done;
 
 	/* new node hooked in, so update inode block count (do this here?) */
-	handle->inode->i_blocks += (handle->fs->blocksize *
-				    EXT2FS_CLUSTER_RATIO(handle->fs)) / 512;
+	ext2fs_iblk_add_blocks(handle->fs, handle->inode, 1);
 	retval = ext2fs_write_inode(handle->fs, handle->ino,
 				    handle->inode);
 	if (retval)
@@ -1074,6 +1134,11 @@ done:
 	return retval;
 }
 
+errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
+{
+	return extent_node_split(handle, 0);
+}
+
 errcode_t ext2fs_extent_insert(ext2_extent_handle_t handle, int flags,
 				      struct ext2fs_extent *extent)
 {
@@ -1105,7 +1170,7 @@ errcode_t ext2fs_extent_insert(ext2_extent_handle_t handle, int flags,
 			printf("node full (level %d) - splitting\n",
 				   handle->level);
 #endif
-			retval = ext2fs_extent_node_split(handle);
+			retval = extent_node_split(handle, 1);
 			if (retval)
 				return retval;
 			path = handle->path + handle->level;


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 26/37] libext2fs: find inode goal when allocating blocks
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (24 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 25/37] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-05-01 23:15 ` [PATCH 27/37] libext2fs: find a range of empty blocks Darrick J. Wong
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Try to be a little smarter about where we go to allocate blocks for a
inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass2.c         |    3 ++-
 lib/ext2fs/alloc.c     |   10 ++++++++++
 lib/ext2fs/bmap.c      |    5 +++--
 lib/ext2fs/expanddir.c |    2 +-
 lib/ext2fs/ext2fs.h    |    1 +
 lib/ext2fs/ext_attr.c  |    3 +--
 lib/ext2fs/extent.c    |   10 ++--------
 lib/ext2fs/mkdir.c     |    3 ++-
 lib/ext2fs/symlink.c   |    3 ++-
 9 files changed, 24 insertions(+), 16 deletions(-)


diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 1667292..4b19cb8 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1729,7 +1729,8 @@ static int allocate_dir_block(e2fsck_t ctx,
 	/*
 	 * First, find a free block
 	 */
-	pctx->errcode = ext2fs_new_block2(fs, 0, ctx->block_found_map, &blk);
+	blk = ext2fs_find_inode_goal(fs, db->ino);
+	pctx->errcode = ext2fs_new_block2(fs, blk, ctx->block_found_map, &blk);
 	if (pctx->errcode) {
 		pctx->str = "ext2fs_new_block";
 		fix_problem(ctx, PR_2_ALLOC_DIRBOCK, pctx);
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 1be4ecc..aa084ac 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -293,3 +293,13 @@ void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 
 	fs->get_alloc_block = func;
 }
+
+blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino)
+{
+	dgrp_t	group = ext2fs_group_of_ino(fs, ino);
+	__u8	log_flex = fs->super->s_log_groups_per_flex;
+
+	if (log_flex)
+		group = group & ~((1 << (log_flex)) - 1);
+	return ext2fs_group_first_block2(fs, group);
+}
diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index a4dc8ef..7623052 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -252,7 +252,7 @@ got_block:
 		retval = extent_bmap(fs, ino, inode, handle, block_buf,
 				     0, block-1, 0, blocks_alloc, &blk64);
 		if (retval)
-			blk64 = 0;
+			blk64 = ext2fs_find_inode_goal(fs, ino);
 		retval = ext2fs_alloc_block2(fs, blk64, block_buf,
 					     &blk64);
 		if (retval)
@@ -368,7 +368,8 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 		}
 
 		*phys_blk = inode_bmap(inode, block);
-		b = block ? inode_bmap(inode, block-1) : 0;
+		b = block ? inode_bmap(inode, block-1) :
+			    ext2fs_find_inode_goal(fs, ino);
 
 		if ((*phys_blk == 0) && (bmap_flags & BMAP_ALLOC)) {
 			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
diff --git a/lib/ext2fs/expanddir.c b/lib/ext2fs/expanddir.c
index d0f7287..2df49ce 100644
--- a/lib/ext2fs/expanddir.c
+++ b/lib/ext2fs/expanddir.c
@@ -111,7 +111,7 @@ errcode_t ext2fs_expand_dir(ext2_filsys fs, ext2_ino_t dir)
 
 	es.done = 0;
 	es.err = 0;
-	es.goal = 0;
+	es.goal = ext2fs_find_inode_goal(fs, dir);
 	es.newblocks = 0;
 	es.dir = dir;
 
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 819a14a..09423ac 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -690,6 +690,7 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 					    errcode_t (**old)(ext2_filsys fs,
 							      blk64_t goal,
 							      blk64_t *ret));
+blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino);
 
 /* alloc_sb.c */
 extern int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 308d21d..a756b7b 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -404,8 +404,7 @@ static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
 	}
 
 	/* Allocate a block */
-	grp = ext2fs_group_of_ino(fs, ino);
-	goal = ext2fs_inode_table_loc(fs, grp);
+	goal = ext2fs_find_inode_goal(fs, ino);
 	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
 	if (err)
 		goto out2;
diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index c0b34a7..3b27113 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -1010,14 +1010,8 @@ static errcode_t extent_node_split(ext2_extent_handle_t handle,
 		goto done;
 	}
 
-	if (!goal_blk) {
-		dgrp_t	group = ext2fs_group_of_ino(handle->fs, handle->ino);
-		__u8	log_flex = handle->fs->super->s_log_groups_per_flex;
-
-		if (log_flex)
-			group = group & ~((1 << (log_flex)) - 1);
-		goal_blk = ext2fs_group_first_block2(handle->fs, group);
-	}
+	if (!goal_blk)
+		goal_blk = ext2fs_find_inode_goal(handle->fs, handle->ino);
 	retval = ext2fs_alloc_block2(handle->fs, goal_blk, block_buf,
 				    &new_node_pblk);
 	if (retval)
diff --git a/lib/ext2fs/mkdir.c b/lib/ext2fs/mkdir.c
index c4c7967..36b1810 100644
--- a/lib/ext2fs/mkdir.c
+++ b/lib/ext2fs/mkdir.c
@@ -69,7 +69,8 @@ errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum,
 	 * Allocate a data block for the directory
 	 */
 	if (!inline_data) {
-		retval = ext2fs_new_block2(fs, 0, 0, &blk);
+		retval = ext2fs_new_block2(fs, ext2fs_find_inode_goal(fs, ino),
+					   0, &blk);
 		if (retval)
 			goto cleanup;
 	}
diff --git a/lib/ext2fs/symlink.c b/lib/ext2fs/symlink.c
index b2ef66c..cb3a2e7 100644
--- a/lib/ext2fs/symlink.c
+++ b/lib/ext2fs/symlink.c
@@ -53,7 +53,8 @@ errcode_t ext2fs_symlink(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t ino,
 	 */
 	fastlink = (target_len < sizeof(inode.i_block));
 	if (!fastlink) {
-		retval = ext2fs_new_block2(fs, 0, 0, &blk);
+		retval = ext2fs_new_block2(fs, ext2fs_find_inode_goal(fs, ino),
+					   0, &blk);
 		if (retval)
 			goto cleanup;
 		retval = ext2fs_get_mem(fs->blocksize, &block_buf);


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 27/37] libext2fs: find a range of empty blocks
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (25 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 26/37] libext2fs: find inode goal when allocating blocks Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-05-01 23:15 ` [PATCH 28/37] libext2fs: provide a function to set inode size Darrick J. Wong
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide a function that, given a goal pblk and a range, will try to
find a run of free blocks to satisfy the allocation.  By default the
function will look anywhere in the filesystem for the run, though this
can be constrained with optional flags.  One flag indicates that the
range must start at the goal block; the other flag indicates that we
should not return a range shorter than len.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/alloc.c  |  105 +++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/ext2fs.h |    6 +++
 2 files changed, 111 insertions(+)


diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index aa084ac..109a050 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -26,6 +26,16 @@
 #include "ext2_fs.h"
 #include "ext2fs.h"
 
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
 /*
  * Clear the uninit block bitmap flag if necessary
  */
@@ -303,3 +313,98 @@ blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino)
 		group = group & ~((1 << (log_flex)) - 1);
 	return ext2fs_group_first_block2(fs, group);
 }
+
+/*
+ * Starting at _goal_, scan around the filesystem to find a run of free blocks
+ * that's at least _len_ blocks long.  If EXT2_NEWRANGE_EXACT_GOAL is given,
+ * then the range of blocks must start at _goal_.  If
+ * EXT2_NEWRANGE_EXACT_LENGTH is given, do not return a allocation shorter than
+ * _len_.
+ *
+ * The starting block is returned in _pblk_ and the length is returned via
+ * _plen_.
+ */
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+			   blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+			   blk64_t *plen)
+{
+	errcode_t retval;
+	blk64_t start, end, b;
+	int looped = 0;
+	blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+
+	dbg_printf("%s: flags=0x%x goal=%llu len=%llu\n", __func__, flags,
+		   goal, len);
+	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+	if (len == 0 || (flags & ~EXT2_NEWRANGE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+	if (!map)
+		map = fs->block_map;
+	if (!map)
+		return EXT2_ET_NO_BLOCK_BITMAP;
+	if (!goal || goal >= ext2fs_blocks_count(fs->super))
+		goal = fs->super->s_first_data_block;
+
+	start = goal;
+	while (!looped || start <= goal) {
+		retval = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+						start, max_blocks - 1, &start);
+		if (retval == ENOENT) {
+			/*
+			 * If there are no free blocks beyond the starting
+			 * point, try scanning the whole filesystem, unless the
+			 * user told us only to allocate from _goal_, or if
+			 * we're already scanning the whole filesystem.
+			 */
+			if (flags & EXT2_NEWRANGE_FIXED_GOAL ||
+			    start == fs->super->s_first_data_block)
+				goto fail;
+			start = fs->super->s_first_data_block;
+			continue;
+		} else if (retval)
+			goto errout;
+
+		if (flags & EXT2_NEWRANGE_FIXED_GOAL && start != goal)
+			goto fail;
+
+		b = min(start + len - 1, max_blocks - 1);
+		retval =  ext2fs_find_first_set_block_bitmap2(fs->block_map,
+						start, b, &end);
+		if (retval == ENOENT)
+			end = b + 1;
+		else if (retval)
+			goto errout;
+
+		if (!(flags & EXT2_NEWRANGE_EXACT_LENGTH) ||
+		    (end - start) >= len) {
+			*pblk = start;
+			*plen = end - start;
+			dbg_printf("%s: new_range goal=%llu--%llu "
+				   "blk=%llu--%llu %llu\n",
+				   __func__, goal, goal + len - 1,
+				   *pblk, *pblk + *plen - 1, *plen);
+
+			for (b = start; b < end;
+			     b += fs->super->s_blocks_per_group)
+				clear_block_uninit(fs,
+						ext2fs_group_of_blk2(fs, b));
+			return 0;
+		}
+
+try_again:
+		if (flags & EXT2_NEWRANGE_FIXED_GOAL)
+			goto fail;
+		start = end;
+		if (start >= max_blocks) {
+			if (looped)
+				goto fail;
+			looped = 1;
+			start = fs->super->s_first_data_block;
+		}
+	}
+
+fail:
+	retval = EXT2_ET_BLOCK_ALLOC_FAIL;
+errout:
+	return retval;
+}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 09423ac..ca35d24 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -691,6 +691,12 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 							      blk64_t goal,
 							      blk64_t *ret));
 blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino);
+#define EXT2_NEWRANGE_FIXED_GOAL	(0x1)
+#define EXT2_NEWRANGE_EXACT_LENGTH	(0x2)
+#define EXT2_NEWRANGE_ALL_FLAGS		(0x3)
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+			   blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+			   blk64_t *plen);
 
 /* alloc_sb.c */
 extern int ext2fs_reserve_super_and_bgd(ext2_filsys fs,


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 28/37] libext2fs: provide a function to set inode size
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (26 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 27/37] libext2fs: find a range of empty blocks Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-07-26 18:37   ` Theodore Ts'o
  2014-05-01 23:15 ` [PATCH 29/37] libext2fs: implement fallocate Darrick J. Wong
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide an API to set i_size in an inode and take care of all required
feature flag modifications.  Refactor the code to use this new
function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c              |    9 ++++-----
 e2fsck/pass2.c              |   11 +++++++++--
 e2fsck/pass3.c              |    5 +++--
 e2fsck/rehash.c             |    5 ++++-
 lib/ext2fs/bb_inode.c       |    5 ++++-
 lib/ext2fs/ext2fs.h         |    2 ++
 lib/ext2fs/fileio.c         |   41 ++++++++++++++++++++++++++++-------------
 lib/ext2fs/mkjournal.c      |    8 +++-----
 lib/ext2fs/res_gdt.c        |    9 +++------
 lib/ext2fs/symlink.c        |    2 +-
 misc/create_inode.c         |    7 ++++++-
 tests/f_big_sparse/expect.1 |    5 -----
 12 files changed, 67 insertions(+), 42 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 376ee23..0705899 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -265,8 +265,7 @@ static void check_size(e2fsck_t ctx, struct problem_context *pctx)
 	if (!fix_problem(ctx, PR_1_SET_NONZSIZE, pctx))
 		return;
 
-	inode->i_size = 0;
-	inode->i_size_high = 0;
+	ext2fs_inode_set_size(ctx->fs, inode, 0);
 	e2fsck_write_inode(ctx, pctx->ino, pctx->inode, "pass1");
 }
 
@@ -2454,9 +2453,9 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 		pctx->num = (pb.last_block+1) * fs->blocksize;
 		pctx->group = bad_size;
 		if (fix_problem(ctx, PR_1_BAD_I_SIZE, pctx)) {
-			inode->i_size = pctx->num;
-			if (!LINUX_S_ISDIR(inode->i_mode))
-				inode->i_size_high = pctx->num >> 32;
+			if (LINUX_S_ISDIR(inode->i_mode))
+				pctx->num &= 0xFFFFFFFFULL;
+			ext2fs_inode_set_size(fs, inode, pctx->num);
 			dirty_inode++;
 		}
 		pctx->num = 0;
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 4b19cb8..7a597de 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1768,8 +1768,15 @@ static int allocate_dir_block(e2fsck_t ctx,
 	 */
 	e2fsck_read_inode(ctx, db->ino, &inode, "allocate_dir_block");
 	ext2fs_iblk_add_blocks(fs, &inode, 1);
-	if (inode.i_size < (db->blockcnt+1) * fs->blocksize)
-		inode.i_size = (db->blockcnt+1) * fs->blocksize;
+	if (EXT2_I_SIZE(&inode) < (db->blockcnt+1) * fs->blocksize) {
+		pctx->errcode = ext2fs_inode_set_size(fs, &inode,
+					(db->blockcnt+1) * fs->blocksize);
+		if (pctx->errcode) {
+			pctx->str = "ext2fs_inode_set_size";
+			fix_problem(ctx, PR_2_ALLOC_DIRBOCK, pctx);
+			return 1;
+		}
+	}
 	e2fsck_write_inode(ctx, db->ino, &inode, "allocate_dir_block");
 
 	/*
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index efc0d49..324e398 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -865,8 +865,9 @@ errcode_t e2fsck_expand_directory(e2fsck_t ctx, ext2_ino_t dir,
 		return retval;
 
 	sz = (es.last_block + 1) * fs->blocksize;
-	inode.i_size = sz;
-	inode.i_size_high = sz >> 32;
+	retval = ext2fs_inode_set_size(fs, &inode, sz);
+	if (retval)
+		return retval;
 	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
 	quota_data_add(ctx->qctx, &inode, dir, es.newblocks * fs->blocksize);
 
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 89708c2..09c55e5 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -783,7 +783,10 @@ static errcode_t write_directory(e2fsck_t ctx, ext2_filsys fs,
 		inode.i_flags &= ~EXT2_INDEX_FL;
 	else
 		inode.i_flags |= EXT2_INDEX_FL;
-	inode.i_size = outdir->num * fs->blocksize;
+	retval = ext2fs_inode_set_size(fs, &inode,
+				       outdir->num * fs->blocksize);
+	if (retval)
+		return retval;
 	ext2fs_iblk_sub_blocks(fs, &inode, wd.cleared);
 	e2fsck_write_inode(ctx, ino, &inode, "rehash_dir");
 
diff --git a/lib/ext2fs/bb_inode.c b/lib/ext2fs/bb_inode.c
index 268eecf..3d9132b 100644
--- a/lib/ext2fs/bb_inode.c
+++ b/lib/ext2fs/bb_inode.c
@@ -128,7 +128,10 @@ errcode_t ext2fs_update_bb_inode(ext2_filsys fs, ext2_badblocks_list bb_list)
 	if (!inode.i_ctime)
 		inode.i_ctime = fs->now ? fs->now : time(0);
 	ext2fs_iblk_set(fs, &inode, rec.bad_block_count);
-	inode.i_size = rec.bad_block_count * fs->blocksize;
+	retval = ext2fs_inode_set_size(fs, &inode,
+				       rec.bad_block_count * fs->blocksize);
+	if (retval)
+		goto cleanup;
 
 	retval = ext2fs_write_inode(fs, EXT2_BAD_INO, &inode);
 	if (retval)
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index ca35d24..3d7374e 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1245,6 +1245,8 @@ errcode_t ext2fs_file_get_lsize(ext2_file_t file, __u64 *ret_size);
 extern ext2_off_t ext2fs_file_get_size(ext2_file_t file);
 extern errcode_t ext2fs_file_set_size(ext2_file_t file, ext2_off_t size);
 extern errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size);
+errcode_t ext2fs_inode_set_size(ext2_filsys fs, struct ext2_inode *inode,
+				ext2_off64_t size);
 
 /* finddev.c */
 extern char *ext2fs_find_block_device(dev_t device);
diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 1e386f8..55affb4 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -567,6 +567,31 @@ out:
 	return retval;
 }
 
+errcode_t ext2fs_inode_set_size(ext2_filsys fs, struct ext2_inode *inode,
+				ext2_off64_t size)
+{
+	/* Only regular files get to be larger than 4GB */
+	if (!LINUX_S_ISREG(inode->i_mode) && (size >> 32))
+		return EXT2_ET_FILE_TOO_BIG;
+
+	/* If we're writing a large file, set the large_file flag */
+	if (LINUX_S_ISREG(inode->i_mode) &&
+	    ext2fs_needs_large_file_feature(size) &&
+	    (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
+	     fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
+		fs->super->s_feature_ro_compat |=
+					EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
+		ext2fs_update_dynamic_rev(fs);
+		ext2fs_mark_super_dirty(fs);
+	}
+
+	inode->i_size = size & 0xffffffff;
+	inode->i_size_high = (size >> 32);
+
+	return 0;
+}
+
 /*
  * This function sets the size of the file, truncating it if necessary
  *
@@ -588,20 +613,10 @@ errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size)
 	old_truncate = ((old_size + file->fs->blocksize - 1) >>
 		      EXT2_BLOCK_SIZE_BITS(file->fs->super));
 
-	/* If we're writing a large file, set the large_file flag */
-	if (LINUX_S_ISREG(file->inode.i_mode) &&
-	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&file->inode)) &&
-	    (!EXT2_HAS_RO_COMPAT_FEATURE(file->fs->super,
-					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
-	     file->fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
-		file->fs->super->s_feature_ro_compat |=
-				EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
-		ext2fs_update_dynamic_rev(file->fs);
-		ext2fs_mark_super_dirty(file->fs);
-	}
+	retval = ext2fs_inode_set_size(file->fs, &file->inode, size);
+	if (retval)
+		return retval;
 
-	file->inode.i_size = size & 0xffffffff;
-	file->inode.i_size_high = (size >> 32);
 	if (file->ino) {
 		retval = ext2fs_write_inode(file->fs, file->ino, &file->inode);
 		if (retval)
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index ecc3912..11f33ab 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -400,15 +400,13 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
 		goto errout;
 
 	inode_size = (unsigned long long)fs->blocksize * num_blocks;
-	inode.i_size = inode_size & 0xFFFFFFFF;
-	inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
-	if (ext2fs_needs_large_file_feature(inode_size))
-		fs->super->s_feature_ro_compat |=
-			EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
 	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
 	inode.i_mtime = inode.i_ctime = fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
 	inode.i_mode = LINUX_S_IFREG | 0600;
+	retval = ext2fs_inode_set_size(fs, &inode, inode_size);
+	if (retval)
+		goto errout;
 
 	if ((retval = ext2fs_write_new_inode(fs, journal_ino, &inode)))
 		goto errout;
diff --git a/lib/ext2fs/res_gdt.c b/lib/ext2fs/res_gdt.c
index e61c330..1343ce6 100644
--- a/lib/ext2fs/res_gdt.c
+++ b/lib/ext2fs/res_gdt.c
@@ -133,12 +133,9 @@ errcode_t ext2fs_create_resize_inode(ext2_filsys fs)
 		dindir_dirty = inode_dirty = 1;
 		inode_size = apb*apb + apb + EXT2_NDIR_BLOCKS;
 		inode_size *= fs->blocksize;
-		inode.i_size = inode_size & 0xFFFFFFFF;
-		inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
-		if(inode.i_size_high) {
-			sb->s_feature_ro_compat |=
-				EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
-		}
+		retval = ext2fs_inode_set_size(fs, &inode, inode_size);
+		if (retval)
+			goto out_free;
 		inode.i_ctime = fs->now ? fs->now : time(0);
 	}
 
diff --git a/lib/ext2fs/symlink.c b/lib/ext2fs/symlink.c
index cb3a2e7..4147181 100644
--- a/lib/ext2fs/symlink.c
+++ b/lib/ext2fs/symlink.c
@@ -80,7 +80,7 @@ errcode_t ext2fs_symlink(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t ino,
 	inode.i_uid = inode.i_gid = 0;
 	ext2fs_iblk_set(fs, &inode, fastlink ? 0 : 1);
 	inode.i_links_count = 1;
-	inode.i_size = target_len;
+	ext2fs_inode_set_size(fs, &inode, target_len);
 	/* The time fields are set by ext2fs_write_new_inode() */
 
 	if (fastlink) {
diff --git a/misc/create_inode.c b/misc/create_inode.c
index e7faab1..ec98afe 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -405,7 +405,12 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 	inode.i_atime = inode.i_ctime = inode.i_mtime =
 		fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
-	inode.i_size = statbuf.st_size;
+	retval = ext2fs_inode_set_size(fs, &inode, statbuf.st_size);
+	if (retval) {
+		com_err(dest, retval, 0);
+		close(fd);
+		return retval;
+	}
 	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				      EXT4_FEATURE_INCOMPAT_INLINE_DATA)) {
 		inode.i_flags |= EXT4_INLINE_DATA_FL;
diff --git a/tests/f_big_sparse/expect.1 b/tests/f_big_sparse/expect.1
index 437ade7..eac82ed 100644
--- a/tests/f_big_sparse/expect.1
+++ b/tests/f_big_sparse/expect.1
@@ -2,11 +2,6 @@ Pass 1: Checking inodes, blocks, and sizes
 Inode 12, i_size is 61440, should be 4398050758656.  Fix? yes
 
 Pass 2: Checking directory structure
-Filesystem contains large files, but lacks LARGE_FILE flag in superblock.
-Fix? yes
-
-Filesystem has feature flag(s) set, but is a revision 0 filesystem.  Fix? yes

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 29/37] libext2fs: implement fallocate
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (27 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 28/37] libext2fs: provide a function to set inode size Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-05-01 23:15 ` [PATCH 31/37] fuse2fs: translate ACL structures Darrick J. Wong
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Create a library function to perform fallocation on arbitrary files,
and wire up a few users for this function.  This is a bit more intense
than Ted's original mk_hugefiles implementation since we have to honor
any blocks that may already be allocated to the file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/Makefile.in |    8 
 lib/ext2fs/ext2fs.h    |   10 +
 lib/ext2fs/fallocate.c |  835 ++++++++++++++++++++++++++++++++++++++++++++++++
 misc/mk_hugefiles.c    |   91 +----
 4 files changed, 863 insertions(+), 81 deletions(-)
 create mode 100644 lib/ext2fs/fallocate.c


diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index f287a57..1e3794c 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -44,6 +44,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
 	expanddir.o \
 	ext_attr.o \
 	extent.o \
+	fallocate.o \
 	fileio.o \
 	finddev.o \
 	flushb.o \
@@ -682,6 +683,13 @@ extent.o: $(srcdir)/extent.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
  $(srcdir)/bitops.h $(srcdir)/e2image.h
+fallocate.o: $(srcdir)/fallocate.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fsP.h \
+ $(srcdir)/ext2fs.h $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
+ $(srcdir)/bitops.h $(srcdir)/e2image.h
 fileio.o: $(srcdir)/fileio.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 3d7374e..84c7c74 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1222,6 +1222,16 @@ extern errcode_t ext2fs_extent_goto2(ext2_extent_handle_t handle,
 				     int leaf_level, blk64_t blk);
 extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);
 
+/* fallocate.c */
+#define EXT2_FALLOCATE_ZERO_BLOCKS	(0x1)
+#define EXT2_FALLOCATE_FORCE_INIT	(0x2)
+#define EXT2_FALLOCATE_FORCE_UNINIT	(0x4)
+#define EXT2_FALLOCATE_INIT_BEYOND_EOF	(0x8)
+#define EXT2_FALLOCATE_ALL_FLAGS	(0xF)
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+			   struct ext2_inode *inode,
+			   blk64_t start, blk64_t len);
+
 /* fileio.c */
 extern errcode_t ext2fs_file_open2(ext2_filsys fs, ext2_ino_t ino,
 				   struct ext2_inode *inode,
diff --git a/lib/ext2fs/fallocate.c b/lib/ext2fs/fallocate.c
new file mode 100644
index 0000000..5e91037
--- /dev/null
+++ b/lib/ext2fs/fallocate.c
@@ -0,0 +1,835 @@
+/*
+ * fallocate.c -- Allocate large chunks of file.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+
+#include "ext2_fs.h"
+#include "ext2fs.h"
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Extent-based fallocate code.
+ *
+ * Find runs of unmapped logical blocks by starting at start and walking the
+ * extents until we reach the end of the range we want.
+ *
+ * For each run of unmapped blocks, try to find the extents on either side of
+ * the range.  If there's a left extent that can grow by at least a cluster and
+ * there are lblocks between start and the next lcluster after start, see if
+ * there's an implied cluster allocation; if so, zero the blocks (if the left
+ * extent is initialized) and adjust the extent.  Ditto for the blocks between
+ * the end of the last full lcluster and end, if there's a right extent.
+ *
+ * Try to attach as much as we can to the left extent, then try to attach as
+ * much as we can to the right extent.  For the remainder, try to allocate the
+ * whole range; map in whatever we get; and repeat until we're done.
+ *
+ * To attach to a left extent, figure out the maximum amount we can add to the
+ * extent and try to allocate that much, and append if successful.  To attach
+ * to a right extent, figure out the max we can add to the extent, try to
+ * allocate that much, and prepend if successful.
+ *
+ * We need an alloc_range function that tells us how much we can allocate given
+ * a maximum length and one of a suggested start, a fixed start, or a fixed end
+ * point.
+ *
+ * Every time we modify the extent tree we also need to update the block stats.
+ *
+ * At the end, update i_blocks and i_size appropriately.
+ */
+
+static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
+{
+#ifdef DEBUG
+	if (desc)
+		printf("%s: ", desc);
+	printf("extent: lblk %llu--%llu, len %u, pblk %llu, flags: ",
+	       extent->e_lblk, extent->e_lblk + extent->e_len - 1,
+	       extent->e_len, extent->e_pblk);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_LEAF)
+		fputs("LEAF ", stdout);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+		fputs("UNINIT ", stdout);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+		fputs("2ND_VISIT ", stdout);
+	if (!extent->e_flags)
+		fputs("(none)", stdout);
+	fputc('\n', stdout);
+	fflush(stdout);
+#endif
+}
+
+static errcode_t claim_range(ext2_filsys fs, struct ext2_inode *inode,
+			     blk64_t blk, blk64_t len)
+{
+	blk64_t	clusters;
+
+	clusters = (len + EXT2FS_CLUSTER_RATIO(fs) - 1) /
+		   EXT2FS_CLUSTER_RATIO(fs);
+	ext2fs_block_alloc_stats_range(fs, blk,
+			clusters * EXT2FS_CLUSTER_RATIO(fs), +1);
+	return ext2fs_iblk_add_blocks(fs, inode, clusters);
+}
+
+static errcode_t ext_falloc_helper(ext2_filsys fs,
+				   int flags,
+				   ext2_ino_t ino,
+				   struct ext2_inode *inode,
+				   ext2_extent_handle_t handle,
+				   struct ext2fs_extent *left_ext,
+				   struct ext2fs_extent *right_ext,
+				   blk64_t range_start, blk64_t range_len,
+				   blk64_t alloc_goal)
+{
+	struct ext2fs_extent	newex, ex;
+	int			op;
+	blk64_t			fillable, pblk, plen, x, cluster_fill, y;
+	blk64_t			eof_blk;
+	errcode_t		err;
+	blk_t			max_extent_len, max_uninit_len, max_init_len;
+
+#ifdef DEBUG
+	printf("%s: ", __func__);
+	if (left_ext)
+		printf("left_ext=%llu--%llu, ", left_ext->e_lblk,
+		       left_ext->e_lblk + left_ext->e_len - 1);
+	if (right_ext)
+		printf("right_ext=%llu--%llu, ", right_ext->e_lblk,
+		       right_ext->e_lblk + right_ext->e_len - 1);
+	printf("start=%llu len=%llu, goal=%llu\n", range_start, range_len,
+	       alloc_goal);
+	fflush(stdout);
+#endif
+	/* Can't create initialized extents past EOF? */
+	if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF))
+		eof_blk = EXT2_I_SIZE(inode) / fs->blocksize;
+
+	/* The allocation goal must be as far into a cluster as range_start. */
+	alloc_goal = (alloc_goal & ~EXT2FS_CLUSTER_MASK(fs)) |
+		     (range_start & EXT2FS_CLUSTER_MASK(fs));
+
+	max_uninit_len = EXT_UNINIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+	max_init_len = EXT_INIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+
+	/* We must lengthen the left extent to the end of the cluster */
+	if (left_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		/* How many more blocks can be attached to left_ext? */
+		if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - left_ext->e_len;
+		else
+			fillable = max_init_len - left_ext->e_len;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto expand_right;
+
+		/*
+		 * If range_start isn't on a cluster boundary, try an
+		 * implied cluster allocation for left_ext.
+		 */
+		cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+			       (range_start & EXT2FS_CLUSTER_MASK(fs));
+		cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill == 0)
+			goto expand_right;
+
+		if (cluster_fill > fillable)
+			cluster_fill = fillable;
+
+		/* Don't expand an initialized left_ext beyond EOF */
+		if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+			x = left_ext->e_lblk + left_ext->e_len - 1;
+			dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+				   __func__, x, x + cluster_fill, eof_blk);
+			if (eof_blk >= x && eof_blk <= x + cluster_fill)
+				cluster_fill = eof_blk - x;
+			if (cluster_fill == 0)
+				goto expand_right;
+		}
+
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto expand_right;
+		left_ext->e_len += cluster_fill;
+		range_start += cluster_fill;
+		range_len -= cluster_fill;
+		alloc_goal += cluster_fill;
+
+		dbg_print_extent("ext_falloc clus left+", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+			err = ext2fs_zero_blocks2(fs, left_ext->e_pblk +
+						  left_ext->e_len -
+						  cluster_fill, cluster_fill,
+						  NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+expand_right:
+	/* We must lengthen the right extent to the beginning of the cluster */
+	if (right_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		/* How much can we attach to right_ext? */
+		if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - right_ext->e_len;
+		else
+			fillable = max_init_len - right_ext->e_len;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto try_merge;
+
+		/*
+		 * If range_end isn't on a cluster boundary, try an implied
+		 * cluster allocation for right_ext.
+		 */
+		cluster_fill = right_ext->e_lblk & EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill == 0)
+			goto try_merge;
+
+		err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+		if (err)
+			goto out;
+
+		if (cluster_fill > fillable)
+			cluster_fill = fillable;
+		right_ext->e_lblk -= cluster_fill;
+		right_ext->e_pblk -= cluster_fill;
+		right_ext->e_len += cluster_fill;
+		range_len -= cluster_fill;
+
+		dbg_print_extent("ext_falloc clus right+", right_ext);
+		err = ext2fs_extent_replace(handle, 0, right_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+			err = ext2fs_zero_blocks2(fs, right_ext->e_pblk,
+						  cluster_fill, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_merge:
+	/* Merge both extents together, perhaps? */
+	if (left_ext && right_ext) {
+		/* Are the two extents mergeable? */
+		if ((left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) !=
+		    (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT))
+			goto try_left;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_left;
+
+		/*
+		 * Skip initialized extent unless user wants to zero blocks
+		 * or requires init extent.
+		 */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (!(flags & EXT2_FALLOCATE_ZERO_BLOCKS) ||
+		     !(flags & EXT2_FALLOCATE_FORCE_INIT)))
+			goto try_left;
+
+		/* Will it even fit? */
+		x = left_ext->e_len + range_len + right_ext->e_len;
+		if (x > (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT ?
+				max_uninit_len : max_init_len))
+			goto try_left;
+
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto try_left;
+
+		/* Allocate blocks */
+		y = left_ext->e_pblk + left_ext->e_len;
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				       EXT2_NEWRANGE_EXACT_LENGTH, y,
+				       right_ext->e_pblk - y + 1, NULL,
+				       &pblk, &plen);
+		if (err)
+			goto try_left;
+		if (pblk + plen != right_ext->e_pblk)
+			goto try_left;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Modify extents */
+		left_ext->e_len = x;
+		dbg_print_extent("ext_falloc merge", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_delete(handle, 0);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+		*right_ext = *left_ext;
+
+		/* Zero blocks */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, range_start, range_len,
+						  NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		return 0;
+	}
+
+try_left:
+	/* Extend the left extent */
+	if (left_ext) {
+		/* How many more blocks can be attached to left_ext? */
+		if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - left_ext->e_len;
+		else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+			fillable = max_init_len - left_ext->e_len;
+		else
+			fillable = 0;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_right;
+
+		if (fillable > range_len)
+			fillable = range_len;
+
+		/* Don't expand an initialized left_ext beyond EOF */
+		x = left_ext->e_lblk + left_ext->e_len - 1;
+		if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+			dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+				   __func__, x, x + fillable, eof_blk);
+			if (eof_blk >= x && eof_blk <= x + fillable)
+				fillable = eof_blk - x;
+		}
+
+		if (fillable == 0)
+			goto try_right;
+
+		/* Test if the right edge of the range is already mapped? */
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			err = ext2fs_map_cluster_block(fs, ino, inode,
+					x + fillable, &pblk);
+			if (err)
+				goto out;
+			if (pblk)
+				fillable -= 1 + ((x + fillable)
+						 & EXT2FS_CLUSTER_MASK(fs));
+			if (fillable == 0)
+				goto try_right;
+		}
+
+		/* Allocate range of blocks */
+		x = left_ext->e_pblk + left_ext->e_len;
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				EXT2_NEWRANGE_EXACT_LENGTH,
+				x, fillable, NULL, &pblk, &plen);
+		if (err)
+			goto try_right;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Modify left_ext */
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto out;
+		range_start += plen;
+		range_len -= plen;
+		left_ext->e_len += plen;
+		dbg_print_extent("ext_falloc left+", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_right:
+	/* Extend the right extent */
+	if (right_ext) {
+		/* How much can we attach to right_ext? */
+		if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - right_ext->e_len;
+		else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+			fillable = max_init_len - right_ext->e_len;
+		else
+			fillable = 0;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_anywhere;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto try_anywhere;
+
+		/* Test if the left edge of the range is already mapped? */
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			err = ext2fs_map_cluster_block(fs, ino, inode,
+					right_ext->e_lblk - fillable, &pblk);
+			if (err)
+				goto out;
+			if (pblk)
+				fillable -= EXT2FS_CLUSTER_RATIO(fs) -
+						((right_ext->e_lblk - fillable)
+						 & EXT2FS_CLUSTER_MASK(fs));
+			if (fillable == 0)
+				goto try_anywhere;
+		}
+
+		/*
+		 * FIXME: It would be nice if we could handle allocating a
+		 * variable range from a fixed end point instead of just
+		 * skipping to the general allocator if the whole range is
+		 * unavailable.
+		 */
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				EXT2_NEWRANGE_EXACT_LENGTH,
+				right_ext->e_pblk - fillable,
+				fillable, NULL, &pblk, &plen);
+		if (err)
+			goto try_anywhere;
+		err = claim_range(fs, inode,
+			      pblk & ~EXT2FS_CLUSTER_MASK(fs),
+			      plen + (pblk & EXT2FS_CLUSTER_MASK(fs)));
+		if (err)
+			goto out;
+
+		/* Modify right_ext */
+		err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+		if (err)
+			goto out;
+		range_len -= plen;
+		right_ext->e_lblk -= plen;
+		right_ext->e_pblk -= plen;
+		right_ext->e_len += plen;
+		dbg_print_extent("ext_falloc right+", right_ext);
+		err = ext2fs_extent_replace(handle, 0, right_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk,
+					plen + cluster_fill, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_anywhere:
+	/* Try implied cluster alloc on the left and right ends */
+	if (range_len > 0 && (range_start & EXT2FS_CLUSTER_MASK(fs))) {
+		cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+			       (range_start & EXT2FS_CLUSTER_MASK(fs));
+		cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill > range_len)
+			cluster_fill = range_len;
+		newex.e_lblk = range_start;
+		err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+					       &pblk);
+		if (err)
+			goto out;
+		if (pblk == 0)
+			goto try_right_implied;
+		newex.e_pblk = pblk;
+		newex.e_len = cluster_fill;
+		newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+				 EXT2_EXTENT_FLAGS_UNINIT);
+		dbg_print_extent("ext_falloc iclus left+", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+						  newex.e_len, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		range_start += cluster_fill;
+		range_len -= cluster_fill;
+	}
+
+try_right_implied:
+	y = range_start + range_len;
+	if (range_len > 0 && (y & EXT2FS_CLUSTER_MASK(fs))) {
+		cluster_fill = y & EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill > range_len)
+			cluster_fill = range_len;
+		newex.e_lblk = y & ~EXT2FS_CLUSTER_MASK(fs);
+		err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+					       &pblk);
+		if (err)
+			goto out;
+		if (pblk == 0)
+			goto no_implied;
+		newex.e_pblk = pblk;
+		newex.e_len = cluster_fill;
+		newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+				 EXT2_EXTENT_FLAGS_UNINIT);
+		dbg_print_extent("ext_falloc iclus right+", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+						  newex.e_len, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		range_len -= cluster_fill;
+	}
+
+no_implied:
+	if (range_len == 0)
+		return 0;
+
+	newex.e_lblk = range_start;
+	if (flags & EXT2_FALLOCATE_FORCE_INIT) {
+		max_extent_len = max_init_len;
+		newex.e_flags = 0;
+	} else {
+		max_extent_len = max_uninit_len;
+		newex.e_flags = EXT2_EXTENT_FLAGS_UNINIT;
+	}
+	pblk = alloc_goal;
+	y = range_len;
+	for (x = 0; x < y;) {
+		cluster_fill = newex.e_lblk & EXT2FS_CLUSTER_MASK(fs);
+		fillable = min(range_len + cluster_fill, max_extent_len);
+		err = ext2fs_new_range(fs, 0, pblk & ~EXT2FS_CLUSTER_MASK(fs),
+				       fillable,
+				       NULL, &pblk, &plen);
+		if (err)
+			goto out;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Create extent */
+		newex.e_pblk = pblk + cluster_fill;
+		newex.e_len = plen - cluster_fill;
+		dbg_print_extent("ext_falloc create", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		/* Update variables at end of loop */
+		x += plen - cluster_fill;
+		range_len -= plen - cluster_fill;
+		newex.e_lblk += plen - cluster_fill;
+		pblk += plen - cluster_fill;
+		if (pblk >= ext2fs_blocks_count(fs->super))
+			pblk = fs->super->s_first_data_block;
+	}
+
+out:
+	return err;
+}
+
+static errcode_t extent_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+				      struct ext2_inode *inode,
+				      blk64_t start, blk64_t len)
+{
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	left_extent, right_extent;
+	struct ext2fs_extent	*left_adjacent, *right_adjacent;
+	errcode_t		err;
+	blk64_t			range_start, range_end = 0, end, next;
+	blk64_t			count, goal, goal_distance;
+
+	end = start + len - 1;
+	err = ext2fs_extent_open2(fs, ino, inode, &handle);
+	if (err)
+		return err;
+
+	/*
+	 * Find the extent closest to the start of the alloc range.  We don't
+	 * check the return value because _goto() sets the current node to the
+	 * next-lowest extent if 'start' is in a hole; or the next-highest
+	 * extent if there aren't any lower ones; or doesn't set a current node
+	 * if there was a real error reading the extent tree.  In that case,
+	 * _get() will error out.
+	 */
+start_again:
+	ext2fs_extent_goto(handle, start);
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, &left_extent);
+	if (err == EXT2_ET_NO_CURRENT_NODE) {
+		blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+		goal = ext2fs_find_inode_goal(fs, ino);
+		err = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+						goal, max_blocks - 1, &goal);
+		goal += start;
+		err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+					NULL, start, len, goal);
+		goto errout;
+	} else if (err)
+		goto errout;
+
+	dbg_print_extent("ext_falloc initial", &left_extent);
+	next = left_extent.e_lblk + left_extent.e_len;
+	if (left_extent.e_lblk > start) {
+		/* The nearest extent we found was beyond start??? */
+		goal = left_extent.e_pblk - (left_extent.e_lblk - start);
+		err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+					&left_extent, start,
+					left_extent.e_lblk - start, goal);
+		if (err)
+			goto errout;
+
+		goto start_again;
+	} else if (next >= start) {
+		range_start = next;
+		left_adjacent = &left_extent;
+	} else {
+		range_start = start;
+		left_adjacent = NULL;
+	}
+	goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+	goal_distance = range_start - next;
+
+	do {
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF,
+					   &right_extent);
+		dbg_printf("%s: ino=%d get next =%d\n", __func__, ino,
+			   (int)err);
+		dbg_print_extent("ext_falloc next", &right_extent);
+		/* Stop if we've seen this extent before */
+		if (!err && right_extent.e_lblk <= left_extent.e_lblk)
+			err = EXT2_ET_EXTENT_NO_NEXT;
+
+		if (err && err != EXT2_ET_EXTENT_NO_NEXT)
+			goto errout;
+		if (err == EXT2_ET_EXTENT_NO_NEXT ||
+		    right_extent.e_lblk > end + 1) {
+			range_end = end;
+			right_adjacent = NULL;
+		} else {
+			/* Handle right_extent.e_lblk <= end */
+			range_end = right_extent.e_lblk - 1;
+			right_adjacent = &right_extent;
+		}
+		if (err != EXT2_ET_EXTENT_NO_NEXT &&
+		    goal_distance > (range_end - right_extent.e_lblk)) {
+			goal = right_extent.e_pblk -
+					(right_extent.e_lblk - range_start);
+			goal_distance = range_end - right_extent.e_lblk;
+		}
+
+		dbg_printf("%s: ino=%d rstart=%llu rend=%llu\n", __func__, ino,
+			   range_start, range_end);
+		err = 0;
+		if (range_start <= range_end) {
+			count = range_end - range_start + 1;
+			err = ext_falloc_helper(fs, flags, ino, inode, handle,
+						left_adjacent, right_adjacent,
+						range_start, count, goal);
+			if (err)
+				goto errout;
+		}
+
+		if (range_end == end)
+			break;
+
+		err = ext2fs_extent_goto(handle, right_extent.e_lblk);
+		if (err)
+			goto errout;
+		next = right_extent.e_lblk + right_extent.e_len;
+		left_extent = right_extent;
+		left_adjacent = &left_extent;
+		range_start = next;
+		goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+		goal_distance = range_start - next;
+	} while (range_end < end);
+
+errout:
+	ext2fs_zero_blocks2(NULL, 0, 0, NULL, NULL);
+	ext2fs_extent_free(handle);
+	return err;
+}
+
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+			   struct ext2_inode *inode,
+			   blk64_t start, blk64_t len)
+{
+	struct ext2_inode	inode_buf;
+	blk64_t			blk, x;
+	errcode_t		err;
+
+	if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+	    (flags & EXT2_FALLOCATE_FORCE_UNINIT)) ||
+	   (flags & ~EXT2_FALLOCATE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (len > ext2fs_blocks_count(fs->super))
+		return EXT2_ET_BLOCK_ALLOC_FAIL;
+	else if (len == 0)
+		return 0;
+
+	/* Read inode structure if necessary */
+	if (!inode) {
+		err = ext2fs_read_inode(fs, ino, &inode_buf);
+		if (err)
+			return err;
+		inode = &inode_buf;
+	}
+	dbg_printf("%s: ino=%d start=%llu len=%llu\n", __func__, ino, start,
+		   len);
+
+	if (inode->i_flags & EXT4_EXTENTS_FL) {
+		err = extent_fallocate(fs, flags, ino, inode, start, len);
+		goto out;
+	}
+
+	/* XXX: Allocate a bunch of blocks the slow way */
+	for (blk = start; blk <= start + len; blk++) {
+		err = ext2fs_bmap2(fs, ino, inode, NULL, 0, blk, 0, &x);
+		if (err)
+			return err;
+		if (x)
+			continue;
+
+		err = ext2fs_bmap2(fs, ino, inode, NULL,
+				   BMAP_ALLOC | BMAP_UNINIT, blk, 0, &x);
+		if (err)
+			return err;
+	}
+
+out:
+	if (inode == &inode_buf)
+		ext2fs_write_inode(fs, ino, inode);
+	return err;
+}
diff --git a/misc/mk_hugefiles.c b/misc/mk_hugefiles.c
index d4dadc4..1ea3048 100644
--- a/misc/mk_hugefiles.c
+++ b/misc/mk_hugefiles.c
@@ -124,7 +124,6 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
 	blk64_t			left;
 	blk64_t			count = 0;
 	struct ext2_inode	inode;
-	ext2_extent_handle_t	handle;
 
 	retval = ext2fs_new_inode(fs, 0, LINUX_S_IFREG, NULL, ino);
 	if (retval)
@@ -144,84 +143,20 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
 
 	ext2fs_inode_alloc_stats2(fs, *ino, +1, 0);
 
-	retval = ext2fs_extent_open2(fs, *ino, &inode, &handle);
+	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				      EXT3_FEATURE_INCOMPAT_EXTENTS))
+		inode.i_flags |= EXT4_EXTENTS_FL;
+	retval = ext2fs_fallocate(fs,
+				  EXT2_FALLOCATE_FORCE_INIT |
+				  EXT2_FALLOCATE_ZERO_BLOCKS,
+				  *ino, &inode, 0, num);
 	if (retval)
 		return retval;
-
-	lblk = 0;
-	left = num ? num : 1;
-	while (left) {
-		blk64_t pblk, end;
-		blk64_t n = left;
-
-		retval =  ext2fs_find_first_zero_block_bitmap2(fs->block_map,
-			goal, ext2fs_blocks_count(fs->super) - 1, &end);
-		if (retval)
-			goto errout;
-		goal = end;
-
-		retval =  ext2fs_find_first_set_block_bitmap2(fs->block_map, goal,
-			       ext2fs_blocks_count(fs->super) - 1, &bend);
-		if (retval == ENOENT) {
-			bend = ext2fs_blocks_count(fs->super);
-			if (num == 0)
-				left = 0;
-		}
-		if (!num || bend - goal < left)
-			n = bend - goal;
-		pblk = goal;
-		if (num)
-			left -= n;
-		goal += n;
-		count += n;
-		ext2fs_block_alloc_stats_range(fs, pblk, n, +1);
-
-		if (zero_hugefile) {
-			blk64_t ret_blk;
-			retval = ext2fs_zero_blocks2(fs, pblk, n,
-						     &ret_blk, NULL);
-
-			if (retval)
-				com_err(program_name, retval,
-					_("while zeroing block %llu "
-					  "for hugefile"), ret_blk);
-		}
-
-		while (n) {
-			blk64_t l = n;
-			struct ext2fs_extent newextent;
-
-			if (l > EXT_INIT_MAX_LEN)
-				l = EXT_INIT_MAX_LEN;
-
-			newextent.e_len = l;
-			newextent.e_pblk = pblk;
-			newextent.e_lblk = lblk;
-			newextent.e_flags = 0;
-
-			retval = ext2fs_extent_insert(handle,
-					EXT2_EXTENT_INSERT_AFTER, &newextent);
-			if (retval)
-				return retval;
-			pblk += l;
-			lblk += l;
-			n -= l;
-		}
-	}
-
-	retval = ext2fs_read_inode(fs, *ino, &inode);
+	retval = ext2fs_inode_set_size(fs, &inode, num * fs->blocksize);
 	if (retval)
-		goto errout;
-
-	retval = ext2fs_iblk_add_blocks(fs, &inode,
-					count / EXT2FS_CLUSTER_RATIO(fs));
-	if (retval)
-		goto errout;
-	size = (__u64) count * fs->blocksize;
-	inode.i_size = size & 0xffffffff;
-	inode.i_size_high = (size >> 32);
+		return retval;
 
-	retval = ext2fs_write_new_inode(fs, *ino, &inode);
+	retval = ext2fs_write_inode(fs, *ino, &inode);
 	if (retval)
 		goto errout;
 
@@ -239,13 +174,7 @@ retry:
 		goto retry;
 	}
 
-	if (retval)
-		goto errout;
-
 errout:
-	if (handle)
-		ext2fs_extent_free(handle);

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 31/37] fuse2fs: translate ACL structures
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (28 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 29/37] libext2fs: implement fallocate Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-05-01 23:15 ` [PATCH 32/37] fuse2fs: handle 64-bit dates correctly Darrick J. Wong
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Translate "native" ACL structures into ext4 ACL structures when
reading or writing the ACL EAs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure       |    5 +
 configure.in    |    8 +-
 lib/config.h.in |    3 +
 misc/fuse2fs.c  |  262 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 270 insertions(+), 8 deletions(-)


diff --git a/configure b/configure
index ce6a4ef..e5943af 100755
--- a/configure
+++ b/configure
@@ -10479,7 +10479,7 @@ fi
 done
 
 fi
-for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
+for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/acl.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -11228,6 +11228,7 @@ else
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29
 "
 if eval test \"x\$"$as_ac_Header"\" = x"yes"; then :
   cat >>confdefs.h <<_ACEOF
@@ -11246,6 +11247,7 @@ done
 
 	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
+#define FUSE_USE_VERSION 29
 #ifdef __linux__
 #include <linux/fs.h>
 #include <linux/falloc.h>
@@ -11365,6 +11367,7 @@ else
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29
 #ifdef __linux__
 # include <linux/fs.h>
 # include <linux/falloc.h>
diff --git a/configure.in b/configure.in
index 2c455af..6c185e7 100644
--- a/configure.in
+++ b/configure.in
@@ -948,6 +948,7 @@ AC_CHECK_HEADERS(m4_flatten([
 	linux/loop.h
 	net/if_dl.h
 	netinet/in.h
+	sys/acl.h
 	sys/disklabel.h
 	sys/file.h
 	sys/ioctl.h
@@ -1177,10 +1178,12 @@ then
 else
 	AC_CHECK_HEADERS([pthread.h fuse.h], [],
 [AC_MSG_FAILURE([Cannot find fuse2fs headers.])],
-[#define _FILE_OFFSET_BITS	64])
+[#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29])
 
 	AC_PREPROC_IFELSE(
-[AC_LANG_PROGRAM([[#ifdef __linux__
+[AC_LANG_PROGRAM([[#define FUSE_USE_VERSION 29
+#ifdef __linux__
 #include <linux/fs.h>
 #include <linux/falloc.h>
 #include <linux/xattr.h>
@@ -1195,6 +1198,7 @@ fi
 ,
 AC_CHECK_HEADERS([pthread.h fuse.h], [], [FUSE_CMT="#"],
 [#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29
 #ifdef __linux__
 # include <linux/fs.h>
 # include <linux/falloc.h>
diff --git a/lib/config.h.in b/lib/config.h.in
index 118a508..852c305 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -426,6 +426,9 @@
 /* Define to 1 if you have the `sysconf' function. */
 #undef HAVE_SYSCONF
 
+/* Define to 1 if you have the <sys/acl.h> header file. */
+#undef HAVE_SYS_ACL_H
+
 /* Define to 1 if you have the <sys/disklabel.h> header file. */
 #undef HAVE_SYS_DISKLABEL_H
 
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index dbb9048..93b4b90 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -17,9 +17,15 @@
 # include <linux/falloc.h>
 # include <linux/xattr.h>
 # define FUSE_PLATFORM_OPTS	",nonempty,big_writes"
+# ifdef HAVE_SYS_ACL_H
+#  define TRANSLATE_LINUX_ACLS
+# endif
 #else
 # define FUSE_PLATFORM_OPTS	""
 #endif
+#ifdef TRANSLATE_LINUX_ACLS
+# include <sys/acl.h>
+#endif
 #include <sys/ioctl.h>
 #include <unistd.h>
 #include <fuse.h>
@@ -59,6 +65,199 @@ static ext2_filsys global_fs; /* Try not to use this directly */
 # define FL_PUNCH_HOLE_FLAG (0)
 #endif
 
+/* ACL translation stuff */
+#ifdef TRANSLATE_LINUX_ACLS
+/*
+ * Copied from acl_ea.h in libacl source; ACLs have to be sent to and from fuse
+ * in this format... at least on Linux.
+ */
+#define ACL_EA_ACCESS		"system.posix_acl_access"
+#define ACL_EA_DEFAULT		"system.posix_acl_default"
+
+#define ACL_EA_VERSION		0x0002
+
+typedef struct {
+	u_int16_t	e_tag;
+	u_int16_t	e_perm;
+	u_int32_t	e_id;
+} acl_ea_entry;
+
+typedef struct {
+	u_int32_t	a_version;
+	acl_ea_entry	a_entries[0];
+} acl_ea_header;
+
+static inline size_t acl_ea_size(int count)
+{
+	return sizeof(acl_ea_header) + count * sizeof(acl_ea_entry);
+}
+
+static inline int acl_ea_count(size_t size)
+{
+	if (size < sizeof(acl_ea_header))
+		return -1;
+	size -= sizeof(acl_ea_header);
+	if (size % sizeof(acl_ea_entry))
+		return -1;
+	return size / sizeof(acl_ea_entry);
+}
+
+/*
+ * ext4 ACL structures, copied from fs/ext4/acl.h.
+ */
+#define EXT4_ACL_VERSION	0x0001
+
+typedef struct {
+	__u16		e_tag;
+	__u16		e_perm;
+	__u32		e_id;
+} ext4_acl_entry;
+
+typedef struct {
+	__u16		e_tag;
+	__u16		e_perm;
+} ext4_acl_entry_short;
+
+typedef struct {
+	__u32		a_version;
+} ext4_acl_header;
+
+static inline size_t ext4_acl_size(int count)
+{
+	if (count <= 4) {
+		return sizeof(ext4_acl_header) +
+		       count * sizeof(ext4_acl_entry_short);
+	} else {
+		return sizeof(ext4_acl_header) +
+		       4 * sizeof(ext4_acl_entry_short) +
+		       (count - 4) * sizeof(ext4_acl_entry);
+	}
+}
+
+static inline int ext4_acl_count(size_t size)
+{
+	ssize_t s;
+	size -= sizeof(ext4_acl_header);
+	s = size - 4 * sizeof(ext4_acl_entry_short);
+	if (s < 0) {
+		if (size % sizeof(ext4_acl_entry_short))
+			return -1;
+		return size / sizeof(ext4_acl_entry_short);
+	} else {
+		if (s % sizeof(ext4_acl_entry))
+			return -1;
+		return s / sizeof(ext4_acl_entry) + 4;
+	}
+}
+
+static errcode_t fuse_to_ext4_acl(acl_ea_header *facl, size_t facl_sz,
+				  ext4_acl_header **eacl, size_t *eacl_sz)
+{
+	int i, facl_count;
+	ext4_acl_header *h;
+	size_t h_sz;
+	ext4_acl_entry *e;
+	acl_ea_entry *a;
+	void *hptr;
+	errcode_t err;
+
+	facl_count = acl_ea_count(facl_sz);
+	h_sz = ext4_acl_size(facl_count);
+	if (facl_count < 0 || facl->a_version != ACL_EA_VERSION)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	err = ext2fs_get_mem(h_sz, &h);
+	if (err)
+		return err;
+
+	h->a_version = ext2fs_cpu_to_le32(EXT4_ACL_VERSION);
+	hptr = h + 1;
+	for (i = 0, a = facl->a_entries; i < facl_count; i++, a++) {
+		e = hptr;
+		e->e_tag = ext2fs_cpu_to_le16(a->e_tag);
+		e->e_perm = ext2fs_cpu_to_le16(a->e_perm);
+
+		switch (a->e_tag) {
+		case ACL_USER:
+		case ACL_GROUP:
+			e->e_id = ext2fs_cpu_to_le32(a->e_id);
+			hptr += sizeof(ext4_acl_entry);
+			break;
+		case ACL_USER_OBJ:
+		case ACL_GROUP_OBJ:
+		case ACL_MASK:
+		case ACL_OTHER:
+			hptr += sizeof(ext4_acl_entry_short);
+			break;
+		default:
+			err = EXT2_ET_INVALID_ARGUMENT;
+			goto out;
+		}
+	}
+
+	*eacl = h;
+	*eacl_sz = h_sz;
+	return err;
+out:
+	ext2fs_free_mem(&h);
+	return err;
+}
+
+static errcode_t ext4_to_fuse_acl(acl_ea_header **facl, size_t *facl_sz,
+				  ext4_acl_header *eacl, size_t eacl_sz)
+{
+	int i, eacl_count;
+	acl_ea_header *f;
+	ext4_acl_entry *e;
+	acl_ea_entry *a;
+	size_t f_sz;
+	void *hptr;
+	errcode_t err;
+
+	eacl_count = ext4_acl_count(eacl_sz);
+	f_sz = acl_ea_size(eacl_count);
+	if (eacl_count < 0 ||
+	    eacl->a_version != ext2fs_cpu_to_le32(EXT4_ACL_VERSION))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	err = ext2fs_get_mem(f_sz, &f);
+	if (err)
+		return err;
+
+	f->a_version = ACL_EA_VERSION;
+	hptr = eacl + 1;
+	for (i = 0, a = f->a_entries; i < eacl_count; i++, a++) {
+		e = hptr;
+		a->e_tag = ext2fs_le16_to_cpu(e->e_tag);
+		a->e_perm = ext2fs_le16_to_cpu(e->e_perm);
+
+		switch (a->e_tag) {
+		case ACL_USER:
+		case ACL_GROUP:
+			a->e_id = ext2fs_le32_to_cpu(e->e_id);
+			hptr += sizeof(ext4_acl_entry);
+			break;
+		case ACL_USER_OBJ:
+		case ACL_GROUP_OBJ:
+		case ACL_MASK:
+		case ACL_OTHER:
+			hptr += sizeof(ext4_acl_entry_short);
+			break;
+		default:
+			err = EXT2_ET_INVALID_ARGUMENT;
+			goto out;
+		}
+	}
+
+	*facl = f;
+	*facl_sz = f_sz;
+	return err;
+out:
+	ext2fs_free_mem(&f);
+	return err;
+}
+#endif /* TRANSLATE_LINUX_ACLS */
+
 /*
  * ext2_file_t contains a struct inode, so we can't leave files open.
  * Use this as a proxy instead.
@@ -2115,6 +2314,30 @@ static int op_statfs(const char *path, struct statvfs *buf)
 	return 0;
 }
 
+typedef errcode_t (*xattr_xlate_get)(void **cooked_buf, size_t *cooked_sz,
+				     const void *raw_buf, size_t raw_sz);
+typedef errcode_t (*xattr_xlate_set)(const void *cooked_buf, size_t cooked_sz,
+				     void **raw_buf, size_t *raw_sz);
+struct xattr_translate {
+	const char *prefix;
+	xattr_xlate_get get;
+	xattr_xlate_set set;
+};
+
+#define XATTR_TRANSLATOR(p, g, s) \
+	{.prefix = (p), \
+	 .get = (xattr_xlate_get)(g), \
+	 .set = (xattr_xlate_set)(s)}
+
+static struct xattr_translate xattr_translators[] = {
+#ifdef TRANSLATE_LINUX_ACLS
+	XATTR_TRANSLATOR(ACL_EA_ACCESS, ext4_to_fuse_acl, fuse_to_ext4_acl),
+	XATTR_TRANSLATOR(ACL_EA_DEFAULT, ext4_to_fuse_acl, fuse_to_ext4_acl),
+#endif
+	XATTR_TRANSLATOR(NULL, NULL, NULL),
+};
+#undef XATTR_TRANSLATOR
+
 static int op_getxattr(const char *path, const char *key, char *value,
 		       size_t len)
 {
@@ -2122,8 +2345,9 @@ static int op_getxattr(const char *path, const char *key, char *value,
 	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
-	void *ptr;
-	size_t plen;
+	struct xattr_translate *xt;
+	void *ptr, *cptr;
+	size_t plen, clen;
 	ext2_ino_t ino;
 	errcode_t err;
 	int ret = 0;
@@ -2166,6 +2390,17 @@ static int op_getxattr(const char *path, const char *key, char *value,
 		goto out2;
 	}
 
+	for (xt = xattr_translators; xt->prefix != NULL; xt++) {
+		if (strncmp(key, xt->prefix, strlen(xt->prefix)) == 0) {
+			err = xt->get(&cptr, &clen, ptr, plen);
+			if (err)
+				goto out3;
+			ext2fs_free_mem(&ptr);
+			ptr = cptr;
+			plen = clen;
+		}
+	}
+
 	if (!len) {
 		ret = plen;
 	} else if (len < plen) {
@@ -2175,6 +2410,7 @@ static int op_getxattr(const char *path, const char *key, char *value,
 		ret = plen;
 	}
 
+out3:
 	ext2fs_free_mem(&ptr);
 out2:
 	err = ext2fs_xattrs_close(&h);
@@ -2289,6 +2525,9 @@ static int op_setxattr(const char *path, const char *key, const char *value,
 	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
+	struct xattr_translate *xt;
+	void *cvalue;
+	size_t clen;
 	ext2_ino_t ino;
 	errcode_t err;
 	int ret = 0;
@@ -2328,19 +2567,32 @@ static int op_setxattr(const char *path, const char *key, const char *value,
 		goto out2;
 	}
 
-	err = ext2fs_xattr_set(h, key, value, len);
+	cvalue = (void *)value;
+	clen = len;
+	for (xt = xattr_translators; xt->prefix != NULL; xt++) {
+		if (strncmp(key, xt->prefix, strlen(xt->prefix)) == 0) {
+			err = xt->set(value, len, &cvalue, &clen);
+			if (err)
+				goto out3;
+		}
+	}
+
+	err = ext2fs_xattr_set(h, key, cvalue, clen);
 	if (err) {
 		ret = translate_error(fs, ino, err);
-		goto out2;
+		goto out3;
 	}
 
 	err = ext2fs_xattrs_write(h);
 	if (err) {
 		ret = translate_error(fs, ino, err);
-		goto out2;
+		goto out3;
 	}
 
 	ret = update_ctime(fs, ino, NULL);
+out3:
+	if (cvalue != value)
+		ext2fs_free_mem(&cvalue);
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (!ret && err)


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 32/37] fuse2fs: handle 64-bit dates correctly
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (29 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 31/37] fuse2fs: translate ACL structures Darrick J. Wong
@ 2014-05-01 23:15 ` Darrick J. Wong
  2014-05-01 23:16 ` [PATCH 33/37] fuse2fs: implement fallocate Darrick J. Wong
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:15 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix fuse2fs' interpretation of 64-bit date quantities to match the
kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/fuse2fs.c |   31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 93b4b90..5306c4f 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -324,15 +324,24 @@ static int __translate_error(ext2_filsys fs, errcode_t err, ext2_ino_t ino,
 
 static inline __u32 ext4_encode_extra_time(const struct timespec *time)
 {
-	return (sizeof(time->tv_sec) > 4 ?
-		(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
-	       ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK);
+	__u32 extra = sizeof(time->tv_sec) > 4 ?
+			((time->tv_sec - (__s32)time->tv_sec) >> 32) &
+			EXT4_EPOCH_MASK : 0;
+	return extra | (time->tv_nsec << EXT4_EPOCH_BITS);
 }
 
 static inline void ext4_decode_extra_time(struct timespec *time, __u32 extra)
 {
-	if (sizeof(time->tv_sec) > 4)
-		time->tv_sec |= (__u64)((extra) & EXT4_EPOCH_MASK) << 32;
+	if (sizeof(time->tv_sec) > 4 && (extra & EXT4_EPOCH_MASK)) {
+		__u64 extra_bits = extra & EXT4_EPOCH_MASK;
+		/*
+		 * Prior to kernel 3.14?, we had a broken decode function,
+		 * wherein we effectively did this:
+		 * if (extra_bits == 3)
+		 *     extra_bits = 0;
+		 */
+		time->tv_sec += extra_bits << 32;
+	}
 	time->tv_nsec = ((extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
 }
 
@@ -358,7 +367,7 @@ do {									       \
 	(timespec)->tv_sec = (signed)((raw_inode)->xtime);		       \
 	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
 		ext4_decode_extra_time((timespec),			       \
-				       raw_inode->xtime ## _extra);	       \
+				       (raw_inode)->xtime ## _extra);	       \
 	else								       \
 		(timespec)->tv_nsec = 0;				       \
 } while (0)
@@ -720,6 +729,7 @@ static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
 	dev_t fakedev = 0;
 	errcode_t err;
 	int ret = 0;
+	struct timespec tv;
 
 	memset(&inode, 0, sizeof(inode));
 	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
@@ -737,9 +747,12 @@ static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
 	statbuf->st_size = EXT2_I_SIZE(&inode);
 	statbuf->st_blksize = fs->blocksize;
 	statbuf->st_blocks = blocks_from_inode(fs, &inode);
-	statbuf->st_atime = inode.i_atime;
-	statbuf->st_mtime = inode.i_mtime;
-	statbuf->st_ctime = inode.i_ctime;
+	EXT4_INODE_GET_XTIME(i_atime, &tv, &inode);
+	statbuf->st_atime = tv.tv_sec;
+	EXT4_INODE_GET_XTIME(i_mtime, &tv, &inode);
+	statbuf->st_mtime = tv.tv_sec;
+	EXT4_INODE_GET_XTIME(i_ctime, &tv, &inode);
+	statbuf->st_ctime = tv.tv_sec;
 	if (LINUX_S_ISCHR(inode.i_mode) ||
 	    LINUX_S_ISBLK(inode.i_mode)) {
 		if (inode.i_block[0])


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 33/37] fuse2fs: implement fallocate
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (30 preceding siblings ...)
  2014-05-01 23:15 ` [PATCH 32/37] fuse2fs: handle 64-bit dates correctly Darrick J. Wong
@ 2014-05-01 23:16 ` Darrick J. Wong
  2014-05-01 23:16 ` [PATCH 35/37] tests: enable using fuse2fs with metadata checksum test Darrick J. Wong
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:16 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Use the (new) ext2fs_fallocate() to fallocate file space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/fuse2fs.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 5306c4f..b161720 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -3276,7 +3276,64 @@ out:
 static int fallocate_helper(struct fuse_file_info *fp, int mode, off_t offset,
 			    off_t len)
 {
-	return -EOPNOTSUPP;
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	ext2_filsys fs;
+	struct ext2_inode_large inode;
+	blk64_t start, end, x;
+	__u64 fsize;
+	errcode_t err;
+	int flags;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+	fs = ff->fs;
+	FUSE2FS_CHECK_MAGIC(fs, fh, FUSE2FS_FILE_MAGIC);
+	start = offset / fs->blocksize;
+	end = (offset + len - 1) / fs->blocksize;
+	dbg_printf("%s: ino=%d mode=0x%x start=%jd end=%llu\n", __func__,
+		   fh->ino, mode, offset / fs->blocksize, end);
+	if (!fs_can_allocate(ff, len / fs->blocksize))
+		return -ENOSPC;
+
+	memset(&inode, 0, sizeof(inode));
+	err = ext2fs_read_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return err;
+	fsize = EXT2_I_SIZE(&inode);
+
+	/* Allocate a bunch of blocks */
+	flags = (mode & FL_KEEP_SIZE_FLAG ? 0 :
+			EXT2_FALLOCATE_INIT_BEYOND_EOF);
+	err = ext2fs_fallocate(fs, flags, fh->ino,
+			       (struct ext2_inode *)&inode,
+			       start, end - start + 1);
+	if (err && err != EXT2_ET_BLOCK_ALLOC_FAIL)
+		return translate_error(fs, fh->ino, err);
+
+	/* Update i_size */
+	if (!(mode & FL_KEEP_SIZE_FLAG)) {
+		if (offset + len > fsize) {
+			err = ext2fs_inode_set_size(fs,
+						(struct ext2_inode *)&inode,
+						offset + len);
+			if (err)
+				return translate_error(fs, fh->ino, err);
+		}
+	}
+
+	err = update_mtime(fs, fh->ino, &inode);
+	if (err)
+		return err;
+
+	err = ext2fs_write_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return err;
 }
 
 static errcode_t clean_block_middle(ext2_filsys fs, ext2_ino_t ino,


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 35/37] tests: enable using fuse2fs with metadata checksum test
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (31 preceding siblings ...)
  2014-05-01 23:16 ` [PATCH 33/37] fuse2fs: implement fallocate Darrick J. Wong
@ 2014-05-01 23:16 ` Darrick J. Wong
  2014-05-01 23:16 ` [PATCH 36/37] tests: test date handling Darrick J. Wong
  2014-05-01 23:16 ` [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity Darrick J. Wong
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:16 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Create custom mount/umount commands so that we can run the metadata
checksumming tests against fuse2fs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/fuse2fs/mount  |   28 ++++++++++++++++++++++++++++
 tests/fuse2fs/umount |   21 +++++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100755 tests/fuse2fs/mount
 create mode 100755 tests/fuse2fs/umount


diff --git a/tests/fuse2fs/mount b/tests/fuse2fs/mount
new file mode 100755
index 0000000..321b1f5
--- /dev/null
+++ b/tests/fuse2fs/mount
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# Mount ext4 via fuse.  Put tests/fuse2fs/ at the start of PATH if you want
+# to run the metadata checksumming tests with fuse2fs.
+
+for arg in "$@"; do
+	if [ -b "${arg}" ]; then
+		DEV="${arg}"
+	elif [ -d "${arg}" ]; then
+		MNT="${arg}"
+	fi
+done
+
+if [ -z "${DEV}" -o -z "${MNT}" ]; then
+	echo "Please specify a device and a mountpoint."
+fi
+
+DIR="$(readlink -f "$(dirname "$0")")"
+if [ -n "${FUSE2FS_DEBUG}" ]; then
+	"${DIR}/../../misc/fuse2fs" "${DEV}" "${MNT}" -d >> "${FUSE2FS_DEBUG}" 2>&1 &
+	sleep 1
+	exit 0
+else
+	"${DIR}/../../misc/fuse2fs" "${DEV}" "${MNT}"
+	ERR=$?
+	sleep 1
+	exit "${ERR}"
+fi
diff --git a/tests/fuse2fs/umount b/tests/fuse2fs/umount
new file mode 100755
index 0000000..b21ee5a
--- /dev/null
+++ b/tests/fuse2fs/umount
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+# unmount a filesystem
+sync
+sync
+sync
+
+sleep 2
+if [ -x /bin/umount ]; then
+	/bin/umount "$@"
+	ERR=$?
+elif [ -x /sbin/umount ]; then
+	/sbin/umount "$@"
+	ERR=$?
+else
+	echo "Where is umount?"
+	exit 5
+fi
+sleep 1
+
+exit "${ERR}"


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 36/37] tests: test date handling
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (32 preceding siblings ...)
  2014-05-01 23:16 ` [PATCH 35/37] tests: enable using fuse2fs with metadata checksum test Darrick J. Wong
@ 2014-05-01 23:16 ` Darrick J. Wong
  2014-05-01 23:16 ` [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity Darrick J. Wong
  34 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:16 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Test our ability to handle the entire range of valid dates.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/metadata-checksum-test.sh |   59 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)


diff --git a/tests/metadata-checksum-test.sh b/tests/metadata-checksum-test.sh
index ad1b219..a17bfd2 100755
--- a/tests/metadata-checksum-test.sh
+++ b/tests/metadata-checksum-test.sh
@@ -3749,6 +3749,65 @@ ${fsck_cmd} -C0 -f -n "${DEV}"
 ${E2FSPROGS}/debugfs/debugfs -R 'ex /fragfile' "${DEV}" | tail -n 15
 }
 
+#####################################
+function date_test {
+msg "date_test"
+
+rm -rf /tmp/ls.before /tmp/ls.after /tmp/debugfs.diff
+
+INODE_SIZE="$(${E2FSPROGS}/misc/dumpe2fs -h "${DEV}" | grep 'Inode size:' | awk '{print $3}')"
+if [ "${INODE_SIZE}" -gt 128 ]; then
+	LAST_YEAR=2430
+else
+	LAST_YEAR=2030
+fi
+
+# Write dates
+${mount_cmd} ${MOUNT_OPTS} "${DEV}" "${MNT}" -t ext4 -o journal_checksum
+seq 1910 20 "${LAST_YEAR}" | while read year; do
+	DATE="${year}-01-01 00:00:00.000000000"
+	FNAME="$(echo "${DATE}" | tr '[ \-:.]' '____')"
+	touch -d "${DATE}" "${MNT}/${FNAME}"
+	echo "${FNAME} ${DATE}" >> /tmp/ls.before
+done
+umount "${MNT}"
+${fsck_cmd} -C0 -f -n "${DEV}"
+
+# debugfs
+seq 1910 20 "${LAST_YEAR}" | while read year; do
+	DATE="${year}-01-01 00:00:00.000000000"
+	FNAME="$(echo "${DATE}" | tr '[ \-:.]' '____')"
+	echo "${FNAME}" "$(${E2FSPROGS}/debugfs/debugfs -R "stat ${FNAME}" "${DEV}" | grep 'mtime:')"
+done > /tmp/debugfs.before
+
+# Re-read from kernel
+${mount_cmd} ${MOUNT_OPTS} "${DEV}" "${MNT}" -t ext4 -o journal_checksum
+seq 1910 20 "${LAST_YEAR}" | while read year; do
+	DATE="${year}-01-01 00:00:00.000000000"
+	FNAME="$(echo "${DATE}" | tr '[ \-:.]' '____')"
+	FDATE="$(stat -c '%y' "${MNT}/${FNAME}" | sed -e 's/......$//g')"
+	echo "${FNAME}" "${FDATE}" >> /tmp/ls.after
+done
+umount "${MNT}"
+
+# Did the kernel work?
+diff -u /tmp/ls.before /tmp/ls.after > /tmp/ls.diff || true
+
+# Does debugfs work?
+touch /tmp/debugfs.diff
+cat /tmp/debugfs.before | sed -e 's/^\(....\).*\(....\)$/\1 \2/g' | while read date fdate crap; do
+	if [ "${date}" != "${fdate}" ]; then
+		echo "${date} != ${fdate}" >> /tmp/debugfs.diff
+	fi
+done
+
+if [ "$(cat /tmp/debugfs.diff /tmp/ls.diff | wc -l)" -gt 0 ]; then
+	echo "BROKEN DATE HANDLING"
+	cat /tmp/debugfs.diff /tmp/ls.diff
+	false
+fi
+}
+
 # This test should be the last one (before speed tests, anyway)
 
 #### ALL SPEED TESTS GO AT THE END


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
                   ` (33 preceding siblings ...)
  2014-05-01 23:16 ` [PATCH 36/37] tests: test date handling Darrick J. Wong
@ 2014-05-01 23:16 ` Darrick J. Wong
  2014-05-02  9:45   ` Lukáš Czerner
  34 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-01 23:16 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

This patch defines ext5 as a set of required feature flags and mount
options, for the purpose of spreading new features to freshly
formatted filesystems and reducing the testing matrix by disabling
nearly all mount options.  The patch uses the s_minor_rev_level field
to indicate the existence of ext5, and switch on feature/mount option
enforcement in the kernel.

The required feature set is:
^resize_inode,dirindex,ext_attr,sparse_super2,filetype,meta_bg,extents,
^flex_bg,64bit,inline_data,sparse_super,huge_file,large_file,dir_nlink,
extra_isize,metadata_csum

The required mount options are:
acl,block_validity,user_xattr,journal_checksum

All other mount options are no longer functional.

The 'ext4' type remains unchanged, for people who require mount
options or a different feature set.  I don't intend to fork any code;
I'm just painting a bigger target (for testing).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/problem.c                |   15 +++++++++
 e2fsck/problem.h                |   14 ++++++--
 e2fsck/unix.c                   |   68 +++++++++++++++++++++++++++++++++++++++
 lib/e2p/ls.c                    |   11 ++++++
 lib/ext2fs/ext2_fs.h            |    3 ++
 lib/ext2fs/ext2fs.h             |   50 +++++++++++++++++++++++++++++
 lib/ext2fs/initialize.c         |    1 +
 misc/Makefile.in                |   11 ++++--
 misc/mke2fs.c                   |   30 +++++++++++++++++
 misc/mke2fs.conf.in             |    4 ++
 misc/tune2fs.c                  |   23 +++++++++++++
 tests/metadata-checksum-test.sh |    5 +++
 tests/t_mke2fs_ext5/expect      |   45 ++++++++++++++++++++++++++
 tests/t_mke2fs_ext5/script      |   33 +++++++++++++++++++
 14 files changed, 306 insertions(+), 7 deletions(-)
 create mode 100644 tests/t_mke2fs_ext5/expect
 create mode 100755 tests/t_mke2fs_ext5/script


diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index ec20bd1..ddfe2b7 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -454,6 +454,21 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("@S 64bit filesystems needs extents to access the whole disk.  "),
 	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
 
+	/* ext5 feature set incorrect. */
+	{ PR_0_FIX_EXT5_FEATURES,
+	  N_("@S ext5 feature set incorrect.  "),
+	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
+
+	/* ext5 flag doesn't match with feature set. */
+	{ PR_0_REMOVE_EXT5_MINOR_REV,
+	  N_("@S ext5 flag doesn't match with feature set.  "),
+	  PROMPT_CLEAR, PR_PREEN_OK | PR_NO_OK},
+
+	/* ext5 default mount options incorrect. */
+	{ PR_0_FIX_EXT5_MNTOPTS,
+	  N_("@S ext5 default mount options incorrect.  "),
+	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
+
 	/* Pass 1 errors */
 
 	/* Pass 1: Checking inodes, blocks, and sizes */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index bc9fa9c..935f78a 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -249,9 +249,6 @@ struct problem_context {
 /* Checking group descriptor failed */
 #define PR_0_CHECK_DESC_FAILED			0x000045
 
-/* 64bit is set but extents are not set. */
-#define PR_0_64BIT_WITHOUT_EXTENTS		0x000048
-
 /*
  * metadata_csum supersedes uninit_bg; both feature bits cannot be set
  * simultaneously.
@@ -261,6 +258,17 @@ struct problem_context {
 /* Superblock has invalid MMP checksum. */
 #define PR_0_MMP_CSUM_INVALID			0x000047
 
+/* 64bit is set but extents are not set. */
+#define PR_0_64BIT_WITHOUT_EXTENTS		0x000048
+
+/* ext5 feature set incorrect. */
+#define PR_0_FIX_EXT5_FEATURES			0x000049
+
+/* ext5 flag doesn't match with feature set. */
+#define PR_0_REMOVE_EXT5_MINOR_REV		0x00004A
+
+/* ext5 default mount options incorrect. */
+#define PR_0_FIX_EXT5_MNTOPTS			0x00004B
 
 /*
  * Pass 1 errors
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index da888c2..55a5d03 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1205,6 +1205,71 @@ check_error:
 	return retval;
 }
 
+#define EXT5_FEATURE_COMPAT_FIXABLE	(EXT2_FEATURE_COMPAT_DIR_INDEX|\
+					 EXT2_FEATURE_COMPAT_EXT_ATTR)
+
+#define EXT5_FEATURE_INCOMPAT_FIXABLE	(EXT3_FEATURE_INCOMPAT_EXTENTS|\
+					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
+
+#define EXT5_FEATURE_RO_COMPAT_FIXABLE	(EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
+					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
+					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
+					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
+
+static void check_ext5_fs(e2fsck_t ctx, struct problem_context *pctx)
+{
+	struct ext2_super_block *sb = ctx->fs->super;
+	__u32 features[3];
+
+	if (sb->s_minor_rev_level != EXT5_MINOR_REV_LEVEL)
+		return;
+
+	features[0] = EXT5_FEATURE_COMPAT_REQD ^
+		(sb->s_feature_compat & EXT5_FEATURE_COMPAT_REQD_MASK);
+	features[1] = EXT5_FEATURE_INCOMPAT_REQD ^
+		(sb->s_feature_incompat & EXT5_FEATURE_INCOMPAT_REQD_MASK);
+	features[2] = EXT5_FEATURE_RO_COMPAT_REQD ^
+		(sb->s_feature_ro_compat & EXT5_FEATURE_RO_COMPAT_REQD_MASK);
+
+	if (!features[0] && !features[1] && !features[2])
+		goto check_mntopts;
+
+	if ((features[0] & EXT5_FEATURE_COMPAT_FIXABLE) == features[0] &&
+	    (features[1] & EXT5_FEATURE_INCOMPAT_FIXABLE) == features[1] &&
+	    (features[2] & EXT5_FEATURE_RO_COMPAT_FIXABLE) == features[2]) {
+		if (fix_problem(ctx, PR_0_FIX_EXT5_FEATURES, pctx)) {
+			sb->s_feature_compat = EXT5_FEATURE_COMPAT_REQD |
+				(sb->s_feature_compat &
+				 ~EXT5_FEATURE_COMPAT_REQD_MASK);
+			sb->s_feature_incompat = EXT5_FEATURE_INCOMPAT_REQD |
+				(sb->s_feature_incompat &
+				 ~EXT5_FEATURE_INCOMPAT_REQD_MASK);
+			sb->s_feature_ro_compat = EXT5_FEATURE_RO_COMPAT_REQD |
+				(sb->s_feature_ro_compat &
+				 ~EXT5_FEATURE_RO_COMPAT_REQD_MASK);
+			ext2fs_mark_super_dirty(ctx->fs);
+		}
+	} else {
+		if (fix_problem(ctx, PR_0_REMOVE_EXT5_MINOR_REV, pctx)) {
+			sb->s_minor_rev_level = 0;
+			ext2fs_mark_super_dirty(ctx->fs);
+		}
+	}
+
+check_mntopts:
+	if (!(EXT5_DEF_MNTOPT ^
+	      (sb->s_default_mount_opts & EXT5_DEF_MNTOPT_MASK)))
+		return;
+
+	if (fix_problem(ctx, PR_0_FIX_EXT5_MNTOPTS, pctx)) {
+		sb->s_default_mount_opts = EXT5_DEF_MNTOPT |
+			(sb->s_default_mount_opts & ~EXT5_DEF_MNTOPT_MASK);
+		ext2fs_mark_super_dirty(ctx->fs);
+	}
+
+	return;
+}
+
 int main (int argc, char *argv[])
 {
 	errcode_t	retval = 0, retval2 = 0, orig_retval = 0;
@@ -1601,6 +1666,9 @@ print_unsupp_features:
 	}
 #endif
 
+	/* check ext5 features and mount options */
+	check_ext5_fs(ctx, &pctx);
+
 	/*
 	 * If the user specified a specific superblock, presumably the
 	 * master superblock has been trashed.  So we mark the
diff --git a/lib/e2p/ls.c b/lib/e2p/ls.c
index a7ea38a..ba91e6a 100644
--- a/lib/e2p/ls.c
+++ b/lib/e2p/ls.c
@@ -239,6 +239,17 @@ void list_super2(struct ext2_super_block * sb, FILE *f)
 #endif
 	} else
 		fprintf(f, " (unknown)\n");
+	if (sb->s_minor_rev_level) {
+		fprintf(f, "Filesystem minor rev #:   %d",
+			sb->s_minor_rev_level);
+		switch (sb->s_minor_rev_level) {
+		case EXT5_MINOR_REV_LEVEL:
+			fprintf(f, " (ext5)\n");
+			break;
+		default:
+			fprintf(f, " (unknown)\n");
+		}
+	}
 	print_features(sb, f);
 	print_super_flags(sb, f);
 	print_mntopts(sb, f);
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 21a8187..027cfe9 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -926,4 +926,7 @@ struct mmp_struct {
  */
 #define EXT4_INLINE_DATA_DOTDOT_SIZE	(4)
 
+/* Minor revision level for ext5 */
+#define EXT5_MINOR_REV_LEVEL		(2)
+
 #endif	/* _LINUX_EXT2_FS_H */
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 84c7c74..fd53162 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -611,6 +611,56 @@ typedef struct ext2_icount *ext2_icount_t;
 					 EXT4_LIB_RO_COMPAT_QUOTA|\
 					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
 
+/* ext5 features */
+#define EXT5_FEATURE_COMPAT_REQD_MASK	(EXT2_FEATURE_COMPAT_RESIZE_INODE|\
+					 EXT2_FEATURE_COMPAT_DIR_INDEX|\
+					 EXT2_FEATURE_COMPAT_EXT_ATTR|\
+					 EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
+
+#define EXT5_FEATURE_COMPAT_REQD	(EXT2_FEATURE_COMPAT_DIR_INDEX|\
+					 EXT2_FEATURE_COMPAT_EXT_ATTR|\
+					 EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
+
+#define EXT5_FEATURE_INCOMPAT_REQD_MASK	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
+					 EXT2_FEATURE_INCOMPAT_META_BG|\
+					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
+					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
+					 EXT4_FEATURE_INCOMPAT_64BIT|\
+					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
+
+#define EXT5_FEATURE_INCOMPAT_REQD	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
+					 EXT2_FEATURE_INCOMPAT_META_BG|\
+					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
+					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
+					 EXT4_FEATURE_INCOMPAT_64BIT|\
+					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
+
+#define EXT5_FEATURE_RO_COMPAT_REQD_MASK (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
+					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
+					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
+					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
+					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE|\
+					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM|\
+					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+
+#define EXT5_FEATURE_RO_COMPAT_REQD	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
+					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
+					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
+					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
+					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE|\
+					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+
+#define EXT5_DEF_MNTOPT_MASK		(EXT2_DEFM_XATTR_USER|\
+					 EXT2_DEFM_ACL|\
+					 EXT2_DEFM_UID16|\
+					 EXT4_DEFM_NOBARRIER|\
+					 EXT4_DEFM_BLOCK_VALIDITY|\
+					 EXT4_DEFM_NODELALLOC)
+
+#define EXT5_DEF_MNTOPT			(EXT2_DEFM_XATTR_USER|\
+					 EXT2_DEFM_ACL|\
+					 EXT4_DEFM_BLOCK_VALIDITY)
+
 /*
  * These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
  * to ext2fs_openfs()
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index 75fbf8e..2d0731b 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -173,6 +173,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
 	set_field(s_raid_stripe_width, 0);	/* default stripe width: 0 */
 	set_field(s_log_groups_per_flex, 0);
 	set_field(s_flags, 0);
+	set_field(s_minor_rev_level, 0);
 	assign_field(s_backup_bgs[0]);
 	assign_field(s_backup_bgs[1]);
 	if (super->s_feature_incompat & ~EXT2_LIB_FEATURE_INCOMPAT_SUPP) {
diff --git a/misc/Makefile.in b/misc/Makefile.in
index 1b942f2..6776f41 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -475,7 +475,7 @@ install: all $(SMANPAGES) $(UMANPAGES) installdirs
 		$(ES) "	INSTALL $(sbindir)/$$i"; \
 		$(INSTALL_PROGRAM) $$i $(DESTDIR)$(sbindir)/$$i; \
 	done
-	$(Q) for i in ext2 ext3 ext4 ext4dev; do \
+	$(Q) for i in ext2 ext3 ext4 ext4dev ext5; do \
 		$(ES) "	LINK $(root_sbindir)/mkfs.$$i"; \
 		(cd $(DESTDIR)$(root_sbindir); \
 			$(LN) $(LINK_INSTALL_FLAGS) mke2fs mkfs.$$i); \
@@ -504,7 +504,7 @@ install: all $(SMANPAGES) $(UMANPAGES) installdirs
 	done
 	$(Q) $(RM) -f $(DESTDIR)$(man8dir)/mkfs.ext2.8.gz \
 		$(DESTDIR)$(man8dir)/mkfs.ext3.8.gz
-	$(Q) for i in ext2 ext3 ext4 ext4dev; do \
+	$(Q) for i in ext2 ext3 ext4 ext4dev ext5; do \
 		$(ES) "	LINK mkfs.$$i.8"; \
 		(cd $(DESTDIR)$(man8dir); \
 			$(LN) $(LINK_INSTALL_FLAGS) mke2fs.8 mkfs.$$i.8); \
@@ -580,7 +580,8 @@ uninstall:
 	$(RM) -f $(DESTDIR)$(root_sbindir)/mkfs.ext2 \
 			$(DESTDIR)$(root_sbindir)/mkfs.ext3 \
 			$(DESTDIR)$(root_sbindir)/mkfs.ext4 \
-			$(DESTDIR)$(root_sbindir)/mkfs.ext4dev
+			$(DESTDIR)$(root_sbindir)/mkfs.ext4dev \
+			$(DESTDIR)$(root_sbindir)/mkfs.ext5
 	for i in $(UPROGS); do \
 		$(RM) -f $(DESTDIR)$(bindir)/$$i; \
 	done
@@ -591,10 +592,12 @@ uninstall:
 		$(DESTDIR)$(man8dir)/mkfs.ext3.8 \
 		$(DESTDIR)$(man8dir)/mkfs.ext4.8 \
 		$(DESTDIR)$(man8dir)/mkfs.ext4dev.8 \
+		$(DESTDIR)$(man8dir)/mkfs.ext5.8 \
 		$(DESTDIR)$(man8dir)/fsck.ext2.8 \
 		$(DESTDIR)$(man8dir)/fsck.ext3.8 \
 		$(DESTDIR)$(man8dir)/fsck.ext4.8 \
-		$(DESTDIR)$(man8dir)/fsck.ext4dev.8
+		$(DESTDIR)$(man8dir)/fsck.ext4dev.8 \
+		$(DESTDIR)$(man8dir)/fsck.ext5.8
 
 	for i in $(UMANPAGES); do \
 		$(RM) -f $(DESTDIR)$(man1dir)/$$i; \
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index a794689..c810238 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1915,6 +1915,36 @@ profile_error:
 		     &fs_param.s_feature_compat);
 	if (tmp)
 		free(tmp);
+
+	/* Add in ext5 options */
+	tmp = get_string_from_profile(fs_types, "interface", NULL);
+	if (tmp) {
+		if (!strcmp(tmp, "ext5"))
+			fs_param.s_minor_rev_level = EXT5_MINOR_REV_LEVEL;
+		else {
+			fprintf(stderr, _("Unknown interface `%s'.\n"), tmp);
+			exit(1);
+		}
+		free(tmp);
+	}
+	if (fs_param.s_minor_rev_level == EXT5_MINOR_REV_LEVEL) {
+		fs_param.s_feature_incompat = EXT5_FEATURE_INCOMPAT_REQD |
+			(fs_param.s_feature_incompat &
+			 ~EXT5_FEATURE_INCOMPAT_REQD_MASK);
+		fs_param.s_feature_ro_compat = EXT5_FEATURE_RO_COMPAT_REQD |
+			(fs_param.s_feature_ro_compat &
+			 ~EXT5_FEATURE_RO_COMPAT_REQD_MASK);
+		fs_param.s_feature_compat = EXT5_FEATURE_COMPAT_REQD |
+			(fs_param.s_feature_compat &
+			 ~EXT5_FEATURE_COMPAT_REQD_MASK);
+		fs_param.s_default_mount_opts = EXT5_DEF_MNTOPT |
+			(fs_param.s_default_mount_opts & ~EXT5_DEF_MNTOPT_MASK);
+		fs_param.s_rev_level = EXT2_DYNAMIC_REV;
+		if (r_opt < EXT2_DYNAMIC_REV)
+			r_opt = -1;
+		fs_param.s_inode_size = 256;
+	}
+
 	/*
 	 * If the user specified features incompatible with the Hurd, complain
 	 */
diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
index de0250d..94fd139 100644
--- a/misc/mke2fs.conf.in
+++ b/misc/mke2fs.conf.in
@@ -20,6 +20,10 @@
 		inode_size = 256
 		options = test_fs=1
 	}
+	ext5 = {
+		features = has_journal
+		interface = ext5
+	}
 	small = {
 		blocksize = 1024
 		inode_size = 128
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 6571764..d3d6330 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -2406,6 +2406,26 @@ static int tune2fs_setup_tdb(const char *name, io_manager *io_ptr)
 	return retval;
 }
 
+static errcode_t update_minor_rev(ext2_filsys fs)
+{
+	if (fs->super->s_minor_rev_level != EXT5_MINOR_REV_LEVEL)
+		return 0;
+
+	if ((EXT5_FEATURE_COMPAT_REQD ^
+	     (fs->super->s_feature_compat & EXT5_FEATURE_COMPAT_REQD_MASK)) ||
+	    (EXT5_FEATURE_INCOMPAT_REQD ^
+	     (fs->super->s_feature_incompat & EXT5_FEATURE_INCOMPAT_REQD_MASK)) ||
+	    (EXT5_FEATURE_RO_COMPAT_REQD ^
+	     (fs->super->s_feature_ro_compat & EXT5_FEATURE_RO_COMPAT_REQD_MASK)) ||
+            (EXT5_DEF_MNTOPT ^
+	     (fs->super->s_default_mount_opts & EXT5_DEF_MNTOPT_MASK))) {
+		fs->super->s_minor_rev_level = 0;
+		ext2fs_mark_super_dirty(fs);
+	}
+
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
 	errcode_t retval;
@@ -2659,6 +2679,9 @@ retry_open:
 		if (rc)
 			goto closefs;
 	}
+	rc = update_minor_rev(fs);
+	if (rc)
+		goto closefs;
 	if (extended_cmd) {
 		rc = parse_extended_opts(fs, extended_cmd);
 		if (rc)
diff --git a/tests/metadata-checksum-test.sh b/tests/metadata-checksum-test.sh
index a17bfd2..e51b1fa 100755
--- a/tests/metadata-checksum-test.sh
+++ b/tests/metadata-checksum-test.sh
@@ -190,6 +190,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
 	blocksize = 4096
 	inode_size = 256
 	inode_ratio = 16384
+	interface = ext5
 
 [fs_types]
 	ext4icsum_no_bv = {
@@ -200,6 +201,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
 		options = mmp_update_interval=5 #${RESIZE_PARAM}
 		lazy_itable_init = 1
 		cluster_size = $((BLK_SZ * 2))
+		interface = ext5
 	}
 	ext4icsum = {
 		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
@@ -208,6 +210,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
 		options = mmp_update_interval=5 #${RESIZE_PARAM}
 		lazy_itable_init = 1
 		cluster_size = $((BLK_SZ * 2))
+		interface = ext5
 	}
 	ext4icsum_noresize = {
 		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
@@ -216,6 +219,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
 		options = mmp_update_interval=5
 		lazy_itable_init = 1
 		cluster_size = $((BLK_SZ * 2))
+		interface = ext5
 	}
 	ext4icsum_hugefiles = {
 		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
@@ -235,6 +239,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
 		hugefiles_digits = 4
 		hugefiles_size = 1G
 		num_hugefiles = 0
+		interface = ext5
 	}
 ENDL
 MKFS_OPTS=""
diff --git a/tests/t_mke2fs_ext5/expect b/tests/t_mke2fs_ext5/expect
new file mode 100644
index 0000000..87e1185
--- /dev/null
+++ b/tests/t_mke2fs_ext5/expect
@@ -0,0 +1,45 @@
+Filesystem volume name:   <none>
+Last mounted on:          <not available>
+Filesystem magic number:  0xEF53
+Filesystem revision #:    1 (dynamic)
+Filesystem minor rev #:   2 (ext5)
+Filesystem features:      ext_attr dir_index sparse_super2 filetype meta_bg extent 64bit flex_bg inline_data sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
+Filesystem flags:         signed_directory_hash 
+Default mount options:    user_xattr acl block_validity
+Filesystem state:         clean
+Errors behavior:          Continue
+Filesystem OS type:       Linux
+Inode count:              64
+Block count:              128
+Reserved block count:     6
+Free blocks:              116
+Free inodes:              53
+First block:              0
+Block size:               4096
+Fragment size:            4096
+Group descriptor size:    64
+Blocks per group:         32768
+Fragments per group:      32768
+Inodes per group:         64
+Inode blocks per group:   4
+Flex block group size:    16
+Last mount time:          n/a
+Mount count:              0
+Maximum mount count:      -1
+Check interval:           0 (<none>)
+Lifetime writes:          5 kB
+Reserved blocks uid:      0 (user root)
+Reserved blocks gid:      0 (group root)
+First inode:              11
+Inode size:	          256
+Required extra isize:     28
+Desired extra isize:      28
+Default directory hash:   half_md4
+
+
+Group 0: (Blocks 0-127) [ITABLE_ZEROED]
+  Primary superblock at 0, Group descriptor at 1
+  Inode table at 34-37 (+34)
+  116 free blocks, 53 free inodes, 2 directories, 53 unused inodes
+  Free blocks: 7-17, 19-33, 38-127
+  Free inodes: 12-64
diff --git a/tests/t_mke2fs_ext5/script b/tests/t_mke2fs_ext5/script
new file mode 100755
index 0000000..9be9bf5
--- /dev/null
+++ b/tests/t_mke2fs_ext5/script
@@ -0,0 +1,33 @@
+test_description="mke2fs with ext5"
+
+conf=$TMPFILE.conf
+
+cat > $conf << ENDL
+[defaults]
+	interface = ext5
+ENDL
+
+trap "rm -rf $TMPFILE $TMPFILE.conf" EXIT INT QUIT
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+OUT=$test_name.log
+EXP=$test_dir/expect
+rm -rf $OUT
+
+# Test command line option
+MKE2FS_CONFIG=$TMPFILE.conf
+export MKE2FS_CONFIG
+$MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE | egrep -v "(Filesystem UUID|Filesystem created|Last write time|Last checked|Directory Hash Seed|Checksum| csum )" >> $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+	rm -f $test_name.tmp
+fi
+


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-01 23:16 ` [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity Darrick J. Wong
@ 2014-05-02  9:45   ` Lukáš Czerner
  2014-05-02 14:04     ` Theodore Ts'o
  2014-05-06  1:33     ` Darrick J. Wong
  0 siblings, 2 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02  9:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:16:29 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 37/37] ext5: define new subtype to add features and reduce
>     testing complexity
> 
> This patch defines ext5 as a set of required feature flags and mount
> options, for the purpose of spreading new features to freshly
> formatted filesystems and reducing the testing matrix by disabling
> nearly all mount options.  The patch uses the s_minor_rev_level field
> to indicate the existence of ext5, and switch on feature/mount option
> enforcement in the kernel.
> 
> The required feature set is:
> ^resize_inode,dirindex,ext_attr,sparse_super2,filetype,meta_bg,extents,
> ^flex_bg,64bit,inline_data,sparse_super,huge_file,large_file,dir_nlink,
> extra_isize,metadata_csum
> 
> The required mount options are:
> acl,block_validity,user_xattr,journal_checksum
> 
> All other mount options are no longer functional.
> 
> The 'ext4' type remains unchanged, for people who require mount
> options or a different feature set.  I don't intend to fork any code;
> I'm just painting a bigger target (for testing).

This is definitely NACK by me. I do not like this and there are
several reasons why.

First of all the name. Given the history of ext file system we tend
to increase then number with the new version of file system. However
you're saying that this is just for testing features ... in that
case it does not make any sense to call it ext5, but not just that
it's stupid to call it ext5 especially since we might actually want
to release ext5 in the future and this would be really confusing for
everybody involved.

I've been trying to get rid of the ext4dev bits and pieces
more-or-less successfully and you're adding new type once again. We
might start the discussion whether to revive ext4dev for this kind
of thing but I am not really convinced that this is the right way to
go either.

What about just simply using mkefs.conf to specify the feature set
we want and use that ? It's simple enough and it should work. We
could also extend the configuration to be able to set default
mount options and such if that's not possible. I just do not understand
why to introduce new file system type if that's just for testing
ext4 features.

Thanks!
-Lukas



> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  e2fsck/problem.c                |   15 +++++++++
>  e2fsck/problem.h                |   14 ++++++--
>  e2fsck/unix.c                   |   68 +++++++++++++++++++++++++++++++++++++++
>  lib/e2p/ls.c                    |   11 ++++++
>  lib/ext2fs/ext2_fs.h            |    3 ++
>  lib/ext2fs/ext2fs.h             |   50 +++++++++++++++++++++++++++++
>  lib/ext2fs/initialize.c         |    1 +
>  misc/Makefile.in                |   11 ++++--
>  misc/mke2fs.c                   |   30 +++++++++++++++++
>  misc/mke2fs.conf.in             |    4 ++
>  misc/tune2fs.c                  |   23 +++++++++++++
>  tests/metadata-checksum-test.sh |    5 +++
>  tests/t_mke2fs_ext5/expect      |   45 ++++++++++++++++++++++++++
>  tests/t_mke2fs_ext5/script      |   33 +++++++++++++++++++
>  14 files changed, 306 insertions(+), 7 deletions(-)
>  create mode 100644 tests/t_mke2fs_ext5/expect
>  create mode 100755 tests/t_mke2fs_ext5/script
> 
> 
> diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> index ec20bd1..ddfe2b7 100644
> --- a/e2fsck/problem.c
> +++ b/e2fsck/problem.c
> @@ -454,6 +454,21 @@ static struct e2fsck_problem problem_table[] = {
>  	  N_("@S 64bit filesystems needs extents to access the whole disk.  "),
>  	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
>  
> +	/* ext5 feature set incorrect. */
> +	{ PR_0_FIX_EXT5_FEATURES,
> +	  N_("@S ext5 feature set incorrect.  "),
> +	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
> +
> +	/* ext5 flag doesn't match with feature set. */
> +	{ PR_0_REMOVE_EXT5_MINOR_REV,
> +	  N_("@S ext5 flag doesn't match with feature set.  "),
> +	  PROMPT_CLEAR, PR_PREEN_OK | PR_NO_OK},
> +
> +	/* ext5 default mount options incorrect. */
> +	{ PR_0_FIX_EXT5_MNTOPTS,
> +	  N_("@S ext5 default mount options incorrect.  "),
> +	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
> +
>  	/* Pass 1 errors */
>  
>  	/* Pass 1: Checking inodes, blocks, and sizes */
> diff --git a/e2fsck/problem.h b/e2fsck/problem.h
> index bc9fa9c..935f78a 100644
> --- a/e2fsck/problem.h
> +++ b/e2fsck/problem.h
> @@ -249,9 +249,6 @@ struct problem_context {
>  /* Checking group descriptor failed */
>  #define PR_0_CHECK_DESC_FAILED			0x000045
>  
> -/* 64bit is set but extents are not set. */
> -#define PR_0_64BIT_WITHOUT_EXTENTS		0x000048
> -
>  /*
>   * metadata_csum supersedes uninit_bg; both feature bits cannot be set
>   * simultaneously.
> @@ -261,6 +258,17 @@ struct problem_context {
>  /* Superblock has invalid MMP checksum. */
>  #define PR_0_MMP_CSUM_INVALID			0x000047
>  
> +/* 64bit is set but extents are not set. */
> +#define PR_0_64BIT_WITHOUT_EXTENTS		0x000048
> +
> +/* ext5 feature set incorrect. */
> +#define PR_0_FIX_EXT5_FEATURES			0x000049
> +
> +/* ext5 flag doesn't match with feature set. */
> +#define PR_0_REMOVE_EXT5_MINOR_REV		0x00004A
> +
> +/* ext5 default mount options incorrect. */
> +#define PR_0_FIX_EXT5_MNTOPTS			0x00004B
>  
>  /*
>   * Pass 1 errors
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index da888c2..55a5d03 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -1205,6 +1205,71 @@ check_error:
>  	return retval;
>  }
>  
> +#define EXT5_FEATURE_COMPAT_FIXABLE	(EXT2_FEATURE_COMPAT_DIR_INDEX|\
> +					 EXT2_FEATURE_COMPAT_EXT_ATTR)
> +
> +#define EXT5_FEATURE_INCOMPAT_FIXABLE	(EXT3_FEATURE_INCOMPAT_EXTENTS|\
> +					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
> +
> +#define EXT5_FEATURE_RO_COMPAT_FIXABLE	(EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
> +					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
> +					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
> +					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
> +
> +static void check_ext5_fs(e2fsck_t ctx, struct problem_context *pctx)
> +{
> +	struct ext2_super_block *sb = ctx->fs->super;
> +	__u32 features[3];
> +
> +	if (sb->s_minor_rev_level != EXT5_MINOR_REV_LEVEL)
> +		return;
> +
> +	features[0] = EXT5_FEATURE_COMPAT_REQD ^
> +		(sb->s_feature_compat & EXT5_FEATURE_COMPAT_REQD_MASK);
> +	features[1] = EXT5_FEATURE_INCOMPAT_REQD ^
> +		(sb->s_feature_incompat & EXT5_FEATURE_INCOMPAT_REQD_MASK);
> +	features[2] = EXT5_FEATURE_RO_COMPAT_REQD ^
> +		(sb->s_feature_ro_compat & EXT5_FEATURE_RO_COMPAT_REQD_MASK);
> +
> +	if (!features[0] && !features[1] && !features[2])
> +		goto check_mntopts;
> +
> +	if ((features[0] & EXT5_FEATURE_COMPAT_FIXABLE) == features[0] &&
> +	    (features[1] & EXT5_FEATURE_INCOMPAT_FIXABLE) == features[1] &&
> +	    (features[2] & EXT5_FEATURE_RO_COMPAT_FIXABLE) == features[2]) {
> +		if (fix_problem(ctx, PR_0_FIX_EXT5_FEATURES, pctx)) {
> +			sb->s_feature_compat = EXT5_FEATURE_COMPAT_REQD |
> +				(sb->s_feature_compat &
> +				 ~EXT5_FEATURE_COMPAT_REQD_MASK);
> +			sb->s_feature_incompat = EXT5_FEATURE_INCOMPAT_REQD |
> +				(sb->s_feature_incompat &
> +				 ~EXT5_FEATURE_INCOMPAT_REQD_MASK);
> +			sb->s_feature_ro_compat = EXT5_FEATURE_RO_COMPAT_REQD |
> +				(sb->s_feature_ro_compat &
> +				 ~EXT5_FEATURE_RO_COMPAT_REQD_MASK);
> +			ext2fs_mark_super_dirty(ctx->fs);
> +		}
> +	} else {
> +		if (fix_problem(ctx, PR_0_REMOVE_EXT5_MINOR_REV, pctx)) {
> +			sb->s_minor_rev_level = 0;
> +			ext2fs_mark_super_dirty(ctx->fs);
> +		}
> +	}
> +
> +check_mntopts:
> +	if (!(EXT5_DEF_MNTOPT ^
> +	      (sb->s_default_mount_opts & EXT5_DEF_MNTOPT_MASK)))
> +		return;
> +
> +	if (fix_problem(ctx, PR_0_FIX_EXT5_MNTOPTS, pctx)) {
> +		sb->s_default_mount_opts = EXT5_DEF_MNTOPT |
> +			(sb->s_default_mount_opts & ~EXT5_DEF_MNTOPT_MASK);
> +		ext2fs_mark_super_dirty(ctx->fs);
> +	}
> +
> +	return;
> +}
> +
>  int main (int argc, char *argv[])
>  {
>  	errcode_t	retval = 0, retval2 = 0, orig_retval = 0;
> @@ -1601,6 +1666,9 @@ print_unsupp_features:
>  	}
>  #endif
>  
> +	/* check ext5 features and mount options */
> +	check_ext5_fs(ctx, &pctx);
> +
>  	/*
>  	 * If the user specified a specific superblock, presumably the
>  	 * master superblock has been trashed.  So we mark the
> diff --git a/lib/e2p/ls.c b/lib/e2p/ls.c
> index a7ea38a..ba91e6a 100644
> --- a/lib/e2p/ls.c
> +++ b/lib/e2p/ls.c
> @@ -239,6 +239,17 @@ void list_super2(struct ext2_super_block * sb, FILE *f)
>  #endif
>  	} else
>  		fprintf(f, " (unknown)\n");
> +	if (sb->s_minor_rev_level) {
> +		fprintf(f, "Filesystem minor rev #:   %d",
> +			sb->s_minor_rev_level);
> +		switch (sb->s_minor_rev_level) {
> +		case EXT5_MINOR_REV_LEVEL:
> +			fprintf(f, " (ext5)\n");
> +			break;
> +		default:
> +			fprintf(f, " (unknown)\n");
> +		}
> +	}
>  	print_features(sb, f);
>  	print_super_flags(sb, f);
>  	print_mntopts(sb, f);
> diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> index 21a8187..027cfe9 100644
> --- a/lib/ext2fs/ext2_fs.h
> +++ b/lib/ext2fs/ext2_fs.h
> @@ -926,4 +926,7 @@ struct mmp_struct {
>   */
>  #define EXT4_INLINE_DATA_DOTDOT_SIZE	(4)
>  
> +/* Minor revision level for ext5 */
> +#define EXT5_MINOR_REV_LEVEL		(2)
> +
>  #endif	/* _LINUX_EXT2_FS_H */
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 84c7c74..fd53162 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -611,6 +611,56 @@ typedef struct ext2_icount *ext2_icount_t;
>  					 EXT4_LIB_RO_COMPAT_QUOTA|\
>  					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
>  
> +/* ext5 features */
> +#define EXT5_FEATURE_COMPAT_REQD_MASK	(EXT2_FEATURE_COMPAT_RESIZE_INODE|\
> +					 EXT2_FEATURE_COMPAT_DIR_INDEX|\
> +					 EXT2_FEATURE_COMPAT_EXT_ATTR|\
> +					 EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> +
> +#define EXT5_FEATURE_COMPAT_REQD	(EXT2_FEATURE_COMPAT_DIR_INDEX|\
> +					 EXT2_FEATURE_COMPAT_EXT_ATTR|\
> +					 EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> +
> +#define EXT5_FEATURE_INCOMPAT_REQD_MASK	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
> +					 EXT2_FEATURE_INCOMPAT_META_BG|\
> +					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
> +					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
> +					 EXT4_FEATURE_INCOMPAT_64BIT|\
> +					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
> +
> +#define EXT5_FEATURE_INCOMPAT_REQD	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
> +					 EXT2_FEATURE_INCOMPAT_META_BG|\
> +					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
> +					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
> +					 EXT4_FEATURE_INCOMPAT_64BIT|\
> +					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
> +
> +#define EXT5_FEATURE_RO_COMPAT_REQD_MASK (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
> +					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
> +					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
> +					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
> +					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE|\
> +					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM|\
> +					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
> +
> +#define EXT5_FEATURE_RO_COMPAT_REQD	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
> +					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
> +					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
> +					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
> +					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE|\
> +					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
> +
> +#define EXT5_DEF_MNTOPT_MASK		(EXT2_DEFM_XATTR_USER|\
> +					 EXT2_DEFM_ACL|\
> +					 EXT2_DEFM_UID16|\
> +					 EXT4_DEFM_NOBARRIER|\
> +					 EXT4_DEFM_BLOCK_VALIDITY|\
> +					 EXT4_DEFM_NODELALLOC)
> +
> +#define EXT5_DEF_MNTOPT			(EXT2_DEFM_XATTR_USER|\
> +					 EXT2_DEFM_ACL|\
> +					 EXT4_DEFM_BLOCK_VALIDITY)
> +
>  /*
>   * These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
>   * to ext2fs_openfs()
> diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
> index 75fbf8e..2d0731b 100644
> --- a/lib/ext2fs/initialize.c
> +++ b/lib/ext2fs/initialize.c
> @@ -173,6 +173,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
>  	set_field(s_raid_stripe_width, 0);	/* default stripe width: 0 */
>  	set_field(s_log_groups_per_flex, 0);
>  	set_field(s_flags, 0);
> +	set_field(s_minor_rev_level, 0);
>  	assign_field(s_backup_bgs[0]);
>  	assign_field(s_backup_bgs[1]);
>  	if (super->s_feature_incompat & ~EXT2_LIB_FEATURE_INCOMPAT_SUPP) {
> diff --git a/misc/Makefile.in b/misc/Makefile.in
> index 1b942f2..6776f41 100644
> --- a/misc/Makefile.in
> +++ b/misc/Makefile.in
> @@ -475,7 +475,7 @@ install: all $(SMANPAGES) $(UMANPAGES) installdirs
>  		$(ES) "	INSTALL $(sbindir)/$$i"; \
>  		$(INSTALL_PROGRAM) $$i $(DESTDIR)$(sbindir)/$$i; \
>  	done
> -	$(Q) for i in ext2 ext3 ext4 ext4dev; do \
> +	$(Q) for i in ext2 ext3 ext4 ext4dev ext5; do \
>  		$(ES) "	LINK $(root_sbindir)/mkfs.$$i"; \
>  		(cd $(DESTDIR)$(root_sbindir); \
>  			$(LN) $(LINK_INSTALL_FLAGS) mke2fs mkfs.$$i); \
> @@ -504,7 +504,7 @@ install: all $(SMANPAGES) $(UMANPAGES) installdirs
>  	done
>  	$(Q) $(RM) -f $(DESTDIR)$(man8dir)/mkfs.ext2.8.gz \
>  		$(DESTDIR)$(man8dir)/mkfs.ext3.8.gz
> -	$(Q) for i in ext2 ext3 ext4 ext4dev; do \
> +	$(Q) for i in ext2 ext3 ext4 ext4dev ext5; do \
>  		$(ES) "	LINK mkfs.$$i.8"; \
>  		(cd $(DESTDIR)$(man8dir); \
>  			$(LN) $(LINK_INSTALL_FLAGS) mke2fs.8 mkfs.$$i.8); \
> @@ -580,7 +580,8 @@ uninstall:
>  	$(RM) -f $(DESTDIR)$(root_sbindir)/mkfs.ext2 \
>  			$(DESTDIR)$(root_sbindir)/mkfs.ext3 \
>  			$(DESTDIR)$(root_sbindir)/mkfs.ext4 \
> -			$(DESTDIR)$(root_sbindir)/mkfs.ext4dev
> +			$(DESTDIR)$(root_sbindir)/mkfs.ext4dev \
> +			$(DESTDIR)$(root_sbindir)/mkfs.ext5
>  	for i in $(UPROGS); do \
>  		$(RM) -f $(DESTDIR)$(bindir)/$$i; \
>  	done
> @@ -591,10 +592,12 @@ uninstall:
>  		$(DESTDIR)$(man8dir)/mkfs.ext3.8 \
>  		$(DESTDIR)$(man8dir)/mkfs.ext4.8 \
>  		$(DESTDIR)$(man8dir)/mkfs.ext4dev.8 \
> +		$(DESTDIR)$(man8dir)/mkfs.ext5.8 \
>  		$(DESTDIR)$(man8dir)/fsck.ext2.8 \
>  		$(DESTDIR)$(man8dir)/fsck.ext3.8 \
>  		$(DESTDIR)$(man8dir)/fsck.ext4.8 \
> -		$(DESTDIR)$(man8dir)/fsck.ext4dev.8
> +		$(DESTDIR)$(man8dir)/fsck.ext4dev.8 \
> +		$(DESTDIR)$(man8dir)/fsck.ext5.8
>  
>  	for i in $(UMANPAGES); do \
>  		$(RM) -f $(DESTDIR)$(man1dir)/$$i; \
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index a794689..c810238 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1915,6 +1915,36 @@ profile_error:
>  		     &fs_param.s_feature_compat);
>  	if (tmp)
>  		free(tmp);
> +
> +	/* Add in ext5 options */
> +	tmp = get_string_from_profile(fs_types, "interface", NULL);
> +	if (tmp) {
> +		if (!strcmp(tmp, "ext5"))
> +			fs_param.s_minor_rev_level = EXT5_MINOR_REV_LEVEL;
> +		else {
> +			fprintf(stderr, _("Unknown interface `%s'.\n"), tmp);
> +			exit(1);
> +		}
> +		free(tmp);
> +	}
> +	if (fs_param.s_minor_rev_level == EXT5_MINOR_REV_LEVEL) {
> +		fs_param.s_feature_incompat = EXT5_FEATURE_INCOMPAT_REQD |
> +			(fs_param.s_feature_incompat &
> +			 ~EXT5_FEATURE_INCOMPAT_REQD_MASK);
> +		fs_param.s_feature_ro_compat = EXT5_FEATURE_RO_COMPAT_REQD |
> +			(fs_param.s_feature_ro_compat &
> +			 ~EXT5_FEATURE_RO_COMPAT_REQD_MASK);
> +		fs_param.s_feature_compat = EXT5_FEATURE_COMPAT_REQD |
> +			(fs_param.s_feature_compat &
> +			 ~EXT5_FEATURE_COMPAT_REQD_MASK);
> +		fs_param.s_default_mount_opts = EXT5_DEF_MNTOPT |
> +			(fs_param.s_default_mount_opts & ~EXT5_DEF_MNTOPT_MASK);
> +		fs_param.s_rev_level = EXT2_DYNAMIC_REV;
> +		if (r_opt < EXT2_DYNAMIC_REV)
> +			r_opt = -1;
> +		fs_param.s_inode_size = 256;
> +	}
> +
>  	/*
>  	 * If the user specified features incompatible with the Hurd, complain
>  	 */
> diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
> index de0250d..94fd139 100644
> --- a/misc/mke2fs.conf.in
> +++ b/misc/mke2fs.conf.in
> @@ -20,6 +20,10 @@
>  		inode_size = 256
>  		options = test_fs=1
>  	}
> +	ext5 = {
> +		features = has_journal
> +		interface = ext5
> +	}
>  	small = {
>  		blocksize = 1024
>  		inode_size = 128
> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> index 6571764..d3d6330 100644
> --- a/misc/tune2fs.c
> +++ b/misc/tune2fs.c
> @@ -2406,6 +2406,26 @@ static int tune2fs_setup_tdb(const char *name, io_manager *io_ptr)
>  	return retval;
>  }
>  
> +static errcode_t update_minor_rev(ext2_filsys fs)
> +{
> +	if (fs->super->s_minor_rev_level != EXT5_MINOR_REV_LEVEL)
> +		return 0;
> +
> +	if ((EXT5_FEATURE_COMPAT_REQD ^
> +	     (fs->super->s_feature_compat & EXT5_FEATURE_COMPAT_REQD_MASK)) ||
> +	    (EXT5_FEATURE_INCOMPAT_REQD ^
> +	     (fs->super->s_feature_incompat & EXT5_FEATURE_INCOMPAT_REQD_MASK)) ||
> +	    (EXT5_FEATURE_RO_COMPAT_REQD ^
> +	     (fs->super->s_feature_ro_compat & EXT5_FEATURE_RO_COMPAT_REQD_MASK)) ||
> +            (EXT5_DEF_MNTOPT ^
> +	     (fs->super->s_default_mount_opts & EXT5_DEF_MNTOPT_MASK))) {
> +		fs->super->s_minor_rev_level = 0;
> +		ext2fs_mark_super_dirty(fs);
> +	}
> +
> +	return 0;
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	errcode_t retval;
> @@ -2659,6 +2679,9 @@ retry_open:
>  		if (rc)
>  			goto closefs;
>  	}
> +	rc = update_minor_rev(fs);
> +	if (rc)
> +		goto closefs;
>  	if (extended_cmd) {
>  		rc = parse_extended_opts(fs, extended_cmd);
>  		if (rc)
> diff --git a/tests/metadata-checksum-test.sh b/tests/metadata-checksum-test.sh
> index a17bfd2..e51b1fa 100755
> --- a/tests/metadata-checksum-test.sh
> +++ b/tests/metadata-checksum-test.sh
> @@ -190,6 +190,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
>  	blocksize = 4096
>  	inode_size = 256
>  	inode_ratio = 16384
> +	interface = ext5
>  
>  [fs_types]
>  	ext4icsum_no_bv = {
> @@ -200,6 +201,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
>  		options = mmp_update_interval=5 #${RESIZE_PARAM}
>  		lazy_itable_init = 1
>  		cluster_size = $((BLK_SZ * 2))
> +		interface = ext5
>  	}
>  	ext4icsum = {
>  		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
> @@ -208,6 +210,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
>  		options = mmp_update_interval=5 #${RESIZE_PARAM}
>  		lazy_itable_init = 1
>  		cluster_size = $((BLK_SZ * 2))
> +		interface = ext5
>  	}
>  	ext4icsum_noresize = {
>  		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
> @@ -216,6 +219,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
>  		options = mmp_update_interval=5
>  		lazy_itable_init = 1
>  		cluster_size = $((BLK_SZ * 2))
> +		interface = ext5
>  	}
>  	ext4icsum_hugefiles = {
>  		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
> @@ -235,6 +239,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
>  		hugefiles_digits = 4
>  		hugefiles_size = 1G
>  		num_hugefiles = 0
> +		interface = ext5
>  	}
>  ENDL
>  MKFS_OPTS=""
> diff --git a/tests/t_mke2fs_ext5/expect b/tests/t_mke2fs_ext5/expect
> new file mode 100644
> index 0000000..87e1185
> --- /dev/null
> +++ b/tests/t_mke2fs_ext5/expect
> @@ -0,0 +1,45 @@
> +Filesystem volume name:   <none>
> +Last mounted on:          <not available>
> +Filesystem magic number:  0xEF53
> +Filesystem revision #:    1 (dynamic)
> +Filesystem minor rev #:   2 (ext5)
> +Filesystem features:      ext_attr dir_index sparse_super2 filetype meta_bg extent 64bit flex_bg inline_data sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> +Filesystem flags:         signed_directory_hash 
> +Default mount options:    user_xattr acl block_validity
> +Filesystem state:         clean
> +Errors behavior:          Continue
> +Filesystem OS type:       Linux
> +Inode count:              64
> +Block count:              128
> +Reserved block count:     6
> +Free blocks:              116
> +Free inodes:              53
> +First block:              0
> +Block size:               4096
> +Fragment size:            4096
> +Group descriptor size:    64
> +Blocks per group:         32768
> +Fragments per group:      32768
> +Inodes per group:         64
> +Inode blocks per group:   4
> +Flex block group size:    16
> +Last mount time:          n/a
> +Mount count:              0
> +Maximum mount count:      -1
> +Check interval:           0 (<none>)
> +Lifetime writes:          5 kB
> +Reserved blocks uid:      0 (user root)
> +Reserved blocks gid:      0 (group root)
> +First inode:              11
> +Inode size:	          256
> +Required extra isize:     28
> +Desired extra isize:      28
> +Default directory hash:   half_md4
> +
> +
> +Group 0: (Blocks 0-127) [ITABLE_ZEROED]
> +  Primary superblock at 0, Group descriptor at 1
> +  Inode table at 34-37 (+34)
> +  116 free blocks, 53 free inodes, 2 directories, 53 unused inodes
> +  Free blocks: 7-17, 19-33, 38-127
> +  Free inodes: 12-64
> diff --git a/tests/t_mke2fs_ext5/script b/tests/t_mke2fs_ext5/script
> new file mode 100755
> index 0000000..9be9bf5
> --- /dev/null
> +++ b/tests/t_mke2fs_ext5/script
> @@ -0,0 +1,33 @@
> +test_description="mke2fs with ext5"
> +
> +conf=$TMPFILE.conf
> +
> +cat > $conf << ENDL
> +[defaults]
> +	interface = ext5
> +ENDL
> +
> +trap "rm -rf $TMPFILE $TMPFILE.conf" EXIT INT QUIT
> +dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
> +OUT=$test_name.log
> +EXP=$test_dir/expect
> +rm -rf $OUT
> +
> +# Test command line option
> +MKE2FS_CONFIG=$TMPFILE.conf
> +export MKE2FS_CONFIG
> +$MKE2FS -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE | egrep -v "(Filesystem UUID|Filesystem created|Last write time|Last checked|Directory Hash Seed|Checksum| csum )" >> $OUT
> +
> +cmp -s $OUT $EXP
> +status=$?
> +
> +if [ "$status" = 0 ] ; then
> +	echo "$test_name: $test_description: ok"
> +	touch $test_name.ok
> +else
> +	echo "$test_name: $test_description: failed"
> +	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
> +	rm -f $test_name.tmp
> +fi
> +
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] misc: coverity fixes
  2014-05-01 23:12 ` [PATCH 02/37] misc: coverity fixes Darrick J. Wong
@ 2014-05-02 11:17   ` Lukáš Czerner
  2014-05-05 20:04     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 11:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:12:36 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 02/37] misc: coverity fixes
> 
> Fix various small resource leaks and error code handling issues that
> Coverity pointed out.
> 
> Fixes-Coverity-Bugs: 11919{39-45}, 1174118, 1049160, 1049144
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  debugfs/xattrs.c    |   38 ++++++++++++++++++++------------------
>  lib/ext2fs/extent.c |    7 ++++---
>  lib/ext2fs/punch.c  |    2 +-
>  misc/create_inode.c |   34 ++++++++++++++++++++--------------
>  4 files changed, 45 insertions(+), 36 deletions(-)
> 
> 
> diff --git a/debugfs/xattrs.c b/debugfs/xattrs.c
> index 0a29521..7109719 100644
> --- a/debugfs/xattrs.c
> +++ b/debugfs/xattrs.c
> @@ -122,26 +122,26 @@ void do_get_xattr(int argc, char **argv)
>  		default:
>  			printf("%s: Usage: %s <file> <attr> [-f outfile]\n",
>  			       argv[0], argv[0]);
> -			return;
> +			goto out2;
>  		}
>  	}
>  
>  	if (optind != argc - 2) {
>  		printf("%s: Usage: %s <file> <attr> [-f outfile]\n", argv[0],
>  		       argv[0]);
> -		return;
> +		goto out2;
>  	}
>  
>  	if (check_fs_open(argv[0]))
> -		return;
> +		goto out2;
>  
>  	ino = string_to_inode(argv[optind]);
>  	if (!ino)
> -		return;
> +		goto out2;
>  
>  	err = ext2fs_xattrs_open(current_fs, ino, &h);
>  	if (err)
> -		return;
> +		goto out2;
>  
>  	err = ext2fs_xattrs_read(h);
>  	if (err)
> @@ -153,18 +153,19 @@ void do_get_xattr(int argc, char **argv)
>  
>  	if (fp) {
>  		fwrite(buf, buflen, 1, fp);
> -		fclose(fp);
>  	} else {
>  		dump_xattr_string(stdout, buf, buflen);
>  		printf("\n");
>  	}
>  
> -	if (buf)
> -		ext2fs_free_mem(&buf);
> +	ext2fs_free_mem(&buf);
>  out:
>  	ext2fs_xattrs_close(&h);
>  	if (err)
>  		com_err(argv[0], err, "while getting extended attribute");
> +out2:
> +	if (fp)
> +		fclose(fp);
>  }
>  
>  void do_set_xattr(int argc, char **argv)
> @@ -190,30 +191,30 @@ void do_set_xattr(int argc, char **argv)
>  		default:
>  			printf("%s: Usage: %s <file> <attr> [-f infile | "
>  			       "value]\n", argv[0], argv[0]);
> -			return;
> +			goto out2;
>  		}
>  	}
>  
>  	if (optind != argc - 2 && optind != argc - 3) {
>  		printf("%s: Usage: %s <file> <attr> [-f infile | value>]\n",
>  		       argv[0], argv[0]);
> -		return;
> +		goto out2;
>  	}
>  
>  	if (check_fs_open(argv[0]))
> -		return;
> +		goto out2;
>  	if (check_fs_read_write(argv[0]))
> -		return;
> +		goto out2;
>  	if (check_fs_bitmaps(argv[0]))
> -		return;
> +		goto out2;
>  
>  	ino = string_to_inode(argv[optind]);
>  	if (!ino)
> -		return;
> +		goto out2;
>  
>  	err = ext2fs_xattrs_open(current_fs, ino, &h);
>  	if (err)
> -		return;
> +		goto out2;
>  
>  	err = ext2fs_xattrs_read(h);
>  	if (err)
> @@ -238,13 +239,14 @@ void do_set_xattr(int argc, char **argv)
>  		goto out;
>  
>  out:
> +	ext2fs_xattrs_close(&h);
> +	if (err)
> +		com_err(argv[0], err, "while setting extended attribute");
> +out2:
>  	if (fp) {
>  		fclose(fp);
>  		ext2fs_free_mem(&buf);
>  	}
> -	ext2fs_xattrs_close(&h);
> -	if (err)
> -		com_err(argv[0], err, "while setting extended attribute");
>  }
>  
>  void do_rm_xattr(int argc, char **argv)
> diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
> index 80ce88f..30673b5 100644
> --- a/lib/ext2fs/extent.c
> +++ b/lib/ext2fs/extent.c
> @@ -1482,7 +1482,7 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
>  			if (retval) {
>  				r2 = ext2fs_extent_goto(handle, orig_lblk);
>  				if (r2 == 0)
> -					ext2fs_extent_replace(handle, 0,
> +					(void)ext2fs_extent_replace(handle, 0,
>  							      &orig_extent);
>  				goto done;
>  			}
> @@ -1498,11 +1498,12 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
>  				r2 = ext2fs_extent_goto(handle,
>  							newextent.e_lblk);
>  				if (r2 == 0)
> -					ext2fs_extent_delete(handle, 0);
> +					(void)ext2fs_extent_delete(handle, 0);
>  			}
>  			r2 = ext2fs_extent_goto(handle, orig_lblk);
>  			if (r2 == 0)
> -				ext2fs_extent_replace(handle, 0, &orig_extent);
> +				(void)ext2fs_extent_replace(handle, 0,
> +							    &orig_extent);
>  			goto done;
>  		}
>  	}
> diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
> index 60cd2a3..c9250cd 100644
> --- a/lib/ext2fs/punch.c
> +++ b/lib/ext2fs/punch.c
> @@ -403,7 +403,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			retval = 0;
>  
>  			/* Jump forward to the next extent. */
> -			ext2fs_extent_goto(handle, next_lblk);
> +			(void)ext2fs_extent_goto(handle, next_lblk);

Why do we not want to check the return value of this ? There might
be an error right ?

>  			op = EXT2_EXTENT_CURRENT;
>  		}
>  		if (retval)
> diff --git a/misc/create_inode.c b/misc/create_inode.c
> index 964c66a..4bb5e5b 100644
> --- a/misc/create_inode.c
> +++ b/misc/create_inode.c
> @@ -465,7 +465,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  	char		ln_target[PATH_MAX];
>  	unsigned int	save_inode;
>  	ext2_ino_t	ino;
> -	errcode_t	retval;
> +	errcode_t	retval = 0;
>  	int		read_cnt;
>  	int		hdlink;
>  
> @@ -486,7 +486,11 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  		if ((!strcmp(dent->d_name, ".")) ||
>  		    (!strcmp(dent->d_name, "..")))
>  			continue;
> -		lstat(dent->d_name, &st);
> +		if (lstat(dent->d_name, &st)) {
> +			com_err(__func__, errno, _("while lstat \"%s\""),
> +				dent->d_name);
> +			goto out;
> +		}
>  		name = dent->d_name;
>  
>  		/* Check for hardlinks */
> @@ -501,7 +505,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  				if (retval) {
>  					com_err(__func__, retval,
>  						"while linking %s", name);
> -					return retval;
> +					goto out;
>  				}
>  				continue;
>  			} else
> @@ -517,7 +521,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  				com_err(__func__, retval,
>  					_("while creating special file "
>  					  "\"%s\""), name);
> -				return retval;
> +				goto out;
>  			}
>  			break;
>  		case S_IFSOCK:
> @@ -527,7 +531,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  			continue;
>  		case S_IFLNK:
>  			read_cnt = readlink(name, ln_target,
> -					    sizeof(ln_target));
> +					    sizeof(ln_target) - 1);
>  			if (read_cnt == -1) {
>  				com_err(__func__, errno,
>  					_("while trying to readlink \"%s\""),
> @@ -541,7 +545,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  				com_err(__func__, retval,
>  					_("while writing symlink\"%s\""),
>  					name);
> -				return retval;
> +				goto out;
>  			}
>  			break;
>  		case S_IFREG:
> @@ -550,7 +554,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  			if (retval) {
>  				com_err(__func__, retval,
>  					_("while writing file \"%s\""), name);
> -				return retval;
> +				goto out;
>  			}
>  			break;
>  		case S_IFDIR:
> @@ -559,25 +563,25 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  			if (retval) {
>  				com_err(__func__, retval,
>  					_("while making dir \"%s\""), name);
> -				return retval;
> +				goto out;
>  			}
>  			retval = ext2fs_namei(fs, root, parent_ino,
>  					      name, &ino);
>  			if (retval) {
>  				com_err(name, retval, 0);
> -					return retval;
> +					goto out;
>  			}
>  			/* Populate the dir recursively*/
>  			retval = __populate_fs(fs, ino, name, root, hdlinks);
>  			if (retval) {
>  				com_err(__func__, retval,
>  					_("while adding dir \"%s\""), name);
> -				return retval;
> +				goto out;
>  			}
>  			if (chdir("..")) {
>  				com_err(__func__, errno,
>  					_("during cd .."));
> -				return errno;

you probably wan to store errno in retval because that's what we
return from the function.

> +				goto out;
>  			}
>  			break;
>  		default:
> @@ -588,14 +592,14 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  		retval =  ext2fs_namei(fs, root, parent_ino, name, &ino);
>  		if (retval) {
>  			com_err(name, retval, 0);
> -			return retval;
> +			goto out;
>  		}
>  
>  		retval = set_inode_extra(fs, parent_ino, ino, &st);
>  		if (retval) {
>  			com_err(__func__, retval,
>  				_("while setting inode for \"%s\""), name);
> -			return retval;
> +			goto out;
>  		}
>  
>  		/* Save the hardlink ino */
> @@ -612,7 +616,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  				if (p == NULL) {
>  					com_err(name, errno,
>  						_("Not enough memory"));
> -					return errno;
> +					goto out;

same here.

Thanks!
-Lukas

>  				}
>  				hdlinks->hdl = p;
>  				hdlinks->size += HDLINK_CNT;
> @@ -623,6 +627,8 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  			hdlinks->count++;
>  		}
>  	}
> +
> +out:
>  	closedir(dh);
>  	return retval;
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/37] libext2fs: create sockets when populating filesystem
  2014-05-01 23:12 ` [PATCH 03/37] libext2fs: create sockets when populating filesystem Darrick J. Wong
@ 2014-05-02 11:22   ` Lukáš Czerner
  2014-05-05 20:08     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 11:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:12:42 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 03/37] libext2fs: create sockets when populating filesystem
> 
> Since the code to copy-in a socket when creating a filesystem is
> fairly simple, just do it here.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/create_inode.c |    9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/misc/create_inode.c b/misc/create_inode.c
> index 4bb5e5b..e7faab1 100644
> --- a/misc/create_inode.c
> +++ b/misc/create_inode.c
> @@ -114,6 +114,9 @@ errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
>  		mode = LINUX_S_IFIFO;
>  		filetype = EXT2_FT_FIFO;
>  		break;
> +	case S_IFSOCK:
> +		mode = LINUX_S_IFSOCK;
> +		filetype = EXT2_FT_SOCK;

You probably want to change the comment for the function as well.

-Lukas

>  	default:
>  		abort();
>  		/* NOTREACHED */
> @@ -516,6 +519,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  		case S_IFCHR:
>  		case S_IFBLK:
>  		case S_IFIFO:
> +		case S_IFSOCK:
>  			retval = do_mknod_internal(fs, parent_ino, name, &st);
>  			if (retval) {
>  				com_err(__func__, retval,
> @@ -524,11 +528,6 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
>  				goto out;
>  			}
>  			break;
> -		case S_IFSOCK:
> -			/* FIXME: there is no make socket function atm. */
> -			com_err(__func__, 0,
> -				_("ignoring socket file \"%s\""), name);
> -			continue;
>  		case S_IFLNK:
>  			read_cnt = readlink(name, ln_target,
>  					    sizeof(ln_target) - 1);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data
  2014-05-01 23:12 ` [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data Darrick J. Wong
@ 2014-05-02 11:27   ` Lukáš Czerner
  2014-05-05 20:10     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 11:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:12:49 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data
> 
> The combination of 128-byte inodes and inline_data is silly, since
> there's no room in the inode table.  Unfortunately, if neither
> mke2fs.conf nor the mkfs command line options specify an inode size,
> the default inode size is set to 128 bytes (by libext2fs) and the
> warning isn't printed.  Therefore, always do the check-and-warning.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/mke2fs.c |   25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> 
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index aecd5d5..6507d0d 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -2282,21 +2282,22 @@ profile_error:
>  				blocksize);
>  			exit(1);
>  		}
> -		/*
> -		 * If inode size is 128 and inline data is enabled, we need
> -		 * to notify users that inline data will never be useful.
> -		 */
> -		if ((fs_param.s_feature_incompat &
> -		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> -		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> -			com_err(program_name, 0,
> -				_("inode size is %d, inline data is useless"),
> -				inode_size);
> -			exit(1);
> -		}
>  		fs_param.s_inode_size = inode_size;
>  	}
>  
> +	/*
> +	 * If inode size is 128 and inline data is enabled, we need
> +	 * to notify users that inline data will never be useful.
> +	 */
> +	if ((fs_param.s_feature_incompat &
> +	     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> +	    fs_param.s_inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> +		com_err(program_name, 0,
> +			_("inode size is %d, inline data is useless"),
> +			inode_size);

Oops :) copy-paste is tricky. You need to use fs_param.s_inode_size
rather than inode_size here. Otherwise it looks good.

Thanks!
-Lukas


> +		exit(1);
> +	}
> +
>  	/* Make sure number of inodes specified will fit in 32 bits */
>  	if (num_inodes == 0) {
>  		unsigned long long n;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
  2014-05-01 23:12 ` [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables Darrick J. Wong
@ 2014-05-02 11:38   ` Lukáš Czerner
  2014-05-05 22:23     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 11:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:12:55 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
> 
> The logdump command doesn't know how to deal with revoke tables in
> 64bit journals, so teach it to do this.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  debugfs/logdump.c          |   20 ++++-
>  tests/f_jnl_64bit/expect.0 |  171 --------------------------------------------
>  2 files changed, 15 insertions(+), 176 deletions(-)
> 
> 
> diff --git a/debugfs/logdump.c b/debugfs/logdump.c
> index 2d0efaf..8b9dc5b 100644
> --- a/debugfs/logdump.c
> +++ b/debugfs/logdump.c
> @@ -526,28 +526,38 @@ static void dump_revoke_block(FILE *out_file, char *buf,
>  {
>  	int			offset, max;
>  	journal_revoke_header_t *header;
> -	unsigned int		*entry, rblock;
> +	unsigned int		*entry;
> +	unsigned long long	*bentry, rblock;
> +	int			tag_size = sizeof(*entry);
>  
>  	if (dump_all)
>  		fprintf(out_file, "Dumping revoke block, sequence %u, at "
>  			"block %u:\n", transaction, blocknr);
>  
> +	if (be32_to_cpu(jsb->s_feature_incompat) & JFS_FEATURE_INCOMPAT_64BIT)
> +		tag_size = sizeof(*bentry);
> +
>  	header = (journal_revoke_header_t *) buf;
>  	offset = sizeof(journal_revoke_header_t);
>  	max = be32_to_cpu(header->r_count);
>  
>  	while (offset < max) {
> -		entry = (unsigned int *) (buf + offset);
> -		rblock = be32_to_cpu(*entry);
> +		if (tag_size == sizeof(*entry)) {
> +			entry = (unsigned int *) (buf + offset);
> +			rblock = be32_to_cpu(*entry);
> +		} else {
> +			bentry = (unsigned long long *)(buf + offset);
> +			rblock = ext2fs_be64_to_cpu(*bentry);
> +		}

I wonder whether we really need to have bentry and entry since those
are just pointers and should be of the same size regardless of what
they are pointing at.

Would not it be better from the readability pov ? Otherwise it looks
good.

Thanks!
-Lukas

>  		if (dump_all || rblock == block_to_dump) {
> -			fprintf(out_file, "  Revoke FS block %u", rblock);
> +			fprintf(out_file, "  Revoke FS block %llu", rblock);
>  			if (dump_all)
>  				fprintf(out_file, "\n");
>  			else
>  				fprintf(out_file," at block %u, sequence %u\n",
>  					blocknr, transaction);
>  		}
> -		offset += 4;
> +		offset += tag_size;
>  	}
>  }
>  
> diff --git a/tests/f_jnl_64bit/expect.0 b/tests/f_jnl_64bit/expect.0
> index 2007f03..5cef2d8 100644
> --- a/tests/f_jnl_64bit/expect.0
> +++ b/tests/f_jnl_64bit/expect.0
> @@ -1,189 +1,97 @@
>  Journal starts at block 67, transaction 32
>  Found expected sequence 32, type 5 (revoke table) at block 67
>  Dumping revoke block, sequence 32, at block 67:
> -  Revoke FS block 0
>    Revoke FS block 1536
> -  Revoke FS block 0
>    Revoke FS block 1472
> -  Revoke FS block 0
>    Revoke FS block 1473
> -  Revoke FS block 0
>    Revoke FS block 1474
> -  Revoke FS block 0
>    Revoke FS block 1475
> -  Revoke FS block 0
>    Revoke FS block 1476
> -  Revoke FS block 0
>    Revoke FS block 1541
> -  Revoke FS block 0
>    Revoke FS block 1477
> -  Revoke FS block 0
>    Revoke FS block 1478
> -  Revoke FS block 0
>    Revoke FS block 1479
> -  Revoke FS block 0
>    Revoke FS block 1480
> -  Revoke FS block 0
>    Revoke FS block 1481
> -  Revoke FS block 0
>    Revoke FS block 1482
> -  Revoke FS block 0
>    Revoke FS block 1483
> -  Revoke FS block 0
>    Revoke FS block 1484
> -  Revoke FS block 0
>    Revoke FS block 1485
> -  Revoke FS block 0
>    Revoke FS block 1486
> -  Revoke FS block 0
>    Revoke FS block 1487
> -  Revoke FS block 0
>    Revoke FS block 1488
> -  Revoke FS block 0
>    Revoke FS block 1489
> -  Revoke FS block 0
>    Revoke FS block 1490
> -  Revoke FS block 0
>    Revoke FS block 1491
> -  Revoke FS block 0
>    Revoke FS block 1556
> -  Revoke FS block 0
>    Revoke FS block 1492
> -  Revoke FS block 0
>    Revoke FS block 1493
> -  Revoke FS block 0
>    Revoke FS block 1429
> -  Revoke FS block 0
>    Revoke FS block 1494
> -  Revoke FS block 0
>    Revoke FS block 1495
> -  Revoke FS block 0
>    Revoke FS block 1496
> -  Revoke FS block 0
>    Revoke FS block 1432
> -  Revoke FS block 0
>    Revoke FS block 1497
> -  Revoke FS block 0
>    Revoke FS block 1498
> -  Revoke FS block 0
>    Revoke FS block 1434
> -  Revoke FS block 0
>    Revoke FS block 1499
> -  Revoke FS block 0
>    Revoke FS block 1435
> -  Revoke FS block 0
>    Revoke FS block 1500
> -  Revoke FS block 0
>    Revoke FS block 1501
> -  Revoke FS block 0
>    Revoke FS block 1502
> -  Revoke FS block 0
>    Revoke FS block 1503
> -  Revoke FS block 0
>    Revoke FS block 1504
> -  Revoke FS block 0
>    Revoke FS block 1505
> -  Revoke FS block 0
>    Revoke FS block 1506
> -  Revoke FS block 0
>    Revoke FS block 1442
> -  Revoke FS block 0
>    Revoke FS block 1507
> -  Revoke FS block 0
>    Revoke FS block 1508
> -  Revoke FS block 0
>    Revoke FS block 1444
> -  Revoke FS block 0
>    Revoke FS block 1509
> -  Revoke FS block 0
>    Revoke FS block 1445
> -  Revoke FS block 0
>    Revoke FS block 1510
> -  Revoke FS block 0
>    Revoke FS block 1511
> -  Revoke FS block 0
>    Revoke FS block 1512
> -  Revoke FS block 0
>    Revoke FS block 1513
> -  Revoke FS block 0
>    Revoke FS block 1449
> -  Revoke FS block 0
>    Revoke FS block 1514
> -  Revoke FS block 0
>    Revoke FS block 1515
> -  Revoke FS block 0
>    Revoke FS block 1516
> -  Revoke FS block 0
>    Revoke FS block 1517
> -  Revoke FS block 0
>    Revoke FS block 1453
> -  Revoke FS block 0
>    Revoke FS block 1518
> -  Revoke FS block 0
>    Revoke FS block 1519
> -  Revoke FS block 0
>    Revoke FS block 1520
> -  Revoke FS block 0
>    Revoke FS block 1456
> -  Revoke FS block 0
>    Revoke FS block 1521
> -  Revoke FS block 0
>    Revoke FS block 1457
> -  Revoke FS block 0
>    Revoke FS block 1522
> -  Revoke FS block 0
>    Revoke FS block 1458
> -  Revoke FS block 0
>    Revoke FS block 1523
> -  Revoke FS block 0
>    Revoke FS block 1459
> -  Revoke FS block 0
>    Revoke FS block 1524
> -  Revoke FS block 0
>    Revoke FS block 1460
> -  Revoke FS block 0
>    Revoke FS block 1525
> -  Revoke FS block 0
>    Revoke FS block 1461
> -  Revoke FS block 0
>    Revoke FS block 1526
> -  Revoke FS block 0
>    Revoke FS block 1462
> -  Revoke FS block 0
>    Revoke FS block 1527
> -  Revoke FS block 0
>    Revoke FS block 1463
> -  Revoke FS block 0
>    Revoke FS block 1528
> -  Revoke FS block 0
>    Revoke FS block 1464
> -  Revoke FS block 0
>    Revoke FS block 1529
> -  Revoke FS block 0
>    Revoke FS block 1465
> -  Revoke FS block 0
>    Revoke FS block 1530
> -  Revoke FS block 0
>    Revoke FS block 1466
> -  Revoke FS block 0
>    Revoke FS block 1531
> -  Revoke FS block 0
>    Revoke FS block 1467
> -  Revoke FS block 0
>    Revoke FS block 1532
> -  Revoke FS block 0
>    Revoke FS block 1468
> -  Revoke FS block 0
>    Revoke FS block 1533
> -  Revoke FS block 0
>    Revoke FS block 1469
> -  Revoke FS block 0
>    Revoke FS block 1534
> -  Revoke FS block 0
>    Revoke FS block 1470
> -  Revoke FS block 0
>    Revoke FS block 1535
> -  Revoke FS block 0
>    Revoke FS block 1471
>  Found expected sequence 32, type 1 (descriptor block) at block 68
>  Dumping descriptor block, sequence 32, at block 68:
> @@ -323,163 +231,84 @@ Dumping descriptor block, sequence 32, at block 150:
>  Found expected sequence 32, type 2 (commit block) at block 201
>  Found expected sequence 33, type 5 (revoke table) at block 202
>  Dumping revoke block, sequence 33, at block 202:
> -  Revoke FS block 0
>    Revoke FS block 1600
> -  Revoke FS block 0
>    Revoke FS block 1601
> -  Revoke FS block 0
>    Revoke FS block 1537
> -  Revoke FS block 0
>    Revoke FS block 1602
> -  Revoke FS block 0
>    Revoke FS block 1538
> -  Revoke FS block 0
>    Revoke FS block 1603
> -  Revoke FS block 0
>    Revoke FS block 1539
> -  Revoke FS block 0
>    Revoke FS block 1604
> -  Revoke FS block 0
>    Revoke FS block 1540
> -  Revoke FS block 0
>    Revoke FS block 1605
> -  Revoke FS block 0
>    Revoke FS block 1606
> -  Revoke FS block 0
>    Revoke FS block 1542
> -  Revoke FS block 0
>    Revoke FS block 1607
> -  Revoke FS block 0
>    Revoke FS block 1543
> -  Revoke FS block 0
>    Revoke FS block 1608
> -  Revoke FS block 0
>    Revoke FS block 1544
> -  Revoke FS block 0
>    Revoke FS block 1609
> -  Revoke FS block 0
>    Revoke FS block 1545
> -  Revoke FS block 0
>    Revoke FS block 1610
> -  Revoke FS block 0
>    Revoke FS block 1546
> -  Revoke FS block 0
>    Revoke FS block 1611
> -  Revoke FS block 0
>    Revoke FS block 1547
> -  Revoke FS block 0
>    Revoke FS block 1612
> -  Revoke FS block 0
>    Revoke FS block 1548
> -  Revoke FS block 0
>    Revoke FS block 1613
> -  Revoke FS block 0
>    Revoke FS block 1549
> -  Revoke FS block 0
>    Revoke FS block 1614
> -  Revoke FS block 0
>    Revoke FS block 1550
> -  Revoke FS block 0
>    Revoke FS block 1615
> -  Revoke FS block 0
>    Revoke FS block 1551
> -  Revoke FS block 0
>    Revoke FS block 1616
> -  Revoke FS block 0
>    Revoke FS block 1552
> -  Revoke FS block 0
>    Revoke FS block 1617
> -  Revoke FS block 0
>    Revoke FS block 1553
> -  Revoke FS block 0
>    Revoke FS block 1554
> -  Revoke FS block 0
>    Revoke FS block 1555
> -  Revoke FS block 0
>    Revoke FS block 1557
> -  Revoke FS block 0
>    Revoke FS block 1558
> -  Revoke FS block 0
>    Revoke FS block 1559
> -  Revoke FS block 0
>    Revoke FS block 1560
> -  Revoke FS block 0
>    Revoke FS block 1561
> -  Revoke FS block 0
>    Revoke FS block 1562
> -  Revoke FS block 0
>    Revoke FS block 1563
> -  Revoke FS block 0
>    Revoke FS block 1564
> -  Revoke FS block 0
>    Revoke FS block 1565
> -  Revoke FS block 0
>    Revoke FS block 1566
> -  Revoke FS block 0
>    Revoke FS block 1567
> -  Revoke FS block 0
>    Revoke FS block 1568
> -  Revoke FS block 0
>    Revoke FS block 1569
> -  Revoke FS block 0
>    Revoke FS block 1570
> -  Revoke FS block 0
>    Revoke FS block 1571
> -  Revoke FS block 0
>    Revoke FS block 1572
> -  Revoke FS block 0
>    Revoke FS block 1573
> -  Revoke FS block 0
>    Revoke FS block 1574
> -  Revoke FS block 0
>    Revoke FS block 1575
> -  Revoke FS block 0
>    Revoke FS block 1576
> -  Revoke FS block 0
>    Revoke FS block 1577
> -  Revoke FS block 0
>    Revoke FS block 1578
> -  Revoke FS block 0
>    Revoke FS block 1579
> -  Revoke FS block 0
>    Revoke FS block 1580
> -  Revoke FS block 0
>    Revoke FS block 1581
> -  Revoke FS block 0
>    Revoke FS block 1582
> -  Revoke FS block 0
>    Revoke FS block 1583
> -  Revoke FS block 0
>    Revoke FS block 1584
> -  Revoke FS block 0
>    Revoke FS block 1585
> -  Revoke FS block 0
>    Revoke FS block 1586
> -  Revoke FS block 0
>    Revoke FS block 1587
> -  Revoke FS block 0
>    Revoke FS block 1588
> -  Revoke FS block 0
>    Revoke FS block 1589
> -  Revoke FS block 0
>    Revoke FS block 1590
> -  Revoke FS block 0
>    Revoke FS block 1591
> -  Revoke FS block 0
>    Revoke FS block 1592
> -  Revoke FS block 0
>    Revoke FS block 1593
> -  Revoke FS block 0
>    Revoke FS block 1594
> -  Revoke FS block 0
>    Revoke FS block 1595
> -  Revoke FS block 0
>    Revoke FS block 1596
> -  Revoke FS block 0
>    Revoke FS block 1597
> -  Revoke FS block 0
>    Revoke FS block 1598
> -  Revoke FS block 0
>    Revoke FS block 1599
>  Found expected sequence 33, type 1 (descriptor block) at block 203
>  Dumping descriptor block, sequence 33, at block 203:
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/37] debugfs: force logdump to display (old) journal contents
  2014-05-01 23:13 ` [PATCH 06/37] debugfs: force logdump to display (old) journal contents Darrick J. Wong
@ 2014-05-02 11:49   ` Lukáš Czerner
  2014-05-06  0:24     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 11:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:02 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 06/37] debugfs: force logdump to display (old) journal
>     contents
> 
> If the user passes -a more than once to logdump, try to dump old log
> contents.  This can be used to try to track down journal problems even
> after recovery.

You need to update man page as well for this. Also I wonder what's
the behaviour if '-a' and '-b' or '-c' are specified simultaneously
and '-a' is specified multiple times ?

Thanks!
-Lukas

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  debugfs/logdump.c |   11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/debugfs/logdump.c b/debugfs/logdump.c
> index 8b9dc5b..bf4bef5 100644
> --- a/debugfs/logdump.c
> +++ b/debugfs/logdump.c
> @@ -393,9 +393,13 @@ static void dump_journal(char *cmdname, FILE *out_file,
>  	fprintf(out_file, "Journal starts at block %u, transaction %u\n",
>  		blocknr, transaction);
>  
> -	if (!blocknr)
> +	if (!blocknr) {
>  		/* Empty journal, nothing to do. */
> -		return;
> +		if (dump_all < 2)
> +			return;
> +		else
> +			blocknr = 1;
> +	}
>  
>  	while (1) {
>  		retval = read_journal_block(cmdname, source,
> @@ -420,7 +424,8 @@ static void dump_journal(char *cmdname, FILE *out_file,
>  			fprintf (out_file, "Found sequence %u (not %u) at "
>  				 "block %u: end of journal.\n",
>  				 sequence, transaction, blocknr);
> -			return;
> +			if (dump_all < 2)
> +				return;
>  		}
>  
>  		if (dump_descriptors) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/37] mke2fs: set gdt csum when creating packed fs
  2014-05-01 23:13 ` [PATCH 08/37] mke2fs: set gdt csum when creating packed fs Darrick J. Wong
@ 2014-05-02 11:55   ` Lukáš Czerner
  2014-05-12  4:22     ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 11:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:15 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 08/37] mke2fs: set gdt csum when creating packed fs
> 
> When we're creating a fs with metadata blocks packed at the beginning
> (packed_meta_blocks=1 in mke2fs.conf), set the group descriptor
> checksum or else we create DOA filesystems with checksum errors.

Makes sense. Thanks!

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/mke2fs.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> 
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 6507d0d..fd6259d 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -383,6 +383,7 @@ static errcode_t packed_allocate_tables(ext2_filsys fs)
>  		ext2fs_block_alloc_stats_range(fs, goal,
>  					       fs->inode_blocks_per_group, +1);
>  		ext2fs_inode_table_loc_set(fs, i, goal);
> +		ext2fs_group_desc_csum_set(fs, i);
>  	}
>  	return 0;
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/37] mke2fs: set error behavior at initialization time
  2014-05-01 23:13 ` [PATCH 09/37] mke2fs: set error behavior at initialization time Darrick J. Wong
@ 2014-05-02 12:13   ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 12:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:21 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 09/37] mke2fs: set error behavior at initialization time
> 
> Port tune2fs' -e flag to mke2fs so that we can set error behavior at
> format time, and introduce the equivalent errors= setting into
> mke2fs.conf.

Looks good. Thanks!

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/mke2fs.8.in             |   23 +++++++++
>  misc/mke2fs.c                |   57 +++++++++++++++++++++-
>  misc/mke2fs.conf.5.in        |   19 +++++++
>  tests/t_mke2fs_errors/expect |   24 +++++++++
>  tests/t_mke2fs_errors/script |  110 ++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 231 insertions(+), 2 deletions(-)
>  create mode 100644 tests/t_mke2fs_errors/expect
>  create mode 100755 tests/t_mke2fs_errors/script
> 
> 
> diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
> index bf17eae..bad76bb 100644
> --- a/misc/mke2fs.8.in
> +++ b/misc/mke2fs.8.in
> @@ -113,6 +113,10 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
>  [
>  .B \-V
>  ]
> +[
> +.B \-e
> +.I errors-behavior
> +]
>  .I device
>  [
>  .I blocks-count
> @@ -206,6 +210,25 @@ lot of buffer cache memory, which may impact other applications running
>  on a busy server.  This option will cause mke2fs to run much more
>  slowly, however, so there is a tradeoff to using direct I/O.
>  .TP
> +.BI \-e " error-behavior"
> +Change the behavior of the kernel code when errors are detected.
> +In all cases, a filesystem error will cause
> +.BR e2fsck (8)
> +to check the filesystem on the next boot.
> +.I error-behavior
> +can be one of the following:
> +.RS 1.2i
> +.TP 1.2i
> +.B continue
> +Continue normal execution.
> +.TP
> +.B remount-ro
> +Remount filesystem read-only.
> +.TP
> +.B panic
> +Cause a kernel panic.
> +.RE
> +.TP
>  .BI \-E " extended-options"
>  Set extended options for the filesystem.  Extended options are comma
>  separated, and may take an argument using the equals ('=') sign.  The
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index fd6259d..a794689 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -112,6 +112,8 @@ static profile_t	profile;
>  static int sys_page_size = 4096;
>  static int linux_version_code = 0;
>  
> +static int errors_behavior = 0;
> +
>  static void usage(void)
>  {
>  	fprintf(stderr, _("Usage: %s [-c|-l filename] [-b block-size] "
> @@ -123,7 +125,7 @@ static void usage(void)
>  	"\t[-g blocks-per-group] [-L volume-label] "
>  	"[-M last-mounted-directory]\n\t[-O feature[,...]] "
>  	"[-r fs-revision] [-E extended-option[,...]]\n"
> -	"\t[-t fs-type] [-T usage-type ] [-U UUID] "
> +	"\t[-t fs-type] [-T usage-type ] [-U UUID] [-e errors_behavior]"
>  	"[-jnqvDFKSV] device [blocks-count]\n"),
>  		program_name);
>  	exit(1);
> @@ -1524,7 +1526,7 @@ profile_error:
>  	}
>  
>  	while ((c = getopt (argc, argv,
> -		    "b:cg:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
> +		    "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
>  		switch (c) {
>  		case 'b':
>  			blocksize = parse_num_blocks2(optarg, -1);
> @@ -1567,6 +1569,20 @@ profile_error:
>  		case 'E':
>  			extended_opts = optarg;
>  			break;
> +		case 'e':
> +			if (strcmp(optarg, "continue") == 0)
> +				errors_behavior = EXT2_ERRORS_CONTINUE;
> +			else if (strcmp(optarg, "remount-ro") == 0)
> +				errors_behavior = EXT2_ERRORS_RO;
> +			else if (strcmp(optarg, "panic") == 0)
> +				errors_behavior = EXT2_ERRORS_PANIC;
> +			else {
> +				com_err(program_name, 0,
> +					_("bad error behavior - %s"),
> +					optarg);
> +				usage();
> +			}
> +			break;
>  		case 'F':
>  			force++;
>  			break;
> @@ -2577,6 +2593,38 @@ static int create_quota_inodes(ext2_filsys fs)
>  	return 0;
>  }
>  
> +static errcode_t set_error_behavior(ext2_filsys fs)
> +{
> +	char	*arg = NULL;
> +	short	errors = fs->super->s_errors;
> +
> +	arg = get_string_from_profile(fs_types, "errors", NULL);
> +	if (arg == NULL)
> +		goto try_user;
> +
> +	if (strcmp(arg, "continue") == 0)
> +		errors = EXT2_ERRORS_CONTINUE;
> +	else if (strcmp(arg, "remount-ro") == 0)
> +		errors = EXT2_ERRORS_RO;
> +	else if (strcmp(arg, "panic") == 0)
> +		errors = EXT2_ERRORS_PANIC;
> +	else {
> +		com_err(program_name, 0,
> +			_("bad error behavior in profile - %s"),
> +			arg);
> +		free(arg);
> +		return EXT2_ET_INVALID_ARGUMENT;
> +	}
> +	free(arg);
> +
> +try_user:
> +	if (errors_behavior)
> +		errors = errors_behavior;
> +
> +	fs->super->s_errors = errors;
> +	return 0;
> +}
> +
>  int main (int argc, char *argv[])
>  {
>  	errcode_t	retval = 0;
> @@ -2641,6 +2689,11 @@ int main (int argc, char *argv[])
>  	}
>  	fs->progress_ops = &ext2fs_numeric_progress_ops;
>  
> +	/* Set the error behavior */
> +	retval = set_error_behavior(fs);
> +	if (retval)
> +		usage();
> +
>  	/* Check the user's mkfs options for metadata checksumming */
>  	if (!quiet &&
>  	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> diff --git a/misc/mke2fs.conf.5.in b/misc/mke2fs.conf.5.in
> index 02efdce..18a003a 100644
> --- a/misc/mke2fs.conf.5.in
> +++ b/misc/mke2fs.conf.5.in
> @@ -302,6 +302,25 @@ whose subsections define the
>  relation, only the last will be used by
>  .BR mke2fs (8).
>  .TP
> +.I errors
> +Change the behavior of the kernel code when errors are detected.
> +In all cases, a filesystem error will cause
> +.BR e2fsck (8)
> +to check the filesystem on the next boot.
> +.I errors
> +can be one of the following:
> +.RS 1.2i
> +.TP 1.2i
> +.B continue
> +Continue normal execution.
> +.TP
> +.B remount-ro
> +Remount filesystem read-only.
> +.TP
> +.B panic
> +Cause a kernel panic.
> +.RE
> +.TP
>  .I features
>  This relation specifies a comma-separated list of features edit
>  requests which modify the feature set
> diff --git a/tests/t_mke2fs_errors/expect b/tests/t_mke2fs_errors/expect
> new file mode 100644
> index 0000000..78514bd
> --- /dev/null
> +++ b/tests/t_mke2fs_errors/expect
> @@ -0,0 +1,24 @@
> +error default
> +Errors behavior:          Continue
> +error continue
> +Errors behavior:          Continue
> +error panic
> +Errors behavior:          Panic
> +error remount-ro
> +Errors behavior:          Remount read-only
> +error garbage
> +error default profile continue
> +Errors behavior:          Continue
> +error default profile panic
> +Errors behavior:          Panic
> +error default profile remount-ro
> +Errors behavior:          Remount read-only
> +error default profile broken
> +error fs_types profile continue
> +Errors behavior:          Continue
> +error fs_types profile panic
> +Errors behavior:          Panic
> +error fs_types profile remount-ro
> +Errors behavior:          Remount read-only
> +error fs_types profile remount-ro
> +Errors behavior:          Panic
> diff --git a/tests/t_mke2fs_errors/script b/tests/t_mke2fs_errors/script
> new file mode 100755
> index 0000000..d09e926
> --- /dev/null
> +++ b/tests/t_mke2fs_errors/script
> @@ -0,0 +1,110 @@
> +test_description="mke2fs with error behavior"
> +
> +conf=$TMPFILE.conf
> +write_defaults_conf()
> +{
> +	errors="$1"
> +	cat > $conf << ENDL
> +[defaults]
> +	errors = $errors
> +ENDL
> +}
> +
> +write_section_conf()
> +{
> +	errors="$1"
> +	cat > $conf << ENDL
> +[defaults]
> +	errors = broken
> +
> +[fs_types]
> +	test_suite = {
> +		errors = $errors
> +	}
> +ENDL
> +}
> +
> +trap "rm -rf $TMPFILE $TMPFILE.conf" EXIT INT QUIT
> +dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
> +OUT=$test_name.log
> +EXP=$test_dir/expect
> +rm -rf $OUT
> +
> +# Test command line option
> +echo "error default" >> $OUT
> +$MKE2FS -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error continue" >> $OUT
> +$MKE2FS -e continue -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error panic" >> $OUT
> +$MKE2FS -e panic -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error remount-ro" >> $OUT
> +$MKE2FS -e remount-ro -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error garbage" >> $OUT
> +dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
> +$MKE2FS -e broken -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +# Test errors= in default
> +echo "error default profile continue" >> $OUT
> +write_defaults_conf continue
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error default profile panic" >> $OUT
> +write_defaults_conf panic
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error default profile remount-ro" >> $OUT
> +write_defaults_conf remount-ro
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error default profile broken" >> $OUT
> +write_defaults_conf broken
> +dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +# Test errors= in a fs type
> +echo "error fs_types profile continue" >> $OUT
> +write_section_conf continue
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error fs_types profile panic" >> $OUT
> +write_section_conf panic
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +echo "error fs_types profile remount-ro" >> $OUT
> +write_section_conf remount-ro
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +# Test command line override
> +echo "error fs_types profile remount-ro" >> $OUT
> +write_section_conf remount-ro
> +MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite -e panic > /dev/null 2>&1
> +$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
> +
> +cmp -s $OUT $EXP
> +status=$?
> +
> +if [ "$status" = 0 ] ; then
> +	echo "$test_name: $test_description: ok"
> +	touch $test_name.ok
> +else
> +	echo "$test_name: $test_description: failed"
> +	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
> +	rm -f $test_name.tmp
> +fi
> +
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/37] e2fsck: verify checksums after checking everything else
  2014-05-01 23:13 ` [PATCH 10/37] e2fsck: verify checksums after checking everything else Darrick J. Wong
@ 2014-05-02 12:32   ` Lukáš Czerner
  2014-05-05 22:56     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 12:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:28 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 10/37] e2fsck: verify checksums after checking everything else
> 
> There's a particular problem with e2fsck's user interface where
> checksum errors are concerned:  Fixing the first complaint about
> a checksum problem results in the inode being cleared even if e2fsck
> could otherwise have recovered it.  While this mode is useful for
> cleaning the remaining broken crud off the filesystem, we could at
> least default to checking everything /else/ and only complaining about
> the incorrect checksum if fsck finds nothing else wrong.
> 
> So, plumb in a config option.  We default to "verify and checksum"
> unless the user tell us otherwise.

I wonder whether it would not be better to always check the checksum
of an object because it might yield additional information.

If the checksum is good and the object is somewhat broken that it's
highly likely that we have a problem within a kernel (or possibly
e2fsprogs if some other operations were performed)

If the checksum is bad and the object is bad, then it's likely that
the corruption happened outside of the file system code, in memory,
on disk or in transfer.

If checksum is bad and the object is good then it's trickier since it
can be kernel metadata csum bug, unlucky silent corruption, or
intentional change of the metadata.

It's not huge amount of information we can get from it, but I think
that it might be useful when dealing with corrupted file system.


Thanks!
-Lukas

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  e2fsck/e2fsck.8.in      |   12 ++++++++++++
>  e2fsck/e2fsck.conf.5.in |   20 ++++++++++++++++++++
>  e2fsck/e2fsck.h         |    1 +
>  e2fsck/problem.c        |   18 ++++++++++++++----
>  e2fsck/problemP.h       |    1 +
>  e2fsck/unix.c           |   11 +++++++++++
>  6 files changed, 59 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
> index f5ed758..43ee063 100644
> --- a/e2fsck/e2fsck.8.in
> +++ b/e2fsck/e2fsck.8.in
> @@ -207,6 +207,18 @@ option may prevent you from further manual data recovery.
>  .BI nodiscard
>  Do not attempt to discard free blocks and unused inode blocks. This option is
>  exactly the opposite of discard option. This is set as default.
> +.TP
> +.BI strict_csums
> +Verify each metadata object's checksum before checking anything other fields
> +in the metadata object.  If the verification fails, offer to clear the item,
> +also before checking any of the other fields.  This option causes e2fsck to
> +favor throwing away broken objects over trying to salvage them.
> +.TP
> +.BI no_strict_csums
> +Perform all regular checks of a metadata object and only verify the checksum if
> +no problems were found.  This option causes e2fsck to try to salvage slightly
> +damaged metadata objects, at the cost of spending processing time on recovering
> +data.  This is set as the default.
>  .RE
>  .TP
>  .B \-f
> diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
> index 9ebfbbf..a8219a8 100644
> --- a/e2fsck/e2fsck.conf.5.in
> +++ b/e2fsck/e2fsck.conf.5.in
> @@ -222,6 +222,26 @@ If this boolean relation is true, e2fsck will run as if the option
>  .B -v
>  is always specified.  This will cause e2fsck to print some additional
>  information at the end of each full file system check.
> +.TP
> +.I strict_csums
> +If this boolean relation is true, e2fsck will run as if
> +.B -E strict_csums
> +is set.  This causes e2fsck to verify each metadata object's checksum before
> +checking anything other fields in the metadata object.  If the verification
> +fails, offer to clear the item, also before checking any of the other fields.
> +This option causes e2fsck to favor throwing away broken objects over trying to
> +salvage them.
> +.IP
> +If the boolean relation is false, e2fsck will run as if
> +.B -E no_strict_csums
> +is set.  In this case, e2fsck will perform all regular checks of a metadata
> +object and only verify the checksum if no problems were found.  This option
> +causes e2fsck to try to salvage slightly damaged metadata objects, at the cost
> +of spending processing time on recovering data.
> +.IP
> +The default is for e2fsck to behave as if
> +.B -E no_strict_csums
> +is set.
>  .SH THE [problems] STANZA
>  Each tag in the
>  .I [problems] 
> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> index dbd6ea8..d7a7be9 100644
> --- a/e2fsck/e2fsck.h
> +++ b/e2fsck/e2fsck.h
> @@ -167,6 +167,7 @@ struct resource_track {
>  #define E2F_OPT_FRAGCHECK	0x0800
>  #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
>  #define E2F_OPT_DISCARD		0x2000
> +#define E2F_OPT_CSUM_FIRST	0x4000
>  
>  /*
>   * E2fsck flags
> diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> index 7f0ad6c..0999399 100644
> --- a/e2fsck/problem.c
> +++ b/e2fsck/problem.c
> @@ -970,7 +970,7 @@ static struct e2fsck_problem problem_table[] = {
>  	/* inode checksum does not match inode */
>  	{ PR_1_INODE_CSUM_INVALID,
>  	  N_("@i %i checksum does not match @i.  "),
> -	  PROMPT_CLEAR, PR_PREEN_OK },
> +	  PROMPT_CLEAR, PR_PREEN_OK | PR_INITIAL_CSUM },
>  
>  	/* inode passes checks, but checksum does not match inode */
>  	{ PR_1_INODE_ONLY_CSUM_INVALID,
> @@ -981,7 +981,7 @@ static struct e2fsck_problem problem_table[] = {
>  	{ PR_1_EXTENT_CSUM_INVALID,
>  	  N_("@i %i extent block checksum does not match extent\n\t(logical @b "
>  	     "%c, @n physical @b %b, len %N)\n"),
> -	  PROMPT_CLEAR, 0 },
> +	  PROMPT_CLEAR, PR_INITIAL_CSUM },
>  
>  	/*
>  	 * Inode extent block passes checks, but checksum does not match
> @@ -996,7 +996,7 @@ static struct e2fsck_problem problem_table[] = {
>  	{ PR_1_EA_BLOCK_CSUM_INVALID,
>  	  N_("Extended attribute @a @b %b checksum for @i %i does not "
>  	     "match.  "),
> -	  PROMPT_CLEAR, 0 },
> +	  PROMPT_CLEAR, PR_INITIAL_CSUM },
>  
>  	/*
>  	 * Extended attribute block passes checks, but checksum for inode does
> @@ -1470,7 +1470,7 @@ static struct e2fsck_problem problem_table[] = {
>  	/* leaf node fails checksum */
>  	{ PR_2_LEAF_NODE_CSUM_INVALID,
>  	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
> -	  PROMPT_SALVAGE, PR_PREEN_OK },
> +	  PROMPT_SALVAGE, PR_PREEN_OK | PR_INITIAL_CSUM },
>  
>  	/* leaf node has no checksum */
>  	{ PR_2_LEAF_NODE_MISSING_CSUM,
> @@ -1944,6 +1944,16 @@ int fix_problem(e2fsck_t ctx, problem_t code, struct problem_context *pctx)
>  		printf(_("Unhandled error code (0x%x)!\n"), code);
>  		return 0;
>  	}
> +
> +	/*
> +	 * If there is a problem with the initial csum verification and the
> +	 * user told e2fsck to verify csums /after/ checking everything else,
> +	 * then don't "fix" anything.
> +	 */
> +	if ((ptr->flags & PR_INITIAL_CSUM) &&
> +	    !(ctx->options & E2F_OPT_CSUM_FIRST))
> +		return 0;
> +
>  	if (!(ptr->flags & PR_CONFIG)) {
>  		char	key[9], *new_desc = NULL;
>  
> diff --git a/e2fsck/problemP.h b/e2fsck/problemP.h
> index 7944cd6..a983598 100644
> --- a/e2fsck/problemP.h
> +++ b/e2fsck/problemP.h
> @@ -44,3 +44,4 @@ struct latch_descr {
>  #define PR_CONFIG	0x080000 /* This problem has been customized
>  				    from the config file */
>  #define PR_FORCE_NO	0x100000 /* Force the answer to be no */
> +#define PR_INITIAL_CSUM	0x200000 /* User can ignore initial csum check */
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index b39383d..c6cdb49 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -692,6 +692,10 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  			else
>  				ctx->log_fn = string_copy(ctx, arg, 0);
>  			continue;
> +		} else if (strcmp(token, "strict_csums") == 0) {
> +			ctx->options |= E2F_OPT_CSUM_FIRST;
> +		} else if (strcmp(token, "no_strict_csums") == 0) {
> +			ctx->options &= ~E2F_OPT_CSUM_FIRST;
>  		} else {
>  			fprintf(stderr, _("Unknown extended option: %s\n"),
>  				token);
> @@ -710,6 +714,8 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  		fputs(("\tjournal_only\n"), stderr);
>  		fputs(("\tdiscard\n"), stderr);
>  		fputs(("\tnodiscard\n"), stderr);
> +		fputs(("\tstrict_csums\n"), stderr);
> +		fputs(("\tno_strict_csums\n"), stderr);
>  		fputc('\n', stderr);
>  		exit(1);
>  	}
> @@ -945,6 +951,11 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
>  	profile_set_syntax_err_cb(syntax_err_report);
>  	profile_init(config_fn, &ctx->profile);
>  
> +	profile_get_boolean(ctx->profile, "options", "strict_csums", NULL,
> +			    0, &c);
> +	if (c)
> +		ctx->options |= E2F_OPT_CSUM_FIRST;
> +
>  	profile_get_boolean(ctx->profile, "options", "report_time", 0, 0,
>  			    &c);
>  	if (c)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/37] e2fsck: fix the extended attribute checksum error message
  2014-05-01 23:13 ` [PATCH 11/37] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
@ 2014-05-02 12:46   ` Lukáš Czerner
  2014-05-05 23:08     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 12:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:34 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 11/37] e2fsck: fix the extended attribute checksum error
>     message
> 
> Make the "EA block passes checks but fails checksum" message less
> strange.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  e2fsck/problem.c |   12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> index 0999399..ec20bd1 100644
> --- a/e2fsck/problem.c
> +++ b/e2fsck/problem.c
> @@ -992,19 +992,17 @@ static struct e2fsck_problem problem_table[] = {
>  	     "extent\n\t(logical @b %c, @n physical @b %b, len %N)\n"),
>  	  PROMPT_FIX, 0 },
>  
> -	/* Extended attribute block checksum for inode does not match. */
> +	/* Extended attribute block checksum does not match. */

The "for inode" is still there in the message, so I do not think
there is a reason to remove it from the comment.

>  	{ PR_1_EA_BLOCK_CSUM_INVALID,
> -	  N_("Extended attribute @a @b %b checksum for @i %i does not "
> -	     "match.  "),
> +	  N_("@a @b %b checksum for @i %i does not match.  "),
>  	  PROMPT_CLEAR, PR_INITIAL_CSUM },
>  
>  	/*
> -	 * Extended attribute block passes checks, but checksum for inode does
> -	 * not match.
> +	 * Extended attribute block passes checks, but checksum does not
> +	 * match.
>  	 */
>  	{ PR_1_EA_BLOCK_ONLY_CSUM_INVALID,
> -	  N_("Extended attribute @a @b %b passes checks, but checksum for "
> -	     "@i %i does not match.  "),
> +	  N_("@a @b %b passes checks, but checksum does not match.  "),

Is there a reason to remove the inode number from the message ?

Thanks!
-Lukas

>  	  PROMPT_FIX, 0 },
>  
>  	/*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible
  2014-05-01 23:13 ` [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
@ 2014-05-02 12:54   ` Lukáš Czerner
  2014-05-05 23:16     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-02 12:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:41 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if
>     possible
> 
> If e2fsck is writing a block of directory entries to disk, it should
> adjust the dirents to add the dirent tail if one is missing.  It's not
> a big deal if there's no space to do this since rehash (pass 3A) will
> reconstruct directories for us.  However, we may as well avoid
> unnecessary work.

I am sorry for the stupid questions, but in what case can be the
dirent tail missing ? It's not immediately obvious to me.

Thanks!
-Lukas

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  e2fsck/pass2.c |   40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> 
> diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
> index 5488c73..95f51b7 100644
> --- a/e2fsck/pass2.c
> +++ b/e2fsck/pass2.c
> @@ -739,6 +739,41 @@ static int is_last_entry(ext2_filsys fs, int inline_data_size,
>  		return (offset < fs->blocksize - csum_size);
>  }
>  
> +static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
> +{
> +	struct ext2_dir_entry *d;
> +	void *top;
> +	struct ext2_dir_entry_tail *t;
> +	unsigned int rec_len;
> +
> +	d = dirbuf;
> +	top = EXT2_DIRENT_TAIL(dirbuf, fs->blocksize);
> +
> +	rec_len = d->rec_len;
> +	while (rec_len && !(rec_len & 0x3)) {
> +		d = (struct ext2_dir_entry *)(((char *)d) + rec_len);
> +		if (((void *)d) + d->rec_len >= top)
> +			break;
> +		rec_len = d->rec_len;
> +	}
> +
> +	if (d != top) {
> +		size_t min_size = EXT2_DIR_REC_LEN(
> +				ext2fs_dirent_name_len(dirbuf));
> +		if (min_size > d->rec_len - sizeof(struct ext2_dir_entry_tail))
> +			return EXT2_ET_DIR_NO_SPACE_FOR_CSUM;
> +		d->rec_len -= sizeof(struct ext2_dir_entry_tail);
> +	}
> +
> +	t = (struct ext2_dir_entry_tail *)top;
> +	if (t->det_reserved_zero1 ||
> +	    t->det_rec_len != sizeof(struct ext2_dir_entry_tail) ||
> +	    t->det_reserved_name_len != EXT2_DIR_NAME_LEN_CSUM)
> +		ext2fs_initialize_dirent_tail(fs, t);
> +
> +	return 0;
> +}
> +
>  static int check_dir_block(ext2_filsys fs,
>  			   struct ext2_db_entry2 *db,
>  			   void *priv_data)
> @@ -1275,8 +1310,13 @@ skip_checksum:
>  		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
>  				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
>  		    is_leaf &&
> +		    !inline_data_size &&
>  		    !ext2fs_dirent_has_tail(fs, (struct ext2_dir_entry *)buf))
> +		{
> +			if (insert_dirent_tail(fs, buf) == 0)
> +				goto write_and_fix;
>  			e2fsck_rehash_dir_later(ctx, ino);
> +		}
>  
>  write_and_fix:
>  		if (e2fsck_dir_will_be_rehashed(ctx, ino))
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-02  9:45   ` Lukáš Czerner
@ 2014-05-02 14:04     ` Theodore Ts'o
  2014-05-06  1:59       ` Darrick J. Wong
  2014-05-06  1:33     ` Darrick J. Wong
  1 sibling, 1 reply; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-02 14:04 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: Darrick J. Wong, linux-ext4

On Fri, May 02, 2014 at 11:45:25AM +0200, Lukáš Czerner wrote:
> This is definitely NACK by me. I do not like this and there are
> several reasons why.
> 
> First of all the name. Given the history of ext file system we tend
> to increase then number with the new version of file system. However
> you're saying that this is just for testing features ... in that
> case it does not make any sense to call it ext5, but not just that
> it's stupid to call it ext5 especially since we might actually want
> to release ext5 in the future and this would be really confusing for
> everybody involved.

Yes, the messaging involved with the "ext3" vs "ext4" bump has been
really unfortunate.  If I had to do it all over again, I would have
created "ext3dev", and then when it was stable, I would done a:

	git rm -rf fs/ext3 ; git mv fs/ext3dev fs/ext4

For example, it would have avoided the problem with SuSE product
managers refusing to support ext4 for multiple years, etc.

It also would have avoided the problem with people doing comparisons
of ext3 versus xfs, even in April 2014 (see a recent Hacker News
promoted blog article, where in someone kvetched that ext3 didn't
support fallocate).  Sigh....

> What about just simply using mkefs.conf to specify the feature set
> we want and use that?

Yes, it's likely that for 1.43 we'll enable various features by
default.  It's been quite deliberate that I haven't enabled by
default, because I wanted to make 100% sure they were completely
stable before enabling them by default.  Some of them we may have been
able to enable by default earlier, but be that as it may, 1.43 is a
good time to make that change.

				- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/37] e2fsck: write dir blocks after new inode when reconstructing root/lost+found
  2014-05-01 23:13 ` [PATCH 13/37] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
@ 2014-05-05 17:13   ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-05 17:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:47 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 13/37] e2fsck: write dir blocks after new inode when
>     reconstructing root/lost+found
> 
> If we trash the root directory block, e2fsck will find inode 11 (the
> old lost+found) and try to attach it to l+f.  The lost+found checker
> also fails to find l+f and tries to add one to the root dir.  The root
> dir is not found but is recreated with incorrect checksums, so linking
> in the l+f dir fails and the l+f '..' entry isn't set.  Since both
> dirs now fail checksum verification, they're both referred to rehash
> to have that fixed, but because l+f doesn't have a '..' entry, rehash
> crashes because l+f has < 2 entries.
> 
> On a checksumming filesystem, the routines in e2fsck that recreate
> /lost+found and / must write the new directory block *after* the inode
> has been written to disk because the checksum depends on i_generation.
> Add a regression test while we're at it.

Looks good, but might be worth noting in description that it also fixes
possible memory leak in check_root()

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  e2fsck/pass3.c                        |   85 +++++----
>  tests/f_rebuild_csum_rootdir/expect.1 |  311 +++++++++++++++++++++++++++++++++
>  tests/f_rebuild_csum_rootdir/expect.2 |    7 +
>  tests/f_rebuild_csum_rootdir/image.gz |  Bin
>  tests/f_rebuild_csum_rootdir/name     |    1 
>  5 files changed, 364 insertions(+), 40 deletions(-)
>  create mode 100644 tests/f_rebuild_csum_rootdir/expect.1
>  create mode 100644 tests/f_rebuild_csum_rootdir/expect.2
>  create mode 100644 tests/f_rebuild_csum_rootdir/image.gz
>  create mode 100644 tests/f_rebuild_csum_rootdir/name
> 
> 
> diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
> index 6f7f855..efc0d49 100644
> --- a/e2fsck/pass3.c
> +++ b/e2fsck/pass3.c
> @@ -188,28 +188,6 @@ static void check_root(e2fsck_t ctx)
>  	ext2fs_mark_bb_dirty(fs);
>  
>  	/*
> -	 * Now let's create the actual data block for the inode
> -	 */
> -	pctx.errcode = ext2fs_new_dir_block(fs, EXT2_ROOT_INO, EXT2_ROOT_INO,
> -					    &block);
> -	if (pctx.errcode) {
> -		pctx.str = "ext2fs_new_dir_block";
> -		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
> -		ctx->flags |= E2F_FLAG_ABORT;
> -		return;
> -	}
> -
> -	pctx.errcode = ext2fs_write_dir_block4(fs, blk, block, 0,
> -					       EXT2_ROOT_INO);
> -	if (pctx.errcode) {
> -		pctx.str = "ext2fs_write_dir_block4";
> -		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
> -		ctx->flags |= E2F_FLAG_ABORT;
> -		return;
> -	}
> -	ext2fs_free_mem(&block);
> -
> -	/*
>  	 * Set up the inode structure
>  	 */
>  	memset(&inode, 0, sizeof(inode));
> @@ -232,6 +210,30 @@ static void check_root(e2fsck_t ctx)
>  	}
>  
>  	/*
> +	 * Now let's create the actual data block for the inode.
> +	 * Due to metadata_csum, we must write the dir blocks AFTER
> +	 * the inode has been written to disk!
> +	 */
> +	pctx.errcode = ext2fs_new_dir_block(fs, EXT2_ROOT_INO, EXT2_ROOT_INO,
> +					    &block);
> +	if (pctx.errcode) {
> +		pctx.str = "ext2fs_new_dir_block";
> +		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
> +		ctx->flags |= E2F_FLAG_ABORT;
> +		return;
> +	}
> +
> +	pctx.errcode = ext2fs_write_dir_block4(fs, blk, block, 0,
> +					       EXT2_ROOT_INO);
> +	ext2fs_free_mem(&block);
> +	if (pctx.errcode) {
> +		pctx.str = "ext2fs_write_dir_block4";
> +		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
> +		ctx->flags |= E2F_FLAG_ABORT;
> +		return;
> +	}
> +
> +	/*
>  	 * Miscellaneous bookkeeping...
>  	 */
>  	e2fsck_add_dir_info(ctx, EXT2_ROOT_INO, EXT2_ROOT_INO);
> @@ -449,24 +451,6 @@ unlink:
>  	ext2fs_inode_alloc_stats2(fs, ino, +1, 1);
>  
>  	/*
> -	 * Now let's create the actual data block for the inode
> -	 */
> -	retval = ext2fs_new_dir_block(fs, ino, EXT2_ROOT_INO, &block);
> -	if (retval) {
> -		pctx.errcode = retval;
> -		fix_problem(ctx, PR_3_ERR_LPF_NEW_DIR_BLOCK, &pctx);
> -		return 0;
> -	}
> -
> -	retval = ext2fs_write_dir_block4(fs, blk, block, 0, ino);
> -	ext2fs_free_mem(&block);
> -	if (retval) {
> -		pctx.errcode = retval;
> -		fix_problem(ctx, PR_3_ERR_LPF_WRITE_BLOCK, &pctx);
> -		return 0;
> -	}
> -
> -	/*
>  	 * Set up the inode structure
>  	 */
>  	memset(&inode, 0, sizeof(inode));
> @@ -486,6 +470,27 @@ unlink:
>  		fix_problem(ctx, PR_3_CREATE_LPF_ERROR, &pctx);
>  		return 0;
>  	}
> +
> +	/*
> +	 * Now let's create the actual data block for the inode.
> +	 * Due to metadata_csum, the directory block MUST be written
> +	 * after the inode is written to disk!
> +	 */
> +	retval = ext2fs_new_dir_block(fs, ino, EXT2_ROOT_INO, &block);
> +	if (retval) {
> +		pctx.errcode = retval;
> +		fix_problem(ctx, PR_3_ERR_LPF_NEW_DIR_BLOCK, &pctx);
> +		return 0;
> +	}
> +
> +	retval = ext2fs_write_dir_block4(fs, blk, block, 0, ino);
> +	ext2fs_free_mem(&block);
> +	if (retval) {
> +		pctx.errcode = retval;
> +		fix_problem(ctx, PR_3_ERR_LPF_WRITE_BLOCK, &pctx);
> +		return 0;
> +	}
> +
>  	/*
>  	 * Finally, create the directory link
>  	 */
> diff --git a/tests/f_rebuild_csum_rootdir/expect.1 b/tests/f_rebuild_csum_rootdir/expect.1
> new file mode 100644
> index 0000000..6b5c47b
> --- /dev/null
> +++ b/tests/f_rebuild_csum_rootdir/expect.1
> @@ -0,0 +1,311 @@
> +Pass 1: Checking inodes, blocks, and sizes
> +Pass 2: Checking directory structure
> +Directory inode 2, block #0, offset 0: directory has no checksum
> +Fix? yes
> +
> +Directory inode 2, block #0, offset 0: directory corrupted
> +Salvage? yes
> +
> +Missing '.' in directory inode 2.
> +Fix? yes
> +
> +Setting filetype for entry '.' in ??? (2) to 2.
> +Missing '..' in directory inode 2.
> +Fix? yes
> +
> +Setting filetype for entry '..' in ??? (2) to 2.
> +Pass 3: Checking directory connectivity
> +'..' in / (2) is <The NULL inode> (0), should be / (2).
> +Fix? yes
> +
> +Unconnected directory inode 11 (/???)
> +Connect to /lost+found? yes
> +
> +/lost+found not found.  Create? yes
> +
> +Pass 3A: Optimizing directories
> +Pass 4: Checking reference counts
> +Inode 11 ref count is 3, should be 2.  Fix? yes
> +
> +Unattached inode 12
> +Connect to /lost+found? yes
> +
> +Inode 12 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 13
> +Connect to /lost+found? yes
> +
> +Inode 13 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 14
> +Connect to /lost+found? yes
> +
> +Inode 14 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 15
> +Connect to /lost+found? yes
> +
> +Inode 15 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 16
> +Connect to /lost+found? yes
> +
> +Inode 16 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 17
> +Connect to /lost+found? yes
> +
> +Inode 17 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 18
> +Connect to /lost+found? yes
> +
> +Inode 18 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 19
> +Connect to /lost+found? yes
> +
> +Inode 19 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 20
> +Connect to /lost+found? yes
> +
> +Inode 20 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 21
> +Connect to /lost+found? yes
> +
> +Inode 21 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 22
> +Connect to /lost+found? yes
> +
> +Inode 22 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 23
> +Connect to /lost+found? yes
> +
> +Inode 23 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 24
> +Connect to /lost+found? yes
> +
> +Inode 24 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 25
> +Connect to /lost+found? yes
> +
> +Inode 25 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 26
> +Connect to /lost+found? yes
> +
> +Inode 26 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 27
> +Connect to /lost+found? yes
> +
> +Inode 27 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 28
> +Connect to /lost+found? yes
> +
> +Inode 28 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 29
> +Connect to /lost+found? yes
> +
> +Inode 29 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 30
> +Connect to /lost+found? yes
> +
> +Inode 30 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 31
> +Connect to /lost+found? yes
> +
> +Inode 31 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 32
> +Connect to /lost+found? yes
> +
> +Inode 32 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 33
> +Connect to /lost+found? yes
> +
> +Inode 33 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 34
> +Connect to /lost+found? yes
> +
> +Inode 34 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 35
> +Connect to /lost+found? yes
> +
> +Inode 35 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 36
> +Connect to /lost+found? yes
> +
> +Inode 36 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 37
> +Connect to /lost+found? yes
> +
> +Inode 37 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 38
> +Connect to /lost+found? yes
> +
> +Inode 38 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 39
> +Connect to /lost+found? yes
> +
> +Inode 39 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 40
> +Connect to /lost+found? yes
> +
> +Inode 40 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 41
> +Connect to /lost+found? yes
> +
> +Inode 41 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 42
> +Connect to /lost+found? yes
> +
> +Inode 42 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 43
> +Connect to /lost+found? yes
> +
> +Inode 43 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 44
> +Connect to /lost+found? yes
> +
> +Inode 44 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 45
> +Connect to /lost+found? yes
> +
> +Inode 45 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 46
> +Connect to /lost+found? yes
> +
> +Inode 46 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 47
> +Connect to /lost+found? yes
> +
> +Inode 47 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 48
> +Connect to /lost+found? yes
> +
> +Inode 48 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 49
> +Connect to /lost+found? yes
> +
> +Inode 49 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 50
> +Connect to /lost+found? yes
> +
> +Inode 50 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 51
> +Connect to /lost+found? yes
> +
> +Inode 51 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 52
> +Connect to /lost+found? yes
> +
> +Inode 52 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 53
> +Connect to /lost+found? yes
> +
> +Inode 53 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 54
> +Connect to /lost+found? yes
> +
> +Inode 54 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 55
> +Connect to /lost+found? yes
> +
> +Inode 55 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 56
> +Connect to /lost+found? yes
> +
> +Inode 56 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 57
> +Connect to /lost+found? yes
> +
> +Inode 57 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 58
> +Connect to /lost+found? yes
> +
> +Inode 58 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 59
> +Connect to /lost+found? yes
> +
> +Inode 59 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 60
> +Connect to /lost+found? yes
> +
> +Inode 60 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 61
> +Connect to /lost+found? yes
> +
> +Inode 61 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 62
> +Connect to /lost+found? yes
> +
> +Inode 62 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 63
> +Connect to /lost+found? yes
> +
> +Inode 63 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 64
> +Connect to /lost+found? yes
> +
> +Inode 64 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached zero-length inode 65.  Clear? yes
> +
> +Unattached inode 66
> +Connect to /lost+found? yes
> +
> +Inode 66 ref count is 2, should be 1.  Fix? yes
> +
> +Unattached inode 67
> +Connect to /lost+found? yes
> +
> +Inode 67 ref count is 2, should be 1.  Fix? yes
> +
> +Pass 5: Checking group summary information
> +
> +test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
> +test_filesys: 67/512 files (1.5% non-contiguous), 1127/2048 blocks
> +Exit status is 1
> diff --git a/tests/f_rebuild_csum_rootdir/expect.2 b/tests/f_rebuild_csum_rootdir/expect.2
> new file mode 100644
> index 0000000..033f1bf
> --- /dev/null
> +++ b/tests/f_rebuild_csum_rootdir/expect.2
> @@ -0,0 +1,7 @@
> +Pass 1: Checking inodes, blocks, and sizes
> +Pass 2: Checking directory structure
> +Pass 3: Checking directory connectivity
> +Pass 4: Checking reference counts
> +Pass 5: Checking group summary information
> +test_filesys: 67/512 files (1.5% non-contiguous), 1127/2048 blocks
> +Exit status is 0
> diff --git a/tests/f_rebuild_csum_rootdir/image.gz b/tests/f_rebuild_csum_rootdir/image.gz
> new file mode 100644
> index 0000000000000000000000000000000000000000..a32fd4431a44560b20033d43836000ef22ce977f
> GIT binary patch
> literal 12476
> zcmeI2c~leUyT@t$QMaFB>q3zwwrWuk5Ktr{q?HOkRKy~R2oeP}Dq@f&VRc#;a6z#s
> zrpl69L{tz2#3&I4TtEa8ma;FSr9dEopd<uln0fBld(XM|o_o$czd!p&@=tP}yz|cc
> zeBS5%exEsK7#C;g9HESNemYIjJ@bmu!CU3;GrDISaQ)YkC7*op=Y<*7pKbba-qmkE
> z{p=r`FV7KVy-rRy3bf?&zWaXpYvZqH{p0*kI-Bftzdp6>+ba_$YR?N_+<)}zf`To5
> z=hb$P-kkHF`HVWm`KluLK6~u-bcIqdR9ibDd9NH992~rn;rhk?^7~s8DqO8i9iH5G
> zbTwI3I@!Bs+3X;vX#e`4%+zo0IIWB#)puBe<k4)1jR@a~ZElv_JmzO_wa#H8NfkT$
> zv*p#67WPB&=KV$mH4)-VMq?elNFE}uYs-01N_4+AeAe?m#pQuxWM!6(s7&6qu=g8u
> zDwWuhHCTP6qz5~DIUa2>J>5bO@g0*d-aQQe^5?moSuer{AFys?L8)aG_kI{eJy^)P
> zyT8XU=k4shrP+9NwN81tFNv)(=H1H!mgoJP2cHO%ch#7zGPbpr$L9Nrqg?{@baZrj
> z7=DQ5$~mJa>EDrpV>9Wm*tcinA`+HmmArRVmgyjicYCcr`AoB!{-=C;#g%1~!(FiP
> zSW-xA_nO{mbkOmx6f7yFD<_KjIAI2S3@Tv&g0i>y-OR|18&;2&WoE5M#^Z<9AJmLF
> z^ehjUPY>zoJSv5)#RD3j(mLTd>GrTDx=P+^pOMRdcD%G#oX3w9sV^saD<@o<ZdEM!
> zL`UaGK;p1zPX_Hbvgpf4dzv-BA(7v~eLP`HM6rnc@ARU;tTnHvq2=RUB9X~~oXk`_
> z{!&CbimW=RxoDF-RAmyYDsFHKIO?S24qdHieo;~V$so)@%!5aNnvr#zUzNc3?=wvf
> z^!(8Fa}e;J=dMXxAJCk9iAe>du06UR8_0+X*{@-rkd%)I-$zb$K3ucyba4~mO$m<H
> zpnuFgSu-d)xuf=Xo97?6Z~hQD&J9*kw}A@G?&(83_ED|ehg00U<Tpm~NmAqg+R4+;
> zzhdSS$+Ml{H}in;DwS&5G=f6gOD04Ih7*Srm~@KWYEPh6z<om5teDj3ox{-%dDa>b
> ze%Mw!iENGr)XL3hj4Ma7*0;bG@&TA{OLhpm{9_oyr0_RXsI)Ux!BCs2U)#yu*t|TJ
> z(eYYp`J7jHRlkB|nF-l1oPgQaM$UKi%td@9)R+yfzt0F~44&&|=UKL3zZAi#M9Pi&
> z63@PLxf?xRLlFt68?vv1<;a1v2ISU7XQ^FDl?uIl&~x1q(Tmynd98PJD2R%VmEXRV
> zSDe}Cz}xQP9<x<#Gp$KDXQld4R<^ivz_Ze;f6fCk7tg%@;hsa-fal}8ev)4spY^wl
> z94Hm+JQ^I7d#%&Vcwb77yMa;0KP}erdA9uGi11)<2lAGOS6I#9V6!Kdv}R-bBTr_L
> z$KdgiygZ+?sPZs(rD&Z<bZnnPsq4L*tWpnG=|i_+ThBvY{d|7-qxeqsyAHL*3VyM`
> z$9}_xCVBT`9y6O49xme51{-+WcXu~8>|Dv;^*ARzTa<4`<}7P3!;9*+>>Ca<%n_t#
> z^xWc+Rg$##=a+KgkF>?a4DcV<S!U~I2BF;4bic8#a3))vs5|E4V=vxtD4}NJU&Gw(
> z!$*fl9E<q7Y!Xjz35#JSo=ztg*JSVJR#_Bx-n{(s`kSezDWgBl%1k!jI_P;=5xlL6
> z+d9~ncUv=U(Q}MfXM%NDiJwMn8!qhJd()>QxkqpB3{*LswzIu7+S^C4s@pjIQryz)
> z0&6txc+g}(@#BoV@R$5Y%VRu!Y%S}ChnU4DcrHFZE-{6i;Sc@kRvbzYy}>LXW*}3A
> zHysR&jt#VawpJ{!m5f}UOwmoL3=Q*=-?&uV$sSxS;7EhEAx-UF4l#b}@ud6=&f$BF
> zEnUs&e&1n{SdMMnhM2u(ef=5NCyiYT`Pd^pQE65%FD%+%{{2?A&=1*J=ssN7(J$Rz
> z73;D!YDzKEqT<XCLYjKIZzy~WTT5TL7P5z}?Kjj6H+Oay?5C*WN{K{ry)wKk)zcoy
> zvd%AwThWy_9;z`g-Z#5(t7~#~bJ^!v8RPvUBjGnwd=|%S7`PJ_`7}B98F#P63ek13
> zeMk21l?{et+yIUjGd&@@#X?heLa?j}A2Ca@>1=99HQFQ2-xa-g!`%CRl824@HNg&-
> z2Gg^}T`dz?)5i63_{}wwjb~=YU0r?Sht&RFpPH~#j?g<@VZ6`0FHhCN@J;O>yXkHz
> zbsMN0jqN{QmvGz;X$zc1i=N@TF<?CTW6CUX=9lHqozcH6BeKxZmD(LWR@JQ6@4Lsa
> zasFvP!|ny~9v^VI8)C6bbTJR2-7vX0wRQWMVSPx+;8y&45?voSEvC4bj~YhF(fHC2
> zcblh6pM>|f@YBWqU4IIYF!sH*4h9~Cs-gb9gx#C|pB_M!86DhlPZPT2PJ_@~YP(2h
> z|Jx|{xG%pu@}q_p<#O>d1%!lEkiv1vQ+To2fMEGfG><NBKR7=?2`65E^9ncpQ-RuL
> z0)^Kj@`?gNJZl7s+$Gdz1JWJS$05W=L3vaRvaC3;+<+7dHX%-43QD-J3%FPsdC>;f
> z*sWEuLZVe9W^3MpvLzHZP=!5Nt<vkT;K3ht7t~5&5i}-JS)K8e6MQa4M5i?1$|TSm
> z4anHJAqw=WM!49sJ@5$~OKAYb?Rt2Z?XO|^j8ZHn2M!vL`bPS=vTY>~ymghpkhUXH
> z&TwL@6Vx{SW(l($DYxo&WV180TIiAT;}^P-iAAsQ?0yA1xdC~J<;Yp$mm`e5Apl31
> zM1HM|JvLSkqZtK6(&~vRWqBHjNo2@iUp@AD9H|V@6(H056zHl<5cSow<hgAmr*)xR
> zm{EiraR&GMLQuVV(?+6(eu^>1p+6O{-5J{YZo#-rh!h^m@h21hRIsl%AOj=KYOJq-
> z;9EQ)0}e^392;EZJr9@0t>wW^-Bv`et{ri6h6@)rfvf4O5!hOxMqh-ETS{fzz4P%T
> z^cx-+oW7t&tT|9@K+fmSAVrBu-Ozh#4BpXM)X4N0qgJb!C-y3+fLYsNY3Mjx*Z|Mp
> z)FUUH6x0A6a_>{Xac7Wg&QSZ11!Aa3CzOmyTT*=lvctx)D-G=EjjH?}dmeP1N&*);
> zXRAzTaoYrsr@tBzdW=qD4NSr-P^FiDlIpGL_19nykM@8Sli<GD3qHnps>%6o#IIry
> zIVPb}ZE$|6jS3+ZZKUO*XRw~0ZMj^HvJ<FkqEexZVU*udj!x0uTpmM^;NV&?%qR*y
> z4o}tv5H8ZO14PwxB7h6b+_&TTy`kgS^#(MUBSX~VZDgl@EZIAeMv6EpESxJtN0Qq}
> z;mQe0Fh6u0vCKq^I#5|g8|l6%mQ=n>Bl&?UW!QchYx1%TgPkF`@LCZ_5-BsO9w`+v
> z#K`v*N_3K5vAFOt!ulMC@b55<F#aoF#IEaZC(JH`-Y`1rLn_F@scbVBx^Q-nQZPvu
> ziByK(DVeh8p!Cfa!~#~$B$RaFsE39K#3w+*W`~di*V0JVYZ<Gc1bV#|`)e%CCUAC`
> zgo?4og;SelDE_F8447>J?)2t!ouRu~$;z3s=D|&R1I(EO@=F8y!s|Yn<F65Vgg`>5
> zgdz%iAk0aIh96Z>O|u*+X>~hYn#kUWNu;*4fVi4C0M(z7uTW(~gfskMd4@{Lz1Ypl
> zdP_0F!Y*GSY;ZKzM~10nZRFKi7SP|PK$c`8IrO1>alJ}0T}S836)oFuJX`Cnl$U)t
> zyqN503pOF@+$v4zoqCp^0SSW07yQs9SKFyK$LP@8Y2_4*g9uO(hJgzR0iA$-3`y<D
> zVGt@eC;)meKzz@Tkko0eVv)XV;<crT)$bn%VZV^_Qa~M*Q8%4Y+XJapdBE^-r|125
> zK9TwVr>Sd%c++YZwqf_$JGLt7j-!FEgHf8%6$%?|^{_r%I+k)Dc`wPG`aU^vi-~>c
> zFt+`dh4EvDa`zTmRUyZ&I?quMT{e1i6_lC8ppCRKNkR-4#UFXdi)Ph;NARRqR|S{3
> zFS=&%9_LIScI_26&Zt91x&m3fy|e~ymrUhj`zNs-gqxkw^|?zM4~WAfE#h|AEgOhn
> zhaQlzA04cve#tnwe?JE6ee}ULH~oinlDGl#Xb!CJhdHgtXU*|^|9rvoDddff)c`CU
> zB%yNciCXjlr1b0oGqs2kt#KW%WqvA&i+p{7I$746Ru4G=!pH-JcbwA`D&&b3Ay~UW
> z&PXkSXNi<>t{z>00UGy-9R<`0CLxu|*x@H+$nB*(<Z0O+BtSyVw#L^_2FQfA8+cG+
> zPRjvhOGX?4`gG|ZQdxR0z`Yz6i=qEi4O$vD&c4=wyz_Db+9@=QLe~OnCX=v~$WYW#
> zg++%bs61N6<7mZtBSY}AdPwN>*9d6YIZuzOmSLp}aKjRR4dlM<g)%0g&rvB)9F`$(
> zRk9=`b&(viq$ybAT;E0b^k984^8U~(Ttf>xt+9>4PH+p(b)!BN1l*<%Uabs``d&Q=
> zFA9jx^_DHjx3t9^RI%(=s*s6$ZKRae8+S=N;_VE&Rc9%C9&G@-=}W8V6fzo+uEu)S
> z*dtjDoJvA=(kl6L0~FUdK}o8=hEcY;fV#&d1c55Xd(99owJ0DG7R6F5a}J~!kU=L+
> za7K>}4}R7af-&6=cN&nErAD~@Z}||H`}TwK*EG`Wpb8!v;hgk^Vo;R$lTBgcmUS|4
> zRjY+wyDLYad(^)Z`Cm5C$!a~eD;KV$ku}!1va7U&5^v;zL|25=*0v+`DYl)x37$5H
> zz+qPisi{vRTLV@3Pgbae;T{SKUEmA$t2yu?k-D?)7EA=uN?~<(517)qrZ%unB-A4{
> z6Ky2R!VVZ2MNpDRMeTHg^G$TFv=%acE&@D}(%&6VT{NT()++=@R&juDgR@anl@iZv
> zBlSkQ5O^&b%y%bP=|vEqNHzA}LexD4#K5B1DHc8Yk^w0?{sk^_Siys9r=)OM(vE(e
> ziA1VNhGo*lZbVb>Pgc<_#mXR(8zJLHFXKU!ZZ;Iqz9EreYDqn)ivl$0o>2;;rTKbp
> zE1WdOO_hdX)O2MF%ZvjL3`hrQ0(DkXHNwh$JN5C=q|+J~4gZtO?=cZ0#3?zeYCwTH
> zWWt>oCx{Td9D!{W!)&^Z0Y7JiQ;ak?6cEDIPbkD+LfKp68s-|6FxEw-frV+5DQ!$P
> zneb-J3C0GhMi@Vgl3XFJ#$GMG*!)}IuCE$~1@R%|yO2yrO23O5dzwa?+2F9Nn~Lf?
> z*G9gW)dMWLpCJs$R!;#kpwJh?`BR;L6(HXka=g6Q5Ok><iK|cx>1u-|necLmV4#ij
> z#E5r=8v4^nO`wW3d9fZ*A1NTJ7Wq=bn?=x*NQrZlNZwy&pn8w^{xpas7eRuz5^4V*
> zD!SGgR_bD~gRXXV%Q$Md39@zuxWoTr3>wYe^agZd_AOW{_t&`7zM1rt>GGO1AlYT+
> zDsFVRf^wy+ySAnszIBG(%^J}2F)3pFJA$koqa~mvpe3Lspe3Lspe3Lspe3Lspe3Ls
> zpe3Lspe3Lspe3Lspe3Lspe3Lspe3Lspe3Lspe67(B|wQB?3U1P9P7+eL4^IM3;(9e
> W)Gq69OyI|Sozg9bf2Chr*ZB{=uW}{;
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/f_rebuild_csum_rootdir/name b/tests/f_rebuild_csum_rootdir/name
> new file mode 100644
> index 0000000..b246f48
> --- /dev/null
> +++ b/tests/f_rebuild_csum_rootdir/name
> @@ -0,0 +1 @@
> +force fsck to rebuild a corrupted rootdir w/ metadata_csum
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] dumpe2fs: add switch to disable checksum verification
  2014-05-01 23:13 ` [PATCH 14/37] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
@ 2014-05-05 17:20   ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-05 17:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:13:54 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 14/37] dumpe2fs: add switch to disable checksum verification
> 
> Add a -n switch to turn off checksum verification.

Looks good. Thanks!

Reviewed-by: Lukas Czerner <lczerner@redhat.com>


> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/dumpe2fs.8.in |    3 +++
>  misc/dumpe2fs.c    |   10 +++++++---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/misc/dumpe2fs.8.in b/misc/dumpe2fs.8.in
> index befaf94..51614db 100644
> --- a/misc/dumpe2fs.8.in
> +++ b/misc/dumpe2fs.8.in
> @@ -61,6 +61,9 @@ using
>  .I device
>  as the pathname to the image file.
>  .TP
> +.B \-n
> +Don't verify checksums when dumping the filesystem.
> +.TP
>  .B \-x
>  print the detailed group information block numbers in hexadecimal format
>  .TP
> diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
> index ae54f8a..3a3684b 100644
> --- a/misc/dumpe2fs.c
> +++ b/misc/dumpe2fs.c
> @@ -52,7 +52,7 @@ static int blocks64 = 0;
>  
>  static void usage(void)
>  {
> -	fprintf (stderr, _("Usage: %s [-bfhixV] [-o superblock=<num>] "
> +	fprintf(stderr, _("Usage: %s [-bfhinxV] [-o superblock=<num>] "
>  		 "[-o blocksize=<num>] device\n"), program_name);
>  	exit (1);
>  }
> @@ -582,7 +582,9 @@ int main (int argc, char ** argv)
>  	if (argc && *argv)
>  		program_name = *argv;
>  
> -	while ((c = getopt (argc, argv, "bfhixVo:")) != EOF) {
> +	flags = EXT2_FLAG_JOURNAL_DEV_OK | EXT2_FLAG_SOFTSUPP_FEATURES |
> +		EXT2_FLAG_64BITS;
> +	while ((c = getopt(argc, argv, "bfhixVo:n")) != EOF) {
>  		switch (c) {
>  		case 'b':
>  			print_badblocks++;
> @@ -608,6 +610,9 @@ int main (int argc, char ** argv)
>  		case 'x':
>  			hex_format++;
>  			break;
> +		case 'n':
> +			flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
> +			break;
>  		default:
>  			usage();
>  		}
> @@ -615,7 +620,6 @@ int main (int argc, char ** argv)
>  	if (optind > argc - 1)
>  		usage();
>  	device_name = argv[optind++];
> -	flags = EXT2_FLAG_JOURNAL_DEV_OK | EXT2_FLAG_SOFTSUPP_FEATURES | EXT2_FLAG_64BITS;
>  	if (force)
>  		flags |= EXT2_FLAG_FORCE;
>  	if (image_dump)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/37] mke2fs: set block_validity as a default mount option
  2014-05-01 23:14 ` [PATCH 15/37] mke2fs: set block_validity as a default mount option Darrick J. Wong
@ 2014-05-05 17:24   ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-05 17:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:14:00 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 15/37] mke2fs: set block_validity as a default mount option
> 
> The block_validity mount option spot-checks block allocations against
> a bitmap of known group metadata blocks.  This helps us to prevent
> self-inflicted catastrophic failures such as trying to "share"
> critical metadata (think bitmaps) with file data, which usually
> results in filesystem destruction.
> 
> In order to test the overhead of the mount option, I re-used the speed
> tests in the metadata checksum testing script.  In short, the program
> creates what looks like 15 copies of a kernel source tree, except that
> it uses fallocate to strip out the overhead of writing the file data
> so that we can focus on metadata overhead.  On a 64G RAM disk, the
> overhead was generally about 0.9% and at most 1.6%.  On a 160G USB
> disk, the overhead was about 0.8% and peaked at 1.2%.
> 
> When I changed the test to write out files instead of merely
> fallocating space, the overhead was negligible.

I like that, but I think we need to run more performance testing on
that to make sure that it really does not break something.

Eric, will you be able to run your performance tests with
block_validity option and compare it with baseline to see whether it
really does not change anything ?

I'll try to run my tests as well.

We really need generic performance test suite, I hope that Dave will
send his performance addition to xfstests soon so we can start
adding tests.

Thanks!
-Lukas

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/mke2fs.conf.in |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
> index 4c5dba7..de0250d 100644
> --- a/misc/mke2fs.conf.in
> +++ b/misc/mke2fs.conf.in
> @@ -1,6 +1,6 @@
>  [defaults]
>  	base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
> -	default_mntopts = acl,user_xattr
> +	default_mntopts = acl,user_xattr,block_validity
>  	enable_periodic_fsck = 0
>  	blocksize = 4096
>  	inode_size = 256
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] misc: coverity fixes
  2014-05-02 11:17   ` Lukáš Czerner
@ 2014-05-05 20:04     ` Darrick J. Wong
  2014-05-11 22:40       ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 20:04 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 01:17:49PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:12:36 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 02/37] misc: coverity fixes
> > 
> > Fix various small resource leaks and error code handling issues that
> > Coverity pointed out.

<snip>

> > diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
> > index 60cd2a3..c9250cd 100644
> > --- a/lib/ext2fs/punch.c
> > +++ b/lib/ext2fs/punch.c
> > @@ -403,7 +403,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
> >  			retval = 0;
> >  
> >  			/* Jump forward to the next extent. */
> > -			ext2fs_extent_goto(handle, next_lblk);
> > +			(void)ext2fs_extent_goto(handle, next_lblk);
> 
> Why do we not want to check the return value of this ? There might
> be an error right ?

We can ignore errors that happen during the goto because the subsequent
ext2fs_extent_get() about ten lines down (to load another extent) will tell us
if there are no more extents or if some error happened.

(I suppose I can add a comment explaining this.)

> >  			op = EXT2_EXTENT_CURRENT;
> >  		}
> >  		if (retval)
> > diff --git a/misc/create_inode.c b/misc/create_inode.c
> > index 964c66a..4bb5e5b 100644
> > --- a/misc/create_inode.c
> > +++ b/misc/create_inode.c
> > @@ -465,7 +465,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  	char		ln_target[PATH_MAX];
> >  	unsigned int	save_inode;
> >  	ext2_ino_t	ino;
> > -	errcode_t	retval;
> > +	errcode_t	retval = 0;
> >  	int		read_cnt;
> >  	int		hdlink;
> >  
> > @@ -486,7 +486,11 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  		if ((!strcmp(dent->d_name, ".")) ||
> >  		    (!strcmp(dent->d_name, "..")))
> >  			continue;
> > -		lstat(dent->d_name, &st);
> > +		if (lstat(dent->d_name, &st)) {
> > +			com_err(__func__, errno, _("while lstat \"%s\""),
> > +				dent->d_name);
> > +			goto out;
> > +		}
> >  		name = dent->d_name;
> >  
> >  		/* Check for hardlinks */
> > @@ -501,7 +505,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  				if (retval) {
> >  					com_err(__func__, retval,
> >  						"while linking %s", name);
> > -					return retval;
> > +					goto out;
> >  				}
> >  				continue;
> >  			} else
> > @@ -517,7 +521,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  				com_err(__func__, retval,
> >  					_("while creating special file "
> >  					  "\"%s\""), name);
> > -				return retval;
> > +				goto out;
> >  			}
> >  			break;
> >  		case S_IFSOCK:
> > @@ -527,7 +531,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  			continue;
> >  		case S_IFLNK:
> >  			read_cnt = readlink(name, ln_target,
> > -					    sizeof(ln_target));
> > +					    sizeof(ln_target) - 1);
> >  			if (read_cnt == -1) {
> >  				com_err(__func__, errno,
> >  					_("while trying to readlink \"%s\""),
> > @@ -541,7 +545,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  				com_err(__func__, retval,
> >  					_("while writing symlink\"%s\""),
> >  					name);
> > -				return retval;
> > +				goto out;
> >  			}
> >  			break;
> >  		case S_IFREG:
> > @@ -550,7 +554,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  			if (retval) {
> >  				com_err(__func__, retval,
> >  					_("while writing file \"%s\""), name);
> > -				return retval;
> > +				goto out;
> >  			}
> >  			break;
> >  		case S_IFDIR:
> > @@ -559,25 +563,25 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  			if (retval) {
> >  				com_err(__func__, retval,
> >  					_("while making dir \"%s\""), name);
> > -				return retval;
> > +				goto out;
> >  			}
> >  			retval = ext2fs_namei(fs, root, parent_ino,
> >  					      name, &ino);
> >  			if (retval) {
> >  				com_err(name, retval, 0);
> > -					return retval;
> > +					goto out;
> >  			}
> >  			/* Populate the dir recursively*/
> >  			retval = __populate_fs(fs, ino, name, root, hdlinks);
> >  			if (retval) {
> >  				com_err(__func__, retval,
> >  					_("while adding dir \"%s\""), name);
> > -				return retval;
> > +				goto out;
> >  			}
> >  			if (chdir("..")) {
> >  				com_err(__func__, errno,
> >  					_("during cd .."));
> > -				return errno;
> 
> you probably wan to store errno in retval because that's what we
> return from the function.

Oops, yes.

> > +				goto out;
> >  			}
> >  			break;
> >  		default:
> > @@ -588,14 +592,14 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  		retval =  ext2fs_namei(fs, root, parent_ino, name, &ino);
> >  		if (retval) {
> >  			com_err(name, retval, 0);
> > -			return retval;
> > +			goto out;
> >  		}
> >  
> >  		retval = set_inode_extra(fs, parent_ino, ino, &st);
> >  		if (retval) {
> >  			com_err(__func__, retval,
> >  				_("while setting inode for \"%s\""), name);
> > -			return retval;
> > +			goto out;
> >  		}
> >  
> >  		/* Save the hardlink ino */
> > @@ -612,7 +616,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  				if (p == NULL) {
> >  					com_err(name, errno,
> >  						_("Not enough memory"));
> > -					return errno;
> > +					goto out;
> 
> same here.

Yes.  Thank you for spotting these.

--D
> 
> Thanks!
> -Lukas
> 
> >  				}
> >  				hdlinks->hdl = p;
> >  				hdlinks->size += HDLINK_CNT;
> > @@ -623,6 +627,8 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  			hdlinks->count++;
> >  		}
> >  	}
> > +
> > +out:
> >  	closedir(dh);
> >  	return retval;
> >  }
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/37] libext2fs: create sockets when populating filesystem
  2014-05-02 11:22   ` Lukáš Czerner
@ 2014-05-05 20:08     ` Darrick J. Wong
  2014-05-11 22:44       ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 20:08 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 01:22:16PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:12:42 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 03/37] libext2fs: create sockets when populating filesystem
> > 
> > Since the code to copy-in a socket when creating a filesystem is
> > fairly simple, just do it here.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  misc/create_inode.c |    9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/misc/create_inode.c b/misc/create_inode.c
> > index 4bb5e5b..e7faab1 100644
> > --- a/misc/create_inode.c
> > +++ b/misc/create_inode.c
> > @@ -114,6 +114,9 @@ errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
> >  		mode = LINUX_S_IFIFO;
> >  		filetype = EXT2_FT_FIFO;
> >  		break;
> > +	case S_IFSOCK:
> > +		mode = LINUX_S_IFSOCK;
> > +		filetype = EXT2_FT_SOCK;
> 
> You probably want to change the comment for the function as well.

I'll do that, thanks.  I'll also teach the function to return retval instead of
-1, since we're returning errcode_t anyway... though those changes are probably
more for the cleanup patch.

--D
> 
> -Lukas
> 
> >  	default:
> >  		abort();
> >  		/* NOTREACHED */
> > @@ -516,6 +519,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  		case S_IFCHR:
> >  		case S_IFBLK:
> >  		case S_IFIFO:
> > +		case S_IFSOCK:
> >  			retval = do_mknod_internal(fs, parent_ino, name, &st);
> >  			if (retval) {
> >  				com_err(__func__, retval,
> > @@ -524,11 +528,6 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> >  				goto out;
> >  			}
> >  			break;
> > -		case S_IFSOCK:
> > -			/* FIXME: there is no make socket function atm. */
> > -			com_err(__func__, 0,
> > -				_("ignoring socket file \"%s\""), name);
> > -			continue;
> >  		case S_IFLNK:
> >  			read_cnt = readlink(name, ln_target,
> >  					    sizeof(ln_target) - 1);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data
  2014-05-02 11:27   ` Lukáš Czerner
@ 2014-05-05 20:10     ` Darrick J. Wong
  2014-05-12  0:26       ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 20:10 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 01:27:01PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:12:49 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data
> > 
> > The combination of 128-byte inodes and inline_data is silly, since
> > there's no room in the inode table.  Unfortunately, if neither
> > mke2fs.conf nor the mkfs command line options specify an inode size,
> > the default inode size is set to 128 bytes (by libext2fs) and the
> > warning isn't printed.  Therefore, always do the check-and-warning.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  misc/mke2fs.c |   25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> > 
> > 
> > diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> > index aecd5d5..6507d0d 100644
> > --- a/misc/mke2fs.c
> > +++ b/misc/mke2fs.c
> > @@ -2282,21 +2282,22 @@ profile_error:
> >  				blocksize);
> >  			exit(1);
> >  		}
> > -		/*
> > -		 * If inode size is 128 and inline data is enabled, we need
> > -		 * to notify users that inline data will never be useful.
> > -		 */
> > -		if ((fs_param.s_feature_incompat &
> > -		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > -		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > -			com_err(program_name, 0,
> > -				_("inode size is %d, inline data is useless"),
> > -				inode_size);
> > -			exit(1);
> > -		}
> >  		fs_param.s_inode_size = inode_size;
> >  	}
> >  
> > +	/*
> > +	 * If inode size is 128 and inline data is enabled, we need
> > +	 * to notify users that inline data will never be useful.
> > +	 */
> > +	if ((fs_param.s_feature_incompat &
> > +	     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > +	    fs_param.s_inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > +		com_err(program_name, 0,
> > +			_("inode size is %d, inline data is useless"),
> > +			inode_size);
> 
> Oops :) copy-paste is tricky. You need to use fs_param.s_inode_size
> rather than inode_size here. Otherwise it looks good.

Will fix, thanks.

--D
> 
> Thanks!
> -Lukas
> 
> 
> > +		exit(1);
> > +	}
> > +
> >  	/* Make sure number of inodes specified will fit in 32 bits */
> >  	if (num_inodes == 0) {
> >  		unsigned long long n;
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
  2014-05-02 11:38   ` Lukáš Czerner
@ 2014-05-05 22:23     ` Darrick J. Wong
  2014-05-06 11:35       ` Lukáš Czerner
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 22:23 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 01:38:04PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:12:55 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
> > 
> > The logdump command doesn't know how to deal with revoke tables in
> > 64bit journals, so teach it to do this.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  debugfs/logdump.c          |   20 ++++-
> >  tests/f_jnl_64bit/expect.0 |  171 --------------------------------------------
> >  2 files changed, 15 insertions(+), 176 deletions(-)
> > 
> > 
> > diff --git a/debugfs/logdump.c b/debugfs/logdump.c
> > index 2d0efaf..8b9dc5b 100644
> > --- a/debugfs/logdump.c
> > +++ b/debugfs/logdump.c
> > @@ -526,28 +526,38 @@ static void dump_revoke_block(FILE *out_file, char *buf,
> >  {
> >  	int			offset, max;
> >  	journal_revoke_header_t *header;
> > -	unsigned int		*entry, rblock;
> > +	unsigned int		*entry;
> > +	unsigned long long	*bentry, rblock;
> > +	int			tag_size = sizeof(*entry);
> >  
> >  	if (dump_all)
> >  		fprintf(out_file, "Dumping revoke block, sequence %u, at "
> >  			"block %u:\n", transaction, blocknr);
> >  
> > +	if (be32_to_cpu(jsb->s_feature_incompat) & JFS_FEATURE_INCOMPAT_64BIT)
> > +		tag_size = sizeof(*bentry);
> > +
> >  	header = (journal_revoke_header_t *) buf;
> >  	offset = sizeof(journal_revoke_header_t);
> >  	max = be32_to_cpu(header->r_count);
> >  
> >  	while (offset < max) {
> > -		entry = (unsigned int *) (buf + offset);
> > -		rblock = be32_to_cpu(*entry);
> > +		if (tag_size == sizeof(*entry)) {
> > +			entry = (unsigned int *) (buf + offset);
> > +			rblock = be32_to_cpu(*entry);
> > +		} else {
> > +			bentry = (unsigned long long *)(buf + offset);
> > +			rblock = ext2fs_be64_to_cpu(*bentry);
> > +		}
> 
> I wonder whether we really need to have bentry and entry since those
> are just pointers and should be of the same size regardless of what
> they are pointing at.
> 
> Would not it be better from the readability pov ? Otherwise it looks
> good.

One could eliminate the local variables by writing it as such:

if (...)
	rblock = be32_to_cpu(*((__u32 *)(buf + offset)));
else
	rblock = ext2fs_be64_to_cpu(*((__u64 *)(buf + offset)));

The parentheses are a little harder to figure out in the second version, but I
don't have a strong opinion either way.

--D
> 
> Thanks!
> -Lukas
> 
> >  		if (dump_all || rblock == block_to_dump) {
> > -			fprintf(out_file, "  Revoke FS block %u", rblock);
> > +			fprintf(out_file, "  Revoke FS block %llu", rblock);
> >  			if (dump_all)
> >  				fprintf(out_file, "\n");
> >  			else
> >  				fprintf(out_file," at block %u, sequence %u\n",
> >  					blocknr, transaction);
> >  		}
> > -		offset += 4;
> > +		offset += tag_size;
> >  	}
> >  }
> >  
> > diff --git a/tests/f_jnl_64bit/expect.0 b/tests/f_jnl_64bit/expect.0
> > index 2007f03..5cef2d8 100644
> > --- a/tests/f_jnl_64bit/expect.0
> > +++ b/tests/f_jnl_64bit/expect.0
> > @@ -1,189 +1,97 @@
> >  Journal starts at block 67, transaction 32
> >  Found expected sequence 32, type 5 (revoke table) at block 67
> >  Dumping revoke block, sequence 32, at block 67:
> > -  Revoke FS block 0
> >    Revoke FS block 1536
> > -  Revoke FS block 0
> >    Revoke FS block 1472
> > -  Revoke FS block 0
> >    Revoke FS block 1473
> > -  Revoke FS block 0
> >    Revoke FS block 1474
> > -  Revoke FS block 0
> >    Revoke FS block 1475
> > -  Revoke FS block 0
> >    Revoke FS block 1476
> > -  Revoke FS block 0
> >    Revoke FS block 1541
> > -  Revoke FS block 0
> >    Revoke FS block 1477
> > -  Revoke FS block 0
> >    Revoke FS block 1478
> > -  Revoke FS block 0
> >    Revoke FS block 1479
> > -  Revoke FS block 0
> >    Revoke FS block 1480
> > -  Revoke FS block 0
> >    Revoke FS block 1481
> > -  Revoke FS block 0
> >    Revoke FS block 1482
> > -  Revoke FS block 0
> >    Revoke FS block 1483
> > -  Revoke FS block 0
> >    Revoke FS block 1484
> > -  Revoke FS block 0
> >    Revoke FS block 1485
> > -  Revoke FS block 0
> >    Revoke FS block 1486
> > -  Revoke FS block 0
> >    Revoke FS block 1487
> > -  Revoke FS block 0
> >    Revoke FS block 1488
> > -  Revoke FS block 0
> >    Revoke FS block 1489
> > -  Revoke FS block 0
> >    Revoke FS block 1490
> > -  Revoke FS block 0
> >    Revoke FS block 1491
> > -  Revoke FS block 0
> >    Revoke FS block 1556
> > -  Revoke FS block 0
> >    Revoke FS block 1492
> > -  Revoke FS block 0
> >    Revoke FS block 1493
> > -  Revoke FS block 0
> >    Revoke FS block 1429
> > -  Revoke FS block 0
> >    Revoke FS block 1494
> > -  Revoke FS block 0
> >    Revoke FS block 1495
> > -  Revoke FS block 0
> >    Revoke FS block 1496
> > -  Revoke FS block 0
> >    Revoke FS block 1432
> > -  Revoke FS block 0
> >    Revoke FS block 1497
> > -  Revoke FS block 0
> >    Revoke FS block 1498
> > -  Revoke FS block 0
> >    Revoke FS block 1434
> > -  Revoke FS block 0
> >    Revoke FS block 1499
> > -  Revoke FS block 0
> >    Revoke FS block 1435
> > -  Revoke FS block 0
> >    Revoke FS block 1500
> > -  Revoke FS block 0
> >    Revoke FS block 1501
> > -  Revoke FS block 0
> >    Revoke FS block 1502
> > -  Revoke FS block 0
> >    Revoke FS block 1503
> > -  Revoke FS block 0
> >    Revoke FS block 1504
> > -  Revoke FS block 0
> >    Revoke FS block 1505
> > -  Revoke FS block 0
> >    Revoke FS block 1506
> > -  Revoke FS block 0
> >    Revoke FS block 1442
> > -  Revoke FS block 0
> >    Revoke FS block 1507
> > -  Revoke FS block 0
> >    Revoke FS block 1508
> > -  Revoke FS block 0
> >    Revoke FS block 1444
> > -  Revoke FS block 0
> >    Revoke FS block 1509
> > -  Revoke FS block 0
> >    Revoke FS block 1445
> > -  Revoke FS block 0
> >    Revoke FS block 1510
> > -  Revoke FS block 0
> >    Revoke FS block 1511
> > -  Revoke FS block 0
> >    Revoke FS block 1512
> > -  Revoke FS block 0
> >    Revoke FS block 1513
> > -  Revoke FS block 0
> >    Revoke FS block 1449
> > -  Revoke FS block 0
> >    Revoke FS block 1514
> > -  Revoke FS block 0
> >    Revoke FS block 1515
> > -  Revoke FS block 0
> >    Revoke FS block 1516
> > -  Revoke FS block 0
> >    Revoke FS block 1517
> > -  Revoke FS block 0
> >    Revoke FS block 1453
> > -  Revoke FS block 0
> >    Revoke FS block 1518
> > -  Revoke FS block 0
> >    Revoke FS block 1519
> > -  Revoke FS block 0
> >    Revoke FS block 1520
> > -  Revoke FS block 0
> >    Revoke FS block 1456
> > -  Revoke FS block 0
> >    Revoke FS block 1521
> > -  Revoke FS block 0
> >    Revoke FS block 1457
> > -  Revoke FS block 0
> >    Revoke FS block 1522
> > -  Revoke FS block 0
> >    Revoke FS block 1458
> > -  Revoke FS block 0
> >    Revoke FS block 1523
> > -  Revoke FS block 0
> >    Revoke FS block 1459
> > -  Revoke FS block 0
> >    Revoke FS block 1524
> > -  Revoke FS block 0
> >    Revoke FS block 1460
> > -  Revoke FS block 0
> >    Revoke FS block 1525
> > -  Revoke FS block 0
> >    Revoke FS block 1461
> > -  Revoke FS block 0
> >    Revoke FS block 1526
> > -  Revoke FS block 0
> >    Revoke FS block 1462
> > -  Revoke FS block 0
> >    Revoke FS block 1527
> > -  Revoke FS block 0
> >    Revoke FS block 1463
> > -  Revoke FS block 0
> >    Revoke FS block 1528
> > -  Revoke FS block 0
> >    Revoke FS block 1464
> > -  Revoke FS block 0
> >    Revoke FS block 1529
> > -  Revoke FS block 0
> >    Revoke FS block 1465
> > -  Revoke FS block 0
> >    Revoke FS block 1530
> > -  Revoke FS block 0
> >    Revoke FS block 1466
> > -  Revoke FS block 0
> >    Revoke FS block 1531
> > -  Revoke FS block 0
> >    Revoke FS block 1467
> > -  Revoke FS block 0
> >    Revoke FS block 1532
> > -  Revoke FS block 0
> >    Revoke FS block 1468
> > -  Revoke FS block 0
> >    Revoke FS block 1533
> > -  Revoke FS block 0
> >    Revoke FS block 1469
> > -  Revoke FS block 0
> >    Revoke FS block 1534
> > -  Revoke FS block 0
> >    Revoke FS block 1470
> > -  Revoke FS block 0
> >    Revoke FS block 1535
> > -  Revoke FS block 0
> >    Revoke FS block 1471
> >  Found expected sequence 32, type 1 (descriptor block) at block 68
> >  Dumping descriptor block, sequence 32, at block 68:
> > @@ -323,163 +231,84 @@ Dumping descriptor block, sequence 32, at block 150:
> >  Found expected sequence 32, type 2 (commit block) at block 201
> >  Found expected sequence 33, type 5 (revoke table) at block 202
> >  Dumping revoke block, sequence 33, at block 202:
> > -  Revoke FS block 0
> >    Revoke FS block 1600
> > -  Revoke FS block 0
> >    Revoke FS block 1601
> > -  Revoke FS block 0
> >    Revoke FS block 1537
> > -  Revoke FS block 0
> >    Revoke FS block 1602
> > -  Revoke FS block 0
> >    Revoke FS block 1538
> > -  Revoke FS block 0
> >    Revoke FS block 1603
> > -  Revoke FS block 0
> >    Revoke FS block 1539
> > -  Revoke FS block 0
> >    Revoke FS block 1604
> > -  Revoke FS block 0
> >    Revoke FS block 1540
> > -  Revoke FS block 0
> >    Revoke FS block 1605
> > -  Revoke FS block 0
> >    Revoke FS block 1606
> > -  Revoke FS block 0
> >    Revoke FS block 1542
> > -  Revoke FS block 0
> >    Revoke FS block 1607
> > -  Revoke FS block 0
> >    Revoke FS block 1543
> > -  Revoke FS block 0
> >    Revoke FS block 1608
> > -  Revoke FS block 0
> >    Revoke FS block 1544
> > -  Revoke FS block 0
> >    Revoke FS block 1609
> > -  Revoke FS block 0
> >    Revoke FS block 1545
> > -  Revoke FS block 0
> >    Revoke FS block 1610
> > -  Revoke FS block 0
> >    Revoke FS block 1546
> > -  Revoke FS block 0
> >    Revoke FS block 1611
> > -  Revoke FS block 0
> >    Revoke FS block 1547
> > -  Revoke FS block 0
> >    Revoke FS block 1612
> > -  Revoke FS block 0
> >    Revoke FS block 1548
> > -  Revoke FS block 0
> >    Revoke FS block 1613
> > -  Revoke FS block 0
> >    Revoke FS block 1549
> > -  Revoke FS block 0
> >    Revoke FS block 1614
> > -  Revoke FS block 0
> >    Revoke FS block 1550
> > -  Revoke FS block 0
> >    Revoke FS block 1615
> > -  Revoke FS block 0
> >    Revoke FS block 1551
> > -  Revoke FS block 0
> >    Revoke FS block 1616
> > -  Revoke FS block 0
> >    Revoke FS block 1552
> > -  Revoke FS block 0
> >    Revoke FS block 1617
> > -  Revoke FS block 0
> >    Revoke FS block 1553
> > -  Revoke FS block 0
> >    Revoke FS block 1554
> > -  Revoke FS block 0
> >    Revoke FS block 1555
> > -  Revoke FS block 0
> >    Revoke FS block 1557
> > -  Revoke FS block 0
> >    Revoke FS block 1558
> > -  Revoke FS block 0
> >    Revoke FS block 1559
> > -  Revoke FS block 0
> >    Revoke FS block 1560
> > -  Revoke FS block 0
> >    Revoke FS block 1561
> > -  Revoke FS block 0
> >    Revoke FS block 1562
> > -  Revoke FS block 0
> >    Revoke FS block 1563
> > -  Revoke FS block 0
> >    Revoke FS block 1564
> > -  Revoke FS block 0
> >    Revoke FS block 1565
> > -  Revoke FS block 0
> >    Revoke FS block 1566
> > -  Revoke FS block 0
> >    Revoke FS block 1567
> > -  Revoke FS block 0
> >    Revoke FS block 1568
> > -  Revoke FS block 0
> >    Revoke FS block 1569
> > -  Revoke FS block 0
> >    Revoke FS block 1570
> > -  Revoke FS block 0
> >    Revoke FS block 1571
> > -  Revoke FS block 0
> >    Revoke FS block 1572
> > -  Revoke FS block 0
> >    Revoke FS block 1573
> > -  Revoke FS block 0
> >    Revoke FS block 1574
> > -  Revoke FS block 0
> >    Revoke FS block 1575
> > -  Revoke FS block 0
> >    Revoke FS block 1576
> > -  Revoke FS block 0
> >    Revoke FS block 1577
> > -  Revoke FS block 0
> >    Revoke FS block 1578
> > -  Revoke FS block 0
> >    Revoke FS block 1579
> > -  Revoke FS block 0
> >    Revoke FS block 1580
> > -  Revoke FS block 0
> >    Revoke FS block 1581
> > -  Revoke FS block 0
> >    Revoke FS block 1582
> > -  Revoke FS block 0
> >    Revoke FS block 1583
> > -  Revoke FS block 0
> >    Revoke FS block 1584
> > -  Revoke FS block 0
> >    Revoke FS block 1585
> > -  Revoke FS block 0
> >    Revoke FS block 1586
> > -  Revoke FS block 0
> >    Revoke FS block 1587
> > -  Revoke FS block 0
> >    Revoke FS block 1588
> > -  Revoke FS block 0
> >    Revoke FS block 1589
> > -  Revoke FS block 0
> >    Revoke FS block 1590
> > -  Revoke FS block 0
> >    Revoke FS block 1591
> > -  Revoke FS block 0
> >    Revoke FS block 1592
> > -  Revoke FS block 0
> >    Revoke FS block 1593
> > -  Revoke FS block 0
> >    Revoke FS block 1594
> > -  Revoke FS block 0
> >    Revoke FS block 1595
> > -  Revoke FS block 0
> >    Revoke FS block 1596
> > -  Revoke FS block 0
> >    Revoke FS block 1597
> > -  Revoke FS block 0
> >    Revoke FS block 1598
> > -  Revoke FS block 0
> >    Revoke FS block 1599
> >  Found expected sequence 33, type 1 (descriptor block) at block 203
> >  Dumping descriptor block, sequence 33, at block 203:
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/37] e2fsck: verify checksums after checking everything else
  2014-05-02 12:32   ` Lukáš Czerner
@ 2014-05-05 22:56     ` Darrick J. Wong
  2014-05-06 11:32       ` Lukáš Czerner
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 22:56 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 02:32:11PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:13:28 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 10/37] e2fsck: verify checksums after checking everything else
> > 
> > There's a particular problem with e2fsck's user interface where
> > checksum errors are concerned:  Fixing the first complaint about
> > a checksum problem results in the inode being cleared even if e2fsck
> > could otherwise have recovered it.  While this mode is useful for
> > cleaning the remaining broken crud off the filesystem, we could at
> > least default to checking everything /else/ and only complaining about
> > the incorrect checksum if fsck finds nothing else wrong.
> > 
> > So, plumb in a config option.  We default to "verify and checksum"
> > unless the user tell us otherwise.
> 
> I wonder whether it would not be better to always check the checksum
> of an object because it might yield additional information.
> 
> If the checksum is good and the object is somewhat broken that it's
> highly likely that we have a problem within a kernel (or possibly
> e2fsprogs if some other operations were performed)
> 
> If the checksum is bad and the object is bad, then it's likely that
> the corruption happened outside of the file system code, in memory,
> on disk or in transfer.
> 
> If checksum is bad and the object is good then it's trickier since it
> can be kernel metadata csum bug, unlucky silent corruption, or
> intentional change of the metadata.
> 
> It's not huge amount of information we can get from it, but I think
> that it might be useful when dealing with corrupted file system.

Hm.  So right now, the object verification code works roughly like this:

A) Verify checksum, offer to zero object if strict_csums and csum failure.
B) Check everything else and offer to fix broken things.
C) Verify checksum again; if !strict_csums and csum failure, offer to zero the
   object.

Do you think that it would be helpful to users if e2fsck warned of checksum
verification failures during step (A) if strict_csums is set?  I think that
would help users (or us developers) to distinguish those three scenarios.
It wouldn't be difficult to make fix_problem() spit out the message.

--D
> 
> Thanks!
> -Lukas
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  e2fsck/e2fsck.8.in      |   12 ++++++++++++
> >  e2fsck/e2fsck.conf.5.in |   20 ++++++++++++++++++++
> >  e2fsck/e2fsck.h         |    1 +
> >  e2fsck/problem.c        |   18 ++++++++++++++----
> >  e2fsck/problemP.h       |    1 +
> >  e2fsck/unix.c           |   11 +++++++++++
> >  6 files changed, 59 insertions(+), 4 deletions(-)
> > 
> > 
> > diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
> > index f5ed758..43ee063 100644
> > --- a/e2fsck/e2fsck.8.in
> > +++ b/e2fsck/e2fsck.8.in
> > @@ -207,6 +207,18 @@ option may prevent you from further manual data recovery.
> >  .BI nodiscard
> >  Do not attempt to discard free blocks and unused inode blocks. This option is
> >  exactly the opposite of discard option. This is set as default.
> > +.TP
> > +.BI strict_csums
> > +Verify each metadata object's checksum before checking anything other fields
> > +in the metadata object.  If the verification fails, offer to clear the item,
> > +also before checking any of the other fields.  This option causes e2fsck to
> > +favor throwing away broken objects over trying to salvage them.
> > +.TP
> > +.BI no_strict_csums
> > +Perform all regular checks of a metadata object and only verify the checksum if
> > +no problems were found.  This option causes e2fsck to try to salvage slightly
> > +damaged metadata objects, at the cost of spending processing time on recovering
> > +data.  This is set as the default.
> >  .RE
> >  .TP
> >  .B \-f
> > diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
> > index 9ebfbbf..a8219a8 100644
> > --- a/e2fsck/e2fsck.conf.5.in
> > +++ b/e2fsck/e2fsck.conf.5.in
> > @@ -222,6 +222,26 @@ If this boolean relation is true, e2fsck will run as if the option
> >  .B -v
> >  is always specified.  This will cause e2fsck to print some additional
> >  information at the end of each full file system check.
> > +.TP
> > +.I strict_csums
> > +If this boolean relation is true, e2fsck will run as if
> > +.B -E strict_csums
> > +is set.  This causes e2fsck to verify each metadata object's checksum before
> > +checking anything other fields in the metadata object.  If the verification
> > +fails, offer to clear the item, also before checking any of the other fields.
> > +This option causes e2fsck to favor throwing away broken objects over trying to
> > +salvage them.
> > +.IP
> > +If the boolean relation is false, e2fsck will run as if
> > +.B -E no_strict_csums
> > +is set.  In this case, e2fsck will perform all regular checks of a metadata
> > +object and only verify the checksum if no problems were found.  This option
> > +causes e2fsck to try to salvage slightly damaged metadata objects, at the cost
> > +of spending processing time on recovering data.
> > +.IP
> > +The default is for e2fsck to behave as if
> > +.B -E no_strict_csums
> > +is set.
> >  .SH THE [problems] STANZA
> >  Each tag in the
> >  .I [problems] 
> > diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> > index dbd6ea8..d7a7be9 100644
> > --- a/e2fsck/e2fsck.h
> > +++ b/e2fsck/e2fsck.h
> > @@ -167,6 +167,7 @@ struct resource_track {
> >  #define E2F_OPT_FRAGCHECK	0x0800
> >  #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
> >  #define E2F_OPT_DISCARD		0x2000
> > +#define E2F_OPT_CSUM_FIRST	0x4000
> >  
> >  /*
> >   * E2fsck flags
> > diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> > index 7f0ad6c..0999399 100644
> > --- a/e2fsck/problem.c
> > +++ b/e2fsck/problem.c
> > @@ -970,7 +970,7 @@ static struct e2fsck_problem problem_table[] = {
> >  	/* inode checksum does not match inode */
> >  	{ PR_1_INODE_CSUM_INVALID,
> >  	  N_("@i %i checksum does not match @i.  "),
> > -	  PROMPT_CLEAR, PR_PREEN_OK },
> > +	  PROMPT_CLEAR, PR_PREEN_OK | PR_INITIAL_CSUM },
> >  
> >  	/* inode passes checks, but checksum does not match inode */
> >  	{ PR_1_INODE_ONLY_CSUM_INVALID,
> > @@ -981,7 +981,7 @@ static struct e2fsck_problem problem_table[] = {
> >  	{ PR_1_EXTENT_CSUM_INVALID,
> >  	  N_("@i %i extent block checksum does not match extent\n\t(logical @b "
> >  	     "%c, @n physical @b %b, len %N)\n"),
> > -	  PROMPT_CLEAR, 0 },
> > +	  PROMPT_CLEAR, PR_INITIAL_CSUM },
> >  
> >  	/*
> >  	 * Inode extent block passes checks, but checksum does not match
> > @@ -996,7 +996,7 @@ static struct e2fsck_problem problem_table[] = {
> >  	{ PR_1_EA_BLOCK_CSUM_INVALID,
> >  	  N_("Extended attribute @a @b %b checksum for @i %i does not "
> >  	     "match.  "),
> > -	  PROMPT_CLEAR, 0 },
> > +	  PROMPT_CLEAR, PR_INITIAL_CSUM },
> >  
> >  	/*
> >  	 * Extended attribute block passes checks, but checksum for inode does
> > @@ -1470,7 +1470,7 @@ static struct e2fsck_problem problem_table[] = {
> >  	/* leaf node fails checksum */
> >  	{ PR_2_LEAF_NODE_CSUM_INVALID,
> >  	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
> > -	  PROMPT_SALVAGE, PR_PREEN_OK },
> > +	  PROMPT_SALVAGE, PR_PREEN_OK | PR_INITIAL_CSUM },
> >  
> >  	/* leaf node has no checksum */
> >  	{ PR_2_LEAF_NODE_MISSING_CSUM,
> > @@ -1944,6 +1944,16 @@ int fix_problem(e2fsck_t ctx, problem_t code, struct problem_context *pctx)
> >  		printf(_("Unhandled error code (0x%x)!\n"), code);
> >  		return 0;
> >  	}
> > +
> > +	/*
> > +	 * If there is a problem with the initial csum verification and the
> > +	 * user told e2fsck to verify csums /after/ checking everything else,
> > +	 * then don't "fix" anything.
> > +	 */
> > +	if ((ptr->flags & PR_INITIAL_CSUM) &&
> > +	    !(ctx->options & E2F_OPT_CSUM_FIRST))
> > +		return 0;
> > +
> >  	if (!(ptr->flags & PR_CONFIG)) {
> >  		char	key[9], *new_desc = NULL;
> >  
> > diff --git a/e2fsck/problemP.h b/e2fsck/problemP.h
> > index 7944cd6..a983598 100644
> > --- a/e2fsck/problemP.h
> > +++ b/e2fsck/problemP.h
> > @@ -44,3 +44,4 @@ struct latch_descr {
> >  #define PR_CONFIG	0x080000 /* This problem has been customized
> >  				    from the config file */
> >  #define PR_FORCE_NO	0x100000 /* Force the answer to be no */
> > +#define PR_INITIAL_CSUM	0x200000 /* User can ignore initial csum check */
> > diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> > index b39383d..c6cdb49 100644
> > --- a/e2fsck/unix.c
> > +++ b/e2fsck/unix.c
> > @@ -692,6 +692,10 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
> >  			else
> >  				ctx->log_fn = string_copy(ctx, arg, 0);
> >  			continue;
> > +		} else if (strcmp(token, "strict_csums") == 0) {
> > +			ctx->options |= E2F_OPT_CSUM_FIRST;
> > +		} else if (strcmp(token, "no_strict_csums") == 0) {
> > +			ctx->options &= ~E2F_OPT_CSUM_FIRST;
> >  		} else {
> >  			fprintf(stderr, _("Unknown extended option: %s\n"),
> >  				token);
> > @@ -710,6 +714,8 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
> >  		fputs(("\tjournal_only\n"), stderr);
> >  		fputs(("\tdiscard\n"), stderr);
> >  		fputs(("\tnodiscard\n"), stderr);
> > +		fputs(("\tstrict_csums\n"), stderr);
> > +		fputs(("\tno_strict_csums\n"), stderr);
> >  		fputc('\n', stderr);
> >  		exit(1);
> >  	}
> > @@ -945,6 +951,11 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
> >  	profile_set_syntax_err_cb(syntax_err_report);
> >  	profile_init(config_fn, &ctx->profile);
> >  
> > +	profile_get_boolean(ctx->profile, "options", "strict_csums", NULL,
> > +			    0, &c);
> > +	if (c)
> > +		ctx->options |= E2F_OPT_CSUM_FIRST;
> > +
> >  	profile_get_boolean(ctx->profile, "options", "report_time", 0, 0,
> >  			    &c);
> >  	if (c)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/37] e2fsck: fix the extended attribute checksum error message
  2014-05-02 12:46   ` Lukáš Czerner
@ 2014-05-05 23:08     ` Darrick J. Wong
  2014-05-06 10:12       ` Lukáš Czerner
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 23:08 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 02:46:56PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:13:34 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 11/37] e2fsck: fix the extended attribute checksum error
> >     message
> > 
> > Make the "EA block passes checks but fails checksum" message less
> > strange.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  e2fsck/problem.c |   12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > 
> > diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> > index 0999399..ec20bd1 100644
> > --- a/e2fsck/problem.c
> > +++ b/e2fsck/problem.c
> > @@ -992,19 +992,17 @@ static struct e2fsck_problem problem_table[] = {
> >  	     "extent\n\t(logical @b %c, @n physical @b %b, len %N)\n"),
> >  	  PROMPT_FIX, 0 },
> >  
> > -	/* Extended attribute block checksum for inode does not match. */
> > +	/* Extended attribute block checksum does not match. */
> 
> The "for inode" is still there in the message, so I do not think
> there is a reason to remove it from the comment.

Oops.

> >  	{ PR_1_EA_BLOCK_CSUM_INVALID,
> > -	  N_("Extended attribute @a @b %b checksum for @i %i does not "
> > -	     "match.  "),
> > +	  N_("@a @b %b checksum for @i %i does not match.  "),
> >  	  PROMPT_CLEAR, PR_INITIAL_CSUM },
> >  
> >  	/*
> > -	 * Extended attribute block passes checks, but checksum for inode does
> > -	 * not match.
> > +	 * Extended attribute block passes checks, but checksum does not
> > +	 * match.
> >  	 */
> >  	{ PR_1_EA_BLOCK_ONLY_CSUM_INVALID,
> > -	  N_("Extended attribute @a @b %b passes checks, but checksum for "
> > -	     "@i %i does not match.  "),
> > +	  N_("@a @b %b passes checks, but checksum does not match.  "),
> 
> Is there a reason to remove the inode number from the message ?

For whatever reason, I was confused by this message and thought it was
referring to a checksum failure in the inode itself.  On the other hand, it's
helpful to map an EA block back to an inode, so perhaps the message should be
changed to:

"Inode XXX's extended attribute block YYY passes checks, but checksum does not
match."

Now that I look at the other metadata_csum checks, the failure message starts
with "@i %i..." so these two might as well follow the convention.  Sorry that I
seem to have strayed from it.

--D
> 
> Thanks!
> -Lukas
> 
> >  	  PROMPT_FIX, 0 },
> >  
> >  	/*
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible
  2014-05-02 12:54   ` Lukáš Czerner
@ 2014-05-05 23:16     ` Darrick J. Wong
  0 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-05 23:16 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 02:54:08PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:13:41 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if
> >     possible
> > 
> > If e2fsck is writing a block of directory entries to disk, it should
> > adjust the dirents to add the dirent tail if one is missing.  It's not
> > a big deal if there's no space to do this since rehash (pass 3A) will
> > reconstruct directories for us.  However, we may as well avoid
> > unnecessary work.
> 
> I am sorry for the stupid questions, but in what case can be the
> dirent tail missing ? It's not immediately obvious to me.

Primarily the "dirent tail missing" case happens if the user runs tune2fs to
add checksums to a FS and it encounters a directory block that doesn't have
enough space to store the directory block checksum field.  When this happens,
tune2fs advises the user to run e2fsck -D to rebuild the directories.  The -D
switch isn't strictly necessary.

The particular sub-case that this patch tries to capture is where the user
ignores the "e2fsck -D" request, deletes entries out of the dir block, and some
time later runs e2fsck.  In that case, we can skip a full rebuild and just fix
the block.

--D
> 
> Thanks!
> -Lukas
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  e2fsck/pass2.c |   40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > 
> > diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
> > index 5488c73..95f51b7 100644
> > --- a/e2fsck/pass2.c
> > +++ b/e2fsck/pass2.c
> > @@ -739,6 +739,41 @@ static int is_last_entry(ext2_filsys fs, int inline_data_size,
> >  		return (offset < fs->blocksize - csum_size);
> >  }
> >  
> > +static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
> > +{
> > +	struct ext2_dir_entry *d;
> > +	void *top;
> > +	struct ext2_dir_entry_tail *t;
> > +	unsigned int rec_len;
> > +
> > +	d = dirbuf;
> > +	top = EXT2_DIRENT_TAIL(dirbuf, fs->blocksize);
> > +
> > +	rec_len = d->rec_len;
> > +	while (rec_len && !(rec_len & 0x3)) {
> > +		d = (struct ext2_dir_entry *)(((char *)d) + rec_len);
> > +		if (((void *)d) + d->rec_len >= top)
> > +			break;
> > +		rec_len = d->rec_len;
> > +	}
> > +
> > +	if (d != top) {
> > +		size_t min_size = EXT2_DIR_REC_LEN(
> > +				ext2fs_dirent_name_len(dirbuf));
> > +		if (min_size > d->rec_len - sizeof(struct ext2_dir_entry_tail))
> > +			return EXT2_ET_DIR_NO_SPACE_FOR_CSUM;
> > +		d->rec_len -= sizeof(struct ext2_dir_entry_tail);
> > +	}
> > +
> > +	t = (struct ext2_dir_entry_tail *)top;
> > +	if (t->det_reserved_zero1 ||
> > +	    t->det_rec_len != sizeof(struct ext2_dir_entry_tail) ||
> > +	    t->det_reserved_name_len != EXT2_DIR_NAME_LEN_CSUM)
> > +		ext2fs_initialize_dirent_tail(fs, t);
> > +
> > +	return 0;
> > +}
> > +
> >  static int check_dir_block(ext2_filsys fs,
> >  			   struct ext2_db_entry2 *db,
> >  			   void *priv_data)
> > @@ -1275,8 +1310,13 @@ skip_checksum:
> >  		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> >  				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
> >  		    is_leaf &&
> > +		    !inline_data_size &&
> >  		    !ext2fs_dirent_has_tail(fs, (struct ext2_dir_entry *)buf))
> > +		{
> > +			if (insert_dirent_tail(fs, buf) == 0)
> > +				goto write_and_fix;
> >  			e2fsck_rehash_dir_later(ctx, ino);
> > +		}
> >  
> >  write_and_fix:
> >  		if (e2fsck_dir_will_be_rehashed(ctx, ino))
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/37] debugfs: force logdump to display (old) journal contents
  2014-05-02 11:49   ` Lukáš Czerner
@ 2014-05-06  0:24     ` Darrick J. Wong
  2014-05-12  1:41       ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-06  0:24 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 01:49:37PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:13:02 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 06/37] debugfs: force logdump to display (old) journal
> >     contents
> > 
> > If the user passes -a more than once to logdump, try to dump old log
> > contents.  This can be used to try to track down journal problems even
> > after recovery.
> 
> You need to update man page as well for this. Also I wonder what's
> the behaviour if '-a' and '-b' or '-c' are specified simultaneously
> and '-a' is specified multiple times ?

I'll update the manpage.  -c seems to hexdump the contents of any block that we
find while iterating the journal.  -b would seem to allow you to dump an
arbitrary block #, but I could never get it to do that.

In any case, specifying -a even once will make logdump dump every block and
ignore -b.

--D
> 
> Thanks!
> -Lukas
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  debugfs/logdump.c |   11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/debugfs/logdump.c b/debugfs/logdump.c
> > index 8b9dc5b..bf4bef5 100644
> > --- a/debugfs/logdump.c
> > +++ b/debugfs/logdump.c
> > @@ -393,9 +393,13 @@ static void dump_journal(char *cmdname, FILE *out_file,
> >  	fprintf(out_file, "Journal starts at block %u, transaction %u\n",
> >  		blocknr, transaction);
> >  
> > -	if (!blocknr)
> > +	if (!blocknr) {
> >  		/* Empty journal, nothing to do. */
> > -		return;
> > +		if (dump_all < 2)
> > +			return;
> > +		else
> > +			blocknr = 1;
> > +	}
> >  
> >  	while (1) {
> >  		retval = read_journal_block(cmdname, source,
> > @@ -420,7 +424,8 @@ static void dump_journal(char *cmdname, FILE *out_file,
> >  			fprintf (out_file, "Found sequence %u (not %u) at "
> >  				 "block %u: end of journal.\n",
> >  				 sequence, transaction, blocknr);
> > -			return;
> > +			if (dump_all < 2)
> > +				return;
> >  		}
> >  
> >  		if (dump_descriptors) {
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-02  9:45   ` Lukáš Czerner
  2014-05-02 14:04     ` Theodore Ts'o
@ 2014-05-06  1:33     ` Darrick J. Wong
  2014-05-06 12:50       ` Lukáš Czerner
  1 sibling, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-06  1:33 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, May 02, 2014 at 11:45:25AM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:16:29 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 37/37] ext5: define new subtype to add features and reduce
> >     testing complexity
> > 
> > This patch defines ext5 as a set of required feature flags and mount
> > options, for the purpose of spreading new features to freshly
> > formatted filesystems and reducing the testing matrix by disabling
> > nearly all mount options.  The patch uses the s_minor_rev_level field
> > to indicate the existence of ext5, and switch on feature/mount option
> > enforcement in the kernel.
> > 
> > The required feature set is:
> > ^resize_inode,dirindex,ext_attr,sparse_super2,filetype,meta_bg,extents,
> > ^flex_bg,64bit,inline_data,sparse_super,huge_file,large_file,dir_nlink,
> > extra_isize,metadata_csum
> > 
> > The required mount options are:
> > acl,block_validity,user_xattr,journal_checksum
> > 
> > All other mount options are no longer functional.
> > 
> > The 'ext4' type remains unchanged, for people who require mount
> > options or a different feature set.  I don't intend to fork any code;
> > I'm just painting a bigger target (for testing).
> 
> This is definitely NACK by me. I do not like this and there are
> several reasons why.
> 
> First of all the name. Given the history of ext file system we tend
> to increase then number with the new version of file system. However
> you're saying that this is just for testing features ... in that
> case it does not make any sense to call it ext5, but not just that
> it's stupid to call it ext5 especially since we might actually want
> to release ext5 in the future and this would be really confusing for
> everybody involved.

I should have been clearer about my aim for "ext5" -- I want to define
ext5 to be "ext4 + some new features - some mount options", and then
work on stabilizing those features.  Historically, we've defined each
extN to be ext(N-1) + more features, and that's what I'm doing here
too.  ext5 would be a real release, with new features and fewer mount
options.  The comment about reducing testing was merely a reflection
upon the side effects of locking down some of the feature flags and
mount options.

I don't think it's a good idea to change what features you get with
'mke2fs -T ext4' since that hasn't changed since ~2008 or so.

Maybe I should have called it ext5dev and killed off ext4dev.

> I've been trying to get rid of the ext4dev bits and pieces
> more-or-less successfully and you're adding new type once again. We
> might start the discussion whether to revive ext4dev for this kind
> of thing but I am not really convinced that this is the right way to
> go either.
> 
> What about just simply using mkefs.conf to specify the feature set
> we want and use that ? It's simple enough and it should work. We
> could also extend the configuration to be able to set default
> mount options and such if that's not possible. I just do not understand
> why to introduce new file system type if that's just for testing
> ext4 features.

Well, yes, I could just create a new fs_types stanza in mke2fs.conf.
I wanted to put a little more teeth in that and actually have the
kernel and e2fsck be able to check that a FS has been declared as
'ext5' and that all the required bits are really there, hence the
ability to set s_minor_rev_level.  I'm not really married to going
that far, though.

(There's already an interface for specifying some of the default mount
options in the superblock; that was sufficient for me.)

--D
> 
> Thanks!
> -Lukas
> 
> 
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  e2fsck/problem.c                |   15 +++++++++
> >  e2fsck/problem.h                |   14 ++++++--
> >  e2fsck/unix.c                   |   68 +++++++++++++++++++++++++++++++++++++++
> >  lib/e2p/ls.c                    |   11 ++++++
> >  lib/ext2fs/ext2_fs.h            |    3 ++
> >  lib/ext2fs/ext2fs.h             |   50 +++++++++++++++++++++++++++++
> >  lib/ext2fs/initialize.c         |    1 +
> >  misc/Makefile.in                |   11 ++++--
> >  misc/mke2fs.c                   |   30 +++++++++++++++++
> >  misc/mke2fs.conf.in             |    4 ++
> >  misc/tune2fs.c                  |   23 +++++++++++++
> >  tests/metadata-checksum-test.sh |    5 +++
> >  tests/t_mke2fs_ext5/expect      |   45 ++++++++++++++++++++++++++
> >  tests/t_mke2fs_ext5/script      |   33 +++++++++++++++++++
> >  14 files changed, 306 insertions(+), 7 deletions(-)
> >  create mode 100644 tests/t_mke2fs_ext5/expect
> >  create mode 100755 tests/t_mke2fs_ext5/script
> > 
> > 
> > diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> > index ec20bd1..ddfe2b7 100644
> > --- a/e2fsck/problem.c
> > +++ b/e2fsck/problem.c
> > @@ -454,6 +454,21 @@ static struct e2fsck_problem problem_table[] = {
> >  	  N_("@S 64bit filesystems needs extents to access the whole disk.  "),
> >  	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
> >  
> > +	/* ext5 feature set incorrect. */
> > +	{ PR_0_FIX_EXT5_FEATURES,
> > +	  N_("@S ext5 feature set incorrect.  "),
> > +	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
> > +
> > +	/* ext5 flag doesn't match with feature set. */
> > +	{ PR_0_REMOVE_EXT5_MINOR_REV,
> > +	  N_("@S ext5 flag doesn't match with feature set.  "),
> > +	  PROMPT_CLEAR, PR_PREEN_OK | PR_NO_OK},
> > +
> > +	/* ext5 default mount options incorrect. */
> > +	{ PR_0_FIX_EXT5_MNTOPTS,
> > +	  N_("@S ext5 default mount options incorrect.  "),
> > +	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
> > +
> >  	/* Pass 1 errors */
> >  
> >  	/* Pass 1: Checking inodes, blocks, and sizes */
> > diff --git a/e2fsck/problem.h b/e2fsck/problem.h
> > index bc9fa9c..935f78a 100644
> > --- a/e2fsck/problem.h
> > +++ b/e2fsck/problem.h
> > @@ -249,9 +249,6 @@ struct problem_context {
> >  /* Checking group descriptor failed */
> >  #define PR_0_CHECK_DESC_FAILED			0x000045
> >  
> > -/* 64bit is set but extents are not set. */
> > -#define PR_0_64BIT_WITHOUT_EXTENTS		0x000048
> > -
> >  /*
> >   * metadata_csum supersedes uninit_bg; both feature bits cannot be set
> >   * simultaneously.
> > @@ -261,6 +258,17 @@ struct problem_context {
> >  /* Superblock has invalid MMP checksum. */
> >  #define PR_0_MMP_CSUM_INVALID			0x000047
> >  
> > +/* 64bit is set but extents are not set. */
> > +#define PR_0_64BIT_WITHOUT_EXTENTS		0x000048
> > +
> > +/* ext5 feature set incorrect. */
> > +#define PR_0_FIX_EXT5_FEATURES			0x000049
> > +
> > +/* ext5 flag doesn't match with feature set. */
> > +#define PR_0_REMOVE_EXT5_MINOR_REV		0x00004A
> > +
> > +/* ext5 default mount options incorrect. */
> > +#define PR_0_FIX_EXT5_MNTOPTS			0x00004B
> >  
> >  /*
> >   * Pass 1 errors
> > diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> > index da888c2..55a5d03 100644
> > --- a/e2fsck/unix.c
> > +++ b/e2fsck/unix.c
> > @@ -1205,6 +1205,71 @@ check_error:
> >  	return retval;
> >  }
> >  
> > +#define EXT5_FEATURE_COMPAT_FIXABLE	(EXT2_FEATURE_COMPAT_DIR_INDEX|\
> > +					 EXT2_FEATURE_COMPAT_EXT_ATTR)
> > +
> > +#define EXT5_FEATURE_INCOMPAT_FIXABLE	(EXT3_FEATURE_INCOMPAT_EXTENTS|\
> > +					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
> > +
> > +#define EXT5_FEATURE_RO_COMPAT_FIXABLE	(EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
> > +					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
> > +					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
> > +					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
> > +
> > +static void check_ext5_fs(e2fsck_t ctx, struct problem_context *pctx)
> > +{
> > +	struct ext2_super_block *sb = ctx->fs->super;
> > +	__u32 features[3];
> > +
> > +	if (sb->s_minor_rev_level != EXT5_MINOR_REV_LEVEL)
> > +		return;
> > +
> > +	features[0] = EXT5_FEATURE_COMPAT_REQD ^
> > +		(sb->s_feature_compat & EXT5_FEATURE_COMPAT_REQD_MASK);
> > +	features[1] = EXT5_FEATURE_INCOMPAT_REQD ^
> > +		(sb->s_feature_incompat & EXT5_FEATURE_INCOMPAT_REQD_MASK);
> > +	features[2] = EXT5_FEATURE_RO_COMPAT_REQD ^
> > +		(sb->s_feature_ro_compat & EXT5_FEATURE_RO_COMPAT_REQD_MASK);
> > +
> > +	if (!features[0] && !features[1] && !features[2])
> > +		goto check_mntopts;
> > +
> > +	if ((features[0] & EXT5_FEATURE_COMPAT_FIXABLE) == features[0] &&
> > +	    (features[1] & EXT5_FEATURE_INCOMPAT_FIXABLE) == features[1] &&
> > +	    (features[2] & EXT5_FEATURE_RO_COMPAT_FIXABLE) == features[2]) {
> > +		if (fix_problem(ctx, PR_0_FIX_EXT5_FEATURES, pctx)) {
> > +			sb->s_feature_compat = EXT5_FEATURE_COMPAT_REQD |
> > +				(sb->s_feature_compat &
> > +				 ~EXT5_FEATURE_COMPAT_REQD_MASK);
> > +			sb->s_feature_incompat = EXT5_FEATURE_INCOMPAT_REQD |
> > +				(sb->s_feature_incompat &
> > +				 ~EXT5_FEATURE_INCOMPAT_REQD_MASK);
> > +			sb->s_feature_ro_compat = EXT5_FEATURE_RO_COMPAT_REQD |
> > +				(sb->s_feature_ro_compat &
> > +				 ~EXT5_FEATURE_RO_COMPAT_REQD_MASK);
> > +			ext2fs_mark_super_dirty(ctx->fs);
> > +		}
> > +	} else {
> > +		if (fix_problem(ctx, PR_0_REMOVE_EXT5_MINOR_REV, pctx)) {
> > +			sb->s_minor_rev_level = 0;
> > +			ext2fs_mark_super_dirty(ctx->fs);
> > +		}
> > +	}
> > +
> > +check_mntopts:
> > +	if (!(EXT5_DEF_MNTOPT ^
> > +	      (sb->s_default_mount_opts & EXT5_DEF_MNTOPT_MASK)))
> > +		return;
> > +
> > +	if (fix_problem(ctx, PR_0_FIX_EXT5_MNTOPTS, pctx)) {
> > +		sb->s_default_mount_opts = EXT5_DEF_MNTOPT |
> > +			(sb->s_default_mount_opts & ~EXT5_DEF_MNTOPT_MASK);
> > +		ext2fs_mark_super_dirty(ctx->fs);
> > +	}
> > +
> > +	return;
> > +}
> > +
> >  int main (int argc, char *argv[])
> >  {
> >  	errcode_t	retval = 0, retval2 = 0, orig_retval = 0;
> > @@ -1601,6 +1666,9 @@ print_unsupp_features:
> >  	}
> >  #endif
> >  
> > +	/* check ext5 features and mount options */
> > +	check_ext5_fs(ctx, &pctx);
> > +
> >  	/*
> >  	 * If the user specified a specific superblock, presumably the
> >  	 * master superblock has been trashed.  So we mark the
> > diff --git a/lib/e2p/ls.c b/lib/e2p/ls.c
> > index a7ea38a..ba91e6a 100644
> > --- a/lib/e2p/ls.c
> > +++ b/lib/e2p/ls.c
> > @@ -239,6 +239,17 @@ void list_super2(struct ext2_super_block * sb, FILE *f)
> >  #endif
> >  	} else
> >  		fprintf(f, " (unknown)\n");
> > +	if (sb->s_minor_rev_level) {
> > +		fprintf(f, "Filesystem minor rev #:   %d",
> > +			sb->s_minor_rev_level);
> > +		switch (sb->s_minor_rev_level) {
> > +		case EXT5_MINOR_REV_LEVEL:
> > +			fprintf(f, " (ext5)\n");
> > +			break;
> > +		default:
> > +			fprintf(f, " (unknown)\n");
> > +		}
> > +	}
> >  	print_features(sb, f);
> >  	print_super_flags(sb, f);
> >  	print_mntopts(sb, f);
> > diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> > index 21a8187..027cfe9 100644
> > --- a/lib/ext2fs/ext2_fs.h
> > +++ b/lib/ext2fs/ext2_fs.h
> > @@ -926,4 +926,7 @@ struct mmp_struct {
> >   */
> >  #define EXT4_INLINE_DATA_DOTDOT_SIZE	(4)
> >  
> > +/* Minor revision level for ext5 */
> > +#define EXT5_MINOR_REV_LEVEL		(2)
> > +
> >  #endif	/* _LINUX_EXT2_FS_H */
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 84c7c74..fd53162 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -611,6 +611,56 @@ typedef struct ext2_icount *ext2_icount_t;
> >  					 EXT4_LIB_RO_COMPAT_QUOTA|\
> >  					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
> >  
> > +/* ext5 features */
> > +#define EXT5_FEATURE_COMPAT_REQD_MASK	(EXT2_FEATURE_COMPAT_RESIZE_INODE|\
> > +					 EXT2_FEATURE_COMPAT_DIR_INDEX|\
> > +					 EXT2_FEATURE_COMPAT_EXT_ATTR|\
> > +					 EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> > +
> > +#define EXT5_FEATURE_COMPAT_REQD	(EXT2_FEATURE_COMPAT_DIR_INDEX|\
> > +					 EXT2_FEATURE_COMPAT_EXT_ATTR|\
> > +					 EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> > +
> > +#define EXT5_FEATURE_INCOMPAT_REQD_MASK	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
> > +					 EXT2_FEATURE_INCOMPAT_META_BG|\
> > +					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
> > +					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
> > +					 EXT4_FEATURE_INCOMPAT_64BIT|\
> > +					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
> > +
> > +#define EXT5_FEATURE_INCOMPAT_REQD	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
> > +					 EXT2_FEATURE_INCOMPAT_META_BG|\
> > +					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
> > +					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
> > +					 EXT4_FEATURE_INCOMPAT_64BIT|\
> > +					 EXT4_FEATURE_INCOMPAT_INLINE_DATA)
> > +
> > +#define EXT5_FEATURE_RO_COMPAT_REQD_MASK (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
> > +					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
> > +					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
> > +					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
> > +					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE|\
> > +					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM|\
> > +					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
> > +
> > +#define EXT5_FEATURE_RO_COMPAT_REQD	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
> > +					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE|\
> > +					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE|\
> > +					 EXT4_FEATURE_RO_COMPAT_DIR_NLINK|\
> > +					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE|\
> > +					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
> > +
> > +#define EXT5_DEF_MNTOPT_MASK		(EXT2_DEFM_XATTR_USER|\
> > +					 EXT2_DEFM_ACL|\
> > +					 EXT2_DEFM_UID16|\
> > +					 EXT4_DEFM_NOBARRIER|\
> > +					 EXT4_DEFM_BLOCK_VALIDITY|\
> > +					 EXT4_DEFM_NODELALLOC)
> > +
> > +#define EXT5_DEF_MNTOPT			(EXT2_DEFM_XATTR_USER|\
> > +					 EXT2_DEFM_ACL|\
> > +					 EXT4_DEFM_BLOCK_VALIDITY)
> > +
> >  /*
> >   * These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
> >   * to ext2fs_openfs()
> > diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
> > index 75fbf8e..2d0731b 100644
> > --- a/lib/ext2fs/initialize.c
> > +++ b/lib/ext2fs/initialize.c
> > @@ -173,6 +173,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
> >  	set_field(s_raid_stripe_width, 0);	/* default stripe width: 0 */
> >  	set_field(s_log_groups_per_flex, 0);
> >  	set_field(s_flags, 0);
> > +	set_field(s_minor_rev_level, 0);
> >  	assign_field(s_backup_bgs[0]);
> >  	assign_field(s_backup_bgs[1]);
> >  	if (super->s_feature_incompat & ~EXT2_LIB_FEATURE_INCOMPAT_SUPP) {
> > diff --git a/misc/Makefile.in b/misc/Makefile.in
> > index 1b942f2..6776f41 100644
> > --- a/misc/Makefile.in
> > +++ b/misc/Makefile.in
> > @@ -475,7 +475,7 @@ install: all $(SMANPAGES) $(UMANPAGES) installdirs
> >  		$(ES) "	INSTALL $(sbindir)/$$i"; \
> >  		$(INSTALL_PROGRAM) $$i $(DESTDIR)$(sbindir)/$$i; \
> >  	done
> > -	$(Q) for i in ext2 ext3 ext4 ext4dev; do \
> > +	$(Q) for i in ext2 ext3 ext4 ext4dev ext5; do \
> >  		$(ES) "	LINK $(root_sbindir)/mkfs.$$i"; \
> >  		(cd $(DESTDIR)$(root_sbindir); \
> >  			$(LN) $(LINK_INSTALL_FLAGS) mke2fs mkfs.$$i); \
> > @@ -504,7 +504,7 @@ install: all $(SMANPAGES) $(UMANPAGES) installdirs
> >  	done
> >  	$(Q) $(RM) -f $(DESTDIR)$(man8dir)/mkfs.ext2.8.gz \
> >  		$(DESTDIR)$(man8dir)/mkfs.ext3.8.gz
> > -	$(Q) for i in ext2 ext3 ext4 ext4dev; do \
> > +	$(Q) for i in ext2 ext3 ext4 ext4dev ext5; do \
> >  		$(ES) "	LINK mkfs.$$i.8"; \
> >  		(cd $(DESTDIR)$(man8dir); \
> >  			$(LN) $(LINK_INSTALL_FLAGS) mke2fs.8 mkfs.$$i.8); \
> > @@ -580,7 +580,8 @@ uninstall:
> >  	$(RM) -f $(DESTDIR)$(root_sbindir)/mkfs.ext2 \
> >  			$(DESTDIR)$(root_sbindir)/mkfs.ext3 \
> >  			$(DESTDIR)$(root_sbindir)/mkfs.ext4 \
> > -			$(DESTDIR)$(root_sbindir)/mkfs.ext4dev
> > +			$(DESTDIR)$(root_sbindir)/mkfs.ext4dev \
> > +			$(DESTDIR)$(root_sbindir)/mkfs.ext5
> >  	for i in $(UPROGS); do \
> >  		$(RM) -f $(DESTDIR)$(bindir)/$$i; \
> >  	done
> > @@ -591,10 +592,12 @@ uninstall:
> >  		$(DESTDIR)$(man8dir)/mkfs.ext3.8 \
> >  		$(DESTDIR)$(man8dir)/mkfs.ext4.8 \
> >  		$(DESTDIR)$(man8dir)/mkfs.ext4dev.8 \
> > +		$(DESTDIR)$(man8dir)/mkfs.ext5.8 \
> >  		$(DESTDIR)$(man8dir)/fsck.ext2.8 \
> >  		$(DESTDIR)$(man8dir)/fsck.ext3.8 \
> >  		$(DESTDIR)$(man8dir)/fsck.ext4.8 \
> > -		$(DESTDIR)$(man8dir)/fsck.ext4dev.8
> > +		$(DESTDIR)$(man8dir)/fsck.ext4dev.8 \
> > +		$(DESTDIR)$(man8dir)/fsck.ext5.8
> >  
> >  	for i in $(UMANPAGES); do \
> >  		$(RM) -f $(DESTDIR)$(man1dir)/$$i; \
> > diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> > index a794689..c810238 100644
> > --- a/misc/mke2fs.c
> > +++ b/misc/mke2fs.c
> > @@ -1915,6 +1915,36 @@ profile_error:
> >  		     &fs_param.s_feature_compat);
> >  	if (tmp)
> >  		free(tmp);
> > +
> > +	/* Add in ext5 options */
> > +	tmp = get_string_from_profile(fs_types, "interface", NULL);
> > +	if (tmp) {
> > +		if (!strcmp(tmp, "ext5"))
> > +			fs_param.s_minor_rev_level = EXT5_MINOR_REV_LEVEL;
> > +		else {
> > +			fprintf(stderr, _("Unknown interface `%s'.\n"), tmp);
> > +			exit(1);
> > +		}
> > +		free(tmp);
> > +	}
> > +	if (fs_param.s_minor_rev_level == EXT5_MINOR_REV_LEVEL) {
> > +		fs_param.s_feature_incompat = EXT5_FEATURE_INCOMPAT_REQD |
> > +			(fs_param.s_feature_incompat &
> > +			 ~EXT5_FEATURE_INCOMPAT_REQD_MASK);
> > +		fs_param.s_feature_ro_compat = EXT5_FEATURE_RO_COMPAT_REQD |
> > +			(fs_param.s_feature_ro_compat &
> > +			 ~EXT5_FEATURE_RO_COMPAT_REQD_MASK);
> > +		fs_param.s_feature_compat = EXT5_FEATURE_COMPAT_REQD |
> > +			(fs_param.s_feature_compat &
> > +			 ~EXT5_FEATURE_COMPAT_REQD_MASK);
> > +		fs_param.s_default_mount_opts = EXT5_DEF_MNTOPT |
> > +			(fs_param.s_default_mount_opts & ~EXT5_DEF_MNTOPT_MASK);
> > +		fs_param.s_rev_level = EXT2_DYNAMIC_REV;
> > +		if (r_opt < EXT2_DYNAMIC_REV)
> > +			r_opt = -1;
> > +		fs_param.s_inode_size = 256;
> > +	}
> > +
> >  	/*
> >  	 * If the user specified features incompatible with the Hurd, complain
> >  	 */
> > diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
> > index de0250d..94fd139 100644
> > --- a/misc/mke2fs.conf.in
> > +++ b/misc/mke2fs.conf.in
> > @@ -20,6 +20,10 @@
> >  		inode_size = 256
> >  		options = test_fs=1
> >  	}
> > +	ext5 = {
> > +		features = has_journal
> > +		interface = ext5
> > +	}
> >  	small = {
> >  		blocksize = 1024
> >  		inode_size = 128
> > diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> > index 6571764..d3d6330 100644
> > --- a/misc/tune2fs.c
> > +++ b/misc/tune2fs.c
> > @@ -2406,6 +2406,26 @@ static int tune2fs_setup_tdb(const char *name, io_manager *io_ptr)
> >  	return retval;
> >  }
> >  
> > +static errcode_t update_minor_rev(ext2_filsys fs)
> > +{
> > +	if (fs->super->s_minor_rev_level != EXT5_MINOR_REV_LEVEL)
> > +		return 0;
> > +
> > +	if ((EXT5_FEATURE_COMPAT_REQD ^
> > +	     (fs->super->s_feature_compat & EXT5_FEATURE_COMPAT_REQD_MASK)) ||
> > +	    (EXT5_FEATURE_INCOMPAT_REQD ^
> > +	     (fs->super->s_feature_incompat & EXT5_FEATURE_INCOMPAT_REQD_MASK)) ||
> > +	    (EXT5_FEATURE_RO_COMPAT_REQD ^
> > +	     (fs->super->s_feature_ro_compat & EXT5_FEATURE_RO_COMPAT_REQD_MASK)) ||
> > +            (EXT5_DEF_MNTOPT ^
> > +	     (fs->super->s_default_mount_opts & EXT5_DEF_MNTOPT_MASK))) {
> > +		fs->super->s_minor_rev_level = 0;
> > +		ext2fs_mark_super_dirty(fs);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  int main(int argc, char **argv)
> >  {
> >  	errcode_t retval;
> > @@ -2659,6 +2679,9 @@ retry_open:
> >  		if (rc)
> >  			goto closefs;
> >  	}
> > +	rc = update_minor_rev(fs);
> > +	if (rc)
> > +		goto closefs;
> >  	if (extended_cmd) {
> >  		rc = parse_extended_opts(fs, extended_cmd);
> >  		if (rc)
> > diff --git a/tests/metadata-checksum-test.sh b/tests/metadata-checksum-test.sh
> > index a17bfd2..e51b1fa 100755
> > --- a/tests/metadata-checksum-test.sh
> > +++ b/tests/metadata-checksum-test.sh
> > @@ -190,6 +190,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
> >  	blocksize = 4096
> >  	inode_size = 256
> >  	inode_ratio = 16384
> > +	interface = ext5
> >  
> >  [fs_types]
> >  	ext4icsum_no_bv = {
> > @@ -200,6 +201,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
> >  		options = mmp_update_interval=5 #${RESIZE_PARAM}
> >  		lazy_itable_init = 1
> >  		cluster_size = $((BLK_SZ * 2))
> > +		interface = ext5
> >  	}
> >  	ext4icsum = {
> >  		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
> > @@ -208,6 +210,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
> >  		options = mmp_update_interval=5 #${RESIZE_PARAM}
> >  		lazy_itable_init = 1
> >  		cluster_size = $((BLK_SZ * 2))
> > +		interface = ext5
> >  	}
> >  	ext4icsum_noresize = {
> >  		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
> > @@ -216,6 +219,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
> >  		options = mmp_update_interval=5
> >  		lazy_itable_init = 1
> >  		cluster_size = $((BLK_SZ * 2))
> > +		interface = ext5
> >  	}
> >  	ext4icsum_hugefiles = {
> >  		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit$MKFS_OPTS
> > @@ -235,6 +239,7 @@ cat > "${MKE2FS_CONFIG}" << ENDL
> >  		hugefiles_digits = 4
> >  		hugefiles_size = 1G
> >  		num_hugefiles = 0
> > +		interface = ext5
> >  	}
> >  ENDL
> >  MKFS_OPTS=""
> > diff --git a/tests/t_mke2fs_ext5/expect b/tests/t_mke2fs_ext5/expect
> > new file mode 100644
> > index 0000000..87e1185
> > --- /dev/null
> > +++ b/tests/t_mke2fs_ext5/expect
> > @@ -0,0 +1,45 @@
> > +Filesystem volume name:   <none>
> > +Last mounted on:          <not available>
> > +Filesystem magic number:  0xEF53
> > +Filesystem revision #:    1 (dynamic)
> > +Filesystem minor rev #:   2 (ext5)
> > +Filesystem features:      ext_attr dir_index sparse_super2 filetype meta_bg extent 64bit flex_bg inline_data sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> > +Filesystem flags:         signed_directory_hash 
> > +Default mount options:    user_xattr acl block_validity
> > +Filesystem state:         clean
> > +Errors behavior:          Continue
> > +Filesystem OS type:       Linux
> > +Inode count:              64
> > +Block count:              128
> > +Reserved block count:     6
> > +Free blocks:              116
> > +Free inodes:              53
> > +First block:              0
> > +Block size:               4096
> > +Fragment size:            4096
> > +Group descriptor size:    64
> > +Blocks per group:         32768
> > +Fragments per group:      32768
> > +Inodes per group:         64
> > +Inode blocks per group:   4
> > +Flex block group size:    16
> > +Last mount time:          n/a
> > +Mount count:              0
> > +Maximum mount count:      -1
> > +Check interval:           0 (<none>)
> > +Lifetime writes:          5 kB
> > +Reserved blocks uid:      0 (user root)
> > +Reserved blocks gid:      0 (group root)
> > +First inode:              11
> > +Inode size:	          256
> > +Required extra isize:     28
> > +Desired extra isize:      28
> > +Default directory hash:   half_md4
> > +
> > +
> > +Group 0: (Blocks 0-127) [ITABLE_ZEROED]
> > +  Primary superblock at 0, Group descriptor at 1
> > +  Inode table at 34-37 (+34)
> > +  116 free blocks, 53 free inodes, 2 directories, 53 unused inodes
> > +  Free blocks: 7-17, 19-33, 38-127
> > +  Free inodes: 12-64
> > diff --git a/tests/t_mke2fs_ext5/script b/tests/t_mke2fs_ext5/script
> > new file mode 100755
> > index 0000000..9be9bf5
> > --- /dev/null
> > +++ b/tests/t_mke2fs_ext5/script
> > @@ -0,0 +1,33 @@
> > +test_description="mke2fs with ext5"
> > +
> > +conf=$TMPFILE.conf
> > +
> > +cat > $conf << ENDL
> > +[defaults]
> > +	interface = ext5
> > +ENDL
> > +
> > +trap "rm -rf $TMPFILE $TMPFILE.conf" EXIT INT QUIT
> > +dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
> > +OUT=$test_name.log
> > +EXP=$test_dir/expect
> > +rm -rf $OUT
> > +
> > +# Test command line option
> > +MKE2FS_CONFIG=$TMPFILE.conf
> > +export MKE2FS_CONFIG
> > +$MKE2FS -F $TMPFILE > /dev/null 2>&1
> > +$DUMPE2FS $TMPFILE | egrep -v "(Filesystem UUID|Filesystem created|Last write time|Last checked|Directory Hash Seed|Checksum| csum )" >> $OUT
> > +
> > +cmp -s $OUT $EXP
> > +status=$?
> > +
> > +if [ "$status" = 0 ] ; then
> > +	echo "$test_name: $test_description: ok"
> > +	touch $test_name.ok
> > +else
> > +	echo "$test_name: $test_description: failed"
> > +	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
> > +	rm -f $test_name.tmp
> > +fi
> > +
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-02 14:04     ` Theodore Ts'o
@ 2014-05-06  1:59       ` Darrick J. Wong
  0 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-06  1:59 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Lukáš Czerner, linux-ext4

On Fri, May 02, 2014 at 10:04:07AM -0400, Theodore Ts'o wrote:
> On Fri, May 02, 2014 at 11:45:25AM +0200, Lukáš Czerner wrote:
> > This is definitely NACK by me. I do not like this and there are
> > several reasons why.
> > 
> > First of all the name. Given the history of ext file system we tend
> > to increase then number with the new version of file system. However
> > you're saying that this is just for testing features ... in that
> > case it does not make any sense to call it ext5, but not just that
> > it's stupid to call it ext5 especially since we might actually want
> > to release ext5 in the future and this would be really confusing for
> > everybody involved.
> 
> Yes, the messaging involved with the "ext3" vs "ext4" bump has been
> really unfortunate.  If I had to do it all over again, I would have
> created "ext3dev", and then when it was stable, I would done a:
> 
> 	git rm -rf fs/ext3 ; git mv fs/ext3dev fs/ext4
> 
> For example, it would have avoided the problem with SuSE product
> managers refusing to support ext4 for multiple years, etc.
> 
> It also would have avoided the problem with people doing comparisons
> of ext3 versus xfs, even in April 2014 (see a recent Hacker News
> promoted blog article, where in someone kvetched that ext3 didn't
> support fallocate).  Sigh....

We could still do that, delete ext3 once we think "ext4 + new
features" is stable enough.  It's not quite having only one extN
featureset in operation at a given time, but "a stable one" and "the
one we're working on" seems like plenty.

> > What about just simply using mkefs.conf to specify the feature set
> > we want and use that?
> 
> Yes, it's likely that for 1.43 we'll enable various features by
> default.  It's been quite deliberate that I haven't enabled by
> default, because I wanted to make 100% sure they were completely
> stable before enabling them by default.  Some of them we may have been

Maybe I should have called this 'ext5alpha' or something, just to see
if I could generate wider interest in testing.  I feel like these new
features are stable enough for some thorough testing, but they're a
pretty long way from 'completely stable'.

How widely are these features being tested?  I'm rather dismayed that
I still find plenty of bugs to stick in the patchbomb.  (Though I'm as
guilty as anyone else for contributing new features.)

> able to enable by default earlier, but be that as it may, 1.43 is a
> good time to make that change.

Hmm, I guess you were intending to update the mke2fs.conf definition of ext4,
then?

--D
> 
> 				- Ted
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/37] e2fsck: fix the extended attribute checksum error message
  2014-05-05 23:08     ` Darrick J. Wong
@ 2014-05-06 10:12       ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-06 10:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3500 bytes --]

On Mon, 5 May 2014, Darrick J. Wong wrote:

> Date: Mon, 5 May 2014 16:08:29 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 11/37] e2fsck: fix the extended attribute checksum error
>     message
> 
> On Fri, May 02, 2014 at 02:46:56PM +0200, Lukáš Czerner wrote:
> > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > 
> > > Date: Thu, 01 May 2014 16:13:34 -0700
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > Cc: linux-ext4@vger.kernel.org
> > > Subject: [PATCH 11/37] e2fsck: fix the extended attribute checksum error
> > >     message
> > > 
> > > Make the "EA block passes checks but fails checksum" message less
> > > strange.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  e2fsck/problem.c |   12 +++++-------
> > >  1 file changed, 5 insertions(+), 7 deletions(-)
> > > 
> > > 
> > > diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> > > index 0999399..ec20bd1 100644
> > > --- a/e2fsck/problem.c
> > > +++ b/e2fsck/problem.c
> > > @@ -992,19 +992,17 @@ static struct e2fsck_problem problem_table[] = {
> > >  	     "extent\n\t(logical @b %c, @n physical @b %b, len %N)\n"),
> > >  	  PROMPT_FIX, 0 },
> > >  
> > > -	/* Extended attribute block checksum for inode does not match. */
> > > +	/* Extended attribute block checksum does not match. */
> > 
> > The "for inode" is still there in the message, so I do not think
> > there is a reason to remove it from the comment.
> 
> Oops.
> 
> > >  	{ PR_1_EA_BLOCK_CSUM_INVALID,
> > > -	  N_("Extended attribute @a @b %b checksum for @i %i does not "
> > > -	     "match.  "),
> > > +	  N_("@a @b %b checksum for @i %i does not match.  "),
> > >  	  PROMPT_CLEAR, PR_INITIAL_CSUM },
> > >  
> > >  	/*
> > > -	 * Extended attribute block passes checks, but checksum for inode does
> > > -	 * not match.
> > > +	 * Extended attribute block passes checks, but checksum does not
> > > +	 * match.
> > >  	 */
> > >  	{ PR_1_EA_BLOCK_ONLY_CSUM_INVALID,
> > > -	  N_("Extended attribute @a @b %b passes checks, but checksum for "
> > > -	     "@i %i does not match.  "),
> > > +	  N_("@a @b %b passes checks, but checksum does not match.  "),
> > 
> > Is there a reason to remove the inode number from the message ?
> 
> For whatever reason, I was confused by this message and thought it was
> referring to a checksum failure in the inode itself.  On the other hand, it's
> helpful to map an EA block back to an inode, so perhaps the message should be
> changed to:
> 
> "Inode XXX's extended attribute block YYY passes checks, but checksum does not
> match."

That sounds better, thanks!
-Lukas

> 
> Now that I look at the other metadata_csum checks, the failure message starts
> with "@i %i..." so these two might as well follow the convention.  Sorry that I
> seem to have strayed from it.
> 
> --D
> > 
> > Thanks!
> > -Lukas
> > 
> > >  	  PROMPT_FIX, 0 },
> > >  
> > >  	/*
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/37] e2fsck: verify checksums after checking everything else
  2014-05-05 22:56     ` Darrick J. Wong
@ 2014-05-06 11:32       ` Lukáš Czerner
  2014-05-08  0:05         ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-06 11:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11612 bytes --]

On Mon, 5 May 2014, Darrick J. Wong wrote:

> Date: Mon, 5 May 2014 15:56:47 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 10/37] e2fsck: verify checksums after checking everything
>     else
> 
> On Fri, May 02, 2014 at 02:32:11PM +0200, Lukáš Czerner wrote:
> > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > 
> > > Date: Thu, 01 May 2014 16:13:28 -0700
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > Cc: linux-ext4@vger.kernel.org
> > > Subject: [PATCH 10/37] e2fsck: verify checksums after checking everything else
> > > 
> > > There's a particular problem with e2fsck's user interface where
> > > checksum errors are concerned:  Fixing the first complaint about
> > > a checksum problem results in the inode being cleared even if e2fsck
> > > could otherwise have recovered it.  While this mode is useful for
> > > cleaning the remaining broken crud off the filesystem, we could at
> > > least default to checking everything /else/ and only complaining about
> > > the incorrect checksum if fsck finds nothing else wrong.
> > > 
> > > So, plumb in a config option.  We default to "verify and checksum"
> > > unless the user tell us otherwise.
> > 
> > I wonder whether it would not be better to always check the checksum
> > of an object because it might yield additional information.
> > 
> > If the checksum is good and the object is somewhat broken that it's
> > highly likely that we have a problem within a kernel (or possibly
> > e2fsprogs if some other operations were performed)
> > 
> > If the checksum is bad and the object is bad, then it's likely that
> > the corruption happened outside of the file system code, in memory,
> > on disk or in transfer.
> > 
> > If checksum is bad and the object is good then it's trickier since it
> > can be kernel metadata csum bug, unlucky silent corruption, or
> > intentional change of the metadata.
> > 
> > It's not huge amount of information we can get from it, but I think
> > that it might be useful when dealing with corrupted file system.
> 
> Hm.  So right now, the object verification code works roughly like this:
> 
> A) Verify checksum, offer to zero object if strict_csums and csum failure.
> B) Check everything else and offer to fix broken things.
> C) Verify checksum again; if !strict_csums and csum failure, offer to zero the
>    object.
> 
> Do you think that it would be helpful to users if e2fsck warned of checksum
> verification failures during step (A) if strict_csums is set?  I think that
> would help users (or us developers) to distinguish those three scenarios.
> It wouldn't be difficult to make fix_problem() spit out the message.

Yes, I think that this is going to be helpful to both, users and
developers. I am not sure how easy or hard it would be but having
e2sfck specifically say that:

"Object checksum is corrupted, but the object seems fine"

or

"Object checksum is ok, but the object itself seems corrupted"

or

"object checksum is corrupted and the object itself is corrupted"

after the checksum verification and object check.

But your solution would be useful as well.

Thanks!
-Lukas

> 
> --D
> > 
> > Thanks!
> > -Lukas
> > 
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  e2fsck/e2fsck.8.in      |   12 ++++++++++++
> > >  e2fsck/e2fsck.conf.5.in |   20 ++++++++++++++++++++
> > >  e2fsck/e2fsck.h         |    1 +
> > >  e2fsck/problem.c        |   18 ++++++++++++++----
> > >  e2fsck/problemP.h       |    1 +
> > >  e2fsck/unix.c           |   11 +++++++++++
> > >  6 files changed, 59 insertions(+), 4 deletions(-)
> > > 
> > > 
> > > diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
> > > index f5ed758..43ee063 100644
> > > --- a/e2fsck/e2fsck.8.in
> > > +++ b/e2fsck/e2fsck.8.in
> > > @@ -207,6 +207,18 @@ option may prevent you from further manual data recovery.
> > >  .BI nodiscard
> > >  Do not attempt to discard free blocks and unused inode blocks. This option is
> > >  exactly the opposite of discard option. This is set as default.
> > > +.TP
> > > +.BI strict_csums
> > > +Verify each metadata object's checksum before checking anything other fields
> > > +in the metadata object.  If the verification fails, offer to clear the item,
> > > +also before checking any of the other fields.  This option causes e2fsck to
> > > +favor throwing away broken objects over trying to salvage them.
> > > +.TP
> > > +.BI no_strict_csums
> > > +Perform all regular checks of a metadata object and only verify the checksum if
> > > +no problems were found.  This option causes e2fsck to try to salvage slightly
> > > +damaged metadata objects, at the cost of spending processing time on recovering
> > > +data.  This is set as the default.
> > >  .RE
> > >  .TP
> > >  .B \-f
> > > diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
> > > index 9ebfbbf..a8219a8 100644
> > > --- a/e2fsck/e2fsck.conf.5.in
> > > +++ b/e2fsck/e2fsck.conf.5.in
> > > @@ -222,6 +222,26 @@ If this boolean relation is true, e2fsck will run as if the option
> > >  .B -v
> > >  is always specified.  This will cause e2fsck to print some additional
> > >  information at the end of each full file system check.
> > > +.TP
> > > +.I strict_csums
> > > +If this boolean relation is true, e2fsck will run as if
> > > +.B -E strict_csums
> > > +is set.  This causes e2fsck to verify each metadata object's checksum before
> > > +checking anything other fields in the metadata object.  If the verification
> > > +fails, offer to clear the item, also before checking any of the other fields.
> > > +This option causes e2fsck to favor throwing away broken objects over trying to
> > > +salvage them.
> > > +.IP
> > > +If the boolean relation is false, e2fsck will run as if
> > > +.B -E no_strict_csums
> > > +is set.  In this case, e2fsck will perform all regular checks of a metadata
> > > +object and only verify the checksum if no problems were found.  This option
> > > +causes e2fsck to try to salvage slightly damaged metadata objects, at the cost
> > > +of spending processing time on recovering data.
> > > +.IP
> > > +The default is for e2fsck to behave as if
> > > +.B -E no_strict_csums
> > > +is set.
> > >  .SH THE [problems] STANZA
> > >  Each tag in the
> > >  .I [problems] 
> > > diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> > > index dbd6ea8..d7a7be9 100644
> > > --- a/e2fsck/e2fsck.h
> > > +++ b/e2fsck/e2fsck.h
> > > @@ -167,6 +167,7 @@ struct resource_track {
> > >  #define E2F_OPT_FRAGCHECK	0x0800
> > >  #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
> > >  #define E2F_OPT_DISCARD		0x2000
> > > +#define E2F_OPT_CSUM_FIRST	0x4000
> > >  
> > >  /*
> > >   * E2fsck flags
> > > diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> > > index 7f0ad6c..0999399 100644
> > > --- a/e2fsck/problem.c
> > > +++ b/e2fsck/problem.c
> > > @@ -970,7 +970,7 @@ static struct e2fsck_problem problem_table[] = {
> > >  	/* inode checksum does not match inode */
> > >  	{ PR_1_INODE_CSUM_INVALID,
> > >  	  N_("@i %i checksum does not match @i.  "),
> > > -	  PROMPT_CLEAR, PR_PREEN_OK },
> > > +	  PROMPT_CLEAR, PR_PREEN_OK | PR_INITIAL_CSUM },
> > >  
> > >  	/* inode passes checks, but checksum does not match inode */
> > >  	{ PR_1_INODE_ONLY_CSUM_INVALID,
> > > @@ -981,7 +981,7 @@ static struct e2fsck_problem problem_table[] = {
> > >  	{ PR_1_EXTENT_CSUM_INVALID,
> > >  	  N_("@i %i extent block checksum does not match extent\n\t(logical @b "
> > >  	     "%c, @n physical @b %b, len %N)\n"),
> > > -	  PROMPT_CLEAR, 0 },
> > > +	  PROMPT_CLEAR, PR_INITIAL_CSUM },
> > >  
> > >  	/*
> > >  	 * Inode extent block passes checks, but checksum does not match
> > > @@ -996,7 +996,7 @@ static struct e2fsck_problem problem_table[] = {
> > >  	{ PR_1_EA_BLOCK_CSUM_INVALID,
> > >  	  N_("Extended attribute @a @b %b checksum for @i %i does not "
> > >  	     "match.  "),
> > > -	  PROMPT_CLEAR, 0 },
> > > +	  PROMPT_CLEAR, PR_INITIAL_CSUM },
> > >  
> > >  	/*
> > >  	 * Extended attribute block passes checks, but checksum for inode does
> > > @@ -1470,7 +1470,7 @@ static struct e2fsck_problem problem_table[] = {
> > >  	/* leaf node fails checksum */
> > >  	{ PR_2_LEAF_NODE_CSUM_INVALID,
> > >  	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
> > > -	  PROMPT_SALVAGE, PR_PREEN_OK },
> > > +	  PROMPT_SALVAGE, PR_PREEN_OK | PR_INITIAL_CSUM },
> > >  
> > >  	/* leaf node has no checksum */
> > >  	{ PR_2_LEAF_NODE_MISSING_CSUM,
> > > @@ -1944,6 +1944,16 @@ int fix_problem(e2fsck_t ctx, problem_t code, struct problem_context *pctx)
> > >  		printf(_("Unhandled error code (0x%x)!\n"), code);
> > >  		return 0;
> > >  	}
> > > +
> > > +	/*
> > > +	 * If there is a problem with the initial csum verification and the
> > > +	 * user told e2fsck to verify csums /after/ checking everything else,
> > > +	 * then don't "fix" anything.
> > > +	 */
> > > +	if ((ptr->flags & PR_INITIAL_CSUM) &&
> > > +	    !(ctx->options & E2F_OPT_CSUM_FIRST))
> > > +		return 0;
> > > +
> > >  	if (!(ptr->flags & PR_CONFIG)) {
> > >  		char	key[9], *new_desc = NULL;
> > >  
> > > diff --git a/e2fsck/problemP.h b/e2fsck/problemP.h
> > > index 7944cd6..a983598 100644
> > > --- a/e2fsck/problemP.h
> > > +++ b/e2fsck/problemP.h
> > > @@ -44,3 +44,4 @@ struct latch_descr {
> > >  #define PR_CONFIG	0x080000 /* This problem has been customized
> > >  				    from the config file */
> > >  #define PR_FORCE_NO	0x100000 /* Force the answer to be no */
> > > +#define PR_INITIAL_CSUM	0x200000 /* User can ignore initial csum check */
> > > diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> > > index b39383d..c6cdb49 100644
> > > --- a/e2fsck/unix.c
> > > +++ b/e2fsck/unix.c
> > > @@ -692,6 +692,10 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
> > >  			else
> > >  				ctx->log_fn = string_copy(ctx, arg, 0);
> > >  			continue;
> > > +		} else if (strcmp(token, "strict_csums") == 0) {
> > > +			ctx->options |= E2F_OPT_CSUM_FIRST;
> > > +		} else if (strcmp(token, "no_strict_csums") == 0) {
> > > +			ctx->options &= ~E2F_OPT_CSUM_FIRST;
> > >  		} else {
> > >  			fprintf(stderr, _("Unknown extended option: %s\n"),
> > >  				token);
> > > @@ -710,6 +714,8 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
> > >  		fputs(("\tjournal_only\n"), stderr);
> > >  		fputs(("\tdiscard\n"), stderr);
> > >  		fputs(("\tnodiscard\n"), stderr);
> > > +		fputs(("\tstrict_csums\n"), stderr);
> > > +		fputs(("\tno_strict_csums\n"), stderr);
> > >  		fputc('\n', stderr);
> > >  		exit(1);
> > >  	}
> > > @@ -945,6 +951,11 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
> > >  	profile_set_syntax_err_cb(syntax_err_report);
> > >  	profile_init(config_fn, &ctx->profile);
> > >  
> > > +	profile_get_boolean(ctx->profile, "options", "strict_csums", NULL,
> > > +			    0, &c);
> > > +	if (c)
> > > +		ctx->options |= E2F_OPT_CSUM_FIRST;
> > > +
> > >  	profile_get_boolean(ctx->profile, "options", "report_time", 0, 0,
> > >  			    &c);
> > >  	if (c)
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
  2014-05-05 22:23     ` Darrick J. Wong
@ 2014-05-06 11:35       ` Lukáš Czerner
  2014-05-12  1:20         ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-06 11:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 14696 bytes --]

On Mon, 5 May 2014, Darrick J. Wong wrote:

> Date: Mon, 5 May 2014 15:23:33 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke
>     tables
> 
> On Fri, May 02, 2014 at 01:38:04PM +0200, Lukáš Czerner wrote:
> > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > 
> > > Date: Thu, 01 May 2014 16:12:55 -0700
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > Cc: linux-ext4@vger.kernel.org
> > > Subject: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
> > > 
> > > The logdump command doesn't know how to deal with revoke tables in
> > > 64bit journals, so teach it to do this.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  debugfs/logdump.c          |   20 ++++-
> > >  tests/f_jnl_64bit/expect.0 |  171 --------------------------------------------
> > >  2 files changed, 15 insertions(+), 176 deletions(-)
> > > 
> > > 
> > > diff --git a/debugfs/logdump.c b/debugfs/logdump.c
> > > index 2d0efaf..8b9dc5b 100644
> > > --- a/debugfs/logdump.c
> > > +++ b/debugfs/logdump.c
> > > @@ -526,28 +526,38 @@ static void dump_revoke_block(FILE *out_file, char *buf,
> > >  {
> > >  	int			offset, max;
> > >  	journal_revoke_header_t *header;
> > > -	unsigned int		*entry, rblock;
> > > +	unsigned int		*entry;
> > > +	unsigned long long	*bentry, rblock;
> > > +	int			tag_size = sizeof(*entry);
> > >  
> > >  	if (dump_all)
> > >  		fprintf(out_file, "Dumping revoke block, sequence %u, at "
> > >  			"block %u:\n", transaction, blocknr);
> > >  
> > > +	if (be32_to_cpu(jsb->s_feature_incompat) & JFS_FEATURE_INCOMPAT_64BIT)
> > > +		tag_size = sizeof(*bentry);
> > > +
> > >  	header = (journal_revoke_header_t *) buf;
> > >  	offset = sizeof(journal_revoke_header_t);
> > >  	max = be32_to_cpu(header->r_count);
> > >  
> > >  	while (offset < max) {
> > > -		entry = (unsigned int *) (buf + offset);
> > > -		rblock = be32_to_cpu(*entry);
> > > +		if (tag_size == sizeof(*entry)) {
> > > +			entry = (unsigned int *) (buf + offset);
> > > +			rblock = be32_to_cpu(*entry);
> > > +		} else {
> > > +			bentry = (unsigned long long *)(buf + offset);
> > > +			rblock = ext2fs_be64_to_cpu(*bentry);
> > > +		}
> > 
> > I wonder whether we really need to have bentry and entry since those
> > are just pointers and should be of the same size regardless of what
> > they are pointing at.
> > 
> > Would not it be better from the readability pov ? Otherwise it looks
> > good.
> 
> One could eliminate the local variables by writing it as such:
> 
> if (...)
> 	rblock = be32_to_cpu(*((__u32 *)(buf + offset)));
> else
> 	rblock = ext2fs_be64_to_cpu(*((__u64 *)(buf + offset)));
> 
> The parentheses are a little harder to figure out in the second version, but I
> don't have a strong opinion either way.

Yes, this seems better to me. But I do not have especially strong
opinion either.

Thanks!
-Lukas

> 
> --D
> > 
> > Thanks!
> > -Lukas
> > 
> > >  		if (dump_all || rblock == block_to_dump) {
> > > -			fprintf(out_file, "  Revoke FS block %u", rblock);
> > > +			fprintf(out_file, "  Revoke FS block %llu", rblock);
> > >  			if (dump_all)
> > >  				fprintf(out_file, "\n");
> > >  			else
> > >  				fprintf(out_file," at block %u, sequence %u\n",
> > >  					blocknr, transaction);
> > >  		}
> > > -		offset += 4;
> > > +		offset += tag_size;
> > >  	}
> > >  }
> > >  
> > > diff --git a/tests/f_jnl_64bit/expect.0 b/tests/f_jnl_64bit/expect.0
> > > index 2007f03..5cef2d8 100644
> > > --- a/tests/f_jnl_64bit/expect.0
> > > +++ b/tests/f_jnl_64bit/expect.0
> > > @@ -1,189 +1,97 @@
> > >  Journal starts at block 67, transaction 32
> > >  Found expected sequence 32, type 5 (revoke table) at block 67
> > >  Dumping revoke block, sequence 32, at block 67:
> > > -  Revoke FS block 0
> > >    Revoke FS block 1536
> > > -  Revoke FS block 0
> > >    Revoke FS block 1472
> > > -  Revoke FS block 0
> > >    Revoke FS block 1473
> > > -  Revoke FS block 0
> > >    Revoke FS block 1474
> > > -  Revoke FS block 0
> > >    Revoke FS block 1475
> > > -  Revoke FS block 0
> > >    Revoke FS block 1476
> > > -  Revoke FS block 0
> > >    Revoke FS block 1541
> > > -  Revoke FS block 0
> > >    Revoke FS block 1477
> > > -  Revoke FS block 0
> > >    Revoke FS block 1478
> > > -  Revoke FS block 0
> > >    Revoke FS block 1479
> > > -  Revoke FS block 0
> > >    Revoke FS block 1480
> > > -  Revoke FS block 0
> > >    Revoke FS block 1481
> > > -  Revoke FS block 0
> > >    Revoke FS block 1482
> > > -  Revoke FS block 0
> > >    Revoke FS block 1483
> > > -  Revoke FS block 0
> > >    Revoke FS block 1484
> > > -  Revoke FS block 0
> > >    Revoke FS block 1485
> > > -  Revoke FS block 0
> > >    Revoke FS block 1486
> > > -  Revoke FS block 0
> > >    Revoke FS block 1487
> > > -  Revoke FS block 0
> > >    Revoke FS block 1488
> > > -  Revoke FS block 0
> > >    Revoke FS block 1489
> > > -  Revoke FS block 0
> > >    Revoke FS block 1490
> > > -  Revoke FS block 0
> > >    Revoke FS block 1491
> > > -  Revoke FS block 0
> > >    Revoke FS block 1556
> > > -  Revoke FS block 0
> > >    Revoke FS block 1492
> > > -  Revoke FS block 0
> > >    Revoke FS block 1493
> > > -  Revoke FS block 0
> > >    Revoke FS block 1429
> > > -  Revoke FS block 0
> > >    Revoke FS block 1494
> > > -  Revoke FS block 0
> > >    Revoke FS block 1495
> > > -  Revoke FS block 0
> > >    Revoke FS block 1496
> > > -  Revoke FS block 0
> > >    Revoke FS block 1432
> > > -  Revoke FS block 0
> > >    Revoke FS block 1497
> > > -  Revoke FS block 0
> > >    Revoke FS block 1498
> > > -  Revoke FS block 0
> > >    Revoke FS block 1434
> > > -  Revoke FS block 0
> > >    Revoke FS block 1499
> > > -  Revoke FS block 0
> > >    Revoke FS block 1435
> > > -  Revoke FS block 0
> > >    Revoke FS block 1500
> > > -  Revoke FS block 0
> > >    Revoke FS block 1501
> > > -  Revoke FS block 0
> > >    Revoke FS block 1502
> > > -  Revoke FS block 0
> > >    Revoke FS block 1503
> > > -  Revoke FS block 0
> > >    Revoke FS block 1504
> > > -  Revoke FS block 0
> > >    Revoke FS block 1505
> > > -  Revoke FS block 0
> > >    Revoke FS block 1506
> > > -  Revoke FS block 0
> > >    Revoke FS block 1442
> > > -  Revoke FS block 0
> > >    Revoke FS block 1507
> > > -  Revoke FS block 0
> > >    Revoke FS block 1508
> > > -  Revoke FS block 0
> > >    Revoke FS block 1444
> > > -  Revoke FS block 0
> > >    Revoke FS block 1509
> > > -  Revoke FS block 0
> > >    Revoke FS block 1445
> > > -  Revoke FS block 0
> > >    Revoke FS block 1510
> > > -  Revoke FS block 0
> > >    Revoke FS block 1511
> > > -  Revoke FS block 0
> > >    Revoke FS block 1512
> > > -  Revoke FS block 0
> > >    Revoke FS block 1513
> > > -  Revoke FS block 0
> > >    Revoke FS block 1449
> > > -  Revoke FS block 0
> > >    Revoke FS block 1514
> > > -  Revoke FS block 0
> > >    Revoke FS block 1515
> > > -  Revoke FS block 0
> > >    Revoke FS block 1516
> > > -  Revoke FS block 0
> > >    Revoke FS block 1517
> > > -  Revoke FS block 0
> > >    Revoke FS block 1453
> > > -  Revoke FS block 0
> > >    Revoke FS block 1518
> > > -  Revoke FS block 0
> > >    Revoke FS block 1519
> > > -  Revoke FS block 0
> > >    Revoke FS block 1520
> > > -  Revoke FS block 0
> > >    Revoke FS block 1456
> > > -  Revoke FS block 0
> > >    Revoke FS block 1521
> > > -  Revoke FS block 0
> > >    Revoke FS block 1457
> > > -  Revoke FS block 0
> > >    Revoke FS block 1522
> > > -  Revoke FS block 0
> > >    Revoke FS block 1458
> > > -  Revoke FS block 0
> > >    Revoke FS block 1523
> > > -  Revoke FS block 0
> > >    Revoke FS block 1459
> > > -  Revoke FS block 0
> > >    Revoke FS block 1524
> > > -  Revoke FS block 0
> > >    Revoke FS block 1460
> > > -  Revoke FS block 0
> > >    Revoke FS block 1525
> > > -  Revoke FS block 0
> > >    Revoke FS block 1461
> > > -  Revoke FS block 0
> > >    Revoke FS block 1526
> > > -  Revoke FS block 0
> > >    Revoke FS block 1462
> > > -  Revoke FS block 0
> > >    Revoke FS block 1527
> > > -  Revoke FS block 0
> > >    Revoke FS block 1463
> > > -  Revoke FS block 0
> > >    Revoke FS block 1528
> > > -  Revoke FS block 0
> > >    Revoke FS block 1464
> > > -  Revoke FS block 0
> > >    Revoke FS block 1529
> > > -  Revoke FS block 0
> > >    Revoke FS block 1465
> > > -  Revoke FS block 0
> > >    Revoke FS block 1530
> > > -  Revoke FS block 0
> > >    Revoke FS block 1466
> > > -  Revoke FS block 0
> > >    Revoke FS block 1531
> > > -  Revoke FS block 0
> > >    Revoke FS block 1467
> > > -  Revoke FS block 0
> > >    Revoke FS block 1532
> > > -  Revoke FS block 0
> > >    Revoke FS block 1468
> > > -  Revoke FS block 0
> > >    Revoke FS block 1533
> > > -  Revoke FS block 0
> > >    Revoke FS block 1469
> > > -  Revoke FS block 0
> > >    Revoke FS block 1534
> > > -  Revoke FS block 0
> > >    Revoke FS block 1470
> > > -  Revoke FS block 0
> > >    Revoke FS block 1535
> > > -  Revoke FS block 0
> > >    Revoke FS block 1471
> > >  Found expected sequence 32, type 1 (descriptor block) at block 68
> > >  Dumping descriptor block, sequence 32, at block 68:
> > > @@ -323,163 +231,84 @@ Dumping descriptor block, sequence 32, at block 150:
> > >  Found expected sequence 32, type 2 (commit block) at block 201
> > >  Found expected sequence 33, type 5 (revoke table) at block 202
> > >  Dumping revoke block, sequence 33, at block 202:
> > > -  Revoke FS block 0
> > >    Revoke FS block 1600
> > > -  Revoke FS block 0
> > >    Revoke FS block 1601
> > > -  Revoke FS block 0
> > >    Revoke FS block 1537
> > > -  Revoke FS block 0
> > >    Revoke FS block 1602
> > > -  Revoke FS block 0
> > >    Revoke FS block 1538
> > > -  Revoke FS block 0
> > >    Revoke FS block 1603
> > > -  Revoke FS block 0
> > >    Revoke FS block 1539
> > > -  Revoke FS block 0
> > >    Revoke FS block 1604
> > > -  Revoke FS block 0
> > >    Revoke FS block 1540
> > > -  Revoke FS block 0
> > >    Revoke FS block 1605
> > > -  Revoke FS block 0
> > >    Revoke FS block 1606
> > > -  Revoke FS block 0
> > >    Revoke FS block 1542
> > > -  Revoke FS block 0
> > >    Revoke FS block 1607
> > > -  Revoke FS block 0
> > >    Revoke FS block 1543
> > > -  Revoke FS block 0
> > >    Revoke FS block 1608
> > > -  Revoke FS block 0
> > >    Revoke FS block 1544
> > > -  Revoke FS block 0
> > >    Revoke FS block 1609
> > > -  Revoke FS block 0
> > >    Revoke FS block 1545
> > > -  Revoke FS block 0
> > >    Revoke FS block 1610
> > > -  Revoke FS block 0
> > >    Revoke FS block 1546
> > > -  Revoke FS block 0
> > >    Revoke FS block 1611
> > > -  Revoke FS block 0
> > >    Revoke FS block 1547
> > > -  Revoke FS block 0
> > >    Revoke FS block 1612
> > > -  Revoke FS block 0
> > >    Revoke FS block 1548
> > > -  Revoke FS block 0
> > >    Revoke FS block 1613
> > > -  Revoke FS block 0
> > >    Revoke FS block 1549
> > > -  Revoke FS block 0
> > >    Revoke FS block 1614
> > > -  Revoke FS block 0
> > >    Revoke FS block 1550
> > > -  Revoke FS block 0
> > >    Revoke FS block 1615
> > > -  Revoke FS block 0
> > >    Revoke FS block 1551
> > > -  Revoke FS block 0
> > >    Revoke FS block 1616
> > > -  Revoke FS block 0
> > >    Revoke FS block 1552
> > > -  Revoke FS block 0
> > >    Revoke FS block 1617
> > > -  Revoke FS block 0
> > >    Revoke FS block 1553
> > > -  Revoke FS block 0
> > >    Revoke FS block 1554
> > > -  Revoke FS block 0
> > >    Revoke FS block 1555
> > > -  Revoke FS block 0
> > >    Revoke FS block 1557
> > > -  Revoke FS block 0
> > >    Revoke FS block 1558
> > > -  Revoke FS block 0
> > >    Revoke FS block 1559
> > > -  Revoke FS block 0
> > >    Revoke FS block 1560
> > > -  Revoke FS block 0
> > >    Revoke FS block 1561
> > > -  Revoke FS block 0
> > >    Revoke FS block 1562
> > > -  Revoke FS block 0
> > >    Revoke FS block 1563
> > > -  Revoke FS block 0
> > >    Revoke FS block 1564
> > > -  Revoke FS block 0
> > >    Revoke FS block 1565
> > > -  Revoke FS block 0
> > >    Revoke FS block 1566
> > > -  Revoke FS block 0
> > >    Revoke FS block 1567
> > > -  Revoke FS block 0
> > >    Revoke FS block 1568
> > > -  Revoke FS block 0
> > >    Revoke FS block 1569
> > > -  Revoke FS block 0
> > >    Revoke FS block 1570
> > > -  Revoke FS block 0
> > >    Revoke FS block 1571
> > > -  Revoke FS block 0
> > >    Revoke FS block 1572
> > > -  Revoke FS block 0
> > >    Revoke FS block 1573
> > > -  Revoke FS block 0
> > >    Revoke FS block 1574
> > > -  Revoke FS block 0
> > >    Revoke FS block 1575
> > > -  Revoke FS block 0
> > >    Revoke FS block 1576
> > > -  Revoke FS block 0
> > >    Revoke FS block 1577
> > > -  Revoke FS block 0
> > >    Revoke FS block 1578
> > > -  Revoke FS block 0
> > >    Revoke FS block 1579
> > > -  Revoke FS block 0
> > >    Revoke FS block 1580
> > > -  Revoke FS block 0
> > >    Revoke FS block 1581
> > > -  Revoke FS block 0
> > >    Revoke FS block 1582
> > > -  Revoke FS block 0
> > >    Revoke FS block 1583
> > > -  Revoke FS block 0
> > >    Revoke FS block 1584
> > > -  Revoke FS block 0
> > >    Revoke FS block 1585
> > > -  Revoke FS block 0
> > >    Revoke FS block 1586
> > > -  Revoke FS block 0
> > >    Revoke FS block 1587
> > > -  Revoke FS block 0
> > >    Revoke FS block 1588
> > > -  Revoke FS block 0
> > >    Revoke FS block 1589
> > > -  Revoke FS block 0
> > >    Revoke FS block 1590
> > > -  Revoke FS block 0
> > >    Revoke FS block 1591
> > > -  Revoke FS block 0
> > >    Revoke FS block 1592
> > > -  Revoke FS block 0
> > >    Revoke FS block 1593
> > > -  Revoke FS block 0
> > >    Revoke FS block 1594
> > > -  Revoke FS block 0
> > >    Revoke FS block 1595
> > > -  Revoke FS block 0
> > >    Revoke FS block 1596
> > > -  Revoke FS block 0
> > >    Revoke FS block 1597
> > > -  Revoke FS block 0
> > >    Revoke FS block 1598
> > > -  Revoke FS block 0
> > >    Revoke FS block 1599
> > >  Found expected sequence 33, type 1 (descriptor block) at block 203
> > >  Dumping descriptor block, sequence 33, at block 203:
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-06  1:33     ` Darrick J. Wong
@ 2014-05-06 12:50       ` Lukáš Czerner
  2014-05-06 15:21         ` Theodore Ts'o
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-06 12:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7203 bytes --]

On Mon, 5 May 2014, Darrick J. Wong wrote:

> Date: Mon, 5 May 2014 18:33:17 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 37/37] ext5: define new subtype to add features and reduce
>      testing complexity
> 
> On Fri, May 02, 2014 at 11:45:25AM +0200, Lukáš Czerner wrote:
> > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > 
> > > Date: Thu, 01 May 2014 16:16:29 -0700
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > Cc: linux-ext4@vger.kernel.org
> > > Subject: [PATCH 37/37] ext5: define new subtype to add features and reduce
> > >     testing complexity
> > > 
> > > This patch defines ext5 as a set of required feature flags and mount
> > > options, for the purpose of spreading new features to freshly
> > > formatted filesystems and reducing the testing matrix by disabling
> > > nearly all mount options.  The patch uses the s_minor_rev_level field
> > > to indicate the existence of ext5, and switch on feature/mount option
> > > enforcement in the kernel.
> > > 
> > > The required feature set is:
> > > ^resize_inode,dirindex,ext_attr,sparse_super2,filetype,meta_bg,extents,
> > > ^flex_bg,64bit,inline_data,sparse_super,huge_file,large_file,dir_nlink,
> > > extra_isize,metadata_csum
> > > 
> > > The required mount options are:
> > > acl,block_validity,user_xattr,journal_checksum
> > > 
> > > All other mount options are no longer functional.
> > > 
> > > The 'ext4' type remains unchanged, for people who require mount
> > > options or a different feature set.  I don't intend to fork any code;
> > > I'm just painting a bigger target (for testing).
> > 
> > This is definitely NACK by me. I do not like this and there are
> > several reasons why.
> > 
> > First of all the name. Given the history of ext file system we tend
> > to increase then number with the new version of file system. However
> > you're saying that this is just for testing features ... in that
> > case it does not make any sense to call it ext5, but not just that
> > it's stupid to call it ext5 especially since we might actually want
> > to release ext5 in the future and this would be really confusing for
> > everybody involved.
> 
> I should have been clearer about my aim for "ext5" -- I want to define
> ext5 to be "ext4 + some new features - some mount options", and then
> work on stabilizing those features.  Historically, we've defined each
> extN to be ext(N-1) + more features, and that's what I'm doing here
> too.  ext5 would be a real release, with new features and fewer mount
> options.  The comment about reducing testing was merely a reflection
> upon the side effects of locking down some of the feature flags and
> mount options.
> 
> I don't think it's a good idea to change what features you get with
> 'mke2fs -T ext4' since that hasn't changed since ~2008 or so.
> 
> Maybe I should have called it ext5dev and killed off ext4dev.
> 
> > I've been trying to get rid of the ext4dev bits and pieces
> > more-or-less successfully and you're adding new type once again. We
> > might start the discussion whether to revive ext4dev for this kind
> > of thing but I am not really convinced that this is the right way to
> > go either.
> > 
> > What about just simply using mkefs.conf to specify the feature set
> > we want and use that ? It's simple enough and it should work. We
> > could also extend the configuration to be able to set default
> > mount options and such if that's not possible. I just do not understand
> > why to introduce new file system type if that's just for testing
> > ext4 features.
> 
> Well, yes, I could just create a new fs_types stanza in mke2fs.conf.
> I wanted to put a little more teeth in that and actually have the
> kernel and e2fsck be able to check that a FS has been declared as
> 'ext5' and that all the required bits are really there, hence the
> ability to set s_minor_rev_level.  I'm not really married to going
> that far, though.
> 
> (There's already an interface for specifying some of the default mount
> options in the superblock; that was sufficient for me.)

"ext5 would be a real release"

This is the most important information that was somewhat hidden in
the original post. If that's the case we should have the discussion
whether we want to release ext5 in reasonably near future.


Let's see what are the difference between ext4 feature wise:

 + 64bit
 + meta_bg
 + sparse_super2
 + inline_data
 + metadata_csum
 - resize_inode

64bit - This is something I've proposed enabling by default for ext4
	already
	(http://www.spinics.net/lists/linux-ext4/msg42294.html) as
	this is a logical step and not really a huge change.
	This also implies disabling resize_inode.

meta_bg - Just makes group descriptors to be spread across the file
	system. It has been around for some time and I am not sure
	why this is not a default already. It should also increase
	the limit of the file system size but I am not sure whether
	this is still true with flex_bg ?
	This does not work together with resize_inode but that's
	true for 64bit as well. So I think that this should be
	default on ext4 or do we have any concerns about this one ?
	Not a big change anyway.

sparse_super2 -
	Limits the number of backup superblocks even more than
	sparse_super. This generally does not bring anything
	useful. It allows us to have more flexible layout for a
	specialized devices such as SMR. I do not thin there is a
	reason for this to be a default, but we can use it on those
	specialized devices which should be determined at mkfs time.
	On the other hand this is really small change to the format
	and I would not strongly object against this being the
	default, but it does not bring us anything on itself.

inline_data -
	I think that this is only really useful if we're using
	bigalloc feature, or if we have really big inodes, otherwise
	it does not bring anything useful. I do not think we want
	this as a default but rather having it enabled at mkfs time
	simultaneously with bigalloc and possibly with big inodes.

metadata_csum -
	That's the biggest feature, which is the most significant
	from the user perspective and changes the behaviour of the
	file system. Eventually yes, I think that this should be
	enabled by default because indeed we're in the business of
	keeping user data safe. However is this a reason to release
	a new file system ? I definitely do not think so.

Please correct me if I forgot about something.

So my conclusion is that it's not at all worth releasing a new file
system for this set of feature where only one of which is actually
more significant.

Yes, if we were to change the file system format significantly, like
let's say get rid of bitmaps and replace it with b-trees following a
huge change in the allocator, then I think it would be worthwhile to
make this big step and start with a new file system.

As fat as mount options are concerned I do not think there is
anything significant either. And releasing a new file system just to
get rid of the mount options does not seem like a best approach
either :)

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-06 12:50       ` Lukáš Czerner
@ 2014-05-06 15:21         ` Theodore Ts'o
  2014-05-06 15:30           ` Lukáš Czerner
  0 siblings, 1 reply; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-06 15:21 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: Darrick J. Wong, linux-ext4

On Tue, May 06, 2014 at 02:50:39PM +0200, Lukáš Czerner wrote:
> meta_bg - Just makes group descriptors to be spread across the file
> 	system. It has been around for some time and I am not sure
> 	why this is not a default already. It should also increase
> 	the limit of the file system size but I am not sure whether
> 	this is still true with flex_bg ?

meta_bg signiicantly slows down mount operations (and in general, any
operation where we need to read in the block group descriptors ---
i.e., dumpe2fs, e2fsck, etc.)

The strategy for meta_bg is that it's something that we enable as we
need it, as part of an online or off-line resize.  That way, we keep
the block groups contiguous for as long as possible.  Once the resize
inode has been exhausted (which _will_ happen when the file system
size grows beyond 16T), the resize operation will turn off the
resize_inode feature and then enable the meta_bg feature.

And this is all working today, with the latest kernel and e2fsprogs;
so there's no reason to enable meta_bg as part of mke2fs operation,
and a good reason not to enable it by default, but to let resize2fs
turn it on when it makes sense to do so.

        	    		       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity
  2014-05-06 15:21         ` Theodore Ts'o
@ 2014-05-06 15:30           ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-06 15:30 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Darrick J. Wong, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1688 bytes --]

On Tue, 6 May 2014, Theodore Ts'o wrote:

> Date: Tue, 6 May 2014 11:21:29 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 37/37] ext5: define new subtype to add features and reduce
>      testing complexity
> 
> On Tue, May 06, 2014 at 02:50:39PM +0200, Lukáš Czerner wrote:
> > meta_bg - Just makes group descriptors to be spread across the file
> > 	system. It has been around for some time and I am not sure
> > 	why this is not a default already. It should also increase
> > 	the limit of the file system size but I am not sure whether
> > 	this is still true with flex_bg ?
> 
> meta_bg signiicantly slows down mount operations (and in general, any
> operation where we need to read in the block group descriptors ---
> i.e., dumpe2fs, e2fsck, etc.)
> 
> The strategy for meta_bg is that it's something that we enable as we
> need it, as part of an online or off-line resize.  That way, we keep
> the block groups contiguous for as long as possible.  Once the resize
> inode has been exhausted (which _will_ happen when the file system
> size grows beyond 16T), the resize operation will turn off the
> resize_inode feature and then enable the meta_bg feature.
> 
> And this is all working today, with the latest kernel and e2fsprogs;
> so there's no reason to enable meta_bg as part of mke2fs operation,
> and a good reason not to enable it by default, but to let resize2fs
> turn it on when it makes sense to do so.
> 
>         	    		       - Ted

Perfect, thanks for explanation. I was not very sure about meta_bg
myself.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-01 23:14 ` [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
@ 2014-05-06 15:45   ` Lukáš Czerner
  2014-05-06 19:59     ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-06 15:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 1 May 2014, Darrick J. Wong wrote:

> Date: Thu, 01 May 2014 16:14:07 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
> 
> In order to support fallocate, we need to be able to have
> ext2fs_bmap2() allocate blocks and put them into uninitialized
> extents.  There's a flag to do this in the extent code, but it's not
> exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
> or somebody will use it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/bmap.c      |   24 ++++++++++++++++++++++--
>  lib/ext2fs/ext2fs.h    |    1 +
>  lib/ext2fs/mkjournal.c |   17 +++++++++++++++++
>  3 files changed, 40 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> index c1d0e6f..a4dc8ef 100644
> --- a/lib/ext2fs/bmap.c
> +++ b/lib/ext2fs/bmap.c
> @@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
>  					    block_buf + fs->blocksize, &b);
>  		if (retval)
>  			return retval;
> +		if (flags & BMAP_UNINIT) {
> +			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
> +			if (retval)
> +				return retval;
> +		}
>  
>  #ifdef WORDS_BIGENDIAN
>  		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
> @@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
>  	errcode_t		retval = 0;
>  	blk64_t			blk64 = 0;
>  	int			alloc = 0;
> +	int			set_flags;
> +
> +	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
>  
>  	if (bmap_flags & BMAP_SET) {
>  		retval = ext2fs_extent_set_bmap(handle, block,
> -						*phys_blk, 0);
> +						*phys_blk, set_flags);
>  		return retval;
>  	}
>  	retval = ext2fs_extent_goto(handle, block);
> @@ -254,7 +262,7 @@ got_block:
>  		alloc++;
>  	set_extent:
>  		retval = ext2fs_extent_set_bmap(handle, block,
> -						blk64, 0);
> +						blk64, set_flags);
>  		if (retval) {
>  			ext2fs_block_alloc_stats2(fs, blk64, -1);
>  			return retval;
> @@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
>  		goto done;
>  	}
>  
> +	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
> +		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
> +		if (retval)
> +			goto done;
> +	}
> +
>  	if (block < EXT2_NDIR_BLOCKS) {
>  		if (bmap_flags & BMAP_SET) {
>  			b = *phys_blk;
> @@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
>  			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
>  			if (retval)
>  				goto done;
> +			if (bmap_flags & BMAP_UNINIT) {
> +				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
> +							     NULL);
> +				if (retval)
> +					goto done;
> +			}
>  			inode_bmap(inode, block) = b;
>  			blocks_alloc++;
>  			*phys_blk = b;
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 599c972..819a14a 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
>   */
>  #define BMAP_ALLOC	0x0001
>  #define BMAP_SET	0x0002
> +#define BMAP_UNINIT	0x0004
>  
>  /*
>   * Returned flags from ext2fs_bmap
> diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> index 884d9c0..ecc3912 100644
> --- a/lib/ext2fs/mkjournal.c
> +++ b/lib/ext2fs/mkjournal.c
> @@ -174,6 +174,23 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
>  			return ENOMEM;
>  		memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
>  	}
> +
> +	/* Try discard, if it zeroes data... */
> +	if (io_channel_discard_zeroes_data(fs->io)) {
> +		memset(buf + fs->blocksize, 0, fs->blocksize);
> +		retval = io_channel_discard(fs->io, blk, num);
> +		if (retval)
> +			goto skip_discard;
> +		retval = io_channel_read_blk64(fs->io, blk, 1, buf);
> +		if (retval)
> +			goto skip_discard;
> +		if (memcmp(buf, buf + fs->blocksize, fs->blocksize) == 0)
> +			return 0;
> +		/* Hah!  Discard doesn't zero! */
> +		fs->io->flags &= ~CHANNEL_FLAGS_DISCARD_ZEROES;
> +	}
> +skip_discard:

You did not mention that in the description, but this is actually a
problem. The reason is that discard might not be reliable on some
devices. This has been discussed several times and I am not the only
one who've seen that even if the device itself says that it will
return zeroes from discarded regions sometimes it might return data.

I would rather avoid this kind of optimization. However if the
underlying "device" is a loop device then it will be reliable if
it's supported. Also if then underlying "device" is a image then we
can just simply use punch hole.

Thanks!
-Lukas

> +
>  	/* OK, do the write loop */
>  	j=0;
>  	while (j < num) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-06 15:45   ` Lukáš Czerner
@ 2014-05-06 19:59     ` Darrick J. Wong
  2014-05-07 10:02       ` Lukáš Czerner
  0 siblings, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-06 19:59 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Tue, May 06, 2014 at 05:45:01PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:14:07 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
> > 
> > In order to support fallocate, we need to be able to have
> > ext2fs_bmap2() allocate blocks and put them into uninitialized
> > extents.  There's a flag to do this in the extent code, but it's not
> > exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
> > or somebody will use it.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  lib/ext2fs/bmap.c      |   24 ++++++++++++++++++++++--
> >  lib/ext2fs/ext2fs.h    |    1 +
> >  lib/ext2fs/mkjournal.c |   17 +++++++++++++++++
> >  3 files changed, 40 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> > index c1d0e6f..a4dc8ef 100644
> > --- a/lib/ext2fs/bmap.c
> > +++ b/lib/ext2fs/bmap.c
> > @@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
> >  					    block_buf + fs->blocksize, &b);
> >  		if (retval)
> >  			return retval;
> > +		if (flags & BMAP_UNINIT) {
> > +			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
> > +			if (retval)
> > +				return retval;
> > +		}
> >  
> >  #ifdef WORDS_BIGENDIAN
> >  		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
> > @@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
> >  	errcode_t		retval = 0;
> >  	blk64_t			blk64 = 0;
> >  	int			alloc = 0;
> > +	int			set_flags;
> > +
> > +	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
> >  
> >  	if (bmap_flags & BMAP_SET) {
> >  		retval = ext2fs_extent_set_bmap(handle, block,
> > -						*phys_blk, 0);
> > +						*phys_blk, set_flags);
> >  		return retval;
> >  	}
> >  	retval = ext2fs_extent_goto(handle, block);
> > @@ -254,7 +262,7 @@ got_block:
> >  		alloc++;
> >  	set_extent:
> >  		retval = ext2fs_extent_set_bmap(handle, block,
> > -						blk64, 0);
> > +						blk64, set_flags);
> >  		if (retval) {
> >  			ext2fs_block_alloc_stats2(fs, blk64, -1);
> >  			return retval;
> > @@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> >  		goto done;
> >  	}
> >  
> > +	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
> > +		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
> > +		if (retval)
> > +			goto done;
> > +	}
> > +
> >  	if (block < EXT2_NDIR_BLOCKS) {
> >  		if (bmap_flags & BMAP_SET) {
> >  			b = *phys_blk;
> > @@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> >  			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
> >  			if (retval)
> >  				goto done;
> > +			if (bmap_flags & BMAP_UNINIT) {
> > +				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
> > +							     NULL);
> > +				if (retval)
> > +					goto done;
> > +			}
> >  			inode_bmap(inode, block) = b;
> >  			blocks_alloc++;
> >  			*phys_blk = b;
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 599c972..819a14a 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
> >   */
> >  #define BMAP_ALLOC	0x0001
> >  #define BMAP_SET	0x0002
> > +#define BMAP_UNINIT	0x0004
> >  
> >  /*
> >   * Returned flags from ext2fs_bmap
> > diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> > index 884d9c0..ecc3912 100644
> > --- a/lib/ext2fs/mkjournal.c
> > +++ b/lib/ext2fs/mkjournal.c
> > @@ -174,6 +174,23 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
> >  			return ENOMEM;
> >  		memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
> >  	}
> > +
> > +	/* Try discard, if it zeroes data... */
> > +	if (io_channel_discard_zeroes_data(fs->io)) {
> > +		memset(buf + fs->blocksize, 0, fs->blocksize);
> > +		retval = io_channel_discard(fs->io, blk, num);
> > +		if (retval)
> > +			goto skip_discard;
> > +		retval = io_channel_read_blk64(fs->io, blk, 1, buf);
> > +		if (retval)
> > +			goto skip_discard;
> > +		if (memcmp(buf, buf + fs->blocksize, fs->blocksize) == 0)
> > +			return 0;
> > +		/* Hah!  Discard doesn't zero! */
> > +		fs->io->flags &= ~CHANNEL_FLAGS_DISCARD_ZEROES;
> > +	}
> > +skip_discard:
> 
> You did not mention that in the description, but this is actually a
> problem. The reason is that discard might not be reliable on some
> devices. This has been discussed several times and I am not the only
> one who've seen that even if the device itself says that it will
> return zeroes from discarded regions sometimes it might return data.

I agree that the storage not living up to the interface it advertises is a
problem, hence the verification step that will unset the io channel flag if it
finds that the device is lying.

On the other hand, I wonder if this ought to be abstracted away in an
io_channel_zero() call that takes care of figuring out if it can do a zeroing
discard or if it has to write a block of zeroes.

Or, are you worried that a discard and immediate re-read will appear to work,
but that a later re-read will return non-zero data?

> I would rather avoid this kind of optimization. However if the
> underlying "device" is a loop device then it will be reliable if
> it's supported. Also if then underlying "device" is a image then we
> can just simply use punch hole.

But static whitelisting is also problematic -- what if the storage device is an
AHCI (or virtio-scsi) disk in QEMU that's ultimately backed by a file that we
can punch_hole?  How do we distinguish that from an SSD hooked up to SATA
hardware?

In the qemu emulated AHCI case we ought to be able to zeroing discard, if
advertised.  I thought it was a reasonable compromise to trust that it works
and verify the results afterward.

--D
> 
> Thanks!
> -Lukas
> 
> > +
> >  	/* OK, do the write loop */
> >  	j=0;
> >  	while (j < num) {
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-06 19:59     ` Darrick J. Wong
@ 2014-05-07 10:02       ` Lukáš Czerner
  2014-05-07 21:37         ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-07 10:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8066 bytes --]

On Tue, 6 May 2014, Darrick J. Wong wrote:

> Date: Tue, 6 May 2014 12:59:38 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 16/37] libext2fs: support allocating uninit blocks in
>     bmap2()
> 
> On Tue, May 06, 2014 at 05:45:01PM +0200, Lukáš Czerner wrote:
> > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > 
> > > Date: Thu, 01 May 2014 16:14:07 -0700
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > Cc: linux-ext4@vger.kernel.org
> > > Subject: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
> > > 
> > > In order to support fallocate, we need to be able to have
> > > ext2fs_bmap2() allocate blocks and put them into uninitialized
> > > extents.  There's a flag to do this in the extent code, but it's not
> > > exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
> > > or somebody will use it.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  lib/ext2fs/bmap.c      |   24 ++++++++++++++++++++++--
> > >  lib/ext2fs/ext2fs.h    |    1 +
> > >  lib/ext2fs/mkjournal.c |   17 +++++++++++++++++
> > >  3 files changed, 40 insertions(+), 2 deletions(-)
> > > 
> > > 
> > > diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> > > index c1d0e6f..a4dc8ef 100644
> > > --- a/lib/ext2fs/bmap.c
> > > +++ b/lib/ext2fs/bmap.c
> > > @@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
> > >  					    block_buf + fs->blocksize, &b);
> > >  		if (retval)
> > >  			return retval;
> > > +		if (flags & BMAP_UNINIT) {
> > > +			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
> > > +			if (retval)
> > > +				return retval;
> > > +		}
> > >  
> > >  #ifdef WORDS_BIGENDIAN
> > >  		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
> > > @@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
> > >  	errcode_t		retval = 0;
> > >  	blk64_t			blk64 = 0;
> > >  	int			alloc = 0;
> > > +	int			set_flags;
> > > +
> > > +	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
> > >  
> > >  	if (bmap_flags & BMAP_SET) {
> > >  		retval = ext2fs_extent_set_bmap(handle, block,
> > > -						*phys_blk, 0);
> > > +						*phys_blk, set_flags);
> > >  		return retval;
> > >  	}
> > >  	retval = ext2fs_extent_goto(handle, block);
> > > @@ -254,7 +262,7 @@ got_block:
> > >  		alloc++;
> > >  	set_extent:
> > >  		retval = ext2fs_extent_set_bmap(handle, block,
> > > -						blk64, 0);
> > > +						blk64, set_flags);
> > >  		if (retval) {
> > >  			ext2fs_block_alloc_stats2(fs, blk64, -1);
> > >  			return retval;
> > > @@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> > >  		goto done;
> > >  	}
> > >  
> > > +	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
> > > +		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
> > > +		if (retval)
> > > +			goto done;
> > > +	}
> > > +
> > >  	if (block < EXT2_NDIR_BLOCKS) {
> > >  		if (bmap_flags & BMAP_SET) {
> > >  			b = *phys_blk;
> > > @@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> > >  			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
> > >  			if (retval)
> > >  				goto done;
> > > +			if (bmap_flags & BMAP_UNINIT) {
> > > +				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
> > > +							     NULL);
> > > +				if (retval)
> > > +					goto done;
> > > +			}
> > >  			inode_bmap(inode, block) = b;
> > >  			blocks_alloc++;
> > >  			*phys_blk = b;
> > > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > > index 599c972..819a14a 100644
> > > --- a/lib/ext2fs/ext2fs.h
> > > +++ b/lib/ext2fs/ext2fs.h
> > > @@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
> > >   */
> > >  #define BMAP_ALLOC	0x0001
> > >  #define BMAP_SET	0x0002
> > > +#define BMAP_UNINIT	0x0004
> > >  
> > >  /*
> > >   * Returned flags from ext2fs_bmap
> > > diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> > > index 884d9c0..ecc3912 100644
> > > --- a/lib/ext2fs/mkjournal.c
> > > +++ b/lib/ext2fs/mkjournal.c
> > > @@ -174,6 +174,23 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
> > >  			return ENOMEM;
> > >  		memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
> > >  	}
> > > +
> > > +	/* Try discard, if it zeroes data... */
> > > +	if (io_channel_discard_zeroes_data(fs->io)) {
> > > +		memset(buf + fs->blocksize, 0, fs->blocksize);
> > > +		retval = io_channel_discard(fs->io, blk, num);
> > > +		if (retval)
> > > +			goto skip_discard;
> > > +		retval = io_channel_read_blk64(fs->io, blk, 1, buf);
> > > +		if (retval)
> > > +			goto skip_discard;
> > > +		if (memcmp(buf, buf + fs->blocksize, fs->blocksize) == 0)
> > > +			return 0;
> > > +		/* Hah!  Discard doesn't zero! */
> > > +		fs->io->flags &= ~CHANNEL_FLAGS_DISCARD_ZEROES;
> > > +	}
> > > +skip_discard:
> > 
> > You did not mention that in the description, but this is actually a
> > problem. The reason is that discard might not be reliable on some
> > devices. This has been discussed several times and I am not the only
> > one who've seen that even if the device itself says that it will
> > return zeroes from discarded regions sometimes it might return data.
> 
> I agree that the storage not living up to the interface it advertises is a
> problem, hence the verification step that will unset the io channel flag if it
> finds that the device is lying.
> 
> On the other hand, I wonder if this ought to be abstracted away in an
> io_channel_zero() call that takes care of figuring out if it can do a zeroing
> discard or if it has to write a block of zeroes.
> 
> Or, are you worried that a discard and immediate re-read will appear to work,
> but that a later re-read will return non-zero data?

Yes I am, because we know that it sometimes behaves unpredictably
and this is one of the things that might just happen. Even though I
have not seen this exact case I've seen the opposite where right
after discard I've read non zero values but later it actually
returned zeroes.

So I would much rather not rely on discard here because you might
expose stale data on indirect files and there is no way to turn this
optimization off.

> 
> > I would rather avoid this kind of optimization. However if the
> > underlying "device" is a loop device then it will be reliable if
> > it's supported. Also if then underlying "device" is a image then we
> > can just simply use punch hole.
> 
> But static whitelisting is also problematic -- what if the storage device is an
> AHCI (or virtio-scsi) disk in QEMU that's ultimately backed by a file that we
> can punch_hole?  How do we distinguish that from an SSD hooked up to SATA
> hardware?

We do not. We can only do that if we know we're sitting on a file.
It is really unfortunate, but I think that there is a limitation in
how we can use discard.

However we could use write same which should help on devices which
supports it and on the fs images because QEMU will convert that to
zero range (at least on xfs since ext4 implementation is quite new).
However I have no idea what is the interface to do that.

-Lukas

> 
> In the qemu emulated AHCI case we ought to be able to zeroing discard, if
> advertised.  I thought it was a reasonable compromise to trust that it works
> and verify the results afterward.
> 
> --D
> > 
> > Thanks!
> > -Lukas
> > 
> > > +
> > >  	/* OK, do the write loop */
> > >  	j=0;
> > >  	while (j < num) {
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-07 10:02       ` Lukáš Czerner
@ 2014-05-07 21:37         ` Darrick J. Wong
  2014-05-08  0:13           ` [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks Darrick J. Wong
  2014-05-08  0:14           ` [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
  0 siblings, 2 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-07 21:37 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Wed, May 07, 2014 at 12:02:30PM +0200, Lukáš Czerner wrote:
> On Tue, 6 May 2014, Darrick J. Wong wrote:
> 
> > Date: Tue, 6 May 2014 12:59:38 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: Lukáš Czerner <lczerner@redhat.com>
> > Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> > Subject: Re: [PATCH 16/37] libext2fs: support allocating uninit blocks in
> >     bmap2()
> > 
> > On Tue, May 06, 2014 at 05:45:01PM +0200, Lukáš Czerner wrote:
> > > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > > 
> > > > Date: Thu, 01 May 2014 16:14:07 -0700
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > > Cc: linux-ext4@vger.kernel.org
> > > > Subject: [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2()
> > > > 
> > > > In order to support fallocate, we need to be able to have
> > > > ext2fs_bmap2() allocate blocks and put them into uninitialized
> > > > extents.  There's a flag to do this in the extent code, but it's not
> > > > exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
> > > > or somebody will use it.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  lib/ext2fs/bmap.c      |   24 ++++++++++++++++++++++--
> > > >  lib/ext2fs/ext2fs.h    |    1 +
> > > >  lib/ext2fs/mkjournal.c |   17 +++++++++++++++++
> > > >  3 files changed, 40 insertions(+), 2 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> > > > index c1d0e6f..a4dc8ef 100644
> > > > --- a/lib/ext2fs/bmap.c
> > > > +++ b/lib/ext2fs/bmap.c
> > > > @@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
> > > >  					    block_buf + fs->blocksize, &b);
> > > >  		if (retval)
> > > >  			return retval;
> > > > +		if (flags & BMAP_UNINIT) {
> > > > +			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
> > > > +			if (retval)
> > > > +				return retval;
> > > > +		}
> > > >  
> > > >  #ifdef WORDS_BIGENDIAN
> > > >  		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
> > > > @@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
> > > >  	errcode_t		retval = 0;
> > > >  	blk64_t			blk64 = 0;
> > > >  	int			alloc = 0;
> > > > +	int			set_flags;
> > > > +
> > > > +	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
> > > >  
> > > >  	if (bmap_flags & BMAP_SET) {
> > > >  		retval = ext2fs_extent_set_bmap(handle, block,
> > > > -						*phys_blk, 0);
> > > > +						*phys_blk, set_flags);
> > > >  		return retval;
> > > >  	}
> > > >  	retval = ext2fs_extent_goto(handle, block);
> > > > @@ -254,7 +262,7 @@ got_block:
> > > >  		alloc++;
> > > >  	set_extent:
> > > >  		retval = ext2fs_extent_set_bmap(handle, block,
> > > > -						blk64, 0);
> > > > +						blk64, set_flags);
> > > >  		if (retval) {
> > > >  			ext2fs_block_alloc_stats2(fs, blk64, -1);
> > > >  			return retval;
> > > > @@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> > > >  		goto done;
> > > >  	}
> > > >  
> > > > +	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
> > > > +		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
> > > > +		if (retval)
> > > > +			goto done;
> > > > +	}
> > > > +
> > > >  	if (block < EXT2_NDIR_BLOCKS) {
> > > >  		if (bmap_flags & BMAP_SET) {
> > > >  			b = *phys_blk;
> > > > @@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> > > >  			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
> > > >  			if (retval)
> > > >  				goto done;
> > > > +			if (bmap_flags & BMAP_UNINIT) {
> > > > +				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
> > > > +							     NULL);
> > > > +				if (retval)
> > > > +					goto done;
> > > > +			}
> > > >  			inode_bmap(inode, block) = b;
> > > >  			blocks_alloc++;
> > > >  			*phys_blk = b;
> > > > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > > > index 599c972..819a14a 100644
> > > > --- a/lib/ext2fs/ext2fs.h
> > > > +++ b/lib/ext2fs/ext2fs.h
> > > > @@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
> > > >   */
> > > >  #define BMAP_ALLOC	0x0001
> > > >  #define BMAP_SET	0x0002
> > > > +#define BMAP_UNINIT	0x0004
> > > >  
> > > >  /*
> > > >   * Returned flags from ext2fs_bmap
> > > > diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> > > > index 884d9c0..ecc3912 100644
> > > > --- a/lib/ext2fs/mkjournal.c
> > > > +++ b/lib/ext2fs/mkjournal.c
> > > > @@ -174,6 +174,23 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
> > > >  			return ENOMEM;
> > > >  		memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
> > > >  	}
> > > > +
> > > > +	/* Try discard, if it zeroes data... */
> > > > +	if (io_channel_discard_zeroes_data(fs->io)) {
> > > > +		memset(buf + fs->blocksize, 0, fs->blocksize);
> > > > +		retval = io_channel_discard(fs->io, blk, num);
> > > > +		if (retval)
> > > > +			goto skip_discard;
> > > > +		retval = io_channel_read_blk64(fs->io, blk, 1, buf);
> > > > +		if (retval)
> > > > +			goto skip_discard;
> > > > +		if (memcmp(buf, buf + fs->blocksize, fs->blocksize) == 0)
> > > > +			return 0;
> > > > +		/* Hah!  Discard doesn't zero! */
> > > > +		fs->io->flags &= ~CHANNEL_FLAGS_DISCARD_ZEROES;
> > > > +	}
> > > > +skip_discard:
> > > 
> > > You did not mention that in the description, but this is actually a
> > > problem. The reason is that discard might not be reliable on some
> > > devices. This has been discussed several times and I am not the only
> > > one who've seen that even if the device itself says that it will
> > > return zeroes from discarded regions sometimes it might return data.
> > 
> > I agree that the storage not living up to the interface it advertises is a
> > problem, hence the verification step that will unset the io channel flag if it
> > finds that the device is lying.
> > 
> > On the other hand, I wonder if this ought to be abstracted away in an
> > io_channel_zero() call that takes care of figuring out if it can do a zeroing
> > discard or if it has to write a block of zeroes.
> > 
> > Or, are you worried that a discard and immediate re-read will appear to work,
> > but that a later re-read will return non-zero data?
> 
> Yes I am, because we know that it sometimes behaves unpredictably
> and this is one of the things that might just happen. Even though I
> have not seen this exact case I've seen the opposite where right
> after discard I've read non zero values but later it actually
> returned zeroes.
> 
> So I would much rather not rely on discard here because you might
> expose stale data on indirect files and there is no way to turn this
> optimization off.

Fair enough.

> > 
> > > I would rather avoid this kind of optimization. However if the
> > > underlying "device" is a loop device then it will be reliable if
> > > it's supported. Also if then underlying "device" is a image then we
> > > can just simply use punch hole.
> > 
> > But static whitelisting is also problematic -- what if the storage device is an
> > AHCI (or virtio-scsi) disk in QEMU that's ultimately backed by a file that we
> > can punch_hole?  How do we distinguish that from an SSD hooked up to SATA
> > hardware?
> 
> We do not. We can only do that if we know we're sitting on a file.
> It is really unfortunate, but I think that there is a limitation in
> how we can use discard.
> 
> However we could use write same which should help on devices which
> supports it and on the fs images because QEMU will convert that to
> zero range (at least on xfs since ext4 implementation is quite new).
> However I have no idea what is the interface to do that.

Hrmm, I guess it would be the BLKZEROOUT ioctl for block devices?  Inside the
kernel it appears to be wired up to WRITE_SAME with a zero buffer or just a
regular WRITE with a lot of zero pages attached.  For regular files, punch hole
(or zero range) seems to be fine.  I think.

This ought to get moved into a separate IO manager routine.

--D
> 
> -Lukas
> 
> > 
> > In the qemu emulated AHCI case we ought to be able to zeroing discard, if
> > advertised.  I thought it was a reasonable compromise to trust that it works
> > and verify the results afterward.
> > 
> > --D
> > > 
> > > Thanks!
> > > -Lukas
> > > 
> > > > +
> > > >  	/* OK, do the write loop */
> > > >  	j=0;
> > > >  	while (j < num) {
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/37] e2fsck: verify checksums after checking everything else
  2014-05-06 11:32       ` Lukáš Czerner
@ 2014-05-08  0:05         ` Darrick J. Wong
  0 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-08  0:05 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Tue, May 06, 2014 at 01:32:32PM +0200, Lukáš Czerner wrote:
> On Mon, 5 May 2014, Darrick J. Wong wrote:
> 
> > Date: Mon, 5 May 2014 15:56:47 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: Lukáš Czerner <lczerner@redhat.com>
> > Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> > Subject: Re: [PATCH 10/37] e2fsck: verify checksums after checking everything
> >     else
> > 
> > On Fri, May 02, 2014 at 02:32:11PM +0200, Lukáš Czerner wrote:
> > > On Thu, 1 May 2014, Darrick J. Wong wrote:
> > > 
> > > > Date: Thu, 01 May 2014 16:13:28 -0700
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > > Cc: linux-ext4@vger.kernel.org
> > > > Subject: [PATCH 10/37] e2fsck: verify checksums after checking everything else
> > > > 
> > > > There's a particular problem with e2fsck's user interface where
> > > > checksum errors are concerned:  Fixing the first complaint about
> > > > a checksum problem results in the inode being cleared even if e2fsck
> > > > could otherwise have recovered it.  While this mode is useful for
> > > > cleaning the remaining broken crud off the filesystem, we could at
> > > > least default to checking everything /else/ and only complaining about
> > > > the incorrect checksum if fsck finds nothing else wrong.
> > > > 
> > > > So, plumb in a config option.  We default to "verify and checksum"
> > > > unless the user tell us otherwise.
> > > 
> > > I wonder whether it would not be better to always check the checksum
> > > of an object because it might yield additional information.
> > > 
> > > If the checksum is good and the object is somewhat broken that it's
> > > highly likely that we have a problem within a kernel (or possibly
> > > e2fsprogs if some other operations were performed)
> > > 
> > > If the checksum is bad and the object is bad, then it's likely that
> > > the corruption happened outside of the file system code, in memory,
> > > on disk or in transfer.
> > > 
> > > If checksum is bad and the object is good then it's trickier since it
> > > can be kernel metadata csum bug, unlucky silent corruption, or
> > > intentional change of the metadata.
> > > 
> > > It's not huge amount of information we can get from it, but I think
> > > that it might be useful when dealing with corrupted file system.
> > 
> > Hm.  So right now, the object verification code works roughly like this:
> > 
> > A) Verify checksum, offer to zero object if strict_csums and csum failure.
> > B) Check everything else and offer to fix broken things.
> > C) Verify checksum again; if !strict_csums and csum failure, offer to zero the
> >    object.
> > 
> > Do you think that it would be helpful to users if e2fsck warned of checksum
> > verification failures during step (A) if strict_csums is set?  I think that
> > would help users (or us developers) to distinguish those three scenarios.
> > It wouldn't be difficult to make fix_problem() spit out the message.
> 
> Yes, I think that this is going to be helpful to both, users and
> developers. I am not sure how easy or hard it would be but having
> e2sfck specifically say that:
> 
> "Object checksum is corrupted, but the object seems fine"
> 
> or
> 
> "Object checksum is ok, but the object itself seems corrupted"
> 
> or
> 
> "object checksum is corrupted and the object itself is corrupted"
> 
> after the checksum verification and object check.
> 
> But your solution would be useful as well.

Ok, I've changed the patch to spit out this, what do you think:

Pass 1: Checking inodes, blocks, and sizes
Inode 12 checksum does not match inode.  Running sanity checks.
Inode 12 passes checks, but checksum does not match inode.  Fix? yes

--D
---
From: Darrick J. Wong <darrick.wong@oracle.com>
Subject: [PATCH] e2fsck: verify checksums after checking everything else

There's a particular problem with e2fsck's user interface where
checksum errors are concerned:  Fixing the first complaint about
a checksum problem results in the inode being cleared even if e2fsck
could otherwise have recovered it.  While this mode is useful for
cleaning the remaining broken crud off the filesystem, we could at
least default to checking everything /else/ and only complaining about
the incorrect checksum if fsck finds nothing else wrong.

So, plumb in a config option.  We default to "verify and checksum"
unless the user tell us otherwise.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.8.in      |   12 ++++++++++++
 e2fsck/e2fsck.conf.5.in |   20 ++++++++++++++++++++
 e2fsck/e2fsck.h         |    1 +
 e2fsck/problem.c        |   25 +++++++++++++++++++++----
 e2fsck/problemP.h       |    1 +
 e2fsck/unix.c           |   11 +++++++++++
 6 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index f5ed758..43ee063 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -207,6 +207,18 @@ option may prevent you from further manual data recovery.
 .BI nodiscard
 Do not attempt to discard free blocks and unused inode blocks. This option is
 exactly the opposite of discard option. This is set as default.
+.TP
+.BI strict_csums
+Verify each metadata object's checksum before checking anything other fields
+in the metadata object.  If the verification fails, offer to clear the item,
+also before checking any of the other fields.  This option causes e2fsck to
+favor throwing away broken objects over trying to salvage them.
+.TP
+.BI no_strict_csums
+Perform all regular checks of a metadata object and only verify the checksum if
+no problems were found.  This option causes e2fsck to try to salvage slightly
+damaged metadata objects, at the cost of spending processing time on recovering
+data.  This is set as the default.
 .RE
 .TP
 .B \-f
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index 9ebfbbf..a8219a8 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -222,6 +222,26 @@ If this boolean relation is true, e2fsck will run as if the option
 .B -v
 is always specified.  This will cause e2fsck to print some additional
 information at the end of each full file system check.
+.TP
+.I strict_csums
+If this boolean relation is true, e2fsck will run as if
+.B -E strict_csums
+is set.  This causes e2fsck to verify each metadata object's checksum before
+checking anything other fields in the metadata object.  If the verification
+fails, offer to clear the item, also before checking any of the other fields.
+This option causes e2fsck to favor throwing away broken objects over trying to
+salvage them.
+.IP
+If the boolean relation is false, e2fsck will run as if
+.B -E no_strict_csums
+is set.  In this case, e2fsck will perform all regular checks of a metadata
+object and only verify the checksum if no problems were found.  This option
+causes e2fsck to try to salvage slightly damaged metadata objects, at the cost
+of spending processing time on recovering data.
+.IP
+The default is for e2fsck to behave as if
+.B -E no_strict_csums
+is set.
 .SH THE [problems] STANZA
 Each tag in the
 .I [problems] 
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index dbd6ea8..d7a7be9 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -167,6 +167,7 @@ struct resource_track {
 #define E2F_OPT_FRAGCHECK	0x0800
 #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
 #define E2F_OPT_DISCARD		0x2000
+#define E2F_OPT_CSUM_FIRST	0x4000
 
 /*
  * E2fsck flags
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 7f0ad6c..3683dd4 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -970,7 +970,7 @@ static struct e2fsck_problem problem_table[] = {
 	/* inode checksum does not match inode */
 	{ PR_1_INODE_CSUM_INVALID,
 	  N_("@i %i checksum does not match @i.  "),
-	  PROMPT_CLEAR, PR_PREEN_OK },
+	  PROMPT_CLEAR, PR_PREEN_OK | PR_INITIAL_CSUM },
 
 	/* inode passes checks, but checksum does not match inode */
 	{ PR_1_INODE_ONLY_CSUM_INVALID,
@@ -981,7 +981,7 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1_EXTENT_CSUM_INVALID,
 	  N_("@i %i extent block checksum does not match extent\n\t(logical @b "
 	     "%c, @n physical @b %b, len %N)\n"),
-	  PROMPT_CLEAR, 0 },
+	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
 	 * Inode extent block passes checks, but checksum does not match
@@ -996,7 +996,7 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1_EA_BLOCK_CSUM_INVALID,
 	  N_("Extended attribute @a @b %b checksum for @i %i does not "
 	     "match.  "),
-	  PROMPT_CLEAR, 0 },
+	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
 	 * Extended attribute block passes checks, but checksum for inode does
@@ -1470,7 +1470,7 @@ static struct e2fsck_problem problem_table[] = {
 	/* leaf node fails checksum */
 	{ PR_2_LEAF_NODE_CSUM_INVALID,
 	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
-	  PROMPT_SALVAGE, PR_PREEN_OK },
+	  PROMPT_SALVAGE, PR_PREEN_OK | PR_INITIAL_CSUM },
 
 	/* leaf node has no checksum */
 	{ PR_2_LEAF_NODE_MISSING_CSUM,
@@ -2030,6 +2030,23 @@ int fix_problem(e2fsck_t ctx, problem_t code, struct problem_context *pctx)
 	}
 	if (ctx->logf && message)
 		print_e2fsck_message(ctx->logf, ctx, message, pctx, 1, 0);
+	/*
+	 * If there is a problem with the initial csum verification and the
+	 * user told e2fsck to verify csums /after/ checking everything else,
+	 * then don't "fix" anything, just warn the user that the csum failed
+	 * and that sanity checks are about to be run.
+	 */
+	if ((ptr->flags & PR_INITIAL_CSUM) &&
+	    !(ctx->options & E2F_OPT_CSUM_FIRST)) {
+		if (*message) {
+			print_e2fsck_message(stdout, ctx,
+				"Running sanity checks.\n", pctx, 1, 0);
+			if (ctx->logf)
+				print_e2fsck_message(ctx->logf, ctx,
+					"Running sanity checks.\n", pctx, 1, 0);
+		}
+		return 0;
+	}
 	if (!(ptr->flags & PR_PREEN_OK) && (ptr->prompt != PROMPT_NONE))
 		preenhalt(ctx);
 
diff --git a/e2fsck/problemP.h b/e2fsck/problemP.h
index 7944cd6..a983598 100644
--- a/e2fsck/problemP.h
+++ b/e2fsck/problemP.h
@@ -44,3 +44,4 @@ struct latch_descr {
 #define PR_CONFIG	0x080000 /* This problem has been customized
 				    from the config file */
 #define PR_FORCE_NO	0x100000 /* Force the answer to be no */
+#define PR_INITIAL_CSUM	0x200000 /* User can ignore initial csum check */
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index b39383d..c6cdb49 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -692,6 +692,10 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 			else
 				ctx->log_fn = string_copy(ctx, arg, 0);
 			continue;
+		} else if (strcmp(token, "strict_csums") == 0) {
+			ctx->options |= E2F_OPT_CSUM_FIRST;
+		} else if (strcmp(token, "no_strict_csums") == 0) {
+			ctx->options &= ~E2F_OPT_CSUM_FIRST;
 		} else {
 			fprintf(stderr, _("Unknown extended option: %s\n"),
 				token);
@@ -710,6 +714,8 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs(("\tjournal_only\n"), stderr);
 		fputs(("\tdiscard\n"), stderr);
 		fputs(("\tnodiscard\n"), stderr);
+		fputs(("\tstrict_csums\n"), stderr);
+		fputs(("\tno_strict_csums\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
@@ -945,6 +951,11 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	profile_set_syntax_err_cb(syntax_err_report);
 	profile_init(config_fn, &ctx->profile);
 
+	profile_get_boolean(ctx->profile, "options", "strict_csums", NULL,
+			    0, &c);
+	if (c)
+		ctx->options |= E2F_OPT_CSUM_FIRST;
+
 	profile_get_boolean(ctx->profile, "options", "report_time", 0, 0,
 			    &c);
 	if (c)
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks
  2014-05-07 21:37         ` Darrick J. Wong
@ 2014-05-08  0:13           ` Darrick J. Wong
  2014-05-13 11:11             ` Lukáš Czerner
  2014-05-08  0:14           ` [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
  1 sibling, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-08  0:13 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

Here's the first part, which teaches the IO manager how to connect with the
zero out ioctls.

--D
---
Plumb a new call into the IO manager to support translating
ext2fs_zero_blocks calls into the equivalent kernel-level BLKZEROOUT
ioctl or FALLOC_FL_ZERO_RANGE fallocate flag primitives.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/ext2_io.h    |    7 +++++-
 lib/ext2fs/io_manager.c |   11 ++++++++++
 lib/ext2fs/mkjournal.c  |    6 +++++
 lib/ext2fs/unix_io.c    |   54 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 1894fb8..98d56aa 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -90,7 +90,9 @@ struct struct_io_manager {
 					int count, const void *data);
 	errcode_t (*discard)(io_channel channel, unsigned long long block,
 			     unsigned long long count);
-	long	reserved[16];
+	errcode_t (*zeroout)(io_channel channel, unsigned long long block,
+			     unsigned long long count);
+	long	reserved[15];
 };
 
 #define IO_FLAG_RW		0x0001
@@ -122,6 +124,9 @@ extern errcode_t io_channel_write_blk64(io_channel channel,
 extern errcode_t io_channel_discard(io_channel channel,
 				    unsigned long long block,
 				    unsigned long long count);
+extern errcode_t io_channel_zeroout(io_channel channel,
+				    unsigned long long block,
+				    unsigned long long count);
 extern errcode_t io_channel_alloc_buf(io_channel channel,
 				      int count, void *ptr);
 
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index 34e4859..569d16a 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -112,6 +112,17 @@ errcode_t io_channel_discard(io_channel channel, unsigned long long block,
 	return EXT2_ET_UNIMPLEMENTED;
 }
 
+errcode_t io_channel_zeroout(io_channel channel, unsigned long long block,
+			     unsigned long long count)
+{
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+
+	if (channel->manager->zeroout)
+		return (channel->manager->zeroout)(channel, block, count);
+
+	return EXT2_ET_UNIMPLEMENTED;
+}
+
 errcode_t io_channel_alloc_buf(io_channel io, int count, void *ptr)
 {
 	size_t	size;
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index 884d9c0..339c7e1 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -167,6 +167,12 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
 		}
 		return 0;
 	}
+
+	/* Try a zero out command, if supported */
+	retval = io_channel_zeroout(fs->io, blk, num);
+	if (retval == 0)
+		return 0;
+
 	/* Allocate the zeroizing buffer if necessary */
 	if (!buf) {
 		buf = malloc(fs->blocksize * STRIDE_LENGTH);
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index c3185b6..d070cb0 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -922,6 +922,59 @@ unimplemented:
 	return EXT2_ET_UNIMPLEMENTED;
 }
 
+#if defined(__linux__) && !defined(BLKZEROOUT)
+#define BLKZEROOUT		_IO(0x12,127)
+#endif
+
+#if defined(__linux__) && !defined(FALLOC_FL_ZERO_RANGE)
+#define FALLOC_FL_ZERO_RANGE    0x10
+#endif
+
+static errcode_t unix_zeroout(io_channel channel, unsigned long long block,
+			      unsigned long long count)
+{
+	struct unix_private_data *data;
+	int		ret;
+
+	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+	data = (struct unix_private_data *) channel->private_data;
+	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+	if (channel->flags & CHANNEL_FLAGS_BLOCK_DEVICE) {
+#ifdef BLKZEROOUT
+		__u64 range[2];
+
+		range[0] = (__u64)(block) * channel->block_size;
+		range[1] = (__u64)(count) * channel->block_size;
+
+		ret = ioctl(data->dev, BLKZEROOUT, &range);
+#else
+		goto unimplemented;
+#endif
+	} else {
+#if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE)
+		/*
+		 * If we are not on block device, try to use the zero out
+		 * primitive.
+		 */
+		ret = fallocate(data->dev,
+				FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE,
+				(off_t)(block) * channel->block_size,
+				(off_t)(count) * channel->block_size);
+#else
+		goto unimplemented;
+#endif
+	}
+	if (ret < 0) {
+		if (errno == EOPNOTSUPP)
+			goto unimplemented;
+		return errno;
+	}
+	return 0;
+unimplemented:
+	return EXT2_ET_UNIMPLEMENTED;
+}
+
 static struct struct_io_manager struct_unix_manager = {
 	EXT2_ET_MAGIC_IO_MANAGER,
 	"Unix I/O Manager",
@@ -937,6 +990,7 @@ static struct struct_io_manager struct_unix_manager = {
 	unix_read_blk64,
 	unix_write_blk64,
 	unix_discard,
+	unix_zeroout,
 };
 
 io_manager unix_io_manager = &struct_unix_manager;

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-07 21:37         ` Darrick J. Wong
  2014-05-08  0:13           ` [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks Darrick J. Wong
@ 2014-05-08  0:14           ` Darrick J. Wong
  2014-05-27 16:28             ` Lukáš Czerner
  1 sibling, 1 reply; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-08  0:14 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

Here's the second part, which for the most part is the old patch, but wired up
to use the bits in the other patch.

--D
---
In order to support fallocate, we need to be able to have
ext2fs_bmap2() allocate blocks and put them into uninitialized
extents.  There's a flag to do this in the extent code, but it's not
exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
or somebody will use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/bmap.c   |   24 ++++++++++++++++++++++--
 lib/ext2fs/ext2fs.h |    1 +
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index c1d0e6f..a4dc8ef 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
 					    block_buf + fs->blocksize, &b);
 		if (retval)
 			return retval;
+		if (flags & BMAP_UNINIT) {
+			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
+			if (retval)
+				return retval;
+		}
 
 #ifdef WORDS_BIGENDIAN
 		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
@@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
 	errcode_t		retval = 0;
 	blk64_t			blk64 = 0;
 	int			alloc = 0;
+	int			set_flags;
+
+	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
 
 	if (bmap_flags & BMAP_SET) {
 		retval = ext2fs_extent_set_bmap(handle, block,
-						*phys_blk, 0);
+						*phys_blk, set_flags);
 		return retval;
 	}
 	retval = ext2fs_extent_goto(handle, block);
@@ -254,7 +262,7 @@ got_block:
 		alloc++;
 	set_extent:
 		retval = ext2fs_extent_set_bmap(handle, block,
-						blk64, 0);
+						blk64, set_flags);
 		if (retval) {
 			ext2fs_block_alloc_stats2(fs, blk64, -1);
 			return retval;
@@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 		goto done;
 	}
 
+	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
+		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
+		if (retval)
+			goto done;
+	}
+
 	if (block < EXT2_NDIR_BLOCKS) {
 		if (bmap_flags & BMAP_SET) {
 			b = *phys_blk;
@@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
 			if (retval)
 				goto done;
+			if (bmap_flags & BMAP_UNINIT) {
+				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
+							     NULL);
+				if (retval)
+					goto done;
+			}
 			inode_bmap(inode, block) = b;
 			blocks_alloc++;
 			*phys_blk = b;
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 599c972..819a14a 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
  */
 #define BMAP_ALLOC	0x0001
 #define BMAP_SET	0x0002
+#define BMAP_UNINIT	0x0004
 
 /*
  * Returned flags from ext2fs_bmap

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/37] misc: create better-packaged static analysis reports
  2014-05-01 23:12 ` [PATCH 01/37] misc: create better-packaged static analysis reports Darrick J. Wong
@ 2014-05-11 22:33   ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-11 22:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, May 01, 2014 at 04:12:29PM -0700, Darrick J. Wong wrote:
> Fix some minor bugs relating to passing CFLAGS to cppcheck, and
> package the cppcheck output into nicer looking reports.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] misc: coverity fixes
  2014-05-05 20:04     ` Darrick J. Wong
@ 2014-05-11 22:40       ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-11 22:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Lukáš Czerner, linux-ext4

Applied, with Lukáš's nits addressed.  One comment:

> > > @@ -612,7 +616,7 @@ static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
> > >  				if (p == NULL) {
> > >  					com_err(name, errno,
> > >  						_("Not enough memory"));
> > > -					return errno;
> > > +					goto out;
> > 
> > same here.
> 
> Yes.  Thank you for spotting these.

The original code was buggy here, since realloc() doesn't set errno.
I've added:

					retval = EXT2_ET_NO_MEMORY;

before the "goto out" line.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/37] libext2fs: create sockets when populating filesystem
  2014-05-05 20:08     ` Darrick J. Wong
@ 2014-05-11 22:44       ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-11 22:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Lukáš Czerner, linux-ext4

Thanks, applied.  I've fixed up the issues which Lukáš pointed out,
including replacing the -1 return with EROFS.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data
  2014-05-05 20:10     ` Darrick J. Wong
@ 2014-05-12  0:26       ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-12  0:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Lukáš Czerner, linux-ext4

Thanks, applied with Lukáš's suggested fix.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables
  2014-05-06 11:35       ` Lukáš Czerner
@ 2014-05-12  1:20         ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-12  1:20 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: Darrick J. Wong, linux-ext4

The original code was a bit buggy in that it assumed "unsigned int" is
guaranteed to be 4 bytes --- which of course, is not guaranteed,
although in practice it's true for most systems.

I also tend to think that lots of extra casts are bad for readability,
and while I used to avoid constructs like this because gdb used to
choke horribly on variables defined in nested scopes, this has
thankfully been fixed for years --- even the most ancient and decrepit
RHEL system (or debian oldstable :-P) should have gdb's that can
correctly deal with this:

	if (32-bit file system) {
		__u32 *entry = ...;
		rblock = be32_to_cpu(*entry);
	} else {
		__u64 *entry = ...;
		rblock = ext2fs_be64_to_cpu(*entry);
	} 

					- Ted

commit a1ff15f83b3ab4b4f524cea48e149ec9be93908c
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun May 11 20:57:18 2014 -0400

    debugfs: teach logdump to deal with 64bit revoke tables
    
    The logdump command doesn't know how to deal with revoke tables in
    64bit journals, so teach it to do this.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/debugfs/logdump.c b/debugfs/logdump.c
index 2d0efaf..211e11a 100644
--- a/debugfs/logdump.c
+++ b/debugfs/logdump.c
@@ -526,28 +526,37 @@ static void dump_revoke_block(FILE *out_file, char *buf,
 {
 	int			offset, max;
 	journal_revoke_header_t *header;
-	unsigned int		*entry, rblock;
+	unsigned long long	rblock;
+	int			tag_size = sizeof(__u32);
 
 	if (dump_all)
 		fprintf(out_file, "Dumping revoke block, sequence %u, at "
 			"block %u:\n", transaction, blocknr);
 
+	if (be32_to_cpu(jsb->s_feature_incompat) & JFS_FEATURE_INCOMPAT_64BIT)
+		tag_size = sizeof(__u64);
+
 	header = (journal_revoke_header_t *) buf;
 	offset = sizeof(journal_revoke_header_t);
 	max = be32_to_cpu(header->r_count);
 
 	while (offset < max) {
-		entry = (unsigned int *) (buf + offset);
-		rblock = be32_to_cpu(*entry);
+		if (tag_size == sizeof(__u32)) {
+			__u32 *entry = (__u32 *) (buf + offset);
+			rblock = be32_to_cpu(*entry);
+		} else {
+			__u64 *entry = (__u64 *) (buf + offset);
+			rblock = ext2fs_be64_to_cpu(*entry);
+		}
 		if (dump_all || rblock == block_to_dump) {
-			fprintf(out_file, "  Revoke FS block %u", rblock);
+			fprintf(out_file, "  Revoke FS block %llu", rblock);
 			if (dump_all)
 				fprintf(out_file, "\n");
 			else
 				fprintf(out_file," at block %u, sequence %u\n",
 					blocknr, transaction);
 		}
-		offset += 4;
+		offset += tag_size;
 	}
 }
 
diff --git a/tests/f_jnl_64bit/expect.0 b/tests/f_jnl_64bit/expect.0
index 2007f03..5cef2d8 100644
--- a/tests/f_jnl_64bit/expect.0
+++ b/tests/f_jnl_64bit/expect.0
@@ -1,189 +1,97 @@
 Journal starts at block 67, transaction 32
 Found expected sequence 32, type 5 (revoke table) at block 67
 Dumping revoke block, sequence 32, at block 67:
-  Revoke FS block 0
   Revoke FS block 1536
-  Revoke FS block 0
   Revoke FS block 1472
-  Revoke FS block 0
   Revoke FS block 1473
-  Revoke FS block 0
   Revoke FS block 1474
-  Revoke FS block 0
   Revoke FS block 1475
-  Revoke FS block 0
   Revoke FS block 1476
-  Revoke FS block 0
   Revoke FS block 1541
-  Revoke FS block 0
   Revoke FS block 1477
-  Revoke FS block 0
   Revoke FS block 1478
-  Revoke FS block 0
   Revoke FS block 1479
-  Revoke FS block 0
   Revoke FS block 1480
-  Revoke FS block 0
   Revoke FS block 1481
-  Revoke FS block 0
   Revoke FS block 1482
-  Revoke FS block 0
   Revoke FS block 1483
-  Revoke FS block 0
   Revoke FS block 1484
-  Revoke FS block 0
   Revoke FS block 1485
-  Revoke FS block 0
   Revoke FS block 1486
-  Revoke FS block 0
   Revoke FS block 1487
-  Revoke FS block 0
   Revoke FS block 1488
-  Revoke FS block 0
   Revoke FS block 1489
-  Revoke FS block 0
   Revoke FS block 1490
-  Revoke FS block 0
   Revoke FS block 1491
-  Revoke FS block 0
   Revoke FS block 1556
-  Revoke FS block 0
   Revoke FS block 1492
-  Revoke FS block 0
   Revoke FS block 1493
-  Revoke FS block 0
   Revoke FS block 1429
-  Revoke FS block 0
   Revoke FS block 1494
-  Revoke FS block 0
   Revoke FS block 1495
-  Revoke FS block 0
   Revoke FS block 1496
-  Revoke FS block 0
   Revoke FS block 1432
-  Revoke FS block 0
   Revoke FS block 1497
-  Revoke FS block 0
   Revoke FS block 1498
-  Revoke FS block 0
   Revoke FS block 1434
-  Revoke FS block 0
   Revoke FS block 1499
-  Revoke FS block 0
   Revoke FS block 1435
-  Revoke FS block 0
   Revoke FS block 1500
-  Revoke FS block 0
   Revoke FS block 1501
-  Revoke FS block 0
   Revoke FS block 1502
-  Revoke FS block 0
   Revoke FS block 1503
-  Revoke FS block 0
   Revoke FS block 1504
-  Revoke FS block 0
   Revoke FS block 1505
-  Revoke FS block 0
   Revoke FS block 1506
-  Revoke FS block 0
   Revoke FS block 1442
-  Revoke FS block 0
   Revoke FS block 1507
-  Revoke FS block 0
   Revoke FS block 1508
-  Revoke FS block 0
   Revoke FS block 1444
-  Revoke FS block 0
   Revoke FS block 1509
-  Revoke FS block 0
   Revoke FS block 1445
-  Revoke FS block 0
   Revoke FS block 1510
-  Revoke FS block 0
   Revoke FS block 1511
-  Revoke FS block 0
   Revoke FS block 1512
-  Revoke FS block 0
   Revoke FS block 1513
-  Revoke FS block 0
   Revoke FS block 1449
-  Revoke FS block 0
   Revoke FS block 1514
-  Revoke FS block 0
   Revoke FS block 1515
-  Revoke FS block 0
   Revoke FS block 1516
-  Revoke FS block 0
   Revoke FS block 1517
-  Revoke FS block 0
   Revoke FS block 1453
-  Revoke FS block 0
   Revoke FS block 1518
-  Revoke FS block 0
   Revoke FS block 1519
-  Revoke FS block 0
   Revoke FS block 1520
-  Revoke FS block 0
   Revoke FS block 1456
-  Revoke FS block 0
   Revoke FS block 1521
-  Revoke FS block 0
   Revoke FS block 1457
-  Revoke FS block 0
   Revoke FS block 1522
-  Revoke FS block 0
   Revoke FS block 1458
-  Revoke FS block 0
   Revoke FS block 1523
-  Revoke FS block 0
   Revoke FS block 1459
-  Revoke FS block 0
   Revoke FS block 1524
-  Revoke FS block 0
   Revoke FS block 1460
-  Revoke FS block 0
   Revoke FS block 1525
-  Revoke FS block 0
   Revoke FS block 1461
-  Revoke FS block 0
   Revoke FS block 1526
-  Revoke FS block 0
   Revoke FS block 1462
-  Revoke FS block 0
   Revoke FS block 1527
-  Revoke FS block 0
   Revoke FS block 1463
-  Revoke FS block 0
   Revoke FS block 1528
-  Revoke FS block 0
   Revoke FS block 1464
-  Revoke FS block 0
   Revoke FS block 1529
-  Revoke FS block 0
   Revoke FS block 1465
-  Revoke FS block 0
   Revoke FS block 1530
-  Revoke FS block 0
   Revoke FS block 1466
-  Revoke FS block 0
   Revoke FS block 1531
-  Revoke FS block 0
   Revoke FS block 1467
-  Revoke FS block 0
   Revoke FS block 1532
-  Revoke FS block 0
   Revoke FS block 1468
-  Revoke FS block 0
   Revoke FS block 1533
-  Revoke FS block 0
   Revoke FS block 1469
-  Revoke FS block 0
   Revoke FS block 1534
-  Revoke FS block 0
   Revoke FS block 1470
-  Revoke FS block 0
   Revoke FS block 1535
-  Revoke FS block 0
   Revoke FS block 1471
 Found expected sequence 32, type 1 (descriptor block) at block 68
 Dumping descriptor block, sequence 32, at block 68:
@@ -323,163 +231,84 @@ Dumping descriptor block, sequence 32, at block 150:
 Found expected sequence 32, type 2 (commit block) at block 201
 Found expected sequence 33, type 5 (revoke table) at block 202
 Dumping revoke block, sequence 33, at block 202:
-  Revoke FS block 0
   Revoke FS block 1600
-  Revoke FS block 0
   Revoke FS block 1601
-  Revoke FS block 0
   Revoke FS block 1537
-  Revoke FS block 0
   Revoke FS block 1602
-  Revoke FS block 0
   Revoke FS block 1538
-  Revoke FS block 0
   Revoke FS block 1603
-  Revoke FS block 0
   Revoke FS block 1539
-  Revoke FS block 0
   Revoke FS block 1604
-  Revoke FS block 0
   Revoke FS block 1540
-  Revoke FS block 0
   Revoke FS block 1605
-  Revoke FS block 0
   Revoke FS block 1606
-  Revoke FS block 0
   Revoke FS block 1542
-  Revoke FS block 0
   Revoke FS block 1607
-  Revoke FS block 0
   Revoke FS block 1543
-  Revoke FS block 0
   Revoke FS block 1608
-  Revoke FS block 0
   Revoke FS block 1544
-  Revoke FS block 0
   Revoke FS block 1609
-  Revoke FS block 0
   Revoke FS block 1545
-  Revoke FS block 0
   Revoke FS block 1610
-  Revoke FS block 0
   Revoke FS block 1546
-  Revoke FS block 0
   Revoke FS block 1611
-  Revoke FS block 0
   Revoke FS block 1547
-  Revoke FS block 0
   Revoke FS block 1612
-  Revoke FS block 0
   Revoke FS block 1548
-  Revoke FS block 0
   Revoke FS block 1613
-  Revoke FS block 0
   Revoke FS block 1549
-  Revoke FS block 0
   Revoke FS block 1614
-  Revoke FS block 0
   Revoke FS block 1550
-  Revoke FS block 0
   Revoke FS block 1615
-  Revoke FS block 0
   Revoke FS block 1551
-  Revoke FS block 0
   Revoke FS block 1616
-  Revoke FS block 0
   Revoke FS block 1552
-  Revoke FS block 0
   Revoke FS block 1617
-  Revoke FS block 0
   Revoke FS block 1553
-  Revoke FS block 0
   Revoke FS block 1554
-  Revoke FS block 0
   Revoke FS block 1555
-  Revoke FS block 0
   Revoke FS block 1557
-  Revoke FS block 0
   Revoke FS block 1558
-  Revoke FS block 0
   Revoke FS block 1559
-  Revoke FS block 0
   Revoke FS block 1560
-  Revoke FS block 0
   Revoke FS block 1561
-  Revoke FS block 0
   Revoke FS block 1562
-  Revoke FS block 0
   Revoke FS block 1563
-  Revoke FS block 0
   Revoke FS block 1564
-  Revoke FS block 0
   Revoke FS block 1565
-  Revoke FS block 0
   Revoke FS block 1566
-  Revoke FS block 0
   Revoke FS block 1567
-  Revoke FS block 0
   Revoke FS block 1568
-  Revoke FS block 0
   Revoke FS block 1569
-  Revoke FS block 0
   Revoke FS block 1570
-  Revoke FS block 0
   Revoke FS block 1571
-  Revoke FS block 0
   Revoke FS block 1572
-  Revoke FS block 0
   Revoke FS block 1573
-  Revoke FS block 0
   Revoke FS block 1574
-  Revoke FS block 0
   Revoke FS block 1575
-  Revoke FS block 0
   Revoke FS block 1576
-  Revoke FS block 0
   Revoke FS block 1577
-  Revoke FS block 0
   Revoke FS block 1578
-  Revoke FS block 0
   Revoke FS block 1579
-  Revoke FS block 0
   Revoke FS block 1580
-  Revoke FS block 0
   Revoke FS block 1581
-  Revoke FS block 0
   Revoke FS block 1582
-  Revoke FS block 0
   Revoke FS block 1583
-  Revoke FS block 0
   Revoke FS block 1584
-  Revoke FS block 0
   Revoke FS block 1585
-  Revoke FS block 0
   Revoke FS block 1586
-  Revoke FS block 0
   Revoke FS block 1587
-  Revoke FS block 0
   Revoke FS block 1588
-  Revoke FS block 0
   Revoke FS block 1589
-  Revoke FS block 0
   Revoke FS block 1590
-  Revoke FS block 0
   Revoke FS block 1591
-  Revoke FS block 0
   Revoke FS block 1592
-  Revoke FS block 0
   Revoke FS block 1593
-  Revoke FS block 0
   Revoke FS block 1594
-  Revoke FS block 0
   Revoke FS block 1595
-  Revoke FS block 0
   Revoke FS block 1596
-  Revoke FS block 0
   Revoke FS block 1597
-  Revoke FS block 0
   Revoke FS block 1598
-  Revoke FS block 0
   Revoke FS block 1599
 Found expected sequence 33, type 1 (descriptor block) at block 203
 Dumping descriptor block, sequence 33, at block 203:

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/37] debugfs: force logdump to display (old) journal contents
  2014-05-06  0:24     ` Darrick J. Wong
@ 2014-05-12  1:41       ` Theodore Ts'o
  2014-05-12  3:31         ` Theodore Ts'o
  2014-05-14  0:05         ` Darrick J. Wong
  0 siblings, 2 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-12  1:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Lukáš Czerner, linux-ext4

On Mon, May 05, 2014 at 05:24:53PM -0700, Darrick J. Wong wrote:
> I'll update the manpage.  -c seems to hexdump the contents of any block that we
> find while iterating the journal.  -b would seem to allow you to dump an
> arbitrary block #, but I could never get it to do that.

It's used to dump information _about_ an arbitrary block.  Here's an
example of some of the cool things you can do with logdump:

<tytso@closure> {/usr/projects/e2fsprogs/e2fsprogs}   (next)
1742% gunzip <  tests/f_jnl_32bit/image.gz  > /tmp/image
<tytso@closure> {/usr/projects/e2fsprogs/e2fsprogs}   (next)
1743% debugfs /tmp/image
debugfs 1.42.9 (4-Feb-2014)
debugfs:  logdump -b 680
Journal starts at block 1, transaction 2
  FS block 66 logged at sequence 3, journal block 8 (flags 0x2)
    (block bitmap for block 680: block is SET)
  FS block 680 logged at sequence 3, journal block 205 (flags 0x2)
  FS block 66 logged at sequence 4, journal block 231 (flags 0x2)
    (block bitmap for block 680: block is SET)
  FS block 680 logged at sequence 4, journal block 234 (flags 0x2)
  FS block 66 logged at sequence 5, journal block 339 (flags 0x2)
    (block bitmap for block 680: block is SET)
  FS block 680 logged at sequence 5, journal block 450 (flags 0x2)
No magic number at block 464: end of journal.
debugfs: icheck 680
Block	 Inode number
680	 2132
debugfs:  logdump -i <2132>
Inode 2132 is at group 1, block 364, offset 384
Journal starts at block 1, transaction 2
  FS block 364 logged at sequence 3, journal block 197 (flags 0x2)
    (inode block for inode 2132):
    Inode: 2132   Type: directory        Mode:  0755   Flags: 0x80000
    Generation: 3167953082    Version: 0x00000008
    User:     0   Group:     0   Size: 1024
    File ACL: 0    Directory ACL: 0
    Links: 9   Blockcount: 2
    Fragment:  Address: 0    Number: 0    Size: 0
    ctime: 0x4fa1639e -- Wed May  2 12:41:02 2012
    atime: 0x4fa1639e -- Wed May  2 12:41:02 2012
    mtime: 0x4fa1639e -- Wed May  2 12:41:02 2012
    Blocks:  (0+1): 127754 (1+1): 4 (5+1): 680 
  FS block 364 logged at sequence 4, journal block 233 (flags 0x2)
    (inode block for inode 2132):
    Inode: 2132   Type: directory        Mode:  0755   Flags: 0x80000
    Generation: 3167953082    Version: 0x0000000c
    User:     0   Group:     0   Size: 1024
    File ACL: 0    Directory ACL: 0
    Links: 13   Blockcount: 2
    Fragment:  Address: 0    Number: 0    Size: 0
    ctime: 0x4fa1639e -- Wed May  2 12:41:02 2012
    atime: 0x4fa1639e -- Wed May  2 12:41:02 2012
    mtime: 0x4fa1639e -- Wed May  2 12:41:02 2012
    Blocks:  (0+1): 127754 (1+1): 4 (5+1): 680 
  FS block 364 logged at sequence 5, journal block 434 (flags 0x2)
    (inode block for inode 2132):
    Inode: 2132   Type: directory        Mode:  0755   Flags: 0x80000
    Generation: 3167953082    Version: 0x00000015
    User:     0   Group:     0   Size: 1024
    File ACL: 0    Directory ACL: 0
    Links: 4   Blockcount: 2
    Fragment:  Address: 0    Number: 0    Size: 0
    ctime: 0x4fa163a7 -- Wed May  2 12:41:11 2012
    atime: 0x4fa163a7 -- Wed May  2 12:41:11 2012
    mtime: 0x4fa163a7 -- Wed May  2 12:41:11 2012
    Blocks:  (0+1): 127754 (1+1): 4 (5+1): 680 
No magic number at block 464: end of journal.
debugfs: quit

The idea is that this can be useful when debugging a potentially
corrupted journal, or for advanced file system recovery.

Note that logdump -c is most useful in combination with -b, for
example: "logdump -b 680 -c".

	  	      	  	   	- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/37] debugfs: force logdump to display (old) journal contents
  2014-05-12  1:41       ` Theodore Ts'o
@ 2014-05-12  3:31         ` Theodore Ts'o
  2014-05-14  0:05         ` Darrick J. Wong
  1 sibling, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-12  3:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Lukáš Czerner, linux-ext4

It is useful to be able to display only selected contents of the
already checkpointed transactions.  So instead of using -a twice, it's
better to define a new option, -O, to print the old journal entries.
This allows for commands such as "logdump -O -b 680".

     	    		      	 	  - Ted

commit 46272d5aa21fe879ca90a157485a2a3507e0a9b4
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Thu May 1 16:13:02 2014 -0700

    debugfs: force logdump to display (old) journal contents
    
    If the user passes the -O option to logdump, try to dump old log
    contents.  This can be used to try to track down journal problems even
    after the journal has been replayed.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index 7cda819..aacb223 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -389,7 +389,7 @@ which is a hard link to
 .IR filespec .
 Note this does not adjust the inode reference counts.
 .TP
-.BI logdump " [-acs] [-b block] [-i filespec] [-f journal_file] [output_file]"
+.BI logdump " [-acsO] [-b block] [-i filespec] [-f journal_file] [output_file]"
 Dump the contents of the ext3 journal.  By default, dump the journal inode as
 specified in the superblock.  However, this can be overridden with the
 .I \-i
@@ -420,6 +420,12 @@ the
 and
 .I \-b
 options.
+.IP
+The
+.I \-O
+option causes logdump to display old (checkpointed) journal entries.
+This can be used to try to track down journal problems even after the
+journal has been replayed.
 .TP
 .BI ls " [-l] [-c] [-d] [-p] filespec"
 Print a listing of the files in the directory
diff --git a/debugfs/logdump.c b/debugfs/logdump.c
index 211e11a..9f9594f 100644
--- a/debugfs/logdump.c
+++ b/debugfs/logdump.c
@@ -39,7 +39,7 @@ enum journal_location {JOURNAL_IS_INTERNAL, JOURNAL_IS_EXTERNAL};
 
 #define ANY_BLOCK ((blk64_t) -1)
 
-static int		dump_all, dump_contents, dump_descriptors;
+static int		dump_all, dump_old, dump_contents, dump_descriptors;
 static blk64_t		block_to_dump, bitmap_to_dump, inode_block_to_dump;
 static unsigned int	group_to_dump, inode_offset_to_dump;
 static ext2_ino_t	inode_to_dump;
@@ -94,6 +94,7 @@ void do_logdump(int argc, char **argv)
 	journal_source.fd = 0;
 	journal_source.file = 0;
 	dump_all = 0;
+	dump_old = 0;
 	dump_contents = 0;
 	dump_descriptors = 1;
 	block_to_dump = ANY_BLOCK;
@@ -102,7 +103,7 @@ void do_logdump(int argc, char **argv)
 	inode_to_dump = -1;
 
 	reset_getopt();
-	while ((c = getopt (argc, argv, "ab:ci:f:s")) != EOF) {
+	while ((c = getopt (argc, argv, "ab:ci:f:Os")) != EOF) {
 		switch (c) {
 		case 'a':
 			dump_all++;
@@ -126,6 +127,9 @@ void do_logdump(int argc, char **argv)
 			inode_spec = optarg;
 			dump_descriptors = 0;
 			break;
+		case 'O':
+			dump_old++;
+			break;
 		case 's':
 			use_sb++;
 			break;
@@ -267,7 +271,7 @@ errout:
 	return;
 
 print_usage:
-	fprintf(stderr, "%s: Usage: logdump [-acs] [-b<block>] [-i<filespec>]\n\t"
+	fprintf(stderr, "%s: Usage: logdump [-acsO] [-b<block>] [-i<filespec>]\n\t"
 		"[-f<journal_file>] [output_file]\n", argv[0]);
 }
 
@@ -393,9 +397,13 @@ static void dump_journal(char *cmdname, FILE *out_file,
 	fprintf(out_file, "Journal starts at block %u, transaction %u\n",
 		blocknr, transaction);
 
-	if (!blocknr)
+	if (!blocknr) {
 		/* Empty journal, nothing to do. */
-		return;
+		if (!dump_old)
+			return;
+		else
+			blocknr = 1;
+	}
 
 	while (1) {
 		retval = read_journal_block(cmdname, source,
@@ -420,7 +428,8 @@ static void dump_journal(char *cmdname, FILE *out_file,
 			fprintf (out_file, "Found sequence %u (not %u) at "
 				 "block %u: end of journal.\n",
 				 sequence, transaction, blocknr);
-			return;
+			if (!dump_old)
+				return;
 		}
 
 		if (dump_descriptors) {

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/37] resize2fs: fix check for collision between old GDT and superblock on sparse_super2 fs
  2014-05-01 23:13 ` [PATCH 07/37] resize2fs: fix check for collision between old GDT and superblock on sparse_super2 fs Darrick J. Wong
@ 2014-05-12  3:35   ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-12  3:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, May 01, 2014 at 04:13:08PM -0700, Darrick J. Wong wrote:
> In reserve_sparse_super2_last_group, the old_desc check should only be
> performed if ext2fs_super_and_bgd_loc2() gave us a location -- a
> return value of 0 means that there is no old-style GDT block.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/37] mke2fs: set gdt csum when creating packed fs
  2014-05-02 11:55   ` Lukáš Czerner
@ 2014-05-12  4:22     ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-05-12  4:22 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: Darrick J. Wong, linux-ext4

On Fri, May 02, 2014 at 01:55:59PM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Darrick J. Wong wrote:
> 
> > Date: Thu, 01 May 2014 16:13:15 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH 08/37] mke2fs: set gdt csum when creating packed fs
> > 
> > When we're creating a fs with metadata blocks packed at the beginning
> > (packed_meta_blocks=1 in mke2fs.conf), set the group descriptor
> > checksum or else we create DOA filesystems with checksum errors.
> 
> Makes sense. Thanks!
> 
> Reviewed-by: Lukas Czerner <lczerner@redhat.com>

Thanks, applied.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks
  2014-05-08  0:13           ` [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks Darrick J. Wong
@ 2014-05-13 11:11             ` Lukáš Czerner
  0 siblings, 0 replies; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-13 11:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5397 bytes --]

On Wed, 7 May 2014, Darrick J. Wong wrote:

> Date: Wed, 7 May 2014 17:13:37 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in
>     ext2fs_zero_blocks
> 
> Here's the first part, which teaches the IO manager how to connect with the
> zero out ioctls.
> 
> --D
> ---
> Plumb a new call into the IO manager to support translating
> ext2fs_zero_blocks calls into the equivalent kernel-level BLKZEROOUT
> ioctl or FALLOC_FL_ZERO_RANGE fallocate flag primitives.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/ext2_io.h    |    7 +++++-
>  lib/ext2fs/io_manager.c |   11 ++++++++++
>  lib/ext2fs/mkjournal.c  |    6 +++++
>  lib/ext2fs/unix_io.c    |   54 +++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
> index 1894fb8..98d56aa 100644
> --- a/lib/ext2fs/ext2_io.h
> +++ b/lib/ext2fs/ext2_io.h
> @@ -90,7 +90,9 @@ struct struct_io_manager {
>  					int count, const void *data);
>  	errcode_t (*discard)(io_channel channel, unsigned long long block,
>  			     unsigned long long count);
> -	long	reserved[16];
> +	errcode_t (*zeroout)(io_channel channel, unsigned long long block,
> +			     unsigned long long count);
> +	long	reserved[15];
>  };
>  
>  #define IO_FLAG_RW		0x0001
> @@ -122,6 +124,9 @@ extern errcode_t io_channel_write_blk64(io_channel channel,
>  extern errcode_t io_channel_discard(io_channel channel,
>  				    unsigned long long block,
>  				    unsigned long long count);
> +extern errcode_t io_channel_zeroout(io_channel channel,
> +				    unsigned long long block,
> +				    unsigned long long count);
>  extern errcode_t io_channel_alloc_buf(io_channel channel,
>  				      int count, void *ptr);
>  
> diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
> index 34e4859..569d16a 100644
> --- a/lib/ext2fs/io_manager.c
> +++ b/lib/ext2fs/io_manager.c
> @@ -112,6 +112,17 @@ errcode_t io_channel_discard(io_channel channel, unsigned long long block,
>  	return EXT2_ET_UNIMPLEMENTED;
>  }
>  
> +errcode_t io_channel_zeroout(io_channel channel, unsigned long long block,
> +			     unsigned long long count)
> +{
> +	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
> +
> +	if (channel->manager->zeroout)
> +		return (channel->manager->zeroout)(channel, block, count);
> +
> +	return EXT2_ET_UNIMPLEMENTED;
> +}
> +
>  errcode_t io_channel_alloc_buf(io_channel io, int count, void *ptr)
>  {
>  	size_t	size;
> diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> index 884d9c0..339c7e1 100644
> --- a/lib/ext2fs/mkjournal.c
> +++ b/lib/ext2fs/mkjournal.c
> @@ -167,6 +167,12 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
>  		}
>  		return 0;
>  	}
> +
> +	/* Try a zero out command, if supported */
> +	retval = io_channel_zeroout(fs->io, blk, num);
> +	if (retval == 0)
> +		return 0;

I guess that this should have been in the second patch ? But it's
not a big deal. It looks good.

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

Thanks!
-Lukas

> +
>  	/* Allocate the zeroizing buffer if necessary */
>  	if (!buf) {
>  		buf = malloc(fs->blocksize * STRIDE_LENGTH);
> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> index c3185b6..d070cb0 100644
> --- a/lib/ext2fs/unix_io.c
> +++ b/lib/ext2fs/unix_io.c
> @@ -922,6 +922,59 @@ unimplemented:
>  	return EXT2_ET_UNIMPLEMENTED;
>  }
>  
> +#if defined(__linux__) && !defined(BLKZEROOUT)
> +#define BLKZEROOUT		_IO(0x12,127)
> +#endif
> +
> +#if defined(__linux__) && !defined(FALLOC_FL_ZERO_RANGE)
> +#define FALLOC_FL_ZERO_RANGE    0x10
> +#endif
> +
> +static errcode_t unix_zeroout(io_channel channel, unsigned long long block,
> +			      unsigned long long count)
> +{
> +	struct unix_private_data *data;
> +	int		ret;
> +
> +	EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
> +	data = (struct unix_private_data *) channel->private_data;
> +	EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
> +
> +	if (channel->flags & CHANNEL_FLAGS_BLOCK_DEVICE) {
> +#ifdef BLKZEROOUT
> +		__u64 range[2];
> +
> +		range[0] = (__u64)(block) * channel->block_size;
> +		range[1] = (__u64)(count) * channel->block_size;
> +
> +		ret = ioctl(data->dev, BLKZEROOUT, &range);
> +#else
> +		goto unimplemented;
> +#endif
> +	} else {
> +#if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE)
> +		/*
> +		 * If we are not on block device, try to use the zero out
> +		 * primitive.
> +		 */
> +		ret = fallocate(data->dev,
> +				FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE,
> +				(off_t)(block) * channel->block_size,
> +				(off_t)(count) * channel->block_size);
> +#else
> +		goto unimplemented;
> +#endif
> +	}
> +	if (ret < 0) {
> +		if (errno == EOPNOTSUPP)
> +			goto unimplemented;
> +		return errno;
> +	}
> +	return 0;
> +unimplemented:
> +	return EXT2_ET_UNIMPLEMENTED;
> +}
> +
>  static struct struct_io_manager struct_unix_manager = {
>  	EXT2_ET_MAGIC_IO_MANAGER,
>  	"Unix I/O Manager",
> @@ -937,6 +990,7 @@ static struct struct_io_manager struct_unix_manager = {
>  	unix_read_blk64,
>  	unix_write_blk64,
>  	unix_discard,
> +	unix_zeroout,
>  };
>  
>  io_manager unix_io_manager = &struct_unix_manager;
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/37] debugfs: force logdump to display (old) journal contents
  2014-05-12  1:41       ` Theodore Ts'o
  2014-05-12  3:31         ` Theodore Ts'o
@ 2014-05-14  0:05         ` Darrick J. Wong
  1 sibling, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-14  0:05 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Lukáš Czerner, linux-ext4

On Sun, May 11, 2014 at 09:41:19PM -0400, Theodore Ts'o wrote:
> On Mon, May 05, 2014 at 05:24:53PM -0700, Darrick J. Wong wrote:
> > I'll update the manpage.  -c seems to hexdump the contents of any block that we
> > find while iterating the journal.  -b would seem to allow you to dump an
> > arbitrary block #, but I could never get it to do that.
> 
> It's used to dump information _about_ an arbitrary block.  Here's an
> example of some of the cool things you can do with logdump:

Oh, -b is for FS physical blocks, not for logical blocks in the journal itself,
I get it!  Thanks for pointing that out! :)

The patch (in the other email) looks fine.

--D
> 
> <tytso@closure> {/usr/projects/e2fsprogs/e2fsprogs}   (next)
> 1742% gunzip <  tests/f_jnl_32bit/image.gz  > /tmp/image
> <tytso@closure> {/usr/projects/e2fsprogs/e2fsprogs}   (next)
> 1743% debugfs /tmp/image
> debugfs 1.42.9 (4-Feb-2014)
> debugfs:  logdump -b 680
> Journal starts at block 1, transaction 2
>   FS block 66 logged at sequence 3, journal block 8 (flags 0x2)
>     (block bitmap for block 680: block is SET)
>   FS block 680 logged at sequence 3, journal block 205 (flags 0x2)
>   FS block 66 logged at sequence 4, journal block 231 (flags 0x2)
>     (block bitmap for block 680: block is SET)
>   FS block 680 logged at sequence 4, journal block 234 (flags 0x2)
>   FS block 66 logged at sequence 5, journal block 339 (flags 0x2)
>     (block bitmap for block 680: block is SET)
>   FS block 680 logged at sequence 5, journal block 450 (flags 0x2)
> No magic number at block 464: end of journal.
> debugfs: icheck 680
> Block	 Inode number
> 680	 2132
> debugfs:  logdump -i <2132>
> Inode 2132 is at group 1, block 364, offset 384
> Journal starts at block 1, transaction 2
>   FS block 364 logged at sequence 3, journal block 197 (flags 0x2)
>     (inode block for inode 2132):
>     Inode: 2132   Type: directory        Mode:  0755   Flags: 0x80000
>     Generation: 3167953082    Version: 0x00000008
>     User:     0   Group:     0   Size: 1024
>     File ACL: 0    Directory ACL: 0
>     Links: 9   Blockcount: 2
>     Fragment:  Address: 0    Number: 0    Size: 0
>     ctime: 0x4fa1639e -- Wed May  2 12:41:02 2012
>     atime: 0x4fa1639e -- Wed May  2 12:41:02 2012
>     mtime: 0x4fa1639e -- Wed May  2 12:41:02 2012
>     Blocks:  (0+1): 127754 (1+1): 4 (5+1): 680 
>   FS block 364 logged at sequence 4, journal block 233 (flags 0x2)
>     (inode block for inode 2132):
>     Inode: 2132   Type: directory        Mode:  0755   Flags: 0x80000
>     Generation: 3167953082    Version: 0x0000000c
>     User:     0   Group:     0   Size: 1024
>     File ACL: 0    Directory ACL: 0
>     Links: 13   Blockcount: 2
>     Fragment:  Address: 0    Number: 0    Size: 0
>     ctime: 0x4fa1639e -- Wed May  2 12:41:02 2012
>     atime: 0x4fa1639e -- Wed May  2 12:41:02 2012
>     mtime: 0x4fa1639e -- Wed May  2 12:41:02 2012
>     Blocks:  (0+1): 127754 (1+1): 4 (5+1): 680 
>   FS block 364 logged at sequence 5, journal block 434 (flags 0x2)
>     (inode block for inode 2132):
>     Inode: 2132   Type: directory        Mode:  0755   Flags: 0x80000
>     Generation: 3167953082    Version: 0x00000015
>     User:     0   Group:     0   Size: 1024
>     File ACL: 0    Directory ACL: 0
>     Links: 4   Blockcount: 2
>     Fragment:  Address: 0    Number: 0    Size: 0
>     ctime: 0x4fa163a7 -- Wed May  2 12:41:11 2012
>     atime: 0x4fa163a7 -- Wed May  2 12:41:11 2012
>     mtime: 0x4fa163a7 -- Wed May  2 12:41:11 2012
>     Blocks:  (0+1): 127754 (1+1): 4 (5+1): 680 
> No magic number at block 464: end of journal.
> debugfs: quit
> 
> The idea is that this can be useful when debugging a potentially
> corrupted journal, or for advanced file system recovery.
> 
> Note that logdump -c is most useful in combination with -b, for
> example: "logdump -b 680 -c".
> 
> 	  	      	  	   	- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-08  0:14           ` [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
@ 2014-05-27 16:28             ` Lukáš Czerner
  2014-05-28 19:48               ` Darrick J. Wong
  0 siblings, 1 reply; 91+ messages in thread
From: Lukáš Czerner @ 2014-05-27 16:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3566 bytes --]

On Wed, 7 May 2014, Darrick J. Wong wrote:

> Date: Wed, 7 May 2014 17:14:47 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> Subject: [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2()
> 
> Here's the second part, which for the most part is the old patch, but wired up
> to use the bits in the other patch.

Looks good

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

but I wonder how effective will this be with zero range when we're
doing this on block at the time.

Thanks!
-Lukas

> 
> --D
> ---
> In order to support fallocate, we need to be able to have
> ext2fs_bmap2() allocate blocks and put them into uninitialized
> extents.  There's a flag to do this in the extent code, but it's not
> exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
> or somebody will use it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/bmap.c   |   24 ++++++++++++++++++++++--
>  lib/ext2fs/ext2fs.h |    1 +
>  2 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> index c1d0e6f..a4dc8ef 100644
> --- a/lib/ext2fs/bmap.c
> +++ b/lib/ext2fs/bmap.c
> @@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
>  					    block_buf + fs->blocksize, &b);
>  		if (retval)
>  			return retval;
> +		if (flags & BMAP_UNINIT) {
> +			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
> +			if (retval)
> +				return retval;
> +		}
>  
>  #ifdef WORDS_BIGENDIAN
>  		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
> @@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
>  	errcode_t		retval = 0;
>  	blk64_t			blk64 = 0;
>  	int			alloc = 0;
> +	int			set_flags;
> +
> +	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
>  
>  	if (bmap_flags & BMAP_SET) {
>  		retval = ext2fs_extent_set_bmap(handle, block,
> -						*phys_blk, 0);
> +						*phys_blk, set_flags);
>  		return retval;
>  	}
>  	retval = ext2fs_extent_goto(handle, block);
> @@ -254,7 +262,7 @@ got_block:
>  		alloc++;
>  	set_extent:
>  		retval = ext2fs_extent_set_bmap(handle, block,
> -						blk64, 0);
> +						blk64, set_flags);
>  		if (retval) {
>  			ext2fs_block_alloc_stats2(fs, blk64, -1);
>  			return retval;
> @@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
>  		goto done;
>  	}
>  
> +	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
> +		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
> +		if (retval)
> +			goto done;
> +	}
> +
>  	if (block < EXT2_NDIR_BLOCKS) {
>  		if (bmap_flags & BMAP_SET) {
>  			b = *phys_blk;
> @@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
>  			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
>  			if (retval)
>  				goto done;
> +			if (bmap_flags & BMAP_UNINIT) {
> +				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
> +							     NULL);
> +				if (retval)
> +					goto done;
> +			}
>  			inode_bmap(inode, block) = b;
>  			blocks_alloc++;
>  			*phys_blk = b;
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 599c972..819a14a 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
>   */
>  #define BMAP_ALLOC	0x0001
>  #define BMAP_SET	0x0002
> +#define BMAP_UNINIT	0x0004
>  
>  /*
>   * Returned flags from ext2fs_bmap
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2()
  2014-05-27 16:28             ` Lukáš Czerner
@ 2014-05-28 19:48               ` Darrick J. Wong
  0 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-05-28 19:48 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Tue, May 27, 2014 at 06:28:22PM +0200, Lukáš Czerner wrote:
> On Wed, 7 May 2014, Darrick J. Wong wrote:
> 
> > Date: Wed, 7 May 2014 17:14:47 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: Lukáš Czerner <lczerner@redhat.com>
> > Cc: tytso@mit.edu, linux-ext4@vger.kernel.org
> > Subject: [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2()
> > 
> > Here's the second part, which for the most part is the old patch, but wired up
> > to use the bits in the other patch.
> 
> Looks good
> 
> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> 
> but I wonder how effective will this be with zero range when we're
> doing this on block at the time.

Not a lot, but hopefully you used ext2fs_fallocate() (and extents) if you
wanted more than one block.

--D
> 
> Thanks!
> -Lukas
> 
> > 
> > --D
> > ---
> > In order to support fallocate, we need to be able to have
> > ext2fs_bmap2() allocate blocks and put them into uninitialized
> > extents.  There's a flag to do this in the extent code, but it's not
> > exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
> > or somebody will use it.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  lib/ext2fs/bmap.c   |   24 ++++++++++++++++++++++--
> >  lib/ext2fs/ext2fs.h |    1 +
> >  2 files changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> > index c1d0e6f..a4dc8ef 100644
> > --- a/lib/ext2fs/bmap.c
> > +++ b/lib/ext2fs/bmap.c
> > @@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
> >  					    block_buf + fs->blocksize, &b);
> >  		if (retval)
> >  			return retval;
> > +		if (flags & BMAP_UNINIT) {
> > +			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
> > +			if (retval)
> > +				return retval;
> > +		}
> >  
> >  #ifdef WORDS_BIGENDIAN
> >  		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
> > @@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
> >  	errcode_t		retval = 0;
> >  	blk64_t			blk64 = 0;
> >  	int			alloc = 0;
> > +	int			set_flags;
> > +
> > +	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
> >  
> >  	if (bmap_flags & BMAP_SET) {
> >  		retval = ext2fs_extent_set_bmap(handle, block,
> > -						*phys_blk, 0);
> > +						*phys_blk, set_flags);
> >  		return retval;
> >  	}
> >  	retval = ext2fs_extent_goto(handle, block);
> > @@ -254,7 +262,7 @@ got_block:
> >  		alloc++;
> >  	set_extent:
> >  		retval = ext2fs_extent_set_bmap(handle, block,
> > -						blk64, 0);
> > +						blk64, set_flags);
> >  		if (retval) {
> >  			ext2fs_block_alloc_stats2(fs, blk64, -1);
> >  			return retval;
> > @@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> >  		goto done;
> >  	}
> >  
> > +	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
> > +		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
> > +		if (retval)
> > +			goto done;
> > +	}
> > +
> >  	if (block < EXT2_NDIR_BLOCKS) {
> >  		if (bmap_flags & BMAP_SET) {
> >  			b = *phys_blk;
> > @@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
> >  			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
> >  			if (retval)
> >  				goto done;
> > +			if (bmap_flags & BMAP_UNINIT) {
> > +				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
> > +							     NULL);
> > +				if (retval)
> > +					goto done;
> > +			}
> >  			inode_bmap(inode, block) = b;
> >  			blocks_alloc++;
> >  			*phys_blk = b;
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 599c972..819a14a 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
> >   */
> >  #define BMAP_ALLOC	0x0001
> >  #define BMAP_SET	0x0002
> > +#define BMAP_UNINIT	0x0004
> >  
> >  /*
> >   * Returned flags from ext2fs_bmap
> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 28/37] libext2fs: provide a function to set inode size
  2014-05-01 23:15 ` [PATCH 28/37] libext2fs: provide a function to set inode size Darrick J. Wong
@ 2014-07-26 18:37   ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-07-26 18:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, May 01, 2014 at 04:15:26PM -0700, Darrick J. Wong wrote:
> Provide an API to set i_size in an inode and take care of all required
> feature flag modifications.  Refactor the code to use this new
> function.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, although I moved the function to blk_num.c and renamed it to
be ext2fs_inode_size_set() to be consistent with the other functions
in blk_num.c.  I also added another use of the function in
misc/mk_hugefile.c.

							- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 24/37] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2014-05-01 23:14 ` [PATCH 24/37] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2014-07-28 22:25   ` Darrick J. Wong
  0 siblings, 0 replies; 91+ messages in thread
From: Darrick J. Wong @ 2014-07-28 22:25 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, May 01, 2014 at 04:14:59PM -0700, Darrick J. Wong wrote:
> e2fsck pass1 is modified to use the block group data prefetch function
> to try to fetch the inode tables into the pagecache before it is
> needed.  In order to avoid cache thrashing, we limit ourselves to
> prefetching at most half the available memory.
> 
> pass2 is modified to use the dirblock prefetching function to prefetch
> the list of directory blocks that are assembled in pass1.  So long as
> we don't anticipate rehashing the dirs (pass 3a), we can release the
> dirblocks as soon as we're done checking them.
> 
> pass4 is modified to prefetch the block and inode bitmaps in
> anticipation of pass 5, because pass4 is entirely CPU bound.
> 
> In general, these mechanisms can halve fsck time, if the host system
> has sufficient memory and the storage system can provide a lot of
> IOPs.  SSDs and multi-spindle RAIDs see the most speedup; single disks
> experience a modest speedup, and single-spindle USB mass storage
> devices see hardly any benefit.
> 
> By default, readahead will try to fill half the physical memory in the
> system.  The -E readahead_mem_kb= option can be given to specify the
> amount of memory to use for readahead, or zero to disable it entirely;
> or an option can be given in e2fsck.conf.

Ted wondered how much speed we gain from using pthreads to spawn the readahead
calls, instead of leaving the main thread in charge of scheduling its own
readahead.  From what I can tell, the extra thread reduces run time by about 2%
(as compared to the no-readahead-at-all run times) on a spinning RAID1 I have.
For SSDs I couldn't see much of a difference, so I guess I can drop the pthread
part.

>From what I can tell, a ^flexbg FS with a lot of directory blocks stands to
gain the most from pthreads.  However, I don't think it'll make much difference
either way.

--D
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  MCONFIG.in              |    1 
>  configure               |   49 +++++++++++++++++
>  configure.in            |    6 ++
>  e2fsck/Makefile.in      |    4 +
>  e2fsck/e2fsck.8.in      |    5 ++
>  e2fsck/e2fsck.c         |  136 +++++++++++++++++++++++++++++++++++++++++++++++
>  e2fsck/e2fsck.conf.5.in |   13 ++++
>  e2fsck/e2fsck.h         |   25 +++++++++
>  e2fsck/pass1.c          |  106 ++++++++++++++++++++++++++++++++++++-
>  e2fsck/pass2.c          |   95 ++++++++++++++++++++++++++++++++-
>  e2fsck/pass4.c          |   22 ++++++++
>  e2fsck/prof_err.et      |    1 
>  e2fsck/rehash.c         |   10 +++
>  e2fsck/unix.c           |   35 ++++++++++++
>  e2fsck/util.c           |   51 ++++++++++++++++++
>  lib/config.h.in         |    9 +++
>  16 files changed, 563 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/MCONFIG.in b/MCONFIG.in
> index 7e520be..352c133 100644
> --- a/MCONFIG.in
> +++ b/MCONFIG.in
> @@ -116,6 +116,7 @@ LIBUUID = @LIBUUID@ @SOCKET_LIB@
>  LIBQUOTA = @STATIC_LIBQUOTA@
>  LIBBLKID = @LIBBLKID@ @PRIVATE_LIBS_CMT@ $(LIBUUID)
>  LIBINTL = @LIBINTL@
> +LIBPTHREADS = @PTHREADS_LIB@
>  SYSLIBS = @LIBS@
>  DEPLIBSS = $(LIB)/libss@LIB_EXT@
>  DEPLIBCOM_ERR = $(LIB)/libcom_err@LIB_EXT@
> diff --git a/configure b/configure
> index 7b0a0d1..5b89229 100755
> --- a/configure
> +++ b/configure
> @@ -639,6 +639,7 @@ CYGWIN_CMT
>  LINUX_CMT
>  UNI_DIFF_OPTS
>  SEM_INIT_LIB
> +PTHREADS_LIB
>  SOCKET_LIB
>  SIZEOF_OFF_T
>  SIZEOF_LONG_LONG
> @@ -10474,7 +10475,7 @@ fi
>  done
>  
>  fi
> -for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
> +for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
>  do :
>    as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
>  ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
> @@ -11235,6 +11236,52 @@ if test $ac_cv_have_optreset = yes; then
>  $as_echo "#define HAVE_OPTRESET 1" >>confdefs.h
>  
>  fi
> +PTHREADS_LIB='-lpthread'
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for pthread_create in -lpthread" >&5
> +$as_echo_n "checking for pthread_create in -lpthread... " >&6; }
> +if ${ac_cv_lib_pthread_pthread_create+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_check_lib_save_LIBS=$LIBS
> +LIBS="-lpthread  $LIBS"
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +/* Override any GCC internal prototype to avoid an error.
> +   Use char because int might match the return type of a GCC
> +   builtin and then its argument prototype would still apply.  */
> +#ifdef __cplusplus
> +extern "C"
> +#endif
> +char pthread_create ();
> +int
> +main ()
> +{
> +return pthread_create ();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_link "$LINENO"; then :
> +  ac_cv_lib_pthread_pthread_create=yes
> +else
> +  ac_cv_lib_pthread_pthread_create=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +    conftest$ac_exeext conftest.$ac_ext
> +LIBS=$ac_check_lib_save_LIBS
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_pthread_pthread_create" >&5
> +$as_echo "$ac_cv_lib_pthread_pthread_create" >&6; }
> +if test "x$ac_cv_lib_pthread_pthread_create" = xyes; then :
> +  cat >>confdefs.h <<_ACEOF
> +#define HAVE_LIBPTHREAD 1
> +_ACEOF
> +
> +  LIBS="-lpthread $LIBS"
> +
> +fi
> +
>  
>  SEM_INIT_LIB=''
>  ac_fn_c_check_func "$LINENO" "sem_init" "ac_cv_func_sem_init"
> diff --git a/configure.in b/configure.in
> index f28bd46..d2cfe41 100644
> --- a/configure.in
> +++ b/configure.in
> @@ -961,6 +961,7 @@ AC_CHECK_HEADERS(m4_flatten([
>  	sys/sockio.h
>  	sys/stat.h
>  	sys/syscall.h
> +	sys/sysctl.h
>  	sys/sysmacros.h
>  	sys/time.h
>  	sys/types.h
> @@ -1173,6 +1174,11 @@ if test $ac_cv_have_optreset = yes; then
>    AC_DEFINE(HAVE_OPTRESET, 1, [Define to 1 if optreset for getopt is present])
>  fi
>  dnl
> +dnl Test for pthread_create in -lpthread
> +dnl
> +PTHREADS_LIB='-lpthread'
> +AC_CHECK_LIB(pthread, pthread_create, AC_SUBST(PTHREADS_LIB))
> +dnl
>  dnl Test for sem_init, and which library it might require:
>  dnl
>  AH_TEMPLATE([HAVE_SEM_INIT], [Define to 1 if sem_init() exists])
> diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
> index 2e08982..548df9c 100644
> --- a/e2fsck/Makefile.in
> +++ b/e2fsck/Makefile.in
> @@ -16,13 +16,13 @@ MANPAGES=	e2fsck.8
>  FMANPAGES=	e2fsck.conf.5
>  
>  LIBS= $(LIBQUOTA) $(LIBEXT2FS) $(LIBCOM_ERR) $(LIBBLKID) $(LIBUUID) \
> -	$(LIBINTL) $(LIBE2P) $(SYSLIBS)
> +	$(LIBINTL) $(LIBE2P) $(SYSLIBS) $(LIBPTHREADS)
>  DEPLIBS= $(DEPLIBQUOTA) $(LIBEXT2FS) $(DEPLIBCOM_ERR) $(DEPLIBBLKID) \
>  	 $(DEPLIBUUID) $(DEPLIBE2P)
>  
>  STATIC_LIBS= $(STATIC_LIBQUOTA) $(STATIC_LIBEXT2FS) $(STATIC_LIBCOM_ERR) \
>  	     $(STATIC_LIBBLKID) $(STATIC_LIBUUID) $(LIBINTL) $(STATIC_LIBE2P) \
> -	     $(SYSLIBS)
> +	     $(SYSLIBS) $(LIBPTHEADS)
>  STATIC_DEPLIBS= $(DEPSTATIC_LIBQUOTA) $(STATIC_LIBEXT2FS) \
>  		$(DEPSTATIC_LIBCOM_ERR) $(DEPSTATIC_LIBBLKID) \
>  		$(DEPSTATIC_LIBUUID) $(DEPSTATIC_LIBE2P)
> diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
> index 43ee063..820281d 100644
> --- a/e2fsck/e2fsck.8.in
> +++ b/e2fsck/e2fsck.8.in
> @@ -208,6 +208,11 @@ option may prevent you from further manual data recovery.
>  Do not attempt to discard free blocks and unused inode blocks. This option is
>  exactly the opposite of discard option. This is set as default.
>  .TP
> +.BI readahead_mem_kb
> +Use at most this many KiB to pre-fetch metadata in the hopes of reducing
> +e2fsck runtime.  By default, this uses half the physical memory in the
> +system; setting this value to zero disables readahead entirely.
> +.TP
>  .BI strict_csums
>  Verify each metadata object's checksum before checking anything other fields
>  in the metadata object.  If the verification fails, offer to clear the item,
> diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
> index 0ec1540..c5d823c 100644
> --- a/e2fsck/e2fsck.c
> +++ b/e2fsck/e2fsck.c
> @@ -15,6 +15,10 @@
>  #include "e2fsck.h"
>  #include "problem.h"
>  
> +#ifdef HAVE_PTHREAD_H
> +#include <pthread.h>
> +#endif
> +
>  /*
>   * This function allocates an e2fsck context
>   */
> @@ -44,6 +48,8 @@ errcode_t e2fsck_allocate_context(e2fsck_t *ret)
>  			context->flags |= E2F_FLAG_TIME_INSANE;
>  	}
>  
> +	e2fsck_init_thread(&context->ra_thread);
> +
>  	*ret = context;
>  	return 0;
>  }
> @@ -209,6 +215,7 @@ int e2fsck_run(e2fsck_t ctx)
>  {
>  	int	i;
>  	pass_t	e2fsck_pass;
> +	errcode_t	err;
>  
>  #ifdef HAVE_SETJMP_H
>  	if (setjmp(ctx->abort_loc)) {
> @@ -226,6 +233,10 @@ int e2fsck_run(e2fsck_t ctx)
>  		e2fsck_pass(ctx);
>  		if (ctx->progress)
>  			(void) (ctx->progress)(ctx, 0, 0, 0);
> +		err = e2fsck_stop_thread(&ctx->ra_thread, NULL);
> +		if (err)
> +			com_err(ctx->program_name, err, "%s",
> +				_("while stopping readahead"));
>  	}
>  	ctx->flags &= ~E2F_FLAG_SETJMP_OK;
>  
> @@ -233,3 +244,128 @@ int e2fsck_run(e2fsck_t ctx)
>  		return (ctx->flags & E2F_FLAG_RUN_RETURN);
>  	return 0;
>  }
> +
> +#ifdef HAVE_PTHREAD_H
> +struct run_threaded {
> +	struct e2fsck_thread *thread;
> +	void * (*func)(void *);
> +	void (*cleanup)(void *);
> +	void *arg;
> +};
> +
> +static void run_threaded_cleanup(void *p)
> +{
> +	struct run_threaded *rt = p;
> +
> +	if (rt->cleanup)
> +		rt->cleanup(rt->arg);
> +	pthread_mutex_lock(&rt->thread->lock);
> +	rt->thread->running = 0;
> +	pthread_mutex_unlock(&rt->thread->lock);
> +	ext2fs_free_mem(&rt);
> +}
> +
> +static void *run_threaded_helper(void *p)
> +{
> +	int old;
> +	struct run_threaded *rt = p;
> +	void *ret;
> +
> +	pthread_cleanup_push(run_threaded_cleanup, rt);
> +	pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
> +	ret = rt->func(rt->arg);
> +	pthread_setcanceltype(old, NULL);
> +	pthread_cleanup_pop(1);
> +	pthread_exit(ret);
> +	return NULL;
> +}
> +#endif /* HAVE_PTHREAD_H */
> +
> +errcode_t e2fsck_init_thread(struct e2fsck_thread *thread)
> +{
> +	errcode_t err = 0;
> +
> +	thread->magic = E2FSCK_ET_MAGIC_RUN_THREAD;
> +#ifdef HAVE_PTHREAD_H
> +	err = pthread_mutex_init(&thread->lock, NULL);
> +#endif /* HAVE_PTHREAD_H */
> +
> +	return err;
> +}
> +
> +errcode_t e2fsck_run_thread(struct e2fsck_thread *thread,
> +			    void * (*func)(void *), void (*cleanup)(void *),
> +			    void *arg)
> +{
> +#ifdef HAVE_PTHREAD_H
> +	struct run_threaded *rt;
> +#endif
> +	errcode_t err = 0, err2;
> +
> +	EXT2_CHECK_MAGIC(thread, E2FSCK_ET_MAGIC_RUN_THREAD);
> +#ifdef HAVE_PTHREAD_H
> +	err = pthread_mutex_lock(&thread->lock);
> +	if (err)
> +		return err;
> +
> +	if (thread->running) {
> +		err = EAGAIN;
> +		goto out;
> +	}
> +
> +	err = pthread_join(thread->tid, NULL);
> +	if (err && err != ESRCH)
> +		goto out;
> +
> +	err = ext2fs_get_mem(sizeof(*rt), &rt);
> +	if (err)
> +		goto out;
> +
> +	rt->thread = thread;
> +	rt->func = func;
> +	rt->cleanup = cleanup;
> +	rt->arg = arg;
> +
> +	err = pthread_create(&thread->tid, NULL, run_threaded_helper, rt);
> +	if (err)
> +		ext2fs_free_mem(&rt);
> +	else
> +		thread->running = 1;
> +out:
> +	pthread_mutex_unlock(&thread->lock);
> +#else
> +	thread->ret = func(arg);
> +	if (cleanup)
> +		cleanup(arg);
> +#endif /* HAVE_PTHREAD_H */
> +
> +	return err;
> +}
> +
> +errcode_t e2fsck_stop_thread(struct e2fsck_thread *thread, void **ret)
> +{
> +	errcode_t err = 0, err2;
> +
> +	EXT2_CHECK_MAGIC(thread, E2FSCK_ET_MAGIC_RUN_THREAD);
> +
> +#ifdef HAVE_PTHREAD_H
> +	err = pthread_mutex_lock(&thread->lock);
> +	if (err)
> +		return err;
> +	if (thread->running)
> +		err = pthread_cancel(thread->tid);
> +	if (err == ESRCH)
> +		err = 0;
> +	err2 = pthread_mutex_unlock(&thread->lock);
> +	if (!err && err2)
> +		err = err2;
> +	if (!err)
> +		err = pthread_join(thread->tid, ret);
> +	if (err == ESRCH)
> +		err = 0;
> +#else
> +	if (ret)
> +		*ret = thread->ret;
> +#endif
> +	return err;
> +}
> diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
> index a8219a8..fcda392 100644
> --- a/e2fsck/e2fsck.conf.5.in
> +++ b/e2fsck/e2fsck.conf.5.in
> @@ -205,6 +205,19 @@ of that type are squelched.  This can be useful if the console is slow
>  (i.e., connected to a serial port) and so a large amount of output could
>  end up delaying the boot process for a long time (potentially hours).
>  .TP
> +.I readahead_mem_pct
> +Use no more than this percentage of memory to try to read in metadata blocks
> +ahead of the main e2fsck thread.  This should reduce run times, depending on
> +the speed of the underlying storage and the amount of free memory.  By default,
> +this is set to 50%.
> +.TP
> +.I readahead_mem_kb
> +Use no more than this amount of memory to read in metadata blocks ahead of the
> +main checking thread.  Setting this value to zero disables readahead entirely.
> +There is no default, but see
> +.B readahead_mem_pct
> +for more details.
> +.TP
>  .I report_features
>  If this boolean relation is true, e2fsck will print the file system
>  features as part of its verbose reporting (i.e., if the
> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> index c739329..59045bc 100644
> --- a/e2fsck/e2fsck.h
> +++ b/e2fsck/e2fsck.h
> @@ -11,6 +11,7 @@
>  
>  #include <stdio.h>
>  #include <string.h>
> +#include <stdint.h>
>  #ifdef HAVE_UNISTD_H
>  #include <unistd.h>
>  #endif
> @@ -69,6 +70,24 @@
>  
>  #include "quota/mkquota.h"
>  
> +/* Functions to run something asynchronously */
> +struct e2fsck_thread {
> +	int magic;
> +#ifdef HAVE_PTHREAD_H
> +	int running;
> +	pthread_t tid;
> +	pthread_mutex_t lock;
> +#else
> +	void *ret;
> +#endif /* HAVE_PTHREAD_T */
> +};
> +
> +errcode_t e2fsck_init_thread(struct e2fsck_thread *thread);
> +errcode_t e2fsck_run_thread(struct e2fsck_thread *thread,
> +			    void * (*func)(void *), void (*cleanup)(void *),
> +			    void *arg);
> +errcode_t e2fsck_stop_thread(struct e2fsck_thread *thread, void **ret);
> +
>  /*
>   * Exit codes used by fsck-type programs
>   */
> @@ -373,6 +392,10 @@ struct e2fsck_struct {
>  	 * e2fsck functions themselves.
>  	 */
>  	void *priv_data;
> +
> +	/* How much are we allowed to readahead? */
> +	unsigned long long readahead_mem_kb;
> +	struct e2fsck_thread ra_thread;
>  };
>  
>  /* Used by the region allocation code */
> @@ -507,6 +530,7 @@ void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino);
>  int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino);
>  errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino);
>  void e2fsck_rehash_directories(e2fsck_t ctx);
> +int e2fsck_will_rehash_dirs(e2fsck_t ctx);
>  
>  /* sigcatcher.c */
>  void sigcatcher_setup(void);
> @@ -585,6 +609,7 @@ extern errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs,
>  						   int default_type,
>  						   const char *profile_name,
>  						   ext2fs_block_bitmap *ret);
> +int64_t get_memory_size(void);
>  
>  /* unix.c */
>  extern void e2fsck_clear_progbar(e2fsck_t ctx);
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index eb9497c..376ee23 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -589,6 +589,67 @@ static errcode_t recheck_bad_inode_checksum(ext2_filsys fs, ext2_ino_t ino,
>  	return 0;
>  }
>  
> +struct pass1ra_ctx {
> +	ext2_filsys fs;
> +	dgrp_t group;
> +	dgrp_t ngroups;
> +};
> +
> +static void pass1_readahead_cleanup(void *p)
> +{
> +	struct pass1ra_ctx *c = p;
> +
> +	ext2fs_free_mem(&p);
> +}
> +
> +static void *pass1_readahead(void *p)
> +{
> +	struct pass1ra_ctx *c = p;
> +	errcode_t err;
> +
> +	e2fsck_readahead(c->fs, E2FSCK_READA_ITABLE, c->group, c->ngroups);
> +	return NULL;
> +}
> +
> +static errcode_t initiate_readahead(e2fsck_t ctx, dgrp_t group, dgrp_t ngroups)
> +{
> +	struct pass1ra_ctx *ractx;
> +	errcode_t err;
> +
> +	err = ext2fs_get_mem(sizeof(*ractx), &ractx);
> +	if (err)
> +		return err;
> +
> +	ractx->fs = ctx->fs;
> +	ractx->group = group;
> +	ractx->ngroups = ngroups;
> +
> +	err = e2fsck_run_thread(&ctx->ra_thread, pass1_readahead,
> +				pass1_readahead_cleanup, ractx);
> +	if (err)
> +		ext2fs_free_mem(&ractx);
> +
> +	return err;
> +}
> +
> +static ext2_ino_t estimate_next_ra_inode(ext2_filsys fs, dgrp_t start,
> +					 dgrp_t end)
> +{
> +	ext2_ino_t inodes_per_group = fs->super->s_inodes_per_group;
> +	dgrp_t grp;
> +
> +	if (end >= fs->group_desc_count)
> +		end = fs->group_desc_count - 1;
> +
> +	for (grp = end; grp >= start; grp--) {
> +		if (ext2fs_bg_flags_test(fs, grp, EXT2_BG_INODE_UNINIT))
> +			continue;
> +		return grp * inodes_per_group;
> +	}
> +
> +	return end * inodes_per_group;
> +}
> +
>  void e2fsck_pass1(e2fsck_t ctx)
>  {
>  	int	i;
> @@ -611,10 +672,40 @@ void e2fsck_pass1(e2fsck_t ctx)
>  	int		busted_fs_time = 0;
>  	int		inode_size;
>  	int		failed_csum = 0;
> +	dgrp_t		grp = 0;
> +	ext2_ino_t	ra_threshold = 0, ino_threshold;
> +	dgrp_t		ra_groups = 0;
> +	ext2_ino_t	inodes_per_group = fs->super->s_inodes_per_group;
> +	errcode_t	err;
>  
>  	init_resource_track(&rtrack, ctx->fs->io);
>  	clear_problem_context(&pctx);
>  
> +	/* If we can do readahead, figure out how many groups to pull in. */
> +	if (!e2fsck_can_readahead(ctx->fs))
> +		ctx->readahead_mem_kb = 0;
> +	if (ctx->readahead_mem_kb) {
> +		ra_groups = ctx->readahead_mem_kb /
> +			    (fs->inode_blocks_per_group * fs->blocksize /
> +			     1024);
> +		if (ra_groups > fs->group_desc_count)
> +			ra_groups = fs->group_desc_count;
> +		if (ra_groups < 16)
> +			ra_groups = 0;
> +		if (ra_groups) {
> +			err = initiate_readahead(ctx, grp, ra_groups);
> +			if (err) {
> +				com_err(ctx->program_name, err, "%s",
> +					_("while starting pass1 readahead"));
> +				ra_groups = 0;
> +			}
> +			ra_threshold = ra_groups *
> +				       inodes_per_group;
> +			ino_threshold = estimate_next_ra_inode(fs, 0,
> +					ra_groups * 9 / 10);
> +		}
> +	}
> +
>  	if (!(ctx->options & E2F_OPT_PREEN))
>  		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
>  
> @@ -774,10 +865,23 @@ void e2fsck_pass1(e2fsck_t ctx)
>  	(void) e2fsck_get_lost_and_found(ctx, 0);
>  
>  	while (1) {
> -		if (ino % (fs->super->s_inodes_per_group * 4) == 1) {
> +
> +		if (ino % (inodes_per_group * 4) == 1) {
>  			if (e2fsck_mmp_update(fs))
>  				fatal_error(ctx, 0);
>  		}
> +		if (ra_groups > 0 && ino > ino_threshold) {
> +			grp = (ra_threshold - 1) / inodes_per_group;
> +			err = initiate_readahead(ctx, grp, ra_groups);
> +			if (err == EAGAIN)
> +				ra_groups /= 2;
> +			else if (err)
> +				com_err(ctx->program_name, err, "%s",
> +					_("while starting pass1 readahead"));
> +			ra_threshold += ra_groups * inodes_per_group;
> +			ino_threshold = estimate_next_ra_inode(fs, grp,
> +					grp + (ra_groups * 9 / 10));
> +		}
>  		old_op = ehandler_operation(_("getting next inode from scan"));
>  		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
>  							  inode, inode_size);
> diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
> index 95f51b7..1667292 100644
> --- a/e2fsck/pass2.c
> +++ b/e2fsck/pass2.c
> @@ -61,6 +61,9 @@
>   * Keeps track of how many times an inode is referenced.
>   */
>  static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf);
> +static int check_dir_block2(ext2_filsys fs,
> +			   struct ext2_db_entry2 *dir_blocks_info,
> +			   void *priv_data);
>  static int check_dir_block(ext2_filsys fs,
>  			   struct ext2_db_entry2 *dir_blocks_info,
>  			   void *priv_data);
> @@ -77,8 +80,67 @@ struct check_dir_struct {
>  	struct problem_context	pctx;
>  	int	count, max;
>  	e2fsck_t ctx;
> +	int	save_readahead;
> +};
> +
> +struct pass2_readahead_data {
> +	ext2_filsys fs;
> +	ext2_dblist dblist;
>  };
>  
> +static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
> +			       void *priv_data)
> +{
> +	db->blockcnt = 1;
> +	return 0;
> +}
> +
> +static void pass2_readahead_cleanup(void *p)
> +{
> +	struct pass2_readahead_data *pr = p;
> +
> +	ext2fs_free_dblist(pr->dblist);
> +	ext2fs_free_mem(&pr);
> +}
> +
> +static void *pass2_readahead(void *p)
> +{
> +	struct pass2_readahead_data *pr = p;
> +
> +	e2fsck_readahead_dblist(pr->fs, 0, pr->dblist);
> +	return NULL;
> +}
> +
> +static errcode_t initiate_readahead(e2fsck_t ctx)
> +{
> +	struct pass2_readahead_data *pr;
> +	errcode_t err;
> +
> +	err = ext2fs_get_mem(sizeof(*pr), &pr);
> +	if (err)
> +		return err;
> +	pr->fs = ctx->fs;
> +	err = ext2fs_copy_dblist(ctx->fs->dblist, &pr->dblist);
> +	if (err)
> +		goto out_pr;
> +	err = ext2fs_dblist_iterate2(pr->dblist, readahead_dir_block,
> +				     NULL);
> +	if (err)
> +		goto out_dblist;
> +	err = e2fsck_run_thread(&ctx->ra_thread, pass2_readahead,
> +				pass2_readahead_cleanup, pr);
> +	if (err)
> +		goto out_dblist;
> +
> +	return 0;
> +
> +out_dblist:
> +	ext2fs_free_dblist(pr->dblist);
> +out_pr:
> +	ext2fs_free_mem(&pr);
> +	return err;
> +}
> +
>  void e2fsck_pass2(e2fsck_t ctx)
>  {
>  	struct ext2_super_block *sb = ctx->fs->super;
> @@ -96,6 +158,10 @@ void e2fsck_pass2(e2fsck_t ctx)
>  	int			i, depth;
>  	problem_t		code;
>  	int			bad_dir;
> +	int (*check_dir_func)(ext2_filsys fs,
> +			      struct ext2_db_entry2 *dir_blocks_info,
> +			      void *priv_data);
> +	errcode_t		err;
>  
>  	init_resource_track(&rtrack, ctx->fs->io);
>  	clear_problem_context(&cd.pctx);
> @@ -139,6 +205,7 @@ void e2fsck_pass2(e2fsck_t ctx)
>  	cd.ctx = ctx;
>  	cd.count = 1;
>  	cd.max = ext2fs_dblist_count2(fs->dblist);
> +	cd.save_readahead = e2fsck_will_rehash_dirs(ctx);
>  
>  	if (ctx->progress)
>  		(void) (ctx->progress)(ctx, 2, 0, cd.max);
> @@ -146,7 +213,16 @@ void e2fsck_pass2(e2fsck_t ctx)
>  	if (fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX)
>  		ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);
>  
> -	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_block,
> +	if (ctx->readahead_mem_kb) {
> +		check_dir_func = check_dir_block2;
> +		err = initiate_readahead(ctx);
> +		if (err)
> +			com_err(ctx->program_name, err, "%s",
> +				_("while starting pass2 readahead"));
> +	} else
> +		check_dir_func = check_dir_block;
> +
> +	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_func,
>  						 &cd);
>  	if (ctx->flags & E2F_FLAG_SIGNAL_MASK || ctx->flags & E2F_FLAG_RESTART)
>  		return;
> @@ -655,6 +731,7 @@ clear_and_exit:
>  	clear_htree(cd->ctx, cd->pctx.ino);
>  	dx_dir->numblocks = 0;
>  	e2fsck_rehash_dir_later(cd->ctx, cd->pctx.ino);
> +	cd->save_readahead = 1;
>  }
>  #endif /* ENABLE_HTREE */
>  
> @@ -774,6 +851,19 @@ static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
>  	return 0;
>  }
>  
> +static int check_dir_block2(ext2_filsys fs,
> +			   struct ext2_db_entry2 *db,
> +			   void *priv_data)
> +{
> +	int err;
> +	struct check_dir_struct *cd = priv_data;
> +
> +	err = check_dir_block(fs, db, priv_data);
> +	if (!cd->save_readahead)
> +		io_channel_cache_release(fs->io, db->blk, 1);
> +	return err;
> +}
> +
>  static int check_dir_block(ext2_filsys fs,
>  			   struct ext2_db_entry2 *db,
>  			   void *priv_data)
> @@ -957,6 +1047,7 @@ out_htree:
>  					 &cd->pctx))
>  				goto skip_checksum;
>  			e2fsck_rehash_dir_later(ctx, ino);
> +			cd->save_readahead = 1;
>  			goto skip_checksum;
>  		}
>  		if (failed_csum) {
> @@ -1249,6 +1340,7 @@ skip_checksum:
>  			pctx.dirent = dirent;
>  			fix_problem(ctx, PR_2_REPORT_DUP_DIRENT, &pctx);
>  			e2fsck_rehash_dir_later(ctx, ino);
> +			cd->save_readahead = 1;
>  			dups_found++;
>  		} else
>  			dict_alloc_insert(&de_dict, dirent, dirent);
> @@ -1316,6 +1408,7 @@ skip_checksum:
>  			if (insert_dirent_tail(fs, buf) == 0)
>  				goto write_and_fix;
>  			e2fsck_rehash_dir_later(ctx, ino);
> +			cd->save_readahead = 1;
>  		}
>  
>  write_and_fix:
> diff --git a/e2fsck/pass4.c b/e2fsck/pass4.c
> index 21d93f0..6cebfa3 100644
> --- a/e2fsck/pass4.c
> +++ b/e2fsck/pass4.c
> @@ -87,6 +87,21 @@ static int disconnect_inode(e2fsck_t ctx, ext2_ino_t i,
>  	return 0;
>  }
>  
> +/* Since pass4 is mostly CPU bound, start readahead of bitmaps for pass 5. */
> +static void *pass5_readahead(void *p)
> +{
> +	ext2_filsys fs = p;
> +
> +	e2fsck_readahead(fs, E2FSCK_READA_BBITMAP | E2FSCK_READA_IBITMAP, 0,
> +			 fs->group_desc_count);
> +	return NULL;
> +}
> +
> +static errcode_t initiate_readahead(e2fsck_t ctx)
> +{
> +	return e2fsck_run_thread(&ctx->ra_thread, pass5_readahead, NULL,
> +				 ctx->fs);
> +}
>  
>  void e2fsck_pass4(e2fsck_t ctx)
>  {
> @@ -100,12 +115,19 @@ void e2fsck_pass4(e2fsck_t ctx)
>  	__u16	link_count, link_counted;
>  	char	*buf = 0;
>  	dgrp_t	group, maxgroup;
> +	errcode_t	err;
>  
>  	init_resource_track(&rtrack, ctx->fs->io);
>  
>  #ifdef MTRACE
>  	mtrace_print("Pass 4");
>  #endif
> +	if (ctx->readahead_mem_kb) {
> +		err = initiate_readahead(ctx);
> +		if (err)
> +			com_err(ctx->program_name, err, "%s",
> +				_("while starting pass5 readahead"));
> +	}
>  
>  	clear_problem_context(&pctx);
>  
> diff --git a/e2fsck/prof_err.et b/e2fsck/prof_err.et
> index c9316c7..21fb524 100644
> --- a/e2fsck/prof_err.et
> +++ b/e2fsck/prof_err.et
> @@ -62,5 +62,6 @@ error_code	PROF_BAD_INTEGER,		"Invalid integer value"
>  
>  error_code	PROF_MAGIC_FILE_DATA, "Bad magic value in profile_file_data_t"
>  
> +error_code	E2FSCK_ET_MAGIC_RUN_THREAD,	"Wrong magic number for e2fsck_thread structure"
>  
>  end
> diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
> index 3b05715..89708c2 100644
> --- a/e2fsck/rehash.c
> +++ b/e2fsck/rehash.c
> @@ -71,6 +71,16 @@ int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino)
>  	return ext2fs_u32_list_test(ctx->dirs_to_hash, ino);
>  }
>  
> +/* Ask if there will be a pass 3A. */
> +int e2fsck_will_rehash_dirs(e2fsck_t ctx)
> +{
> +	if (ctx->options & E2F_OPT_COMPRESS_DIRS)
> +		return 1;
> +	if (!ctx->dirs_to_hash)
> +		return 0;
> +	return ext2fs_u32_list_count(ctx->dirs_to_hash) > 0;
> +}
> +
>  struct fill_dir_struct {
>  	char *buf;
>  	struct ext2_inode *inode;
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index c6cdb49..da888c2 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -643,6 +643,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  	char	*buf, *token, *next, *p, *arg;
>  	int	ea_ver;
>  	int	extended_usage = 0;
> +	unsigned long long reada_kb;
>  
>  	buf = string_copy(ctx, opts, 0);
>  	for (token = buf; token && *token; token = next) {
> @@ -671,6 +672,15 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  				continue;
>  			}
>  			ctx->ext_attr_ver = ea_ver;
> +		} else if (strcmp(token, "readahead_mem_kb") == 0) {
> +			reada_kb = strtoull(arg, &p, 0);
> +			if (*p) {
> +				fprintf(stderr, "%s",
> +					_("Invalid readahead buffer size.\n"));
> +				extended_usage++;
> +				continue;
> +			}
> +			ctx->readahead_mem_kb = reada_kb;
>  		} else if (strcmp(token, "fragcheck") == 0) {
>  			ctx->options |= E2F_OPT_FRAGCHECK;
>  			continue;
> @@ -716,6 +726,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  		fputs(("\tnodiscard\n"), stderr);
>  		fputs(("\tstrict_csums\n"), stderr);
>  		fputs(("\tno_strict_csums\n"), stderr);
> +		fputs(("\treadahead_mem_kb=<buffer size>\n"), stderr);
>  		fputc('\n', stderr);
>  		exit(1);
>  	}
> @@ -749,6 +760,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
>  #ifdef CONFIG_JBD_DEBUG
>  	char 		*jbd_debug;
>  #endif
> +	unsigned long long phys_mem_kb;
>  
>  	retval = e2fsck_allocate_context(&ctx);
>  	if (retval)
> @@ -776,6 +788,8 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
>  	else
>  		ctx->program_name = "e2fsck";
>  
> +	phys_mem_kb = get_memory_size() / 1024;
> +	ctx->readahead_mem_kb = ~0ULL;
>  	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
>  		switch (c) {
>  		case 'C':
> @@ -965,6 +979,22 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
>  	if (c)
>  		verbose = 1;
>  
> +	/* Figure out how much memory goes to readahead */
> +	if (ctx->readahead_mem_kb == ~0ULL) {
> +		profile_get_integer(ctx->profile, "options",
> +				    "readahead_mem_pct", 0, 50, &c);
> +		if (c >= 0 && c <= 100)
> +			ctx->readahead_mem_kb = phys_mem_kb * c / 100;
> +		else
> +			ctx->readahead_mem_kb = phys_mem_kb / 2;
> +		profile_get_integer(ctx->profile, "options",
> +				    "readahead_mem_kb", 0, -1, &c);
> +		if (c >= 0)
> +			ctx->readahead_mem_kb = c;
> +	}
> +	if (ctx->readahead_mem_kb > phys_mem_kb)
> +		ctx->readahead_mem_kb = phys_mem_kb;
> +
>  	/* Turn off discard in read-only mode */
>  	if ((ctx->options & E2F_OPT_NO) &&
>  	    (ctx->options & E2F_OPT_DISCARD))
> @@ -1781,6 +1811,11 @@ no_journal:
>  		}
>  	}
>  
> +	retval = e2fsck_stop_thread(&ctx->ra_thread, NULL);
> +	if (retval)
> +		com_err(ctx->program_name, retval, "%s",
> +			_("while stopping readahead"));
> +
>  	e2fsck_write_bitmaps(ctx);
>  	io_channel_flush(ctx->fs->io);
>  	print_resource_track(ctx, NULL, &ctx->global_rtrack, ctx->fs->io);
> diff --git a/e2fsck/util.c b/e2fsck/util.c
> index fec6179..09b78c2 100644
> --- a/e2fsck/util.c
> +++ b/e2fsck/util.c
> @@ -37,6 +37,10 @@
>  #include <errno.h>
>  #endif
>  
> +#ifdef HAVE_SYS_SYSCTL_H
> +#include <sys/sysctl.h>
> +#endif
> +
>  #include "e2fsck.h"
>  
>  extern e2fsck_t e2fsck_global_ctx;   /* Try your very best not to use this! */
> @@ -845,3 +849,50 @@ errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs, const char *descr,
>  	fs->default_bitmap_type = save_type;
>  	return retval;
>  }
> +
> +/* Return memory size in bytes */
> +int64_t get_memory_size(void)
> +{
> +#if defined(_SC_PHYS_PAGES)
> +# if defined(_SC_PAGESIZE)
> +	return (int64_t)sysconf(_SC_PHYS_PAGES) *
> +	       (int64_t)sysconf(_SC_PAGESIZE);
> +# elif defined(_SC_PAGE_SIZE)
> +	return (int64_t)sysconf(_SC_PHYS_PAGES) *
> +	       (int64_t)sysconf(_SC_PAGE_SIZE);
> +# endif
> +#elif defined(_SC_AIX_REALMEM)
> +	return (int64_t)sysconf(_SC_AIX_REALMEM) * (int64_t)1024L;
> +#elif defined(CTL_HW)
> +# if (defined(HW_MEMSIZE) || defined(HW_PHYSMEM64))
> +#  define CTL_HW_INT64
> +# elif (defined(HW_PHYSMEM) || defined(HW_REALMEM))
> +#  define CTL_HW_UINT
> +# endif
> +	int mib[2];
> +	mib[0] = CTL_HW;
> +# if defined(HW_MEMSIZE)
> +	mib[1] = HW_MEMSIZE;
> +# elif defined(HW_PHYSMEM64)
> +	mib[1] = HW_PHYSMEM64;
> +# elif defined(HW_REALMEM)
> +	mib[1] = HW_REALMEM;
> +# elif defined(HW_PYSMEM)
> +	mib[1] = HW_PHYSMEM;
> +# endif
> +# if defined(CTL_HW_INT64)
> +	int64_t size = 0;
> +# elif defined(CTL_HW_UINT)
> +	unsigned int size = 0;
> +# endif
> +# if defined(CTL_HW_INT64) || defined(CTL_HW_UINT)
> +	size_t len = sizeof(size);
> +	if (sysctl(mib, 2, &size, &len, NULL, 0) == 0)
> +		return (int64_t)size;
> +# endif
> +	return 0;
> +#else
> +# warning "Don't know how to detect memory on your platform?"
> +	return 0;
> +#endif
> +}
> diff --git a/lib/config.h.in b/lib/config.h.in
> index e0384ee..836c2df 100644
> --- a/lib/config.h.in
> +++ b/lib/config.h.in
> @@ -203,6 +203,9 @@
>  /* Define if your <locale.h> file defines LC_MESSAGES. */
>  #undef HAVE_LC_MESSAGES
>  
> +/* Define to 1 if you have the `pthread' library (-lpthread). */
> +#undef HAVE_LIBPTHREAD
> +
>  /* Define to 1 if you have the <limits.h> header file. */
>  #undef HAVE_LIMITS_H
>  
> @@ -314,6 +317,9 @@
>  /* Define to 1 if you have the `pread' function. */
>  #undef HAVE_PREAD
>  
> +/* Define to 1 if you have the <pthread.h> header file. */
> +#undef HAVE_PTHREAD_H
> +
>  /* Define to 1 if you have the `putenv' function. */
>  #undef HAVE_PUTENV
>  
> @@ -465,6 +471,9 @@
>  /* Define to 1 if you have the <sys/syscall.h> header file. */
>  #undef HAVE_SYS_SYSCALL_H
>  
> +/* Define to 1 if you have the <sys/sysctl.h> header file. */
> +#undef HAVE_SYS_SYSCTL_H
> +
>  /* Define to 1 if you have the <sys/sysmacros.h> header file. */
>  #undef HAVE_SYS_SYSMACROS_H
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 21/37] libext2fs: have UNIX IO manager use pread/pwrite
  2014-05-01 23:14 ` [PATCH 21/37] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
@ 2014-08-02 23:16   ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-08-02 23:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, May 01, 2014 at 04:14:39PM -0700, Darrick J. Wong wrote:
> If pread/pwrite are present, have the UNIX IO manager use them for
> aligned IOs (instead of the current seek -> read/write), thereby
> saving us a (minor) amount of system call overhead.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 25/37] libext2fs: when appending to a file, don't split an index block in equal halves
  2014-05-01 23:15 ` [PATCH 25/37] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
@ 2014-08-02 23:43   ` Theodore Ts'o
  0 siblings, 0 replies; 91+ messages in thread
From: Theodore Ts'o @ 2014-08-02 23:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, May 01, 2014 at 04:15:05PM -0700, Darrick J. Wong wrote:
> When we're appending an extent to the end of a file and the index
> block is full, don't split the index block into two half-full index
> blocks because this leaves us with under utilized index blocks, at
> least in the fallocate case.  Instead, copy the last extent from the
> full block into the new block.  This isn't perfect utilization, but
> there's a lot of work involved in teaching extent.c to be able to goto
> a nonexistent node in a newly allocated (and empty) extent block.
> 
> This patch does not fix the general problem of keeping the extent tree
> balanced.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2014-08-02 23:46 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 23:12 [PATCH 00/37] e2fsprogs patchbomb 5/14 Darrick J. Wong
2014-05-01 23:12 ` [PATCH 01/37] misc: create better-packaged static analysis reports Darrick J. Wong
2014-05-11 22:33   ` Theodore Ts'o
2014-05-01 23:12 ` [PATCH 02/37] misc: coverity fixes Darrick J. Wong
2014-05-02 11:17   ` Lukáš Czerner
2014-05-05 20:04     ` Darrick J. Wong
2014-05-11 22:40       ` Theodore Ts'o
2014-05-01 23:12 ` [PATCH 03/37] libext2fs: create sockets when populating filesystem Darrick J. Wong
2014-05-02 11:22   ` Lukáš Czerner
2014-05-05 20:08     ` Darrick J. Wong
2014-05-11 22:44       ` Theodore Ts'o
2014-05-01 23:12 ` [PATCH 04/37] mke2fs: always warn if 128-byte inode and inline_data Darrick J. Wong
2014-05-02 11:27   ` Lukáš Czerner
2014-05-05 20:10     ` Darrick J. Wong
2014-05-12  0:26       ` Theodore Ts'o
2014-05-01 23:12 ` [PATCH 05/37] debugfs: teach logdump to deal with 64bit revoke tables Darrick J. Wong
2014-05-02 11:38   ` Lukáš Czerner
2014-05-05 22:23     ` Darrick J. Wong
2014-05-06 11:35       ` Lukáš Czerner
2014-05-12  1:20         ` Theodore Ts'o
2014-05-01 23:13 ` [PATCH 06/37] debugfs: force logdump to display (old) journal contents Darrick J. Wong
2014-05-02 11:49   ` Lukáš Czerner
2014-05-06  0:24     ` Darrick J. Wong
2014-05-12  1:41       ` Theodore Ts'o
2014-05-12  3:31         ` Theodore Ts'o
2014-05-14  0:05         ` Darrick J. Wong
2014-05-01 23:13 ` [PATCH 07/37] resize2fs: fix check for collision between old GDT and superblock on sparse_super2 fs Darrick J. Wong
2014-05-12  3:35   ` Theodore Ts'o
2014-05-01 23:13 ` [PATCH 08/37] mke2fs: set gdt csum when creating packed fs Darrick J. Wong
2014-05-02 11:55   ` Lukáš Czerner
2014-05-12  4:22     ` Theodore Ts'o
2014-05-01 23:13 ` [PATCH 09/37] mke2fs: set error behavior at initialization time Darrick J. Wong
2014-05-02 12:13   ` Lukáš Czerner
2014-05-01 23:13 ` [PATCH 10/37] e2fsck: verify checksums after checking everything else Darrick J. Wong
2014-05-02 12:32   ` Lukáš Czerner
2014-05-05 22:56     ` Darrick J. Wong
2014-05-06 11:32       ` Lukáš Czerner
2014-05-08  0:05         ` Darrick J. Wong
2014-05-01 23:13 ` [PATCH 11/37] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
2014-05-02 12:46   ` Lukáš Czerner
2014-05-05 23:08     ` Darrick J. Wong
2014-05-06 10:12       ` Lukáš Czerner
2014-05-01 23:13 ` [PATCH 12/37] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
2014-05-02 12:54   ` Lukáš Czerner
2014-05-05 23:16     ` Darrick J. Wong
2014-05-01 23:13 ` [PATCH 13/37] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
2014-05-05 17:13   ` Lukáš Czerner
2014-05-01 23:13 ` [PATCH 14/37] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
2014-05-05 17:20   ` Lukáš Czerner
2014-05-01 23:14 ` [PATCH 15/37] mke2fs: set block_validity as a default mount option Darrick J. Wong
2014-05-05 17:24   ` Lukáš Czerner
2014-05-01 23:14 ` [PATCH 16/37] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2014-05-06 15:45   ` Lukáš Czerner
2014-05-06 19:59     ` Darrick J. Wong
2014-05-07 10:02       ` Lukáš Czerner
2014-05-07 21:37         ` Darrick J. Wong
2014-05-08  0:13           ` [PATCH 1/2] libext2fs: support BLKZEROOUT/FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks Darrick J. Wong
2014-05-13 11:11             ` Lukáš Czerner
2014-05-08  0:14           ` [PATCH 2/2] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2014-05-27 16:28             ` Lukáš Czerner
2014-05-28 19:48               ` Darrick J. Wong
2014-05-01 23:14 ` [PATCH 17/37] libext2fs: file IO routines should handle uninit blocks Darrick J. Wong
2014-05-01 23:14 ` [PATCH 18/37] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
2014-05-01 23:14 ` [PATCH 19/37] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
2014-05-01 23:14 ` [PATCH 20/37] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
2014-05-01 23:14 ` [PATCH 21/37] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
2014-08-02 23:16   ` Theodore Ts'o
2014-05-01 23:14 ` [PATCH 22/37] ext2fs: add readahead method to improve scanning Darrick J. Wong
2014-05-01 23:14 ` [PATCH 23/37] e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2014-05-01 23:14 ` [PATCH 24/37] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2014-07-28 22:25   ` Darrick J. Wong
2014-05-01 23:15 ` [PATCH 25/37] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
2014-08-02 23:43   ` Theodore Ts'o
2014-05-01 23:15 ` [PATCH 26/37] libext2fs: find inode goal when allocating blocks Darrick J. Wong
2014-05-01 23:15 ` [PATCH 27/37] libext2fs: find a range of empty blocks Darrick J. Wong
2014-05-01 23:15 ` [PATCH 28/37] libext2fs: provide a function to set inode size Darrick J. Wong
2014-07-26 18:37   ` Theodore Ts'o
2014-05-01 23:15 ` [PATCH 29/37] libext2fs: implement fallocate Darrick J. Wong
2014-05-01 23:15 ` [PATCH 31/37] fuse2fs: translate ACL structures Darrick J. Wong
2014-05-01 23:15 ` [PATCH 32/37] fuse2fs: handle 64-bit dates correctly Darrick J. Wong
2014-05-01 23:16 ` [PATCH 33/37] fuse2fs: implement fallocate Darrick J. Wong
2014-05-01 23:16 ` [PATCH 35/37] tests: enable using fuse2fs with metadata checksum test Darrick J. Wong
2014-05-01 23:16 ` [PATCH 36/37] tests: test date handling Darrick J. Wong
2014-05-01 23:16 ` [PATCH 37/37] ext5: define new subtype to add features and reduce testing complexity Darrick J. Wong
2014-05-02  9:45   ` Lukáš Czerner
2014-05-02 14:04     ` Theodore Ts'o
2014-05-06  1:59       ` Darrick J. Wong
2014-05-06  1:33     ` Darrick J. Wong
2014-05-06 12:50       ` Lukáš Czerner
2014-05-06 15:21         ` Theodore Ts'o
2014-05-06 15:30           ` Lukáš Czerner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.