All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/49] e2fsprogs patchbomb 3/14
@ 2014-03-11  6:53 Darrick J. Wong
  2014-03-11  6:54 ` [PATCH 01/49] create_inode: clean up return mess in do_write_internal Darrick J. Wong
                   ` (46 more replies)
  0 siblings, 47 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:53 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

I wasn't expecting to re-spam the list quite so soon, but since
inline_data and create_inode went in last week, most changes are in
patches 1-5, 8-16, and 23-26.  Since the giant mailing in December,
most changes have been in patches 22-27 and 34-42.  The first 27
patches are bugfixes for existing functionality; everything after is
new stuff.  (Well, much of it's been out for review for a while...)

The first five patches fix numerous problems in create_inode.c
relating to incorrect error handling, style problems, whitespace
problems.  They also clean up the mixing of debugfs/mke2fs' global
variables, and do a proper job managing populate_fs' internal state --
this should not be handled by callers to populate_fs.

Patches 6-7 provide some minor tweaks to the extended
attribute editing code that had been sitting (unreleased :/) in my
tree when Ted pulled in v4 of the extended attribute patches.  Most
notable is a fix for the delete method being unable to remove the last
xattr attached to an inode.

Patches 8-14 fix some bugs with the inline_data implementation.
Various minor details seem to have been missed, such as not rehashing
inline directories, calculating the available size for inline data,
calculating i_blocks correctly, fine details of interactions between
the xattr editing code and the inline data code, mistakes with how the
inline directory dirent iterator deals with restoring the caller's
context, and a bug in resize2fs.

Patches 15-16 introduce cppcheck checking to the build process when C=1
is specified, and fix a few errors that it picked up.

Patches 17-20 implement various minor bug fixes and cleanups, some of
which are based on complaints from valgrind, clang, and cppcheck.

Patches 21 reduces the giant flood of numbers when e2fsck prints runs of
duplicate blocks.

Patches 22-27 make some alterations to metadata checksumming support;
by default, e2fsck will now check the inode before verifying the
checksum.  There's a command line option to restore the "just scrape
it off the system" behavior for heavily damaged filesystems.  There
are a couple of patches to fix erroneous behavior and crashes when
e2fsck has to rebuild the root directory.  The final patch in this
clump adds a command line option to dumpe2fs to ignore checksum
failures.

Patch 28 enables block_validity for new filesystems.  As noted here
previously, the overhead of enabling this option seems to be at most a
1% performance hit when performing a lot of small allocations, and
negligible otherwise.  On the plus side, the filesystem is smarter
about noticing erroneous allocations out of metadata areas (i.e. block
bitmap corruption) and shutting itself down to prevent damage.

Patches 29-30 enhance ext2fs_bmap2() to allow the creation of
uninitialized extents.  The functionality is already there; really it
just adds a flag to indicate uninitialized.  There's also a patch to
the fileio routines to handle uninitialized extents.  These patches
are unchanged from December.

Patches 31-33 add to resize2fs the ability to convert a filesystem to
and from 64bit mode.  These patches are unchanged from December.

Patches 34-37 implement readahead for e2fsck.  The first patch tries
to reduce system call overhead by using pread/pwrite if available.
The next two patches plumb in the IO manager and library changes
necessary to read metadata blocks into the page cache (on Linux).  The
final patch teaches e2fsck to use the library readahead functions in a
separate thread.

Crude testing has been done via:
# echo 3 > /proc/sys/vm/drop_caches
# e2fsck -Fnfvtt /dev/XXX

So far in my crude testing on a cold system, I've seen about a ~20%
speedup on a SSD, a ~40% speedup on a 3x RAID1 SATA array, and about
a 10% speedup on a single-spindle SATA disk.  On a single-queue USB
HDD, performance doesn't change much.  It looks as though low end
storage like USB HDDs will not benefit, which doesn't surprise me.
There's around a 2% regression for USB HDDs, though it doesn't seem
statistically significant.  The SSD numbers are harder to quantify
since they're already fast.  Somewhat unexpectedly, the readahead code
speeds up e2fsck even when the page cache has already been warmed up.

This third version of the readahead patches try to prevent page cache
thrashing by limiting the amount of (user-configurable) readahead to a
default of half of physical memory.  It also tries to release some of
the memory pages if it can conclude that it's totally done with a
block, and it can now detect very slow readahead and disable it.

Patches 38-42 implement fallocate for e2fsprogs, and modifies Ted's
mk_hugefiles functionality to use it.  The general fallocate API call
is (regrettably) much more complex than Ted's, since it must grapple
with the possibility that the file already has mapped blocks.  There
were also a lot of bigalloc related subtleties.

Patches 43-46 implement fuse2fs, a FUSE server based on libext2fs.
Primarily I've been using it to shake out bugs in the library via
xfstests and the metadata checksumming test program.  It can also be
used to mount ext4 on any OS supporting FUSE, and it can also mount
64k-block filesystems on x86, though I'd be wary of using rw mode.
fuse2fs depends on these new APIs: xattr editing, uninit extent
handling, and the new fallocate call.

Patches 47-49 provide the metadata checksumming test script.  Its
primary advantage over 'make check' is that it allows one to specify a 
variety of different mkfs and mount options.  It's also growing more
tests as a result of fuse2fs exercise.

I've tested these e2fsprogs changes against the -next branch as of
3/6.  These days, I use several VMs, each with 8GB ramdisks to test
with; the test process is checkpatch > make C=1 > make check >
metadata checksum tests > fuse + xfstests.

Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 01/49] create_inode: clean up return mess in do_write_internal
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-11 20:30   ` Andreas Dilger
  2014-03-11  6:54 ` [PATCH 02/49] create_inode: minor cleanups Darrick J. Wong
                   ` (45 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

do_write_internal returns errno when ext2 library calls fail; since
errno only reflects the outcome of the last C library call, this will
result in confused callers.  Eliminate the naked return since
this results in an undefined return value.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index cf4a58f..647480c 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -353,14 +353,14 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 	if (retval == 0) {
 		com_err(__func__, 0, "The file '%s' already exists\n", dest);
 		close(fd);
-		return errno;
+		return retval;
 	}
 
 	retval = ext2fs_new_inode(current_fs, cwd, 010755, 0, &newfile);
 	if (retval) {
 		com_err(__func__, retval, 0);
 		close(fd);
-		return errno;
+		return retval;
 	}
 #ifdef DEBUGFS
 	printf("Allocated inode: %u\n", newfile);
@@ -372,7 +372,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 		if (retval) {
 			com_err(__func__, retval, "while expanding directory");
 			close(fd);
-			return errno;
+			return retval;
 		}
 		retval = ext2fs_link(current_fs, cwd, dest, newfile,
 					EXT2_FT_REG_FILE);
@@ -412,12 +412,15 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 	if ((retval = ext2fs_write_new_inode(current_fs, newfile, &inode))) {
 		com_err(__func__, retval, "while creating inode %u", newfile);
 		close(fd);
-		return errno;
+		return retval;
 	}
 	if (inode.i_flags & EXT4_INLINE_DATA_FL) {
 		retval = ext2fs_inline_data_init(current_fs, newfile);
-		if (retval)
-			return;
+		if (retval) {
+			com_err("copy_file", retval, 0);
+			close(fd);
+			return retval;
+		}
 	}
 	if (LINUX_S_ISREG(inode.i_mode)) {
 		if (statbuf.st_blocks < statbuf.st_size / S_BLKSIZE) {
@@ -434,7 +437,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 	}
 	close(fd);
 
-	return 0;
+	return retval;
 }
 
 /* Copy files from source_dir to fs */


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 02/49] create_inode: minor cleanups
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
  2014-03-11  6:54 ` [PATCH 01/49] create_inode: clean up return mess in do_write_internal Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-11 20:31   ` Andreas Dilger
  2014-03-11  6:54 ` [PATCH 03/49] create_inode: whitespace fixes Darrick J. Wong
                   ` (44 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix a couple of small style issues in the create_inode files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |   42 ++++++++++++++++++++++++++++--------------
 misc/create_inode.h |    5 +++++
 2 files changed, 33 insertions(+), 14 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index 647480c..b204e71 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -1,3 +1,6 @@
+#include <time.h>
+#include <unistd.h>
+
 #include "create_inode.h"
 
 #if __STDC_VERSION__ < 199901L
@@ -179,7 +182,8 @@ errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target)
 	cp = strrchr(name, '/');
 	if (cp) {
 		*cp = 0;
-		if ((retval =  ext2fs_namei(current_fs, root, cwd, name, &parent_ino))){
+		retval = ext2fs_namei(current_fs, root, cwd, name, &parent_ino);
+		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
 		}
@@ -216,7 +220,8 @@ errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 	cp = strrchr(name, '/');
 	if (cp) {
 		*cp = 0;
-		if ((retval =  ext2fs_namei(current_fs, root, cwd, name, &parent_ino))){
+		retval = ext2fs_namei(current_fs, root, cwd, name, &parent_ino);
+		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
 		}
@@ -409,7 +414,8 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 		inode.i_flags |= EXT4_EXTENTS_FL;
 	}
 
-	if ((retval = ext2fs_write_new_inode(current_fs, newfile, &inode))) {
+	retval = ext2fs_write_new_inode(current_fs, newfile, &inode);
+	if (retval) {
 		com_err(__func__, retval, "while creating inode %u", newfile);
 		close(fd);
 		return retval;
@@ -464,12 +470,12 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 
 	if (!(dh = opendir("."))) {
 		com_err(__func__, errno,
-			_("while openning directory \"%s\""), source_dir);
+			_("while opening directory \"%s\""), source_dir);
 		return errno;
 	}
 
-	while((dent = readdir(dh))) {
-		if((!strcmp(dent->d_name, ".")) || (!strcmp(dent->d_name, "..")))
+	while ((dent = readdir(dh))) {
+		if ((!strcmp(dent->d_name, ".")) || (!strcmp(dent->d_name, "..")))
 			continue;
 		lstat(dent->d_name, &st);
 		name = dent->d_name;
@@ -494,7 +500,8 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 			case S_IFCHR:
 			case S_IFBLK:
 			case S_IFIFO:
-				if ((retval = do_mknod_internal(parent_ino, name, &st))) {
+				retval = do_mknod_internal(parent_ino, name, &st);
+				if (retval) {
 					com_err(__func__, retval,
 						_("while creating special file \"%s\""), name);
 					return retval;
@@ -506,32 +513,37 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 					_("ignoring socket file \"%s\""), name);
 				continue;
 			case S_IFLNK:
-				if((read_cnt = readlink(name, ln_target, sizeof(ln_target))) == -1) {
+				read_cnt = readlink(name, ln_target, sizeof(ln_target));
+				if (read_cnt == -1) {
 					com_err(__func__, errno,
 						_("while trying to readlink \"%s\""), name);
 					return errno;
 				}
 				ln_target[read_cnt] = '\0';
-				if ((retval = do_symlink_internal(parent_ino, name, ln_target))) {
+				retval = do_symlink_internal(parent_ino, name, ln_target);
+				if (retval) {
 					com_err(__func__, retval,
 						_("while writing symlink\"%s\""), name);
 					return retval;
 				}
 				break;
 			case S_IFREG:
-				if ((retval = do_write_internal(parent_ino, name, name))) {
+				retval = do_write_internal(parent_ino, name, name);
+				if (retval) {
 					com_err(__func__, retval,
 						_("while writing file \"%s\""), name);
 					return retval;
 				}
 				break;
 			case S_IFDIR:
-				if ((retval = do_mkdir_internal(parent_ino, name, &st))) {
+				retval = do_mkdir_internal(parent_ino, name, &st);
+				if (retval) {
 					com_err(__func__, retval,
 						_("while making dir \"%s\""), name);
 					return retval;
 				}
-				if ((retval = ext2fs_namei(current_fs, root, parent_ino, name, &ino))) {
+				retval = ext2fs_namei(current_fs, root, parent_ino, name, &ino);
+				if (retval) {
 					com_err(name, retval, 0);
 						return retval;
 				}
@@ -548,12 +560,14 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 					_("ignoring entry \"%s\""), name);
 		}
 
-		if ((retval =  ext2fs_namei(current_fs, root, parent_ino, name, &ino))){
+		retval =  ext2fs_namei(current_fs, root, parent_ino, name, &ino);
+		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
 		}
 
-		if ((retval = set_inode_extra(parent_ino, ino, &st))) {
+		retval = set_inode_extra(parent_ino, ino, &st);
+		if (retval) {
 			com_err(__func__, retval,
 				_("while setting inode for \"%s\""), name);
 			return retval;
diff --git a/misc/create_inode.h b/misc/create_inode.h
index 2b6d429..79742e8 100644
--- a/misc/create_inode.h
+++ b/misc/create_inode.h
@@ -1,3 +1,6 @@
+#ifndef _CREATE_INODE_H
+#define _CREATE_INODE_H
+
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
@@ -33,3 +36,5 @@ extern errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat
 extern errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target);
 extern errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st);
 extern errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest);
+
+#endif /* _CREATE_INODE_H */


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 03/49] create_inode: whitespace fixes
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
  2014-03-11  6:54 ` [PATCH 01/49] create_inode: clean up return mess in do_write_internal Darrick J. Wong
  2014-03-11  6:54 ` [PATCH 02/49] create_inode: minor cleanups Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-12  3:27   ` Theodore Ts'o
  2014-03-11  6:54 ` [PATCH 04/49] create_inode: move debugfs internal state back to debugfs Darrick J. Wong
                   ` (43 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix a ton of whitespace issues.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |  198 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 110 insertions(+), 88 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index b204e71..766a8a4 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -25,7 +25,8 @@
 int hdlink_cnt = HDLINK_CNT;
 
 /* Link an inode number to a directory */
-static errcode_t add_link(ext2_ino_t parent_ino, ext2_ino_t ino, const char *name)
+static errcode_t add_link(ext2_ino_t parent_ino, ext2_ino_t ino,
+			  const char *name)
 {
 	struct ext2_inode	inode;
 	errcode_t		retval;
@@ -43,7 +44,8 @@ static errcode_t add_link(ext2_ino_t parent_ino, ext2_ino_t ino, const char *nam
 			com_err(__func__, retval, "while expanding directory");
 			return retval;
 		}
-		retval = ext2fs_link(current_fs, parent_ino, name, ino, inode.i_flags);
+		retval = ext2fs_link(current_fs, parent_ino, name, ino,
+				     inode.i_flags);
 	}
 	if (retval) {
 		com_err(__func__, retval, "while linking %s", name);
@@ -103,18 +105,18 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 	int			filetype;
 
 	switch(st->st_mode & S_IFMT) {
-		case S_IFCHR:
-			mode = LINUX_S_IFCHR;
-			filetype = EXT2_FT_CHRDEV;
-			break;
-		case S_IFBLK:
-			mode = LINUX_S_IFBLK;
-			filetype =  EXT2_FT_BLKDEV;
-			break;
-		case S_IFIFO:
-			mode = LINUX_S_IFIFO;
-			filetype = EXT2_FT_FIFO;
-			break;
+	case S_IFCHR:
+		mode = LINUX_S_IFCHR;
+		filetype = EXT2_FT_CHRDEV;
+		break;
+	case S_IFBLK:
+		mode = LINUX_S_IFBLK;
+		filetype =  EXT2_FT_BLKDEV;
+		break;
+	case S_IFIFO:
+		mode = LINUX_S_IFIFO;
+		filetype = EXT2_FT_FIFO;
+		break;
 	}
 
 	if (!(current_fs->flags & EXT2_FLAG_RW)) {
@@ -143,7 +145,7 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 		com_err(name, retval, 0);
 		return -1;
 	}
-        if (ext2fs_test_inode_bitmap2(current_fs->inode_map, ino))
+	if (ext2fs_test_inode_bitmap2(current_fs->inode_map, ino))
 		com_err(__func__, 0, "Warning: inode already set");
 	ext2fs_inode_alloc_stats2(current_fs, ino, +1, 0);
 	memset(&inode, 0, sizeof(inode));
@@ -159,7 +161,8 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 		inode.i_block[1] = 0;
 	} else {
 		inode.i_block[0] = 0;
-		inode.i_block[1] = (minor & 0xff) | (major << 8) | ((minor & ~0xff) << 12);
+		inode.i_block[1] = (minor & 0xff) | (major << 8) |
+				   ((minor & ~0xff) << 12);
 	}
 	inode.i_links_count = 1;
 
@@ -182,7 +185,8 @@ errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target)
 	cp = strrchr(name, '/');
 	if (cp) {
 		*cp = 0;
-		retval = ext2fs_namei(current_fs, root, cwd, name, &parent_ino);
+		retval = ext2fs_namei(current_fs, root, cwd, name,
+				      &parent_ino);
 		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
@@ -196,7 +200,8 @@ try_again:
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
 		retval = ext2fs_expand_dir(current_fs, parent_ino);
 		if (retval) {
-			com_err("do_symlink_internal", retval, "while expanding directory");
+			com_err("do_symlink_internal", retval,
+				"while expanding directory");
 			return retval;
 		}
 		goto try_again;
@@ -220,7 +225,8 @@ errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 	cp = strrchr(name, '/');
 	if (cp) {
 		*cp = 0;
-		retval = ext2fs_namei(current_fs, root, cwd, name, &parent_ino);
+		retval = ext2fs_namei(current_fs, root, cwd, name,
+				      &parent_ino);
 		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
@@ -245,7 +251,8 @@ try_again:
 	}
 }
 
-static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_holes)
+static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize,
+			   int make_holes)
 {
 	ext2_file_t	e2_file;
 	errcode_t	retval;
@@ -291,7 +298,9 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
 			cmp = memcmp(ptr, zero_buf, got);
 			if (cmp == 0) {
 				 /* The whole block is zero, make a hole */
-				retval = ext2fs_file_lseek(e2_file, got, EXT2_SEEK_CUR, NULL);
+				retval = ext2fs_file_lseek(e2_file, got,
+							   EXT2_SEEK_CUR,
+							   NULL);
 				if (retval)
 					goto fail;
 				got = 0;
@@ -387,7 +396,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 		close(fd);
 		return errno;
 	}
-        if (ext2fs_test_inode_bitmap2(current_fs->inode_map, newfile))
+	if (ext2fs_test_inode_bitmap2(current_fs->inode_map, newfile))
 		com_err(__func__, 0, "Warning: inode already set");
 	ext2fs_inode_alloc_stats2(current_fs, newfile, +1, 0);
 	memset(&inode, 0, sizeof(inode));
@@ -464,7 +473,8 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 
 	if (chdir(source_dir) < 0) {
 		com_err(__func__, errno,
-			_("while changing working directory to \"%s\""), source_dir);
+			_("while changing working directory to \"%s\""),
+			source_dir);
 		return errno;
 	}
 
@@ -475,20 +485,24 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 	}
 
 	while ((dent = readdir(dh))) {
-		if ((!strcmp(dent->d_name, ".")) || (!strcmp(dent->d_name, "..")))
+		if ((!strcmp(dent->d_name, ".")) ||
+		    (!strcmp(dent->d_name, "..")))
 			continue;
 		lstat(dent->d_name, &st);
 		name = dent->d_name;
 
 		/* Check for hardlinks */
 		save_inode = 0;
-		if (!S_ISDIR(st.st_mode) && !S_ISLNK(st.st_mode) && st.st_nlink > 1) {
+		if (!S_ISDIR(st.st_mode) && !S_ISLNK(st.st_mode) &&
+		    st.st_nlink > 1) {
 			hdlink = is_hardlink(st.st_ino);
 			if (hdlink >= 0) {
 				retval = add_link(parent_ino,
-						hdlinks.hdl[hdlink].dst_ino, name);
+						  hdlinks.hdl[hdlink].dst_ino,
+						  name);
 				if (retval) {
-					com_err(__func__, retval, "while linking %s", name);
+					com_err(__func__, retval,
+						"while linking %s", name);
 					return retval;
 				}
 				continue;
@@ -497,70 +511,78 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 		}
 
 		switch(st.st_mode & S_IFMT) {
-			case S_IFCHR:
-			case S_IFBLK:
-			case S_IFIFO:
-				retval = do_mknod_internal(parent_ino, name, &st);
-				if (retval) {
-					com_err(__func__, retval,
-						_("while creating special file \"%s\""), name);
-					return retval;
-				}
-				break;
-			case S_IFSOCK:
-				/* FIXME: there is no make socket function atm. */
-				com_err(__func__, 0,
-					_("ignoring socket file \"%s\""), name);
-				continue;
-			case S_IFLNK:
-				read_cnt = readlink(name, ln_target, sizeof(ln_target));
-				if (read_cnt == -1) {
-					com_err(__func__, errno,
-						_("while trying to readlink \"%s\""), name);
-					return errno;
-				}
-				ln_target[read_cnt] = '\0';
-				retval = do_symlink_internal(parent_ino, name, ln_target);
-				if (retval) {
-					com_err(__func__, retval,
-						_("while writing symlink\"%s\""), name);
-					return retval;
-				}
-				break;
-			case S_IFREG:
-				retval = do_write_internal(parent_ino, name, name);
-				if (retval) {
-					com_err(__func__, retval,
-						_("while writing file \"%s\""), name);
-					return retval;
-				}
-				break;
-			case S_IFDIR:
-				retval = do_mkdir_internal(parent_ino, name, &st);
-				if (retval) {
-					com_err(__func__, retval,
-						_("while making dir \"%s\""), name);
-					return retval;
-				}
-				retval = ext2fs_namei(current_fs, root, parent_ino, name, &ino);
-				if (retval) {
-					com_err(name, retval, 0);
-						return retval;
-				}
-				/* Populate the dir recursively*/
-				retval = populate_fs(ino, name);
-				if (retval) {
-					com_err(__func__, retval, _("while adding dir \"%s\""), name);
+		case S_IFCHR:
+		case S_IFBLK:
+		case S_IFIFO:
+			retval = do_mknod_internal(parent_ino, name, &st);
+			if (retval) {
+				com_err(__func__, retval,
+					_("while creating special file "
+					  "\"%s\""), name);
+				return retval;
+			}
+			break;
+		case S_IFSOCK:
+			/* FIXME: there is no make socket function atm. */
+			com_err(__func__, 0,
+				_("ignoring socket file \"%s\""), name);
+			continue;
+		case S_IFLNK:
+			read_cnt = readlink(name, ln_target,
+					    sizeof(ln_target));
+			if (read_cnt == -1) {
+				com_err(__func__, errno,
+					_("while trying to readlink \"%s\""),
+					name);
+				return errno;
+			}
+			ln_target[read_cnt] = '\0';
+			retval = do_symlink_internal(parent_ino, name,
+						     ln_target);
+			if (retval) {
+				com_err(__func__, retval,
+					_("while writing symlink\"%s\""),
+					name);
+				return retval;
+			}
+			break;
+		case S_IFREG:
+			retval = do_write_internal(parent_ino, name, name);
+			if (retval) {
+				com_err(__func__, retval,
+					_("while writing file \"%s\""), name);
+				return retval;
+			}
+			break;
+		case S_IFDIR:
+			retval = do_mkdir_internal(parent_ino, name, &st);
+			if (retval) {
+				com_err(__func__, retval,
+					_("while making dir \"%s\""), name);
+				return retval;
+			}
+			retval = ext2fs_namei(current_fs, root, parent_ino,
+					      name, &ino);
+			if (retval) {
+				com_err(name, retval, 0);
 					return retval;
-				}
-				chdir("..");
-				break;
-			default:
-				com_err(__func__, 0,
-					_("ignoring entry \"%s\""), name);
+			}
+			/* Populate the dir recursively*/
+			retval = populate_fs(ino, name);
+			if (retval) {
+				com_err(__func__, retval,
+					_("while adding dir \"%s\""), name);
+				return retval;
+			}
+			chdir("..");
+			break;
+		default:
+			com_err(__func__, 0,
+				_("ignoring entry \"%s\""), name);
 		}
 
-		retval =  ext2fs_namei(current_fs, root, parent_ino, name, &ino);
+		retval =  ext2fs_namei(current_fs, root, parent_ino,
+				       name, &ino);
 		if (retval) {
 			com_err(name, retval, 0);
 			return retval;


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 04/49] create_inode: move debugfs internal state back to debugfs
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 03/49] create_inode: whitespace fixes Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-12  3:31   ` Theodore Ts'o
  2014-03-11  6:54 ` [PATCH 05/49] create_inode: handle hard link inum mappings per populate_fs invocation Darrick J. Wong
                   ` (42 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Since create_inode.c is shared between debugfs and mke2fs, don't
spread debugfs internal state into mke2fs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c   |   15 ++++--
 misc/create_inode.c |  134 +++++++++++++++++++++++++++------------------------
 misc/create_inode.h |   21 +++++---
 misc/mke2fs.c       |    5 +-
 4 files changed, 95 insertions(+), 80 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 5554017..9b38c08 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -43,7 +43,8 @@ ss_request_table *extra_cmds;
 const char *debug_prog_name;
 int sci_idx;
 
-ext2_ino_t	cwd;
+ext2_filsys    current_fs;
+ext2_ino_t     root, cwd;
 
 static void open_filesystem(char *device, int open_flags, blk64_t superblock,
 			    blk64_t blocksize, int catastrophic,
@@ -1606,7 +1607,8 @@ void do_write(int argc, char *argv[])
 				"<native file> <new file>", CHECK_FS_RW))
 		return;
 
-	if ((retval = do_write_internal(cwd, argv[1], argv[2])))
+	retval = do_write_internal(current_fs, cwd, argv[1], argv[2], root);
+	if (retval)
 		com_err(argv[0], retval, 0);
 }
 
@@ -1654,7 +1656,8 @@ void do_mknod(int argc, char *argv[])
 		goto usage;
 
 	st.st_rdev = makedev(major, minor);
-	if ((retval = do_mknod_internal(cwd, argv[1], &st)))
+	retval = do_mknod_internal(current_fs, cwd, argv[1], &st);
+	if (retval)
 		com_err(argv[0], retval, 0);
 }
 
@@ -1666,7 +1669,8 @@ void do_mkdir(int argc, char *argv[])
 				"<filename>", CHECK_FS_RW))
 		return;
 
-	if ((retval = do_mkdir_internal(cwd, argv[1], NULL)))
+	retval = do_mkdir_internal(current_fs, cwd, argv[1], NULL, root);
+	if (retval)
 		com_err(argv[0], retval, 0);
 
 }
@@ -2067,7 +2071,8 @@ void do_symlink(int argc, char *argv[])
 				"<filename> <target>", CHECK_FS_RW))
 		return;
 
-	if ((retval = do_symlink_internal(cwd, argv[1], argv[2])))
+	retval = do_symlink_internal(current_fs, cwd, argv[1], argv[2], root);
+	if (retval)
 		com_err(argv[0], retval, 0);
 
 }
diff --git a/misc/create_inode.c b/misc/create_inode.c
index 766a8a4..588f3f6 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -25,27 +25,26 @@
 int hdlink_cnt = HDLINK_CNT;
 
 /* Link an inode number to a directory */
-static errcode_t add_link(ext2_ino_t parent_ino, ext2_ino_t ino,
-			  const char *name)
+static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
+			  ext2_ino_t ino, const char *name)
 {
 	struct ext2_inode	inode;
 	errcode_t		retval;
 
-	retval = ext2fs_read_inode(current_fs, ino, &inode);
+	retval = ext2fs_read_inode(fs, ino, &inode);
         if (retval) {
 		com_err(__func__, retval, "while reading inode %u", ino);
 		return retval;
 	}
 
-	retval = ext2fs_link(current_fs, parent_ino, name, ino, inode.i_flags);
+	retval = ext2fs_link(fs, parent_ino, name, ino, inode.i_flags);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
-		retval = ext2fs_expand_dir(current_fs, parent_ino);
+		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
 			com_err(__func__, retval, "while expanding directory");
 			return retval;
 		}
-		retval = ext2fs_link(current_fs, parent_ino, name, ino,
-				     inode.i_flags);
+		retval = ext2fs_link(fs, parent_ino, name, ino, inode.i_flags);
 	}
 	if (retval) {
 		com_err(__func__, retval, "while linking %s", name);
@@ -54,7 +53,7 @@ static errcode_t add_link(ext2_ino_t parent_ino, ext2_ino_t ino,
 
 	inode.i_links_count++;
 
-	retval = ext2fs_write_inode(current_fs, ino, &inode);
+	retval = ext2fs_write_inode(fs, ino, &inode);
 	if (retval)
 		com_err(__func__, retval, "while writing inode %u", ino);
 
@@ -75,12 +74,13 @@ static void fill_inode(struct ext2_inode *inode, struct stat *st)
 }
 
 /* Set the uid, gid, mode and time for the inode */
-errcode_t set_inode_extra(ext2_ino_t cwd, ext2_ino_t ino, struct stat *st)
+static errcode_t set_inode_extra(ext2_filsys fs, ext2_ino_t cwd,
+				 ext2_ino_t ino, struct stat *st)
 {
 	errcode_t		retval;
 	struct ext2_inode	inode;
 
-	retval = ext2fs_read_inode(current_fs, ino, &inode);
+	retval = ext2fs_read_inode(fs, ino, &inode);
         if (retval) {
 		com_err(__func__, retval, "while reading inode %u", ino);
 		return retval;
@@ -88,7 +88,7 @@ errcode_t set_inode_extra(ext2_ino_t cwd, ext2_ino_t ino, struct stat *st)
 
 	fill_inode(&inode, st);
 
-	retval = ext2fs_write_inode(current_fs, ino, &inode);
+	retval = ext2fs_write_inode(fs, ino, &inode);
 	if (retval) {
 		com_err(__func__, retval, "while writing inode %u", ino);
 		return retval;
@@ -96,7 +96,8 @@ errcode_t set_inode_extra(ext2_ino_t cwd, ext2_ino_t ino, struct stat *st)
 }
 
 /* Make a special file which is block, character and fifo */
-errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
+errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
+			    struct stat *st)
 {
 	ext2_ino_t		ino;
 	errcode_t 		retval;
@@ -119,11 +120,11 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 		break;
 	}
 
-	if (!(current_fs->flags & EXT2_FLAG_RW)) {
+	if (!(fs->flags & EXT2_FLAG_RW)) {
 		com_err(__func__, 0, "Filesystem opened read/only");
 		return -1;
 	}
-	retval = ext2fs_new_inode(current_fs, cwd, 010755, 0, &ino);
+	retval = ext2fs_new_inode(fs, cwd, 010755, 0, &ino);
 	if (retval) {
 		com_err(__func__, retval, 0);
 		return retval;
@@ -132,26 +133,26 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 #ifdef DEBUGFS
 	printf("Allocated inode: %u\n", ino);
 #endif
-	retval = ext2fs_link(current_fs, cwd, name, ino, filetype);
+	retval = ext2fs_link(fs, cwd, name, ino, filetype);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
-		retval = ext2fs_expand_dir(current_fs, cwd);
+		retval = ext2fs_expand_dir(fs, cwd);
 		if (retval) {
 			com_err(__func__, retval, "while expanding directory");
 			return retval;
 		}
-		retval = ext2fs_link(current_fs, cwd, name, ino, filetype);
+		retval = ext2fs_link(fs, cwd, name, ino, filetype);
 	}
 	if (retval) {
 		com_err(name, retval, 0);
 		return -1;
 	}
-	if (ext2fs_test_inode_bitmap2(current_fs->inode_map, ino))
+	if (ext2fs_test_inode_bitmap2(fs->inode_map, ino))
 		com_err(__func__, 0, "Warning: inode already set");
-	ext2fs_inode_alloc_stats2(current_fs, ino, +1, 0);
+	ext2fs_inode_alloc_stats2(fs, ino, +1, 0);
 	memset(&inode, 0, sizeof(inode));
 	inode.i_mode = mode;
 	inode.i_atime = inode.i_ctime = inode.i_mtime =
-		current_fs->now ? current_fs->now : time(0);
+		fs->now ? fs->now : time(0);
 
 	major = major(st->st_rdev);
 	minor = minor(st->st_rdev);
@@ -166,7 +167,7 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 	}
 	inode.i_links_count = 1;
 
-	retval = ext2fs_write_new_inode(current_fs, ino, &inode);
+	retval = ext2fs_write_new_inode(fs, ino, &inode);
 	if (retval)
 		com_err(__func__, retval, "while creating inode %u", ino);
 
@@ -174,7 +175,8 @@ errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 }
 
 /* Make a symlink name -> target */
-errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target)
+errcode_t do_symlink_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
+			      char *target, ext2_ino_t root)
 {
 	char			*cp;
 	ext2_ino_t		parent_ino;
@@ -185,8 +187,7 @@ errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target)
 	cp = strrchr(name, '/');
 	if (cp) {
 		*cp = 0;
-		retval = ext2fs_namei(current_fs, root, cwd, name,
-				      &parent_ino);
+		retval = ext2fs_namei(fs, root, cwd, name, &parent_ino);
 		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
@@ -196,9 +197,9 @@ errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target)
 		parent_ino = cwd;
 
 try_again:
-	retval = ext2fs_symlink(current_fs, parent_ino, 0, name, target);
+	retval = ext2fs_symlink(fs, parent_ino, 0, name, target);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
-		retval = ext2fs_expand_dir(current_fs, parent_ino);
+		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
 			com_err("do_symlink_internal", retval,
 				"while expanding directory");
@@ -214,7 +215,8 @@ try_again:
 }
 
 /* Make a directory in the fs */
-errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st)
+errcode_t do_mkdir_internal(ext2_filsys fs, ext2_ino_t cwd, const char *name,
+			    struct stat *st, ext2_ino_t root)
 {
 	char			*cp;
 	ext2_ino_t		parent_ino, ino;
@@ -225,8 +227,7 @@ errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 	cp = strrchr(name, '/');
 	if (cp) {
 		*cp = 0;
-		retval = ext2fs_namei(current_fs, root, cwd, name,
-				      &parent_ino);
+		retval = ext2fs_namei(fs, root, cwd, name, &parent_ino);
 		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
@@ -236,9 +237,9 @@ errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st)
 		parent_ino = cwd;
 
 try_again:
-	retval = ext2fs_mkdir(current_fs, parent_ino, 0, name);
+	retval = ext2fs_mkdir(fs, parent_ino, 0, name);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
-		retval = ext2fs_expand_dir(current_fs, parent_ino);
+		retval = ext2fs_expand_dir(fs, parent_ino);
 		if (retval) {
 			com_err(__func__, retval, "while expanding directory");
 			return retval;
@@ -251,8 +252,8 @@ try_again:
 	}
 }
 
-static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize,
-			   int make_holes)
+static errcode_t copy_file(ext2_filsys fs, int fd, ext2_ino_t newfile,
+			   int bufsize, int make_holes)
 {
 	ext2_file_t	e2_file;
 	errcode_t	retval;
@@ -263,7 +264,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize,
 	char		*zero_buf;
 	int		cmp;
 
-	retval = ext2fs_file_open(current_fs, newfile,
+	retval = ext2fs_file_open(fs, newfile,
 				  EXT2_FILE_WRITE, &e2_file);
 	if (retval)
 		return retval;
@@ -330,7 +331,7 @@ fail:
 	return retval;
 }
 
-int is_hardlink(ext2_ino_t ino)
+static int is_hardlink(ext2_ino_t ino)
 {
 	int i;
 
@@ -342,7 +343,8 @@ int is_hardlink(ext2_ino_t ino)
 }
 
 /* Copy the native file to the fs */
-errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
+errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
+			    const char *dest, ext2_ino_t root)
 {
 	int		fd;
 	struct stat	statbuf;
@@ -363,14 +365,14 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 		return errno;
 	}
 
-	retval = ext2fs_namei(current_fs, root, cwd, dest, &newfile);
+	retval = ext2fs_namei(fs, root, cwd, dest, &newfile);
 	if (retval == 0) {
 		com_err(__func__, 0, "The file '%s' already exists\n", dest);
 		close(fd);
 		return retval;
 	}
 
-	retval = ext2fs_new_inode(current_fs, cwd, 010755, 0, &newfile);
+	retval = ext2fs_new_inode(fs, cwd, 010755, 0, &newfile);
 	if (retval) {
 		com_err(__func__, retval, 0);
 		close(fd);
@@ -379,16 +381,16 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 #ifdef DEBUGFS
 	printf("Allocated inode: %u\n", newfile);
 #endif
-	retval = ext2fs_link(current_fs, cwd, dest, newfile,
+	retval = ext2fs_link(fs, cwd, dest, newfile,
 				EXT2_FT_REG_FILE);
 	if (retval == EXT2_ET_DIR_NO_SPACE) {
-		retval = ext2fs_expand_dir(current_fs, cwd);
+		retval = ext2fs_expand_dir(fs, cwd);
 		if (retval) {
 			com_err(__func__, retval, "while expanding directory");
 			close(fd);
 			return retval;
 		}
-		retval = ext2fs_link(current_fs, cwd, dest, newfile,
+		retval = ext2fs_link(fs, cwd, dest, newfile,
 					EXT2_FT_REG_FILE);
 	}
 	if (retval) {
@@ -396,19 +398,19 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 		close(fd);
 		return errno;
 	}
-	if (ext2fs_test_inode_bitmap2(current_fs->inode_map, newfile))
+	if (ext2fs_test_inode_bitmap2(fs->inode_map, newfile))
 		com_err(__func__, 0, "Warning: inode already set");
-	ext2fs_inode_alloc_stats2(current_fs, newfile, +1, 0);
+	ext2fs_inode_alloc_stats2(fs, newfile, +1, 0);
 	memset(&inode, 0, sizeof(inode));
 	inode.i_mode = (statbuf.st_mode & ~LINUX_S_IFMT) | LINUX_S_IFREG;
 	inode.i_atime = inode.i_ctime = inode.i_mtime =
-		current_fs->now ? current_fs->now : time(0);
+		fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
 	inode.i_size = statbuf.st_size;
-	if (EXT2_HAS_INCOMPAT_FEATURE(current_fs->super,
+	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				      EXT4_FEATURE_INCOMPAT_INLINE_DATA)) {
 		inode.i_flags |= EXT4_INLINE_DATA_FL;
-	} else if (current_fs->super->s_feature_incompat &
+	} else if (fs->super->s_feature_incompat &
 		   EXT3_FEATURE_INCOMPAT_EXTENTS) {
 		int i;
 		struct ext3_extent_header *eh;
@@ -423,14 +425,14 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 		inode.i_flags |= EXT4_EXTENTS_FL;
 	}
 
-	retval = ext2fs_write_new_inode(current_fs, newfile, &inode);
+	retval = ext2fs_write_new_inode(fs, newfile, &inode);
 	if (retval) {
 		com_err(__func__, retval, "while creating inode %u", newfile);
 		close(fd);
 		return retval;
 	}
 	if (inode.i_flags & EXT4_INLINE_DATA_FL) {
-		retval = ext2fs_inline_data_init(current_fs, newfile);
+		retval = ext2fs_inline_data_init(fs, newfile);
 		if (retval) {
 			com_err("copy_file", retval, 0);
 			close(fd);
@@ -446,7 +448,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 			 */
 			bufsize = statbuf.st_blksize;
 		}
-		retval = copy_file(fd, newfile, bufsize, make_holes);
+		retval = copy_file(fs, fd, newfile, bufsize, make_holes);
 		if (retval)
 			com_err("copy_file", retval, 0);
 	}
@@ -456,7 +458,8 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 }
 
 /* Copy files from source_dir to fs */
-errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
+errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
+		      const char *source_dir, ext2_ino_t root)
 {
 	const char	*name;
 	DIR		*dh;
@@ -469,8 +472,6 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 	int		read_cnt;
 	int		hdlink;
 
-	root = EXT2_ROOT_INO;
-
 	if (chdir(source_dir) < 0) {
 		com_err(__func__, errno,
 			_("while changing working directory to \"%s\""),
@@ -497,7 +498,7 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 		    st.st_nlink > 1) {
 			hdlink = is_hardlink(st.st_ino);
 			if (hdlink >= 0) {
-				retval = add_link(parent_ino,
+				retval = add_link(fs, parent_ino,
 						  hdlinks.hdl[hdlink].dst_ino,
 						  name);
 				if (retval) {
@@ -514,7 +515,7 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 		case S_IFCHR:
 		case S_IFBLK:
 		case S_IFIFO:
-			retval = do_mknod_internal(parent_ino, name, &st);
+			retval = do_mknod_internal(fs, parent_ino, name, &st);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while creating special file "
@@ -537,8 +538,8 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 				return errno;
 			}
 			ln_target[read_cnt] = '\0';
-			retval = do_symlink_internal(parent_ino, name,
-						     ln_target);
+			retval = do_symlink_internal(fs, parent_ino, name,
+						     ln_target, root);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while writing symlink\"%s\""),
@@ -547,7 +548,8 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 			}
 			break;
 		case S_IFREG:
-			retval = do_write_internal(parent_ino, name, name);
+			retval = do_write_internal(fs, parent_ino, name, name,
+						   root);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while writing file \"%s\""), name);
@@ -555,40 +557,44 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
 			}
 			break;
 		case S_IFDIR:
-			retval = do_mkdir_internal(parent_ino, name, &st);
+			retval = do_mkdir_internal(fs, parent_ino, name, &st,
+						   root);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while making dir \"%s\""), name);
 				return retval;
 			}
-			retval = ext2fs_namei(current_fs, root, parent_ino,
+			retval = ext2fs_namei(fs, root, parent_ino,
 					      name, &ino);
 			if (retval) {
 				com_err(name, retval, 0);
 					return retval;
 			}
 			/* Populate the dir recursively*/
-			retval = populate_fs(ino, name);
+			retval = populate_fs(fs, ino, name, root);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while adding dir \"%s\""), name);
 				return retval;
 			}
-			chdir("..");
+			if (chdir("..")) {
+				com_err(__func__, errno,
+					_("during cd .."));
+				return errno;
+			}
 			break;
 		default:
 			com_err(__func__, 0,
 				_("ignoring entry \"%s\""), name);
 		}
 
-		retval =  ext2fs_namei(current_fs, root, parent_ino,
-				       name, &ino);
+		retval =  ext2fs_namei(fs, root, parent_ino, name, &ino);
 		if (retval) {
 			com_err(name, retval, 0);
 			return retval;
 		}
 
-		retval = set_inode_extra(parent_ino, ino, &st);
+		retval = set_inode_extra(fs, parent_ino, ino, &st);
 		if (retval) {
 			com_err(__func__, retval,
 				_("while setting inode for \"%s\""), name);
diff --git a/misc/create_inode.h b/misc/create_inode.h
index 79742e8..fd96910 100644
--- a/misc/create_inode.h
+++ b/misc/create_inode.h
@@ -23,18 +23,23 @@ struct hdlinks_s
 
 struct hdlinks_s hdlinks;
 
-ext2_filsys    current_fs;
-ext2_ino_t     root;
-
 /* For saving the hard links */
 #define HDLINK_CNT     4
 extern int hdlink_cnt;
 
 /* For populating the filesystem */
-extern errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir);
-extern errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat *st);
-extern errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target);
-extern errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st);
-extern errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest);
+extern errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
+			     const char *source_dir, ext2_ino_t root);
+extern errcode_t do_mknod_internal(ext2_filsys fs, ext2_ino_t cwd,
+				   const char *name, struct stat *st);
+extern errcode_t do_symlink_internal(ext2_filsys fs, ext2_ino_t cwd,
+				     const char *name, char *target,
+				     ext2_ino_t root);
+extern errcode_t do_mkdir_internal(ext2_filsys fs, ext2_ino_t cwd,
+				   const char *name, struct stat *st,
+				   ext2_ino_t root);
+extern errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd,
+				   const char *src, const char *dest,
+				   ext2_ino_t root);
 
 #endif /* _CREATE_INODE_H */
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 61aced2..1422336 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -3000,9 +3000,8 @@ no_journal:
 		}
 
 		hdlinks.count = 0;
-		current_fs = fs;
-		root = EXT2_ROOT_INO;
-		retval = populate_fs(root, root_dir);
+		retval = populate_fs(fs, EXT2_ROOT_INO, root_dir,
+				     EXT2_ROOT_INO);
 		if (retval)
 			fprintf(stderr, "%s",
 				_("\nError while populating file system"));


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 05/49] create_inode: handle hard link inum mappings per populate_fs invocation
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 04/49] create_inode: move debugfs internal state back to debugfs Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-12  3:46   ` Theodore Ts'o
  2014-03-11  6:54 ` [PATCH 06/49] libext2fs: support modifying arbitrary extended attributes (v5) Darrick J. Wong
                   ` (41 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When calling populate_fs, the map for hardlink detection is not
cleaned up between populate_fs invocations, which could lead to
unexpected results if anyone calls populate_fs twice in the same
client program).  This doesn't happen right now, but we might as well
clean it up.

The detctor fails if the external directory crosses mountpoints,
so fix that too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/create_inode.c |   63 +++++++++++++++++++++++++++++++++++----------------
 misc/create_inode.h |   10 +++-----
 misc/mke2fs.c       |   12 ----------
 3 files changed, 47 insertions(+), 38 deletions(-)


diff --git a/misc/create_inode.c b/misc/create_inode.c
index 588f3f6..fc4172d 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -21,9 +21,6 @@
 #define S_BLKSIZE 512
 #endif
 
-/* For saving the hard links */
-int hdlink_cnt = HDLINK_CNT;
-
 /* Link an inode number to a directory */
 static errcode_t add_link(ext2_filsys fs, ext2_ino_t parent_ino,
 			  ext2_ino_t ino, const char *name)
@@ -331,12 +328,13 @@ fail:
 	return retval;
 }
 
-static int is_hardlink(ext2_ino_t ino)
+static int is_hardlink(struct hdlinks_s *hdlinks, dev_t dev, ino_t ino)
 {
 	int i;
 
-	for(i = 0; i < hdlinks.count; i++) {
-		if(hdlinks.hdl[i].src_ino == ino)
+	for (i = 0; i < hdlinks->count; i++) {
+		if (hdlinks->hdl[i].src_dev == dev &&
+		    hdlinks->hdl[i].src_ino == ino)
 			return i;
 	}
 	return -1;
@@ -458,8 +456,9 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 }
 
 /* Copy files from source_dir to fs */
-errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
-		      const char *source_dir, ext2_ino_t root)
+static errcode_t __populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
+			       const char *source_dir, ext2_ino_t root,
+			       struct hdlinks_s *hdlinks)
 {
 	const char	*name;
 	DIR		*dh;
@@ -496,10 +495,10 @@ errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 		save_inode = 0;
 		if (!S_ISDIR(st.st_mode) && !S_ISLNK(st.st_mode) &&
 		    st.st_nlink > 1) {
-			hdlink = is_hardlink(st.st_ino);
+			hdlink = is_hardlink(hdlinks, st.st_dev, st.st_ino);
 			if (hdlink >= 0) {
 				retval = add_link(fs, parent_ino,
-						  hdlinks.hdl[hdlink].dst_ino,
+						  hdlinks->hdl[hdlink].dst_ino,
 						  name);
 				if (retval) {
 					com_err(__func__, retval,
@@ -571,7 +570,7 @@ errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 					return retval;
 			}
 			/* Populate the dir recursively*/
-			retval = populate_fs(fs, ino, name, root);
+			retval = __populate_fs(fs, ino, name, root, hdlinks);
 			if (retval) {
 				com_err(__func__, retval,
 					_("while adding dir \"%s\""), name);
@@ -608,20 +607,44 @@ errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
 			 * free() since the lifespan will be over after the fs
 			 * populated.
 			 */
-			if (hdlinks.count == hdlink_cnt) {
-				if ((hdlinks.hdl = realloc (hdlinks.hdl,
-						(hdlink_cnt + HDLINK_CNT) *
-						sizeof (struct hdlink_s))) == NULL) {
-					com_err(name, errno, "Not enough memory");
+			if (hdlinks->count == hdlinks->size) {
+				void *p = realloc(hdlinks->hdl,
+						(hdlinks->size + HDLINK_CNT) *
+						sizeof(struct hdlink_s));
+				if (p == NULL) {
+					com_err(name, errno,
+						_("Not enough memory"));
 					return errno;
 				}
-				hdlink_cnt += HDLINK_CNT;
+				hdlinks->hdl = p;
+				hdlinks->size += HDLINK_CNT;
 			}
-			hdlinks.hdl[hdlinks.count].src_ino = st.st_ino;
-			hdlinks.hdl[hdlinks.count].dst_ino = ino;
-			hdlinks.count++;
+			hdlinks->hdl[hdlinks->count].src_dev = st.st_dev;
+			hdlinks->hdl[hdlinks->count].src_ino = st.st_ino;
+			hdlinks->hdl[hdlinks->count].dst_ino = ino;
+			hdlinks->count++;
 		}
 	}
 	closedir(dh);
 	return retval;
 }
+
+errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
+		      const char *source_dir, ext2_ino_t root)
+{
+	struct hdlinks_s hdlinks;
+	errcode_t retval;
+
+	hdlinks.count = 0;
+	hdlinks.size = HDLINK_CNT;
+	hdlinks.hdl = realloc(NULL, hdlinks.size * sizeof(struct hdlink_s));
+	if (hdlinks.hdl == NULL) {
+		com_err(__func__, errno, "Not enough memory");
+		return errno;
+	}
+
+	retval = __populate_fs(fs, parent_ino, source_dir, root, &hdlinks);
+
+	free(hdlinks.hdl);
+	return retval;
+}
diff --git a/misc/create_inode.h b/misc/create_inode.h
index fd96910..067bf96 100644
--- a/misc/create_inode.h
+++ b/misc/create_inode.h
@@ -11,21 +11,19 @@
 
 struct hdlink_s
 {
-	ext2_ino_t src_ino;
+	dev_t src_dev;
+	ino_t src_ino;
 	ext2_ino_t dst_ino;
 };
 
 struct hdlinks_s
 {
 	int count;
+	int size;
 	struct hdlink_s *hdl;
 };
 
-struct hdlinks_s hdlinks;
-
-/* For saving the hard links */
-#define HDLINK_CNT     4
-extern int hdlink_cnt;
+#define HDLINK_CNT	(4)
 
 /* For populating the filesystem */
 extern errcode_t populate_fs(ext2_filsys fs, ext2_ino_t parent_ino,
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 1422336..aecd5d5 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -2988,18 +2988,6 @@ no_journal:
 		if (!quiet)
 			printf("%s", _("Copying files into the device: "));
 
-		/*
-		 * Allocate memory for the hardlinks, we don't need free()
-		 * since the lifespan will be over after the fs populated.
-		 */
-		if ((hdlinks.hdl = (struct hdlink_s *)
-				malloc(hdlink_cnt * sizeof(struct hdlink_s))) == NULL) {
-			fprintf(stderr, "%s", _("\nNot enough memory\n"));
-			retval = ext2fs_close(fs);
-			return retval;
-		}

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 06/49] libext2fs: support modifying arbitrary extended attributes (v5)
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 05/49] create_inode: handle hard link inum mappings per populate_fs invocation Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-12  3:51   ` Theodore Ts'o
  2014-03-11  6:54 ` [PATCH 07/49] debugfs: create commands to edit extended attributes Darrick J. Wong
                   ` (40 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

v5: Add magic number checking to the extended attribute editing
handle; move inline data to the head of the attribute list when
writing so that inline data ends up in the inode area; and always zero
the attribute space before writing to ensure that we can delete the
last xattr.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c         |    4 +++-
 lib/ext2fs/ext2_err.et.in |    3 +++
 lib/ext2fs/ext2fs.h       |    2 +-
 lib/ext2fs/ext_attr.c     |   39 ++++++++++++++++++++++++++++++++++++---
 4 files changed, 43 insertions(+), 5 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 9b38c08..d50bb42 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -549,6 +549,7 @@ static int dump_attr(char *name, char *value, size_t value_len, void *data)
 static void dump_inode_attributes(FILE *out, ext2_ino_t ino)
 {
 	struct ext2_xattr_handle *h;
+	size_t sz;
 	errcode_t err;
 
 	err = ext2fs_xattrs_open(current_fs, ino, &h);
@@ -559,7 +560,8 @@ static void dump_inode_attributes(FILE *out, ext2_ino_t ino)
 	if (err)
 		goto out;
 
-	if (ext2fs_xattrs_count(h) == 0)
+	err = ext2fs_xattrs_count(h, &sz);
+	if (err || sz == 0)
 		goto out;
 
 	fprintf(out, "Extended attributes:\n");
diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
index 007103d..51c88d0 100644
--- a/lib/ext2fs/ext2_err.et.in
+++ b/lib/ext2fs/ext2_err.et.in
@@ -512,4 +512,7 @@ ec	EXT2_ET_INLINE_DATA_NO_BLOCK,
 ec	EXT2_ET_INLINE_DATA_NO_SPACE,
 	"No free space in inline data"
 
+ec	EXT2_ET_MAGIC_EA_HANDLE,
+	"Wrong magic number for extended attribute structure"
+
 	end
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index a7b6116..3756e8b 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1183,7 +1183,7 @@ errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
 errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
 errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
 			       struct ext2_inode_large *inode);
-size_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle);
+errcode_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle, size_t *count);
 errcode_t ext2fs_xattr_inode_max_size(ext2_filsys fs, ext2_ino_t ino,
 				      size_t *size);
 
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index e8dee53..308d21d 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -195,6 +195,7 @@ struct ext2_xattr {
 };
 
 struct ext2_xattr_handle {
+	errcode_t magic;
 	ext2_filsys fs;
 	struct ext2_xattr *attrs;
 	size_t length, count;
@@ -238,6 +239,24 @@ static struct ea_name_index ea_names[] = {
 	{0, NULL},
 };
 
+static void move_inline_data_to_front(struct ext2_xattr_handle *h)
+{
+	struct ext2_xattr *x;
+	struct ext2_xattr tmp;
+
+	for (x = h->attrs + 1; x < h->attrs + h->length; x++) {
+		if (!x->name)
+			continue;
+
+		if (strcmp(x->name, "system.data") == 0) {
+			memcpy(&tmp, x, sizeof(tmp));
+			memcpy(x, h->attrs, sizeof(tmp));
+			memcpy(h->attrs, &tmp, sizeof(tmp));
+			return;
+		}
+	}
+}
+
 static const char *find_ea_prefix(int index)
 {
 	struct ea_name_index *e;
@@ -412,6 +431,7 @@ static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
 	unsigned int entry_size, value_size;
 	int idx, ret;
 
+	memset(entries_start, 0, storage_size);
 	/* For all remaining x...  */
 	for (; x < handle->attrs + handle->length; x++) {
 		if (!x->name)
@@ -471,6 +491,7 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 	unsigned int i;
 	errcode_t err;
 
+	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EA_HANDLE);
 	i = EXT2_INODE_SIZE(handle->fs->super);
 	if (i < sizeof(*inode))
 		i = sizeof(*inode);
@@ -484,6 +505,8 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 	if (err)
 		goto out;
 
+	move_inline_data_to_front(handle);
+
 	x = handle->attrs;
 	/* Does the inode have size for EA? */
 	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
@@ -511,7 +534,7 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 
 write_ea_block:
 	/* Write the EA block */
-	err = ext2fs_get_memzero(handle->fs->blocksize, &block_buf);
+	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
 	if (err)
 		goto out;
 
@@ -590,6 +613,7 @@ static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
 		x++;
 
 	entry = entries;
+	remain = storage_size;
 	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
 		__u32 hash;
 
@@ -682,6 +706,7 @@ errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
 	int i;
 	errcode_t err;
 
+	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EA_HANDLE);
 	i = EXT2_INODE_SIZE(handle->fs->super);
 	if (i < sizeof(*inode))
 		i = sizeof(*inode);
@@ -781,6 +806,7 @@ errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
 	errcode_t err;
 	int ret;
 
+	EXT2_CHECK_MAGIC(h, EXT2_ET_MAGIC_EA_HANDLE);
 	for (x = h->attrs; x < h->attrs + h->length; x++) {
 		if (!x->name)
 			continue;
@@ -802,6 +828,7 @@ errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
 	void *val;
 	errcode_t err;
 
+	EXT2_CHECK_MAGIC(h, EXT2_ET_MAGIC_EA_HANDLE);
 	for (x = h->attrs; x < h->attrs + h->length; x++) {
 		if (!x->name)
 			continue;
@@ -893,6 +920,7 @@ errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
 	char *new_value;
 	errcode_t err;
 
+	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EA_HANDLE);
 	last_empty = NULL;
 	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
 		if (!x->name) {
@@ -958,6 +986,7 @@ errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
 	struct ext2_xattr *x;
 	errcode_t err;
 
+	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EA_HANDLE);
 	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
 		if (!x->name)
 			continue;
@@ -991,6 +1020,7 @@ errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
 	if (err)
 		return err;
 
+	h->magic = EXT2_ET_MAGIC_EA_HANDLE;
 	h->length = 4;
 	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
 				   &h->attrs);
@@ -1010,6 +1040,7 @@ errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
 	struct ext2_xattr_handle *h = *handle;
 	errcode_t err;
 
+	EXT2_CHECK_MAGIC(h, EXT2_ET_MAGIC_EA_HANDLE);
 	if (h->dirty) {
 		err = ext2fs_xattrs_write(h);
 		if (err)
@@ -1022,7 +1053,9 @@ errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
 	return 0;
 }
 
-size_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle)
+errcode_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle, size_t *count)
 {
-	return handle->count;
+	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EA_HANDLE);
+	*count = handle->count;
+	return 0;
 }


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 07/49] debugfs: create commands to edit extended attributes
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (5 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 06/49] libext2fs: support modifying arbitrary extended attributes (v5) Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-12  3:51   ` Theodore Ts'o
  2014-03-11  6:54 ` [PATCH 08/49] e2fsck: don't rehash inline directories Darrick J. Wong
                   ` (39 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Enhance debugfs to be able to display and modify extended attributes, and
create some simple tests for the extended attribute editing functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/Makefile.in        |   14 ++
 debugfs/debug_cmds.ct      |   12 ++
 debugfs/debugfs.c          |   62 ---------
 debugfs/debugfs.h          |    3 
 debugfs/xattrs.c           |  297 ++++++++++++++++++++++++++++++++++++++++++++
 tests/d_xattr_edits/expect |   51 ++++++++
 tests/d_xattr_edits/name   |    1 
 tests/d_xattr_edits/script |  135 ++++++++++++++++++++
 8 files changed, 510 insertions(+), 65 deletions(-)
 create mode 100644 debugfs/xattrs.c
 create mode 100644 tests/d_xattr_edits/expect
 create mode 100644 tests/d_xattr_edits/name
 create mode 100644 tests/d_xattr_edits/script


diff --git a/debugfs/Makefile.in b/debugfs/Makefile.in
index e0c5597..16d6aa7 100644
--- a/debugfs/Makefile.in
+++ b/debugfs/Makefile.in
@@ -18,18 +18,18 @@ MK_CMDS=	_SS_DIR_OVERRIDE=../lib/ss ../lib/ss/mk_cmds
 
 DEBUG_OBJS= debug_cmds.o debugfs.o util.o ncheck.o icheck.o ls.o \
 	lsdel.o dump.o set_fields.o logdump.o htree.o unused.o e2freefrag.o \
-	filefrag.o extent_cmds.o extent_inode.o zap.o create_inode.o
+	filefrag.o extent_cmds.o extent_inode.o zap.o create_inode.o xattrs.o
 
 RO_DEBUG_OBJS= ro_debug_cmds.o ro_debugfs.o util.o ncheck.o icheck.o ls.o \
 	lsdel.o logdump.o htree.o e2freefrag.o filefrag.o extent_cmds.o \
-	extent_inode.o
+	extent_inode.o xattrs.o
 
 SRCS= debug_cmds.c $(srcdir)/debugfs.c $(srcdir)/util.c $(srcdir)/ls.c \
 	$(srcdir)/ncheck.c $(srcdir)/icheck.c $(srcdir)/lsdel.c \
 	$(srcdir)/dump.c $(srcdir)/set_fields.c ${srcdir}/logdump.c \
 	$(srcdir)/htree.c $(srcdir)/unused.c ${srcdir}/../misc/e2freefrag.c \
 	$(srcdir)/filefrag.c $(srcdir)/extent_inode.c $(srcdir)/zap.c \
-	$(srcdir)/../misc/create_inode.c
+	$(srcdir)/../misc/create_inode.c $(srcdir)/xattrs.c
 
 LIBS= $(LIBEXT2FS) $(LIBE2P) $(LIBSS) $(LIBCOM_ERR) $(LIBBLKID) \
 	$(LIBUUID) $(SYSLIBS)
@@ -285,3 +285,11 @@ create_inode.o: $(srcdir)/../misc/create_inode.c \
  $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
  $(srcdir)/../misc/nls-enable.h
+xattrs.o: $(srcdir)/xattrs.c $(srcdir)/debugfs.h \
+ $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/lib/ext2fs/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(top_srcdir)/lib/ext2fs/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
+ $(srcdir)/jfs_user.h $(top_srcdir)/lib/ext2fs/kernel-jbd.h \
+ $(top_srcdir)/lib/ext2fs/jfs_compat.h $(top_srcdir)/lib/ext2fs/kernel-list.h
diff --git a/debugfs/debug_cmds.ct b/debugfs/debug_cmds.ct
index 96ff00f..666032b 100644
--- a/debugfs/debug_cmds.ct
+++ b/debugfs/debug_cmds.ct
@@ -190,5 +190,17 @@ request do_zap_block, "Zap block: fill with 0, pattern, flip bits etc.",
 request do_block_dump, "Dump contents of a block",
 	block_dump, bd;
 
+request do_list_xattr, "List extended attributes of an inode",
+	ea_list;
+
+request do_get_xattr, "Get an extended attribute of an inode",
+	ea_get;
+
+request do_set_xattr, "Set an extended attribute of an inode",
+	ea_set;
+
+request do_rm_xattr, "Remove an extended attribute of an inode",
+	ea_rm;
+
 end;
 
diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index d50bb42..a5cd007 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -492,27 +492,6 @@ static int list_blocks_proc(ext2_filsys fs EXT2FS_ATTR((unused)),
 	return 0;
 }
 
-static void dump_xattr_string(FILE *out, const char *str, int len)
-{
-	int printable = 0;
-	int i;
-
-	/* check: is string "printable enough?" */
-	for (i = 0; i < len; i++)
-		if (isprint(str[i]))
-			printable++;
-
-	if (printable <= len*7/8)
-		printable = 0;
-
-	for (i = 0; i < len; i++)
-		if (printable)
-			fprintf(out, isprint(str[i]) ? "%c" : "\\%03o",
-				(unsigned char)str[i]);
-		else
-			fprintf(out, "%02x ", (unsigned char)str[i]);
-}
-
 static void internal_dump_inode_extra(FILE *out,
 				      const char *prefix EXT2FS_ATTR((unused)),
 				      ext2_ino_t inode_num EXT2FS_ATTR((unused)),
@@ -532,47 +511,6 @@ static void internal_dump_inode_extra(FILE *out,
 	}
 }
 
-/* Dump extended attributes */
-static int dump_attr(char *name, char *value, size_t value_len, void *data)
-{
-	FILE *out = data;
-
-	fprintf(out, "  ");
-	dump_xattr_string(out, name, strlen(name));
-	fprintf(out, " = \"");
-	dump_xattr_string(out, value, value_len);
-	fprintf(out, "\" (%zu)\n", value_len);
-
-	return 0;
-}
-
-static void dump_inode_attributes(FILE *out, ext2_ino_t ino)
-{
-	struct ext2_xattr_handle *h;
-	size_t sz;
-	errcode_t err;
-
-	err = ext2fs_xattrs_open(current_fs, ino, &h);
-	if (err)
-		return;
-
-	err = ext2fs_xattrs_read(h);
-	if (err)
-		goto out;
-
-	err = ext2fs_xattrs_count(h, &sz);
-	if (err || sz == 0)
-		goto out;
-
-	fprintf(out, "Extended attributes:\n");
-	err = ext2fs_xattrs_iterate(h, dump_attr, out);
-	if (err)
-		goto out;
-
-out:
-	err = ext2fs_xattrs_close(&h);
-}
-
 static void dump_blocks(FILE *f, const char *prefix, ext2_ino_t inode)
 {
 	struct list_blocks_struct lb;
diff --git a/debugfs/debugfs.h b/debugfs/debugfs.h
index 5e3b256..3c27f82 100644
--- a/debugfs/debugfs.h
+++ b/debugfs/debugfs.h
@@ -175,6 +175,9 @@ extern void do_filefrag(int argc, char *argv[]);
 /* util.c */
 extern time_t string_to_time(const char *arg);
 
+/* xattrs.c */
+void dump_inode_attributes(FILE *out, ext2_ino_t ino);
+
 /* zap.c */
 extern void do_zap_block(int argc, char **argv);
 extern void do_block_dump(int argc, char **argv);
diff --git a/debugfs/xattrs.c b/debugfs/xattrs.c
new file mode 100644
index 0000000..0a29521
--- /dev/null
+++ b/debugfs/xattrs.c
@@ -0,0 +1,297 @@
+/*
+ * xattrs.c --- Modify extended attributes via debugfs.
+ *
+ * Copyright (C) 2014 Oracle.  This file may be redistributed
+ * under the terms of the GNU Public License.
+ */
+
+#include "config.h"
+#include <stdio.h>
+#ifdef HAVE_GETOPT_H
+#include <getopt.h>
+#else
+extern int optind;
+extern char *optarg;
+#endif
+#include <ctype.h>
+
+#include "debugfs.h"
+
+/* Dump extended attributes */
+static void dump_xattr_string(FILE *out, const char *str, int len)
+{
+	int printable = 0;
+	int i;
+
+	/* check: is string "printable enough?" */
+	for (i = 0; i < len; i++)
+		if (isprint(str[i]))
+			printable++;
+
+	if (printable <= len*7/8)
+		printable = 0;
+
+	for (i = 0; i < len; i++)
+		if (printable)
+			fprintf(out, isprint(str[i]) ? "%c" : "\\%03o",
+				(unsigned char)str[i]);
+		else
+			fprintf(out, "%02x ", (unsigned char)str[i]);
+}
+
+static int dump_attr(char *name, char *value, size_t value_len, void *data)
+{
+	FILE *out = data;
+
+	fprintf(out, "  ");
+	dump_xattr_string(out, name, strlen(name));
+	fprintf(out, " = \"");
+	dump_xattr_string(out, value, value_len);
+	fprintf(out, "\" (%zu)\n", value_len);
+
+	return 0;
+}
+
+void dump_inode_attributes(FILE *out, ext2_ino_t ino)
+{
+	struct ext2_xattr_handle *h;
+	size_t sz;
+	errcode_t err;
+
+	err = ext2fs_xattrs_open(current_fs, ino, &h);
+	if (err)
+		return;
+
+	err = ext2fs_xattrs_read(h);
+	if (err)
+		goto out;
+
+	err = ext2fs_xattrs_count(h, &sz);
+	if (err || sz == 0)
+		goto out;
+
+	fprintf(out, "Extended attributes:\n");
+	err = ext2fs_xattrs_iterate(h, dump_attr, out);
+	if (err)
+		goto out;
+
+out:
+	err = ext2fs_xattrs_close(&h);
+}
+
+void do_list_xattr(int argc, char **argv)
+{
+	ext2_ino_t ino;
+
+	if (argc != 2) {
+		printf("%s: Usage: %s <file>\n", argv[0],
+		       argv[0]);
+		return;
+	}
+
+	if (check_fs_open(argv[0]))
+		return;
+
+	ino = string_to_inode(argv[1]);
+	if (!ino)
+		return;
+
+	dump_inode_attributes(stdout, ino);
+}
+
+void do_get_xattr(int argc, char **argv)
+{
+	ext2_ino_t ino;
+	struct ext2_xattr_handle *h;
+	FILE *fp = NULL;
+	char *buf = NULL;
+	size_t buflen;
+	int i;
+	errcode_t err;
+
+	reset_getopt();
+	while ((i = getopt(argc, argv, "f:")) != -1) {
+		switch (i) {
+		case 'f':
+			fp = fopen(optarg, "w");
+			if (fp == NULL) {
+				perror(optarg);
+				return;
+			}
+			break;
+		default:
+			printf("%s: Usage: %s <file> <attr> [-f outfile]\n",
+			       argv[0], argv[0]);
+			return;
+		}
+	}
+
+	if (optind != argc - 2) {
+		printf("%s: Usage: %s <file> <attr> [-f outfile]\n", argv[0],
+		       argv[0]);
+		return;
+	}
+
+	if (check_fs_open(argv[0]))
+		return;
+
+	ino = string_to_inode(argv[optind]);
+	if (!ino)
+		return;
+
+	err = ext2fs_xattrs_open(current_fs, ino, &h);
+	if (err)
+		return;
+
+	err = ext2fs_xattrs_read(h);
+	if (err)
+		goto out;
+
+	err = ext2fs_xattr_get(h, argv[optind + 1], (void **)&buf, &buflen);
+	if (err)
+		goto out;
+
+	if (fp) {
+		fwrite(buf, buflen, 1, fp);
+		fclose(fp);
+	} else {
+		dump_xattr_string(stdout, buf, buflen);
+		printf("\n");
+	}
+
+	if (buf)
+		ext2fs_free_mem(&buf);
+out:
+	ext2fs_xattrs_close(&h);
+	if (err)
+		com_err(argv[0], err, "while getting extended attribute");
+}
+
+void do_set_xattr(int argc, char **argv)
+{
+	ext2_ino_t ino;
+	struct ext2_xattr_handle *h;
+	FILE *fp = NULL;
+	char *buf = NULL;
+	size_t buflen;
+	int i;
+	errcode_t err;
+
+	reset_getopt();
+	while ((i = getopt(argc, argv, "f:")) != -1) {
+		switch (i) {
+		case 'f':
+			fp = fopen(optarg, "r");
+			if (fp == NULL) {
+				perror(optarg);
+				return;
+			}
+			break;
+		default:
+			printf("%s: Usage: %s <file> <attr> [-f infile | "
+			       "value]\n", argv[0], argv[0]);
+			return;
+		}
+	}
+
+	if (optind != argc - 2 && optind != argc - 3) {
+		printf("%s: Usage: %s <file> <attr> [-f infile | value>]\n",
+		       argv[0], argv[0]);
+		return;
+	}
+
+	if (check_fs_open(argv[0]))
+		return;
+	if (check_fs_read_write(argv[0]))
+		return;
+	if (check_fs_bitmaps(argv[0]))
+		return;
+
+	ino = string_to_inode(argv[optind]);
+	if (!ino)
+		return;
+
+	err = ext2fs_xattrs_open(current_fs, ino, &h);
+	if (err)
+		return;
+
+	err = ext2fs_xattrs_read(h);
+	if (err)
+		goto out;
+
+	if (fp) {
+		err = ext2fs_get_mem(current_fs->blocksize, &buf);
+		if (err)
+			goto out;
+		buflen = fread(buf, 1, current_fs->blocksize, fp);
+	} else {
+		buf = argv[optind + 2];
+		buflen = strlen(argv[optind + 2]);
+	}
+
+	err = ext2fs_xattr_set(h, argv[optind + 1], buf, buflen);
+	if (err)
+		goto out;
+
+	err = ext2fs_xattrs_write(h);
+	if (err)
+		goto out;
+
+out:
+	if (fp) {
+		fclose(fp);
+		ext2fs_free_mem(&buf);
+	}
+	ext2fs_xattrs_close(&h);
+	if (err)
+		com_err(argv[0], err, "while setting extended attribute");
+}
+
+void do_rm_xattr(int argc, char **argv)
+{
+	ext2_ino_t ino;
+	struct ext2_xattr_handle *h;
+	int i;
+	errcode_t err;
+
+	if (argc < 3) {
+		printf("%s: Usage: %s <file> <attrs>...\n", argv[0], argv[0]);
+		return;
+	}
+
+	if (check_fs_open(argv[0]))
+		return;
+	if (check_fs_read_write(argv[0]))
+		return;
+	if (check_fs_bitmaps(argv[0]))
+		return;
+
+	ino = string_to_inode(argv[1]);
+	if (!ino)
+		return;
+
+	err = ext2fs_xattrs_open(current_fs, ino, &h);
+	if (err)
+		return;
+
+	err = ext2fs_xattrs_read(h);
+	if (err)
+		goto out;
+
+	for (i = 2; i < argc; i++) {
+		size_t buflen;
+		char *buf;
+
+		err = ext2fs_xattr_remove(h, argv[i]);
+		if (err)
+			goto out;
+	}
+
+	err = ext2fs_xattrs_write(h);
+	if (err)
+		goto out;
+out:
+	ext2fs_xattrs_close(&h);
+	if (err)
+		com_err(argv[0], err, "while removing extended attribute");
+}
diff --git a/tests/d_xattr_edits/expect b/tests/d_xattr_edits/expect
new file mode 100644
index 0000000..10e30c1
--- /dev/null
+++ b/tests/d_xattr_edits/expect
@@ -0,0 +1,51 @@
+debugfs edit extended attributes
+mke2fs -Fq -b 1024 test.img 512
+Exit status is 0
+ea_set / user.joe smith
+Exit status is 0
+ea_set / user.moo FEE_FIE_FOE_FUMMMMMM
+Exit status is 0
+ea_list /
+Extended attributes:
+  user.joe = "smith" (5)
+  user.moo = "FEE_FIE_FOE_FUMMMMMM" (20)
+Exit status is 0
+ea_get / user.moo
+FEE_FIE_FOE_FUMMMMMM
+Exit status is 0
+ea_get / nosuchea
+ea_get: Extended attribute key not found while getting extended attribute
+Exit status is 0
+ea_rm / user.moo
+Exit status is 0
+ea_rm / nosuchea
+ea_rm: Extended attribute key not found while removing extended attribute
+Exit status is 0
+ea_list /
+Extended attributes:
+  user.joe = "smith" (5)
+Exit status is 0
+ea_get / user.moo
+ea_get: Extended attribute key not found while getting extended attribute
+Exit status is 0
+ea_rm / user.joe
+Exit status is 0
+ea_list /
+Exit status is 0
+ea_set / user.file_based_xattr -f d_xattr_edits.tmp
+Exit status is 0
+ea_list /
+Extended attributes:
+  user.file_based_xattr = "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\012" (108)
+Exit status is 0
+ea_get / user.file_based_xattr -f d_xattr_edits.ver.tmp
+Exit status is 0
+Compare big attribute
+e2fsck -yf -N test_filesys
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/64 files (0.0% non-contiguous), 29/512 blocks
+Exit status is 0
diff --git a/tests/d_xattr_edits/name b/tests/d_xattr_edits/name
new file mode 100644
index 0000000..c0c428c
--- /dev/null
+++ b/tests/d_xattr_edits/name
@@ -0,0 +1 @@
+edit extended attributes in debugfs
diff --git a/tests/d_xattr_edits/script b/tests/d_xattr_edits/script
new file mode 100644
index 0000000..1e33716
--- /dev/null
+++ b/tests/d_xattr_edits/script
@@ -0,0 +1,135 @@
+if test -x $DEBUGFS_EXE; then
+
+OUT=$test_name.log
+EXP=$test_dir/expect
+VERIFY_FSCK_OPT=-yf
+
+TEST_DATA=$test_name.tmp
+VERIFY_DATA=$test_name.ver.tmp
+
+echo "debugfs edit extended attributes" > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo "mke2fs -Fq -b 1024 test.img 512" >> $OUT
+
+$MKE2FS -Fq $TMPFILE 512 > /dev/null 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+echo "ea_set / user.joe smith" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_set / user.joe smith" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_set / user.moo FEE_FIE_FOE_FUMMMMMM" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_set / user.moo FEE_FIE_FOE_FUMMMMMM" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_list /" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_list /" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_get / user.moo" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_get / user.moo" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_get / nosuchea" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_get / nosuchea" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_rm / user.moo" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_rm / user.moo" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_rm / nosuchea" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_rm / nosuchea" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_list /" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_list /" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_get / user.moo" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_get / user.moo" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_rm / user.joe" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_rm / user.joe" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_list /" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_list /" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567" > $TEST_DATA
+echo "ea_set / user.file_based_xattr -f $TEST_DATA" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_set / user.file_based_xattr -f $TEST_DATA" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_list /" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_list /" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "ea_get / user.file_based_xattr -f $VERIFY_DATA" > $OUT.new
+$DEBUGFS -w $TMPFILE -R "ea_get / user.file_based_xattr -f $VERIFY_DATA" >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo "Compare big attribute" > $OUT.new
+diff -u $TEST_DATA $VERIFY_DATA >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+echo e2fsck $VERIFY_FSCK_OPT -N test_filesys > $OUT.new
+$FSCK $VERIFY_FSCK_OPT -N test_filesys $TMPFILE >> $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed $OUT.new >> $OUT
+
+#
+# Do the verification
+#
+
+rm -f $TMPFILE $OUT.new
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+else
+	echo "$test_name: $test_description: failed"
+	diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+unset VERIFY_FSCK_OPT NATIVE_FSCK_OPT OUT EXP TEST_DATA VERIFY_DATA
+
+else #if test -x $DEBUGFS_EXE; then
+	echo "$test_name: $test_description: skipped"
+fi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 08/49] e2fsck: don't rehash inline directories
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (6 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 07/49] debugfs: create commands to edit extended attributes Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-13  3:52   ` Theodore Ts'o
  2014-03-11  6:54 ` [PATCH 09/49] libext2fs: don't fail when doing a strict rewrite of inline data Darrick J. Wong
                   ` (38 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If a directory's contents are stored entirely inside the inode,
there's no index to rebuild and no dirblock checksum to recompute.
As far as I know these are the only two reasons to call dir rehash.

Therefore, we can move on to the next dir instead of what we do right
now, which is try to iterate the dir blocks (which of course fails due
to the inline_data iflag being set) and then flood stdout with useless
messages that aren't even failures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/rehash.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)


diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 8a99453..3b05715 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -794,6 +794,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino)
 	outdir.hashes = 0;
 	e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
 
+	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				      EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
+	   (inode.i_flags & EXT4_INLINE_DATA_FL))
+		return 0;
+
 	retval = ENOMEM;
 	fd.harray = 0;
 	dir_buf = malloc(inode.i_size);
@@ -822,8 +827,6 @@ retry_nohash:
 	/* Read in the entire directory into memory */
 	retval = ext2fs_block_iterate3(fs, ino, 0, 0,
 				       fill_dir_block, &fd);
-	if (retval == EXT2_ET_INLINE_DATA_CANT_ITERATE)
-		goto errout;
 	if (fd.err) {
 		retval = fd.err;
 		goto errout;


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 09/49] libext2fs: don't fail when doing a strict rewrite of inline data
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (7 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 08/49] e2fsck: don't rehash inline directories Darrick J. Wong
@ 2014-03-11  6:54 ` Darrick J. Wong
  2014-03-14 13:19   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file Darrick J. Wong
                   ` (37 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:54 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

ext2fs_inline_data_set() tries to ensure that there is sufficient free
space in the inode to store the inline data.  Unfortunately, it gets
the check wrong -- ext2fs_xattr_inode_max_size() returns the amount of
unused bytes in the EA area, and _data_set() doesn't factor in the
size of the existing inline data.  Therefore, a strict rewrite of an
N-byte inlinedata with another N-byte inlinedata fails.

Fix the code to do the size check correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/inline_data.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)


diff --git a/lib/ext2fs/inline_data.c b/lib/ext2fs/inline_data.c
index 9a786fc..72e8fa3 100644
--- a/lib/ext2fs/inline_data.c
+++ b/lib/ext2fs/inline_data.c
@@ -522,7 +522,7 @@ errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
 	struct ext2_inode inode_buf;
 	struct ext2_inline_data data;
 	errcode_t retval;
-	size_t max_size;
+	size_t free_ea_size, existing_size, free_inode_size;
 
 	if (!inode) {
 		retval = ext2fs_read_inode(fs, ino, &inode_buf);
@@ -536,11 +536,20 @@ errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
 		return ext2fs_write_inode(fs, ino, inode);
 	}
 
-	retval = ext2fs_xattr_inode_max_size(fs, ino, &max_size);
+	retval = ext2fs_xattr_inode_max_size(fs, ino, &free_ea_size);
 	if (retval)
 		return retval;
 
-	if (size - EXT4_MIN_INLINE_DATA_SIZE > max_size)
+	retval = ext2fs_inline_data_size(fs, ino, &existing_size);
+	if (retval)
+		return retval;
+
+	if (existing_size < EXT4_MIN_INLINE_DATA_SIZE)
+		free_inode_size = EXT4_MIN_INLINE_DATA_SIZE - existing_size;
+	else
+		free_inode_size = 0;
+
+	if (size > existing_size + free_ea_size + free_inode_size)
 		return EXT2_ET_INLINE_DATA_NO_SPACE;
 
 	memcpy((void *)inode->i_block, buf, EXT4_MIN_INLINE_DATA_SIZE);


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (8 preceding siblings ...)
  2014-03-11  6:54 ` [PATCH 09/49] libext2fs: don't fail when doing a strict rewrite of inline data Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-12 16:38   ` Andreas Dilger
  2014-03-11  6:55 ` [PATCH 11/49] e2fsck: zero errcode when checking inline data blocks Darrick J. Wong
                   ` (36 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

i_blocks covers the number of blocks allocated to an inode for data,
extents, and ACL blocks.  Since it's possible for a file to have a
separate ACL block and inline data, we must be careful when expanding
an inline data file to adjust, not set, the value of i_blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/inline_data.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/inline_data.c b/lib/ext2fs/inline_data.c
index 72e8fa3..a9ec923 100644
--- a/lib/ext2fs/inline_data.c
+++ b/lib/ext2fs/inline_data.c
@@ -372,7 +372,9 @@ ext2fs_inline_data_dir_expand(ext2_filsys fs, ext2_ino_t ino,
 	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super, EXT3_FEATURE_INCOMPAT_EXTENTS))
 		inode->i_flags |= EXT4_EXTENTS_FL;
 	inode->i_flags &= ~EXT4_INLINE_DATA_FL;
-	ext2fs_iblk_set(fs, inode, 1);
+	retval = ext2fs_iblk_add_blocks(fs, inode, 1);
+	if (retval)
+		goto errout;
 	inode->i_size = fs->blocksize;
 	retval = ext2fs_bmap2(fs, ino, inode, 0, BMAP_SET, 0, 0, &blk);
 	if (retval)
@@ -410,7 +412,6 @@ ext2fs_inline_data_file_expand(ext2_filsys fs, ext2_ino_t ino,
 		inode->i_flags |= EXT4_EXTENTS_FL;
 	}
 	inode->i_flags &= ~EXT4_INLINE_DATA_FL;
-	ext2fs_iblk_set(fs, inode, 0);
 	inode->i_size = 0;
 	retval = ext2fs_write_inode(fs, ino, inode);
 	if (retval)


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 11/49] e2fsck: zero errcode when checking inline data blocks
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (9 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:26   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 12/49] libext2fs: during inlinedata expand, don't corrupt inode Darrick J. Wong
                   ` (35 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When checking inline data blocks, always zero pctx->errcode because
otherwise a previous error condition could leak through and "cause" a
fatal block iteration failure.  I found this by corrupting an xattr
block on an inline_data inode and fsck aborted when I tried to repair
it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 11b3dde..641b3fb 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2158,8 +2158,10 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,
 static void check_blocks_inline_data(e2fsck_t ctx, struct problem_context *pctx,
 				     struct process_block_struct *pb)
 {
-	if (!pb->is_dir)
+	if (!pb->is_dir) {
+		pctx->errcode = 0;
 		return;
+	}
 
 	pctx->errcode = ext2fs_add_dir_block2(ctx->fs->dblist, pb->ino, 0, 0);
 	if (pctx->errcode) {


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 12/49] libext2fs: during inlinedata expand, don't corrupt inode
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (10 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 11/49] e2fsck: zero errcode when checking inline data blocks Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:29   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 13/49] libext2fs: repair side effects when iterating dirents in inline dirs Darrick J. Wong
                   ` (34 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When expanding an inline data inode, it's possible that the reduction
in the size of the EA structures causes the freeing of the EA block,
which changes the inode.  If this happens, the local version of the
inode that ext2fs_inline_data_expand was modifying will be out of sync
with what's on the disk.  This local copy gets written out to disk
after a block allocation, at which point it's possible that the inode
EA block and logical block zero point to the same physical block,
which is bad news.

Therefore, write the local copy to disk before removing the inline
data EA, and reread it afterwards.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/inline_data.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)


diff --git a/lib/ext2fs/inline_data.c b/lib/ext2fs/inline_data.c
index a9ec923..f3cd375 100644
--- a/lib/ext2fs/inline_data.c
+++ b/lib/ext2fs/inline_data.c
@@ -460,9 +460,22 @@ errcode_t ext2fs_inline_data_expand(ext2_filsys fs, ext2_ino_t ino)
 	}
 
 	memset((void *)inode.i_block, 0, EXT4_MIN_INLINE_DATA_SIZE);
+	/*
+	 * NOTE: We must do this write -> ea_remove -> read cycle here because
+	 * removing the inline data EA can free the EA block, which is a change
+	 * that our stack copy of the inode will never see.  If that happens,
+	 * we can end up with the EA block and lblk 0 pointing to the same
+	 * pblk, which is bad news.
+	 */
+	retval = ext2fs_write_inode(fs, ino, &inode);
+	if (retval)
+		goto errout;
 	retval = ext2fs_inline_data_ea_remove(fs, ino);
 	if (retval)
 		goto errout;
+	retval = ext2fs_read_inode(fs, ino, &inode);
+	if (retval)
+		goto errout;
 
 	if (LINUX_S_ISDIR(inode.i_mode)) {
 		retval = ext2fs_inline_data_dir_expand(fs, ino, &inode,


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 13/49] libext2fs: repair side effects when iterating dirents in inline dirs
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (11 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 12/49] libext2fs: during inlinedata expand, don't corrupt inode Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:30   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 14/49] resize2fs: add inline dirs for remapping Darrick J. Wong
                   ` (33 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In ext2fs_inline_data_dir_iterate(), we must be very careful to undo
any modifications we make to the dir_context pointer passed in by the
caller, because it's entirely possible that the caller will still want
to do something with the ctx or something inside.

Specifically, ext2fs_dblist_dir_iterate() wants to be able to free
ctx->buf, and it reuses the ctx for multiple dblist entries.  That
means that assigning ctx->buf will cause weird crashes at the end of
dir_iterate().

Since we're being careful with ctx, we might as well handle adding the
INLINE_DATA flag to ctx->flags for ext2fs_process_dir_block, since the
dblist caller forgets to unset the flag before reusing the ctx.

This fixes some crashes and valgrind complaints in resize2fs, and is
necessary for the next patch, which fixes resize2fs not to corrupt
inline_data filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/dblist_dir.c  |    6 ++----
 lib/ext2fs/dir_iterate.c |    1 -
 lib/ext2fs/inline_data.c |   12 ++++++++++--
 3 files changed, 12 insertions(+), 7 deletions(-)


diff --git a/lib/ext2fs/dblist_dir.c b/lib/ext2fs/dblist_dir.c
index 2fbb772..864a3ca 100644
--- a/lib/ext2fs/dblist_dir.c
+++ b/lib/ext2fs/dblist_dir.c
@@ -76,14 +76,12 @@ static int db_dir_proc(ext2_filsys fs, struct ext2_db_entry2 *db_info,
 	ctx->errcode = ext2fs_read_inode(fs, ctx->dir, &inode);
 	if (ctx->errcode)
 		return DBLIST_ABORT;
-	if (inode.i_flags & EXT4_INLINE_DATA_FL) {
-		ctx->flags = DIRENT_FLAG_INCLUDE_INLINE_DATA;
+	if (inode.i_flags & EXT4_INLINE_DATA_FL)
 		ret = ext2fs_inline_data_dir_iterate(fs, ctx->dir, ctx);
-	} else {
+	else
 		ret = ext2fs_process_dir_block(fs, &db_info->blk,
 					       db_info->blockcnt, 0, 0,
 					       priv_data);
-	}
 	if ((ret & BLOCK_ABORT) && !ctx->errcode)
 		return DBLIST_ABORT;
 	return 0;
diff --git a/lib/ext2fs/dir_iterate.c b/lib/ext2fs/dir_iterate.c
index 8cb6740..67152cc 100644
--- a/lib/ext2fs/dir_iterate.c
+++ b/lib/ext2fs/dir_iterate.c
@@ -128,7 +128,6 @@ errcode_t ext2fs_dir_iterate2(ext2_filsys fs,
 	if (!block_buf)
 		ext2fs_free_mem(&ctx.buf);
 	if (retval == EXT2_ET_INLINE_DATA_CANT_ITERATE) {
-		ctx.flags |= DIRENT_FLAG_INCLUDE_INLINE_DATA;
 		(void) ext2fs_inline_data_dir_iterate(fs, dir, &ctx);
 		retval = 0;
 	}
diff --git a/lib/ext2fs/inline_data.c b/lib/ext2fs/inline_data.c
index f3cd375..7be0f96 100644
--- a/lib/ext2fs/inline_data.c
+++ b/lib/ext2fs/inline_data.c
@@ -120,8 +120,15 @@ int ext2fs_inline_data_dir_iterate(ext2_filsys fs, ext2_ino_t ino,
 	struct ext2_inline_data data;
 	int ret = BLOCK_ABORT;
 	e2_blkcnt_t blockcnt = 0;
+	char *old_buf;
+	unsigned int old_buflen;
+	int old_flags;
 
 	ctx = (struct dir_context *)priv_data;
+	old_buf = ctx->buf;
+	old_buflen = ctx->buflen;
+	old_flags = ctx->flags;
+	ctx->flags |= DIRENT_FLAG_INCLUDE_INLINE_DATA;
 
 	ctx->errcode = ext2fs_read_inode(fs, ino, &inode);
 	if (ctx->errcode)
@@ -235,9 +242,10 @@ int ext2fs_inline_data_dir_iterate(ext2_filsys fs, ext2_ino_t ino,
 
 out1:
 	ext2fs_free_mem(&data.ea_data);
-	ctx->buf = 0;

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 14/49] resize2fs: add inline dirs for remapping
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (12 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 13/49] libext2fs: repair side effects when iterating dirents in inline dirs Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:31   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 15/49] all: Introduce cppcheck static checking for make C=1 Darrick J. Wong
                   ` (32 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're looking for directory blocks for the inode remapping step,
we need to include inline_data directories in the remap process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |    7 +++++++
 1 file changed, 7 insertions(+)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 7122b2f..f5f1337 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1712,6 +1712,13 @@ remap_blocks:
 				retval = pb.error;
 				goto errout;
 			}
+		} else if ((inode->i_flags & EXT4_INLINE_DATA_FL) &&
+			   (rfs->bmap || pb.is_dir)) {
+			/* inline data dir; update it too */
+			retval = ext2fs_add_dir_block2(rfs->old_fs->dblist,
+						       new_inode, 0, 0);
+			if (retval)
+				goto errout;
 		}
 	}
 	io_channel_flush(rfs->old_fs->io);


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 15/49] all: Introduce cppcheck static checking for make C=1
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (13 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 14/49] resize2fs: add inline dirs for remapping Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:33   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 16/49] misc: cppcheck cleanups Darrick J. Wong
                   ` (31 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Introduce more static checking via cppcheck.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 MCONFIG.in              |    6 ++++++
 debugfs/Makefile.in     |    1 +
 e2fsck/Makefile.in      |    1 +
 ext2ed/Makefile.in      |    1 +
 intl/Makefile.in        |    7 +++++++
 lib/blkid/Makefile.in   |    1 +
 lib/e2p/Makefile.in     |    1 +
 lib/et/Makefile.in      |    1 +
 lib/ext2fs/Makefile.in  |    1 +
 lib/quota/Makefile.in   |    1 +
 lib/ss/Makefile.in      |    1 +
 lib/uuid/Makefile.in    |    1 +
 misc/Makefile.in        |    1 +
 resize/Makefile.in      |    1 +
 tests/progs/Makefile.in |    1 +
 util/Makefile.in        |    1 +
 16 files changed, 27 insertions(+)


diff --git a/MCONFIG.in b/MCONFIG.in
index 5ed4df0..9b411d6 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -52,17 +52,23 @@ datadir = @datadir@
 
 @ifGNUmake@ CHECK=sparse
 @ifGNUmake@ CHECK_OPTS=-Wsparse-all -Wno-transparent-union -Wno-return-void -Wno-undef -Wno-non-pointer-null
+@ifGNUmake@ CPPCHECK=cppcheck
+@ifGNUmake@ CPPCHECK_OPTS=--force --enable=all
 @ifGNUmake@ ifeq ("$(C)", "2")
 @ifGNUmake@   CHECK_CMD=$(CHECK) $(CHECK_OPTS) -Wbitwise -D__CHECK_ENDIAN__
+@ifGNUmake@   CPPCHECK_CMD=$(CPPCHECK) $(CPPCHECK_OPTS)
 @ifGNUmake@ else
 @ifGNUmake@   ifeq ("$(C)", "1")
 @ifGNUmake@     CHECK_CMD=$(CHECK) $(CHECK_OPTS)
+@ifGNUmake@     CPPCHECK_CMD=$(CPPCHECK) $(CPPCHECK_OPTS)
 @ifGNUmake@    else
 @ifGNUmake@     CHECK_CMD=@true
+@ifGNUmake@     CPPCHECK_CMD=@true
 @ifGNUmake@   endif
 @ifGNUmake@ endif
 
 @ifNotGNUmake@ CHECK_CMD=@true
+@ifNotGNUmake@ CPPHECK_CMD=@true
 
 CC = @CC@
 BUILD_CC = @BUILD_CC@
diff --git a/debugfs/Makefile.in b/debugfs/Makefile.in
index 16d6aa7..34cdac1 100644
--- a/debugfs/Makefile.in
+++ b/debugfs/Makefile.in
@@ -46,6 +46,7 @@ STATIC_DEPLIBS= $(STATIC_LIBEXT2FS) $(DEPSTATIC_LIBSS) \
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 
 all:: $(PROGS) $(MANPAGES)
 
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index c23f1cb..5c8ce39 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -40,6 +40,7 @@ COMPILE_ET=$(top_builddir)/lib/et/compile_et --build-tree
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 
 #
diff --git a/ext2ed/Makefile.in b/ext2ed/Makefile.in
index 5f4cc69..f05a562 100644
--- a/ext2ed/Makefile.in
+++ b/ext2ed/Makefile.in
@@ -34,6 +34,7 @@ DOCS=   doc/ext2ed-design.pdf doc/user-guide.pdf doc/ext2fs-overview.pdf \
 .c.o:
 	$(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(CPPCHECK_CMD) $<
 
 .SUFFIXES: .sgml .ps .pdf .html
 
diff --git a/intl/Makefile.in b/intl/Makefile.in
index 87d081f..07700c8 100644
--- a/intl/Makefile.in
+++ b/intl/Makefile.in
@@ -61,17 +61,23 @@ mkinstalldirs = $(SHELL) $(MKINSTALLDIRS)
 
 @ifGNUmake@ CHECK=sparse
 @ifGNUmake@ CHECK_OPTS=-Wsparse-all -Wno-transparent-union -Wno-return-void -Wno-undef -Wno-non-pointer-null
+@ifGNUmake@ CPPCHECK=cppcheck
+@ifGNUmake@ CPPCHECK_OPTS=--force --enable=all
 @ifGNUmake@ ifeq ("$(C)", "2")
 @ifGNUmake@   CHECK_CMD=$(CHECK) $(CHECK_OPTS) -Wbitwise -D__CHECK_ENDIAN__
+@ifGNUmake@   CPPCHECK_CMD=$(CPPCHECK) $(CPPCHECK_OPTS)
 @ifGNUmake@ else
 @ifGNUmake@   ifeq ("$(C)", "1")
 @ifGNUmake@     CHECK_CMD=$(CHECK) $(CHECK_OPTS)
+@ifGNUmake@     CPPCHECK_CMD=$(CPPCHECK) $(CPPCHECK_OPTS)
 @ifGNUmake@    else
 @ifGNUmake@     CHECK_CMD=@true
+@ifGNUmake@     CPPCHECK_CMD=@true
 @ifGNUmake@   endif
 @ifGNUmake@ endif
 
 @ifNotGNUmake@ CHECK_CMD=@true
+@ifNotGNUmake@ CPPCHECK_CMD=@true
 
 l = @INTL_LIBTOOL_SUFFIX_PREFIX@
 
@@ -206,6 +212,7 @@ LTV_AGE=4
 	$(E) "	CC $<"
 	$(Q) $(COMPILE) $<
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 
 .y.c:
 	$(YACC) $(YFLAGS) --output $@ $<
diff --git a/lib/blkid/Makefile.in b/lib/blkid/Makefile.in
index faed6f1..69b5b4c 100644
--- a/lib/blkid/Makefile.in
+++ b/lib/blkid/Makefile.in
@@ -56,6 +56,7 @@ DEPLIBS_BLKID=	$(DEPSTATIC_LIBBLKID) $(DEPSTATIC_LIBUUID)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/e2p/Makefile.in b/lib/e2p/Makefile.in
index d6992fc..761ac48 100644
--- a/lib/e2p/Makefile.in
+++ b/lib/e2p/Makefile.in
@@ -56,6 +56,7 @@ BSDLIB_INSTALL_DIR = $(root_libdir)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/et/Makefile.in b/lib/et/Makefile.in
index ff99f5d..4f2d31f 100644
--- a/lib/et/Makefile.in
+++ b/lib/et/Makefile.in
@@ -44,6 +44,7 @@ BSDLIB_INSTALL_DIR = $(root_libdir)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 7777cb1..0c880c7 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -205,6 +205,7 @@ all:: ext2fs.pc
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/quota/Makefile.in b/lib/quota/Makefile.in
index e423356..0344d09 100644
--- a/lib/quota/Makefile.in
+++ b/lib/quota/Makefile.in
@@ -48,6 +48,7 @@ LIBDIR= quota
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 #ELF_CMT#	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/ss/Makefile.in b/lib/ss/Makefile.in
index 28bcfd5..4c1ef8f 100644
--- a/lib/ss/Makefile.in
+++ b/lib/ss/Makefile.in
@@ -35,6 +35,7 @@ MK_CMDS=_SS_DIR_OVERRIDE=. ./mk_cmds
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $<
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -DSHARED_ELF_LIB -fPIC -o elfshared/$*.o -c $<
diff --git a/lib/uuid/Makefile.in b/lib/uuid/Makefile.in
index 14d08c1..f5b767e 100644
--- a/lib/uuid/Makefile.in
+++ b/lib/uuid/Makefile.in
@@ -63,6 +63,7 @@ BSDLIB_INSTALL_DIR = $(root_libdir)
 	$(E) "	CC $<"
 	$(Q) $(CC) $(ALL_CFLAGS) -c $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 @CHECKER_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -checker -g -o checker/$*.o -c $<
 @ELF_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -fPIC -o elfshared/$*.o -c $<
diff --git a/misc/Makefile.in b/misc/Makefile.in
index e061867..18a8a2f 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -103,6 +103,7 @@ COMPILE_ET=$(top_builddir)/lib/et/compile_et --build-tree
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 @PROFILE_CMT@	$(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<
 
 all:: profiled $(SPROGS) $(UPROGS) $(USPROGS) $(SMANPAGES) $(UMANPAGES) \
diff --git a/resize/Makefile.in b/resize/Makefile.in
index f7b80ef..16f2a95 100644
--- a/resize/Makefile.in
+++ b/resize/Makefile.in
@@ -39,6 +39,7 @@ DEPSTATIC_LIBS= $(STATIC_LIBE2P) $(STATIC_LIBEXT2FS) $(DEPSTATIC_LIBCOM_ERR)
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 
 all:: $(PROGS) $(TEST_PROGS) $(MANPAGES) 
 
diff --git a/tests/progs/Makefile.in b/tests/progs/Makefile.in
index 44d04b5..6c986e4 100644
--- a/tests/progs/Makefile.in
+++ b/tests/progs/Makefile.in
@@ -28,6 +28,7 @@ DEPLIBS= $(LIBEXT2FS) $(DEPLIBSS) $(DEPLIBCOM_ERR)
 	$(E) "	CC $<"
 	$(Q) $(CC) -c $(ALL_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 
 all:: $(PROGS)
 
diff --git a/util/Makefile.in b/util/Makefile.in
index d235fff..2375e17 100644
--- a/util/Makefile.in
+++ b/util/Makefile.in
@@ -17,6 +17,7 @@ SRCS = $(srcdir)/subst.c
 	$(E) "	CC $<"
 	$(Q) $(BUILD_CC) -c $(BUILD_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
+	$(Q) $(CPPCHECK_CMD) $<
 
 PROGS=		subst symlinks
 


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 16/49] misc: cppcheck cleanups
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (14 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 15/49] all: Introduce cppcheck static checking for make C=1 Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:34   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 17/49] libext2fs: fix 64bit overflow in ext2fs_block_alloc_stats_range Darrick J. Wong
                   ` (30 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix a number of things that cppcheck complains about.  Most of these
are minor resource leaks and forgotten declarations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c      |    2 +-
 debugfs/debugfs.h      |    4 ++++
 e2fsck/pass1.c         |    2 +-
 e2fsck/pass2.c         |    2 +-
 e2fsck/region.c        |    2 ++
 lib/ext2fs/expanddir.c |    1 +
 lib/ext2fs/ext2fs.h    |   11 +++++++++++
 lib/ext2fs/ext2fsP.h   |    9 ---------
 lib/ext2fs/mkdir.c     |    1 +
 lib/ext2fs/punch.c     |    1 +
 util/subst.c           |    2 ++
 11 files changed, 25 insertions(+), 12 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index a5cd007..a10446d 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -666,7 +666,7 @@ static void dump_inline_data(FILE *out, const char *prefix, ext2_ino_t inode_num
 
 	retval = ext2fs_inline_data_size(current_fs, inode_num, &size);
 	if (!retval)
-		fprintf(out, "%sSize of inline data: %d", prefix, size);
+		fprintf(out, "%sSize of inline data: %zu", prefix, size);
 }
 
 void internal_dump_inode(FILE *out, const char *prefix,
diff --git a/debugfs/debugfs.h b/debugfs/debugfs.h
index 3c27f82..0164ca5 100644
--- a/debugfs/debugfs.h
+++ b/debugfs/debugfs.h
@@ -177,6 +177,10 @@ extern time_t string_to_time(const char *arg);
 
 /* xattrs.c */
 void dump_inode_attributes(FILE *out, ext2_ino_t ino);
+void do_get_xattr(int argc, char **argv);
+void do_set_xattr(int argc, char **argv);
+void do_rm_xattr(int argc, char **argv);
+void do_list_xattr(int argc, char **argv);
 
 /* zap.c */
 extern void do_zap_block(int argc, char **argv);
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 641b3fb..eb9497c 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -446,7 +446,7 @@ static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
 	inlinedata_fs = (ctx->fs->super->s_feature_incompat &
 			 EXT4_FEATURE_INCOMPAT_INLINE_DATA);
 	if (inlinedata_fs && (inode->i_flags & EXT4_INLINE_DATA_FL)) {
-		unsigned int size;
+		size_t size;
 
 		if (ext2fs_inline_data_size(ctx->fs, pctx->ino, &size))
 			return;
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 586f3a8..5488c73 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -770,7 +770,7 @@ static int check_dir_block(ext2_filsys fs,
 	int	dx_csum_size = 0, de_csum_size = 0;
 	int	failed_csum = 0;
 	int	is_leaf = 1;
-	int	inline_data_size = 0;
+	size_t	inline_data_size = 0;
 	int	filetype = 0;
 
 	cd = (struct check_dir_struct *) priv_data;
diff --git a/e2fsck/region.c b/e2fsck/region.c
index 4b669f0..aaaaa19 100644
--- a/e2fsck/region.c
+++ b/e2fsck/region.c
@@ -203,6 +203,8 @@ int main(int argc, char **argv)
 			break;
 		}
 	}
+	if (r)
+		region_free(r);
 }
 
 #endif /* TEST_PROGRAM */
diff --git a/lib/ext2fs/expanddir.c b/lib/ext2fs/expanddir.c
index 7cff343..d0f7287 100644
--- a/lib/ext2fs/expanddir.c
+++ b/lib/ext2fs/expanddir.c
@@ -18,6 +18,7 @@
 
 #include "ext2_fs.h"
 #include "ext2fs.h"
+#include "ext2fsP.h"
 
 struct expand_dir_struct {
 	int		done;
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 3756e8b..599c972 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1386,6 +1386,17 @@ errcode_t ext2fs_icount_validate(ext2_icount_t icount, FILE *);
 extern errcode_t ext2fs_get_memalign(unsigned long size,
 				     unsigned long align, void *ptr);
 
+/* inline_data.c */
+extern errcode_t ext2fs_inline_data_init(ext2_filsys fs, ext2_ino_t ino);
+extern errcode_t ext2fs_inline_data_size(ext2_filsys fs, ext2_ino_t ino,
+					 size_t *size);
+extern errcode_t ext2fs_inline_data_get(ext2_filsys fs, ext2_ino_t ino,
+					struct ext2_inode *inode,
+					void *buf, size_t *size);
+extern errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
+					struct ext2_inode *inode,
+					void *buf, size_t size);
+
 /* inode.c */
 extern errcode_t ext2fs_create_inode_cache(ext2_filsys fs,
 					   unsigned int cache_size);
diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index 20257e2..f8c61e6 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -88,20 +88,11 @@ extern int ext2fs_process_dir_block(ext2_filsys  	fs,
 				    int			ref_offset,
 				    void		*priv_data);
 
-extern errcode_t ext2fs_inline_data_init(ext2_filsys fs, ext2_ino_t ino);
-extern errcode_t ext2fs_inline_data_size(ext2_filsys fs, ext2_ino_t ino,
-					 size_t *size);
 extern errcode_t ext2fs_inline_data_ea_remove(ext2_filsys fs, ext2_ino_t ino);
 extern errcode_t ext2fs_inline_data_expand(ext2_filsys fs, ext2_ino_t ino);
 extern int ext2fs_inline_data_dir_iterate(ext2_filsys fs,
 					  ext2_ino_t ino,
 					  void *priv_data);
-extern errcode_t ext2fs_inline_data_get(ext2_filsys fs, ext2_ino_t ino,
-					struct ext2_inode *inode,
-					void *buf, size_t *size);
-extern errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
-					struct ext2_inode *inode,
-					void *buf, size_t size);
 
 /* Generic numeric progress meter */
 
diff --git a/lib/ext2fs/mkdir.c b/lib/ext2fs/mkdir.c
index 06c2c7e..c4c7967 100644
--- a/lib/ext2fs/mkdir.c
+++ b/lib/ext2fs/mkdir.c
@@ -26,6 +26,7 @@
 
 #include "ext2_fs.h"
 #include "ext2fs.h"
+#include "ext2fsP.h"
 
 #ifndef EXT2_FT_DIR
 #define EXT2_FT_DIR		2
diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 95e19d9..532c4b8 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -19,6 +19,7 @@
 
 #include "ext2_fs.h"
 #include "ext2fs.h"
+#include "ext2fsP.h"
 
 #undef PUNCH_DEBUG
 
diff --git a/util/subst.c b/util/subst.c
index 6a5eab1..2ea16d9 100644
--- a/util/subst.c
+++ b/util/subst.c
@@ -426,6 +426,8 @@ int main(int argc, char **argv)
 	}
 	if (old)
 		fclose(old);
+	if (newfn)
+		free(newfn);
 	return (0);
 }
 


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 17/49] libext2fs: fix 64bit overflow in ext2fs_block_alloc_stats_range
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (15 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 16/49] misc: cppcheck cleanups Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:35   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs Darrick J. Wong
                   ` (29 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In ext2fs_block_alloc_stats_range(), the quantity "-inuse * n" is
calculated as a signed 32-bit quantity.  Unfortunately, gcc (4.6.3 on
Ubuntu 12.04) doesn't sign-extend this quantity to fill the blk64_t
parameter that ext2fs_free_blocks_count_add() wants, so the end result
is that the superblock gets a ridiculously huge free block count.

Changing the declaration of 'n' to blk64_t seems to fix this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/alloc_stats.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
index 5bb86ef..4feb24d 100644
--- a/lib/ext2fs/alloc_stats.c
+++ b/lib/ext2fs/alloc_stats.c
@@ -129,7 +129,7 @@ void ext2fs_block_alloc_stats_range(ext2_filsys fs, blk64_t blk,
 	while (num) {
 		int group = ext2fs_group_of_blk2(fs, blk);
 		blk64_t last_blk = ext2fs_group_last_block2(fs, group);
-		blk_t n = num;
+		blk64_t n = num;
 
 		if (blk + num > last_blk)
 			n = last_blk - blk + 1;


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (16 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 17/49] libext2fs: fix 64bit overflow in ext2fs_block_alloc_stats_range Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:39   ` Theodore Ts'o
  2014-03-14 13:53   ` Theodore Ts'o
  2014-03-11  6:55 ` [PATCH 19/49] libext2fs: fix memory leak when drastically shrinking extent tree depth Darrick J. Wong
                   ` (28 subsequent siblings)
  46 siblings, 2 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix a few minor bugs that cppcheck complained about.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c   |    1 +
 debugfs/util.c      |    2 +-
 e2fsck/unix.c       |    1 +
 lib/ext2fs/icount.c |    2 ++
 util/subst.c        |    3 +++
 5 files changed, 8 insertions(+), 1 deletion(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index a10446d..72ab040 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -657,6 +657,7 @@ static void dump_extents(FILE *f, const char *prefix, ext2_ino_t ino,
 	}
 	if (printed)
 		fprintf(f, "\n");
+	ext2fs_extent_free(handle);
 }
 
 static void dump_inline_data(FILE *out, const char *prefix, ext2_ino_t inode_num)
diff --git a/debugfs/util.c b/debugfs/util.c
index 9ddfe0b..5cc4e22 100644
--- a/debugfs/util.c
+++ b/debugfs/util.c
@@ -201,7 +201,7 @@ char *time_to_string(__u32 cl)
 		tz = ss_safe_getenv("TZ");
 		if (!tz)
 			tz = "";
-		do_gmt = !strcmp(tz, "GMT") | !strcmp(tz, "GMT0");
+		do_gmt = !strcmp(tz, "GMT") || !strcmp(tz, "GMT0");
 	}
 
 	return asctime((do_gmt) ? gmtime(&t) : localtime(&t));
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index b39383d..11c2693 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1016,6 +1016,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 			strcat(newpath, oldpath);
 		}
 		putenv(newpath);
+		free(newpath);
 	}
 #ifdef CONFIG_JBD_DEBUG
 	jbd_debug = getenv("E2FSCK_JBD_DEBUG");
diff --git a/lib/ext2fs/icount.c b/lib/ext2fs/icount.c
index a3b20f0..7d1b3d5 100644
--- a/lib/ext2fs/icount.c
+++ b/lib/ext2fs/icount.c
@@ -198,6 +198,7 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
 	fd = mkstemp(fn);
 	if (fd < 0) {
 		retval = errno;
+		ext2fs_free_mem(&fn);
 		goto errout;
 	}
 	umask(save_umask);
@@ -216,6 +217,7 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
 	close(fd);
 	if (icount->tdb == NULL) {
 		retval = errno;
+		ext2fs_free_mem(&fn);
 		goto errout;
 	}
 	*ret = icount;
diff --git a/util/subst.c b/util/subst.c
index 2ea16d9..32d5293 100644
--- a/util/subst.c
+++ b/util/subst.c
@@ -17,6 +17,9 @@
 #include <fcntl.h>
 #include <time.h>
 #include <utime.h>
+#ifdef HAVE_SYS_TIME_H
+#include <sys/time.h>
+#endif
 
 #ifdef HAVE_GETOPT_H
 #include <getopt.h>


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 19/49] libext2fs: fix memory leak when drastically shrinking extent tree depth
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (17 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs Darrick J. Wong
@ 2014-03-11  6:55 ` Darrick J. Wong
  2014-03-14 13:56   ` Theodore Ts'o
  2014-03-11  6:56 ` [PATCH 20/49] libext2fs: fix parents when modifying extents Darrick J. Wong
                   ` (27 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:55 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In ext2fs_extent_free(), h(andle)->max_depth is used as a loop
conditional variable to free all the h->path[].buf pointers.  However,
ext2fs_extent_delete() sets max_depth = 0 if we've removed everything
from the extent tree, which causes a subsequent _free() to leak some
buf pointers.  max_depth can be re-incremented when splitting extent
nodes, but there's no guarantee that it'll reach the old value before
the free.

Therefore, remember the size of h->paths[] separately, and use that
when freeing the extent handle.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/extent.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)


diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 3ccae66..f27344e 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -58,6 +58,7 @@ struct ext2_extent_handle {
 	int			type;
 	int			level;
 	int			max_depth;
+	int			max_paths;
 	struct extent_path	*path;
 };
 
@@ -168,7 +169,7 @@ void ext2fs_extent_free(ext2_extent_handle_t handle)
 		return;
 
 	if (handle->path) {
-		for (i=1; i <= handle->max_depth; i++) {
+		for (i = 1; i < handle->max_paths; i++) {
 			if (handle->path[i].buf)
 				ext2fs_free_mem(&handle->path[i].buf);
 		}
@@ -242,11 +243,10 @@ errcode_t ext2fs_extent_open2(ext2_filsys fs, ext2_ino_t ino,
 	handle->max_depth = ext2fs_le16_to_cpu(eh->eh_depth);
 	handle->type = ext2fs_le16_to_cpu(eh->eh_magic);
 
-	retval = ext2fs_get_mem(((handle->max_depth+1) *
-				 sizeof(struct extent_path)),
-				&handle->path);
-	memset(handle->path, 0,
-	       (handle->max_depth+1) * sizeof(struct extent_path));
+	handle->max_paths = handle->max_depth + 1;
+	retval = ext2fs_get_memzero(handle->max_paths *
+				    sizeof(struct extent_path),
+				    &handle->path);
 	handle->path[0].buf = (char *) handle->inode->i_block;
 
 	handle->path[0].left = handle->path[0].entries =
@@ -912,13 +912,11 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 	if (handle->level == 0) {
 		new_root = 1;
 		tocopy = ext2fs_le16_to_cpu(eh->eh_entries);
-		retval = ext2fs_get_mem(((handle->max_depth+2) *
-					 sizeof(struct extent_path)),
-					&newpath);
+		retval = ext2fs_get_memzero((handle->max_paths + 1) *
+					    sizeof(struct extent_path),
+					    &newpath);
 		if (retval)
 			goto done;
-		memset(newpath, 0,
-		       ((handle->max_depth+2) * sizeof(struct extent_path)));
 	} else {
 		tocopy = ext2fs_le16_to_cpu(eh->eh_entries) / 2;
 	}
@@ -996,13 +994,14 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 	/* current path now has fewer active entries, we copied some out */
 	if (handle->level == 0) {
 		memcpy(newpath, path,
-		       sizeof(struct extent_path) * (handle->max_depth+1));
+		       sizeof(struct extent_path) * handle->max_paths);
 		handle->path = newpath;
 		newpath = path;
 		path = handle->path;
 		path->entries = 1;
 		path->left = path->max_entries - 1;
 		handle->max_depth++;
+		handle->max_paths++;
 		eh->eh_depth = ext2fs_cpu_to_le16(handle->max_depth);
 	} else {
 		path->entries -= tocopy;


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 20/49] libext2fs: fix parents when modifying extents
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (18 preceding siblings ...)
  2014-03-11  6:55 ` [PATCH 19/49] libext2fs: fix memory leak when drastically shrinking extent tree depth Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-14 14:01   ` Theodore Ts'o
  2014-03-11  6:56 ` [PATCH 21/49] e2fsck: print runs of duplicate blocks instead of all of them Darrick J. Wong
                   ` (26 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
when altering either end of an extent so that the parent nodes reflect
the added mapping.

There's a slight complication to using fix_parents: if there are two
mappings to an lblk in the tree, the value of handle->path->curr can
point to either extent afterwards), which is documented in a comment.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/extent.c |   30 ++++++++++++++++++++++++------
 lib/ext2fs/punch.c  |   14 ++++++++++----
 2 files changed, 34 insertions(+), 10 deletions(-)


diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index f27344e..80ce88f 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -720,7 +720,14 @@ errcode_t ext2fs_extent_goto(ext2_extent_handle_t handle,
  * and so on.
  *
  * Safe to call for any position in node; if not at the first entry,
- * will  simply return.
+ * it will simply return.
+ *
+ * Note a subtlety of this function -- if there happen to be two extents
+ * mapping the same lblk and someone calls fix_parents on the second of the two
+ * extents, the position of the extent handle after the call will be the second
+ * extent if nothing happened, or the first extent if something did.  A caller
+ * in this situation must use ext2fs_extent_goto() after calling this function.
+ * Or simply don't map the same lblk with two extents, ever.
  */
 errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
 {
@@ -1379,17 +1386,25 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 							       &next_extent);
 				if (retval)
 					goto done;
-				retval = ext2fs_extent_fix_parents(handle);
-				if (retval)
-					goto done;
 			} else
 				retval = ext2fs_extent_insert(handle,
 				      EXT2_EXTENT_INSERT_AFTER, &newextent);
 			if (retval)
 				goto done;
-			/* Now pointing at inserted extent; move back to prev */
+			retval = ext2fs_extent_fix_parents(handle);
+			if (retval)
+				goto done;
+			/*
+			 * Now pointing at inserted extent; move back to prev.
+			 *
+			 * We cannot use EXT2_EXTENT_PREV to go back; note the
+			 * subtlety in the comment for fix_parents().
+			 */
+			retval = ext2fs_extent_goto(handle, logical);
+			if (retval)
+				goto done;
 			retval = ext2fs_extent_get(handle,
-						   EXT2_EXTENT_PREV_LEAF,
+						   EXT2_EXTENT_CURRENT,
 						   &extent);
 			if (retval)
 				goto done;
@@ -1422,6 +1437,9 @@ errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 							      0, &newextent);
 			if (retval)
 				goto done;
+			retval = ext2fs_extent_fix_parents(handle);
+			if (retval)
+				goto done;
 			retval = ext2fs_extent_get(handle,
 						   EXT2_EXTENT_NEXT_LEAF,
 						   &extent);
diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 532c4b8..60cd2a3 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -344,10 +344,16 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 					EXT2_EXTENT_INSERT_AFTER, &newex);
 			if (retval)
 				goto errout;
-			/* Now pointing at inserted extent; so go back */
-			retval = ext2fs_extent_get(handle,
-						   EXT2_EXTENT_PREV_LEAF,
-						   &newex);
+			retval = ext2fs_extent_fix_parents(handle);
+			if (retval)
+				goto errout;
+			/*
+			 * Now pointing at inserted extent; so go back.
+			 *
+			 * We cannot use EXT2_EXTENT_PREV to go back; note the
+			 * subtlety in the comment for fix_parents().
+			 */
+			retval = ext2fs_extent_goto(handle, extent.e_lblk);
 			if (retval)
 				goto errout;
 		} 


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 21/49] e2fsck: print runs of duplicate blocks instead of all of them
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (19 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 20/49] libext2fs: fix parents when modifying extents Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-15 16:19   ` Theodore Ts'o
  2014-03-11  6:56 ` [PATCH 22/49] e2fsck: verify checksums after checking everything else Darrick J. Wong
                   ` (25 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When pass1 finds blocks that are mapped to multiple files, it will
print every duplicated block.  If there are long sequences of
duplicate blocks (e.g. the e_pblk field is wrong in an extent), this
can cause a gigantic flood of output when a range could convey the
same information.  Therefore, teach pass1b to print ranges when
possible.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1b.c             |   23 +++++++++++++++++++++--
 e2fsck/problem.c            |    5 +++++
 e2fsck/problem.h            |    3 +++
 tests/f_bbfile/expect.1     |    4 ++--
 tests/f_dup/expect.1        |    4 ++--
 tests/f_dup2/expect.1       |    6 +++---
 tests/f_dup_ba/expect.1     |   12 ++++++------
 tests/f_dup_resize/expect.1 |    4 ++--
 tests/f_dupfsblks/expect.1  |    4 ++--
 tests/f_dupsuper/expect.1   |    2 +-
 10 files changed, 47 insertions(+), 20 deletions(-)


diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index 41a82cf..d7c5e55 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -262,6 +262,7 @@ struct process_block_struct {
 	ext2_ino_t	ino;
 	int		dup_blocks;
 	blk64_t		cur_cluster;
+	blk64_t		last_blk;
 	struct ext2_inode *inode;
 	struct problem_context *pctx;
 };
@@ -274,6 +275,7 @@ static void pass1b(e2fsck_t ctx, char *block_buf)
 	ext2_inode_scan	scan;
 	struct process_block_struct pb;
 	struct problem_context pctx;
+	problem_t op;
 
 	clear_problem_context(&pctx);
 
@@ -314,6 +316,8 @@ static void pass1b(e2fsck_t ctx, char *block_buf)
 		pb.dup_blocks = 0;
 		pb.inode = &inode;
 		pb.cur_cluster = ~0;
+		pb.last_blk = 0;
+		pb.pctx->blk = pb.pctx->blk2 = 0;
 
 		if (ext2fs_inode_has_valid_blocks2(fs, &inode) ||
 		    (ino == EXT2_BAD_INO))
@@ -329,6 +333,11 @@ static void pass1b(e2fsck_t ctx, char *block_buf)
 			ext2fs_file_acl_block_set(fs, &inode, blk);
 		}
 		if (pb.dup_blocks) {
+			if (ino != EXT2_BAD_INO) {
+				op = pctx.blk == pctx.blk2 ?
+					PR_1B_DUP_BLOCK : PR_1B_DUP_RANGE;
+				fix_problem(ctx, op, pb.pctx);
+			}
 			end_problem_latch(ctx, PR_LATCH_DBLOCK);
 			if (ino >= EXT2_FIRST_INODE(fs->super) ||
 			    ino == EXT2_ROOT_INO)
@@ -351,6 +360,7 @@ static int process_pass1b_block(ext2_filsys fs EXT2FS_ATTR((unused)),
 	struct process_block_struct *p;
 	e2fsck_t ctx;
 	blk64_t	lc;
+	problem_t op;
 
 	if (HOLE_BLKADDR(*block_nr))
 		return 0;
@@ -363,8 +373,17 @@ static int process_pass1b_block(ext2_filsys fs EXT2FS_ATTR((unused)),
 
 	/* OK, this is a duplicate block */
 	if (p->ino != EXT2_BAD_INO) {
-		p->pctx->blk = *block_nr;
-		fix_problem(ctx, PR_1B_DUP_BLOCK, p->pctx);
+		if (p->last_blk + 1 != *block_nr) {
+			if (p->last_blk) {
+				op = p->pctx->blk == p->pctx->blk2 ?
+						PR_1B_DUP_BLOCK :
+						PR_1B_DUP_RANGE;
+				fix_problem(ctx, op, p->pctx);
+			}
+			p->pctx->blk = *block_nr;
+		}
+		p->pctx->blk2 = *block_nr;
+		p->last_blk = *block_nr;
 	}
 	p->dup_blocks++;
 	ext2fs_mark_inode_bitmap2(inode_dup_map, p->ino);
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 1282858..7f0ad6c 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1073,6 +1073,11 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("Error adjusting refcount for @a @b %b (@i %i): %m\n"),
 	  PROMPT_NONE, 0 },
 
+	/* Duplicate/bad block range in inode */
+	{ PR_1B_DUP_RANGE,
+	  " %b--%c",
+	  PROMPT_NONE, PR_LATCH_DBLOCK | PR_PREEN_NOHDR },
+
 	/* Pass 1C: Scan directories for inodes with multiply-claimed blocks. */
 	{ PR_1C_PASS_HEADER,
 	  N_("Pass 1C: Scanning directories for @is with @m @bs\n"),
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 61cbbef..bc9fa9c 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -628,6 +628,9 @@ struct problem_context {
 /* Error adjusting EA refcount */
 #define PR_1B_ADJ_EA_REFCOUNT	0x011007
 
+/* Duplicate/bad block range in inode */
+#define PR_1B_DUP_RANGE		0x011008
+
 /* Pass 1C: Scan directories for inodes with dup blocks. */
 #define PR_1C_PASS_HEADER	0x012000
 
diff --git a/tests/f_bbfile/expect.1 b/tests/f_bbfile/expect.1
index 1d639f6..ec1a36e 100644
--- a/tests/f_bbfile/expect.1
+++ b/tests/f_bbfile/expect.1
@@ -8,8 +8,8 @@ Relocating group 0's inode bitmap from 4 to 43...
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
 Multiply-claimed block(s) in inode 2: 21
-Multiply-claimed block(s) in inode 11: 9 10 11 12 13 14 15 16 17 18 19 20
-Multiply-claimed block(s) in inode 12: 25 26
+Multiply-claimed block(s) in inode 11: 9--20
+Multiply-claimed block(s) in inode 12: 25--26
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
 (There are 3 inodes containing multiply-claimed blocks.)
diff --git a/tests/f_dup/expect.1 b/tests/f_dup/expect.1
index e7128f3..075e62c 100644
--- a/tests/f_dup/expect.1
+++ b/tests/f_dup/expect.1
@@ -4,8 +4,8 @@ Pass 1: Checking inodes, blocks, and sizes
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
-Multiply-claimed block(s) in inode 12: 25 26
-Multiply-claimed block(s) in inode 13: 25 26
+Multiply-claimed block(s) in inode 12: 25--26
+Multiply-claimed block(s) in inode 13: 25--26
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
 (There are 2 inodes containing multiply-claimed blocks.)
diff --git a/tests/f_dup2/expect.1 b/tests/f_dup2/expect.1
index 0476005..69aa21b 100644
--- a/tests/f_dup2/expect.1
+++ b/tests/f_dup2/expect.1
@@ -4,9 +4,9 @@ Pass 1: Checking inodes, blocks, and sizes
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
-Multiply-claimed block(s) in inode 12: 25 26
-Multiply-claimed block(s) in inode 13: 25 26 57 58
-Multiply-claimed block(s) in inode 14: 57 58
+Multiply-claimed block(s) in inode 12: 25--26
+Multiply-claimed block(s) in inode 13: 25--26 57--58
+Multiply-claimed block(s) in inode 14: 57--58
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
 (There are 3 inodes containing multiply-claimed blocks.)
diff --git a/tests/f_dup_ba/expect.1 b/tests/f_dup_ba/expect.1
index f0ad457..f4581c4 100644
--- a/tests/f_dup_ba/expect.1
+++ b/tests/f_dup_ba/expect.1
@@ -6,12 +6,12 @@ Inode 16, i_blocks is 128, should be 896.  Fix? yes
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
-Multiply-claimed block(s) in inode 16: 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
-Multiply-claimed block(s) in inode 17: 160 161
-Multiply-claimed block(s) in inode 18: 176 177
-Multiply-claimed block(s) in inode 19: 192 193
-Multiply-claimed block(s) in inode 20: 208 209
-Multiply-claimed block(s) in inode 21: 224 225
+Multiply-claimed block(s) in inode 16: 160--239
+Multiply-claimed block(s) in inode 17: 160--161
+Multiply-claimed block(s) in inode 18: 176--177
+Multiply-claimed block(s) in inode 19: 192--193
+Multiply-claimed block(s) in inode 20: 208--209
+Multiply-claimed block(s) in inode 21: 224--225
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
 (There are 6 inodes containing multiply-claimed blocks.)
diff --git a/tests/f_dup_resize/expect.1 b/tests/f_dup_resize/expect.1
index dd8fe05..aaf7769 100644
--- a/tests/f_dup_resize/expect.1
+++ b/tests/f_dup_resize/expect.1
@@ -4,8 +4,8 @@ Pass 1: Checking inodes, blocks, and sizes
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
-Multiply-claimed block(s) in inode 7: 4 5 6 7
-Multiply-claimed block(s) in inode 12: 4 5 6 7
+Multiply-claimed block(s) in inode 7: 4--7
+Multiply-claimed block(s) in inode 12: 4--7
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
 (There are 1 inodes containing multiply-claimed blocks.)
diff --git a/tests/f_dupfsblks/expect.1 b/tests/f_dupfsblks/expect.1
index 3f70109..6751986 100644
--- a/tests/f_dupfsblks/expect.1
+++ b/tests/f_dupfsblks/expect.1
@@ -8,8 +8,8 @@ Inode 13, i_size is 0, should be 2048.  Fix? yes
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
-Multiply-claimed block(s) in inode 12: 3 4 6 1
-Multiply-claimed block(s) in inode 13: 2 3
+Multiply-claimed block(s) in inode 12: 3--4 6 1
+Multiply-claimed block(s) in inode 13: 2--3
 Multiply-claimed block(s) in inode 14: 2
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
diff --git a/tests/f_dupsuper/expect.1 b/tests/f_dupsuper/expect.1
index 830370a..2107e2d 100644
--- a/tests/f_dupsuper/expect.1
+++ b/tests/f_dupsuper/expect.1
@@ -4,7 +4,7 @@ Pass 1: Checking inodes, blocks, and sizes
 
 Running additional passes to resolve blocks claimed by more than one inode...
 Pass 1B: Rescanning for multiply-claimed blocks
-Multiply-claimed block(s) in inode 12: 2 3 1
+Multiply-claimed block(s) in inode 12: 2--3 1
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 Pass 1D: Reconciling multiply-claimed blocks
 (There are 1 inodes containing multiply-claimed blocks.)


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 22/49] e2fsck: verify checksums after checking everything else
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (20 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 21/49] e2fsck: print runs of duplicate blocks instead of all of them Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:56 ` [PATCH 23/49] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
                   ` (24 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

There's a particular problem with e2fsck's user interface where
checksum errors are concerned:  Fixing the first complaint about
a checksum problem results in the inode being cleared even if e2fsck
could otherwise have recovered it.  While this mode is useful for
cleaning the remaining broken crud off the filesystem, we could at
least default to checking everything /else/ and only complaining about
the incorrect checksum if fsck finds nothing else wrong.

So, plumb in a config option.  We default to "verify and checksum"
unless the user tell us otherwise.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.8.in      |   12 ++++++++++++
 e2fsck/e2fsck.conf.5.in |   20 ++++++++++++++++++++
 e2fsck/e2fsck.h         |    1 +
 e2fsck/problem.c        |   18 ++++++++++++++----
 e2fsck/problemP.h       |    1 +
 e2fsck/unix.c           |   11 +++++++++++
 6 files changed, 59 insertions(+), 4 deletions(-)


diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index f5ed758..43ee063 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -207,6 +207,18 @@ option may prevent you from further manual data recovery.
 .BI nodiscard
 Do not attempt to discard free blocks and unused inode blocks. This option is
 exactly the opposite of discard option. This is set as default.
+.TP
+.BI strict_csums
+Verify each metadata object's checksum before checking anything other fields
+in the metadata object.  If the verification fails, offer to clear the item,
+also before checking any of the other fields.  This option causes e2fsck to
+favor throwing away broken objects over trying to salvage them.
+.TP
+.BI no_strict_csums
+Perform all regular checks of a metadata object and only verify the checksum if
+no problems were found.  This option causes e2fsck to try to salvage slightly
+damaged metadata objects, at the cost of spending processing time on recovering
+data.  This is set as the default.
 .RE
 .TP
 .B \-f
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index 9ebfbbf..a8219a8 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -222,6 +222,26 @@ If this boolean relation is true, e2fsck will run as if the option
 .B -v
 is always specified.  This will cause e2fsck to print some additional
 information at the end of each full file system check.
+.TP
+.I strict_csums
+If this boolean relation is true, e2fsck will run as if
+.B -E strict_csums
+is set.  This causes e2fsck to verify each metadata object's checksum before
+checking anything other fields in the metadata object.  If the verification
+fails, offer to clear the item, also before checking any of the other fields.
+This option causes e2fsck to favor throwing away broken objects over trying to
+salvage them.
+.IP
+If the boolean relation is false, e2fsck will run as if
+.B -E no_strict_csums
+is set.  In this case, e2fsck will perform all regular checks of a metadata
+object and only verify the checksum if no problems were found.  This option
+causes e2fsck to try to salvage slightly damaged metadata objects, at the cost
+of spending processing time on recovering data.
+.IP
+The default is for e2fsck to behave as if
+.B -E no_strict_csums
+is set.
 .SH THE [problems] STANZA
 Each tag in the
 .I [problems] 
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index dbd6ea8..d7a7be9 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -167,6 +167,7 @@ struct resource_track {
 #define E2F_OPT_FRAGCHECK	0x0800
 #define E2F_OPT_JOURNAL_ONLY	0x1000 /* only replay the journal */
 #define E2F_OPT_DISCARD		0x2000
+#define E2F_OPT_CSUM_FIRST	0x4000
 
 /*
  * E2fsck flags
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 7f0ad6c..0999399 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -970,7 +970,7 @@ static struct e2fsck_problem problem_table[] = {
 	/* inode checksum does not match inode */
 	{ PR_1_INODE_CSUM_INVALID,
 	  N_("@i %i checksum does not match @i.  "),
-	  PROMPT_CLEAR, PR_PREEN_OK },
+	  PROMPT_CLEAR, PR_PREEN_OK | PR_INITIAL_CSUM },
 
 	/* inode passes checks, but checksum does not match inode */
 	{ PR_1_INODE_ONLY_CSUM_INVALID,
@@ -981,7 +981,7 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1_EXTENT_CSUM_INVALID,
 	  N_("@i %i extent block checksum does not match extent\n\t(logical @b "
 	     "%c, @n physical @b %b, len %N)\n"),
-	  PROMPT_CLEAR, 0 },
+	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
 	 * Inode extent block passes checks, but checksum does not match
@@ -996,7 +996,7 @@ static struct e2fsck_problem problem_table[] = {
 	{ PR_1_EA_BLOCK_CSUM_INVALID,
 	  N_("Extended attribute @a @b %b checksum for @i %i does not "
 	     "match.  "),
-	  PROMPT_CLEAR, 0 },
+	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
 	 * Extended attribute block passes checks, but checksum for inode does
@@ -1470,7 +1470,7 @@ static struct e2fsck_problem problem_table[] = {
 	/* leaf node fails checksum */
 	{ PR_2_LEAF_NODE_CSUM_INVALID,
 	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
-	  PROMPT_SALVAGE, PR_PREEN_OK },
+	  PROMPT_SALVAGE, PR_PREEN_OK | PR_INITIAL_CSUM },
 
 	/* leaf node has no checksum */
 	{ PR_2_LEAF_NODE_MISSING_CSUM,
@@ -1944,6 +1944,16 @@ int fix_problem(e2fsck_t ctx, problem_t code, struct problem_context *pctx)
 		printf(_("Unhandled error code (0x%x)!\n"), code);
 		return 0;
 	}
+
+	/*
+	 * If there is a problem with the initial csum verification and the
+	 * user told e2fsck to verify csums /after/ checking everything else,
+	 * then don't "fix" anything.
+	 */
+	if ((ptr->flags & PR_INITIAL_CSUM) &&
+	    !(ctx->options & E2F_OPT_CSUM_FIRST))
+		return 0;
+
 	if (!(ptr->flags & PR_CONFIG)) {
 		char	key[9], *new_desc = NULL;
 
diff --git a/e2fsck/problemP.h b/e2fsck/problemP.h
index 7944cd6..a983598 100644
--- a/e2fsck/problemP.h
+++ b/e2fsck/problemP.h
@@ -44,3 +44,4 @@ struct latch_descr {
 #define PR_CONFIG	0x080000 /* This problem has been customized
 				    from the config file */
 #define PR_FORCE_NO	0x100000 /* Force the answer to be no */
+#define PR_INITIAL_CSUM	0x200000 /* User can ignore initial csum check */
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 11c2693..80ebdb1 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -692,6 +692,10 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 			else
 				ctx->log_fn = string_copy(ctx, arg, 0);
 			continue;
+		} else if (strcmp(token, "strict_csums") == 0) {
+			ctx->options |= E2F_OPT_CSUM_FIRST;
+		} else if (strcmp(token, "no_strict_csums") == 0) {
+			ctx->options &= ~E2F_OPT_CSUM_FIRST;
 		} else {
 			fprintf(stderr, _("Unknown extended option: %s\n"),
 				token);
@@ -710,6 +714,8 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs(("\tjournal_only\n"), stderr);
 		fputs(("\tdiscard\n"), stderr);
 		fputs(("\tnodiscard\n"), stderr);
+		fputs(("\tstrict_csums\n"), stderr);
+		fputs(("\tno_strict_csums\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
@@ -945,6 +951,11 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	profile_set_syntax_err_cb(syntax_err_report);
 	profile_init(config_fn, &ctx->profile);
 
+	profile_get_boolean(ctx->profile, "options", "strict_csums", NULL,
+			    0, &c);
+	if (c)
+		ctx->options |= E2F_OPT_CSUM_FIRST;
+
 	profile_get_boolean(ctx->profile, "options", "report_time", 0, 0,
 			    &c);
 	if (c)


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 23/49] e2fsck: fix the extended attribute checksum error message
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (21 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 22/49] e2fsck: verify checksums after checking everything else Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:56 ` [PATCH 24/49] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
                   ` (23 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Make the "EA block passes checks but fails checksum" message less
strange.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/problem.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)


diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 0999399..ec20bd1 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -992,19 +992,17 @@ static struct e2fsck_problem problem_table[] = {
 	     "extent\n\t(logical @b %c, @n physical @b %b, len %N)\n"),
 	  PROMPT_FIX, 0 },
 
-	/* Extended attribute block checksum for inode does not match. */
+	/* Extended attribute block checksum does not match. */
 	{ PR_1_EA_BLOCK_CSUM_INVALID,
-	  N_("Extended attribute @a @b %b checksum for @i %i does not "
-	     "match.  "),
+	  N_("@a @b %b checksum for @i %i does not match.  "),
 	  PROMPT_CLEAR, PR_INITIAL_CSUM },
 
 	/*
-	 * Extended attribute block passes checks, but checksum for inode does
-	 * not match.
+	 * Extended attribute block passes checks, but checksum does not
+	 * match.
 	 */
 	{ PR_1_EA_BLOCK_ONLY_CSUM_INVALID,
-	  N_("Extended attribute @a @b %b passes checks, but checksum for "
-	     "@i %i does not match.  "),
+	  N_("@a @b %b passes checks, but checksum does not match.  "),
 	  PROMPT_FIX, 0 },
 
 	/*


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 24/49] e2fsck: insert a missing dirent tail for checksums if possible
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (22 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 23/49] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:56 ` [PATCH 25/49] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
                   ` (22 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If e2fsck is writing a block of directory entries to disk, it should
adjust the dirents to add the dirent tail if one is missing.  It's not
a big deal if there's no space to do this since rehash (pass 3A) will
reconstruct directories for us.  However, we may as well avoid
unnecessary work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass2.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)


diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 5488c73..99b4042 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -739,6 +739,41 @@ static int is_last_entry(ext2_filsys fs, int inline_data_size,
 		return (offset < fs->blocksize - csum_size);
 }
 
+static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
+{
+	struct ext2_dir_entry *d;
+	void *top;
+	struct ext2_dir_entry_tail *t;
+	unsigned int rec_len;
+
+	d = dirbuf;
+	top = EXT2_DIRENT_TAIL(dirbuf, fs->blocksize);
+
+	rec_len = d->rec_len;
+	while (rec_len && !(rec_len & 0x3)) {
+		d = (struct ext2_dir_entry *)(((char *)d) + rec_len);
+		if (((void *)d) + d->rec_len >= top)
+			break;
+		rec_len = d->rec_len;
+	}
+
+	if (d != top) {
+		size_t min_size = EXT2_DIR_REC_LEN(
+				ext2fs_dirent_name_len(dirbuf));
+		if (min_size > d->rec_len - sizeof(struct ext2_dir_entry_tail))
+			return EXT2_ET_DIR_NO_SPACE_FOR_CSUM;
+		d->rec_len -= sizeof(struct ext2_dir_entry_tail);
+	}
+
+	t = (struct ext2_dir_entry_tail *)top;
+	if (t->det_reserved_zero1 ||
+	    t->det_rec_len != sizeof(struct ext2_dir_entry_tail) ||
+	    t->det_reserved_name_len != EXT2_DIR_NAME_LEN_CSUM)
+		ext2fs_initialize_dirent_tail(fs, t);
+
+	return 0;
+}
+
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *db,
 			   void *priv_data)
@@ -1275,7 +1310,11 @@ skip_checksum:
 		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
 				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
 		    is_leaf &&
+		    !inline_data_size &&
 		    !ext2fs_dirent_has_tail(fs, (struct ext2_dir_entry *)buf))
+		{
+			if (insert_dirent_tail(fs, buf) == 0)
+				goto write_and_fix;
 			e2fsck_rehash_dir_later(ctx, ino);
 
 write_and_fix:


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 25/49] e2fsck: write dir blocks after new inode when reconstructing root/lost+found
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (23 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 24/49] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:56 ` [PATCH 26/49] tests: add test for corrupted checksummed root directory block Darrick J. Wong
                   ` (21 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If e2fsck has to rebuild the root or lost+found directories, be sure
to write the new directory block after the inode, because the dir
block write has to read the on-disk inode for the generation number.
This solves the problem where e2fsck will cough out complaints about
checksums failing on lost+found and later crash.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass3.c |   85 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 45 insertions(+), 40 deletions(-)


diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 6f7f855..efc0d49 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -188,28 +188,6 @@ static void check_root(e2fsck_t ctx)
 	ext2fs_mark_bb_dirty(fs);
 
 	/*
-	 * Now let's create the actual data block for the inode
-	 */
-	pctx.errcode = ext2fs_new_dir_block(fs, EXT2_ROOT_INO, EXT2_ROOT_INO,
-					    &block);
-	if (pctx.errcode) {
-		pctx.str = "ext2fs_new_dir_block";
-		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
-		ctx->flags |= E2F_FLAG_ABORT;
-		return;
-	}
-
-	pctx.errcode = ext2fs_write_dir_block4(fs, blk, block, 0,
-					       EXT2_ROOT_INO);
-	if (pctx.errcode) {
-		pctx.str = "ext2fs_write_dir_block4";
-		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
-		ctx->flags |= E2F_FLAG_ABORT;
-		return;
-	}
-	ext2fs_free_mem(&block);
-
-	/*
 	 * Set up the inode structure
 	 */
 	memset(&inode, 0, sizeof(inode));
@@ -232,6 +210,30 @@ static void check_root(e2fsck_t ctx)
 	}
 
 	/*
+	 * Now let's create the actual data block for the inode.
+	 * Due to metadata_csum, we must write the dir blocks AFTER
+	 * the inode has been written to disk!
+	 */
+	pctx.errcode = ext2fs_new_dir_block(fs, EXT2_ROOT_INO, EXT2_ROOT_INO,
+					    &block);
+	if (pctx.errcode) {
+		pctx.str = "ext2fs_new_dir_block";
+		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
+		ctx->flags |= E2F_FLAG_ABORT;
+		return;
+	}
+
+	pctx.errcode = ext2fs_write_dir_block4(fs, blk, block, 0,
+					       EXT2_ROOT_INO);
+	ext2fs_free_mem(&block);
+	if (pctx.errcode) {
+		pctx.str = "ext2fs_write_dir_block4";
+		fix_problem(ctx, PR_3_CREATE_ROOT_ERROR, &pctx);
+		ctx->flags |= E2F_FLAG_ABORT;
+		return;
+	}
+
+	/*
 	 * Miscellaneous bookkeeping...
 	 */
 	e2fsck_add_dir_info(ctx, EXT2_ROOT_INO, EXT2_ROOT_INO);
@@ -449,24 +451,6 @@ unlink:
 	ext2fs_inode_alloc_stats2(fs, ino, +1, 1);
 
 	/*
-	 * Now let's create the actual data block for the inode
-	 */
-	retval = ext2fs_new_dir_block(fs, ino, EXT2_ROOT_INO, &block);
-	if (retval) {
-		pctx.errcode = retval;
-		fix_problem(ctx, PR_3_ERR_LPF_NEW_DIR_BLOCK, &pctx);
-		return 0;
-	}
-
-	retval = ext2fs_write_dir_block4(fs, blk, block, 0, ino);
-	ext2fs_free_mem(&block);
-	if (retval) {
-		pctx.errcode = retval;
-		fix_problem(ctx, PR_3_ERR_LPF_WRITE_BLOCK, &pctx);
-		return 0;
-	}
-
-	/*
 	 * Set up the inode structure
 	 */
 	memset(&inode, 0, sizeof(inode));
@@ -486,6 +470,27 @@ unlink:
 		fix_problem(ctx, PR_3_CREATE_LPF_ERROR, &pctx);
 		return 0;
 	}
+
+	/*
+	 * Now let's create the actual data block for the inode.
+	 * Due to metadata_csum, the directory block MUST be written
+	 * after the inode is written to disk!
+	 */
+	retval = ext2fs_new_dir_block(fs, ino, EXT2_ROOT_INO, &block);
+	if (retval) {
+		pctx.errcode = retval;
+		fix_problem(ctx, PR_3_ERR_LPF_NEW_DIR_BLOCK, &pctx);
+		return 0;
+	}
+
+	retval = ext2fs_write_dir_block4(fs, blk, block, 0, ino);
+	ext2fs_free_mem(&block);
+	if (retval) {
+		pctx.errcode = retval;
+		fix_problem(ctx, PR_3_ERR_LPF_WRITE_BLOCK, &pctx);
+		return 0;
+	}
+
 	/*
 	 * Finally, create the directory link
 	 */


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 26/49] tests: add test for corrupted checksummed root directory block
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (24 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 25/49] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:56 ` [PATCH 27/49] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
                   ` (20 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

fsck crashes if we take a checksummed filesystem, zap the root
directory block, and try to fix things.

If we trash the root directory block, e2fsck will find inode 11 (the
old lost+found) and try to attach it to l+f.  The lost+found checker
also fails to find l+f and tries to add one to the root dir.  The root
dir is not found but is recreated with incorrect checksums, so linking
in the l+f dir fails and the l+f '..' entry isn't set.  Since both
dirs now fail checksum verification, they're both referred to rehash
to have that fixed, but because l+f doesn't have a '..' entry, rehash
crashes because l+f has < 2 entries.

On a checksumming filesystem, the routines in e2fsck that recreate
/lost+found and / must write the new directory block *after* the inode
has been written to disk because the checksum depends on i_generation.
This will have been fixed in the previous patch, so add a regression
test.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/f_rebuild_csum_rootdir/expect.1 |  311 +++++++++++++++++++++++++++++++++
 tests/f_rebuild_csum_rootdir/expect.2 |    7 +
 tests/f_rebuild_csum_rootdir/image.gz |  Bin
 tests/f_rebuild_csum_rootdir/name     |    1 
 4 files changed, 319 insertions(+)
 create mode 100644 tests/f_rebuild_csum_rootdir/expect.1
 create mode 100644 tests/f_rebuild_csum_rootdir/expect.2
 create mode 100644 tests/f_rebuild_csum_rootdir/image.gz
 create mode 100644 tests/f_rebuild_csum_rootdir/name


diff --git a/tests/f_rebuild_csum_rootdir/expect.1 b/tests/f_rebuild_csum_rootdir/expect.1
new file mode 100644
index 0000000..6b5c47b
--- /dev/null
+++ b/tests/f_rebuild_csum_rootdir/expect.1
@@ -0,0 +1,311 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Directory inode 2, block #0, offset 0: directory has no checksum
+Fix? yes
+
+Directory inode 2, block #0, offset 0: directory corrupted
+Salvage? yes
+
+Missing '.' in directory inode 2.
+Fix? yes
+
+Setting filetype for entry '.' in ??? (2) to 2.
+Missing '..' in directory inode 2.
+Fix? yes
+
+Setting filetype for entry '..' in ??? (2) to 2.
+Pass 3: Checking directory connectivity
+'..' in / (2) is <The NULL inode> (0), should be / (2).
+Fix? yes
+
+Unconnected directory inode 11 (/???)
+Connect to /lost+found? yes
+
+/lost+found not found.  Create? yes
+
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Inode 11 ref count is 3, should be 2.  Fix? yes
+
+Unattached inode 12
+Connect to /lost+found? yes
+
+Inode 12 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 13
+Connect to /lost+found? yes
+
+Inode 13 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 14
+Connect to /lost+found? yes
+
+Inode 14 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 15
+Connect to /lost+found? yes
+
+Inode 15 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 16
+Connect to /lost+found? yes
+
+Inode 16 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 17
+Connect to /lost+found? yes
+
+Inode 17 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 18
+Connect to /lost+found? yes
+
+Inode 18 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 19
+Connect to /lost+found? yes
+
+Inode 19 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 20
+Connect to /lost+found? yes
+
+Inode 20 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 21
+Connect to /lost+found? yes
+
+Inode 21 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 22
+Connect to /lost+found? yes
+
+Inode 22 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 23
+Connect to /lost+found? yes
+
+Inode 23 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 24
+Connect to /lost+found? yes
+
+Inode 24 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 25
+Connect to /lost+found? yes
+
+Inode 25 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 26
+Connect to /lost+found? yes
+
+Inode 26 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 27
+Connect to /lost+found? yes
+
+Inode 27 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 28
+Connect to /lost+found? yes
+
+Inode 28 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 29
+Connect to /lost+found? yes
+
+Inode 29 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 30
+Connect to /lost+found? yes
+
+Inode 30 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 31
+Connect to /lost+found? yes
+
+Inode 31 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 32
+Connect to /lost+found? yes
+
+Inode 32 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 33
+Connect to /lost+found? yes
+
+Inode 33 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 34
+Connect to /lost+found? yes
+
+Inode 34 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 35
+Connect to /lost+found? yes
+
+Inode 35 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 36
+Connect to /lost+found? yes
+
+Inode 36 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 37
+Connect to /lost+found? yes
+
+Inode 37 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 38
+Connect to /lost+found? yes
+
+Inode 38 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 39
+Connect to /lost+found? yes
+
+Inode 39 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 40
+Connect to /lost+found? yes
+
+Inode 40 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 41
+Connect to /lost+found? yes
+
+Inode 41 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 42
+Connect to /lost+found? yes
+
+Inode 42 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 43
+Connect to /lost+found? yes
+
+Inode 43 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 44
+Connect to /lost+found? yes
+
+Inode 44 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 45
+Connect to /lost+found? yes
+
+Inode 45 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 46
+Connect to /lost+found? yes
+
+Inode 46 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 47
+Connect to /lost+found? yes
+
+Inode 47 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 48
+Connect to /lost+found? yes
+
+Inode 48 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 49
+Connect to /lost+found? yes
+
+Inode 49 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 50
+Connect to /lost+found? yes
+
+Inode 50 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 51
+Connect to /lost+found? yes
+
+Inode 51 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 52
+Connect to /lost+found? yes
+
+Inode 52 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 53
+Connect to /lost+found? yes
+
+Inode 53 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 54
+Connect to /lost+found? yes
+
+Inode 54 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 55
+Connect to /lost+found? yes
+
+Inode 55 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 56
+Connect to /lost+found? yes
+
+Inode 56 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 57
+Connect to /lost+found? yes
+
+Inode 57 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 58
+Connect to /lost+found? yes
+
+Inode 58 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 59
+Connect to /lost+found? yes
+
+Inode 59 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 60
+Connect to /lost+found? yes
+
+Inode 60 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 61
+Connect to /lost+found? yes
+
+Inode 61 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 62
+Connect to /lost+found? yes
+
+Inode 62 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 63
+Connect to /lost+found? yes
+
+Inode 63 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 64
+Connect to /lost+found? yes
+
+Inode 64 ref count is 2, should be 1.  Fix? yes
+
+Unattached zero-length inode 65.  Clear? yes
+
+Unattached inode 66
+Connect to /lost+found? yes
+
+Inode 66 ref count is 2, should be 1.  Fix? yes
+
+Unattached inode 67
+Connect to /lost+found? yes
+
+Inode 67 ref count is 2, should be 1.  Fix? yes
+
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 67/512 files (1.5% non-contiguous), 1127/2048 blocks
+Exit status is 1
diff --git a/tests/f_rebuild_csum_rootdir/expect.2 b/tests/f_rebuild_csum_rootdir/expect.2
new file mode 100644
index 0000000..033f1bf
--- /dev/null
+++ b/tests/f_rebuild_csum_rootdir/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 67/512 files (1.5% non-contiguous), 1127/2048 blocks
+Exit status is 0
diff --git a/tests/f_rebuild_csum_rootdir/image.gz b/tests/f_rebuild_csum_rootdir/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..a32fd4431a44560b20033d43836000ef22ce977f
GIT binary patch
literal 12476
zcmeI2c~leUyT@t$QMaFB>q3zwwrWuk5Ktr{q?HOkRKy~R2oeP}Dq@f&VRc#;a6z#s
zrpl69L{tz2#3&I4TtEa8ma;FSr9dEopd<uln0fBld(XM|o_o$czd!p&@=tP}yz|cc
zeBS5%exEsK7#C;g9HESNemYIjJ@bmu!CU3;GrDISaQ)YkC7*op=Y<*7pKbba-qmkE
z{p=r`FV7KVy-rRy3bf?&zWaXpYvZqH{p0*kI-Bftzdp6>+ba_$YR?N_+<)}zf`To5
z=hb$P-kkHF`HVWm`KluLK6~u-bcIqdR9ibDd9NH992~rn;rhk?^7~s8DqO8i9iH5G
zbTwI3I@!Bs+3X;vX#e`4%+zo0IIWB#)puBe<k4)1jR@a~ZElv_JmzO_wa#H8NfkT$
zv*p#67WPB&=KV$mH4)-VMq?elNFE}uYs-01N_4+AeAe?m#pQuxWM!6(s7&6qu=g8u
zDwWuhHCTP6qz5~DIUa2>J>5bO@g0*d-aQQe^5?moSuer{AFys?L8)aG_kI{eJy^)P
zyT8XU=k4shrP+9NwN81tFNv)(=H1H!mgoJP2cHO%ch#7zGPbpr$L9Nrqg?{@baZrj
z7=DQ5$~mJa>EDrpV>9Wm*tcinA`+HmmArRVmgyjicYCcr`AoB!{-=C;#g%1~!(FiP
zSW-xA_nO{mbkOmx6f7yFD<_KjIAI2S3@Tv&g0i>y-OR|18&;2&WoE5M#^Z<9AJmLF
z^ehjUPY>zoJSv5)#RD3j(mLTd>GrTDx=P+^pOMRdcD%G#oX3w9sV^saD<@o<ZdEM!
zL`UaGK;p1zPX_Hbvgpf4dzv-BA(7v~eLP`HM6rnc@ARU;tTnHvq2=RUB9X~~oXk`_
z{!&CbimW=RxoDF-RAmyYDsFHKIO?S24qdHieo;~V$so)@%!5aNnvr#zUzNc3?=wvf
z^!(8Fa}e;J=dMXxAJCk9iAe>du06UR8_0+X*{@-rkd%)I-$zb$K3ucyba4~mO$m<H
zpnuFgSu-d)xuf=Xo97?6Z~hQD&J9*kw}A@G?&(83_ED|ehg00U<Tpm~NmAqg+R4+;
zzhdSS$+Ml{H}in;DwS&5G=f6gOD04Ih7*Srm~@KWYEPh6z<om5teDj3ox{-%dDa>b
ze%Mw!iENGr)XL3hj4Ma7*0;bG@&TA{OLhpm{9_oyr0_RXsI)Ux!BCs2U)#yu*t|TJ
z(eYYp`J7jHRlkB|nF-l1oPgQaM$UKi%td@9)R+yfzt0F~44&&|=UKL3zZAi#M9Pi&
z63@PLxf?xRLlFt68?vv1<;a1v2ISU7XQ^FDl?uIl&~x1q(Tmynd98PJD2R%VmEXRV
zSDe}Cz}xQP9<x<#Gp$KDXQld4R<^ivz_Ze;f6fCk7tg%@;hsa-fal}8ev)4spY^wl
z94Hm+JQ^I7d#%&Vcwb77yMa;0KP}erdA9uGi11)<2lAGOS6I#9V6!Kdv}R-bBTr_L
z$KdgiygZ+?sPZs(rD&Z<bZnnPsq4L*tWpnG=|i_+ThBvY{d|7-qxeqsyAHL*3VyM`
z$9}_xCVBT`9y6O49xme51{-+WcXu~8>|Dv;^*ARzTa<4`<}7P3!;9*+>>Ca<%n_t#
z^xWc+Rg$##=a+KgkF>?a4DcV<S!U~I2BF;4bic8#a3))vs5|E4V=vxtD4}NJU&Gw(
z!$*fl9E<q7Y!Xjz35#JSo=ztg*JSVJR#_Bx-n{(s`kSezDWgBl%1k!jI_P;=5xlL6
z+d9~ncUv=U(Q}MfXM%NDiJwMn8!qhJd()>QxkqpB3{*LswzIu7+S^C4s@pjIQryz)
z0&6txc+g}(@#BoV@R$5Y%VRu!Y%S}ChnU4DcrHFZE-{6i;Sc@kRvbzYy}>LXW*}3A
zHysR&jt#VawpJ{!m5f}UOwmoL3=Q*=-?&uV$sSxS;7EhEAx-UF4l#b}@ud6=&f$BF
zEnUs&e&1n{SdMMnhM2u(ef=5NCyiYT`Pd^pQE65%FD%+%{{2?A&=1*J=ssN7(J$Rz
z73;D!YDzKEqT<XCLYjKIZzy~WTT5TL7P5z}?Kjj6H+Oay?5C*WN{K{ry)wKk)zcoy
zvd%AwThWy_9;z`g-Z#5(t7~#~bJ^!v8RPvUBjGnwd=|%S7`PJ_`7}B98F#P63ek13
zeMk21l?{et+yIUjGd&@@#X?heLa?j}A2Ca@>1=99HQFQ2-xa-g!`%CRl824@HNg&-
z2Gg^}T`dz?)5i63_{}wwjb~=YU0r?Sht&RFpPH~#j?g<@VZ6`0FHhCN@J;O>yXkHz
zbsMN0jqN{QmvGz;X$zc1i=N@TF<?CTW6CUX=9lHqozcH6BeKxZmD(LWR@JQ6@4Lsa
zasFvP!|ny~9v^VI8)C6bbTJR2-7vX0wRQWMVSPx+;8y&45?voSEvC4bj~YhF(fHC2
zcblh6pM>|f@YBWqU4IIYF!sH*4h9~Cs-gb9gx#C|pB_M!86DhlPZPT2PJ_@~YP(2h
z|Jx|{xG%pu@}q_p<#O>d1%!lEkiv1vQ+To2fMEGfG><NBKR7=?2`65E^9ncpQ-RuL
z0)^Kj@`?gNJZl7s+$Gdz1JWJS$05W=L3vaRvaC3;+<+7dHX%-43QD-J3%FPsdC>;f
z*sWEuLZVe9W^3MpvLzHZP=!5Nt<vkT;K3ht7t~5&5i}-JS)K8e6MQa4M5i?1$|TSm
z4anHJAqw=WM!49sJ@5$~OKAYb?Rt2Z?XO|^j8ZHn2M!vL`bPS=vTY>~ymghpkhUXH
z&TwL@6Vx{SW(l($DYxo&WV180TIiAT;}^P-iAAsQ?0yA1xdC~J<;Yp$mm`e5Apl31
zM1HM|JvLSkqZtK6(&~vRWqBHjNo2@iUp@AD9H|V@6(H056zHl<5cSow<hgAmr*)xR
zm{EiraR&GMLQuVV(?+6(eu^>1p+6O{-5J{YZo#-rh!h^m@h21hRIsl%AOj=KYOJq-
z;9EQ)0}e^392;EZJr9@0t>wW^-Bv`et{ri6h6@)rfvf4O5!hOxMqh-ETS{fzz4P%T
z^cx-+oW7t&tT|9@K+fmSAVrBu-Ozh#4BpXM)X4N0qgJb!C-y3+fLYsNY3Mjx*Z|Mp
z)FUUH6x0A6a_>{Xac7Wg&QSZ11!Aa3CzOmyTT*=lvctx)D-G=EjjH?}dmeP1N&*);
zXRAzTaoYrsr@tBzdW=qD4NSr-P^FiDlIpGL_19nykM@8Sli<GD3qHnps>%6o#IIry
zIVPb}ZE$|6jS3+ZZKUO*XRw~0ZMj^HvJ<FkqEexZVU*udj!x0uTpmM^;NV&?%qR*y
z4o}tv5H8ZO14PwxB7h6b+_&TTy`kgS^#(MUBSX~VZDgl@EZIAeMv6EpESxJtN0Qq}
z;mQe0Fh6u0vCKq^I#5|g8|l6%mQ=n>Bl&?UW!QchYx1%TgPkF`@LCZ_5-BsO9w`+v
z#K`v*N_3K5vAFOt!ulMC@b55<F#aoF#IEaZC(JH`-Y`1rLn_F@scbVBx^Q-nQZPvu
ziByK(DVeh8p!Cfa!~#~$B$RaFsE39K#3w+*W`~di*V0JVYZ<Gc1bV#|`)e%CCUAC`
zgo?4og;SelDE_F8447>J?)2t!ouRu~$;z3s=D|&R1I(EO@=F8y!s|Yn<F65Vgg`>5
zgdz%iAk0aIh96Z>O|u*+X>~hYn#kUWNu;*4fVi4C0M(z7uTW(~gfskMd4@{Lz1Ypl
zdP_0F!Y*GSY;ZKzM~10nZRFKi7SP|PK$c`8IrO1>alJ}0T}S836)oFuJX`Cnl$U)t
zyqN503pOF@+$v4zoqCp^0SSW07yQs9SKFyK$LP@8Y2_4*g9uO(hJgzR0iA$-3`y<D
zVGt@eC;)meKzz@Tkko0eVv)XV;<crT)$bn%VZV^_Qa~M*Q8%4Y+XJapdBE^-r|125
zK9TwVr>Sd%c++YZwqf_$JGLt7j-!FEgHf8%6$%?|^{_r%I+k)Dc`wPG`aU^vi-~>c
zFt+`dh4EvDa`zTmRUyZ&I?quMT{e1i6_lC8ppCRKNkR-4#UFXdi)Ph;NARRqR|S{3
zFS=&%9_LIScI_26&Zt91x&m3fy|e~ymrUhj`zNs-gqxkw^|?zM4~WAfE#h|AEgOhn
zhaQlzA04cve#tnwe?JE6ee}ULH~oinlDGl#Xb!CJhdHgtXU*|^|9rvoDddff)c`CU
zB%yNciCXjlr1b0oGqs2kt#KW%WqvA&i+p{7I$746Ru4G=!pH-JcbwA`D&&b3Ay~UW
z&PXkSXNi<>t{z>00UGy-9R<`0CLxu|*x@H+$nB*(<Z0O+BtSyVw#L^_2FQfA8+cG+
zPRjvhOGX?4`gG|ZQdxR0z`Yz6i=qEi4O$vD&c4=wyz_Db+9@=QLe~OnCX=v~$WYW#
zg++%bs61N6<7mZtBSY}AdPwN>*9d6YIZuzOmSLp}aKjRR4dlM<g)%0g&rvB)9F`$(
zRk9=`b&(viq$ybAT;E0b^k984^8U~(Ttf>xt+9>4PH+p(b)!BN1l*<%Uabs``d&Q=
zFA9jx^_DHjx3t9^RI%(=s*s6$ZKRae8+S=N;_VE&Rc9%C9&G@-=}W8V6fzo+uEu)S
z*dtjDoJvA=(kl6L0~FUdK}o8=hEcY;fV#&d1c55Xd(99owJ0DG7R6F5a}J~!kU=L+
za7K>}4}R7af-&6=cN&nErAD~@Z}||H`}TwK*EG`Wpb8!v;hgk^Vo;R$lTBgcmUS|4
zRjY+wyDLYad(^)Z`Cm5C$!a~eD;KV$ku}!1va7U&5^v;zL|25=*0v+`DYl)x37$5H
zz+qPisi{vRTLV@3Pgbae;T{SKUEmA$t2yu?k-D?)7EA=uN?~<(517)qrZ%unB-A4{
z6Ky2R!VVZ2MNpDRMeTHg^G$TFv=%acE&@D}(%&6VT{NT()++=@R&juDgR@anl@iZv
zBlSkQ5O^&b%y%bP=|vEqNHzA}LexD4#K5B1DHc8Yk^w0?{sk^_Siys9r=)OM(vE(e
ziA1VNhGo*lZbVb>Pgc<_#mXR(8zJLHFXKU!ZZ;Iqz9EreYDqn)ivl$0o>2;;rTKbp
zE1WdOO_hdX)O2MF%ZvjL3`hrQ0(DkXHNwh$JN5C=q|+J~4gZtO?=cZ0#3?zeYCwTH
zWWt>oCx{Td9D!{W!)&^Z0Y7JiQ;ak?6cEDIPbkD+LfKp68s-|6FxEw-frV+5DQ!$P
zneb-J3C0GhMi@Vgl3XFJ#$GMG*!)}IuCE$~1@R%|yO2yrO23O5dzwa?+2F9Nn~Lf?
z*G9gW)dMWLpCJs$R!;#kpwJh?`BR;L6(HXka=g6Q5Ok><iK|cx>1u-|necLmV4#ij
z#E5r=8v4^nO`wW3d9fZ*A1NTJ7Wq=bn?=x*NQrZlNZwy&pn8w^{xpas7eRuz5^4V*
zD!SGgR_bD~gRXXV%Q$Md39@zuxWoTr3>wYe^agZd_AOW{_t&`7zM1rt>GGO1AlYT+
zDsFVRf^wy+ySAnszIBG(%^J}2F)3pFJA$koqa~mvpe3Lspe3Lspe3Lspe3Lspe3Ls
zpe3Lspe3Lspe3Lspe3Lspe3Lspe3Lspe3Lspe67(B|wQB?3U1P9P7+eL4^IM3;(9e
W)Gq69OyI|Sozg9bf2Chr*ZB{=uW}{;

literal 0
HcmV?d00001

diff --git a/tests/f_rebuild_csum_rootdir/name b/tests/f_rebuild_csum_rootdir/name
new file mode 100644
index 0000000..b246f48
--- /dev/null
+++ b/tests/f_rebuild_csum_rootdir/name
@@ -0,0 +1 @@
+force fsck to rebuild a corrupted rootdir w/ metadata_csum


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 27/49] dumpe2fs: add switch to disable checksum verification
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (25 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 26/49] tests: add test for corrupted checksummed root directory block Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:56 ` [PATCH 28/49] mke2fs: set block_validity as a default mount option Darrick J. Wong
                   ` (19 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Add a -n switch to turn off checksum verification.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/dumpe2fs.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
index ae54f8a..45eddaf 100644
--- a/misc/dumpe2fs.c
+++ b/misc/dumpe2fs.c
@@ -582,7 +582,9 @@ int main (int argc, char ** argv)
 	if (argc && *argv)
 		program_name = *argv;
 
-	while ((c = getopt (argc, argv, "bfhixVo:")) != EOF) {
+	flags = EXT2_FLAG_JOURNAL_DEV_OK | EXT2_FLAG_SOFTSUPP_FEATURES |
+		EXT2_FLAG_64BITS;
+	while ((c = getopt(argc, argv, "bfhixVo:n")) != EOF) {
 		switch (c) {
 		case 'b':
 			print_badblocks++;
@@ -608,6 +610,9 @@ int main (int argc, char ** argv)
 		case 'x':
 			hex_format++;
 			break;
+		case 'n':
+			flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
+			break;
 		default:
 			usage();
 		}
@@ -615,7 +620,6 @@ int main (int argc, char ** argv)
 	if (optind > argc - 1)
 		usage();
 	device_name = argv[optind++];
-	flags = EXT2_FLAG_JOURNAL_DEV_OK | EXT2_FLAG_SOFTSUPP_FEATURES | EXT2_FLAG_64BITS;
 	if (force)
 		flags |= EXT2_FLAG_FORCE;
 	if (image_dump)


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 28/49] mke2fs: set block_validity as a default mount option
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (26 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 27/49] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
@ 2014-03-11  6:56 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 29/49] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
                   ` (18 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:56 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The block_validity mount option spot-checks block allocations against
a bitmap of known group metadata blocks.  This helps us to prevent
self-inflicted catastrophic failures such as trying to "share"
critical metadata (think bitmaps) with file data, which usually
results in filesystem destruction.

In order to test the overhead of the mount option, I re-used the speed
tests in the metadata checksum testing script.  In short, the program
creates what looks like 15 copies of a kernel source tree, except that
it uses fallocate to strip out the overhead of writing the file data
so that we can focus on metadata overhead.  On a 64G RAM disk, the
overhead was generally about 0.9% and at most 1.6%.  On a 160G USB
disk, the overhead was about 0.8% and peaked at 1.2%.

When I changed the test to write out files instead of merely
fallocating space, the overhead was negligible.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.conf.in |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
index 4c5dba7..de0250d 100644
--- a/misc/mke2fs.conf.in
+++ b/misc/mke2fs.conf.in
@@ -1,6 +1,6 @@
 [defaults]
 	base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
-	default_mntopts = acl,user_xattr
+	default_mntopts = acl,user_xattr,block_validity
 	enable_periodic_fsck = 0
 	blocksize = 4096
 	inode_size = 256


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 29/49] libext2fs: support allocating uninit blocks in bmap2()
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (27 preceding siblings ...)
  2014-03-11  6:56 ` [PATCH 28/49] mke2fs: set block_validity as a default mount option Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 30/49] libext2fs: file IO routines should handle uninit blocks Darrick J. Wong
                   ` (17 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

In order to support fallocate, we need to be able to have
ext2fs_bmap2() allocate blocks and put them into uninitialized
extents.  There's a flag to do this in the extent code, but it's not
exposed to the bmap2 interface, so plumb that in.  Eventually fuse2fs
or somebody will use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/bmap.c      |   24 ++++++++++++++++++++++--
 lib/ext2fs/ext2fs.h    |    1 +
 lib/ext2fs/mkjournal.c |   17 +++++++++++++++++
 3 files changed, 40 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index c1d0e6f..a4dc8ef 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -72,6 +72,11 @@ static _BMAP_INLINE_ errcode_t block_ind_bmap(ext2_filsys fs, int flags,
 					    block_buf + fs->blocksize, &b);
 		if (retval)
 			return retval;
+		if (flags & BMAP_UNINIT) {
+			retval = ext2fs_zero_blocks2(fs, b, 1, NULL, NULL);
+			if (retval)
+				return retval;
+		}
 
 #ifdef WORDS_BIGENDIAN
 		((blk_t *) block_buf)[nr] = ext2fs_swab32(b);
@@ -214,10 +219,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
 	errcode_t		retval = 0;
 	blk64_t			blk64 = 0;
 	int			alloc = 0;
+	int			set_flags;
+
+	set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
 
 	if (bmap_flags & BMAP_SET) {
 		retval = ext2fs_extent_set_bmap(handle, block,
-						*phys_blk, 0);
+						*phys_blk, set_flags);
 		return retval;
 	}
 	retval = ext2fs_extent_goto(handle, block);
@@ -254,7 +262,7 @@ got_block:
 		alloc++;
 	set_extent:
 		retval = ext2fs_extent_set_bmap(handle, block,
-						blk64, 0);
+						blk64, set_flags);
 		if (retval) {
 			ext2fs_block_alloc_stats2(fs, blk64, -1);
 			return retval;
@@ -345,6 +353,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 		goto done;
 	}
 
+	if ((bmap_flags & BMAP_SET) && (bmap_flags & BMAP_UNINIT)) {
+		retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
+		if (retval)
+			goto done;
+	}
+
 	if (block < EXT2_NDIR_BLOCKS) {
 		if (bmap_flags & BMAP_SET) {
 			b = *phys_blk;
@@ -360,6 +374,12 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
 			if (retval)
 				goto done;
+			if (bmap_flags & BMAP_UNINIT) {
+				retval = ext2fs_zero_blocks2(fs, b, 1, NULL,
+							     NULL);
+				if (retval)
+					goto done;
+			}
 			inode_bmap(inode, block) = b;
 			blocks_alloc++;
 			*phys_blk = b;
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 599c972..819a14a 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -527,6 +527,7 @@ typedef struct ext2_icount *ext2_icount_t;
  */
 #define BMAP_ALLOC	0x0001
 #define BMAP_SET	0x0002
+#define BMAP_UNINIT	0x0004
 
 /*
  * Returned flags from ext2fs_bmap
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index 884d9c0..ecc3912 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -174,6 +174,23 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
 			return ENOMEM;
 		memset(buf, 0, fs->blocksize * STRIDE_LENGTH);
 	}
+
+	/* Try discard, if it zeroes data... */
+	if (io_channel_discard_zeroes_data(fs->io)) {
+		memset(buf + fs->blocksize, 0, fs->blocksize);
+		retval = io_channel_discard(fs->io, blk, num);
+		if (retval)
+			goto skip_discard;
+		retval = io_channel_read_blk64(fs->io, blk, 1, buf);
+		if (retval)
+			goto skip_discard;
+		if (memcmp(buf, buf + fs->blocksize, fs->blocksize) == 0)
+			return 0;
+		/* Hah!  Discard doesn't zero! */
+		fs->io->flags &= ~CHANNEL_FLAGS_DISCARD_ZEROES;
+	}
+skip_discard:
+
 	/* OK, do the write loop */
 	j=0;
 	while (j < num) {


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 30/49] libext2fs: file IO routines should handle uninit blocks
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (28 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 29/49] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 31/49] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
                   ` (16 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The file IO routines do not handle uninit blocks at all.  The read
method should check for the uninit flag and return a buffer of zeroes,
and the write routine should convert unwritten extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/fileio.c |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 14eaed3..1e386f8 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -123,6 +123,8 @@ errcode_t ext2fs_file_flush(ext2_file_t file)
 {
 	errcode_t	retval;
 	ext2_filsys fs;
+	int		ret_flags;
+	blk64_t		dontcare;
 
 	EXT2_CHECK_MAGIC(file, EXT2_ET_MAGIC_EXT2_FILE);
 	fs = file->fs;
@@ -131,6 +133,22 @@ errcode_t ext2fs_file_flush(ext2_file_t file)
 	    !(file->flags & EXT2_FILE_BUF_DIRTY))
 		return 0;
 
+	/* Is this an uninit block? */
+	if (file->physblock && file->inode.i_flags & EXT4_EXTENTS_FL) {
+		retval = ext2fs_bmap2(fs, file->ino, &file->inode, BMAP_BUFFER,
+				      0, file->blockno, &ret_flags, &dontcare);
+		if (retval)
+			return retval;
+		if (ret_flags & BMAP_RET_UNINIT) {
+			retval = ext2fs_bmap2(fs, file->ino, &file->inode,
+					      BMAP_BUFFER, BMAP_SET,
+					      file->blockno, 0,
+					      &file->physblock);
+			if (retval)
+				return retval;
+		}
+	}
+
 	/*
 	 * OK, the physical block hasn't been allocated yet.
 	 * Allocate it.
@@ -185,15 +203,17 @@ static errcode_t load_buffer(ext2_file_t file, int dontfill)
 {
 	ext2_filsys	fs = file->fs;
 	errcode_t	retval;
+	int		ret_flags;
 
 	if (!(file->flags & EXT2_FILE_BUF_VALID)) {
 		retval = ext2fs_bmap2(fs, file->ino, &file->inode,
-				     BMAP_BUFFER, 0, file->blockno, 0,
+				     BMAP_BUFFER, 0, file->blockno, &ret_flags,
 				     &file->physblock);
 		if (retval)
 			return retval;
 		if (!dontfill) {
-			if (file->physblock) {
+			if (file->physblock &&
+			    !(ret_flags & BMAP_RET_UNINIT)) {
 				retval = io_channel_read_blk64(fs->io,
 							       file->physblock,
 							       1, file->buf);


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 31/49] resize2fs: convert fs to and from 64bit mode
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (29 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 30/49] libext2fs: file IO routines should handle uninit blocks Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 32/49] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
                   ` (15 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

resize2fs does its magic by loading a filesystem, duplicating the
in-memory image of that fs, moving relevant blocks out of the way of
whatever new metadata get created, and finally writing everything back
out to disk.  Enabling 64bit mode enlarges the group descriptors,
which makes resize2fs a reasonable vehicle for taking care of the rest
of the bookkeeping requirements, so add to resize2fs the ability to
convert a filesystem to 64bit mode and back.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/main.c         |   40 ++++++-
 resize/resize2fs.8.in |   18 +++
 resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
 resize/resize2fs.h    |    3 +
 4 files changed, 336 insertions(+), 7 deletions(-)


diff --git a/resize/main.c b/resize/main.c
index 2b7abff..e37521a 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -42,7 +42,7 @@ static char *device_name, *io_options;
 static void usage (char *prog)
 {
 	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
-			   "[-p] device [new_size]\n\n"), prog);
+			   "[-p] device [-b|-s|new_size]\n\n"), prog);
 
 	exit (1);
 }
@@ -200,7 +200,7 @@ int main (int argc, char ** argv)
 	if (argc && *argv)
 		program_name = *argv;
 
-	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
+	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
 		switch (c) {
 		case 'h':
 			usage(program_name);
@@ -226,6 +226,12 @@ int main (int argc, char ** argv)
 		case 'S':
 			use_stride = atoi(optarg);
 			break;
+		case 'b':
+			flags |= RESIZE_ENABLE_64BIT;
+			break;
+		case 's':
+			flags |= RESIZE_DISABLE_64BIT;
+			break;
 		default:
 			usage(program_name);
 		}
@@ -384,6 +390,10 @@ int main (int argc, char ** argv)
 		if (sys_page_size > fs->blocksize)
 			new_size &= ~((sys_page_size / fs->blocksize)-1);
 	}
+	/* If changing 64bit, don't change the filesystem size. */
+	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
+		new_size = ext2fs_blocks_count(fs->super);
+	}
 	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				       EXT4_FEATURE_INCOMPAT_64BIT)) {
 		/* Take 16T down to 2^32-1 blocks */
@@ -435,7 +445,31 @@ int main (int argc, char ** argv)
 			fs->blocksize / 1024, new_size);
 		exit(1);
 	}
-	if (new_size == ext2fs_blocks_count(fs->super)) {
+	if ((flags & RESIZE_DISABLE_64BIT) && (flags & RESIZE_ENABLE_64BIT)) {
+		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
+		exit(1);
+	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
+		new_size = ext2fs_blocks_count(fs->super);
+		if (new_size >= (1ULL << 32)) {
+			fprintf(stderr, _("Cannot change the 64bit feature "
+				"on a filesystem that is larger than "
+				"2^32 blocks.\n"));
+			exit(1);
+		}
+		if (mount_flags & EXT2_MF_MOUNTED) {
+			fprintf(stderr, _("Cannot change the 64bit feature "
+				"while the filesystem is mounted.\n"));
+			exit(1);
+		}
+		if (flags & RESIZE_ENABLE_64BIT &&
+		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
+			fprintf(stderr, _("Please enable the extents feature "
+				"with tune2fs before enabling the 64bit "
+				"feature.\n"));
+			exit(1);
+		}
+	} else if (new_size == ext2fs_blocks_count(fs->super)) {
 		fprintf(stderr, _("The filesystem is already %llu blocks "
 			"long.  Nothing to do!\n\n"), new_size);
 		exit(0);
diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
index a1f3099..1c75816 100644
--- a/resize/resize2fs.8.in
+++ b/resize/resize2fs.8.in
@@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
 .SH SYNOPSIS
 .B resize2fs
 [
-.B \-fFpPM
+.B \-fFpPMbs
 ]
 [
 .B \-d
@@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
 to shrink the size of the partition.  When shrinking the size of
 the partition, make sure you do not make it smaller than the new size
 of the ext2 filesystem!
+.PP
+The
+.B \-b
+and
+.B \-s
+options enable and disable the 64bit feature, respectively.  The resize2fs
+program will, of course, take care of resizing the block group descriptors
+and moving other data blocks out of the way, as needed.  It is not possible
+to resize the filesystem concurrent with changing the 64bit status.
 .SH OPTIONS
 .TP
+.B \-b
+Turns on the 64bit feature, resizes the group descriptors as necessary, and
+moves other metadata out of the way.
+.TP
 .B \-d \fIdebug-flags
 Turns on various resize2fs debugging features, if they have been compiled
 into the binary.
@@ -126,6 +139,9 @@ of what the program is doing.
 .B \-P
 Print the minimum size of the filesystem and exit.
 .TP
+.B \-s
+Turns off the 64bit feature and frees blocks that are no longer in use.
+.TP
 .B \-S \fIRAID-stride
 The
 .B resize2fs
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index f5f1337..cf5bef2 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -56,6 +56,9 @@ static errcode_t mark_table_blocks(ext2_filsys fs,
 static errcode_t clear_sparse_super2_last_group(ext2_resize_t rfs);
 static errcode_t reserve_sparse_super2_last_group(ext2_resize_t rfs,
 						 ext2fs_block_bitmap meta_bmap);
+static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
+static errcode_t move_bg_metadata(ext2_resize_t rfs);
+static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
 
 /*
  * Some helper CPP macros
@@ -122,13 +125,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
 	if (retval)
 		goto errout;
 
+	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
+	retval = resize_group_descriptors(rfs, *new_size);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
+	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
+	retval = move_bg_metadata(rfs);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
+	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
+	retval = zero_high_bits_in_inodes(rfs);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
 	init_resource_track(&rtrack, "adjust_superblock", fs->io);
 	retval = adjust_superblock(rfs, *new_size);
 	if (retval)
 		goto errout;
 	print_resource_track(rfs, &rtrack, fs->io);
 
-
 	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
 	fix_uninit_block_bitmaps(rfs->new_fs);
 	print_resource_track(rfs, &rtrack, fs->io);
@@ -231,6 +251,259 @@ errout:
 	return retval;
 }
 
+/* Toggle 64bit mode */
+static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
+{
+	void *o, *n, *new_group_desc;
+	dgrp_t i;
+	int copy_size;
+	errcode_t retval;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
+	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
+	    (rfs->flags & RESIZE_DISABLE_64BIT &&
+	     rfs->flags & RESIZE_ENABLE_64BIT))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (rfs->flags & RESIZE_DISABLE_64BIT) {
+		rfs->new_fs->super->s_feature_incompat &=
+				~EXT4_FEATURE_INCOMPAT_64BIT;
+		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
+	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
+		rfs->new_fs->super->s_feature_incompat |=
+				EXT4_FEATURE_INCOMPAT_64BIT;
+		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
+	}
+
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
+	    EXT2_DESC_SIZE(rfs->new_fs->super))
+		return 0;
+
+	o = rfs->new_fs->group_desc;
+	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
+			rfs->old_fs->group_desc_count,
+			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
+	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
+				      rfs->old_fs->blocksize, &new_group_desc);
+	if (retval)
+		return retval;
+
+	n = new_group_desc;
+
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
+	    EXT2_DESC_SIZE(rfs->new_fs->super))
+		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
+	else
+		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		memcpy(n, o, copy_size);
+		n += EXT2_DESC_SIZE(rfs->new_fs->super);
+		o += EXT2_DESC_SIZE(rfs->old_fs->super);
+	}
+
+	ext2fs_free_mem(&rfs->new_fs->group_desc);
+	rfs->new_fs->group_desc = new_group_desc;
+
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
+		ext2fs_group_desc_csum_set(rfs->new_fs, i);
+
+	return 0;
+}
+
+/* Move bitmaps/inode tables out of the way. */
+static errcode_t move_bg_metadata(ext2_resize_t rfs)
+{
+	dgrp_t i;
+	blk64_t b, c, d;
+	ext2fs_block_bitmap old_map, new_map;
+	int old, new;
+	errcode_t retval;
+	int zero = 0, one = 1;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
+	if (retval)
+		goto out;
+
+	/* Construct bitmaps of super/descriptor blocks in old and new fs */
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
+						   NULL);
+		if (retval)
+			goto out;
+		ext2fs_mark_block_bitmap2(old_map, b);
+		ext2fs_mark_block_bitmap2(old_map, c);
+		ext2fs_mark_block_bitmap2(old_map, d);
+
+		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
+						   NULL);
+		if (retval)
+			goto out;
+		ext2fs_mark_block_bitmap2(new_map, b);
+		ext2fs_mark_block_bitmap2(new_map, c);
+		ext2fs_mark_block_bitmap2(new_map, d);
+	}
+
+	/* Find changes in block allocations for bg metadata */
+	for (b = 0;
+	     b < ext2fs_blocks_count(rfs->new_fs->super);
+	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
+		old = ext2fs_test_block_bitmap2(old_map, b);
+		new = ext2fs_test_block_bitmap2(new_map, b);
+
+		if (old && !new)
+			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
+		else if (!old && new)
+			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
+		else
+			ext2fs_unmark_block_bitmap2(new_map, b);
+	}
+	/* new_map now shows blocks that have been newly allocated. */
+
+	/* Move any conflicting bitmaps and inode tables */
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
+		if (ext2fs_test_block_bitmap2(new_map, b))
+			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
+
+		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
+		if (ext2fs_test_block_bitmap2(new_map, b))
+			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
+
+		c = ext2fs_inode_table_loc(rfs->new_fs, i);
+		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
+			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
+				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
+				break;
+			}
+		}
+	}
+
+out:
+	if (old_map)
+		ext2fs_free_block_bitmap(old_map);
+	if (new_map)
+		ext2fs_free_block_bitmap(new_map);
+	return retval;
+}
+
+/* Zero out the high bits of extent fields */
+static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
+				 struct ext2_inode *inode)
+{
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	extent;
+	int			op = EXT2_EXTENT_ROOT;
+	errcode_t		errcode;
+
+	if (!(inode->i_flags & EXT4_EXTENTS_FL))
+		return 0;
+
+	errcode = ext2fs_extent_open(fs, ino, &handle);
+	if (errcode)
+		return errcode;
+
+	while (1) {
+		errcode = ext2fs_extent_get(handle, op, &extent);
+		if (errcode)
+			break;
+
+		op = EXT2_EXTENT_NEXT_SIB;
+
+		if (extent.e_pblk > (1ULL << 32)) {
+			extent.e_pblk &= (1ULL << 32) - 1;
+			errcode = ext2fs_extent_replace(handle, 0, &extent);
+			if (errcode)
+				break;
+		}
+	}
+
+	/* Ok if we run off the end */
+	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
+		errcode = 0;
+	return errcode;
+}
+
+/* Zero out the high bits of inodes. */
+static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
+{
+	ext2_filsys	fs = rfs->new_fs;
+	int length = EXT2_INODE_SIZE(fs->super);
+	struct ext2_inode *inode = NULL;
+	ext2_inode_scan	scan = NULL;
+	errcode_t	retval;
+	ext2_ino_t	ino;
+	blk64_t		file_acl_block;
+	int		inode_dirty;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	if (fs->super->s_creator_os != EXT2_OS_LINUX)
+		return 0;
+
+	retval = ext2fs_open_inode_scan(fs, 0, &scan);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_get_mem(length, &inode);
+	if (retval)
+		goto out;
+
+	do {
+		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
+		if (retval)
+			goto out;
+		if (!ino)
+			break;
+		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
+			continue;
+
+		/*
+		 * Here's how we deal with high block number fields:
+		 *
+		 *  - i_size_high has been been written out with i_size_lo
+		 *    since the ext2 days, so no conversion is needed.
+		 *
+		 *  - i_blocks_hi is guarded by both the huge_file feature and
+		 *    inode flags and has always been written out with
+		 *    i_blocks_lo if the feature is set.  The field is only
+		 *    ever read if both feature and inode flag are set, so
+		 *    we don't need to zero it now.
+		 *
+		 *  - i_file_acl_high can be uninitialized, so zero it if
+		 *    it isn't already.
+		 */
+		if (inode->osd2.linux2.l_i_file_acl_high) {
+			inode->osd2.linux2.l_i_file_acl_high = 0;
+			retval = ext2fs_write_inode_full(fs, ino, inode,
+							 length);
+			if (retval)
+				goto out;
+		}
+
+		retval = zero_high_bits_in_extents(fs, ino, inode);
+		if (retval)
+			goto out;
+	} while (ino);
+
+out:
+	if (inode)
+		ext2fs_free_mem(&inode);
+	if (scan)
+		ext2fs_close_inode_scan(scan);
+	return retval;
+}
+
 /*
  * Clean up the bitmaps for unitialized bitmaps
  */
@@ -455,7 +728,8 @@ retry:
 	/*
 	 * Reallocate the group descriptors as necessary.
 	 */
-	if (old_fs->desc_blocks != fs->desc_blocks) {
+	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
+	    old_fs->desc_blocks != fs->desc_blocks) {
 		retval = ext2fs_resize_mem(old_fs->desc_blocks *
 					   fs->blocksize,
 					   fs->desc_blocks * fs->blocksize,
@@ -1006,7 +1280,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	if (retval)
 		goto errout;
 
-	if (old_blocks == new_blocks) {
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
+	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
+	    old_blocks == new_blocks) {
 		retval = 0;
 		goto errout;
 	}
diff --git a/resize/resize2fs.h b/resize/resize2fs.h
index 7aeab91..829fcd8 100644
--- a/resize/resize2fs.h
+++ b/resize/resize2fs.h
@@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
 #define RESIZE_PERCENT_COMPLETE		0x0100
 #define RESIZE_VERBOSE			0x0200
 
+#define RESIZE_ENABLE_64BIT		0x0400
+#define RESIZE_DISABLE_64BIT		0x0800
+
 /*
  * This structure is used for keeping track of how much resources have
  * been used for a particular resize2fs pass.


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 32/49] resize2fs: when toggling 64bit, don't free in-use bg data clusters
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (30 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 31/49] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 33/49] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
                   ` (14 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Currently, move_bg_metadata() assumes that if a block containing a
superblock or a group descriptor is no longer needed, then it is safe
to free the whole cluster.  This of course isn't true, for bitmaps and
inode tables can share these clusters.  Therefore, check a little more
carefully before freeing clusters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   71 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 55 insertions(+), 16 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index cf5bef2..d40b058 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -317,11 +317,11 @@ static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 static errcode_t move_bg_metadata(ext2_resize_t rfs)
 {
 	dgrp_t i;
-	blk64_t b, c, d;
+	blk64_t b, c, d, old_desc_blocks, new_desc_blocks, j;
 	ext2fs_block_bitmap old_map, new_map;
 	int old, new;
 	errcode_t retval;
-	int zero = 0, one = 1;
+	int zero = 0, one = 1, cluster_ratio;
 
 	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
 		return 0;
@@ -334,6 +334,17 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 	if (retval)
 		goto out;
 
+	if (EXT2_HAS_INCOMPAT_FEATURE(rfs->old_fs->super,
+				      EXT2_FEATURE_INCOMPAT_META_BG)) {
+		old_desc_blocks = rfs->old_fs->super->s_first_meta_bg;
+		new_desc_blocks = rfs->new_fs->super->s_first_meta_bg;
+	} else {
+		old_desc_blocks = rfs->old_fs->desc_blocks +
+				rfs->old_fs->super->s_reserved_gdt_blocks;
+		new_desc_blocks = rfs->new_fs->desc_blocks +
+				rfs->new_fs->super->s_reserved_gdt_blocks;
+	}
+
 	/* Construct bitmaps of super/descriptor blocks in old and new fs */
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
 		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
@@ -341,7 +352,8 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 		if (retval)
 			goto out;
 		ext2fs_mark_block_bitmap2(old_map, b);
-		ext2fs_mark_block_bitmap2(old_map, c);
+		for (j = 0; c != 0 && j < old_desc_blocks; j++)
+			ext2fs_mark_block_bitmap2(old_map, c + j);
 		ext2fs_mark_block_bitmap2(old_map, d);
 
 		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
@@ -349,45 +361,72 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 		if (retval)
 			goto out;
 		ext2fs_mark_block_bitmap2(new_map, b);
-		ext2fs_mark_block_bitmap2(new_map, c);
+		for (j = 0; c != 0 && j < new_desc_blocks; j++)
+			ext2fs_mark_block_bitmap2(new_map, c + j);
 		ext2fs_mark_block_bitmap2(new_map, d);
 	}
 
+	cluster_ratio = EXT2FS_CLUSTER_RATIO(rfs->new_fs);
+
 	/* Find changes in block allocations for bg metadata */
 	for (b = 0;
 	     b < ext2fs_blocks_count(rfs->new_fs->super);
-	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
+	     b += cluster_ratio) {
 		old = ext2fs_test_block_bitmap2(old_map, b);
 		new = ext2fs_test_block_bitmap2(new_map, b);
 
-		if (old && !new)
-			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
-		else if (!old && new)
-			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
-		else
+		if (old && !new) {
+			/* mark old_map, unmark new_map */
+			if (cluster_ratio == 1)
+				ext2fs_unmark_block_bitmap2(
+						rfs->new_fs->block_map, b);
+		} else if (!old && new)
+			; /* unmark old_map, mark new_map */
+		else {
+			ext2fs_unmark_block_bitmap2(old_map, b);
 			ext2fs_unmark_block_bitmap2(new_map, b);
+		}
 	}
-	/* new_map now shows blocks that have been newly allocated. */
 
-	/* Move any conflicting bitmaps and inode tables */
+	/*
+	 * new_map now shows blocks that have been newly allocated.
+	 * old_map now shows blocks that have been newly freed.
+	 */
+
+	/*
+	 * Move any conflicting bitmaps and inode tables.  Ensure that we
+	 * don't try to free clusters associated with bitmaps or tables.
+	 */
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
 		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
 		if (ext2fs_test_block_bitmap2(new_map, b))
 			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
+		else if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(old_map, b);
 
 		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
 		if (ext2fs_test_block_bitmap2(new_map, b))
 			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
+		else if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(old_map, b);
 
 		c = ext2fs_inode_table_loc(rfs->new_fs, i);
-		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
-			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
+		for (b = 0;
+		     b < rfs->new_fs->inode_blocks_per_group;
+		     b++) {
+			if (ext2fs_test_block_bitmap2(new_map, b + c))
 				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
-				break;
-			}
+			else if (ext2fs_test_block_bitmap2(old_map, b + c))
+				ext2fs_unmark_block_bitmap2(old_map, b + c);
 		}
 	}
 
+	/* Free unused clusters */
+	for (b = 0;
+	     cluster_ratio > 1 && b < ext2fs_blocks_count(rfs->new_fs->super);
+	     b += cluster_ratio)
+		if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
 out:
 	if (old_map)
 		ext2fs_free_block_bitmap(old_map);


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 33/49] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (31 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 32/49] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 34/49] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
                   ` (13 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Since we're constructing the fantasy that new_fs has always been a
64bit fs, we need to adjust reserved_gdt_blocks when we start resizing
the metadata so that the size of the gdt space in the new fs reflects
the fantasy throughout the resize process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index d40b058..3b3a329 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -251,6 +251,24 @@ errout:
 	return retval;
 }
 
+/* Keep the size of the group descriptor region constant */
+static void adjust_reserved_gdt_blocks(ext2_filsys old_fs, ext2_filsys fs)
+{
+	if ((fs->super->s_feature_compat &
+	     EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
+	    (old_fs->desc_blocks != fs->desc_blocks)) {
+		int new;
+
+		new = ((int) fs->super->s_reserved_gdt_blocks) +
+			(old_fs->desc_blocks - fs->desc_blocks);
+		if (new < 0)
+			new = 0;
+		if (new > (int) fs->blocksize/4)
+			new = fs->blocksize/4;
+		fs->super->s_reserved_gdt_blocks = new;
+	}
+}
+
 /* Toggle 64bit mode */
 static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 {
@@ -310,6 +328,8 @@ static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
 		ext2fs_group_desc_csum_set(rfs->new_fs, i);
 
+	adjust_reserved_gdt_blocks(rfs->old_fs, rfs->new_fs);
+
 	return 0;
 }
 
@@ -787,20 +807,11 @@ retry:
 	 * number of descriptor blocks, then adjust
 	 * s_reserved_gdt_blocks if possible to avoid needing to move
 	 * the inode table either now or in the future.
+	 *
+	 * Note: If we're converting to 64bit mode, we did this earlier.
 	 */
-	if ((fs->super->s_feature_compat &
-	     EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
-	    (old_fs->desc_blocks != fs->desc_blocks)) {
-		int new;

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 34/49] libext2fs: have UNIX IO manager use pread/pwrite
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (32 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 33/49] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-11  6:57 ` [PATCH 35/49] ext2fs: add readahead method to improve scanning Darrick J. Wong
                   ` (12 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If pread/pwrite are present, have the UNIX IO manager use them for
aligned IOs (instead of the current seek -> read/write), thereby
saving us a (minor) amount of system call overhead.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure            |    2 +-
 configure.in         |    2 ++
 lib/config.h.in      |    6 ++++++
 lib/ext2fs/unix_io.c |   24 ++++++++++++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)


diff --git a/configure b/configure
index 6449f59..7b0a0d1 100755
--- a/configure
+++ b/configure
@@ -11155,7 +11155,7 @@ if test "$ac_res" != no; then :
 fi
 
 fi
-for ac_func in  	__secure_getenv 	backtrace 	blkid_probe_get_topology 	chflags 	fadvise64 	fallocate 	fallocate64 	fchown 	fdatasync 	fstat64 	ftruncate64 	futimes 	getcwd 	getdtablesize 	getmntinfo 	getpwuid_r 	getrlimit 	getrusage 	jrand48 	llseek 	lseek64 	mallinfo 	mbstowcs 	memalign 	mempcpy 	mmap 	msync 	nanosleep 	open64 	pathconf 	posix_fadvise 	posix_fadvise64 	posix_memalign 	prctl 	secure_getenv 	setmntent 	setresgid 	setresuid 	srandom 	stpcpy 	strcasecmp 	strdup 	strnlen 	strptime 	strtoull 	sync_file_range 	sysconf 	usleep 	utime 	valloc
+for ac_func in  	__secure_getenv 	backtrace 	blkid_probe_get_topology 	chflags 	fadvise64 	fallocate 	fallocate64 	fchown 	fdatasync 	fstat64 	ftruncate64 	futimes 	getcwd 	getdtablesize 	getmntinfo 	getpwuid_r 	getrlimit 	getrusage 	jrand48 	llseek 	lseek64 	mallinfo 	mbstowcs 	memalign 	mempcpy 	mmap 	msync 	nanosleep 	open64 	pathconf 	posix_fadvise 	posix_fadvise64 	posix_memalign 	prctl 	pread 	pwrite 	secure_getenv 	setmntent 	setresgid 	setresuid 	srandom 	stpcpy 	strcasecmp 	strdup 	strnlen 	strptime 	strtoull 	sync_file_range 	sysconf 	usleep 	utime 	valloc
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 8a033b0..f28bd46 100644
--- a/configure.in
+++ b/configure.in
@@ -1135,6 +1135,8 @@ AC_CHECK_FUNCS(m4_flatten([
 	posix_fadvise64
 	posix_memalign
 	prctl
+	pread
+	pwrite
 	secure_getenv
 	setmntent
 	setresgid
diff --git a/lib/config.h.in b/lib/config.h.in
index 12ac1e0..e0384ee 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -311,9 +311,15 @@
 /* Define to 1 if you have the `prctl' function. */
 #undef HAVE_PRCTL
 
+/* Define to 1 if you have the `pread' function. */
+#undef HAVE_PREAD
+
 /* Define to 1 if you have the `putenv' function. */
 #undef HAVE_PUTENV
 
+/* Define to 1 if you have the `pwrite' function. */
+#undef HAVE_PWRITE
+
 /* Define to 1 if dirent has d_reclen */
 #undef HAVE_RECLEN_DIRENT
 
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index c3185b6..a818c13 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -130,6 +130,18 @@ static errcode_t raw_read_blk(io_channel channel,
 	size = (count < 0) ? -count : count * channel->block_size;
 	data->io_stats.bytes_read += size;
 	location = ((ext2_loff_t) block * channel->block_size) + data->offset;
+
+#ifdef HAVE_PREAD
+	/* Try an aligned pread */
+	if ((channel->align == 0) ||
+	    (IS_ALIGNED(buf, channel->align) &&
+	     IS_ALIGNED(size, channel->align))) {
+		actual = pread(data->dev, buf, size, location);
+		if (actual == size)
+			return 0;
+	}
+#endif /* HAVE_PREAD */
+
 	if (ext2fs_llseek(data->dev, location, SEEK_SET) != location) {
 		retval = errno ? errno : EXT2_ET_LLSEEK_FAILED;
 		goto error_out;
@@ -200,6 +212,18 @@ static errcode_t raw_write_blk(io_channel channel,
 	data->io_stats.bytes_written += size;
 
 	location = ((ext2_loff_t) block * channel->block_size) + data->offset;
+
+#ifdef HAVE_PWRITE
+	/* Try an aligned pwrite */
+	if ((channel->align == 0) ||
+	    (IS_ALIGNED(buf, channel->align) &&
+	     IS_ALIGNED(size, channel->align))) {
+		actual = pwrite(data->dev, buf, size, location);
+		if (actual == size)
+			return 0;
+	}
+#endif /* HAVE_PWRITE */
+
 	if (ext2fs_llseek(data->dev, location, SEEK_SET) != location) {
 		retval = errno ? errno : EXT2_ET_LLSEEK_FAILED;
 		goto error_out;


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 35/49] ext2fs: add readahead method to improve scanning
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (33 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 34/49] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-17 22:07   ` Andreas Dilger
  2014-03-11  6:57 ` [PATCH 36/49] libext2fs: allow clients to read-ahead metadata Darrick J. Wong
                   ` (11 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4, Andreas Dilger

Frøm: Andreas Dilger <adilger@whamcloud.com>

Add a readahead method for prefetching ranges of disk blocks.  This is
useful for inode table scanning, and other large contiguous ranges of
blocks, and may also prove useful for random block prefetch, since it
will allow reordering of the IO without waiting synchronously for the
reads to complete.

It is currently using the posix_fadvise(POSIX_FADV_WILLNEED)
interface, as this proved most efficient during our testing.

[darrick.wong@oracle.com]
Add a cache_release method for advising the pagecache to discard disk
cache blocks.  Make the arguments to the readahead function take the
same ULL values as the other IO functions, and return an appropriate
error code when fadvise isn't available.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/ext2_io.h    |   12 ++++++++++++
 lib/ext2fs/io_manager.c |   18 ++++++++++++++++++
 lib/ext2fs/unix_io.c    |   46 +++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 73 insertions(+), 3 deletions(-)


diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 1894fb8..636f797 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -90,6 +90,12 @@ struct struct_io_manager {
 					int count, const void *data);
 	errcode_t (*discard)(io_channel channel, unsigned long long block,
 			     unsigned long long count);
+	errcode_t (*cache_readahead)(io_channel channel,
+				     unsigned long long block,
+				     unsigned long long count);
+	errcode_t (*cache_release)(io_channel channel,
+				   unsigned long long block,
+				   unsigned long long count);
 	long	reserved[16];
 };
 
@@ -124,6 +130,12 @@ extern errcode_t io_channel_discard(io_channel channel,
 				    unsigned long long count);
 extern errcode_t io_channel_alloc_buf(io_channel channel,
 				      int count, void *ptr);
+extern errcode_t io_channel_cache_readahead(io_channel io,
+					    unsigned long long block,
+					    unsigned long long count);
+extern errcode_t io_channel_cache_release(io_channel io,
+					  unsigned long long block,
+					  unsigned long long count);
 
 /* unix_io.c */
 extern io_manager unix_io_manager;
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index 34e4859..a1258c4 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -128,3 +128,21 @@ errcode_t io_channel_alloc_buf(io_channel io, int count, void *ptr)
 	else
 		return ext2fs_get_mem(size, ptr);
 }
+
+errcode_t io_channel_cache_readahead(io_channel io, unsigned long long block,
+				     unsigned long long count)
+{
+	if (!io->manager->cache_readahead)
+		return EXT2_ET_OP_NOT_SUPPORTED;
+
+	return io->manager->cache_readahead(io, block, count);
+}
+
+errcode_t io_channel_cache_release(io_channel io, unsigned long long block,
+				   unsigned long long count)
+{
+	if (!io->manager->cache_release)
+		return EXT2_ET_OP_NOT_SUPPORTED;
+
+	return io->manager->cache_release(io, block, count);
+}
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index a818c13..a95e289 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -15,6 +15,9 @@
  * %End-Header%
  */
 
+#define _XOPEN_SOURCE 600
+#define _DARWIN_C_SOURCE
+#define _FILE_OFFSET_BITS 64
 #define _LARGEFILE_SOURCE
 #define _LARGEFILE64_SOURCE
 #ifndef _GNU_SOURCE
@@ -35,6 +38,9 @@
 #ifdef __linux__
 #include <sys/utsname.h>
 #endif
+#if HAVE_SYS_TYPES_H
+#include <sys/types.h>
+#endif
 #ifdef HAVE_SYS_IOCTL_H
 #include <sys/ioctl.h>
 #endif
@@ -44,9 +50,6 @@
 #if HAVE_SYS_STAT_H
 #include <sys/stat.h>
 #endif
-#if HAVE_SYS_TYPES_H
-#include <sys/types.h>
-#endif
 #if HAVE_SYS_RESOURCE_H
 #include <sys/resource.h>
 #endif
@@ -97,6 +100,7 @@ struct unix_private_data {
 #define IS_ALIGNED(n, align) ((((unsigned long) n) & \
 			       ((unsigned long) ((align)-1))) == 0)
 
+
 static errcode_t unix_get_stats(io_channel channel, io_stats *stats)
 {
 	errcode_t	retval = 0;
@@ -810,6 +814,40 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
 #endif /* NO_IO_CACHE */
 }
 
+static errcode_t unix_cache_readahead(io_channel channel,
+				      unsigned long long block,
+				      unsigned long long count)
+{
+#ifdef POSIX_FADV_WILLNEED
+	struct unix_private_data *data;
+
+	data = (struct unix_private_data *)channel->private_data;
+	return posix_fadvise(data->dev,
+			     (ext2_loff_t)block * channel->block_size,
+			     (ext2_loff_t)count * channel->block_size,
+			     POSIX_FADV_WILLNEED);
+#else
+	return EXT2_ET_OP_NOT_SUPPORTED;
+#endif
+}
+
+static errcode_t unix_cache_release(io_channel channel,
+				    unsigned long long block,
+				    unsigned long long count)
+{
+#ifdef POSIX_FADV_DONTNEED
+	struct unix_private_data *data;
+
+	data = (struct unix_private_data *)channel->private_data;
+	return posix_fadvise(data->dev,
+			     (ext2_loff_t)block * channel->block_size,
+			     (ext2_loff_t)count * channel->block_size,
+			     POSIX_FADV_DONTNEED);
+#else
+	return EXT2_ET_OP_NOT_SUPPORTED;
+#endif
+}
+
 static errcode_t unix_write_blk(io_channel channel, unsigned long block,
 				int count, const void *buf)
 {
@@ -961,6 +999,8 @@ static struct struct_io_manager struct_unix_manager = {
 	unix_read_blk64,
 	unix_write_blk64,
 	unix_discard,
+	unix_cache_readahead,
+	unix_cache_release,
 };
 
 io_manager unix_io_manager = &struct_unix_manager;

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 36/49] libext2fs: allow clients to read-ahead metadata
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (34 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 35/49] ext2fs: add readahead method to improve scanning Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-17 23:11   ` Andreas Dilger
  2014-03-11  6:57 ` [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
                   ` (10 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

This patch adds to libext2fs the ability to pre-fetch metadata
into the page cache in the hopes of speeding up libext2fs' clients.
There are two new library functions -- the first allows a client to
readahead a list of blocks, and the second is a helper function that
uses that first mechanism to load group data (bitmaps, inode tables).

e2fsck will employ both of these methods to speed itself up.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/Makefile.in |    4 +
 lib/ext2fs/ext2fs.h    |   13 +++
 lib/ext2fs/readahead.c |  188 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 205 insertions(+)
 create mode 100644 lib/ext2fs/readahead.c


diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 0c880c7..e64342e 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -78,6 +78,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
 	qcow2.o \
 	read_bb.o \
 	read_bb_file.o \
+	readahead.o \
 	res_gdt.o \
 	rw_bitmaps.o \
 	swapfs.o \
@@ -155,6 +156,7 @@ SRCS= ext2_err.c \
 	$(srcdir)/qcow2.c \
 	$(srcdir)/read_bb.c \
 	$(srcdir)/read_bb_file.c \
+	$(srcdir)/readahead.c \
 	$(srcdir)/res_gdt.c \
 	$(srcdir)/rw_bitmaps.c \
 	$(srcdir)/swapfs.c \
@@ -903,6 +905,8 @@ read_bb_file.o: $(srcdir)/read_bb_file.c $(top_builddir)/lib/config.h \
  $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(srcdir)/ext2_ext_attr.h $(srcdir)/bitops.h
+readahead.o: $(srcdir)/readahead.c $(top_builddir)/lib/config.h \
+ $(srcdir)/ext2fs.h $(srcdir)/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_err.h
 res_gdt.o: $(srcdir)/res_gdt.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 819a14a..933a14d 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1563,6 +1563,19 @@ extern errcode_t ext2fs_read_bb_FILE(ext2_filsys fs, FILE *f,
 				     void (*invalid)(ext2_filsys fs,
 						     blk_t blk));
 
+/* readahead.c */
+#define EXT2_READA_SUPER	(0x01)
+#define EXT2_READA_GDT		(0x02)
+#define EXT2_READA_BBITMAP	(0x04)
+#define EXT2_READA_IBITMAP	(0x08)
+#define EXT2_READA_ITABLE	(0x10)
+#define EXT2_READA_ALL_FLAGS	(0x1F)
+errcode_t ext2fs_readahead(ext2_filsys fs, int flags, dgrp_t start,
+			   dgrp_t ngroups);
+errcode_t ext2fs_readahead_dblist(ext2_filsys fs, int flags,
+				  ext2_dblist dblist);
+int ext2fs_can_readahead(ext2_filsys fs);
+
 /* res_gdt.c */
 extern errcode_t ext2fs_create_resize_inode(ext2_filsys fs);
 
diff --git a/lib/ext2fs/readahead.c b/lib/ext2fs/readahead.c
new file mode 100644
index 0000000..ed6e555
--- /dev/null
+++ b/lib/ext2fs/readahead.c
@@ -0,0 +1,188 @@
+/*
+ * readahead.c -- Try to convince the OS to prefetch metadata.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+
+#include "ext2_fs.h"
+#include "ext2fs.h"
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+struct read_dblist {
+	errcode_t err;
+	blk64_t run_start;
+	blk64_t run_len;
+};
+
+static EXT2_QSORT_TYPE readahead_dir_block_cmp(const void *a, const void *b)
+{
+	const struct ext2_db_entry2 *db_a =
+		(const struct ext2_db_entry2 *) a;
+	const struct ext2_db_entry2 *db_b =
+		(const struct ext2_db_entry2 *) b;
+
+	return (int) (db_a->blk - db_b->blk);
+}
+
+static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
+			       void *priv_data)
+{
+	errcode_t err = 0;
+	struct read_dblist *pr = priv_data;
+
+	if (!pr->run_len || db->blk != pr->run_start + pr->run_len) {
+		if (pr->run_len) {
+			pr->err = io_channel_cache_readahead(fs->io,
+							     pr->run_start,
+							     pr->run_len);
+			dbg_printf("readahead start=%llu len=%llu err=%d\n",
+				   pr->run_start, pr->run_len,
+				   (int)pr->err);
+		}
+		pr->run_start = db->blk;
+		pr->run_len = 0;
+	}
+	pr->run_len += db->blockcnt;
+
+	return pr->err ? DBLIST_ABORT : 0;
+}
+
+errcode_t ext2fs_readahead_dblist(ext2_filsys fs, int flags,
+				  ext2_dblist dblist)
+{
+	errcode_t err;
+	struct read_dblist pr;
+
+	dbg_printf("%s: flags=0x%x\n", __func__, flags);
+	if (flags)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	ext2fs_dblist_sort2(dblist, readahead_dir_block_cmp);
+
+	memset(&pr, 0, sizeof(pr));
+	err = ext2fs_dblist_iterate2(dblist, readahead_dir_block, &pr);
+	if (pr.err)
+		return pr.err;
+	if (err)
+		return err;
+
+	if (pr.run_len)
+		err = io_channel_cache_readahead(fs->io, pr.run_start,
+						 pr.run_len);
+
+	return err;
+}
+
+errcode_t ext2fs_readahead(ext2_filsys fs, int flags, dgrp_t start,
+			   dgrp_t ngroups)
+{
+	blk64_t		super, old_gdt, new_gdt;
+	blk_t		blocks;
+	dgrp_t		i;
+	ext2_dblist	dblist;
+	dgrp_t		end = start + ngroups;
+	errcode_t	err = 0;
+
+	dbg_printf("%s: flags=0x%x start=%d groups=%d\n", __func__, flags,
+		   start, ngroups);
+	if (flags & ~EXT2_READA_ALL_FLAGS)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (end > fs->group_desc_count)
+		end = fs->group_desc_count;
+
+	if (flags == 0)
+		return 0;
+
+	err = ext2fs_init_dblist(fs, &dblist);
+	if (err)
+		return err;
+
+	for (i = start; i < end; i++) {
+		err = ext2fs_super_and_bgd_loc2(fs, i, &super, &old_gdt,
+						&new_gdt, &blocks);
+		if (err)
+			break;
+
+		if (flags & EXT2_READA_SUPER) {
+			err = ext2fs_add_dir_block2(dblist, 0, super, 0);
+			if (err)
+				break;
+		}
+
+		if (flags & EXT2_READA_GDT) {
+			if (old_gdt)
+				err = ext2fs_add_dir_block2(dblist, 0, old_gdt,
+							    blocks);
+			else if (new_gdt)
+				err = ext2fs_add_dir_block2(dblist, 0, new_gdt,
+							    blocks);
+			else
+				err = 0;
+			if (err)
+				break;
+		}
+
+		if ((flags & EXT2_READA_BBITMAP) &&
+		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_BLOCK_UNINIT) &&
+		    ext2fs_bg_free_blocks_count(fs, i) <
+				fs->super->s_blocks_per_group) {
+			super = ext2fs_block_bitmap_loc(fs, i);
+			err = ext2fs_add_dir_block2(dblist, 0, super, 1);
+			if (err)
+				break;
+		}
+
+		if ((flags & EXT2_READA_IBITMAP) &&
+		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_INODE_UNINIT) &&
+		    ext2fs_bg_free_inodes_count(fs, i) <
+				fs->super->s_inodes_per_group) {
+			super = ext2fs_inode_bitmap_loc(fs, i);
+			err = ext2fs_add_dir_block2(dblist, 0, super, 1);
+			if (err)
+				break;
+		}
+
+		if ((flags & EXT2_READA_ITABLE) &&
+		    ext2fs_bg_free_inodes_count(fs, i) <
+				fs->super->s_inodes_per_group) {
+			super = ext2fs_inode_table_loc(fs, i);
+			blocks = fs->inode_blocks_per_group -
+				 (ext2fs_bg_itable_unused(fs, i) *
+				  EXT2_INODE_SIZE(fs->super) / fs->blocksize);
+			err = ext2fs_add_dir_block2(dblist, 0, super, blocks);
+			if (err)
+				break;
+		}
+	}
+
+	if (!err)
+		err = ext2fs_readahead_dblist(fs, 0, dblist);
+
+	ext2fs_free_dblist(dblist);
+	return err;
+}
+
+int ext2fs_can_readahead(ext2_filsys fs)
+{
+	errcode_t err;
+
+	err = io_channel_cache_readahead(fs->io, 0, 1);
+	dbg_printf("%s: supp=%d\n", __func__, err != EXT2_ET_OP_NOT_SUPPORTED);
+	return err != EXT2_ET_OP_NOT_SUPPORTED;
+}


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (35 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 36/49] libext2fs: allow clients to read-ahead metadata Darrick J. Wong
@ 2014-03-11  6:57 ` Darrick J. Wong
  2014-03-17 23:10   ` Andreas Dilger
  2014-03-11  6:58 ` [PATCH 38/49] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
                   ` (9 subsequent siblings)
  46 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:57 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed.  In order to avoid cache thrashing, we limit ourselves to
prefetching at most half the available memory.

pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1.  So long as
we don't anticipate rehashing the dirs (pass 3a), we can release the
dirblocks as soon as we're done checking them.

pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.

In general, these mechanisms can halve fsck time, if the host system
has sufficient memory and the storage system can provide a lot of
IOPs.  SSDs and multi-spindle RAIDs see the most speedup; single disks
experience a modest speedup, and single-spindle USB mass storage
devices see hardly any benefit.

By default, readahead will try to fill half the physical memory in the
system.  The -R option can be given to specify the amount of memory to
use for readahead, or zero to disable it entirely; or an option can be
given in e2fsck.conf.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 MCONFIG.in              |    1 
 configure               |   49 +++++++++++++++++
 configure.in            |    6 ++
 e2fsck/Makefile.in      |    4 +
 e2fsck/e2fsck.8.in      |    9 +++
 e2fsck/e2fsck.c         |  136 +++++++++++++++++++++++++++++++++++++++++++++++
 e2fsck/e2fsck.conf.5.in |   13 ++++
 e2fsck/e2fsck.h         |   25 +++++++++
 e2fsck/pass1.c          |   83 +++++++++++++++++++++++++++++
 e2fsck/pass2.c          |   96 +++++++++++++++++++++++++++++++++
 e2fsck/pass4.c          |   22 ++++++++
 e2fsck/prof_err.et      |    1 
 e2fsck/rehash.c         |   10 +++
 e2fsck/unix.c           |   35 +++++++++++-
 e2fsck/util.c           |   51 ++++++++++++++++++
 lib/config.h.in         |    9 +++
 16 files changed, 544 insertions(+), 6 deletions(-)


diff --git a/MCONFIG.in b/MCONFIG.in
index 9b411d6..6ee88db 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -116,6 +116,7 @@ LIBUUID = @LIBUUID@ @SOCKET_LIB@
 LIBQUOTA = @STATIC_LIBQUOTA@
 LIBBLKID = @LIBBLKID@ @PRIVATE_LIBS_CMT@ $(LIBUUID)
 LIBINTL = @LIBINTL@
+LIBPTHREADS = @PTHREADS_LIB@
 SYSLIBS = @LIBS@
 DEPLIBSS = $(LIB)/libss@LIB_EXT@
 DEPLIBCOM_ERR = $(LIB)/libcom_err@LIB_EXT@
diff --git a/configure b/configure
index 7b0a0d1..5b89229 100755
--- a/configure
+++ b/configure
@@ -639,6 +639,7 @@ CYGWIN_CMT
 LINUX_CMT
 UNI_DIFF_OPTS
 SEM_INIT_LIB
+PTHREADS_LIB
 SOCKET_LIB
 SIZEOF_OFF_T
 SIZEOF_LONG_LONG
@@ -10474,7 +10475,7 @@ fi
 done
 
 fi
-for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
+for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -11235,6 +11236,52 @@ if test $ac_cv_have_optreset = yes; then
 $as_echo "#define HAVE_OPTRESET 1" >>confdefs.h
 
 fi
+PTHREADS_LIB='-lpthread'
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for pthread_create in -lpthread" >&5
+$as_echo_n "checking for pthread_create in -lpthread... " >&6; }
+if ${ac_cv_lib_pthread_pthread_create+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lpthread  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char pthread_create ();
+int
+main ()
+{
+return pthread_create ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_pthread_pthread_create=yes
+else
+  ac_cv_lib_pthread_pthread_create=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_pthread_pthread_create" >&5
+$as_echo "$ac_cv_lib_pthread_pthread_create" >&6; }
+if test "x$ac_cv_lib_pthread_pthread_create" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBPTHREAD 1
+_ACEOF
+
+  LIBS="-lpthread $LIBS"
+
+fi
+
 
 SEM_INIT_LIB=''
 ac_fn_c_check_func "$LINENO" "sem_init" "ac_cv_func_sem_init"
diff --git a/configure.in b/configure.in
index f28bd46..d2cfe41 100644
--- a/configure.in
+++ b/configure.in
@@ -961,6 +961,7 @@ AC_CHECK_HEADERS(m4_flatten([
 	sys/sockio.h
 	sys/stat.h
 	sys/syscall.h
+	sys/sysctl.h
 	sys/sysmacros.h
 	sys/time.h
 	sys/types.h
@@ -1173,6 +1174,11 @@ if test $ac_cv_have_optreset = yes; then
   AC_DEFINE(HAVE_OPTRESET, 1, [Define to 1 if optreset for getopt is present])
 fi
 dnl
+dnl Test for pthread_create in -lpthread
+dnl
+PTHREADS_LIB='-lpthread'
+AC_CHECK_LIB(pthread, pthread_create, AC_SUBST(PTHREADS_LIB))
+dnl
 dnl Test for sem_init, and which library it might require:
 dnl
 AH_TEMPLATE([HAVE_SEM_INIT], [Define to 1 if sem_init() exists])
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 5c8ce39..7136f7f 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -16,13 +16,13 @@ MANPAGES=	e2fsck.8
 FMANPAGES=	e2fsck.conf.5
 
 LIBS= $(LIBQUOTA) $(LIBEXT2FS) $(LIBCOM_ERR) $(LIBBLKID) $(LIBUUID) \
-	$(LIBINTL) $(LIBE2P) $(SYSLIBS)
+	$(LIBINTL) $(LIBE2P) $(SYSLIBS) $(LIBPTHREADS)
 DEPLIBS= $(DEPLIBQUOTA) $(LIBEXT2FS) $(DEPLIBCOM_ERR) $(DEPLIBBLKID) \
 	 $(DEPLIBUUID) $(DEPLIBE2P)
 
 STATIC_LIBS= $(STATIC_LIBQUOTA) $(STATIC_LIBEXT2FS) $(STATIC_LIBCOM_ERR) \
 	     $(STATIC_LIBBLKID) $(STATIC_LIBUUID) $(LIBINTL) $(STATIC_LIBE2P) \
-	     $(SYSLIBS)
+	     $(SYSLIBS) $(LIBPTHEADS)
 STATIC_DEPLIBS= $(DEPSTATIC_LIBQUOTA) $(STATIC_LIBEXT2FS) \
 		$(DEPSTATIC_LIBCOM_ERR) $(DEPSTATIC_LIBBLKID) \
 		$(DEPSTATIC_LIBUUID) $(DEPSTATIC_LIBE2P)
diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 43ee063..90eda4c 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -34,6 +34,10 @@ e2fsck \- check a Linux ext2/ext3/ext4 file system
 .B \-E
 .I extended_options
 ]
+[
+.B \-R
+.I readahead_mem_kb
+]
 .I device
 .SH DESCRIPTION
 .B e2fsck
@@ -302,6 +306,11 @@ options.
 This option does nothing at all; it is provided only for backwards
 compatibility.
 .TP
+.B \-R
+Use at most this many KiB to pre-fetch metadata in the hopes of reducing
+e2fsck runtime.  By default, this uses half the physical memory in the
+system; setting this value to zero disables readahead entirely.
+.TP
 .B \-t
 Print timing statistics for
 .BR e2fsck .
diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 0ec1540..c5d823c 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -15,6 +15,10 @@
 #include "e2fsck.h"
 #include "problem.h"
 
+#ifdef HAVE_PTHREAD_H
+#include <pthread.h>
+#endif
+
 /*
  * This function allocates an e2fsck context
  */
@@ -44,6 +48,8 @@ errcode_t e2fsck_allocate_context(e2fsck_t *ret)
 			context->flags |= E2F_FLAG_TIME_INSANE;
 	}
 
+	e2fsck_init_thread(&context->ra_thread);
+
 	*ret = context;
 	return 0;
 }
@@ -209,6 +215,7 @@ int e2fsck_run(e2fsck_t ctx)
 {
 	int	i;
 	pass_t	e2fsck_pass;
+	errcode_t	err;
 
 #ifdef HAVE_SETJMP_H
 	if (setjmp(ctx->abort_loc)) {
@@ -226,6 +233,10 @@ int e2fsck_run(e2fsck_t ctx)
 		e2fsck_pass(ctx);
 		if (ctx->progress)
 			(void) (ctx->progress)(ctx, 0, 0, 0);
+		err = e2fsck_stop_thread(&ctx->ra_thread, NULL);
+		if (err)
+			com_err(ctx->program_name, err, "%s",
+				_("while stopping readahead"));
 	}
 	ctx->flags &= ~E2F_FLAG_SETJMP_OK;
 
@@ -233,3 +244,128 @@ int e2fsck_run(e2fsck_t ctx)
 		return (ctx->flags & E2F_FLAG_RUN_RETURN);
 	return 0;
 }
+
+#ifdef HAVE_PTHREAD_H
+struct run_threaded {
+	struct e2fsck_thread *thread;
+	void * (*func)(void *);
+	void (*cleanup)(void *);
+	void *arg;
+};
+
+static void run_threaded_cleanup(void *p)
+{
+	struct run_threaded *rt = p;
+
+	if (rt->cleanup)
+		rt->cleanup(rt->arg);
+	pthread_mutex_lock(&rt->thread->lock);
+	rt->thread->running = 0;
+	pthread_mutex_unlock(&rt->thread->lock);
+	ext2fs_free_mem(&rt);
+}
+
+static void *run_threaded_helper(void *p)
+{
+	int old;
+	struct run_threaded *rt = p;
+	void *ret;
+
+	pthread_cleanup_push(run_threaded_cleanup, rt);
+	pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+	ret = rt->func(rt->arg);
+	pthread_setcanceltype(old, NULL);
+	pthread_cleanup_pop(1);
+	pthread_exit(ret);
+	return NULL;
+}
+#endif /* HAVE_PTHREAD_H */
+
+errcode_t e2fsck_init_thread(struct e2fsck_thread *thread)
+{
+	errcode_t err = 0;
+
+	thread->magic = E2FSCK_ET_MAGIC_RUN_THREAD;
+#ifdef HAVE_PTHREAD_H
+	err = pthread_mutex_init(&thread->lock, NULL);
+#endif /* HAVE_PTHREAD_H */
+
+	return err;
+}
+
+errcode_t e2fsck_run_thread(struct e2fsck_thread *thread,
+			    void * (*func)(void *), void (*cleanup)(void *),
+			    void *arg)
+{
+#ifdef HAVE_PTHREAD_H
+	struct run_threaded *rt;
+#endif
+	errcode_t err = 0, err2;
+
+	EXT2_CHECK_MAGIC(thread, E2FSCK_ET_MAGIC_RUN_THREAD);
+#ifdef HAVE_PTHREAD_H
+	err = pthread_mutex_lock(&thread->lock);
+	if (err)
+		return err;
+
+	if (thread->running) {
+		err = EAGAIN;
+		goto out;
+	}
+
+	err = pthread_join(thread->tid, NULL);
+	if (err && err != ESRCH)
+		goto out;
+
+	err = ext2fs_get_mem(sizeof(*rt), &rt);
+	if (err)
+		goto out;
+
+	rt->thread = thread;
+	rt->func = func;
+	rt->cleanup = cleanup;
+	rt->arg = arg;
+
+	err = pthread_create(&thread->tid, NULL, run_threaded_helper, rt);
+	if (err)
+		ext2fs_free_mem(&rt);
+	else
+		thread->running = 1;
+out:
+	pthread_mutex_unlock(&thread->lock);
+#else
+	thread->ret = func(arg);
+	if (cleanup)
+		cleanup(arg);
+#endif /* HAVE_PTHREAD_H */
+
+	return err;
+}
+
+errcode_t e2fsck_stop_thread(struct e2fsck_thread *thread, void **ret)
+{
+	errcode_t err = 0, err2;
+
+	EXT2_CHECK_MAGIC(thread, E2FSCK_ET_MAGIC_RUN_THREAD);
+
+#ifdef HAVE_PTHREAD_H
+	err = pthread_mutex_lock(&thread->lock);
+	if (err)
+		return err;
+	if (thread->running)
+		err = pthread_cancel(thread->tid);
+	if (err == ESRCH)
+		err = 0;
+	err2 = pthread_mutex_unlock(&thread->lock);
+	if (!err && err2)
+		err = err2;
+	if (!err)
+		err = pthread_join(thread->tid, ret);
+	if (err == ESRCH)
+		err = 0;
+#else
+	if (ret)
+		*ret = thread->ret;
+#endif
+	return err;
+}
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index a8219a8..fcda392 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -205,6 +205,19 @@ of that type are squelched.  This can be useful if the console is slow
 (i.e., connected to a serial port) and so a large amount of output could
 end up delaying the boot process for a long time (potentially hours).
 .TP
+.I readahead_mem_pct
+Use no more than this percentage of memory to try to read in metadata blocks
+ahead of the main e2fsck thread.  This should reduce run times, depending on
+the speed of the underlying storage and the amount of free memory.  By default,
+this is set to 50%.
+.TP
+.I readahead_mem_kb
+Use no more than this amount of memory to read in metadata blocks ahead of the
+main checking thread.  Setting this value to zero disables readahead entirely.
+There is no default, but see
+.B readahead_mem_pct
+for more details.
+.TP
 .I report_features
 If this boolean relation is true, e2fsck will print the file system
 features as part of its verbose reporting (i.e., if the
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index d7a7be9..8ceeff9 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -11,6 +11,7 @@
 
 #include <stdio.h>
 #include <string.h>
+#include <stdint.h>
 #ifdef HAVE_UNISTD_H
 #include <unistd.h>
 #endif
@@ -69,6 +70,24 @@
 
 #include "quota/mkquota.h"
 
+/* Functions to run something asynchronously */
+struct e2fsck_thread {
+	int magic;
+#ifdef HAVE_PTHREAD_H
+	int running;
+	pthread_t tid;
+	pthread_mutex_t lock;
+#else
+	void *ret;
+#endif /* HAVE_PTHREAD_T */
+};
+
+errcode_t e2fsck_init_thread(struct e2fsck_thread *thread);
+errcode_t e2fsck_run_thread(struct e2fsck_thread *thread,
+			    void * (*func)(void *), void (*cleanup)(void *),
+			    void *arg);
+errcode_t e2fsck_stop_thread(struct e2fsck_thread *thread, void **ret);
+
 /*
  * Exit codes used by fsck-type programs
  */
@@ -373,6 +392,10 @@ struct e2fsck_struct {
 	 * e2fsck functions themselves.
 	 */
 	void *priv_data;
+
+	/* How much are we allowed to readahead? */
+	unsigned long long readahead_mem_kb;
+	struct e2fsck_thread ra_thread;
 };
 
 /* Used by the region allocation code */
@@ -495,6 +518,7 @@ void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino);
 int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino);
 errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino);
 void e2fsck_rehash_directories(e2fsck_t ctx);
+int e2fsck_will_rehash_dirs(e2fsck_t ctx);
 
 /* sigcatcher.c */
 void sigcatcher_setup(void);
@@ -573,6 +597,7 @@ extern errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs,
 						   int default_type,
 						   const char *profile_name,
 						   ext2fs_block_bitmap *ret);
+int64_t get_memory_size(void);
 
 /* unix.c */
 extern void e2fsck_clear_progbar(e2fsck_t ctx);
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index eb9497c..a6d3297 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -589,6 +589,49 @@ static errcode_t recheck_bad_inode_checksum(ext2_filsys fs, ext2_ino_t ino,
 	return 0;
 }
 
+struct pass1ra_ctx {
+	ext2_filsys fs;
+	dgrp_t group;
+	dgrp_t ngroups;
+};
+
+static void pass1_readahead_cleanup(void *p)
+{
+	struct pass1ra_ctx *c = p;
+
+	ext2fs_free_mem(&p);
+}
+
+static void *pass1_readahead(void *p)
+{
+	struct pass1ra_ctx *c = p;
+	errcode_t err;
+
+	ext2fs_readahead(c->fs, EXT2_READA_ITABLE, c->group, c->ngroups);
+	return NULL;
+}
+
+static errcode_t initiate_readahead(e2fsck_t ctx, dgrp_t group, dgrp_t ngroups)
+{
+	struct pass1ra_ctx *ractx;
+	errcode_t err;
+
+	err = ext2fs_get_mem(sizeof(*ractx), &ractx);
+	if (err)
+		return err;
+
+	ractx->fs = ctx->fs;
+	ractx->group = group;
+	ractx->ngroups = ngroups;
+
+	err = e2fsck_run_thread(&ctx->ra_thread, pass1_readahead,
+				pass1_readahead_cleanup, ractx);
+	if (err)
+		ext2fs_free_mem(&ractx);
+
+	return err;
+}
+
 void e2fsck_pass1(e2fsck_t ctx)
 {
 	int	i;
@@ -611,10 +654,37 @@ void e2fsck_pass1(e2fsck_t ctx)
 	int		busted_fs_time = 0;
 	int		inode_size;
 	int		failed_csum = 0;
+	dgrp_t		grp;
+	ext2_ino_t	ra_threshold = 0;
+	dgrp_t		ra_groups = 0;
+	errcode_t	err;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	clear_problem_context(&pctx);
 
+	/* If we can do readahead, figure out how many groups to pull in. */
+	if (!ext2fs_can_readahead(ctx->fs))
+		ctx->readahead_mem_kb = 0;
+	if (ctx->readahead_mem_kb) {
+		ra_groups = ctx->readahead_mem_kb /
+			    (fs->inode_blocks_per_group * fs->blocksize /
+			     1024);
+		if (ra_groups < 16)
+			ra_groups = 0;
+		else if (ra_groups > fs->group_desc_count)
+			ra_groups = fs->group_desc_count;
+		if (ra_groups) {
+			err = initiate_readahead(ctx, grp, ra_groups);
+			if (err) {
+				com_err(ctx->program_name, err, "%s",
+					_("while starting pass1 readahead"));
+				ra_groups = 0;
+			}
+			ra_threshold = ra_groups *
+				       fs->super->s_inodes_per_group;
+		}
+	}
+
 	if (!(ctx->options & E2F_OPT_PREEN))
 		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
 
@@ -778,6 +848,19 @@ void e2fsck_pass1(e2fsck_t ctx)
 			if (e2fsck_mmp_update(fs))
 				fatal_error(ctx, 0);
 		}
+		if (ra_groups && ino > ra_threshold) {
+			grp = (ino - 1) / fs->super->s_inodes_per_group;
+			ra_threshold = (grp + ra_groups) *
+				       fs->super->s_inodes_per_group;
+			err = initiate_readahead(ctx, grp, ra_groups);
+			if (err == EAGAIN) {
+				printf("Disabling slow readahead.\n");
+				ra_groups = 0;
+			} else if (err) {
+				com_err(ctx->program_name, err, "%s",
+					_("while starting pass1 readahead"));
+			}
+		}
 		old_op = ehandler_operation(_("getting next inode from scan"));
 		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
 							  inode, inode_size);
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 99b4042..292db82 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -61,6 +61,9 @@
  * Keeps track of how many times an inode is referenced.
  */
 static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf);
+static int check_dir_block2(ext2_filsys fs,
+			   struct ext2_db_entry2 *dir_blocks_info,
+			   void *priv_data);
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *dir_blocks_info,
 			   void *priv_data);
@@ -77,8 +80,67 @@ struct check_dir_struct {
 	struct problem_context	pctx;
 	int	count, max;
 	e2fsck_t ctx;
+	int	save_readahead;
+};
+
+struct pass2_readahead_data {
+	ext2_filsys fs;
+	ext2_dblist dblist;
 };
 
+static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
+			       void *priv_data)
+{
+	db->blockcnt = 1;
+	return 0;
+}
+
+static void pass2_readahead_cleanup(void *p)
+{
+	struct pass2_readahead_data *pr = p;
+
+	ext2fs_free_dblist(pr->dblist);
+	ext2fs_free_mem(&pr);
+}
+
+static void *pass2_readahead(void *p)
+{
+	struct pass2_readahead_data *pr = p;
+
+	ext2fs_readahead_dblist(pr->fs, 0, pr->dblist);
+	return NULL;
+}
+
+static errcode_t initiate_readahead(e2fsck_t ctx)
+{
+	struct pass2_readahead_data *pr;
+	errcode_t err;
+
+	err = ext2fs_get_mem(sizeof(*pr), &pr);
+	if (err)
+		return err;
+	pr->fs = ctx->fs;
+	err = ext2fs_copy_dblist(ctx->fs->dblist, &pr->dblist);
+	if (err)
+		goto out_pr;
+	err = ext2fs_dblist_iterate2(pr->dblist, readahead_dir_block,
+				     NULL);
+	if (err)
+		goto out_dblist;
+	err = e2fsck_run_thread(&ctx->ra_thread, pass2_readahead,
+				pass2_readahead_cleanup, pr);
+	if (err)
+		goto out_dblist;
+
+	return 0;
+
+out_dblist:
+	ext2fs_free_dblist(pr->dblist);
+out_pr:
+	ext2fs_free_mem(&pr);
+	return err;
+}
+
 void e2fsck_pass2(e2fsck_t ctx)
 {
 	struct ext2_super_block *sb = ctx->fs->super;
@@ -96,6 +158,10 @@ void e2fsck_pass2(e2fsck_t ctx)
 	int			i, depth;
 	problem_t		code;
 	int			bad_dir;
+	int (*check_dir_func)(ext2_filsys fs,
+			      struct ext2_db_entry2 *dir_blocks_info,
+			      void *priv_data);
+	errcode_t		err;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 	clear_problem_context(&cd.pctx);
@@ -139,6 +205,7 @@ void e2fsck_pass2(e2fsck_t ctx)
 	cd.ctx = ctx;
 	cd.count = 1;
 	cd.max = ext2fs_dblist_count2(fs->dblist);
+	cd.save_readahead = e2fsck_will_rehash_dirs(ctx);
 
 	if (ctx->progress)
 		(void) (ctx->progress)(ctx, 2, 0, cd.max);
@@ -146,7 +213,16 @@ void e2fsck_pass2(e2fsck_t ctx)
 	if (fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX)
 		ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);
 
-	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_block,
+	if (ctx->readahead_mem_kb) {
+		check_dir_func = check_dir_block2;
+		err = initiate_readahead(ctx);
+		if (err)
+			com_err(ctx->program_name, err, "%s",
+				_("while starting pass2 readahead"));
+	} else
+		check_dir_func = check_dir_block;
+
+	cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_func,
 						 &cd);
 	if (ctx->flags & E2F_FLAG_SIGNAL_MASK || ctx->flags & E2F_FLAG_RESTART)
 		return;
@@ -655,6 +731,7 @@ clear_and_exit:
 	clear_htree(cd->ctx, cd->pctx.ino);
 	dx_dir->numblocks = 0;
 	e2fsck_rehash_dir_later(cd->ctx, cd->pctx.ino);
+	cd->save_readahead = 1;
 }
 #endif /* ENABLE_HTREE */
 
@@ -774,6 +851,19 @@ static errcode_t insert_dirent_tail(ext2_filsys fs, void *dirbuf)
 	return 0;
 }
 
+static int check_dir_block2(ext2_filsys fs,
+			   struct ext2_db_entry2 *db,
+			   void *priv_data)
+{
+	int err;
+	struct check_dir_struct *cd = priv_data;
+
+	err = check_dir_block(fs, db, priv_data);
+	if (!cd->save_readahead)
+		io_channel_cache_release(fs->io, db->blk, 1);
+	return err;
+}
+
 static int check_dir_block(ext2_filsys fs,
 			   struct ext2_db_entry2 *db,
 			   void *priv_data)
@@ -957,6 +1047,7 @@ out_htree:
 					 &cd->pctx))
 				goto skip_checksum;
 			e2fsck_rehash_dir_later(ctx, ino);
+			cd->save_readahead = 1;
 			goto skip_checksum;
 		}
 		if (failed_csum) {
@@ -1249,6 +1340,7 @@ skip_checksum:
 			pctx.dirent = dirent;
 			fix_problem(ctx, PR_2_REPORT_DUP_DIRENT, &pctx);
 			e2fsck_rehash_dir_later(ctx, ino);
+			cd->save_readahead = 1;
 			dups_found++;
 		} else
 			dict_alloc_insert(&de_dict, dirent, dirent);
@@ -1316,6 +1408,8 @@ skip_checksum:
 			if (insert_dirent_tail(fs, buf) == 0)
 				goto write_and_fix;
 			e2fsck_rehash_dir_later(ctx, ino);
+			cd->save_readahead = 1;
+		}
 
 write_and_fix:
 		if (e2fsck_dir_will_be_rehashed(ctx, ino))
diff --git a/e2fsck/pass4.c b/e2fsck/pass4.c
index 21d93f0..959dfc3 100644
--- a/e2fsck/pass4.c
+++ b/e2fsck/pass4.c
@@ -87,6 +87,21 @@ static int disconnect_inode(e2fsck_t ctx, ext2_ino_t i,
 	return 0;
 }
 
+/* Since pass4 is mostly CPU bound, start readahead of bitmaps for pass 5. */
+static void *pass5_readahead(void *p)
+{
+	ext2_filsys fs = p;
+
+	ext2fs_readahead(fs, EXT2_READA_BBITMAP | EXT2_READA_IBITMAP, 0,
+			 fs->group_desc_count);
+	return NULL;
+}
+
+static errcode_t initiate_readahead(e2fsck_t ctx)
+{
+	return e2fsck_run_thread(&ctx->ra_thread, pass5_readahead, NULL,
+				 ctx->fs);
+}
 
 void e2fsck_pass4(e2fsck_t ctx)
 {
@@ -100,12 +115,19 @@ void e2fsck_pass4(e2fsck_t ctx)
 	__u16	link_count, link_counted;
 	char	*buf = 0;
 	dgrp_t	group, maxgroup;
+	errcode_t	err;
 
 	init_resource_track(&rtrack, ctx->fs->io);
 
 #ifdef MTRACE
 	mtrace_print("Pass 4");
 #endif
+	if (ctx->readahead_mem_kb) {
+		err = initiate_readahead(ctx);
+		if (err)
+			com_err(ctx->program_name, err, "%s",
+				_("while starting pass5 readahead"));
+	}
 
 	clear_problem_context(&pctx);
 
diff --git a/e2fsck/prof_err.et b/e2fsck/prof_err.et
index c9316c7..21fb524 100644
--- a/e2fsck/prof_err.et
+++ b/e2fsck/prof_err.et
@@ -62,5 +62,6 @@ error_code	PROF_BAD_INTEGER,		"Invalid integer value"
 
 error_code	PROF_MAGIC_FILE_DATA, "Bad magic value in profile_file_data_t"
 
+error_code	E2FSCK_ET_MAGIC_RUN_THREAD,	"Wrong magic number for e2fsck_thread structure"
 
 end
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 3b05715..89708c2 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -71,6 +71,16 @@ int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino)
 	return ext2fs_u32_list_test(ctx->dirs_to_hash, ino);
 }
 
+/* Ask if there will be a pass 3A. */
+int e2fsck_will_rehash_dirs(e2fsck_t ctx)
+{
+	if (ctx->options & E2F_OPT_COMPRESS_DIRS)
+		return 1;
+	if (!ctx->dirs_to_hash)
+		return 0;
+	return ext2fs_u32_list_count(ctx->dirs_to_hash) > 0;
+}
+
 struct fill_dir_struct {
 	char *buf;
 	struct ext2_inode *inode;
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 80ebdb1..d6ef8c5 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -74,7 +74,7 @@ static void usage(e2fsck_t ctx)
 		_("Usage: %s [-panyrcdfvtDFV] [-b superblock] [-B blocksize]\n"
 		"\t\t[-I inode_buffer_blocks] [-P process_inode_size]\n"
 		"\t\t[-l|-L bad_blocks_file] [-C fd] [-j external_journal]\n"
-		"\t\t[-E extended-options] device\n"),
+		"\t\t[-E extended-options] [-R readahead_kb] device\n"),
 		ctx->program_name);
 
 	fprintf(stderr, "%s", _("\nEmergency help:\n"
@@ -90,6 +90,7 @@ static void usage(e2fsck_t ctx)
 		" -j external_journal  Set location of the external journal\n"
 		" -l bad_blocks_file   Add to badblocks list\n"
 		" -L bad_blocks_file   Set badblocks list\n"
+		" -R readahead_kb      Allow this much readahead.\n"
 		));
 
 	exit(FSCK_USAGE);
@@ -749,6 +750,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 #ifdef CONFIG_JBD_DEBUG
 	char 		*jbd_debug;
 #endif
+	unsigned long long phys_mem_kb, reada_kb;
 
 	retval = e2fsck_allocate_context(&ctx);
 	if (retval)
@@ -776,8 +778,16 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	else
 		ctx->program_name = "e2fsck";
 
-	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
+	phys_mem_kb = get_memory_size() / 1024;
+	reada_kb = ~0ULL;
+	while ((c = getopt(argc, argv,
+			   "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDkR:")) != EOF)
 		switch (c) {
+		case 'R':
+			res = sscanf(optarg, "%llu", &reada_kb);
+			if (res != 1)
+				goto sscanf_err;
+			break;
 		case 'C':
 			ctx->progress = e2fsck_update_progress;
 			res = sscanf(optarg, "%d", &ctx->progress_fd);
@@ -965,6 +975,22 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	if (c)
 		verbose = 1;
 
+	/* Figure out how much memory goes to readahead */
+	profile_get_integer(ctx->profile, "options", "readahead_mem_pct", 0,
+			    50, &c);
+	if (c >= 0 && c <= 100)
+		ctx->readahead_mem_kb = phys_mem_kb * c / 100;
+	else
+		ctx->readahead_mem_kb = phys_mem_kb / 2;
+	profile_get_integer(ctx->profile, "options", "readahead_mem_kb", 0,
+			    -1, &c);
+	if (c >= 0)
+		ctx->readahead_mem_kb = c;
+	if (reada_kb != ~0ULL)
+		ctx->readahead_mem_kb = reada_kb;
+	if (ctx->readahead_mem_kb > phys_mem_kb)
+		ctx->readahead_mem_kb = phys_mem_kb;
+
 	/* Turn off discard in read-only mode */
 	if ((ctx->options & E2F_OPT_NO) &&
 	    (ctx->options & E2F_OPT_DISCARD))
@@ -1782,6 +1808,11 @@ no_journal:
 		}
 	}
 
+	retval = e2fsck_stop_thread(&ctx->ra_thread, NULL);
+	if (retval)
+		com_err(ctx->program_name, retval, "%s",
+			_("while stopping readahead"));
+
 	e2fsck_write_bitmaps(ctx);
 	io_channel_flush(ctx->fs->io);
 	print_resource_track(ctx, NULL, &ctx->global_rtrack, ctx->fs->io);
diff --git a/e2fsck/util.c b/e2fsck/util.c
index fec6179..09b78c2 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -37,6 +37,10 @@
 #include <errno.h>
 #endif
 
+#ifdef HAVE_SYS_SYSCTL_H
+#include <sys/sysctl.h>
+#endif
+
 #include "e2fsck.h"
 
 extern e2fsck_t e2fsck_global_ctx;   /* Try your very best not to use this! */
@@ -845,3 +849,50 @@ errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs, const char *descr,
 	fs->default_bitmap_type = save_type;
 	return retval;
 }
+
+/* Return memory size in bytes */
+int64_t get_memory_size(void)
+{
+#if defined(_SC_PHYS_PAGES)
+# if defined(_SC_PAGESIZE)
+	return (int64_t)sysconf(_SC_PHYS_PAGES) *
+	       (int64_t)sysconf(_SC_PAGESIZE);
+# elif defined(_SC_PAGE_SIZE)
+	return (int64_t)sysconf(_SC_PHYS_PAGES) *
+	       (int64_t)sysconf(_SC_PAGE_SIZE);
+# endif
+#elif defined(_SC_AIX_REALMEM)
+	return (int64_t)sysconf(_SC_AIX_REALMEM) * (int64_t)1024L;
+#elif defined(CTL_HW)
+# if (defined(HW_MEMSIZE) || defined(HW_PHYSMEM64))
+#  define CTL_HW_INT64
+# elif (defined(HW_PHYSMEM) || defined(HW_REALMEM))
+#  define CTL_HW_UINT
+# endif
+	int mib[2];
+	mib[0] = CTL_HW;
+# if defined(HW_MEMSIZE)
+	mib[1] = HW_MEMSIZE;
+# elif defined(HW_PHYSMEM64)
+	mib[1] = HW_PHYSMEM64;
+# elif defined(HW_REALMEM)
+	mib[1] = HW_REALMEM;
+# elif defined(HW_PYSMEM)
+	mib[1] = HW_PHYSMEM;
+# endif
+# if defined(CTL_HW_INT64)
+	int64_t size = 0;
+# elif defined(CTL_HW_UINT)
+	unsigned int size = 0;
+# endif
+# if defined(CTL_HW_INT64) || defined(CTL_HW_UINT)
+	size_t len = sizeof(size);
+	if (sysctl(mib, 2, &size, &len, NULL, 0) == 0)
+		return (int64_t)size;
+# endif
+	return 0;
+#else
+# warning "Don't know how to detect memory on your platform?"
+	return 0;
+#endif
+}
diff --git a/lib/config.h.in b/lib/config.h.in
index e0384ee..836c2df 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -203,6 +203,9 @@
 /* Define if your <locale.h> file defines LC_MESSAGES. */
 #undef HAVE_LC_MESSAGES
 
+/* Define to 1 if you have the `pthread' library (-lpthread). */
+#undef HAVE_LIBPTHREAD
+
 /* Define to 1 if you have the <limits.h> header file. */
 #undef HAVE_LIMITS_H
 
@@ -314,6 +317,9 @@
 /* Define to 1 if you have the `pread' function. */
 #undef HAVE_PREAD
 
+/* Define to 1 if you have the <pthread.h> header file. */
+#undef HAVE_PTHREAD_H
+
 /* Define to 1 if you have the `putenv' function. */
 #undef HAVE_PUTENV
 
@@ -465,6 +471,9 @@
 /* Define to 1 if you have the <sys/syscall.h> header file. */
 #undef HAVE_SYS_SYSCALL_H
 
+/* Define to 1 if you have the <sys/sysctl.h> header file. */
+#undef HAVE_SYS_SYSCTL_H
+
 /* Define to 1 if you have the <sys/sysmacros.h> header file. */
 #undef HAVE_SYS_SYSMACROS_H
 


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 38/49] libext2fs: when appending to a file, don't split an index block in equal halves
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (36 preceding siblings ...)
  2014-03-11  6:57 ` [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 39/49] libext2fs: find inode goal when allocating blocks Darrick J. Wong
                   ` (8 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're appending an extent to the end of a file and the index
block is full, don't split the index block into two half-full index
blocks because this leaves us with under utilized index blocks, at
least in the fallocate case.  Instead, copy the last extent from the
full block into the new block.  This isn't perfect utilization, but
there's a lot of work involved in teaching extent.c to be able to goto
a nonexistent node in a newly allocated (and empty) extent block.

This patch does not fix the general problem of keeping the extent tree
balanced.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/extent.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 7 deletions(-)


diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 80ce88f..cf75a0b 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -29,6 +29,8 @@
 #include "ext2fsP.h"
 #include "e2image.h"
 
+#undef DEBUG
+
 /*
  * Definitions to be dropped in lib/ext2fs/ext2fs.h
  */
@@ -122,11 +124,39 @@ static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
 
 }
 
+static void dump_path(const char *tag, struct ext2_extent_handle *handle,
+		      struct extent_path *path)
+{
+	struct extent_path *ppp = path;
+	printf("%s: level=%d\n", tag, handle->level);
+
+	do {
+		printf("%s: path=%ld buf=%p entries=%d max_entries=%d left=%d "
+		       "visit_num=%d flags=0x%x end_blk=%llu curr=%p(%ld)\n",
+		       tag, (ppp - handle->path), ppp->buf, ppp->entries,
+		       ppp->max_entries, ppp->left, ppp->visit_num, ppp->flags,
+		       ppp->end_blk, ppp->curr, ppp->curr - (void *)ppp->buf);
+		printf("  ");
+		dbg_show_header((struct ext3_extent_header *)ppp->buf);
+		if (ppp->curr) {
+			printf("  ");
+			dbg_show_index(ppp->curr);
+			printf("  ");
+			dbg_show_extent(ppp->curr);
+		}
+		ppp--;
+	} while (ppp >= handle->path);
+	fflush(stdout);
+
+	return;
+}
+
 #else
 #define dbg_show_header(eh) do { } while (0)
 #define dbg_show_index(ix) do { } while (0)
 #define dbg_show_extent(ex) do { } while (0)
 #define dbg_print_extent(desc, ex) do { } while (0)
+#define dump_path(tag, handle, path) do { } while (0)
 #endif
 
 /*
@@ -837,12 +867,31 @@ errcode_t ext2fs_extent_replace(ext2_extent_handle_t handle,
 	return 0;
 }
 
+static int splitting_at_eof(struct ext2_extent_handle *handle,
+			    struct extent_path *path)
+{
+	struct extent_path *ppp = path;
+	dump_path(__func__, handle, path);
+
+	if (handle->level == 0)
+		return 0;
+
+	do {
+		if (ppp->left)
+			return 0;
+		ppp--;
+	} while (ppp >= handle->path);
+
+	return 1;
+}
+
 /*
  * allocate a new block, move half the current node to it, and update parent
  *
  * handle will be left pointing at original record.
  */
-errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
+static errcode_t extent_node_split(ext2_extent_handle_t handle,
+				   int expand_allowed)
 {
 	errcode_t			retval = 0;
 	blk64_t				new_node_pblk;
@@ -857,6 +906,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 	int				tocopy;
 	int				new_root = 0;
 	struct ext2_extent_info		info;
+	int				no_balance;
 
 	/* basic sanity */
 	EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EXTENT_HANDLE);
@@ -897,7 +947,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 			goto done;
 		goal_blk = extent.e_pblk;
 
-		retval = ext2fs_extent_node_split(handle);
+		retval = extent_node_split(handle, expand_allowed);
 		if (retval)
 			goto done;
 
@@ -912,6 +962,14 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 	if (!path->curr)
 		return EXT2_ET_NO_CURRENT_NODE;
 
+	/*
+	 * Normally, we try to split a full node in half.  This doesn't turn
+	 * out so well if we're tacking extents on the end of the file because
+	 * then we're stuck with a tree of half-full extent blocks.  This of
+	 * course doesn't apply to the root level.
+	 */
+	no_balance = expand_allowed ? splitting_at_eof(handle, path) : 0;
+
 	/* extent header of the current node we'll split */
 	eh = (struct ext3_extent_header *)path->buf;
 
@@ -925,7 +983,10 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 		if (retval)
 			goto done;
 	} else {
-		tocopy = ext2fs_le16_to_cpu(eh->eh_entries) / 2;
+		if (no_balance)
+			tocopy = 1;
+		else
+			tocopy = ext2fs_le16_to_cpu(eh->eh_entries) / 2;
 	}
 
 #ifdef DEBUG
@@ -934,7 +995,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 				handle->level);
 #endif
 
-	if (!tocopy) {
+	if (!tocopy && !no_balance) {
 #ifdef DEBUG
 		printf("Nothing to copy to new block!\n");
 #endif
@@ -1059,8 +1120,7 @@ errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
 		goto done;
 
 	/* new node hooked in, so update inode block count (do this here?) */
-	handle->inode->i_blocks += (handle->fs->blocksize *
-				    EXT2FS_CLUSTER_RATIO(handle->fs)) / 512;
+	ext2fs_iblk_add_blocks(handle->fs, handle->inode, 1);
 	retval = ext2fs_write_inode(handle->fs, handle->ino,
 				    handle->inode);
 	if (retval)
@@ -1074,6 +1134,11 @@ done:
 	return retval;
 }
 
+errcode_t ext2fs_extent_node_split(ext2_extent_handle_t handle)
+{
+	return extent_node_split(handle, 0);
+}
+
 errcode_t ext2fs_extent_insert(ext2_extent_handle_t handle, int flags,
 				      struct ext2fs_extent *extent)
 {
@@ -1105,7 +1170,7 @@ errcode_t ext2fs_extent_insert(ext2_extent_handle_t handle, int flags,
 			printf("node full (level %d) - splitting\n",
 				   handle->level);
 #endif
-			retval = ext2fs_extent_node_split(handle);
+			retval = extent_node_split(handle, 1);
 			if (retval)
 				return retval;
 			path = handle->path + handle->level;


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 39/49] libext2fs: find inode goal when allocating blocks
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (37 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 38/49] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 40/49] libext2fs: find a range of empty blocks Darrick J. Wong
                   ` (7 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Try to be a little smarter about where we go to allocate blocks for a
inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass2.c         |    3 ++-
 lib/ext2fs/alloc.c     |   10 ++++++++++
 lib/ext2fs/bmap.c      |    5 +++--
 lib/ext2fs/expanddir.c |    2 +-
 lib/ext2fs/ext2fs.h    |    1 +
 lib/ext2fs/ext_attr.c  |    3 +--
 lib/ext2fs/extent.c    |   10 ++--------
 lib/ext2fs/mkdir.c     |    3 ++-
 lib/ext2fs/symlink.c   |    3 ++-
 9 files changed, 24 insertions(+), 16 deletions(-)


diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 292db82..5b84947 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1729,7 +1729,8 @@ static int allocate_dir_block(e2fsck_t ctx,
 	/*
 	 * First, find a free block
 	 */
-	pctx->errcode = ext2fs_new_block2(fs, 0, ctx->block_found_map, &blk);
+	blk = ext2fs_find_inode_goal(fs, db->ino);
+	pctx->errcode = ext2fs_new_block2(fs, blk, ctx->block_found_map, &blk);
 	if (pctx->errcode) {
 		pctx->str = "ext2fs_new_block";
 		fix_problem(ctx, PR_2_ALLOC_DIRBOCK, pctx);
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 1be4ecc..aa084ac 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -293,3 +293,13 @@ void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 
 	fs->get_alloc_block = func;
 }
+
+blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino)
+{
+	dgrp_t	group = ext2fs_group_of_ino(fs, ino);
+	__u8	log_flex = fs->super->s_log_groups_per_flex;
+
+	if (log_flex)
+		group = group & ~((1 << (log_flex)) - 1);
+	return ext2fs_group_first_block2(fs, group);
+}
diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index a4dc8ef..7623052 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -252,7 +252,7 @@ got_block:
 		retval = extent_bmap(fs, ino, inode, handle, block_buf,
 				     0, block-1, 0, blocks_alloc, &blk64);
 		if (retval)
-			blk64 = 0;
+			blk64 = ext2fs_find_inode_goal(fs, ino);
 		retval = ext2fs_alloc_block2(fs, blk64, block_buf,
 					     &blk64);
 		if (retval)
@@ -368,7 +368,8 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
 		}
 
 		*phys_blk = inode_bmap(inode, block);
-		b = block ? inode_bmap(inode, block-1) : 0;
+		b = block ? inode_bmap(inode, block-1) :
+			    ext2fs_find_inode_goal(fs, ino);
 
 		if ((*phys_blk == 0) && (bmap_flags & BMAP_ALLOC)) {
 			retval = ext2fs_alloc_block(fs, b, block_buf, &b);
diff --git a/lib/ext2fs/expanddir.c b/lib/ext2fs/expanddir.c
index d0f7287..2df49ce 100644
--- a/lib/ext2fs/expanddir.c
+++ b/lib/ext2fs/expanddir.c
@@ -111,7 +111,7 @@ errcode_t ext2fs_expand_dir(ext2_filsys fs, ext2_ino_t dir)
 
 	es.done = 0;
 	es.err = 0;
-	es.goal = 0;
+	es.goal = ext2fs_find_inode_goal(fs, dir);
 	es.newblocks = 0;
 	es.dir = dir;
 
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 933a14d..d3a7f04 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -690,6 +690,7 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 					    errcode_t (**old)(ext2_filsys fs,
 							      blk64_t goal,
 							      blk64_t *ret));
+blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino);
 
 /* alloc_sb.c */
 extern int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 308d21d..a756b7b 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -404,8 +404,7 @@ static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
 	}
 
 	/* Allocate a block */
-	grp = ext2fs_group_of_ino(fs, ino);
-	goal = ext2fs_inode_table_loc(fs, grp);
+	goal = ext2fs_find_inode_goal(fs, ino);
 	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
 	if (err)
 		goto out2;
diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index cf75a0b..5a6c5b5 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -1010,14 +1010,8 @@ static errcode_t extent_node_split(ext2_extent_handle_t handle,
 		goto done;
 	}
 
-	if (!goal_blk) {
-		dgrp_t	group = ext2fs_group_of_ino(handle->fs, handle->ino);
-		__u8	log_flex = handle->fs->super->s_log_groups_per_flex;
-
-		if (log_flex)
-			group = group & ~((1 << (log_flex)) - 1);
-		goal_blk = ext2fs_group_first_block2(handle->fs, group);
-	}
+	if (!goal_blk)
+		goal_blk = ext2fs_find_inode_goal(handle->fs, handle->ino);
 	retval = ext2fs_alloc_block2(handle->fs, goal_blk, block_buf,
 				    &new_node_pblk);
 	if (retval)
diff --git a/lib/ext2fs/mkdir.c b/lib/ext2fs/mkdir.c
index c4c7967..36b1810 100644
--- a/lib/ext2fs/mkdir.c
+++ b/lib/ext2fs/mkdir.c
@@ -69,7 +69,8 @@ errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum,
 	 * Allocate a data block for the directory
 	 */
 	if (!inline_data) {
-		retval = ext2fs_new_block2(fs, 0, 0, &blk);
+		retval = ext2fs_new_block2(fs, ext2fs_find_inode_goal(fs, ino),
+					   0, &blk);
 		if (retval)
 			goto cleanup;
 	}
diff --git a/lib/ext2fs/symlink.c b/lib/ext2fs/symlink.c
index b2ef66c..cb3a2e7 100644
--- a/lib/ext2fs/symlink.c
+++ b/lib/ext2fs/symlink.c
@@ -53,7 +53,8 @@ errcode_t ext2fs_symlink(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t ino,
 	 */
 	fastlink = (target_len < sizeof(inode.i_block));
 	if (!fastlink) {
-		retval = ext2fs_new_block2(fs, 0, 0, &blk);
+		retval = ext2fs_new_block2(fs, ext2fs_find_inode_goal(fs, ino),
+					   0, &blk);
 		if (retval)
 			goto cleanup;
 		retval = ext2fs_get_mem(fs->blocksize, &block_buf);


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 40/49] libext2fs: find a range of empty blocks
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (38 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 39/49] libext2fs: find inode goal when allocating blocks Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 41/49] libext2fs: provide a function to set inode size Darrick J. Wong
                   ` (6 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide a function that, given a goal pblk and a range, will try to
find a run of free blocks to satisfy the allocation.  By default the
function will look anywhere in the filesystem for the run, though this
can be constrained with optional flags.  One flag indicates that the
range must start at the goal block; the other flag indicates that we
should not return a range shorter than len.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/alloc.c  |  105 +++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/ext2fs.h |    6 +++
 2 files changed, 111 insertions(+)


diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index aa084ac..109a050 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -26,6 +26,16 @@
 #include "ext2_fs.h"
 #include "ext2fs.h"
 
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
 /*
  * Clear the uninit block bitmap flag if necessary
  */
@@ -303,3 +313,98 @@ blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino)
 		group = group & ~((1 << (log_flex)) - 1);
 	return ext2fs_group_first_block2(fs, group);
 }
+
+/*
+ * Starting at _goal_, scan around the filesystem to find a run of free blocks
+ * that's at least _len_ blocks long.  If EXT2_NEWRANGE_EXACT_GOAL is given,
+ * then the range of blocks must start at _goal_.  If
+ * EXT2_NEWRANGE_EXACT_LENGTH is given, do not return a allocation shorter than
+ * _len_.
+ *
+ * The starting block is returned in _pblk_ and the length is returned via
+ * _plen_.
+ */
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+			   blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+			   blk64_t *plen)
+{
+	errcode_t retval;
+	blk64_t start, end, b;
+	int looped = 0;
+	blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+
+	dbg_printf("%s: flags=0x%x goal=%llu len=%llu\n", __func__, flags,
+		   goal, len);
+	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+	if (len == 0 || (flags & ~EXT2_NEWRANGE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+	if (!map)
+		map = fs->block_map;
+	if (!map)
+		return EXT2_ET_NO_BLOCK_BITMAP;
+	if (!goal || goal >= ext2fs_blocks_count(fs->super))
+		goal = fs->super->s_first_data_block;
+
+	start = goal;
+	while (!looped || start <= goal) {
+		retval = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+						start, max_blocks - 1, &start);
+		if (retval == ENOENT) {
+			/*
+			 * If there are no free blocks beyond the starting
+			 * point, try scanning the whole filesystem, unless the
+			 * user told us only to allocate from _goal_, or if
+			 * we're already scanning the whole filesystem.
+			 */
+			if (flags & EXT2_NEWRANGE_FIXED_GOAL ||
+			    start == fs->super->s_first_data_block)
+				goto fail;
+			start = fs->super->s_first_data_block;
+			continue;
+		} else if (retval)
+			goto errout;
+
+		if (flags & EXT2_NEWRANGE_FIXED_GOAL && start != goal)
+			goto fail;
+
+		b = min(start + len - 1, max_blocks - 1);
+		retval =  ext2fs_find_first_set_block_bitmap2(fs->block_map,
+						start, b, &end);
+		if (retval == ENOENT)
+			end = b + 1;
+		else if (retval)
+			goto errout;
+
+		if (!(flags & EXT2_NEWRANGE_EXACT_LENGTH) ||
+		    (end - start) >= len) {
+			*pblk = start;
+			*plen = end - start;
+			dbg_printf("%s: new_range goal=%llu--%llu "
+				   "blk=%llu--%llu %llu\n",
+				   __func__, goal, goal + len - 1,
+				   *pblk, *pblk + *plen - 1, *plen);
+
+			for (b = start; b < end;
+			     b += fs->super->s_blocks_per_group)
+				clear_block_uninit(fs,
+						ext2fs_group_of_blk2(fs, b));
+			return 0;
+		}
+
+try_again:
+		if (flags & EXT2_NEWRANGE_FIXED_GOAL)
+			goto fail;
+		start = end;
+		if (start >= max_blocks) {
+			if (looped)
+				goto fail;
+			looped = 1;
+			start = fs->super->s_first_data_block;
+		}
+	}
+
+fail:
+	retval = EXT2_ET_BLOCK_ALLOC_FAIL;
+errout:
+	return retval;
+}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index d3a7f04..7354e4d 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -691,6 +691,12 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
 							      blk64_t goal,
 							      blk64_t *ret));
 blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino);
+#define EXT2_NEWRANGE_FIXED_GOAL	(0x1)
+#define EXT2_NEWRANGE_EXACT_LENGTH	(0x2)
+#define EXT2_NEWRANGE_ALL_FLAGS		(0x3)
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+			   blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+			   blk64_t *plen);
 
 /* alloc_sb.c */
 extern int ext2fs_reserve_super_and_bgd(ext2_filsys fs,


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 41/49] libext2fs: provide a function to set inode size
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (39 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 40/49] libext2fs: find a range of empty blocks Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 42/49] libext2fs: implement fallocate Darrick J. Wong
                   ` (5 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Provide an API to set i_size in an inode and take care of all required
feature flag modifications.  Refactor the code to use this new
function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c              |    9 ++++-----
 e2fsck/pass2.c              |   11 +++++++++--
 e2fsck/pass3.c              |    5 +++--
 e2fsck/rehash.c             |    5 ++++-
 lib/ext2fs/bb_inode.c       |    5 ++++-
 lib/ext2fs/ext2fs.h         |    2 ++
 lib/ext2fs/fileio.c         |   41 ++++++++++++++++++++++++++++-------------
 lib/ext2fs/mkjournal.c      |    8 +++-----
 lib/ext2fs/res_gdt.c        |    9 +++------
 lib/ext2fs/symlink.c        |    2 +-
 misc/create_inode.c         |    7 ++++++-
 tests/f_big_sparse/expect.1 |    5 -----
 12 files changed, 67 insertions(+), 42 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index a6d3297..8f67b76 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -265,8 +265,7 @@ static void check_size(e2fsck_t ctx, struct problem_context *pctx)
 	if (!fix_problem(ctx, PR_1_SET_NONZSIZE, pctx))
 		return;
 
-	inode->i_size = 0;
-	inode->i_size_high = 0;
+	ext2fs_inode_set_size(ctx->fs, inode, 0);
 	e2fsck_write_inode(ctx, pctx->ino, pctx->inode, "pass1");
 }
 
@@ -2433,9 +2432,9 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 		pctx->num = (pb.last_block+1) * fs->blocksize;
 		pctx->group = bad_size;
 		if (fix_problem(ctx, PR_1_BAD_I_SIZE, pctx)) {
-			inode->i_size = pctx->num;
-			if (!LINUX_S_ISDIR(inode->i_mode))
-				inode->i_size_high = pctx->num >> 32;
+			if (LINUX_S_ISDIR(inode->i_mode))
+				pctx->num &= 0xFFFFFFFFULL;
+			ext2fs_inode_set_size(fs, inode, pctx->num);
 			dirty_inode++;
 		}
 		pctx->num = 0;
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 5b84947..238beb0 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1768,8 +1768,15 @@ static int allocate_dir_block(e2fsck_t ctx,
 	 */
 	e2fsck_read_inode(ctx, db->ino, &inode, "allocate_dir_block");
 	ext2fs_iblk_add_blocks(fs, &inode, 1);
-	if (inode.i_size < (db->blockcnt+1) * fs->blocksize)
-		inode.i_size = (db->blockcnt+1) * fs->blocksize;
+	if (EXT2_I_SIZE(&inode) < (db->blockcnt+1) * fs->blocksize) {
+		pctx->errcode = ext2fs_inode_set_size(fs, &inode,
+					(db->blockcnt+1) * fs->blocksize);
+		if (pctx->errcode) {
+			pctx->str = "ext2fs_inode_set_size";
+			fix_problem(ctx, PR_2_ALLOC_DIRBOCK, pctx);
+			return 1;
+		}
+	}
 	e2fsck_write_inode(ctx, db->ino, &inode, "allocate_dir_block");
 
 	/*
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index efc0d49..324e398 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -865,8 +865,9 @@ errcode_t e2fsck_expand_directory(e2fsck_t ctx, ext2_ino_t dir,
 		return retval;
 
 	sz = (es.last_block + 1) * fs->blocksize;
-	inode.i_size = sz;
-	inode.i_size_high = sz >> 32;
+	retval = ext2fs_inode_set_size(fs, &inode, sz);
+	if (retval)
+		return retval;
 	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
 	quota_data_add(ctx->qctx, &inode, dir, es.newblocks * fs->blocksize);
 
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 89708c2..09c55e5 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -783,7 +783,10 @@ static errcode_t write_directory(e2fsck_t ctx, ext2_filsys fs,
 		inode.i_flags &= ~EXT2_INDEX_FL;
 	else
 		inode.i_flags |= EXT2_INDEX_FL;
-	inode.i_size = outdir->num * fs->blocksize;
+	retval = ext2fs_inode_set_size(fs, &inode,
+				       outdir->num * fs->blocksize);
+	if (retval)
+		return retval;
 	ext2fs_iblk_sub_blocks(fs, &inode, wd.cleared);
 	e2fsck_write_inode(ctx, ino, &inode, "rehash_dir");
 
diff --git a/lib/ext2fs/bb_inode.c b/lib/ext2fs/bb_inode.c
index 268eecf..3d9132b 100644
--- a/lib/ext2fs/bb_inode.c
+++ b/lib/ext2fs/bb_inode.c
@@ -128,7 +128,10 @@ errcode_t ext2fs_update_bb_inode(ext2_filsys fs, ext2_badblocks_list bb_list)
 	if (!inode.i_ctime)
 		inode.i_ctime = fs->now ? fs->now : time(0);
 	ext2fs_iblk_set(fs, &inode, rec.bad_block_count);
-	inode.i_size = rec.bad_block_count * fs->blocksize;
+	retval = ext2fs_inode_set_size(fs, &inode,
+				       rec.bad_block_count * fs->blocksize);
+	if (retval)
+		goto cleanup;
 
 	retval = ext2fs_write_inode(fs, EXT2_BAD_INO, &inode);
 	if (retval)
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 7354e4d..1ae5295 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1245,6 +1245,8 @@ errcode_t ext2fs_file_get_lsize(ext2_file_t file, __u64 *ret_size);
 extern ext2_off_t ext2fs_file_get_size(ext2_file_t file);
 extern errcode_t ext2fs_file_set_size(ext2_file_t file, ext2_off_t size);
 extern errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size);
+errcode_t ext2fs_inode_set_size(ext2_filsys fs, struct ext2_inode *inode,
+				ext2_off64_t size);
 
 /* finddev.c */
 extern char *ext2fs_find_block_device(dev_t device);
diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 1e386f8..55affb4 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -567,6 +567,31 @@ out:
 	return retval;
 }
 
+errcode_t ext2fs_inode_set_size(ext2_filsys fs, struct ext2_inode *inode,
+				ext2_off64_t size)
+{
+	/* Only regular files get to be larger than 4GB */
+	if (!LINUX_S_ISREG(inode->i_mode) && (size >> 32))
+		return EXT2_ET_FILE_TOO_BIG;
+
+	/* If we're writing a large file, set the large_file flag */
+	if (LINUX_S_ISREG(inode->i_mode) &&
+	    ext2fs_needs_large_file_feature(size) &&
+	    (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
+	     fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
+		fs->super->s_feature_ro_compat |=
+					EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
+		ext2fs_update_dynamic_rev(fs);
+		ext2fs_mark_super_dirty(fs);
+	}
+
+	inode->i_size = size & 0xffffffff;
+	inode->i_size_high = (size >> 32);
+
+	return 0;
+}
+
 /*
  * This function sets the size of the file, truncating it if necessary
  *
@@ -588,20 +613,10 @@ errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size)
 	old_truncate = ((old_size + file->fs->blocksize - 1) >>
 		      EXT2_BLOCK_SIZE_BITS(file->fs->super));
 
-	/* If we're writing a large file, set the large_file flag */
-	if (LINUX_S_ISREG(file->inode.i_mode) &&
-	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&file->inode)) &&
-	    (!EXT2_HAS_RO_COMPAT_FEATURE(file->fs->super,
-					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
-	     file->fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
-		file->fs->super->s_feature_ro_compat |=
-				EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
-		ext2fs_update_dynamic_rev(file->fs);
-		ext2fs_mark_super_dirty(file->fs);
-	}
+	retval = ext2fs_inode_set_size(file->fs, &file->inode, size);
+	if (retval)
+		return retval;
 
-	file->inode.i_size = size & 0xffffffff;
-	file->inode.i_size_high = (size >> 32);
 	if (file->ino) {
 		retval = ext2fs_write_inode(file->fs, file->ino, &file->inode);
 		if (retval)
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index ecc3912..11f33ab 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -400,15 +400,13 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
 		goto errout;
 
 	inode_size = (unsigned long long)fs->blocksize * num_blocks;
-	inode.i_size = inode_size & 0xFFFFFFFF;
-	inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
-	if (ext2fs_needs_large_file_feature(inode_size))
-		fs->super->s_feature_ro_compat |=
-			EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
 	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
 	inode.i_mtime = inode.i_ctime = fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
 	inode.i_mode = LINUX_S_IFREG | 0600;
+	retval = ext2fs_inode_set_size(fs, &inode, inode_size);
+	if (retval)
+		goto errout;
 
 	if ((retval = ext2fs_write_new_inode(fs, journal_ino, &inode)))
 		goto errout;
diff --git a/lib/ext2fs/res_gdt.c b/lib/ext2fs/res_gdt.c
index e61c330..1343ce6 100644
--- a/lib/ext2fs/res_gdt.c
+++ b/lib/ext2fs/res_gdt.c
@@ -133,12 +133,9 @@ errcode_t ext2fs_create_resize_inode(ext2_filsys fs)
 		dindir_dirty = inode_dirty = 1;
 		inode_size = apb*apb + apb + EXT2_NDIR_BLOCKS;
 		inode_size *= fs->blocksize;
-		inode.i_size = inode_size & 0xFFFFFFFF;
-		inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
-		if(inode.i_size_high) {
-			sb->s_feature_ro_compat |=
-				EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
-		}
+		retval = ext2fs_inode_set_size(fs, &inode, inode_size);
+		if (retval)
+			goto out_free;
 		inode.i_ctime = fs->now ? fs->now : time(0);
 	}
 
diff --git a/lib/ext2fs/symlink.c b/lib/ext2fs/symlink.c
index cb3a2e7..4147181 100644
--- a/lib/ext2fs/symlink.c
+++ b/lib/ext2fs/symlink.c
@@ -80,7 +80,7 @@ errcode_t ext2fs_symlink(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t ino,
 	inode.i_uid = inode.i_gid = 0;
 	ext2fs_iblk_set(fs, &inode, fastlink ? 0 : 1);
 	inode.i_links_count = 1;
-	inode.i_size = target_len;
+	ext2fs_inode_set_size(fs, &inode, target_len);
 	/* The time fields are set by ext2fs_write_new_inode() */
 
 	if (fastlink) {
diff --git a/misc/create_inode.c b/misc/create_inode.c
index fc4172d..98f2bd0 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -404,7 +404,12 @@ errcode_t do_write_internal(ext2_filsys fs, ext2_ino_t cwd, const char *src,
 	inode.i_atime = inode.i_ctime = inode.i_mtime =
 		fs->now ? fs->now : time(0);
 	inode.i_links_count = 1;
-	inode.i_size = statbuf.st_size;
+	retval = ext2fs_inode_set_size(fs, &inode, statbuf.st_size);
+	if (retval) {
+		com_err(dest, retval, 0);
+		close(fd);
+		return retval;
+	}
 	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				      EXT4_FEATURE_INCOMPAT_INLINE_DATA)) {
 		inode.i_flags |= EXT4_INLINE_DATA_FL;
diff --git a/tests/f_big_sparse/expect.1 b/tests/f_big_sparse/expect.1
index 437ade7..eac82ed 100644
--- a/tests/f_big_sparse/expect.1
+++ b/tests/f_big_sparse/expect.1
@@ -2,11 +2,6 @@ Pass 1: Checking inodes, blocks, and sizes
 Inode 12, i_size is 61440, should be 4398050758656.  Fix? yes
 
 Pass 2: Checking directory structure
-Filesystem contains large files, but lacks LARGE_FILE flag in superblock.
-Fix? yes
-
-Filesystem has feature flag(s) set, but is a revision 0 filesystem.  Fix? yes

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 42/49] libext2fs: implement fallocate
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (40 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 41/49] libext2fs: provide a function to set inode size Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 44/49] fuse2fs: translate ACL structures Darrick J. Wong
                   ` (4 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Create a library function to perform fallocation on arbitrary files,
and wire up a few users for this function.  This is a bit more intense
than Ted's original mk_hugefiles implementation since we have to honor
any blocks that may already be allocated to the file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/Makefile.in |    8 
 lib/ext2fs/ext2fs.h    |   10 +
 lib/ext2fs/fallocate.c |  835 ++++++++++++++++++++++++++++++++++++++++++++++++
 misc/mk_hugefiles.c    |   84 +----
 4 files changed, 863 insertions(+), 74 deletions(-)
 create mode 100644 lib/ext2fs/fallocate.c


diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index e64342e..7ea0cd2 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -44,6 +44,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
 	expanddir.o \
 	ext_attr.o \
 	extent.o \
+	fallocate.o \
 	fileio.o \
 	finddev.o \
 	flushb.o \
@@ -684,6 +685,13 @@ extent.o: $(srcdir)/extent.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
  $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
  $(srcdir)/bitops.h $(srcdir)/e2image.h
+fallocate.o: $(srcdir)/fallocate.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fsP.h \
+ $(srcdir)/ext2fs.h $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
+ $(srcdir)/bitops.h $(srcdir)/e2image.h
 fileio.o: $(srcdir)/fileio.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 1ae5295..9aaa54e 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1222,6 +1222,16 @@ extern errcode_t ext2fs_extent_goto2(ext2_extent_handle_t handle,
 				     int leaf_level, blk64_t blk);
 extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);
 
+/* fallocate.c */
+#define EXT2_FALLOCATE_ZERO_BLOCKS	(0x1)
+#define EXT2_FALLOCATE_FORCE_INIT	(0x2)
+#define EXT2_FALLOCATE_FORCE_UNINIT	(0x4)
+#define EXT2_FALLOCATE_INIT_BEYOND_EOF	(0x8)
+#define EXT2_FALLOCATE_ALL_FLAGS	(0xF)
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+			   struct ext2_inode *inode,
+			   blk64_t start, blk64_t len);
+
 /* fileio.c */
 extern errcode_t ext2fs_file_open2(ext2_filsys fs, ext2_ino_t ino,
 				   struct ext2_inode *inode,
diff --git a/lib/ext2fs/fallocate.c b/lib/ext2fs/fallocate.c
new file mode 100644
index 0000000..5e91037
--- /dev/null
+++ b/lib/ext2fs/fallocate.c
@@ -0,0 +1,835 @@
+/*
+ * fallocate.c -- Allocate large chunks of file.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+
+#include "ext2_fs.h"
+#include "ext2fs.h"
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Extent-based fallocate code.
+ *
+ * Find runs of unmapped logical blocks by starting at start and walking the
+ * extents until we reach the end of the range we want.
+ *
+ * For each run of unmapped blocks, try to find the extents on either side of
+ * the range.  If there's a left extent that can grow by at least a cluster and
+ * there are lblocks between start and the next lcluster after start, see if
+ * there's an implied cluster allocation; if so, zero the blocks (if the left
+ * extent is initialized) and adjust the extent.  Ditto for the blocks between
+ * the end of the last full lcluster and end, if there's a right extent.
+ *
+ * Try to attach as much as we can to the left extent, then try to attach as
+ * much as we can to the right extent.  For the remainder, try to allocate the
+ * whole range; map in whatever we get; and repeat until we're done.
+ *
+ * To attach to a left extent, figure out the maximum amount we can add to the
+ * extent and try to allocate that much, and append if successful.  To attach
+ * to a right extent, figure out the max we can add to the extent, try to
+ * allocate that much, and prepend if successful.
+ *
+ * We need an alloc_range function that tells us how much we can allocate given
+ * a maximum length and one of a suggested start, a fixed start, or a fixed end
+ * point.
+ *
+ * Every time we modify the extent tree we also need to update the block stats.
+ *
+ * At the end, update i_blocks and i_size appropriately.
+ */
+
+static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
+{
+#ifdef DEBUG
+	if (desc)
+		printf("%s: ", desc);
+	printf("extent: lblk %llu--%llu, len %u, pblk %llu, flags: ",
+	       extent->e_lblk, extent->e_lblk + extent->e_len - 1,
+	       extent->e_len, extent->e_pblk);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_LEAF)
+		fputs("LEAF ", stdout);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+		fputs("UNINIT ", stdout);
+	if (extent->e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+		fputs("2ND_VISIT ", stdout);
+	if (!extent->e_flags)
+		fputs("(none)", stdout);
+	fputc('\n', stdout);
+	fflush(stdout);
+#endif
+}
+
+static errcode_t claim_range(ext2_filsys fs, struct ext2_inode *inode,
+			     blk64_t blk, blk64_t len)
+{
+	blk64_t	clusters;
+
+	clusters = (len + EXT2FS_CLUSTER_RATIO(fs) - 1) /
+		   EXT2FS_CLUSTER_RATIO(fs);
+	ext2fs_block_alloc_stats_range(fs, blk,
+			clusters * EXT2FS_CLUSTER_RATIO(fs), +1);
+	return ext2fs_iblk_add_blocks(fs, inode, clusters);
+}
+
+static errcode_t ext_falloc_helper(ext2_filsys fs,
+				   int flags,
+				   ext2_ino_t ino,
+				   struct ext2_inode *inode,
+				   ext2_extent_handle_t handle,
+				   struct ext2fs_extent *left_ext,
+				   struct ext2fs_extent *right_ext,
+				   blk64_t range_start, blk64_t range_len,
+				   blk64_t alloc_goal)
+{
+	struct ext2fs_extent	newex, ex;
+	int			op;
+	blk64_t			fillable, pblk, plen, x, cluster_fill, y;
+	blk64_t			eof_blk;
+	errcode_t		err;
+	blk_t			max_extent_len, max_uninit_len, max_init_len;
+
+#ifdef DEBUG
+	printf("%s: ", __func__);
+	if (left_ext)
+		printf("left_ext=%llu--%llu, ", left_ext->e_lblk,
+		       left_ext->e_lblk + left_ext->e_len - 1);
+	if (right_ext)
+		printf("right_ext=%llu--%llu, ", right_ext->e_lblk,
+		       right_ext->e_lblk + right_ext->e_len - 1);
+	printf("start=%llu len=%llu, goal=%llu\n", range_start, range_len,
+	       alloc_goal);
+	fflush(stdout);
+#endif
+	/* Can't create initialized extents past EOF? */
+	if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF))
+		eof_blk = EXT2_I_SIZE(inode) / fs->blocksize;
+
+	/* The allocation goal must be as far into a cluster as range_start. */
+	alloc_goal = (alloc_goal & ~EXT2FS_CLUSTER_MASK(fs)) |
+		     (range_start & EXT2FS_CLUSTER_MASK(fs));
+
+	max_uninit_len = EXT_UNINIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+	max_init_len = EXT_INIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+
+	/* We must lengthen the left extent to the end of the cluster */
+	if (left_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		/* How many more blocks can be attached to left_ext? */
+		if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - left_ext->e_len;
+		else
+			fillable = max_init_len - left_ext->e_len;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto expand_right;
+
+		/*
+		 * If range_start isn't on a cluster boundary, try an
+		 * implied cluster allocation for left_ext.
+		 */
+		cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+			       (range_start & EXT2FS_CLUSTER_MASK(fs));
+		cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill == 0)
+			goto expand_right;
+
+		if (cluster_fill > fillable)
+			cluster_fill = fillable;
+
+		/* Don't expand an initialized left_ext beyond EOF */
+		if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+			x = left_ext->e_lblk + left_ext->e_len - 1;
+			dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+				   __func__, x, x + cluster_fill, eof_blk);
+			if (eof_blk >= x && eof_blk <= x + cluster_fill)
+				cluster_fill = eof_blk - x;
+			if (cluster_fill == 0)
+				goto expand_right;
+		}
+
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto expand_right;
+		left_ext->e_len += cluster_fill;
+		range_start += cluster_fill;
+		range_len -= cluster_fill;
+		alloc_goal += cluster_fill;
+
+		dbg_print_extent("ext_falloc clus left+", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+			err = ext2fs_zero_blocks2(fs, left_ext->e_pblk +
+						  left_ext->e_len -
+						  cluster_fill, cluster_fill,
+						  NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+expand_right:
+	/* We must lengthen the right extent to the beginning of the cluster */
+	if (right_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		/* How much can we attach to right_ext? */
+		if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - right_ext->e_len;
+		else
+			fillable = max_init_len - right_ext->e_len;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto try_merge;
+
+		/*
+		 * If range_end isn't on a cluster boundary, try an implied
+		 * cluster allocation for right_ext.
+		 */
+		cluster_fill = right_ext->e_lblk & EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill == 0)
+			goto try_merge;
+
+		err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+		if (err)
+			goto out;
+
+		if (cluster_fill > fillable)
+			cluster_fill = fillable;
+		right_ext->e_lblk -= cluster_fill;
+		right_ext->e_pblk -= cluster_fill;
+		right_ext->e_len += cluster_fill;
+		range_len -= cluster_fill;
+
+		dbg_print_extent("ext_falloc clus right+", right_ext);
+		err = ext2fs_extent_replace(handle, 0, right_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+			err = ext2fs_zero_blocks2(fs, right_ext->e_pblk,
+						  cluster_fill, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_merge:
+	/* Merge both extents together, perhaps? */
+	if (left_ext && right_ext) {
+		/* Are the two extents mergeable? */
+		if ((left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) !=
+		    (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT))
+			goto try_left;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_left;
+
+		/*
+		 * Skip initialized extent unless user wants to zero blocks
+		 * or requires init extent.
+		 */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (!(flags & EXT2_FALLOCATE_ZERO_BLOCKS) ||
+		     !(flags & EXT2_FALLOCATE_FORCE_INIT)))
+			goto try_left;
+
+		/* Will it even fit? */
+		x = left_ext->e_len + range_len + right_ext->e_len;
+		if (x > (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT ?
+				max_uninit_len : max_init_len))
+			goto try_left;
+
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto try_left;
+
+		/* Allocate blocks */
+		y = left_ext->e_pblk + left_ext->e_len;
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				       EXT2_NEWRANGE_EXACT_LENGTH, y,
+				       right_ext->e_pblk - y + 1, NULL,
+				       &pblk, &plen);
+		if (err)
+			goto try_left;
+		if (pblk + plen != right_ext->e_pblk)
+			goto try_left;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Modify extents */
+		left_ext->e_len = x;
+		dbg_print_extent("ext_falloc merge", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_delete(handle, 0);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+		*right_ext = *left_ext;
+
+		/* Zero blocks */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, range_start, range_len,
+						  NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		return 0;
+	}
+
+try_left:
+	/* Extend the left extent */
+	if (left_ext) {
+		/* How many more blocks can be attached to left_ext? */
+		if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - left_ext->e_len;
+		else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+			fillable = max_init_len - left_ext->e_len;
+		else
+			fillable = 0;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_right;
+
+		if (fillable > range_len)
+			fillable = range_len;
+
+		/* Don't expand an initialized left_ext beyond EOF */
+		x = left_ext->e_lblk + left_ext->e_len - 1;
+		if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+			dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+				   __func__, x, x + fillable, eof_blk);
+			if (eof_blk >= x && eof_blk <= x + fillable)
+				fillable = eof_blk - x;
+		}
+
+		if (fillable == 0)
+			goto try_right;
+
+		/* Test if the right edge of the range is already mapped? */
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			err = ext2fs_map_cluster_block(fs, ino, inode,
+					x + fillable, &pblk);
+			if (err)
+				goto out;
+			if (pblk)
+				fillable -= 1 + ((x + fillable)
+						 & EXT2FS_CLUSTER_MASK(fs));
+			if (fillable == 0)
+				goto try_right;
+		}
+
+		/* Allocate range of blocks */
+		x = left_ext->e_pblk + left_ext->e_len;
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				EXT2_NEWRANGE_EXACT_LENGTH,
+				x, fillable, NULL, &pblk, &plen);
+		if (err)
+			goto try_right;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Modify left_ext */
+		err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+		if (err)
+			goto out;
+		range_start += plen;
+		range_len -= plen;
+		left_ext->e_len += plen;
+		dbg_print_extent("ext_falloc left+", left_ext);
+		err = ext2fs_extent_replace(handle, 0, left_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_right:
+	/* Extend the right extent */
+	if (right_ext) {
+		/* How much can we attach to right_ext? */
+		if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+			fillable = max_uninit_len - right_ext->e_len;
+		else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+			fillable = max_init_len - right_ext->e_len;
+		else
+			fillable = 0;
+
+		/* User requires init/uninit but extent is uninit/init. */
+		if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+		     (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+		    ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+		     !(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+			goto try_anywhere;
+
+		if (fillable > range_len)
+			fillable = range_len;
+		if (fillable == 0)
+			goto try_anywhere;
+
+		/* Test if the left edge of the range is already mapped? */
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			err = ext2fs_map_cluster_block(fs, ino, inode,
+					right_ext->e_lblk - fillable, &pblk);
+			if (err)
+				goto out;
+			if (pblk)
+				fillable -= EXT2FS_CLUSTER_RATIO(fs) -
+						((right_ext->e_lblk - fillable)
+						 & EXT2FS_CLUSTER_MASK(fs));
+			if (fillable == 0)
+				goto try_anywhere;
+		}
+
+		/*
+		 * FIXME: It would be nice if we could handle allocating a
+		 * variable range from a fixed end point instead of just
+		 * skipping to the general allocator if the whole range is
+		 * unavailable.
+		 */
+		err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+				EXT2_NEWRANGE_EXACT_LENGTH,
+				right_ext->e_pblk - fillable,
+				fillable, NULL, &pblk, &plen);
+		if (err)
+			goto try_anywhere;
+		err = claim_range(fs, inode,
+			      pblk & ~EXT2FS_CLUSTER_MASK(fs),
+			      plen + (pblk & EXT2FS_CLUSTER_MASK(fs)));
+		if (err)
+			goto out;
+
+		/* Modify right_ext */
+		err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+		if (err)
+			goto out;
+		range_len -= plen;
+		right_ext->e_lblk -= plen;
+		right_ext->e_pblk -= plen;
+		right_ext->e_len += plen;
+		dbg_print_extent("ext_falloc right+", right_ext);
+		err = ext2fs_extent_replace(handle, 0, right_ext);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		/* Zero blocks if necessary */
+		if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk,
+					plen + cluster_fill, NULL, NULL);
+			if (err)
+				goto out;
+		}
+	}
+
+try_anywhere:
+	/* Try implied cluster alloc on the left and right ends */
+	if (range_len > 0 && (range_start & EXT2FS_CLUSTER_MASK(fs))) {
+		cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+			       (range_start & EXT2FS_CLUSTER_MASK(fs));
+		cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill > range_len)
+			cluster_fill = range_len;
+		newex.e_lblk = range_start;
+		err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+					       &pblk);
+		if (err)
+			goto out;
+		if (pblk == 0)
+			goto try_right_implied;
+		newex.e_pblk = pblk;
+		newex.e_len = cluster_fill;
+		newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+				 EXT2_EXTENT_FLAGS_UNINIT);
+		dbg_print_extent("ext_falloc iclus left+", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+						  newex.e_len, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		range_start += cluster_fill;
+		range_len -= cluster_fill;
+	}
+
+try_right_implied:
+	y = range_start + range_len;
+	if (range_len > 0 && (y & EXT2FS_CLUSTER_MASK(fs))) {
+		cluster_fill = y & EXT2FS_CLUSTER_MASK(fs);
+		if (cluster_fill > range_len)
+			cluster_fill = range_len;
+		newex.e_lblk = y & ~EXT2FS_CLUSTER_MASK(fs);
+		err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+					       &pblk);
+		if (err)
+			goto out;
+		if (pblk == 0)
+			goto no_implied;
+		newex.e_pblk = pblk;
+		newex.e_len = cluster_fill;
+		newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+				 EXT2_EXTENT_FLAGS_UNINIT);
+		dbg_print_extent("ext_falloc iclus right+", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+						  newex.e_len, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		range_len -= cluster_fill;
+	}
+
+no_implied:
+	if (range_len == 0)
+		return 0;
+
+	newex.e_lblk = range_start;
+	if (flags & EXT2_FALLOCATE_FORCE_INIT) {
+		max_extent_len = max_init_len;
+		newex.e_flags = 0;
+	} else {
+		max_extent_len = max_uninit_len;
+		newex.e_flags = EXT2_EXTENT_FLAGS_UNINIT;
+	}
+	pblk = alloc_goal;
+	y = range_len;
+	for (x = 0; x < y;) {
+		cluster_fill = newex.e_lblk & EXT2FS_CLUSTER_MASK(fs);
+		fillable = min(range_len + cluster_fill, max_extent_len);
+		err = ext2fs_new_range(fs, 0, pblk & ~EXT2FS_CLUSTER_MASK(fs),
+				       fillable,
+				       NULL, &pblk, &plen);
+		if (err)
+			goto out;
+		err = claim_range(fs, inode, pblk, plen);
+		if (err)
+			goto out;
+
+		/* Create extent */
+		newex.e_pblk = pblk + cluster_fill;
+		newex.e_len = plen - cluster_fill;
+		dbg_print_extent("ext_falloc create", &newex);
+		ext2fs_extent_goto(handle, newex.e_lblk);
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+					&ex);
+		if (err == EXT2_ET_NO_CURRENT_NODE)
+			ex.e_lblk = 0;
+		else if (err)
+			goto out;
+
+		if (ex.e_lblk > newex.e_lblk)
+			op = 0; /* insert before */
+		else
+			op = EXT2_EXTENT_INSERT_AFTER;
+		dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+			   __func__, op ? "after" : "before", ex.e_lblk,
+			   newex.e_lblk);
+		err = ext2fs_extent_insert(handle, op, &newex);
+		if (err)
+			goto out;
+		err = ext2fs_extent_fix_parents(handle);
+		if (err)
+			goto out;
+
+		if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+		    (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+			err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+			if (err)
+				goto out;
+		}
+
+		/* Update variables at end of loop */
+		x += plen - cluster_fill;
+		range_len -= plen - cluster_fill;
+		newex.e_lblk += plen - cluster_fill;
+		pblk += plen - cluster_fill;
+		if (pblk >= ext2fs_blocks_count(fs->super))
+			pblk = fs->super->s_first_data_block;
+	}
+
+out:
+	return err;
+}
+
+static errcode_t extent_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+				      struct ext2_inode *inode,
+				      blk64_t start, blk64_t len)
+{
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	left_extent, right_extent;
+	struct ext2fs_extent	*left_adjacent, *right_adjacent;
+	errcode_t		err;
+	blk64_t			range_start, range_end = 0, end, next;
+	blk64_t			count, goal, goal_distance;
+
+	end = start + len - 1;
+	err = ext2fs_extent_open2(fs, ino, inode, &handle);
+	if (err)
+		return err;
+
+	/*
+	 * Find the extent closest to the start of the alloc range.  We don't
+	 * check the return value because _goto() sets the current node to the
+	 * next-lowest extent if 'start' is in a hole; or the next-highest
+	 * extent if there aren't any lower ones; or doesn't set a current node
+	 * if there was a real error reading the extent tree.  In that case,
+	 * _get() will error out.
+	 */
+start_again:
+	ext2fs_extent_goto(handle, start);
+	err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, &left_extent);
+	if (err == EXT2_ET_NO_CURRENT_NODE) {
+		blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+		goal = ext2fs_find_inode_goal(fs, ino);
+		err = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+						goal, max_blocks - 1, &goal);
+		goal += start;
+		err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+					NULL, start, len, goal);
+		goto errout;
+	} else if (err)
+		goto errout;
+
+	dbg_print_extent("ext_falloc initial", &left_extent);
+	next = left_extent.e_lblk + left_extent.e_len;
+	if (left_extent.e_lblk > start) {
+		/* The nearest extent we found was beyond start??? */
+		goal = left_extent.e_pblk - (left_extent.e_lblk - start);
+		err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+					&left_extent, start,
+					left_extent.e_lblk - start, goal);
+		if (err)
+			goto errout;
+
+		goto start_again;
+	} else if (next >= start) {
+		range_start = next;
+		left_adjacent = &left_extent;
+	} else {
+		range_start = start;
+		left_adjacent = NULL;
+	}
+	goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+	goal_distance = range_start - next;
+
+	do {
+		err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF,
+					   &right_extent);
+		dbg_printf("%s: ino=%d get next =%d\n", __func__, ino,
+			   (int)err);
+		dbg_print_extent("ext_falloc next", &right_extent);
+		/* Stop if we've seen this extent before */
+		if (!err && right_extent.e_lblk <= left_extent.e_lblk)
+			err = EXT2_ET_EXTENT_NO_NEXT;
+
+		if (err && err != EXT2_ET_EXTENT_NO_NEXT)
+			goto errout;
+		if (err == EXT2_ET_EXTENT_NO_NEXT ||
+		    right_extent.e_lblk > end + 1) {
+			range_end = end;
+			right_adjacent = NULL;
+		} else {
+			/* Handle right_extent.e_lblk <= end */
+			range_end = right_extent.e_lblk - 1;
+			right_adjacent = &right_extent;
+		}
+		if (err != EXT2_ET_EXTENT_NO_NEXT &&
+		    goal_distance > (range_end - right_extent.e_lblk)) {
+			goal = right_extent.e_pblk -
+					(right_extent.e_lblk - range_start);
+			goal_distance = range_end - right_extent.e_lblk;
+		}
+
+		dbg_printf("%s: ino=%d rstart=%llu rend=%llu\n", __func__, ino,
+			   range_start, range_end);
+		err = 0;
+		if (range_start <= range_end) {
+			count = range_end - range_start + 1;
+			err = ext_falloc_helper(fs, flags, ino, inode, handle,
+						left_adjacent, right_adjacent,
+						range_start, count, goal);
+			if (err)
+				goto errout;
+		}
+
+		if (range_end == end)
+			break;
+
+		err = ext2fs_extent_goto(handle, right_extent.e_lblk);
+		if (err)
+			goto errout;
+		next = right_extent.e_lblk + right_extent.e_len;
+		left_extent = right_extent;
+		left_adjacent = &left_extent;
+		range_start = next;
+		goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+		goal_distance = range_start - next;
+	} while (range_end < end);
+
+errout:
+	ext2fs_zero_blocks2(NULL, 0, 0, NULL, NULL);
+	ext2fs_extent_free(handle);
+	return err;
+}
+
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+			   struct ext2_inode *inode,
+			   blk64_t start, blk64_t len)
+{
+	struct ext2_inode	inode_buf;
+	blk64_t			blk, x;
+	errcode_t		err;
+
+	if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+	    (flags & EXT2_FALLOCATE_FORCE_UNINIT)) ||
+	   (flags & ~EXT2_FALLOCATE_ALL_FLAGS))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (len > ext2fs_blocks_count(fs->super))
+		return EXT2_ET_BLOCK_ALLOC_FAIL;
+	else if (len == 0)
+		return 0;
+
+	/* Read inode structure if necessary */
+	if (!inode) {
+		err = ext2fs_read_inode(fs, ino, &inode_buf);
+		if (err)
+			return err;
+		inode = &inode_buf;
+	}
+	dbg_printf("%s: ino=%d start=%llu len=%llu\n", __func__, ino, start,
+		   len);
+
+	if (inode->i_flags & EXT4_EXTENTS_FL) {
+		err = extent_fallocate(fs, flags, ino, inode, start, len);
+		goto out;
+	}
+
+	/* XXX: Allocate a bunch of blocks the slow way */
+	for (blk = start; blk <= start + len; blk++) {
+		err = ext2fs_bmap2(fs, ino, inode, NULL, 0, blk, 0, &x);
+		if (err)
+			return err;
+		if (x)
+			continue;
+
+		err = ext2fs_bmap2(fs, ino, inode, NULL,
+				   BMAP_ALLOC | BMAP_UNINIT, blk, 0, &x);
+		if (err)
+			return err;
+	}
+
+out:
+	if (inode == &inode_buf)
+		ext2fs_write_inode(fs, ino, inode);
+	return err;
+}
diff --git a/misc/mk_hugefiles.c b/misc/mk_hugefiles.c
index d4dadc4..19892c5 100644
--- a/misc/mk_hugefiles.c
+++ b/misc/mk_hugefiles.c
@@ -144,84 +144,20 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
 
 	ext2fs_inode_alloc_stats2(fs, *ino, +1, 0);
 
-	retval = ext2fs_extent_open2(fs, *ino, &inode, &handle);
+	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				      EXT3_FEATURE_INCOMPAT_EXTENTS))
+		inode.i_flags |= EXT4_EXTENTS_FL;
+	retval = ext2fs_fallocate(fs,
+				  EXT2_FALLOCATE_FORCE_INIT |
+				  EXT2_FALLOCATE_ZERO_BLOCKS,
+				  *ino, &inode, 0, num);
 	if (retval)
 		return retval;
-
-	lblk = 0;
-	left = num ? num : 1;
-	while (left) {
-		blk64_t pblk, end;
-		blk64_t n = left;
-
-		retval =  ext2fs_find_first_zero_block_bitmap2(fs->block_map,
-			goal, ext2fs_blocks_count(fs->super) - 1, &end);
-		if (retval)
-			goto errout;
-		goal = end;
-
-		retval =  ext2fs_find_first_set_block_bitmap2(fs->block_map, goal,
-			       ext2fs_blocks_count(fs->super) - 1, &bend);
-		if (retval == ENOENT) {
-			bend = ext2fs_blocks_count(fs->super);
-			if (num == 0)
-				left = 0;
-		}
-		if (!num || bend - goal < left)
-			n = bend - goal;
-		pblk = goal;
-		if (num)
-			left -= n;
-		goal += n;
-		count += n;
-		ext2fs_block_alloc_stats_range(fs, pblk, n, +1);
-
-		if (zero_hugefile) {
-			blk64_t ret_blk;
-			retval = ext2fs_zero_blocks2(fs, pblk, n,
-						     &ret_blk, NULL);
-
-			if (retval)
-				com_err(program_name, retval,
-					_("while zeroing block %llu "
-					  "for hugefile"), ret_blk);
-		}
-
-		while (n) {
-			blk64_t l = n;
-			struct ext2fs_extent newextent;
-
-			if (l > EXT_INIT_MAX_LEN)
-				l = EXT_INIT_MAX_LEN;
-
-			newextent.e_len = l;
-			newextent.e_pblk = pblk;
-			newextent.e_lblk = lblk;
-			newextent.e_flags = 0;
-
-			retval = ext2fs_extent_insert(handle,
-					EXT2_EXTENT_INSERT_AFTER, &newextent);
-			if (retval)
-				return retval;
-			pblk += l;
-			lblk += l;
-			n -= l;
-		}
-	}
-
-	retval = ext2fs_read_inode(fs, *ino, &inode);
+	retval = ext2fs_inode_set_size(fs, &inode, num * fs->blocksize);
 	if (retval)
-		goto errout;

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 44/49] fuse2fs: translate ACL structures
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (41 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 42/49] libext2fs: implement fallocate Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 45/49] fuse2fs: handle 64-bit dates correctly Darrick J. Wong
                   ` (3 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Translate "native" ACL structures into ext4 ACL structures when
reading or writing the ACL EAs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure       |    5 +
 configure.in    |    8 +-
 lib/config.h.in |    3 +
 misc/fuse2fs.c  |  262 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 270 insertions(+), 8 deletions(-)


diff --git a/configure b/configure
index ce6a4ef..e5943af 100755
--- a/configure
+++ b/configure
@@ -10479,7 +10479,7 @@ fi
 done
 
 fi
-for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
+for ac_header in  	dirent.h 	errno.h 	execinfo.h 	getopt.h 	malloc.h 	mntent.h 	paths.h 	semaphore.h 	setjmp.h 	signal.h 	stdarg.h 	stdint.h 	stdlib.h 	termios.h 	termio.h 	unistd.h 	utime.h 	linux/falloc.h 	linux/fd.h 	linux/major.h 	linux/loop.h 	net/if_dl.h 	netinet/in.h 	sys/acl.h 	sys/disklabel.h 	sys/file.h 	sys/ioctl.h 	sys/mkdev.h 	sys/mman.h 	sys/prctl.h 	sys/queue.h 	sys/resource.h 	sys/select.h 	sys/socket.h 	sys/sockio.h 	sys/stat.h 	sys/syscall.h 	sys/sysctl.h 	sys/sysmacros.h 	sys/time.h 	sys/types.h 	sys/un.h 	sys/wait.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -11228,6 +11228,7 @@ else
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29
 "
 if eval test \"x\$"$as_ac_Header"\" = x"yes"; then :
   cat >>confdefs.h <<_ACEOF
@@ -11246,6 +11247,7 @@ done
 
 	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
+#define FUSE_USE_VERSION 29
 #ifdef __linux__
 #include <linux/fs.h>
 #include <linux/falloc.h>
@@ -11365,6 +11367,7 @@ else
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29
 #ifdef __linux__
 # include <linux/fs.h>
 # include <linux/falloc.h>
diff --git a/configure.in b/configure.in
index 2c455af..6c185e7 100644
--- a/configure.in
+++ b/configure.in
@@ -948,6 +948,7 @@ AC_CHECK_HEADERS(m4_flatten([
 	linux/loop.h
 	net/if_dl.h
 	netinet/in.h
+	sys/acl.h
 	sys/disklabel.h
 	sys/file.h
 	sys/ioctl.h
@@ -1177,10 +1178,12 @@ then
 else
 	AC_CHECK_HEADERS([pthread.h fuse.h], [],
 [AC_MSG_FAILURE([Cannot find fuse2fs headers.])],
-[#define _FILE_OFFSET_BITS	64])
+[#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29])
 
 	AC_PREPROC_IFELSE(
-[AC_LANG_PROGRAM([[#ifdef __linux__
+[AC_LANG_PROGRAM([[#define FUSE_USE_VERSION 29
+#ifdef __linux__
 #include <linux/fs.h>
 #include <linux/falloc.h>
 #include <linux/xattr.h>
@@ -1195,6 +1198,7 @@ fi
 ,
 AC_CHECK_HEADERS([pthread.h fuse.h], [], [FUSE_CMT="#"],
 [#define _FILE_OFFSET_BITS	64
+#define FUSE_USE_VERSION 29
 #ifdef __linux__
 # include <linux/fs.h>
 # include <linux/falloc.h>
diff --git a/lib/config.h.in b/lib/config.h.in
index 118a508..852c305 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -426,6 +426,9 @@
 /* Define to 1 if you have the `sysconf' function. */
 #undef HAVE_SYSCONF
 
+/* Define to 1 if you have the <sys/acl.h> header file. */
+#undef HAVE_SYS_ACL_H
+
 /* Define to 1 if you have the <sys/disklabel.h> header file. */
 #undef HAVE_SYS_DISKLABEL_H
 
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 34c05b9..df83cbd 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -17,9 +17,15 @@
 # include <linux/falloc.h>
 # include <linux/xattr.h>
 # define FUSE_PLATFORM_OPTS	",nonempty,big_writes"
+# ifdef HAVE_SYS_ACL_H
+#  define TRANSLATE_LINUX_ACLS
+# endif
 #else
 # define FUSE_PLATFORM_OPTS	""
 #endif
+#ifdef TRANSLATE_LINUX_ACLS
+# include <sys/acl.h>
+#endif
 #include <sys/ioctl.h>
 #include <unistd.h>
 #include <fuse.h>
@@ -59,6 +65,199 @@ static ext2_filsys global_fs; /* Try not to use this directly */
 # define FL_PUNCH_HOLE_FLAG (0)
 #endif
 
+/* ACL translation stuff */
+#ifdef TRANSLATE_LINUX_ACLS
+/*
+ * Copied from acl_ea.h in libacl source; ACLs have to be sent to and from fuse
+ * in this format... at least on Linux.
+ */
+#define ACL_EA_ACCESS		"system.posix_acl_access"
+#define ACL_EA_DEFAULT		"system.posix_acl_default"
+
+#define ACL_EA_VERSION		0x0002
+
+typedef struct {
+	u_int16_t	e_tag;
+	u_int16_t	e_perm;
+	u_int32_t	e_id;
+} acl_ea_entry;
+
+typedef struct {
+	u_int32_t	a_version;
+	acl_ea_entry	a_entries[0];
+} acl_ea_header;
+
+static inline size_t acl_ea_size(int count)
+{
+	return sizeof(acl_ea_header) + count * sizeof(acl_ea_entry);
+}
+
+static inline int acl_ea_count(size_t size)
+{
+	if (size < sizeof(acl_ea_header))
+		return -1;
+	size -= sizeof(acl_ea_header);
+	if (size % sizeof(acl_ea_entry))
+		return -1;
+	return size / sizeof(acl_ea_entry);
+}
+
+/*
+ * ext4 ACL structures, copied from fs/ext4/acl.h.
+ */
+#define EXT4_ACL_VERSION	0x0001
+
+typedef struct {
+	__u16		e_tag;
+	__u16		e_perm;
+	__u32		e_id;
+} ext4_acl_entry;
+
+typedef struct {
+	__u16		e_tag;
+	__u16		e_perm;
+} ext4_acl_entry_short;
+
+typedef struct {
+	__u32		a_version;
+} ext4_acl_header;
+
+static inline size_t ext4_acl_size(int count)
+{
+	if (count <= 4) {
+		return sizeof(ext4_acl_header) +
+		       count * sizeof(ext4_acl_entry_short);
+	} else {
+		return sizeof(ext4_acl_header) +
+		       4 * sizeof(ext4_acl_entry_short) +
+		       (count - 4) * sizeof(ext4_acl_entry);
+	}
+}
+
+static inline int ext4_acl_count(size_t size)
+{
+	ssize_t s;
+	size -= sizeof(ext4_acl_header);
+	s = size - 4 * sizeof(ext4_acl_entry_short);
+	if (s < 0) {
+		if (size % sizeof(ext4_acl_entry_short))
+			return -1;
+		return size / sizeof(ext4_acl_entry_short);
+	} else {
+		if (s % sizeof(ext4_acl_entry))
+			return -1;
+		return s / sizeof(ext4_acl_entry) + 4;
+	}
+}
+
+static errcode_t fuse_to_ext4_acl(acl_ea_header *facl, size_t facl_sz,
+				  ext4_acl_header **eacl, size_t *eacl_sz)
+{
+	int i, facl_count;
+	ext4_acl_header *h;
+	size_t h_sz;
+	ext4_acl_entry *e;
+	acl_ea_entry *a;
+	void *hptr;
+	errcode_t err;
+
+	facl_count = acl_ea_count(facl_sz);
+	h_sz = ext4_acl_size(facl_count);
+	if (facl_count < 0 || facl->a_version != ACL_EA_VERSION)
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	err = ext2fs_get_mem(h_sz, &h);
+	if (err)
+		return err;
+
+	h->a_version = ext2fs_cpu_to_le32(EXT4_ACL_VERSION);
+	hptr = h + 1;
+	for (i = 0, a = facl->a_entries; i < facl_count; i++, a++) {
+		e = hptr;
+		e->e_tag = ext2fs_cpu_to_le16(a->e_tag);
+		e->e_perm = ext2fs_cpu_to_le16(a->e_perm);
+
+		switch (a->e_tag) {
+		case ACL_USER:
+		case ACL_GROUP:
+			e->e_id = ext2fs_cpu_to_le32(a->e_id);
+			hptr += sizeof(ext4_acl_entry);
+			break;
+		case ACL_USER_OBJ:
+		case ACL_GROUP_OBJ:
+		case ACL_MASK:
+		case ACL_OTHER:
+			hptr += sizeof(ext4_acl_entry_short);
+			break;
+		default:
+			err = EXT2_ET_INVALID_ARGUMENT;
+			goto out;
+		}
+	}
+
+	*eacl = h;
+	*eacl_sz = h_sz;
+	return err;
+out:
+	ext2fs_free_mem(&h);
+	return err;
+}
+
+static errcode_t ext4_to_fuse_acl(acl_ea_header **facl, size_t *facl_sz,
+				  ext4_acl_header *eacl, size_t eacl_sz)
+{
+	int i, eacl_count;
+	acl_ea_header *f;
+	ext4_acl_entry *e;
+	acl_ea_entry *a;
+	size_t f_sz;
+	void *hptr;
+	errcode_t err;
+
+	eacl_count = ext4_acl_count(eacl_sz);
+	f_sz = acl_ea_size(eacl_count);
+	if (eacl_count < 0 ||
+	    eacl->a_version != ext2fs_cpu_to_le32(EXT4_ACL_VERSION))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	err = ext2fs_get_mem(f_sz, &f);
+	if (err)
+		return err;
+
+	f->a_version = ACL_EA_VERSION;
+	hptr = eacl + 1;
+	for (i = 0, a = f->a_entries; i < eacl_count; i++, a++) {
+		e = hptr;
+		a->e_tag = ext2fs_le16_to_cpu(e->e_tag);
+		a->e_perm = ext2fs_le16_to_cpu(e->e_perm);
+
+		switch (a->e_tag) {
+		case ACL_USER:
+		case ACL_GROUP:
+			a->e_id = ext2fs_le32_to_cpu(e->e_id);
+			hptr += sizeof(ext4_acl_entry);
+			break;
+		case ACL_USER_OBJ:
+		case ACL_GROUP_OBJ:
+		case ACL_MASK:
+		case ACL_OTHER:
+			hptr += sizeof(ext4_acl_entry_short);
+			break;
+		default:
+			err = EXT2_ET_INVALID_ARGUMENT;
+			goto out;
+		}
+	}
+
+	*facl = f;
+	*facl_sz = f_sz;
+	return err;
+out:
+	ext2fs_free_mem(&f);
+	return err;
+}
+#endif /* TRANSLATE_LINUX_ACLS */
+
 /*
  * ext2_file_t contains a struct inode, so we can't leave files open.
  * Use this as a proxy instead.
@@ -2113,6 +2312,30 @@ static int op_statfs(const char *path, struct statvfs *buf)
 	return 0;
 }
 
+typedef errcode_t (*xattr_xlate_get)(void **cooked_buf, size_t *cooked_sz,
+				     const void *raw_buf, size_t raw_sz);
+typedef errcode_t (*xattr_xlate_set)(const void *cooked_buf, size_t cooked_sz,
+				     void **raw_buf, size_t *raw_sz);
+struct xattr_translate {
+	const char *prefix;
+	xattr_xlate_get get;
+	xattr_xlate_set set;
+};
+
+#define XATTR_TRANSLATOR(p, g, s) \
+	{.prefix = (p), \
+	 .get = (xattr_xlate_get)(g), \
+	 .set = (xattr_xlate_set)(s)}
+
+static struct xattr_translate xattr_translators[] = {
+#ifdef TRANSLATE_LINUX_ACLS
+	XATTR_TRANSLATOR(ACL_EA_ACCESS, ext4_to_fuse_acl, fuse_to_ext4_acl),
+	XATTR_TRANSLATOR(ACL_EA_DEFAULT, ext4_to_fuse_acl, fuse_to_ext4_acl),
+#endif
+	XATTR_TRANSLATOR(NULL, NULL, NULL),
+};
+#undef XATTR_TRANSLATOR
+
 static int op_getxattr(const char *path, const char *key, char *value,
 		       size_t len)
 {
@@ -2120,8 +2343,9 @@ static int op_getxattr(const char *path, const char *key, char *value,
 	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
-	void *ptr;
-	size_t plen;
+	struct xattr_translate *xt;
+	void *ptr, *cptr;
+	size_t plen, clen;
 	ext2_ino_t ino;
 	errcode_t err;
 	int ret = 0;
@@ -2164,6 +2388,17 @@ static int op_getxattr(const char *path, const char *key, char *value,
 		goto out2;
 	}
 
+	for (xt = xattr_translators; xt->prefix != NULL; xt++) {
+		if (strncmp(key, xt->prefix, strlen(xt->prefix)) == 0) {
+			err = xt->get(&cptr, &clen, ptr, plen);
+			if (err)
+				goto out3;
+			ext2fs_free_mem(&ptr);
+			ptr = cptr;
+			plen = clen;
+		}
+	}
+
 	if (!len) {
 		ret = plen;
 	} else if (len < plen) {
@@ -2173,6 +2408,7 @@ static int op_getxattr(const char *path, const char *key, char *value,
 		ret = plen;
 	}
 
+out3:
 	ext2fs_free_mem(&ptr);
 out2:
 	err = ext2fs_xattrs_close(&h);
@@ -2287,6 +2523,9 @@ static int op_setxattr(const char *path, const char *key, const char *value,
 	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
 	ext2_filsys fs;
 	struct ext2_xattr_handle *h;
+	struct xattr_translate *xt;
+	void *cvalue;
+	size_t clen;
 	ext2_ino_t ino;
 	errcode_t err;
 	int ret = 0;
@@ -2326,19 +2565,32 @@ static int op_setxattr(const char *path, const char *key, const char *value,
 		goto out2;
 	}
 
-	err = ext2fs_xattr_set(h, key, value, len);
+	cvalue = (void *)value;
+	clen = len;
+	for (xt = xattr_translators; xt->prefix != NULL; xt++) {
+		if (strncmp(key, xt->prefix, strlen(xt->prefix)) == 0) {
+			err = xt->set(value, len, &cvalue, &clen);
+			if (err)
+				goto out3;
+		}
+	}
+
+	err = ext2fs_xattr_set(h, key, cvalue, clen);
 	if (err) {
 		ret = translate_error(fs, ino, err);
-		goto out2;
+		goto out3;
 	}
 
 	err = ext2fs_xattrs_write(h);
 	if (err) {
 		ret = translate_error(fs, ino, err);
-		goto out2;
+		goto out3;
 	}
 
 	ret = update_ctime(fs, ino, NULL);
+out3:
+	if (cvalue != value)
+		ext2fs_free_mem(&cvalue);
 out2:
 	err = ext2fs_xattrs_close(&h);
 	if (!ret && err)


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 45/49] fuse2fs: handle 64-bit dates correctly
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (42 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 44/49] fuse2fs: translate ACL structures Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:58 ` [PATCH 46/49] fuse2fs: implement fallocate Darrick J. Wong
                   ` (2 subsequent siblings)
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Fix fuse2fs' interpretation of 64-bit date quantities to match the
kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/fuse2fs.c |   31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index df83cbd..bfc9a91 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -324,15 +324,24 @@ static int __translate_error(ext2_filsys fs, errcode_t err, ext2_ino_t ino,
 
 static inline __u32 ext4_encode_extra_time(const struct timespec *time)
 {
-	return (sizeof(time->tv_sec) > 4 ?
-		(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
-	       ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK);
+	__u32 extra = sizeof(time->tv_sec) > 4 ?
+			((time->tv_sec - (__s32)time->tv_sec) >> 32) &
+			EXT4_EPOCH_MASK : 0;
+	return extra | (time->tv_nsec << EXT4_EPOCH_BITS);
 }
 
 static inline void ext4_decode_extra_time(struct timespec *time, __u32 extra)
 {
-	if (sizeof(time->tv_sec) > 4)
-		time->tv_sec |= (__u64)((extra) & EXT4_EPOCH_MASK) << 32;
+	if (sizeof(time->tv_sec) > 4 && (extra & EXT4_EPOCH_MASK)) {
+		__u64 extra_bits = extra & EXT4_EPOCH_MASK;
+		/*
+		 * Prior to kernel 3.14?, we had a broken decode function,
+		 * wherein we effectively did this:
+		 * if (extra_bits == 3)
+		 *     extra_bits = 0;
+		 */
+		time->tv_sec += extra_bits << 32;
+	}
 	time->tv_nsec = ((extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
 }
 
@@ -358,7 +367,7 @@ do {									       \
 	(timespec)->tv_sec = (signed)((raw_inode)->xtime);		       \
 	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
 		ext4_decode_extra_time((timespec),			       \
-				       raw_inode->xtime ## _extra);	       \
+				       (raw_inode)->xtime ## _extra);	       \
 	else								       \
 		(timespec)->tv_nsec = 0;				       \
 } while (0)
@@ -720,6 +729,7 @@ static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
 	dev_t fakedev = 0;
 	errcode_t err;
 	int ret = 0;
+	struct timespec tv;
 
 	memset(&inode, 0, sizeof(inode));
 	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
@@ -737,9 +747,12 @@ static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
 	statbuf->st_size = EXT2_I_SIZE(&inode);
 	statbuf->st_blksize = fs->blocksize;
 	statbuf->st_blocks = blocks_from_inode(fs, &inode);
-	statbuf->st_atime = inode.i_atime;
-	statbuf->st_mtime = inode.i_mtime;
-	statbuf->st_ctime = inode.i_ctime;
+	EXT4_INODE_GET_XTIME(i_atime, &tv, &inode);
+	statbuf->st_atime = tv.tv_sec;
+	EXT4_INODE_GET_XTIME(i_mtime, &tv, &inode);
+	statbuf->st_mtime = tv.tv_sec;
+	EXT4_INODE_GET_XTIME(i_ctime, &tv, &inode);
+	statbuf->st_ctime = tv.tv_sec;
 	if (LINUX_S_ISCHR(inode.i_mode) ||
 	    LINUX_S_ISBLK(inode.i_mode)) {
 		if (inode.i_block[0])


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 46/49] fuse2fs: implement fallocate
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (43 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 45/49] fuse2fs: handle 64-bit dates correctly Darrick J. Wong
@ 2014-03-11  6:58 ` Darrick J. Wong
  2014-03-11  6:59 ` [PATCH 48/49] tests: enable using fuse2fs with metadata checksum test Darrick J. Wong
  2014-03-11  6:59 ` [PATCH 49/49] tests: test date handling Darrick J. Wong
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:58 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Use the (new) ext2fs_fallocate() to fallocate file space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/fuse2fs.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)


diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index bfc9a91..04c2dea 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -3274,7 +3274,64 @@ out:
 static int fallocate_helper(struct fuse_file_info *fp, int mode, off_t offset,
 			    off_t len)
 {
-	return -EOPNOTSUPP;
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	ext2_filsys fs;
+	struct ext2_inode_large inode;
+	blk64_t start, end, x;
+	__u64 fsize;
+	errcode_t err;
+	int flags;
+	int ret = 0;
+
+	FUSE2FS_CHECK_CONTEXT(ff);
+	fs = ff->fs;
+	FUSE2FS_CHECK_MAGIC(fs, fh, FUSE2FS_FILE_MAGIC);
+	start = offset / fs->blocksize;
+	end = (offset + len - 1) / fs->blocksize;
+	dbg_printf("%s: ino=%d mode=0x%x start=%jd end=%llu\n", __func__,
+		   fh->ino, mode, offset / fs->blocksize, end);
+	if (!fs_can_allocate(ff, len / fs->blocksize))
+		return -ENOSPC;
+
+	memset(&inode, 0, sizeof(inode));
+	err = ext2fs_read_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return err;
+	fsize = EXT2_I_SIZE(&inode);
+
+	/* Allocate a bunch of blocks */
+	flags = (mode & FL_KEEP_SIZE_FLAG ? 0 :
+			EXT2_FALLOCATE_INIT_BEYOND_EOF);
+	err = ext2fs_fallocate(fs, flags, fh->ino,
+			       (struct ext2_inode *)&inode,
+			       start, end - start + 1);
+	if (err && err != EXT2_ET_BLOCK_ALLOC_FAIL)
+		return translate_error(fs, fh->ino, err);
+
+	/* Update i_size */
+	if (!(mode & FL_KEEP_SIZE_FLAG)) {
+		if (offset + len > fsize) {
+			err = ext2fs_inode_set_size(fs,
+						(struct ext2_inode *)&inode,
+						offset + len);
+			if (err)
+				return translate_error(fs, fh->ino, err);
+		}
+	}
+
+	err = update_mtime(fs, fh->ino, &inode);
+	if (err)
+		return err;
+
+	err = ext2fs_write_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return err;
 }
 
 static errcode_t clean_block_middle(ext2_filsys fs, ext2_ino_t ino,


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 48/49] tests: enable using fuse2fs with metadata checksum test
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (44 preceding siblings ...)
  2014-03-11  6:58 ` [PATCH 46/49] fuse2fs: implement fallocate Darrick J. Wong
@ 2014-03-11  6:59 ` Darrick J. Wong
  2014-03-11  6:59 ` [PATCH 49/49] tests: test date handling Darrick J. Wong
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:59 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Create custom mount/umount commands so that we can run the metadata
checksumming tests against fuse2fs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/fuse2fs/mount  |   28 ++++++++++++++++++++++++++++
 tests/fuse2fs/umount |   21 +++++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100755 tests/fuse2fs/mount
 create mode 100755 tests/fuse2fs/umount


diff --git a/tests/fuse2fs/mount b/tests/fuse2fs/mount
new file mode 100755
index 0000000..321b1f5
--- /dev/null
+++ b/tests/fuse2fs/mount
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# Mount ext4 via fuse.  Put tests/fuse2fs/ at the start of PATH if you want
+# to run the metadata checksumming tests with fuse2fs.
+
+for arg in "$@"; do
+	if [ -b "${arg}" ]; then
+		DEV="${arg}"
+	elif [ -d "${arg}" ]; then
+		MNT="${arg}"
+	fi
+done
+
+if [ -z "${DEV}" -o -z "${MNT}" ]; then
+	echo "Please specify a device and a mountpoint."
+fi
+
+DIR="$(readlink -f "$(dirname "$0")")"
+if [ -n "${FUSE2FS_DEBUG}" ]; then
+	"${DIR}/../../misc/fuse2fs" "${DEV}" "${MNT}" -d >> "${FUSE2FS_DEBUG}" 2>&1 &
+	sleep 1
+	exit 0
+else
+	"${DIR}/../../misc/fuse2fs" "${DEV}" "${MNT}"
+	ERR=$?
+	sleep 1
+	exit "${ERR}"
+fi
diff --git a/tests/fuse2fs/umount b/tests/fuse2fs/umount
new file mode 100755
index 0000000..b21ee5a
--- /dev/null
+++ b/tests/fuse2fs/umount
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+# unmount a filesystem
+sync
+sync
+sync
+
+sleep 2
+if [ -x /bin/umount ]; then
+	/bin/umount "$@"
+	ERR=$?
+elif [ -x /sbin/umount ]; then
+	/sbin/umount "$@"
+	ERR=$?
+else
+	echo "Where is umount?"
+	exit 5
+fi
+sleep 1
+
+exit "${ERR}"


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 49/49] tests: test date handling
  2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
                   ` (45 preceding siblings ...)
  2014-03-11  6:59 ` [PATCH 48/49] tests: enable using fuse2fs with metadata checksum test Darrick J. Wong
@ 2014-03-11  6:59 ` Darrick J. Wong
  46 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11  6:59 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Test our ability to handle the entire range of valid dates.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/metadata-checksum-test.sh |   59 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)


diff --git a/tests/metadata-checksum-test.sh b/tests/metadata-checksum-test.sh
index a3ff6d6..e4f272e 100755
--- a/tests/metadata-checksum-test.sh
+++ b/tests/metadata-checksum-test.sh
@@ -3746,6 +3746,65 @@ ${fsck_cmd} -C0 -f -n "${DEV}"
 ${E2FSPROGS}/debugfs/debugfs -R 'ex /fragfile' "${DEV}" | tail -n 15
 }
 
+#####################################
+function date_test {
+msg "date_test"
+
+rm -rf /tmp/ls.before /tmp/ls.after /tmp/debugfs.diff
+
+INODE_SIZE="$(${E2FSPROGS}/misc/dumpe2fs -h "${DEV}" | grep 'Inode size:' | awk '{print $3}')"
+if [ "${INODE_SIZE}" -gt 128 ]; then
+	LAST_YEAR=2430
+else
+	LAST_YEAR=2030
+fi
+
+# Write dates
+${mount_cmd} ${MOUNT_OPTS} "${DEV}" "${MNT}" -t ext4 -o journal_checksum
+seq 1910 20 "${LAST_YEAR}" | while read year; do
+	DATE="${year}-01-01 00:00:00.000000000"
+	FNAME="$(echo "${DATE}" | tr '[ \-:.]' '____')"
+	touch -d "${DATE}" "${MNT}/${FNAME}"
+	echo "${FNAME} ${DATE}" >> /tmp/ls.before
+done
+umount "${MNT}"
+${fsck_cmd} -C0 -f -n "${DEV}"
+
+# debugfs
+seq 1910 20 "${LAST_YEAR}" | while read year; do
+	DATE="${year}-01-01 00:00:00.000000000"
+	FNAME="$(echo "${DATE}" | tr '[ \-:.]' '____')"
+	echo "${FNAME}" "$(${E2FSPROGS}/debugfs/debugfs -R "stat ${FNAME}" "${DEV}" | grep 'mtime:')"
+done > /tmp/debugfs.before
+
+# Re-read from kernel
+${mount_cmd} ${MOUNT_OPTS} "${DEV}" "${MNT}" -t ext4 -o journal_checksum
+seq 1910 20 "${LAST_YEAR}" | while read year; do
+	DATE="${year}-01-01 00:00:00.000000000"
+	FNAME="$(echo "${DATE}" | tr '[ \-:.]' '____')"
+	FDATE="$(stat -c '%y' "${MNT}/${FNAME}" | sed -e 's/......$//g')"
+	echo "${FNAME}" "${FDATE}" >> /tmp/ls.after
+done
+umount "${MNT}"
+
+# Did the kernel work?
+diff -u /tmp/ls.before /tmp/ls.after > /tmp/ls.diff || true
+
+# Does debugfs work?
+touch /tmp/debugfs.diff
+cat /tmp/debugfs.before | sed -e 's/^\(....\).*\(....\)$/\1 \2/g' | while read date fdate crap; do
+	if [ "${date}" != "${fdate}" ]; then
+		echo "${date} != ${fdate}" >> /tmp/debugfs.diff
+	fi
+done
+
+if [ "$(cat /tmp/debugfs.diff /tmp/ls.diff | wc -l)" -gt 0 ]; then
+	echo "BROKEN DATE HANDLING"
+	cat /tmp/debugfs.diff /tmp/ls.diff
+	false
+fi
+}
+
 # This test should be the last one (before speed tests, anyway)
 
 #### ALL SPEED TESTS GO AT THE END


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/49] create_inode: clean up return mess in do_write_internal
  2014-03-11  6:54 ` [PATCH 01/49] create_inode: clean up return mess in do_write_internal Darrick J. Wong
@ 2014-03-11 20:30   ` Andreas Dilger
  2014-03-11 20:41     ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Andreas Dilger @ 2014-03-11 20:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

On Mar 11, 2014, at 12:54 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> do_write_internal returns errno when ext2 library calls fail; since
> errno only reflects the outcome of the last C library call, this will
> result in confused callers.  Eliminate the naked return since
> this results in an undefined return value.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> misc/create_inode.c |   17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/misc/create_inode.c b/misc/create_inode.c
> index cf4a58f..647480c 100644
> --- a/misc/create_inode.c
> +++ b/misc/create_inode.c
> @@ -353,14 +353,14 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> 	if (retval == 0) {
> 		com_err(__func__, 0, "The file '%s' already exists\n", dest);
> 		close(fd);
> -		return errno;
> +		return retval;
> 	}

This seems a bit strange.  It looks like an error return, but it will
actually return "0" since this branch is only entered if retval == 0.
Should this return an explicit error value here?

Cheers, Andreas

> 	retval = ext2fs_new_inode(current_fs, cwd, 010755, 0, &newfile);
> 	if (retval) {
> 		com_err(__func__, retval, 0);
> 		close(fd);
> -		return errno;
> +		return retval;
> 	}
> #ifdef DEBUGFS
> 	printf("Allocated inode: %u\n", newfile);
> @@ -372,7 +372,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> 		if (retval) {
> 			com_err(__func__, retval, "while expanding directory");
> 			close(fd);
> -			return errno;
> +			return retval;
> 		}
> 		retval = ext2fs_link(current_fs, cwd, dest, newfile,
> 					EXT2_FT_REG_FILE);
> @@ -412,12 +412,15 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> 	if ((retval = ext2fs_write_new_inode(current_fs, newfile, &inode))) {
> 		com_err(__func__, retval, "while creating inode %u", newfile);
> 		close(fd);
> -		return errno;
> +		return retval;
> 	}
> 	if (inode.i_flags & EXT4_INLINE_DATA_FL) {
> 		retval = ext2fs_inline_data_init(current_fs, newfile);
> -		if (retval)
> -			return;
> +		if (retval) {
> +			com_err("copy_file", retval, 0);
> +			close(fd);
> +			return retval;
> +		}
> 	}
> 	if (LINUX_S_ISREG(inode.i_mode)) {
> 		if (statbuf.st_blocks < statbuf.st_size / S_BLKSIZE) {
> @@ -434,7 +437,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> 	}
> 	close(fd);
> 
> -	return 0;
> +	return retval;
> }
> 
> /* Copy files from source_dir to fs */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/49] create_inode: minor cleanups
  2014-03-11  6:54 ` [PATCH 02/49] create_inode: minor cleanups Darrick J. Wong
@ 2014-03-11 20:31   ` Andreas Dilger
  2014-03-12  3:25     ` Theodore Ts'o
  2014-03-12  3:27     ` Theodore Ts'o
  0 siblings, 2 replies; 88+ messages in thread
From: Andreas Dilger @ 2014-03-11 20:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 6126 bytes --]

On Mar 11, 2014, at 12:54 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:

> Fix a couple of small style issues in the create_inode files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca>


> ---
> misc/create_inode.c |   42 ++++++++++++++++++++++++++++--------------
> misc/create_inode.h |    5 +++++
> 2 files changed, 33 insertions(+), 14 deletions(-)
> 
> 
> diff --git a/misc/create_inode.c b/misc/create_inode.c
> index 647480c..b204e71 100644
> --- a/misc/create_inode.c
> +++ b/misc/create_inode.c
> @@ -1,3 +1,6 @@
> +#include <time.h>
> +#include <unistd.h>
> +
> #include "create_inode.h"
> 
> #if __STDC_VERSION__ < 199901L
> @@ -179,7 +182,8 @@ errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target)
> 	cp = strrchr(name, '/');
> 	if (cp) {
> 		*cp = 0;
> -		if ((retval =  ext2fs_namei(current_fs, root, cwd, name, &parent_ino))){
> +		retval = ext2fs_namei(current_fs, root, cwd, name, &parent_ino);
> +		if (retval) {
> 			com_err(name, retval, 0);
> 			return retval;
> 		}
> @@ -216,7 +220,8 @@ errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st)
> 	cp = strrchr(name, '/');
> 	if (cp) {
> 		*cp = 0;
> -		if ((retval =  ext2fs_namei(current_fs, root, cwd, name, &parent_ino))){
> +		retval = ext2fs_namei(current_fs, root, cwd, name, &parent_ino);
> +		if (retval) {
> 			com_err(name, retval, 0);
> 			return retval;
> 		}
> @@ -409,7 +414,8 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> 		inode.i_flags |= EXT4_EXTENTS_FL;
> 	}
> 
> -	if ((retval = ext2fs_write_new_inode(current_fs, newfile, &inode))) {
> +	retval = ext2fs_write_new_inode(current_fs, newfile, &inode);
> +	if (retval) {
> 		com_err(__func__, retval, "while creating inode %u", newfile);
> 		close(fd);
> 		return retval;
> @@ -464,12 +470,12 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
> 
> 	if (!(dh = opendir("."))) {
> 		com_err(__func__, errno,
> -			_("while openning directory \"%s\""), source_dir);
> +			_("while opening directory \"%s\""), source_dir);
> 		return errno;
> 	}
> 
> -	while((dent = readdir(dh))) {
> -		if((!strcmp(dent->d_name, ".")) || (!strcmp(dent->d_name, "..")))
> +	while ((dent = readdir(dh))) {
> +		if ((!strcmp(dent->d_name, ".")) || (!strcmp(dent->d_name, "..")))
> 			continue;
> 		lstat(dent->d_name, &st);
> 		name = dent->d_name;
> @@ -494,7 +500,8 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
> 			case S_IFCHR:
> 			case S_IFBLK:
> 			case S_IFIFO:
> -				if ((retval = do_mknod_internal(parent_ino, name, &st))) {
> +				retval = do_mknod_internal(parent_ino, name, &st);
> +				if (retval) {
> 					com_err(__func__, retval,
> 						_("while creating special file \"%s\""), name);
> 					return retval;
> @@ -506,32 +513,37 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
> 					_("ignoring socket file \"%s\""), name);
> 				continue;
> 			case S_IFLNK:
> -				if((read_cnt = readlink(name, ln_target, sizeof(ln_target))) == -1) {
> +				read_cnt = readlink(name, ln_target, sizeof(ln_target));
> +				if (read_cnt == -1) {
> 					com_err(__func__, errno,
> 						_("while trying to readlink \"%s\""), name);
> 					return errno;
> 				}
> 				ln_target[read_cnt] = '\0';
> -				if ((retval = do_symlink_internal(parent_ino, name, ln_target))) {
> +				retval = do_symlink_internal(parent_ino, name, ln_target);
> +				if (retval) {
> 					com_err(__func__, retval,
> 						_("while writing symlink\"%s\""), name);
> 					return retval;
> 				}
> 				break;
> 			case S_IFREG:
> -				if ((retval = do_write_internal(parent_ino, name, name))) {
> +				retval = do_write_internal(parent_ino, name, name);
> +				if (retval) {
> 					com_err(__func__, retval,
> 						_("while writing file \"%s\""), name);
> 					return retval;
> 				}
> 				break;
> 			case S_IFDIR:
> -				if ((retval = do_mkdir_internal(parent_ino, name, &st))) {
> +				retval = do_mkdir_internal(parent_ino, name, &st);
> +				if (retval) {
> 					com_err(__func__, retval,
> 						_("while making dir \"%s\""), name);
> 					return retval;
> 				}
> -				if ((retval = ext2fs_namei(current_fs, root, parent_ino, name, &ino))) {
> +				retval = ext2fs_namei(current_fs, root, parent_ino, name, &ino);
> +				if (retval) {
> 					com_err(name, retval, 0);
> 						return retval;
> 				}
> @@ -548,12 +560,14 @@ errcode_t populate_fs(ext2_ino_t parent_ino, const char *source_dir)
> 					_("ignoring entry \"%s\""), name);
> 		}
> 
> -		if ((retval =  ext2fs_namei(current_fs, root, parent_ino, name, &ino))){
> +		retval =  ext2fs_namei(current_fs, root, parent_ino, name, &ino);
> +		if (retval) {
> 			com_err(name, retval, 0);
> 			return retval;
> 		}
> 
> -		if ((retval = set_inode_extra(parent_ino, ino, &st))) {
> +		retval = set_inode_extra(parent_ino, ino, &st);
> +		if (retval) {
> 			com_err(__func__, retval,
> 				_("while setting inode for \"%s\""), name);
> 			return retval;
> diff --git a/misc/create_inode.h b/misc/create_inode.h
> index 2b6d429..79742e8 100644
> --- a/misc/create_inode.h
> +++ b/misc/create_inode.h
> @@ -1,3 +1,6 @@
> +#ifndef _CREATE_INODE_H
> +#define _CREATE_INODE_H
> +
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> @@ -33,3 +36,5 @@ extern errcode_t do_mknod_internal(ext2_ino_t cwd, const char *name, struct stat
> extern errcode_t do_symlink_internal(ext2_ino_t cwd, const char *name, char *target);
> extern errcode_t do_mkdir_internal(ext2_ino_t cwd, const char *name, struct stat *st);
> extern errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest);
> +
> +#endif /* _CREATE_INODE_H */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/49] create_inode: clean up return mess in do_write_internal
  2014-03-11 20:30   ` Andreas Dilger
@ 2014-03-11 20:41     ` Darrick J. Wong
  2014-03-11 21:08       ` Theodore Ts'o
  0 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-11 20:41 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: tytso, linux-ext4

On Tue, Mar 11, 2014 at 02:30:02PM -0600, Andreas Dilger wrote:
> On Mar 11, 2014, at 12:54 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > do_write_internal returns errno when ext2 library calls fail; since
> > errno only reflects the outcome of the last C library call, this will
> > result in confused callers.  Eliminate the naked return since
> > this results in an undefined return value.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > misc/create_inode.c |   17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > 
> > diff --git a/misc/create_inode.c b/misc/create_inode.c
> > index cf4a58f..647480c 100644
> > --- a/misc/create_inode.c
> > +++ b/misc/create_inode.c
> > @@ -353,14 +353,14 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> > 	if (retval == 0) {
> > 		com_err(__func__, 0, "The file '%s' already exists\n", dest);
> > 		close(fd);
> > -		return errno;
> > +		return retval;
> > 	}
> 
> This seems a bit strange.  It looks like an error return, but it will
> actually return "0" since this branch is only entered if retval == 0.
> Should this return an explicit error value here?

You're right; maybe we should return EXT2_ET_FILE_EXISTS or something?

I don't really think feeding zero to the com_err() is a great idea either.
Zheng, do you have an opinion about which error code to return?

--D
> 
> Cheers, Andreas
> 
> > 	retval = ext2fs_new_inode(current_fs, cwd, 010755, 0, &newfile);
> > 	if (retval) {
> > 		com_err(__func__, retval, 0);
> > 		close(fd);
> > -		return errno;
> > +		return retval;
> > 	}
> > #ifdef DEBUGFS
> > 	printf("Allocated inode: %u\n", newfile);
> > @@ -372,7 +372,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> > 		if (retval) {
> > 			com_err(__func__, retval, "while expanding directory");
> > 			close(fd);
> > -			return errno;
> > +			return retval;
> > 		}
> > 		retval = ext2fs_link(current_fs, cwd, dest, newfile,
> > 					EXT2_FT_REG_FILE);
> > @@ -412,12 +412,15 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> > 	if ((retval = ext2fs_write_new_inode(current_fs, newfile, &inode))) {
> > 		com_err(__func__, retval, "while creating inode %u", newfile);
> > 		close(fd);
> > -		return errno;
> > +		return retval;
> > 	}
> > 	if (inode.i_flags & EXT4_INLINE_DATA_FL) {
> > 		retval = ext2fs_inline_data_init(current_fs, newfile);
> > -		if (retval)
> > -			return;
> > +		if (retval) {
> > +			com_err("copy_file", retval, 0);
> > +			close(fd);
> > +			return retval;
> > +		}
> > 	}
> > 	if (LINUX_S_ISREG(inode.i_mode)) {
> > 		if (statbuf.st_blocks < statbuf.st_size / S_BLKSIZE) {
> > @@ -434,7 +437,7 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
> > 	}
> > 	close(fd);
> > 
> > -	return 0;
> > +	return retval;
> > }
> > 
> > /* Copy files from source_dir to fs */
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/49] create_inode: clean up return mess in do_write_internal
  2014-03-11 20:41     ` Darrick J. Wong
@ 2014-03-11 21:08       ` Theodore Ts'o
  2014-03-12  3:24         ` Theodore Ts'o
  0 siblings, 1 reply; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-11 21:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Andreas Dilger, linux-ext4

On Tue, Mar 11, 2014 at 01:41:31PM -0700, Darrick J. Wong wrote:
> > This seems a bit strange.  It looks like an error return, but it will
> > actually return "0" since this branch is only entered if retval == 0.
> > Should this return an explicit error value here?
> 
> You're right; maybe we should return EXT2_ET_FILE_EXISTS or something?

EXT2_ET_FILE_EXISTS sounds good to me.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/49] create_inode: clean up return mess in do_write_internal
  2014-03-11 21:08       ` Theodore Ts'o
@ 2014-03-12  3:24         ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Andreas Dilger, linux-ext4

On Tue, Mar 11, 2014 at 05:08:53PM -0400, Theodore Ts'o wrote:
> On Tue, Mar 11, 2014 at 01:41:31PM -0700, Darrick J. Wong wrote:
> > > This seems a bit strange.  It looks like an error return, but it will
> > > actually return "0" since this branch is only entered if retval == 0.
> > > Should this return an explicit error value here?
> > 
> > You're right; maybe we should return EXT2_ET_FILE_EXISTS or something?
> 
> EXT2_ET_FILE_EXISTS sounds good to me.

Thanks, applied, with the following change:

diff --git a/misc/create_inode.c b/misc/create_inode.c
index 647480c..fb6b800 100644
--- a/misc/create_inode.c
+++ b/misc/create_inode.c
@@ -351,9 +351,8 @@ errcode_t do_write_internal(ext2_ino_t cwd, const char *src, const char *dest)
 
 	retval = ext2fs_namei(current_fs, root, cwd, dest, &newfile);
 	if (retval == 0) {
-		com_err(__func__, 0, "The file '%s' already exists\n", dest);
 		close(fd);
-		return retval;
+		return EXT2_ET_FILE_EXISTS;
 	}
 
 	retval = ext2fs_new_inode(current_fs, cwd, 010755, 0, &newfile);

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/49] create_inode: minor cleanups
  2014-03-11 20:31   ` Andreas Dilger
@ 2014-03-12  3:25     ` Theodore Ts'o
  2014-03-12  3:27     ` Theodore Ts'o
  1 sibling, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:25 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Darrick J. Wong, linux-ext4

On Tue, Mar 11, 2014 at 02:31:27PM -0600, Andreas Dilger wrote:
> On Mar 11, 2014, at 12:54 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > Fix a couple of small style issues in the create_inode files.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/49] create_inode: minor cleanups
  2014-03-11 20:31   ` Andreas Dilger
  2014-03-12  3:25     ` Theodore Ts'o
@ 2014-03-12  3:27     ` Theodore Ts'o
  1 sibling, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:27 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Darrick J. Wong, linux-ext4

On Tue, Mar 11, 2014 at 02:31:27PM -0600, Andreas Dilger wrote:
> On Mar 11, 2014, at 12:54 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > Fix a couple of small style issues in the create_inode files.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/49] create_inode: whitespace fixes
  2014-03-11  6:54 ` [PATCH 03/49] create_inode: whitespace fixes Darrick J. Wong
@ 2014-03-12  3:27   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:17PM -0700, Darrick J. Wong wrote:
> Fix a ton of whitespace issues.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 04/49] create_inode: move debugfs internal state back to debugfs
  2014-03-11  6:54 ` [PATCH 04/49] create_inode: move debugfs internal state back to debugfs Darrick J. Wong
@ 2014-03-12  3:31   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:23PM -0700, Darrick J. Wong wrote:
> Since create_inode.c is shared between debugfs and mke2fs, don't
> spread debugfs internal state into mke2fs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 05/49] create_inode: handle hard link inum mappings per populate_fs invocation
  2014-03-11  6:54 ` [PATCH 05/49] create_inode: handle hard link inum mappings per populate_fs invocation Darrick J. Wong
@ 2014-03-12  3:46   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:30PM -0700, Darrick J. Wong wrote:
> When calling populate_fs, the map for hardlink detection is not
> cleaned up between populate_fs invocations, which could lead to
> unexpected results if anyone calls populate_fs twice in the same
> client program).  This doesn't happen right now, but we might as well
> clean it up.
> 
> The detctor fails if the external directory crosses mountpoints,
> so fix that too.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 06/49] libext2fs: support modifying arbitrary extended attributes (v5)
  2014-03-11  6:54 ` [PATCH 06/49] libext2fs: support modifying arbitrary extended attributes (v5) Darrick J. Wong
@ 2014-03-12  3:51   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:36PM -0700, Darrick J. Wong wrote:
> v5: Add magic number checking to the extended attribute editing
> handle; move inline data to the head of the attribute list when
> writing so that inline data ends up in the inode area; and always zero
> the attribute space before writing to ensure that we can delete the
> last xattr.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied (with a slight adjustment to the git commit summary)

		      	       		     - Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 07/49] debugfs: create commands to edit extended attributes
  2014-03-11  6:54 ` [PATCH 07/49] debugfs: create commands to edit extended attributes Darrick J. Wong
@ 2014-03-12  3:51   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-12  3:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:43PM -0700, Darrick J. Wong wrote:
> Enhance debugfs to be able to display and modify extended attributes, and
> create some simple tests for the extended attribute editing functions.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file
  2014-03-11  6:55 ` [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file Darrick J. Wong
@ 2014-03-12 16:38   ` Andreas Dilger
  2014-03-12 17:01     ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Andreas Dilger @ 2014-03-12 16:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, darrick.wong, linux-ext4

I thought it wasn't possible to have inline data and an external xattr block?  It doesn't make sense to do it that way compared to the normal inline xattr and external data block. 

Cheers, Andreas

> On Mar 11, 2014, at 0:55, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
> 
> i_blocks covers the number of blocks allocated to an inode for data,
> extents, and ACL blocks.  Since it's possible for a file to have a
> separate ACL block and inline data, we must be careful when expanding
> an inline data file to adjust, not set, the value of i_blocks.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> lib/ext2fs/inline_data.c |    5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/inline_data.c b/lib/ext2fs/inline_data.c
> index 72e8fa3..a9ec923 100644
> --- a/lib/ext2fs/inline_data.c
> +++ b/lib/ext2fs/inline_data.c
> @@ -372,7 +372,9 @@ ext2fs_inline_data_dir_expand(ext2_filsys fs, ext2_ino_t ino,
>    if (EXT2_HAS_INCOMPAT_FEATURE(fs->super, EXT3_FEATURE_INCOMPAT_EXTENTS))
>        inode->i_flags |= EXT4_EXTENTS_FL;
>    inode->i_flags &= ~EXT4_INLINE_DATA_FL;
> -    ext2fs_iblk_set(fs, inode, 1);
> +    retval = ext2fs_iblk_add_blocks(fs, inode, 1);
> +    if (retval)
> +        goto errout;
>    inode->i_size = fs->blocksize;
>    retval = ext2fs_bmap2(fs, ino, inode, 0, BMAP_SET, 0, 0, &blk);
>    if (retval)
> @@ -410,7 +412,6 @@ ext2fs_inline_data_file_expand(ext2_filsys fs, ext2_ino_t ino,
>        inode->i_flags |= EXT4_EXTENTS_FL;
>    }
>    inode->i_flags &= ~EXT4_INLINE_DATA_FL;
> -    ext2fs_iblk_set(fs, inode, 0);
>    inode->i_size = 0;
>    retval = ext2fs_write_inode(fs, ino, inode);
>    if (retval)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file
  2014-03-12 16:38   ` Andreas Dilger
@ 2014-03-12 17:01     ` Darrick J. Wong
  2014-03-14 13:25       ` Theodore Ts'o
  0 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-12 17:01 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: tytso, linux-ext4

On Wed, Mar 12, 2014 at 10:38:57AM -0600, Andreas Dilger wrote:
> I thought it wasn't possible to have inline data and an external xattr block?
> It doesn't make sense to do it that way compared to the normal inline xattr
> and external data block. 

I don't know if you're /supposed/ to be able to do that, but the kernel permits
me to create an inline data dir and then set a huge xattr on it that forces the
allocation of an EA block.

--D
> 
> Cheers, Andreas
> 
> > On Mar 11, 2014, at 0:55, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
> > 
> > i_blocks covers the number of blocks allocated to an inode for data,
> > extents, and ACL blocks.  Since it's possible for a file to have a
> > separate ACL block and inline data, we must be careful when expanding
> > an inline data file to adjust, not set, the value of i_blocks.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > lib/ext2fs/inline_data.c |    5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/lib/ext2fs/inline_data.c b/lib/ext2fs/inline_data.c
> > index 72e8fa3..a9ec923 100644
> > --- a/lib/ext2fs/inline_data.c
> > +++ b/lib/ext2fs/inline_data.c
> > @@ -372,7 +372,9 @@ ext2fs_inline_data_dir_expand(ext2_filsys fs, ext2_ino_t ino,
> >    if (EXT2_HAS_INCOMPAT_FEATURE(fs->super, EXT3_FEATURE_INCOMPAT_EXTENTS))
> >        inode->i_flags |= EXT4_EXTENTS_FL;
> >    inode->i_flags &= ~EXT4_INLINE_DATA_FL;
> > -    ext2fs_iblk_set(fs, inode, 1);
> > +    retval = ext2fs_iblk_add_blocks(fs, inode, 1);
> > +    if (retval)
> > +        goto errout;
> >    inode->i_size = fs->blocksize;
> >    retval = ext2fs_bmap2(fs, ino, inode, 0, BMAP_SET, 0, 0, &blk);
> >    if (retval)
> > @@ -410,7 +412,6 @@ ext2fs_inline_data_file_expand(ext2_filsys fs, ext2_ino_t ino,
> >        inode->i_flags |= EXT4_EXTENTS_FL;
> >    }
> >    inode->i_flags &= ~EXT4_INLINE_DATA_FL;
> > -    ext2fs_iblk_set(fs, inode, 0);
> >    inode->i_size = 0;
> >    retval = ext2fs_write_inode(fs, ino, inode);
> >    if (retval)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 08/49] e2fsck: don't rehash inline directories
  2014-03-11  6:54 ` [PATCH 08/49] e2fsck: don't rehash inline directories Darrick J. Wong
@ 2014-03-13  3:52   ` Theodore Ts'o
  2014-03-13  5:38     ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-13  3:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:49PM -0700, Darrick J. Wong wrote:
> If a directory's contents are stored entirely inside the inode,
> there's no index to rebuild and no dirblock checksum to recompute.
> As far as I know these are the only two reasons to call dir rehash.

Well, actually, there is a third reason to rehash directories, and
that is to reorganize a directory to optimize out deleted entries that
are scattered in the middle of the directory.

That being said, it's more critical for inline directories, since we
very much want to keep them from spilling over to an external block,
this process of compressing out deleted space is something that should
be done in real time as we operate on the directory, by the kernel,
and not just at fsck time.

The only reason why we don't do this today is because if the directory
is open for scanning using opendir/readdir, if we reorganize a
directory block, it could end up corrupting the readdir --- and for
non-inline directories, it's much less important.

What I think would might make sense is to have the kernel track
whether the directory has been opened for reading, and if it hasn't,
then it would be safe to try compressing all of the directory entries
in the block so that the free space is in a single unused directory
entry at the end of the block.  We could try doing this "dynamic
compression" of directory free space both at unlink(2) time, and also
when we try inserting a directory entry into the block and there is
apparently no space in the directory block.

So I'm fine with skipping the rehashing of inline directories now, but
this is a future, relatively small, kernel project we might want to
think about for ext4.

Cheers,

							- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 08/49] e2fsck: don't rehash inline directories
  2014-03-13  3:52   ` Theodore Ts'o
@ 2014-03-13  5:38     ` Darrick J. Wong
  2014-03-13 12:13       ` Theodore Ts'o
  0 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-13  5:38 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Wed, Mar 12, 2014 at 11:52:48PM -0400, Theodore Ts'o wrote:
> On Mon, Mar 10, 2014 at 11:54:49PM -0700, Darrick J. Wong wrote:
> > If a directory's contents are stored entirely inside the inode,
> > there's no index to rebuild and no dirblock checksum to recompute.
> > As far as I know these are the only two reasons to call dir rehash.
> 
> Well, actually, there is a third reason to rehash directories, and
> that is to reorganize a directory to optimize out deleted entries that
> are scattered in the middle of the directory.

Ooh, I forgot about that. :/

> That being said, it's more critical for inline directories, since we
> very much want to keep them from spilling over to an external block,
> this process of compressing out deleted space is something that should
> be done in real time as we operate on the directory, by the kernel,
> and not just at fsck time.
> 
> The only reason why we don't do this today is because if the directory
> is open for scanning using opendir/readdir, if we reorganize a
> directory block, it could end up corrupting the readdir --- and for
> non-inline directories, it's much less important.
> 
> What I think would might make sense is to have the kernel track
> whether the directory has been opened for reading, and if it hasn't,
> then it would be safe to try compressing all of the directory entries
> in the block so that the free space is in a single unused directory
> entry at the end of the block.  We could try doing this "dynamic
> compression" of directory free space both at unlink(2) time, and also
> when we try inserting a directory entry into the block and there is
> apparently no space in the directory block.
> 
> So I'm fine with skipping the rehashing of inline directories now, but
> this is a future, relatively small, kernel project we might want to
> think about for ext4.

Probably we ought to fix up rehash.c to be able to compress directory entries
too.  The only reason I kicked them here was that somehow an inline data dir
would end up on the rehash list, causing the block iteration to fail and e2fsck
stops cold.

--D
> 
> Cheers,
> 
> 							- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 08/49] e2fsck: don't rehash inline directories
  2014-03-13  5:38     ` Darrick J. Wong
@ 2014-03-13 12:13       ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-13 12:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Wed, Mar 12, 2014 at 10:38:48PM -0700, Darrick J. Wong wrote:
> Probably we ought to fix up rehash.c to be able to compress directory entries
> too.  The only reason I kicked them here was that somehow an inline data dir
> would end up on the rehash list, causing the block iteration to fail and e2fsck
> stops cold.

Sure, this would be a nice to have, but I'll take this patch for now
so at least we don't crash out.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 09/49] libext2fs: don't fail when doing a strict rewrite of inline data
  2014-03-11  6:54 ` [PATCH 09/49] libext2fs: don't fail when doing a strict rewrite of inline data Darrick J. Wong
@ 2014-03-14 13:19   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:54:56PM -0700, Darrick J. Wong wrote:
> ext2fs_inline_data_set() tries to ensure that there is sufficient free
> space in the inode to store the inline data.  Unfortunately, it gets
> the check wrong -- ext2fs_xattr_inode_max_size() returns the amount of
> unused bytes in the EA area, and _data_set() doesn't factor in the
> size of the existing inline data.  Therefore, a strict rewrite of an
> N-byte inlinedata with another N-byte inlinedata fails.
> 
> Fix the code to do the size check correctly.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file
  2014-03-12 17:01     ` Darrick J. Wong
@ 2014-03-14 13:25       ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Andreas Dilger, linux-ext4

On Wed, Mar 12, 2014 at 10:01:33AM -0700, Darrick J. Wong wrote:
> On Wed, Mar 12, 2014 at 10:38:57AM -0600, Andreas Dilger wrote:
> > I thought it wasn't possible to have inline data and an external xattr block?
> > It doesn't make sense to do it that way compared to the normal inline xattr
> > and external data block. 
> 
> I don't know if you're /supposed/ to be able to do that, but the kernel permits
> me to create an inline data dir and then set a huge xattr on it that forces the
> allocation of an EA block.

I could imagine situations where the file size is very small, and the
xattr size is close to the blocksize, where it would make sense to
allow this (since if we prohibited this, we would need to use two
blocks, one for the data block since the data does't fit in the inode,
and a second data block for the xattr)

Anyway, I've applid the patch, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 11/49] e2fsck: zero errcode when checking inline data blocks
  2014-03-11  6:55 ` [PATCH 11/49] e2fsck: zero errcode when checking inline data blocks Darrick J. Wong
@ 2014-03-14 13:26   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:08PM -0700, Darrick J. Wong wrote:
> When checking inline data blocks, always zero pctx->errcode because
> otherwise a previous error condition could leak through and "cause" a
> fatal block iteration failure.  I found this by corrupting an xattr
> block on an inline_data inode and fsck aborted when I tried to repair
> it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 12/49] libext2fs: during inlinedata expand, don't corrupt inode
  2014-03-11  6:55 ` [PATCH 12/49] libext2fs: during inlinedata expand, don't corrupt inode Darrick J. Wong
@ 2014-03-14 13:29   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:15PM -0700, Darrick J. Wong wrote:
> When expanding an inline data inode, it's possible that the reduction
> in the size of the EA structures causes the freeing of the EA block,
> which changes the inode.  If this happens, the local version of the
> inode that ext2fs_inline_data_expand was modifying will be out of sync
> with what's on the disk.  This local copy gets written out to disk
> after a block allocation, at which point it's possible that the inode
> EA block and logical block zero point to the same physical block,
> which is bad news.
> 
> Therefore, write the local copy to disk before removing the inline
> data EA, and reread it afterwards.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 13/49] libext2fs: repair side effects when iterating dirents in inline dirs
  2014-03-11  6:55 ` [PATCH 13/49] libext2fs: repair side effects when iterating dirents in inline dirs Darrick J. Wong
@ 2014-03-14 13:30   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:21PM -0700, Darrick J. Wong wrote:
> In ext2fs_inline_data_dir_iterate(), we must be very careful to undo
> any modifications we make to the dir_context pointer passed in by the
> caller, because it's entirely possible that the caller will still want
> to do something with the ctx or something inside.
> 
> Specifically, ext2fs_dblist_dir_iterate() wants to be able to free
> ctx->buf, and it reuses the ctx for multiple dblist entries.  That
> means that assigning ctx->buf will cause weird crashes at the end of
> dir_iterate().
> 
> Since we're being careful with ctx, we might as well handle adding the
> INLINE_DATA flag to ctx->flags for ext2fs_process_dir_block, since the
> dblist caller forgets to unset the flag before reusing the ctx.
> 
> This fixes some crashes and valgrind complaints in resize2fs, and is
> necessary for the next patch, which fixes resize2fs not to corrupt
> inline_data filesystems.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/49] resize2fs: add inline dirs for remapping
  2014-03-11  6:55 ` [PATCH 14/49] resize2fs: add inline dirs for remapping Darrick J. Wong
@ 2014-03-14 13:31   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:27PM -0700, Darrick J. Wong wrote:
> When we're looking for directory blocks for the inode remapping step,
> we need to include inline_data directories in the remap process.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 15/49] all: Introduce cppcheck static checking for make C=1
  2014-03-11  6:55 ` [PATCH 15/49] all: Introduce cppcheck static checking for make C=1 Darrick J. Wong
@ 2014-03-14 13:33   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:34PM -0700, Darrick J. Wong wrote:
> Introduce more static checking via cppcheck.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/49] misc: cppcheck cleanups
  2014-03-11  6:55 ` [PATCH 16/49] misc: cppcheck cleanups Darrick J. Wong
@ 2014-03-14 13:34   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:40PM -0700, Darrick J. Wong wrote:
> Fix a number of things that cppcheck complains about.  Most of these
> are minor resource leaks and forgotten declarations.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/49] libext2fs: fix 64bit overflow in ext2fs_block_alloc_stats_range
  2014-03-11  6:55 ` [PATCH 17/49] libext2fs: fix 64bit overflow in ext2fs_block_alloc_stats_range Darrick J. Wong
@ 2014-03-14 13:35   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:46PM -0700, Darrick J. Wong wrote:
> In ext2fs_block_alloc_stats_range(), the quantity "-inuse * n" is
> calculated as a signed 32-bit quantity.  Unfortunately, gcc (4.6.3 on
> Ubuntu 12.04) doesn't sign-extend this quantity to fill the blk64_t
> parameter that ext2fs_free_blocks_count_add() wants, so the end result
> is that the superblock gets a ridiculously huge free block count.
> 
> Changing the declaration of 'n' to blk64_t seems to fix this.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs
  2014-03-11  6:55 ` [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs Darrick J. Wong
@ 2014-03-14 13:39   ` Theodore Ts'o
  2014-03-14 13:53   ` Theodore Ts'o
  1 sibling, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:53PM -0700, Darrick J. Wong wrote:
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index b39383d..11c2693 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -1016,6 +1016,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
>  			strcat(newpath, oldpath);
>  		}
>  		putenv(newpath);
> +		free(newpath);
>  	}

This introduces a bug.  An attempt to reference the PATH environment
variable will result in garbage or a crash.  Quoting from the putenv()
manpage:

   The putenv() function adds or changes the value of environment
   variables.  The argument string is of the form name=value.  If name
   does not already exist in the environment, then string is added to
   the environment.  If name does exist, then the value of name in the
   environment is changed to value.  The string pointed to by string
   becomes part of the environment, so altering the string changes the
   environment.

It's a common false positive with things like cppcheck, valgrind
--leak-check, etc.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs
  2014-03-11  6:55 ` [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs Darrick J. Wong
  2014-03-14 13:39   ` Theodore Ts'o
@ 2014-03-14 13:53   ` Theodore Ts'o
  2014-03-14 19:23     ` Darrick J. Wong
  1 sibling, 1 reply; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:53PM -0700, Darrick J. Wong wrote:
> Fix a few minor bugs that cppcheck complained about.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied with the following changes.  It looks like cppcheck complained
with another false positive in ext2fs_create_icount_tdb().  The
filename is copied in icount->tdb_fn, and so adding a call to
ext2fs_free_mem() will actually result in a double-free bug, since
ext2fs_free_icount() will take care of releasing the memory.  Also,
perhaps just as importantly, it will take care of deleting the
temporary file created by mkstemp() first.

I did keep the first ext2fs_free_mem() and moved setting
icount->tdb_fn down by a bit just to avoid a potential bug if
mkstemp() fails, and there is a valid file of the form *-icount-XXXXXX
that the user would be unhappy with us deleting.  Pedantic, perhaps,
since it would probably never happen, but it's good to be 100%
correct.  :-)

					- Ted

diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 11c2693..b39383d 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1016,7 +1016,6 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 			strcat(newpath, oldpath);
 		}
 		putenv(newpath);
-		free(newpath);
 	}
 #ifdef CONFIG_JBD_DEBUG
 	jbd_debug = getenv("E2FSCK_JBD_DEBUG");
diff --git a/lib/ext2fs/icount.c b/lib/ext2fs/icount.c
index 7d1b3d5..5e1f5c6 100644
--- a/lib/ext2fs/icount.c
+++ b/lib/ext2fs/icount.c
@@ -193,7 +193,6 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
 		goto errout;
 	uuid_unparse(fs->super->s_uuid, uuid);
 	sprintf(fn, "%s/%s-icount-XXXXXX", tdb_dir, uuid);
-	icount->tdb_fn = fn;
 	save_umask = umask(077);
 	fd = mkstemp(fn);
 	if (fd < 0) {
@@ -201,6 +200,7 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
 		ext2fs_free_mem(&fn);
 		goto errout;
 	}
+	icount->tdb_fn = fn;
 	umask(save_umask);
 	/*
 	 * This is an overestimate of the size that we will need; the
@@ -217,7 +217,6 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
 	close(fd);
 	if (icount->tdb == NULL) {
 		retval = errno;
-		ext2fs_free_mem(&fn);
 		goto errout;
 	}
 	*ret = icount;

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 19/49] libext2fs: fix memory leak when drastically shrinking extent tree depth
  2014-03-11  6:55 ` [PATCH 19/49] libext2fs: fix memory leak when drastically shrinking extent tree depth Darrick J. Wong
@ 2014-03-14 13:56   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 13:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:55:59PM -0700, Darrick J. Wong wrote:
> In ext2fs_extent_free(), h(andle)->max_depth is used as a loop
> conditional variable to free all the h->path[].buf pointers.  However,
> ext2fs_extent_delete() sets max_depth = 0 if we've removed everything
> from the extent tree, which causes a subsequent _free() to leak some
> buf pointers.  max_depth can be re-incremented when splitting extent
> nodes, but there's no guarantee that it'll reach the old value before
> the free.
> 
> Therefore, remember the size of h->paths[] separately, and use that
> when freeing the extent handle.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

						- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/49] libext2fs: fix parents when modifying extents
  2014-03-11  6:56 ` [PATCH 20/49] libext2fs: fix parents when modifying extents Darrick J. Wong
@ 2014-03-14 14:01   ` Theodore Ts'o
  2014-03-14 20:13     ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-14 14:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:56:05PM -0700, Darrick J. Wong wrote:
> In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
> when altering either end of an extent so that the parent nodes reflect
> the added mapping.

Can you say more about what bug/symptom this fixes?

> There's a slight complication to using fix_parents: if there are two
> mappings to an lblk in the tree, the value of handle->path->curr can
> point to either extent afterwards), which is documented in a comment.

It's horribly wrong to map an lblk with two extents.  So the question
is at what places should we complain if we notice this.  In the ideal
world we would never allow an extent tree to be mutated in such a way
that it is invalid like this, but I am worried about the overhead
costs of always checking.  But if there are places where it wouldn't
take much effort to check, we should probably do so and return an
error.  (If the extent tree is already invalid, I suppose we should
allow error out the operation, since this would affect e2fsck and
debugfs.  I'm talking about checks to make sure that libext2fs or its
callers don't accidentally make things worse.)

	    	  	       	  - Ted

P.S.  I suppose the one possible valid use case for making an invalid
extent tree is a developer making an invalid extent tree for
regression testing, so maybe there should be an exemption for debugfs.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs
  2014-03-14 13:53   ` Theodore Ts'o
@ 2014-03-14 19:23     ` Darrick J. Wong
  0 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-14 19:23 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Fri, Mar 14, 2014 at 09:53:50AM -0400, Theodore Ts'o wrote:
> On Mon, Mar 10, 2014 at 11:55:53PM -0700, Darrick J. Wong wrote:
> > Fix a few minor bugs that cppcheck complained about.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Applied with the following changes.  It looks like cppcheck complained
> with another false positive in ext2fs_create_icount_tdb().  The
> filename is copied in icount->tdb_fn, and so adding a call to
> ext2fs_free_mem() will actually result in a double-free bug, since
> ext2fs_free_icount() will take care of releasing the memory.  Also,
> perhaps just as importantly, it will take care of deleting the
> temporary file created by mkstemp() first.
> 
> I did keep the first ext2fs_free_mem() and moved setting
> icount->tdb_fn down by a bit just to avoid a potential bug if
> mkstemp() fails, and there is a valid file of the form *-icount-XXXXXX
> that the user would be unhappy with us deleting.  Pedantic, perhaps,
> since it would probably never happen, but it's good to be 100%
> correct.  :-)

Ok, thanks for fixing the mistakes.  I was ignorant of the putenv thing. :/

--D
> 
> 					- Ted
> 
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index 11c2693..b39383d 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -1016,7 +1016,6 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
>  			strcat(newpath, oldpath);
>  		}
>  		putenv(newpath);
> -		free(newpath);
>  	}
>  #ifdef CONFIG_JBD_DEBUG
>  	jbd_debug = getenv("E2FSCK_JBD_DEBUG");
> diff --git a/lib/ext2fs/icount.c b/lib/ext2fs/icount.c
> index 7d1b3d5..5e1f5c6 100644
> --- a/lib/ext2fs/icount.c
> +++ b/lib/ext2fs/icount.c
> @@ -193,7 +193,6 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
>  		goto errout;
>  	uuid_unparse(fs->super->s_uuid, uuid);
>  	sprintf(fn, "%s/%s-icount-XXXXXX", tdb_dir, uuid);
> -	icount->tdb_fn = fn;
>  	save_umask = umask(077);
>  	fd = mkstemp(fn);
>  	if (fd < 0) {
> @@ -201,6 +200,7 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
>  		ext2fs_free_mem(&fn);
>  		goto errout;
>  	}
> +	icount->tdb_fn = fn;
>  	umask(save_umask);
>  	/*
>  	 * This is an overestimate of the size that we will need; the
> @@ -217,7 +217,6 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
>  	close(fd);
>  	if (icount->tdb == NULL) {
>  		retval = errno;
> -		ext2fs_free_mem(&fn);
>  		goto errout;
>  	}
>  	*ret = icount;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/49] libext2fs: fix parents when modifying extents
  2014-03-14 14:01   ` Theodore Ts'o
@ 2014-03-14 20:13     ` Darrick J. Wong
  2014-03-15 15:46       ` Theodore Ts'o
  0 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-14 20:13 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Fri, Mar 14, 2014 at 10:01:58AM -0400, Theodore Ts'o wrote:
> On Mon, Mar 10, 2014 at 11:56:05PM -0700, Darrick J. Wong wrote:
> > In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
> > when altering either end of an extent so that the parent nodes reflect
> > the added mapping.
> 
> Can you say more about what bug/symptom this fixes?

I first observed symptoms when calls to _set_bmap() or _punch_extent() on a
leaf node would leave the index node's ei_block set to the wrong value, which
e2fsck complains about.

In the _set_bmap() case, I noticed that the "remapping last block in extent"
case would produce symptoms if we are trying to remap a block from "extent" to
"next_extent", and the two extents are pointed to by different index nodes.
_extent_replace(..., next_extent) updates e_lblk in the leaf extent, but
because there's no _extent_fix_parents() call, the index nodes never get
updated.

In the _punch_extent() case, we conclude that we need to split an extent into
two pieces since we're punching out the middle.  If the extent is the last
extent in the block, the second extent will be inserted into a new leaf node
block.  Without _fix_parents(), the index node doesn't seem to get updated.

> > There's a slight complication to using fix_parents: if there are two
> > mappings to an lblk in the tree, the value of handle->path->curr can
> > point to either extent afterwards), which is documented in a comment.
> 
> It's horribly wrong to map an lblk with two extents.  So the question
> is at what places should we complain if we notice this.  In the ideal
> world we would never allow an extent tree to be mutated in such a way
> that it is invalid like this, but I am worried about the overhead
> costs of always checking.  But if there are places where it wouldn't
> take much effort to check, we should probably do so and return an
> error.  (If the extent tree is already invalid, I suppose we should
> allow error out the operation, since this would affect e2fsck and
> debugfs.  I'm talking about checks to make sure that libext2fs or its
> callers don't accidentally make things worse.)

Both cases above cause the (temporary) use of two extents to map the same lblk.
_set_bmap() first adjusts next_extent.e_lblk to cover the lblk, then decrements
extent.e_len so that "extent" no longer covers the lblk.  _punch_extent()
splits an extent by first inserting the second part of the extent and then
shortening the original extent to reflect the punchout.

These two cases seemed slightly suspect to me, but since e2fsprogs doesn't
journal, changing the code to reduce one extent and enlarge the other opens the
possibility that we could lose the lblk mapping entirely if something happens
in between the two operations.  I prefer "block still mapped but fsck is
unhappy about redundant bookeepping" to "the block is gone", so I added the two
fixes and let it go.

--D
> 
> 	    	  	       	  - Ted
> 
> P.S.  I suppose the one possible valid use case for making an invalid
> extent tree is a developer making an invalid extent tree for
> regression testing, so maybe there should be an exemption for debugfs.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/49] libext2fs: fix parents when modifying extents
  2014-03-14 20:13     ` Darrick J. Wong
@ 2014-03-15 15:46       ` Theodore Ts'o
  2014-03-17 16:59         ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-15 15:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Fri, Mar 14, 2014 at 01:13:19PM -0700, Darrick J. Wong wrote:
> Both cases above cause the (temporary) use of two extents to map the same lblk.
> _set_bmap() first adjusts next_extent.e_lblk to cover the lblk, then decrements
> extent.e_len so that "extent" no longer covers the lblk.  _punch_extent()
> splits an extent by first inserting the second part of the extent and then
> shortening the original extent to reflect the punchout.
> 
> These two cases seemed slightly suspect to me, but since e2fsprogs doesn't
> journal, changing the code to reduce one extent and enlarge the other opens the
> possibility that we could lose the lblk mapping entirely if something happens
> in between the two operations.  I prefer "block still mapped but fsck is
> unhappy about redundant bookeepping" to "the block is gone", so I added the two
> fixes and let it go.

OK, fair enough, I'll merge in this patch since it's fixing a real
bug, and I can't think of a good way to fix this issue without making
pretty massive changes.

I'll note though that at the moment e2fsck doesn't have sophisticated
extent tree recovery support; so if the extent tree is corrupt, it
offers to zap the entire extent tree, instead of trying to fix up the
extent tree.  That wasn't an issue because it was likely that if
absent bugs, the most likely case if the extent tree is corrupted,
there's not much you can do, so it wasn't worth it to add some code to
handle these cases.

However, if a large number of users start using your FUSE server in
production, then making sure the right thing happens when they suffer
power failures start becoming more important --- but it may not be
trivial since libext2fs wasn't originally intended for that use case.
I am glad that you implemented it, since it's a great way to get
better testing for various corner cases.  But that's different from
advertising that the FUSE server should be used in production use
cases; in particular, we might need to figure out some kind of
journalling system for FUSE, either using the ext4's internal journal,
or some user-space journalling system.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 21/49] e2fsck: print runs of duplicate blocks instead of all of them
  2014-03-11  6:56 ` [PATCH 21/49] e2fsck: print runs of duplicate blocks instead of all of them Darrick J. Wong
@ 2014-03-15 16:19   ` Theodore Ts'o
  0 siblings, 0 replies; 88+ messages in thread
From: Theodore Ts'o @ 2014-03-15 16:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Mon, Mar 10, 2014 at 11:56:12PM -0700, Darrick J. Wong wrote:
> When pass1 finds blocks that are mapped to multiple files, it will
> print every duplicated block.  If there are long sequences of
> duplicate blocks (e.g. the e_pblk field is wrong in an extent), this
> can cause a gigantic flood of output when a range could convey the
> same information.  Therefore, teach pass1b to print ranges when
> possible.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/49] libext2fs: fix parents when modifying extents
  2014-03-15 15:46       ` Theodore Ts'o
@ 2014-03-17 16:59         ` Darrick J. Wong
  0 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-17 16:59 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Sat, Mar 15, 2014 at 11:46:26AM -0400, Theodore Ts'o wrote:
> On Fri, Mar 14, 2014 at 01:13:19PM -0700, Darrick J. Wong wrote:
> > Both cases above cause the (temporary) use of two extents to map the same lblk.
> > _set_bmap() first adjusts next_extent.e_lblk to cover the lblk, then decrements
> > extent.e_len so that "extent" no longer covers the lblk.  _punch_extent()
> > splits an extent by first inserting the second part of the extent and then
> > shortening the original extent to reflect the punchout.
> > 
> > These two cases seemed slightly suspect to me, but since e2fsprogs doesn't
> > journal, changing the code to reduce one extent and enlarge the other opens the
> > possibility that we could lose the lblk mapping entirely if something happens
> > in between the two operations.  I prefer "block still mapped but fsck is
> > unhappy about redundant bookeepping" to "the block is gone", so I added the two
> > fixes and let it go.
> 
> OK, fair enough, I'll merge in this patch since it's fixing a real
> bug, and I can't think of a good way to fix this issue without making
> pretty massive changes.
> 
> I'll note though that at the moment e2fsck doesn't have sophisticated
> extent tree recovery support; so if the extent tree is corrupt, it
> offers to zap the entire extent tree, instead of trying to fix up the
> extent tree.  That wasn't an issue because it was likely that if
> absent bugs, the most likely case if the extent tree is corrupted,
> there's not much you can do, so it wasn't worth it to add some code to
> handle these cases.
> 
> However, if a large number of users start using your FUSE server in
> production, then making sure the right thing happens when they suffer
> power failures start becoming more important --- but it may not be
> trivial since libext2fs wasn't originally intended for that use case.
> I am glad that you implemented it, since it's a great way to get
> better testing for various corner cases.  But that's different from
> advertising that the FUSE server should be used in production use
> cases; in particular, we might need to figure out some kind of
> journalling system for FUSE, either using the ext4's internal journal,
> or some user-space journalling system.

I wasn't planning to advertise fuse2fs for production use.  But perhaps it
could cough out a few more warnings if you're not mounting ro, and/or default
to ro?

--D
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 35/49] ext2fs: add readahead method to improve scanning
  2014-03-11  6:57 ` [PATCH 35/49] ext2fs: add readahead method to improve scanning Darrick J. Wong
@ 2014-03-17 22:07   ` Andreas Dilger
  0 siblings, 0 replies; 88+ messages in thread
From: Andreas Dilger @ 2014-03-17 22:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4, Andreas Dilger

[-- Attachment #1: Type: text/plain, Size: 6351 bytes --]

On Mar 11, 2014, at 12:57 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:

> Frøm: Andreas Dilger <adilger@whamcloud.com>
> 
> Add a readahead method for prefetching ranges of disk blocks.  This is
> useful for inode table scanning, and other large contiguous ranges of
> blocks, and may also prove useful for random block prefetch, since it
> will allow reordering of the IO without waiting synchronously for the
> reads to complete.
> 
> It is currently using the posix_fadvise(POSIX_FADV_WILLNEED)
> interface, as this proved most efficient during our testing.
> 
> [darrick.wong@oracle.com]
> Add a cache_release method for advising the pagecache to discard disk
> cache blocks.  Make the arguments to the readahead function take the
> same ULL values as the other IO functions, and return an appropriate
> error code when fadvise isn't available.

This already has my Signed-off-by: line, but I thought I'd chime in that
I also reviewed the later additions from Darrick for the cache release.

Cheers, Andreas

> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> lib/ext2fs/ext2_io.h    |   12 ++++++++++++
> lib/ext2fs/io_manager.c |   18 ++++++++++++++++++
> lib/ext2fs/unix_io.c    |   46 +++++++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 73 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
> index 1894fb8..636f797 100644
> --- a/lib/ext2fs/ext2_io.h
> +++ b/lib/ext2fs/ext2_io.h
> @@ -90,6 +90,12 @@ struct struct_io_manager {
> 					int count, const void *data);
> 	errcode_t (*discard)(io_channel channel, unsigned long long block,
> 			     unsigned long long count);
> +	errcode_t (*cache_readahead)(io_channel channel,
> +				     unsigned long long block,
> +				     unsigned long long count);
> +	errcode_t (*cache_release)(io_channel channel,
> +				   unsigned long long block,
> +				   unsigned long long count);
> 	long	reserved[16];
> };
> 
> @@ -124,6 +130,12 @@ extern errcode_t io_channel_discard(io_channel channel,
> 				    unsigned long long count);
> extern errcode_t io_channel_alloc_buf(io_channel channel,
> 				      int count, void *ptr);
> +extern errcode_t io_channel_cache_readahead(io_channel io,
> +					    unsigned long long block,
> +					    unsigned long long count);
> +extern errcode_t io_channel_cache_release(io_channel io,
> +					  unsigned long long block,
> +					  unsigned long long count);
> 
> /* unix_io.c */
> extern io_manager unix_io_manager;
> diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
> index 34e4859..a1258c4 100644
> --- a/lib/ext2fs/io_manager.c
> +++ b/lib/ext2fs/io_manager.c
> @@ -128,3 +128,21 @@ errcode_t io_channel_alloc_buf(io_channel io, int count, void *ptr)
> 	else
> 		return ext2fs_get_mem(size, ptr);
> }
> +
> +errcode_t io_channel_cache_readahead(io_channel io, unsigned long long block,
> +				     unsigned long long count)
> +{
> +	if (!io->manager->cache_readahead)
> +		return EXT2_ET_OP_NOT_SUPPORTED;
> +
> +	return io->manager->cache_readahead(io, block, count);
> +}
> +
> +errcode_t io_channel_cache_release(io_channel io, unsigned long long block,
> +				   unsigned long long count)
> +{
> +	if (!io->manager->cache_release)
> +		return EXT2_ET_OP_NOT_SUPPORTED;
> +
> +	return io->manager->cache_release(io, block, count);
> +}
> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> index a818c13..a95e289 100644
> --- a/lib/ext2fs/unix_io.c
> +++ b/lib/ext2fs/unix_io.c
> @@ -15,6 +15,9 @@
>  * %End-Header%
>  */
> 
> +#define _XOPEN_SOURCE 600
> +#define _DARWIN_C_SOURCE
> +#define _FILE_OFFSET_BITS 64
> #define _LARGEFILE_SOURCE
> #define _LARGEFILE64_SOURCE
> #ifndef _GNU_SOURCE
> @@ -35,6 +38,9 @@
> #ifdef __linux__
> #include <sys/utsname.h>
> #endif
> +#if HAVE_SYS_TYPES_H
> +#include <sys/types.h>
> +#endif
> #ifdef HAVE_SYS_IOCTL_H
> #include <sys/ioctl.h>
> #endif
> @@ -44,9 +50,6 @@
> #if HAVE_SYS_STAT_H
> #include <sys/stat.h>
> #endif
> -#if HAVE_SYS_TYPES_H
> -#include <sys/types.h>
> -#endif
> #if HAVE_SYS_RESOURCE_H
> #include <sys/resource.h>
> #endif
> @@ -97,6 +100,7 @@ struct unix_private_data {
> #define IS_ALIGNED(n, align) ((((unsigned long) n) & \
> 			       ((unsigned long) ((align)-1))) == 0)
> 
> +
> static errcode_t unix_get_stats(io_channel channel, io_stats *stats)
> {
> 	errcode_t	retval = 0;
> @@ -810,6 +814,40 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
> #endif /* NO_IO_CACHE */
> }
> 
> +static errcode_t unix_cache_readahead(io_channel channel,
> +				      unsigned long long block,
> +				      unsigned long long count)
> +{
> +#ifdef POSIX_FADV_WILLNEED
> +	struct unix_private_data *data;
> +
> +	data = (struct unix_private_data *)channel->private_data;
> +	return posix_fadvise(data->dev,
> +			     (ext2_loff_t)block * channel->block_size,
> +			     (ext2_loff_t)count * channel->block_size,
> +			     POSIX_FADV_WILLNEED);
> +#else
> +	return EXT2_ET_OP_NOT_SUPPORTED;
> +#endif
> +}
> +
> +static errcode_t unix_cache_release(io_channel channel,
> +				    unsigned long long block,
> +				    unsigned long long count)
> +{
> +#ifdef POSIX_FADV_DONTNEED
> +	struct unix_private_data *data;
> +
> +	data = (struct unix_private_data *)channel->private_data;
> +	return posix_fadvise(data->dev,
> +			     (ext2_loff_t)block * channel->block_size,
> +			     (ext2_loff_t)count * channel->block_size,
> +			     POSIX_FADV_DONTNEED);
> +#else
> +	return EXT2_ET_OP_NOT_SUPPORTED;
> +#endif
> +}
> +
> static errcode_t unix_write_blk(io_channel channel, unsigned long block,
> 				int count, const void *buf)
> {
> @@ -961,6 +999,8 @@ static struct struct_io_manager struct_unix_manager = {
> 	unix_read_blk64,
> 	unix_write_blk64,
> 	unix_discard,
> +	unix_cache_readahead,
> +	unix_cache_release,
> };
> 
> io_manager unix_io_manager = &struct_unix_manager;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2014-03-11  6:57 ` [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2014-03-17 23:10   ` Andreas Dilger
  2014-03-18  4:42     ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Andreas Dilger @ 2014-03-17 23:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 6891 bytes --]


On Mar 11, 2014, at 12:57 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:

> e2fsck pass1 is modified to use the block group data prefetch function
> to try to fetch the inode tables into the pagecache before it is
> needed.  In order to avoid cache thrashing, we limit ourselves to
> prefetching at most half the available memory.

It looks like the prefetching is done in huge chunks, and not incrementally?
It makes more sense to have a steady amount of prefetch happening instead
of waiting for it to all be consumed before starting a new batch.  See in
e2fsck_pass1() below.

> pass2 is modified to use the dirblock prefetching function to prefetch
> the list of directory blocks that are assembled in pass1.  So long as
> we don't anticipate rehashing the dirs (pass 3a), we can release the
> dirblocks as soon as we're done checking them.
> 
> pass4 is modified to prefetch the block and inode bitmaps in
> anticipation of pass 5, because pass4 is entirely CPU bound.
> 
> In general, these mechanisms can halve fsck time, if the host system
> has sufficient memory and the storage system can provide a lot of
> IOPs.  SSDs and multi-spindle RAIDs see the most speedup; single disks
> experience a modest speedup, and single-spindle USB mass storage
> devices see hardly any benefit.
> 
> By default, readahead will try to fill half the physical memory in the
> system.  The -R option can be given to specify the amount of memory to
> use for readahead, or zero to disable it entirely; or an option can be
> given in e2fsck.conf.
> 
> 
> +static void *pass1_readahead(void *p)
> +{
> +	struct pass1ra_ctx *c = p;
> +	errcode_t err;
> +
> +	ext2fs_readahead(c->fs, EXT2_READA_ITABLE, c->group, c->ngroups);
> +	return NULL;
> +}
> +
> +static errcode_t initiate_readahead(e2fsck_t ctx, dgrp_t group, dgrp_t ngroups)
> +{
> +	struct pass1ra_ctx *ractx;
> +	errcode_t err;
> +
> +	err = ext2fs_get_mem(sizeof(*ractx), &ractx);
> +	if (err)
> +		return err;
> +
> +	ractx->fs = ctx->fs;
> +	ractx->group = group;
> +	ractx->ngroups = ngroups;
> +
> +	err = e2fsck_run_thread(&ctx->ra_thread, pass1_readahead,
> +				pass1_readahead_cleanup, ractx);
> +	if (err)
> +		ext2fs_free_mem(&ractx);
> +
> +	return err;
> +}
> +
>  void e2fsck_pass1(e2fsck_t ctx)
>  {
> 	int	i;
> @@ -611,10 +654,37 @@ void e2fsck_pass1(e2fsck_t ctx)
> 	int		busted_fs_time = 0;
> 	int		inode_size;
> 	int		failed_csum = 0;
> +	dgrp_t		grp;
> +	ext2_ino_t	ra_threshold = 0;
> +	dgrp_t		ra_groups = 0;
> +	errcode_t	err;
> 
> 	init_resource_track(&rtrack, ctx->fs->io);
> 	clear_problem_context(&pctx);
> 
> +	/* If we can do readahead, figure out how many groups to pull in. */
> +	if (!ext2fs_can_readahead(ctx->fs))
> +		ctx->readahead_mem_kb = 0;
> +	if (ctx->readahead_mem_kb) {
> +		ra_groups = ctx->readahead_mem_kb /
> +			    (fs->inode_blocks_per_group * fs->blocksize /
> +			     1024);
> +		if (ra_groups < 16)
> +			ra_groups = 0;

It probably always makes sense to prefetch one group if possible?

> +		else if (ra_groups > fs->group_desc_count)
> +			ra_groups = fs->group_desc_count;
> +		if (ra_groups) {
> +			err = initiate_readahead(ctx, grp, ra_groups);

Looks like "grp" is used uninitialized here.  Should be "grp = 0" to start.

> +			if (err) {
> +				com_err(ctx->program_name, err, "%s",
> +					_("while starting pass1 readahead"));
> +				ra_groups = 0;
> +			}
> +			ra_threshold = ra_groups *
> +				       fs->super->s_inodes_per_group;

This is the threshold of the last inode to be prefetched.

> +		}
> +	}
> +
> 	if (!(ctx->options & E2F_OPT_PREEN))
> 		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
> 
> @@ -778,6 +848,19 @@ void e2fsck_pass1(e2fsck_t ctx)
> 			if (e2fsck_mmp_update(fs))
> 				fatal_error(ctx, 0);
> 		}
> +		if (ra_groups && ino > ra_threshold) {

This doesn't start prefetching again until the last inode is checked.
It probably makes sense to have a sliding window to start readahead
again once half of the memory has been consumed or so.  Otherwise,
the scanning will block here until the next inode table is read from
disk, instead of the readahead being started earlier and it is in RAM.

> +			grp = (ino - 1) / fs->super->s_inodes_per_group;
> +			ra_threshold = (grp + ra_groups) *
> +				       fs->super->s_inodes_per_group;

> +			err = initiate_readahead(ctx, grp, ra_groups);
> +			if (err == EAGAIN) {
> +				printf("Disabling slow readahead.\n");
> +				ra_groups = 0;

I see that EAGAIN comes from e2fsck_run_thread(), if there is still a
readahead thread running.  Does it make sense to stop readahead in
that case?  It would seem to me that if readahead is taking a long
time and the inode processing is catching up to it (i.e. IO bound)
then it is even more important to do readahead in that case.

Something like the following to readahead half of the inode tables once
half of them have been processed, and shrink the readahead window if the
readahead is being called too often:

	if (ra_groups != 0 && ino > ra_threshold - (ra_groups + 1) / 2 *
					fs->super->s_inodes_per_group) {			if (ra_threshold < ino)
			ra_threshold = ino;
		grp = (ra_threshold -1) / fs->super->s_inodes_per_group;
		err = initiate_readahead(ctx, grp, (ra_groups + 1) / 2);
		if (err == EAGAIN)
			ra_groups = (ra_groups + 1) / 2;
		else if (err)
			com_err(ctx->program_name, err, "%s",
				_("while starting pass1 readahead"));
		else
			ra_threshold += (ra_groups + 1) / 2 *
				fs->super->s_inodes_per_group;
	}

> +			} else if (err) {
> +				com_err(ctx->program_name, err, "%s",
> +					_("while starting pass1 readahead"));
> +			}
> +		}
> 		old_op = ehandler_operation(_("getting next inode from scan"));
> 		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
> 							  inode, inode_size);
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index 80ebdb1..d6ef8c5 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -74,7 +74,7 @@ static void usage(e2fsck_t ctx)
> 		_("Usage: %s [-panyrcdfvtDFV] [-b superblock] [-B blocksize]\n"
> 		"\t\t[-I inode_buffer_blocks] [-P process_inode_size]\n"
> 		"\t\t[-l|-L bad_blocks_file] [-C fd] [-j external_journal]\n"
> -		"\t\t[-E extended-options] device\n"),
> +		"\t\t[-E extended-options] [-R readahead_kb] device\n"),

Note that "-R" is only recently deprecated for raid options, why not make
this an option under "-E"?

> 		ctx->program_name);
> 
> 	fprintf(stderr, "%s", _("\nEmergency help:\n"
> @@ -90,6 +90,7 @@ static void usage(e2fsck_t ctx)
> 		" -j external_journal  Set location of the external journal\n"
> 		" -l bad_blocks_file   Add to badblocks list\n"
> 		" -L bad_blocks_file   Set badblocks list\n"
> +		" -R readahead_kb      Allow this much readahead.\n"


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 36/49] libext2fs: allow clients to read-ahead metadata
  2014-03-11  6:57 ` [PATCH 36/49] libext2fs: allow clients to read-ahead metadata Darrick J. Wong
@ 2014-03-17 23:11   ` Andreas Dilger
  0 siblings, 0 replies; 88+ messages in thread
From: Andreas Dilger @ 2014-03-17 23:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 5739 bytes --]


On Mar 11, 2014, at 12:57 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> This patch adds to libext2fs the ability to pre-fetch metadata
> into the page cache in the hopes of speeding up libext2fs' clients.
> There are two new library functions -- the first allows a client to
> readahead a list of blocks, and the second is a helper function that
> uses that first mechanism to load group data (bitmaps, inode tables).
> 
> e2fsck will employ both of these methods to speed itself up.

You can also add a Reviewed-by: Andreas Dilger <adilger@dilger.ca> on this.

> diff --git a/lib/ext2fs/readahead.c b/lib/ext2fs/readahead.c
> new file mode 100644
> index 0000000..ed6e555
> --- /dev/null
> +++ b/lib/ext2fs/readahead.c
> @@ -0,0 +1,188 @@
> +struct read_dblist {
> +	errcode_t err;
> +	blk64_t run_start;
> +	blk64_t run_len;
> +};
> +
> +static EXT2_QSORT_TYPE readahead_dir_block_cmp(const void *a, const void *b)
> +{
> +	const struct ext2_db_entry2 *db_a =
> +		(const struct ext2_db_entry2 *) a;
> +	const struct ext2_db_entry2 *db_b =
> +		(const struct ext2_db_entry2 *) b;

> +
> +	return (int) (db_a->blk - db_b->blk);
> +}
> +
> +static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
> +			       void *priv_data)
> +{
> +	errcode_t err = 0;
> +	struct read_dblist *pr = priv_data;
> +
> +	if (!pr->run_len || db->blk != pr->run_start + pr->run_len) {


It probably isn't necessary to check "!pr->run_len", since the only
case where this isn't entered on a new look is db->blk == 0, which is
always loaded when the filesystem is mounted anyway.

Cheers, Andreas

> +		if (pr->run_len) {
> +			pr->err = io_channel_cache_readahead(fs->io,
> +							     pr->run_start,
> +							     pr->run_len);
> +			dbg_printf("readahead start=%llu len=%llu err=%d\n",
> +				   pr->run_start, pr->run_len,
> +				   (int)pr->err);
> +		}
> +		pr->run_start = db->blk;
> +		pr->run_len = 0;

> +	}
> +	pr->run_len += db->blockcnt;
> +
> +	return pr->err ? DBLIST_ABORT : 0;
> +}
> +
> +errcode_t ext2fs_readahead_dblist(ext2_filsys fs, int flags,
> +				  ext2_dblist dblist)
> +{
> +	errcode_t err;
> +	struct read_dblist pr;
> +
> +	dbg_printf("%s: flags=0x%x\n", __func__, flags);
> +	if (flags)
> +		return EXT2_ET_INVALID_ARGUMENT;
> +
> +	ext2fs_dblist_sort2(dblist, readahead_dir_block_cmp);
> +
> +	memset(&pr, 0, sizeof(pr));
> +	err = ext2fs_dblist_iterate2(dblist, readahead_dir_block, &pr);
> +	if (pr.err)
> +		return pr.err;
> +	if (err)
> +		return err;
> +
> +	if (pr.run_len)
> +		err = io_channel_cache_readahead(fs->io, pr.run_start,
> +						 pr.run_len);
> +
> +	return err;
> +}
> +
> +errcode_t ext2fs_readahead(ext2_filsys fs, int flags, dgrp_t start,
> +			   dgrp_t ngroups)
> +{
> +	blk64_t		super, old_gdt, new_gdt;
> +	blk_t		blocks;
> +	dgrp_t		i;
> +	ext2_dblist	dblist;
> +	dgrp_t		end = start + ngroups;
> +	errcode_t	err = 0;
> +
> +	dbg_printf("%s: flags=0x%x start=%d groups=%d\n", __func__, flags,
> +		   start, ngroups);
> +	if (flags & ~EXT2_READA_ALL_FLAGS)
> +		return EXT2_ET_INVALID_ARGUMENT;
> +
> +	if (end > fs->group_desc_count)
> +		end = fs->group_desc_count;
> +
> +	if (flags == 0)
> +		return 0;
> +
> +	err = ext2fs_init_dblist(fs, &dblist);
> +	if (err)
> +		return err;
> +
> +	for (i = start; i < end; i++) {
> +		err = ext2fs_super_and_bgd_loc2(fs, i, &super, &old_gdt,
> +						&new_gdt, &blocks);
> +		if (err)
> +			break;
> +
> +		if (flags & EXT2_READA_SUPER) {
> +			err = ext2fs_add_dir_block2(dblist, 0, super, 0);
> +			if (err)
> +				break;
> +		}
> +
> +		if (flags & EXT2_READA_GDT) {
> +			if (old_gdt)
> +				err = ext2fs_add_dir_block2(dblist, 0, old_gdt,
> +							    blocks);
> +			else if (new_gdt)
> +				err = ext2fs_add_dir_block2(dblist, 0, new_gdt,
> +							    blocks);
> +			else
> +				err = 0;
> +			if (err)
> +				break;
> +		}
> +
> +		if ((flags & EXT2_READA_BBITMAP) &&
> +		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_BLOCK_UNINIT) &&
> +		    ext2fs_bg_free_blocks_count(fs, i) <
> +				fs->super->s_blocks_per_group) {
> +			super = ext2fs_block_bitmap_loc(fs, i);
> +			err = ext2fs_add_dir_block2(dblist, 0, super, 1);
> +			if (err)
> +				break;
> +		}
> +
> +		if ((flags & EXT2_READA_IBITMAP) &&
> +		    !ext2fs_bg_flags_test(fs, i, EXT2_BG_INODE_UNINIT) &&
> +		    ext2fs_bg_free_inodes_count(fs, i) <
> +				fs->super->s_inodes_per_group) {
> +			super = ext2fs_inode_bitmap_loc(fs, i);
> +			err = ext2fs_add_dir_block2(dblist, 0, super, 1);
> +			if (err)
> +				break;
> +		}
> +
> +		if ((flags & EXT2_READA_ITABLE) &&
> +		    ext2fs_bg_free_inodes_count(fs, i) <
> +				fs->super->s_inodes_per_group) {
> +			super = ext2fs_inode_table_loc(fs, i);
> +			blocks = fs->inode_blocks_per_group -
> +				 (ext2fs_bg_itable_unused(fs, i) *
> +				  EXT2_INODE_SIZE(fs->super) / fs->blocksize);
> +			err = ext2fs_add_dir_block2(dblist, 0, super, blocks);
> +			if (err)
> +				break;
> +		}
> +	}
> +
> +	if (!err)
> +		err = ext2fs_readahead_dblist(fs, 0, dblist);
> +
> +	ext2fs_free_dblist(dblist);
> +	return err;
> +}
> +
> +int ext2fs_can_readahead(ext2_filsys fs)
> +{
> +	errcode_t err;
> +
> +	err = io_channel_cache_readahead(fs->io, 0, 1);
> +	dbg_printf("%s: supp=%d\n", __func__, err != EXT2_ET_OP_NOT_SUPPORTED);
> +	return err != EXT2_ET_OP_NOT_SUPPORTED;
> +}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2014-03-17 23:10   ` Andreas Dilger
@ 2014-03-18  4:42     ` Darrick J. Wong
  2014-03-18  6:50       ` Darrick J. Wong
  0 siblings, 1 reply; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-18  4:42 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: tytso, linux-ext4

On Mon, Mar 17, 2014 at 05:10:22PM -0600, Andreas Dilger wrote:
> 
> On Mar 11, 2014, at 12:57 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > e2fsck pass1 is modified to use the block group data prefetch function
> > to try to fetch the inode tables into the pagecache before it is
> > needed.  In order to avoid cache thrashing, we limit ourselves to
> > prefetching at most half the available memory.
> 
> It looks like the prefetching is done in huge chunks, and not incrementally?
> It makes more sense to have a steady amount of prefetch happening instead
> of waiting for it to all be consumed before starting a new batch.  See in
> e2fsck_pass1() below.

I agree that prefetch ought not to wait until the entire inode table is
consumed.

> > pass2 is modified to use the dirblock prefetching function to prefetch
> > the list of directory blocks that are assembled in pass1.  So long as
> > we don't anticipate rehashing the dirs (pass 3a), we can release the
> > dirblocks as soon as we're done checking them.
> > 
> > pass4 is modified to prefetch the block and inode bitmaps in
> > anticipation of pass 5, because pass4 is entirely CPU bound.
> > 
> > In general, these mechanisms can halve fsck time, if the host system
> > has sufficient memory and the storage system can provide a lot of
> > IOPs.  SSDs and multi-spindle RAIDs see the most speedup; single disks
> > experience a modest speedup, and single-spindle USB mass storage
> > devices see hardly any benefit.
> > 
> > By default, readahead will try to fill half the physical memory in the
> > system.  The -R option can be given to specify the amount of memory to
> > use for readahead, or zero to disable it entirely; or an option can be
> > given in e2fsck.conf.
> > 
> > 
> > +static void *pass1_readahead(void *p)
> > +{
> > +	struct pass1ra_ctx *c = p;
> > +	errcode_t err;
> > +
> > +	ext2fs_readahead(c->fs, EXT2_READA_ITABLE, c->group, c->ngroups);
> > +	return NULL;
> > +}
> > +
> > +static errcode_t initiate_readahead(e2fsck_t ctx, dgrp_t group, dgrp_t ngroups)
> > +{
> > +	struct pass1ra_ctx *ractx;
> > +	errcode_t err;
> > +
> > +	err = ext2fs_get_mem(sizeof(*ractx), &ractx);
> > +	if (err)
> > +		return err;
> > +
> > +	ractx->fs = ctx->fs;
> > +	ractx->group = group;
> > +	ractx->ngroups = ngroups;
> > +
> > +	err = e2fsck_run_thread(&ctx->ra_thread, pass1_readahead,
> > +				pass1_readahead_cleanup, ractx);
> > +	if (err)
> > +		ext2fs_free_mem(&ractx);
> > +
> > +	return err;
> > +}
> > +
> >  void e2fsck_pass1(e2fsck_t ctx)
> >  {
> > 	int	i;
> > @@ -611,10 +654,37 @@ void e2fsck_pass1(e2fsck_t ctx)
> > 	int		busted_fs_time = 0;
> > 	int		inode_size;
> > 	int		failed_csum = 0;
> > +	dgrp_t		grp;
> > +	ext2_ino_t	ra_threshold = 0;
> > +	dgrp_t		ra_groups = 0;
> > +	errcode_t	err;
> > 
> > 	init_resource_track(&rtrack, ctx->fs->io);
> > 	clear_problem_context(&pctx);
> > 
> > +	/* If we can do readahead, figure out how many groups to pull in. */
> > +	if (!ext2fs_can_readahead(ctx->fs))
> > +		ctx->readahead_mem_kb = 0;
> > +	if (ctx->readahead_mem_kb) {
> > +		ra_groups = ctx->readahead_mem_kb /
> > +			    (fs->inode_blocks_per_group * fs->blocksize /
> > +			     1024);
> > +		if (ra_groups < 16)
> > +			ra_groups = 0;
> 
> It probably always makes sense to prefetch one group if possible?

I was intending to skip pass1 RA if there wasn't a lot of memory around.  Not
that I did a lot of work to figure out if < 16 groups really was a "lowmem"
situation.

> > +		else if (ra_groups > fs->group_desc_count)
> > +			ra_groups = fs->group_desc_count;
> > +		if (ra_groups) {
> > +			err = initiate_readahead(ctx, grp, ra_groups);
> 
> Looks like "grp" is used uninitialized here.  Should be "grp = 0" to start.

Oops, good catch.

> > +			if (err) {
> > +				com_err(ctx->program_name, err, "%s",
> > +					_("while starting pass1 readahead"));
> > +				ra_groups = 0;
> > +			}
> > +			ra_threshold = ra_groups *
> > +				       fs->super->s_inodes_per_group;
> 
> This is the threshold of the last inode to be prefetched.

Yes.

> > +		}
> > +	}
> > +
> > 	if (!(ctx->options & E2F_OPT_PREEN))
> > 		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
> > 
> > @@ -778,6 +848,19 @@ void e2fsck_pass1(e2fsck_t ctx)
> > 			if (e2fsck_mmp_update(fs))
> > 				fatal_error(ctx, 0);
> > 		}
> > +		if (ra_groups && ino > ra_threshold) {
> 
> This doesn't start prefetching again until the last inode is checked.
> It probably makes sense to have a sliding window to start readahead
> again once half of the memory has been consumed or so.  Otherwise,
> the scanning will block here until the next inode table is read from
> disk, instead of the readahead being started earlier and it is in RAM.

You're right, it would be even faster if ra_threshold were to start RA a couple
of block groups *before* we run out of prefetched data.

> > +			grp = (ino - 1) / fs->super->s_inodes_per_group;
> > +			ra_threshold = (grp + ra_groups) *
> > +				       fs->super->s_inodes_per_group;
> 
> > +			err = initiate_readahead(ctx, grp, ra_groups);
> > +			if (err == EAGAIN) {
> > +				printf("Disabling slow readahead.\n");
> > +				ra_groups = 0;
> 
> I see that EAGAIN comes from e2fsck_run_thread(), if there is still a
> readahead thread running.  Does it make sense to stop readahead in
> that case?  It would seem to me that if readahead is taking a long
> time and the inode processing is catching up to it (i.e. IO bound)
> then it is even more important to do readahead in that case.

This is tricky -- POSIX_FADV_WILLNEED starts a non-blocking readahead, so there
really isn't any good way to tell if the inode checker has caught up to RA.
Here I'm interpreting "RA thread still running" as a warning that soon the
inode checker will be ahead of the RA, so we might as well stop the RA.
However, there still isn't really much good way to find out exactly where RA
is.

> Something like the following to readahead half of the inode tables once
> half of them have been processed, and shrink the readahead window if the
> readahead is being called too often:

Hmm.  I will give this a shot and report back; this seems like it ought to
produce a better result than "two before" as I suggested above.

> 	if (ra_groups != 0 && ino > ra_threshold - (ra_groups + 1) / 2 *
> 					fs->super->s_inodes_per_group) {
>		if (ra_threshold < ino)
> 			ra_threshold = ino;
> 		grp = (ra_threshold -1) / fs->super->s_inodes_per_group;
> 		err = initiate_readahead(ctx, grp, (ra_groups + 1) / 2);
> 		if (err == EAGAIN)
> 			ra_groups = (ra_groups + 1) / 2;
> 		else if (err)
> 			com_err(ctx->program_name, err, "%s",
> 				_("while starting pass1 readahead"));
> 		else
> 			ra_threshold += (ra_groups + 1) / 2 *
> 				fs->super->s_inodes_per_group;
> 	}
> 
> > +			} else if (err) {
> > +				com_err(ctx->program_name, err, "%s",
> > +					_("while starting pass1 readahead"));
> > +			}
> > +		}
> > 		old_op = ehandler_operation(_("getting next inode from scan"));
> > 		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
> > 							  inode, inode_size);
> > diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> > index 80ebdb1..d6ef8c5 100644
> > --- a/e2fsck/unix.c
> > +++ b/e2fsck/unix.c
> > @@ -74,7 +74,7 @@ static void usage(e2fsck_t ctx)
> > 		_("Usage: %s [-panyrcdfvtDFV] [-b superblock] [-B blocksize]\n"
> > 		"\t\t[-I inode_buffer_blocks] [-P process_inode_size]\n"
> > 		"\t\t[-l|-L bad_blocks_file] [-C fd] [-j external_journal]\n"
> > -		"\t\t[-E extended-options] device\n"),
> > +		"\t\t[-E extended-options] [-R readahead_kb] device\n"),
> 
> Note that "-R" is only recently deprecated for raid options, why not make
> this an option under "-E"?

Ok.

--D
> 
> > 		ctx->program_name);
> > 
> > 	fprintf(stderr, "%s", _("\nEmergency help:\n"
> > @@ -90,6 +90,7 @@ static void usage(e2fsck_t ctx)
> > 		" -j external_journal  Set location of the external journal\n"
> > 		" -l bad_blocks_file   Add to badblocks list\n"
> > 		" -L bad_blocks_file   Set badblocks list\n"
> > +		" -R readahead_kb      Allow this much readahead.\n"
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4
  2014-03-18  4:42     ` Darrick J. Wong
@ 2014-03-18  6:50       ` Darrick J. Wong
  0 siblings, 0 replies; 88+ messages in thread
From: Darrick J. Wong @ 2014-03-18  6:50 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: tytso, linux-ext4

On Mon, Mar 17, 2014 at 09:42:31PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 17, 2014 at 05:10:22PM -0600, Andreas Dilger wrote:
> > 
> > On Mar 11, 2014, at 12:57 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > > e2fsck pass1 is modified to use the block group data prefetch function
> > > to try to fetch the inode tables into the pagecache before it is
> > > needed.  In order to avoid cache thrashing, we limit ourselves to
> > > prefetching at most half the available memory.
> > 
> > It looks like the prefetching is done in huge chunks, and not incrementally?
> > It makes more sense to have a steady amount of prefetch happening instead
> > of waiting for it to all be consumed before starting a new batch.  See in
> > e2fsck_pass1() below.
> 
> I agree that prefetch ought not to wait until the entire inode table is
> consumed.
> 
> > > pass2 is modified to use the dirblock prefetching function to prefetch
> > > the list of directory blocks that are assembled in pass1.  So long as
> > > we don't anticipate rehashing the dirs (pass 3a), we can release the
> > > dirblocks as soon as we're done checking them.
> > > 
> > > pass4 is modified to prefetch the block and inode bitmaps in
> > > anticipation of pass 5, because pass4 is entirely CPU bound.
> > > 
> > > In general, these mechanisms can halve fsck time, if the host system
> > > has sufficient memory and the storage system can provide a lot of
> > > IOPs.  SSDs and multi-spindle RAIDs see the most speedup; single disks
> > > experience a modest speedup, and single-spindle USB mass storage
> > > devices see hardly any benefit.
> > > 
> > > By default, readahead will try to fill half the physical memory in the
> > > system.  The -R option can be given to specify the amount of memory to
> > > use for readahead, or zero to disable it entirely; or an option can be
> > > given in e2fsck.conf.
> > > 
> > > 
> > > +static void *pass1_readahead(void *p)
> > > +{
> > > +	struct pass1ra_ctx *c = p;
> > > +	errcode_t err;
> > > +
> > > +	ext2fs_readahead(c->fs, EXT2_READA_ITABLE, c->group, c->ngroups);
> > > +	return NULL;
> > > +}
> > > +
> > > +static errcode_t initiate_readahead(e2fsck_t ctx, dgrp_t group, dgrp_t ngroups)
> > > +{
> > > +	struct pass1ra_ctx *ractx;
> > > +	errcode_t err;
> > > +
> > > +	err = ext2fs_get_mem(sizeof(*ractx), &ractx);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	ractx->fs = ctx->fs;
> > > +	ractx->group = group;
> > > +	ractx->ngroups = ngroups;
> > > +
> > > +	err = e2fsck_run_thread(&ctx->ra_thread, pass1_readahead,
> > > +				pass1_readahead_cleanup, ractx);
> > > +	if (err)
> > > +		ext2fs_free_mem(&ractx);
> > > +
> > > +	return err;
> > > +}
> > > +
> > >  void e2fsck_pass1(e2fsck_t ctx)
> > >  {
> > > 	int	i;
> > > @@ -611,10 +654,37 @@ void e2fsck_pass1(e2fsck_t ctx)
> > > 	int		busted_fs_time = 0;
> > > 	int		inode_size;
> > > 	int		failed_csum = 0;
> > > +	dgrp_t		grp;
> > > +	ext2_ino_t	ra_threshold = 0;
> > > +	dgrp_t		ra_groups = 0;
> > > +	errcode_t	err;
> > > 
> > > 	init_resource_track(&rtrack, ctx->fs->io);
> > > 	clear_problem_context(&pctx);
> > > 
> > > +	/* If we can do readahead, figure out how many groups to pull in. */
> > > +	if (!ext2fs_can_readahead(ctx->fs))
> > > +		ctx->readahead_mem_kb = 0;
> > > +	if (ctx->readahead_mem_kb) {
> > > +		ra_groups = ctx->readahead_mem_kb /
> > > +			    (fs->inode_blocks_per_group * fs->blocksize /
> > > +			     1024);
> > > +		if (ra_groups < 16)
> > > +			ra_groups = 0;
> > 
> > It probably always makes sense to prefetch one group if possible?
> 
> I was intending to skip pass1 RA if there wasn't a lot of memory around.  Not
> that I did a lot of work to figure out if < 16 groups really was a "lowmem"
> situation.
> 
> > > +		else if (ra_groups > fs->group_desc_count)
> > > +			ra_groups = fs->group_desc_count;
> > > +		if (ra_groups) {
> > > +			err = initiate_readahead(ctx, grp, ra_groups);
> > 
> > Looks like "grp" is used uninitialized here.  Should be "grp = 0" to start.
> 
> Oops, good catch.
> 
> > > +			if (err) {
> > > +				com_err(ctx->program_name, err, "%s",
> > > +					_("while starting pass1 readahead"));
> > > +				ra_groups = 0;
> > > +			}
> > > +			ra_threshold = ra_groups *
> > > +				       fs->super->s_inodes_per_group;
> > 
> > This is the threshold of the last inode to be prefetched.
> 
> Yes.
> 
> > > +		}
> > > +	}
> > > +
> > > 	if (!(ctx->options & E2F_OPT_PREEN))
> > > 		fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
> > > 
> > > @@ -778,6 +848,19 @@ void e2fsck_pass1(e2fsck_t ctx)
> > > 			if (e2fsck_mmp_update(fs))
> > > 				fatal_error(ctx, 0);
> > > 		}
> > > +		if (ra_groups && ino > ra_threshold) {
> > 
> > This doesn't start prefetching again until the last inode is checked.
> > It probably makes sense to have a sliding window to start readahead
> > again once half of the memory has been consumed or so.  Otherwise,
> > the scanning will block here until the next inode table is read from
> > disk, instead of the readahead being started earlier and it is in RAM.
> 
> You're right, it would be even faster if ra_threshold were to start RA a couple
> of block groups *before* we run out of prefetched data.
> 
> > > +			grp = (ino - 1) / fs->super->s_inodes_per_group;
> > > +			ra_threshold = (grp + ra_groups) *
> > > +				       fs->super->s_inodes_per_group;
> > 
> > > +			err = initiate_readahead(ctx, grp, ra_groups);
> > > +			if (err == EAGAIN) {
> > > +				printf("Disabling slow readahead.\n");
> > > +				ra_groups = 0;
> > 
> > I see that EAGAIN comes from e2fsck_run_thread(), if there is still a
> > readahead thread running.  Does it make sense to stop readahead in
> > that case?  It would seem to me that if readahead is taking a long
> > time and the inode processing is catching up to it (i.e. IO bound)
> > then it is even more important to do readahead in that case.
> 
> This is tricky -- POSIX_FADV_WILLNEED starts a non-blocking readahead, so there
> really isn't any good way to tell if the inode checker has caught up to RA.
> Here I'm interpreting "RA thread still running" as a warning that soon the
> inode checker will be ahead of the RA, so we might as well stop the RA.
> However, there still isn't really much good way to find out exactly where RA
> is.
> 
> > Something like the following to readahead half of the inode tables once
> > half of them have been processed, and shrink the readahead window if the
> > readahead is being called too often:
> 
> Hmm.  I will give this a shot and report back; this seems like it ought to
> produce a better result than "two before" as I suggested above.
> 
> > 	if (ra_groups != 0 && ino > ra_threshold - (ra_groups + 1) / 2 *
> > 					fs->super->s_inodes_per_group) {
> >		if (ra_threshold < ino)
> > 			ra_threshold = ino;
> > 		grp = (ra_threshold -1) / fs->super->s_inodes_per_group;
> > 		err = initiate_readahead(ctx, grp, (ra_groups + 1) / 2);
> > 		if (err == EAGAIN)
> > 			ra_groups = (ra_groups + 1) / 2;
> > 		else if (err)
> > 			com_err(ctx->program_name, err, "%s",
> > 				_("while starting pass1 readahead"));
> > 		else
> > 			ra_threshold += (ra_groups + 1) / 2 *
> > 				fs->super->s_inodes_per_group;
> > 	}

Now that I've thought about this a little harder, even this isn't quite
sufficient -- since the inode scan skips inode_uninit blockgroups, we have to
figure out which group our new ra_threshold inode is in and scan backwards
through the groups until we find a bg that isn't inode_uninit.  If we don't do
this, the scan will skip right past our ra_threshold, which means that RA
starts late or possibly even after we've started scanning inodes from the group
we're RAing.

That said, even doing that I don't see much more of a speed up.

--D

> > 
> > > +			} else if (err) {
> > > +				com_err(ctx->program_name, err, "%s",
> > > +					_("while starting pass1 readahead"));
> > > +			}
> > > +		}
> > > 		old_op = ehandler_operation(_("getting next inode from scan"));
> > > 		pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
> > > 							  inode, inode_size);
> > > diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> > > index 80ebdb1..d6ef8c5 100644
> > > --- a/e2fsck/unix.c
> > > +++ b/e2fsck/unix.c
> > > @@ -74,7 +74,7 @@ static void usage(e2fsck_t ctx)
> > > 		_("Usage: %s [-panyrcdfvtDFV] [-b superblock] [-B blocksize]\n"
> > > 		"\t\t[-I inode_buffer_blocks] [-P process_inode_size]\n"
> > > 		"\t\t[-l|-L bad_blocks_file] [-C fd] [-j external_journal]\n"
> > > -		"\t\t[-E extended-options] device\n"),
> > > +		"\t\t[-E extended-options] [-R readahead_kb] device\n"),
> > 
> > Note that "-R" is only recently deprecated for raid options, why not make
> > this an option under "-E"?
> 
> Ok.
> 
> --D
> > 
> > > 		ctx->program_name);
> > > 
> > > 	fprintf(stderr, "%s", _("\nEmergency help:\n"
> > > @@ -90,6 +90,7 @@ static void usage(e2fsck_t ctx)
> > > 		" -j external_journal  Set location of the external journal\n"
> > > 		" -l bad_blocks_file   Add to badblocks list\n"
> > > 		" -L bad_blocks_file   Set badblocks list\n"
> > > +		" -R readahead_kb      Allow this much readahead.\n"
> > 
> > 
> > Cheers, Andreas
> > 
> > 
> > 
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2014-03-18  6:50 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-11  6:53 [PATCH 00/49] e2fsprogs patchbomb 3/14 Darrick J. Wong
2014-03-11  6:54 ` [PATCH 01/49] create_inode: clean up return mess in do_write_internal Darrick J. Wong
2014-03-11 20:30   ` Andreas Dilger
2014-03-11 20:41     ` Darrick J. Wong
2014-03-11 21:08       ` Theodore Ts'o
2014-03-12  3:24         ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 02/49] create_inode: minor cleanups Darrick J. Wong
2014-03-11 20:31   ` Andreas Dilger
2014-03-12  3:25     ` Theodore Ts'o
2014-03-12  3:27     ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 03/49] create_inode: whitespace fixes Darrick J. Wong
2014-03-12  3:27   ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 04/49] create_inode: move debugfs internal state back to debugfs Darrick J. Wong
2014-03-12  3:31   ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 05/49] create_inode: handle hard link inum mappings per populate_fs invocation Darrick J. Wong
2014-03-12  3:46   ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 06/49] libext2fs: support modifying arbitrary extended attributes (v5) Darrick J. Wong
2014-03-12  3:51   ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 07/49] debugfs: create commands to edit extended attributes Darrick J. Wong
2014-03-12  3:51   ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 08/49] e2fsck: don't rehash inline directories Darrick J. Wong
2014-03-13  3:52   ` Theodore Ts'o
2014-03-13  5:38     ` Darrick J. Wong
2014-03-13 12:13       ` Theodore Ts'o
2014-03-11  6:54 ` [PATCH 09/49] libext2fs: don't fail when doing a strict rewrite of inline data Darrick J. Wong
2014-03-14 13:19   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 10/49] libext2fs: fix iblocks correctly when expanding an inline_data file Darrick J. Wong
2014-03-12 16:38   ` Andreas Dilger
2014-03-12 17:01     ` Darrick J. Wong
2014-03-14 13:25       ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 11/49] e2fsck: zero errcode when checking inline data blocks Darrick J. Wong
2014-03-14 13:26   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 12/49] libext2fs: during inlinedata expand, don't corrupt inode Darrick J. Wong
2014-03-14 13:29   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 13/49] libext2fs: repair side effects when iterating dirents in inline dirs Darrick J. Wong
2014-03-14 13:30   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 14/49] resize2fs: add inline dirs for remapping Darrick J. Wong
2014-03-14 13:31   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 15/49] all: Introduce cppcheck static checking for make C=1 Darrick J. Wong
2014-03-14 13:33   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 16/49] misc: cppcheck cleanups Darrick J. Wong
2014-03-14 13:34   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 17/49] libext2fs: fix 64bit overflow in ext2fs_block_alloc_stats_range Darrick J. Wong
2014-03-14 13:35   ` Theodore Ts'o
2014-03-11  6:55 ` [PATCH 18/49] misc: fix header complaints and resource leaks in e2fsprogs Darrick J. Wong
2014-03-14 13:39   ` Theodore Ts'o
2014-03-14 13:53   ` Theodore Ts'o
2014-03-14 19:23     ` Darrick J. Wong
2014-03-11  6:55 ` [PATCH 19/49] libext2fs: fix memory leak when drastically shrinking extent tree depth Darrick J. Wong
2014-03-14 13:56   ` Theodore Ts'o
2014-03-11  6:56 ` [PATCH 20/49] libext2fs: fix parents when modifying extents Darrick J. Wong
2014-03-14 14:01   ` Theodore Ts'o
2014-03-14 20:13     ` Darrick J. Wong
2014-03-15 15:46       ` Theodore Ts'o
2014-03-17 16:59         ` Darrick J. Wong
2014-03-11  6:56 ` [PATCH 21/49] e2fsck: print runs of duplicate blocks instead of all of them Darrick J. Wong
2014-03-15 16:19   ` Theodore Ts'o
2014-03-11  6:56 ` [PATCH 22/49] e2fsck: verify checksums after checking everything else Darrick J. Wong
2014-03-11  6:56 ` [PATCH 23/49] e2fsck: fix the extended attribute checksum error message Darrick J. Wong
2014-03-11  6:56 ` [PATCH 24/49] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
2014-03-11  6:56 ` [PATCH 25/49] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
2014-03-11  6:56 ` [PATCH 26/49] tests: add test for corrupted checksummed root directory block Darrick J. Wong
2014-03-11  6:56 ` [PATCH 27/49] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
2014-03-11  6:56 ` [PATCH 28/49] mke2fs: set block_validity as a default mount option Darrick J. Wong
2014-03-11  6:57 ` [PATCH 29/49] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2014-03-11  6:57 ` [PATCH 30/49] libext2fs: file IO routines should handle uninit blocks Darrick J. Wong
2014-03-11  6:57 ` [PATCH 31/49] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
2014-03-11  6:57 ` [PATCH 32/49] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
2014-03-11  6:57 ` [PATCH 33/49] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
2014-03-11  6:57 ` [PATCH 34/49] libext2fs: have UNIX IO manager use pread/pwrite Darrick J. Wong
2014-03-11  6:57 ` [PATCH 35/49] ext2fs: add readahead method to improve scanning Darrick J. Wong
2014-03-17 22:07   ` Andreas Dilger
2014-03-11  6:57 ` [PATCH 36/49] libext2fs: allow clients to read-ahead metadata Darrick J. Wong
2014-03-17 23:11   ` Andreas Dilger
2014-03-11  6:57 ` [PATCH 37/49] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2014-03-17 23:10   ` Andreas Dilger
2014-03-18  4:42     ` Darrick J. Wong
2014-03-18  6:50       ` Darrick J. Wong
2014-03-11  6:58 ` [PATCH 38/49] libext2fs: when appending to a file, don't split an index block in equal halves Darrick J. Wong
2014-03-11  6:58 ` [PATCH 39/49] libext2fs: find inode goal when allocating blocks Darrick J. Wong
2014-03-11  6:58 ` [PATCH 40/49] libext2fs: find a range of empty blocks Darrick J. Wong
2014-03-11  6:58 ` [PATCH 41/49] libext2fs: provide a function to set inode size Darrick J. Wong
2014-03-11  6:58 ` [PATCH 42/49] libext2fs: implement fallocate Darrick J. Wong
2014-03-11  6:58 ` [PATCH 44/49] fuse2fs: translate ACL structures Darrick J. Wong
2014-03-11  6:58 ` [PATCH 45/49] fuse2fs: handle 64-bit dates correctly Darrick J. Wong
2014-03-11  6:58 ` [PATCH 46/49] fuse2fs: implement fallocate Darrick J. Wong
2014-03-11  6:59 ` [PATCH 48/49] tests: enable using fuse2fs with metadata checksum test Darrick J. Wong
2014-03-11  6:59 ` [PATCH 49/49] tests: test date handling Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.