All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
@ 2013-10-18  4:48 Darrick J. Wong
  2013-10-18  4:49 ` [PATCH 01/25] libext2fs: stop iterating dirents when done linking Darrick J. Wong
                   ` (25 more replies)
  0 siblings, 26 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:48 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Well, here we go again.  This is the same patchbomb from a couple of
weeks ago, minus the patches that Ted has already accepted, plus
several fixes to resize2fs that weren't ready back then, and a few
other fixes that migrated into the other patches.  This series is
against -next.

Ted, since you've accepted patches into -pu, do you want me to send
patches against -pu as well?  Or put more bluntly, what are your
thoughts about revert-and-replace of patches in -pu?  Patches 2, 6,
11, 23, and 24 have changed significantly since 9/30.

The first eight patches fix miscellaneous errors: #1 stops dirent
iteration after we successfully link an inode into a directory.  #2
fixes a bug that prohibited us from specifyinng a 64bit superblock
number when opening an FS.  #3 prohibits running mke2fs with -E
resize= and meta_bg.  #4 causes users of the badblocks code to reject
64bit block numbers.  #5 fixes shift overflows errors when punching
the end of non-extent files.  #6 refactors all the tests for whether
or not we need to set the LARGE_FILE feature (because someone goofed
earlier).  #7 fixes a problem wherein mkfs ignored non-4096 blocksize
directives in the config file on a device larger than 2^32KB.  #8
cleans up some code in debugfs.

The next two patches fix some 64bit truncation bugs.

Regarding next five patches, I turned on bigalloc and found a number
of bugs relating to the fact that block_alloc_stats2() takes a block
number but operates on clusters.  I've fixed up all the allocation
errors that I found.  I also decided to make the quota code use
ext2fs_punch rather than try to correct its behavior wrt bigalloc.
There was also a bug wherein the requirement that 64-bit bitmaps be
enabled (via EXT2_FLAG_64BITS) for bigalloc filesystems.  There's also
a patch to reduce the e2fsck output verbosity when there are block
bitmap errors.  Note that #11 has been refactored significantly.

The next patch provides the ability to toggle the 64bit feature on any
ext4 filesystem.

The four patches after that fix various resize2fs bugs with bigalloc.

The next two patches fix bugs with metadata_csum.  There's a patch to
fix up some code to test if checksums are enabled instead of a
GDT_CSUM open-code.  Finally, there's a patch to resize2fs to rewrite
checksums of inodes that were relocated.

The next two patches add the ability to edit extended attributes and
add a fuse2fs driver for e2fsprogs.  I admit that the xattr editing
functions clash with the inline_data patches, though sadly, the inline
data patches don't provide an API to access EAs in a separate EA
block.  The fuse driver should work with the latest versions of Linux
fuse (2.9.2) and osxfuse (2.6.1).  I've been using the fuse driver to
test e2fsprogs functionality, which is how I came across most of the
bugs fixed above.  Both of these patches (#23 and #24) have received
fixes since the 9/30 posting.

The final patch adds my metadata checksum test program to the tests/
directory, along with a new metadata_check target to run a quick
check.  It includes substitute mount/umount commands for use with
fuse2fs.

For fuse2fs, I think it'd be useful to reintroduce journal replay too.
(Or cheat and call e2fsck -E journal_only...)  Also, fuse2fs doesn't
yet know how to read or write ACLs yet.

I've tested these e2fsprogs changes against the -next branch as of a
few days ago.  These days, I use a 2GB ramdisk and a 20T "disk" I
constructed out of dm-snapshot to test in an x64 VM.

Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 01/25] libext2fs: stop iterating dirents when done linking
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-10-23 23:39   ` Theodore Ts'o
  2013-10-18  4:49 ` [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter Darrick J. Wong
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we've succesfully linked an inode into a directory, we can stop
iterating the directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/link.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/lib/ext2fs/link.c b/lib/ext2fs/link.c
index 2a44575..09e6cb4 100644
--- a/lib/ext2fs/link.c
+++ b/lib/ext2fs/link.c
@@ -45,7 +45,7 @@ static int link_proc(struct ext2_dir_entry *dirent,
 	struct ext2_dir_entry_tail *t;
 
 	if (ls->done)
-		return 0;
+		return DIRENT_ABORT;
 
 	rec_len = EXT2_DIR_REC_LEN(ls->namelen);
 


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
  2013-10-18  4:49 ` [PATCH 01/25] libext2fs: stop iterating dirents when done linking Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-10-18 18:32   ` Darrick J. Wong
  2013-10-18  4:49 ` [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set Darrick J. Wong
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Since it's possible for very large filesystems to store backup
superblocks at very large (> 2^32) block numbers, we need to be able
to handle the case of a caller directing us to read one of these
high-numbered backups.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c   |    4 ++--
 e2fsck/journal.c    |    6 +++---
 e2fsck/unix.c       |    8 ++++----
 lib/ext2fs/ext2fs.h |    4 ++++
 lib/ext2fs/openfs.c |   21 +++++++++++++++------
 misc/dumpe2fs.c     |    4 ++--
 6 files changed, 30 insertions(+), 17 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 8c32eff..4f6108d 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -94,8 +94,8 @@ static void open_filesystem(char *device, int open_flags, blk64_t superblock,
 	if (catastrophic)
 		open_flags |= EXT2_FLAG_SKIP_MMP;
 
-	retval = ext2fs_open(device, open_flags, superblock, blocksize,
-			     unix_io_manager, &current_fs);
+	retval = ext2fs_open3(device, NULL, open_flags, superblock, blocksize,
+			      unix_io_manager, &current_fs);
 	if (retval) {
 		com_err(device, retval, "while opening filesystem");
 		current_fs = NULL;
diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index 2509303..af35a38 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -967,9 +967,9 @@ int e2fsck_run_ext3_journal(e2fsck_t ctx)
 
 	ext2fs_mmp_stop(ctx->fs);
 	ext2fs_free(ctx->fs);
-	retval = ext2fs_open(ctx->filesystem_name, EXT2_FLAG_RW,
-			     ctx->superblock, blocksize, io_ptr,
-			     &ctx->fs);
+	retval = ext2fs_open3(ctx->filesystem_name, NULL, EXT2_FLAG_RW,
+			      ctx->superblock, blocksize, io_ptr,
+			      &ctx->fs);
 	if (retval) {
 		com_err(ctx->program_name, retval,
 			_("while trying to re-open %s"),
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 0546653..fb41ca0 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1040,7 +1040,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
 
 	*ret_fs = NULL;
 	if (ctx->superblock && ctx->blocksize) {
-		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
+		retval = ext2fs_open3(ctx->filesystem_name, ctx->io_options,
 				      flags, ctx->superblock, ctx->blocksize,
 				      io_ptr, ret_fs);
 	} else if (ctx->superblock) {
@@ -1051,7 +1051,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
 				ext2fs_free(*ret_fs);
 				*ret_fs = NULL;
 			}
-			retval = ext2fs_open2(ctx->filesystem_name,
+			retval = ext2fs_open3(ctx->filesystem_name,
 					      ctx->io_options, flags,
 					      ctx->superblock, blocksize,
 					      io_ptr, ret_fs);
@@ -1059,7 +1059,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
 				break;
 		}
 	} else
-		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
+		retval = ext2fs_open3(ctx->filesystem_name, ctx->io_options,
 				      flags, 0, 0, io_ptr, ret_fs);
 
 	if (retval == 0)
@@ -1375,7 +1375,7 @@ failure:
 	 * don't need to update the mount count and last checked
 	 * fields in the backup superblock (the kernel doesn't update
 	 * the backup superblocks anyway).  With newer versions of the
-	 * library this flag is set by ext2fs_open2(), but we set this
+	 * library this flag is set by ext2fs_open3(), but we set this
 	 * here just to be sure.  (No, we don't support e2fsck running
 	 * with some other libext2fs than the one that it was shipped
 	 * with, but just in case....)
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 67876ad..1ef4d67 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1443,6 +1443,10 @@ extern errcode_t ext2fs_open2(const char *name, const char *io_options,
 			      int flags, int superblock,
 			      unsigned int block_size, io_manager manager,
 			      ext2_filsys *ret_fs);
+extern errcode_t ext2fs_open3(const char *name, const char *io_options,
+			      int flags, blk64_t superblock,
+			      unsigned int block_size, io_manager manager,
+			      ext2_filsys *ret_fs);
 extern blk64_t ext2fs_descriptor_block_loc2(ext2_filsys fs,
 					blk64_t group_block, dgrp_t i);
 extern blk_t ext2fs_descriptor_block_loc(ext2_filsys fs, blk_t group_block,
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index 2ad9114..b046d6c 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -76,6 +76,15 @@ errcode_t ext2fs_open(const char *name, int flags, int superblock,
 			    manager, ret_fs);
 }
 
+errcode_t ext2fs_open2(const char *name, const char *io_options,
+		       int flags, int superblock,
+		       unsigned int block_size, io_manager manager,
+		       ext2_filsys *ret_fs)
+{
+	return ext2fs_open3(name, io_options, flags, superblock, block_size,
+			    manager, ret_fs);
+}
+
 /*
  *  Note: if superblock is non-zero, block-size must also be non-zero.
  * 	Superblock and block_size can be zero to use the default size.
@@ -90,8 +99,8 @@ errcode_t ext2fs_open(const char *name, int flags, int superblock,
  *	EXT2_FLAG_64BITS - Allow 64-bit bitfields (needed for large
  *				filesystems)
  */
-errcode_t ext2fs_open2(const char *name, const char *io_options,
-		       int flags, int superblock,
+errcode_t ext2fs_open3(const char *name, const char *io_options,
+		       int flags, blk64_t superblock,
 		       unsigned int block_size, io_manager manager,
 		       ext2_filsys *ret_fs)
 {
@@ -189,8 +198,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
 		if (retval)
 			goto cleanup;
 	}
-	retval = io_channel_read_blk(fs->io, superblock, -SUPERBLOCK_SIZE,
-				     fs->super);
+	retval = io_channel_read_blk64(fs->io, superblock, -SUPERBLOCK_SIZE,
+				       fs->super);
 	if (retval)
 		goto cleanup;
 	if (fs->orig_super)
@@ -380,8 +389,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
 	else
 		first_meta_bg = fs->desc_blocks;
 	if (first_meta_bg) {
-		retval = io_channel_read_blk(fs->io, group_block+1,
-					     first_meta_bg, dest);
+		retval = io_channel_read_blk64(fs->io, group_block+1,
+					       first_meta_bg, dest);
 		if (retval)
 			goto cleanup;
 #ifdef WORDS_BIGENDIAN
diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
index ae70f70..b139977 100644
--- a/misc/dumpe2fs.c
+++ b/misc/dumpe2fs.c
@@ -611,7 +611,7 @@ int main (int argc, char ** argv)
 		for (use_blocksize = EXT2_MIN_BLOCK_SIZE;
 		     use_blocksize <= EXT2_MAX_BLOCK_SIZE;
 		     use_blocksize *= 2) {
-			retval = ext2fs_open (device_name, flags,
+			retval = ext2fs_open3(device_name, NULL, flags,
 					      use_superblock,
 					      use_blocksize, unix_io_manager,
 					      &fs);
@@ -619,7 +619,7 @@ int main (int argc, char ** argv)
 				break;
 		}
 	} else
-		retval = ext2fs_open (device_name, flags, use_superblock,
+		retval = ext2fs_open3(device_name, NULL, flags, use_superblock,
 				      use_blocksize, unix_io_manager, &fs);
 	if (retval) {
 		com_err (program_name, retval, _("while trying to open %s"),


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
  2013-10-18  4:49 ` [PATCH 01/25] libext2fs: stop iterating dirents when done linking Darrick J. Wong
  2013-10-18  4:49 ` [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-10-23 15:08   ` Lukáš Czerner
  2013-10-23 23:40   ` Theodore Ts'o
  2013-10-18  4:49 ` [PATCH 04/25] libext2fs: reject 64bit badblocks numbers Darrick J. Wong
                   ` (22 subsequent siblings)
  25 siblings, 2 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Passing the "-E resize=NNN" option to mke2fs sets the resize_inode
feature.  However, resize_inode and meta_bg are mutually exclusive
(and the feature flag parser enforces this); therefore, we shouldn't
allow resize_inode to sneak in the back door like this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.c |   11 +++++++++++
 1 file changed, 11 insertions(+)


diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 6e709b9..ce3c696 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -2448,6 +2448,17 @@ int main (int argc, char *argv[])
 	}
 	fs->progress_ops = &ext2fs_numeric_progress_ops;
 
+	/* We can't have resize_inode sneak in via resize= on a meta_bg fs. */
+	if (!quiet &&
+	    EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				      EXT2_FEATURE_INCOMPAT_META_BG) &&
+	    fs->super->s_reserved_gdt_blocks > 0) {
+		printf(_("Reserving GDT blocks (resize_inode) is not possible "
+			 "with the meta_bg feature.\nThey cannot be enabled "
+			 "simultaneously.\n"));
+		exit(1);
+	}
+
 	/* Check the user's mkfs options for metadata checksumming */
 	if (!quiet &&
 	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 04/25] libext2fs: reject 64bit badblocks numbers
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-10-23 15:24   ` Lukáš Czerner
  2013-10-18  4:49 ` [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks Darrick J. Wong
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Don't accept block numbers larger than 2^32 for the badblocks list,
and don't run badblocks on them either.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/read_bb_file.c |    7 +++++--
 misc/badblocks.c          |   17 ++++++++++++++++-
 2 files changed, 21 insertions(+), 3 deletions(-)


diff --git a/lib/ext2fs/read_bb_file.c b/lib/ext2fs/read_bb_file.c
index 7d7bb7a..4a498d2 100644
--- a/lib/ext2fs/read_bb_file.c
+++ b/lib/ext2fs/read_bb_file.c
@@ -39,7 +39,7 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
 					       void *priv_data))
 {
 	errcode_t	retval;
-	blk_t		blockno;
+	blk64_t		blockno;
 	int		count;
 	char		buf[128];
 
@@ -55,9 +55,12 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
 	while (!feof (f)) {
 		if (fgets(buf, sizeof(buf), f) == NULL)
 			break;
-		count = sscanf(buf, "%u", &blockno);
+		count = sscanf(buf, "%llu", &blockno);
 		if (count <= 0)
 			continue;
+		/* Badblocks isn't going to be updated for 64bit */
+		if (blockno > 1ULL << 32)
+			return EOVERFLOW;
 		if (fs &&
 		    ((blockno < fs->super->s_first_data_block) ||
 		     (blockno >= ext2fs_blocks_count(fs->super)))) {
diff --git a/misc/badblocks.c b/misc/badblocks.c
index c9e47c7..802080c 100644
--- a/misc/badblocks.c
+++ b/misc/badblocks.c
@@ -1047,6 +1047,7 @@ int main (int argc, char ** argv)
 				  unsigned int);
 	int open_flag;
 	long sysval;
+	blk64_t inblk;
 
 	setbuf(stdout, NULL);
 	setbuf(stderr, NULL);
@@ -1204,6 +1205,13 @@ int main (int argc, char ** argv)
 		     (unsigned long) first_block, (unsigned long) last_block);
 	    exit (1);
 	}
+	/* ext2 badblocks file can't handle large values */
+	if ((blk64_t)last_block >= 1ULL << 32) {
+		com_err(program_name, EOVERFLOW,
+			_("invalid end block (%lu): must be less than %llu"),
+			(unsigned long)last_block, 1ULL << 32);
+		exit(1);
+	}
 	if (w_flag)
 		check_mount(device_name);
 
@@ -1262,13 +1270,20 @@ int main (int argc, char ** argv)
 
 	if (in) {
 		for(;;) {
-			switch(fscanf (in, "%u\n", &next_bad)) {
+			switch (fscanf(in, "%llu\n", &inblk)) {
 				case 0:
 					com_err (program_name, 0, "input file - bad format");
 					exit (1);
 				case EOF:
 					break;
 				default:
+					if (inblk > 1ULL << 32) {
+						com_err(program_name,
+							EOVERFLOW,
+							_("while adding to in-memory bad block list"));
+						exit(1);
+					}
+					next_bad = inblk;
 					errcode = ext2fs_badblocks_list_add(bb_list,next_bad);
 					if (errcode) {
 						com_err (program_name, errcode, _("while adding to in-memory bad block list"));


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 04/25] libext2fs: reject 64bit badblocks numbers Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-10-24  0:08   ` Theodore Ts'o
  2013-10-18  4:49 ` [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE Darrick J. Wong
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

On a FS with a rather large blockize (> 4K), the old block map
structure can construct a fat enough "tree" (or whatever we call that
lopsided thing) that (at least in theory) one could create mappings
for logical blocks higher than 32 bits.  In practice this doesn't
happen, but the 'max' and 'iter' variables that the punch helpers use
will overflow because the BLOCK_SIZE_BITS shifts are too large to fit
a 32-bit variable.  This causes punch to fail on TIND-mapped blocks
even if the file is < 16T.  So enlarge the fields to fit.

(Yes this is an obscure corner case...)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/punch.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)


diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 4471f46..790a0ad8 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -50,15 +50,16 @@ static errcode_t ind_punch(ext2_filsys fs, struct ext2_inode *inode,
 			   blk_t start, blk_t count, int max)
 {
 	errcode_t	retval;
-	blk_t		b, offset;
-	int		i, incr;
+	blk_t		b;
+	int		i;
+	blk64_t		offset, incr;
 	int		freed = 0;
 
 #ifdef PUNCH_DEBUG
 	printf("Entering ind_punch, level %d, start %u, count %u, "
 	       "max %d\n", level, start, count, max);
 #endif
-	incr = 1 << ((EXT2_BLOCK_SIZE_BITS(fs->super)-2)*level);
+	incr = 1ULL << ((EXT2_BLOCK_SIZE_BITS(fs->super)-2)*level);
 	for (i=0, offset=0; i < max; i++, p++, offset += incr) {
 		if (offset >= start + count)
 			break;
@@ -87,7 +88,7 @@ static errcode_t ind_punch(ext2_filsys fs, struct ext2_inode *inode,
 				continue;
 		}
 #ifdef PUNCH_DEBUG
-		printf("Freeing block %u (offset %d)\n", b, offset);
+		printf("Freeing block %u (offset %llu)\n", b, offset);
 #endif
 		ext2fs_block_alloc_stats(fs, b, -1);
 		*p = 0;
@@ -108,7 +109,7 @@ static errcode_t ext2fs_punch_ind(ext2_filsys fs, struct ext2_inode *inode,
 	int			num = EXT2_NDIR_BLOCKS;
 	blk_t			*bp = inode->i_block;
 	blk_t			addr_per_block;
-	blk_t			max = EXT2_NDIR_BLOCKS;
+	blk64_t			max = EXT2_NDIR_BLOCKS;
 
 	if (!block_buf) {
 		retval = ext2fs_get_array(3, fs->blocksize, &buf);
@@ -119,10 +120,10 @@ static errcode_t ext2fs_punch_ind(ext2_filsys fs, struct ext2_inode *inode,
 
 	addr_per_block = (blk_t) fs->blocksize >> 2;
 
-	for (level=0; level < 4; level++, max *= addr_per_block) {
+	for (level = 0; level < 4; level++, max *= (blk64_t)addr_per_block) {
 #ifdef PUNCH_DEBUG
 		printf("Main loop level %d, start %u count %u "
-		       "max %d num %d\n", level, start, count, max, num);
+		       "max %llu num %d\n", level, start, count, max, num);
 #endif
 		if (start < max) {
 			retval = ind_punch(fs, inode, block_buf, bp, level,


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-11-25  7:09   ` Zheng Liu
  2013-10-18  4:49 ` [PATCH 07/25] mke2fs: load configfile blocksize setting before 64bit checks Darrick J. Wong
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

For each site where we test for a large file (> 2GB) and set the
LARGE_FILE feature, use a helper function to make the size test
consistent with the test that's in e2fsck.  This fixes the fsck
complaints when we try to create a 2GB journal (not so hard with 64k
block size) and fixes the incorrect test in fileio.c.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass1.c         |    3 ++-
 lib/ext2fs/ext2fs.h    |    6 ++++++
 lib/ext2fs/fileio.c    |    2 +-
 lib/ext2fs/mkjournal.c |    2 +-
 4 files changed, 10 insertions(+), 3 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index ab23e42..8c18a93 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2281,7 +2281,8 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 		}
 		pctx->num = 0;
 	}
-	if (LINUX_S_ISREG(inode->i_mode) && EXT2_I_SIZE(inode) >= 0x80000000UL)
+	if (LINUX_S_ISREG(inode->i_mode) &&
+	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(inode)))
 		ctx->large_files++;
 	if ((pb.num_blocks != ext2fs_inode_i_blocks(fs, inode)) ||
 	    ((fs->super->s_feature_ro_compat &
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 1ef4d67..8f82dae 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -646,6 +646,12 @@ static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
 			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
 }
 
+/* The LARGE_FILE feature should be set if we have stored files 2GB+ in size */
+static inline int ext2fs_needs_large_file_feature(unsigned long long file_size)
+{
+	return file_size >= 0x80000000ULL;
+}
+
 /* alloc.c */
 extern errcode_t ext2fs_new_inode(ext2_filsys fs, ext2_ino_t dir, int mode,
 				  ext2fs_inode_bitmap map, ext2_ino_t *ret);
diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 02e6263..6b213b5 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -400,7 +400,7 @@ errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size)
 
 	/* If we're writing a large file, set the large_file flag */
 	if (LINUX_S_ISREG(file->inode.i_mode) &&
-	    EXT2_I_SIZE(&file->inode) > 0x7FFFFFFULL &&
+	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&file->inode)) &&
 	    (!EXT2_HAS_RO_COMPAT_FEATURE(file->fs->super,
 					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
 	     file->fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index c636a97..2afd3b7 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -378,7 +378,7 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
 	inode_size = (unsigned long long)fs->blocksize * num_blocks;
 	inode.i_size = inode_size & 0xFFFFFFFF;
 	inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
-	if (inode.i_size_high)
+	if (ext2fs_needs_large_file_feature(inode_size))
 		fs->super->s_feature_ro_compat |=
 			EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
 	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 07/25] mke2fs: load configfile blocksize setting before 64bit checks
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (5 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-11-25  8:01   ` Zheng Liu
  2013-10-18  4:49 ` [PATCH 08/25] debugfs: fix various minor bogosity Darrick J. Wong
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

mke2fs has a series of checks to ensure that we don't create a
filesystem too big for its blocksize -- if auto-64bit is on, then it
turns on 64bit; otherwise it complains.  Unfortunately, it performs
these checks before looking in mke2fs.conf for a blocksize, which
means that the checks are incorrect if the user specifies a non-4096
blocksize in the config file and says nothing on the command line.  It
also has the effect of mandating a 4k block size on any block device
larger than 4T in that situation.  Therefore, read the block size from
the config file before performing the 64bit checks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/mke2fs.c |  132 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 70 insertions(+), 62 deletions(-)


diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index ce3c696..86091d7 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1298,6 +1298,21 @@ static void PRS(int argc, char *argv[])
 	char *		fs_type = 0;
 	char *		usage_types = 0;
 	blk64_t		dev_size;
+	/*
+	 * NOTE: A few words about fs_blocks_count and blocksize:
+	 *
+	 * Initially, blocksize is set to zero, which implies 1024.
+	 * If -b is specified, blocksize is updated to the user's value.
+	 *
+	 * Next, the device size or the user's "blocks" command line argument
+	 * is used to set fs_blocks_count; the units are blocksize.
+	 *
+	 * Later, if blocksize hasn't been set and the profile specifies a
+	 * blocksize, then blocksize is updated and fs_blocks_count is scaled
+	 * appropriately.  Note the change in units!
+	 *
+	 * Finally, we complain about fs_blocks_count > 2^32 on a non-64bit fs.
+	 */
 	blk64_t		fs_blocks_count = 0;
 #ifdef __linux__
 	struct 		utsname ut;
@@ -1780,15 +1795,65 @@ profile_error:
 		}
 	}
 
+	/* Get the hardware sector sizes, if available */
+	retval = ext2fs_get_device_sectsize(device_name, &lsector_size);
+	if (retval) {
+		com_err(program_name, retval,
+			_("while trying to determine hardware sector size"));
+		exit(1);
+	}
+	retval = ext2fs_get_device_phys_sectsize(device_name, &psector_size);
+	if (retval) {
+		com_err(program_name, retval,
+			_("while trying to determine physical sector size"));
+		exit(1);
+	}
+
+	if ((tmp = getenv("MKE2FS_DEVICE_SECTSIZE")) != NULL)
+		lsector_size = atoi(tmp);
+	if ((tmp = getenv("MKE2FS_DEVICE_PHYS_SECTSIZE")) != NULL)
+		psector_size = atoi(tmp);
+
+	/* Older kernels may not have physical/logical distinction */
+	if (!psector_size)
+		psector_size = lsector_size;
+
+	if (blocksize <= 0) {
+		use_bsize = get_int_from_profile(fs_types, "blocksize", 4096);
+
+		if (use_bsize == -1) {
+			use_bsize = sys_page_size;
+			if ((linux_version_code < (2*65536 + 6*256)) &&
+			    (use_bsize > 4096))
+				use_bsize = 4096;
+		}
+		if (lsector_size && use_bsize < lsector_size)
+			use_bsize = lsector_size;
+		if ((blocksize < 0) && (use_bsize < (-blocksize)))
+			use_bsize = -blocksize;
+		blocksize = use_bsize;
+		fs_blocks_count /= (blocksize / 1024);
+	} else {
+		if (blocksize < lsector_size) {			/* Impossible */
+			com_err(program_name, EINVAL,
+				_("while setting blocksize; too small "
+				  "for device\n"));
+			exit(1);
+		} else if ((blocksize < psector_size) &&
+			   (psector_size <= sys_page_size)) {	/* Suboptimal */
+			fprintf(stderr, _("Warning: specified blocksize %d is "
+				"less than device physical sectorsize %d\n"),
+				blocksize, psector_size);
+		}
+	}
+
+	fs_param.s_log_block_size =
+		int_log2(blocksize >> EXT2_MIN_BLOCK_LOG_SIZE);
+
 	/*
 	 * We now need to do a sanity check of fs_blocks_count for
 	 * 32-bit vs 64-bit block number support.
 	 */
-	if ((fs_blocks_count > MAX_32_NUM) && (blocksize == 0)) {
-		fs_blocks_count /= 4; /* Try using a 4k blocksize */
-		blocksize = 4096;
-		fs_param.s_log_block_size = 2;
-	}
 	if ((fs_blocks_count > MAX_32_NUM) &&
 	    !(fs_param.s_feature_incompat & EXT4_FEATURE_INCOMPAT_64BIT) &&
 	    get_bool_from_profile(fs_types, "auto_64-bit_support", 0)) {
@@ -1889,63 +1954,6 @@ profile_error:
 	if ((fs_param.s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) &&
 	    ((tmp = getenv("MKE2FS_FIRST_META_BG"))))
 		fs_param.s_first_meta_bg = atoi(tmp);
-
-	/* Get the hardware sector sizes, if available */
-	retval = ext2fs_get_device_sectsize(device_name, &lsector_size);
-	if (retval) {
-		com_err(program_name, retval,
-			_("while trying to determine hardware sector size"));
-		exit(1);
-	}
-	retval = ext2fs_get_device_phys_sectsize(device_name, &psector_size);
-	if (retval) {
-		com_err(program_name, retval,
-			_("while trying to determine physical sector size"));
-		exit(1);
-	}
-
-	if ((tmp = getenv("MKE2FS_DEVICE_SECTSIZE")) != NULL)
-		lsector_size = atoi(tmp);
-	if ((tmp = getenv("MKE2FS_DEVICE_PHYS_SECTSIZE")) != NULL)
-		psector_size = atoi(tmp);
-
-	/* Older kernels may not have physical/logical distinction */
-	if (!psector_size)
-		psector_size = lsector_size;
-
-	if (blocksize <= 0) {
-		use_bsize = get_int_from_profile(fs_types, "blocksize", 4096);
-
-		if (use_bsize == -1) {
-			use_bsize = sys_page_size;
-			if ((linux_version_code < (2*65536 + 6*256)) &&
-			    (use_bsize > 4096))
-				use_bsize = 4096;
-		}
-		if (lsector_size && use_bsize < lsector_size)
-			use_bsize = lsector_size;
-		if ((blocksize < 0) && (use_bsize < (-blocksize)))
-			use_bsize = -blocksize;
-		blocksize = use_bsize;
-		ext2fs_blocks_count_set(&fs_param,
-					ext2fs_blocks_count(&fs_param) /
-					(blocksize / 1024));
-	} else {
-		if (blocksize < lsector_size) {			/* Impossible */
-			com_err(program_name, EINVAL,
-				_("while setting blocksize; too small "
-				  "for device\n"));
-			exit(1);
-		} else if ((blocksize < psector_size) &&
-			   (psector_size <= sys_page_size)) {	/* Suboptimal */
-			fprintf(stderr, _("Warning: specified blocksize %d is "
-				"less than device physical sectorsize %d\n"),
-				blocksize, psector_size);
-		}
-	}

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 08/25] debugfs: fix various minor bogosity
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (6 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 07/25] mke2fs: load configfile blocksize setting before 64bit checks Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-11-25  8:08   ` Zheng Liu
  2013-10-18  4:49 ` [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses Darrick J. Wong
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: Darren Hart, linux-ext4, Robert Yang

We should really use the ext2fs memory allocator functions in
copy_file(), and we really should return a value if there's allocation
problems.

Also fix up a minor bogosity in an error message.

Cc: Robert Yang <liezhi.yang@windriver.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 4f6108d..d3db356 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -1601,9 +1601,10 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
 	if (retval)
 		return retval;
 
-	if (!(buf = (char *) malloc(bufsize))){
-		com_err("copy_file", errno, "can't allocate buffer\n");
-		return;
+	retval = ext2fs_get_mem(bufsize, &buf);
+	if (retval) {
+		com_err("copy_file", retval, "can't allocate buffer\n");
+		return retval;
 	}
 
 	/* This is used for checking whether the whole block is zero */
@@ -1654,7 +1655,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
 	return retval;
 
 fail:
-	free(buf);
+	ext2fs_free_mem(&buf);
 	ext2fs_free_mem(&zero_buf);
 	(void) ext2fs_file_close(e2_file);
 	return retval;
@@ -2112,7 +2113,7 @@ void do_bmap(int argc, char *argv[])
 
 	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
 	if (errcode) {
-		com_err("argv[0]", errcode,
+		com_err(argv[0], errcode,
 			"while mapping logical block %llu\n", blk);
 		return;
 	}


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (7 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 08/25] debugfs: fix various minor bogosity Darrick J. Wong
@ 2013-10-18  4:49 ` Darrick J. Wong
  2013-10-18 18:37   ` Darrick J. Wong
  2013-10-18  4:50 ` [PATCH 10/25] debugfs: handle 64bit block numbers Darrick J. Wong
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:49 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

The extended attribute refcounting code only accepts blk_t, which is
dangerous because EA blocks can exist at high addresses (> 2^32) as
well.  Therefore, widen the block fields to 64 bits.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/e2fsck.h      |   12 ++++++------
 e2fsck/ea_refcount.c |   36 ++++++++++++++++++------------------
 2 files changed, 24 insertions(+), 24 deletions(-)


diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 13d70f1..f1df525 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -432,17 +432,17 @@ extern struct dx_dir_info *e2fsck_dx_dir_info_iter(e2fsck_t ctx, int *control);
 /* ea_refcount.c */
 extern errcode_t ea_refcount_create(int size, ext2_refcount_t *ret);
 extern void ea_refcount_free(ext2_refcount_t refcount);
-extern errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
+extern errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk64_t blk,
 				   int *ret);
 extern errcode_t ea_refcount_increment(ext2_refcount_t refcount,
-				       blk_t blk, int *ret);
+				       blk64_t blk, int *ret);
 extern errcode_t ea_refcount_decrement(ext2_refcount_t refcount,
-				       blk_t blk, int *ret);
+				       blk64_t blk, int *ret);
 extern errcode_t ea_refcount_store(ext2_refcount_t refcount,
-				   blk_t blk, int count);
-extern blk_t ext2fs_get_refcount_size(ext2_refcount_t refcount);
+				   blk64_t blk, int count);
+extern blk64_t ext2fs_get_refcount_size(ext2_refcount_t refcount);
 extern void ea_refcount_intr_begin(ext2_refcount_t refcount);
-extern blk_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
+extern blk64_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
 
 /* ehandler.c */
 extern const char *ehandler_operation(const char *op);
diff --git a/e2fsck/ea_refcount.c b/e2fsck/ea_refcount.c
index e66e636..6f376a3 100644
--- a/e2fsck/ea_refcount.c
+++ b/e2fsck/ea_refcount.c
@@ -25,14 +25,14 @@
  * checked, its bit is set in the block_ea_map bitmap.
  */
 struct ea_refcount_el {
-	blk_t	ea_blk;
+	blk64_t	ea_blk;
 	int	ea_count;
 };
 
 struct ea_refcount {
-	blk_t		count;
-	blk_t		size;
-	blk_t		cursor;
+	unsigned long		count;
+	unsigned long		size;
+	unsigned long		cursor;
 	struct ea_refcount_el	*list;
 };
 
@@ -111,11 +111,11 @@ static void refcount_collapse(ext2_refcount_t refcount)
  * 	specified position.
  */
 static struct ea_refcount_el *insert_refcount_el(ext2_refcount_t refcount,
-						 blk_t blk, int pos)
+						 blk64_t blk, int pos)
 {
 	struct ea_refcount_el 	*el;
 	errcode_t		retval;
-	blk_t			new_size = 0;
+	blk64_t			new_size = 0;
 	int			num;
 
 	if (refcount->count >= refcount->size) {
@@ -153,7 +153,7 @@ static struct ea_refcount_el *insert_refcount_el(ext2_refcount_t refcount,
  * 	and we can't find an entry, create one in the sorted list.
  */
 static struct ea_refcount_el *get_refcount_el(ext2_refcount_t refcount,
-					      blk_t blk, int create)
+					      blk64_t blk, int create)
 {
 	int	low, high, mid;
 
@@ -206,7 +206,7 @@ retry:
 	return 0;
 }
 
-errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
+errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk64_t blk,
 				int *ret)
 {
 	struct ea_refcount_el	*el;
@@ -220,7 +220,7 @@ errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
 	return 0;
 }
 
-errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk_t blk, int *ret)
+errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk64_t blk, int *ret)
 {
 	struct ea_refcount_el	*el;
 
@@ -234,7 +234,7 @@ errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk_t blk, int *ret)
 	return 0;
 }
 
-errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk_t blk, int *ret)
+errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk64_t blk, int *ret)
 {
 	struct ea_refcount_el	*el;
 
@@ -249,7 +249,7 @@ errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk_t blk, int *ret)
 	return 0;
 }
 
-errcode_t ea_refcount_store(ext2_refcount_t refcount, blk_t blk, int count)
+errcode_t ea_refcount_store(ext2_refcount_t refcount, blk64_t blk, int count)
 {
 	struct ea_refcount_el	*el;
 
@@ -263,7 +263,7 @@ errcode_t ea_refcount_store(ext2_refcount_t refcount, blk_t blk, int count)
 	return 0;
 }
 
-blk_t ext2fs_get_refcount_size(ext2_refcount_t refcount)
+blk64_t ext2fs_get_refcount_size(ext2_refcount_t refcount)
 {
 	if (!refcount)
 		return 0;
@@ -277,7 +277,7 @@ void ea_refcount_intr_begin(ext2_refcount_t refcount)
 }
 
 
-blk_t ea_refcount_intr_next(ext2_refcount_t refcount,
+blk64_t ea_refcount_intr_next(ext2_refcount_t refcount,
 				int *ret)
 {
 	struct ea_refcount_el	*list;
@@ -370,7 +370,7 @@ int main(int argc, char **argv)
 	int	i = 0;
 	ext2_refcount_t refcount;
 	int		size, arg;
-	blk_t		blk;
+	blk64_t		blk;
 	errcode_t	retval;
 
 	while (1) {
@@ -394,7 +394,7 @@ int main(int argc, char **argv)
 			printf("Freeing refcount\n");
 			break;
 		case BCODE_STORE:
-			blk = (blk_t) bcode_program[i++];
+			blk = (blk64_t) bcode_program[i++];
 			arg = bcode_program[i++];
 			printf("Storing blk %u with value %d\n", blk, arg);
 			retval = ea_refcount_store(refcount, blk, arg);
@@ -403,7 +403,7 @@ int main(int argc, char **argv)
 					"while storing blk %u", blk);
 			break;
 		case BCODE_FETCH:
-			blk = (blk_t) bcode_program[i++];
+			blk = (blk64_t) bcode_program[i++];
 			retval = ea_refcount_fetch(refcount, blk, &arg);
 			if (retval)
 				com_err("ea_refcount_fetch", retval,
@@ -413,7 +413,7 @@ int main(int argc, char **argv)
 				       blk, arg);
 			break;
 		case BCODE_INCR:
-			blk = (blk_t) bcode_program[i++];
+			blk = (blk64_t) bcode_program[i++];
 			retval = ea_refcount_increment(refcount, blk, &arg);
 			if (retval)
 				com_err("ea_refcount_increment", retval,
@@ -423,7 +423,7 @@ int main(int argc, char **argv)
 				       blk, arg);
 			break;
 		case BCODE_DECR:
-			blk = (blk_t) bcode_program[i++];
+			blk = (blk64_t) bcode_program[i++];
 			retval = ea_refcount_decrement(refcount, blk, &arg);
 			if (retval)
 				com_err("ea_refcount_decrement", retval,


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 10/25] debugfs: handle 64bit block numbers
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (8 preceding siblings ...)
  2013-10-18  4:49 ` [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-10-18 18:47   ` Darrick J. Wong
  2013-11-25  8:33   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 11/25] libext2fs: only punch complete clusters Darrick J. Wong
                   ` (15 subsequent siblings)
  25 siblings, 2 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

debugfs should use strtoull wrappers for reading block numbers from
the command line.  "unsigned long" isn't wide enough to handle block
numbers on 32bit platforms.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debugfs/debugfs.c      |   33 ++++++++++++++++++++++-----------
 debugfs/extent_inode.c |   22 +++++++++-------------
 debugfs/util.c         |    2 +-
 3 files changed, 32 insertions(+), 25 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index d3db356..46fcd07 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -181,8 +181,7 @@ void do_open_filesys(int argc, char **argv)
 				return;
 			break;
 		case 's':
-			superblock = parse_ulong(optarg, argv[0],
-						 "superblock number", &err);
+			err = strtoblk(argv[0], optarg, &superblock);
 			if (err)
 				return;
 			break;
@@ -277,14 +276,17 @@ void do_init_filesys(int argc, char **argv)
 	struct ext2_super_block param;
 	errcode_t	retval;
 	int		err;
+	blk64_t		blocks;
 
 	if (common_args_process(argc, argv, 3, 3, "initialize",
 				"<device> <blocksize>", CHECK_FS_NOTOPEN))
 		return;
 
 	memset(&param, 0, sizeof(struct ext2_super_block));
-	ext2fs_blocks_count_set(&param, parse_ulong(argv[2], argv[0],
-						    "blocks count", &err));
+	err = strtoblk(argv[0], argv[2], &blocks);
+	if (err)
+		return;
+	ext2fs_blocks_count_set(&param, blocks);
 	if (err)
 		return;
 	retval = ext2fs_initialize(argv[1], 0, &param,
@@ -2109,7 +2111,9 @@ void do_bmap(int argc, char *argv[])
 	ino = string_to_inode(argv[1]);
 	if (!ino)
 		return;
-	blk = parse_ulong(argv[2], argv[0], "logical_block", &err);
+	err = strtoblk(argv[0], argv[2], &blk);
+	if (err)
+		return;
 
 	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
 	if (errcode) {
@@ -2254,10 +2258,14 @@ void do_punch(int argc, char *argv[])
 	ino = string_to_inode(argv[1]);
 	if (!ino)
 		return;
-	start = parse_ulong(argv[2], argv[0], "logical_block", &err);
-	if (argc == 4)
-		end = parse_ulong(argv[3], argv[0], "logical_block", &err);
-	else
+	err = strtoblk(argv[0], argv[2], &start);
+	if (err)
+		return;
+	if (argc == 4) {
+		err = strtoblk(argv[0], argv[3], &end);
+		if (err)
+			return;
+	} else
 		end = ~0;
 
 	errcode = ext2fs_punch(current_fs, ino, 0, 0, start, end);
@@ -2474,8 +2482,11 @@ int main(int argc, char **argv)
 						"block size", 0);
 			break;
 		case 's':
-			superblock = parse_ulong(optarg, argv[0],
-						 "superblock number", 0);
+			retval = strtoblk(argv[0], optarg, &superblock);
+			if (retval) {
+				com_err(argv[0], retval, 0, debug_prog_name);
+				return 1;
+			}
 			break;
 		case 'c':
 			catastrophic = 1;
diff --git a/debugfs/extent_inode.c b/debugfs/extent_inode.c
index 0bbc4c5..75e328c 100644
--- a/debugfs/extent_inode.c
+++ b/debugfs/extent_inode.c
@@ -264,7 +264,7 @@ void do_replace_node(int argc, char *argv[])
 		return;
 	}
 
-	extent.e_lblk = parse_ulong(argv[1], argv[0], "logical block", &err);
+	err = strtoblk(argv[0], argv[1], &extent.e_lblk);
 	if (err)
 		return;
 
@@ -272,7 +272,7 @@ void do_replace_node(int argc, char *argv[])
 	if (err)
 		return;
 
-	extent.e_pblk = parse_ulong(argv[3], argv[0], "logical block", &err);
+	err = strtoblk(argv[0], argv[3], &extent.e_pblk);
 	if (err)
 		return;
 
@@ -338,8 +338,7 @@ void do_insert_node(int argc, char *argv[])
 		return;
 	}
 
-	extent.e_lblk = parse_ulong(argv[1], cmd,
-				    "logical block", &err);
+	err = strtoblk(cmd, argv[1], &extent.e_lblk);
 	if (err)
 		return;
 
@@ -348,8 +347,7 @@ void do_insert_node(int argc, char *argv[])
 	if (err)
 		return;
 
-	extent.e_pblk = parse_ulong(argv[3], cmd,
-				    "pysical block", &err);
+	err = strtoblk(cmd, argv[3], &extent.e_pblk);
 	if (err)
 		return;
 
@@ -366,8 +364,8 @@ void do_set_bmap(int argc, char **argv)
 	const char	*usage = "[--uninit] <lblk> <pblk>";
 	struct ext2fs_extent extent;
 	errcode_t	retval;
-	blk_t		logical;
-	blk_t		physical;
+	blk64_t		logical;
+	blk64_t		physical;
 	char		*cmd = argv[0];
 	int		flags = 0;
 	int		err;
@@ -387,18 +385,16 @@ void do_set_bmap(int argc, char **argv)
 		return;
 	}
 
-	logical = parse_ulong(argv[1], cmd,
-				    "logical block", &err);
+	err = strtoblk(cmd, argv[1], &logical);
 	if (err)
 		return;
 
-	physical = parse_ulong(argv[2], cmd,
-				    "physical block", &err);
+	err = strtoblk(cmd, argv[2], &physical);
 	if (err)
 		return;
 
 	retval = ext2fs_extent_set_bmap(current_handle, logical,
-					(blk64_t) physical, flags);
+					physical, flags);
 	if (retval) {
 		com_err(cmd, retval, 0);
 		return;
diff --git a/debugfs/util.c b/debugfs/util.c
index cf3a6c6..09088e0 100644
--- a/debugfs/util.c
+++ b/debugfs/util.c
@@ -377,7 +377,7 @@ int common_block_args_process(int argc, char *argv[],
 	}
 
 	if (argc > 2) {
-		*count = parse_ulong(argv[2], argv[0], "count", &err);
+		err = strtoblk(argv[0], argv[2], count);
 		if (err)
 			return 1;
 	}


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 11/25] libext2fs: only punch complete clusters
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (9 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 10/25] debugfs: handle 64bit block numbers Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-10-18 18:55   ` Darrick J. Wong
  2013-11-25  8:51   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 12/25] libext2fs: don't update the summary counts when doing implied cluster allocation Darrick J. Wong
                   ` (14 subsequent siblings)
  25 siblings, 2 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When bigalloc is enabled, using ext2fs_block_alloc_stats2() to free
any block in a cluster has the effect of freeing the entire cluster.
This is problematic if a caller instructs us to punch, say, blocks
12-15 of a 16-block cluster, because blocks 0-11 now point to a "free"
cluster.

The naive way to solve this problem is to see if any of the other
blocks in this logical cluster map to a physical cluster.  If so, then
we know that the cluster is still in use and it mustn't be freed.
Otherwise, we are punching the last mapped block in this cluster, so
we can free the cluster.

The implementation given only does the rigorous checks for the partial
clusters at the beginning and end of the punching range.

v2: Refactor the block free code into a separate helper function that
should be more efficient.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/bmap.c   |   29 ++++++++++++++++++
 lib/ext2fs/ext2fs.h |    3 ++
 lib/ext2fs/punch.c  |   82 ++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 109 insertions(+), 5 deletions(-)


diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index 5074587..80f8f86 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -173,6 +173,35 @@ static errcode_t implied_cluster_alloc(ext2_filsys fs, ext2_ino_t ino,
 	return 0;
 }
 
+/* Try to map a logical block to an already-allocated physical cluster. */
+errcode_t ext2fs_map_cluster_block(ext2_filsys fs, ext2_ino_t ino,
+				   struct ext2_inode *inode, blk64_t lblk,
+				   blk64_t *pblk)
+{
+	ext2_extent_handle_t handle;
+	errcode_t retval;
+
+	/* Need bigalloc and extents to be enabled */
+	*pblk = 0;
+	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+					EXT4_FEATURE_RO_COMPAT_BIGALLOC) ||
+	    !(inode->i_flags & EXT4_EXTENTS_FL))
+		return 0;
+
+	retval = ext2fs_extent_open2(fs, ino, inode, &handle);
+	if (retval)
+		goto out;
+
+	retval = implied_cluster_alloc(fs, ino, inode, handle, lblk, pblk);
+	if (retval)
+		goto out2;
+
+out2:
+	ext2fs_extent_free(handle);
+out:
+	return retval;
+}
+
 static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
 			     struct ext2_inode *inode,
 			     ext2_extent_handle_t handle,
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 8f82dae..5247922 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -924,6 +924,9 @@ extern errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino,
 			      struct ext2_inode *inode,
 			      char *block_buf, int bmap_flags, blk64_t block,
 			      int *ret_flags, blk64_t *phys_blk);
+errcode_t ext2fs_map_cluster_block(ext2_filsys fs, ext2_ino_t ino,
+				   struct ext2_inode *inode, blk64_t lblk,
+				   blk64_t *pblk);
 
 #if 0
 /* bmove.c */
diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 790a0ad8..1e4398e 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -177,6 +177,75 @@ static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
 #define dbg_printf(f, a...)		do { } while (0)
 #endif
 
+/* Free a range of blocks, respecting cluster boundaries */
+static errcode_t punch_extent_blocks(ext2_filsys fs, ext2_ino_t ino,
+				     struct ext2_inode *inode,
+				     blk64_t lfree_start, blk64_t free_start,
+				     __u32 free_count, int *freed)
+{
+	blk64_t		pblk;
+	int		freed_now = 0;
+	__u32		cluster_freed;
+	errcode_t	retval = 0;
+
+	/* No bigalloc?  Just free each block. */
+	if (EXT2FS_CLUSTER_RATIO(fs) == 1) {
+		*freed += free_count;
+		while (free_count-- > 0)
+			ext2fs_block_alloc_stats2(fs, free_start++, -1);
+		return retval;
+	}
+
+	/*
+	 * Try to free up to the next cluster boundary.  We assume that all
+	 * blocks in a logical cluster map to blocks from the same physical
+	 * cluster, and that the offsets within the [pl]clusters match.
+	 */
+	if (free_start & EXT2FS_CLUSTER_MASK(fs)) {
+		retval = ext2fs_map_cluster_block(fs, ino, inode,
+						  lfree_start, &pblk);
+		if (retval)
+			goto errout;
+		if (!pblk) {
+			ext2fs_block_alloc_stats2(fs, free_start, -1);
+			freed_now++;
+		}
+		cluster_freed = EXT2FS_CLUSTER_RATIO(fs) -
+			(free_start & EXT2FS_CLUSTER_MASK(fs));
+		if (cluster_freed > free_count)
+			cluster_freed = free_count;
+		free_count -= cluster_freed;
+		free_start += cluster_freed;
+		lfree_start += cluster_freed;
+	}
+
+	/* Free whole clusters from the middle of the range. */
+	while (free_count > 0 && free_count >= EXT2FS_CLUSTER_RATIO(fs)) {
+		ext2fs_block_alloc_stats2(fs, free_start, -1);
+		freed_now++;
+		cluster_freed = EXT2FS_CLUSTER_RATIO(fs);
+		free_count -= cluster_freed;
+		free_start += cluster_freed;
+		lfree_start += cluster_freed;
+	}
+
+	/* Try to free the last cluster. */
+	if (free_count > 0) {
+		retval = ext2fs_map_cluster_block(fs, ino, inode,
+						  lfree_start, &pblk);
+		if (retval)
+			goto errout;
+		if (!pblk) {
+			ext2fs_block_alloc_stats2(fs, free_start, -1);
+			freed_now++;
+		}
+	}
+
+errout:
+	*freed += freed_now;
+	return retval;
+}
+
 static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 				     struct ext2_inode *inode,
 				     blk64_t start, blk64_t end)
@@ -184,7 +253,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 	ext2_extent_handle_t	handle = 0;
 	struct ext2fs_extent	extent;
 	errcode_t		retval;
-	blk64_t			free_start, next;
+	blk64_t			free_start, next, lfree_start;
 	__u32			free_count, newlen;
 	int			freed = 0;
 	int			op;
@@ -211,6 +280,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 			/* Start of deleted region before extent; 
 			   adjust beginning of extent */
 			free_start = extent.e_pblk;
+			lfree_start = extent.e_lblk;
 			if (next > end)
 				free_count = end - extent.e_lblk + 1;
 			else
@@ -226,6 +296,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 			dbg_printf("Case #%d\n", 2);
 			newlen = start - extent.e_lblk;
 			free_start = extent.e_pblk + newlen;
+			lfree_start = extent.e_lblk + newlen;
 			free_count = extent.e_len - newlen;
 			extent.e_len = newlen;
 		} else {
@@ -241,6 +312,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 
 			extent.e_len = start - extent.e_lblk;
 			free_start = extent.e_pblk + extent.e_len;
+			lfree_start = extent.e_lblk + extent.e_len;
 			free_count = end - start + 1;
 
 			dbg_print_extent("inserting", &newex);
@@ -281,10 +353,10 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 			goto errout;
 		dbg_printf("Free start %llu, free count = %u\n",
 		       free_start, free_count);
-		while (free_count-- > 0) {
-			ext2fs_block_alloc_stats2(fs, free_start++, -1);
-			freed++;
-		}
+		retval = punch_extent_blocks(fs, ino, inode, lfree_start,
+					     free_start, free_count, &freed);
+		if (retval)
+			goto errout;
 	next_extent:
 		retval = ext2fs_extent_get(handle, op,
 					   &extent);


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 12/25] libext2fs: don't update the summary counts when doing implied cluster allocation
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (10 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 11/25] libext2fs: only punch complete clusters Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-11-25  9:03   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 13/25] libext2fs: use ext2fs_punch() to truncate quota file Darrick J. Wong
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're appending a block to a directory file or the journal file,
and the new block is part of a cluster that has already been allocated
to the file (implied cluster allocation), don't update the bitmap or
the summary counts because that was performed when the cluster was
allocated.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/expanddir.c |    2 +-
 lib/ext2fs/mkjournal.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/lib/ext2fs/expanddir.c b/lib/ext2fs/expanddir.c
index 22558d6..09a15fa 100644
--- a/lib/ext2fs/expanddir.c
+++ b/lib/ext2fs/expanddir.c
@@ -55,6 +55,7 @@ static int expand_dir_proc(ext2_filsys	fs,
 			return BLOCK_ABORT;
 		}
 		es->newblocks++;
+		ext2fs_block_alloc_stats2(fs, new_blk, +1);
 	}
 	if (blockcnt > 0) {
 		retval = ext2fs_new_dir_block(fs, 0, 0, &block);
@@ -82,7 +83,6 @@ static int expand_dir_proc(ext2_filsys	fs,
 	}
 	ext2fs_free_mem(&block);
 	*blocknr = new_blk;
-	ext2fs_block_alloc_stats2(fs, new_blk, +1);
 
 	if (es->done)
 		return (BLOCK_CHANGED | BLOCK_ABORT);
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index 2afd3b7..8bf4670 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -250,6 +250,7 @@ static int mkjournal_proc(ext2_filsys	fs,
 			es->err = retval;
 			return BLOCK_ABORT;
 		}
+		ext2fs_block_alloc_stats2(fs, new_blk, +1);
 		es->newblocks++;
 	}
 	if (blockcnt >= 0)
@@ -285,7 +286,6 @@ static int mkjournal_proc(ext2_filsys	fs,
 		return BLOCK_ABORT;
 	}
 	*blocknr = es->goal = new_blk;
-	ext2fs_block_alloc_stats2(fs, new_blk, +1);
 
 	if (es->num_blocks == 0)
 		return (BLOCK_CHANGED | BLOCK_ABORT);


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 13/25] libext2fs: use ext2fs_punch() to truncate quota file
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (11 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 12/25] libext2fs: don't update the summary counts when doing implied cluster allocation Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-11-25  9:08   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 14/25] e2fsck: only release clusters when shortening a directory during a rehash Darrick J. Wong
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Use the new ext2fs_punch() call to truncate the quota file.  This also
eliminates the need to fix it to work with bigalloc.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/quota/quotaio.c |   19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)


diff --git a/lib/quota/quotaio.c b/lib/quota/quotaio.c
index 8ddb92a..1bdcba6 100644
--- a/lib/quota/quotaio.c
+++ b/lib/quota/quotaio.c
@@ -98,19 +98,6 @@ void update_grace_times(struct dquot *q)
 	}
 }
 
-static int release_blocks_proc(ext2_filsys fs, blk64_t *blocknr,
-			       e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
-			       blk64_t ref_block EXT2FS_ATTR((unused)),
-			       int ref_offset EXT2FS_ATTR((unused)),
-			       void *private EXT2FS_ATTR((unused)))
-{
-	blk64_t	block;
-
-	block = *blocknr;
-	ext2fs_block_alloc_stats2(fs, block, -1);
-	return 0;
-}
-
 static int compute_num_blocks_proc(ext2_filsys fs, blk64_t *blocknr,
 			       e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
 			       blk64_t ref_block EXT2FS_ATTR((unused)),
@@ -135,9 +122,9 @@ errcode_t quota_inode_truncate(ext2_filsys fs, ext2_ino_t ino)
 		inode.i_dtime = fs->now ? fs->now : time(0);
 		if (!ext2fs_inode_has_valid_blocks2(fs, &inode))
 			return 0;

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 14/25] e2fsck: only release clusters when shortening a directory during a rehash
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (12 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 13/25] libext2fs: use ext2fs_punch() to truncate quota file Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-11-25 11:09   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 15/25] e2fsck: print cluster ranges when encountering bitmap errors Darrick J. Wong
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When the rehash process is running on a bigalloc filesystem, it
compresses all the directory entries and hash structures into the
beginning of the directory file and then uses block_iterate3() to free
the blocks off the end of the file.  It seems to call
ext2fs_block_alloc_stats2() for every block in a cluster, which is
unfortunate because this function allocates and frees entire clusters
(and updates the summary counts accordingly).  In this case e2fsck
writes out incorrect summary counts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/rehash.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)


diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 6ef3568..29da9a1 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -719,10 +719,18 @@ static int write_dir_block(ext2_filsys fs,
 		/* We don't need this block, so release it */
 		e2fsck_read_bitmaps(wd->ctx);
 		blk = *block_nr;
-		ext2fs_unmark_block_bitmap2(wd->ctx->block_found_map, blk);
-		ext2fs_block_alloc_stats2(fs, blk, -1);
+		/*
+		 * In theory, we only release blocks from the end of the
+		 * directory file, so it's fine to clobber a whole cluster at
+		 * once.
+		 */
+		if (blk % EXT2FS_CLUSTER_RATIO(fs) == 0) {
+			ext2fs_unmark_block_bitmap2(wd->ctx->block_found_map,
+						    blk);
+			ext2fs_block_alloc_stats2(fs, blk, -1);
+			wd->cleared++;
+		}
 		*block_nr = 0;
-		wd->cleared++;
 		return BLOCK_CHANGED;
 	}
 


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 15/25] e2fsck: print cluster ranges when encountering bitmap errors
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (13 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 14/25] e2fsck: only release clusters when shortening a directory during a rehash Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-11-25 11:56   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 16/25] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

If pass5 finds bitmap errors in a range of clusters, don't print each
cluster number individually when we could print only the start and end
cluster number.  e2fsck already does this for the non-bigalloc case.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 e2fsck/pass5.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index 346c831..30dc70a 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -528,8 +528,8 @@ redo_counts:
 			save_problem = problem;
 		} else {
 			if ((problem == save_problem) &&
-			    (pctx.blk2 == i-1))
-				pctx.blk2++;
+			    (pctx.blk2 == i - EXT2FS_CLUSTER_RATIO(fs)))
+				pctx.blk2 += EXT2FS_CLUSTER_RATIO(fs);
 			else {
 				print_bitmap_problem(ctx, save_problem, &pctx);
 				pctx.blk = pctx.blk2 = i;


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 16/25] resize2fs: convert fs to and from 64bit mode
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (14 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 15/25] e2fsck: print cluster ranges when encountering bitmap errors Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-10-18 18:59   ` Darrick J. Wong
  2013-11-26  6:44   ` Zheng Liu
  2013-10-18  4:50 ` [PATCH 17/25] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
                   ` (9 subsequent siblings)
  25 siblings, 2 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

resize2fs does its magic by loading a filesystem, duplicating the
in-memory image of that fs, moving relevant blocks out of the way of
whatever new metadata get created, and finally writing everything back
out to disk.  Enabling 64bit mode enlarges the group descriptors,
which makes resize2fs a reasonable vehicle for taking care of the rest
of the bookkeeping requirements, so add to resize2fs the ability to
convert a filesystem to 64bit mode and back.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/main.c         |   40 ++++++-
 resize/resize2fs.8.in |   18 +++
 resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
 resize/resize2fs.h    |    3 +
 4 files changed, 336 insertions(+), 7 deletions(-)


diff --git a/resize/main.c b/resize/main.c
index 1394ae1..ad0c946 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -41,7 +41,7 @@ char *program_name, *device_name, *io_options;
 static void usage (char *prog)
 {
 	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
-			   "[-p] device [new_size]\n\n"), prog);
+			   "[-p] device [-b|-s|new_size]\n\n"), prog);
 
 	exit (1);
 }
@@ -199,7 +199,7 @@ int main (int argc, char ** argv)
 	if (argc && *argv)
 		program_name = *argv;
 
-	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
+	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
 		switch (c) {
 		case 'h':
 			usage(program_name);
@@ -225,6 +225,12 @@ int main (int argc, char ** argv)
 		case 'S':
 			use_stride = atoi(optarg);
 			break;
+		case 'b':
+			flags |= RESIZE_ENABLE_64BIT;
+			break;
+		case 's':
+			flags |= RESIZE_DISABLE_64BIT;
+			break;
 		default:
 			usage(program_name);
 		}
@@ -383,6 +389,10 @@ int main (int argc, char ** argv)
 		if (sys_page_size > fs->blocksize)
 			new_size &= ~((sys_page_size / fs->blocksize)-1);
 	}
+	/* If changing 64bit, don't change the filesystem size. */
+	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
+		new_size = ext2fs_blocks_count(fs->super);
+	}
 	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				       EXT4_FEATURE_INCOMPAT_64BIT)) {
 		/* Take 16T down to 2^32-1 blocks */
@@ -434,7 +444,31 @@ int main (int argc, char ** argv)
 			fs->blocksize / 1024, new_size);
 		exit(1);
 	}
-	if (new_size == ext2fs_blocks_count(fs->super)) {
+	if (flags & RESIZE_DISABLE_64BIT && flags & RESIZE_ENABLE_64BIT) {
+		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
+		exit(1);
+	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
+		new_size = ext2fs_blocks_count(fs->super);
+		if (new_size >= (1ULL << 32)) {
+			fprintf(stderr, _("Cannot change the 64bit feature "
+				"on a filesystem that is larger than "
+				"2^32 blocks.\n"));
+			exit(1);
+		}
+		if (mount_flags & EXT2_MF_MOUNTED) {
+			fprintf(stderr, _("Cannot change the 64bit feature "
+				"while the filesystem is mounted.\n"));
+			exit(1);
+		}
+		if (flags & RESIZE_ENABLE_64BIT &&
+		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
+			fprintf(stderr, _("Please enable the extents feature "
+				"with tune2fs before enabling the 64bit "
+				"feature.\n"));
+			exit(1);
+		}
+	} else if (new_size == ext2fs_blocks_count(fs->super)) {
 		fprintf(stderr, _("The filesystem is already %llu blocks "
 			"long.  Nothing to do!\n\n"), new_size);
 		exit(0);
diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
index a1f3099..1c75816 100644
--- a/resize/resize2fs.8.in
+++ b/resize/resize2fs.8.in
@@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
 .SH SYNOPSIS
 .B resize2fs
 [
-.B \-fFpPM
+.B \-fFpPMbs
 ]
 [
 .B \-d
@@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
 to shrink the size of the partition.  When shrinking the size of
 the partition, make sure you do not make it smaller than the new size
 of the ext2 filesystem!
+.PP
+The
+.B \-b
+and
+.B \-s
+options enable and disable the 64bit feature, respectively.  The resize2fs
+program will, of course, take care of resizing the block group descriptors
+and moving other data blocks out of the way, as needed.  It is not possible
+to resize the filesystem concurrent with changing the 64bit status.
 .SH OPTIONS
 .TP
+.B \-b
+Turns on the 64bit feature, resizes the group descriptors as necessary, and
+moves other metadata out of the way.
+.TP
 .B \-d \fIdebug-flags
 Turns on various resize2fs debugging features, if they have been compiled
 into the binary.
@@ -126,6 +139,9 @@ of what the program is doing.
 .B \-P
 Print the minimum size of the filesystem and exit.
 .TP
+.B \-s
+Turns off the 64bit feature and frees blocks that are no longer in use.
+.TP
 .B \-S \fIRAID-stride
 The
 .B resize2fs
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 0feff0f..05ba6e1 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -53,6 +53,9 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs);
 static errcode_t fix_sb_journal_backup(ext2_filsys fs);
 static errcode_t mark_table_blocks(ext2_filsys fs,
 				   ext2fs_block_bitmap bmap);
+static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
+static errcode_t move_bg_metadata(ext2_resize_t rfs);
+static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
 
 /*
  * Some helper CPP macros
@@ -119,13 +122,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
 	if (retval)
 		goto errout;
 
+	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
+	retval = resize_group_descriptors(rfs, *new_size);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
+	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
+	retval = move_bg_metadata(rfs);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
+	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
+	retval = zero_high_bits_in_inodes(rfs);
+	if (retval)
+		goto errout;
+	print_resource_track(rfs, &rtrack, fs->io);
+
 	init_resource_track(&rtrack, "adjust_superblock", fs->io);
 	retval = adjust_superblock(rfs, *new_size);
 	if (retval)
 		goto errout;
 	print_resource_track(rfs, &rtrack, fs->io);
 
-
 	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
 	fix_uninit_block_bitmaps(rfs->new_fs);
 	print_resource_track(rfs, &rtrack, fs->io);
@@ -221,6 +241,259 @@ errout:
 	return retval;
 }
 
+/* Toggle 64bit mode */
+static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
+{
+	void *o, *n, *new_group_desc;
+	dgrp_t i;
+	int copy_size;
+	errcode_t retval;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
+	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
+	    (rfs->flags & RESIZE_DISABLE_64BIT &&
+	     rfs->flags & RESIZE_ENABLE_64BIT))
+		return EXT2_ET_INVALID_ARGUMENT;
+
+	if (rfs->flags & RESIZE_DISABLE_64BIT) {
+		rfs->new_fs->super->s_feature_incompat &=
+				~EXT4_FEATURE_INCOMPAT_64BIT;
+		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
+	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
+		rfs->new_fs->super->s_feature_incompat |=
+				EXT4_FEATURE_INCOMPAT_64BIT;
+		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
+	}
+
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
+	    EXT2_DESC_SIZE(rfs->new_fs->super))
+		return 0;
+
+	o = rfs->new_fs->group_desc;
+	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
+			rfs->old_fs->group_desc_count,
+			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
+	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
+				      rfs->old_fs->blocksize, &new_group_desc);
+	if (retval)
+		return retval;
+
+	n = new_group_desc;
+
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
+	    EXT2_DESC_SIZE(rfs->new_fs->super))
+		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
+	else
+		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		memcpy(n, o, copy_size);
+		n += EXT2_DESC_SIZE(rfs->new_fs->super);
+		o += EXT2_DESC_SIZE(rfs->old_fs->super);
+	}
+
+	ext2fs_free_mem(&rfs->new_fs->group_desc);
+	rfs->new_fs->group_desc = new_group_desc;
+
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
+		ext2fs_group_desc_csum_set(rfs->new_fs, i);
+
+	return 0;
+}
+
+/* Move bitmaps/inode tables out of the way. */
+static errcode_t move_bg_metadata(ext2_resize_t rfs)
+{
+	dgrp_t i;
+	blk64_t b, c, d;
+	ext2fs_block_bitmap old_map, new_map;
+	int old, new;
+	errcode_t retval;
+	int zero = 0, one = 1;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
+	if (retval)
+		goto out;
+
+	/* Construct bitmaps of super/descriptor blocks in old and new fs */
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
+						   NULL);
+		if (retval)
+			goto out;
+		ext2fs_mark_block_bitmap2(old_map, b);
+		ext2fs_mark_block_bitmap2(old_map, c);
+		ext2fs_mark_block_bitmap2(old_map, d);
+
+		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
+						   NULL);
+		if (retval)
+			goto out;
+		ext2fs_mark_block_bitmap2(new_map, b);
+		ext2fs_mark_block_bitmap2(new_map, c);
+		ext2fs_mark_block_bitmap2(new_map, d);
+	}
+
+	/* Find changes in block allocations for bg metadata */
+	for (b = 0;
+	     b < ext2fs_blocks_count(rfs->new_fs->super);
+	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
+		old = ext2fs_test_block_bitmap2(old_map, b);
+		new = ext2fs_test_block_bitmap2(new_map, b);
+
+		if (old && !new)
+			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
+		else if (!old && new)
+			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
+		else
+			ext2fs_unmark_block_bitmap2(new_map, b);
+	}
+	/* new_map now shows blocks that have been newly allocated. */
+
+	/* Move any conflicting bitmaps and inode tables */
+	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
+		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
+		if (ext2fs_test_block_bitmap2(new_map, b))
+			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
+
+		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
+		if (ext2fs_test_block_bitmap2(new_map, b))
+			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
+
+		c = ext2fs_inode_table_loc(rfs->new_fs, i);
+		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
+			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
+				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
+				break;
+			}
+		}
+	}
+
+out:
+	if (old_map)
+		ext2fs_free_block_bitmap(old_map);
+	if (new_map)
+		ext2fs_free_block_bitmap(new_map);
+	return retval;
+}
+
+/* Zero out the high bits of extent fields */
+static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
+				 struct ext2_inode *inode)
+{
+	ext2_extent_handle_t	handle;
+	struct ext2fs_extent	extent;
+	int			op = EXT2_EXTENT_ROOT;
+	errcode_t		errcode;
+
+	if (!(inode->i_flags & EXT4_EXTENTS_FL))
+		return 0;
+
+	errcode = ext2fs_extent_open(fs, ino, &handle);
+	if (errcode)
+		return errcode;
+
+	while (1) {
+		errcode = ext2fs_extent_get(handle, op, &extent);
+		if (errcode)
+			break;
+
+		op = EXT2_EXTENT_NEXT_SIB;
+
+		if (extent.e_pblk > (1ULL << 32)) {
+			extent.e_pblk &= (1ULL << 32) - 1;
+			errcode = ext2fs_extent_replace(handle, 0, &extent);
+			if (errcode)
+				break;
+		}
+	}
+
+	/* Ok if we run off the end */
+	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
+		errcode = 0;
+	return errcode;
+}
+
+/* Zero out the high bits of inodes. */
+static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
+{
+	ext2_filsys	fs = rfs->new_fs;
+	int length = EXT2_INODE_SIZE(fs->super);
+	struct ext2_inode *inode = NULL;
+	ext2_inode_scan	scan = NULL;
+	errcode_t	retval;
+	ext2_ino_t	ino;
+	blk64_t		file_acl_block;
+	int		inode_dirty;
+
+	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
+		return 0;
+
+	if (fs->super->s_creator_os != EXT2_OS_LINUX)
+		return 0;
+
+	retval = ext2fs_open_inode_scan(fs, 0, &scan);
+	if (retval)
+		return retval;
+
+	retval = ext2fs_get_mem(length, &inode);
+	if (retval)
+		goto out;
+
+	do {
+		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
+		if (retval)
+			goto out;
+		if (!ino)
+			break;
+		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
+			continue;
+
+		/*
+		 * Here's how we deal with high block number fields:
+		 *
+		 *  - i_size_high has been been written out with i_size_lo
+		 *    since the ext2 days, so no conversion is needed.
+		 *
+		 *  - i_blocks_hi is guarded by both the huge_file feature and
+		 *    inode flags and has always been written out with
+		 *    i_blocks_lo if the feature is set.  The field is only
+		 *    ever read if both feature and inode flag are set, so
+		 *    we don't need to zero it now.
+		 *
+		 *  - i_file_acl_high can be uninitialized, so zero it if
+		 *    it isn't already.
+		 */
+		if (inode->osd2.linux2.l_i_file_acl_high) {
+			inode->osd2.linux2.l_i_file_acl_high = 0;
+			retval = ext2fs_write_inode_full(fs, ino, inode,
+							 length);
+			if (retval)
+				goto out;
+		}
+
+		retval = zero_high_bits_in_extents(fs, ino, inode);
+		if (retval)
+			goto out;
+	} while (ino);
+
+out:
+	if (inode)
+		ext2fs_free_mem(&inode);
+	if (scan)
+		ext2fs_close_inode_scan(scan);
+	return retval;
+}
+
 /*
  * Clean up the bitmaps for unitialized bitmaps
  */
@@ -424,7 +697,8 @@ retry:
 	/*
 	 * Reallocate the group descriptors as necessary.
 	 */
-	if (old_fs->desc_blocks != fs->desc_blocks) {
+	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
+	    old_fs->desc_blocks != fs->desc_blocks) {
 		retval = ext2fs_resize_mem(old_fs->desc_blocks *
 					   fs->blocksize,
 					   fs->desc_blocks * fs->blocksize,
@@ -949,7 +1223,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 		new_blocks = fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
 	}
 
-	if (old_blocks == new_blocks) {
+	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
+	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
+	    old_blocks == new_blocks) {
 		retval = 0;
 		goto errout;
 	}
diff --git a/resize/resize2fs.h b/resize/resize2fs.h
index 52319b5..5a1c5dc 100644
--- a/resize/resize2fs.h
+++ b/resize/resize2fs.h
@@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
 #define RESIZE_PERCENT_COMPLETE		0x0100
 #define RESIZE_VERBOSE			0x0200
 
+#define RESIZE_ENABLE_64BIT		0x0400
+#define RESIZE_DISABLE_64BIT		0x0800
+
 /*
  * This structure is used for keeping track of how much resources have
  * been used for a particular resize2fs pass.


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 17/25] resize2fs: when toggling 64bit, don't free in-use bg data clusters
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (15 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 16/25] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-10-18  4:50 ` [PATCH 18/25] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Currently, move_bg_metadata() assumes that if a block containing a
superblock or a group descriptor is no longer needed, then it is safe
to free the whole cluster.  This of course isn't true, for bitmaps and
inode tables can share these clusters.  Therefore, check a little more
carefully before freeing clusters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   71 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 55 insertions(+), 16 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 05ba6e1..472aa4a 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -307,11 +307,11 @@ static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 static errcode_t move_bg_metadata(ext2_resize_t rfs)
 {
 	dgrp_t i;
-	blk64_t b, c, d;
+	blk64_t b, c, d, old_desc_blocks, new_desc_blocks, j;
 	ext2fs_block_bitmap old_map, new_map;
 	int old, new;
 	errcode_t retval;
-	int zero = 0, one = 1;
+	int zero = 0, one = 1, cluster_ratio;
 
 	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
 		return 0;
@@ -324,6 +324,17 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 	if (retval)
 		goto out;
 
+	if (EXT2_HAS_INCOMPAT_FEATURE(rfs->old_fs->super,
+				      EXT2_FEATURE_INCOMPAT_META_BG)) {
+		old_desc_blocks = rfs->old_fs->super->s_first_meta_bg;
+		new_desc_blocks = rfs->new_fs->super->s_first_meta_bg;
+	} else {
+		old_desc_blocks = rfs->old_fs->desc_blocks +
+				rfs->old_fs->super->s_reserved_gdt_blocks;
+		new_desc_blocks = rfs->new_fs->desc_blocks +
+				rfs->new_fs->super->s_reserved_gdt_blocks;
+	}
+
 	/* Construct bitmaps of super/descriptor blocks in old and new fs */
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
 		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
@@ -331,7 +342,8 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 		if (retval)
 			goto out;
 		ext2fs_mark_block_bitmap2(old_map, b);
-		ext2fs_mark_block_bitmap2(old_map, c);
+		for (j = 0; c != 0 && j < old_desc_blocks; j++)
+			ext2fs_mark_block_bitmap2(old_map, c + j);
 		ext2fs_mark_block_bitmap2(old_map, d);
 
 		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
@@ -339,45 +351,72 @@ static errcode_t move_bg_metadata(ext2_resize_t rfs)
 		if (retval)
 			goto out;
 		ext2fs_mark_block_bitmap2(new_map, b);
-		ext2fs_mark_block_bitmap2(new_map, c);
+		for (j = 0; c != 0 && j < new_desc_blocks; j++)
+			ext2fs_mark_block_bitmap2(new_map, c + j);
 		ext2fs_mark_block_bitmap2(new_map, d);
 	}
 
+	cluster_ratio = EXT2FS_CLUSTER_RATIO(rfs->new_fs);
+
 	/* Find changes in block allocations for bg metadata */
 	for (b = 0;
 	     b < ext2fs_blocks_count(rfs->new_fs->super);
-	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
+	     b += cluster_ratio) {
 		old = ext2fs_test_block_bitmap2(old_map, b);
 		new = ext2fs_test_block_bitmap2(new_map, b);
 
-		if (old && !new)
-			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
-		else if (!old && new)
-			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
-		else
+		if (old && !new) {
+			/* mark old_map, unmark new_map */
+			if (cluster_ratio == 1)
+				ext2fs_unmark_block_bitmap2(
+						rfs->new_fs->block_map, b);
+		} else if (!old && new)
+			; /* unmark old_map, mark new_map */
+		else {
+			ext2fs_unmark_block_bitmap2(old_map, b);
 			ext2fs_unmark_block_bitmap2(new_map, b);
+		}
 	}
-	/* new_map now shows blocks that have been newly allocated. */
 
-	/* Move any conflicting bitmaps and inode tables */
+	/*
+	 * new_map now shows blocks that have been newly allocated.
+	 * old_map now shows blocks that have been newly freed.
+	 */
+
+	/*
+	 * Move any conflicting bitmaps and inode tables.  Ensure that we
+	 * don't try to free clusters associated with bitmaps or tables.
+	 */
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
 		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
 		if (ext2fs_test_block_bitmap2(new_map, b))
 			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
+		else if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(old_map, b);
 
 		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
 		if (ext2fs_test_block_bitmap2(new_map, b))
 			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
+		else if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(old_map, b);
 
 		c = ext2fs_inode_table_loc(rfs->new_fs, i);
-		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
-			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
+		for (b = 0;
+		     b < rfs->new_fs->inode_blocks_per_group;
+		     b++) {
+			if (ext2fs_test_block_bitmap2(new_map, b + c))
 				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
-				break;
-			}
+			else if (ext2fs_test_block_bitmap2(old_map, b + c))
+				ext2fs_unmark_block_bitmap2(old_map, b + c);
 		}
 	}
 
+	/* Free unused clusters */
+	for (b = 0;
+	     cluster_ratio > 1 && b < ext2fs_blocks_count(rfs->new_fs->super);
+	     b += cluster_ratio)
+		if (ext2fs_test_block_bitmap2(old_map, b))
+			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
 out:
 	if (old_map)
 		ext2fs_free_block_bitmap(old_map);


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 18/25] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (16 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 17/25] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
@ 2013-10-18  4:50 ` Darrick J. Wong
  2013-10-18  4:51 ` [PATCH 19/25] resize2fs: during shrink, don't free in-use bg data clusters Darrick J. Wong
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:50 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Since we're constructing the fantasy that new_fs has always been a
64bit fs, we need to adjust reserved_gdt_blocks when we start resizing
the metadata so that the size of the gdt space in the new fs reflects
the fantasy throughout the resize process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 472aa4a..5a576a7 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -241,6 +241,24 @@ errout:
 	return retval;
 }
 
+/* Keep the size of the group descriptor region constant */
+static void adjust_reserved_gdt_blocks(ext2_filsys old_fs, ext2_filsys fs)
+{
+	if ((fs->super->s_feature_compat &
+	     EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
+	    (old_fs->desc_blocks != fs->desc_blocks)) {
+		int new;
+
+		new = ((int) fs->super->s_reserved_gdt_blocks) +
+			(old_fs->desc_blocks - fs->desc_blocks);
+		if (new < 0)
+			new = 0;
+		if (new > (int) fs->blocksize/4)
+			new = fs->blocksize/4;
+		fs->super->s_reserved_gdt_blocks = new;
+	}
+}
+
 /* Toggle 64bit mode */
 static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 {
@@ -300,6 +318,8 @@ static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
 	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
 		ext2fs_group_desc_csum_set(rfs->new_fs, i);
 
+	adjust_reserved_gdt_blocks(rfs->old_fs, rfs->new_fs);
+
 	return 0;
 }
 
@@ -756,20 +776,11 @@ retry:
 	 * number of descriptor blocks, then adjust
 	 * s_reserved_gdt_blocks if possible to avoid needing to move
 	 * the inode table either now or in the future.
+	 *
+	 * Note: If we're converting to 64bit mode, we did this earlier.
 	 */
-	if ((fs->super->s_feature_compat &
-	     EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
-	    (old_fs->desc_blocks != fs->desc_blocks)) {
-		int new;

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 19/25] resize2fs: during shrink, don't free in-use bg data clusters
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (17 preceding siblings ...)
  2013-10-18  4:50 ` [PATCH 18/25] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
@ 2013-10-18  4:51 ` Darrick J. Wong
  2013-10-18  4:51 ` [PATCH 20/25] resize2fs: don't free in-use clusters when moving blocks Darrick J. Wong
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:51 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When freeing a group's metadata blocks, be careful not to free
clusters belonging to other groups!

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   78 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 29 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 5a576a7..12f6d16 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -602,40 +602,60 @@ static void fix_uninit_block_bitmaps(ext2_filsys fs)
  * release them in the new filesystem data structure, and mark them as
  * reserved so the old inode table blocks don't get overwritten.
  */
-static void free_gdp_blocks(ext2_filsys fs,
-			    ext2fs_block_bitmap reserve_blocks,
-			    ext2_filsys old_fs,
-			    dgrp_t group)
+static errcode_t free_gdp_blocks(ext2_filsys fs,
+				 ext2fs_block_bitmap reserve_blocks,
+				 ext2_filsys old_fs,
+				 dgrp_t group, dgrp_t count)
 {
 	blk64_t	blk;
 	int	j;
+	dgrp_t	i;
+	ext2fs_block_bitmap bg_map = NULL;
+	errcode_t retval = 0;
 
-	blk = ext2fs_block_bitmap_loc(old_fs, group);
-	if (blk &&
-	    (blk < ext2fs_blocks_count(fs->super))) {
-		ext2fs_block_alloc_stats2(fs, blk, -1);
-		ext2fs_mark_block_bitmap2(reserve_blocks, blk);
-	}
+	/* If bigalloc, don't free metadata living in the same cluster */
+	if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		retval = ext2fs_allocate_block_bitmap(fs, "bgdata", &bg_map);
+		if (retval)
+			goto out;
 
-	blk = ext2fs_inode_bitmap_loc(old_fs, group);
-	if (blk &&
-	    (blk < ext2fs_blocks_count(fs->super))) {
-		ext2fs_block_alloc_stats2(fs, blk, -1);
-		ext2fs_mark_block_bitmap2(reserve_blocks, blk);
+		retval = mark_table_blocks(fs, bg_map);
+		if (retval)
+			goto out;
 	}
 
-	blk = ext2fs_inode_table_loc(old_fs, group);
-	if (blk == 0 ||
-	    (blk >= ext2fs_blocks_count(fs->super)))
-		return;
+	for (i = group; i < group + count; i++) {
+		blk = ext2fs_block_bitmap_loc(old_fs, i);
+		if (blk &&
+		    (blk < ext2fs_blocks_count(fs->super)) &&
+		    !(bg_map && ext2fs_test_block_bitmap2(bg_map, blk))) {
+			ext2fs_block_alloc_stats2(fs, blk, -1);
+			ext2fs_mark_block_bitmap2(reserve_blocks, blk);
+		}
 
-	for (j = 0;
-	     j < fs->inode_blocks_per_group; j++, blk++) {
-		if (blk >= ext2fs_blocks_count(fs->super))
-			break;
-		ext2fs_block_alloc_stats2(fs, blk, -1);
-		ext2fs_mark_block_bitmap2(reserve_blocks, blk);
+		blk = ext2fs_inode_bitmap_loc(old_fs, i);
+		if (blk &&
+		    (blk < ext2fs_blocks_count(fs->super)) &&
+		    !(bg_map && ext2fs_test_block_bitmap2(bg_map, blk))) {
+			ext2fs_block_alloc_stats2(fs, blk, -1);
+			ext2fs_mark_block_bitmap2(reserve_blocks, blk);
+		}
+
+		blk = ext2fs_inode_table_loc(old_fs, i);
+		for (j = 0;
+		     j < fs->inode_blocks_per_group; j++, blk++) {
+			if (blk >= ext2fs_blocks_count(fs->super) ||
+			    (bg_map && ext2fs_test_block_bitmap2(bg_map, blk)))
+				continue;
+			ext2fs_block_alloc_stats2(fs, blk, -1);
+			ext2fs_mark_block_bitmap2(reserve_blocks, blk);
+		}
 	}
+
+out:
+	if (bg_map)
+		ext2fs_free_block_bitmap(bg_map);
+	return retval;
 }
 
 /*
@@ -791,10 +811,10 @@ retry:
 		 * Check the block groups that we are chopping off
 		 * and free any blocks associated with their metadata
 		 */
-		for (i = fs->group_desc_count;
-		     i < old_fs->group_desc_count; i++)
-			free_gdp_blocks(fs, reserve_blocks, old_fs, i);
-		retval = 0;
+		retval = free_gdp_blocks(fs, reserve_blocks, old_fs,
+					 fs->group_desc_count,
+					 old_fs->group_desc_count -
+					 fs->group_desc_count);
 		goto errout;
 	}
 


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 20/25] resize2fs: don't free in-use clusters when moving blocks
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (18 preceding siblings ...)
  2013-10-18  4:51 ` [PATCH 19/25] resize2fs: during shrink, don't free in-use bg data clusters Darrick J. Wong
@ 2013-10-18  4:51 ` Darrick J. Wong
  2013-10-18  4:51 ` [PATCH 21/25] misc: use the checksum predicate function, not raw flag tests Darrick J. Wong
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:51 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

When we're moving blocks around the filesystem, ensure that freeing
the old blocks only frees the clusters if they're not in use by other
metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |   70 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 61 insertions(+), 9 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 12f6d16..b351cc6 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1196,12 +1196,12 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	int		j, has_super;
 	dgrp_t		i, max_groups, g;
 	blk64_t		blk, group_blk;
-	blk64_t		old_blocks, new_blocks;
+	blk64_t		old_blocks, new_blocks, group_end, cluster_freed;
 	blk64_t		new_size;
 	unsigned int	meta_bg, meta_bg_size;
 	errcode_t	retval;
 	ext2_filsys 	fs, old_fs;
-	ext2fs_block_bitmap	meta_bmap;
+	ext2fs_block_bitmap	meta_bmap, new_meta_bmap = NULL;
 	int		flex_bg;
 
 	fs = rfs->new_fs;
@@ -1310,15 +1310,40 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	 * blocks as free.
 	 */
 	if (old_blocks > new_blocks) {
+		if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+			retval = ext2fs_allocate_block_bitmap(fs,
+							      _("new meta blocks"),
+							      &new_meta_bmap);
+			if (retval)
+				goto errout;
+
+			retval = mark_table_blocks(fs, new_meta_bmap);
+			if (retval)
+				goto errout;
+		}
+
 		for (i = 0; i < max_groups; i++) {
 			if (!ext2fs_bg_has_super(fs, i)) {
 				group_blk += fs->super->s_blocks_per_group;
 				continue;
 			}
-			for (blk = group_blk+1+new_blocks;
-			     blk < group_blk+1+old_blocks; blk++) {
-				ext2fs_block_alloc_stats2(fs, blk, -1);
+			group_end = group_blk + 1 + old_blocks;
+			for (blk = group_blk + 1 + new_blocks;
+			     blk < group_end;) {
+				if (new_meta_bmap == NULL ||
+				    !ext2fs_test_block_bitmap2(new_meta_bmap,
+							       blk)) {
+					cluster_freed = EXT2FS_CLUSTER_RATIO(fs) -
+							(blk & EXT2FS_CLUSTER_MASK(fs));
+					if (cluster_freed > group_end - blk)
+						cluster_freed = group_end - blk;
+					ext2fs_block_alloc_stats2(fs, blk, -1);
+					blk += EXT2FS_CLUSTER_RATIO(fs);
+					rfs->needed_blocks -= cluster_freed;
+					continue;
+				}
 				rfs->needed_blocks--;
+				blk++;
 			}
 			group_blk += fs->super->s_blocks_per_group;
 		}
@@ -1464,6 +1489,8 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	retval = 0;
 
 errout:
+	if (new_meta_bmap)
+		ext2fs_free_block_bitmap(new_meta_bmap);
 	if (meta_bmap)
 		ext2fs_free_block_bitmap(meta_bmap);
 
@@ -2063,9 +2090,10 @@ static errcode_t move_itables(ext2_resize_t rfs)
 	dgrp_t		i, max_groups;
 	ext2_filsys	fs = rfs->new_fs;
 	char		*cp;
-	blk64_t		old_blk, new_blk, blk;
+	blk64_t		old_blk, new_blk, blk, cluster_freed;
 	errcode_t	retval;
 	int		j, to_move, moved;
+	ext2fs_block_bitmap	new_bmap = NULL;
 
 	max_groups = fs->group_desc_count;
 	if (max_groups > rfs->old_fs->group_desc_count)
@@ -2078,6 +2106,17 @@ static errcode_t move_itables(ext2_resize_t rfs)
 			return retval;
 	}
 
+	if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+		retval = ext2fs_allocate_block_bitmap(fs, _("new meta blocks"),
+						      &new_bmap);
+		if (retval)
+			return retval;
+
+		retval = mark_table_blocks(fs, new_bmap);
+		if (retval)
+			goto errout;
+	}
+
 	/*
 	 * Figure out how many inode tables we need to move
 	 */
@@ -2155,8 +2194,19 @@ static errcode_t move_itables(ext2_resize_t rfs)
 		}
 
 		for (blk = ext2fs_inode_table_loc(rfs->old_fs, i), j=0;
-		     j < fs->inode_blocks_per_group ; j++, blk++)
-			ext2fs_block_alloc_stats2(fs, blk, -1);
+		     j < fs->inode_blocks_per_group;) {
+			if (new_bmap == NULL ||
+			    !ext2fs_test_block_bitmap2(new_bmap, blk)) {
+				ext2fs_block_alloc_stats2(fs, blk, -1);
+				cluster_freed = EXT2FS_CLUSTER_RATIO(fs) -
+						(blk & EXT2FS_CLUSTER_MASK(fs));
+				blk += cluster_freed;
+				j += cluster_freed;
+				continue;
+			}
+			blk++;
+			j++;
+		}
 
 		ext2fs_inode_table_loc_set(rfs->old_fs, i, new_blk);
 		ext2fs_group_desc_csum_set(rfs->old_fs, i);
@@ -2176,9 +2226,11 @@ static errcode_t move_itables(ext2_resize_t rfs)
 	if (rfs->flags & RESIZE_DEBUG_ITABLEMOVE)
 		printf("Inode table move finished.\n");
 #endif
-	return 0;
+	retval = 0;
 
 errout:
+	if (new_bmap)
+		ext2fs_free_block_bitmap(new_bmap);
 	return retval;
 }
 


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 21/25] misc: use the checksum predicate function, not raw flag tests
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (19 preceding siblings ...)
  2013-10-18  4:51 ` [PATCH 20/25] resize2fs: don't free in-use clusters when moving blocks Darrick J. Wong
@ 2013-10-18  4:51 ` Darrick J. Wong
  2013-10-18  4:51 ` [PATCH 22/25] resize2fs: rewrite extent/dir/ea block checksums when migrating Darrick J. Wong
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:51 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

metadata_csum implies uninit_bg, and in fact forces the bit off for
rocompat with older implementations.  Therefore, to detect the
presence of checksums, we should use the predicate function to decide
if group descriptor checksums are turned on, not open-coded flag
tests.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 misc/e2image.c     |    4 +---
 resize/resize2fs.c |    4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)


diff --git a/misc/e2image.c b/misc/e2image.c
index 4a5bb22..a466fe8 100644
--- a/misc/e2image.c
+++ b/misc/e2image.c
@@ -349,9 +349,7 @@ static void mark_table_blocks(ext2_filsys fs)
 		    ext2fs_inode_table_loc(fs, i)) {
 			unsigned int end = (unsigned) fs->inode_blocks_per_group;
 			/* skip unused blocks */
-			if (!output_is_blk &&
-			    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-						       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+			if (!output_is_blk && ext2fs_has_group_desc_csum(fs))
 				end -= (ext2fs_bg_itable_unused(fs, i) /
 					EXT2_INODES_PER_BLOCK(fs->super));
 			for (j = 0, b = ext2fs_inode_table_loc(fs, i);
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index b351cc6..440c20e 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1012,9 +1012,7 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk64_t new_size)
 	 * supports lazy inode initialization, we can skip
 	 * initializing the inode table.
 	 */
-	if (lazy_itable_init &&
-	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (lazy_itable_init && ext2fs_has_group_desc_csum(fs)) {
 		retval = 0;
 		goto errout;
 	}


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 22/25] resize2fs: rewrite extent/dir/ea block checksums when migrating
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (20 preceding siblings ...)
  2013-10-18  4:51 ` [PATCH 21/25] misc: use the checksum predicate function, not raw flag tests Darrick J. Wong
@ 2013-10-18  4:51 ` Darrick J. Wong
  2013-10-18  4:51 ` [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes Darrick J. Wong
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:51 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

With the advent of metadata_csum, we now tie extent and directory
blocks to the associated inode number (and generation).  Therefore, we
must be careful when remapping inodes.  At that point in the resize
process, all the blocks that are going away have been duplicated
elsewhere in the FS (albeit with checksums based on the old inode
numbers).  If we're moving the inode, then do that and remember that
new inode number.  Now we can update the block mappings for each inode
with the final inode number, and schedule directory blocks for mass
inode relocation.  We also have to recalculate the EA block checksum.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 resize/resize2fs.c |  154 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 114 insertions(+), 40 deletions(-)


diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 440c20e..17b8c45 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1733,10 +1733,12 @@ __u64 extent_translate(ext2_filsys fs, ext2_extent extent, __u64 old_loc)
 struct process_block_struct {
 	ext2_resize_t 		rfs;
 	ext2_ino_t		ino;
+	ext2_ino_t		old_ino;
 	struct ext2_inode *	inode;
 	errcode_t		error;
 	int			is_dir;
 	int			changed;
+	int			has_extents;
 };
 
 static int process_block(ext2_filsys fs, blk64_t	*block_nr,
@@ -1760,11 +1762,23 @@ static int process_block(ext2_filsys fs, blk64_t	*block_nr,
 #ifdef RESIZE2FS_DEBUG
 			if (pb->rfs->flags & RESIZE_DEBUG_BMOVE)
 				printf("ino=%u, blockcnt=%lld, %llu->%llu\n",
-				       pb->ino, blockcnt, block, new_block);
+				       pb->old_ino, blockcnt, block,
+				       new_block);
 #endif
 			block = new_block;
 		}
 	}
+
+	/*
+	 * If we moved inodes and metadata_csum is enabled, we must force the
+	 * extent block to be rewritten with new checksum.
+	 */
+	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+	    pb->has_extents &&
+	    pb->old_ino != pb->ino)
+		ret |= BLOCK_CHANGED;
+
 	if (pb->is_dir) {
 		retval = ext2fs_add_dir_block2(fs->dblist, pb->ino,
 					       block, (int) blockcnt);
@@ -1804,6 +1818,46 @@ static errcode_t progress_callback(ext2_filsys fs,
 	return 0;
 }
 
+static errcode_t migrate_ea_block(ext2_resize_t rfs, ext2_ino_t ino,
+				  struct ext2_inode *inode, int *changed)
+{
+	char *buf;
+	blk64_t new_block;
+	errcode_t err = 0;
+
+	/* No EA block or no remapping?  Quit early. */
+	if (ext2fs_file_acl_block(rfs->old_fs, inode) == 0 && !rfs->bmap)
+		return 0;
+	new_block = extent_translate(rfs->old_fs, rfs->bmap,
+		ext2fs_file_acl_block(rfs->old_fs, inode));
+	if (new_block == 0)
+		return 0;
+
+	/* Set the new ACL block */
+	ext2fs_file_acl_block_set(rfs->old_fs, inode, new_block);
+
+	/* Update checksum */
+	if (EXT2_HAS_RO_COMPAT_FEATURE(rfs->new_fs->super,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		err = ext2fs_get_mem(rfs->old_fs->blocksize, &buf);
+		if (err)
+			return err;
+		rfs->old_fs->flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
+		err = ext2fs_read_ext_attr3(rfs->old_fs, new_block, buf, ino);
+		rfs->old_fs->flags &= ~EXT2_FLAG_IGNORE_CSUM_ERRORS;
+		if (err)
+			goto out;
+		err = ext2fs_write_ext_attr3(rfs->old_fs, new_block, buf, ino);
+		if (err)
+			goto out;
+	}
+	*changed = 1;
+
+out:
+	ext2fs_free_mem(&buf);
+	return err;
+}
+
 static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
 {
 	struct process_block_struct	pb;
@@ -1814,7 +1868,6 @@ static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
 	char			*block_buf = 0;
 	ext2_ino_t		start_to_move;
 	blk64_t			orig_size;
-	blk64_t			new_block;
 	int			inode_size;
 
 	if ((rfs->old_fs->group_desc_count <=
@@ -1877,37 +1930,19 @@ static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
 		pb.is_dir = LINUX_S_ISDIR(inode->i_mode);
 		pb.changed = 0;
 
-		if (ext2fs_file_acl_block(rfs->old_fs, inode) && rfs->bmap) {
-			new_block = extent_translate(rfs->old_fs, rfs->bmap,
-				ext2fs_file_acl_block(rfs->old_fs, inode));
-			if (new_block) {
-				ext2fs_file_acl_block_set(rfs->old_fs, inode,
-							  new_block);
-				retval = ext2fs_write_inode_full(rfs->old_fs,
-							    ino, inode, inode_size);
-				if (retval) goto errout;
-			}
-		}
-
-		if (ext2fs_inode_has_valid_blocks2(rfs->old_fs, inode) &&
-		    (rfs->bmap || pb.is_dir)) {
-			pb.ino = ino;
-			retval = ext2fs_block_iterate3(rfs->old_fs,
-						       ino, 0, block_buf,
-						       process_block, &pb);
-			if (retval)
-				goto errout;
-			if (pb.error) {
-				retval = pb.error;
-				goto errout;
-			}
-		}
+		/* Remap EA block */
+		retval = migrate_ea_block(rfs, ino, inode, &pb.changed);
+		if (retval)
+			goto errout;
 
+		new_inode = ino;
 		if (ino <= start_to_move)
-			continue; /* Don't need to move it. */
+			goto remap_blocks; /* Don't need to move inode. */
 
 		/*
-		 * Find a new inode
+		 * Find a new inode.  Now that extents and directory blocks
+		 * are tied to the inode number through the checksum, we must
+		 * set up the new inode before we start rewriting blocks.
 		 */
 		retval = ext2fs_new_inode(rfs->new_fs, 0, 0, 0, &new_inode);
 		if (retval)
@@ -1915,16 +1950,12 @@ static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
 
 		ext2fs_inode_alloc_stats2(rfs->new_fs, new_inode, +1,
 					  pb.is_dir);
-		if (pb.changed) {
-			/* Get the new version of the inode */
-			retval = ext2fs_read_inode_full(rfs->old_fs, ino,
-						inode, inode_size);
-			if (retval) goto errout;
-		}
 		inode->i_ctime = time(0);
 		retval = ext2fs_write_inode_full(rfs->old_fs, new_inode,
 						inode, inode_size);
-		if (retval) goto errout;
+		if (retval)
+			goto errout;
+		pb.changed = 0;
 
 #ifdef RESIZE2FS_DEBUG
 		if (rfs->flags & RESIZE_DEBUG_INODEMAP)
@@ -1936,6 +1967,37 @@ static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
 				goto errout;
 		}
 		ext2fs_add_extent_entry(rfs->imap, ino, new_inode);
+
+remap_blocks:
+		if (pb.changed)
+			retval = ext2fs_write_inode_full(rfs->old_fs,
+							 new_inode,
+							 inode, inode_size);
+		if (retval)
+			goto errout;
+
+		/*
+		 * Update inodes to point to new blocks; schedule directory
+		 * blocks for inode remapping.  Need to write out dir blocks
+		 * with new inode numbers if we have metadata_csum enabled.
+		 */
+		if (ext2fs_inode_has_valid_blocks2(rfs->old_fs, inode) &&
+		    (rfs->bmap || pb.is_dir)) {
+			pb.ino = new_inode;
+			pb.old_ino = ino;
+			pb.has_extents = inode->i_flags & EXT4_EXTENTS_FL;
+			rfs->old_fs->flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
+			retval = ext2fs_block_iterate3(rfs->old_fs,
+						       new_inode, 0, block_buf,
+						       process_block, &pb);
+			rfs->old_fs->flags &= ~EXT2_FLAG_IGNORE_CSUM_ERRORS;
+			if (retval)
+				goto errout;
+			if (pb.error) {
+				retval = pb.error;
+				goto errout;
+			}
+		}
 	}
 	io_channel_flush(rfs->old_fs->io);
 
@@ -1978,6 +2040,7 @@ static int check_and_change_inodes(ext2_ino_t dir,
 	struct ext2_inode 	inode;
 	ext2_ino_t		new_inode;
 	errcode_t		retval;
+	int			ret = 0;
 
 	if (is->rfs->progress && offset == 0) {
 		io_channel_flush(is->rfs->old_fs->io);
@@ -1988,13 +2051,22 @@ static int check_and_change_inodes(ext2_ino_t dir,
 			return DIRENT_ABORT;
 	}
 
+	/*
+	 * If we have checksums enabled and the inode wasn't present in the
+	 * old fs, then we must rewrite all dir blocks with new checksums.
+	 */
+	if (EXT2_HAS_RO_COMPAT_FEATURE(is->rfs->old_fs->super,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+	    !ext2fs_test_inode_bitmap2(is->rfs->old_fs->inode_map, dir))
+		ret |= DIRENT_CHANGED;
+
 	if (!dirent->inode)
-		return 0;
+		return ret;
 
 	new_inode = ext2fs_extent_translate(is->rfs->imap, dirent->inode);
 
 	if (!new_inode)
-		return 0;
+		return ret;
 #ifdef RESIZE2FS_DEBUG
 	if (is->rfs->flags & RESIZE_DEBUG_INODEMAP)
 		printf("Inode translate (dir=%u, name=%.*s, %u->%u)\n",
@@ -2010,10 +2082,10 @@ static int check_and_change_inodes(ext2_ino_t dir,
 		inode.i_mtime = inode.i_ctime = time(0);
 		is->err = ext2fs_write_inode(is->rfs->old_fs, dir, &inode);
 		if (is->err)
-			return DIRENT_ABORT;
+			return ret | DIRENT_ABORT;
 	}
 
-	return DIRENT_CHANGED;
+	return ret | DIRENT_CHANGED;
 }
 
 static errcode_t inode_ref_fix(ext2_resize_t rfs)
@@ -2040,9 +2112,11 @@ static errcode_t inode_ref_fix(ext2_resize_t rfs)
 			goto errout;
 	}
 
+	rfs->old_fs->flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
 	retval = ext2fs_dblist_dir_iterate(rfs->old_fs->dblist,
 					   DIRENT_FLAG_INCLUDE_EMPTY, 0,
 					   check_and_change_inodes, &is);
+	rfs->old_fs->flags &= ~EXT2_FLAG_IGNORE_CSUM_ERRORS;
 	if (retval)
 		goto errout;
 	if (is.err) {


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (21 preceding siblings ...)
  2013-10-18  4:51 ` [PATCH 22/25] resize2fs: rewrite extent/dir/ea block checksums when migrating Darrick J. Wong
@ 2013-10-18  4:51 ` Darrick J. Wong
  2013-10-18 19:25   ` Darrick J. Wong
                     ` (2 more replies)
  2013-10-18  4:51 ` [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs Darrick J. Wong
                   ` (2 subsequent siblings)
  25 siblings, 3 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:51 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

Add functions to allow clients to get, set, and remove extended
attributes from any file.  It also supports modifying EAs living in
i_file_acl.

v2: Put the header declarations in the correct part of ext2fs.h,
provide a function to release an EA block from an inode, and check
i_extra_isize to make sure we actually have space for in-inode EAs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/ext2_err.et.in |   18 +
 lib/ext2fs/ext2fs.h       |   28 ++
 lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 807 insertions(+)


diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
index 9cc1bd1..b819a90 100644
--- a/lib/ext2fs/ext2_err.et.in
+++ b/lib/ext2fs/ext2_err.et.in
@@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
 ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
 	"Cannot block iterate on an inode containing inline data"
 
+ec	EXT2_ET_EA_BAD_NAME_LEN,
+	"Extended attribute has an invalid name length"
+
+ec	EXT2_ET_EA_BAD_VALUE_SIZE,
+	"Extended attribute has an invalid value length"
+
+ec	EXT2_ET_BAD_EA_HASH,
+	"Extended attribute has an incorrect hash"
+
+ec	EXT2_ET_BAD_EA_HEADER,
+	"Extended attribute block has a bad header"
+
+ec	EXT2_ET_EA_KEY_NOT_FOUND,
+	"Extended attribute key not found"
+
+ec	EXT2_ET_EA_NO_SPACE,
+	"Insufficient space to store extended attribute data"
+
 	end
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 5247922..93adae8 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
 #define EXT2_FLAG_FLUSH_NO_SYNC          1
 
 /*
+ * Modify and iterate extended attributes
+ */
+struct ext2_xattr_handle;
+#define XATTR_ABORT	1
+#define XATTR_CHANGED	2
+
+/*
  * function prototypes
  */
 static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
@@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
 					   char *block_buf,
 					   int adjust, __u32 *newcount,
 					   ext2_ino_t inum);
+errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
+			       unsigned int expandby);
+errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
+errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
+errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
+				int (*func)(char *name, char *value,
+					    void *data),
+				void *data);
+errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
+			   void **value, unsigned int *value_len);
+errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
+			   const char *key,
+			   const void *value,
+			   unsigned int value_len);
+errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
+			      const char *key);
+errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
+			     struct ext2_xattr_handle **handle);
+errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
+errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
+			       struct ext2_inode_large *inode);
 
 /* extent.c */
 extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 9649a14..2a1e5e7 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
 	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
 					  newcount);
 }
+
+/* Manipulate the contents of extended attribute regions */
+struct ext2_xattr {
+	char *name;
+	void *value;
+	unsigned int value_len;
+};
+
+struct ext2_xattr_handle {
+	ext2_filsys fs;
+	struct ext2_xattr *attrs;
+	unsigned int length;
+	ext2_ino_t ino;
+	int dirty;
+};
+
+errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
+			       unsigned int expandby)
+{
+	struct ext2_xattr *new_attrs;
+	errcode_t err;
+
+	err = ext2fs_get_arrayzero(h->length + expandby,
+				   sizeof(struct ext2_xattr), &new_attrs);
+	if (err)
+		return err;
+
+	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
+	ext2fs_free_mem(&h->attrs);
+	h->length += expandby;
+	h->attrs = new_attrs;
+
+	return 0;
+}
+
+struct ea_name_index {
+	int index;
+	const char *name;
+};
+
+static struct ea_name_index ea_names[] = {
+	{1, "user."},
+	{2, "system.posix_acl_access"},
+	{3, "system.posix_acl_default"},
+	{4, "trusted."},
+	{6, "security."},
+	{7, "system."},
+	{0, NULL},
+};
+
+static const char *find_ea_prefix(int index)
+{
+	struct ea_name_index *e;
+
+	for (e = ea_names; e->name; e++)
+		if (e->index == index)
+			return e->name;
+
+	return NULL;
+}
+
+static int find_ea_index(const char *fullname, char **name, int *index)
+{
+	struct ea_name_index *e;
+
+	for (e = ea_names; e->name; e++)
+		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
+			*name = (char *)fullname + strlen(e->name);
+			*index = e->index;
+			return 1;
+		}
+	return 0;
+}
+
+errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
+			       struct ext2_inode_large *inode)
+{
+	struct ext2_ext_attr_header *header;
+	void *block_buf = NULL;
+	dgrp_t grp;
+	blk64_t blk, goal;
+	errcode_t err;
+	struct ext2_inode_large i;
+
+	/* Read inode? */
+	if (inode == NULL) {
+		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
+					     sizeof(struct ext2_inode_large));
+		if (err)
+			return err;
+		inode = &i;
+	}
+
+	/* Do we already have an EA block? */
+	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
+	if (blk == 0)
+		return 0;
+
+	/* Find block, zero it, write back */
+	if ((blk < fs->super->s_first_data_block) ||
+	    (blk >= ext2fs_blocks_count(fs->super))) {
+		err = EXT2_ET_BAD_EA_BLOCK_NUM;
+		goto out;
+	}
+
+	err = ext2fs_get_mem(fs->blocksize, &block_buf);
+	if (err)
+		goto out;
+
+	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
+	if (err)
+		goto out2;
+
+	header = (struct ext2_ext_attr_header *) block_buf;
+	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
+		err = EXT2_ET_BAD_EA_HEADER;
+		goto out2;
+	}
+
+	header->h_refcount--;
+	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
+	if (err)
+		goto out2;
+
+	/* Erase link to block */
+	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
+	if (header->h_refcount == 0)
+		ext2fs_block_alloc_stats2(fs, blk, -1);
+
+	/* Write inode? */
+	if (inode == &i) {
+		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
+					      sizeof(struct ext2_inode_large));
+		if (err)
+			goto out2;
+	}
+
+out2:
+	ext2fs_free_mem(&block_buf);
+out:
+	return err;
+}
+
+static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
+					 struct ext2_inode_large *inode)
+{
+	struct ext2_ext_attr_header *header;
+	void *block_buf = NULL;
+	dgrp_t grp;
+	blk64_t blk, goal;
+	errcode_t err;
+
+	/* Do we already have an EA block? */
+	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
+	if (blk != 0) {
+		if ((blk < fs->super->s_first_data_block) ||
+		    (blk >= ext2fs_blocks_count(fs->super))) {
+			err = EXT2_ET_BAD_EA_BLOCK_NUM;
+			goto out;
+		}
+
+		err = ext2fs_get_mem(fs->blocksize, &block_buf);
+		if (err)
+			goto out;
+
+		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
+		if (err)
+			goto out2;
+
+		header = (struct ext2_ext_attr_header *) block_buf;
+		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
+			err = EXT2_ET_BAD_EA_HEADER;
+			goto out2;
+		}
+
+		/* Single-user block.  We're done here. */
+		if (header->h_refcount == 1)
+			return 0;
+
+		/* We need to CoW the block. */
+		header->h_refcount--;
+		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
+		if (err)
+			goto out2;
+	} else {
+		/* No block, we must increment i_blocks */
+		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
+					     1);
+		if (err)
+			goto out;
+	}
+
+	/* Allocate a block */
+	grp = ext2fs_group_of_ino(fs, ino);
+	goal = ext2fs_inode_table_loc(fs, grp);
+	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
+	if (err)
+		return err;
+	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
+out2:
+	ext2fs_free_mem(&block_buf);
+out:
+	return err;
+}
+
+
+static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
+					struct ext2_xattr **pos,
+					void *entries_start,
+					unsigned int storage_size,
+					unsigned int value_offset_correction)
+{
+	struct ext2_xattr *x = *pos;
+	struct ext2_ext_attr_entry *e = entries_start;
+	void *end = entries_start + storage_size;
+	char *shortname;
+	unsigned int entry_size, value_size;
+	int idx, ret;
+
+	/* For all remaining x...  */
+	for (; x < handle->attrs + handle->length; x++) {
+		if (!x->name)
+			continue;
+
+		/* Calculate index and shortname position */
+		shortname = x->name;
+		ret = find_ea_index(x->name, &shortname, &idx);
+
+		/* Calculate entry and value size */
+		entry_size = (sizeof(*e) + strlen(shortname) +
+			      EXT2_EXT_ATTR_PAD - 1) &
+			     ~(EXT2_EXT_ATTR_PAD - 1);
+		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
+			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
+
+		/*
+		 * Would entry collide with value?
+		 * Note that we must leave sufficient room for a (u32)0 to
+		 * mark the end of the entries.
+		 */
+		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
+			break;
+
+		/* Fill out e appropriately */
+		e->e_name_len = strlen(shortname);
+		e->e_name_index = (ret ? idx : 0);
+		e->e_value_offs = end - value_size - (void *)entries_start +
+				value_offset_correction;
+		e->e_value_block = 0;
+		e->e_value_size = x->value_len;
+
+		/* Store name and value */
+		end -= value_size;
+		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
+		memcpy(end, x->value, e->e_value_size);
+
+		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
+
+		e = EXT2_EXT_ATTR_NEXT(e);
+		*(__u32 *)e = 0;
+	}
+	*pos = x;
+
+	return 0;
+}
+
+errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
+{
+	struct ext2_xattr *x;
+	struct ext2_inode_large *inode;
+	void *start, *block_buf = NULL;
+	struct ext2_ext_attr_header *header;
+	__u32 ea_inode_magic;
+	blk64_t blk;
+	unsigned int storage_size;
+	unsigned int i, written;
+	errcode_t err;
+
+	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
+				     EXT2_FEATURE_COMPAT_EXT_ATTR))
+		return 0;
+
+	i = EXT2_INODE_SIZE(handle->fs->super);
+	if (i < sizeof(*inode))
+		i = sizeof(*inode);
+	err = ext2fs_get_memzero(i, &inode);
+	if (err)
+		return err;
+
+	err = ext2fs_read_inode_full(handle->fs, handle->ino,
+				     (struct ext2_inode *)inode,
+				     EXT2_INODE_SIZE(handle->fs->super));
+	if (err)
+		goto out;
+
+	x = handle->attrs;
+	/* Does the inode have size for EA? */
+	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
+						  inode->i_extra_isize +
+						  sizeof(__u32))
+		goto write_ea_block;
+
+	/* Write the inode EA */
+	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
+	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
+	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
+	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
+		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
+		sizeof(__u32);
+	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
+		inode->i_extra_isize + sizeof(__u32);
+
+	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
+	if (err)
+		goto out;
+
+	/* Are we done? */
+	if (x == handle->attrs + handle->length)
+		goto skip_ea_block;
+
+write_ea_block:
+	/* Write the EA block */
+	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
+	if (err)
+		goto out;
+
+	storage_size = handle->fs->blocksize -
+		sizeof(struct ext2_ext_attr_header);
+	start = block_buf + sizeof(struct ext2_ext_attr_header);
+
+	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
+				     (void *)start - block_buf);
+	if (err)
+		goto out2;
+
+	if (x < handle->attrs + handle->length) {
+		err = EXT2_ET_EA_NO_SPACE;
+		goto out2;
+	}
+
+	if (block_buf) {
+		/* Write a header on the EA block */
+		header = block_buf;
+		header->h_magic = EXT2_EXT_ATTR_MAGIC;
+		header->h_refcount = 1;
+		header->h_blocks = 1;
+
+		/* Get a new block for writing */
+		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
+		if (err)
+			goto out2;
+
+		/* Finally, write the new EA block */
+		blk = ext2fs_file_acl_block(handle->fs,
+					    (struct ext2_inode *)inode);
+		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
+					     handle->ino);
+		if (err)
+			goto out2;
+	}
+
+skip_ea_block:
+	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
+	if (!block_buf && blk) {
+		/* xattrs shrunk, free the block */
+		ext2fs_file_acl_block_set(handle->fs,
+					  (struct ext2_inode *)inode, 0);
+		err = ext2fs_iblk_sub_blocks(handle->fs,
+					     (struct ext2_inode *)inode, 1);
+		if (err)
+			goto out;
+		ext2fs_block_alloc_stats2(handle->fs, blk, -1);
+	}
+
+	/* Write the inode */
+	err = ext2fs_write_inode_full(handle->fs, handle->ino,
+				      (struct ext2_inode *)inode,
+				      EXT2_INODE_SIZE(handle->fs->super));
+	if (err)
+		goto out2;
+
+out2:
+	ext2fs_free_mem(&block_buf);
+out:
+	ext2fs_free_mem(&inode);
+	handle->dirty = 0;
+	return err;
+}
+
+static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
+					 struct ext2_ext_attr_entry *entries,
+					 unsigned int storage_size,
+					 void *value_start)
+{
+	struct ext2_xattr *x;
+	struct ext2_ext_attr_entry *entry;
+	const char *prefix;
+	void *ptr;
+	unsigned int remain, prefix_len;
+	errcode_t err;
+
+	x = handle->attrs;
+	while (x->name)
+		x++;
+
+	entry = entries;
+	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
+		__u32 hash;
+
+		/* header eats this space */
+		remain -= sizeof(struct ext2_ext_attr_entry);
+
+		/* is attribute name valid? */
+		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
+			return EXT2_ET_EA_BAD_NAME_LEN;
+
+		/* attribute len eats this space */
+		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
+
+		/* check value size */
+		if (entry->e_value_size > remain)
+			return EXT2_ET_EA_BAD_VALUE_SIZE;
+
+		/* e_value_block must be 0 in inode's ea */
+		if (entry->e_value_block != 0)
+			return EXT2_ET_BAD_EA_BLOCK_NUM;
+
+		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
+							 entry->e_value_offs);
+
+		/* e_hash may be 0 in older inode's ea */
+		if (entry->e_hash != 0 && entry->e_hash != hash)
+			return EXT2_ET_BAD_EA_HASH;
+
+		remain -= entry->e_value_size;
+
+		/* Allocate space for more attrs? */
+		if (x == handle->attrs + handle->length) {
+			err = ext2fs_xattrs_expand(handle, 4);
+			if (err)
+				return err;
+			x = handle->attrs + handle->length - 4;
+		}
+
+		/* Extract name/value */
+		prefix = find_ea_prefix(entry->e_name_index);
+		prefix_len = (prefix ? strlen(prefix) : 0);
+		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
+					 &x->name);
+		if (err)
+			return err;
+		if (prefix)
+			memcpy(x->name, prefix, prefix_len);
+		if (entry->e_name_len)
+			memcpy(x->name + prefix_len,
+			       (void *)entry + sizeof(*entry),
+			       entry->e_name_len);
+
+		err = ext2fs_get_mem(entry->e_value_size, &x->value);
+		if (err)
+			return err;
+		x->value_len = entry->e_value_size;
+		memcpy(x->value, value_start + entry->e_value_offs,
+		       entry->e_value_size);
+		x++;
+		entry = EXT2_EXT_ATTR_NEXT(entry);
+	}
+
+	return 0;
+}
+
+errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
+{
+	struct ext2_xattr *attrs = NULL, *x;
+	unsigned int attrs_len;
+	struct ext2_inode_large *inode;
+	struct ext2_ext_attr_header *header;
+	__u32 ea_inode_magic;
+	unsigned int storage_size;
+	void *start, *block_buf = NULL;
+	blk64_t blk;
+	int i;
+	errcode_t err;
+
+	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
+				     EXT2_FEATURE_COMPAT_EXT_ATTR))
+		return 0;
+
+	i = EXT2_INODE_SIZE(handle->fs->super);
+	if (i < sizeof(*inode))
+		i = sizeof(*inode);
+	err = ext2fs_get_memzero(i, &inode);
+	if (err)
+		return err;
+
+	err = ext2fs_read_inode_full(handle->fs, handle->ino,
+				     (struct ext2_inode *)inode,
+				     EXT2_INODE_SIZE(handle->fs->super));
+	if (err)
+		goto out;
+
+	/* Does the inode have size for EA? */
+	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
+						  inode->i_extra_isize +
+						  sizeof(__u32))
+		goto read_ea_block;
+
+	/* Look for EA in the inode */
+	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
+	       inode->i_extra_isize, sizeof(__u32));
+	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
+		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
+			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
+			sizeof(__u32);
+		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
+			inode->i_extra_isize + sizeof(__u32);
+
+		err = read_xattrs_from_buffer(handle, start, storage_size,
+					      start);
+		if (err)
+			goto out;
+	}
+
+read_ea_block:
+	/* Look for EA in a separate EA block */
+	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
+	if (blk != 0) {
+		if ((blk < handle->fs->super->s_first_data_block) ||
+		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
+			err = EXT2_ET_BAD_EA_BLOCK_NUM;
+			goto out;
+		}
+
+		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
+		if (err)
+			goto out;
+
+		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
+					    handle->ino);
+		if (err)
+			goto out3;
+
+		header = (struct ext2_ext_attr_header *) block_buf;
+		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
+			err = EXT2_ET_BAD_EA_HEADER;
+			goto out3;
+		}
+
+		if (header->h_blocks != 1) {
+			err = EXT2_ET_BAD_EA_HEADER;
+			goto out3;
+		}
+
+		/* Read EAs */
+		storage_size = handle->fs->blocksize -
+			sizeof(struct ext2_ext_attr_header);
+		start = block_buf + sizeof(struct ext2_ext_attr_header);
+		err = read_xattrs_from_buffer(handle, start, storage_size,
+					      block_buf);
+		if (err)
+			goto out3;
+
+		ext2fs_free_mem(&block_buf);
+	}
+
+	ext2fs_free_mem(&block_buf);
+	ext2fs_free_mem(&inode);
+	return 0;
+
+out3:
+	ext2fs_free_mem(&block_buf);
+out:
+	ext2fs_free_mem(&inode);
+	return err;
+}
+
+#define XATTR_ABORT	1
+#define XATTR_CHANGED	2
+errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
+				int (*func)(char *name, char *value,
+					    void *data),
+				void *data)
+{
+	struct ext2_xattr *x;
+	errcode_t err;
+	int ret;
+
+	for (x = h->attrs; x < h->attrs + h->length; x++) {
+		if (!x->name)
+			continue;
+
+		ret = func(x->name, x->value, data);
+		if (ret & XATTR_CHANGED)
+			h->dirty = 1;
+		if (ret & XATTR_ABORT)
+			return 0;
+	}
+
+	return 0;
+}
+
+errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
+			   void **value, unsigned int *value_len)
+{
+	struct ext2_xattr *x;
+	void *val;
+	errcode_t err;
+
+	for (x = h->attrs; x < h->attrs + h->length; x++) {
+		if (!x->name)
+			continue;
+
+		if (strcmp(x->name, key) == 0) {
+			err = ext2fs_get_mem(x->value_len, &val);
+			if (err)
+				return err;
+			memcpy(val, x->value, x->value_len);
+			*value = val;
+			*value_len = x->value_len;
+			return 0;
+		}
+	}
+
+	return EXT2_ET_EA_KEY_NOT_FOUND;
+}
+
+errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
+			   const char *key,
+			   const void *value,
+			   unsigned int value_len)
+{
+	struct ext2_xattr *x, *last_empty;
+	char *new_value;
+	errcode_t err;
+
+	last_empty = NULL;
+	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
+		if (!x->name) {
+			last_empty = x;
+			continue;
+		}
+
+		/* Replace xattr */
+		if (strcmp(x->name, key) == 0) {
+			err = ext2fs_get_mem(value_len, &new_value);
+			if (err)
+				return err;
+			memcpy(new_value, value, value_len);
+			ext2fs_free_mem(&x->value);
+			x->value = new_value;
+			x->value_len = value_len;
+			handle->dirty = 1;
+			return 0;
+		}
+	}
+
+	/* Add attr to empty slot */
+	if (last_empty) {
+		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
+		if (err)
+			return err;
+		strcpy(last_empty->name, key);
+
+		err = ext2fs_get_mem(value_len, &last_empty->value);
+		if (err)
+			return err;
+		memcpy(last_empty->value, value, value_len);
+		last_empty->value_len = value_len;
+		handle->dirty = 1;
+		return 0;
+	}
+
+	/* Expand array, append slot */
+	err = ext2fs_xattrs_expand(handle, 4);
+	if (err)
+		return err;
+
+	x = handle->attrs + handle->length - 4;
+	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
+	if (err)
+		return err;
+	strcpy(x->name, key);
+
+	err = ext2fs_get_mem(value_len, &x->value);
+	if (err)
+		return err;
+	memcpy(x->value, value, value_len);
+	x->value_len = value_len;
+	handle->dirty = 1;
+	return 0;
+}
+
+errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
+			      const char *key)
+{
+	struct ext2_xattr *x;
+	errcode_t err;
+
+	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
+		if (!x->name)
+			continue;
+
+		if (strcmp(x->name, key) == 0) {
+			ext2fs_free_mem(&x->name);
+			ext2fs_free_mem(&x->value);
+			x->value_len = 0;
+			handle->dirty = 1;
+			return 0;
+		}
+	}
+
+	return EXT2_ET_EA_KEY_NOT_FOUND;
+}
+
+errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
+			     struct ext2_xattr_handle **handle)
+{
+	struct ext2_xattr_handle *h;
+	errcode_t err;
+
+	err = ext2fs_get_memzero(sizeof(*h), &h);
+	if (err)
+		return err;
+
+	h->length = 4;
+	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
+				   &h->attrs);
+	if (err) {
+		ext2fs_free_mem(&h);
+		return err;
+	}
+	h->ino = ino;
+	h->fs = fs;
+	*handle = h;
+	return 0;
+}
+
+errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
+{
+	unsigned int i;
+	struct ext2_xattr_handle *h = *handle;
+	struct ext2_xattr *a = h->attrs;
+	errcode_t err;
+
+	if (h->dirty) {
+		err = ext2fs_xattrs_write(h);
+		if (err)
+			return err;
+	}
+
+	for (i = 0; i < h->length; i++) {
+		if (a[i].name)
+			ext2fs_free_mem(&a[i].name);
+		if (a[i].value)
+			ext2fs_free_mem(&a[i].value);
+	}
+
+	ext2fs_free_mem(&h->attrs);
+	ext2fs_free_mem(handle);
+	return 0;
+}


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (22 preceding siblings ...)
  2013-10-18  4:51 ` [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes Darrick J. Wong
@ 2013-10-18  4:51 ` Darrick J. Wong
  2013-10-18 19:36   ` Darrick J. Wong
  2013-10-22  1:20   ` Darrick J. Wong
  2013-10-18 13:13 ` [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Lukáš Czerner
  2013-10-18 18:39 ` Theodore Ts'o
  25 siblings, 2 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18  4:51 UTC (permalink / raw)
  To: tytso, darrick.wong; +Cc: linux-ext4

This is the initial implementation of a FUSE server based on
e2fsprogs.  The point of this program is to enable ext4 to run on any
OS that FUSE supports (and doesn't already have a native driver), such
as MacOS X, BSDs, and Windows.  The code requires FUSE API v28, which
is available in Linux fuse and osxfuse releases that are available as
of August 2013.

v2: Remove unnecessary calls to ext2fs_flush(), and ensure that xattr
blocks are freed when removing an inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 MCONFIG.in       |    1 
 configure        |   89 ++
 configure.in     |    9 
 misc/Makefile.in |   15 
 misc/fuse2fs.c   | 2837 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 2949 insertions(+), 2 deletions(-)
 create mode 100644 misc/fuse2fs.c


diff --git a/MCONFIG.in b/MCONFIG.in
index fa2b03e..9f88b55 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -93,6 +93,7 @@ LIBCOM_ERR = $(LIB)/libcom_err@LIB_EXT@ @PRIVATE_LIBS_CMT@ @SEM_INIT_LIB@
 LIBE2P = $(LIB)/libe2p@LIB_EXT@
 LIBEXT2FS = $(LIB)/libext2fs@LIB_EXT@
 LIBUUID = @LIBUUID@ @SOCKET_LIB@
+LIBFUSE = @FUSE_LIB@
 LIBQUOTA = @STATIC_LIBQUOTA@
 LIBBLKID = @LIBBLKID@ @PRIVATE_LIBS_CMT@ $(LIBUUID)
 LIBINTL = @LIBINTL@
diff --git a/configure b/configure
index 2338fbe..c666235 100755
--- a/configure
+++ b/configure
@@ -639,6 +639,8 @@ CYGWIN_CMT
 LINUX_CMT
 UNI_DIFF_OPTS
 SEM_INIT_LIB
+FUSE_CMT
+FUSE_LIB
 SOCKET_LIB
 SIZEOF_OFF_T
 SIZEOF_LONG_LONG
@@ -11172,6 +11174,93 @@ if test "x$ac_cv_lib_socket_socket" = xyes; then :
 fi
 
 
+FUSE_CMT=''
+FUSE_LIB=''
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
+$as_echo_n "checking for fuse_main in -losxfuse... " >&6; }
+if test "${ac_cv_lib_osxfuse_fuse_main+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-losxfuse  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char fuse_main ();
+int
+main ()
+{
+return fuse_main ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_osxfuse_fuse_main=yes
+else
+  ac_cv_lib_osxfuse_fuse_main=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_osxfuse_fuse_main" >&5
+$as_echo "$ac_cv_lib_osxfuse_fuse_main" >&6; }
+if test "x$ac_cv_lib_osxfuse_fuse_main" = x""yes; then :
+  FUSE_LIB=-losxfuse
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -lfuse" >&5
+$as_echo_n "checking for fuse_main in -lfuse... " >&6; }
+if test "${ac_cv_lib_fuse_fuse_main+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lfuse  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char fuse_main ();
+int
+main ()
+{
+return fuse_main ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_fuse_fuse_main=yes
+else
+  ac_cv_lib_fuse_fuse_main=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_fuse_fuse_main" >&5
+$as_echo "$ac_cv_lib_fuse_fuse_main" >&6; }
+if test "x$ac_cv_lib_fuse_fuse_main" = x""yes; then :
+  FUSE_LIB=-lfuse
+else
+  FUSE_CMT="#"
+fi
+
+fi
+
+
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for optreset" >&5
 $as_echo_n "checking for optreset... " >&6; }
 if ${ac_cv_have_optreset+:} false; then :
diff --git a/configure.in b/configure.in
index 049dc11..623adc8 100644
--- a/configure.in
+++ b/configure.in
@@ -1127,6 +1127,15 @@ SOCKET_LIB=''
 AC_CHECK_LIB(socket, socket, [SOCKET_LIB=-lsocket])
 AC_SUBST(SOCKET_LIB)
 dnl
+dnl Check to see if the FUSE library is -lfuse or -losxfuse
+dnl
+FUSE_CMT=''
+FUSE_LIB=''
+dnl osxfuse.dylib supersedes fuselib.dylib
+AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse], [AC_CHECK_LIB(fuse, fuse_main, [FUSE_LIB=-lfuse], [FUSE_CMT="#"])])
+AC_SUBST(FUSE_LIB)
+AC_SUBST(FUSE_CMT)
+dnl
 dnl See if optreset exists
 dnl
 AC_MSG_CHECKING(for optreset)
diff --git a/misc/Makefile.in b/misc/Makefile.in
index a798f96..1838d03 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -26,9 +26,12 @@ INSTALL = @INSTALL@
 @BLKID_CMT@FINDFS_LINK= findfs
 @BLKID_CMT@FINDFS_MAN= findfs.8
 
+@FUSE_CMT@FUSE_PROG= fuse2fs
+
 SPROGS=		mke2fs badblocks tune2fs dumpe2fs $(BLKID_PROG) logsave \
 			$(E2IMAGE_PROG) @FSCK_PROG@ e2undo
-USPROGS=	mklost+found filefrag e2freefrag $(UUIDD_PROG) $(E4DEFRAG_PROG)
+USPROGS=	mklost+found filefrag e2freefrag $(UUIDD_PROG) $(E4DEFRAG_PROG) \
+			$(FUSE_PROG)
 SMANPAGES=	tune2fs.8 mklost+found.8 mke2fs.8 dumpe2fs.8 badblocks.8 \
 			e2label.8 $(FINDFS_MAN) $(BLKID_MAN) $(E2IMAGE_MAN) \
 			logsave.8 filefrag.8 e2freefrag.8 e2undo.8 \
@@ -56,6 +59,7 @@ FILEFRAG_OBJS=	filefrag.o
 E2UNDO_OBJS=  e2undo.o
 E4DEFRAG_OBJS=	e4defrag.o
 E2FREEFRAG_OBJS= e2freefrag.o
+FUSE2FS_OBJS=	fuse2fs.o
 
 PROFILED_TUNE2FS_OBJS=	profiled/tune2fs.o profiled/util.o
 PROFILED_MKLPF_OBJS=	profiled/mklost+found.o
@@ -75,6 +79,7 @@ PROFILED_FILEFRAG_OBJS=	profiled/filefrag.o
 PROFILED_E2FREEFRAG_OBJS= profiled/e2freefrag.o
 PROFILED_E2UNDO_OBJS=	profiled/e2undo.o
 PROFILED_E4DEFRAG_OBJS=	profiled/e4defrag.o
+PROFILED_FUSE2FS_OJBS=	profiled/fuse2fs.o
 
 SRCS=	$(srcdir)/tune2fs.c $(srcdir)/mklost+found.c $(srcdir)/mke2fs.c \
 		$(srcdir)/chattr.c $(srcdir)/lsattr.c $(srcdir)/dumpe2fs.c \
@@ -82,7 +87,7 @@ SRCS=	$(srcdir)/tune2fs.c $(srcdir)/mklost+found.c $(srcdir)/mke2fs.c \
 		$(srcdir)/uuidgen.c $(srcdir)/blkid.c $(srcdir)/logsave.c \
 		$(srcdir)/filefrag.c $(srcdir)/base_device.c \
 		$(srcdir)/ismounted.c $(srcdir)/../e2fsck/profile.c \
-		$(srcdir)/e2undo.c $(srcdir)/e2freefrag.c
+		$(srcdir)/e2undo.c $(srcdir)/e2freefrag.c $(srcdir)/fuse2fs.c
 
 LIBS= $(LIBEXT2FS) $(LIBCOM_ERR) 
 DEPLIBS= $(LIBEXT2FS) $(DEPLIBCOM_ERR)
@@ -335,6 +340,12 @@ filefrag.profiled: $(FILEFRAG_OBJS)
 	$(Q) $(CC) $(ALL_LDFLAGS) -g -pg -o filefrag.profiled \
 		$(PROFILED_FILEFRAG_OBJS) 
 
+fuse2fs: $(FUSE2FS_OBJS) $(DEPLIBS) $(DEPLIBBLKID) $(DEPLIBUUID) \
+		$(DEPLIBQUOTA) $(LIBEXT2FS)
+	$(E) "	LD $@"
+	$(Q) $(CC) $(ALL_LDFLAGS) -o fuse2fs $(FUSE2FS_OBJS) $(LIBS) \
+		$(LIBFUSE) $(LIBBLKID) $(LIBUUID) $(LIBEXT2FS)
+
 tst_ismounted: $(srcdir)/ismounted.c $(STATIC_LIBEXT2FS) $(DEPLIBCOM_ERR)
 	$(E) "	LD $@"
 	$(CC) -o tst_ismounted $(srcdir)/ismounted.c -DDEBUG $(ALL_CFLAGS) \
diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
new file mode 100644
index 0000000..d1c00df
--- /dev/null
+++ b/misc/fuse2fs.c
@@ -0,0 +1,2837 @@
+/*
+ * fuse2fs.c - FUSE server for e2fsprogs.
+ *
+ * Copyright (C) 2013 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License.
+ * %End-Header%
+ */
+#define _FILE_OFFSET_BITS 64
+#define FUSE_USE_VERSION 29
+#define _GNU_SOURCE
+#include <pthread.h>
+#ifdef __linux__
+# include <linux/fs.h>
+# include <linux/falloc.h>
+# include <linux/xattr.h>
+#endif
+#include <sys/ioctl.h>
+#include <unistd.h>
+#include <fuse.h>
+#include "ext2fs/ext2fs.h"
+#include "ext2fs/ext2_fs.h"
+
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
+# ifdef _IOR
+#  ifdef _IOW
+#   define SUPPORT_I_FLAGS
+#  endif
+# endif
+#endif
+
+#ifdef FALLOC_FL_KEEP_SIZE
+# define FL_KEEP_SIZE_FLAG FALLOC_FL_KEEP_SIZE
+#else
+# define FL_KEEP_SIZE_FLAG (0)
+#endif
+
+#ifdef FALLOC_FL_PUNCH_HOLE
+# define FL_PUNCH_HOLE_FLAG FALLOC_FL_PUNCH_HOLE
+#else
+# define FL_PUNCH_HOLE_FLAG (0)
+#endif
+
+/*
+ * ext2_file_t contains a struct inode, so we can't leave files open.
+ * Use this as a proxy instead.
+ */
+struct fuse2fs_file_handle {
+	ext2_ino_t ino;
+	int open_flags;
+};
+
+/* Main program context */
+struct fuse2fs {
+	ext2_filsys fs;
+	pthread_mutex_t bfl;
+	int panic_on_error;
+	FILE *err_fp;
+	unsigned int next_generation;
+};
+
+static int __translate_error(ext2_filsys fs, errcode_t err, ext2_ino_t ino,
+			     const char *file, int line);
+#define translate_error(fs, ino, err) __translate_error((fs), (err), (ino), \
+			__FILE__, __LINE__)
+
+/* for macosx */
+#ifndef W_OK
+#  define W_OK 2
+#endif
+
+#ifndef R_OK
+#  define R_OK 4
+#endif
+
+#define EXT4_EPOCH_BITS 2
+#define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
+#define EXT4_NSEC_MASK  (~0UL << EXT4_EPOCH_BITS)
+
+/*
+ * Extended fields will fit into an inode if the filesystem was formatted
+ * with large inodes (-I 256 or larger) and there are not currently any EAs
+ * consuming all of the available space. For new inodes we always reserve
+ * enough space for the kernel's known extended fields, but for inodes
+ * created with an old kernel this might not have been the case. None of
+ * the extended inode fields is critical for correct filesystem operation.
+ * This macro checks if a certain field fits in the inode. Note that
+ * inode-size = GOOD_OLD_INODE_SIZE + i_extra_isize
+ */
+#define EXT4_FITS_IN_INODE(ext4_inode, field)		\
+	((offsetof(typeof(*ext4_inode), field) +	\
+	  sizeof((ext4_inode)->field))			\
+	<= (EXT2_GOOD_OLD_INODE_SIZE +			\
+	    (ext4_inode)->i_extra_isize))		\
+
+static inline __u32 ext4_encode_extra_time(const struct timespec *time)
+{
+	return (sizeof(time->tv_sec) > 4 ?
+		(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
+	       ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK);
+}
+
+static inline void ext4_decode_extra_time(struct timespec *time, __u32 extra)
+{
+	if (sizeof(time->tv_sec) > 4)
+		time->tv_sec |= (__u64)((extra) & EXT4_EPOCH_MASK) << 32;
+	time->tv_nsec = ((extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
+}
+
+#define EXT4_INODE_SET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	(raw_inode)->xtime = (timespec)->tv_sec;			       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		(raw_inode)->xtime ## _extra =				       \
+				ext4_encode_extra_time(timespec);	       \
+} while (0)
+
+#define EXT4_EINODE_SET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime))			       \
+		(raw_inode)->xtime = (timespec)->tv_sec;		       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		(raw_inode)->xtime ## _extra =				       \
+				ext4_encode_extra_time(timespec);	       \
+} while (0)
+
+#define EXT4_INODE_GET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	(timespec)->tv_sec = (signed)((raw_inode)->xtime);		       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		ext4_decode_extra_time((timespec),			       \
+				       raw_inode->xtime ## _extra);	       \
+	else								       \
+		(timespec)->tv_nsec = 0;				       \
+} while (0)
+
+#define EXT4_EINODE_GET_XTIME(xtime, timespec, raw_inode)		       \
+do {									       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime))			       \
+		(timespec)->tv_sec =					       \
+			(signed)((raw_inode)->xtime);			       \
+	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
+		ext4_decode_extra_time((timespec),			       \
+				       raw_inode->xtime ## _extra);	       \
+	else								       \
+		(timespec)->tv_nsec = 0;				       \
+} while (0)
+
+static void get_now(struct timespec *now)
+{
+#ifdef CLOCK_REALTIME
+	if (!clock_gettime(CLOCK_REALTIME, now))
+		return;
+#endif
+
+	now->tv_sec = time(NULL);
+	now->tv_nsec = 0;
+}
+
+static void increment_version(struct ext2_inode_large *inode)
+{
+	__u64 ver;
+
+	ver = inode->osd1.linux1.l_i_version;
+	if (EXT4_FITS_IN_INODE(inode, i_version_hi))
+		ver |= (__u64)inode->i_version_hi << 32;
+	ver++;
+	inode->osd1.linux1.l_i_version = ver;
+	if (EXT4_FITS_IN_INODE(inode, i_version_hi))
+		inode->i_version_hi = ver >> 32;
+}
+
+static void init_times(struct ext2_inode_large *inode)
+{
+	struct timespec now;
+
+	get_now(&now);
+	EXT4_INODE_SET_XTIME(i_atime, &now, inode);
+	EXT4_INODE_SET_XTIME(i_ctime, &now, inode);
+	EXT4_INODE_SET_XTIME(i_mtime, &now, inode);
+	EXT4_EINODE_SET_XTIME(i_crtime, &now, inode);
+	increment_version(inode);
+}
+
+static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
+			struct ext2_inode_large *pinode)
+{
+	errcode_t err;
+	struct timespec now;
+	struct ext2_inode_large inode;
+
+	get_now(&now);
+
+	/* If user already has a inode buffer, just update that */
+	if (pinode) {
+		increment_version(pinode);
+		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
+		return 0;
+	}
+
+	/* Otherwise we have to read-modify-write the inode */
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	increment_version(&inode);
+	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int update_atime(ext2_filsys fs, ext2_ino_t ino)
+{
+	errcode_t err;
+	struct ext2_inode_large inode, *pinode;
+	struct timespec atime, mtime, now;
+
+	if (!(fs->flags & EXT2_FLAG_RW))
+		return 0;
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	pinode = &inode;
+	EXT4_INODE_GET_XTIME(i_atime, &atime, pinode);
+	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
+	get_now(&now);
+	/*
+	 * If atime is newer than mtime and atime hasn't been updated in more
+	 * than a day, skip the atime update.  Same idea as Linux "relatime".
+	 */
+	if (atime.tv_sec >= mtime.tv_sec && atime.tv_sec >= now.tv_sec - 86400)
+		return 0;
+	EXT4_INODE_SET_XTIME(i_atime, &now, &inode);
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int update_mtime(ext2_filsys fs, ext2_ino_t ino)
+{
+	errcode_t err;
+	struct ext2_inode_large inode;
+	struct timespec now;
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	get_now(&now);
+	EXT4_INODE_SET_XTIME(i_mtime, &now, &inode);
+	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
+	increment_version(&inode);
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	return 0;
+}
+
+static int ext2_file_type(unsigned int mode)
+{
+	if (LINUX_S_ISREG(mode))
+		return EXT2_FT_REG_FILE;
+
+	if (LINUX_S_ISDIR(mode))
+		return EXT2_FT_DIR;
+
+	if (LINUX_S_ISCHR(mode))
+		return EXT2_FT_CHRDEV;
+
+	if (LINUX_S_ISBLK(mode))
+		return EXT2_FT_BLKDEV;
+
+	if (LINUX_S_ISLNK(mode))
+		return EXT2_FT_SYMLINK;
+
+	if (LINUX_S_ISFIFO(mode))
+		return EXT2_FT_FIFO;
+
+	if (LINUX_S_ISSOCK(mode))
+		return EXT2_FT_SOCK;
+
+	return 0;
+}
+
+static int fs_writeable(ext2_filsys fs)
+{
+	return (fs->flags & EXT2_FLAG_RW) && (fs->super->s_error_count == 0);
+}
+
+static int __check_access(struct fuse_context *ctxt, ext2_filsys fs,
+			  ext2_ino_t ino, int mask, int ignore_flags)
+{
+	errcode_t err;
+	struct ext2_inode inode;
+	mode_t perms;
+
+	/* no writing to read-only or broken fs */
+	if ((mask & W_OK) && !fs_writeable(fs))
+		return -EROFS;
+
+	err = ext2fs_read_inode(fs, ino, &inode);
+	if (err)
+		return translate_error(fs, ino, err);
+
+	/* existence check */
+	if (mask == 0)
+		return 0;
+
+	/* is immutable? */
+	if (!ignore_flags && (mask & W_OK) &&
+	    (inode.i_flags & EXT2_IMMUTABLE_FL))
+		return -EPERM;
+
+	perms = inode.i_mode & 0777;
+
+	/* always allow root */
+	if (ctxt->uid == 0)
+		return 0;
+
+	/* allow owner, if perms match */
+	if (inode.i_uid == ctxt->uid) {
+		if ((mask << 6) & perms)
+			return 0;
+		return -EPERM;
+	}
+
+	/* allow group, if perms match */
+	if (inode.i_gid == ctxt->gid) {
+		if ((mask << 3) & perms)
+			return 0;
+		return -EPERM;
+	}
+
+	/* otherwise check other */
+	if (mask & perms)
+		return 0;
+	return -EPERM;
+}
+
+static int check_inum_access(struct fuse_context *ctxt, ext2_filsys fs,
+			     ext2_ino_t ino, int mask)
+{
+	return __check_access(ctxt, fs, ino, mask, 0);
+}
+
+static int check_flags_access(struct fuse_context *ctxt, ext2_filsys fs,
+			      ext2_ino_t ino, int mask)
+{
+	return __check_access(ctxt, fs, ino, mask, 1);
+}
+
+static void op_destroy(void *p)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+
+	if (fs->flags & EXT2_FLAG_RW) {
+		fs->super->s_state |= EXT2_VALID_FS;
+		if (fs->super->s_error_count)
+			fs->super->s_state |= EXT2_ERROR_FS;
+		ext2fs_mark_super_dirty(fs);
+		err = ext2fs_set_gdt_csum(fs);
+		if (err)
+			translate_error(fs, 0, err);
+
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			translate_error(fs, 0, err);
+	}
+}
+
+static void *op_init(struct fuse_conn_info *conn)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+
+	if (fs->flags & EXT2_FLAG_RW) {
+		fs->super->s_mnt_count++;
+		fs->super->s_mtime = time(NULL);
+		fs->super->s_state &= ~EXT2_VALID_FS;
+		ext2fs_mark_super_dirty(fs);
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			translate_error(fs, 0, err);
+	}
+	return ff;
+}
+
+static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
+{
+	struct ext2_inode_large inode;
+	dev_t fakedev = 0;
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return translate_error(fs, ino, err);
+
+	memcpy(&fakedev, fs->super->s_uuid, sizeof(fakedev));
+	statbuf->st_dev = fakedev;
+	statbuf->st_ino = ino;
+	statbuf->st_mode = inode.i_mode;
+	statbuf->st_nlink = inode.i_links_count;
+	statbuf->st_uid = inode.i_uid;
+	statbuf->st_gid = inode.i_gid;
+	statbuf->st_size = inode.i_size;
+	statbuf->st_blksize = fs->blocksize;
+	statbuf->st_blocks = inode.i_blocks;
+	statbuf->st_atime = inode.i_atime;
+	statbuf->st_mtime = inode.i_mtime;
+	statbuf->st_ctime = inode.i_ctime;
+	if (LINUX_S_ISCHR(inode.i_mode) ||
+	    LINUX_S_ISBLK(inode.i_mode)) {
+		if (inode.i_block[0])
+			statbuf->st_rdev = inode.i_block[0];
+		else
+			statbuf->st_rdev = inode.i_block[1];
+	}
+
+	return ret;
+}
+
+static int op_getattr(const char *path, struct stat *statbuf)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+	ret = stat_inode(fs, ino, statbuf);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_readlink(const char *path, char *buf, size_t len)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode inode;
+	unsigned int got;
+	ext2_file_t file;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	err = ext2fs_read_inode(fs, ino, &inode);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	if (!LINUX_S_ISLNK(inode.i_mode)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	len--;
+	if (inode.i_size < len)
+		len = inode.i_size;
+	if (ext2fs_inode_data_blocks2(fs, &inode)) {
+		/* big symlink */
+
+		err = ext2fs_file_open(fs, ino, 0, &file);
+		if (err) {
+			ret = translate_error(fs, ino, err);
+			goto out;
+		}
+
+		err = ext2fs_file_read(file, buf, len, &got);
+		if (err || got != len) {
+			ext2fs_file_close(file);
+			ret = translate_error(fs, ino, err);
+			goto out;
+		}
+
+		err = ext2fs_file_close(file);
+		if (err) {
+			ret = translate_error(fs, ino, err);
+			goto out;
+		}
+	} else
+		/* inline symlink */
+		memcpy(buf, (char *)inode.i_block, len);
+	buf[len] = 0;
+
+	if (fs_writeable(fs)) {
+		ret = update_atime(fs, ino);
+		if (ret)
+			goto out;
+	}
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_mknod(const char *path, mode_t mode, dev_t dev)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t parent, child;
+	char *temp_path = strdup(path);
+	errcode_t err;
+	char *node_name, a;
+	int filetype;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ctxt, fs, parent, W_OK);
+	if (ret)
+		goto out2;
+
+	*node_name = a;
+
+	if (LINUX_S_ISCHR(mode))
+		filetype = EXT2_FT_CHRDEV;
+	else if (LINUX_S_ISBLK(mode))
+		filetype = EXT2_FT_BLKDEV;
+	else if (LINUX_S_ISFIFO(mode))
+		filetype = EXT2_FT_FIFO;
+	else {
+		ret = -EINVAL;
+		goto out2;
+	}
+
+	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	err = ext2fs_link(fs, parent, node_name, child, filetype);
+	if (err == EXT2_ET_DIR_NO_SPACE) {
+		err = ext2fs_expand_dir(fs, parent);
+		if (err) {
+			ret = translate_error(fs, parent, err);
+			goto out2;
+		}
+
+		err = ext2fs_link(fs, parent, node_name, child,
+				     filetype);
+	}
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent);
+	if (ret)
+		goto out2;
+
+	memset(&inode, 0, sizeof(inode));
+	inode.i_mode = mode;
+
+	if (dev & ~0xFFFF)
+		inode.i_block[1] = dev;
+	else
+		inode.i_block[0] = dev;
+	inode.i_links_count = 1;
+	inode.i_extra_isize = sizeof(struct ext2_inode_large) -
+		EXT2_GOOD_OLD_INODE_SIZE;
+
+	err = ext2fs_write_new_inode(fs, child, (struct ext2_inode *)&inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	inode.i_generation = ff->next_generation++;
+	init_times(&inode);
+	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
+
+out2:
+	pthread_mutex_unlock(&ff->bfl);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int op_mkdir(const char *path, mode_t mode)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t parent, child;
+	char *temp_path = strdup(path);
+	errcode_t err;
+	char *node_name, a;
+	struct ext2_inode_large inode;
+	char *block;
+	blk64_t blk;
+	int ret = 0;
+
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ctxt, fs, parent, W_OK);
+	if (ret)
+		goto out2;
+
+	*node_name = a;
+
+	err = ext2fs_mkdir(fs, parent, 0, node_name);
+	if (err == EXT2_ET_DIR_NO_SPACE) {
+		err = ext2fs_expand_dir(fs, parent);
+		if (err) {
+			ret = translate_error(fs, parent, err);
+			goto out2;
+		}
+
+		err = ext2fs_mkdir(fs, parent, 0, node_name);
+	}
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent);
+	if (ret)
+		goto out2;
+
+	/* Still have to update the uid/gid of the dir */
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	err = ext2fs_read_inode_full(fs, child, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	inode.i_uid = ctxt->uid;
+	inode.i_gid = ctxt->gid;
+	inode.i_generation = ff->next_generation++;
+
+	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	/* Rewrite the directory block checksum, having set i_generation */
+	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		goto out2;
+	err = ext2fs_new_dir_block(fs, child, parent, &block);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+	err = ext2fs_bmap2(fs, child, (struct ext2_inode *)&inode, NULL, 0, 0,
+			   NULL, &blk);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out3;
+	}
+	err = ext2fs_write_dir_block4(fs, blk, block, 0, child);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out3;
+	}
+
+out3:
+	ext2fs_free_mem(&block);
+out2:
+	pthread_mutex_unlock(&ff->bfl);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int unlink_file_by_name(struct fuse_context *ctxt, ext2_filsys fs,
+			       const char *path)
+{
+	errcode_t err;
+	ext2_ino_t dir;
+	char *filename = strdup(path);
+	char *base_name;
+	int ret;
+
+	base_name = strrchr(filename, '/');
+	if (base_name) {
+		*base_name++ = '\0';
+		err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, filename,
+				   &dir);
+		if (err) {
+			free(filename);
+			return translate_error(fs, 0, err);
+		}
+	} else {
+		dir = EXT2_ROOT_INO;
+		base_name = filename;
+	}
+
+	ret = check_inum_access(ctxt, fs, dir, W_OK);
+	if (ret) {
+		free(filename);
+		return ret;
+	}
+
+	err = ext2fs_unlink(fs, dir, base_name, 0, 0);
+	free(filename);
+	if (err)
+		return translate_error(fs, dir, err);
+
+	return update_mtime(fs, dir);
+}
+
+static int release_blocks_proc(ext2_filsys fs, blk64_t *blocknr,
+			       e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
+			       blk64_t ref_block EXT2FS_ATTR((unused)),
+			       int ref_offset EXT2FS_ATTR((unused)),
+			       void *private EXT2FS_ATTR((unused)))
+{
+	blk64_t blk = *blocknr;
+
+	if (blk % EXT2FS_CLUSTER_RATIO(fs) == 0)
+		ext2fs_block_alloc_stats2(fs, *blocknr, -1);
+	return 0;
+}
+
+static int remove_inode(struct fuse2fs *ff, ext2_ino_t ino)
+{
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	switch (inode.i_links_count) {
+	case 0:
+		return 0; /* XXX: already done? */
+	case 1:
+		inode.i_links_count--;
+		inode.i_dtime = fs->now ? fs->now : time(0);
+		break;
+	default:
+		inode.i_links_count--;
+	}
+
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	if (inode.i_links_count)
+		goto out;
+
+	err = ext2fs_free_ext_attr(fs, ino, &inode);
+	if (err)
+		goto out;
+	if (ext2fs_inode_has_valid_blocks2(fs, (struct ext2_inode *)&inode))
+		ext2fs_block_iterate3(fs, ino, BLOCK_FLAG_READ_ONLY, NULL,
+				      release_blocks_proc, NULL);
+	ext2fs_inode_alloc_stats2(fs, ino, -1,
+				  LINUX_S_ISDIR(inode.i_mode));
+out:
+	return ret;
+}
+
+static int __op_unlink(const char *path)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	ret = unlink_file_by_name(ctxt, fs, path);
+	if (ret)
+		goto out;
+
+	ret = remove_inode(ff, ino);
+	if (ret)
+		goto out;
+out:
+	return ret;
+}
+
+static int op_unlink(const char *path)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	int ret;
+
+	pthread_mutex_lock(&ff->bfl);
+	ret = __op_unlink(path);
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+struct rd_struct {
+	ext2_ino_t	parent;
+	int		empty;
+};
+
+static int rmdir_proc(ext2_ino_t dir EXT2FS_ATTR((unused)),
+		      int	entry EXT2FS_ATTR((unused)),
+		      struct ext2_dir_entry *dirent,
+		      int	offset EXT2FS_ATTR((unused)),
+		      int	blocksize EXT2FS_ATTR((unused)),
+		      char	*buf EXT2FS_ATTR((unused)),
+		      void	*private)
+{
+	struct rd_struct *rds = (struct rd_struct *) private;
+
+	if (dirent->inode == 0)
+		return 0;
+	if (((dirent->name_len & 0xFF) == 1) && (dirent->name[0] == '.'))
+		return 0;
+	if (((dirent->name_len & 0xFF) == 2) && (dirent->name[0] == '.') &&
+	    (dirent->name[1] == '.')) {
+		rds->parent = dirent->inode;
+		return 0;
+	}
+	rds->empty = 0;
+	return 0;
+}
+
+static int op_rmdir(const char *path)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t child;
+	errcode_t err;
+	struct ext2_inode inode;
+	struct rd_struct rds;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, child, W_OK);
+	if (ret)
+		goto out;
+
+	rds.parent = 0;
+	rds.empty = 1;
+
+	err = ext2fs_dir_iterate2(fs, child, 0, 0, rmdir_proc, &rds);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out;
+	}
+
+	if (rds.empty == 0) {
+		ret = -ENOTEMPTY;
+		goto out;
+	}
+
+	ret = unlink_file_by_name(ctxt, fs, path);
+	if (ret)
+		goto out;
+	/* Directories have to be "removed" twice. */
+	ret = remove_inode(ff, child);
+	if (ret)
+		goto out;
+	ret = remove_inode(ff, child);
+	if (ret)
+		goto out;
+
+	if (rds.parent) {
+		err = ext2fs_read_inode(fs, rds.parent, &inode);
+		if (err) {
+			ret = translate_error(fs, rds.parent, err);
+			goto out;
+		}
+		if (inode.i_links_count > 1)
+			inode.i_links_count--;
+		ret = update_mtime(fs, rds.parent);
+		if (ret)
+			goto out;
+		err = ext2fs_write_inode(fs, rds.parent, &inode);
+		if (err) {
+			ret = translate_error(fs, rds.parent, err);
+			goto out;
+		}
+	}
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_symlink(const char *src, const char *dest)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t parent, child;
+	char *temp_path = strdup(dest);
+	errcode_t err;
+	char *node_name, a;
+	struct ext2_inode_large inode;
+	int len = strlen(src);
+	int ret = 0;
+
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	*node_name = a;
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	ret = check_inum_access(ctxt, fs, parent, W_OK);
+	if (ret)
+		goto out2;
+
+
+	/* Create symlink */
+	err = ext2fs_symlink(fs, parent, 0, node_name, (char *)src);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	/* Update parent dir's mtime */
+	ret = update_mtime(fs, parent);
+	if (ret)
+		goto out2;
+
+	/* Still have to update the uid/gid of the symlink */
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &child);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	err = ext2fs_read_inode_full(fs, child, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	inode.i_uid = ctxt->uid;
+	inode.i_gid = ctxt->gid;
+	inode.i_generation = ff->next_generation++;
+
+	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+out2:
+	pthread_mutex_unlock(&ff->bfl);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int op_rename(const char *from, const char *to)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t from_ino, to_ino, to_dir_ino, from_dir_ino;
+	char *temp_to = NULL, *temp_from = NULL;
+	char *cp, a;
+	struct ext2_inode from_inode;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, from, &from_ino);
+	if (err || from_ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, to, &to_ino);
+	if (err && err != EXT2_ET_FILE_NOT_FOUND) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	if (err == EXT2_ET_FILE_NOT_FOUND)
+		to_ino = 0;
+
+	/* Already the same file? */
+	if (to_ino != 0 && to_ino == from_ino) {
+		ret = 0;
+		goto out;
+	}
+
+	temp_to = strdup(to);
+	if (!temp_to) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	temp_from = strdup(from);
+	if (!temp_from) {
+		ret = -ENOMEM;
+		goto out2;
+	}
+
+	/* Find parent dir of the source and check write access */
+	cp = strrchr(temp_from, '/');
+	if (!cp) {
+		ret = -EINVAL;
+		goto out2;
+	}
+
+	a = *(cp + 1);
+	*(cp + 1) = 0;
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_from,
+			   &from_dir_ino);
+	*(cp + 1) = a;
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+	if (from_dir_ino == 0) {
+		ret = -ENOENT;
+		goto out2;
+	}
+
+	ret = check_inum_access(ctxt, fs, from_dir_ino, W_OK);
+	if (ret)
+		goto out2;
+
+	/* Find parent dir of the destination and check write access */
+	cp = strrchr(temp_to, '/');
+	if (!cp) {
+		ret = -EINVAL;
+		goto out2;
+	}
+
+	a = *(cp + 1);
+	*(cp + 1) = 0;
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_to,
+			   &to_dir_ino);
+	*(cp + 1) = a;
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+	if (to_dir_ino == 0) {
+		ret = -ENOENT;
+		goto out2;
+	}
+
+	ret = check_inum_access(ctxt, fs, to_dir_ino, W_OK);
+	if (ret)
+		goto out2;
+
+	/* Get ready to do the move */
+	err = ext2fs_read_inode(fs, from_ino, &from_inode);
+	if (err) {
+		ret = translate_error(fs, from_ino, err);
+		goto out2;
+	}
+
+	/* If the target exists, unlink it first */
+	if (to_ino != 0) {
+		ret = __op_unlink(to);
+		if (ret)
+			goto out2;
+	}
+
+	/* Link in the new file */
+	err = ext2fs_link(fs, to_dir_ino, cp + 1, from_ino,
+			  ext2_file_type(from_inode.i_mode));
+	if (err == EXT2_ET_DIR_NO_SPACE) {
+		err = ext2fs_expand_dir(fs, to_dir_ino);
+		if (err) {
+			ret = translate_error(fs, to_dir_ino, err);
+			goto out2;
+		}
+
+		err = ext2fs_link(fs, to_dir_ino, cp + 1, from_ino,
+				     ext2_file_type(from_inode.i_mode));
+	}
+	if (err) {
+		ret = translate_error(fs, to_dir_ino, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, to_dir_ino);
+	if (ret)
+		goto out2;
+
+	/* Remove the old file */
+	ret = unlink_file_by_name(ctxt, fs, from);
+	if (ret)
+		goto out2;
+
+	/* Flush the whole mess out */
+	err = ext2fs_flush2(fs, 0);
+	if (err)
+		ret = translate_error(fs, 0, err);
+
+out2:
+	free(temp_from);
+	free(temp_to);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_link(const char *src, const char *dest)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	char *temp_path = strdup(dest);
+	errcode_t err;
+	char *node_name, a;
+	ext2_ino_t parent, ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	*node_name = a;
+	if (err) {
+		err = -ENOENT;
+		goto out2;
+	}
+
+	ret = check_inum_access(ctxt, fs, parent, W_OK);
+	if (ret)
+		goto out2;
+
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, src, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	inode.i_links_count++;
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out2;
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_link(fs, parent, node_name, ino,
+			  ext2_file_type(inode.i_mode));
+	if (err == EXT2_ET_DIR_NO_SPACE) {
+		err = ext2fs_expand_dir(fs, parent);
+		if (err) {
+			ret = translate_error(fs, parent, err);
+			goto out2;
+		}
+
+		err = ext2fs_link(fs, parent, node_name, ino,
+				     ext2_file_type(inode.i_mode));
+	}
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent);
+	if (ret)
+		goto out;
+
+out2:
+	pthread_mutex_unlock(&ff->bfl);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int op_chmod(const char *path, mode_t mode)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	/* XXX: Fails if uid matches but u-w */
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	inode.i_mode &= ~0xFFF;
+	inode.i_mode |= mode & 0xFFF;
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_chown(const char *path, uid_t owner, gid_t group)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	inode.i_uid = owner;
+	inode.i_gid = group;
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_truncate(const char *path, off_t len)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	ext2_file_t file;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_set_size2(file, len);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+out2:
+	err = ext2fs_file_close(file);
+	if (err && !ret) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	if (!ret)
+		ret = update_mtime(fs, ino);
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return err;
+}
+
+static int __op_open(const char *path, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct fuse2fs_file_handle *file;
+	int check, ret = 0;
+
+	file = calloc(1, sizeof(*file));
+	if (!file)
+		return -ENOMEM;
+
+	file->open_flags = 0;
+	if (fp->flags & (O_RDWR | O_WRONLY))
+		file->open_flags |= EXT2_FILE_WRITE;
+	if (fp->flags & O_CREAT)
+		file->open_flags |= EXT2_FILE_CREATE;
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &file->ino);
+	if (err || file->ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	check = R_OK;
+	if (file->open_flags & EXT2_FILE_WRITE)
+		check |= W_OK;
+	ret = check_inum_access(ctxt, fs, file->ino, check);
+	if (ret)
+		goto out;
+	fp->fh = (uint64_t)file;
+
+out:
+	if (ret)
+		free(file);
+	return ret;
+}
+
+static int op_open(const char *path, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	int ret;
+
+	pthread_mutex_lock(&ff->bfl);
+	ret = __op_open(path, fp);
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_read(const char *path, char *buf, size_t len, off_t offset,
+		     struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	ext2_file_t efp;
+	errcode_t err;
+	unsigned int got;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_llseek(efp, offset, SEEK_SET, NULL);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_read(efp, buf, len, &got);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_close(efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	if (fs_writeable(fs)) {
+		ret = update_atime(fs, fh->ino);
+		if (ret)
+			goto out;
+	}
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return got ? got : ret;
+}
+
+static int op_write(const char *path, const char *buf, size_t len, off_t offset,
+		      struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	ext2_file_t efp;
+	errcode_t err;
+	unsigned int got;
+	__u64 fsize;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!fs_writeable(fs)) {
+		ret = -EROFS;
+		goto out;
+	}
+
+	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_llseek(efp, offset, SEEK_SET, NULL);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_write(efp, buf, len, &got);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_flush(efp);
+	if (err) {
+		got = 0;
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	/*
+	 * Apparently ext2fs_file_write will dirty the inode (to allocate
+	 * blocks) without bothering to write out the inode, so change the
+	 * file size *after* the write, because changing the size forces
+	 * the inode out to disk.
+	 */
+	err = ext2fs_file_get_lsize(efp, &fsize);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+	if (offset + len > fsize) {
+		fsize = offset + len;
+		err = ext2fs_file_set_size2(efp, fsize);
+		if (err) {
+			ret = translate_error(fs, fh->ino, err);
+			goto out;
+		}
+	}
+
+	err = ext2fs_file_close(efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	ret = update_mtime(fs, fh->ino);
+	if (ret)
+		goto out;
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return got ? got : ret;
+}
+
+static int op_release(const char *path, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (fs_writeable(fs) && fh->open_flags & EXT2_FILE_WRITE) {
+		err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
+		if (err)
+			ret = translate_error(fs, fh->ino, err);
+	}
+	fp->fh = 0;
+	pthread_mutex_unlock(&ff->bfl);
+
+	free(fh);
+
+	return ret;
+}
+
+static int op_fsync(const char *path, int datasync, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	errcode_t err;
+	int ret = 0;
+
+	/* For now, flush everything, even if it's slow */
+	pthread_mutex_lock(&ff->bfl);
+	if (fs_writeable(fs) && fh->open_flags & EXT2_FILE_WRITE) {
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			ret = translate_error(fs, fh->ino, err);
+	}
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+
+static int op_statfs(const char *path, struct statvfs *buf)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	uint64_t fsid, *f;
+
+	buf->f_bsize = fs->blocksize;
+	buf->f_frsize = 0;
+	buf->f_blocks = fs->super->s_blocks_count;
+	buf->f_bfree = fs->super->s_free_blocks_count;
+	if (fs->super->s_free_blocks_count < fs->super->s_r_blocks_count)
+		buf->f_bavail = 0;
+	else
+		buf->f_bavail = fs->super->s_free_blocks_count -
+				fs->super->s_r_blocks_count;
+	buf->f_files = fs->super->s_inodes_count;
+	buf->f_ffree = fs->super->s_free_inodes_count;
+	buf->f_favail = fs->super->s_free_inodes_count;
+	f = (uint64_t *)fs->super->s_uuid;
+	fsid = *f;
+	f++;
+	fsid ^= *f;
+	buf->f_fsid = fsid;
+	buf->f_flag = 0;
+	if (fs->flags & EXT2_FLAG_RW)
+		buf->f_flag |= ST_RDONLY;
+	buf->f_namemax = EXT2_NAME_LEN;
+
+	return 0;
+}
+
+static int op_getxattr(const char *path, const char *key, char *value,
+		       size_t len)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	void *ptr;
+	unsigned int plen;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
+				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, R_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattr_get(h, key, &ptr, &plen);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	if (!len) {
+		ret = plen;
+	} else if (len < plen) {
+		ret = -ERANGE;
+	} else {
+		memcpy(value, ptr, plen);
+		ret = plen;
+	}
+
+	ext2fs_free_mem(&ptr);
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (err)
+		ret = translate_error(fs, ino, err);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+
+static int count_buffer_space(char *name, char *value, void *data)
+{
+	unsigned int *x = data;
+
+	*x = *x + strlen(name) + 1;
+	return 0;
+}
+
+static int copy_names(char *name, char *value, void *data)
+{
+	char **b = data;
+
+	strncpy(*b, name, strlen(name));
+	*b = *b + strlen(name) + 1;
+
+	return 0;
+}
+
+static int op_listxattr(const char *path, char *names, size_t len)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	unsigned int bufsz;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
+				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, R_OK);
+	if (ret)
+		goto out2;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	/* Count buffer space needed for names */
+	bufsz = 0;
+	err = ext2fs_xattrs_iterate(h, count_buffer_space, &bufsz);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	if (len == 0) {
+		ret = bufsz;
+		goto out2;
+	} else if (len < bufsz) {
+		ret = -ERANGE;
+		goto out2;
+	}
+
+	/* Copy names out */
+	memset(names, 0, len);
+	err = ext2fs_xattrs_iterate(h, copy_names, &names);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+	ret = bufsz;
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (err)
+		ret = translate_error(fs, ino, err);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+
+static int op_setxattr(const char *path, const char *key, const char *value,
+		       size_t len, int flags)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
+				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattr_set(h, key, value, len);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattrs_write(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (err)
+		ret = translate_error(fs, ino, err);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+
+static int op_removexattr(const char *path, const char *key)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct ext2_xattr_handle *h;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
+				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_xattrs_open(fs, ino, &h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	err = ext2fs_xattrs_read(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattr_remove(h, key);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+	err = ext2fs_xattrs_write(h);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out2;
+	}
+
+out2:
+	err = ext2fs_xattrs_close(&h);
+	if (err)
+		ret = translate_error(fs, ino, err);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+
+struct readdir_iter {
+	void *buf;
+	fuse_fill_dir_t func;
+};
+
+static int op_readdir_iter(ext2_ino_t dir, int entry,
+			   struct ext2_dir_entry *dirent, int offset,
+			   int blocksize, char *buf, void *data)
+{
+	struct readdir_iter *i = data;
+	struct stat statbuf;
+	char namebuf[EXT2_NAME_LEN + 1];
+	int ret;
+
+	memcpy(namebuf, dirent->name, dirent->name_len & 0xFF);
+	namebuf[dirent->name_len & 0xFF] = 0;
+	statbuf.st_ino = dirent->inode;
+	statbuf.st_mode = S_IFREG;
+	ret = i->func(i->buf, namebuf, NULL, 0);
+	if (ret)
+		return DIRENT_ABORT;
+
+	return 0;
+}
+
+static int op_readdir(const char *path, void *buf, fuse_fill_dir_t fill_func,
+		      off_t offset, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct readdir_iter i;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	i.buf = buf;
+	i.func = fill_func;
+	err = ext2fs_dir_iterate2(fs, fh->ino, 0, NULL, op_readdir_iter, &i);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	if (fs_writeable(fs)) {
+		ret = update_atime(fs, fh->ino);
+		if (ret)
+			goto out;
+	}
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_access(const char *path, int mask)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err || ino == 0) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, mask);
+	if (ret)
+		goto out;
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct ext3_extent_header *eh;
+	ext2_ino_t parent, child;
+	char *temp_path = strdup(path);
+	errcode_t err;
+	char *node_name, a;
+	int filetype, i;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	if (!temp_path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name = strrchr(temp_path, '/');
+	if (!node_name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	node_name++;
+	a = *node_name;
+	*node_name = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
+			   &parent);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out2;
+	}
+
+	err = check_inum_access(ctxt, fs, parent, W_OK);
+	if (err)
+		goto out;
+
+	*node_name = a;
+
+	filetype = ext2_file_type(mode);
+
+	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	err = ext2fs_link(fs, parent, node_name, child, filetype);
+	if (err == EXT2_ET_DIR_NO_SPACE) {
+		err = ext2fs_expand_dir(fs, parent);
+		if (err) {
+			ret = translate_error(fs, parent, err);
+			goto out2;
+		}
+
+		err = ext2fs_link(fs, parent, node_name, child,
+				     filetype);
+	}
+	if (err) {
+		ret = translate_error(fs, parent, err);
+		goto out2;
+	}
+
+	ret = update_mtime(fs, parent);
+	if (ret)
+		goto out2;
+
+	memset(&inode, 0, sizeof(inode));
+	inode.i_mode = mode;
+	inode.i_links_count = 1;
+	inode.i_extra_isize = sizeof(struct ext2_inode_large) -
+		EXT2_GOOD_OLD_INODE_SIZE;
+	if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
+		inode.i_flags = EXT4_EXTENTS_FL;
+
+		/* This must be initialized, even for a zero byte file. */
+		eh = (struct ext3_extent_header *) &inode.i_block[0];
+		eh->eh_magic = ext2fs_cpu_to_le16(EXT3_EXT_MAGIC);
+		eh->eh_depth = 0;
+		eh->eh_entries = 0;
+		i = (sizeof(inode.i_block) - sizeof(*eh)) /
+			sizeof(struct ext3_extent);
+		eh->eh_max = ext2fs_cpu_to_le16(i);
+	}
+
+	err = ext2fs_write_new_inode(fs, child, (struct ext2_inode *)&inode);
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	inode.i_generation = ff->next_generation++;
+	init_times(&inode);
+	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, child, err);
+		goto out2;
+	}
+
+	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
+
+	ret = __op_open(path, fp);
+	if (ret)
+		goto out2;
+out2:
+	pthread_mutex_unlock(&ff->bfl);
+out:
+	free(temp_path);
+	return ret;
+}
+
+static int op_ftruncate(const char *path, off_t len, struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	ext2_file_t efp;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!fs_writeable(fs)) {
+		ret = -EROFS;
+		goto out;
+	}
+
+	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_set_size2(efp, len);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	err = ext2fs_file_close(efp);
+	if (err) {
+		ret = translate_error(fs, fh->ino, err);
+		goto out;
+	}
+
+	ret = update_mtime(fs, fh->ino);
+	if (ret)
+		goto out;
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return 0;
+}
+
+static int op_fgetattr(const char *path, struct stat *statbuf,
+		       struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	ret = stat_inode(fs, fh->ino, statbuf);
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+
+static int op_utimens(const char *path, const struct timespec tv[2])
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	errcode_t err;
+	ext2_ino_t ino;
+	struct ext2_inode_large inode;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	ret = check_inum_access(ctxt, fs, ino, W_OK);
+	if (ret)
+		goto out;
+
+	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+	EXT4_INODE_SET_XTIME(i_atime, tv, &inode);
+	EXT4_INODE_SET_XTIME(i_mtime, tv + 1, &inode);
+	ret = update_ctime(fs, ino, &inode);
+	if (ret)
+		goto out;
+
+	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return 0;
+}
+
+#ifdef SUPPORT_I_FLAGS
+static int ioctl_getflags(ext2_filsys fs, struct fuse2fs_file_handle *fh,
+			  void *data)
+{
+	errcode_t err;
+	struct ext2_inode_large inode;
+
+	err = ext2fs_read_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return -EIO;
+
+	*(__u32 *)data = inode.i_flags & EXT2_FL_USER_VISIBLE;
+	return 0;
+}
+
+#define FUSE2FS_MODIFIABLE_IFLAGS \
+	(EXT2_IMMUTABLE_FL | EXT2_APPEND_FL | EXT2_NODUMP_FL | \
+	 EXT2_NOATIME_FL | EXT3_JOURNAL_DATA_FL | EXT2_DIRSYNC_FL | \
+	 EXT2_TOPDIR_FL)
+
+int ioctl_setflags(ext2_filsys fs, struct fuse2fs_file_handle *fh, void *data)
+{
+	errcode_t err;
+	struct ext2_inode_large inode;
+	int ret;
+	__u32 flags = *(__u32 *)data;
+	struct fuse_context *ctxt = fuse_get_context();
+
+	ret = check_flags_access(ctxt, fs, fh->ino, W_OK);
+	if (ret)
+		return ret;
+
+	err = ext2fs_read_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				     sizeof(inode));
+	if (err)
+		return -EIO;
+
+	if ((inode.i_flags ^ flags) & ~FUSE2FS_MODIFIABLE_IFLAGS)
+		return -EINVAL;
+
+	inode.i_flags = inode.i_flags & ~FUSE2FS_MODIFIABLE_IFLAGS |
+			flags & FUSE2FS_MODIFIABLE_IFLAGS;
+
+	err = ext2fs_write_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
+				      sizeof(inode));
+	if (err)
+		return -EIO;
+
+	return 0;
+}
+#endif /* SUPPORT_I_FLAGS */
+
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
+static int op_ioctl(const char *path, int cmd, void *arg,
+		      struct fuse_file_info *fp, unsigned int flags, void *data)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	switch (cmd) {
+#ifdef SUPPORT_I_FLAGS
+	case EXT2_IOC_GETFLAGS:
+		ret = ioctl_getflags(fs, fh, data);
+		break;
+	case EXT2_IOC_SETFLAGS:
+		ret = ioctl_setflags(fs, fh, data);
+		break;
+#endif
+	default:
+		ret = -ENOTTY;
+	}
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+#endif /* FUSE 28 */
+
+static int op_bmap(const char *path, size_t blocksize, uint64_t *idx)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	ext2_ino_t ino;
+	errcode_t err;
+	int ret = 0;
+
+	pthread_mutex_lock(&ff->bfl);
+	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
+	if (err) {
+		ret = translate_error(fs, 0, err);
+		goto out;
+	}
+
+	err = ext2fs_bmap2(fs, ino, NULL, NULL, 0, *idx, 0, (blk64_t *)idx);
+	if (err) {
+		ret = translate_error(fs, ino, err);
+		goto out;
+	}
+
+out:
+	pthread_mutex_unlock(&ff->bfl);
+	return ret;
+}
+
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)
+static int fallocate_helper(struct fuse_file_info *fp, int mode, off_t offset,
+			    off_t len)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	blk64_t blk, end, x;
+	__u64 fsize;
+	ext2_file_t efp;
+	struct ext2_inode_large inode;
+	errcode_t err;
+	int ret = 0;
+
+	/* Allocate a bunch of blocks */
+	end = (offset + len - 1) / fs->blocksize;
+	for (blk = offset / fs->blocksize; blk <= end; blk++) {
+		err = ext2fs_bmap2(fs, fh->ino, NULL, NULL, BMAP_ALLOC, blk,
+				   0, &x);
+		if (err)
+			return translate_error(fs, fh->ino, err);
+	}
+
+	/* Update i_size */
+	if (!(mode & FL_KEEP_SIZE_FLAG)) {
+		err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
+		if (err)
+			return translate_error(fs, fh->ino, err);
+
+		err = ext2fs_file_get_lsize(efp, &fsize);
+		if (err) {
+			ret = translate_error(fs, fh->ino, err);
+			goto out_isize;
+		}
+		if (offset + len > fsize) {
+			fsize = offset + len;
+			err = ext2fs_file_set_size2(efp, fsize);
+			if (err) {
+				ret = translate_error(fs, fh->ino, err);
+				goto out_isize;
+			}
+		}
+
+out_isize:
+		err = ext2fs_file_close(efp);
+		if (ret)
+			return ret;
+		if (err)
+			return translate_error(fs, fh->ino, err);
+	}
+
+	return update_mtime(fs, fh->ino);
+}
+
+static int punch_helper(struct fuse_file_info *fp, int mode, off_t offset,
+			off_t len)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
+	blk64_t blk, start, end, x;
+	__u64 fsize;
+	ext2_file_t efp;
+	struct ext2_inode_large inode;
+	errcode_t err;
+	int ret = 0;
+
+	/* kernel ext4 punch requires this flag to be set */
+	if (!(mode & FL_KEEP_SIZE_FLAG))
+		return -EINVAL;
+
+	if (len < fs->blocksize)
+		return 0;
+
+	/* Punch out a bunch of blocks */
+	start = (offset + fs->blocksize - 1) / fs->blocksize;
+	end = (offset + len - fs->blocksize) / fs->blocksize;
+
+	if (start > end)
+		return -EINVAL;
+
+	err = ext2fs_punch(fs, fh->ino, NULL, NULL, start, end);
+	if (err)
+		return translate_error(fs, fh->ino, err);
+
+	return update_mtime(fs, fh->ino);
+}
+
+static int op_fallocate(const char *path, int mode, off_t offset, off_t len,
+			struct fuse_file_info *fp)
+{
+	struct fuse_context *ctxt = fuse_get_context();
+	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
+	ext2_filsys fs = ff->fs;
+	int ret;
+
+	/* Catch unknown flags */
+	if (mode & ~(FL_PUNCH_HOLE_FLAG | FL_KEEP_SIZE_FLAG))
+		return -EINVAL;
+
+	pthread_mutex_lock(&ff->bfl);
+	if (!fs_writeable(fs)) {
+		ret = -EROFS;
+		goto out;
+	}
+	if (mode & FL_PUNCH_HOLE_FLAG)
+		ret = punch_helper(fp, mode, offset, len);
+	else
+		ret = fallocate_helper(fp, mode, offset, len);
+out:
+	pthread_mutex_unlock(&ff->bfl);
+
+	return ret;
+}
+#endif /* FUSE 29 */
+
+static struct fuse_operations fs_ops = {
+	.init = op_init,
+	.destroy = op_destroy,
+	.getattr = op_getattr,
+	.readlink = op_readlink,
+	.mknod = op_mknod,
+	.mkdir = op_mkdir,
+	.unlink = op_unlink,
+	.rmdir = op_rmdir,
+	.symlink = op_symlink,
+	.rename = op_rename,
+	.link = op_link,
+	.chmod = op_chmod,
+	.chown = op_chown,
+	.truncate = op_truncate,
+	.open = op_open,
+	.read = op_read,
+	.write = op_write,
+	.statfs = op_statfs,
+	.release = op_release,
+	.fsync = op_fsync,
+	.setxattr = op_setxattr,
+	.getxattr = op_getxattr,
+	.listxattr = op_listxattr,
+	.removexattr = op_removexattr,
+	.opendir = op_open,
+	.readdir = op_readdir,
+	.releasedir = op_release,
+	.fsyncdir = op_fsync,
+	.access = op_access,
+	.create = op_create,
+	.ftruncate = op_ftruncate,
+	.fgetattr = op_fgetattr,
+	.utimens = op_utimens,
+	.bmap = op_bmap,
+#ifdef SUPERFLUOUS
+	.lock = op_lock,
+	.poll = op_poll,
+#endif
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
+	.ioctl = op_ioctl,
+	.flag_nullpath_ok = 1,
+#endif
+#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)
+	.flag_nopath = 1,
+	.fallocate = op_fallocate,
+#endif
+};
+
+static int get_random_bytes(void *p, size_t sz)
+{
+	int fd;
+	ssize_t r;
+
+	fd = open("/dev/random", O_RDONLY);
+	if (fd < 0) {
+		perror("/dev/random");
+		return 0;
+	}
+
+	r = read(fd, p, sz);
+
+	close(fd);
+	return r == sz;
+}
+
+int main(int argc, char *argv[])
+{
+	errcode_t err;
+	ext2_filsys fs;
+	char *tok, *arg, *logfile;
+	int i;
+	int readwrite = 1, panic_on_error = 0;
+	struct fuse2fs *ff;
+	char extra_args[BUFSIZ];
+	int ret = 0, flags = EXT2_FLAG_64BITS | EXT2_FLAG_EXCLUSIVE;
+
+	if (argc < 2) {
+		printf("Usage: %s dev mntpt [-o options] [fuse_args]\n",
+		       argv[0]);
+		return 1;
+	}
+
+	for (i = 1; i < argc - 1; i++) {
+		if (strcmp(argv[i], "-o"))
+			continue;
+		arg = argv[i + 1];
+		while ((tok = strtok(arg, ","))) {
+			arg = NULL;
+			if (!strcmp(tok, "ro"))
+				readwrite = 0;
+			else if (!strcmp(tok, "errors=panic"))
+				panic_on_error = 1;
+		}
+	}
+
+	if (!readwrite)
+		printf("Mounting read-only.\n");
+
+#ifdef ENABLE_NLS
+	setlocale(LC_MESSAGES, "");
+	setlocale(LC_CTYPE, "");
+	bindtextdomain(NLS_CAT_NAME, LOCALEDIR);
+	textdomain(NLS_CAT_NAME);
+	set_com_err_gettext(gettext);
+#endif
+	add_error_table(&et_ext2_error_table);
+
+	ff = calloc(1, sizeof(*ff));
+	if (!ff) {
+		perror("init");
+		return 1;
+	}
+	ff->panic_on_error = panic_on_error;
+
+	/* Set up error logging */
+	logfile = getenv("FUSE2FS_LOGFILE");
+	if (logfile) {
+		ff->err_fp = fopen(logfile, "a");
+		if (!ff->err_fp) {
+			perror(logfile);
+			goto out_nofs;
+		}
+	} else
+		ff->err_fp = stderr;
+
+	/* Start up the fs (while we still can use stdout) */
+	ret = 2;
+	if (readwrite)
+		flags |= EXT2_FLAG_RW;
+	err = ext2fs_open3(argv[1], NULL, flags, 0, 0, unix_io_manager, &fs);
+	if (err) {
+		printf("%s: %s.\n", argv[1], error_message(err));
+		printf("Please run e2fsck -fy %s.\n", argv[1]);
+		goto out_nofs;
+	}
+	ff->fs = fs;
+	fs->priv_data = ff;
+
+	ret = 3;
+	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+				      EXT3_FEATURE_INCOMPAT_RECOVER)) {
+		printf("Journal needs recovery; running `e2fsck -E "
+		       "journal_only' is required.\n");
+		goto out;
+	}
+
+	if (readwrite) {
+		if (EXT2_HAS_COMPAT_FEATURE(fs->super,
+					    EXT3_FEATURE_COMPAT_HAS_JOURNAL))
+			printf("Journal mode will not be used.\n");
+		err = ext2fs_read_inode_bitmap(fs);
+		if (err) {
+			translate_error(fs, 0, err);
+			goto out;
+		}
+		err = ext2fs_read_block_bitmap(fs);
+		if (err) {
+			translate_error(fs, 0, err);
+			goto out;
+		}
+	}
+
+	if (!(fs->super->s_state & EXT2_VALID_FS))
+		printf("Warning: Mounting unchecked fs, running e2fsck "
+		       "is recommended.\n");
+	if (fs->super->s_max_mnt_count > 0 &&
+	    fs->super->s_mnt_count >= fs->super->s_max_mnt_count)
+		printf("Warning: Maximal mount count reached, running "
+		       "e2fsck is recommended.\n");
+	if (fs->super->s_checkinterval > 0 &&
+	    fs->super->s_lastcheck + fs->super->s_checkinterval <= time(0))
+		printf("Warning: Check time reached; running e2fsck "
+		       "is recommended.\n");
+	if (fs->super->s_last_orphan)
+		printf("Orphans detected; running e2fsck is recommended.\n");
+
+	if (fs->super->s_state & EXT2_ERROR_FS) {
+		printf("Errors detected; running e2fsck is required.\n");
+		goto out;
+	}
+
+	/* Initialize generation counter */
+	get_random_bytes(&ff->next_generation, sizeof(unsigned int));
+
+	/* Stuff in some fuse parameters of our own */
+	snprintf(extra_args, BUFSIZ, "-okernel_cache,subtype=ext4,use_ino,"
+		 "fsname=%s", argv[1]);
+	argv[0] = argv[1];
+	argv[1] = argv[2];
+	argv[2] = extra_args;
+
+	pthread_mutex_init(&ff->bfl, NULL);
+	fuse_main(argc, argv, &fs_ops, ff);
+	pthread_mutex_destroy(&ff->bfl);
+
+	ret = 0;
+out:
+	err = ext2fs_close(fs);
+	if (err)
+		ret = translate_error(fs, 0, err);
+out_nofs:
+	free(ff);
+
+	return ret;
+}
+
+static int __translate_error(ext2_filsys fs, errcode_t err, ext2_ino_t ino,
+			     const char *file, int line)
+{
+	struct timespec now;
+	int ret;
+	struct fuse2fs *ff = fs->priv_data;
+	int is_err = 0;
+
+	/* Translate ext2 error to unix error code */
+	switch (err) {
+	case EXT2_ET_NO_MEMORY:
+	case EXT2_ET_TDB_ERR_OOM:
+		ret = -ENOMEM;
+		break;
+	case EXT2_ET_INVALID_ARGUMENT:
+	case EXT2_ET_LLSEEK_FAILED:
+		ret = -EINVAL;
+		break;
+	case EXT2_ET_NO_DIRECTORY:
+		ret = -ENOTDIR;
+		break;
+	case EXT2_ET_FILE_NOT_FOUND:
+		ret = -ENOENT;
+		break;
+	case EXT2_ET_DIR_NO_SPACE:
+		is_err = 1;
+	case EXT2_ET_TOOSMALL:
+	case EXT2_ET_BLOCK_ALLOC_FAIL:
+	case EXT2_ET_INODE_ALLOC_FAIL:
+	case EXT2_ET_EA_NO_SPACE:
+		ret = -ENOSPC;
+		break;
+	case EXT2_ET_SYMLINK_LOOP:
+		ret = -EMLINK;
+		break;
+	case EXT2_ET_FILE_TOO_BIG:
+		ret = -EFBIG;
+		break;
+	case EXT2_ET_TDB_ERR_EXISTS:
+	case EXT2_ET_FILE_EXISTS:
+		ret = -EEXIST;
+		break;
+	case EXT2_ET_MMP_FAILED:
+	case EXT2_ET_MMP_FSCK_ON:
+		ret = -EBUSY;
+		break;
+	case EXT2_ET_EA_KEY_NOT_FOUND:
+		ret = -ENODATA;
+		break;
+	default:
+		is_err = 1;
+		ret = -EIO;
+		break;
+	}
+
+	if (!is_err)
+		return ret;
+
+	if (ino)
+		fprintf(ff->err_fp, "FUSE2FS (%s): %s (inode #%d) at %s:%d.\n",
+			fs && fs->device_name ? fs->device_name : "???",
+			error_message(err), ino, file, line);
+	else
+		fprintf(ff->err_fp, "FUSE2FS (%s): %s at %s:%d.\n",
+			fs && fs->device_name ? fs->device_name : "???",
+			error_message(err), file, line);
+	fflush(ff->err_fp);
+
+	/* Make a note in the error log */
+	get_now(&now);
+	fs->super->s_last_error_time = now.tv_sec;
+	fs->super->s_last_error_ino = ino;
+	fs->super->s_last_error_line = line;
+	fs->super->s_last_error_block = 0;
+	strncpy(fs->super->s_last_error_func, file,
+		sizeof(fs->super->s_last_error_func));
+	if (fs->super->s_first_error_time == 0) {
+		fs->super->s_first_error_time = now.tv_sec;
+		fs->super->s_first_error_ino = ino;
+		fs->super->s_first_error_line = line;
+		fs->super->s_first_error_block = 0;
+		strncpy(fs->super->s_first_error_func, file,
+			sizeof(fs->super->s_first_error_func));
+	}
+
+	fs->super->s_error_count++;
+	ext2fs_mark_super_dirty(fs);
+	ext2fs_flush(fs);
+	if (ff->panic_on_error)
+		abort();
+
+	return ret;
+}


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (23 preceding siblings ...)
  2013-10-18  4:51 ` [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs Darrick J. Wong
@ 2013-10-18 13:13 ` Lukáš Czerner
  2013-10-18 18:13   ` Darrick J. Wong
  2013-10-18 18:39 ` Theodore Ts'o
  25 siblings, 1 reply; 73+ messages in thread
From: Lukáš Czerner @ 2013-10-18 13:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 17 Oct 2013, Darrick J. Wong wrote:

> Date: Thu, 17 Oct 2013 21:48:54 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013

I was going to review this, but could you please include the
information what changed since the last version of each patch (into
the patch itself) since it'll make the review much easier.

Thanks!
-Lukas

> 
> Well, here we go again.  This is the same patchbomb from a couple of
> weeks ago, minus the patches that Ted has already accepted, plus
> several fixes to resize2fs that weren't ready back then, and a few
> other fixes that migrated into the other patches.  This series is
> against -next.
> 
> Ted, since you've accepted patches into -pu, do you want me to send
> patches against -pu as well?  Or put more bluntly, what are your
> thoughts about revert-and-replace of patches in -pu?  Patches 2, 6,
> 11, 23, and 24 have changed significantly since 9/30.
> 
> The first eight patches fix miscellaneous errors: #1 stops dirent
> iteration after we successfully link an inode into a directory.  #2
> fixes a bug that prohibited us from specifyinng a 64bit superblock
> number when opening an FS.  #3 prohibits running mke2fs with -E
> resize= and meta_bg.  #4 causes users of the badblocks code to reject
> 64bit block numbers.  #5 fixes shift overflows errors when punching
> the end of non-extent files.  #6 refactors all the tests for whether
> or not we need to set the LARGE_FILE feature (because someone goofed
> earlier).  #7 fixes a problem wherein mkfs ignored non-4096 blocksize
> directives in the config file on a device larger than 2^32KB.  #8
> cleans up some code in debugfs.
> 
> The next two patches fix some 64bit truncation bugs.
> 
> Regarding next five patches, I turned on bigalloc and found a number
> of bugs relating to the fact that block_alloc_stats2() takes a block
> number but operates on clusters.  I've fixed up all the allocation
> errors that I found.  I also decided to make the quota code use
> ext2fs_punch rather than try to correct its behavior wrt bigalloc.
> There was also a bug wherein the requirement that 64-bit bitmaps be
> enabled (via EXT2_FLAG_64BITS) for bigalloc filesystems.  There's also
> a patch to reduce the e2fsck output verbosity when there are block
> bitmap errors.  Note that #11 has been refactored significantly.
> 
> The next patch provides the ability to toggle the 64bit feature on any
> ext4 filesystem.
> 
> The four patches after that fix various resize2fs bugs with bigalloc.
> 
> The next two patches fix bugs with metadata_csum.  There's a patch to
> fix up some code to test if checksums are enabled instead of a
> GDT_CSUM open-code.  Finally, there's a patch to resize2fs to rewrite
> checksums of inodes that were relocated.
> 
> The next two patches add the ability to edit extended attributes and
> add a fuse2fs driver for e2fsprogs.  I admit that the xattr editing
> functions clash with the inline_data patches, though sadly, the inline
> data patches don't provide an API to access EAs in a separate EA
> block.  The fuse driver should work with the latest versions of Linux
> fuse (2.9.2) and osxfuse (2.6.1).  I've been using the fuse driver to
> test e2fsprogs functionality, which is how I came across most of the
> bugs fixed above.  Both of these patches (#23 and #24) have received
> fixes since the 9/30 posting.
> 
> The final patch adds my metadata checksum test program to the tests/
> directory, along with a new metadata_check target to run a quick
> check.  It includes substitute mount/umount commands for use with
> fuse2fs.
> 
> For fuse2fs, I think it'd be useful to reintroduce journal replay too.
> (Or cheat and call e2fsck -E journal_only...)  Also, fuse2fs doesn't
> yet know how to read or write ACLs yet.
> 
> I've tested these e2fsprogs changes against the -next branch as of a
> few days ago.  These days, I use a 2GB ramdisk and a 20T "disk" I
> constructed out of dm-snapshot to test in an x64 VM.
> 
> Comments and questions are, as always, welcome.
> 
> --D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
  2013-10-18 13:13 ` [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Lukáš Czerner
@ 2013-10-18 18:13   ` Darrick J. Wong
  2013-10-18 20:37     ` Darrick J. Wong
  0 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 18:13 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, Oct 18, 2013 at 03:13:57PM +0200, Lukáš Czerner wrote:
> On Thu, 17 Oct 2013, Darrick J. Wong wrote:
> 
> > Date: Thu, 17 Oct 2013 21:48:54 -0700
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > To: tytso@mit.edu, darrick.wong@oracle.com
> > Cc: linux-ext4@vger.kernel.org
> > Subject: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
> 
> I was going to review this, but could you please include the
> information what changed since the last version of each patch (into
> the patch itself) since it'll make the review much easier.

I'm working on providing diffs against last time. :)

--D
> 
> Thanks!
> -Lukas
> 
> > 
> > Well, here we go again.  This is the same patchbomb from a couple of
> > weeks ago, minus the patches that Ted has already accepted, plus
> > several fixes to resize2fs that weren't ready back then, and a few
> > other fixes that migrated into the other patches.  This series is
> > against -next.
> > 
> > Ted, since you've accepted patches into -pu, do you want me to send
> > patches against -pu as well?  Or put more bluntly, what are your
> > thoughts about revert-and-replace of patches in -pu?  Patches 2, 6,
> > 11, 23, and 24 have changed significantly since 9/30.
> > 
> > The first eight patches fix miscellaneous errors: #1 stops dirent
> > iteration after we successfully link an inode into a directory.  #2
> > fixes a bug that prohibited us from specifyinng a 64bit superblock
> > number when opening an FS.  #3 prohibits running mke2fs with -E
> > resize= and meta_bg.  #4 causes users of the badblocks code to reject
> > 64bit block numbers.  #5 fixes shift overflows errors when punching
> > the end of non-extent files.  #6 refactors all the tests for whether
> > or not we need to set the LARGE_FILE feature (because someone goofed
> > earlier).  #7 fixes a problem wherein mkfs ignored non-4096 blocksize
> > directives in the config file on a device larger than 2^32KB.  #8
> > cleans up some code in debugfs.
> > 
> > The next two patches fix some 64bit truncation bugs.
> > 
> > Regarding next five patches, I turned on bigalloc and found a number
> > of bugs relating to the fact that block_alloc_stats2() takes a block
> > number but operates on clusters.  I've fixed up all the allocation
> > errors that I found.  I also decided to make the quota code use
> > ext2fs_punch rather than try to correct its behavior wrt bigalloc.
> > There was also a bug wherein the requirement that 64-bit bitmaps be
> > enabled (via EXT2_FLAG_64BITS) for bigalloc filesystems.  There's also
> > a patch to reduce the e2fsck output verbosity when there are block
> > bitmap errors.  Note that #11 has been refactored significantly.
> > 
> > The next patch provides the ability to toggle the 64bit feature on any
> > ext4 filesystem.
> > 
> > The four patches after that fix various resize2fs bugs with bigalloc.
> > 
> > The next two patches fix bugs with metadata_csum.  There's a patch to
> > fix up some code to test if checksums are enabled instead of a
> > GDT_CSUM open-code.  Finally, there's a patch to resize2fs to rewrite
> > checksums of inodes that were relocated.
> > 
> > The next two patches add the ability to edit extended attributes and
> > add a fuse2fs driver for e2fsprogs.  I admit that the xattr editing
> > functions clash with the inline_data patches, though sadly, the inline
> > data patches don't provide an API to access EAs in a separate EA
> > block.  The fuse driver should work with the latest versions of Linux
> > fuse (2.9.2) and osxfuse (2.6.1).  I've been using the fuse driver to
> > test e2fsprogs functionality, which is how I came across most of the
> > bugs fixed above.  Both of these patches (#23 and #24) have received
> > fixes since the 9/30 posting.
> > 
> > The final patch adds my metadata checksum test program to the tests/
> > directory, along with a new metadata_check target to run a quick
> > check.  It includes substitute mount/umount commands for use with
> > fuse2fs.
> > 
> > For fuse2fs, I think it'd be useful to reintroduce journal replay too.
> > (Or cheat and call e2fsck -E journal_only...)  Also, fuse2fs doesn't
> > yet know how to read or write ACLs yet.
> > 
> > I've tested these e2fsprogs changes against the -next branch as of a
> > few days ago.  These days, I use a 2GB ramdisk and a 20T "disk" I
> > constructed out of dm-snapshot to test in an x64 VM.
> > 
> > Comments and questions are, as always, welcome.
> > 
> > --D
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter
  2013-10-18  4:49 ` [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter Darrick J. Wong
@ 2013-10-18 18:32   ` Darrick J. Wong
  2013-10-23 14:49     ` Lukáš Czerner
  0 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 18:32 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:49:07PM -0700, Darrick J. Wong wrote:
> Since it's possible for very large filesystems to store backup
> superblocks at very large (> 2^32) block numbers, we need to be able
> to handle the case of a caller directing us to read one of these
> high-numbered backups.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  debugfs/debugfs.c   |    4 ++--
>  e2fsck/journal.c    |    6 +++---
>  e2fsck/unix.c       |    8 ++++----
>  lib/ext2fs/ext2fs.h |    4 ++++
>  lib/ext2fs/openfs.c |   21 +++++++++++++++------
>  misc/dumpe2fs.c     |    4 ++--
>  6 files changed, 30 insertions(+), 17 deletions(-)
> 
> 
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index 8c32eff..4f6108d 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -94,8 +94,8 @@ static void open_filesystem(char *device, int open_flags, blk64_t superblock,
>  	if (catastrophic)
>  		open_flags |= EXT2_FLAG_SKIP_MMP;
>  
> -	retval = ext2fs_open(device, open_flags, superblock, blocksize,
> -			     unix_io_manager, &current_fs);
> +	retval = ext2fs_open3(device, NULL, open_flags, superblock, blocksize,
> +			      unix_io_manager, &current_fs);
>  	if (retval) {
>  		com_err(device, retval, "while opening filesystem");
>  		current_fs = NULL;
> diff --git a/e2fsck/journal.c b/e2fsck/journal.c
> index 2509303..af35a38 100644
> --- a/e2fsck/journal.c
> +++ b/e2fsck/journal.c
> @@ -967,9 +967,9 @@ int e2fsck_run_ext3_journal(e2fsck_t ctx)
>  
>  	ext2fs_mmp_stop(ctx->fs);
>  	ext2fs_free(ctx->fs);
> -	retval = ext2fs_open(ctx->filesystem_name, EXT2_FLAG_RW,
> -			     ctx->superblock, blocksize, io_ptr,
> -			     &ctx->fs);
> +	retval = ext2fs_open3(ctx->filesystem_name, NULL, EXT2_FLAG_RW,
> +			      ctx->superblock, blocksize, io_ptr,
> +			      &ctx->fs);
>  	if (retval) {
>  		com_err(ctx->program_name, retval,
>  			_("while trying to re-open %s"),
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index 0546653..fb41ca0 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -1040,7 +1040,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
>  
>  	*ret_fs = NULL;
>  	if (ctx->superblock && ctx->blocksize) {
> -		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
> +		retval = ext2fs_open3(ctx->filesystem_name, ctx->io_options,
>  				      flags, ctx->superblock, ctx->blocksize,
>  				      io_ptr, ret_fs);
>  	} else if (ctx->superblock) {
> @@ -1051,7 +1051,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
>  				ext2fs_free(*ret_fs);
>  				*ret_fs = NULL;
>  			}
> -			retval = ext2fs_open2(ctx->filesystem_name,
> +			retval = ext2fs_open3(ctx->filesystem_name,
>  					      ctx->io_options, flags,
>  					      ctx->superblock, blocksize,
>  					      io_ptr, ret_fs);
> @@ -1059,7 +1059,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
>  				break;
>  		}
>  	} else
> -		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
> +		retval = ext2fs_open3(ctx->filesystem_name, ctx->io_options,
>  				      flags, 0, 0, io_ptr, ret_fs);
>  
>  	if (retval == 0)
> @@ -1375,7 +1375,7 @@ failure:
>  	 * don't need to update the mount count and last checked
>  	 * fields in the backup superblock (the kernel doesn't update
>  	 * the backup superblocks anyway).  With newer versions of the
> -	 * library this flag is set by ext2fs_open2(), but we set this
> +	 * library this flag is set by ext2fs_open3(), but we set this
>  	 * here just to be sure.  (No, we don't support e2fsck running
>  	 * with some other libext2fs than the one that it was shipped
>  	 * with, but just in case....)
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 67876ad..1ef4d67 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -1443,6 +1443,10 @@ extern errcode_t ext2fs_open2(const char *name, const char *io_options,
>  			      int flags, int superblock,
>  			      unsigned int block_size, io_manager manager,
>  			      ext2_filsys *ret_fs);
> +extern errcode_t ext2fs_open3(const char *name, const char *io_options,
> +			      int flags, blk64_t superblock,
> +			      unsigned int block_size, io_manager manager,
> +			      ext2_filsys *ret_fs);
>  extern blk64_t ext2fs_descriptor_block_loc2(ext2_filsys fs,
>  					blk64_t group_block, dgrp_t i);
>  extern blk_t ext2fs_descriptor_block_loc(ext2_filsys fs, blk_t group_block,
> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
> index 2ad9114..b046d6c 100644
> --- a/lib/ext2fs/openfs.c
> +++ b/lib/ext2fs/openfs.c
> @@ -76,6 +76,15 @@ errcode_t ext2fs_open(const char *name, int flags, int superblock,
>  			    manager, ret_fs);
>  }
>  
> +errcode_t ext2fs_open2(const char *name, const char *io_options,
> +		       int flags, int superblock,
> +		       unsigned int block_size, io_manager manager,
> +		       ext2_filsys *ret_fs)
> +{
> +	return ext2fs_open3(name, io_options, flags, superblock, block_size,
> +			    manager, ret_fs);
> +}
> +
>  /*
>   *  Note: if superblock is non-zero, block-size must also be non-zero.
>   * 	Superblock and block_size can be zero to use the default size.
> @@ -90,8 +99,8 @@ errcode_t ext2fs_open(const char *name, int flags, int superblock,
>   *	EXT2_FLAG_64BITS - Allow 64-bit bitfields (needed for large
>   *				filesystems)
>   */
> -errcode_t ext2fs_open2(const char *name, const char *io_options,
> -		       int flags, int superblock,
> +errcode_t ext2fs_open3(const char *name, const char *io_options,
> +		       int flags, blk64_t superblock,
>  		       unsigned int block_size, io_manager manager,
>  		       ext2_filsys *ret_fs)
>  {
> @@ -189,8 +198,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
>  		if (retval)
>  			goto cleanup;
>  	}
> -	retval = io_channel_read_blk(fs->io, superblock, -SUPERBLOCK_SIZE,
> -				     fs->super);
> +	retval = io_channel_read_blk64(fs->io, superblock, -SUPERBLOCK_SIZE,
> +				       fs->super);
>  	if (retval)
>  		goto cleanup;
>  	if (fs->orig_super)
> @@ -380,8 +389,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
>  	else
>  		first_meta_bg = fs->desc_blocks;
>  	if (first_meta_bg) {
> -		retval = io_channel_read_blk(fs->io, group_block+1,
> -					     first_meta_bg, dest);
> +		retval = io_channel_read_blk64(fs->io, group_block+1,
> +					       first_meta_bg, dest);

The only change to this patch is the use of *read_blk64 in these two hunks.

--D

>  		if (retval)
>  			goto cleanup;
>  #ifdef WORDS_BIGENDIAN
> diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
> index ae70f70..b139977 100644
> --- a/misc/dumpe2fs.c
> +++ b/misc/dumpe2fs.c
> @@ -611,7 +611,7 @@ int main (int argc, char ** argv)
>  		for (use_blocksize = EXT2_MIN_BLOCK_SIZE;
>  		     use_blocksize <= EXT2_MAX_BLOCK_SIZE;
>  		     use_blocksize *= 2) {
> -			retval = ext2fs_open (device_name, flags,
> +			retval = ext2fs_open3(device_name, NULL, flags,
>  					      use_superblock,
>  					      use_blocksize, unix_io_manager,
>  					      &fs);
> @@ -619,7 +619,7 @@ int main (int argc, char ** argv)
>  				break;
>  		}
>  	} else
> -		retval = ext2fs_open (device_name, flags, use_superblock,
> +		retval = ext2fs_open3(device_name, NULL, flags, use_superblock,
>  				      use_blocksize, unix_io_manager, &fs);
>  	if (retval) {
>  		com_err (program_name, retval, _("while trying to open %s"),
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses
  2013-10-18  4:49 ` [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses Darrick J. Wong
@ 2013-10-18 18:37   ` Darrick J. Wong
  2013-11-25  8:18     ` Zheng Liu
  0 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 18:37 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:49:55PM -0700, Darrick J. Wong wrote:
> The extended attribute refcounting code only accepts blk_t, which is
> dangerous because EA blocks can exist at high addresses (> 2^32) as
> well.  Therefore, widen the block fields to 64 bits.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  e2fsck/e2fsck.h      |   12 ++++++------
>  e2fsck/ea_refcount.c |   36 ++++++++++++++++++------------------
>  2 files changed, 24 insertions(+), 24 deletions(-)
> 
> 
> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> index 13d70f1..f1df525 100644
> --- a/e2fsck/e2fsck.h
> +++ b/e2fsck/e2fsck.h
> @@ -432,17 +432,17 @@ extern struct dx_dir_info *e2fsck_dx_dir_info_iter(e2fsck_t ctx, int *control);
>  /* ea_refcount.c */
>  extern errcode_t ea_refcount_create(int size, ext2_refcount_t *ret);
>  extern void ea_refcount_free(ext2_refcount_t refcount);
> -extern errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
> +extern errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk64_t blk,
>  				   int *ret);
>  extern errcode_t ea_refcount_increment(ext2_refcount_t refcount,
> -				       blk_t blk, int *ret);
> +				       blk64_t blk, int *ret);
>  extern errcode_t ea_refcount_decrement(ext2_refcount_t refcount,
> -				       blk_t blk, int *ret);
> +				       blk64_t blk, int *ret);
>  extern errcode_t ea_refcount_store(ext2_refcount_t refcount,
> -				   blk_t blk, int count);
> -extern blk_t ext2fs_get_refcount_size(ext2_refcount_t refcount);
> +				   blk64_t blk, int count);
> +extern blk64_t ext2fs_get_refcount_size(ext2_refcount_t refcount);
>  extern void ea_refcount_intr_begin(ext2_refcount_t refcount);
> -extern blk_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
> +extern blk64_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
>  
>  /* ehandler.c */
>  extern const char *ehandler_operation(const char *op);
> diff --git a/e2fsck/ea_refcount.c b/e2fsck/ea_refcount.c
> index e66e636..6f376a3 100644
> --- a/e2fsck/ea_refcount.c
> +++ b/e2fsck/ea_refcount.c
> @@ -25,14 +25,14 @@
>   * checked, its bit is set in the block_ea_map bitmap.
>   */
>  struct ea_refcount_el {
> -	blk_t	ea_blk;
> +	blk64_t	ea_blk;
>  	int	ea_count;
>  };
>  
>  struct ea_refcount {
> -	blk_t		count;
> -	blk_t		size;
> -	blk_t		cursor;
> +	unsigned long		count;
> +	unsigned long		size;
> +	unsigned long		cursor;

This (unsigned long instead of blk_t) is the only thing that changed since last
time.

--D

>  	struct ea_refcount_el	*list;
>  };
>  
> @@ -111,11 +111,11 @@ static void refcount_collapse(ext2_refcount_t refcount)
>   * 	specified position.
>   */
>  static struct ea_refcount_el *insert_refcount_el(ext2_refcount_t refcount,
> -						 blk_t blk, int pos)
> +						 blk64_t blk, int pos)
>  {
>  	struct ea_refcount_el 	*el;
>  	errcode_t		retval;
> -	blk_t			new_size = 0;
> +	blk64_t			new_size = 0;
>  	int			num;
>  
>  	if (refcount->count >= refcount->size) {
> @@ -153,7 +153,7 @@ static struct ea_refcount_el *insert_refcount_el(ext2_refcount_t refcount,
>   * 	and we can't find an entry, create one in the sorted list.
>   */
>  static struct ea_refcount_el *get_refcount_el(ext2_refcount_t refcount,
> -					      blk_t blk, int create)
> +					      blk64_t blk, int create)
>  {
>  	int	low, high, mid;
>  
> @@ -206,7 +206,7 @@ retry:
>  	return 0;
>  }
>  
> -errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
> +errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk64_t blk,
>  				int *ret)
>  {
>  	struct ea_refcount_el	*el;
> @@ -220,7 +220,7 @@ errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
>  	return 0;
>  }
>  
> -errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk_t blk, int *ret)
> +errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk64_t blk, int *ret)
>  {
>  	struct ea_refcount_el	*el;
>  
> @@ -234,7 +234,7 @@ errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk_t blk, int *ret)
>  	return 0;
>  }
>  
> -errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk_t blk, int *ret)
> +errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk64_t blk, int *ret)
>  {
>  	struct ea_refcount_el	*el;
>  
> @@ -249,7 +249,7 @@ errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk_t blk, int *ret)
>  	return 0;
>  }
>  
> -errcode_t ea_refcount_store(ext2_refcount_t refcount, blk_t blk, int count)
> +errcode_t ea_refcount_store(ext2_refcount_t refcount, blk64_t blk, int count)
>  {
>  	struct ea_refcount_el	*el;
>  
> @@ -263,7 +263,7 @@ errcode_t ea_refcount_store(ext2_refcount_t refcount, blk_t blk, int count)
>  	return 0;
>  }
>  
> -blk_t ext2fs_get_refcount_size(ext2_refcount_t refcount)
> +blk64_t ext2fs_get_refcount_size(ext2_refcount_t refcount)
>  {
>  	if (!refcount)
>  		return 0;
> @@ -277,7 +277,7 @@ void ea_refcount_intr_begin(ext2_refcount_t refcount)
>  }
>  
>  
> -blk_t ea_refcount_intr_next(ext2_refcount_t refcount,
> +blk64_t ea_refcount_intr_next(ext2_refcount_t refcount,
>  				int *ret)
>  {
>  	struct ea_refcount_el	*list;
> @@ -370,7 +370,7 @@ int main(int argc, char **argv)
>  	int	i = 0;
>  	ext2_refcount_t refcount;
>  	int		size, arg;
> -	blk_t		blk;
> +	blk64_t		blk;
>  	errcode_t	retval;
>  
>  	while (1) {
> @@ -394,7 +394,7 @@ int main(int argc, char **argv)
>  			printf("Freeing refcount\n");
>  			break;
>  		case BCODE_STORE:
> -			blk = (blk_t) bcode_program[i++];
> +			blk = (blk64_t) bcode_program[i++];
>  			arg = bcode_program[i++];
>  			printf("Storing blk %u with value %d\n", blk, arg);
>  			retval = ea_refcount_store(refcount, blk, arg);
> @@ -403,7 +403,7 @@ int main(int argc, char **argv)
>  					"while storing blk %u", blk);
>  			break;
>  		case BCODE_FETCH:
> -			blk = (blk_t) bcode_program[i++];
> +			blk = (blk64_t) bcode_program[i++];
>  			retval = ea_refcount_fetch(refcount, blk, &arg);
>  			if (retval)
>  				com_err("ea_refcount_fetch", retval,
> @@ -413,7 +413,7 @@ int main(int argc, char **argv)
>  				       blk, arg);
>  			break;
>  		case BCODE_INCR:
> -			blk = (blk_t) bcode_program[i++];
> +			blk = (blk64_t) bcode_program[i++];
>  			retval = ea_refcount_increment(refcount, blk, &arg);
>  			if (retval)
>  				com_err("ea_refcount_increment", retval,
> @@ -423,7 +423,7 @@ int main(int argc, char **argv)
>  				       blk, arg);
>  			break;
>  		case BCODE_DECR:
> -			blk = (blk_t) bcode_program[i++];
> +			blk = (blk64_t) bcode_program[i++];
>  			retval = ea_refcount_decrement(refcount, blk, &arg);
>  			if (retval)
>  				com_err("ea_refcount_decrement", retval,
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
  2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
                   ` (24 preceding siblings ...)
  2013-10-18 13:13 ` [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Lukáš Czerner
@ 2013-10-18 18:39 ` Theodore Ts'o
  25 siblings, 0 replies; 73+ messages in thread
From: Theodore Ts'o @ 2013-10-18 18:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:48:54PM -0700, Darrick J. Wong wrote:
> 
> Ted, since you've accepted patches into -pu, do you want me to send
> patches against -pu as well?  Or put more bluntly, what are your
> thoughts about revert-and-replace of patches in -pu?  Patches 2, 6,
> 11, 23, and 24 have changed significantly since 9/30.

The pu branch is a rewinding patch.  What this means in practice is
that I'll accept those patches which I believe are aready, and for the
rest, I'll rewind the "dw/resize64-fuse" branch back to next, and
apply the rest onto the dw/resize64-fuse branch.  If I get a new set
of inline patches, I'd do the same thing.

Then when I update the pu branch, I rewind the pu branch to next, and
then merge in all of the "xx/yyyy" feature branches, resolving
conflicts along the way.  This makes it obvious which branches have
conflicts against each other, and it also allows me to run regression
tests against the combined set of feature patch sets that aren't quite
ready for next.

So in other words, no, you don't need to send patches against pu,
thanks!

      	    	       	   	      - Ted

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 10/25] debugfs: handle 64bit block numbers
  2013-10-18  4:50 ` [PATCH 10/25] debugfs: handle 64bit block numbers Darrick J. Wong
@ 2013-10-18 18:47   ` Darrick J. Wong
  2013-11-25  8:33   ` Zheng Liu
  1 sibling, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 18:47 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:50:01PM -0700, Darrick J. Wong wrote:
> debugfs should use strtoull wrappers for reading block numbers from
> the command line.  "unsigned long" isn't wide enough to handle block
> numbers on 32bit platforms.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  debugfs/debugfs.c      |   33 ++++++++++++++++++++++-----------
>  debugfs/extent_inode.c |   22 +++++++++-------------
>  debugfs/util.c         |    2 +-
>  3 files changed, 32 insertions(+), 25 deletions(-)
> 
> 
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index d3db356..46fcd07 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -181,8 +181,7 @@ void do_open_filesys(int argc, char **argv)
>  				return;
>  			break;
>  		case 's':
> -			superblock = parse_ulong(optarg, argv[0],
> -						 "superblock number", &err);
> +			err = strtoblk(argv[0], optarg, &superblock);

I converted all of these to strtoblk instead of parse_ulonglong.  Otherwise,
this patch is the same.

--D

>  			if (err)
>  				return;
>  			break;
> @@ -277,14 +276,17 @@ void do_init_filesys(int argc, char **argv)
>  	struct ext2_super_block param;
>  	errcode_t	retval;
>  	int		err;
> +	blk64_t		blocks;
>  
>  	if (common_args_process(argc, argv, 3, 3, "initialize",
>  				"<device> <blocksize>", CHECK_FS_NOTOPEN))
>  		return;
>  
>  	memset(&param, 0, sizeof(struct ext2_super_block));
> -	ext2fs_blocks_count_set(&param, parse_ulong(argv[2], argv[0],
> -						    "blocks count", &err));
> +	err = strtoblk(argv[0], argv[2], &blocks);
> +	if (err)
> +		return;
> +	ext2fs_blocks_count_set(&param, blocks);
>  	if (err)
>  		return;
>  	retval = ext2fs_initialize(argv[1], 0, &param,
> @@ -2109,7 +2111,9 @@ void do_bmap(int argc, char *argv[])
>  	ino = string_to_inode(argv[1]);
>  	if (!ino)
>  		return;
> -	blk = parse_ulong(argv[2], argv[0], "logical_block", &err);
> +	err = strtoblk(argv[0], argv[2], &blk);
> +	if (err)
> +		return;
>  
>  	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
>  	if (errcode) {
> @@ -2254,10 +2258,14 @@ void do_punch(int argc, char *argv[])
>  	ino = string_to_inode(argv[1]);
>  	if (!ino)
>  		return;
> -	start = parse_ulong(argv[2], argv[0], "logical_block", &err);
> -	if (argc == 4)
> -		end = parse_ulong(argv[3], argv[0], "logical_block", &err);
> -	else
> +	err = strtoblk(argv[0], argv[2], &start);
> +	if (err)
> +		return;
> +	if (argc == 4) {
> +		err = strtoblk(argv[0], argv[3], &end);
> +		if (err)
> +			return;
> +	} else
>  		end = ~0;
>  
>  	errcode = ext2fs_punch(current_fs, ino, 0, 0, start, end);
> @@ -2474,8 +2482,11 @@ int main(int argc, char **argv)
>  						"block size", 0);
>  			break;
>  		case 's':
> -			superblock = parse_ulong(optarg, argv[0],
> -						 "superblock number", 0);
> +			retval = strtoblk(argv[0], optarg, &superblock);
> +			if (retval) {
> +				com_err(argv[0], retval, 0, debug_prog_name);
> +				return 1;
> +			}
>  			break;
>  		case 'c':
>  			catastrophic = 1;
> diff --git a/debugfs/extent_inode.c b/debugfs/extent_inode.c
> index 0bbc4c5..75e328c 100644
> --- a/debugfs/extent_inode.c
> +++ b/debugfs/extent_inode.c
> @@ -264,7 +264,7 @@ void do_replace_node(int argc, char *argv[])
>  		return;
>  	}
>  
> -	extent.e_lblk = parse_ulong(argv[1], argv[0], "logical block", &err);
> +	err = strtoblk(argv[0], argv[1], &extent.e_lblk);
>  	if (err)
>  		return;
>  
> @@ -272,7 +272,7 @@ void do_replace_node(int argc, char *argv[])
>  	if (err)
>  		return;
>  
> -	extent.e_pblk = parse_ulong(argv[3], argv[0], "logical block", &err);
> +	err = strtoblk(argv[0], argv[3], &extent.e_pblk);
>  	if (err)
>  		return;
>  
> @@ -338,8 +338,7 @@ void do_insert_node(int argc, char *argv[])
>  		return;
>  	}
>  
> -	extent.e_lblk = parse_ulong(argv[1], cmd,
> -				    "logical block", &err);
> +	err = strtoblk(cmd, argv[1], &extent.e_lblk);
>  	if (err)
>  		return;
>  
> @@ -348,8 +347,7 @@ void do_insert_node(int argc, char *argv[])
>  	if (err)
>  		return;
>  
> -	extent.e_pblk = parse_ulong(argv[3], cmd,
> -				    "pysical block", &err);
> +	err = strtoblk(cmd, argv[3], &extent.e_pblk);
>  	if (err)
>  		return;
>  
> @@ -366,8 +364,8 @@ void do_set_bmap(int argc, char **argv)
>  	const char	*usage = "[--uninit] <lblk> <pblk>";
>  	struct ext2fs_extent extent;
>  	errcode_t	retval;
> -	blk_t		logical;
> -	blk_t		physical;
> +	blk64_t		logical;
> +	blk64_t		physical;
>  	char		*cmd = argv[0];
>  	int		flags = 0;
>  	int		err;
> @@ -387,18 +385,16 @@ void do_set_bmap(int argc, char **argv)
>  		return;
>  	}
>  
> -	logical = parse_ulong(argv[1], cmd,
> -				    "logical block", &err);
> +	err = strtoblk(cmd, argv[1], &logical);
>  	if (err)
>  		return;
>  
> -	physical = parse_ulong(argv[2], cmd,
> -				    "physical block", &err);
> +	err = strtoblk(cmd, argv[2], &physical);
>  	if (err)
>  		return;
>  
>  	retval = ext2fs_extent_set_bmap(current_handle, logical,
> -					(blk64_t) physical, flags);
> +					physical, flags);
>  	if (retval) {
>  		com_err(cmd, retval, 0);
>  		return;
> diff --git a/debugfs/util.c b/debugfs/util.c
> index cf3a6c6..09088e0 100644
> --- a/debugfs/util.c
> +++ b/debugfs/util.c
> @@ -377,7 +377,7 @@ int common_block_args_process(int argc, char *argv[],
>  	}
>  
>  	if (argc > 2) {
> -		*count = parse_ulong(argv[2], argv[0], "count", &err);
> +		err = strtoblk(argv[0], argv[2], count);
>  		if (err)
>  			return 1;
>  	}
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 11/25] libext2fs: only punch complete clusters
  2013-10-18  4:50 ` [PATCH 11/25] libext2fs: only punch complete clusters Darrick J. Wong
@ 2013-10-18 18:55   ` Darrick J. Wong
  2013-11-25  8:51   ` Zheng Liu
  1 sibling, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 18:55 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:50:08PM -0700, Darrick J. Wong wrote:
> When bigalloc is enabled, using ext2fs_block_alloc_stats2() to free
> any block in a cluster has the effect of freeing the entire cluster.
> This is problematic if a caller instructs us to punch, say, blocks
> 12-15 of a 16-block cluster, because blocks 0-11 now point to a "free"
> cluster.
> 
> The naive way to solve this problem is to see if any of the other
> blocks in this logical cluster map to a physical cluster.  If so, then
> we know that the cluster is still in use and it mustn't be freed.
> Otherwise, we are punching the last mapped block in this cluster, so
> we can free the cluster.
> 
> The implementation given only does the rigorous checks for the partial
> clusters at the beginning and end of the punching range.
> 
> v2: Refactor the block free code into a separate helper function that
> should be more efficient.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/bmap.c   |   29 ++++++++++++++++++
>  lib/ext2fs/ext2fs.h |    3 ++
>  lib/ext2fs/punch.c  |   82 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 109 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> index 5074587..80f8f86 100644
> --- a/lib/ext2fs/bmap.c
> +++ b/lib/ext2fs/bmap.c
> @@ -173,6 +173,35 @@ static errcode_t implied_cluster_alloc(ext2_filsys fs, ext2_ino_t ino,
>  	return 0;
>  }
>  
> +/* Try to map a logical block to an already-allocated physical cluster. */
> +errcode_t ext2fs_map_cluster_block(ext2_filsys fs, ext2_ino_t ino,
> +				   struct ext2_inode *inode, blk64_t lblk,
> +				   blk64_t *pblk)
> +{
> +	ext2_extent_handle_t handle;
> +	errcode_t retval;
> +
> +	/* Need bigalloc and extents to be enabled */
> +	*pblk = 0;
> +	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> +					EXT4_FEATURE_RO_COMPAT_BIGALLOC) ||
> +	    !(inode->i_flags & EXT4_EXTENTS_FL))
> +		return 0;
> +
> +	retval = ext2fs_extent_open2(fs, ino, inode, &handle);
> +	if (retval)
> +		goto out;
> +
> +	retval = implied_cluster_alloc(fs, ino, inode, handle, lblk, pblk);
> +	if (retval)
> +		goto out2;
> +
> +out2:
> +	ext2fs_extent_free(handle);
> +out:
> +	return retval;
> +}
> +
>  static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
>  			     struct ext2_inode *inode,
>  			     ext2_extent_handle_t handle,
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 8f82dae..5247922 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -924,6 +924,9 @@ extern errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino,
>  			      struct ext2_inode *inode,
>  			      char *block_buf, int bmap_flags, blk64_t block,
>  			      int *ret_flags, blk64_t *phys_blk);
> +errcode_t ext2fs_map_cluster_block(ext2_filsys fs, ext2_ino_t ino,
> +				   struct ext2_inode *inode, blk64_t lblk,
> +				   blk64_t *pblk);
>  
>  #if 0
>  /* bmove.c */
> diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
> index 790a0ad8..1e4398e 100644
> --- a/lib/ext2fs/punch.c
> +++ b/lib/ext2fs/punch.c
> @@ -177,6 +177,75 @@ static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
>  #define dbg_printf(f, a...)		do { } while (0)
>  #endif
>  
> +/* Free a range of blocks, respecting cluster boundaries */
> +static errcode_t punch_extent_blocks(ext2_filsys fs, ext2_ino_t ino,
> +				     struct ext2_inode *inode,
> +				     blk64_t lfree_start, blk64_t free_start,
> +				     __u32 free_count, int *freed)
> +{
> +	blk64_t		pblk;
> +	int		freed_now = 0;
> +	__u32		cluster_freed;
> +	errcode_t	retval = 0;
> +
> +	/* No bigalloc?  Just free each block. */
> +	if (EXT2FS_CLUSTER_RATIO(fs) == 1) {
> +		*freed += free_count;
> +		while (free_count-- > 0)
> +			ext2fs_block_alloc_stats2(fs, free_start++, -1);
> +		return retval;
> +	}
> +
> +	/*
> +	 * Try to free up to the next cluster boundary.  We assume that all
> +	 * blocks in a logical cluster map to blocks from the same physical
> +	 * cluster, and that the offsets within the [pl]clusters match.
> +	 */
> +	if (free_start & EXT2FS_CLUSTER_MASK(fs)) {
> +		retval = ext2fs_map_cluster_block(fs, ino, inode,
> +						  lfree_start, &pblk);
> +		if (retval)
> +			goto errout;
> +		if (!pblk) {
> +			ext2fs_block_alloc_stats2(fs, free_start, -1);
> +			freed_now++;
> +		}
> +		cluster_freed = EXT2FS_CLUSTER_RATIO(fs) -
> +			(free_start & EXT2FS_CLUSTER_MASK(fs));
> +		if (cluster_freed > free_count)
> +			cluster_freed = free_count;
> +		free_count -= cluster_freed;
> +		free_start += cluster_freed;
> +		lfree_start += cluster_freed;
> +	}
> +
> +	/* Free whole clusters from the middle of the range. */
> +	while (free_count > 0 && free_count >= EXT2FS_CLUSTER_RATIO(fs)) {
> +		ext2fs_block_alloc_stats2(fs, free_start, -1);
> +		freed_now++;
> +		cluster_freed = EXT2FS_CLUSTER_RATIO(fs);
> +		free_count -= cluster_freed;
> +		free_start += cluster_freed;
> +		lfree_start += cluster_freed;
> +	}
> +
> +	/* Try to free the last cluster. */
> +	if (free_count > 0) {
> +		retval = ext2fs_map_cluster_block(fs, ino, inode,
> +						  lfree_start, &pblk);
> +		if (retval)
> +			goto errout;
> +		if (!pblk) {
> +			ext2fs_block_alloc_stats2(fs, free_start, -1);
> +			freed_now++;
> +		}
> +	}
> +
> +errout:
> +	*freed += freed_now;
> +	return retval;
> +}

The major change in this patch since last time is that I broke out the
deallocation step into this separate function and made it do less work
for clusters inside the punch range.

--D
> +
>  static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  				     struct ext2_inode *inode,
>  				     blk64_t start, blk64_t end)
> @@ -184,7 +253,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  	ext2_extent_handle_t	handle = 0;
>  	struct ext2fs_extent	extent;
>  	errcode_t		retval;
> -	blk64_t			free_start, next;
> +	blk64_t			free_start, next, lfree_start;
>  	__u32			free_count, newlen;
>  	int			freed = 0;
>  	int			op;
> @@ -211,6 +280,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			/* Start of deleted region before extent; 
>  			   adjust beginning of extent */
>  			free_start = extent.e_pblk;
> +			lfree_start = extent.e_lblk;
>  			if (next > end)
>  				free_count = end - extent.e_lblk + 1;
>  			else
> @@ -226,6 +296,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			dbg_printf("Case #%d\n", 2);
>  			newlen = start - extent.e_lblk;
>  			free_start = extent.e_pblk + newlen;
> +			lfree_start = extent.e_lblk + newlen;
>  			free_count = extent.e_len - newlen;
>  			extent.e_len = newlen;
>  		} else {
> @@ -241,6 +312,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  
>  			extent.e_len = start - extent.e_lblk;
>  			free_start = extent.e_pblk + extent.e_len;
> +			lfree_start = extent.e_lblk + extent.e_len;
>  			free_count = end - start + 1;
>  
>  			dbg_print_extent("inserting", &newex);
> @@ -281,10 +353,10 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			goto errout;
>  		dbg_printf("Free start %llu, free count = %u\n",
>  		       free_start, free_count);
> -		while (free_count-- > 0) {
> -			ext2fs_block_alloc_stats2(fs, free_start++, -1);
> -			freed++;
> -		}
> +		retval = punch_extent_blocks(fs, ino, inode, lfree_start,
> +					     free_start, free_count, &freed);
> +		if (retval)
> +			goto errout;
>  	next_extent:
>  		retval = ext2fs_extent_get(handle, op,
>  					   &extent);
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/25] resize2fs: convert fs to and from 64bit mode
  2013-10-18  4:50 ` [PATCH 16/25] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
@ 2013-10-18 18:59   ` Darrick J. Wong
  2013-11-26  6:44   ` Zheng Liu
  1 sibling, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 18:59 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:50:42PM -0700, Darrick J. Wong wrote:
> resize2fs does its magic by loading a filesystem, duplicating the
> in-memory image of that fs, moving relevant blocks out of the way of
> whatever new metadata get created, and finally writing everything back
> out to disk.  Enabling 64bit mode enlarges the group descriptors,
> which makes resize2fs a reasonable vehicle for taking care of the rest
> of the bookkeeping requirements, so add to resize2fs the ability to
> convert a filesystem to 64bit mode and back.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  resize/main.c         |   40 ++++++-
>  resize/resize2fs.8.in |   18 +++
>  resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  resize/resize2fs.h    |    3 +
>  4 files changed, 336 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/resize/main.c b/resize/main.c
> index 1394ae1..ad0c946 100644
> --- a/resize/main.c
> +++ b/resize/main.c
> @@ -41,7 +41,7 @@ char *program_name, *device_name, *io_options;
>  static void usage (char *prog)
>  {
>  	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
> -			   "[-p] device [new_size]\n\n"), prog);
> +			   "[-p] device [-b|-s|new_size]\n\n"), prog);
>  
>  	exit (1);
>  }
> @@ -199,7 +199,7 @@ int main (int argc, char ** argv)
>  	if (argc && *argv)
>  		program_name = *argv;
>  
> -	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
> +	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
>  		switch (c) {
>  		case 'h':
>  			usage(program_name);
> @@ -225,6 +225,12 @@ int main (int argc, char ** argv)
>  		case 'S':
>  			use_stride = atoi(optarg);
>  			break;
> +		case 'b':
> +			flags |= RESIZE_ENABLE_64BIT;
> +			break;
> +		case 's':
> +			flags |= RESIZE_DISABLE_64BIT;
> +			break;
>  		default:
>  			usage(program_name);
>  		}
> @@ -383,6 +389,10 @@ int main (int argc, char ** argv)
>  		if (sys_page_size > fs->blocksize)
>  			new_size &= ~((sys_page_size / fs->blocksize)-1);
>  	}
> +	/* If changing 64bit, don't change the filesystem size. */
> +	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> +		new_size = ext2fs_blocks_count(fs->super);
> +	}
>  	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
>  				       EXT4_FEATURE_INCOMPAT_64BIT)) {
>  		/* Take 16T down to 2^32-1 blocks */
> @@ -434,7 +444,31 @@ int main (int argc, char ** argv)
>  			fs->blocksize / 1024, new_size);
>  		exit(1);
>  	}
> -	if (new_size == ext2fs_blocks_count(fs->super)) {
> +	if (flags & RESIZE_DISABLE_64BIT && flags & RESIZE_ENABLE_64BIT) {
> +		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
> +		exit(1);
> +	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> +		new_size = ext2fs_blocks_count(fs->super);
> +		if (new_size >= (1ULL << 32)) {
> +			fprintf(stderr, _("Cannot change the 64bit feature "
> +				"on a filesystem that is larger than "
> +				"2^32 blocks.\n"));
> +			exit(1);
> +		}
> +		if (mount_flags & EXT2_MF_MOUNTED) {
> +			fprintf(stderr, _("Cannot change the 64bit feature "
> +				"while the filesystem is mounted.\n"));
> +			exit(1);
> +		}
> +		if (flags & RESIZE_ENABLE_64BIT &&
> +		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> +				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
> +			fprintf(stderr, _("Please enable the extents feature "
> +				"with tune2fs before enabling the 64bit "
> +				"feature.\n"));
> +			exit(1);
> +		}
> +	} else if (new_size == ext2fs_blocks_count(fs->super)) {
>  		fprintf(stderr, _("The filesystem is already %llu blocks "
>  			"long.  Nothing to do!\n\n"), new_size);
>  		exit(0);
> diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
> index a1f3099..1c75816 100644
> --- a/resize/resize2fs.8.in
> +++ b/resize/resize2fs.8.in
> @@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
>  .SH SYNOPSIS
>  .B resize2fs
>  [
> -.B \-fFpPM
> +.B \-fFpPMbs
>  ]
>  [
>  .B \-d
> @@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
>  to shrink the size of the partition.  When shrinking the size of
>  the partition, make sure you do not make it smaller than the new size
>  of the ext2 filesystem!
> +.PP
> +The
> +.B \-b
> +and
> +.B \-s
> +options enable and disable the 64bit feature, respectively.  The resize2fs
> +program will, of course, take care of resizing the block group descriptors
> +and moving other data blocks out of the way, as needed.  It is not possible
> +to resize the filesystem concurrent with changing the 64bit status.
>  .SH OPTIONS
>  .TP
> +.B \-b
> +Turns on the 64bit feature, resizes the group descriptors as necessary, and
> +moves other metadata out of the way.
> +.TP
>  .B \-d \fIdebug-flags
>  Turns on various resize2fs debugging features, if they have been compiled
>  into the binary.
> @@ -126,6 +139,9 @@ of what the program is doing.
>  .B \-P
>  Print the minimum size of the filesystem and exit.
>  .TP
> +.B \-s
> +Turns off the 64bit feature and frees blocks that are no longer in use.
> +.TP
>  .B \-S \fIRAID-stride
>  The
>  .B resize2fs
> diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> index 0feff0f..05ba6e1 100644
> --- a/resize/resize2fs.c
> +++ b/resize/resize2fs.c
> @@ -53,6 +53,9 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs);
>  static errcode_t fix_sb_journal_backup(ext2_filsys fs);
>  static errcode_t mark_table_blocks(ext2_filsys fs,
>  				   ext2fs_block_bitmap bmap);
> +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
> +static errcode_t move_bg_metadata(ext2_resize_t rfs);
> +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
>  
>  /*
>   * Some helper CPP macros
> @@ -119,13 +122,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
>  	if (retval)
>  		goto errout;
>  
> +	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
> +	retval = resize_group_descriptors(rfs, *new_size);
> +	if (retval)
> +		goto errout;
> +	print_resource_track(rfs, &rtrack, fs->io);
> +
> +	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
> +	retval = move_bg_metadata(rfs);
> +	if (retval)
> +		goto errout;
> +	print_resource_track(rfs, &rtrack, fs->io);
> +
> +	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
> +	retval = zero_high_bits_in_inodes(rfs);
> +	if (retval)
> +		goto errout;
> +	print_resource_track(rfs, &rtrack, fs->io);
> +
>  	init_resource_track(&rtrack, "adjust_superblock", fs->io);
>  	retval = adjust_superblock(rfs, *new_size);
>  	if (retval)
>  		goto errout;
>  	print_resource_track(rfs, &rtrack, fs->io);
>  
> -
>  	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
>  	fix_uninit_block_bitmaps(rfs->new_fs);
>  	print_resource_track(rfs, &rtrack, fs->io);
> @@ -221,6 +241,259 @@ errout:
>  	return retval;
>  }
>  
> +/* Toggle 64bit mode */
> +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
> +{
> +	void *o, *n, *new_group_desc;
> +	dgrp_t i;
> +	int copy_size;
> +	errcode_t retval;
> +
> +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> +		return 0;
> +
> +	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
> +	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
> +	    (rfs->flags & RESIZE_DISABLE_64BIT &&
> +	     rfs->flags & RESIZE_ENABLE_64BIT))
> +		return EXT2_ET_INVALID_ARGUMENT;
> +
> +	if (rfs->flags & RESIZE_DISABLE_64BIT) {
> +		rfs->new_fs->super->s_feature_incompat &=
> +				~EXT4_FEATURE_INCOMPAT_64BIT;
> +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
> +	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
> +		rfs->new_fs->super->s_feature_incompat |=
> +				EXT4_FEATURE_INCOMPAT_64BIT;
> +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
> +	}
> +
> +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> +		return 0;
> +
> +	o = rfs->new_fs->group_desc;
> +	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
> +			rfs->old_fs->group_desc_count,
> +			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
> +	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
> +				      rfs->old_fs->blocksize, &new_group_desc);
> +	if (retval)
> +		return retval;
> +
> +	n = new_group_desc;
> +
> +	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
> +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> +		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
> +	else
> +		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> +		memcpy(n, o, copy_size);
> +		n += EXT2_DESC_SIZE(rfs->new_fs->super);
> +		o += EXT2_DESC_SIZE(rfs->old_fs->super);
> +	}
> +
> +	ext2fs_free_mem(&rfs->new_fs->group_desc);
> +	rfs->new_fs->group_desc = new_group_desc;
> +
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
> +		ext2fs_group_desc_csum_set(rfs->new_fs, i);
> +
> +	return 0;
> +}
> +
> +/* Move bitmaps/inode tables out of the way. */
> +static errcode_t move_bg_metadata(ext2_resize_t rfs)
> +{
> +	dgrp_t i;
> +	blk64_t b, c, d;
> +	ext2fs_block_bitmap old_map, new_map;
> +	int old, new;
> +	errcode_t retval;
> +	int zero = 0, one = 1;
> +
> +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> +		return 0;
> +
> +	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
> +	if (retval)
> +		return retval;
> +
> +	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
> +	if (retval)
> +		goto out;
> +
> +	/* Construct bitmaps of super/descriptor blocks in old and new fs */
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> +		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
> +						   NULL);
> +		if (retval)
> +			goto out;
> +		ext2fs_mark_block_bitmap2(old_map, b);
> +		ext2fs_mark_block_bitmap2(old_map, c);
> +		ext2fs_mark_block_bitmap2(old_map, d);
> +
> +		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
> +						   NULL);
> +		if (retval)
> +			goto out;
> +		ext2fs_mark_block_bitmap2(new_map, b);
> +		ext2fs_mark_block_bitmap2(new_map, c);
> +		ext2fs_mark_block_bitmap2(new_map, d);
> +	}
> +
> +	/* Find changes in block allocations for bg metadata */
> +	for (b = 0;
> +	     b < ext2fs_blocks_count(rfs->new_fs->super);
> +	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
> +		old = ext2fs_test_block_bitmap2(old_map, b);
> +		new = ext2fs_test_block_bitmap2(new_map, b);
> +
> +		if (old && !new)
> +			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
> +		else if (!old && new)
> +			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
> +		else
> +			ext2fs_unmark_block_bitmap2(new_map, b);
> +	}
> +	/* new_map now shows blocks that have been newly allocated. */
> +
> +	/* Move any conflicting bitmaps and inode tables */
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> +		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
> +		if (ext2fs_test_block_bitmap2(new_map, b))
> +			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
> +
> +		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
> +		if (ext2fs_test_block_bitmap2(new_map, b))
> +			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
> +
> +		c = ext2fs_inode_table_loc(rfs->new_fs, i);
> +		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
> +			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
> +				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
> +				break;
> +			}
> +		}
> +	}
> +
> +out:
> +	if (old_map)
> +		ext2fs_free_block_bitmap(old_map);
> +	if (new_map)
> +		ext2fs_free_block_bitmap(new_map);
> +	return retval;
> +}
> +
> +/* Zero out the high bits of extent fields */
> +static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
> +				 struct ext2_inode *inode)
> +{
> +	ext2_extent_handle_t	handle;
> +	struct ext2fs_extent	extent;
> +	int			op = EXT2_EXTENT_ROOT;
> +	errcode_t		errcode;
> +
> +	if (!(inode->i_flags & EXT4_EXTENTS_FL))
> +		return 0;
> +
> +	errcode = ext2fs_extent_open(fs, ino, &handle);
> +	if (errcode)
> +		return errcode;
> +
> +	while (1) {
> +		errcode = ext2fs_extent_get(handle, op, &extent);
> +		if (errcode)
> +			break;
> +
> +		op = EXT2_EXTENT_NEXT_SIB;
> +
> +		if (extent.e_pblk > (1ULL << 32)) {
> +			extent.e_pblk &= (1ULL << 32) - 1;
> +			errcode = ext2fs_extent_replace(handle, 0, &extent);
> +			if (errcode)
> +				break;
> +		}
> +	}
> +
> +	/* Ok if we run off the end */
> +	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
> +		errcode = 0;
> +	return errcode;
> +}
> +
> +/* Zero out the high bits of inodes. */
> +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
> +{
> +	ext2_filsys	fs = rfs->new_fs;
> +	int length = EXT2_INODE_SIZE(fs->super);
> +	struct ext2_inode *inode = NULL;
> +	ext2_inode_scan	scan = NULL;
> +	errcode_t	retval;
> +	ext2_ino_t	ino;
> +	blk64_t		file_acl_block;
> +	int		inode_dirty;
> +
> +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> +		return 0;
> +
> +	if (fs->super->s_creator_os != EXT2_OS_LINUX)
> +		return 0;
> +
> +	retval = ext2fs_open_inode_scan(fs, 0, &scan);
> +	if (retval)
> +		return retval;
> +
> +	retval = ext2fs_get_mem(length, &inode);
> +	if (retval)
> +		goto out;
> +
> +	do {
> +		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
> +		if (retval)
> +			goto out;
> +		if (!ino)
> +			break;
> +		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
> +			continue;
> +
> +		/*
> +		 * Here's how we deal with high block number fields:
> +		 *
> +		 *  - i_size_high has been been written out with i_size_lo
> +		 *    since the ext2 days, so no conversion is needed.
> +		 *
> +		 *  - i_blocks_hi is guarded by both the huge_file feature and
> +		 *    inode flags and has always been written out with
> +		 *    i_blocks_lo if the feature is set.  The field is only
> +		 *    ever read if both feature and inode flag are set, so
> +		 *    we don't need to zero it now.
> +		 *
> +		 *  - i_file_acl_high can be uninitialized, so zero it if
> +		 *    it isn't already.
> +		 */
> +		if (inode->osd2.linux2.l_i_file_acl_high) {
> +			inode->osd2.linux2.l_i_file_acl_high = 0;
> +			retval = ext2fs_write_inode_full(fs, ino, inode,
> +							 length);
> +			if (retval)
> +				goto out;
> +		}
> +
> +		retval = zero_high_bits_in_extents(fs, ino, inode);
> +		if (retval)
> +			goto out;
> +	} while (ino);
> +
> +out:
> +	if (inode)
> +		ext2fs_free_mem(&inode);
> +	if (scan)
> +		ext2fs_close_inode_scan(scan);
> +	return retval;

I forgot this return retval in the previous patch. :(

--D

> +}
> +
>  /*
>   * Clean up the bitmaps for unitialized bitmaps
>   */
> @@ -424,7 +697,8 @@ retry:
>  	/*
>  	 * Reallocate the group descriptors as necessary.
>  	 */
> -	if (old_fs->desc_blocks != fs->desc_blocks) {
> +	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
> +	    old_fs->desc_blocks != fs->desc_blocks) {
>  		retval = ext2fs_resize_mem(old_fs->desc_blocks *
>  					   fs->blocksize,
>  					   fs->desc_blocks * fs->blocksize,
> @@ -949,7 +1223,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
>  		new_blocks = fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
>  	}
>  
> -	if (old_blocks == new_blocks) {
> +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> +	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
> +	    old_blocks == new_blocks) {
>  		retval = 0;
>  		goto errout;
>  	}
> diff --git a/resize/resize2fs.h b/resize/resize2fs.h
> index 52319b5..5a1c5dc 100644
> --- a/resize/resize2fs.h
> +++ b/resize/resize2fs.h
> @@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
>  #define RESIZE_PERCENT_COMPLETE		0x0100
>  #define RESIZE_VERBOSE			0x0200
>  
> +#define RESIZE_ENABLE_64BIT		0x0400
> +#define RESIZE_DISABLE_64BIT		0x0800
> +
>  /*
>   * This structure is used for keeping track of how much resources have
>   * been used for a particular resize2fs pass.
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-10-18  4:51 ` [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes Darrick J. Wong
@ 2013-10-18 19:25   ` Darrick J. Wong
  2013-10-22  1:13   ` Darrick J. Wong
  2013-11-26  7:21   ` Zheng Liu
  2 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 19:25 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

There were a lot of cosmetic changes in this patch since September.  I
rearranged the header file to put the constants up with the other constants,
and the functions under the correct file name.  I added a function to free a EA
block, and added in other checks to make sure that there actually is extra
space after the inode before trying to write EAs there.

This functionality is still missing posix acl <-> ext4 acl translation, but
there are many things missing for EA support, such as printing the names
properly in debugfs, any sort of e2fsck checking, inlinedata integration... all
of which will come next.  For now I prefer to work on reducing the patchbomb
size before I go adding more.

Roughly speaking, this is the diff from the 9/30 patch to yesterday's:

---
 lib/ext2fs/ext2fs.h   |   51 ++++++++++++++------------
 lib/ext2fs/ext_attr.c |   98 +++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 121 insertions(+), 28 deletions(-)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 45555d6..308dc9b 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -638,6 +638,13 @@ typedef struct stat ext2fs_struct_stat;
 #define EXT2_FLAG_FLUSH_NO_SYNC          1
 
 /*
+ * Modify and iterate extended attributes
+ */
+struct ext2_xattr_handle;
+#define XATTR_ABORT	1
+#define XATTR_CHANGED	2
+
+/*
  * function prototypes
  */
 static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
@@ -1161,6 +1168,27 @@ extern errcode_t ext2fs_ext_attr_find_entry(struct ext2_ext_attr_entry **pentry,
 					    size_t size, int sorted);
 extern errcode_t ext2fs_ext_attr_set_entry(struct ext2_ext_attr_info *i,
 					   struct ext2_ext_attr_search *s);
+errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
+			       unsigned int expandby);
+errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
+errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
+errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
+				int (*func)(char *name, char *value,
+					    void *data),
+				void *data);
+errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
+			   void **value, unsigned int *value_len);
+errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
+			   const char *key,
+			   const void *value,
+			   unsigned int value_len);
+errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
+			      const char *key);
+errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
+			     struct ext2_xattr_handle **handle);
+errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
+errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
+			       struct ext2_inode_large *inode);
 
 /* extent.c */
 extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
@@ -1189,29 +1217,6 @@ extern errcode_t ext2fs_extent_goto2(ext2_extent_handle_t handle,
 				     int leaf_level, blk64_t blk);
 extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);
 
-struct ext2_xattr_handle;
-errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
-			       unsigned int expandby);
-errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
-errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
-#define XATTR_ABORT	1
-#define XATTR_CHANGED	2
-errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
-				int (*func)(char *name, char *value,
-					    void *data),
-				void *data);
-errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
-			   void **value, unsigned int *value_len);
-errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
-			   const char *key,
-			   const void *value,
-			   unsigned int value_len);
-errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
-			      const char *key);
-errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
-			     struct ext2_xattr_handle **handle);
-errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
-
 /* fileio.c */
 extern errcode_t ext2fs_file_open2(ext2_filsys fs, ext2_ino_t ino,
 				   struct ext2_inode *inode,
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 876dac7..c5fd070 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -373,7 +373,6 @@ errcode_t ext2fs_ext_attr_set_entry(struct ext2_ext_attr_info *i,
 	return 0;
 }
 
-
 /* Manipulate the contents of extended attribute regions */
 struct ext2_xattr {
 	char *name;
@@ -447,6 +446,74 @@ static int find_ea_index(const char *fullname, char **name, int *index)
 	return 0;
 }
 
+errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
+			       struct ext2_inode_large *inode)
+{
+	struct ext2_ext_attr_header *header;
+	void *block_buf = NULL;
+	dgrp_t grp;
+	blk64_t blk, goal;
+	errcode_t err;
+	struct ext2_inode_large i;
+
+	/* Read inode? */
+	if (inode == NULL) {
+		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
+					     sizeof(struct ext2_inode_large));
+		if (err)
+			return err;
+		inode = &i;
+	}
+
+	/* Do we already have an EA block? */
+	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
+	if (blk == 0)
+		return 0;
+
+	/* Find block, zero it, write back */
+	if ((blk < fs->super->s_first_data_block) ||
+	    (blk >= ext2fs_blocks_count(fs->super))) {
+		err = EXT2_ET_BAD_EA_BLOCK_NUM;
+		goto out;
+	}
+
+	err = ext2fs_get_mem(fs->blocksize, &block_buf);
+	if (err)
+		goto out;
+
+	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
+	if (err)
+		goto out2;
+
+	header = (struct ext2_ext_attr_header *) block_buf;
+	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
+		err = EXT2_ET_BAD_EA_HEADER;
+		goto out2;
+	}
+
+	header->h_refcount--;
+	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
+	if (err)
+		goto out2;
+
+	/* Erase link to block */
+	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
+	if (header->h_refcount == 0)
+		ext2fs_block_alloc_stats2(fs, blk, -1);
+
+	/* Write inode? */
+	if (inode == &i) {
+		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
+					      sizeof(struct ext2_inode_large));
+		if (err)
+			goto out2;
+	}
+out2:
+	ext2fs_free_mem(&block_buf);
+out:
+	return err;
+}
+
 static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
 					 struct ext2_inode_large *inode)
 {
@@ -586,7 +653,10 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 				     EXT2_FEATURE_COMPAT_EXT_ATTR))
 		return 0;
 
-	err = ext2fs_get_memzero(EXT2_INODE_SIZE(handle->fs->super), &inode);
+	i = EXT2_INODE_SIZE(handle->fs->super);
+	if (i < sizeof(*inode))
+		i = sizeof(*inode);
+	err = ext2fs_get_memzero(i, &inode);
 	if (err)
 		return err;
 
@@ -596,6 +666,13 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 	if (err)
 		goto out;
 
+	x = handle->attrs;
+	/* Does the inode have size for EA? */
+	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
+						  inode->i_extra_isize +
+						  sizeof(__u32))
+		goto write_ea_block;
+
 	/* Write the inode EA */
 	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
 	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
@@ -605,7 +682,6 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 		sizeof(__u32);
 	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
 		inode->i_extra_isize + sizeof(__u32);
-	x = handle->attrs;
 
 	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
 	if (err)
@@ -615,6 +691,7 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 	if (x == handle->attrs + handle->length)
 		goto skip_ea_block;
 
+write_ea_block:
 	/* Write the EA block */
 	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
 	if (err)
@@ -634,7 +711,6 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
 		goto out2;
 	}
 
-skip_ea_block:
 	if (block_buf) {
 		/* Write a header on the EA block */
 		header = block_buf;
@@ -656,6 +732,7 @@ skip_ea_block:
 			goto out2;
 	}
 
+skip_ea_block:
 	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
 	if (!block_buf && blk) {
 		/* xattrs shrunk, free the block */
@@ -775,13 +852,17 @@ errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
 	unsigned int storage_size;
 	void *start, *block_buf = NULL;
 	blk64_t blk;
+	int i;
 	errcode_t err;
 
 	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
 				     EXT2_FEATURE_COMPAT_EXT_ATTR))
 		return 0;
 
-	err = ext2fs_get_memzero(EXT2_INODE_SIZE(handle->fs->super), &inode);
+	i = EXT2_INODE_SIZE(handle->fs->super);
+	if (i < sizeof(*inode))
+		i = sizeof(*inode);
+	err = ext2fs_get_memzero(i, &inode);
 	if (err)
 		return err;
 
@@ -791,6 +872,12 @@ errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
 	if (err)
 		goto out;
 
+	/* Does the inode have size for EA? */
+	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
+						  inode->i_extra_isize +
+						  sizeof(__u32))
+		goto read_ea_block;
+
 	/* Look for EA in the inode */
 	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
 	       inode->i_extra_isize, sizeof(__u32));
@@ -807,6 +894,7 @@ errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
 			goto out;
 	}
 
+read_ea_block:
 	/* Look for EA in a separate EA block */
 	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
 	if (blk != 0) {

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs
  2013-10-18  4:51 ` [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs Darrick J. Wong
@ 2013-10-18 19:36   ` Darrick J. Wong
  2013-10-22  1:20   ` Darrick J. Wong
  1 sibling, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 19:36 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

Since last month, I've removed a number of flush calls that didn't seem
necessary, and amended the inode removal function to release any associated EA
blocks.  The diff looks like this:

---
 misc/fuse2fs.c |   49 ++++++++-----------------------------------------
 1 file changed, 8 insertions(+), 41 deletions(-)

diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
index 09171a9..d1c00df 100644
--- a/misc/fuse2fs.c
+++ b/misc/fuse2fs.c
@@ -382,6 +382,10 @@ static void op_destroy(void *p)
 		err = ext2fs_set_gdt_csum(fs);
 		if (err)
 			translate_error(fs, 0, err);
+
+		err = ext2fs_flush2(fs, 0);
+		if (err)
+			translate_error(fs, 0, err);
 	}
 }
 
@@ -636,12 +640,6 @@ static int op_mknod(const char *path, mode_t mode, dev_t dev)
 
 	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
 
-	err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
-	if (err) {
-		ret = translate_error(fs, child, err);
-		goto out2;
-	}
-
 out2:
 	pthread_mutex_unlock(&ff->bfl);
 out:
@@ -854,18 +852,14 @@ static int remove_inode(struct fuse2fs *ff, ext2_ino_t ino)
 	if (inode.i_links_count)
 		goto out;
 
+	err = ext2fs_free_ext_attr(fs, ino, &inode);
+	if (err)
+		goto out;
 	if (ext2fs_inode_has_valid_blocks2(fs, (struct ext2_inode *)&inode))
 		ext2fs_block_iterate3(fs, ino, BLOCK_FLAG_READ_ONLY, NULL,
 				      release_blocks_proc, NULL);
 	ext2fs_inode_alloc_stats2(fs, ino, -1,
 				  LINUX_S_ISDIR(inode.i_mode));
-
-	err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
-	if (err) {
-		ret = translate_error(fs, ino, err);
-		goto out;
-	}
-
 out:
 	return ret;
 }
@@ -1655,26 +1649,6 @@ out:
 	return got ? got : ret;
 }
 
-static int op_flush(const char *path, struct fuse_file_info *fp)
-{
-	struct fuse_context *ctxt = fuse_get_context();
-	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
-	ext2_filsys fs = ff->fs;
-	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
-	errcode_t err;
-	int ret = 0;
-
-	pthread_mutex_lock(&ff->bfl);
-	if (fs_writeable(fs) && fh->open_flags & EXT2_FILE_WRITE) {
-		err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
-		if (err)
-			ret = translate_error(fs, fh->ino, err);
-	}
-	pthread_mutex_unlock(&ff->bfl);
-
-	return ret;
-}
-
 static int op_release(const char *path, struct fuse_file_info *fp)
 {
 	struct fuse_context *ctxt = fuse_get_context();
@@ -2219,12 +2193,6 @@ static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
 
 	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
 
-	err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
-	if (err) {
-		ret = translate_error(fs, child, err);
-		goto out2;
-	}

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
  2013-10-18 18:13   ` Darrick J. Wong
@ 2013-10-18 20:37     ` Darrick J. Wong
  0 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-18 20:37 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: tytso, linux-ext4

On Fri, Oct 18, 2013 at 11:13:43AM -0700, Darrick J. Wong wrote:
> On Fri, Oct 18, 2013 at 03:13:57PM +0200, Lukáš Czerner wrote:
> > On Thu, 17 Oct 2013, Darrick J. Wong wrote:
> > 
> > > Date: Thu, 17 Oct 2013 21:48:54 -0700
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > To: tytso@mit.edu, darrick.wong@oracle.com
> > > Cc: linux-ext4@vger.kernel.org
> > > Subject: [PATCH v2 00/25] e2fsprogs patchbomb 10/2013
> > 
> > I was going to review this, but could you please include the
> > information what changed since the last version of each patch (into
> > the patch itself) since it'll make the review much easier.
> 
> I'm working on providing diffs against last time. :)

Ok, I've sent out emails about what's changed since last time.

Patches 1, 3-8, 15, 17-22, and 25 are new.

Patches 2, 9, 10-11, 16, and 23-24 have changes, as noted separately.

Patches 12-14 are unchanged.

--D
> 
> --D
> > 
> > Thanks!
> > -Lukas
> > 
> > > 
> > > Well, here we go again.  This is the same patchbomb from a couple of
> > > weeks ago, minus the patches that Ted has already accepted, plus
> > > several fixes to resize2fs that weren't ready back then, and a few
> > > other fixes that migrated into the other patches.  This series is
> > > against -next.
> > > 
> > > Ted, since you've accepted patches into -pu, do you want me to send
> > > patches against -pu as well?  Or put more bluntly, what are your
> > > thoughts about revert-and-replace of patches in -pu?  Patches 2, 6,
> > > 11, 23, and 24 have changed significantly since 9/30.
> > > 
> > > The first eight patches fix miscellaneous errors: #1 stops dirent
> > > iteration after we successfully link an inode into a directory.  #2
> > > fixes a bug that prohibited us from specifyinng a 64bit superblock
> > > number when opening an FS.  #3 prohibits running mke2fs with -E
> > > resize= and meta_bg.  #4 causes users of the badblocks code to reject
> > > 64bit block numbers.  #5 fixes shift overflows errors when punching
> > > the end of non-extent files.  #6 refactors all the tests for whether
> > > or not we need to set the LARGE_FILE feature (because someone goofed
> > > earlier).  #7 fixes a problem wherein mkfs ignored non-4096 blocksize
> > > directives in the config file on a device larger than 2^32KB.  #8
> > > cleans up some code in debugfs.
> > > 
> > > The next two patches fix some 64bit truncation bugs.
> > > 
> > > Regarding next five patches, I turned on bigalloc and found a number
> > > of bugs relating to the fact that block_alloc_stats2() takes a block
> > > number but operates on clusters.  I've fixed up all the allocation
> > > errors that I found.  I also decided to make the quota code use
> > > ext2fs_punch rather than try to correct its behavior wrt bigalloc.
> > > There was also a bug wherein the requirement that 64-bit bitmaps be
> > > enabled (via EXT2_FLAG_64BITS) for bigalloc filesystems.  There's also
> > > a patch to reduce the e2fsck output verbosity when there are block
> > > bitmap errors.  Note that #11 has been refactored significantly.
> > > 
> > > The next patch provides the ability to toggle the 64bit feature on any
> > > ext4 filesystem.
> > > 
> > > The four patches after that fix various resize2fs bugs with bigalloc.
> > > 
> > > The next two patches fix bugs with metadata_csum.  There's a patch to
> > > fix up some code to test if checksums are enabled instead of a
> > > GDT_CSUM open-code.  Finally, there's a patch to resize2fs to rewrite
> > > checksums of inodes that were relocated.
> > > 
> > > The next two patches add the ability to edit extended attributes and
> > > add a fuse2fs driver for e2fsprogs.  I admit that the xattr editing
> > > functions clash with the inline_data patches, though sadly, the inline
> > > data patches don't provide an API to access EAs in a separate EA
> > > block.  The fuse driver should work with the latest versions of Linux
> > > fuse (2.9.2) and osxfuse (2.6.1).  I've been using the fuse driver to
> > > test e2fsprogs functionality, which is how I came across most of the
> > > bugs fixed above.  Both of these patches (#23 and #24) have received
> > > fixes since the 9/30 posting.
> > > 
> > > The final patch adds my metadata checksum test program to the tests/
> > > directory, along with a new metadata_check target to run a quick
> > > check.  It includes substitute mount/umount commands for use with
> > > fuse2fs.
> > > 
> > > For fuse2fs, I think it'd be useful to reintroduce journal replay too.
> > > (Or cheat and call e2fsck -E journal_only...)  Also, fuse2fs doesn't
> > > yet know how to read or write ACLs yet.
> > > 
> > > I've tested these e2fsprogs changes against the -next branch as of a
> > > few days ago.  These days, I use a 2GB ramdisk and a 20T "disk" I
> > > constructed out of dm-snapshot to test in an x64 VM.
> > > 
> > > Comments and questions are, as always, welcome.
> > > 
> > > --D
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-10-18  4:51 ` [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes Darrick J. Wong
  2013-10-18 19:25   ` Darrick J. Wong
@ 2013-10-22  1:13   ` Darrick J. Wong
  2013-11-26  7:21   ` Zheng Liu
  2 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-22  1:13 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> Add functions to allow clients to get, set, and remove extended
> attributes from any file.  It also supports modifying EAs living in
> i_file_acl.
> 
> v2: Put the header declarations in the correct part of ext2fs.h,
> provide a function to release an EA block from an inode, and check
> i_extra_isize to make sure we actually have space for in-inode EAs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/ext2_err.et.in |   18 +
>  lib/ext2fs/ext2fs.h       |   28 ++
>  lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 807 insertions(+)
> 
> 
> diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
> index 9cc1bd1..b819a90 100644
> --- a/lib/ext2fs/ext2_err.et.in
> +++ b/lib/ext2fs/ext2_err.et.in
> @@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
>  ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
>  	"Cannot block iterate on an inode containing inline data"
>  
> +ec	EXT2_ET_EA_BAD_NAME_LEN,
> +	"Extended attribute has an invalid name length"
> +
> +ec	EXT2_ET_EA_BAD_VALUE_SIZE,
> +	"Extended attribute has an invalid value length"
> +
> +ec	EXT2_ET_BAD_EA_HASH,
> +	"Extended attribute has an incorrect hash"
> +
> +ec	EXT2_ET_BAD_EA_HEADER,
> +	"Extended attribute block has a bad header"
> +
> +ec	EXT2_ET_EA_KEY_NOT_FOUND,
> +	"Extended attribute key not found"
> +
> +ec	EXT2_ET_EA_NO_SPACE,
> +	"Insufficient space to store extended attribute data"
> +
>  	end
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 5247922..93adae8 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
>  #define EXT2_FLAG_FLUSH_NO_SYNC          1
>  
>  /*
> + * Modify and iterate extended attributes
> + */
> +struct ext2_xattr_handle;
> +#define XATTR_ABORT	1
> +#define XATTR_CHANGED	2
> +
> +/*
>   * function prototypes
>   */
>  static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> @@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
>  					   char *block_buf,
>  					   int adjust, __u32 *newcount,
>  					   ext2_ino_t inum);
> +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> +			       unsigned int expandby);
> +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> +				int (*func)(char *name, char *value,
> +					    void *data),

The value length needs to be passed to the helper function.

> +				void *data);
> +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> +			   void **value, unsigned int *value_len);
> +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> +			   const char *key,
> +			   const void *value,
> +			   unsigned int value_len);

These lengths ought to be size_t, not unsigned int.

Also, shouldn't there be a way to query the number of extended attributes?

> +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> +			      const char *key);
> +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> +			     struct ext2_xattr_handle **handle);
> +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> +			       struct ext2_inode_large *inode);
>  
>  /* extent.c */
>  extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
> diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> index 9649a14..2a1e5e7 100644
> --- a/lib/ext2fs/ext_attr.c
> +++ b/lib/ext2fs/ext_attr.c
> @@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
>  	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
>  					  newcount);
>  }
> +
> +/* Manipulate the contents of extended attribute regions */
> +struct ext2_xattr {
> +	char *name;
> +	void *value;
> +	unsigned int value_len;
> +};
> +
> +struct ext2_xattr_handle {
> +	ext2_filsys fs;
> +	struct ext2_xattr *attrs;
> +	unsigned int length;
> +	ext2_ino_t ino;
> +	int dirty;
> +};
> +
> +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> +			       unsigned int expandby)

Not used outside this file; could be static.

> +{
> +	struct ext2_xattr *new_attrs;
> +	errcode_t err;
> +
> +	err = ext2fs_get_arrayzero(h->length + expandby,
> +				   sizeof(struct ext2_xattr), &new_attrs);
> +	if (err)
> +		return err;
> +
> +	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
> +	ext2fs_free_mem(&h->attrs);
> +	h->length += expandby;
> +	h->attrs = new_attrs;
> +
> +	return 0;
> +}
> +
> +struct ea_name_index {
> +	int index;
> +	const char *name;
> +};
> +
> +static struct ea_name_index ea_names[] = {
> +	{1, "user."},
> +	{2, "system.posix_acl_access"},
> +	{3, "system.posix_acl_default"},
> +	{4, "trusted."},
> +	{6, "security."},
> +	{7, "system."},
> +	{0, NULL},
> +};
> +
> +static const char *find_ea_prefix(int index)
> +{
> +	struct ea_name_index *e;
> +
> +	for (e = ea_names; e->name; e++)
> +		if (e->index == index)
> +			return e->name;
> +
> +	return NULL;
> +}
> +
> +static int find_ea_index(const char *fullname, char **name, int *index)
> +{
> +	struct ea_name_index *e;
> +
> +	for (e = ea_names; e->name; e++)
> +		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
> +			*name = (char *)fullname + strlen(e->name);
> +			*index = e->index;
> +			return 1;
> +		}
> +	return 0;
> +}
> +
> +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> +			       struct ext2_inode_large *inode)
> +{
> +	struct ext2_ext_attr_header *header;
> +	void *block_buf = NULL;
> +	dgrp_t grp;
> +	blk64_t blk, goal;
> +	errcode_t err;
> +	struct ext2_inode_large i;
> +
> +	/* Read inode? */
> +	if (inode == NULL) {
> +		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
> +					     sizeof(struct ext2_inode_large));
> +		if (err)
> +			return err;
> +		inode = &i;
> +	}
> +
> +	/* Do we already have an EA block? */
> +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> +	if (blk == 0)
> +		return 0;
> +
> +	/* Find block, zero it, write back */
> +	if ((blk < fs->super->s_first_data_block) ||
> +	    (blk >= ext2fs_blocks_count(fs->super))) {
> +		err = EXT2_ET_BAD_EA_BLOCK_NUM;
> +		goto out;
> +	}
> +
> +	err = ext2fs_get_mem(fs->blocksize, &block_buf);
> +	if (err)
> +		goto out;
> +
> +	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> +	if (err)
> +		goto out2;
> +
> +	header = (struct ext2_ext_attr_header *) block_buf;
> +	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> +		err = EXT2_ET_BAD_EA_HEADER;
> +		goto out2;
> +	}
> +
> +	header->h_refcount--;
> +	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> +	if (err)
> +		goto out2;
> +
> +	/* Erase link to block */
> +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
> +	if (header->h_refcount == 0)
> +		ext2fs_block_alloc_stats2(fs, blk, -1);

Hmm, i_blocks should be decremented here, no?

> +
> +	/* Write inode? */
> +	if (inode == &i) {
> +		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
> +					      sizeof(struct ext2_inode_large));
> +		if (err)
> +			goto out2;
> +	}
> +
> +out2:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	return err;
> +}
> +
> +static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
> +					 struct ext2_inode_large *inode)
> +{
> +	struct ext2_ext_attr_header *header;
> +	void *block_buf = NULL;
> +	dgrp_t grp;
> +	blk64_t blk, goal;
> +	errcode_t err;
> +
> +	/* Do we already have an EA block? */
> +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> +	if (blk != 0) {
> +		if ((blk < fs->super->s_first_data_block) ||
> +		    (blk >= ext2fs_blocks_count(fs->super))) {
> +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> +			goto out;
> +		}
> +
> +		err = ext2fs_get_mem(fs->blocksize, &block_buf);
> +		if (err)
> +			goto out;
> +
> +		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> +		if (err)
> +			goto out2;
> +
> +		header = (struct ext2_ext_attr_header *) block_buf;
> +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> +			err = EXT2_ET_BAD_EA_HEADER;
> +			goto out2;
> +		}
> +
> +		/* Single-user block.  We're done here. */
> +		if (header->h_refcount == 1)
> +			return 0;

This leaks block_buf.

> +
> +		/* We need to CoW the block. */
> +		header->h_refcount--;
> +		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> +		if (err)
> +			goto out2;
> +	} else {
> +		/* No block, we must increment i_blocks */
> +		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
> +					     1);
> +		if (err)
> +			goto out;
> +	}
> +
> +	/* Allocate a block */
> +	grp = ext2fs_group_of_ino(fs, ino);
> +	goal = ext2fs_inode_table_loc(fs, grp);
> +	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
> +	if (err)
> +		return err;
> +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
> +out2:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	return err;
> +}
> +
> +
> +static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
> +					struct ext2_xattr **pos,
> +					void *entries_start,
> +					unsigned int storage_size,
> +					unsigned int value_offset_correction)
> +{
> +	struct ext2_xattr *x = *pos;
> +	struct ext2_ext_attr_entry *e = entries_start;
> +	void *end = entries_start + storage_size;
> +	char *shortname;
> +	unsigned int entry_size, value_size;
> +	int idx, ret;
> +
> +	/* For all remaining x...  */
> +	for (; x < handle->attrs + handle->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		/* Calculate index and shortname position */
> +		shortname = x->name;
> +		ret = find_ea_index(x->name, &shortname, &idx);
> +
> +		/* Calculate entry and value size */
> +		entry_size = (sizeof(*e) + strlen(shortname) +
> +			      EXT2_EXT_ATTR_PAD - 1) &
> +			     ~(EXT2_EXT_ATTR_PAD - 1);
> +		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
> +			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
> +
> +		/*
> +		 * Would entry collide with value?
> +		 * Note that we must leave sufficient room for a (u32)0 to
> +		 * mark the end of the entries.
> +		 */
> +		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
> +			break;
> +
> +		/* Fill out e appropriately */
> +		e->e_name_len = strlen(shortname);
> +		e->e_name_index = (ret ? idx : 0);
> +		e->e_value_offs = end - value_size - (void *)entries_start +
> +				value_offset_correction;
> +		e->e_value_block = 0;
> +		e->e_value_size = x->value_len;
> +
> +		/* Store name and value */
> +		end -= value_size;
> +		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
> +		memcpy(end, x->value, e->e_value_size);
> +
> +		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
> +
> +		e = EXT2_EXT_ATTR_NEXT(e);
> +		*(__u32 *)e = 0;
> +	}
> +	*pos = x;
> +
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> +{
> +	struct ext2_xattr *x;
> +	struct ext2_inode_large *inode;
> +	void *start, *block_buf = NULL;
> +	struct ext2_ext_attr_header *header;
> +	__u32 ea_inode_magic;
> +	blk64_t blk;
> +	unsigned int storage_size;
> +	unsigned int i, written;
> +	errcode_t err;
> +
> +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> +		return 0;
> +
> +	i = EXT2_INODE_SIZE(handle->fs->super);
> +	if (i < sizeof(*inode))
> +		i = sizeof(*inode);
> +	err = ext2fs_get_memzero(i, &inode);
> +	if (err)
> +		return err;
> +
> +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> +				     (struct ext2_inode *)inode,
> +				     EXT2_INODE_SIZE(handle->fs->super));
> +	if (err)
> +		goto out;
> +
> +	x = handle->attrs;
> +	/* Does the inode have size for EA? */
> +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> +						  inode->i_extra_isize +
> +						  sizeof(__u32))
> +		goto write_ea_block;
> +
> +	/* Write the inode EA */
> +	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
> +	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
> +	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> +		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> +		sizeof(__u32);
> +	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +		inode->i_extra_isize + sizeof(__u32);
> +
> +	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
> +	if (err)
> +		goto out;
> +
> +	/* Are we done? */
> +	if (x == handle->attrs + handle->length)
> +		goto skip_ea_block;
> +
> +write_ea_block:
> +	/* Write the EA block */
> +	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> +	if (err)
> +		goto out;
> +
> +	storage_size = handle->fs->blocksize -
> +		sizeof(struct ext2_ext_attr_header);
> +	start = block_buf + sizeof(struct ext2_ext_attr_header);
> +
> +	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
> +				     (void *)start - block_buf);
> +	if (err)
> +		goto out2;
> +
> +	if (x < handle->attrs + handle->length) {
> +		err = EXT2_ET_EA_NO_SPACE;
> +		goto out2;
> +	}
> +
> +	if (block_buf) {
> +		/* Write a header on the EA block */
> +		header = block_buf;
> +		header->h_magic = EXT2_EXT_ATTR_MAGIC;
> +		header->h_refcount = 1;
> +		header->h_blocks = 1;
> +
> +		/* Get a new block for writing */
> +		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
> +		if (err)
> +			goto out2;
> +
> +		/* Finally, write the new EA block */
> +		blk = ext2fs_file_acl_block(handle->fs,
> +					    (struct ext2_inode *)inode);
> +		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
> +					     handle->ino);
> +		if (err)
> +			goto out2;
> +	}
> +
> +skip_ea_block:
> +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> +	if (!block_buf && blk) {
> +		/* xattrs shrunk, free the block */
> +		ext2fs_file_acl_block_set(handle->fs,
> +					  (struct ext2_inode *)inode, 0);
> +		err = ext2fs_iblk_sub_blocks(handle->fs,
> +					     (struct ext2_inode *)inode, 1);
> +		if (err)
> +			goto out;
> +		ext2fs_block_alloc_stats2(handle->fs, blk, -1);

I think ext2fs_free_ext_attr() here?

> +	}
> +
> +	/* Write the inode */
> +	err = ext2fs_write_inode_full(handle->fs, handle->ino,
> +				      (struct ext2_inode *)inode,
> +				      EXT2_INODE_SIZE(handle->fs->super));
> +	if (err)
> +		goto out2;
> +
> +out2:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	ext2fs_free_mem(&inode);
> +	handle->dirty = 0;
> +	return err;
> +}
> +
> +static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
> +					 struct ext2_ext_attr_entry *entries,
> +					 unsigned int storage_size,
> +					 void *value_start)
> +{
> +	struct ext2_xattr *x;
> +	struct ext2_ext_attr_entry *entry;
> +	const char *prefix;
> +	void *ptr;
> +	unsigned int remain, prefix_len;
> +	errcode_t err;
> +
> +	x = handle->attrs;
> +	while (x->name)
> +		x++;
> +
> +	entry = entries;
> +	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
> +		__u32 hash;
> +
> +		/* header eats this space */
> +		remain -= sizeof(struct ext2_ext_attr_entry);
> +
> +		/* is attribute name valid? */
> +		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
> +			return EXT2_ET_EA_BAD_NAME_LEN;
> +
> +		/* attribute len eats this space */
> +		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
> +
> +		/* check value size */
> +		if (entry->e_value_size > remain)
> +			return EXT2_ET_EA_BAD_VALUE_SIZE;
> +
> +		/* e_value_block must be 0 in inode's ea */
> +		if (entry->e_value_block != 0)
> +			return EXT2_ET_BAD_EA_BLOCK_NUM;
> +
> +		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
> +							 entry->e_value_offs);
> +
> +		/* e_hash may be 0 in older inode's ea */
> +		if (entry->e_hash != 0 && entry->e_hash != hash)
> +			return EXT2_ET_BAD_EA_HASH;
> +
> +		remain -= entry->e_value_size;
> +
> +		/* Allocate space for more attrs? */
> +		if (x == handle->attrs + handle->length) {
> +			err = ext2fs_xattrs_expand(handle, 4);
> +			if (err)
> +				return err;
> +			x = handle->attrs + handle->length - 4;
> +		}
> +
> +		/* Extract name/value */
> +		prefix = find_ea_prefix(entry->e_name_index);
> +		prefix_len = (prefix ? strlen(prefix) : 0);
> +		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
> +					 &x->name);
> +		if (err)
> +			return err;
> +		if (prefix)
> +			memcpy(x->name, prefix, prefix_len);
> +		if (entry->e_name_len)
> +			memcpy(x->name + prefix_len,
> +			       (void *)entry + sizeof(*entry),
> +			       entry->e_name_len);
> +
> +		err = ext2fs_get_mem(entry->e_value_size, &x->value);
> +		if (err)
> +			return err;
> +		x->value_len = entry->e_value_size;
> +		memcpy(x->value, value_start + entry->e_value_offs,
> +		       entry->e_value_size);
> +		x++;
> +		entry = EXT2_EXT_ATTR_NEXT(entry);
> +	}
> +
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
> +{
> +	struct ext2_xattr *attrs = NULL, *x;
> +	unsigned int attrs_len;
> +	struct ext2_inode_large *inode;
> +	struct ext2_ext_attr_header *header;
> +	__u32 ea_inode_magic;
> +	unsigned int storage_size;
> +	void *start, *block_buf = NULL;
> +	blk64_t blk;
> +	int i;
> +	errcode_t err;
> +
> +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> +		return 0;

Maybe we should free all the keys and values, just in case someone is
re-reading the EAs?

I'm not sure this really would ever happen, but I like the idea of defensively
programming against filling up our data structure with duplicate keys by
accidentally reading multiple times.

All of these are of course now fixed in my tree.

--D
> +
> +	i = EXT2_INODE_SIZE(handle->fs->super);
> +	if (i < sizeof(*inode))
> +		i = sizeof(*inode);
> +	err = ext2fs_get_memzero(i, &inode);
> +	if (err)
> +		return err;
> +
> +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> +				     (struct ext2_inode *)inode,
> +				     EXT2_INODE_SIZE(handle->fs->super));
> +	if (err)
> +		goto out;
> +
> +	/* Does the inode have size for EA? */
> +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> +						  inode->i_extra_isize +
> +						  sizeof(__u32))
> +		goto read_ea_block;
> +
> +	/* Look for EA in the inode */
> +	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +	       inode->i_extra_isize, sizeof(__u32));
> +	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
> +		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> +			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> +			sizeof(__u32);
> +		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +			inode->i_extra_isize + sizeof(__u32);
> +
> +		err = read_xattrs_from_buffer(handle, start, storage_size,
> +					      start);
> +		if (err)
> +			goto out;
> +	}
> +
> +read_ea_block:
> +	/* Look for EA in a separate EA block */
> +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> +	if (blk != 0) {
> +		if ((blk < handle->fs->super->s_first_data_block) ||
> +		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
> +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> +			goto out;
> +		}
> +
> +		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> +		if (err)
> +			goto out;
> +
> +		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
> +					    handle->ino);
> +		if (err)
> +			goto out3;
> +
> +		header = (struct ext2_ext_attr_header *) block_buf;
> +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> +			err = EXT2_ET_BAD_EA_HEADER;
> +			goto out3;
> +		}
> +
> +		if (header->h_blocks != 1) {
> +			err = EXT2_ET_BAD_EA_HEADER;
> +			goto out3;
> +		}
> +
> +		/* Read EAs */
> +		storage_size = handle->fs->blocksize -
> +			sizeof(struct ext2_ext_attr_header);
> +		start = block_buf + sizeof(struct ext2_ext_attr_header);
> +		err = read_xattrs_from_buffer(handle, start, storage_size,
> +					      block_buf);
> +		if (err)
> +			goto out3;
> +
> +		ext2fs_free_mem(&block_buf);
> +	}
> +
> +	ext2fs_free_mem(&block_buf);
> +	ext2fs_free_mem(&inode);
> +	return 0;
> +
> +out3:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	ext2fs_free_mem(&inode);
> +	return err;
> +}
> +
> +#define XATTR_ABORT	1
> +#define XATTR_CHANGED	2
> +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> +				int (*func)(char *name, char *value,
> +					    void *data),
> +				void *data)
> +{
> +	struct ext2_xattr *x;
> +	errcode_t err;
> +	int ret;
> +
> +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		ret = func(x->name, x->value, data);
> +		if (ret & XATTR_CHANGED)
> +			h->dirty = 1;
> +		if (ret & XATTR_ABORT)
> +			return 0;
> +	}
> +
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> +			   void **value, unsigned int *value_len)
> +{
> +	struct ext2_xattr *x;
> +	void *val;
> +	errcode_t err;
> +
> +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		if (strcmp(x->name, key) == 0) {
> +			err = ext2fs_get_mem(x->value_len, &val);
> +			if (err)
> +				return err;
> +			memcpy(val, x->value, x->value_len);
> +			*value = val;
> +			*value_len = x->value_len;
> +			return 0;
> +		}
> +	}
> +
> +	return EXT2_ET_EA_KEY_NOT_FOUND;
> +}
> +
> +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> +			   const char *key,
> +			   const void *value,
> +			   unsigned int value_len)
> +{
> +	struct ext2_xattr *x, *last_empty;
> +	char *new_value;
> +	errcode_t err;
> +
> +	last_empty = NULL;
> +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> +		if (!x->name) {
> +			last_empty = x;
> +			continue;
> +		}
> +
> +		/* Replace xattr */
> +		if (strcmp(x->name, key) == 0) {
> +			err = ext2fs_get_mem(value_len, &new_value);
> +			if (err)
> +				return err;
> +			memcpy(new_value, value, value_len);
> +			ext2fs_free_mem(&x->value);
> +			x->value = new_value;
> +			x->value_len = value_len;
> +			handle->dirty = 1;
> +			return 0;
> +		}
> +	}
> +
> +	/* Add attr to empty slot */
> +	if (last_empty) {
> +		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
> +		if (err)
> +			return err;
> +		strcpy(last_empty->name, key);
> +
> +		err = ext2fs_get_mem(value_len, &last_empty->value);
> +		if (err)
> +			return err;
> +		memcpy(last_empty->value, value, value_len);
> +		last_empty->value_len = value_len;
> +		handle->dirty = 1;
> +		return 0;
> +	}
> +
> +	/* Expand array, append slot */
> +	err = ext2fs_xattrs_expand(handle, 4);
> +	if (err)
> +		return err;
> +
> +	x = handle->attrs + handle->length - 4;
> +	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
> +	if (err)
> +		return err;
> +	strcpy(x->name, key);
> +
> +	err = ext2fs_get_mem(value_len, &x->value);
> +	if (err)
> +		return err;
> +	memcpy(x->value, value, value_len);
> +	x->value_len = value_len;
> +	handle->dirty = 1;
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> +			      const char *key)
> +{
> +	struct ext2_xattr *x;
> +	errcode_t err;
> +
> +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		if (strcmp(x->name, key) == 0) {
> +			ext2fs_free_mem(&x->name);
> +			ext2fs_free_mem(&x->value);
> +			x->value_len = 0;
> +			handle->dirty = 1;
> +			return 0;
> +		}
> +	}
> +
> +	return EXT2_ET_EA_KEY_NOT_FOUND;
> +}
> +
> +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> +			     struct ext2_xattr_handle **handle)
> +{
> +	struct ext2_xattr_handle *h;
> +	errcode_t err;
> +
> +	err = ext2fs_get_memzero(sizeof(*h), &h);
> +	if (err)
> +		return err;
> +
> +	h->length = 4;
> +	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
> +				   &h->attrs);
> +	if (err) {
> +		ext2fs_free_mem(&h);
> +		return err;
> +	}
> +	h->ino = ino;
> +	h->fs = fs;
> +	*handle = h;
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
> +{
> +	unsigned int i;
> +	struct ext2_xattr_handle *h = *handle;
> +	struct ext2_xattr *a = h->attrs;
> +	errcode_t err;
> +
> +	if (h->dirty) {
> +		err = ext2fs_xattrs_write(h);
> +		if (err)
> +			return err;
> +	}
> +
> +	for (i = 0; i < h->length; i++) {
> +		if (a[i].name)
> +			ext2fs_free_mem(&a[i].name);
> +		if (a[i].value)
> +			ext2fs_free_mem(&a[i].value);
> +	}
> +
> +	ext2fs_free_mem(&h->attrs);
> +	ext2fs_free_mem(handle);
> +	return 0;
> +}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs
  2013-10-18  4:51 ` [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs Darrick J. Wong
  2013-10-18 19:36   ` Darrick J. Wong
@ 2013-10-22  1:20   ` Darrick J. Wong
  1 sibling, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-10-22  1:20 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:51:41PM -0700, Darrick J. Wong wrote:
> This is the initial implementation of a FUSE server based on
> e2fsprogs.  The point of this program is to enable ext4 to run on any
> OS that FUSE supports (and doesn't already have a native driver), such
> as MacOS X, BSDs, and Windows.  The code requires FUSE API v28, which
> is available in Linux fuse and osxfuse releases that are available as
> of August 2013.
> 
> v2: Remove unnecessary calls to ext2fs_flush(), and ensure that xattr
> blocks are freed when removing an inode.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  MCONFIG.in       |    1 
>  configure        |   89 ++
>  configure.in     |    9 
>  misc/Makefile.in |   15 
>  misc/fuse2fs.c   | 2837 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 2949 insertions(+), 2 deletions(-)
>  create mode 100644 misc/fuse2fs.c
> 
> 
> diff --git a/MCONFIG.in b/MCONFIG.in
> index fa2b03e..9f88b55 100644
> --- a/MCONFIG.in
> +++ b/MCONFIG.in
> @@ -93,6 +93,7 @@ LIBCOM_ERR = $(LIB)/libcom_err@LIB_EXT@ @PRIVATE_LIBS_CMT@ @SEM_INIT_LIB@
>  LIBE2P = $(LIB)/libe2p@LIB_EXT@
>  LIBEXT2FS = $(LIB)/libext2fs@LIB_EXT@
>  LIBUUID = @LIBUUID@ @SOCKET_LIB@
> +LIBFUSE = @FUSE_LIB@
>  LIBQUOTA = @STATIC_LIBQUOTA@
>  LIBBLKID = @LIBBLKID@ @PRIVATE_LIBS_CMT@ $(LIBUUID)
>  LIBINTL = @LIBINTL@
> diff --git a/configure b/configure
> index 2338fbe..c666235 100755
> --- a/configure
> +++ b/configure
> @@ -639,6 +639,8 @@ CYGWIN_CMT
>  LINUX_CMT
>  UNI_DIFF_OPTS
>  SEM_INIT_LIB
> +FUSE_CMT
> +FUSE_LIB
>  SOCKET_LIB
>  SIZEOF_OFF_T
>  SIZEOF_LONG_LONG
> @@ -11172,6 +11174,93 @@ if test "x$ac_cv_lib_socket_socket" = xyes; then :
>  fi
>  
>  
> +FUSE_CMT=''
> +FUSE_LIB=''
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -losxfuse" >&5
> +$as_echo_n "checking for fuse_main in -losxfuse... " >&6; }
> +if test "${ac_cv_lib_osxfuse_fuse_main+set}" = set; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_check_lib_save_LIBS=$LIBS
> +LIBS="-losxfuse  $LIBS"
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +/* Override any GCC internal prototype to avoid an error.
> +   Use char because int might match the return type of a GCC
> +   builtin and then its argument prototype would still apply.  */
> +#ifdef __cplusplus
> +extern "C"
> +#endif
> +char fuse_main ();
> +int
> +main ()
> +{
> +return fuse_main ();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_link "$LINENO"; then :
> +  ac_cv_lib_osxfuse_fuse_main=yes
> +else
> +  ac_cv_lib_osxfuse_fuse_main=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +    conftest$ac_exeext conftest.$ac_ext
> +LIBS=$ac_check_lib_save_LIBS
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_osxfuse_fuse_main" >&5
> +$as_echo "$ac_cv_lib_osxfuse_fuse_main" >&6; }
> +if test "x$ac_cv_lib_osxfuse_fuse_main" = x""yes; then :
> +  FUSE_LIB=-losxfuse
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for fuse_main in -lfuse" >&5
> +$as_echo_n "checking for fuse_main in -lfuse... " >&6; }
> +if test "${ac_cv_lib_fuse_fuse_main+set}" = set; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_check_lib_save_LIBS=$LIBS
> +LIBS="-lfuse  $LIBS"
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +/* Override any GCC internal prototype to avoid an error.
> +   Use char because int might match the return type of a GCC
> +   builtin and then its argument prototype would still apply.  */
> +#ifdef __cplusplus
> +extern "C"
> +#endif
> +char fuse_main ();
> +int
> +main ()
> +{
> +return fuse_main ();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_link "$LINENO"; then :
> +  ac_cv_lib_fuse_fuse_main=yes
> +else
> +  ac_cv_lib_fuse_fuse_main=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +    conftest$ac_exeext conftest.$ac_ext
> +LIBS=$ac_check_lib_save_LIBS
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_fuse_fuse_main" >&5
> +$as_echo "$ac_cv_lib_fuse_fuse_main" >&6; }
> +if test "x$ac_cv_lib_fuse_fuse_main" = x""yes; then :
> +  FUSE_LIB=-lfuse
> +else
> +  FUSE_CMT="#"
> +fi
> +
> +fi
> +
> +
> +
>  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for optreset" >&5
>  $as_echo_n "checking for optreset... " >&6; }
>  if ${ac_cv_have_optreset+:} false; then :
> diff --git a/configure.in b/configure.in
> index 049dc11..623adc8 100644
> --- a/configure.in
> +++ b/configure.in
> @@ -1127,6 +1127,15 @@ SOCKET_LIB=''
>  AC_CHECK_LIB(socket, socket, [SOCKET_LIB=-lsocket])
>  AC_SUBST(SOCKET_LIB)
>  dnl
> +dnl Check to see if the FUSE library is -lfuse or -losxfuse
> +dnl
> +FUSE_CMT=''
> +FUSE_LIB=''
> +dnl osxfuse.dylib supersedes fuselib.dylib
> +AC_CHECK_LIB(osxfuse, fuse_main, [FUSE_LIB=-losxfuse], [AC_CHECK_LIB(fuse, fuse_main, [FUSE_LIB=-lfuse], [FUSE_CMT="#"])])
> +AC_SUBST(FUSE_LIB)
> +AC_SUBST(FUSE_CMT)
> +dnl
>  dnl See if optreset exists
>  dnl
>  AC_MSG_CHECKING(for optreset)
> diff --git a/misc/Makefile.in b/misc/Makefile.in
> index a798f96..1838d03 100644
> --- a/misc/Makefile.in
> +++ b/misc/Makefile.in
> @@ -26,9 +26,12 @@ INSTALL = @INSTALL@
>  @BLKID_CMT@FINDFS_LINK= findfs
>  @BLKID_CMT@FINDFS_MAN= findfs.8
>  
> +@FUSE_CMT@FUSE_PROG= fuse2fs
> +
>  SPROGS=		mke2fs badblocks tune2fs dumpe2fs $(BLKID_PROG) logsave \
>  			$(E2IMAGE_PROG) @FSCK_PROG@ e2undo
> -USPROGS=	mklost+found filefrag e2freefrag $(UUIDD_PROG) $(E4DEFRAG_PROG)
> +USPROGS=	mklost+found filefrag e2freefrag $(UUIDD_PROG) $(E4DEFRAG_PROG) \
> +			$(FUSE_PROG)
>  SMANPAGES=	tune2fs.8 mklost+found.8 mke2fs.8 dumpe2fs.8 badblocks.8 \
>  			e2label.8 $(FINDFS_MAN) $(BLKID_MAN) $(E2IMAGE_MAN) \
>  			logsave.8 filefrag.8 e2freefrag.8 e2undo.8 \
> @@ -56,6 +59,7 @@ FILEFRAG_OBJS=	filefrag.o
>  E2UNDO_OBJS=  e2undo.o
>  E4DEFRAG_OBJS=	e4defrag.o
>  E2FREEFRAG_OBJS= e2freefrag.o
> +FUSE2FS_OBJS=	fuse2fs.o
>  
>  PROFILED_TUNE2FS_OBJS=	profiled/tune2fs.o profiled/util.o
>  PROFILED_MKLPF_OBJS=	profiled/mklost+found.o
> @@ -75,6 +79,7 @@ PROFILED_FILEFRAG_OBJS=	profiled/filefrag.o
>  PROFILED_E2FREEFRAG_OBJS= profiled/e2freefrag.o
>  PROFILED_E2UNDO_OBJS=	profiled/e2undo.o
>  PROFILED_E4DEFRAG_OBJS=	profiled/e4defrag.o
> +PROFILED_FUSE2FS_OJBS=	profiled/fuse2fs.o
>  
>  SRCS=	$(srcdir)/tune2fs.c $(srcdir)/mklost+found.c $(srcdir)/mke2fs.c \
>  		$(srcdir)/chattr.c $(srcdir)/lsattr.c $(srcdir)/dumpe2fs.c \
> @@ -82,7 +87,7 @@ SRCS=	$(srcdir)/tune2fs.c $(srcdir)/mklost+found.c $(srcdir)/mke2fs.c \
>  		$(srcdir)/uuidgen.c $(srcdir)/blkid.c $(srcdir)/logsave.c \
>  		$(srcdir)/filefrag.c $(srcdir)/base_device.c \
>  		$(srcdir)/ismounted.c $(srcdir)/../e2fsck/profile.c \
> -		$(srcdir)/e2undo.c $(srcdir)/e2freefrag.c
> +		$(srcdir)/e2undo.c $(srcdir)/e2freefrag.c $(srcdir)/fuse2fs.c
>  
>  LIBS= $(LIBEXT2FS) $(LIBCOM_ERR) 
>  DEPLIBS= $(LIBEXT2FS) $(DEPLIBCOM_ERR)
> @@ -335,6 +340,12 @@ filefrag.profiled: $(FILEFRAG_OBJS)
>  	$(Q) $(CC) $(ALL_LDFLAGS) -g -pg -o filefrag.profiled \
>  		$(PROFILED_FILEFRAG_OBJS) 
>  
> +fuse2fs: $(FUSE2FS_OBJS) $(DEPLIBS) $(DEPLIBBLKID) $(DEPLIBUUID) \
> +		$(DEPLIBQUOTA) $(LIBEXT2FS)
> +	$(E) "	LD $@"
> +	$(Q) $(CC) $(ALL_LDFLAGS) -o fuse2fs $(FUSE2FS_OBJS) $(LIBS) \
> +		$(LIBFUSE) $(LIBBLKID) $(LIBUUID) $(LIBEXT2FS)
> +
>  tst_ismounted: $(srcdir)/ismounted.c $(STATIC_LIBEXT2FS) $(DEPLIBCOM_ERR)
>  	$(E) "	LD $@"
>  	$(CC) -o tst_ismounted $(srcdir)/ismounted.c -DDEBUG $(ALL_CFLAGS) \
> diff --git a/misc/fuse2fs.c b/misc/fuse2fs.c
> new file mode 100644
> index 0000000..d1c00df
> --- /dev/null
> +++ b/misc/fuse2fs.c
> @@ -0,0 +1,2837 @@
> +/*
> + * fuse2fs.c - FUSE server for e2fsprogs.
> + *
> + * Copyright (C) 2013 Oracle.
> + *
> + * %Begin-Header%
> + * This file may be redistributed under the terms of the GNU Public
> + * License.
> + * %End-Header%
> + */
> +#define _FILE_OFFSET_BITS 64
> +#define FUSE_USE_VERSION 29
> +#define _GNU_SOURCE
> +#include <pthread.h>
> +#ifdef __linux__
> +# include <linux/fs.h>
> +# include <linux/falloc.h>
> +# include <linux/xattr.h>
> +#endif
> +#include <sys/ioctl.h>
> +#include <unistd.h>
> +#include <fuse.h>
> +#include "ext2fs/ext2fs.h"
> +#include "ext2fs/ext2_fs.h"
> +
> +#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
> +# ifdef _IOR
> +#  ifdef _IOW
> +#   define SUPPORT_I_FLAGS
> +#  endif
> +# endif
> +#endif
> +
> +#ifdef FALLOC_FL_KEEP_SIZE
> +# define FL_KEEP_SIZE_FLAG FALLOC_FL_KEEP_SIZE
> +#else
> +# define FL_KEEP_SIZE_FLAG (0)
> +#endif
> +
> +#ifdef FALLOC_FL_PUNCH_HOLE
> +# define FL_PUNCH_HOLE_FLAG FALLOC_FL_PUNCH_HOLE
> +#else
> +# define FL_PUNCH_HOLE_FLAG (0)
> +#endif
> +
> +/*
> + * ext2_file_t contains a struct inode, so we can't leave files open.
> + * Use this as a proxy instead.
> + */
> +struct fuse2fs_file_handle {
> +	ext2_ino_t ino;
> +	int open_flags;
> +};
> +
> +/* Main program context */
> +struct fuse2fs {
> +	ext2_filsys fs;
> +	pthread_mutex_t bfl;
> +	int panic_on_error;
> +	FILE *err_fp;
> +	unsigned int next_generation;
> +};
> +
> +static int __translate_error(ext2_filsys fs, errcode_t err, ext2_ino_t ino,
> +			     const char *file, int line);
> +#define translate_error(fs, ino, err) __translate_error((fs), (err), (ino), \
> +			__FILE__, __LINE__)
> +
> +/* for macosx */
> +#ifndef W_OK
> +#  define W_OK 2
> +#endif
> +
> +#ifndef R_OK
> +#  define R_OK 4
> +#endif
> +
> +#define EXT4_EPOCH_BITS 2
> +#define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
> +#define EXT4_NSEC_MASK  (~0UL << EXT4_EPOCH_BITS)
> +
> +/*
> + * Extended fields will fit into an inode if the filesystem was formatted
> + * with large inodes (-I 256 or larger) and there are not currently any EAs
> + * consuming all of the available space. For new inodes we always reserve
> + * enough space for the kernel's known extended fields, but for inodes
> + * created with an old kernel this might not have been the case. None of
> + * the extended inode fields is critical for correct filesystem operation.
> + * This macro checks if a certain field fits in the inode. Note that
> + * inode-size = GOOD_OLD_INODE_SIZE + i_extra_isize
> + */
> +#define EXT4_FITS_IN_INODE(ext4_inode, field)		\
> +	((offsetof(typeof(*ext4_inode), field) +	\
> +	  sizeof((ext4_inode)->field))			\
> +	<= (EXT2_GOOD_OLD_INODE_SIZE +			\
> +	    (ext4_inode)->i_extra_isize))		\
> +
> +static inline __u32 ext4_encode_extra_time(const struct timespec *time)
> +{
> +	return (sizeof(time->tv_sec) > 4 ?
> +		(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
> +	       ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK);
> +}
> +
> +static inline void ext4_decode_extra_time(struct timespec *time, __u32 extra)
> +{
> +	if (sizeof(time->tv_sec) > 4)
> +		time->tv_sec |= (__u64)((extra) & EXT4_EPOCH_MASK) << 32;
> +	time->tv_nsec = ((extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
> +}
> +
> +#define EXT4_INODE_SET_XTIME(xtime, timespec, raw_inode)		       \
> +do {									       \
> +	(raw_inode)->xtime = (timespec)->tv_sec;			       \
> +	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
> +		(raw_inode)->xtime ## _extra =				       \
> +				ext4_encode_extra_time(timespec);	       \
> +} while (0)
> +
> +#define EXT4_EINODE_SET_XTIME(xtime, timespec, raw_inode)		       \
> +do {									       \
> +	if (EXT4_FITS_IN_INODE(raw_inode, xtime))			       \
> +		(raw_inode)->xtime = (timespec)->tv_sec;		       \
> +	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
> +		(raw_inode)->xtime ## _extra =				       \
> +				ext4_encode_extra_time(timespec);	       \
> +} while (0)
> +
> +#define EXT4_INODE_GET_XTIME(xtime, timespec, raw_inode)		       \
> +do {									       \
> +	(timespec)->tv_sec = (signed)((raw_inode)->xtime);		       \
> +	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
> +		ext4_decode_extra_time((timespec),			       \
> +				       raw_inode->xtime ## _extra);	       \
> +	else								       \
> +		(timespec)->tv_nsec = 0;				       \
> +} while (0)
> +
> +#define EXT4_EINODE_GET_XTIME(xtime, timespec, raw_inode)		       \
> +do {									       \
> +	if (EXT4_FITS_IN_INODE(raw_inode, xtime))			       \
> +		(timespec)->tv_sec =					       \
> +			(signed)((raw_inode)->xtime);			       \
> +	if (EXT4_FITS_IN_INODE(raw_inode, xtime ## _extra))		       \
> +		ext4_decode_extra_time((timespec),			       \
> +				       raw_inode->xtime ## _extra);	       \
> +	else								       \
> +		(timespec)->tv_nsec = 0;				       \
> +} while (0)
> +
> +static void get_now(struct timespec *now)
> +{
> +#ifdef CLOCK_REALTIME
> +	if (!clock_gettime(CLOCK_REALTIME, now))
> +		return;
> +#endif
> +
> +	now->tv_sec = time(NULL);
> +	now->tv_nsec = 0;
> +}
> +
> +static void increment_version(struct ext2_inode_large *inode)
> +{
> +	__u64 ver;
> +
> +	ver = inode->osd1.linux1.l_i_version;
> +	if (EXT4_FITS_IN_INODE(inode, i_version_hi))
> +		ver |= (__u64)inode->i_version_hi << 32;
> +	ver++;
> +	inode->osd1.linux1.l_i_version = ver;
> +	if (EXT4_FITS_IN_INODE(inode, i_version_hi))
> +		inode->i_version_hi = ver >> 32;
> +}
> +
> +static void init_times(struct ext2_inode_large *inode)
> +{
> +	struct timespec now;
> +
> +	get_now(&now);
> +	EXT4_INODE_SET_XTIME(i_atime, &now, inode);
> +	EXT4_INODE_SET_XTIME(i_ctime, &now, inode);
> +	EXT4_INODE_SET_XTIME(i_mtime, &now, inode);
> +	EXT4_EINODE_SET_XTIME(i_crtime, &now, inode);
> +	increment_version(inode);
> +}
> +
> +static int update_ctime(ext2_filsys fs, ext2_ino_t ino,
> +			struct ext2_inode_large *pinode)
> +{
> +	errcode_t err;
> +	struct timespec now;
> +	struct ext2_inode_large inode;
> +
> +	get_now(&now);
> +
> +	/* If user already has a inode buffer, just update that */
> +	if (pinode) {
> +		increment_version(pinode);
> +		EXT4_INODE_SET_XTIME(i_ctime, &now, pinode);
> +		return 0;
> +	}
> +
> +	/* Otherwise we have to read-modify-write the inode */
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));

Subtle bug here -- if inode size is 128, the i_extra_size field of the
ext2_inode_large is never set.  Later on, EXT4_INODE_SET_XTIME calls
EXT4_FITS_IN_INODE, which depends on i_extra_isize.  Therefore, the
i_extra_isize field must always be zeroed.

> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	increment_version(&inode);
> +	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	return 0;
> +}
> +
> +static int update_atime(ext2_filsys fs, ext2_ino_t ino)
> +{
> +	errcode_t err;
> +	struct ext2_inode_large inode, *pinode;
> +	struct timespec atime, mtime, now;
> +
> +	if (!(fs->flags & EXT2_FLAG_RW))
> +		return 0;
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	pinode = &inode;
> +	EXT4_INODE_GET_XTIME(i_atime, &atime, pinode);
> +	EXT4_INODE_GET_XTIME(i_mtime, &mtime, pinode);
> +	get_now(&now);
> +	/*
> +	 * If atime is newer than mtime and atime hasn't been updated in more
> +	 * than a day, skip the atime update.  Same idea as Linux "relatime".
> +	 */
> +	if (atime.tv_sec >= mtime.tv_sec && atime.tv_sec >= now.tv_sec - 86400)
> +		return 0;
> +	EXT4_INODE_SET_XTIME(i_atime, &now, &inode);
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	return 0;
> +}
> +
> +static int update_mtime(ext2_filsys fs, ext2_ino_t ino)
> +{
> +	errcode_t err;
> +	struct ext2_inode_large inode;
> +	struct timespec now;
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	get_now(&now);
> +	EXT4_INODE_SET_XTIME(i_mtime, &now, &inode);
> +	EXT4_INODE_SET_XTIME(i_ctime, &now, &inode);
> +	increment_version(&inode);
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	return 0;
> +}
> +
> +static int ext2_file_type(unsigned int mode)
> +{
> +	if (LINUX_S_ISREG(mode))
> +		return EXT2_FT_REG_FILE;
> +
> +	if (LINUX_S_ISDIR(mode))
> +		return EXT2_FT_DIR;
> +
> +	if (LINUX_S_ISCHR(mode))
> +		return EXT2_FT_CHRDEV;
> +
> +	if (LINUX_S_ISBLK(mode))
> +		return EXT2_FT_BLKDEV;
> +
> +	if (LINUX_S_ISLNK(mode))
> +		return EXT2_FT_SYMLINK;
> +
> +	if (LINUX_S_ISFIFO(mode))
> +		return EXT2_FT_FIFO;
> +
> +	if (LINUX_S_ISSOCK(mode))
> +		return EXT2_FT_SOCK;
> +
> +	return 0;
> +}
> +
> +static int fs_writeable(ext2_filsys fs)
> +{
> +	return (fs->flags & EXT2_FLAG_RW) && (fs->super->s_error_count == 0);
> +}
> +
> +static int __check_access(struct fuse_context *ctxt, ext2_filsys fs,
> +			  ext2_ino_t ino, int mask, int ignore_flags)
> +{
> +	errcode_t err;
> +	struct ext2_inode inode;
> +	mode_t perms;
> +
> +	/* no writing to read-only or broken fs */
> +	if ((mask & W_OK) && !fs_writeable(fs))
> +		return -EROFS;
> +
> +	err = ext2fs_read_inode(fs, ino, &inode);
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	/* existence check */
> +	if (mask == 0)
> +		return 0;
> +
> +	/* is immutable? */
> +	if (!ignore_flags && (mask & W_OK) &&
> +	    (inode.i_flags & EXT2_IMMUTABLE_FL))
> +		return -EPERM;
> +
> +	perms = inode.i_mode & 0777;
> +
> +	/* always allow root */
> +	if (ctxt->uid == 0)
> +		return 0;
> +
> +	/* allow owner, if perms match */
> +	if (inode.i_uid == ctxt->uid) {
> +		if ((mask << 6) & perms)
> +			return 0;
> +		return -EPERM;
> +	}
> +
> +	/* allow group, if perms match */
> +	if (inode.i_gid == ctxt->gid) {
> +		if ((mask << 3) & perms)
> +			return 0;
> +		return -EPERM;
> +	}
> +
> +	/* otherwise check other */
> +	if (mask & perms)
> +		return 0;
> +	return -EPERM;
> +}
> +
> +static int check_inum_access(struct fuse_context *ctxt, ext2_filsys fs,
> +			     ext2_ino_t ino, int mask)
> +{
> +	return __check_access(ctxt, fs, ino, mask, 0);
> +}
> +
> +static int check_flags_access(struct fuse_context *ctxt, ext2_filsys fs,
> +			      ext2_ino_t ino, int mask)
> +{
> +	return __check_access(ctxt, fs, ino, mask, 1);
> +}
> +
> +static void op_destroy(void *p)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +
> +	if (fs->flags & EXT2_FLAG_RW) {
> +		fs->super->s_state |= EXT2_VALID_FS;
> +		if (fs->super->s_error_count)
> +			fs->super->s_state |= EXT2_ERROR_FS;
> +		ext2fs_mark_super_dirty(fs);
> +		err = ext2fs_set_gdt_csum(fs);
> +		if (err)
> +			translate_error(fs, 0, err);
> +
> +		err = ext2fs_flush2(fs, 0);
> +		if (err)
> +			translate_error(fs, 0, err);
> +	}
> +}
> +
> +static void *op_init(struct fuse_conn_info *conn)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +
> +	if (fs->flags & EXT2_FLAG_RW) {
> +		fs->super->s_mnt_count++;
> +		fs->super->s_mtime = time(NULL);
> +		fs->super->s_state &= ~EXT2_VALID_FS;
> +		ext2fs_mark_super_dirty(fs);
> +		err = ext2fs_flush2(fs, 0);
> +		if (err)
> +			translate_error(fs, 0, err);
> +	}
> +	return ff;
> +}
> +
> +static int stat_inode(ext2_filsys fs, ext2_ino_t ino, struct stat *statbuf)
> +{
> +	struct ext2_inode_large inode;
> +	dev_t fakedev = 0;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err)
> +		return translate_error(fs, ino, err);
> +
> +	memcpy(&fakedev, fs->super->s_uuid, sizeof(fakedev));
> +	statbuf->st_dev = fakedev;
> +	statbuf->st_ino = ino;
> +	statbuf->st_mode = inode.i_mode;
> +	statbuf->st_nlink = inode.i_links_count;
> +	statbuf->st_uid = inode.i_uid;
> +	statbuf->st_gid = inode.i_gid;
> +	statbuf->st_size = inode.i_size;
> +	statbuf->st_blksize = fs->blocksize;
> +	statbuf->st_blocks = inode.i_blocks;
> +	statbuf->st_atime = inode.i_atime;
> +	statbuf->st_mtime = inode.i_mtime;
> +	statbuf->st_ctime = inode.i_ctime;
> +	if (LINUX_S_ISCHR(inode.i_mode) ||
> +	    LINUX_S_ISBLK(inode.i_mode)) {
> +		if (inode.i_block[0])
> +			statbuf->st_rdev = inode.i_block[0];
> +		else
> +			statbuf->st_rdev = inode.i_block[1];
> +	}
> +
> +	return ret;
> +}
> +
> +static int op_getattr(const char *path, struct stat *statbuf)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +	ret = stat_inode(fs, ino, statbuf);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_readlink(const char *path, char *buf, size_t len)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	struct ext2_inode inode;
> +	unsigned int got;
> +	ext2_file_t file;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_read_inode(fs, ino, &inode);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	if (!LINUX_S_ISLNK(inode.i_mode)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	len--;
> +	if (inode.i_size < len)
> +		len = inode.i_size;
> +	if (ext2fs_inode_data_blocks2(fs, &inode)) {
> +		/* big symlink */
> +
> +		err = ext2fs_file_open(fs, ino, 0, &file);
> +		if (err) {
> +			ret = translate_error(fs, ino, err);
> +			goto out;
> +		}
> +
> +		err = ext2fs_file_read(file, buf, len, &got);
> +		if (err || got != len) {
> +			ext2fs_file_close(file);
> +			ret = translate_error(fs, ino, err);
> +			goto out;
> +		}
> +
> +		err = ext2fs_file_close(file);
> +		if (err) {
> +			ret = translate_error(fs, ino, err);
> +			goto out;
> +		}
> +	} else
> +		/* inline symlink */
> +		memcpy(buf, (char *)inode.i_block, len);
> +	buf[len] = 0;
> +
> +	if (fs_writeable(fs)) {
> +		ret = update_atime(fs, ino);
> +		if (ret)
> +			goto out;
> +	}
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_mknod(const char *path, mode_t mode, dev_t dev)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t parent, child;
> +	char *temp_path = strdup(path);
> +	errcode_t err;
> +	char *node_name, a;
> +	int filetype;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	if (!temp_path) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name = strrchr(temp_path, '/');
> +	if (!node_name) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name++;
> +	a = *node_name;
> +	*node_name = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &parent);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, parent, W_OK);
> +	if (ret)
> +		goto out2;
> +
> +	*node_name = a;
> +
> +	if (LINUX_S_ISCHR(mode))
> +		filetype = EXT2_FT_CHRDEV;
> +	else if (LINUX_S_ISBLK(mode))
> +		filetype = EXT2_FT_BLKDEV;
> +	else if (LINUX_S_ISFIFO(mode))
> +		filetype = EXT2_FT_FIFO;
> +	else {
> +		ret = -EINVAL;
> +		goto out2;
> +	}
> +
> +	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_link(fs, parent, node_name, child, filetype);
> +	if (err == EXT2_ET_DIR_NO_SPACE) {
> +		err = ext2fs_expand_dir(fs, parent);
> +		if (err) {
> +			ret = translate_error(fs, parent, err);
> +			goto out2;
> +		}
> +
> +		err = ext2fs_link(fs, parent, node_name, child,
> +				     filetype);
> +	}
> +	if (err) {
> +		ret = translate_error(fs, parent, err);
> +		goto out2;
> +	}
> +
> +	ret = update_mtime(fs, parent);
> +	if (ret)
> +		goto out2;
> +
> +	memset(&inode, 0, sizeof(inode));
> +	inode.i_mode = mode;
> +
> +	if (dev & ~0xFFFF)
> +		inode.i_block[1] = dev;
> +	else
> +		inode.i_block[0] = dev;
> +	inode.i_links_count = 1;
> +	inode.i_extra_isize = sizeof(struct ext2_inode_large) -
> +		EXT2_GOOD_OLD_INODE_SIZE;
> +
> +	err = ext2fs_write_new_inode(fs, child, (struct ext2_inode *)&inode);
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	inode.i_generation = ff->next_generation++;
> +	init_times(&inode);
> +	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
> +
> +out2:
> +	pthread_mutex_unlock(&ff->bfl);
> +out:
> +	free(temp_path);
> +	return ret;
> +}
> +
> +static int op_mkdir(const char *path, mode_t mode)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t parent, child;
> +	char *temp_path = strdup(path);
> +	errcode_t err;
> +	char *node_name, a;
> +	struct ext2_inode_large inode;
> +	char *block;
> +	blk64_t blk;
> +	int ret = 0;
> +
> +	if (!temp_path) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name = strrchr(temp_path, '/');
> +	if (!node_name) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name++;
> +	a = *node_name;
> +	*node_name = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &parent);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, parent, W_OK);
> +	if (ret)
> +		goto out2;
> +
> +	*node_name = a;
> +
> +	err = ext2fs_mkdir(fs, parent, 0, node_name);
> +	if (err == EXT2_ET_DIR_NO_SPACE) {
> +		err = ext2fs_expand_dir(fs, parent);
> +		if (err) {
> +			ret = translate_error(fs, parent, err);
> +			goto out2;
> +		}
> +
> +		err = ext2fs_mkdir(fs, parent, 0, node_name);
> +	}
> +	if (err) {
> +		ret = translate_error(fs, parent, err);
> +		goto out2;
> +	}
> +
> +	ret = update_mtime(fs, parent);
> +	if (ret)
> +		goto out2;
> +
> +	/* Still have to update the uid/gid of the dir */
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &child);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_read_inode_full(fs, child, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	inode.i_uid = ctxt->uid;
> +	inode.i_gid = ctxt->gid;
> +	inode.i_generation = ff->next_generation++;
> +
> +	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	/* Rewrite the directory block checksum, having set i_generation */
> +	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> +					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> +		goto out2;
> +	err = ext2fs_new_dir_block(fs, child, parent, &block);
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +	err = ext2fs_bmap2(fs, child, (struct ext2_inode *)&inode, NULL, 0, 0,
> +			   NULL, &blk);
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out3;
> +	}
> +	err = ext2fs_write_dir_block4(fs, blk, block, 0, child);
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out3;
> +	}
> +
> +out3:
> +	ext2fs_free_mem(&block);
> +out2:
> +	pthread_mutex_unlock(&ff->bfl);
> +out:
> +	free(temp_path);
> +	return ret;
> +}
> +
> +static int unlink_file_by_name(struct fuse_context *ctxt, ext2_filsys fs,
> +			       const char *path)
> +{
> +	errcode_t err;
> +	ext2_ino_t dir;
> +	char *filename = strdup(path);
> +	char *base_name;
> +	int ret;
> +
> +	base_name = strrchr(filename, '/');
> +	if (base_name) {
> +		*base_name++ = '\0';
> +		err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, filename,
> +				   &dir);
> +		if (err) {
> +			free(filename);
> +			return translate_error(fs, 0, err);
> +		}
> +	} else {
> +		dir = EXT2_ROOT_INO;
> +		base_name = filename;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, dir, W_OK);
> +	if (ret) {
> +		free(filename);
> +		return ret;
> +	}
> +
> +	err = ext2fs_unlink(fs, dir, base_name, 0, 0);
> +	free(filename);
> +	if (err)
> +		return translate_error(fs, dir, err);
> +
> +	return update_mtime(fs, dir);
> +}
> +
> +static int release_blocks_proc(ext2_filsys fs, blk64_t *blocknr,
> +			       e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
> +			       blk64_t ref_block EXT2FS_ATTR((unused)),
> +			       int ref_offset EXT2FS_ATTR((unused)),
> +			       void *private EXT2FS_ATTR((unused)))
> +{
> +	blk64_t blk = *blocknr;
> +
> +	if (blk % EXT2FS_CLUSTER_RATIO(fs) == 0)
> +		ext2fs_block_alloc_stats2(fs, *blocknr, -1);
> +	return 0;
> +}
> +
> +static int remove_inode(struct fuse2fs *ff, ext2_ino_t ino)
> +{
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	switch (inode.i_links_count) {
> +	case 0:
> +		return 0; /* XXX: already done? */
> +	case 1:
> +		inode.i_links_count--;
> +		inode.i_dtime = fs->now ? fs->now : time(0);
> +		break;
> +	default:
> +		inode.i_links_count--;
> +	}
> +
> +	ret = update_ctime(fs, ino, &inode);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	if (inode.i_links_count)
> +		goto out;
> +
> +	err = ext2fs_free_ext_attr(fs, ino, &inode);
> +	if (err)
> +		goto out;
> +	if (ext2fs_inode_has_valid_blocks2(fs, (struct ext2_inode *)&inode))
> +		ext2fs_block_iterate3(fs, ino, BLOCK_FLAG_READ_ONLY, NULL,
> +				      release_blocks_proc, NULL);
> +	ext2fs_inode_alloc_stats2(fs, ino, -1,
> +				  LINUX_S_ISDIR(inode.i_mode));
> +out:
> +	return ret;
> +}
> +
> +static int __op_unlink(const char *path)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	ret = unlink_file_by_name(ctxt, fs, path);
> +	if (ret)
> +		goto out;
> +
> +	ret = remove_inode(ff, ino);
> +	if (ret)
> +		goto out;
> +out:
> +	return ret;
> +}
> +
> +static int op_unlink(const char *path)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	int ret;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	ret = __op_unlink(path);
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +struct rd_struct {
> +	ext2_ino_t	parent;
> +	int		empty;
> +};
> +
> +static int rmdir_proc(ext2_ino_t dir EXT2FS_ATTR((unused)),
> +		      int	entry EXT2FS_ATTR((unused)),
> +		      struct ext2_dir_entry *dirent,
> +		      int	offset EXT2FS_ATTR((unused)),
> +		      int	blocksize EXT2FS_ATTR((unused)),
> +		      char	*buf EXT2FS_ATTR((unused)),
> +		      void	*private)
> +{
> +	struct rd_struct *rds = (struct rd_struct *) private;
> +
> +	if (dirent->inode == 0)
> +		return 0;
> +	if (((dirent->name_len & 0xFF) == 1) && (dirent->name[0] == '.'))
> +		return 0;
> +	if (((dirent->name_len & 0xFF) == 2) && (dirent->name[0] == '.') &&
> +	    (dirent->name[1] == '.')) {
> +		rds->parent = dirent->inode;
> +		return 0;
> +	}
> +	rds->empty = 0;
> +	return 0;
> +}
> +
> +static int op_rmdir(const char *path)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t child;
> +	errcode_t err;
> +	struct ext2_inode inode;
> +	struct rd_struct rds;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &child);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, child, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	rds.parent = 0;
> +	rds.empty = 1;
> +
> +	err = ext2fs_dir_iterate2(fs, child, 0, 0, rmdir_proc, &rds);
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out;
> +	}
> +
> +	if (rds.empty == 0) {
> +		ret = -ENOTEMPTY;
> +		goto out;
> +	}
> +
> +	ret = unlink_file_by_name(ctxt, fs, path);
> +	if (ret)
> +		goto out;
> +	/* Directories have to be "removed" twice. */
> +	ret = remove_inode(ff, child);
> +	if (ret)
> +		goto out;
> +	ret = remove_inode(ff, child);
> +	if (ret)
> +		goto out;
> +
> +	if (rds.parent) {
> +		err = ext2fs_read_inode(fs, rds.parent, &inode);
> +		if (err) {
> +			ret = translate_error(fs, rds.parent, err);
> +			goto out;
> +		}
> +		if (inode.i_links_count > 1)
> +			inode.i_links_count--;
> +		ret = update_mtime(fs, rds.parent);
> +		if (ret)
> +			goto out;
> +		err = ext2fs_write_inode(fs, rds.parent, &inode);
> +		if (err) {
> +			ret = translate_error(fs, rds.parent, err);
> +			goto out;
> +		}
> +	}
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_symlink(const char *src, const char *dest)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t parent, child;
> +	char *temp_path = strdup(dest);
> +	errcode_t err;
> +	char *node_name, a;
> +	struct ext2_inode_large inode;
> +	int len = strlen(src);
> +	int ret = 0;
> +
> +	if (!temp_path) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name = strrchr(temp_path, '/');
> +	if (!node_name) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name++;
> +	a = *node_name;
> +	*node_name = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &parent);
> +	*node_name = a;
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, parent, W_OK);
> +	if (ret)
> +		goto out2;
> +
> +
> +	/* Create symlink */
> +	err = ext2fs_symlink(fs, parent, 0, node_name, (char *)src);
> +	if (err) {
> +		ret = translate_error(fs, parent, err);
> +		goto out2;
> +	}
> +
> +	/* Update parent dir's mtime */
> +	ret = update_mtime(fs, parent);
> +	if (ret)
> +		goto out2;
> +
> +	/* Still have to update the uid/gid of the symlink */
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &child);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_read_inode_full(fs, child, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	inode.i_uid = ctxt->uid;
> +	inode.i_gid = ctxt->gid;
> +	inode.i_generation = ff->next_generation++;
> +
> +	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +out2:
> +	pthread_mutex_unlock(&ff->bfl);
> +out:
> +	free(temp_path);
> +	return ret;
> +}
> +
> +static int op_rename(const char *from, const char *to)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t from_ino, to_ino, to_dir_ino, from_dir_ino;
> +	char *temp_to = NULL, *temp_from = NULL;
> +	char *cp, a;
> +	struct ext2_inode from_inode;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, from, &from_ino);
> +	if (err || from_ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, to, &to_ino);
> +	if (err && err != EXT2_ET_FILE_NOT_FOUND) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	if (err == EXT2_ET_FILE_NOT_FOUND)
> +		to_ino = 0;
> +
> +	/* Already the same file? */
> +	if (to_ino != 0 && to_ino == from_ino) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	temp_to = strdup(to);
> +	if (!temp_to) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	temp_from = strdup(from);
> +	if (!temp_from) {
> +		ret = -ENOMEM;
> +		goto out2;
> +	}
> +
> +	/* Find parent dir of the source and check write access */
> +	cp = strrchr(temp_from, '/');
> +	if (!cp) {
> +		ret = -EINVAL;
> +		goto out2;
> +	}
> +
> +	a = *(cp + 1);
> +	*(cp + 1) = 0;
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_from,
> +			   &from_dir_ino);
> +	*(cp + 1) = a;
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +	if (from_dir_ino == 0) {
> +		ret = -ENOENT;
> +		goto out2;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, from_dir_ino, W_OK);
> +	if (ret)
> +		goto out2;
> +
> +	/* Find parent dir of the destination and check write access */
> +	cp = strrchr(temp_to, '/');
> +	if (!cp) {
> +		ret = -EINVAL;
> +		goto out2;
> +	}
> +
> +	a = *(cp + 1);
> +	*(cp + 1) = 0;
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_to,
> +			   &to_dir_ino);
> +	*(cp + 1) = a;
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +	if (to_dir_ino == 0) {
> +		ret = -ENOENT;
> +		goto out2;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, to_dir_ino, W_OK);
> +	if (ret)
> +		goto out2;
> +
> +	/* Get ready to do the move */
> +	err = ext2fs_read_inode(fs, from_ino, &from_inode);
> +	if (err) {
> +		ret = translate_error(fs, from_ino, err);
> +		goto out2;
> +	}
> +
> +	/* If the target exists, unlink it first */
> +	if (to_ino != 0) {
> +		ret = __op_unlink(to);
> +		if (ret)
> +			goto out2;
> +	}
> +
> +	/* Link in the new file */
> +	err = ext2fs_link(fs, to_dir_ino, cp + 1, from_ino,
> +			  ext2_file_type(from_inode.i_mode));
> +	if (err == EXT2_ET_DIR_NO_SPACE) {
> +		err = ext2fs_expand_dir(fs, to_dir_ino);
> +		if (err) {
> +			ret = translate_error(fs, to_dir_ino, err);
> +			goto out2;
> +		}
> +
> +		err = ext2fs_link(fs, to_dir_ino, cp + 1, from_ino,
> +				     ext2_file_type(from_inode.i_mode));
> +	}
> +	if (err) {
> +		ret = translate_error(fs, to_dir_ino, err);
> +		goto out2;
> +	}
> +
> +	ret = update_mtime(fs, to_dir_ino);
> +	if (ret)
> +		goto out2;
> +
> +	/* Remove the old file */
> +	ret = unlink_file_by_name(ctxt, fs, from);
> +	if (ret)
> +		goto out2;
> +
> +	/* Flush the whole mess out */
> +	err = ext2fs_flush2(fs, 0);
> +	if (err)
> +		ret = translate_error(fs, 0, err);
> +
> +out2:
> +	free(temp_from);
> +	free(temp_to);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_link(const char *src, const char *dest)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	char *temp_path = strdup(dest);
> +	errcode_t err;
> +	char *node_name, a;
> +	ext2_ino_t parent, ino;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	if (!temp_path) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name = strrchr(temp_path, '/');
> +	if (!node_name) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name++;
> +	a = *node_name;
> +	*node_name = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &parent);
> +	*node_name = a;
> +	if (err) {
> +		err = -ENOENT;
> +		goto out2;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, parent, W_OK);
> +	if (ret)
> +		goto out2;
> +
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, src, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	inode.i_links_count++;
> +	ret = update_ctime(fs, ino, &inode);
> +	if (ret)
> +		goto out2;
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_link(fs, parent, node_name, ino,
> +			  ext2_file_type(inode.i_mode));
> +	if (err == EXT2_ET_DIR_NO_SPACE) {
> +		err = ext2fs_expand_dir(fs, parent);
> +		if (err) {
> +			ret = translate_error(fs, parent, err);
> +			goto out2;
> +		}
> +
> +		err = ext2fs_link(fs, parent, node_name, ino,
> +				     ext2_file_type(inode.i_mode));
> +	}
> +	if (err) {
> +		ret = translate_error(fs, parent, err);
> +		goto out2;
> +	}
> +
> +	ret = update_mtime(fs, parent);
> +	if (ret)
> +		goto out;
> +
> +out2:
> +	pthread_mutex_unlock(&ff->bfl);
> +out:
> +	free(temp_path);
> +	return ret;
> +}
> +
> +static int op_chmod(const char *path, mode_t mode)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	/* XXX: Fails if uid matches but u-w */
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	inode.i_mode &= ~0xFFF;
> +	inode.i_mode |= mode & 0xFFF;
> +	ret = update_ctime(fs, ino, &inode);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_chown(const char *path, uid_t owner, gid_t group)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	inode.i_uid = owner;
> +	inode.i_gid = group;
> +	ret = update_ctime(fs, ino, &inode);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_truncate(const char *path, off_t len)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	ext2_file_t file;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_file_open(fs, ino, EXT2_FILE_WRITE, &file);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_set_size2(file, len);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +out2:
> +	err = ext2fs_file_close(file);
> +	if (err && !ret) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	if (!ret)
> +		ret = update_mtime(fs, ino);
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return err;
> +}
> +
> +static int __op_open(const char *path, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	struct fuse2fs_file_handle *file;
> +	int check, ret = 0;
> +
> +	file = calloc(1, sizeof(*file));
> +	if (!file)
> +		return -ENOMEM;
> +
> +	file->open_flags = 0;
> +	if (fp->flags & (O_RDWR | O_WRONLY))
> +		file->open_flags |= EXT2_FILE_WRITE;
> +	if (fp->flags & O_CREAT)
> +		file->open_flags |= EXT2_FILE_CREATE;
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &file->ino);
> +	if (err || file->ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	check = R_OK;
> +	if (file->open_flags & EXT2_FILE_WRITE)
> +		check |= W_OK;
> +	ret = check_inum_access(ctxt, fs, file->ino, check);
> +	if (ret)
> +		goto out;
> +	fp->fh = (uint64_t)file;
> +
> +out:
> +	if (ret)
> +		free(file);
> +	return ret;
> +}
> +
> +static int op_open(const char *path, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	int ret;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	ret = __op_open(path, fp);
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_read(const char *path, char *buf, size_t len, off_t offset,
> +		     struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	ext2_file_t efp;
> +	errcode_t err;
> +	unsigned int got;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_llseek(efp, offset, SEEK_SET, NULL);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_read(efp, buf, len, &got);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_close(efp);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	if (fs_writeable(fs)) {
> +		ret = update_atime(fs, fh->ino);
> +		if (ret)
> +			goto out;
> +	}
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return got ? got : ret;
> +}
> +
> +static int op_write(const char *path, const char *buf, size_t len, off_t offset,
> +		      struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	ext2_file_t efp;
> +	errcode_t err;
> +	unsigned int got;
> +	__u64 fsize;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!fs_writeable(fs)) {
> +		ret = -EROFS;
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_llseek(efp, offset, SEEK_SET, NULL);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_write(efp, buf, len, &got);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_flush(efp);
> +	if (err) {
> +		got = 0;
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	/*
> +	 * Apparently ext2fs_file_write will dirty the inode (to allocate
> +	 * blocks) without bothering to write out the inode, so change the
> +	 * file size *after* the write, because changing the size forces
> +	 * the inode out to disk.
> +	 */
> +	err = ext2fs_file_get_lsize(efp, &fsize);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +	if (offset + len > fsize) {
> +		fsize = offset + len;
> +		err = ext2fs_file_set_size2(efp, fsize);
> +		if (err) {
> +			ret = translate_error(fs, fh->ino, err);
> +			goto out;
> +		}
> +	}
> +
> +	err = ext2fs_file_close(efp);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	ret = update_mtime(fs, fh->ino);
> +	if (ret)
> +		goto out;
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return got ? got : ret;
> +}
> +
> +static int op_release(const char *path, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (fs_writeable(fs) && fh->open_flags & EXT2_FILE_WRITE) {
> +		err = ext2fs_flush2(fs, EXT2_FLAG_FLUSH_NO_SYNC);
> +		if (err)
> +			ret = translate_error(fs, fh->ino, err);
> +	}
> +	fp->fh = 0;
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	free(fh);
> +
> +	return ret;
> +}
> +
> +static int op_fsync(const char *path, int datasync, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	/* For now, flush everything, even if it's slow */
> +	pthread_mutex_lock(&ff->bfl);
> +	if (fs_writeable(fs) && fh->open_flags & EXT2_FILE_WRITE) {
> +		err = ext2fs_flush2(fs, 0);
> +		if (err)
> +			ret = translate_error(fs, fh->ino, err);
> +	}
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +
> +static int op_statfs(const char *path, struct statvfs *buf)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	uint64_t fsid, *f;
> +
> +	buf->f_bsize = fs->blocksize;
> +	buf->f_frsize = 0;
> +	buf->f_blocks = fs->super->s_blocks_count;
> +	buf->f_bfree = fs->super->s_free_blocks_count;
> +	if (fs->super->s_free_blocks_count < fs->super->s_r_blocks_count)
> +		buf->f_bavail = 0;
> +	else
> +		buf->f_bavail = fs->super->s_free_blocks_count -
> +				fs->super->s_r_blocks_count;
> +	buf->f_files = fs->super->s_inodes_count;
> +	buf->f_ffree = fs->super->s_free_inodes_count;
> +	buf->f_favail = fs->super->s_free_inodes_count;
> +	f = (uint64_t *)fs->super->s_uuid;
> +	fsid = *f;
> +	f++;
> +	fsid ^= *f;
> +	buf->f_fsid = fsid;
> +	buf->f_flag = 0;
> +	if (fs->flags & EXT2_FLAG_RW)
> +		buf->f_flag |= ST_RDONLY;
> +	buf->f_namemax = EXT2_NAME_LEN;
> +
> +	return 0;
> +}
> +
> +static int op_getxattr(const char *path, const char *key, char *value,
> +		       size_t len)
> +{

As a general note, these functions do not perform the translation that ext4.ko
provides.  Therefore, the ext4 ACL structure is being fed back into the kernel
via fuse, even though the kernel itself uses a different (slightly bulkier)
data structure.  It is necessary to provide a translation layer to and from the
native format, though that might be tricky on non-Linux platforms.

> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct ext2_xattr_handle *h;
> +	void *ptr;
> +	unsigned int plen;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
> +		ret = -ENOTSUP;
> +		goto out;
> +	}
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, R_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_xattrs_open(fs, ino, &h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_xattrs_read(h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_xattr_get(h, key, &ptr, &plen);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	if (!len) {
> +		ret = plen;
> +	} else if (len < plen) {
> +		ret = -ERANGE;
> +	} else {
> +		memcpy(value, ptr, plen);
> +		ret = plen;
> +	}
> +
> +	ext2fs_free_mem(&ptr);
> +out2:
> +	err = ext2fs_xattrs_close(&h);
> +	if (err)
> +		ret = translate_error(fs, ino, err);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +
> +static int count_buffer_space(char *name, char *value, void *data)
> +{
> +	unsigned int *x = data;
> +
> +	*x = *x + strlen(name) + 1;
> +	return 0;
> +}
> +
> +static int copy_names(char *name, char *value, void *data)
> +{
> +	char **b = data;
> +
> +	strncpy(*b, name, strlen(name));
> +	*b = *b + strlen(name) + 1;
> +
> +	return 0;
> +}
> +
> +static int op_listxattr(const char *path, char *names, size_t len)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct ext2_xattr_handle *h;
> +	unsigned int bufsz;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
> +		ret = -ENOTSUP;
> +		goto out;
> +	}
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, R_OK);
> +	if (ret)
> +		goto out2;
> +
> +	err = ext2fs_xattrs_open(fs, ino, &h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_xattrs_read(h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	/* Count buffer space needed for names */
> +	bufsz = 0;
> +	err = ext2fs_xattrs_iterate(h, count_buffer_space, &bufsz);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	if (len == 0) {
> +		ret = bufsz;
> +		goto out2;
> +	} else if (len < bufsz) {
> +		ret = -ERANGE;
> +		goto out2;
> +	}
> +
> +	/* Copy names out */
> +	memset(names, 0, len);
> +	err = ext2fs_xattrs_iterate(h, copy_names, &names);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +	ret = bufsz;
> +out2:
> +	err = ext2fs_xattrs_close(&h);
> +	if (err)
> +		ret = translate_error(fs, ino, err);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +
> +static int op_setxattr(const char *path, const char *key, const char *value,
> +		       size_t len, int flags)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct ext2_xattr_handle *h;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
> +		ret = -ENOTSUP;
> +		goto out;
> +	}
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_xattrs_open(fs, ino, &h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_xattrs_read(h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_xattr_set(h, key, value, len);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_xattrs_write(h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +out2:
> +	err = ext2fs_xattrs_close(&h);
> +	if (err)
> +		ret = translate_error(fs, ino, err);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +
> +static int op_removexattr(const char *path, const char *key)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct ext2_xattr_handle *h;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!EXT2_HAS_COMPAT_FEATURE(fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR)) {
> +		ret = -ENOTSUP;
> +		goto out;
> +	}
> +
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_xattrs_open(fs, ino, &h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_xattrs_read(h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_xattr_remove(h, key);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_xattrs_write(h);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out2;
> +	}
> +
> +out2:
> +	err = ext2fs_xattrs_close(&h);
> +	if (err)
> +		ret = translate_error(fs, ino, err);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +
> +struct readdir_iter {
> +	void *buf;
> +	fuse_fill_dir_t func;
> +};
> +
> +static int op_readdir_iter(ext2_ino_t dir, int entry,
> +			   struct ext2_dir_entry *dirent, int offset,
> +			   int blocksize, char *buf, void *data)
> +{
> +	struct readdir_iter *i = data;
> +	struct stat statbuf;
> +	char namebuf[EXT2_NAME_LEN + 1];
> +	int ret;
> +
> +	memcpy(namebuf, dirent->name, dirent->name_len & 0xFF);
> +	namebuf[dirent->name_len & 0xFF] = 0;
> +	statbuf.st_ino = dirent->inode;
> +	statbuf.st_mode = S_IFREG;
> +	ret = i->func(i->buf, namebuf, NULL, 0);
> +	if (ret)
> +		return DIRENT_ABORT;
> +
> +	return 0;
> +}
> +
> +static int op_readdir(const char *path, void *buf, fuse_fill_dir_t fill_func,
> +		      off_t offset, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	struct readdir_iter i;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	i.buf = buf;
> +	i.func = fill_func;
> +	err = ext2fs_dir_iterate2(fs, fh->ino, 0, NULL, op_readdir_iter, &i);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	if (fs_writeable(fs)) {
> +		ret = update_atime(fs, fh->ino);
> +		if (ret)
> +			goto out;
> +	}
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_access(const char *path, int mask)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err || ino == 0) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, mask);
> +	if (ret)
> +		goto out;
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +static int op_create(const char *path, mode_t mode, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct ext3_extent_header *eh;
> +	ext2_ino_t parent, child;
> +	char *temp_path = strdup(path);
> +	errcode_t err;
> +	char *node_name, a;
> +	int filetype, i;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	if (!temp_path) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name = strrchr(temp_path, '/');
> +	if (!node_name) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	node_name++;
> +	a = *node_name;
> +	*node_name = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, temp_path,
> +			   &parent);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out2;
> +	}
> +
> +	err = check_inum_access(ctxt, fs, parent, W_OK);
> +	if (err)
> +		goto out;
> +
> +	*node_name = a;
> +
> +	filetype = ext2_file_type(mode);
> +
> +	err = ext2fs_new_inode(fs, parent, mode, 0, &child);
> +	if (err) {
> +		ret = translate_error(fs, parent, err);
> +		goto out2;
> +	}
> +
> +	err = ext2fs_link(fs, parent, node_name, child, filetype);
> +	if (err == EXT2_ET_DIR_NO_SPACE) {
> +		err = ext2fs_expand_dir(fs, parent);
> +		if (err) {
> +			ret = translate_error(fs, parent, err);
> +			goto out2;
> +		}
> +
> +		err = ext2fs_link(fs, parent, node_name, child,
> +				     filetype);
> +	}
> +	if (err) {
> +		ret = translate_error(fs, parent, err);
> +		goto out2;
> +	}
> +
> +	ret = update_mtime(fs, parent);
> +	if (ret)
> +		goto out2;
> +
> +	memset(&inode, 0, sizeof(inode));
> +	inode.i_mode = mode;
> +	inode.i_links_count = 1;
> +	inode.i_extra_isize = sizeof(struct ext2_inode_large) -
> +		EXT2_GOOD_OLD_INODE_SIZE;
> +	if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
> +		inode.i_flags = EXT4_EXTENTS_FL;
> +
> +		/* This must be initialized, even for a zero byte file. */
> +		eh = (struct ext3_extent_header *) &inode.i_block[0];
> +		eh->eh_magic = ext2fs_cpu_to_le16(EXT3_EXT_MAGIC);
> +		eh->eh_depth = 0;
> +		eh->eh_entries = 0;
> +		i = (sizeof(inode.i_block) - sizeof(*eh)) /
> +			sizeof(struct ext3_extent);
> +		eh->eh_max = ext2fs_cpu_to_le16(i);
> +	}
> +
> +	err = ext2fs_write_new_inode(fs, child, (struct ext2_inode *)&inode);
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	inode.i_generation = ff->next_generation++;
> +	init_times(&inode);
> +	err = ext2fs_write_inode_full(fs, child, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, child, err);
> +		goto out2;
> +	}
> +
> +	ext2fs_inode_alloc_stats2(fs, child, 1, 0);
> +
> +	ret = __op_open(path, fp);
> +	if (ret)
> +		goto out2;
> +out2:
> +	pthread_mutex_unlock(&ff->bfl);
> +out:
> +	free(temp_path);
> +	return ret;
> +}
> +
> +static int op_ftruncate(const char *path, off_t len, struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	ext2_file_t efp;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!fs_writeable(fs)) {
> +		ret = -EROFS;
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_set_size2(efp, len);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_file_close(efp);
> +	if (err) {
> +		ret = translate_error(fs, fh->ino, err);
> +		goto out;
> +	}
> +
> +	ret = update_mtime(fs, fh->ino);
> +	if (ret)
> +		goto out;
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return 0;
> +}
> +
> +static int op_fgetattr(const char *path, struct stat *statbuf,
> +		       struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	ret = stat_inode(fs, fh->ino, statbuf);
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +
> +static int op_utimens(const char *path, const struct timespec tv[2])
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	errcode_t err;
> +	ext2_ino_t ino;
> +	struct ext2_inode_large inode;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	ret = check_inum_access(ctxt, fs, ino, W_OK);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +	EXT4_INODE_SET_XTIME(i_atime, tv, &inode);
> +	EXT4_INODE_SET_XTIME(i_mtime, tv + 1, &inode);
> +	ret = update_ctime(fs, ino, &inode);
> +	if (ret)
> +		goto out;
> +
> +	err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return 0;
> +}
> +
> +#ifdef SUPPORT_I_FLAGS
> +static int ioctl_getflags(ext2_filsys fs, struct fuse2fs_file_handle *fh,
> +			  void *data)
> +{
> +	errcode_t err;
> +	struct ext2_inode_large inode;
> +
> +	err = ext2fs_read_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err)
> +		return -EIO;
> +
> +	*(__u32 *)data = inode.i_flags & EXT2_FL_USER_VISIBLE;
> +	return 0;
> +}
> +
> +#define FUSE2FS_MODIFIABLE_IFLAGS \
> +	(EXT2_IMMUTABLE_FL | EXT2_APPEND_FL | EXT2_NODUMP_FL | \
> +	 EXT2_NOATIME_FL | EXT3_JOURNAL_DATA_FL | EXT2_DIRSYNC_FL | \
> +	 EXT2_TOPDIR_FL)
> +
> +int ioctl_setflags(ext2_filsys fs, struct fuse2fs_file_handle *fh, void *data)

This can be static.

> +{
> +	errcode_t err;
> +	struct ext2_inode_large inode;
> +	int ret;
> +	__u32 flags = *(__u32 *)data;
> +	struct fuse_context *ctxt = fuse_get_context();
> +
> +	ret = check_flags_access(ctxt, fs, fh->ino, W_OK);
> +	if (ret)
> +		return ret;
> +
> +	err = ext2fs_read_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
> +				     sizeof(inode));
> +	if (err)
> +		return -EIO;
> +
> +	if ((inode.i_flags ^ flags) & ~FUSE2FS_MODIFIABLE_IFLAGS)
> +		return -EINVAL;
> +
> +	inode.i_flags = inode.i_flags & ~FUSE2FS_MODIFIABLE_IFLAGS |
> +			flags & FUSE2FS_MODIFIABLE_IFLAGS;
> +
> +	err = ext2fs_write_inode_full(fs, fh->ino, (struct ext2_inode *)&inode,
> +				      sizeof(inode));
> +	if (err)
> +		return -EIO;
> +
> +	return 0;
> +}
> +#endif /* SUPPORT_I_FLAGS */
> +
> +#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
> +static int op_ioctl(const char *path, int cmd, void *arg,
> +		      struct fuse_file_info *fp, unsigned int flags, void *data)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	switch (cmd) {
> +#ifdef SUPPORT_I_FLAGS
> +	case EXT2_IOC_GETFLAGS:
> +		ret = ioctl_getflags(fs, fh, data);
> +		break;
> +	case EXT2_IOC_SETFLAGS:
> +		ret = ioctl_setflags(fs, fh, data);
> +		break;
> +#endif
> +	default:
> +		ret = -ENOTTY;
> +	}
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +#endif /* FUSE 28 */
> +
> +static int op_bmap(const char *path, size_t blocksize, uint64_t *idx)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	ext2_ino_t ino;
> +	errcode_t err;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	err = ext2fs_namei(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, path, &ino);
> +	if (err) {
> +		ret = translate_error(fs, 0, err);
> +		goto out;
> +	}
> +
> +	err = ext2fs_bmap2(fs, ino, NULL, NULL, 0, *idx, 0, (blk64_t *)idx);
> +	if (err) {
> +		ret = translate_error(fs, ino, err);
> +		goto out;
> +	}
> +
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +	return ret;
> +}
> +
> +#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)
> +static int fallocate_helper(struct fuse_file_info *fp, int mode, off_t offset,
> +			    off_t len)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	blk64_t blk, end, x;
> +	__u64 fsize;
> +	ext2_file_t efp;
> +	struct ext2_inode_large inode;

Unused variable.

> +	errcode_t err;
> +	int ret = 0;
> +
> +	/* Allocate a bunch of blocks */
> +	end = (offset + len - 1) / fs->blocksize;
> +	for (blk = offset / fs->blocksize; blk <= end; blk++) {
> +		err = ext2fs_bmap2(fs, fh->ino, NULL, NULL, BMAP_ALLOC, blk,
> +				   0, &x);
> +		if (err)
> +			return translate_error(fs, fh->ino, err);
> +	}
> +
> +	/* Update i_size */
> +	if (!(mode & FL_KEEP_SIZE_FLAG)) {
> +		err = ext2fs_file_open(fs, fh->ino, fh->open_flags, &efp);
> +		if (err)
> +			return translate_error(fs, fh->ino, err);
> +
> +		err = ext2fs_file_get_lsize(efp, &fsize);
> +		if (err) {
> +			ret = translate_error(fs, fh->ino, err);
> +			goto out_isize;
> +		}
> +		if (offset + len > fsize) {
> +			fsize = offset + len;
> +			err = ext2fs_file_set_size2(efp, fsize);
> +			if (err) {
> +				ret = translate_error(fs, fh->ino, err);
> +				goto out_isize;
> +			}
> +		}
> +
> +out_isize:
> +		err = ext2fs_file_close(efp);
> +		if (ret)
> +			return ret;
> +		if (err)
> +			return translate_error(fs, fh->ino, err);
> +	}
> +
> +	return update_mtime(fs, fh->ino);
> +}
> +
> +static int punch_helper(struct fuse_file_info *fp, int mode, off_t offset,
> +			off_t len)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	struct fuse2fs_file_handle *fh = (struct fuse2fs_file_handle *)fp->fh;
> +	blk64_t blk, start, end, x;
> +	__u64 fsize;
> +	ext2_file_t efp;
> +	struct ext2_inode_large inode;

Unused variable.

All of these problems are fixed in my dev tree, of course. :)

--D

> +	errcode_t err;
> +	int ret = 0;
> +
> +	/* kernel ext4 punch requires this flag to be set */
> +	if (!(mode & FL_KEEP_SIZE_FLAG))
> +		return -EINVAL;
> +
> +	if (len < fs->blocksize)
> +		return 0;
> +
> +	/* Punch out a bunch of blocks */
> +	start = (offset + fs->blocksize - 1) / fs->blocksize;
> +	end = (offset + len - fs->blocksize) / fs->blocksize;
> +
> +	if (start > end)
> +		return -EINVAL;
> +
> +	err = ext2fs_punch(fs, fh->ino, NULL, NULL, start, end);
> +	if (err)
> +		return translate_error(fs, fh->ino, err);
> +
> +	return update_mtime(fs, fh->ino);
> +}
> +
> +static int op_fallocate(const char *path, int mode, off_t offset, off_t len,
> +			struct fuse_file_info *fp)
> +{
> +	struct fuse_context *ctxt = fuse_get_context();
> +	struct fuse2fs *ff = (struct fuse2fs *)ctxt->private_data;
> +	ext2_filsys fs = ff->fs;
> +	int ret;
> +
> +	/* Catch unknown flags */
> +	if (mode & ~(FL_PUNCH_HOLE_FLAG | FL_KEEP_SIZE_FLAG))
> +		return -EINVAL;
> +
> +	pthread_mutex_lock(&ff->bfl);
> +	if (!fs_writeable(fs)) {
> +		ret = -EROFS;
> +		goto out;
> +	}
> +	if (mode & FL_PUNCH_HOLE_FLAG)
> +		ret = punch_helper(fp, mode, offset, len);
> +	else
> +		ret = fallocate_helper(fp, mode, offset, len);
> +out:
> +	pthread_mutex_unlock(&ff->bfl);
> +
> +	return ret;
> +}
> +#endif /* FUSE 29 */
> +
> +static struct fuse_operations fs_ops = {
> +	.init = op_init,
> +	.destroy = op_destroy,
> +	.getattr = op_getattr,
> +	.readlink = op_readlink,
> +	.mknod = op_mknod,
> +	.mkdir = op_mkdir,
> +	.unlink = op_unlink,
> +	.rmdir = op_rmdir,
> +	.symlink = op_symlink,
> +	.rename = op_rename,
> +	.link = op_link,
> +	.chmod = op_chmod,
> +	.chown = op_chown,
> +	.truncate = op_truncate,
> +	.open = op_open,
> +	.read = op_read,
> +	.write = op_write,
> +	.statfs = op_statfs,
> +	.release = op_release,
> +	.fsync = op_fsync,
> +	.setxattr = op_setxattr,
> +	.getxattr = op_getxattr,
> +	.listxattr = op_listxattr,
> +	.removexattr = op_removexattr,
> +	.opendir = op_open,
> +	.readdir = op_readdir,
> +	.releasedir = op_release,
> +	.fsyncdir = op_fsync,
> +	.access = op_access,
> +	.create = op_create,
> +	.ftruncate = op_ftruncate,
> +	.fgetattr = op_fgetattr,
> +	.utimens = op_utimens,
> +	.bmap = op_bmap,
> +#ifdef SUPERFLUOUS
> +	.lock = op_lock,
> +	.poll = op_poll,
> +#endif
> +#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
> +	.ioctl = op_ioctl,
> +	.flag_nullpath_ok = 1,
> +#endif
> +#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 9)
> +	.flag_nopath = 1,
> +	.fallocate = op_fallocate,
> +#endif
> +};
> +
> +static int get_random_bytes(void *p, size_t sz)
> +{
> +	int fd;
> +	ssize_t r;
> +
> +	fd = open("/dev/random", O_RDONLY);
> +	if (fd < 0) {
> +		perror("/dev/random");
> +		return 0;
> +	}
> +
> +	r = read(fd, p, sz);
> +
> +	close(fd);
> +	return r == sz;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	errcode_t err;
> +	ext2_filsys fs;
> +	char *tok, *arg, *logfile;
> +	int i;
> +	int readwrite = 1, panic_on_error = 0;
> +	struct fuse2fs *ff;
> +	char extra_args[BUFSIZ];
> +	int ret = 0, flags = EXT2_FLAG_64BITS | EXT2_FLAG_EXCLUSIVE;
> +
> +	if (argc < 2) {
> +		printf("Usage: %s dev mntpt [-o options] [fuse_args]\n",
> +		       argv[0]);
> +		return 1;
> +	}
> +
> +	for (i = 1; i < argc - 1; i++) {
> +		if (strcmp(argv[i], "-o"))
> +			continue;
> +		arg = argv[i + 1];
> +		while ((tok = strtok(arg, ","))) {
> +			arg = NULL;
> +			if (!strcmp(tok, "ro"))
> +				readwrite = 0;
> +			else if (!strcmp(tok, "errors=panic"))
> +				panic_on_error = 1;
> +		}
> +	}
> +
> +	if (!readwrite)
> +		printf("Mounting read-only.\n");
> +
> +#ifdef ENABLE_NLS
> +	setlocale(LC_MESSAGES, "");
> +	setlocale(LC_CTYPE, "");
> +	bindtextdomain(NLS_CAT_NAME, LOCALEDIR);
> +	textdomain(NLS_CAT_NAME);
> +	set_com_err_gettext(gettext);
> +#endif
> +	add_error_table(&et_ext2_error_table);
> +
> +	ff = calloc(1, sizeof(*ff));
> +	if (!ff) {
> +		perror("init");
> +		return 1;
> +	}
> +	ff->panic_on_error = panic_on_error;
> +
> +	/* Set up error logging */
> +	logfile = getenv("FUSE2FS_LOGFILE");
> +	if (logfile) {
> +		ff->err_fp = fopen(logfile, "a");
> +		if (!ff->err_fp) {
> +			perror(logfile);
> +			goto out_nofs;
> +		}
> +	} else
> +		ff->err_fp = stderr;
> +
> +	/* Start up the fs (while we still can use stdout) */
> +	ret = 2;
> +	if (readwrite)
> +		flags |= EXT2_FLAG_RW;
> +	err = ext2fs_open3(argv[1], NULL, flags, 0, 0, unix_io_manager, &fs);
> +	if (err) {
> +		printf("%s: %s.\n", argv[1], error_message(err));
> +		printf("Please run e2fsck -fy %s.\n", argv[1]);
> +		goto out_nofs;
> +	}
> +	ff->fs = fs;
> +	fs->priv_data = ff;
> +
> +	ret = 3;
> +	if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> +				      EXT3_FEATURE_INCOMPAT_RECOVER)) {
> +		printf("Journal needs recovery; running `e2fsck -E "
> +		       "journal_only' is required.\n");
> +		goto out;
> +	}
> +
> +	if (readwrite) {
> +		if (EXT2_HAS_COMPAT_FEATURE(fs->super,
> +					    EXT3_FEATURE_COMPAT_HAS_JOURNAL))
> +			printf("Journal mode will not be used.\n");
> +		err = ext2fs_read_inode_bitmap(fs);
> +		if (err) {
> +			translate_error(fs, 0, err);
> +			goto out;
> +		}
> +		err = ext2fs_read_block_bitmap(fs);
> +		if (err) {
> +			translate_error(fs, 0, err);
> +			goto out;
> +		}
> +	}
> +
> +	if (!(fs->super->s_state & EXT2_VALID_FS))
> +		printf("Warning: Mounting unchecked fs, running e2fsck "
> +		       "is recommended.\n");
> +	if (fs->super->s_max_mnt_count > 0 &&
> +	    fs->super->s_mnt_count >= fs->super->s_max_mnt_count)
> +		printf("Warning: Maximal mount count reached, running "
> +		       "e2fsck is recommended.\n");
> +	if (fs->super->s_checkinterval > 0 &&
> +	    fs->super->s_lastcheck + fs->super->s_checkinterval <= time(0))
> +		printf("Warning: Check time reached; running e2fsck "
> +		       "is recommended.\n");
> +	if (fs->super->s_last_orphan)
> +		printf("Orphans detected; running e2fsck is recommended.\n");
> +
> +	if (fs->super->s_state & EXT2_ERROR_FS) {
> +		printf("Errors detected; running e2fsck is required.\n");
> +		goto out;
> +	}
> +
> +	/* Initialize generation counter */
> +	get_random_bytes(&ff->next_generation, sizeof(unsigned int));
> +
> +	/* Stuff in some fuse parameters of our own */
> +	snprintf(extra_args, BUFSIZ, "-okernel_cache,subtype=ext4,use_ino,"
> +		 "fsname=%s", argv[1]);
> +	argv[0] = argv[1];
> +	argv[1] = argv[2];
> +	argv[2] = extra_args;
> +
> +	pthread_mutex_init(&ff->bfl, NULL);
> +	fuse_main(argc, argv, &fs_ops, ff);
> +	pthread_mutex_destroy(&ff->bfl);
> +
> +	ret = 0;
> +out:
> +	err = ext2fs_close(fs);
> +	if (err)
> +		ret = translate_error(fs, 0, err);
> +out_nofs:
> +	free(ff);
> +
> +	return ret;
> +}
> +
> +static int __translate_error(ext2_filsys fs, errcode_t err, ext2_ino_t ino,
> +			     const char *file, int line)
> +{
> +	struct timespec now;
> +	int ret;
> +	struct fuse2fs *ff = fs->priv_data;
> +	int is_err = 0;
> +
> +	/* Translate ext2 error to unix error code */
> +	switch (err) {
> +	case EXT2_ET_NO_MEMORY:
> +	case EXT2_ET_TDB_ERR_OOM:
> +		ret = -ENOMEM;
> +		break;
> +	case EXT2_ET_INVALID_ARGUMENT:
> +	case EXT2_ET_LLSEEK_FAILED:
> +		ret = -EINVAL;
> +		break;
> +	case EXT2_ET_NO_DIRECTORY:
> +		ret = -ENOTDIR;
> +		break;
> +	case EXT2_ET_FILE_NOT_FOUND:
> +		ret = -ENOENT;
> +		break;
> +	case EXT2_ET_DIR_NO_SPACE:
> +		is_err = 1;
> +	case EXT2_ET_TOOSMALL:
> +	case EXT2_ET_BLOCK_ALLOC_FAIL:
> +	case EXT2_ET_INODE_ALLOC_FAIL:
> +	case EXT2_ET_EA_NO_SPACE:
> +		ret = -ENOSPC;
> +		break;
> +	case EXT2_ET_SYMLINK_LOOP:
> +		ret = -EMLINK;
> +		break;
> +	case EXT2_ET_FILE_TOO_BIG:
> +		ret = -EFBIG;
> +		break;
> +	case EXT2_ET_TDB_ERR_EXISTS:
> +	case EXT2_ET_FILE_EXISTS:
> +		ret = -EEXIST;
> +		break;
> +	case EXT2_ET_MMP_FAILED:
> +	case EXT2_ET_MMP_FSCK_ON:
> +		ret = -EBUSY;
> +		break;
> +	case EXT2_ET_EA_KEY_NOT_FOUND:
> +		ret = -ENODATA;
> +		break;
> +	default:
> +		is_err = 1;
> +		ret = -EIO;
> +		break;
> +	}
> +
> +	if (!is_err)
> +		return ret;
> +
> +	if (ino)
> +		fprintf(ff->err_fp, "FUSE2FS (%s): %s (inode #%d) at %s:%d.\n",
> +			fs && fs->device_name ? fs->device_name : "???",
> +			error_message(err), ino, file, line);
> +	else
> +		fprintf(ff->err_fp, "FUSE2FS (%s): %s at %s:%d.\n",
> +			fs && fs->device_name ? fs->device_name : "???",
> +			error_message(err), file, line);
> +	fflush(ff->err_fp);
> +
> +	/* Make a note in the error log */
> +	get_now(&now);
> +	fs->super->s_last_error_time = now.tv_sec;
> +	fs->super->s_last_error_ino = ino;
> +	fs->super->s_last_error_line = line;
> +	fs->super->s_last_error_block = 0;
> +	strncpy(fs->super->s_last_error_func, file,
> +		sizeof(fs->super->s_last_error_func));
> +	if (fs->super->s_first_error_time == 0) {
> +		fs->super->s_first_error_time = now.tv_sec;
> +		fs->super->s_first_error_ino = ino;
> +		fs->super->s_first_error_line = line;
> +		fs->super->s_first_error_block = 0;
> +		strncpy(fs->super->s_first_error_func, file,
> +			sizeof(fs->super->s_first_error_func));
> +	}
> +
> +	fs->super->s_error_count++;
> +	ext2fs_mark_super_dirty(fs);
> +	ext2fs_flush(fs);
> +	if (ff->panic_on_error)
> +		abort();
> +
> +	return ret;
> +}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter
  2013-10-18 18:32   ` Darrick J. Wong
@ 2013-10-23 14:49     ` Lukáš Czerner
  0 siblings, 0 replies; 73+ messages in thread
From: Lukáš Czerner @ 2013-10-23 14:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Fri, 18 Oct 2013, Darrick J. Wong wrote:

> Date: Fri, 18 Oct 2013 11:32:04 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the
>     superblock parameter
> 
> On Thu, Oct 17, 2013 at 09:49:07PM -0700, Darrick J. Wong wrote:
> > Since it's possible for very large filesystems to store backup
> > superblocks at very large (> 2^32) block numbers, we need to be able
> > to handle the case of a caller directing us to read one of these
> > high-numbered backups.

Looks good.

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  debugfs/debugfs.c   |    4 ++--
> >  e2fsck/journal.c    |    6 +++---
> >  e2fsck/unix.c       |    8 ++++----
> >  lib/ext2fs/ext2fs.h |    4 ++++
> >  lib/ext2fs/openfs.c |   21 +++++++++++++++------
> >  misc/dumpe2fs.c     |    4 ++--
> >  6 files changed, 30 insertions(+), 17 deletions(-)
> > 
> > 
> > diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> > index 8c32eff..4f6108d 100644
> > --- a/debugfs/debugfs.c
> > +++ b/debugfs/debugfs.c
> > @@ -94,8 +94,8 @@ static void open_filesystem(char *device, int open_flags, blk64_t superblock,
> >  	if (catastrophic)
> >  		open_flags |= EXT2_FLAG_SKIP_MMP;
> >  
> > -	retval = ext2fs_open(device, open_flags, superblock, blocksize,
> > -			     unix_io_manager, &current_fs);
> > +	retval = ext2fs_open3(device, NULL, open_flags, superblock, blocksize,
> > +			      unix_io_manager, &current_fs);
> >  	if (retval) {
> >  		com_err(device, retval, "while opening filesystem");
> >  		current_fs = NULL;
> > diff --git a/e2fsck/journal.c b/e2fsck/journal.c
> > index 2509303..af35a38 100644
> > --- a/e2fsck/journal.c
> > +++ b/e2fsck/journal.c
> > @@ -967,9 +967,9 @@ int e2fsck_run_ext3_journal(e2fsck_t ctx)
> >  
> >  	ext2fs_mmp_stop(ctx->fs);
> >  	ext2fs_free(ctx->fs);
> > -	retval = ext2fs_open(ctx->filesystem_name, EXT2_FLAG_RW,
> > -			     ctx->superblock, blocksize, io_ptr,
> > -			     &ctx->fs);
> > +	retval = ext2fs_open3(ctx->filesystem_name, NULL, EXT2_FLAG_RW,
> > +			      ctx->superblock, blocksize, io_ptr,
> > +			      &ctx->fs);
> >  	if (retval) {
> >  		com_err(ctx->program_name, retval,
> >  			_("while trying to re-open %s"),
> > diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> > index 0546653..fb41ca0 100644
> > --- a/e2fsck/unix.c
> > +++ b/e2fsck/unix.c
> > @@ -1040,7 +1040,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
> >  
> >  	*ret_fs = NULL;
> >  	if (ctx->superblock && ctx->blocksize) {
> > -		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
> > +		retval = ext2fs_open3(ctx->filesystem_name, ctx->io_options,
> >  				      flags, ctx->superblock, ctx->blocksize,
> >  				      io_ptr, ret_fs);
> >  	} else if (ctx->superblock) {
> > @@ -1051,7 +1051,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
> >  				ext2fs_free(*ret_fs);
> >  				*ret_fs = NULL;
> >  			}
> > -			retval = ext2fs_open2(ctx->filesystem_name,
> > +			retval = ext2fs_open3(ctx->filesystem_name,
> >  					      ctx->io_options, flags,
> >  					      ctx->superblock, blocksize,
> >  					      io_ptr, ret_fs);
> > @@ -1059,7 +1059,7 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
> >  				break;
> >  		}
> >  	} else
> > -		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
> > +		retval = ext2fs_open3(ctx->filesystem_name, ctx->io_options,
> >  				      flags, 0, 0, io_ptr, ret_fs);
> >  
> >  	if (retval == 0)
> > @@ -1375,7 +1375,7 @@ failure:
> >  	 * don't need to update the mount count and last checked
> >  	 * fields in the backup superblock (the kernel doesn't update
> >  	 * the backup superblocks anyway).  With newer versions of the
> > -	 * library this flag is set by ext2fs_open2(), but we set this
> > +	 * library this flag is set by ext2fs_open3(), but we set this
> >  	 * here just to be sure.  (No, we don't support e2fsck running
> >  	 * with some other libext2fs than the one that it was shipped
> >  	 * with, but just in case....)
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 67876ad..1ef4d67 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -1443,6 +1443,10 @@ extern errcode_t ext2fs_open2(const char *name, const char *io_options,
> >  			      int flags, int superblock,
> >  			      unsigned int block_size, io_manager manager,
> >  			      ext2_filsys *ret_fs);
> > +extern errcode_t ext2fs_open3(const char *name, const char *io_options,
> > +			      int flags, blk64_t superblock,
> > +			      unsigned int block_size, io_manager manager,
> > +			      ext2_filsys *ret_fs);
> >  extern blk64_t ext2fs_descriptor_block_loc2(ext2_filsys fs,
> >  					blk64_t group_block, dgrp_t i);
> >  extern blk_t ext2fs_descriptor_block_loc(ext2_filsys fs, blk_t group_block,
> > diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
> > index 2ad9114..b046d6c 100644
> > --- a/lib/ext2fs/openfs.c
> > +++ b/lib/ext2fs/openfs.c
> > @@ -76,6 +76,15 @@ errcode_t ext2fs_open(const char *name, int flags, int superblock,
> >  			    manager, ret_fs);
> >  }
> >  
> > +errcode_t ext2fs_open2(const char *name, const char *io_options,
> > +		       int flags, int superblock,
> > +		       unsigned int block_size, io_manager manager,
> > +		       ext2_filsys *ret_fs)
> > +{
> > +	return ext2fs_open3(name, io_options, flags, superblock, block_size,
> > +			    manager, ret_fs);
> > +}
> > +
> >  /*
> >   *  Note: if superblock is non-zero, block-size must also be non-zero.
> >   * 	Superblock and block_size can be zero to use the default size.
> > @@ -90,8 +99,8 @@ errcode_t ext2fs_open(const char *name, int flags, int superblock,
> >   *	EXT2_FLAG_64BITS - Allow 64-bit bitfields (needed for large
> >   *				filesystems)
> >   */
> > -errcode_t ext2fs_open2(const char *name, const char *io_options,
> > -		       int flags, int superblock,
> > +errcode_t ext2fs_open3(const char *name, const char *io_options,
> > +		       int flags, blk64_t superblock,
> >  		       unsigned int block_size, io_manager manager,
> >  		       ext2_filsys *ret_fs)
> >  {
> > @@ -189,8 +198,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> >  		if (retval)
> >  			goto cleanup;
> >  	}
> > -	retval = io_channel_read_blk(fs->io, superblock, -SUPERBLOCK_SIZE,
> > -				     fs->super);
> > +	retval = io_channel_read_blk64(fs->io, superblock, -SUPERBLOCK_SIZE,
> > +				       fs->super);
> >  	if (retval)
> >  		goto cleanup;
> >  	if (fs->orig_super)
> > @@ -380,8 +389,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> >  	else
> >  		first_meta_bg = fs->desc_blocks;
> >  	if (first_meta_bg) {
> > -		retval = io_channel_read_blk(fs->io, group_block+1,
> > -					     first_meta_bg, dest);
> > +		retval = io_channel_read_blk64(fs->io, group_block+1,
> > +					       first_meta_bg, dest);
> 
> The only change to this patch is the use of *read_blk64 in these two hunks.
> 
> --D
> 
> >  		if (retval)
> >  			goto cleanup;
> >  #ifdef WORDS_BIGENDIAN
> > diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
> > index ae70f70..b139977 100644
> > --- a/misc/dumpe2fs.c
> > +++ b/misc/dumpe2fs.c
> > @@ -611,7 +611,7 @@ int main (int argc, char ** argv)
> >  		for (use_blocksize = EXT2_MIN_BLOCK_SIZE;
> >  		     use_blocksize <= EXT2_MAX_BLOCK_SIZE;
> >  		     use_blocksize *= 2) {
> > -			retval = ext2fs_open (device_name, flags,
> > +			retval = ext2fs_open3(device_name, NULL, flags,
> >  					      use_superblock,
> >  					      use_blocksize, unix_io_manager,
> >  					      &fs);
> > @@ -619,7 +619,7 @@ int main (int argc, char ** argv)
> >  				break;
> >  		}
> >  	} else
> > -		retval = ext2fs_open (device_name, flags, use_superblock,
> > +		retval = ext2fs_open3(device_name, NULL, flags, use_superblock,
> >  				      use_blocksize, unix_io_manager, &fs);
> >  	if (retval) {
> >  		com_err (program_name, retval, _("while trying to open %s"),
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set
  2013-10-18  4:49 ` [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set Darrick J. Wong
@ 2013-10-23 15:08   ` Lukáš Czerner
  2013-10-23 23:40   ` Theodore Ts'o
  1 sibling, 0 replies; 73+ messages in thread
From: Lukáš Czerner @ 2013-10-23 15:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 17 Oct 2013, Darrick J. Wong wrote:

> Date: Thu, 17 Oct 2013 21:49:16 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when
>     meta_bg is set
> 
> Passing the "-E resize=NNN" option to mke2fs sets the resize_inode
> feature.  However, resize_inode and meta_bg are mutually exclusive
> (and the feature flag parser enforces this); therefore, we shouldn't
> allow resize_inode to sneak in the back door like this.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  misc/mke2fs.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> 
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 6e709b9..ce3c696 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -2448,6 +2448,17 @@ int main (int argc, char *argv[])
>  	}
>  	fs->progress_ops = &ext2fs_numeric_progress_ops;
>  
> +	/* We can't have resize_inode sneak in via resize= on a meta_bg fs. */
> +	if (!quiet &&

It means that this will only be checked when user does _not_ run
with quiet mode, so it'll still be possible when run with -q. I
assume this is not what you wanted.

Thanks!
-Lukas


> +	    EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> +				      EXT2_FEATURE_INCOMPAT_META_BG) &&
> +	    fs->super->s_reserved_gdt_blocks > 0) {
> +		printf(_("Reserving GDT blocks (resize_inode) is not possible "
> +			 "with the meta_bg feature.\nThey cannot be enabled "
> +			 "simultaneously.\n"));
> +		exit(1);
> +	}
> +
>  	/* Check the user's mkfs options for metadata checksumming */
>  	if (!quiet &&
>  	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/25] libext2fs: reject 64bit badblocks numbers
  2013-10-18  4:49 ` [PATCH 04/25] libext2fs: reject 64bit badblocks numbers Darrick J. Wong
@ 2013-10-23 15:24   ` Lukáš Czerner
  2013-10-23 23:58     ` Theodore Ts'o
  0 siblings, 1 reply; 73+ messages in thread
From: Lukáš Czerner @ 2013-10-23 15:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, 17 Oct 2013, Darrick J. Wong wrote:

> Date: Thu, 17 Oct 2013 21:49:22 -0700
> From: Darrick J. Wong <darrick.wong@oracle.com>
> To: tytso@mit.edu, darrick.wong@oracle.com
> Cc: linux-ext4@vger.kernel.org
> Subject: [PATCH 04/25] libext2fs: reject 64bit badblocks numbers
> 
> Don't accept block numbers larger than 2^32 for the badblocks list,
> and don't run badblocks on them either.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/read_bb_file.c |    7 +++++--
>  misc/badblocks.c          |   17 ++++++++++++++++-
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/read_bb_file.c b/lib/ext2fs/read_bb_file.c
> index 7d7bb7a..4a498d2 100644
> --- a/lib/ext2fs/read_bb_file.c
> +++ b/lib/ext2fs/read_bb_file.c
> @@ -39,7 +39,7 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
>  					       void *priv_data))
>  {
>  	errcode_t	retval;
> -	blk_t		blockno;
> +	blk64_t		blockno;
>  	int		count;
>  	char		buf[128];
>  
> @@ -55,9 +55,12 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
>  	while (!feof (f)) {
>  		if (fgets(buf, sizeof(buf), f) == NULL)
>  			break;
> -		count = sscanf(buf, "%u", &blockno);
> +		count = sscanf(buf, "%llu", &blockno);
>  		if (count <= 0)
>  			continue;
> +		/* Badblocks isn't going to be updated for 64bit */
> +		if (blockno > 1ULL << 32)

1ULL << 32 is not 32bit number. You need

	if (blockno >= 1ULL << 32)

or

	if (blockno > (1ULL << 32) - 1)

or better yet, use UINT32_MAX from stdint.h

> +			return EOVERFLOW;
>  		if (fs &&
>  		    ((blockno < fs->super->s_first_data_block) ||
>  		     (blockno >= ext2fs_blocks_count(fs->super)))) {
> diff --git a/misc/badblocks.c b/misc/badblocks.c
> index c9e47c7..802080c 100644
> --- a/misc/badblocks.c
> +++ b/misc/badblocks.c
> @@ -1047,6 +1047,7 @@ int main (int argc, char ** argv)
>  				  unsigned int);
>  	int open_flag;
>  	long sysval;
> +	blk64_t inblk;
>  
>  	setbuf(stdout, NULL);
>  	setbuf(stderr, NULL);
> @@ -1204,6 +1205,13 @@ int main (int argc, char ** argv)
>  		     (unsigned long) first_block, (unsigned long) last_block);
>  	    exit (1);
>  	}
> +	/* ext2 badblocks file can't handle large values */
> +	if ((blk64_t)last_block >= 1ULL << 32) {
> +		com_err(program_name, EOVERFLOW,
> +			_("invalid end block (%lu): must be less than %llu"),
> +			(unsigned long)last_block, 1ULL << 32);
> +		exit(1);
> +	}
>  	if (w_flag)
>  		check_mount(device_name);
>  
> @@ -1262,13 +1270,20 @@ int main (int argc, char ** argv)
>  
>  	if (in) {
>  		for(;;) {
> -			switch(fscanf (in, "%u\n", &next_bad)) {
> +			switch (fscanf(in, "%llu\n", &inblk)) {
>  				case 0:
>  					com_err (program_name, 0, "input file - bad format");
>  					exit (1);
>  				case EOF:
>  					break;
>  				default:
> +					if (inblk > 1ULL << 32) {

same here

> +						com_err(program_name,
> +							EOVERFLOW,
> +							_("while adding to in-memory bad block list"));
> +						exit(1);
> +					}
> +					next_bad = inblk;
>  					errcode = ext2fs_badblocks_list_add(bb_list,next_bad);
>  					if (errcode) {
>  						com_err (program_name, errcode, _("while adding to in-memory bad block list"));
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 01/25] libext2fs: stop iterating dirents when done linking
  2013-10-18  4:49 ` [PATCH 01/25] libext2fs: stop iterating dirents when done linking Darrick J. Wong
@ 2013-10-23 23:39   ` Theodore Ts'o
  0 siblings, 0 replies; 73+ messages in thread
From: Theodore Ts'o @ 2013-10-23 23:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:49:01PM -0700, Darrick J. Wong wrote:
> When we've succesfully linked an inode into a directory, we can stop
> iterating the directory.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set
  2013-10-18  4:49 ` [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set Darrick J. Wong
  2013-10-23 15:08   ` Lukáš Czerner
@ 2013-10-23 23:40   ` Theodore Ts'o
  1 sibling, 0 replies; 73+ messages in thread
From: Theodore Ts'o @ 2013-10-23 23:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

Here's a better way to fix this.  Thanks for pointing out this
problem!

					- Ted

>From cecfb4c04227dd5803c24b311d92a80e91b7b380 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Thu, 17 Oct 2013 21:49:16 -0700
Subject: [PATCH] mke2fs: don't let resize= turn on resize_inode when meta_bg
 is set

Passing the "-E resize=NNN" option to mke2fs sets the resize_inode
feature.  However, resize_inode and meta_bg are mutually exclusive;
unfortunately, we check this constraint before we parse the extended
options.  Fix this by moving this check after the calls
parse_extended_opts().

Reported-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 misc/mke2fs.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index cc06a97..64d923a 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1797,15 +1797,6 @@ profile_error:
 		fs_param.s_feature_ro_compat = 0;
  	}
 
-	if ((fs_param.s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) &&
-	    (fs_param.s_feature_compat & EXT2_FEATURE_COMPAT_RESIZE_INODE)) {
-		fprintf(stderr, _("The resize_inode and meta_bg features "
-				  "are not compatible.\n"
-				  "They can not be both enabled "
-				  "simultaneously.\n"));
-		exit(1);
-	}

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/25] libext2fs: reject 64bit badblocks numbers
  2013-10-23 15:24   ` Lukáš Czerner
@ 2013-10-23 23:58     ` Theodore Ts'o
  2013-10-24 11:40       ` Lukáš Czerner
  0 siblings, 1 reply; 73+ messages in thread
From: Theodore Ts'o @ 2013-10-23 23:58 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: Darrick J. Wong, linux-ext4

On Wed, Oct 23, 2013 at 05:24:00PM +0200, Lukáš Czerner wrote:
> 
> 1ULL << 32 is not 32bit number. You need
> 
> 	if (blockno >= 1ULL << 32)

I fixed up this patch like this (which should also be easier for the
compiler to optimize).

						- Ted

From d87f198ca3250c9dff6a4002cd2bbbb5ab6f113a Mon Sep 17 00:00:00 2001
From: "Darrick J. Wong" <darrick.wong@oracle.com>
Date: Wed, 23 Oct 2013 19:43:32 -0400
Subject: [PATCH] libext2fs: reject 64bit badblocks numbers

Don't accept block numbers larger than 2^32 for the badblocks list,
and don't run badblocks on them either.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 lib/ext2fs/read_bb_file.c |  7 +++++--
 misc/badblocks.c          | 21 ++++++++++++++++++---
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/lib/ext2fs/read_bb_file.c b/lib/ext2fs/read_bb_file.c
index 7d7bb7a..8d1ad1a 100644
--- a/lib/ext2fs/read_bb_file.c
+++ b/lib/ext2fs/read_bb_file.c
@@ -39,7 +39,7 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
 					       void *priv_data))
 {
 	errcode_t	retval;
-	blk_t		blockno;
+	blk64_t		blockno;
 	int		count;
 	char		buf[128];
 
@@ -55,9 +55,12 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
 	while (!feof (f)) {
 		if (fgets(buf, sizeof(buf), f) == NULL)
 			break;
-		count = sscanf(buf, "%u", &blockno);
+		count = sscanf(buf, "%llu", &blockno);
 		if (count <= 0)
 			continue;
+		/* Badblocks isn't going to be updated for 64bit */
+		if (blockno >> 32)
+			return EOVERFLOW;
 		if (fs &&
 		    ((blockno < fs->super->s_first_data_block) ||
 		     (blockno >= ext2fs_blocks_count(fs->super)))) {
diff --git a/misc/badblocks.c b/misc/badblocks.c
index c9e47c7..432c17b 100644
--- a/misc/badblocks.c
+++ b/misc/badblocks.c
@@ -1047,6 +1047,7 @@ int main (int argc, char ** argv)
 				  unsigned int);
 	int open_flag;
 	long sysval;
+	blk64_t inblk;
 
 	setbuf(stdout, NULL);
 	setbuf(stderr, NULL);
@@ -1200,10 +1201,17 @@ int main (int argc, char ** argv)
 		first_block = parse_uint(argv[optind], _("first block"));
 	} else first_block = 0;
 	if (first_block >= last_block) {
-	    com_err (program_name, 0, _("invalid starting block (%lu): must be less than %lu"),
-		     (unsigned long) first_block, (unsigned long) last_block);
+	    com_err (program_name, 0, _("invalid starting block (%llu): must be less than %llu"),
+		     first_block, last_block);
 	    exit (1);
 	}
+	/* ext2 badblocks file can't handle large values */
+	if (last_block >> 32) {
+		com_err(program_name, EOVERFLOW,
+			_("invalid end block (%llu): must be 32-bit value"),
+			last_block);
+		exit(1);
+	}
 	if (w_flag)
 		check_mount(device_name);
 
@@ -1262,13 +1270,20 @@ int main (int argc, char ** argv)
 
 	if (in) {
 		for(;;) {
-			switch(fscanf (in, "%u\n", &next_bad)) {
+			switch (fscanf(in, "%llu\n", &inblk)) {
 				case 0:
 					com_err (program_name, 0, "input file - bad format");
 					exit (1);
 				case EOF:
 					break;
 				default:
+					if (inblk >> 32) {
+						com_err(program_name,
+							EOVERFLOW,
+							_("while adding to in-memory bad block list"));
+						exit(1);
+					}
+					next_bad = inblk;
 					errcode = ext2fs_badblocks_list_add(bb_list,next_bad);
 					if (errcode) {
 						com_err (program_name, errcode, _("while adding to in-memory bad block list"));
-- 
1.7.12.rc0.22.gcdd159b

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks
  2013-10-18  4:49 ` [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks Darrick J. Wong
@ 2013-10-24  0:08   ` Theodore Ts'o
  2013-12-04  4:40     ` Darrick J. Wong
  0 siblings, 1 reply; 73+ messages in thread
From: Theodore Ts'o @ 2013-10-24  0:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

On Thu, Oct 17, 2013 at 09:49:28PM -0700, Darrick J. Wong wrote:
> On a FS with a rather large blockize (> 4K), the old block map
> structure can construct a fat enough "tree" (or whatever we call that
> lopsided thing) that (at least in theory) one could create mappings
> for logical blocks higher than 32 bits.  In practice this doesn't
> happen, but the 'max' and 'iter' variables that the punch helpers use
> will overflow because the BLOCK_SIZE_BITS shifts are too large to fit
> a 32-bit variable.  This causes punch to fail on TIND-mapped blocks
> even if the file is < 16T.  So enlarge the fields to fit.

Hmm.... this brings up the question of whether we should support
inodes that have indirect block maps that result in mappings for
logical blocks > 32-bits.  There is probably a lot of code that
assumes that the logical block number is 32-bits that will break
horribly.

So things brings up a couple of different questions.

#1) Does e2fsck notice, and does it complain if it trips against one
of these.

#2) What should e2fsprogs do when it comes across one of these inodes.
It may be that simply returning an error is enough, once we notice
that it hsa blocks larger than this.  Would it be cleaner and more
efficient for the punch code to simply make sure that it stops before
the logical block number overflows?  64-bit variables have a cost,
especially on 32-bit machines.

					- Ted

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 04/25] libext2fs: reject 64bit badblocks numbers
  2013-10-23 23:58     ` Theodore Ts'o
@ 2013-10-24 11:40       ` Lukáš Czerner
  0 siblings, 0 replies; 73+ messages in thread
From: Lukáš Czerner @ 2013-10-24 11:40 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Darrick J. Wong, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4228 bytes --]

On Wed, 23 Oct 2013, Theodore Ts'o wrote:

> Date: Wed, 23 Oct 2013 19:58:10 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 04/25] libext2fs: reject 64bit badblocks numbers
> 
> On Wed, Oct 23, 2013 at 05:24:00PM +0200, Lukáš Czerner wrote:
> > 
> > 1ULL << 32 is not 32bit number. You need
> > 
> > 	if (blockno >= 1ULL << 32)
> 
> I fixed up this patch like this (which should also be easier for the
> compiler to optimize).

I've just noticed something bellow...

> 
> 						- Ted
> 
> From d87f198ca3250c9dff6a4002cd2bbbb5ab6f113a Mon Sep 17 00:00:00 2001
> From: "Darrick J. Wong" <darrick.wong@oracle.com>
> Date: Wed, 23 Oct 2013 19:43:32 -0400
> Subject: [PATCH] libext2fs: reject 64bit badblocks numbers
> 
> Don't accept block numbers larger than 2^32 for the badblocks list,
> and don't run badblocks on them either.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  lib/ext2fs/read_bb_file.c |  7 +++++--
>  misc/badblocks.c          | 21 ++++++++++++++++++---
>  2 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/ext2fs/read_bb_file.c b/lib/ext2fs/read_bb_file.c
> index 7d7bb7a..8d1ad1a 100644
> --- a/lib/ext2fs/read_bb_file.c
> +++ b/lib/ext2fs/read_bb_file.c
> @@ -39,7 +39,7 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
>  					       void *priv_data))
>  {
>  	errcode_t	retval;
> -	blk_t		blockno;
> +	blk64_t		blockno;
>  	int		count;
>  	char		buf[128];
>  
> @@ -55,9 +55,12 @@ errcode_t ext2fs_read_bb_FILE2(ext2_filsys fs, FILE *f,
>  	while (!feof (f)) {
>  		if (fgets(buf, sizeof(buf), f) == NULL)
>  			break;
> -		count = sscanf(buf, "%u", &blockno);
> +		count = sscanf(buf, "%llu", &blockno);
>  		if (count <= 0)
>  			continue;
> +		/* Badblocks isn't going to be updated for 64bit */
> +		if (blockno >> 32)
> +			return EOVERFLOW;
>  		if (fs &&
>  		    ((blockno < fs->super->s_first_data_block) ||
>  		     (blockno >= ext2fs_blocks_count(fs->super)))) {
> diff --git a/misc/badblocks.c b/misc/badblocks.c
> index c9e47c7..432c17b 100644
> --- a/misc/badblocks.c
> +++ b/misc/badblocks.c
> @@ -1047,6 +1047,7 @@ int main (int argc, char ** argv)
>  				  unsigned int);
>  	int open_flag;
>  	long sysval;
> +	blk64_t inblk;
>  
>  	setbuf(stdout, NULL);
>  	setbuf(stderr, NULL);
> @@ -1200,10 +1201,17 @@ int main (int argc, char ** argv)
>  		first_block = parse_uint(argv[optind], _("first block"));
>  	} else first_block = 0;
>  	if (first_block >= last_block) {
> -	    com_err (program_name, 0, _("invalid starting block (%lu): must be less than %lu"),
> -		     (unsigned long) first_block, (unsigned long) last_block);
> +	    com_err (program_name, 0, _("invalid starting block (%llu): must be less than %llu"),
> +		     first_block, last_block);
>  	    exit (1);
>  	}
> +	/* ext2 badblocks file can't handle large values */
> +	if (last_block >> 32) {

last_block can be obtained using ext2fs_get_device_size2() so it's
not really a "last block" but rather "number of blocks" in which
case we might potentially run into a problem with file systems
exactly 16TB long where it should theoretically be possible to use
badblocks.

Thanks!
-Lukas


> +		com_err(program_name, EOVERFLOW,
> +			_("invalid end block (%llu): must be 32-bit value"),
> +			last_block);
> +		exit(1);
> +	}
>  	if (w_flag)
>  		check_mount(device_name);
>  
> @@ -1262,13 +1270,20 @@ int main (int argc, char ** argv)
>  
>  	if (in) {
>  		for(;;) {
> -			switch(fscanf (in, "%u\n", &next_bad)) {
> +			switch (fscanf(in, "%llu\n", &inblk)) {
>  				case 0:
>  					com_err (program_name, 0, "input file - bad format");
>  					exit (1);
>  				case EOF:
>  					break;
>  				default:
> +					if (inblk >> 32) {
> +						com_err(program_name,
> +							EOVERFLOW,
> +							_("while adding to in-memory bad block list"));
> +						exit(1);
> +					}
> +					next_bad = inblk;
>  					errcode = ext2fs_badblocks_list_add(bb_list,next_bad);
>  					if (errcode) {
>  						com_err (program_name, errcode, _("while adding to in-memory bad block list"));
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE
  2013-10-18  4:49 ` [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE Darrick J. Wong
@ 2013-11-25  7:09   ` Zheng Liu
  2013-11-25 17:57     ` Darrick J. Wong
  0 siblings, 1 reply; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  7:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:49:35PM -0700, Darrick J. Wong wrote:
> For each site where we test for a large file (> 2GB) and set the
> LARGE_FILE feature, use a helper function to make the size test
> consistent with the test that's in e2fsck.  This fixes the fsck
> complaints when we try to create a 2GB journal (not so hard with 64k
> block size) and fixes the incorrect test in fileio.c.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

In e2fsck/pass2.c there is also a place that needs to be fixed.
Otherwise the patch looks good to me.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 3c0bf49..66ed665 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1318,7 +1318,8 @@ static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf)
 	if (!ext2fs_inode_has_valid_blocks2(fs, &inode))
 		goto clear_inode;
 
-	if (LINUX_S_ISREG(inode.i_mode) && EXT2_I_SIZE(&inode) >= 0x80000000UL)
+	if (LINUX_S_ISREG(inode.i_mode) &&
+	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&inode)))
 		ctx->large_files--;
 
 	del_block.ctx = ctx;

> ---
>  e2fsck/pass1.c         |    3 ++-
>  lib/ext2fs/ext2fs.h    |    6 ++++++
>  lib/ext2fs/fileio.c    |    2 +-
>  lib/ext2fs/mkjournal.c |    2 +-
>  4 files changed, 10 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index ab23e42..8c18a93 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -2281,7 +2281,8 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
>  		}
>  		pctx->num = 0;
>  	}
> -	if (LINUX_S_ISREG(inode->i_mode) && EXT2_I_SIZE(inode) >= 0x80000000UL)
> +	if (LINUX_S_ISREG(inode->i_mode) &&
> +	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(inode)))
>  		ctx->large_files++;
>  	if ((pb.num_blocks != ext2fs_inode_i_blocks(fs, inode)) ||
>  	    ((fs->super->s_feature_ro_compat &
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 1ef4d67..8f82dae 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -646,6 +646,12 @@ static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
>  			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
>  }
>  
> +/* The LARGE_FILE feature should be set if we have stored files 2GB+ in size */
> +static inline int ext2fs_needs_large_file_feature(unsigned long long file_size)
> +{
> +	return file_size >= 0x80000000ULL;
> +}
> +
>  /* alloc.c */
>  extern errcode_t ext2fs_new_inode(ext2_filsys fs, ext2_ino_t dir, int mode,
>  				  ext2fs_inode_bitmap map, ext2_ino_t *ret);
> diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
> index 02e6263..6b213b5 100644
> --- a/lib/ext2fs/fileio.c
> +++ b/lib/ext2fs/fileio.c
> @@ -400,7 +400,7 @@ errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size)
>  
>  	/* If we're writing a large file, set the large_file flag */
>  	if (LINUX_S_ISREG(file->inode.i_mode) &&
> -	    EXT2_I_SIZE(&file->inode) > 0x7FFFFFFULL &&
> +	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&file->inode)) &&
>  	    (!EXT2_HAS_RO_COMPAT_FEATURE(file->fs->super,
>  					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
>  	     file->fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
> diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> index c636a97..2afd3b7 100644
> --- a/lib/ext2fs/mkjournal.c
> +++ b/lib/ext2fs/mkjournal.c
> @@ -378,7 +378,7 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
>  	inode_size = (unsigned long long)fs->blocksize * num_blocks;
>  	inode.i_size = inode_size & 0xFFFFFFFF;
>  	inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
> -	if (inode.i_size_high)
> +	if (ext2fs_needs_large_file_feature(inode_size))
>  		fs->super->s_feature_ro_compat |=
>  			EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
>  	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 07/25] mke2fs: load configfile blocksize setting before 64bit checks
  2013-10-18  4:49 ` [PATCH 07/25] mke2fs: load configfile blocksize setting before 64bit checks Darrick J. Wong
@ 2013-11-25  8:01   ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  8:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:49:42PM -0700, Darrick J. Wong wrote:
> mke2fs has a series of checks to ensure that we don't create a
> filesystem too big for its blocksize -- if auto-64bit is on, then it
> turns on 64bit; otherwise it complains.  Unfortunately, it performs
> these checks before looking in mke2fs.conf for a blocksize, which
> means that the checks are incorrect if the user specifies a non-4096
> blocksize in the config file and says nothing on the command line.  It
> also has the effect of mandating a 4k block size on any block device
> larger than 4T in that situation.  Therefore, read the block size from
> the config file before performing the 64bit checks.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> ---
>  misc/mke2fs.c |  132 ++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 70 insertions(+), 62 deletions(-)
> 
> 
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index ce3c696..86091d7 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1298,6 +1298,21 @@ static void PRS(int argc, char *argv[])
>  	char *		fs_type = 0;
>  	char *		usage_types = 0;
>  	blk64_t		dev_size;
> +	/*
> +	 * NOTE: A few words about fs_blocks_count and blocksize:
> +	 *
> +	 * Initially, blocksize is set to zero, which implies 1024.
> +	 * If -b is specified, blocksize is updated to the user's value.
> +	 *
> +	 * Next, the device size or the user's "blocks" command line argument
> +	 * is used to set fs_blocks_count; the units are blocksize.
> +	 *
> +	 * Later, if blocksize hasn't been set and the profile specifies a
> +	 * blocksize, then blocksize is updated and fs_blocks_count is scaled
> +	 * appropriately.  Note the change in units!
> +	 *
> +	 * Finally, we complain about fs_blocks_count > 2^32 on a non-64bit fs.
> +	 */
>  	blk64_t		fs_blocks_count = 0;
>  #ifdef __linux__
>  	struct 		utsname ut;
> @@ -1780,15 +1795,65 @@ profile_error:
>  		}
>  	}
>  
> +	/* Get the hardware sector sizes, if available */
> +	retval = ext2fs_get_device_sectsize(device_name, &lsector_size);
> +	if (retval) {
> +		com_err(program_name, retval,
> +			_("while trying to determine hardware sector size"));
> +		exit(1);
> +	}
> +	retval = ext2fs_get_device_phys_sectsize(device_name, &psector_size);
> +	if (retval) {
> +		com_err(program_name, retval,
> +			_("while trying to determine physical sector size"));
> +		exit(1);
> +	}
> +
> +	if ((tmp = getenv("MKE2FS_DEVICE_SECTSIZE")) != NULL)
> +		lsector_size = atoi(tmp);
> +	if ((tmp = getenv("MKE2FS_DEVICE_PHYS_SECTSIZE")) != NULL)
> +		psector_size = atoi(tmp);
> +
> +	/* Older kernels may not have physical/logical distinction */
> +	if (!psector_size)
> +		psector_size = lsector_size;
> +
> +	if (blocksize <= 0) {
> +		use_bsize = get_int_from_profile(fs_types, "blocksize", 4096);
> +
> +		if (use_bsize == -1) {
> +			use_bsize = sys_page_size;
> +			if ((linux_version_code < (2*65536 + 6*256)) &&
> +			    (use_bsize > 4096))
> +				use_bsize = 4096;
> +		}
> +		if (lsector_size && use_bsize < lsector_size)
> +			use_bsize = lsector_size;
> +		if ((blocksize < 0) && (use_bsize < (-blocksize)))
> +			use_bsize = -blocksize;
> +		blocksize = use_bsize;
> +		fs_blocks_count /= (blocksize / 1024);
> +	} else {
> +		if (blocksize < lsector_size) {			/* Impossible */
> +			com_err(program_name, EINVAL,
> +				_("while setting blocksize; too small "
> +				  "for device\n"));
> +			exit(1);
> +		} else if ((blocksize < psector_size) &&
> +			   (psector_size <= sys_page_size)) {	/* Suboptimal */
> +			fprintf(stderr, _("Warning: specified blocksize %d is "
> +				"less than device physical sectorsize %d\n"),
> +				blocksize, psector_size);
> +		}
> +	}
> +
> +	fs_param.s_log_block_size =
> +		int_log2(blocksize >> EXT2_MIN_BLOCK_LOG_SIZE);
> +
>  	/*
>  	 * We now need to do a sanity check of fs_blocks_count for
>  	 * 32-bit vs 64-bit block number support.
>  	 */
> -	if ((fs_blocks_count > MAX_32_NUM) && (blocksize == 0)) {
> -		fs_blocks_count /= 4; /* Try using a 4k blocksize */
> -		blocksize = 4096;
> -		fs_param.s_log_block_size = 2;
> -	}
>  	if ((fs_blocks_count > MAX_32_NUM) &&
>  	    !(fs_param.s_feature_incompat & EXT4_FEATURE_INCOMPAT_64BIT) &&
>  	    get_bool_from_profile(fs_types, "auto_64-bit_support", 0)) {
> @@ -1889,63 +1954,6 @@ profile_error:
>  	if ((fs_param.s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) &&
>  	    ((tmp = getenv("MKE2FS_FIRST_META_BG"))))
>  		fs_param.s_first_meta_bg = atoi(tmp);
> -
> -	/* Get the hardware sector sizes, if available */
> -	retval = ext2fs_get_device_sectsize(device_name, &lsector_size);
> -	if (retval) {
> -		com_err(program_name, retval,
> -			_("while trying to determine hardware sector size"));
> -		exit(1);
> -	}
> -	retval = ext2fs_get_device_phys_sectsize(device_name, &psector_size);
> -	if (retval) {
> -		com_err(program_name, retval,
> -			_("while trying to determine physical sector size"));
> -		exit(1);
> -	}
> -
> -	if ((tmp = getenv("MKE2FS_DEVICE_SECTSIZE")) != NULL)
> -		lsector_size = atoi(tmp);
> -	if ((tmp = getenv("MKE2FS_DEVICE_PHYS_SECTSIZE")) != NULL)
> -		psector_size = atoi(tmp);
> -
> -	/* Older kernels may not have physical/logical distinction */
> -	if (!psector_size)
> -		psector_size = lsector_size;
> -
> -	if (blocksize <= 0) {
> -		use_bsize = get_int_from_profile(fs_types, "blocksize", 4096);
> -
> -		if (use_bsize == -1) {
> -			use_bsize = sys_page_size;
> -			if ((linux_version_code < (2*65536 + 6*256)) &&
> -			    (use_bsize > 4096))
> -				use_bsize = 4096;
> -		}
> -		if (lsector_size && use_bsize < lsector_size)
> -			use_bsize = lsector_size;
> -		if ((blocksize < 0) && (use_bsize < (-blocksize)))
> -			use_bsize = -blocksize;
> -		blocksize = use_bsize;
> -		ext2fs_blocks_count_set(&fs_param,
> -					ext2fs_blocks_count(&fs_param) /
> -					(blocksize / 1024));
> -	} else {
> -		if (blocksize < lsector_size) {			/* Impossible */
> -			com_err(program_name, EINVAL,
> -				_("while setting blocksize; too small "
> -				  "for device\n"));
> -			exit(1);
> -		} else if ((blocksize < psector_size) &&
> -			   (psector_size <= sys_page_size)) {	/* Suboptimal */
> -			fprintf(stderr, _("Warning: specified blocksize %d is "
> -				"less than device physical sectorsize %d\n"),
> -				blocksize, psector_size);
> -		}
> -	}
> -
> -	fs_param.s_log_block_size =
> -		int_log2(blocksize >> EXT2_MIN_BLOCK_LOG_SIZE);
>  	if (fs_param.s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_BIGALLOC) {
>  		if (!cluster_size)
>  			cluster_size = get_int_from_profile(fs_types,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 08/25] debugfs: fix various minor bogosity
  2013-10-18  4:49 ` [PATCH 08/25] debugfs: fix various minor bogosity Darrick J. Wong
@ 2013-11-25  8:08   ` Zheng Liu
  2013-11-25 18:05     ` Darrick J. Wong
  0 siblings, 1 reply; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  8:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, Darren Hart, linux-ext4, Robert Yang

On Thu, Oct 17, 2013 at 09:49:48PM -0700, Darrick J. Wong wrote:
> We should really use the ext2fs memory allocator functions in
> copy_file(), and we really should return a value if there's allocation
> problems.
> 
> Also fix up a minor bogosity in an error message.
> 
> Cc: Robert Yang <liezhi.yang@windriver.com>
> Cc: Darren Hart <dvhart@linux.intel.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Two places are missing to be fixed.  Otherwise the patch looks good to
me.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index d3db356..cc8dd20 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -1611,7 +1611,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
 	retval = ext2fs_get_memzero(bufsize, &zero_buf);
 	if (retval) {
 		com_err("copy_file", retval, "can't allocate buffer\n");
-		free(buf);
+		ext2fs_free_mem(&buf);
 		return retval;
 	}
 
@@ -1649,7 +1649,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
 			ptr += written;
 		}
 	}
-	free(buf);
+	ext2fs_free_mem(&buf);
 	ext2fs_free_mem(&zero_buf);
 	retval = ext2fs_file_close(e2_file);
 	return retval;

> ---
>  debugfs/debugfs.c |   11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index 4f6108d..d3db356 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -1601,9 +1601,10 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
>  	if (retval)
>  		return retval;
>  
> -	if (!(buf = (char *) malloc(bufsize))){
> -		com_err("copy_file", errno, "can't allocate buffer\n");
> -		return;
> +	retval = ext2fs_get_mem(bufsize, &buf);
> +	if (retval) {
> +		com_err("copy_file", retval, "can't allocate buffer\n");
> +		return retval;
>  	}
>  
>  	/* This is used for checking whether the whole block is zero */
> @@ -1654,7 +1655,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
>  	return retval;
>  
>  fail:
> -	free(buf);
> +	ext2fs_free_mem(&buf);
>  	ext2fs_free_mem(&zero_buf);
>  	(void) ext2fs_file_close(e2_file);
>  	return retval;
> @@ -2112,7 +2113,7 @@ void do_bmap(int argc, char *argv[])
>  
>  	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
>  	if (errcode) {
> -		com_err("argv[0]", errcode,
> +		com_err(argv[0], errcode,
>  			"while mapping logical block %llu\n", blk);
>  		return;
>  	}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses
  2013-10-18 18:37   ` Darrick J. Wong
@ 2013-11-25  8:18     ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  8:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Fri, Oct 18, 2013 at 11:37:00AM -0700, Darrick J. Wong wrote:
> On Thu, Oct 17, 2013 at 09:49:55PM -0700, Darrick J. Wong wrote:
> > The extended attribute refcounting code only accepts blk_t, which is
> > dangerous because EA blocks can exist at high addresses (> 2^32) as
> > well.  Therefore, widen the block fields to 64 bits.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> > ---
> >  e2fsck/e2fsck.h      |   12 ++++++------
> >  e2fsck/ea_refcount.c |   36 ++++++++++++++++++------------------
> >  2 files changed, 24 insertions(+), 24 deletions(-)
> > 
> > 
> > diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> > index 13d70f1..f1df525 100644
> > --- a/e2fsck/e2fsck.h
> > +++ b/e2fsck/e2fsck.h
> > @@ -432,17 +432,17 @@ extern struct dx_dir_info *e2fsck_dx_dir_info_iter(e2fsck_t ctx, int *control);
> >  /* ea_refcount.c */
> >  extern errcode_t ea_refcount_create(int size, ext2_refcount_t *ret);
> >  extern void ea_refcount_free(ext2_refcount_t refcount);
> > -extern errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
> > +extern errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk64_t blk,
> >  				   int *ret);
> >  extern errcode_t ea_refcount_increment(ext2_refcount_t refcount,
> > -				       blk_t blk, int *ret);
> > +				       blk64_t blk, int *ret);
> >  extern errcode_t ea_refcount_decrement(ext2_refcount_t refcount,
> > -				       blk_t blk, int *ret);
> > +				       blk64_t blk, int *ret);
> >  extern errcode_t ea_refcount_store(ext2_refcount_t refcount,
> > -				   blk_t blk, int count);
> > -extern blk_t ext2fs_get_refcount_size(ext2_refcount_t refcount);
> > +				   blk64_t blk, int count);
> > +extern blk64_t ext2fs_get_refcount_size(ext2_refcount_t refcount);
> >  extern void ea_refcount_intr_begin(ext2_refcount_t refcount);
> > -extern blk_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
> > +extern blk64_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
> >  
> >  /* ehandler.c */
> >  extern const char *ehandler_operation(const char *op);
> > diff --git a/e2fsck/ea_refcount.c b/e2fsck/ea_refcount.c
> > index e66e636..6f376a3 100644
> > --- a/e2fsck/ea_refcount.c
> > +++ b/e2fsck/ea_refcount.c
> > @@ -25,14 +25,14 @@
> >   * checked, its bit is set in the block_ea_map bitmap.
> >   */
> >  struct ea_refcount_el {
> > -	blk_t	ea_blk;
> > +	blk64_t	ea_blk;
> >  	int	ea_count;
> >  };
> >  
> >  struct ea_refcount {
> > -	blk_t		count;
> > -	blk_t		size;
> > -	blk_t		cursor;
> > +	unsigned long		count;
> > +	unsigned long		size;
> > +	unsigned long		cursor;
> 
> This (unsigned long instead of blk_t) is the only thing that changed since last
> time.
> 
> --D
> 
> >  	struct ea_refcount_el	*list;
> >  };
> >  
> > @@ -111,11 +111,11 @@ static void refcount_collapse(ext2_refcount_t refcount)
> >   * 	specified position.
> >   */
> >  static struct ea_refcount_el *insert_refcount_el(ext2_refcount_t refcount,
> > -						 blk_t blk, int pos)
> > +						 blk64_t blk, int pos)
> >  {
> >  	struct ea_refcount_el 	*el;
> >  	errcode_t		retval;
> > -	blk_t			new_size = 0;
> > +	blk64_t			new_size = 0;
> >  	int			num;
> >  
> >  	if (refcount->count >= refcount->size) {
> > @@ -153,7 +153,7 @@ static struct ea_refcount_el *insert_refcount_el(ext2_refcount_t refcount,
> >   * 	and we can't find an entry, create one in the sorted list.
> >   */
> >  static struct ea_refcount_el *get_refcount_el(ext2_refcount_t refcount,
> > -					      blk_t blk, int create)
> > +					      blk64_t blk, int create)
> >  {
> >  	int	low, high, mid;
> >  
> > @@ -206,7 +206,7 @@ retry:
> >  	return 0;
> >  }
> >  
> > -errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
> > +errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk64_t blk,
> >  				int *ret)
> >  {
> >  	struct ea_refcount_el	*el;
> > @@ -220,7 +220,7 @@ errcode_t ea_refcount_fetch(ext2_refcount_t refcount, blk_t blk,
> >  	return 0;
> >  }
> >  
> > -errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk_t blk, int *ret)
> > +errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk64_t blk, int *ret)
> >  {
> >  	struct ea_refcount_el	*el;
> >  
> > @@ -234,7 +234,7 @@ errcode_t ea_refcount_increment(ext2_refcount_t refcount, blk_t blk, int *ret)
> >  	return 0;
> >  }
> >  
> > -errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk_t blk, int *ret)
> > +errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk64_t blk, int *ret)
> >  {
> >  	struct ea_refcount_el	*el;
> >  
> > @@ -249,7 +249,7 @@ errcode_t ea_refcount_decrement(ext2_refcount_t refcount, blk_t blk, int *ret)
> >  	return 0;
> >  }
> >  
> > -errcode_t ea_refcount_store(ext2_refcount_t refcount, blk_t blk, int count)
> > +errcode_t ea_refcount_store(ext2_refcount_t refcount, blk64_t blk, int count)
> >  {
> >  	struct ea_refcount_el	*el;
> >  
> > @@ -263,7 +263,7 @@ errcode_t ea_refcount_store(ext2_refcount_t refcount, blk_t blk, int count)
> >  	return 0;
> >  }
> >  
> > -blk_t ext2fs_get_refcount_size(ext2_refcount_t refcount)
> > +blk64_t ext2fs_get_refcount_size(ext2_refcount_t refcount)
> >  {
> >  	if (!refcount)
> >  		return 0;
> > @@ -277,7 +277,7 @@ void ea_refcount_intr_begin(ext2_refcount_t refcount)
> >  }
> >  
> >  
> > -blk_t ea_refcount_intr_next(ext2_refcount_t refcount,
> > +blk64_t ea_refcount_intr_next(ext2_refcount_t refcount,
> >  				int *ret)
> >  {
> >  	struct ea_refcount_el	*list;
> > @@ -370,7 +370,7 @@ int main(int argc, char **argv)
> >  	int	i = 0;
> >  	ext2_refcount_t refcount;
> >  	int		size, arg;
> > -	blk_t		blk;
> > +	blk64_t		blk;
> >  	errcode_t	retval;
> >  
> >  	while (1) {
> > @@ -394,7 +394,7 @@ int main(int argc, char **argv)
> >  			printf("Freeing refcount\n");
> >  			break;
> >  		case BCODE_STORE:
> > -			blk = (blk_t) bcode_program[i++];
> > +			blk = (blk64_t) bcode_program[i++];
> >  			arg = bcode_program[i++];
> >  			printf("Storing blk %u with value %d\n", blk, arg);
> >  			retval = ea_refcount_store(refcount, blk, arg);
> > @@ -403,7 +403,7 @@ int main(int argc, char **argv)
> >  					"while storing blk %u", blk);
> >  			break;
> >  		case BCODE_FETCH:
> > -			blk = (blk_t) bcode_program[i++];
> > +			blk = (blk64_t) bcode_program[i++];
> >  			retval = ea_refcount_fetch(refcount, blk, &arg);
> >  			if (retval)
> >  				com_err("ea_refcount_fetch", retval,
> > @@ -413,7 +413,7 @@ int main(int argc, char **argv)
> >  				       blk, arg);
> >  			break;
> >  		case BCODE_INCR:
> > -			blk = (blk_t) bcode_program[i++];
> > +			blk = (blk64_t) bcode_program[i++];
> >  			retval = ea_refcount_increment(refcount, blk, &arg);
> >  			if (retval)
> >  				com_err("ea_refcount_increment", retval,
> > @@ -423,7 +423,7 @@ int main(int argc, char **argv)
> >  				       blk, arg);
> >  			break;
> >  		case BCODE_DECR:
> > -			blk = (blk_t) bcode_program[i++];
> > +			blk = (blk64_t) bcode_program[i++];
> >  			retval = ea_refcount_decrement(refcount, blk, &arg);
> >  			if (retval)
> >  				com_err("ea_refcount_decrement", retval,
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 10/25] debugfs: handle 64bit block numbers
  2013-10-18  4:50 ` [PATCH 10/25] debugfs: handle 64bit block numbers Darrick J. Wong
  2013-10-18 18:47   ` Darrick J. Wong
@ 2013-11-25  8:33   ` Zheng Liu
  2013-11-25 17:49     ` Darrick J. Wong
  1 sibling, 1 reply; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  8:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:01PM -0700, Darrick J. Wong wrote:
> debugfs should use strtoull wrappers for reading block numbers from
> the command line.  "unsigned long" isn't wide enough to handle block
> numbers on 32bit platforms.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  debugfs/debugfs.c      |   33 ++++++++++++++++++++++-----------
>  debugfs/extent_inode.c |   22 +++++++++-------------
>  debugfs/util.c         |    2 +-
>  3 files changed, 32 insertions(+), 25 deletions(-)
> 
> 
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index d3db356..46fcd07 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -181,8 +181,7 @@ void do_open_filesys(int argc, char **argv)
>  				return;
>  			break;
>  		case 's':
> -			superblock = parse_ulong(optarg, argv[0],
> -						 "superblock number", &err);
> +			err = strtoblk(argv[0], optarg, &superblock);
>  			if (err)
>  				return;
>  			break;
> @@ -277,14 +276,17 @@ void do_init_filesys(int argc, char **argv)
>  	struct ext2_super_block param;
>  	errcode_t	retval;
>  	int		err;
> +	blk64_t		blocks;
>  
>  	if (common_args_process(argc, argv, 3, 3, "initialize",
>  				"<device> <blocksize>", CHECK_FS_NOTOPEN))
                                           ^^^^^^^^^
To be honest, I never use this command in debugfs.  I am a little
confused.  If I understand correctly, we should pass a parameter as the
number of blocks.  But obviously here the usage tells us that we should
pass a parameter as the block size.  Do we need to fix it?

                                                - Zheng

>  		return;
>  
>  	memset(&param, 0, sizeof(struct ext2_super_block));
> -	ext2fs_blocks_count_set(&param, parse_ulong(argv[2], argv[0],
> -						    "blocks count", &err));
> +	err = strtoblk(argv[0], argv[2], &blocks);
> +	if (err)
> +		return;
> +	ext2fs_blocks_count_set(&param, blocks);
>  	if (err)
>  		return;
>  	retval = ext2fs_initialize(argv[1], 0, &param,
> @@ -2109,7 +2111,9 @@ void do_bmap(int argc, char *argv[])
>  	ino = string_to_inode(argv[1]);
>  	if (!ino)
>  		return;
> -	blk = parse_ulong(argv[2], argv[0], "logical_block", &err);
> +	err = strtoblk(argv[0], argv[2], &blk);
> +	if (err)
> +		return;
>  
>  	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
>  	if (errcode) {
> @@ -2254,10 +2258,14 @@ void do_punch(int argc, char *argv[])
>  	ino = string_to_inode(argv[1]);
>  	if (!ino)
>  		return;
> -	start = parse_ulong(argv[2], argv[0], "logical_block", &err);
> -	if (argc == 4)
> -		end = parse_ulong(argv[3], argv[0], "logical_block", &err);
> -	else
> +	err = strtoblk(argv[0], argv[2], &start);
> +	if (err)
> +		return;
> +	if (argc == 4) {
> +		err = strtoblk(argv[0], argv[3], &end);
> +		if (err)
> +			return;
> +	} else
>  		end = ~0;
>  
>  	errcode = ext2fs_punch(current_fs, ino, 0, 0, start, end);
> @@ -2474,8 +2482,11 @@ int main(int argc, char **argv)
>  						"block size", 0);
>  			break;
>  		case 's':
> -			superblock = parse_ulong(optarg, argv[0],
> -						 "superblock number", 0);
> +			retval = strtoblk(argv[0], optarg, &superblock);
> +			if (retval) {
> +				com_err(argv[0], retval, 0, debug_prog_name);
> +				return 1;
> +			}
>  			break;
>  		case 'c':
>  			catastrophic = 1;
> diff --git a/debugfs/extent_inode.c b/debugfs/extent_inode.c
> index 0bbc4c5..75e328c 100644
> --- a/debugfs/extent_inode.c
> +++ b/debugfs/extent_inode.c
> @@ -264,7 +264,7 @@ void do_replace_node(int argc, char *argv[])
>  		return;
>  	}
>  
> -	extent.e_lblk = parse_ulong(argv[1], argv[0], "logical block", &err);
> +	err = strtoblk(argv[0], argv[1], &extent.e_lblk);
>  	if (err)
>  		return;
>  
> @@ -272,7 +272,7 @@ void do_replace_node(int argc, char *argv[])
>  	if (err)
>  		return;
>  
> -	extent.e_pblk = parse_ulong(argv[3], argv[0], "logical block", &err);
> +	err = strtoblk(argv[0], argv[3], &extent.e_pblk);
>  	if (err)
>  		return;
>  
> @@ -338,8 +338,7 @@ void do_insert_node(int argc, char *argv[])
>  		return;
>  	}
>  
> -	extent.e_lblk = parse_ulong(argv[1], cmd,
> -				    "logical block", &err);
> +	err = strtoblk(cmd, argv[1], &extent.e_lblk);
>  	if (err)
>  		return;
>  
> @@ -348,8 +347,7 @@ void do_insert_node(int argc, char *argv[])
>  	if (err)
>  		return;
>  
> -	extent.e_pblk = parse_ulong(argv[3], cmd,
> -				    "pysical block", &err);
> +	err = strtoblk(cmd, argv[3], &extent.e_pblk);
>  	if (err)
>  		return;
>  
> @@ -366,8 +364,8 @@ void do_set_bmap(int argc, char **argv)
>  	const char	*usage = "[--uninit] <lblk> <pblk>";
>  	struct ext2fs_extent extent;
>  	errcode_t	retval;
> -	blk_t		logical;
> -	blk_t		physical;
> +	blk64_t		logical;
> +	blk64_t		physical;
>  	char		*cmd = argv[0];
>  	int		flags = 0;
>  	int		err;
> @@ -387,18 +385,16 @@ void do_set_bmap(int argc, char **argv)
>  		return;
>  	}
>  
> -	logical = parse_ulong(argv[1], cmd,
> -				    "logical block", &err);
> +	err = strtoblk(cmd, argv[1], &logical);
>  	if (err)
>  		return;
>  
> -	physical = parse_ulong(argv[2], cmd,
> -				    "physical block", &err);
> +	err = strtoblk(cmd, argv[2], &physical);
>  	if (err)
>  		return;
>  
>  	retval = ext2fs_extent_set_bmap(current_handle, logical,
> -					(blk64_t) physical, flags);
> +					physical, flags);
>  	if (retval) {
>  		com_err(cmd, retval, 0);
>  		return;
> diff --git a/debugfs/util.c b/debugfs/util.c
> index cf3a6c6..09088e0 100644
> --- a/debugfs/util.c
> +++ b/debugfs/util.c
> @@ -377,7 +377,7 @@ int common_block_args_process(int argc, char *argv[],
>  	}
>  
>  	if (argc > 2) {
> -		*count = parse_ulong(argv[2], argv[0], "count", &err);
> +		err = strtoblk(argv[0], argv[2], count);
>  		if (err)
>  			return 1;
>  	}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 11/25] libext2fs: only punch complete clusters
  2013-10-18  4:50 ` [PATCH 11/25] libext2fs: only punch complete clusters Darrick J. Wong
  2013-10-18 18:55   ` Darrick J. Wong
@ 2013-11-25  8:51   ` Zheng Liu
  1 sibling, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  8:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:08PM -0700, Darrick J. Wong wrote:
> When bigalloc is enabled, using ext2fs_block_alloc_stats2() to free
> any block in a cluster has the effect of freeing the entire cluster.
> This is problematic if a caller instructs us to punch, say, blocks
> 12-15 of a 16-block cluster, because blocks 0-11 now point to a "free"
> cluster.
> 
> The naive way to solve this problem is to see if any of the other
> blocks in this logical cluster map to a physical cluster.  If so, then
> we know that the cluster is still in use and it mustn't be freed.
> Otherwise, we are punching the last mapped block in this cluster, so
> we can free the cluster.
> 
> The implementation given only does the rigorous checks for the partial
> clusters at the beginning and end of the punching range.
> 
> v2: Refactor the block free code into a separate helper function that
> should be more efficient.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> ---
>  lib/ext2fs/bmap.c   |   29 ++++++++++++++++++
>  lib/ext2fs/ext2fs.h |    3 ++
>  lib/ext2fs/punch.c  |   82 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 109 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
> index 5074587..80f8f86 100644
> --- a/lib/ext2fs/bmap.c
> +++ b/lib/ext2fs/bmap.c
> @@ -173,6 +173,35 @@ static errcode_t implied_cluster_alloc(ext2_filsys fs, ext2_ino_t ino,
>  	return 0;
>  }
>  
> +/* Try to map a logical block to an already-allocated physical cluster. */
> +errcode_t ext2fs_map_cluster_block(ext2_filsys fs, ext2_ino_t ino,
> +				   struct ext2_inode *inode, blk64_t lblk,
> +				   blk64_t *pblk)
> +{
> +	ext2_extent_handle_t handle;
> +	errcode_t retval;
> +
> +	/* Need bigalloc and extents to be enabled */
> +	*pblk = 0;
> +	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> +					EXT4_FEATURE_RO_COMPAT_BIGALLOC) ||
> +	    !(inode->i_flags & EXT4_EXTENTS_FL))
> +		return 0;
> +
> +	retval = ext2fs_extent_open2(fs, ino, inode, &handle);
> +	if (retval)
> +		goto out;
> +
> +	retval = implied_cluster_alloc(fs, ino, inode, handle, lblk, pblk);
> +	if (retval)
> +		goto out2;
> +
> +out2:
> +	ext2fs_extent_free(handle);
> +out:
> +	return retval;
> +}
> +
>  static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
>  			     struct ext2_inode *inode,
>  			     ext2_extent_handle_t handle,
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 8f82dae..5247922 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -924,6 +924,9 @@ extern errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino,
>  			      struct ext2_inode *inode,
>  			      char *block_buf, int bmap_flags, blk64_t block,
>  			      int *ret_flags, blk64_t *phys_blk);
> +errcode_t ext2fs_map_cluster_block(ext2_filsys fs, ext2_ino_t ino,
> +				   struct ext2_inode *inode, blk64_t lblk,
> +				   blk64_t *pblk);
>  
>  #if 0
>  /* bmove.c */
> diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
> index 790a0ad8..1e4398e 100644
> --- a/lib/ext2fs/punch.c
> +++ b/lib/ext2fs/punch.c
> @@ -177,6 +177,75 @@ static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
>  #define dbg_printf(f, a...)		do { } while (0)
>  #endif
>  
> +/* Free a range of blocks, respecting cluster boundaries */
> +static errcode_t punch_extent_blocks(ext2_filsys fs, ext2_ino_t ino,
> +				     struct ext2_inode *inode,
> +				     blk64_t lfree_start, blk64_t free_start,
> +				     __u32 free_count, int *freed)
> +{
> +	blk64_t		pblk;
> +	int		freed_now = 0;
> +	__u32		cluster_freed;
> +	errcode_t	retval = 0;
> +
> +	/* No bigalloc?  Just free each block. */
> +	if (EXT2FS_CLUSTER_RATIO(fs) == 1) {
> +		*freed += free_count;
> +		while (free_count-- > 0)
> +			ext2fs_block_alloc_stats2(fs, free_start++, -1);
> +		return retval;
> +	}
> +
> +	/*
> +	 * Try to free up to the next cluster boundary.  We assume that all
> +	 * blocks in a logical cluster map to blocks from the same physical
> +	 * cluster, and that the offsets within the [pl]clusters match.
> +	 */
> +	if (free_start & EXT2FS_CLUSTER_MASK(fs)) {
> +		retval = ext2fs_map_cluster_block(fs, ino, inode,
> +						  lfree_start, &pblk);
> +		if (retval)
> +			goto errout;
> +		if (!pblk) {
> +			ext2fs_block_alloc_stats2(fs, free_start, -1);
> +			freed_now++;
> +		}
> +		cluster_freed = EXT2FS_CLUSTER_RATIO(fs) -
> +			(free_start & EXT2FS_CLUSTER_MASK(fs));
> +		if (cluster_freed > free_count)
> +			cluster_freed = free_count;
> +		free_count -= cluster_freed;
> +		free_start += cluster_freed;
> +		lfree_start += cluster_freed;
> +	}
> +
> +	/* Free whole clusters from the middle of the range. */
> +	while (free_count > 0 && free_count >= EXT2FS_CLUSTER_RATIO(fs)) {
> +		ext2fs_block_alloc_stats2(fs, free_start, -1);
> +		freed_now++;
> +		cluster_freed = EXT2FS_CLUSTER_RATIO(fs);
> +		free_count -= cluster_freed;
> +		free_start += cluster_freed;
> +		lfree_start += cluster_freed;
> +	}
> +
> +	/* Try to free the last cluster. */
> +	if (free_count > 0) {
> +		retval = ext2fs_map_cluster_block(fs, ino, inode,
> +						  lfree_start, &pblk);
> +		if (retval)
> +			goto errout;
> +		if (!pblk) {
> +			ext2fs_block_alloc_stats2(fs, free_start, -1);
> +			freed_now++;
> +		}
> +	}
> +
> +errout:
> +	*freed += freed_now;
> +	return retval;
> +}
> +
>  static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  				     struct ext2_inode *inode,
>  				     blk64_t start, blk64_t end)
> @@ -184,7 +253,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  	ext2_extent_handle_t	handle = 0;
>  	struct ext2fs_extent	extent;
>  	errcode_t		retval;
> -	blk64_t			free_start, next;
> +	blk64_t			free_start, next, lfree_start;
>  	__u32			free_count, newlen;
>  	int			freed = 0;
>  	int			op;
> @@ -211,6 +280,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			/* Start of deleted region before extent; 
>  			   adjust beginning of extent */
>  			free_start = extent.e_pblk;
> +			lfree_start = extent.e_lblk;
>  			if (next > end)
>  				free_count = end - extent.e_lblk + 1;
>  			else
> @@ -226,6 +296,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			dbg_printf("Case #%d\n", 2);
>  			newlen = start - extent.e_lblk;
>  			free_start = extent.e_pblk + newlen;
> +			lfree_start = extent.e_lblk + newlen;
>  			free_count = extent.e_len - newlen;
>  			extent.e_len = newlen;
>  		} else {
> @@ -241,6 +312,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  
>  			extent.e_len = start - extent.e_lblk;
>  			free_start = extent.e_pblk + extent.e_len;
> +			lfree_start = extent.e_lblk + extent.e_len;
>  			free_count = end - start + 1;
>  
>  			dbg_print_extent("inserting", &newex);
> @@ -281,10 +353,10 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
>  			goto errout;
>  		dbg_printf("Free start %llu, free count = %u\n",
>  		       free_start, free_count);
> -		while (free_count-- > 0) {
> -			ext2fs_block_alloc_stats2(fs, free_start++, -1);
> -			freed++;
> -		}
> +		retval = punch_extent_blocks(fs, ino, inode, lfree_start,
> +					     free_start, free_count, &freed);
> +		if (retval)
> +			goto errout;
>  	next_extent:
>  		retval = ext2fs_extent_get(handle, op,
>  					   &extent);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/25] libext2fs: don't update the summary counts when doing implied cluster allocation
  2013-10-18  4:50 ` [PATCH 12/25] libext2fs: don't update the summary counts when doing implied cluster allocation Darrick J. Wong
@ 2013-11-25  9:03   ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  9:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:14PM -0700, Darrick J. Wong wrote:
> When we're appending a block to a directory file or the journal file,
> and the new block is part of a cluster that has already been allocated
> to the file (implied cluster allocation), don't update the bitmap or
> the summary counts because that was performed when the cluster was
> allocated.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

In e2fsck/pass3.c we also have a expand_dir_proc() function to expand
dir.  Maybe it also needs to be fixed.  Otherwise the patch looks good
to me.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> ---
>  lib/ext2fs/expanddir.c |    2 +-
>  lib/ext2fs/mkjournal.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/lib/ext2fs/expanddir.c b/lib/ext2fs/expanddir.c
> index 22558d6..09a15fa 100644
> --- a/lib/ext2fs/expanddir.c
> +++ b/lib/ext2fs/expanddir.c
> @@ -55,6 +55,7 @@ static int expand_dir_proc(ext2_filsys	fs,
>  			return BLOCK_ABORT;
>  		}
>  		es->newblocks++;
> +		ext2fs_block_alloc_stats2(fs, new_blk, +1);
>  	}
>  	if (blockcnt > 0) {
>  		retval = ext2fs_new_dir_block(fs, 0, 0, &block);
> @@ -82,7 +83,6 @@ static int expand_dir_proc(ext2_filsys	fs,
>  	}
>  	ext2fs_free_mem(&block);
>  	*blocknr = new_blk;
> -	ext2fs_block_alloc_stats2(fs, new_blk, +1);
>  
>  	if (es->done)
>  		return (BLOCK_CHANGED | BLOCK_ABORT);
> diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> index 2afd3b7..8bf4670 100644
> --- a/lib/ext2fs/mkjournal.c
> +++ b/lib/ext2fs/mkjournal.c
> @@ -250,6 +250,7 @@ static int mkjournal_proc(ext2_filsys	fs,
>  			es->err = retval;
>  			return BLOCK_ABORT;
>  		}
> +		ext2fs_block_alloc_stats2(fs, new_blk, +1);
>  		es->newblocks++;
>  	}
>  	if (blockcnt >= 0)
> @@ -285,7 +286,6 @@ static int mkjournal_proc(ext2_filsys	fs,
>  		return BLOCK_ABORT;
>  	}
>  	*blocknr = es->goal = new_blk;
> -	ext2fs_block_alloc_stats2(fs, new_blk, +1);
>  
>  	if (es->num_blocks == 0)
>  		return (BLOCK_CHANGED | BLOCK_ABORT);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/25] libext2fs: use ext2fs_punch() to truncate quota file
  2013-10-18  4:50 ` [PATCH 13/25] libext2fs: use ext2fs_punch() to truncate quota file Darrick J. Wong
@ 2013-11-25  9:08   ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25  9:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:21PM -0700, Darrick J. Wong wrote:
> Use the new ext2fs_punch() call to truncate the quota file.  This also
> eliminates the need to fix it to work with bigalloc.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> ---
>  lib/quota/quotaio.c |   19 +++----------------
>  1 file changed, 3 insertions(+), 16 deletions(-)
> 
> 
> diff --git a/lib/quota/quotaio.c b/lib/quota/quotaio.c
> index 8ddb92a..1bdcba6 100644
> --- a/lib/quota/quotaio.c
> +++ b/lib/quota/quotaio.c
> @@ -98,19 +98,6 @@ void update_grace_times(struct dquot *q)
>  	}
>  }
>  
> -static int release_blocks_proc(ext2_filsys fs, blk64_t *blocknr,
> -			       e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
> -			       blk64_t ref_block EXT2FS_ATTR((unused)),
> -			       int ref_offset EXT2FS_ATTR((unused)),
> -			       void *private EXT2FS_ATTR((unused)))
> -{
> -	blk64_t	block;
> -
> -	block = *blocknr;
> -	ext2fs_block_alloc_stats2(fs, block, -1);
> -	return 0;
> -}
> -
>  static int compute_num_blocks_proc(ext2_filsys fs, blk64_t *blocknr,
>  			       e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)),
>  			       blk64_t ref_block EXT2FS_ATTR((unused)),
> @@ -135,9 +122,9 @@ errcode_t quota_inode_truncate(ext2_filsys fs, ext2_ino_t ino)
>  		inode.i_dtime = fs->now ? fs->now : time(0);
>  		if (!ext2fs_inode_has_valid_blocks2(fs, &inode))
>  			return 0;
> -
> -		ext2fs_block_iterate3(fs, ino, BLOCK_FLAG_READ_ONLY, NULL,
> -				      release_blocks_proc, NULL);
> +		err = ext2fs_punch(fs, ino, &inode, NULL, 0, ~0ULL);
> +		if (err)
> +			return err;
>  		fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
>  		memset(&inode, 0, sizeof(struct ext2_inode));
>  	} else {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/25] e2fsck: only release clusters when shortening a directory during a rehash
  2013-10-18  4:50 ` [PATCH 14/25] e2fsck: only release clusters when shortening a directory during a rehash Darrick J. Wong
@ 2013-11-25 11:09   ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25 11:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:27PM -0700, Darrick J. Wong wrote:
> When the rehash process is running on a bigalloc filesystem, it
> compresses all the directory entries and hash structures into the
> beginning of the directory file and then uses block_iterate3() to free
> the blocks off the end of the file.  It seems to call
> ext2fs_block_alloc_stats2() for every block in a cluster, which is
> unfortunate because this function allocates and frees entire clusters
> (and updates the summary counts accordingly).  In this case e2fsck
> writes out incorrect summary counts.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> ---
>  e2fsck/rehash.c |   14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
> index 6ef3568..29da9a1 100644
> --- a/e2fsck/rehash.c
> +++ b/e2fsck/rehash.c
> @@ -719,10 +719,18 @@ static int write_dir_block(ext2_filsys fs,
>  		/* We don't need this block, so release it */
>  		e2fsck_read_bitmaps(wd->ctx);
>  		blk = *block_nr;
> -		ext2fs_unmark_block_bitmap2(wd->ctx->block_found_map, blk);
> -		ext2fs_block_alloc_stats2(fs, blk, -1);
> +		/*
> +		 * In theory, we only release blocks from the end of the
> +		 * directory file, so it's fine to clobber a whole cluster at
> +		 * once.
> +		 */
> +		if (blk % EXT2FS_CLUSTER_RATIO(fs) == 0) {
> +			ext2fs_unmark_block_bitmap2(wd->ctx->block_found_map,
> +						    blk);
> +			ext2fs_block_alloc_stats2(fs, blk, -1);
> +			wd->cleared++;
> +		}
>  		*block_nr = 0;
> -		wd->cleared++;
>  		return BLOCK_CHANGED;
>  	}
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/25] e2fsck: print cluster ranges when encountering bitmap errors
  2013-10-18  4:50 ` [PATCH 15/25] e2fsck: print cluster ranges when encountering bitmap errors Darrick J. Wong
@ 2013-11-25 11:56   ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-25 11:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:35PM -0700, Darrick J. Wong wrote:
> If pass5 finds bitmap errors in a range of clusters, don't print each
> cluster number individually when we could print only the start and end
> cluster number.  e2fsck already does this for the non-bigalloc case.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>

                                                - Zheng

> ---
>  e2fsck/pass5.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> index 346c831..30dc70a 100644
> --- a/e2fsck/pass5.c
> +++ b/e2fsck/pass5.c
> @@ -528,8 +528,8 @@ redo_counts:
>  			save_problem = problem;
>  		} else {
>  			if ((problem == save_problem) &&
> -			    (pctx.blk2 == i-1))
> -				pctx.blk2++;
> +			    (pctx.blk2 == i - EXT2FS_CLUSTER_RATIO(fs)))
> +				pctx.blk2 += EXT2FS_CLUSTER_RATIO(fs);
>  			else {
>  				print_bitmap_problem(ctx, save_problem, &pctx);
>  				pctx.blk = pctx.blk2 = i;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 10/25] debugfs: handle 64bit block numbers
  2013-11-25  8:33   ` Zheng Liu
@ 2013-11-25 17:49     ` Darrick J. Wong
  0 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-25 17:49 UTC (permalink / raw)
  To: tytso, linux-ext4

On Mon, Nov 25, 2013 at 04:33:02PM +0800, Zheng Liu wrote:
> On Thu, Oct 17, 2013 at 09:50:01PM -0700, Darrick J. Wong wrote:
> > debugfs should use strtoull wrappers for reading block numbers from
> > the command line.  "unsigned long" isn't wide enough to handle block
> > numbers on 32bit platforms.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  debugfs/debugfs.c      |   33 ++++++++++++++++++++++-----------
> >  debugfs/extent_inode.c |   22 +++++++++-------------
> >  debugfs/util.c         |    2 +-
> >  3 files changed, 32 insertions(+), 25 deletions(-)
> > 
> > 
> > diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> > index d3db356..46fcd07 100644
> > --- a/debugfs/debugfs.c
> > +++ b/debugfs/debugfs.c
> > @@ -181,8 +181,7 @@ void do_open_filesys(int argc, char **argv)
> >  				return;
> >  			break;
> >  		case 's':
> > -			superblock = parse_ulong(optarg, argv[0],
> > -						 "superblock number", &err);
> > +			err = strtoblk(argv[0], optarg, &superblock);
> >  			if (err)
> >  				return;
> >  			break;
> > @@ -277,14 +276,17 @@ void do_init_filesys(int argc, char **argv)
> >  	struct ext2_super_block param;
> >  	errcode_t	retval;
> >  	int		err;
> > +	blk64_t		blocks;
> >  
> >  	if (common_args_process(argc, argv, 3, 3, "initialize",
> >  				"<device> <blocksize>", CHECK_FS_NOTOPEN))
>                                            ^^^^^^^^^
> To be honest, I never use this command in debugfs.  I am a little
> confused.  If I understand correctly, we should pass a parameter as the
> number of blocks.  But obviously here the usage tells us that we should
> pass a parameter as the block size.  Do we need to fix it?

Yup.  That help text is incorrect.  I'll add it to my stack.

--D
> 
>                                                 - Zheng
> 
> >  		return;
> >  
> >  	memset(&param, 0, sizeof(struct ext2_super_block));
> > -	ext2fs_blocks_count_set(&param, parse_ulong(argv[2], argv[0],
> > -						    "blocks count", &err));
> > +	err = strtoblk(argv[0], argv[2], &blocks);
> > +	if (err)
> > +		return;
> > +	ext2fs_blocks_count_set(&param, blocks);
> >  	if (err)
> >  		return;
> >  	retval = ext2fs_initialize(argv[1], 0, &param,
> > @@ -2109,7 +2111,9 @@ void do_bmap(int argc, char *argv[])
> >  	ino = string_to_inode(argv[1]);
> >  	if (!ino)
> >  		return;
> > -	blk = parse_ulong(argv[2], argv[0], "logical_block", &err);
> > +	err = strtoblk(argv[0], argv[2], &blk);
> > +	if (err)
> > +		return;
> >  
> >  	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
> >  	if (errcode) {
> > @@ -2254,10 +2258,14 @@ void do_punch(int argc, char *argv[])
> >  	ino = string_to_inode(argv[1]);
> >  	if (!ino)
> >  		return;
> > -	start = parse_ulong(argv[2], argv[0], "logical_block", &err);
> > -	if (argc == 4)
> > -		end = parse_ulong(argv[3], argv[0], "logical_block", &err);
> > -	else
> > +	err = strtoblk(argv[0], argv[2], &start);
> > +	if (err)
> > +		return;
> > +	if (argc == 4) {
> > +		err = strtoblk(argv[0], argv[3], &end);
> > +		if (err)
> > +			return;
> > +	} else
> >  		end = ~0;
> >  
> >  	errcode = ext2fs_punch(current_fs, ino, 0, 0, start, end);
> > @@ -2474,8 +2482,11 @@ int main(int argc, char **argv)
> >  						"block size", 0);
> >  			break;
> >  		case 's':
> > -			superblock = parse_ulong(optarg, argv[0],
> > -						 "superblock number", 0);
> > +			retval = strtoblk(argv[0], optarg, &superblock);
> > +			if (retval) {
> > +				com_err(argv[0], retval, 0, debug_prog_name);
> > +				return 1;
> > +			}
> >  			break;
> >  		case 'c':
> >  			catastrophic = 1;
> > diff --git a/debugfs/extent_inode.c b/debugfs/extent_inode.c
> > index 0bbc4c5..75e328c 100644
> > --- a/debugfs/extent_inode.c
> > +++ b/debugfs/extent_inode.c
> > @@ -264,7 +264,7 @@ void do_replace_node(int argc, char *argv[])
> >  		return;
> >  	}
> >  
> > -	extent.e_lblk = parse_ulong(argv[1], argv[0], "logical block", &err);
> > +	err = strtoblk(argv[0], argv[1], &extent.e_lblk);
> >  	if (err)
> >  		return;
> >  
> > @@ -272,7 +272,7 @@ void do_replace_node(int argc, char *argv[])
> >  	if (err)
> >  		return;
> >  
> > -	extent.e_pblk = parse_ulong(argv[3], argv[0], "logical block", &err);
> > +	err = strtoblk(argv[0], argv[3], &extent.e_pblk);
> >  	if (err)
> >  		return;
> >  
> > @@ -338,8 +338,7 @@ void do_insert_node(int argc, char *argv[])
> >  		return;
> >  	}
> >  
> > -	extent.e_lblk = parse_ulong(argv[1], cmd,
> > -				    "logical block", &err);
> > +	err = strtoblk(cmd, argv[1], &extent.e_lblk);
> >  	if (err)
> >  		return;
> >  
> > @@ -348,8 +347,7 @@ void do_insert_node(int argc, char *argv[])
> >  	if (err)
> >  		return;
> >  
> > -	extent.e_pblk = parse_ulong(argv[3], cmd,
> > -				    "pysical block", &err);
> > +	err = strtoblk(cmd, argv[3], &extent.e_pblk);
> >  	if (err)
> >  		return;
> >  
> > @@ -366,8 +364,8 @@ void do_set_bmap(int argc, char **argv)
> >  	const char	*usage = "[--uninit] <lblk> <pblk>";
> >  	struct ext2fs_extent extent;
> >  	errcode_t	retval;
> > -	blk_t		logical;
> > -	blk_t		physical;
> > +	blk64_t		logical;
> > +	blk64_t		physical;
> >  	char		*cmd = argv[0];
> >  	int		flags = 0;
> >  	int		err;
> > @@ -387,18 +385,16 @@ void do_set_bmap(int argc, char **argv)
> >  		return;
> >  	}
> >  
> > -	logical = parse_ulong(argv[1], cmd,
> > -				    "logical block", &err);
> > +	err = strtoblk(cmd, argv[1], &logical);
> >  	if (err)
> >  		return;
> >  
> > -	physical = parse_ulong(argv[2], cmd,
> > -				    "physical block", &err);
> > +	err = strtoblk(cmd, argv[2], &physical);
> >  	if (err)
> >  		return;
> >  
> >  	retval = ext2fs_extent_set_bmap(current_handle, logical,
> > -					(blk64_t) physical, flags);
> > +					physical, flags);
> >  	if (retval) {
> >  		com_err(cmd, retval, 0);
> >  		return;
> > diff --git a/debugfs/util.c b/debugfs/util.c
> > index cf3a6c6..09088e0 100644
> > --- a/debugfs/util.c
> > +++ b/debugfs/util.c
> > @@ -377,7 +377,7 @@ int common_block_args_process(int argc, char *argv[],
> >  	}
> >  
> >  	if (argc > 2) {
> > -		*count = parse_ulong(argv[2], argv[0], "count", &err);
> > +		err = strtoblk(argv[0], argv[2], count);
> >  		if (err)
> >  			return 1;
> >  	}
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE
  2013-11-25  7:09   ` Zheng Liu
@ 2013-11-25 17:57     ` Darrick J. Wong
  0 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-25 17:57 UTC (permalink / raw)
  To: tytso, linux-ext4

On Mon, Nov 25, 2013 at 03:09:02PM +0800, Zheng Liu wrote:
> On Thu, Oct 17, 2013 at 09:49:35PM -0700, Darrick J. Wong wrote:
> > For each site where we test for a large file (> 2GB) and set the
> > LARGE_FILE feature, use a helper function to make the size test
> > consistent with the test that's in e2fsck.  This fixes the fsck
> > complaints when we try to create a 2GB journal (not so hard with 64k
> > block size) and fixes the incorrect test in fileio.c.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In e2fsck/pass2.c there is also a place that needs to be fixed.
> Otherwise the patch looks good to me.
> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
> 
>                                                 - Zheng
> 
> diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
> index 3c0bf49..66ed665 100644
> --- a/e2fsck/pass2.c
> +++ b/e2fsck/pass2.c
> @@ -1318,7 +1318,8 @@ static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf)
>  	if (!ext2fs_inode_has_valid_blocks2(fs, &inode))
>  		goto clear_inode;
>  
> -	if (LINUX_S_ISREG(inode.i_mode) && EXT2_I_SIZE(&inode) >= 0x80000000UL)
> +	if (LINUX_S_ISREG(inode.i_mode) &&
> +	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&inode)))
>  		ctx->large_files--;
>  
>  	del_block.ctx = ctx;

Good catch!  Thank you; I'll update the patch.

--D
> 
> > ---
> >  e2fsck/pass1.c         |    3 ++-
> >  lib/ext2fs/ext2fs.h    |    6 ++++++
> >  lib/ext2fs/fileio.c    |    2 +-
> >  lib/ext2fs/mkjournal.c |    2 +-
> >  4 files changed, 10 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> > index ab23e42..8c18a93 100644
> > --- a/e2fsck/pass1.c
> > +++ b/e2fsck/pass1.c
> > @@ -2281,7 +2281,8 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
> >  		}
> >  		pctx->num = 0;
> >  	}
> > -	if (LINUX_S_ISREG(inode->i_mode) && EXT2_I_SIZE(inode) >= 0x80000000UL)
> > +	if (LINUX_S_ISREG(inode->i_mode) &&
> > +	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(inode)))
> >  		ctx->large_files++;
> >  	if ((pb.num_blocks != ext2fs_inode_i_blocks(fs, inode)) ||
> >  	    ((fs->super->s_feature_ro_compat &
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 1ef4d67..8f82dae 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -646,6 +646,12 @@ static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> >  			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
> >  }
> >  
> > +/* The LARGE_FILE feature should be set if we have stored files 2GB+ in size */
> > +static inline int ext2fs_needs_large_file_feature(unsigned long long file_size)
> > +{
> > +	return file_size >= 0x80000000ULL;
> > +}
> > +
> >  /* alloc.c */
> >  extern errcode_t ext2fs_new_inode(ext2_filsys fs, ext2_ino_t dir, int mode,
> >  				  ext2fs_inode_bitmap map, ext2_ino_t *ret);
> > diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
> > index 02e6263..6b213b5 100644
> > --- a/lib/ext2fs/fileio.c
> > +++ b/lib/ext2fs/fileio.c
> > @@ -400,7 +400,7 @@ errcode_t ext2fs_file_set_size2(ext2_file_t file, ext2_off64_t size)
> >  
> >  	/* If we're writing a large file, set the large_file flag */
> >  	if (LINUX_S_ISREG(file->inode.i_mode) &&
> > -	    EXT2_I_SIZE(&file->inode) > 0x7FFFFFFULL &&
> > +	    ext2fs_needs_large_file_feature(EXT2_I_SIZE(&file->inode)) &&
> >  	    (!EXT2_HAS_RO_COMPAT_FEATURE(file->fs->super,
> >  					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE) ||
> >  	     file->fs->super->s_rev_level == EXT2_GOOD_OLD_REV)) {
> > diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
> > index c636a97..2afd3b7 100644
> > --- a/lib/ext2fs/mkjournal.c
> > +++ b/lib/ext2fs/mkjournal.c
> > @@ -378,7 +378,7 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
> >  	inode_size = (unsigned long long)fs->blocksize * num_blocks;
> >  	inode.i_size = inode_size & 0xFFFFFFFF;
> >  	inode.i_size_high = (inode_size >> 32) & 0xFFFFFFFF;
> > -	if (inode.i_size_high)
> > +	if (ext2fs_needs_large_file_feature(inode_size))
> >  		fs->super->s_feature_ro_compat |=
> >  			EXT2_FEATURE_RO_COMPAT_LARGE_FILE;
> >  	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 08/25] debugfs: fix various minor bogosity
  2013-11-25  8:08   ` Zheng Liu
@ 2013-11-25 18:05     ` Darrick J. Wong
  0 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-25 18:05 UTC (permalink / raw)
  To: tytso, Darren Hart, linux-ext4, Robert Yang

On Mon, Nov 25, 2013 at 04:08:24PM +0800, Zheng Liu wrote:
> On Thu, Oct 17, 2013 at 09:49:48PM -0700, Darrick J. Wong wrote:
> > We should really use the ext2fs memory allocator functions in
> > copy_file(), and we really should return a value if there's allocation
> > problems.
> > 
> > Also fix up a minor bogosity in an error message.
> > 
> > Cc: Robert Yang <liezhi.yang@windriver.com>
> > Cc: Darren Hart <dvhart@linux.intel.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Two places are missing to be fixed.  Otherwise the patch looks good to
> me.
> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
> 
>                                                 - Zheng
> 
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index d3db356..cc8dd20 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -1611,7 +1611,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
>  	retval = ext2fs_get_memzero(bufsize, &zero_buf);
>  	if (retval) {
>  		com_err("copy_file", retval, "can't allocate buffer\n");
> -		free(buf);
> +		ext2fs_free_mem(&buf);
>  		return retval;
>  	}
>  
> @@ -1649,7 +1649,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
>  			ptr += written;
>  		}
>  	}
> -	free(buf);
> +	ext2fs_free_mem(&buf);
>  	ext2fs_free_mem(&zero_buf);
>  	retval = ext2fs_file_close(e2_file);
>  	return retval;

Good catch.  I'll update the patch.

--D

> 
> > ---
> >  debugfs/debugfs.c |   11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> > index 4f6108d..d3db356 100644
> > --- a/debugfs/debugfs.c
> > +++ b/debugfs/debugfs.c
> > @@ -1601,9 +1601,10 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
> >  	if (retval)
> >  		return retval;
> >  
> > -	if (!(buf = (char *) malloc(bufsize))){
> > -		com_err("copy_file", errno, "can't allocate buffer\n");
> > -		return;
> > +	retval = ext2fs_get_mem(bufsize, &buf);
> > +	if (retval) {
> > +		com_err("copy_file", retval, "can't allocate buffer\n");
> > +		return retval;
> >  	}
> >  
> >  	/* This is used for checking whether the whole block is zero */
> > @@ -1654,7 +1655,7 @@ static errcode_t copy_file(int fd, ext2_ino_t newfile, int bufsize, int make_hol
> >  	return retval;
> >  
> >  fail:
> > -	free(buf);
> > +	ext2fs_free_mem(&buf);
> >  	ext2fs_free_mem(&zero_buf);
> >  	(void) ext2fs_file_close(e2_file);
> >  	return retval;
> > @@ -2112,7 +2113,7 @@ void do_bmap(int argc, char *argv[])
> >  
> >  	errcode = ext2fs_bmap2(current_fs, ino, 0, 0, 0, blk, 0, &pblk);
> >  	if (errcode) {
> > -		com_err("argv[0]", errcode,
> > +		com_err(argv[0], errcode,
> >  			"while mapping logical block %llu\n", blk);
> >  		return;
> >  	}
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/25] resize2fs: convert fs to and from 64bit mode
  2013-10-18  4:50 ` [PATCH 16/25] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
  2013-10-18 18:59   ` Darrick J. Wong
@ 2013-11-26  6:44   ` Zheng Liu
  2013-11-26 18:39     ` Darrick J. Wong
  1 sibling, 1 reply; 73+ messages in thread
From: Zheng Liu @ 2013-11-26  6:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:50:42PM -0700, Darrick J. Wong wrote:
> resize2fs does its magic by loading a filesystem, duplicating the
> in-memory image of that fs, moving relevant blocks out of the way of
> whatever new metadata get created, and finally writing everything back
> out to disk.  Enabling 64bit mode enlarges the group descriptors,
> which makes resize2fs a reasonable vehicle for taking care of the rest
> of the bookkeeping requirements, so add to resize2fs the ability to
> convert a filesystem to 64bit mode and back.

Sorry, I don't get your point why we need to add these arguments to
enable/disable 64bit mode.  If I understand correctly, we don't disable
64bit mode for a file system which is larger than 2^32 blocks.  So that
means that we just disable it for a file system which 64bit shouldn't be
enabled.  Is it worth doing this?

Otherwise one nit below.

                                                - Zheng

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  resize/main.c         |   40 ++++++-
>  resize/resize2fs.8.in |   18 +++
>  resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  resize/resize2fs.h    |    3 +
>  4 files changed, 336 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/resize/main.c b/resize/main.c
> index 1394ae1..ad0c946 100644
> --- a/resize/main.c
> +++ b/resize/main.c
> @@ -41,7 +41,7 @@ char *program_name, *device_name, *io_options;
>  static void usage (char *prog)
>  {
>  	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
> -			   "[-p] device [new_size]\n\n"), prog);
> +			   "[-p] device [-b|-s|new_size]\n\n"), prog);
>  
>  	exit (1);
>  }
> @@ -199,7 +199,7 @@ int main (int argc, char ** argv)
>  	if (argc && *argv)
>  		program_name = *argv;
>  
> -	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
> +	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
>  		switch (c) {
>  		case 'h':
>  			usage(program_name);
> @@ -225,6 +225,12 @@ int main (int argc, char ** argv)
>  		case 'S':
>  			use_stride = atoi(optarg);
>  			break;
> +		case 'b':
> +			flags |= RESIZE_ENABLE_64BIT;
> +			break;
> +		case 's':
> +			flags |= RESIZE_DISABLE_64BIT;
> +			break;
>  		default:
>  			usage(program_name);
>  		}
> @@ -383,6 +389,10 @@ int main (int argc, char ** argv)
>  		if (sys_page_size > fs->blocksize)
>  			new_size &= ~((sys_page_size / fs->blocksize)-1);
>  	}
> +	/* If changing 64bit, don't change the filesystem size. */
> +	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> +		new_size = ext2fs_blocks_count(fs->super);
> +	}
>  	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
>  				       EXT4_FEATURE_INCOMPAT_64BIT)) {
>  		/* Take 16T down to 2^32-1 blocks */
> @@ -434,7 +444,31 @@ int main (int argc, char ** argv)
>  			fs->blocksize / 1024, new_size);
>  		exit(1);
>  	}
> -	if (new_size == ext2fs_blocks_count(fs->super)) {
> +	if (flags & RESIZE_DISABLE_64BIT && flags & RESIZE_ENABLE_64BIT) {
            ^^^^^
Coding style problem:
        if ((flags & RESIZE_ENABLE_64BIT) && (flags & RESIZE_ENABLE_64BIT))

> +		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
> +		exit(1);
> +	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> +		new_size = ext2fs_blocks_count(fs->super);
> +		if (new_size >= (1ULL << 32)) {
> +			fprintf(stderr, _("Cannot change the 64bit feature "
> +				"on a filesystem that is larger than "
> +				"2^32 blocks.\n"));
> +			exit(1);
> +		}
> +		if (mount_flags & EXT2_MF_MOUNTED) {
> +			fprintf(stderr, _("Cannot change the 64bit feature "
> +				"while the filesystem is mounted.\n"));
> +			exit(1);
> +		}
> +		if (flags & RESIZE_ENABLE_64BIT &&
                    ^^^^
                    ditto

> +		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> +				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
> +			fprintf(stderr, _("Please enable the extents feature "
> +				"with tune2fs before enabling the 64bit "
> +				"feature.\n"));
> +			exit(1);
> +		}
> +	} else if (new_size == ext2fs_blocks_count(fs->super)) {
>  		fprintf(stderr, _("The filesystem is already %llu blocks "
>  			"long.  Nothing to do!\n\n"), new_size);
>  		exit(0);
> diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
> index a1f3099..1c75816 100644
> --- a/resize/resize2fs.8.in
> +++ b/resize/resize2fs.8.in
> @@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
>  .SH SYNOPSIS
>  .B resize2fs
>  [
> -.B \-fFpPM
> +.B \-fFpPMbs
>  ]
>  [
>  .B \-d
> @@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
>  to shrink the size of the partition.  When shrinking the size of
>  the partition, make sure you do not make it smaller than the new size
>  of the ext2 filesystem!
> +.PP
> +The
> +.B \-b
> +and
> +.B \-s
> +options enable and disable the 64bit feature, respectively.  The resize2fs
> +program will, of course, take care of resizing the block group descriptors
> +and moving other data blocks out of the way, as needed.  It is not possible
> +to resize the filesystem concurrent with changing the 64bit status.
>  .SH OPTIONS
>  .TP
> +.B \-b
> +Turns on the 64bit feature, resizes the group descriptors as necessary, and
> +moves other metadata out of the way.
> +.TP
>  .B \-d \fIdebug-flags
>  Turns on various resize2fs debugging features, if they have been compiled
>  into the binary.
> @@ -126,6 +139,9 @@ of what the program is doing.
>  .B \-P
>  Print the minimum size of the filesystem and exit.
>  .TP
> +.B \-s
> +Turns off the 64bit feature and frees blocks that are no longer in use.
> +.TP
>  .B \-S \fIRAID-stride
>  The
>  .B resize2fs
> diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> index 0feff0f..05ba6e1 100644
> --- a/resize/resize2fs.c
> +++ b/resize/resize2fs.c
> @@ -53,6 +53,9 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs);
>  static errcode_t fix_sb_journal_backup(ext2_filsys fs);
>  static errcode_t mark_table_blocks(ext2_filsys fs,
>  				   ext2fs_block_bitmap bmap);
> +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
> +static errcode_t move_bg_metadata(ext2_resize_t rfs);
> +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
>  
>  /*
>   * Some helper CPP macros
> @@ -119,13 +122,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
>  	if (retval)
>  		goto errout;
>  
> +	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
> +	retval = resize_group_descriptors(rfs, *new_size);
> +	if (retval)
> +		goto errout;
> +	print_resource_track(rfs, &rtrack, fs->io);
> +
> +	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
> +	retval = move_bg_metadata(rfs);
> +	if (retval)
> +		goto errout;
> +	print_resource_track(rfs, &rtrack, fs->io);
> +
> +	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
> +	retval = zero_high_bits_in_inodes(rfs);
> +	if (retval)
> +		goto errout;
> +	print_resource_track(rfs, &rtrack, fs->io);
> +
>  	init_resource_track(&rtrack, "adjust_superblock", fs->io);
>  	retval = adjust_superblock(rfs, *new_size);
>  	if (retval)
>  		goto errout;
>  	print_resource_track(rfs, &rtrack, fs->io);
>  
> -
>  	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
>  	fix_uninit_block_bitmaps(rfs->new_fs);
>  	print_resource_track(rfs, &rtrack, fs->io);
> @@ -221,6 +241,259 @@ errout:
>  	return retval;
>  }
>  
> +/* Toggle 64bit mode */
> +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
> +{
> +	void *o, *n, *new_group_desc;
> +	dgrp_t i;
> +	int copy_size;
> +	errcode_t retval;
> +
> +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> +		return 0;
> +
> +	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
> +	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
> +	    (rfs->flags & RESIZE_DISABLE_64BIT &&
> +	     rfs->flags & RESIZE_ENABLE_64BIT))
> +		return EXT2_ET_INVALID_ARGUMENT;
> +
> +	if (rfs->flags & RESIZE_DISABLE_64BIT) {
> +		rfs->new_fs->super->s_feature_incompat &=
> +				~EXT4_FEATURE_INCOMPAT_64BIT;
> +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
> +	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
> +		rfs->new_fs->super->s_feature_incompat |=
> +				EXT4_FEATURE_INCOMPAT_64BIT;
> +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
> +	}
> +
> +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> +		return 0;
> +
> +	o = rfs->new_fs->group_desc;
> +	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
> +			rfs->old_fs->group_desc_count,
> +			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
> +	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
> +				      rfs->old_fs->blocksize, &new_group_desc);
> +	if (retval)
> +		return retval;
> +
> +	n = new_group_desc;
> +
> +	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
> +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> +		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
> +	else
> +		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> +		memcpy(n, o, copy_size);
> +		n += EXT2_DESC_SIZE(rfs->new_fs->super);
> +		o += EXT2_DESC_SIZE(rfs->old_fs->super);
> +	}
> +
> +	ext2fs_free_mem(&rfs->new_fs->group_desc);
> +	rfs->new_fs->group_desc = new_group_desc;
> +
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
> +		ext2fs_group_desc_csum_set(rfs->new_fs, i);
> +
> +	return 0;
> +}
> +
> +/* Move bitmaps/inode tables out of the way. */
> +static errcode_t move_bg_metadata(ext2_resize_t rfs)
> +{
> +	dgrp_t i;
> +	blk64_t b, c, d;
> +	ext2fs_block_bitmap old_map, new_map;
> +	int old, new;
> +	errcode_t retval;
> +	int zero = 0, one = 1;
> +
> +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> +		return 0;
> +
> +	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
> +	if (retval)
> +		return retval;
> +
> +	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
> +	if (retval)
> +		goto out;
> +
> +	/* Construct bitmaps of super/descriptor blocks in old and new fs */
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> +		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
> +						   NULL);
> +		if (retval)
> +			goto out;
> +		ext2fs_mark_block_bitmap2(old_map, b);
> +		ext2fs_mark_block_bitmap2(old_map, c);
> +		ext2fs_mark_block_bitmap2(old_map, d);
> +
> +		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
> +						   NULL);
> +		if (retval)
> +			goto out;
> +		ext2fs_mark_block_bitmap2(new_map, b);
> +		ext2fs_mark_block_bitmap2(new_map, c);
> +		ext2fs_mark_block_bitmap2(new_map, d);
> +	}
> +
> +	/* Find changes in block allocations for bg metadata */
> +	for (b = 0;
> +	     b < ext2fs_blocks_count(rfs->new_fs->super);
> +	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
> +		old = ext2fs_test_block_bitmap2(old_map, b);
> +		new = ext2fs_test_block_bitmap2(new_map, b);
> +
> +		if (old && !new)
> +			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
> +		else if (!old && new)
> +			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
> +		else
> +			ext2fs_unmark_block_bitmap2(new_map, b);
> +	}
> +	/* new_map now shows blocks that have been newly allocated. */
> +
> +	/* Move any conflicting bitmaps and inode tables */
> +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> +		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
> +		if (ext2fs_test_block_bitmap2(new_map, b))
> +			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
> +
> +		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
> +		if (ext2fs_test_block_bitmap2(new_map, b))
> +			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
> +
> +		c = ext2fs_inode_table_loc(rfs->new_fs, i);
> +		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
> +			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
> +				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
> +				break;
> +			}
> +		}
> +	}
> +
> +out:
> +	if (old_map)
> +		ext2fs_free_block_bitmap(old_map);
> +	if (new_map)
> +		ext2fs_free_block_bitmap(new_map);
> +	return retval;
> +}
> +
> +/* Zero out the high bits of extent fields */
> +static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
> +				 struct ext2_inode *inode)
> +{
> +	ext2_extent_handle_t	handle;
> +	struct ext2fs_extent	extent;
> +	int			op = EXT2_EXTENT_ROOT;
> +	errcode_t		errcode;
> +
> +	if (!(inode->i_flags & EXT4_EXTENTS_FL))
> +		return 0;
> +
> +	errcode = ext2fs_extent_open(fs, ino, &handle);
> +	if (errcode)
> +		return errcode;
> +
> +	while (1) {
> +		errcode = ext2fs_extent_get(handle, op, &extent);
> +		if (errcode)
> +			break;
> +
> +		op = EXT2_EXTENT_NEXT_SIB;
> +
> +		if (extent.e_pblk > (1ULL << 32)) {
> +			extent.e_pblk &= (1ULL << 32) - 1;
> +			errcode = ext2fs_extent_replace(handle, 0, &extent);
> +			if (errcode)
> +				break;
> +		}
> +	}
> +
> +	/* Ok if we run off the end */
> +	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
> +		errcode = 0;
> +	return errcode;
> +}
> +
> +/* Zero out the high bits of inodes. */
> +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
> +{
> +	ext2_filsys	fs = rfs->new_fs;
> +	int length = EXT2_INODE_SIZE(fs->super);
> +	struct ext2_inode *inode = NULL;
> +	ext2_inode_scan	scan = NULL;
> +	errcode_t	retval;
> +	ext2_ino_t	ino;
> +	blk64_t		file_acl_block;
> +	int		inode_dirty;
> +
> +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> +		return 0;
> +
> +	if (fs->super->s_creator_os != EXT2_OS_LINUX)
> +		return 0;
> +
> +	retval = ext2fs_open_inode_scan(fs, 0, &scan);
> +	if (retval)
> +		return retval;
> +
> +	retval = ext2fs_get_mem(length, &inode);
> +	if (retval)
> +		goto out;
> +
> +	do {
> +		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
> +		if (retval)
> +			goto out;
> +		if (!ino)
> +			break;
> +		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
> +			continue;
> +
> +		/*
> +		 * Here's how we deal with high block number fields:
> +		 *
> +		 *  - i_size_high has been been written out with i_size_lo
> +		 *    since the ext2 days, so no conversion is needed.
> +		 *
> +		 *  - i_blocks_hi is guarded by both the huge_file feature and
> +		 *    inode flags and has always been written out with
> +		 *    i_blocks_lo if the feature is set.  The field is only
> +		 *    ever read if both feature and inode flag are set, so
> +		 *    we don't need to zero it now.
> +		 *
> +		 *  - i_file_acl_high can be uninitialized, so zero it if
> +		 *    it isn't already.
> +		 */
> +		if (inode->osd2.linux2.l_i_file_acl_high) {
> +			inode->osd2.linux2.l_i_file_acl_high = 0;
> +			retval = ext2fs_write_inode_full(fs, ino, inode,
> +							 length);
> +			if (retval)
> +				goto out;
> +		}
> +
> +		retval = zero_high_bits_in_extents(fs, ino, inode);
> +		if (retval)
> +			goto out;
> +	} while (ino);
> +
> +out:
> +	if (inode)
> +		ext2fs_free_mem(&inode);
> +	if (scan)
> +		ext2fs_close_inode_scan(scan);
> +	return retval;
> +}
> +
>  /*
>   * Clean up the bitmaps for unitialized bitmaps
>   */
> @@ -424,7 +697,8 @@ retry:
>  	/*
>  	 * Reallocate the group descriptors as necessary.
>  	 */
> -	if (old_fs->desc_blocks != fs->desc_blocks) {
> +	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
> +	    old_fs->desc_blocks != fs->desc_blocks) {
>  		retval = ext2fs_resize_mem(old_fs->desc_blocks *
>  					   fs->blocksize,
>  					   fs->desc_blocks * fs->blocksize,
> @@ -949,7 +1223,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
>  		new_blocks = fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
>  	}
>  
> -	if (old_blocks == new_blocks) {
> +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> +	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
> +	    old_blocks == new_blocks) {
>  		retval = 0;
>  		goto errout;
>  	}
> diff --git a/resize/resize2fs.h b/resize/resize2fs.h
> index 52319b5..5a1c5dc 100644
> --- a/resize/resize2fs.h
> +++ b/resize/resize2fs.h
> @@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
>  #define RESIZE_PERCENT_COMPLETE		0x0100
>  #define RESIZE_VERBOSE			0x0200
>  
> +#define RESIZE_ENABLE_64BIT		0x0400
> +#define RESIZE_DISABLE_64BIT		0x0800
> +
>  /*
>   * This structure is used for keeping track of how much resources have
>   * been used for a particular resize2fs pass.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-10-18  4:51 ` [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes Darrick J. Wong
  2013-10-18 19:25   ` Darrick J. Wong
  2013-10-22  1:13   ` Darrick J. Wong
@ 2013-11-26  7:21   ` Zheng Liu
  2013-11-26 19:55     ` Darrick J. Wong
  2013-11-27  1:56     ` Darrick J. Wong
  2 siblings, 2 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-26  7:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> Add functions to allow clients to get, set, and remove extended
> attributes from any file.  It also supports modifying EAs living in
> i_file_acl.
> 
> v2: Put the header declarations in the correct part of ext2fs.h,
> provide a function to release an EA block from an inode, and check
> i_extra_isize to make sure we actually have space for in-inode EAs.

Is this the latest version?  I am working on inline data patch set for
e2fsprogs, and I want to use these API to manipulate the EA.  So that
would be great if you could point out which one is the latest version.
Thanks in advance.  Otherwise some nits below.

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  lib/ext2fs/ext2_err.et.in |   18 +
>  lib/ext2fs/ext2fs.h       |   28 ++
>  lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 807 insertions(+)
> 
> 
> diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
> index 9cc1bd1..b819a90 100644
> --- a/lib/ext2fs/ext2_err.et.in
> +++ b/lib/ext2fs/ext2_err.et.in
> @@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
>  ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
>  	"Cannot block iterate on an inode containing inline data"
>  
> +ec	EXT2_ET_EA_BAD_NAME_LEN,
> +	"Extended attribute has an invalid name length"
> +
> +ec	EXT2_ET_EA_BAD_VALUE_SIZE,
> +	"Extended attribute has an invalid value length"
> +
> +ec	EXT2_ET_BAD_EA_HASH,
> +	"Extended attribute has an incorrect hash"
> +
> +ec	EXT2_ET_BAD_EA_HEADER,
> +	"Extended attribute block has a bad header"
> +
> +ec	EXT2_ET_EA_KEY_NOT_FOUND,
> +	"Extended attribute key not found"
> +
> +ec	EXT2_ET_EA_NO_SPACE,
> +	"Insufficient space to store extended attribute data"
> +
>  	end
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 5247922..93adae8 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
>  #define EXT2_FLAG_FLUSH_NO_SYNC          1
>  
>  /*
> + * Modify and iterate extended attributes
> + */
> +struct ext2_xattr_handle;
> +#define XATTR_ABORT	1
> +#define XATTR_CHANGED	2
> +
> +/*
>   * function prototypes
>   */
>  static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> @@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
>  					   char *block_buf,
>  					   int adjust, __u32 *newcount,
>  					   ext2_ino_t inum);
> +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> +			       unsigned int expandby);
> +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> +				int (*func)(char *name, char *value,
> +					    void *data),
> +				void *data);
> +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> +			   void **value, unsigned int *value_len);
> +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> +			   const char *key,
> +			   const void *value,
> +			   unsigned int value_len);
> +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> +			      const char *key);
> +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> +			     struct ext2_xattr_handle **handle);
> +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> +			       struct ext2_inode_large *inode);
>  
>  /* extent.c */
>  extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
> diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> index 9649a14..2a1e5e7 100644
> --- a/lib/ext2fs/ext_attr.c
> +++ b/lib/ext2fs/ext_attr.c
> @@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
>  	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
>  					  newcount);
>  }
> +
> +/* Manipulate the contents of extended attribute regions */
> +struct ext2_xattr {
> +	char *name;
> +	void *value;
> +	unsigned int value_len;
> +};
> +
> +struct ext2_xattr_handle {
> +	ext2_filsys fs;
> +	struct ext2_xattr *attrs;
> +	unsigned int length;
> +	ext2_ino_t ino;
> +	int dirty;
> +};
> +
> +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> +			       unsigned int expandby)
> +{
> +	struct ext2_xattr *new_attrs;
> +	errcode_t err;
> +
> +	err = ext2fs_get_arrayzero(h->length + expandby,
> +				   sizeof(struct ext2_xattr), &new_attrs);
> +	if (err)
> +		return err;
> +
> +	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
> +	ext2fs_free_mem(&h->attrs);
> +	h->length += expandby;
> +	h->attrs = new_attrs;
> +
> +	return 0;
> +}
> +
> +struct ea_name_index {
> +	int index;
> +	const char *name;
> +};
> +
> +static struct ea_name_index ea_names[] = {
> +	{1, "user."},
> +	{2, "system.posix_acl_access"},
> +	{3, "system.posix_acl_default"},
> +	{4, "trusted."},
> +	{6, "security."},
> +	{7, "system."},

It seems that we also have a _RICHACL name here.

> +	{0, NULL},
> +};
> +
> +static const char *find_ea_prefix(int index)
> +{
> +	struct ea_name_index *e;
> +
> +	for (e = ea_names; e->name; e++)
> +		if (e->index == index)
> +			return e->name;
> +
> +	return NULL;
> +}
> +
> +static int find_ea_index(const char *fullname, char **name, int *index)
> +{
> +	struct ea_name_index *e;
> +
> +	for (e = ea_names; e->name; e++)

Coding style problem:
       for (e = ea_names; e->name; e++) {
               ...
       }

Thanks,
                                                - Zheng

> +		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
> +			*name = (char *)fullname + strlen(e->name);
> +			*index = e->index;
> +			return 1;
> +		}
> +	return 0;
> +}
> +
> +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> +			       struct ext2_inode_large *inode)
> +{
> +	struct ext2_ext_attr_header *header;
> +	void *block_buf = NULL;
> +	dgrp_t grp;
> +	blk64_t blk, goal;
> +	errcode_t err;
> +	struct ext2_inode_large i;
> +
> +	/* Read inode? */
> +	if (inode == NULL) {
> +		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
> +					     sizeof(struct ext2_inode_large));
> +		if (err)
> +			return err;
> +		inode = &i;
> +	}
> +
> +	/* Do we already have an EA block? */
> +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> +	if (blk == 0)
> +		return 0;
> +
> +	/* Find block, zero it, write back */
> +	if ((blk < fs->super->s_first_data_block) ||
> +	    (blk >= ext2fs_blocks_count(fs->super))) {
> +		err = EXT2_ET_BAD_EA_BLOCK_NUM;
> +		goto out;
> +	}
> +
> +	err = ext2fs_get_mem(fs->blocksize, &block_buf);
> +	if (err)
> +		goto out;
> +
> +	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> +	if (err)
> +		goto out2;
> +
> +	header = (struct ext2_ext_attr_header *) block_buf;
> +	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> +		err = EXT2_ET_BAD_EA_HEADER;
> +		goto out2;
> +	}
> +
> +	header->h_refcount--;
> +	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> +	if (err)
> +		goto out2;
> +
> +	/* Erase link to block */
> +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
> +	if (header->h_refcount == 0)
> +		ext2fs_block_alloc_stats2(fs, blk, -1);
> +
> +	/* Write inode? */
> +	if (inode == &i) {
> +		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
> +					      sizeof(struct ext2_inode_large));
> +		if (err)
> +			goto out2;
> +	}
> +
> +out2:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	return err;
> +}
> +
> +static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
> +					 struct ext2_inode_large *inode)
> +{
> +	struct ext2_ext_attr_header *header;
> +	void *block_buf = NULL;
> +	dgrp_t grp;
> +	blk64_t blk, goal;
> +	errcode_t err;
> +
> +	/* Do we already have an EA block? */
> +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> +	if (blk != 0) {
> +		if ((blk < fs->super->s_first_data_block) ||
> +		    (blk >= ext2fs_blocks_count(fs->super))) {
> +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> +			goto out;
> +		}
> +
> +		err = ext2fs_get_mem(fs->blocksize, &block_buf);
> +		if (err)
> +			goto out;
> +
> +		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> +		if (err)
> +			goto out2;
> +
> +		header = (struct ext2_ext_attr_header *) block_buf;
> +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> +			err = EXT2_ET_BAD_EA_HEADER;
> +			goto out2;
> +		}
> +
> +		/* Single-user block.  We're done here. */
> +		if (header->h_refcount == 1)
> +			return 0;
> +
> +		/* We need to CoW the block. */
> +		header->h_refcount--;
> +		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> +		if (err)
> +			goto out2;
> +	} else {
> +		/* No block, we must increment i_blocks */
> +		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
> +					     1);
> +		if (err)
> +			goto out;
> +	}
> +
> +	/* Allocate a block */
> +	grp = ext2fs_group_of_ino(fs, ino);
> +	goal = ext2fs_inode_table_loc(fs, grp);
> +	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
> +	if (err)
> +		return err;
> +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
> +out2:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	return err;
> +}
> +
> +
> +static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
> +					struct ext2_xattr **pos,
> +					void *entries_start,
> +					unsigned int storage_size,
> +					unsigned int value_offset_correction)
> +{
> +	struct ext2_xattr *x = *pos;
> +	struct ext2_ext_attr_entry *e = entries_start;
> +	void *end = entries_start + storage_size;
> +	char *shortname;
> +	unsigned int entry_size, value_size;
> +	int idx, ret;
> +
> +	/* For all remaining x...  */
> +	for (; x < handle->attrs + handle->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		/* Calculate index and shortname position */
> +		shortname = x->name;
> +		ret = find_ea_index(x->name, &shortname, &idx);
> +
> +		/* Calculate entry and value size */
> +		entry_size = (sizeof(*e) + strlen(shortname) +
> +			      EXT2_EXT_ATTR_PAD - 1) &
> +			     ~(EXT2_EXT_ATTR_PAD - 1);
> +		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
> +			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
> +
> +		/*
> +		 * Would entry collide with value?
> +		 * Note that we must leave sufficient room for a (u32)0 to
> +		 * mark the end of the entries.
> +		 */
> +		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
> +			break;
> +
> +		/* Fill out e appropriately */
> +		e->e_name_len = strlen(shortname);
> +		e->e_name_index = (ret ? idx : 0);
> +		e->e_value_offs = end - value_size - (void *)entries_start +
> +				value_offset_correction;
> +		e->e_value_block = 0;
> +		e->e_value_size = x->value_len;
> +
> +		/* Store name and value */
> +		end -= value_size;
> +		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
> +		memcpy(end, x->value, e->e_value_size);
> +
> +		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
> +
> +		e = EXT2_EXT_ATTR_NEXT(e);
> +		*(__u32 *)e = 0;
> +	}
> +	*pos = x;
> +
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> +{
> +	struct ext2_xattr *x;
> +	struct ext2_inode_large *inode;
> +	void *start, *block_buf = NULL;
> +	struct ext2_ext_attr_header *header;
> +	__u32 ea_inode_magic;
> +	blk64_t blk;
> +	unsigned int storage_size;
> +	unsigned int i, written;
> +	errcode_t err;
> +
> +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> +		return 0;
> +
> +	i = EXT2_INODE_SIZE(handle->fs->super);
> +	if (i < sizeof(*inode))
> +		i = sizeof(*inode);
> +	err = ext2fs_get_memzero(i, &inode);
> +	if (err)
> +		return err;
> +
> +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> +				     (struct ext2_inode *)inode,
> +				     EXT2_INODE_SIZE(handle->fs->super));
> +	if (err)
> +		goto out;
> +
> +	x = handle->attrs;
> +	/* Does the inode have size for EA? */
> +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> +						  inode->i_extra_isize +
> +						  sizeof(__u32))
> +		goto write_ea_block;
> +
> +	/* Write the inode EA */
> +	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
> +	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
> +	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> +		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> +		sizeof(__u32);
> +	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +		inode->i_extra_isize + sizeof(__u32);
> +
> +	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
> +	if (err)
> +		goto out;
> +
> +	/* Are we done? */
> +	if (x == handle->attrs + handle->length)
> +		goto skip_ea_block;
> +
> +write_ea_block:
> +	/* Write the EA block */
> +	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> +	if (err)
> +		goto out;
> +
> +	storage_size = handle->fs->blocksize -
> +		sizeof(struct ext2_ext_attr_header);
> +	start = block_buf + sizeof(struct ext2_ext_attr_header);
> +
> +	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
> +				     (void *)start - block_buf);
> +	if (err)
> +		goto out2;
> +
> +	if (x < handle->attrs + handle->length) {
> +		err = EXT2_ET_EA_NO_SPACE;
> +		goto out2;
> +	}
> +
> +	if (block_buf) {
> +		/* Write a header on the EA block */
> +		header = block_buf;
> +		header->h_magic = EXT2_EXT_ATTR_MAGIC;
> +		header->h_refcount = 1;
> +		header->h_blocks = 1;
> +
> +		/* Get a new block for writing */
> +		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
> +		if (err)
> +			goto out2;
> +
> +		/* Finally, write the new EA block */
> +		blk = ext2fs_file_acl_block(handle->fs,
> +					    (struct ext2_inode *)inode);
> +		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
> +					     handle->ino);
> +		if (err)
> +			goto out2;
> +	}
> +
> +skip_ea_block:
> +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> +	if (!block_buf && blk) {
> +		/* xattrs shrunk, free the block */
> +		ext2fs_file_acl_block_set(handle->fs,
> +					  (struct ext2_inode *)inode, 0);
> +		err = ext2fs_iblk_sub_blocks(handle->fs,
> +					     (struct ext2_inode *)inode, 1);
> +		if (err)
> +			goto out;
> +		ext2fs_block_alloc_stats2(handle->fs, blk, -1);
> +	}
> +
> +	/* Write the inode */
> +	err = ext2fs_write_inode_full(handle->fs, handle->ino,
> +				      (struct ext2_inode *)inode,
> +				      EXT2_INODE_SIZE(handle->fs->super));
> +	if (err)
> +		goto out2;
> +
> +out2:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	ext2fs_free_mem(&inode);
> +	handle->dirty = 0;
> +	return err;
> +}
> +
> +static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
> +					 struct ext2_ext_attr_entry *entries,
> +					 unsigned int storage_size,
> +					 void *value_start)
> +{
> +	struct ext2_xattr *x;
> +	struct ext2_ext_attr_entry *entry;
> +	const char *prefix;
> +	void *ptr;
> +	unsigned int remain, prefix_len;
> +	errcode_t err;
> +
> +	x = handle->attrs;
> +	while (x->name)
> +		x++;
> +
> +	entry = entries;
> +	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
> +		__u32 hash;
> +
> +		/* header eats this space */
> +		remain -= sizeof(struct ext2_ext_attr_entry);
> +
> +		/* is attribute name valid? */
> +		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
> +			return EXT2_ET_EA_BAD_NAME_LEN;
> +
> +		/* attribute len eats this space */
> +		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
> +
> +		/* check value size */
> +		if (entry->e_value_size > remain)
> +			return EXT2_ET_EA_BAD_VALUE_SIZE;
> +
> +		/* e_value_block must be 0 in inode's ea */
> +		if (entry->e_value_block != 0)
> +			return EXT2_ET_BAD_EA_BLOCK_NUM;
> +
> +		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
> +							 entry->e_value_offs);
> +
> +		/* e_hash may be 0 in older inode's ea */
> +		if (entry->e_hash != 0 && entry->e_hash != hash)
> +			return EXT2_ET_BAD_EA_HASH;
> +
> +		remain -= entry->e_value_size;
> +
> +		/* Allocate space for more attrs? */
> +		if (x == handle->attrs + handle->length) {
> +			err = ext2fs_xattrs_expand(handle, 4);
> +			if (err)
> +				return err;
> +			x = handle->attrs + handle->length - 4;
> +		}
> +
> +		/* Extract name/value */
> +		prefix = find_ea_prefix(entry->e_name_index);
> +		prefix_len = (prefix ? strlen(prefix) : 0);
> +		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
> +					 &x->name);
> +		if (err)
> +			return err;
> +		if (prefix)
> +			memcpy(x->name, prefix, prefix_len);
> +		if (entry->e_name_len)
> +			memcpy(x->name + prefix_len,
> +			       (void *)entry + sizeof(*entry),
> +			       entry->e_name_len);
> +
> +		err = ext2fs_get_mem(entry->e_value_size, &x->value);
> +		if (err)
> +			return err;
> +		x->value_len = entry->e_value_size;
> +		memcpy(x->value, value_start + entry->e_value_offs,
> +		       entry->e_value_size);
> +		x++;
> +		entry = EXT2_EXT_ATTR_NEXT(entry);
> +	}
> +
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
> +{
> +	struct ext2_xattr *attrs = NULL, *x;
> +	unsigned int attrs_len;
> +	struct ext2_inode_large *inode;
> +	struct ext2_ext_attr_header *header;
> +	__u32 ea_inode_magic;
> +	unsigned int storage_size;
> +	void *start, *block_buf = NULL;
> +	blk64_t blk;
> +	int i;
> +	errcode_t err;
> +
> +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> +		return 0;
> +
> +	i = EXT2_INODE_SIZE(handle->fs->super);
> +	if (i < sizeof(*inode))
> +		i = sizeof(*inode);
> +	err = ext2fs_get_memzero(i, &inode);
> +	if (err)
> +		return err;
> +
> +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> +				     (struct ext2_inode *)inode,
> +				     EXT2_INODE_SIZE(handle->fs->super));
> +	if (err)
> +		goto out;
> +
> +	/* Does the inode have size for EA? */
> +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> +						  inode->i_extra_isize +
> +						  sizeof(__u32))
> +		goto read_ea_block;
> +
> +	/* Look for EA in the inode */
> +	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +	       inode->i_extra_isize, sizeof(__u32));
> +	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
> +		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> +			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> +			sizeof(__u32);
> +		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> +			inode->i_extra_isize + sizeof(__u32);
> +
> +		err = read_xattrs_from_buffer(handle, start, storage_size,
> +					      start);
> +		if (err)
> +			goto out;
> +	}
> +
> +read_ea_block:
> +	/* Look for EA in a separate EA block */
> +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> +	if (blk != 0) {
> +		if ((blk < handle->fs->super->s_first_data_block) ||
> +		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
> +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> +			goto out;
> +		}
> +
> +		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> +		if (err)
> +			goto out;
> +
> +		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
> +					    handle->ino);
> +		if (err)
> +			goto out3;
> +
> +		header = (struct ext2_ext_attr_header *) block_buf;
> +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> +			err = EXT2_ET_BAD_EA_HEADER;
> +			goto out3;
> +		}
> +
> +		if (header->h_blocks != 1) {
> +			err = EXT2_ET_BAD_EA_HEADER;
> +			goto out3;
> +		}
> +
> +		/* Read EAs */
> +		storage_size = handle->fs->blocksize -
> +			sizeof(struct ext2_ext_attr_header);
> +		start = block_buf + sizeof(struct ext2_ext_attr_header);
> +		err = read_xattrs_from_buffer(handle, start, storage_size,
> +					      block_buf);
> +		if (err)
> +			goto out3;
> +
> +		ext2fs_free_mem(&block_buf);
> +	}
> +
> +	ext2fs_free_mem(&block_buf);
> +	ext2fs_free_mem(&inode);
> +	return 0;
> +
> +out3:
> +	ext2fs_free_mem(&block_buf);
> +out:
> +	ext2fs_free_mem(&inode);
> +	return err;
> +}
> +
> +#define XATTR_ABORT	1
> +#define XATTR_CHANGED	2
> +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> +				int (*func)(char *name, char *value,
> +					    void *data),
> +				void *data)
> +{
> +	struct ext2_xattr *x;
> +	errcode_t err;
> +	int ret;
> +
> +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		ret = func(x->name, x->value, data);
> +		if (ret & XATTR_CHANGED)
> +			h->dirty = 1;
> +		if (ret & XATTR_ABORT)
> +			return 0;
> +	}
> +
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> +			   void **value, unsigned int *value_len)
> +{
> +	struct ext2_xattr *x;
> +	void *val;
> +	errcode_t err;
> +
> +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		if (strcmp(x->name, key) == 0) {
> +			err = ext2fs_get_mem(x->value_len, &val);
> +			if (err)
> +				return err;
> +			memcpy(val, x->value, x->value_len);
> +			*value = val;
> +			*value_len = x->value_len;
> +			return 0;
> +		}
> +	}
> +
> +	return EXT2_ET_EA_KEY_NOT_FOUND;
> +}
> +
> +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> +			   const char *key,
> +			   const void *value,
> +			   unsigned int value_len)
> +{
> +	struct ext2_xattr *x, *last_empty;
> +	char *new_value;
> +	errcode_t err;
> +
> +	last_empty = NULL;
> +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> +		if (!x->name) {
> +			last_empty = x;
> +			continue;
> +		}
> +
> +		/* Replace xattr */
> +		if (strcmp(x->name, key) == 0) {
> +			err = ext2fs_get_mem(value_len, &new_value);
> +			if (err)
> +				return err;
> +			memcpy(new_value, value, value_len);
> +			ext2fs_free_mem(&x->value);
> +			x->value = new_value;
> +			x->value_len = value_len;
> +			handle->dirty = 1;
> +			return 0;
> +		}
> +	}
> +
> +	/* Add attr to empty slot */
> +	if (last_empty) {
> +		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
> +		if (err)
> +			return err;
> +		strcpy(last_empty->name, key);
> +
> +		err = ext2fs_get_mem(value_len, &last_empty->value);
> +		if (err)
> +			return err;
> +		memcpy(last_empty->value, value, value_len);
> +		last_empty->value_len = value_len;
> +		handle->dirty = 1;
> +		return 0;
> +	}
> +
> +	/* Expand array, append slot */
> +	err = ext2fs_xattrs_expand(handle, 4);
> +	if (err)
> +		return err;
> +
> +	x = handle->attrs + handle->length - 4;
> +	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
> +	if (err)
> +		return err;
> +	strcpy(x->name, key);
> +
> +	err = ext2fs_get_mem(value_len, &x->value);
> +	if (err)
> +		return err;
> +	memcpy(x->value, value, value_len);
> +	x->value_len = value_len;
> +	handle->dirty = 1;
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> +			      const char *key)
> +{
> +	struct ext2_xattr *x;
> +	errcode_t err;
> +
> +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> +		if (!x->name)
> +			continue;
> +
> +		if (strcmp(x->name, key) == 0) {
> +			ext2fs_free_mem(&x->name);
> +			ext2fs_free_mem(&x->value);
> +			x->value_len = 0;
> +			handle->dirty = 1;
> +			return 0;
> +		}
> +	}
> +
> +	return EXT2_ET_EA_KEY_NOT_FOUND;
> +}
> +
> +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> +			     struct ext2_xattr_handle **handle)
> +{
> +	struct ext2_xattr_handle *h;
> +	errcode_t err;
> +
> +	err = ext2fs_get_memzero(sizeof(*h), &h);
> +	if (err)
> +		return err;
> +
> +	h->length = 4;
> +	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
> +				   &h->attrs);
> +	if (err) {
> +		ext2fs_free_mem(&h);
> +		return err;
> +	}
> +	h->ino = ino;
> +	h->fs = fs;
> +	*handle = h;
> +	return 0;
> +}
> +
> +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
> +{
> +	unsigned int i;
> +	struct ext2_xattr_handle *h = *handle;
> +	struct ext2_xattr *a = h->attrs;
> +	errcode_t err;
> +
> +	if (h->dirty) {
> +		err = ext2fs_xattrs_write(h);
> +		if (err)
> +			return err;
> +	}
> +
> +	for (i = 0; i < h->length; i++) {
> +		if (a[i].name)
> +			ext2fs_free_mem(&a[i].name);
> +		if (a[i].value)
> +			ext2fs_free_mem(&a[i].value);
> +	}
> +
> +	ext2fs_free_mem(&h->attrs);
> +	ext2fs_free_mem(handle);
> +	return 0;
> +}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/25] resize2fs: convert fs to and from 64bit mode
  2013-11-26  6:44   ` Zheng Liu
@ 2013-11-26 18:39     ` Darrick J. Wong
  2013-11-27  2:21       ` Zheng Liu
  0 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-26 18:39 UTC (permalink / raw)
  To: tytso, linux-ext4

On Tue, Nov 26, 2013 at 02:44:45PM +0800, Zheng Liu wrote:
> On Thu, Oct 17, 2013 at 09:50:42PM -0700, Darrick J. Wong wrote:
> > resize2fs does its magic by loading a filesystem, duplicating the
> > in-memory image of that fs, moving relevant blocks out of the way of
> > whatever new metadata get created, and finally writing everything back
> > out to disk.  Enabling 64bit mode enlarges the group descriptors,
> > which makes resize2fs a reasonable vehicle for taking care of the rest
> > of the bookkeeping requirements, so add to resize2fs the ability to
> > convert a filesystem to 64bit mode and back.
> 
> Sorry, I don't get your point why we need to add these arguments to
> enable/disable 64bit mode.  If I understand correctly, we don't disable
> 64bit mode for a file system which is larger than 2^32 blocks.  So that
> means that we just disable it for a file system which 64bit shouldn't be
> enabled.  Is it worth doing this?

Are you questioning the entire conversion, or just the 64->32 direction?

32->64 has two benefits: You can resize (somewhat) past 16T (256T I think?);
and you get full 32-bit bitmap checksums.

I agree that 64->32 isn't terribly useful, but dislike one-way conversions.

> Otherwise one nit below.
> 
>                                                 - Zheng
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  resize/main.c         |   40 ++++++-
> >  resize/resize2fs.8.in |   18 +++
> >  resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  resize/resize2fs.h    |    3 +
> >  4 files changed, 336 insertions(+), 7 deletions(-)
> > 
> > 
> > diff --git a/resize/main.c b/resize/main.c
> > index 1394ae1..ad0c946 100644
> > --- a/resize/main.c
> > +++ b/resize/main.c
> > @@ -41,7 +41,7 @@ char *program_name, *device_name, *io_options;
> >  static void usage (char *prog)
> >  {
> >  	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
> > -			   "[-p] device [new_size]\n\n"), prog);
> > +			   "[-p] device [-b|-s|new_size]\n\n"), prog);
> >  
> >  	exit (1);
> >  }
> > @@ -199,7 +199,7 @@ int main (int argc, char ** argv)
> >  	if (argc && *argv)
> >  		program_name = *argv;
> >  
> > -	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
> > +	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
> >  		switch (c) {
> >  		case 'h':
> >  			usage(program_name);
> > @@ -225,6 +225,12 @@ int main (int argc, char ** argv)
> >  		case 'S':
> >  			use_stride = atoi(optarg);
> >  			break;
> > +		case 'b':
> > +			flags |= RESIZE_ENABLE_64BIT;
> > +			break;
> > +		case 's':
> > +			flags |= RESIZE_DISABLE_64BIT;
> > +			break;
> >  		default:
> >  			usage(program_name);
> >  		}
> > @@ -383,6 +389,10 @@ int main (int argc, char ** argv)
> >  		if (sys_page_size > fs->blocksize)
> >  			new_size &= ~((sys_page_size / fs->blocksize)-1);
> >  	}
> > +	/* If changing 64bit, don't change the filesystem size. */
> > +	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> > +		new_size = ext2fs_blocks_count(fs->super);
> > +	}
> >  	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> >  				       EXT4_FEATURE_INCOMPAT_64BIT)) {
> >  		/* Take 16T down to 2^32-1 blocks */
> > @@ -434,7 +444,31 @@ int main (int argc, char ** argv)
> >  			fs->blocksize / 1024, new_size);
> >  		exit(1);
> >  	}
> > -	if (new_size == ext2fs_blocks_count(fs->super)) {
> > +	if (flags & RESIZE_DISABLE_64BIT && flags & RESIZE_ENABLE_64BIT) {
>             ^^^^^
> Coding style problem:
>         if ((flags & RESIZE_ENABLE_64BIT) && (flags & RESIZE_ENABLE_64BIT))

Yes, thank you for catching this.

--D

> 
> > +		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
> > +		exit(1);
> > +	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> > +		new_size = ext2fs_blocks_count(fs->super);
> > +		if (new_size >= (1ULL << 32)) {
> > +			fprintf(stderr, _("Cannot change the 64bit feature "
> > +				"on a filesystem that is larger than "
> > +				"2^32 blocks.\n"));
> > +			exit(1);
> > +		}
> > +		if (mount_flags & EXT2_MF_MOUNTED) {
> > +			fprintf(stderr, _("Cannot change the 64bit feature "
> > +				"while the filesystem is mounted.\n"));
> > +			exit(1);
> > +		}
> > +		if (flags & RESIZE_ENABLE_64BIT &&
>                     ^^^^
>                     ditto
> 
> > +		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> > +				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
> > +			fprintf(stderr, _("Please enable the extents feature "
> > +				"with tune2fs before enabling the 64bit "
> > +				"feature.\n"));
> > +			exit(1);
> > +		}
> > +	} else if (new_size == ext2fs_blocks_count(fs->super)) {
> >  		fprintf(stderr, _("The filesystem is already %llu blocks "
> >  			"long.  Nothing to do!\n\n"), new_size);
> >  		exit(0);
> > diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
> > index a1f3099..1c75816 100644
> > --- a/resize/resize2fs.8.in
> > +++ b/resize/resize2fs.8.in
> > @@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
> >  .SH SYNOPSIS
> >  .B resize2fs
> >  [
> > -.B \-fFpPM
> > +.B \-fFpPMbs
> >  ]
> >  [
> >  .B \-d
> > @@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
> >  to shrink the size of the partition.  When shrinking the size of
> >  the partition, make sure you do not make it smaller than the new size
> >  of the ext2 filesystem!
> > +.PP
> > +The
> > +.B \-b
> > +and
> > +.B \-s
> > +options enable and disable the 64bit feature, respectively.  The resize2fs
> > +program will, of course, take care of resizing the block group descriptors
> > +and moving other data blocks out of the way, as needed.  It is not possible
> > +to resize the filesystem concurrent with changing the 64bit status.
> >  .SH OPTIONS
> >  .TP
> > +.B \-b
> > +Turns on the 64bit feature, resizes the group descriptors as necessary, and
> > +moves other metadata out of the way.
> > +.TP
> >  .B \-d \fIdebug-flags
> >  Turns on various resize2fs debugging features, if they have been compiled
> >  into the binary.
> > @@ -126,6 +139,9 @@ of what the program is doing.
> >  .B \-P
> >  Print the minimum size of the filesystem and exit.
> >  .TP
> > +.B \-s
> > +Turns off the 64bit feature and frees blocks that are no longer in use.
> > +.TP
> >  .B \-S \fIRAID-stride
> >  The
> >  .B resize2fs
> > diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> > index 0feff0f..05ba6e1 100644
> > --- a/resize/resize2fs.c
> > +++ b/resize/resize2fs.c
> > @@ -53,6 +53,9 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs);
> >  static errcode_t fix_sb_journal_backup(ext2_filsys fs);
> >  static errcode_t mark_table_blocks(ext2_filsys fs,
> >  				   ext2fs_block_bitmap bmap);
> > +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
> > +static errcode_t move_bg_metadata(ext2_resize_t rfs);
> > +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
> >  
> >  /*
> >   * Some helper CPP macros
> > @@ -119,13 +122,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
> >  	if (retval)
> >  		goto errout;
> >  
> > +	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
> > +	retval = resize_group_descriptors(rfs, *new_size);
> > +	if (retval)
> > +		goto errout;
> > +	print_resource_track(rfs, &rtrack, fs->io);
> > +
> > +	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
> > +	retval = move_bg_metadata(rfs);
> > +	if (retval)
> > +		goto errout;
> > +	print_resource_track(rfs, &rtrack, fs->io);
> > +
> > +	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
> > +	retval = zero_high_bits_in_inodes(rfs);
> > +	if (retval)
> > +		goto errout;
> > +	print_resource_track(rfs, &rtrack, fs->io);
> > +
> >  	init_resource_track(&rtrack, "adjust_superblock", fs->io);
> >  	retval = adjust_superblock(rfs, *new_size);
> >  	if (retval)
> >  		goto errout;
> >  	print_resource_track(rfs, &rtrack, fs->io);
> >  
> > -
> >  	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
> >  	fix_uninit_block_bitmaps(rfs->new_fs);
> >  	print_resource_track(rfs, &rtrack, fs->io);
> > @@ -221,6 +241,259 @@ errout:
> >  	return retval;
> >  }
> >  
> > +/* Toggle 64bit mode */
> > +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
> > +{
> > +	void *o, *n, *new_group_desc;
> > +	dgrp_t i;
> > +	int copy_size;
> > +	errcode_t retval;
> > +
> > +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> > +		return 0;
> > +
> > +	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
> > +	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
> > +	    (rfs->flags & RESIZE_DISABLE_64BIT &&
> > +	     rfs->flags & RESIZE_ENABLE_64BIT))
> > +		return EXT2_ET_INVALID_ARGUMENT;
> > +
> > +	if (rfs->flags & RESIZE_DISABLE_64BIT) {
> > +		rfs->new_fs->super->s_feature_incompat &=
> > +				~EXT4_FEATURE_INCOMPAT_64BIT;
> > +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
> > +	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
> > +		rfs->new_fs->super->s_feature_incompat |=
> > +				EXT4_FEATURE_INCOMPAT_64BIT;
> > +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
> > +	}
> > +
> > +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> > +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> > +		return 0;
> > +
> > +	o = rfs->new_fs->group_desc;
> > +	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
> > +			rfs->old_fs->group_desc_count,
> > +			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
> > +	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
> > +				      rfs->old_fs->blocksize, &new_group_desc);
> > +	if (retval)
> > +		return retval;
> > +
> > +	n = new_group_desc;
> > +
> > +	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
> > +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> > +		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
> > +	else
> > +		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
> > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> > +		memcpy(n, o, copy_size);
> > +		n += EXT2_DESC_SIZE(rfs->new_fs->super);
> > +		o += EXT2_DESC_SIZE(rfs->old_fs->super);
> > +	}
> > +
> > +	ext2fs_free_mem(&rfs->new_fs->group_desc);
> > +	rfs->new_fs->group_desc = new_group_desc;
> > +
> > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
> > +		ext2fs_group_desc_csum_set(rfs->new_fs, i);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Move bitmaps/inode tables out of the way. */
> > +static errcode_t move_bg_metadata(ext2_resize_t rfs)
> > +{
> > +	dgrp_t i;
> > +	blk64_t b, c, d;
> > +	ext2fs_block_bitmap old_map, new_map;
> > +	int old, new;
> > +	errcode_t retval;
> > +	int zero = 0, one = 1;
> > +
> > +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> > +		return 0;
> > +
> > +	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
> > +	if (retval)
> > +		return retval;
> > +
> > +	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
> > +	if (retval)
> > +		goto out;
> > +
> > +	/* Construct bitmaps of super/descriptor blocks in old and new fs */
> > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> > +		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
> > +						   NULL);
> > +		if (retval)
> > +			goto out;
> > +		ext2fs_mark_block_bitmap2(old_map, b);
> > +		ext2fs_mark_block_bitmap2(old_map, c);
> > +		ext2fs_mark_block_bitmap2(old_map, d);
> > +
> > +		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
> > +						   NULL);
> > +		if (retval)
> > +			goto out;
> > +		ext2fs_mark_block_bitmap2(new_map, b);
> > +		ext2fs_mark_block_bitmap2(new_map, c);
> > +		ext2fs_mark_block_bitmap2(new_map, d);
> > +	}
> > +
> > +	/* Find changes in block allocations for bg metadata */
> > +	for (b = 0;
> > +	     b < ext2fs_blocks_count(rfs->new_fs->super);
> > +	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
> > +		old = ext2fs_test_block_bitmap2(old_map, b);
> > +		new = ext2fs_test_block_bitmap2(new_map, b);
> > +
> > +		if (old && !new)
> > +			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
> > +		else if (!old && new)
> > +			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
> > +		else
> > +			ext2fs_unmark_block_bitmap2(new_map, b);
> > +	}
> > +	/* new_map now shows blocks that have been newly allocated. */
> > +
> > +	/* Move any conflicting bitmaps and inode tables */
> > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> > +		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
> > +		if (ext2fs_test_block_bitmap2(new_map, b))
> > +			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
> > +
> > +		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
> > +		if (ext2fs_test_block_bitmap2(new_map, b))
> > +			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
> > +
> > +		c = ext2fs_inode_table_loc(rfs->new_fs, i);
> > +		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
> > +			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
> > +				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
> > +				break;
> > +			}
> > +		}
> > +	}
> > +
> > +out:
> > +	if (old_map)
> > +		ext2fs_free_block_bitmap(old_map);
> > +	if (new_map)
> > +		ext2fs_free_block_bitmap(new_map);
> > +	return retval;
> > +}
> > +
> > +/* Zero out the high bits of extent fields */
> > +static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
> > +				 struct ext2_inode *inode)
> > +{
> > +	ext2_extent_handle_t	handle;
> > +	struct ext2fs_extent	extent;
> > +	int			op = EXT2_EXTENT_ROOT;
> > +	errcode_t		errcode;
> > +
> > +	if (!(inode->i_flags & EXT4_EXTENTS_FL))
> > +		return 0;
> > +
> > +	errcode = ext2fs_extent_open(fs, ino, &handle);
> > +	if (errcode)
> > +		return errcode;
> > +
> > +	while (1) {
> > +		errcode = ext2fs_extent_get(handle, op, &extent);
> > +		if (errcode)
> > +			break;
> > +
> > +		op = EXT2_EXTENT_NEXT_SIB;
> > +
> > +		if (extent.e_pblk > (1ULL << 32)) {
> > +			extent.e_pblk &= (1ULL << 32) - 1;
> > +			errcode = ext2fs_extent_replace(handle, 0, &extent);
> > +			if (errcode)
> > +				break;
> > +		}
> > +	}
> > +
> > +	/* Ok if we run off the end */
> > +	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
> > +		errcode = 0;
> > +	return errcode;
> > +}
> > +
> > +/* Zero out the high bits of inodes. */
> > +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
> > +{
> > +	ext2_filsys	fs = rfs->new_fs;
> > +	int length = EXT2_INODE_SIZE(fs->super);
> > +	struct ext2_inode *inode = NULL;
> > +	ext2_inode_scan	scan = NULL;
> > +	errcode_t	retval;
> > +	ext2_ino_t	ino;
> > +	blk64_t		file_acl_block;
> > +	int		inode_dirty;
> > +
> > +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> > +		return 0;
> > +
> > +	if (fs->super->s_creator_os != EXT2_OS_LINUX)
> > +		return 0;
> > +
> > +	retval = ext2fs_open_inode_scan(fs, 0, &scan);
> > +	if (retval)
> > +		return retval;
> > +
> > +	retval = ext2fs_get_mem(length, &inode);
> > +	if (retval)
> > +		goto out;
> > +
> > +	do {
> > +		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
> > +		if (retval)
> > +			goto out;
> > +		if (!ino)
> > +			break;
> > +		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
> > +			continue;
> > +
> > +		/*
> > +		 * Here's how we deal with high block number fields:
> > +		 *
> > +		 *  - i_size_high has been been written out with i_size_lo
> > +		 *    since the ext2 days, so no conversion is needed.
> > +		 *
> > +		 *  - i_blocks_hi is guarded by both the huge_file feature and
> > +		 *    inode flags and has always been written out with
> > +		 *    i_blocks_lo if the feature is set.  The field is only
> > +		 *    ever read if both feature and inode flag are set, so
> > +		 *    we don't need to zero it now.
> > +		 *
> > +		 *  - i_file_acl_high can be uninitialized, so zero it if
> > +		 *    it isn't already.
> > +		 */
> > +		if (inode->osd2.linux2.l_i_file_acl_high) {
> > +			inode->osd2.linux2.l_i_file_acl_high = 0;
> > +			retval = ext2fs_write_inode_full(fs, ino, inode,
> > +							 length);
> > +			if (retval)
> > +				goto out;
> > +		}
> > +
> > +		retval = zero_high_bits_in_extents(fs, ino, inode);
> > +		if (retval)
> > +			goto out;
> > +	} while (ino);
> > +
> > +out:
> > +	if (inode)
> > +		ext2fs_free_mem(&inode);
> > +	if (scan)
> > +		ext2fs_close_inode_scan(scan);
> > +	return retval;
> > +}
> > +
> >  /*
> >   * Clean up the bitmaps for unitialized bitmaps
> >   */
> > @@ -424,7 +697,8 @@ retry:
> >  	/*
> >  	 * Reallocate the group descriptors as necessary.
> >  	 */
> > -	if (old_fs->desc_blocks != fs->desc_blocks) {
> > +	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
> > +	    old_fs->desc_blocks != fs->desc_blocks) {
> >  		retval = ext2fs_resize_mem(old_fs->desc_blocks *
> >  					   fs->blocksize,
> >  					   fs->desc_blocks * fs->blocksize,
> > @@ -949,7 +1223,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
> >  		new_blocks = fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
> >  	}
> >  
> > -	if (old_blocks == new_blocks) {
> > +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> > +	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
> > +	    old_blocks == new_blocks) {
> >  		retval = 0;
> >  		goto errout;
> >  	}
> > diff --git a/resize/resize2fs.h b/resize/resize2fs.h
> > index 52319b5..5a1c5dc 100644
> > --- a/resize/resize2fs.h
> > +++ b/resize/resize2fs.h
> > @@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
> >  #define RESIZE_PERCENT_COMPLETE		0x0100
> >  #define RESIZE_VERBOSE			0x0200
> >  
> > +#define RESIZE_ENABLE_64BIT		0x0400
> > +#define RESIZE_DISABLE_64BIT		0x0800
> > +
> >  /*
> >   * This structure is used for keeping track of how much resources have
> >   * been used for a particular resize2fs pass.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-26  7:21   ` Zheng Liu
@ 2013-11-26 19:55     ` Darrick J. Wong
  2013-11-27  2:52       ` Zheng Liu
  2013-11-27  1:56     ` Darrick J. Wong
  1 sibling, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-26 19:55 UTC (permalink / raw)
  To: tytso, linux-ext4

On Tue, Nov 26, 2013 at 03:21:16PM +0800, Zheng Liu wrote:
> On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> > Add functions to allow clients to get, set, and remove extended
> > attributes from any file.  It also supports modifying EAs living in
> > i_file_acl.
> > 
> > v2: Put the header declarations in the correct part of ext2fs.h,
> > provide a function to release an EA block from an inode, and check
> > i_extra_isize to make sure we actually have space for in-inode EAs.
> 
> Is this the latest version?  I am working on inline data patch set for
> e2fsprogs, and I want to use these API to manipulate the EA.  So that
> would be great if you could point out which one is the latest version.
> Thanks in advance.  Otherwise some nits below.

Oh!  I was just about to start working on pulling your patches into my monster
patchset. :)

I changed the extended attribute API a little bit -- the function pointer to
ext2fs_xattrs_iterate() takes a value length; lengths are now specified in
size_t; and the ext2fs_xattrs_count() call is new.  I removed
ext2fs_xattrs_expand() since it's an internal call.

This is the current set of APIs:

errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
                               int (*func)(char *name, char *value,
                                           size_t value_len, void *data),
                               void *data);
errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
                          void **value, size_t *value_len);
errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
                          const char *key,
                          const void *value,
                          size_t value_len);
errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
                             const char *key);
errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
                            struct ext2_xattr_handle **handle);
errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
                              struct ext2_inode_large *inode);
size_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle);

I was planning a couple of modifications to support inline_data -- since we can
rewrite the inode-ea and ea-block arbitrarily, ext2fs_xattrs_write() ought to
ensure that the inlinedata EA gets written into i_blocks and the beginning of
the inode-ea area.

Should the attributes be sorted before writing?  I was thinking that the
desirable(?) order might be inline_data, security attributes, "everything
else", then user attributes?  Or we could simply maintain FCFS order as is done
now.

The other change was to ext2fs_xattr_set() to return
EXT2_ET_INLINE_DATA_NO_SPACE if it figures out that there's not enough space in
i_blocks + inode-ea to fit the inline data.

> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  lib/ext2fs/ext2_err.et.in |   18 +
> >  lib/ext2fs/ext2fs.h       |   28 ++
> >  lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 807 insertions(+)
> > 
> > 
> > diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
> > index 9cc1bd1..b819a90 100644
> > --- a/lib/ext2fs/ext2_err.et.in
> > +++ b/lib/ext2fs/ext2_err.et.in
> > @@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
> >  ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
> >  	"Cannot block iterate on an inode containing inline data"
> >  
> > +ec	EXT2_ET_EA_BAD_NAME_LEN,
> > +	"Extended attribute has an invalid name length"
> > +
> > +ec	EXT2_ET_EA_BAD_VALUE_SIZE,
> > +	"Extended attribute has an invalid value length"
> > +
> > +ec	EXT2_ET_BAD_EA_HASH,
> > +	"Extended attribute has an incorrect hash"
> > +
> > +ec	EXT2_ET_BAD_EA_HEADER,
> > +	"Extended attribute block has a bad header"
> > +
> > +ec	EXT2_ET_EA_KEY_NOT_FOUND,
> > +	"Extended attribute key not found"
> > +
> > +ec	EXT2_ET_EA_NO_SPACE,
> > +	"Insufficient space to store extended attribute data"
> > +
> >  	end
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 5247922..93adae8 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
> >  #define EXT2_FLAG_FLUSH_NO_SYNC          1
> >  
> >  /*
> > + * Modify and iterate extended attributes
> > + */
> > +struct ext2_xattr_handle;
> > +#define XATTR_ABORT	1
> > +#define XATTR_CHANGED	2
> > +
> > +/*
> >   * function prototypes
> >   */
> >  static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> > @@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
> >  					   char *block_buf,
> >  					   int adjust, __u32 *newcount,
> >  					   ext2_ino_t inum);
> > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > +			       unsigned int expandby);
> > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > +				int (*func)(char *name, char *value,
> > +					    void *data),
> > +				void *data);
> > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > +			   void **value, unsigned int *value_len);
> > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > +			   const char *key,
> > +			   const void *value,
> > +			   unsigned int value_len);
> > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > +			      const char *key);
> > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > +			     struct ext2_xattr_handle **handle);
> > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > +			       struct ext2_inode_large *inode);
> >  
> >  /* extent.c */
> >  extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
> > diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> > index 9649a14..2a1e5e7 100644
> > --- a/lib/ext2fs/ext_attr.c
> > +++ b/lib/ext2fs/ext_attr.c
> > @@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
> >  	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
> >  					  newcount);
> >  }
> > +
> > +/* Manipulate the contents of extended attribute regions */
> > +struct ext2_xattr {
> > +	char *name;
> > +	void *value;
> > +	unsigned int value_len;
> > +};
> > +
> > +struct ext2_xattr_handle {
> > +	ext2_filsys fs;
> > +	struct ext2_xattr *attrs;
> > +	unsigned int length;
> > +	ext2_ino_t ino;
> > +	int dirty;
> > +};
> > +
> > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > +			       unsigned int expandby)
> > +{
> > +	struct ext2_xattr *new_attrs;
> > +	errcode_t err;
> > +
> > +	err = ext2fs_get_arrayzero(h->length + expandby,
> > +				   sizeof(struct ext2_xattr), &new_attrs);
> > +	if (err)
> > +		return err;
> > +
> > +	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
> > +	ext2fs_free_mem(&h->attrs);
> > +	h->length += expandby;
> > +	h->attrs = new_attrs;
> > +
> > +	return 0;
> > +}
> > +
> > +struct ea_name_index {
> > +	int index;
> > +	const char *name;
> > +};
> > +
> > +static struct ea_name_index ea_names[] = {
> > +	{1, "user."},
> > +	{2, "system.posix_acl_access"},
> > +	{3, "system.posix_acl_default"},
> > +	{4, "trusted."},
> > +	{6, "security."},
> > +	{7, "system."},
> 
> It seems that we also have a _RICHACL name here.
> 
> > +	{0, NULL},
> > +};
> > +
> > +static const char *find_ea_prefix(int index)
> > +{
> > +	struct ea_name_index *e;
> > +
> > +	for (e = ea_names; e->name; e++)
> > +		if (e->index == index)
> > +			return e->name;
> > +
> > +	return NULL;
> > +}
> > +
> > +static int find_ea_index(const char *fullname, char **name, int *index)
> > +{
> > +	struct ea_name_index *e;
> > +
> > +	for (e = ea_names; e->name; e++)
> 
> Coding style problem:
>        for (e = ea_names; e->name; e++) {
>                ...
>        }

Ok I'll change it.

--D

> Thanks,
>                                                 - Zheng
> 
> > +		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
> > +			*name = (char *)fullname + strlen(e->name);
> > +			*index = e->index;
> > +			return 1;
> > +		}
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > +			       struct ext2_inode_large *inode)
> > +{
> > +	struct ext2_ext_attr_header *header;
> > +	void *block_buf = NULL;
> > +	dgrp_t grp;
> > +	blk64_t blk, goal;
> > +	errcode_t err;
> > +	struct ext2_inode_large i;
> > +
> > +	/* Read inode? */
> > +	if (inode == NULL) {
> > +		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
> > +					     sizeof(struct ext2_inode_large));
> > +		if (err)
> > +			return err;
> > +		inode = &i;
> > +	}
> > +
> > +	/* Do we already have an EA block? */
> > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > +	if (blk == 0)
> > +		return 0;
> > +
> > +	/* Find block, zero it, write back */
> > +	if ((blk < fs->super->s_first_data_block) ||
> > +	    (blk >= ext2fs_blocks_count(fs->super))) {
> > +		err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > +		goto out;
> > +	}
> > +
> > +	err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > +	if (err)
> > +		goto out;
> > +
> > +	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > +	if (err)
> > +		goto out2;
> > +
> > +	header = (struct ext2_ext_attr_header *) block_buf;
> > +	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > +		err = EXT2_ET_BAD_EA_HEADER;
> > +		goto out2;
> > +	}
> > +
> > +	header->h_refcount--;
> > +	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > +	if (err)
> > +		goto out2;
> > +
> > +	/* Erase link to block */
> > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
> > +	if (header->h_refcount == 0)
> > +		ext2fs_block_alloc_stats2(fs, blk, -1);
> > +
> > +	/* Write inode? */
> > +	if (inode == &i) {
> > +		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
> > +					      sizeof(struct ext2_inode_large));
> > +		if (err)
> > +			goto out2;
> > +	}
> > +
> > +out2:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	return err;
> > +}
> > +
> > +static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
> > +					 struct ext2_inode_large *inode)
> > +{
> > +	struct ext2_ext_attr_header *header;
> > +	void *block_buf = NULL;
> > +	dgrp_t grp;
> > +	blk64_t blk, goal;
> > +	errcode_t err;
> > +
> > +	/* Do we already have an EA block? */
> > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > +	if (blk != 0) {
> > +		if ((blk < fs->super->s_first_data_block) ||
> > +		    (blk >= ext2fs_blocks_count(fs->super))) {
> > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > +			goto out;
> > +		}
> > +
> > +		err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > +		if (err)
> > +			goto out;
> > +
> > +		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > +		if (err)
> > +			goto out2;
> > +
> > +		header = (struct ext2_ext_attr_header *) block_buf;
> > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > +			err = EXT2_ET_BAD_EA_HEADER;
> > +			goto out2;
> > +		}
> > +
> > +		/* Single-user block.  We're done here. */
> > +		if (header->h_refcount == 1)
> > +			return 0;
> > +
> > +		/* We need to CoW the block. */
> > +		header->h_refcount--;
> > +		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > +		if (err)
> > +			goto out2;
> > +	} else {
> > +		/* No block, we must increment i_blocks */
> > +		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
> > +					     1);
> > +		if (err)
> > +			goto out;
> > +	}
> > +
> > +	/* Allocate a block */
> > +	grp = ext2fs_group_of_ino(fs, ino);
> > +	goal = ext2fs_inode_table_loc(fs, grp);
> > +	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
> > +	if (err)
> > +		return err;
> > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
> > +out2:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	return err;
> > +}
> > +
> > +
> > +static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
> > +					struct ext2_xattr **pos,
> > +					void *entries_start,
> > +					unsigned int storage_size,
> > +					unsigned int value_offset_correction)
> > +{
> > +	struct ext2_xattr *x = *pos;
> > +	struct ext2_ext_attr_entry *e = entries_start;
> > +	void *end = entries_start + storage_size;
> > +	char *shortname;
> > +	unsigned int entry_size, value_size;
> > +	int idx, ret;
> > +
> > +	/* For all remaining x...  */
> > +	for (; x < handle->attrs + handle->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		/* Calculate index and shortname position */
> > +		shortname = x->name;
> > +		ret = find_ea_index(x->name, &shortname, &idx);
> > +
> > +		/* Calculate entry and value size */
> > +		entry_size = (sizeof(*e) + strlen(shortname) +
> > +			      EXT2_EXT_ATTR_PAD - 1) &
> > +			     ~(EXT2_EXT_ATTR_PAD - 1);
> > +		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
> > +			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
> > +
> > +		/*
> > +		 * Would entry collide with value?
> > +		 * Note that we must leave sufficient room for a (u32)0 to
> > +		 * mark the end of the entries.
> > +		 */
> > +		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
> > +			break;
> > +
> > +		/* Fill out e appropriately */
> > +		e->e_name_len = strlen(shortname);
> > +		e->e_name_index = (ret ? idx : 0);
> > +		e->e_value_offs = end - value_size - (void *)entries_start +
> > +				value_offset_correction;
> > +		e->e_value_block = 0;
> > +		e->e_value_size = x->value_len;
> > +
> > +		/* Store name and value */
> > +		end -= value_size;
> > +		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
> > +		memcpy(end, x->value, e->e_value_size);
> > +
> > +		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
> > +
> > +		e = EXT2_EXT_ATTR_NEXT(e);
> > +		*(__u32 *)e = 0;
> > +	}
> > +	*pos = x;
> > +
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> > +{
> > +	struct ext2_xattr *x;
> > +	struct ext2_inode_large *inode;
> > +	void *start, *block_buf = NULL;
> > +	struct ext2_ext_attr_header *header;
> > +	__u32 ea_inode_magic;
> > +	blk64_t blk;
> > +	unsigned int storage_size;
> > +	unsigned int i, written;
> > +	errcode_t err;
> > +
> > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > +		return 0;
> > +
> > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > +	if (i < sizeof(*inode))
> > +		i = sizeof(*inode);
> > +	err = ext2fs_get_memzero(i, &inode);
> > +	if (err)
> > +		return err;
> > +
> > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > +				     (struct ext2_inode *)inode,
> > +				     EXT2_INODE_SIZE(handle->fs->super));
> > +	if (err)
> > +		goto out;
> > +
> > +	x = handle->attrs;
> > +	/* Does the inode have size for EA? */
> > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > +						  inode->i_extra_isize +
> > +						  sizeof(__u32))
> > +		goto write_ea_block;
> > +
> > +	/* Write the inode EA */
> > +	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
> > +	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
> > +	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > +		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > +		sizeof(__u32);
> > +	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +		inode->i_extra_isize + sizeof(__u32);
> > +
> > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
> > +	if (err)
> > +		goto out;
> > +
> > +	/* Are we done? */
> > +	if (x == handle->attrs + handle->length)
> > +		goto skip_ea_block;
> > +
> > +write_ea_block:
> > +	/* Write the EA block */
> > +	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > +	if (err)
> > +		goto out;
> > +
> > +	storage_size = handle->fs->blocksize -
> > +		sizeof(struct ext2_ext_attr_header);
> > +	start = block_buf + sizeof(struct ext2_ext_attr_header);
> > +
> > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
> > +				     (void *)start - block_buf);
> > +	if (err)
> > +		goto out2;
> > +
> > +	if (x < handle->attrs + handle->length) {
> > +		err = EXT2_ET_EA_NO_SPACE;
> > +		goto out2;
> > +	}
> > +
> > +	if (block_buf) {
> > +		/* Write a header on the EA block */
> > +		header = block_buf;
> > +		header->h_magic = EXT2_EXT_ATTR_MAGIC;
> > +		header->h_refcount = 1;
> > +		header->h_blocks = 1;
> > +
> > +		/* Get a new block for writing */
> > +		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
> > +		if (err)
> > +			goto out2;
> > +
> > +		/* Finally, write the new EA block */
> > +		blk = ext2fs_file_acl_block(handle->fs,
> > +					    (struct ext2_inode *)inode);
> > +		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
> > +					     handle->ino);
> > +		if (err)
> > +			goto out2;
> > +	}
> > +
> > +skip_ea_block:
> > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > +	if (!block_buf && blk) {
> > +		/* xattrs shrunk, free the block */
> > +		ext2fs_file_acl_block_set(handle->fs,
> > +					  (struct ext2_inode *)inode, 0);
> > +		err = ext2fs_iblk_sub_blocks(handle->fs,
> > +					     (struct ext2_inode *)inode, 1);
> > +		if (err)
> > +			goto out;
> > +		ext2fs_block_alloc_stats2(handle->fs, blk, -1);
> > +	}
> > +
> > +	/* Write the inode */
> > +	err = ext2fs_write_inode_full(handle->fs, handle->ino,
> > +				      (struct ext2_inode *)inode,
> > +				      EXT2_INODE_SIZE(handle->fs->super));
> > +	if (err)
> > +		goto out2;
> > +
> > +out2:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	ext2fs_free_mem(&inode);
> > +	handle->dirty = 0;
> > +	return err;
> > +}
> > +
> > +static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
> > +					 struct ext2_ext_attr_entry *entries,
> > +					 unsigned int storage_size,
> > +					 void *value_start)
> > +{
> > +	struct ext2_xattr *x;
> > +	struct ext2_ext_attr_entry *entry;
> > +	const char *prefix;
> > +	void *ptr;
> > +	unsigned int remain, prefix_len;
> > +	errcode_t err;
> > +
> > +	x = handle->attrs;
> > +	while (x->name)
> > +		x++;
> > +
> > +	entry = entries;
> > +	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
> > +		__u32 hash;
> > +
> > +		/* header eats this space */
> > +		remain -= sizeof(struct ext2_ext_attr_entry);
> > +
> > +		/* is attribute name valid? */
> > +		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
> > +			return EXT2_ET_EA_BAD_NAME_LEN;
> > +
> > +		/* attribute len eats this space */
> > +		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
> > +
> > +		/* check value size */
> > +		if (entry->e_value_size > remain)
> > +			return EXT2_ET_EA_BAD_VALUE_SIZE;
> > +
> > +		/* e_value_block must be 0 in inode's ea */
> > +		if (entry->e_value_block != 0)
> > +			return EXT2_ET_BAD_EA_BLOCK_NUM;
> > +
> > +		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
> > +							 entry->e_value_offs);
> > +
> > +		/* e_hash may be 0 in older inode's ea */
> > +		if (entry->e_hash != 0 && entry->e_hash != hash)
> > +			return EXT2_ET_BAD_EA_HASH;
> > +
> > +		remain -= entry->e_value_size;
> > +
> > +		/* Allocate space for more attrs? */
> > +		if (x == handle->attrs + handle->length) {
> > +			err = ext2fs_xattrs_expand(handle, 4);
> > +			if (err)
> > +				return err;
> > +			x = handle->attrs + handle->length - 4;
> > +		}
> > +
> > +		/* Extract name/value */
> > +		prefix = find_ea_prefix(entry->e_name_index);
> > +		prefix_len = (prefix ? strlen(prefix) : 0);
> > +		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
> > +					 &x->name);
> > +		if (err)
> > +			return err;
> > +		if (prefix)
> > +			memcpy(x->name, prefix, prefix_len);
> > +		if (entry->e_name_len)
> > +			memcpy(x->name + prefix_len,
> > +			       (void *)entry + sizeof(*entry),
> > +			       entry->e_name_len);
> > +
> > +		err = ext2fs_get_mem(entry->e_value_size, &x->value);
> > +		if (err)
> > +			return err;
> > +		x->value_len = entry->e_value_size;
> > +		memcpy(x->value, value_start + entry->e_value_offs,
> > +		       entry->e_value_size);
> > +		x++;
> > +		entry = EXT2_EXT_ATTR_NEXT(entry);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
> > +{
> > +	struct ext2_xattr *attrs = NULL, *x;
> > +	unsigned int attrs_len;
> > +	struct ext2_inode_large *inode;
> > +	struct ext2_ext_attr_header *header;
> > +	__u32 ea_inode_magic;
> > +	unsigned int storage_size;
> > +	void *start, *block_buf = NULL;
> > +	blk64_t blk;
> > +	int i;
> > +	errcode_t err;
> > +
> > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > +		return 0;
> > +
> > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > +	if (i < sizeof(*inode))
> > +		i = sizeof(*inode);
> > +	err = ext2fs_get_memzero(i, &inode);
> > +	if (err)
> > +		return err;
> > +
> > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > +				     (struct ext2_inode *)inode,
> > +				     EXT2_INODE_SIZE(handle->fs->super));
> > +	if (err)
> > +		goto out;
> > +
> > +	/* Does the inode have size for EA? */
> > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > +						  inode->i_extra_isize +
> > +						  sizeof(__u32))
> > +		goto read_ea_block;
> > +
> > +	/* Look for EA in the inode */
> > +	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +	       inode->i_extra_isize, sizeof(__u32));
> > +	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
> > +		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > +			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > +			sizeof(__u32);
> > +		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +			inode->i_extra_isize + sizeof(__u32);
> > +
> > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > +					      start);
> > +		if (err)
> > +			goto out;
> > +	}
> > +
> > +read_ea_block:
> > +	/* Look for EA in a separate EA block */
> > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > +	if (blk != 0) {
> > +		if ((blk < handle->fs->super->s_first_data_block) ||
> > +		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
> > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > +			goto out;
> > +		}
> > +
> > +		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > +		if (err)
> > +			goto out;
> > +
> > +		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
> > +					    handle->ino);
> > +		if (err)
> > +			goto out3;
> > +
> > +		header = (struct ext2_ext_attr_header *) block_buf;
> > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > +			err = EXT2_ET_BAD_EA_HEADER;
> > +			goto out3;
> > +		}
> > +
> > +		if (header->h_blocks != 1) {
> > +			err = EXT2_ET_BAD_EA_HEADER;
> > +			goto out3;
> > +		}
> > +
> > +		/* Read EAs */
> > +		storage_size = handle->fs->blocksize -
> > +			sizeof(struct ext2_ext_attr_header);
> > +		start = block_buf + sizeof(struct ext2_ext_attr_header);
> > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > +					      block_buf);
> > +		if (err)
> > +			goto out3;
> > +
> > +		ext2fs_free_mem(&block_buf);
> > +	}
> > +
> > +	ext2fs_free_mem(&block_buf);
> > +	ext2fs_free_mem(&inode);
> > +	return 0;
> > +
> > +out3:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	ext2fs_free_mem(&inode);
> > +	return err;
> > +}
> > +
> > +#define XATTR_ABORT	1
> > +#define XATTR_CHANGED	2
> > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > +				int (*func)(char *name, char *value,
> > +					    void *data),
> > +				void *data)
> > +{
> > +	struct ext2_xattr *x;
> > +	errcode_t err;
> > +	int ret;
> > +
> > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		ret = func(x->name, x->value, data);
> > +		if (ret & XATTR_CHANGED)
> > +			h->dirty = 1;
> > +		if (ret & XATTR_ABORT)
> > +			return 0;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > +			   void **value, unsigned int *value_len)
> > +{
> > +	struct ext2_xattr *x;
> > +	void *val;
> > +	errcode_t err;
> > +
> > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		if (strcmp(x->name, key) == 0) {
> > +			err = ext2fs_get_mem(x->value_len, &val);
> > +			if (err)
> > +				return err;
> > +			memcpy(val, x->value, x->value_len);
> > +			*value = val;
> > +			*value_len = x->value_len;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > +}
> > +
> > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > +			   const char *key,
> > +			   const void *value,
> > +			   unsigned int value_len)
> > +{
> > +	struct ext2_xattr *x, *last_empty;
> > +	char *new_value;
> > +	errcode_t err;
> > +
> > +	last_empty = NULL;
> > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > +		if (!x->name) {
> > +			last_empty = x;
> > +			continue;
> > +		}
> > +
> > +		/* Replace xattr */
> > +		if (strcmp(x->name, key) == 0) {
> > +			err = ext2fs_get_mem(value_len, &new_value);
> > +			if (err)
> > +				return err;
> > +			memcpy(new_value, value, value_len);
> > +			ext2fs_free_mem(&x->value);
> > +			x->value = new_value;
> > +			x->value_len = value_len;
> > +			handle->dirty = 1;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/* Add attr to empty slot */
> > +	if (last_empty) {
> > +		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
> > +		if (err)
> > +			return err;
> > +		strcpy(last_empty->name, key);
> > +
> > +		err = ext2fs_get_mem(value_len, &last_empty->value);
> > +		if (err)
> > +			return err;
> > +		memcpy(last_empty->value, value, value_len);
> > +		last_empty->value_len = value_len;
> > +		handle->dirty = 1;
> > +		return 0;
> > +	}
> > +
> > +	/* Expand array, append slot */
> > +	err = ext2fs_xattrs_expand(handle, 4);
> > +	if (err)
> > +		return err;
> > +
> > +	x = handle->attrs + handle->length - 4;
> > +	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
> > +	if (err)
> > +		return err;
> > +	strcpy(x->name, key);
> > +
> > +	err = ext2fs_get_mem(value_len, &x->value);
> > +	if (err)
> > +		return err;
> > +	memcpy(x->value, value, value_len);
> > +	x->value_len = value_len;
> > +	handle->dirty = 1;
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > +			      const char *key)
> > +{
> > +	struct ext2_xattr *x;
> > +	errcode_t err;
> > +
> > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		if (strcmp(x->name, key) == 0) {
> > +			ext2fs_free_mem(&x->name);
> > +			ext2fs_free_mem(&x->value);
> > +			x->value_len = 0;
> > +			handle->dirty = 1;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > +			     struct ext2_xattr_handle **handle)
> > +{
> > +	struct ext2_xattr_handle *h;
> > +	errcode_t err;
> > +
> > +	err = ext2fs_get_memzero(sizeof(*h), &h);
> > +	if (err)
> > +		return err;
> > +
> > +	h->length = 4;
> > +	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
> > +				   &h->attrs);
> > +	if (err) {
> > +		ext2fs_free_mem(&h);
> > +		return err;
> > +	}
> > +	h->ino = ino;
> > +	h->fs = fs;
> > +	*handle = h;
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
> > +{
> > +	unsigned int i;
> > +	struct ext2_xattr_handle *h = *handle;
> > +	struct ext2_xattr *a = h->attrs;
> > +	errcode_t err;
> > +
> > +	if (h->dirty) {
> > +		err = ext2fs_xattrs_write(h);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	for (i = 0; i < h->length; i++) {
> > +		if (a[i].name)
> > +			ext2fs_free_mem(&a[i].name);
> > +		if (a[i].value)
> > +			ext2fs_free_mem(&a[i].value);
> > +	}
> > +
> > +	ext2fs_free_mem(&h->attrs);
> > +	ext2fs_free_mem(handle);
> > +	return 0;
> > +}
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-26  7:21   ` Zheng Liu
  2013-11-26 19:55     ` Darrick J. Wong
@ 2013-11-27  1:56     ` Darrick J. Wong
  2013-11-29  5:30       ` Zheng Liu
  1 sibling, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-27  1:56 UTC (permalink / raw)
  To: tytso, linux-ext4

On Tue, Nov 26, 2013 at 03:21:16PM +0800, Zheng Liu wrote:
> On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> > Add functions to allow clients to get, set, and remove extended
> > attributes from any file.  It also supports modifying EAs living in
> > i_file_acl.
> > 
> > v2: Put the header declarations in the correct part of ext2fs.h,
> > provide a function to release an EA block from an inode, and check
> > i_extra_isize to make sure we actually have space for in-inode EAs.
> 
> Is this the latest version?  I am working on inline data patch set for
> e2fsprogs, and I want to use these API to manipulate the EA.  So that
> would be great if you could point out which one is the latest version.
> Thanks in advance.  Otherwise some nits below.
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  lib/ext2fs/ext2_err.et.in |   18 +
> >  lib/ext2fs/ext2fs.h       |   28 ++
> >  lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 807 insertions(+)
> > 
> > 
> > diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
> > index 9cc1bd1..b819a90 100644
> > --- a/lib/ext2fs/ext2_err.et.in
> > +++ b/lib/ext2fs/ext2_err.et.in
> > @@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
> >  ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
> >  	"Cannot block iterate on an inode containing inline data"
> >  
> > +ec	EXT2_ET_EA_BAD_NAME_LEN,
> > +	"Extended attribute has an invalid name length"
> > +
> > +ec	EXT2_ET_EA_BAD_VALUE_SIZE,
> > +	"Extended attribute has an invalid value length"
> > +
> > +ec	EXT2_ET_BAD_EA_HASH,
> > +	"Extended attribute has an incorrect hash"
> > +
> > +ec	EXT2_ET_BAD_EA_HEADER,
> > +	"Extended attribute block has a bad header"
> > +
> > +ec	EXT2_ET_EA_KEY_NOT_FOUND,
> > +	"Extended attribute key not found"
> > +
> > +ec	EXT2_ET_EA_NO_SPACE,
> > +	"Insufficient space to store extended attribute data"
> > +
> >  	end
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 5247922..93adae8 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
> >  #define EXT2_FLAG_FLUSH_NO_SYNC          1
> >  
> >  /*
> > + * Modify and iterate extended attributes
> > + */
> > +struct ext2_xattr_handle;
> > +#define XATTR_ABORT	1
> > +#define XATTR_CHANGED	2
> > +
> > +/*
> >   * function prototypes
> >   */
> >  static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> > @@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
> >  					   char *block_buf,
> >  					   int adjust, __u32 *newcount,
> >  					   ext2_ino_t inum);
> > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > +			       unsigned int expandby);
> > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > +				int (*func)(char *name, char *value,
> > +					    void *data),
> > +				void *data);
> > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > +			   void **value, unsigned int *value_len);
> > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > +			   const char *key,
> > +			   const void *value,
> > +			   unsigned int value_len);
> > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > +			      const char *key);
> > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > +			     struct ext2_xattr_handle **handle);
> > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > +			       struct ext2_inode_large *inode);
> >  
> >  /* extent.c */
> >  extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
> > diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> > index 9649a14..2a1e5e7 100644
> > --- a/lib/ext2fs/ext_attr.c
> > +++ b/lib/ext2fs/ext_attr.c
> > @@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
> >  	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
> >  					  newcount);
> >  }
> > +
> > +/* Manipulate the contents of extended attribute regions */
> > +struct ext2_xattr {
> > +	char *name;
> > +	void *value;
> > +	unsigned int value_len;
> > +};
> > +
> > +struct ext2_xattr_handle {
> > +	ext2_filsys fs;
> > +	struct ext2_xattr *attrs;
> > +	unsigned int length;
> > +	ext2_ino_t ino;
> > +	int dirty;
> > +};
> > +
> > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > +			       unsigned int expandby)
> > +{
> > +	struct ext2_xattr *new_attrs;
> > +	errcode_t err;
> > +
> > +	err = ext2fs_get_arrayzero(h->length + expandby,
> > +				   sizeof(struct ext2_xattr), &new_attrs);
> > +	if (err)
> > +		return err;
> > +
> > +	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
> > +	ext2fs_free_mem(&h->attrs);
> > +	h->length += expandby;
> > +	h->attrs = new_attrs;
> > +
> > +	return 0;
> > +}
> > +
> > +struct ea_name_index {
> > +	int index;
> > +	const char *name;
> > +};
> > +
> > +static struct ea_name_index ea_names[] = {
> > +	{1, "user."},
> > +	{2, "system.posix_acl_access"},
> > +	{3, "system.posix_acl_default"},
> > +	{4, "trusted."},
> > +	{6, "security."},
> > +	{7, "system."},
> 
> It seems that we also have a _RICHACL name here.

Yes.  Do you know what it's used for?  EXT4_XATTR_INDEX_RICHACL isn't used as
of 3.13-rc1.

--D
> 
> > +	{0, NULL},
> > +};
> > +
> > +static const char *find_ea_prefix(int index)
> > +{
> > +	struct ea_name_index *e;
> > +
> > +	for (e = ea_names; e->name; e++)
> > +		if (e->index == index)
> > +			return e->name;
> > +
> > +	return NULL;
> > +}
> > +
> > +static int find_ea_index(const char *fullname, char **name, int *index)
> > +{
> > +	struct ea_name_index *e;
> > +
> > +	for (e = ea_names; e->name; e++)
> 
> Coding style problem:
>        for (e = ea_names; e->name; e++) {
>                ...
>        }
> 
> Thanks,
>                                                 - Zheng
> 
> > +		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
> > +			*name = (char *)fullname + strlen(e->name);
> > +			*index = e->index;
> > +			return 1;
> > +		}
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > +			       struct ext2_inode_large *inode)
> > +{
> > +	struct ext2_ext_attr_header *header;
> > +	void *block_buf = NULL;
> > +	dgrp_t grp;
> > +	blk64_t blk, goal;
> > +	errcode_t err;
> > +	struct ext2_inode_large i;
> > +
> > +	/* Read inode? */
> > +	if (inode == NULL) {
> > +		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
> > +					     sizeof(struct ext2_inode_large));
> > +		if (err)
> > +			return err;
> > +		inode = &i;
> > +	}
> > +
> > +	/* Do we already have an EA block? */
> > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > +	if (blk == 0)
> > +		return 0;
> > +
> > +	/* Find block, zero it, write back */
> > +	if ((blk < fs->super->s_first_data_block) ||
> > +	    (blk >= ext2fs_blocks_count(fs->super))) {
> > +		err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > +		goto out;
> > +	}
> > +
> > +	err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > +	if (err)
> > +		goto out;
> > +
> > +	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > +	if (err)
> > +		goto out2;
> > +
> > +	header = (struct ext2_ext_attr_header *) block_buf;
> > +	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > +		err = EXT2_ET_BAD_EA_HEADER;
> > +		goto out2;
> > +	}
> > +
> > +	header->h_refcount--;
> > +	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > +	if (err)
> > +		goto out2;
> > +
> > +	/* Erase link to block */
> > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
> > +	if (header->h_refcount == 0)
> > +		ext2fs_block_alloc_stats2(fs, blk, -1);
> > +
> > +	/* Write inode? */
> > +	if (inode == &i) {
> > +		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
> > +					      sizeof(struct ext2_inode_large));
> > +		if (err)
> > +			goto out2;
> > +	}
> > +
> > +out2:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	return err;
> > +}
> > +
> > +static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
> > +					 struct ext2_inode_large *inode)
> > +{
> > +	struct ext2_ext_attr_header *header;
> > +	void *block_buf = NULL;
> > +	dgrp_t grp;
> > +	blk64_t blk, goal;
> > +	errcode_t err;
> > +
> > +	/* Do we already have an EA block? */
> > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > +	if (blk != 0) {
> > +		if ((blk < fs->super->s_first_data_block) ||
> > +		    (blk >= ext2fs_blocks_count(fs->super))) {
> > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > +			goto out;
> > +		}
> > +
> > +		err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > +		if (err)
> > +			goto out;
> > +
> > +		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > +		if (err)
> > +			goto out2;
> > +
> > +		header = (struct ext2_ext_attr_header *) block_buf;
> > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > +			err = EXT2_ET_BAD_EA_HEADER;
> > +			goto out2;
> > +		}
> > +
> > +		/* Single-user block.  We're done here. */
> > +		if (header->h_refcount == 1)
> > +			return 0;
> > +
> > +		/* We need to CoW the block. */
> > +		header->h_refcount--;
> > +		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > +		if (err)
> > +			goto out2;
> > +	} else {
> > +		/* No block, we must increment i_blocks */
> > +		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
> > +					     1);
> > +		if (err)
> > +			goto out;
> > +	}
> > +
> > +	/* Allocate a block */
> > +	grp = ext2fs_group_of_ino(fs, ino);
> > +	goal = ext2fs_inode_table_loc(fs, grp);
> > +	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
> > +	if (err)
> > +		return err;
> > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
> > +out2:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	return err;
> > +}
> > +
> > +
> > +static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
> > +					struct ext2_xattr **pos,
> > +					void *entries_start,
> > +					unsigned int storage_size,
> > +					unsigned int value_offset_correction)
> > +{
> > +	struct ext2_xattr *x = *pos;
> > +	struct ext2_ext_attr_entry *e = entries_start;
> > +	void *end = entries_start + storage_size;
> > +	char *shortname;
> > +	unsigned int entry_size, value_size;
> > +	int idx, ret;
> > +
> > +	/* For all remaining x...  */
> > +	for (; x < handle->attrs + handle->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		/* Calculate index and shortname position */
> > +		shortname = x->name;
> > +		ret = find_ea_index(x->name, &shortname, &idx);
> > +
> > +		/* Calculate entry and value size */
> > +		entry_size = (sizeof(*e) + strlen(shortname) +
> > +			      EXT2_EXT_ATTR_PAD - 1) &
> > +			     ~(EXT2_EXT_ATTR_PAD - 1);
> > +		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
> > +			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
> > +
> > +		/*
> > +		 * Would entry collide with value?
> > +		 * Note that we must leave sufficient room for a (u32)0 to
> > +		 * mark the end of the entries.
> > +		 */
> > +		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
> > +			break;
> > +
> > +		/* Fill out e appropriately */
> > +		e->e_name_len = strlen(shortname);
> > +		e->e_name_index = (ret ? idx : 0);
> > +		e->e_value_offs = end - value_size - (void *)entries_start +
> > +				value_offset_correction;
> > +		e->e_value_block = 0;
> > +		e->e_value_size = x->value_len;
> > +
> > +		/* Store name and value */
> > +		end -= value_size;
> > +		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
> > +		memcpy(end, x->value, e->e_value_size);
> > +
> > +		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
> > +
> > +		e = EXT2_EXT_ATTR_NEXT(e);
> > +		*(__u32 *)e = 0;
> > +	}
> > +	*pos = x;
> > +
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> > +{
> > +	struct ext2_xattr *x;
> > +	struct ext2_inode_large *inode;
> > +	void *start, *block_buf = NULL;
> > +	struct ext2_ext_attr_header *header;
> > +	__u32 ea_inode_magic;
> > +	blk64_t blk;
> > +	unsigned int storage_size;
> > +	unsigned int i, written;
> > +	errcode_t err;
> > +
> > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > +		return 0;
> > +
> > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > +	if (i < sizeof(*inode))
> > +		i = sizeof(*inode);
> > +	err = ext2fs_get_memzero(i, &inode);
> > +	if (err)
> > +		return err;
> > +
> > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > +				     (struct ext2_inode *)inode,
> > +				     EXT2_INODE_SIZE(handle->fs->super));
> > +	if (err)
> > +		goto out;
> > +
> > +	x = handle->attrs;
> > +	/* Does the inode have size for EA? */
> > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > +						  inode->i_extra_isize +
> > +						  sizeof(__u32))
> > +		goto write_ea_block;
> > +
> > +	/* Write the inode EA */
> > +	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
> > +	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
> > +	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > +		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > +		sizeof(__u32);
> > +	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +		inode->i_extra_isize + sizeof(__u32);
> > +
> > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
> > +	if (err)
> > +		goto out;
> > +
> > +	/* Are we done? */
> > +	if (x == handle->attrs + handle->length)
> > +		goto skip_ea_block;
> > +
> > +write_ea_block:
> > +	/* Write the EA block */
> > +	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > +	if (err)
> > +		goto out;
> > +
> > +	storage_size = handle->fs->blocksize -
> > +		sizeof(struct ext2_ext_attr_header);
> > +	start = block_buf + sizeof(struct ext2_ext_attr_header);
> > +
> > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
> > +				     (void *)start - block_buf);
> > +	if (err)
> > +		goto out2;
> > +
> > +	if (x < handle->attrs + handle->length) {
> > +		err = EXT2_ET_EA_NO_SPACE;
> > +		goto out2;
> > +	}
> > +
> > +	if (block_buf) {
> > +		/* Write a header on the EA block */
> > +		header = block_buf;
> > +		header->h_magic = EXT2_EXT_ATTR_MAGIC;
> > +		header->h_refcount = 1;
> > +		header->h_blocks = 1;
> > +
> > +		/* Get a new block for writing */
> > +		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
> > +		if (err)
> > +			goto out2;
> > +
> > +		/* Finally, write the new EA block */
> > +		blk = ext2fs_file_acl_block(handle->fs,
> > +					    (struct ext2_inode *)inode);
> > +		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
> > +					     handle->ino);
> > +		if (err)
> > +			goto out2;
> > +	}
> > +
> > +skip_ea_block:
> > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > +	if (!block_buf && blk) {
> > +		/* xattrs shrunk, free the block */
> > +		ext2fs_file_acl_block_set(handle->fs,
> > +					  (struct ext2_inode *)inode, 0);
> > +		err = ext2fs_iblk_sub_blocks(handle->fs,
> > +					     (struct ext2_inode *)inode, 1);
> > +		if (err)
> > +			goto out;
> > +		ext2fs_block_alloc_stats2(handle->fs, blk, -1);
> > +	}
> > +
> > +	/* Write the inode */
> > +	err = ext2fs_write_inode_full(handle->fs, handle->ino,
> > +				      (struct ext2_inode *)inode,
> > +				      EXT2_INODE_SIZE(handle->fs->super));
> > +	if (err)
> > +		goto out2;
> > +
> > +out2:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	ext2fs_free_mem(&inode);
> > +	handle->dirty = 0;
> > +	return err;
> > +}
> > +
> > +static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
> > +					 struct ext2_ext_attr_entry *entries,
> > +					 unsigned int storage_size,
> > +					 void *value_start)
> > +{
> > +	struct ext2_xattr *x;
> > +	struct ext2_ext_attr_entry *entry;
> > +	const char *prefix;
> > +	void *ptr;
> > +	unsigned int remain, prefix_len;
> > +	errcode_t err;
> > +
> > +	x = handle->attrs;
> > +	while (x->name)
> > +		x++;
> > +
> > +	entry = entries;
> > +	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
> > +		__u32 hash;
> > +
> > +		/* header eats this space */
> > +		remain -= sizeof(struct ext2_ext_attr_entry);
> > +
> > +		/* is attribute name valid? */
> > +		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
> > +			return EXT2_ET_EA_BAD_NAME_LEN;
> > +
> > +		/* attribute len eats this space */
> > +		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
> > +
> > +		/* check value size */
> > +		if (entry->e_value_size > remain)
> > +			return EXT2_ET_EA_BAD_VALUE_SIZE;
> > +
> > +		/* e_value_block must be 0 in inode's ea */
> > +		if (entry->e_value_block != 0)
> > +			return EXT2_ET_BAD_EA_BLOCK_NUM;
> > +
> > +		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
> > +							 entry->e_value_offs);
> > +
> > +		/* e_hash may be 0 in older inode's ea */
> > +		if (entry->e_hash != 0 && entry->e_hash != hash)
> > +			return EXT2_ET_BAD_EA_HASH;
> > +
> > +		remain -= entry->e_value_size;
> > +
> > +		/* Allocate space for more attrs? */
> > +		if (x == handle->attrs + handle->length) {
> > +			err = ext2fs_xattrs_expand(handle, 4);
> > +			if (err)
> > +				return err;
> > +			x = handle->attrs + handle->length - 4;
> > +		}
> > +
> > +		/* Extract name/value */
> > +		prefix = find_ea_prefix(entry->e_name_index);
> > +		prefix_len = (prefix ? strlen(prefix) : 0);
> > +		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
> > +					 &x->name);
> > +		if (err)
> > +			return err;
> > +		if (prefix)
> > +			memcpy(x->name, prefix, prefix_len);
> > +		if (entry->e_name_len)
> > +			memcpy(x->name + prefix_len,
> > +			       (void *)entry + sizeof(*entry),
> > +			       entry->e_name_len);
> > +
> > +		err = ext2fs_get_mem(entry->e_value_size, &x->value);
> > +		if (err)
> > +			return err;
> > +		x->value_len = entry->e_value_size;
> > +		memcpy(x->value, value_start + entry->e_value_offs,
> > +		       entry->e_value_size);
> > +		x++;
> > +		entry = EXT2_EXT_ATTR_NEXT(entry);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
> > +{
> > +	struct ext2_xattr *attrs = NULL, *x;
> > +	unsigned int attrs_len;
> > +	struct ext2_inode_large *inode;
> > +	struct ext2_ext_attr_header *header;
> > +	__u32 ea_inode_magic;
> > +	unsigned int storage_size;
> > +	void *start, *block_buf = NULL;
> > +	blk64_t blk;
> > +	int i;
> > +	errcode_t err;
> > +
> > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > +		return 0;
> > +
> > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > +	if (i < sizeof(*inode))
> > +		i = sizeof(*inode);
> > +	err = ext2fs_get_memzero(i, &inode);
> > +	if (err)
> > +		return err;
> > +
> > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > +				     (struct ext2_inode *)inode,
> > +				     EXT2_INODE_SIZE(handle->fs->super));
> > +	if (err)
> > +		goto out;
> > +
> > +	/* Does the inode have size for EA? */
> > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > +						  inode->i_extra_isize +
> > +						  sizeof(__u32))
> > +		goto read_ea_block;
> > +
> > +	/* Look for EA in the inode */
> > +	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +	       inode->i_extra_isize, sizeof(__u32));
> > +	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
> > +		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > +			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > +			sizeof(__u32);
> > +		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > +			inode->i_extra_isize + sizeof(__u32);
> > +
> > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > +					      start);
> > +		if (err)
> > +			goto out;
> > +	}
> > +
> > +read_ea_block:
> > +	/* Look for EA in a separate EA block */
> > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > +	if (blk != 0) {
> > +		if ((blk < handle->fs->super->s_first_data_block) ||
> > +		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
> > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > +			goto out;
> > +		}
> > +
> > +		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > +		if (err)
> > +			goto out;
> > +
> > +		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
> > +					    handle->ino);
> > +		if (err)
> > +			goto out3;
> > +
> > +		header = (struct ext2_ext_attr_header *) block_buf;
> > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > +			err = EXT2_ET_BAD_EA_HEADER;
> > +			goto out3;
> > +		}
> > +
> > +		if (header->h_blocks != 1) {
> > +			err = EXT2_ET_BAD_EA_HEADER;
> > +			goto out3;
> > +		}
> > +
> > +		/* Read EAs */
> > +		storage_size = handle->fs->blocksize -
> > +			sizeof(struct ext2_ext_attr_header);
> > +		start = block_buf + sizeof(struct ext2_ext_attr_header);
> > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > +					      block_buf);
> > +		if (err)
> > +			goto out3;
> > +
> > +		ext2fs_free_mem(&block_buf);
> > +	}
> > +
> > +	ext2fs_free_mem(&block_buf);
> > +	ext2fs_free_mem(&inode);
> > +	return 0;
> > +
> > +out3:
> > +	ext2fs_free_mem(&block_buf);
> > +out:
> > +	ext2fs_free_mem(&inode);
> > +	return err;
> > +}
> > +
> > +#define XATTR_ABORT	1
> > +#define XATTR_CHANGED	2
> > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > +				int (*func)(char *name, char *value,
> > +					    void *data),
> > +				void *data)
> > +{
> > +	struct ext2_xattr *x;
> > +	errcode_t err;
> > +	int ret;
> > +
> > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		ret = func(x->name, x->value, data);
> > +		if (ret & XATTR_CHANGED)
> > +			h->dirty = 1;
> > +		if (ret & XATTR_ABORT)
> > +			return 0;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > +			   void **value, unsigned int *value_len)
> > +{
> > +	struct ext2_xattr *x;
> > +	void *val;
> > +	errcode_t err;
> > +
> > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		if (strcmp(x->name, key) == 0) {
> > +			err = ext2fs_get_mem(x->value_len, &val);
> > +			if (err)
> > +				return err;
> > +			memcpy(val, x->value, x->value_len);
> > +			*value = val;
> > +			*value_len = x->value_len;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > +}
> > +
> > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > +			   const char *key,
> > +			   const void *value,
> > +			   unsigned int value_len)
> > +{
> > +	struct ext2_xattr *x, *last_empty;
> > +	char *new_value;
> > +	errcode_t err;
> > +
> > +	last_empty = NULL;
> > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > +		if (!x->name) {
> > +			last_empty = x;
> > +			continue;
> > +		}
> > +
> > +		/* Replace xattr */
> > +		if (strcmp(x->name, key) == 0) {
> > +			err = ext2fs_get_mem(value_len, &new_value);
> > +			if (err)
> > +				return err;
> > +			memcpy(new_value, value, value_len);
> > +			ext2fs_free_mem(&x->value);
> > +			x->value = new_value;
> > +			x->value_len = value_len;
> > +			handle->dirty = 1;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/* Add attr to empty slot */
> > +	if (last_empty) {
> > +		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
> > +		if (err)
> > +			return err;
> > +		strcpy(last_empty->name, key);
> > +
> > +		err = ext2fs_get_mem(value_len, &last_empty->value);
> > +		if (err)
> > +			return err;
> > +		memcpy(last_empty->value, value, value_len);
> > +		last_empty->value_len = value_len;
> > +		handle->dirty = 1;
> > +		return 0;
> > +	}
> > +
> > +	/* Expand array, append slot */
> > +	err = ext2fs_xattrs_expand(handle, 4);
> > +	if (err)
> > +		return err;
> > +
> > +	x = handle->attrs + handle->length - 4;
> > +	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
> > +	if (err)
> > +		return err;
> > +	strcpy(x->name, key);
> > +
> > +	err = ext2fs_get_mem(value_len, &x->value);
> > +	if (err)
> > +		return err;
> > +	memcpy(x->value, value, value_len);
> > +	x->value_len = value_len;
> > +	handle->dirty = 1;
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > +			      const char *key)
> > +{
> > +	struct ext2_xattr *x;
> > +	errcode_t err;
> > +
> > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > +		if (!x->name)
> > +			continue;
> > +
> > +		if (strcmp(x->name, key) == 0) {
> > +			ext2fs_free_mem(&x->name);
> > +			ext2fs_free_mem(&x->value);
> > +			x->value_len = 0;
> > +			handle->dirty = 1;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > +			     struct ext2_xattr_handle **handle)
> > +{
> > +	struct ext2_xattr_handle *h;
> > +	errcode_t err;
> > +
> > +	err = ext2fs_get_memzero(sizeof(*h), &h);
> > +	if (err)
> > +		return err;
> > +
> > +	h->length = 4;
> > +	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
> > +				   &h->attrs);
> > +	if (err) {
> > +		ext2fs_free_mem(&h);
> > +		return err;
> > +	}
> > +	h->ino = ino;
> > +	h->fs = fs;
> > +	*handle = h;
> > +	return 0;
> > +}
> > +
> > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
> > +{
> > +	unsigned int i;
> > +	struct ext2_xattr_handle *h = *handle;
> > +	struct ext2_xattr *a = h->attrs;
> > +	errcode_t err;
> > +
> > +	if (h->dirty) {
> > +		err = ext2fs_xattrs_write(h);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	for (i = 0; i < h->length; i++) {
> > +		if (a[i].name)
> > +			ext2fs_free_mem(&a[i].name);
> > +		if (a[i].value)
> > +			ext2fs_free_mem(&a[i].value);
> > +	}
> > +
> > +	ext2fs_free_mem(&h->attrs);
> > +	ext2fs_free_mem(handle);
> > +	return 0;
> > +}
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/25] resize2fs: convert fs to and from 64bit mode
  2013-11-26 18:39     ` Darrick J. Wong
@ 2013-11-27  2:21       ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-27  2:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Tue, Nov 26, 2013 at 10:39:20AM -0800, Darrick J. Wong wrote:
> On Tue, Nov 26, 2013 at 02:44:45PM +0800, Zheng Liu wrote:
> > On Thu, Oct 17, 2013 at 09:50:42PM -0700, Darrick J. Wong wrote:
> > > resize2fs does its magic by loading a filesystem, duplicating the
> > > in-memory image of that fs, moving relevant blocks out of the way of
> > > whatever new metadata get created, and finally writing everything back
> > > out to disk.  Enabling 64bit mode enlarges the group descriptors,
> > > which makes resize2fs a reasonable vehicle for taking care of the rest
> > > of the bookkeeping requirements, so add to resize2fs the ability to
> > > convert a filesystem to 64bit mode and back.
> > 
> > Sorry, I don't get your point why we need to add these arguments to
> > enable/disable 64bit mode.  If I understand correctly, we don't disable
> > 64bit mode for a file system which is larger than 2^32 blocks.  So that
> > means that we just disable it for a file system which 64bit shouldn't be
> > enabled.  Is it worth doing this?
> 
> Are you questioning the entire conversion, or just the 64->32 direction?
> 
> 32->64 has two benefits: You can resize (somewhat) past 16T (256T I think?);
> and you get full 32-bit bitmap checksums.
> 
> I agree that 64->32 isn't terribly useful, but dislike one-way conversions.

Thanks for your explanation.  Now it makes sense to me.  Enabling 64bit
mode makes us break the limitation of 16T.  Absolutely it is useful for
us.

Thanks,
                                                - Zheng

> 
> > Otherwise one nit below.
> > 
> >                                                 - Zheng
> > 
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  resize/main.c         |   40 ++++++-
> > >  resize/resize2fs.8.in |   18 +++
> > >  resize/resize2fs.c    |  282 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  resize/resize2fs.h    |    3 +
> > >  4 files changed, 336 insertions(+), 7 deletions(-)
> > > 
> > > 
> > > diff --git a/resize/main.c b/resize/main.c
> > > index 1394ae1..ad0c946 100644
> > > --- a/resize/main.c
> > > +++ b/resize/main.c
> > > @@ -41,7 +41,7 @@ char *program_name, *device_name, *io_options;
> > >  static void usage (char *prog)
> > >  {
> > >  	fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
> > > -			   "[-p] device [new_size]\n\n"), prog);
> > > +			   "[-p] device [-b|-s|new_size]\n\n"), prog);
> > >  
> > >  	exit (1);
> > >  }
> > > @@ -199,7 +199,7 @@ int main (int argc, char ** argv)
> > >  	if (argc && *argv)
> > >  		program_name = *argv;
> > >  
> > > -	while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
> > > +	while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
> > >  		switch (c) {
> > >  		case 'h':
> > >  			usage(program_name);
> > > @@ -225,6 +225,12 @@ int main (int argc, char ** argv)
> > >  		case 'S':
> > >  			use_stride = atoi(optarg);
> > >  			break;
> > > +		case 'b':
> > > +			flags |= RESIZE_ENABLE_64BIT;
> > > +			break;
> > > +		case 's':
> > > +			flags |= RESIZE_DISABLE_64BIT;
> > > +			break;
> > >  		default:
> > >  			usage(program_name);
> > >  		}
> > > @@ -383,6 +389,10 @@ int main (int argc, char ** argv)
> > >  		if (sys_page_size > fs->blocksize)
> > >  			new_size &= ~((sys_page_size / fs->blocksize)-1);
> > >  	}
> > > +	/* If changing 64bit, don't change the filesystem size. */
> > > +	if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> > > +		new_size = ext2fs_blocks_count(fs->super);
> > > +	}
> > >  	if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> > >  				       EXT4_FEATURE_INCOMPAT_64BIT)) {
> > >  		/* Take 16T down to 2^32-1 blocks */
> > > @@ -434,7 +444,31 @@ int main (int argc, char ** argv)
> > >  			fs->blocksize / 1024, new_size);
> > >  		exit(1);
> > >  	}
> > > -	if (new_size == ext2fs_blocks_count(fs->super)) {
> > > +	if (flags & RESIZE_DISABLE_64BIT && flags & RESIZE_ENABLE_64BIT) {
> >             ^^^^^
> > Coding style problem:
> >         if ((flags & RESIZE_ENABLE_64BIT) && (flags & RESIZE_ENABLE_64BIT))
> 
> Yes, thank you for catching this.
> 
> --D
> 
> > 
> > > +		fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
> > > +		exit(1);
> > > +	} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
> > > +		new_size = ext2fs_blocks_count(fs->super);
> > > +		if (new_size >= (1ULL << 32)) {
> > > +			fprintf(stderr, _("Cannot change the 64bit feature "
> > > +				"on a filesystem that is larger than "
> > > +				"2^32 blocks.\n"));
> > > +			exit(1);
> > > +		}
> > > +		if (mount_flags & EXT2_MF_MOUNTED) {
> > > +			fprintf(stderr, _("Cannot change the 64bit feature "
> > > +				"while the filesystem is mounted.\n"));
> > > +			exit(1);
> > > +		}
> > > +		if (flags & RESIZE_ENABLE_64BIT &&
> >                     ^^^^
> >                     ditto
> > 
> > > +		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> > > +				EXT3_FEATURE_INCOMPAT_EXTENTS)) {
> > > +			fprintf(stderr, _("Please enable the extents feature "
> > > +				"with tune2fs before enabling the 64bit "
> > > +				"feature.\n"));
> > > +			exit(1);
> > > +		}
> > > +	} else if (new_size == ext2fs_blocks_count(fs->super)) {
> > >  		fprintf(stderr, _("The filesystem is already %llu blocks "
> > >  			"long.  Nothing to do!\n\n"), new_size);
> > >  		exit(0);
> > > diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
> > > index a1f3099..1c75816 100644
> > > --- a/resize/resize2fs.8.in
> > > +++ b/resize/resize2fs.8.in
> > > @@ -8,7 +8,7 @@ resize2fs \- ext2/ext3/ext4 file system resizer
> > >  .SH SYNOPSIS
> > >  .B resize2fs
> > >  [
> > > -.B \-fFpPM
> > > +.B \-fFpPMbs
> > >  ]
> > >  [
> > >  .B \-d
> > > @@ -85,8 +85,21 @@ to shrink the size of filesystem.  Then you may use
> > >  to shrink the size of the partition.  When shrinking the size of
> > >  the partition, make sure you do not make it smaller than the new size
> > >  of the ext2 filesystem!
> > > +.PP
> > > +The
> > > +.B \-b
> > > +and
> > > +.B \-s
> > > +options enable and disable the 64bit feature, respectively.  The resize2fs
> > > +program will, of course, take care of resizing the block group descriptors
> > > +and moving other data blocks out of the way, as needed.  It is not possible
> > > +to resize the filesystem concurrent with changing the 64bit status.
> > >  .SH OPTIONS
> > >  .TP
> > > +.B \-b
> > > +Turns on the 64bit feature, resizes the group descriptors as necessary, and
> > > +moves other metadata out of the way.
> > > +.TP
> > >  .B \-d \fIdebug-flags
> > >  Turns on various resize2fs debugging features, if they have been compiled
> > >  into the binary.
> > > @@ -126,6 +139,9 @@ of what the program is doing.
> > >  .B \-P
> > >  Print the minimum size of the filesystem and exit.
> > >  .TP
> > > +.B \-s
> > > +Turns off the 64bit feature and frees blocks that are no longer in use.
> > > +.TP
> > >  .B \-S \fIRAID-stride
> > >  The
> > >  .B resize2fs
> > > diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> > > index 0feff0f..05ba6e1 100644
> > > --- a/resize/resize2fs.c
> > > +++ b/resize/resize2fs.c
> > > @@ -53,6 +53,9 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs);
> > >  static errcode_t fix_sb_journal_backup(ext2_filsys fs);
> > >  static errcode_t mark_table_blocks(ext2_filsys fs,
> > >  				   ext2fs_block_bitmap bmap);
> > > +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size);
> > > +static errcode_t move_bg_metadata(ext2_resize_t rfs);
> > > +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs);
> > >  
> > >  /*
> > >   * Some helper CPP macros
> > > @@ -119,13 +122,30 @@ errcode_t resize_fs(ext2_filsys fs, blk64_t *new_size, int flags,
> > >  	if (retval)
> > >  		goto errout;
> > >  
> > > +	init_resource_track(&rtrack, "resize_group_descriptors", fs->io);
> > > +	retval = resize_group_descriptors(rfs, *new_size);
> > > +	if (retval)
> > > +		goto errout;
> > > +	print_resource_track(rfs, &rtrack, fs->io);
> > > +
> > > +	init_resource_track(&rtrack, "move_bg_metadata", fs->io);
> > > +	retval = move_bg_metadata(rfs);
> > > +	if (retval)
> > > +		goto errout;
> > > +	print_resource_track(rfs, &rtrack, fs->io);
> > > +
> > > +	init_resource_track(&rtrack, "zero_high_bits_in_metadata", fs->io);
> > > +	retval = zero_high_bits_in_inodes(rfs);
> > > +	if (retval)
> > > +		goto errout;
> > > +	print_resource_track(rfs, &rtrack, fs->io);
> > > +
> > >  	init_resource_track(&rtrack, "adjust_superblock", fs->io);
> > >  	retval = adjust_superblock(rfs, *new_size);
> > >  	if (retval)
> > >  		goto errout;
> > >  	print_resource_track(rfs, &rtrack, fs->io);
> > >  
> > > -
> > >  	init_resource_track(&rtrack, "fix_uninit_block_bitmaps 2", fs->io);
> > >  	fix_uninit_block_bitmaps(rfs->new_fs);
> > >  	print_resource_track(rfs, &rtrack, fs->io);
> > > @@ -221,6 +241,259 @@ errout:
> > >  	return retval;
> > >  }
> > >  
> > > +/* Toggle 64bit mode */
> > > +static errcode_t resize_group_descriptors(ext2_resize_t rfs, blk64_t new_size)
> > > +{
> > > +	void *o, *n, *new_group_desc;
> > > +	dgrp_t i;
> > > +	int copy_size;
> > > +	errcode_t retval;
> > > +
> > > +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> > > +		return 0;
> > > +
> > > +	if (new_size != ext2fs_blocks_count(rfs->new_fs->super) ||
> > > +	    ext2fs_blocks_count(rfs->new_fs->super) >= (1ULL << 32) ||
> > > +	    (rfs->flags & RESIZE_DISABLE_64BIT &&
> > > +	     rfs->flags & RESIZE_ENABLE_64BIT))
> > > +		return EXT2_ET_INVALID_ARGUMENT;
> > > +
> > > +	if (rfs->flags & RESIZE_DISABLE_64BIT) {
> > > +		rfs->new_fs->super->s_feature_incompat &=
> > > +				~EXT4_FEATURE_INCOMPAT_64BIT;
> > > +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE;
> > > +	} else if (rfs->flags & RESIZE_ENABLE_64BIT) {
> > > +		rfs->new_fs->super->s_feature_incompat |=
> > > +				EXT4_FEATURE_INCOMPAT_64BIT;
> > > +		rfs->new_fs->super->s_desc_size = EXT2_MIN_DESC_SIZE_64BIT;
> > > +	}
> > > +
> > > +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> > > +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> > > +		return 0;
> > > +
> > > +	o = rfs->new_fs->group_desc;
> > > +	rfs->new_fs->desc_blocks = ext2fs_div_ceil(
> > > +			rfs->old_fs->group_desc_count,
> > > +			EXT2_DESC_PER_BLOCK(rfs->new_fs->super));
> > > +	retval = ext2fs_get_arrayzero(rfs->new_fs->desc_blocks,
> > > +				      rfs->old_fs->blocksize, &new_group_desc);
> > > +	if (retval)
> > > +		return retval;
> > > +
> > > +	n = new_group_desc;
> > > +
> > > +	if (EXT2_DESC_SIZE(rfs->old_fs->super) <=
> > > +	    EXT2_DESC_SIZE(rfs->new_fs->super))
> > > +		copy_size = EXT2_DESC_SIZE(rfs->old_fs->super);
> > > +	else
> > > +		copy_size = EXT2_DESC_SIZE(rfs->new_fs->super);
> > > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> > > +		memcpy(n, o, copy_size);
> > > +		n += EXT2_DESC_SIZE(rfs->new_fs->super);
> > > +		o += EXT2_DESC_SIZE(rfs->old_fs->super);
> > > +	}
> > > +
> > > +	ext2fs_free_mem(&rfs->new_fs->group_desc);
> > > +	rfs->new_fs->group_desc = new_group_desc;
> > > +
> > > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++)
> > > +		ext2fs_group_desc_csum_set(rfs->new_fs, i);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Move bitmaps/inode tables out of the way. */
> > > +static errcode_t move_bg_metadata(ext2_resize_t rfs)
> > > +{
> > > +	dgrp_t i;
> > > +	blk64_t b, c, d;
> > > +	ext2fs_block_bitmap old_map, new_map;
> > > +	int old, new;
> > > +	errcode_t retval;
> > > +	int zero = 0, one = 1;
> > > +
> > > +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> > > +		return 0;
> > > +
> > > +	retval = ext2fs_allocate_block_bitmap(rfs->old_fs, "oldfs", &old_map);
> > > +	if (retval)
> > > +		return retval;
> > > +
> > > +	retval = ext2fs_allocate_block_bitmap(rfs->new_fs, "newfs", &new_map);
> > > +	if (retval)
> > > +		goto out;
> > > +
> > > +	/* Construct bitmaps of super/descriptor blocks in old and new fs */
> > > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> > > +		retval = ext2fs_super_and_bgd_loc2(rfs->old_fs, i, &b, &c, &d,
> > > +						   NULL);
> > > +		if (retval)
> > > +			goto out;
> > > +		ext2fs_mark_block_bitmap2(old_map, b);
> > > +		ext2fs_mark_block_bitmap2(old_map, c);
> > > +		ext2fs_mark_block_bitmap2(old_map, d);
> > > +
> > > +		retval = ext2fs_super_and_bgd_loc2(rfs->new_fs, i, &b, &c, &d,
> > > +						   NULL);
> > > +		if (retval)
> > > +			goto out;
> > > +		ext2fs_mark_block_bitmap2(new_map, b);
> > > +		ext2fs_mark_block_bitmap2(new_map, c);
> > > +		ext2fs_mark_block_bitmap2(new_map, d);
> > > +	}
> > > +
> > > +	/* Find changes in block allocations for bg metadata */
> > > +	for (b = 0;
> > > +	     b < ext2fs_blocks_count(rfs->new_fs->super);
> > > +	     b += EXT2FS_CLUSTER_RATIO(rfs->new_fs)) {
> > > +		old = ext2fs_test_block_bitmap2(old_map, b);
> > > +		new = ext2fs_test_block_bitmap2(new_map, b);
> > > +
> > > +		if (old && !new)
> > > +			ext2fs_unmark_block_bitmap2(rfs->new_fs->block_map, b);
> > > +		else if (!old && new)
> > > +			; /* empty ext2fs_mark_block_bitmap2(new_map, b); */
> > > +		else
> > > +			ext2fs_unmark_block_bitmap2(new_map, b);
> > > +	}
> > > +	/* new_map now shows blocks that have been newly allocated. */
> > > +
> > > +	/* Move any conflicting bitmaps and inode tables */
> > > +	for (i = 0; i < rfs->old_fs->group_desc_count; i++) {
> > > +		b = ext2fs_block_bitmap_loc(rfs->new_fs, i);
> > > +		if (ext2fs_test_block_bitmap2(new_map, b))
> > > +			ext2fs_block_bitmap_loc_set(rfs->new_fs, i, 0);
> > > +
> > > +		b = ext2fs_inode_bitmap_loc(rfs->new_fs, i);
> > > +		if (ext2fs_test_block_bitmap2(new_map, b))
> > > +			ext2fs_inode_bitmap_loc_set(rfs->new_fs, i, 0);
> > > +
> > > +		c = ext2fs_inode_table_loc(rfs->new_fs, i);
> > > +		for (b = 0; b < rfs->new_fs->inode_blocks_per_group; b++) {
> > > +			if (ext2fs_test_block_bitmap2(new_map, b + c)) {
> > > +				ext2fs_inode_table_loc_set(rfs->new_fs, i, 0);
> > > +				break;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > > +out:
> > > +	if (old_map)
> > > +		ext2fs_free_block_bitmap(old_map);
> > > +	if (new_map)
> > > +		ext2fs_free_block_bitmap(new_map);
> > > +	return retval;
> > > +}
> > > +
> > > +/* Zero out the high bits of extent fields */
> > > +static errcode_t zero_high_bits_in_extents(ext2_filsys fs, ext2_ino_t ino,
> > > +				 struct ext2_inode *inode)
> > > +{
> > > +	ext2_extent_handle_t	handle;
> > > +	struct ext2fs_extent	extent;
> > > +	int			op = EXT2_EXTENT_ROOT;
> > > +	errcode_t		errcode;
> > > +
> > > +	if (!(inode->i_flags & EXT4_EXTENTS_FL))
> > > +		return 0;
> > > +
> > > +	errcode = ext2fs_extent_open(fs, ino, &handle);
> > > +	if (errcode)
> > > +		return errcode;
> > > +
> > > +	while (1) {
> > > +		errcode = ext2fs_extent_get(handle, op, &extent);
> > > +		if (errcode)
> > > +			break;
> > > +
> > > +		op = EXT2_EXTENT_NEXT_SIB;
> > > +
> > > +		if (extent.e_pblk > (1ULL << 32)) {
> > > +			extent.e_pblk &= (1ULL << 32) - 1;
> > > +			errcode = ext2fs_extent_replace(handle, 0, &extent);
> > > +			if (errcode)
> > > +				break;
> > > +		}
> > > +	}
> > > +
> > > +	/* Ok if we run off the end */
> > > +	if (errcode == EXT2_ET_EXTENT_NO_NEXT)
> > > +		errcode = 0;
> > > +	return errcode;
> > > +}
> > > +
> > > +/* Zero out the high bits of inodes. */
> > > +static errcode_t zero_high_bits_in_inodes(ext2_resize_t rfs)
> > > +{
> > > +	ext2_filsys	fs = rfs->new_fs;
> > > +	int length = EXT2_INODE_SIZE(fs->super);
> > > +	struct ext2_inode *inode = NULL;
> > > +	ext2_inode_scan	scan = NULL;
> > > +	errcode_t	retval;
> > > +	ext2_ino_t	ino;
> > > +	blk64_t		file_acl_block;
> > > +	int		inode_dirty;
> > > +
> > > +	if (!(rfs->flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)))
> > > +		return 0;
> > > +
> > > +	if (fs->super->s_creator_os != EXT2_OS_LINUX)
> > > +		return 0;
> > > +
> > > +	retval = ext2fs_open_inode_scan(fs, 0, &scan);
> > > +	if (retval)
> > > +		return retval;
> > > +
> > > +	retval = ext2fs_get_mem(length, &inode);
> > > +	if (retval)
> > > +		goto out;
> > > +
> > > +	do {
> > > +		retval = ext2fs_get_next_inode_full(scan, &ino, inode, length);
> > > +		if (retval)
> > > +			goto out;
> > > +		if (!ino)
> > > +			break;
> > > +		if (!ext2fs_test_inode_bitmap2(fs->inode_map, ino))
> > > +			continue;
> > > +
> > > +		/*
> > > +		 * Here's how we deal with high block number fields:
> > > +		 *
> > > +		 *  - i_size_high has been been written out with i_size_lo
> > > +		 *    since the ext2 days, so no conversion is needed.
> > > +		 *
> > > +		 *  - i_blocks_hi is guarded by both the huge_file feature and
> > > +		 *    inode flags and has always been written out with
> > > +		 *    i_blocks_lo if the feature is set.  The field is only
> > > +		 *    ever read if both feature and inode flag are set, so
> > > +		 *    we don't need to zero it now.
> > > +		 *
> > > +		 *  - i_file_acl_high can be uninitialized, so zero it if
> > > +		 *    it isn't already.
> > > +		 */
> > > +		if (inode->osd2.linux2.l_i_file_acl_high) {
> > > +			inode->osd2.linux2.l_i_file_acl_high = 0;
> > > +			retval = ext2fs_write_inode_full(fs, ino, inode,
> > > +							 length);
> > > +			if (retval)
> > > +				goto out;
> > > +		}
> > > +
> > > +		retval = zero_high_bits_in_extents(fs, ino, inode);
> > > +		if (retval)
> > > +			goto out;
> > > +	} while (ino);
> > > +
> > > +out:
> > > +	if (inode)
> > > +		ext2fs_free_mem(&inode);
> > > +	if (scan)
> > > +		ext2fs_close_inode_scan(scan);
> > > +	return retval;
> > > +}
> > > +
> > >  /*
> > >   * Clean up the bitmaps for unitialized bitmaps
> > >   */
> > > @@ -424,7 +697,8 @@ retry:
> > >  	/*
> > >  	 * Reallocate the group descriptors as necessary.
> > >  	 */
> > > -	if (old_fs->desc_blocks != fs->desc_blocks) {
> > > +	if (EXT2_DESC_SIZE(old_fs->super) == EXT2_DESC_SIZE(fs->super) &&
> > > +	    old_fs->desc_blocks != fs->desc_blocks) {
> > >  		retval = ext2fs_resize_mem(old_fs->desc_blocks *
> > >  					   fs->blocksize,
> > >  					   fs->desc_blocks * fs->blocksize,
> > > @@ -949,7 +1223,9 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
> > >  		new_blocks = fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
> > >  	}
> > >  
> > > -	if (old_blocks == new_blocks) {
> > > +	if (EXT2_DESC_SIZE(rfs->old_fs->super) ==
> > > +	    EXT2_DESC_SIZE(rfs->new_fs->super) &&
> > > +	    old_blocks == new_blocks) {
> > >  		retval = 0;
> > >  		goto errout;
> > >  	}
> > > diff --git a/resize/resize2fs.h b/resize/resize2fs.h
> > > index 52319b5..5a1c5dc 100644
> > > --- a/resize/resize2fs.h
> > > +++ b/resize/resize2fs.h
> > > @@ -82,6 +82,9 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
> > >  #define RESIZE_PERCENT_COMPLETE		0x0100
> > >  #define RESIZE_VERBOSE			0x0200
> > >  
> > > +#define RESIZE_ENABLE_64BIT		0x0400
> > > +#define RESIZE_DISABLE_64BIT		0x0800
> > > +
> > >  /*
> > >   * This structure is used for keeping track of how much resources have
> > >   * been used for a particular resize2fs pass.
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-26 19:55     ` Darrick J. Wong
@ 2013-11-27  2:52       ` Zheng Liu
  2013-11-27  3:13         ` Darrick J. Wong
  0 siblings, 1 reply; 73+ messages in thread
From: Zheng Liu @ 2013-11-27  2:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Tue, Nov 26, 2013 at 11:55:47AM -0800, Darrick J. Wong wrote:
> On Tue, Nov 26, 2013 at 03:21:16PM +0800, Zheng Liu wrote:
> > On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> > > Add functions to allow clients to get, set, and remove extended
> > > attributes from any file.  It also supports modifying EAs living in
> > > i_file_acl.
> > > 
> > > v2: Put the header declarations in the correct part of ext2fs.h,
> > > provide a function to release an EA block from an inode, and check
> > > i_extra_isize to make sure we actually have space for in-inode EAs.
> > 
> > Is this the latest version?  I am working on inline data patch set for
> > e2fsprogs, and I want to use these API to manipulate the EA.  So that
> > would be great if you could point out which one is the latest version.
> > Thanks in advance.  Otherwise some nits below.
> 
> Oh!  I was just about to start working on pulling your patches into my monster
> patchset. :)

Wow!  Sorry for my late.  If you just begin to work on inline data
patchset.  Would you mind sending your latest monster patchset without
my inline data patchset first?  That gives me a chance to take a closer
look at them.  In general, I will send my patch set asap.

> 
> I changed the extended attribute API a little bit -- the function pointer to
> ext2fs_xattrs_iterate() takes a value length; lengths are now specified in
> size_t; and the ext2fs_xattrs_count() call is new.  I removed
> ext2fs_xattrs_expand() since it's an internal call.
> 
> This is the current set of APIs:
> 
> errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
>                                int (*func)(char *name, char *value,
>                                            size_t value_len, void *data),
>                                void *data);
> errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
>                           void **value, size_t *value_len);
> errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
>                           const char *key,
>                           const void *value,
>                           size_t value_len);
> errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
>                              const char *key);
> errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
>                             struct ext2_xattr_handle **handle);
> errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
>                               struct ext2_inode_large *inode);
> size_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle);

That would be great if you could send a latest patch to me, and it seems
that I don't need to adjust my patch too much. :)

> 
> I was planning a couple of modifications to support inline_data -- since we can
> rewrite the inode-ea and ea-block arbitrarily, ext2fs_xattrs_write() ought to
> ensure that the inlinedata EA gets written into i_blocks and the beginning of
> the inode-ea area.

It makes no sense to me because we should ensure a function just does
one thing.  Hence, ext2fs_xattrs_write() just needs to write data into
ea area, and it doesn't need to care about the content of these data.

> 
> Should the attributes be sorted before writing?  I was thinking that the
> desirable(?) order might be inline_data, security attributes, "everything
> else", then user attributes?  Or we could simply maintain FCFS order as is done
> now.

At the front of fs/ext4/xattr.c file:

 * The header is followed by multiple entry descriptors. In disk blocks, the
 * entry descriptors are kept sorted. In inodes, they are unsorted. The
 * attribute values are aligned to the end of the block in no specific order.

If I understand correctly, we just need to sort the entries.

> 
> The other change was to ext2fs_xattr_set() to return
> EXT2_ET_INLINE_DATA_NO_SPACE if it figures out that there's not enough space in
> i_blocks + inode-ea to fit the inline data.

As I said above, ext2fs_xattr_set() don't need to handle i_blocks.  But
I think it quite needs to add a parameter to indicate whether we want to
allocate a block to store these ea data.

Thanks,
                                                - Zheng

> 
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  lib/ext2fs/ext2_err.et.in |   18 +
> > >  lib/ext2fs/ext2fs.h       |   28 ++
> > >  lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 807 insertions(+)
> > > 
> > > 
> > > diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
> > > index 9cc1bd1..b819a90 100644
> > > --- a/lib/ext2fs/ext2_err.et.in
> > > +++ b/lib/ext2fs/ext2_err.et.in
> > > @@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
> > >  ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
> > >  	"Cannot block iterate on an inode containing inline data"
> > >  
> > > +ec	EXT2_ET_EA_BAD_NAME_LEN,
> > > +	"Extended attribute has an invalid name length"
> > > +
> > > +ec	EXT2_ET_EA_BAD_VALUE_SIZE,
> > > +	"Extended attribute has an invalid value length"
> > > +
> > > +ec	EXT2_ET_BAD_EA_HASH,
> > > +	"Extended attribute has an incorrect hash"
> > > +
> > > +ec	EXT2_ET_BAD_EA_HEADER,
> > > +	"Extended attribute block has a bad header"
> > > +
> > > +ec	EXT2_ET_EA_KEY_NOT_FOUND,
> > > +	"Extended attribute key not found"
> > > +
> > > +ec	EXT2_ET_EA_NO_SPACE,
> > > +	"Insufficient space to store extended attribute data"
> > > +
> > >  	end
> > > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > > index 5247922..93adae8 100644
> > > --- a/lib/ext2fs/ext2fs.h
> > > +++ b/lib/ext2fs/ext2fs.h
> > > @@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
> > >  #define EXT2_FLAG_FLUSH_NO_SYNC          1
> > >  
> > >  /*
> > > + * Modify and iterate extended attributes
> > > + */
> > > +struct ext2_xattr_handle;
> > > +#define XATTR_ABORT	1
> > > +#define XATTR_CHANGED	2
> > > +
> > > +/*
> > >   * function prototypes
> > >   */
> > >  static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> > > @@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
> > >  					   char *block_buf,
> > >  					   int adjust, __u32 *newcount,
> > >  					   ext2_ino_t inum);
> > > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > > +			       unsigned int expandby);
> > > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> > > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> > > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > > +				int (*func)(char *name, char *value,
> > > +					    void *data),
> > > +				void *data);
> > > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > > +			   void **value, unsigned int *value_len);
> > > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > > +			   const char *key,
> > > +			   const void *value,
> > > +			   unsigned int value_len);
> > > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > > +			      const char *key);
> > > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > > +			     struct ext2_xattr_handle **handle);
> > > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> > > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > > +			       struct ext2_inode_large *inode);
> > >  
> > >  /* extent.c */
> > >  extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
> > > diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> > > index 9649a14..2a1e5e7 100644
> > > --- a/lib/ext2fs/ext_attr.c
> > > +++ b/lib/ext2fs/ext_attr.c
> > > @@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
> > >  	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
> > >  					  newcount);
> > >  }
> > > +
> > > +/* Manipulate the contents of extended attribute regions */
> > > +struct ext2_xattr {
> > > +	char *name;
> > > +	void *value;
> > > +	unsigned int value_len;
> > > +};
> > > +
> > > +struct ext2_xattr_handle {
> > > +	ext2_filsys fs;
> > > +	struct ext2_xattr *attrs;
> > > +	unsigned int length;
> > > +	ext2_ino_t ino;
> > > +	int dirty;
> > > +};
> > > +
> > > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > > +			       unsigned int expandby)
> > > +{
> > > +	struct ext2_xattr *new_attrs;
> > > +	errcode_t err;
> > > +
> > > +	err = ext2fs_get_arrayzero(h->length + expandby,
> > > +				   sizeof(struct ext2_xattr), &new_attrs);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
> > > +	ext2fs_free_mem(&h->attrs);
> > > +	h->length += expandby;
> > > +	h->attrs = new_attrs;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +struct ea_name_index {
> > > +	int index;
> > > +	const char *name;
> > > +};
> > > +
> > > +static struct ea_name_index ea_names[] = {
> > > +	{1, "user."},
> > > +	{2, "system.posix_acl_access"},
> > > +	{3, "system.posix_acl_default"},
> > > +	{4, "trusted."},
> > > +	{6, "security."},
> > > +	{7, "system."},
> > 
> > It seems that we also have a _RICHACL name here.
> > 
> > > +	{0, NULL},
> > > +};
> > > +
> > > +static const char *find_ea_prefix(int index)
> > > +{
> > > +	struct ea_name_index *e;
> > > +
> > > +	for (e = ea_names; e->name; e++)
> > > +		if (e->index == index)
> > > +			return e->name;
> > > +
> > > +	return NULL;
> > > +}
> > > +
> > > +static int find_ea_index(const char *fullname, char **name, int *index)
> > > +{
> > > +	struct ea_name_index *e;
> > > +
> > > +	for (e = ea_names; e->name; e++)
> > 
> > Coding style problem:
> >        for (e = ea_names; e->name; e++) {
> >                ...
> >        }
> 
> Ok I'll change it.
> 
> --D
> 
> > Thanks,
> >                                                 - Zheng
> > 
> > > +		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
> > > +			*name = (char *)fullname + strlen(e->name);
> > > +			*index = e->index;
> > > +			return 1;
> > > +		}
> > > +	return 0;
> > > +}
> > > +
> > > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > > +			       struct ext2_inode_large *inode)
> > > +{
> > > +	struct ext2_ext_attr_header *header;
> > > +	void *block_buf = NULL;
> > > +	dgrp_t grp;
> > > +	blk64_t blk, goal;
> > > +	errcode_t err;
> > > +	struct ext2_inode_large i;
> > > +
> > > +	/* Read inode? */
> > > +	if (inode == NULL) {
> > > +		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
> > > +					     sizeof(struct ext2_inode_large));
> > > +		if (err)
> > > +			return err;
> > > +		inode = &i;
> > > +	}
> > > +
> > > +	/* Do we already have an EA block? */
> > > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > > +	if (blk == 0)
> > > +		return 0;
> > > +
> > > +	/* Find block, zero it, write back */
> > > +	if ((blk < fs->super->s_first_data_block) ||
> > > +	    (blk >= ext2fs_blocks_count(fs->super))) {
> > > +		err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > > +		goto out;
> > > +	}
> > > +
> > > +	err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > > +	if (err)
> > > +		goto out;
> > > +
> > > +	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > > +	if (err)
> > > +		goto out2;
> > > +
> > > +	header = (struct ext2_ext_attr_header *) block_buf;
> > > +	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > > +		err = EXT2_ET_BAD_EA_HEADER;
> > > +		goto out2;
> > > +	}
> > > +
> > > +	header->h_refcount--;
> > > +	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > > +	if (err)
> > > +		goto out2;
> > > +
> > > +	/* Erase link to block */
> > > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
> > > +	if (header->h_refcount == 0)
> > > +		ext2fs_block_alloc_stats2(fs, blk, -1);
> > > +
> > > +	/* Write inode? */
> > > +	if (inode == &i) {
> > > +		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
> > > +					      sizeof(struct ext2_inode_large));
> > > +		if (err)
> > > +			goto out2;
> > > +	}
> > > +
> > > +out2:
> > > +	ext2fs_free_mem(&block_buf);
> > > +out:
> > > +	return err;
> > > +}
> > > +
> > > +static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
> > > +					 struct ext2_inode_large *inode)
> > > +{
> > > +	struct ext2_ext_attr_header *header;
> > > +	void *block_buf = NULL;
> > > +	dgrp_t grp;
> > > +	blk64_t blk, goal;
> > > +	errcode_t err;
> > > +
> > > +	/* Do we already have an EA block? */
> > > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > > +	if (blk != 0) {
> > > +		if ((blk < fs->super->s_first_data_block) ||
> > > +		    (blk >= ext2fs_blocks_count(fs->super))) {
> > > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > > +			goto out;
> > > +		}
> > > +
> > > +		err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > > +		if (err)
> > > +			goto out;
> > > +
> > > +		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > > +		if (err)
> > > +			goto out2;
> > > +
> > > +		header = (struct ext2_ext_attr_header *) block_buf;
> > > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > > +			err = EXT2_ET_BAD_EA_HEADER;
> > > +			goto out2;
> > > +		}
> > > +
> > > +		/* Single-user block.  We're done here. */
> > > +		if (header->h_refcount == 1)
> > > +			return 0;
> > > +
> > > +		/* We need to CoW the block. */
> > > +		header->h_refcount--;
> > > +		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > > +		if (err)
> > > +			goto out2;
> > > +	} else {
> > > +		/* No block, we must increment i_blocks */
> > > +		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
> > > +					     1);
> > > +		if (err)
> > > +			goto out;
> > > +	}
> > > +
> > > +	/* Allocate a block */
> > > +	grp = ext2fs_group_of_ino(fs, ino);
> > > +	goal = ext2fs_inode_table_loc(fs, grp);
> > > +	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
> > > +	if (err)
> > > +		return err;
> > > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
> > > +out2:
> > > +	ext2fs_free_mem(&block_buf);
> > > +out:
> > > +	return err;
> > > +}
> > > +
> > > +
> > > +static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
> > > +					struct ext2_xattr **pos,
> > > +					void *entries_start,
> > > +					unsigned int storage_size,
> > > +					unsigned int value_offset_correction)
> > > +{
> > > +	struct ext2_xattr *x = *pos;
> > > +	struct ext2_ext_attr_entry *e = entries_start;
> > > +	void *end = entries_start + storage_size;
> > > +	char *shortname;
> > > +	unsigned int entry_size, value_size;
> > > +	int idx, ret;
> > > +
> > > +	/* For all remaining x...  */
> > > +	for (; x < handle->attrs + handle->length; x++) {
> > > +		if (!x->name)
> > > +			continue;
> > > +
> > > +		/* Calculate index and shortname position */
> > > +		shortname = x->name;
> > > +		ret = find_ea_index(x->name, &shortname, &idx);
> > > +
> > > +		/* Calculate entry and value size */
> > > +		entry_size = (sizeof(*e) + strlen(shortname) +
> > > +			      EXT2_EXT_ATTR_PAD - 1) &
> > > +			     ~(EXT2_EXT_ATTR_PAD - 1);
> > > +		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
> > > +			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
> > > +
> > > +		/*
> > > +		 * Would entry collide with value?
> > > +		 * Note that we must leave sufficient room for a (u32)0 to
> > > +		 * mark the end of the entries.
> > > +		 */
> > > +		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
> > > +			break;
> > > +
> > > +		/* Fill out e appropriately */
> > > +		e->e_name_len = strlen(shortname);
> > > +		e->e_name_index = (ret ? idx : 0);
> > > +		e->e_value_offs = end - value_size - (void *)entries_start +
> > > +				value_offset_correction;
> > > +		e->e_value_block = 0;
> > > +		e->e_value_size = x->value_len;
> > > +
> > > +		/* Store name and value */
> > > +		end -= value_size;
> > > +		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
> > > +		memcpy(end, x->value, e->e_value_size);
> > > +
> > > +		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
> > > +
> > > +		e = EXT2_EXT_ATTR_NEXT(e);
> > > +		*(__u32 *)e = 0;
> > > +	}
> > > +	*pos = x;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> > > +{
> > > +	struct ext2_xattr *x;
> > > +	struct ext2_inode_large *inode;
> > > +	void *start, *block_buf = NULL;
> > > +	struct ext2_ext_attr_header *header;
> > > +	__u32 ea_inode_magic;
> > > +	blk64_t blk;
> > > +	unsigned int storage_size;
> > > +	unsigned int i, written;
> > > +	errcode_t err;
> > > +
> > > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > > +		return 0;
> > > +
> > > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > > +	if (i < sizeof(*inode))
> > > +		i = sizeof(*inode);
> > > +	err = ext2fs_get_memzero(i, &inode);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > > +				     (struct ext2_inode *)inode,
> > > +				     EXT2_INODE_SIZE(handle->fs->super));
> > > +	if (err)
> > > +		goto out;
> > > +
> > > +	x = handle->attrs;
> > > +	/* Does the inode have size for EA? */
> > > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > > +						  inode->i_extra_isize +
> > > +						  sizeof(__u32))
> > > +		goto write_ea_block;
> > > +
> > > +	/* Write the inode EA */
> > > +	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
> > > +	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > +	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
> > > +	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > > +		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > > +		sizeof(__u32);
> > > +	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > +		inode->i_extra_isize + sizeof(__u32);
> > > +
> > > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
> > > +	if (err)
> > > +		goto out;
> > > +
> > > +	/* Are we done? */
> > > +	if (x == handle->attrs + handle->length)
> > > +		goto skip_ea_block;
> > > +
> > > +write_ea_block:
> > > +	/* Write the EA block */
> > > +	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > > +	if (err)
> > > +		goto out;
> > > +
> > > +	storage_size = handle->fs->blocksize -
> > > +		sizeof(struct ext2_ext_attr_header);
> > > +	start = block_buf + sizeof(struct ext2_ext_attr_header);
> > > +
> > > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
> > > +				     (void *)start - block_buf);
> > > +	if (err)
> > > +		goto out2;
> > > +
> > > +	if (x < handle->attrs + handle->length) {
> > > +		err = EXT2_ET_EA_NO_SPACE;
> > > +		goto out2;
> > > +	}
> > > +
> > > +	if (block_buf) {
> > > +		/* Write a header on the EA block */
> > > +		header = block_buf;
> > > +		header->h_magic = EXT2_EXT_ATTR_MAGIC;
> > > +		header->h_refcount = 1;
> > > +		header->h_blocks = 1;
> > > +
> > > +		/* Get a new block for writing */
> > > +		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
> > > +		if (err)
> > > +			goto out2;
> > > +
> > > +		/* Finally, write the new EA block */
> > > +		blk = ext2fs_file_acl_block(handle->fs,
> > > +					    (struct ext2_inode *)inode);
> > > +		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
> > > +					     handle->ino);
> > > +		if (err)
> > > +			goto out2;
> > > +	}
> > > +
> > > +skip_ea_block:
> > > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > > +	if (!block_buf && blk) {
> > > +		/* xattrs shrunk, free the block */
> > > +		ext2fs_file_acl_block_set(handle->fs,
> > > +					  (struct ext2_inode *)inode, 0);
> > > +		err = ext2fs_iblk_sub_blocks(handle->fs,
> > > +					     (struct ext2_inode *)inode, 1);
> > > +		if (err)
> > > +			goto out;
> > > +		ext2fs_block_alloc_stats2(handle->fs, blk, -1);
> > > +	}
> > > +
> > > +	/* Write the inode */
> > > +	err = ext2fs_write_inode_full(handle->fs, handle->ino,
> > > +				      (struct ext2_inode *)inode,
> > > +				      EXT2_INODE_SIZE(handle->fs->super));
> > > +	if (err)
> > > +		goto out2;
> > > +
> > > +out2:
> > > +	ext2fs_free_mem(&block_buf);
> > > +out:
> > > +	ext2fs_free_mem(&inode);
> > > +	handle->dirty = 0;
> > > +	return err;
> > > +}
> > > +
> > > +static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
> > > +					 struct ext2_ext_attr_entry *entries,
> > > +					 unsigned int storage_size,
> > > +					 void *value_start)
> > > +{
> > > +	struct ext2_xattr *x;
> > > +	struct ext2_ext_attr_entry *entry;
> > > +	const char *prefix;
> > > +	void *ptr;
> > > +	unsigned int remain, prefix_len;
> > > +	errcode_t err;
> > > +
> > > +	x = handle->attrs;
> > > +	while (x->name)
> > > +		x++;
> > > +
> > > +	entry = entries;
> > > +	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
> > > +		__u32 hash;
> > > +
> > > +		/* header eats this space */
> > > +		remain -= sizeof(struct ext2_ext_attr_entry);
> > > +
> > > +		/* is attribute name valid? */
> > > +		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
> > > +			return EXT2_ET_EA_BAD_NAME_LEN;
> > > +
> > > +		/* attribute len eats this space */
> > > +		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
> > > +
> > > +		/* check value size */
> > > +		if (entry->e_value_size > remain)
> > > +			return EXT2_ET_EA_BAD_VALUE_SIZE;
> > > +
> > > +		/* e_value_block must be 0 in inode's ea */
> > > +		if (entry->e_value_block != 0)
> > > +			return EXT2_ET_BAD_EA_BLOCK_NUM;
> > > +
> > > +		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
> > > +							 entry->e_value_offs);
> > > +
> > > +		/* e_hash may be 0 in older inode's ea */
> > > +		if (entry->e_hash != 0 && entry->e_hash != hash)
> > > +			return EXT2_ET_BAD_EA_HASH;
> > > +
> > > +		remain -= entry->e_value_size;
> > > +
> > > +		/* Allocate space for more attrs? */
> > > +		if (x == handle->attrs + handle->length) {
> > > +			err = ext2fs_xattrs_expand(handle, 4);
> > > +			if (err)
> > > +				return err;
> > > +			x = handle->attrs + handle->length - 4;
> > > +		}
> > > +
> > > +		/* Extract name/value */
> > > +		prefix = find_ea_prefix(entry->e_name_index);
> > > +		prefix_len = (prefix ? strlen(prefix) : 0);
> > > +		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
> > > +					 &x->name);
> > > +		if (err)
> > > +			return err;
> > > +		if (prefix)
> > > +			memcpy(x->name, prefix, prefix_len);
> > > +		if (entry->e_name_len)
> > > +			memcpy(x->name + prefix_len,
> > > +			       (void *)entry + sizeof(*entry),
> > > +			       entry->e_name_len);
> > > +
> > > +		err = ext2fs_get_mem(entry->e_value_size, &x->value);
> > > +		if (err)
> > > +			return err;
> > > +		x->value_len = entry->e_value_size;
> > > +		memcpy(x->value, value_start + entry->e_value_offs,
> > > +		       entry->e_value_size);
> > > +		x++;
> > > +		entry = EXT2_EXT_ATTR_NEXT(entry);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
> > > +{
> > > +	struct ext2_xattr *attrs = NULL, *x;
> > > +	unsigned int attrs_len;
> > > +	struct ext2_inode_large *inode;
> > > +	struct ext2_ext_attr_header *header;
> > > +	__u32 ea_inode_magic;
> > > +	unsigned int storage_size;
> > > +	void *start, *block_buf = NULL;
> > > +	blk64_t blk;
> > > +	int i;
> > > +	errcode_t err;
> > > +
> > > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > > +		return 0;
> > > +
> > > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > > +	if (i < sizeof(*inode))
> > > +		i = sizeof(*inode);
> > > +	err = ext2fs_get_memzero(i, &inode);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > > +				     (struct ext2_inode *)inode,
> > > +				     EXT2_INODE_SIZE(handle->fs->super));
> > > +	if (err)
> > > +		goto out;
> > > +
> > > +	/* Does the inode have size for EA? */
> > > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > > +						  inode->i_extra_isize +
> > > +						  sizeof(__u32))
> > > +		goto read_ea_block;
> > > +
> > > +	/* Look for EA in the inode */
> > > +	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > +	       inode->i_extra_isize, sizeof(__u32));
> > > +	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
> > > +		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > > +			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > > +			sizeof(__u32);
> > > +		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > +			inode->i_extra_isize + sizeof(__u32);
> > > +
> > > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > > +					      start);
> > > +		if (err)
> > > +			goto out;
> > > +	}
> > > +
> > > +read_ea_block:
> > > +	/* Look for EA in a separate EA block */
> > > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > > +	if (blk != 0) {
> > > +		if ((blk < handle->fs->super->s_first_data_block) ||
> > > +		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
> > > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > > +			goto out;
> > > +		}
> > > +
> > > +		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > > +		if (err)
> > > +			goto out;
> > > +
> > > +		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
> > > +					    handle->ino);
> > > +		if (err)
> > > +			goto out3;
> > > +
> > > +		header = (struct ext2_ext_attr_header *) block_buf;
> > > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > > +			err = EXT2_ET_BAD_EA_HEADER;
> > > +			goto out3;
> > > +		}
> > > +
> > > +		if (header->h_blocks != 1) {
> > > +			err = EXT2_ET_BAD_EA_HEADER;
> > > +			goto out3;
> > > +		}
> > > +
> > > +		/* Read EAs */
> > > +		storage_size = handle->fs->blocksize -
> > > +			sizeof(struct ext2_ext_attr_header);
> > > +		start = block_buf + sizeof(struct ext2_ext_attr_header);
> > > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > > +					      block_buf);
> > > +		if (err)
> > > +			goto out3;
> > > +
> > > +		ext2fs_free_mem(&block_buf);
> > > +	}
> > > +
> > > +	ext2fs_free_mem(&block_buf);
> > > +	ext2fs_free_mem(&inode);
> > > +	return 0;
> > > +
> > > +out3:
> > > +	ext2fs_free_mem(&block_buf);
> > > +out:
> > > +	ext2fs_free_mem(&inode);
> > > +	return err;
> > > +}
> > > +
> > > +#define XATTR_ABORT	1
> > > +#define XATTR_CHANGED	2
> > > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > > +				int (*func)(char *name, char *value,
> > > +					    void *data),
> > > +				void *data)
> > > +{
> > > +	struct ext2_xattr *x;
> > > +	errcode_t err;
> > > +	int ret;
> > > +
> > > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > > +		if (!x->name)
> > > +			continue;
> > > +
> > > +		ret = func(x->name, x->value, data);
> > > +		if (ret & XATTR_CHANGED)
> > > +			h->dirty = 1;
> > > +		if (ret & XATTR_ABORT)
> > > +			return 0;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > > +			   void **value, unsigned int *value_len)
> > > +{
> > > +	struct ext2_xattr *x;
> > > +	void *val;
> > > +	errcode_t err;
> > > +
> > > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > > +		if (!x->name)
> > > +			continue;
> > > +
> > > +		if (strcmp(x->name, key) == 0) {
> > > +			err = ext2fs_get_mem(x->value_len, &val);
> > > +			if (err)
> > > +				return err;
> > > +			memcpy(val, x->value, x->value_len);
> > > +			*value = val;
> > > +			*value_len = x->value_len;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > > +			   const char *key,
> > > +			   const void *value,
> > > +			   unsigned int value_len)
> > > +{
> > > +	struct ext2_xattr *x, *last_empty;
> > > +	char *new_value;
> > > +	errcode_t err;
> > > +
> > > +	last_empty = NULL;
> > > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > > +		if (!x->name) {
> > > +			last_empty = x;
> > > +			continue;
> > > +		}
> > > +
> > > +		/* Replace xattr */
> > > +		if (strcmp(x->name, key) == 0) {
> > > +			err = ext2fs_get_mem(value_len, &new_value);
> > > +			if (err)
> > > +				return err;
> > > +			memcpy(new_value, value, value_len);
> > > +			ext2fs_free_mem(&x->value);
> > > +			x->value = new_value;
> > > +			x->value_len = value_len;
> > > +			handle->dirty = 1;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	/* Add attr to empty slot */
> > > +	if (last_empty) {
> > > +		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
> > > +		if (err)
> > > +			return err;
> > > +		strcpy(last_empty->name, key);
> > > +
> > > +		err = ext2fs_get_mem(value_len, &last_empty->value);
> > > +		if (err)
> > > +			return err;
> > > +		memcpy(last_empty->value, value, value_len);
> > > +		last_empty->value_len = value_len;
> > > +		handle->dirty = 1;
> > > +		return 0;
> > > +	}
> > > +
> > > +	/* Expand array, append slot */
> > > +	err = ext2fs_xattrs_expand(handle, 4);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	x = handle->attrs + handle->length - 4;
> > > +	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
> > > +	if (err)
> > > +		return err;
> > > +	strcpy(x->name, key);
> > > +
> > > +	err = ext2fs_get_mem(value_len, &x->value);
> > > +	if (err)
> > > +		return err;
> > > +	memcpy(x->value, value, value_len);
> > > +	x->value_len = value_len;
> > > +	handle->dirty = 1;
> > > +	return 0;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > > +			      const char *key)
> > > +{
> > > +	struct ext2_xattr *x;
> > > +	errcode_t err;
> > > +
> > > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > > +		if (!x->name)
> > > +			continue;
> > > +
> > > +		if (strcmp(x->name, key) == 0) {
> > > +			ext2fs_free_mem(&x->name);
> > > +			ext2fs_free_mem(&x->value);
> > > +			x->value_len = 0;
> > > +			handle->dirty = 1;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > > +			     struct ext2_xattr_handle **handle)
> > > +{
> > > +	struct ext2_xattr_handle *h;
> > > +	errcode_t err;
> > > +
> > > +	err = ext2fs_get_memzero(sizeof(*h), &h);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	h->length = 4;
> > > +	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
> > > +				   &h->attrs);
> > > +	if (err) {
> > > +		ext2fs_free_mem(&h);
> > > +		return err;
> > > +	}
> > > +	h->ino = ino;
> > > +	h->fs = fs;
> > > +	*handle = h;
> > > +	return 0;
> > > +}
> > > +
> > > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
> > > +{
> > > +	unsigned int i;
> > > +	struct ext2_xattr_handle *h = *handle;
> > > +	struct ext2_xattr *a = h->attrs;
> > > +	errcode_t err;
> > > +
> > > +	if (h->dirty) {
> > > +		err = ext2fs_xattrs_write(h);
> > > +		if (err)
> > > +			return err;
> > > +	}
> > > +
> > > +	for (i = 0; i < h->length; i++) {
> > > +		if (a[i].name)
> > > +			ext2fs_free_mem(&a[i].name);
> > > +		if (a[i].value)
> > > +			ext2fs_free_mem(&a[i].value);
> > > +	}
> > > +
> > > +	ext2fs_free_mem(&h->attrs);
> > > +	ext2fs_free_mem(handle);
> > > +	return 0;
> > > +}
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-27  2:52       ` Zheng Liu
@ 2013-11-27  3:13         ` Darrick J. Wong
  2013-11-27 11:36           ` Zheng Liu
  0 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-27  3:13 UTC (permalink / raw)
  To: tytso, linux-ext4

On Wed, Nov 27, 2013 at 10:52:32AM +0800, Zheng Liu wrote:
> On Tue, Nov 26, 2013 at 11:55:47AM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 26, 2013 at 03:21:16PM +0800, Zheng Liu wrote:
> > > On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> > > > Add functions to allow clients to get, set, and remove extended
> > > > attributes from any file.  It also supports modifying EAs living in
> > > > i_file_acl.
> > > > 
> > > > v2: Put the header declarations in the correct part of ext2fs.h,
> > > > provide a function to release an EA block from an inode, and check
> > > > i_extra_isize to make sure we actually have space for in-inode EAs.
> > > 
> > > Is this the latest version?  I am working on inline data patch set for
> > > e2fsprogs, and I want to use these API to manipulate the EA.  So that
> > > would be great if you could point out which one is the latest version.
> > > Thanks in advance.  Otherwise some nits below.
> > 
> > Oh!  I was just about to start working on pulling your patches into my monster
> > patchset. :)
> 
> Wow!  Sorry for my late.  If you just begin to work on inline data
> patchset.  Would you mind sending your latest monster patchset without
> my inline data patchset first?  That gives me a chance to take a closer
> look at them.  In general, I will send my patch set asap.

Don't worry about the timing.  I've been busy with a lot of other things.
Every time I think I'm done and can start on inline_data, I find another weird
test case that breaks things, so I have to go back and figure out what went
wrong.

The patchbomb lives here: https://djwong.org/docs/e2fsprogs-patches/ 

Patch 22 is the end of my last (Oct. 2013) patchbomb.  I think the relevant
ones you want are #20 and #24-32.  You can skip #26-27 if you don't care about
fuse2fs.

Patch 37 is where I stopped before I started trying to fix the more obvious
Coverity bugs.  Patch 55 is where most of the xfstests bug fixes start.
> 
> > 
> > I changed the extended attribute API a little bit -- the function pointer to
> > ext2fs_xattrs_iterate() takes a value length; lengths are now specified in
> > size_t; and the ext2fs_xattrs_count() call is new.  I removed
> > ext2fs_xattrs_expand() since it's an internal call.
> > 
> > This is the current set of APIs:
> > 
> > errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> > errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> > errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> >                                int (*func)(char *name, char *value,
> >                                            size_t value_len, void *data),
> >                                void *data);
> > errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> >                           void **value, size_t *value_len);
> > errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> >                           const char *key,
> >                           const void *value,
> >                           size_t value_len);
> > errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> >                              const char *key);
> > errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> >                             struct ext2_xattr_handle **handle);
> > errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> > errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> >                               struct ext2_inode_large *inode);
> > size_t ext2fs_xattrs_count(struct ext2_xattr_handle *handle);
> 
> That would be great if you could send a latest patch to me, and it seems
> that I don't need to adjust my patch too much. :)
> 
> > 
> > I was planning a couple of modifications to support inline_data -- since we can
> > rewrite the inode-ea and ea-block arbitrarily, ext2fs_xattrs_write() ought to
> > ensure that the inlinedata EA gets written into i_blocks and the beginning of
> > the inode-ea area.
> 
> It makes no sense to me because we should ensure a function just does
> one thing.  Hence, ext2fs_xattrs_write() just needs to write data into
> ea area, and it doesn't need to care about the content of these data.
> 
> > 
> > Should the attributes be sorted before writing?  I was thinking that the
> > desirable(?) order might be inline_data, security attributes, "everything
> > else", then user attributes?  Or we could simply maintain FCFS order as is done
> > now.
> 
> At the front of fs/ext4/xattr.c file:
> 
>  * The header is followed by multiple entry descriptors. In disk blocks, the
>  * entry descriptors are kept sorted. In inodes, they are unsorted. The
>  * attribute values are aligned to the end of the block in no specific order.
> 
> If I understand correctly, we just need to sort the entries.
> 
> > 
> > The other change was to ext2fs_xattr_set() to return
> > EXT2_ET_INLINE_DATA_NO_SPACE if it figures out that there's not enough space in
> > i_blocks + inode-ea to fit the inline data.
> 
> As I said above, ext2fs_xattr_set() don't need to handle i_blocks.  But
> I think it quite needs to add a parameter to indicate whether we want to
> allocate a block to store these ea data.

Ohh, ok.  For some reason I had envisioned the xattrs code dealing with all
aspects of storing inline data, but of course this isn't necessary.  The
inline_data code can store whatever it wants in i_blocks and call out to the
xattrs functions to handle whatever needs to fit in the inode EA area.

It shouldn't be too hard to add a XATTRS_SET_FIT_IN_INODE flag to check for
that.

--D
> 
> Thanks,
>                                                 - Zheng
> 
> > 
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  lib/ext2fs/ext2_err.et.in |   18 +
> > > >  lib/ext2fs/ext2fs.h       |   28 ++
> > > >  lib/ext2fs/ext_attr.c     |  761 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 807 insertions(+)
> > > > 
> > > > 
> > > > diff --git a/lib/ext2fs/ext2_err.et.in b/lib/ext2fs/ext2_err.et.in
> > > > index 9cc1bd1..b819a90 100644
> > > > --- a/lib/ext2fs/ext2_err.et.in
> > > > +++ b/lib/ext2fs/ext2_err.et.in
> > > > @@ -482,4 +482,22 @@ ec	EXT2_ET_BLOCK_BITMAP_CSUM_INVALID,
> > > >  ec	EXT2_ET_INLINE_DATA_CANT_ITERATE,
> > > >  	"Cannot block iterate on an inode containing inline data"
> > > >  
> > > > +ec	EXT2_ET_EA_BAD_NAME_LEN,
> > > > +	"Extended attribute has an invalid name length"
> > > > +
> > > > +ec	EXT2_ET_EA_BAD_VALUE_SIZE,
> > > > +	"Extended attribute has an invalid value length"
> > > > +
> > > > +ec	EXT2_ET_BAD_EA_HASH,
> > > > +	"Extended attribute has an incorrect hash"
> > > > +
> > > > +ec	EXT2_ET_BAD_EA_HEADER,
> > > > +	"Extended attribute block has a bad header"
> > > > +
> > > > +ec	EXT2_ET_EA_KEY_NOT_FOUND,
> > > > +	"Extended attribute key not found"
> > > > +
> > > > +ec	EXT2_ET_EA_NO_SPACE,
> > > > +	"Insufficient space to store extended attribute data"
> > > > +
> > > >  	end
> > > > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > > > index 5247922..93adae8 100644
> > > > --- a/lib/ext2fs/ext2fs.h
> > > > +++ b/lib/ext2fs/ext2fs.h
> > > > @@ -637,6 +637,13 @@ typedef struct stat ext2fs_struct_stat;
> > > >  #define EXT2_FLAG_FLUSH_NO_SYNC          1
> > > >  
> > > >  /*
> > > > + * Modify and iterate extended attributes
> > > > + */
> > > > +struct ext2_xattr_handle;
> > > > +#define XATTR_ABORT	1
> > > > +#define XATTR_CHANGED	2
> > > > +
> > > > +/*
> > > >   * function prototypes
> > > >   */
> > > >  static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> > > > @@ -1151,6 +1158,27 @@ extern errcode_t ext2fs_adjust_ea_refcount3(ext2_filsys fs, blk64_t blk,
> > > >  					   char *block_buf,
> > > >  					   int adjust, __u32 *newcount,
> > > >  					   ext2_ino_t inum);
> > > > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > > > +			       unsigned int expandby);
> > > > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle);
> > > > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle);
> > > > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > > > +				int (*func)(char *name, char *value,
> > > > +					    void *data),
> > > > +				void *data);
> > > > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > > > +			   void **value, unsigned int *value_len);
> > > > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > > > +			   const char *key,
> > > > +			   const void *value,
> > > > +			   unsigned int value_len);
> > > > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > > > +			      const char *key);
> > > > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > > > +			     struct ext2_xattr_handle **handle);
> > > > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle);
> > > > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > > > +			       struct ext2_inode_large *inode);
> > > >  
> > > >  /* extent.c */
> > > >  extern errcode_t ext2fs_extent_header_verify(void *ptr, int size);
> > > > diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> > > > index 9649a14..2a1e5e7 100644
> > > > --- a/lib/ext2fs/ext_attr.c
> > > > +++ b/lib/ext2fs/ext_attr.c
> > > > @@ -186,3 +186,764 @@ errcode_t ext2fs_adjust_ea_refcount(ext2_filsys fs, blk_t blk,
> > > >  	return ext2fs_adjust_ea_refcount2(fs, blk, block_buf, adjust,
> > > >  					  newcount);
> > > >  }
> > > > +
> > > > +/* Manipulate the contents of extended attribute regions */
> > > > +struct ext2_xattr {
> > > > +	char *name;
> > > > +	void *value;
> > > > +	unsigned int value_len;
> > > > +};
> > > > +
> > > > +struct ext2_xattr_handle {
> > > > +	ext2_filsys fs;
> > > > +	struct ext2_xattr *attrs;
> > > > +	unsigned int length;
> > > > +	ext2_ino_t ino;
> > > > +	int dirty;
> > > > +};
> > > > +
> > > > +errcode_t ext2fs_xattrs_expand(struct ext2_xattr_handle *h,
> > > > +			       unsigned int expandby)
> > > > +{
> > > > +	struct ext2_xattr *new_attrs;
> > > > +	errcode_t err;
> > > > +
> > > > +	err = ext2fs_get_arrayzero(h->length + expandby,
> > > > +				   sizeof(struct ext2_xattr), &new_attrs);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	memcpy(new_attrs, h->attrs, h->length * sizeof(struct ext2_xattr));
> > > > +	ext2fs_free_mem(&h->attrs);
> > > > +	h->length += expandby;
> > > > +	h->attrs = new_attrs;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +struct ea_name_index {
> > > > +	int index;
> > > > +	const char *name;
> > > > +};
> > > > +
> > > > +static struct ea_name_index ea_names[] = {
> > > > +	{1, "user."},
> > > > +	{2, "system.posix_acl_access"},
> > > > +	{3, "system.posix_acl_default"},
> > > > +	{4, "trusted."},
> > > > +	{6, "security."},
> > > > +	{7, "system."},
> > > 
> > > It seems that we also have a _RICHACL name here.
> > > 
> > > > +	{0, NULL},
> > > > +};
> > > > +
> > > > +static const char *find_ea_prefix(int index)
> > > > +{
> > > > +	struct ea_name_index *e;
> > > > +
> > > > +	for (e = ea_names; e->name; e++)
> > > > +		if (e->index == index)
> > > > +			return e->name;
> > > > +
> > > > +	return NULL;
> > > > +}
> > > > +
> > > > +static int find_ea_index(const char *fullname, char **name, int *index)
> > > > +{
> > > > +	struct ea_name_index *e;
> > > > +
> > > > +	for (e = ea_names; e->name; e++)
> > > 
> > > Coding style problem:
> > >        for (e = ea_names; e->name; e++) {
> > >                ...
> > >        }
> > 
> > Ok I'll change it.
> > 
> > --D
> > 
> > > Thanks,
> > >                                                 - Zheng
> > > 
> > > > +		if (memcmp(fullname, e->name, strlen(e->name)) == 0) {
> > > > +			*name = (char *)fullname + strlen(e->name);
> > > > +			*index = e->index;
> > > > +			return 1;
> > > > +		}
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_free_ext_attr(ext2_filsys fs, ext2_ino_t ino,
> > > > +			       struct ext2_inode_large *inode)
> > > > +{
> > > > +	struct ext2_ext_attr_header *header;
> > > > +	void *block_buf = NULL;
> > > > +	dgrp_t grp;
> > > > +	blk64_t blk, goal;
> > > > +	errcode_t err;
> > > > +	struct ext2_inode_large i;
> > > > +
> > > > +	/* Read inode? */
> > > > +	if (inode == NULL) {
> > > > +		err = ext2fs_read_inode_full(fs, ino, (struct ext2_inode *)&i,
> > > > +					     sizeof(struct ext2_inode_large));
> > > > +		if (err)
> > > > +			return err;
> > > > +		inode = &i;
> > > > +	}
> > > > +
> > > > +	/* Do we already have an EA block? */
> > > > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > > > +	if (blk == 0)
> > > > +		return 0;
> > > > +
> > > > +	/* Find block, zero it, write back */
> > > > +	if ((blk < fs->super->s_first_data_block) ||
> > > > +	    (blk >= ext2fs_blocks_count(fs->super))) {
> > > > +		err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > > > +	if (err)
> > > > +		goto out;
> > > > +
> > > > +	err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > > > +	if (err)
> > > > +		goto out2;
> > > > +
> > > > +	header = (struct ext2_ext_attr_header *) block_buf;
> > > > +	if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > > > +		err = EXT2_ET_BAD_EA_HEADER;
> > > > +		goto out2;
> > > > +	}
> > > > +
> > > > +	header->h_refcount--;
> > > > +	err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > > > +	if (err)
> > > > +		goto out2;
> > > > +
> > > > +	/* Erase link to block */
> > > > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, 0);
> > > > +	if (header->h_refcount == 0)
> > > > +		ext2fs_block_alloc_stats2(fs, blk, -1);
> > > > +
> > > > +	/* Write inode? */
> > > > +	if (inode == &i) {
> > > > +		err = ext2fs_write_inode_full(fs, ino, (struct ext2_inode *)&i,
> > > > +					      sizeof(struct ext2_inode_large));
> > > > +		if (err)
> > > > +			goto out2;
> > > > +	}
> > > > +
> > > > +out2:
> > > > +	ext2fs_free_mem(&block_buf);
> > > > +out:
> > > > +	return err;
> > > > +}
> > > > +
> > > > +static errcode_t prep_ea_block_for_write(ext2_filsys fs, ext2_ino_t ino,
> > > > +					 struct ext2_inode_large *inode)
> > > > +{
> > > > +	struct ext2_ext_attr_header *header;
> > > > +	void *block_buf = NULL;
> > > > +	dgrp_t grp;
> > > > +	blk64_t blk, goal;
> > > > +	errcode_t err;
> > > > +
> > > > +	/* Do we already have an EA block? */
> > > > +	blk = ext2fs_file_acl_block(fs, (struct ext2_inode *)inode);
> > > > +	if (blk != 0) {
> > > > +		if ((blk < fs->super->s_first_data_block) ||
> > > > +		    (blk >= ext2fs_blocks_count(fs->super))) {
> > > > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > > > +			goto out;
> > > > +		}
> > > > +
> > > > +		err = ext2fs_get_mem(fs->blocksize, &block_buf);
> > > > +		if (err)
> > > > +			goto out;
> > > > +
> > > > +		err = ext2fs_read_ext_attr3(fs, blk, block_buf, ino);
> > > > +		if (err)
> > > > +			goto out2;
> > > > +
> > > > +		header = (struct ext2_ext_attr_header *) block_buf;
> > > > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > > > +			err = EXT2_ET_BAD_EA_HEADER;
> > > > +			goto out2;
> > > > +		}
> > > > +
> > > > +		/* Single-user block.  We're done here. */
> > > > +		if (header->h_refcount == 1)
> > > > +			return 0;
> > > > +
> > > > +		/* We need to CoW the block. */
> > > > +		header->h_refcount--;
> > > > +		err = ext2fs_write_ext_attr3(fs, blk, block_buf, ino);
> > > > +		if (err)
> > > > +			goto out2;
> > > > +	} else {
> > > > +		/* No block, we must increment i_blocks */
> > > > +		err = ext2fs_iblk_add_blocks(fs, (struct ext2_inode *)inode,
> > > > +					     1);
> > > > +		if (err)
> > > > +			goto out;
> > > > +	}
> > > > +
> > > > +	/* Allocate a block */
> > > > +	grp = ext2fs_group_of_ino(fs, ino);
> > > > +	goal = ext2fs_inode_table_loc(fs, grp);
> > > > +	err = ext2fs_alloc_block2(fs, goal, NULL, &blk);
> > > > +	if (err)
> > > > +		return err;
> > > > +	ext2fs_file_acl_block_set(fs, (struct ext2_inode *)inode, blk);
> > > > +out2:
> > > > +	ext2fs_free_mem(&block_buf);
> > > > +out:
> > > > +	return err;
> > > > +}
> > > > +
> > > > +
> > > > +static errcode_t write_xattrs_to_buffer(struct ext2_xattr_handle *handle,
> > > > +					struct ext2_xattr **pos,
> > > > +					void *entries_start,
> > > > +					unsigned int storage_size,
> > > > +					unsigned int value_offset_correction)
> > > > +{
> > > > +	struct ext2_xattr *x = *pos;
> > > > +	struct ext2_ext_attr_entry *e = entries_start;
> > > > +	void *end = entries_start + storage_size;
> > > > +	char *shortname;
> > > > +	unsigned int entry_size, value_size;
> > > > +	int idx, ret;
> > > > +
> > > > +	/* For all remaining x...  */
> > > > +	for (; x < handle->attrs + handle->length; x++) {
> > > > +		if (!x->name)
> > > > +			continue;
> > > > +
> > > > +		/* Calculate index and shortname position */
> > > > +		shortname = x->name;
> > > > +		ret = find_ea_index(x->name, &shortname, &idx);
> > > > +
> > > > +		/* Calculate entry and value size */
> > > > +		entry_size = (sizeof(*e) + strlen(shortname) +
> > > > +			      EXT2_EXT_ATTR_PAD - 1) &
> > > > +			     ~(EXT2_EXT_ATTR_PAD - 1);
> > > > +		value_size = ((x->value_len + EXT2_EXT_ATTR_PAD - 1) /
> > > > +			      EXT2_EXT_ATTR_PAD) * EXT2_EXT_ATTR_PAD;
> > > > +
> > > > +		/*
> > > > +		 * Would entry collide with value?
> > > > +		 * Note that we must leave sufficient room for a (u32)0 to
> > > > +		 * mark the end of the entries.
> > > > +		 */
> > > > +		if ((void *)e + entry_size + sizeof(__u32) > end - value_size)
> > > > +			break;
> > > > +
> > > > +		/* Fill out e appropriately */
> > > > +		e->e_name_len = strlen(shortname);
> > > > +		e->e_name_index = (ret ? idx : 0);
> > > > +		e->e_value_offs = end - value_size - (void *)entries_start +
> > > > +				value_offset_correction;
> > > > +		e->e_value_block = 0;
> > > > +		e->e_value_size = x->value_len;
> > > > +
> > > > +		/* Store name and value */
> > > > +		end -= value_size;
> > > > +		memcpy((void *)e + sizeof(*e), shortname, e->e_name_len);
> > > > +		memcpy(end, x->value, e->e_value_size);
> > > > +
> > > > +		e->e_hash = ext2fs_ext_attr_hash_entry(e, end);
> > > > +
> > > > +		e = EXT2_EXT_ATTR_NEXT(e);
> > > > +		*(__u32 *)e = 0;
> > > > +	}
> > > > +	*pos = x;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> > > > +{
> > > > +	struct ext2_xattr *x;
> > > > +	struct ext2_inode_large *inode;
> > > > +	void *start, *block_buf = NULL;
> > > > +	struct ext2_ext_attr_header *header;
> > > > +	__u32 ea_inode_magic;
> > > > +	blk64_t blk;
> > > > +	unsigned int storage_size;
> > > > +	unsigned int i, written;
> > > > +	errcode_t err;
> > > > +
> > > > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > > > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > > > +		return 0;
> > > > +
> > > > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > > > +	if (i < sizeof(*inode))
> > > > +		i = sizeof(*inode);
> > > > +	err = ext2fs_get_memzero(i, &inode);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > > > +				     (struct ext2_inode *)inode,
> > > > +				     EXT2_INODE_SIZE(handle->fs->super));
> > > > +	if (err)
> > > > +		goto out;
> > > > +
> > > > +	x = handle->attrs;
> > > > +	/* Does the inode have size for EA? */
> > > > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > > > +						  inode->i_extra_isize +
> > > > +						  sizeof(__u32))
> > > > +		goto write_ea_block;
> > > > +
> > > > +	/* Write the inode EA */
> > > > +	ea_inode_magic = EXT2_EXT_ATTR_MAGIC;
> > > > +	memcpy(((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > > +	       inode->i_extra_isize, &ea_inode_magic, sizeof(__u32));
> > > > +	storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > > > +		EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > > > +		sizeof(__u32);
> > > > +	start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > > +		inode->i_extra_isize + sizeof(__u32);
> > > > +
> > > > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size, 0);
> > > > +	if (err)
> > > > +		goto out;
> > > > +
> > > > +	/* Are we done? */
> > > > +	if (x == handle->attrs + handle->length)
> > > > +		goto skip_ea_block;
> > > > +
> > > > +write_ea_block:
> > > > +	/* Write the EA block */
> > > > +	err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > > > +	if (err)
> > > > +		goto out;
> > > > +
> > > > +	storage_size = handle->fs->blocksize -
> > > > +		sizeof(struct ext2_ext_attr_header);
> > > > +	start = block_buf + sizeof(struct ext2_ext_attr_header);
> > > > +
> > > > +	err = write_xattrs_to_buffer(handle, &x, start, storage_size,
> > > > +				     (void *)start - block_buf);
> > > > +	if (err)
> > > > +		goto out2;
> > > > +
> > > > +	if (x < handle->attrs + handle->length) {
> > > > +		err = EXT2_ET_EA_NO_SPACE;
> > > > +		goto out2;
> > > > +	}
> > > > +
> > > > +	if (block_buf) {
> > > > +		/* Write a header on the EA block */
> > > > +		header = block_buf;
> > > > +		header->h_magic = EXT2_EXT_ATTR_MAGIC;
> > > > +		header->h_refcount = 1;
> > > > +		header->h_blocks = 1;
> > > > +
> > > > +		/* Get a new block for writing */
> > > > +		err = prep_ea_block_for_write(handle->fs, handle->ino, inode);
> > > > +		if (err)
> > > > +			goto out2;
> > > > +
> > > > +		/* Finally, write the new EA block */
> > > > +		blk = ext2fs_file_acl_block(handle->fs,
> > > > +					    (struct ext2_inode *)inode);
> > > > +		err = ext2fs_write_ext_attr3(handle->fs, blk, block_buf,
> > > > +					     handle->ino);
> > > > +		if (err)
> > > > +			goto out2;
> > > > +	}
> > > > +
> > > > +skip_ea_block:
> > > > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > > > +	if (!block_buf && blk) {
> > > > +		/* xattrs shrunk, free the block */
> > > > +		ext2fs_file_acl_block_set(handle->fs,
> > > > +					  (struct ext2_inode *)inode, 0);
> > > > +		err = ext2fs_iblk_sub_blocks(handle->fs,
> > > > +					     (struct ext2_inode *)inode, 1);
> > > > +		if (err)
> > > > +			goto out;
> > > > +		ext2fs_block_alloc_stats2(handle->fs, blk, -1);
> > > > +	}
> > > > +
> > > > +	/* Write the inode */
> > > > +	err = ext2fs_write_inode_full(handle->fs, handle->ino,
> > > > +				      (struct ext2_inode *)inode,
> > > > +				      EXT2_INODE_SIZE(handle->fs->super));
> > > > +	if (err)
> > > > +		goto out2;
> > > > +
> > > > +out2:
> > > > +	ext2fs_free_mem(&block_buf);
> > > > +out:
> > > > +	ext2fs_free_mem(&inode);
> > > > +	handle->dirty = 0;
> > > > +	return err;
> > > > +}
> > > > +
> > > > +static errcode_t read_xattrs_from_buffer(struct ext2_xattr_handle *handle,
> > > > +					 struct ext2_ext_attr_entry *entries,
> > > > +					 unsigned int storage_size,
> > > > +					 void *value_start)
> > > > +{
> > > > +	struct ext2_xattr *x;
> > > > +	struct ext2_ext_attr_entry *entry;
> > > > +	const char *prefix;
> > > > +	void *ptr;
> > > > +	unsigned int remain, prefix_len;
> > > > +	errcode_t err;
> > > > +
> > > > +	x = handle->attrs;
> > > > +	while (x->name)
> > > > +		x++;
> > > > +
> > > > +	entry = entries;
> > > > +	while (!EXT2_EXT_IS_LAST_ENTRY(entry)) {
> > > > +		__u32 hash;
> > > > +
> > > > +		/* header eats this space */
> > > > +		remain -= sizeof(struct ext2_ext_attr_entry);
> > > > +
> > > > +		/* is attribute name valid? */
> > > > +		if (EXT2_EXT_ATTR_SIZE(entry->e_name_len) > remain)
> > > > +			return EXT2_ET_EA_BAD_NAME_LEN;
> > > > +
> > > > +		/* attribute len eats this space */
> > > > +		remain -= EXT2_EXT_ATTR_SIZE(entry->e_name_len);
> > > > +
> > > > +		/* check value size */
> > > > +		if (entry->e_value_size > remain)
> > > > +			return EXT2_ET_EA_BAD_VALUE_SIZE;
> > > > +
> > > > +		/* e_value_block must be 0 in inode's ea */
> > > > +		if (entry->e_value_block != 0)
> > > > +			return EXT2_ET_BAD_EA_BLOCK_NUM;
> > > > +
> > > > +		hash = ext2fs_ext_attr_hash_entry(entry, value_start +
> > > > +							 entry->e_value_offs);
> > > > +
> > > > +		/* e_hash may be 0 in older inode's ea */
> > > > +		if (entry->e_hash != 0 && entry->e_hash != hash)
> > > > +			return EXT2_ET_BAD_EA_HASH;
> > > > +
> > > > +		remain -= entry->e_value_size;
> > > > +
> > > > +		/* Allocate space for more attrs? */
> > > > +		if (x == handle->attrs + handle->length) {
> > > > +			err = ext2fs_xattrs_expand(handle, 4);
> > > > +			if (err)
> > > > +				return err;
> > > > +			x = handle->attrs + handle->length - 4;
> > > > +		}
> > > > +
> > > > +		/* Extract name/value */
> > > > +		prefix = find_ea_prefix(entry->e_name_index);
> > > > +		prefix_len = (prefix ? strlen(prefix) : 0);
> > > > +		err = ext2fs_get_memzero(entry->e_name_len + prefix_len + 1,
> > > > +					 &x->name);
> > > > +		if (err)
> > > > +			return err;
> > > > +		if (prefix)
> > > > +			memcpy(x->name, prefix, prefix_len);
> > > > +		if (entry->e_name_len)
> > > > +			memcpy(x->name + prefix_len,
> > > > +			       (void *)entry + sizeof(*entry),
> > > > +			       entry->e_name_len);
> > > > +
> > > > +		err = ext2fs_get_mem(entry->e_value_size, &x->value);
> > > > +		if (err)
> > > > +			return err;
> > > > +		x->value_len = entry->e_value_size;
> > > > +		memcpy(x->value, value_start + entry->e_value_offs,
> > > > +		       entry->e_value_size);
> > > > +		x++;
> > > > +		entry = EXT2_EXT_ATTR_NEXT(entry);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
> > > > +{
> > > > +	struct ext2_xattr *attrs = NULL, *x;
> > > > +	unsigned int attrs_len;
> > > > +	struct ext2_inode_large *inode;
> > > > +	struct ext2_ext_attr_header *header;
> > > > +	__u32 ea_inode_magic;
> > > > +	unsigned int storage_size;
> > > > +	void *start, *block_buf = NULL;
> > > > +	blk64_t blk;
> > > > +	int i;
> > > > +	errcode_t err;
> > > > +
> > > > +	if (!EXT2_HAS_COMPAT_FEATURE(handle->fs->super,
> > > > +				     EXT2_FEATURE_COMPAT_EXT_ATTR))
> > > > +		return 0;
> > > > +
> > > > +	i = EXT2_INODE_SIZE(handle->fs->super);
> > > > +	if (i < sizeof(*inode))
> > > > +		i = sizeof(*inode);
> > > > +	err = ext2fs_get_memzero(i, &inode);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	err = ext2fs_read_inode_full(handle->fs, handle->ino,
> > > > +				     (struct ext2_inode *)inode,
> > > > +				     EXT2_INODE_SIZE(handle->fs->super));
> > > > +	if (err)
> > > > +		goto out;
> > > > +
> > > > +	/* Does the inode have size for EA? */
> > > > +	if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
> > > > +						  inode->i_extra_isize +
> > > > +						  sizeof(__u32))
> > > > +		goto read_ea_block;
> > > > +
> > > > +	/* Look for EA in the inode */
> > > > +	memcpy(&ea_inode_magic, ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > > +	       inode->i_extra_isize, sizeof(__u32));
> > > > +	if (ea_inode_magic == EXT2_EXT_ATTR_MAGIC) {
> > > > +		storage_size = EXT2_INODE_SIZE(handle->fs->super) -
> > > > +			EXT2_GOOD_OLD_INODE_SIZE - inode->i_extra_isize -
> > > > +			sizeof(__u32);
> > > > +		start = ((char *) inode) + EXT2_GOOD_OLD_INODE_SIZE +
> > > > +			inode->i_extra_isize + sizeof(__u32);
> > > > +
> > > > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > > > +					      start);
> > > > +		if (err)
> > > > +			goto out;
> > > > +	}
> > > > +
> > > > +read_ea_block:
> > > > +	/* Look for EA in a separate EA block */
> > > > +	blk = ext2fs_file_acl_block(handle->fs, (struct ext2_inode *)inode);
> > > > +	if (blk != 0) {
> > > > +		if ((blk < handle->fs->super->s_first_data_block) ||
> > > > +		    (blk >= ext2fs_blocks_count(handle->fs->super))) {
> > > > +			err = EXT2_ET_BAD_EA_BLOCK_NUM;
> > > > +			goto out;
> > > > +		}
> > > > +
> > > > +		err = ext2fs_get_mem(handle->fs->blocksize, &block_buf);
> > > > +		if (err)
> > > > +			goto out;
> > > > +
> > > > +		err = ext2fs_read_ext_attr3(handle->fs, blk, block_buf,
> > > > +					    handle->ino);
> > > > +		if (err)
> > > > +			goto out3;
> > > > +
> > > > +		header = (struct ext2_ext_attr_header *) block_buf;
> > > > +		if (header->h_magic != EXT2_EXT_ATTR_MAGIC) {
> > > > +			err = EXT2_ET_BAD_EA_HEADER;
> > > > +			goto out3;
> > > > +		}
> > > > +
> > > > +		if (header->h_blocks != 1) {
> > > > +			err = EXT2_ET_BAD_EA_HEADER;
> > > > +			goto out3;
> > > > +		}
> > > > +
> > > > +		/* Read EAs */
> > > > +		storage_size = handle->fs->blocksize -
> > > > +			sizeof(struct ext2_ext_attr_header);
> > > > +		start = block_buf + sizeof(struct ext2_ext_attr_header);
> > > > +		err = read_xattrs_from_buffer(handle, start, storage_size,
> > > > +					      block_buf);
> > > > +		if (err)
> > > > +			goto out3;
> > > > +
> > > > +		ext2fs_free_mem(&block_buf);
> > > > +	}
> > > > +
> > > > +	ext2fs_free_mem(&block_buf);
> > > > +	ext2fs_free_mem(&inode);
> > > > +	return 0;
> > > > +
> > > > +out3:
> > > > +	ext2fs_free_mem(&block_buf);
> > > > +out:
> > > > +	ext2fs_free_mem(&inode);
> > > > +	return err;
> > > > +}
> > > > +
> > > > +#define XATTR_ABORT	1
> > > > +#define XATTR_CHANGED	2
> > > > +errcode_t ext2fs_xattrs_iterate(struct ext2_xattr_handle *h,
> > > > +				int (*func)(char *name, char *value,
> > > > +					    void *data),
> > > > +				void *data)
> > > > +{
> > > > +	struct ext2_xattr *x;
> > > > +	errcode_t err;
> > > > +	int ret;
> > > > +
> > > > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > > > +		if (!x->name)
> > > > +			continue;
> > > > +
> > > > +		ret = func(x->name, x->value, data);
> > > > +		if (ret & XATTR_CHANGED)
> > > > +			h->dirty = 1;
> > > > +		if (ret & XATTR_ABORT)
> > > > +			return 0;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattr_get(struct ext2_xattr_handle *h, const char *key,
> > > > +			   void **value, unsigned int *value_len)
> > > > +{
> > > > +	struct ext2_xattr *x;
> > > > +	void *val;
> > > > +	errcode_t err;
> > > > +
> > > > +	for (x = h->attrs; x < h->attrs + h->length; x++) {
> > > > +		if (!x->name)
> > > > +			continue;
> > > > +
> > > > +		if (strcmp(x->name, key) == 0) {
> > > > +			err = ext2fs_get_mem(x->value_len, &val);
> > > > +			if (err)
> > > > +				return err;
> > > > +			memcpy(val, x->value, x->value_len);
> > > > +			*value = val;
> > > > +			*value_len = x->value_len;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattr_set(struct ext2_xattr_handle *handle,
> > > > +			   const char *key,
> > > > +			   const void *value,
> > > > +			   unsigned int value_len)
> > > > +{
> > > > +	struct ext2_xattr *x, *last_empty;
> > > > +	char *new_value;
> > > > +	errcode_t err;
> > > > +
> > > > +	last_empty = NULL;
> > > > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > > > +		if (!x->name) {
> > > > +			last_empty = x;
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		/* Replace xattr */
> > > > +		if (strcmp(x->name, key) == 0) {
> > > > +			err = ext2fs_get_mem(value_len, &new_value);
> > > > +			if (err)
> > > > +				return err;
> > > > +			memcpy(new_value, value, value_len);
> > > > +			ext2fs_free_mem(&x->value);
> > > > +			x->value = new_value;
> > > > +			x->value_len = value_len;
> > > > +			handle->dirty = 1;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/* Add attr to empty slot */
> > > > +	if (last_empty) {
> > > > +		err = ext2fs_get_mem(strlen(key) + 1, &last_empty->name);
> > > > +		if (err)
> > > > +			return err;
> > > > +		strcpy(last_empty->name, key);
> > > > +
> > > > +		err = ext2fs_get_mem(value_len, &last_empty->value);
> > > > +		if (err)
> > > > +			return err;
> > > > +		memcpy(last_empty->value, value, value_len);
> > > > +		last_empty->value_len = value_len;
> > > > +		handle->dirty = 1;
> > > > +		return 0;
> > > > +	}
> > > > +
> > > > +	/* Expand array, append slot */
> > > > +	err = ext2fs_xattrs_expand(handle, 4);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	x = handle->attrs + handle->length - 4;
> > > > +	err = ext2fs_get_mem(strlen(key) + 1, &x->name);
> > > > +	if (err)
> > > > +		return err;
> > > > +	strcpy(x->name, key);
> > > > +
> > > > +	err = ext2fs_get_mem(value_len, &x->value);
> > > > +	if (err)
> > > > +		return err;
> > > > +	memcpy(x->value, value, value_len);
> > > > +	x->value_len = value_len;
> > > > +	handle->dirty = 1;
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattr_remove(struct ext2_xattr_handle *handle,
> > > > +			      const char *key)
> > > > +{
> > > > +	struct ext2_xattr *x;
> > > > +	errcode_t err;
> > > > +
> > > > +	for (x = handle->attrs; x < handle->attrs + handle->length; x++) {
> > > > +		if (!x->name)
> > > > +			continue;
> > > > +
> > > > +		if (strcmp(x->name, key) == 0) {
> > > > +			ext2fs_free_mem(&x->name);
> > > > +			ext2fs_free_mem(&x->value);
> > > > +			x->value_len = 0;
> > > > +			handle->dirty = 1;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return EXT2_ET_EA_KEY_NOT_FOUND;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattrs_open(ext2_filsys fs, ext2_ino_t ino,
> > > > +			     struct ext2_xattr_handle **handle)
> > > > +{
> > > > +	struct ext2_xattr_handle *h;
> > > > +	errcode_t err;
> > > > +
> > > > +	err = ext2fs_get_memzero(sizeof(*h), &h);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	h->length = 4;
> > > > +	err = ext2fs_get_arrayzero(h->length, sizeof(struct ext2_xattr),
> > > > +				   &h->attrs);
> > > > +	if (err) {
> > > > +		ext2fs_free_mem(&h);
> > > > +		return err;
> > > > +	}
> > > > +	h->ino = ino;
> > > > +	h->fs = fs;
> > > > +	*handle = h;
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +errcode_t ext2fs_xattrs_close(struct ext2_xattr_handle **handle)
> > > > +{
> > > > +	unsigned int i;
> > > > +	struct ext2_xattr_handle *h = *handle;
> > > > +	struct ext2_xattr *a = h->attrs;
> > > > +	errcode_t err;
> > > > +
> > > > +	if (h->dirty) {
> > > > +		err = ext2fs_xattrs_write(h);
> > > > +		if (err)
> > > > +			return err;
> > > > +	}
> > > > +
> > > > +	for (i = 0; i < h->length; i++) {
> > > > +		if (a[i].name)
> > > > +			ext2fs_free_mem(&a[i].name);
> > > > +		if (a[i].value)
> > > > +			ext2fs_free_mem(&a[i].value);
> > > > +	}
> > > > +
> > > > +	ext2fs_free_mem(&h->attrs);
> > > > +	ext2fs_free_mem(handle);
> > > > +	return 0;
> > > > +}
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-27  3:13         ` Darrick J. Wong
@ 2013-11-27 11:36           ` Zheng Liu
  0 siblings, 0 replies; 73+ messages in thread
From: Zheng Liu @ 2013-11-27 11:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Tue, Nov 26, 2013 at 07:13:18PM -0800, Darrick J. Wong wrote:
> On Wed, Nov 27, 2013 at 10:52:32AM +0800, Zheng Liu wrote:
> > On Tue, Nov 26, 2013 at 11:55:47AM -0800, Darrick J. Wong wrote:
> > > On Tue, Nov 26, 2013 at 03:21:16PM +0800, Zheng Liu wrote:
> > > > On Thu, Oct 17, 2013 at 09:51:34PM -0700, Darrick J. Wong wrote:
> > > > > Add functions to allow clients to get, set, and remove extended
> > > > > attributes from any file.  It also supports modifying EAs living in
> > > > > i_file_acl.
> > > > > 
> > > > > v2: Put the header declarations in the correct part of ext2fs.h,
> > > > > provide a function to release an EA block from an inode, and check
> > > > > i_extra_isize to make sure we actually have space for in-inode EAs.
> > > > 
> > > > Is this the latest version?  I am working on inline data patch set for
> > > > e2fsprogs, and I want to use these API to manipulate the EA.  So that
> > > > would be great if you could point out which one is the latest version.
> > > > Thanks in advance.  Otherwise some nits below.
> > > 
> > > Oh!  I was just about to start working on pulling your patches into my monster
> > > patchset. :)
> > 
> > Wow!  Sorry for my late.  If you just begin to work on inline data
> > patchset.  Would you mind sending your latest monster patchset without
> > my inline data patchset first?  That gives me a chance to take a closer
> > look at them.  In general, I will send my patch set asap.
> 
> Don't worry about the timing.  I've been busy with a lot of other things.
> Every time I think I'm done and can start on inline_data, I find another weird
> test case that breaks things, so I have to go back and figure out what went
> wrong.
> 
> The patchbomb lives here: https://djwong.org/docs/e2fsprogs-patches/ 
> 
> Patch 22 is the end of my last (Oct. 2013) patchbomb.  I think the relevant
> ones you want are #20 and #24-32.  You can skip #26-27 if you don't care about
> fuse2fs.

Got it.  Thanks for your help. :)

                                                - Zheng

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-27  1:56     ` Darrick J. Wong
@ 2013-11-29  5:30       ` Zheng Liu
  2013-11-29  8:17         ` Jan Kara
  0 siblings, 1 reply; 73+ messages in thread
From: Zheng Liu @ 2013-11-29  5:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: tytso, linux-ext4

On Tue, Nov 26, 2013 at 05:56:17PM -0800, Darrick J. Wong wrote:
[...]
> > > +static struct ea_name_index ea_names[] = {
> > > +	{1, "user."},
> > > +	{2, "system.posix_acl_access"},
> > > +	{3, "system.posix_acl_default"},
> > > +	{4, "trusted."},
> > > +	{6, "security."},
> > > +	{7, "system."},
> > 
> > It seems that we also have a _RICHACL name here.
> 
> Yes.  Do you know what it's used for?  EXT4_XATTR_INDEX_RICHACL isn't used as
> of 3.13-rc1.

Sorry, I just look at this mail.  If I remember correctly, this flag is
added by Jan Kara because he want to reserve this flag for rich acl
which has been implemented out of upstream kernel tree.

Thanks,
                                                - Zheng

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-29  5:30       ` Zheng Liu
@ 2013-11-29  8:17         ` Jan Kara
  2013-11-30 20:24           ` Darrick J. Wong
  0 siblings, 1 reply; 73+ messages in thread
From: Jan Kara @ 2013-11-29  8:17 UTC (permalink / raw)
  To: Zheng Liu; +Cc: Darrick J. Wong, tytso, linux-ext4

On Fri 29-11-13 13:30:13, Zheng Liu wrote:
> On Tue, Nov 26, 2013 at 05:56:17PM -0800, Darrick J. Wong wrote:
> [...]
> > > > +static struct ea_name_index ea_names[] = {
> > > > +	{1, "user."},
> > > > +	{2, "system.posix_acl_access"},
> > > > +	{3, "system.posix_acl_default"},
> > > > +	{4, "trusted."},
> > > > +	{6, "security."},
> > > > +	{7, "system."},
> > > 
> > > It seems that we also have a _RICHACL name here.
> > 
> > Yes.  Do you know what it's used for?  EXT4_XATTR_INDEX_RICHACL isn't used as
> > of 3.13-rc1.
> 
> Sorry, I just look at this mail.  If I remember correctly, this flag is
> added by Jan Kara because he want to reserve this flag for rich acl
> which has been implemented out of upstream kernel tree.
  Yes, SUSE kernels use EXT4_XATTR_INDEX_RICHACL to store Samba acls.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-29  8:17         ` Jan Kara
@ 2013-11-30 20:24           ` Darrick J. Wong
  2013-12-02  8:38             ` Jan Kara
  0 siblings, 1 reply; 73+ messages in thread
From: Darrick J. Wong @ 2013-11-30 20:24 UTC (permalink / raw)
  To: Jan Kara; +Cc: Zheng Liu, tytso, linux-ext4

On Fri, Nov 29, 2013 at 09:17:07AM +0100, Jan Kara wrote:
> On Fri 29-11-13 13:30:13, Zheng Liu wrote:
> > On Tue, Nov 26, 2013 at 05:56:17PM -0800, Darrick J. Wong wrote:
> > [...]
> > > > > +static struct ea_name_index ea_names[] = {
> > > > > +	{1, "user."},
> > > > > +	{2, "system.posix_acl_access"},
> > > > > +	{3, "system.posix_acl_default"},
> > > > > +	{4, "trusted."},
> > > > > +	{6, "security."},
> > > > > +	{7, "system."},
> > > > 
> > > > It seems that we also have a _RICHACL name here.
> > > 
> > > Yes.  Do you know what it's used for?  EXT4_XATTR_INDEX_RICHACL isn't used as
> > > of 3.13-rc1.
> > 
> > Sorry, I just look at this mail.  If I remember correctly, this flag is
> > added by Jan Kara because he want to reserve this flag for rich acl
> > which has been implemented out of upstream kernel tree.
>   Yes, SUSE kernels use EXT4_XATTR_INDEX_RICHACL to store Samba acls.

Just to confirm, the prefix is 'system.richacl'?

Also, does anyone know if the value 5 maps to anything?

/me adds the attribute name index map to the wiki page.

--D
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes
  2013-11-30 20:24           ` Darrick J. Wong
@ 2013-12-02  8:38             ` Jan Kara
  0 siblings, 0 replies; 73+ messages in thread
From: Jan Kara @ 2013-12-02  8:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Jan Kara, Zheng Liu, tytso, linux-ext4

On Sat 30-11-13 12:24:36, Darrick J. Wong wrote:
> On Fri, Nov 29, 2013 at 09:17:07AM +0100, Jan Kara wrote:
> > On Fri 29-11-13 13:30:13, Zheng Liu wrote:
> > > On Tue, Nov 26, 2013 at 05:56:17PM -0800, Darrick J. Wong wrote:
> > > [...]
> > > > > > +static struct ea_name_index ea_names[] = {
> > > > > > +	{1, "user."},
> > > > > > +	{2, "system.posix_acl_access"},
> > > > > > +	{3, "system.posix_acl_default"},
> > > > > > +	{4, "trusted."},
> > > > > > +	{6, "security."},
> > > > > > +	{7, "system."},
> > > > > 
> > > > > It seems that we also have a _RICHACL name here.
> > > > 
> > > > Yes.  Do you know what it's used for?  EXT4_XATTR_INDEX_RICHACL isn't used as
> > > > of 3.13-rc1.
> > > 
> > > Sorry, I just look at this mail.  If I remember correctly, this flag is
> > > added by Jan Kara because he want to reserve this flag for rich acl
> > > which has been implemented out of upstream kernel tree.
> >   Yes, SUSE kernels use EXT4_XATTR_INDEX_RICHACL to store Samba acls.
> 
> Just to confirm, the prefix is 'system.richacl'?
  Yes.

> Also, does anyone know if the value 5 maps to anything?
  I'm not aware of anything.

> /me adds the attribute name index map to the wiki page.
  Thanks!

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks
  2013-10-24  0:08   ` Theodore Ts'o
@ 2013-12-04  4:40     ` Darrick J. Wong
  0 siblings, 0 replies; 73+ messages in thread
From: Darrick J. Wong @ 2013-12-04  4:40 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Wed, Oct 23, 2013 at 08:08:34PM -0400, Theodore Ts'o wrote:
> On Thu, Oct 17, 2013 at 09:49:28PM -0700, Darrick J. Wong wrote:
> > On a FS with a rather large blockize (> 4K), the old block map
> > structure can construct a fat enough "tree" (or whatever we call that
> > lopsided thing) that (at least in theory) one could create mappings
> > for logical blocks higher than 32 bits.  In practice this doesn't
> > happen, but the 'max' and 'iter' variables that the punch helpers use
> > will overflow because the BLOCK_SIZE_BITS shifts are too large to fit
> > a 32-bit variable.  This causes punch to fail on TIND-mapped blocks
> > even if the file is < 16T.  So enlarge the fields to fit.
> 
> Hmm.... this brings up the question of whether we should support
> inodes that have indirect block maps that result in mappings for
> logical blocks > 32-bits.  There is probably a lot of code that
> assumes that the logical block number is 32-bits that will break
> horribly.

I'm not sure.  The way I noticed this brokeness was by creating a FS with 64k
blocks, sparse-writing a range of blocks at lblk 268451854 (to force it to
create a tind map) and then try to punch it.  The file itself had a size of
just under 16T.  e2fsck seemed fine with the file, and as you can see the lblk
number was nowhere close to 2^32.

I think the problem is that the punch code is using two variables max and incr
as upper limits on how many blocks it should try to punch for a given level.
Since the variables aren't wide enough, they overflow (effectively becoming
zero) and then things like (offset + incr(0) <= start) become true and so it
quits early.

---

If I use fuse2fs to create a non-extent file that exceeds 2^32 blocks (and
blocksize > 4k), fsck doesn't complain.

If the blocksize is 4k or less, the kernel refuses to write the file, but
fuse2fs creates a garbled filesystem (with enormous i_size but no blocks
mapped) and fsck complains.  Hmm, I'll look into that.

--D

> 
> So things brings up a couple of different questions.
> 
> #1) Does e2fsck notice, and does it complain if it trips against one
> of these.
> 
> #2) What should e2fsprogs do when it comes across one of these inodes.
> It may be that simply returning an error is enough, once we notice
> that it hsa blocks larger than this.  Would it be cleaner and more
> efficient for the punch code to simply make sure that it stops before
> the logical block number overflows?  64-bit variables have a cost,
> especially on 32-bit machines.
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2013-12-04  4:40 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-18  4:48 [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Darrick J. Wong
2013-10-18  4:49 ` [PATCH 01/25] libext2fs: stop iterating dirents when done linking Darrick J. Wong
2013-10-23 23:39   ` Theodore Ts'o
2013-10-18  4:49 ` [PATCH 02/25] libext2fs: fix ext2fs_open2() truncation of the superblock parameter Darrick J. Wong
2013-10-18 18:32   ` Darrick J. Wong
2013-10-23 14:49     ` Lukáš Czerner
2013-10-18  4:49 ` [PATCH 03/25] mke2fs: don't let resize= turn on resize_inode when meta_bg is set Darrick J. Wong
2013-10-23 15:08   ` Lukáš Czerner
2013-10-23 23:40   ` Theodore Ts'o
2013-10-18  4:49 ` [PATCH 04/25] libext2fs: reject 64bit badblocks numbers Darrick J. Wong
2013-10-23 15:24   ` Lukáš Czerner
2013-10-23 23:58     ` Theodore Ts'o
2013-10-24 11:40       ` Lukáš Czerner
2013-10-18  4:49 ` [PATCH 05/25] libext2fs: don't overflow when punching indirect blocks with large blocks Darrick J. Wong
2013-10-24  0:08   ` Theodore Ts'o
2013-12-04  4:40     ` Darrick J. Wong
2013-10-18  4:49 ` [PATCH 06/25] libext2fs: fix tests that set LARGE_FILE Darrick J. Wong
2013-11-25  7:09   ` Zheng Liu
2013-11-25 17:57     ` Darrick J. Wong
2013-10-18  4:49 ` [PATCH 07/25] mke2fs: load configfile blocksize setting before 64bit checks Darrick J. Wong
2013-11-25  8:01   ` Zheng Liu
2013-10-18  4:49 ` [PATCH 08/25] debugfs: fix various minor bogosity Darrick J. Wong
2013-11-25  8:08   ` Zheng Liu
2013-11-25 18:05     ` Darrick J. Wong
2013-10-18  4:49 ` [PATCH 09/25] e2fsck: teach EA refcounting code to handle 64bit block addresses Darrick J. Wong
2013-10-18 18:37   ` Darrick J. Wong
2013-11-25  8:18     ` Zheng Liu
2013-10-18  4:50 ` [PATCH 10/25] debugfs: handle 64bit block numbers Darrick J. Wong
2013-10-18 18:47   ` Darrick J. Wong
2013-11-25  8:33   ` Zheng Liu
2013-11-25 17:49     ` Darrick J. Wong
2013-10-18  4:50 ` [PATCH 11/25] libext2fs: only punch complete clusters Darrick J. Wong
2013-10-18 18:55   ` Darrick J. Wong
2013-11-25  8:51   ` Zheng Liu
2013-10-18  4:50 ` [PATCH 12/25] libext2fs: don't update the summary counts when doing implied cluster allocation Darrick J. Wong
2013-11-25  9:03   ` Zheng Liu
2013-10-18  4:50 ` [PATCH 13/25] libext2fs: use ext2fs_punch() to truncate quota file Darrick J. Wong
2013-11-25  9:08   ` Zheng Liu
2013-10-18  4:50 ` [PATCH 14/25] e2fsck: only release clusters when shortening a directory during a rehash Darrick J. Wong
2013-11-25 11:09   ` Zheng Liu
2013-10-18  4:50 ` [PATCH 15/25] e2fsck: print cluster ranges when encountering bitmap errors Darrick J. Wong
2013-11-25 11:56   ` Zheng Liu
2013-10-18  4:50 ` [PATCH 16/25] resize2fs: convert fs to and from 64bit mode Darrick J. Wong
2013-10-18 18:59   ` Darrick J. Wong
2013-11-26  6:44   ` Zheng Liu
2013-11-26 18:39     ` Darrick J. Wong
2013-11-27  2:21       ` Zheng Liu
2013-10-18  4:50 ` [PATCH 17/25] resize2fs: when toggling 64bit, don't free in-use bg data clusters Darrick J. Wong
2013-10-18  4:50 ` [PATCH 18/25] resize2fs: adjust reserved_gdt_blocks when changing group descriptor size Darrick J. Wong
2013-10-18  4:51 ` [PATCH 19/25] resize2fs: during shrink, don't free in-use bg data clusters Darrick J. Wong
2013-10-18  4:51 ` [PATCH 20/25] resize2fs: don't free in-use clusters when moving blocks Darrick J. Wong
2013-10-18  4:51 ` [PATCH 21/25] misc: use the checksum predicate function, not raw flag tests Darrick J. Wong
2013-10-18  4:51 ` [PATCH 22/25] resize2fs: rewrite extent/dir/ea block checksums when migrating Darrick J. Wong
2013-10-18  4:51 ` [PATCH 23/25] libext2fs: support modifying arbitrary extended attributes Darrick J. Wong
2013-10-18 19:25   ` Darrick J. Wong
2013-10-22  1:13   ` Darrick J. Wong
2013-11-26  7:21   ` Zheng Liu
2013-11-26 19:55     ` Darrick J. Wong
2013-11-27  2:52       ` Zheng Liu
2013-11-27  3:13         ` Darrick J. Wong
2013-11-27 11:36           ` Zheng Liu
2013-11-27  1:56     ` Darrick J. Wong
2013-11-29  5:30       ` Zheng Liu
2013-11-29  8:17         ` Jan Kara
2013-11-30 20:24           ` Darrick J. Wong
2013-12-02  8:38             ` Jan Kara
2013-10-18  4:51 ` [PATCH 24/25] misc: add fuse2fs, a FUSE server for e2fsprogs Darrick J. Wong
2013-10-18 19:36   ` Darrick J. Wong
2013-10-22  1:20   ` Darrick J. Wong
2013-10-18 13:13 ` [PATCH v2 00/25] e2fsprogs patchbomb 10/2013 Lukáš Czerner
2013-10-18 18:13   ` Darrick J. Wong
2013-10-18 20:37     ` Darrick J. Wong
2013-10-18 18:39 ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.