All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] some fixes for bugs spotted by valgrind
@ 2011-05-30 21:18 Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:18 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

Hello friends!

tmp branch recently got very nice feature: 'mkfs.btrfs -r /some/directory'.

It's very useful, when you need to creare minimal root: sh and fs_mark.

But there is another hidden feature! As '-r' can create whole filesystem
we can effectively valgrind a lot of code paths in btrfs and pick bugs.

This patch series is mostly (with one exception) dumb obvous holes plugs
(sometimes they are backports from kernel).

Patchset based on 'tmp' branch e6bd18d8938986c997c45f0ea95b221d4edec095.

First off the exception:
In order to make --mixed produce proper filesystems with meta+data only
blocks (and not meta+data/data ones, which confused space_cache and led
to an oops for me) I ask to consider for pulling Arne's patch:
> Subject: [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case

The rest of patches should be obvoius. They don't fix all the fair valgrind
compaints, but reduce them severely.


Poking at valgrind warnings I have noticed very worrying problem.
When we (over)write superblock we take 4096 continuous bytes in memory.
In kernel the structures reside in btrfs_fs_info structure, so we compute
CRC for:
    struct btrfs_super_block super_copy;
    struct btrfs_super_block super_for_commit;
and then write it to disk. Nere we have 2 issues:
1. kernel pointers and other random stuff leaks out to kernel.
   It's nondeterministic and leaks out data (not too bad,
   as it should be accessible only for root, but still)
2. more serious: is there guarantee, that noone will kick-in
   between CRC computation and superblock outwrite?

   What if some of mutexes, semaphores or lists will change
   it's internal state? Some async thread will kick it
   an we will end-up writing superblock with invalid CRC!
   This might well be the cause of recend superblock
   corruptions under heavy load + hangup retorted to the list.

Consider the following call chain:
[somewhere in write_dev_supers ...]

                                                                                                                                                bh->b_end_io = btrfs_end_buffer_write_sync;                        crc = ~(u32)0;
                        crc = btrfs_csum_data(NULL, (char *)sb +
                                              BTRFS_CSUM_SIZE, crc,
                                              BTRFS_SUPER_INFO_SIZE -
                                              BTRFS_CSUM_SIZE);
                        btrfs_csum_final(crc, sb->csum);

                        /*
                         * !!!!!!!!!!!!
                         * something kicks-in and changes fs_info lying right after sb
                         * !!!!!!!!!!!!
                         */

                        bh = __getblk(device->bdev, bytenr / 4096,
                                      BTRFS_SUPER_INFO_SIZE);

                        /*
                         * !!!!!!!!!!!!
                         * and we write superblock with incorrect checksum
                         * !!!!!!!!!!!!
                         */
                        memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);

                        /* one reference for submit_bh */
                        get_bh(bh);

                        set_buffer_uptodate(bh);
                        lock_buffer(bh);
                        bh->b_end_io = btrfs_end_buffer_write_sync;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes' Sergei Trofimovich
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Arne Jansen

From: Arne Jansen <sensille@gmx.net>

When creating a mixed fs with mkfs, an extra metadata chunk got allocated.
This is because btrfs_reserve_extent calls do_chunk_alloc for METADATA,
which in turn wasn't able to find the proper space_info, as __find_space_info
did a hard compare of the flags. It is now sufficient for the space_info to
include the proper flag. This reflects the change done to the kernel code
to support mixed chunks.
Also for a subsequent chunk allocation (which should not be hit in the mkfs
case), the chunk is now created with the flags from the space_info instead
of the requested flags. A better solution would be to pull the full changeset
for the mixed case from the kernel into the user mode (or, even better, share
the code)

The additional chunk probably confused block_rsv calculation, which in turn
led to severeal ENOSPC Oopses.

Signed-off-by: Arne Jansen <sensille@gmx.net>
---
 extent-tree.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index b2f9bb2..c6c77c6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1735,7 +1735,7 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
 	struct btrfs_space_info *found;
 	list_for_each(cur, head) {
 		found = list_entry(cur, struct btrfs_space_info, list);
-		if (found->flags == flags)
+		if (found->flags & flags)
 			return found;
 	}
 	return NULL;
@@ -1812,7 +1812,8 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 	    thresh)
 		return 0;
 
-	ret = btrfs_alloc_chunk(trans, extent_root, &start, &num_bytes, flags);
+	ret = btrfs_alloc_chunk(trans, extent_root, &start, &num_bytes,
+	                        space_info->flags);
 	if (ret == -ENOSPC) {
 		space_info->full = 1;
 		return 0;
@@ -1820,7 +1821,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 
 	BUG_ON(ret);
 
-	ret = btrfs_make_block_group(trans, extent_root, 0, flags,
+	ret = btrfs_make_block_group(trans, extent_root, 0, space_info->flags,
 		     BTRFS_FIRST_CHUNK_TREE_OBJECTID, start, num_bytes);
 	BUG_ON(ret);
 	return 0;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes'
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode) Sergei Trofimovich
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 convert.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/convert.c b/convert.c
index fbcf4a3..291dc27 100644
--- a/convert.c
+++ b/convert.c
@@ -1120,7 +1120,7 @@ fail:
 	return ret;
 }
 /*
- * scan ext2's inode bitmap and copy all used inode.
+ * scan ext2's inode bitmap and copy all used inodes.
  */
 static int copy_inodes(struct btrfs_root *root, ext2_filsys ext2_fs,
 		       int datacsum, int packing, int noxattr)
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode)
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes' Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum Sergei Trofimovich
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

mkfs.btrfs does not handle relative pathnames for now. When
they are passed to it it creates empty image. So first time
I thought it does not work at all.

This patch adds error handling for scandir(). With patch it behaves
this way:

    $ mkfs.btrfs -r ./root
    ...
    fs created label (null) on output.img
            nodesize 4096 leafsize 4096 sectorsize 4096 size 256.00MB
    Btrfs v0.19-52-g438c5ff-dirty
    scandir for ./root failed: No such file or directory
    unable to traverse_directory
    Making image is aborted.
    mkfs.btrfs: mkfs.c:1402: main: Assertion `!(ret)' failed.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 mkfs.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 57c88f9..9d7b792 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -895,6 +895,12 @@ static int traverse_directory(struct btrfs_trans_handle *trans,
 
 		count = scandir(parent_dir_entry->path, &files,
 				directory_select, NULL);
+		if (count == -1)
+		{
+			fprintf(stderr, "scandir for %s failed: %s\n",
+				parent_dir_name, strerror (errno));
+			goto fail;
+		}
 
 		for (i = 0; i < count; i++) {
 			cur_file = files[i];
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (2 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode) Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 5/9] mkfs.btrfs: fix symlink names writing Sergei Trofimovich
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

==31873== Command: ./mkfs.btrfs -r /some/root/
==31873== Parent PID: 31872
==31873==
==31873== Conditional jump or move depends on uninitialised value(s)
==31873==    at 0x42C3D0: add_file_items (mkfs.c:792)
==31873==    by 0x42CAB3: traverse_directory (mkfs.c:948)
==31873==    by 0x42CF11: make_image (mkfs.c:1047)
==31873==    by 0x42DE53: main (mkfs.c:1401)
==31873==  Uninitialised value was created by a stack allocation
==31873==    at 0x41B1B1: btrfs_csum_file_block (file-item.c:195)

'ret' value was not initialized for 'found' branch.

The same fix sits in kernel:
> commit 639cb58675ce9b507eed9c3d6b3335488079b21a
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Thu Aug 28 06:15:25 2008 -0400
>
>     Btrfs: Fix variable init during csum creation
>
>     Signed-off-by: Chris Mason <chris.mason@oracle.com>

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 file-item.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/file-item.c b/file-item.c
index 9732282..47f6ad2 100644
--- a/file-item.c
+++ b/file-item.c
@@ -218,6 +218,7 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
 	item = btrfs_lookup_csum(trans, root, path, bytenr, 1);
 	if (!IS_ERR(item)) {
 		leaf = path->nodes[0];
+		ret = 0;
 		goto found;
 	}
 	ret = PTR_ERR(item);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/9] mkfs.btrfs: fix symlink names writing
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (3 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data Sergei Trofimovich
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

Found by valgrind:
==8968== Use of uninitialised value of size 8
==8968==    at 0x41CE7D: crc32c_le (crc32c.c:98)
==8968==    by 0x40A1D0: csum_tree_block_size (disk-io.c:82)
==8968==    by 0x40A2D4: csum_tree_block (disk-io.c:105)
==8968==    by 0x40A7D6: write_tree_block (disk-io.c:241)
==8968==    by 0x40ACEE: __commit_transaction (disk-io.c:354)
==8968==    by 0x40AE9E: btrfs_commit_transaction (disk-io.c:385)
==8968==    by 0x42CF66: make_image (mkfs.c:1061)
==8968==    by 0x42DE63: main (mkfs.c:1410)
==8968==  Uninitialised value was created by a stack allocation
==8968==    at 0x42B5FB: add_inode_items (mkfs.c:493)

readlink(2) does not write '\0' for us, so make it manually.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 mkfs.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 9d7b792..8ff2b1e 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -709,11 +709,13 @@ static int add_symbolic_link(struct btrfs_trans_handle *trans,
 		fprintf(stderr, "readlink failed for %s\n", path_name);
 		goto fail;
 	}
-	if (ret > sectorsize) {
+	if (ret >= sectorsize) {
 		fprintf(stderr, "symlink too long for %s", path_name);
 		ret = -1;
 		goto fail;
 	}
+
+	buf[ret] = '\0'; /* readlink does not do it for us */
 	ret = btrfs_insert_inline_extent(trans, root, objectid, 0,
 					 buf, ret + 1);
 fail:
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data.
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (4 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 5/9] mkfs.btrfs: fix symlink names writing Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes Sergei Trofimovich
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

Found by valgrind:
==8968== Use of uninitialised value of size 8
==8968==    at 0x41CE7D: crc32c_le (crc32c.c:98)
==8968==    by 0x40A1D0: csum_tree_block_size (disk-io.c:82)
==8968==    by 0x40A2D4: csum_tree_block (disk-io.c:105)
==8968==    by 0x40A7D6: write_tree_block (disk-io.c:241)
==8968==    by 0x40ACEE: __commit_transaction (disk-io.c:354)
==8968==    by 0x40AE9E: btrfs_commit_transaction (disk-io.c:385)
==8968==    by 0x42CF66: make_image (mkfs.c:1061)
==8968==    by 0x42DE63: main (mkfs.c:1410)
==8968==  Uninitialised value was created by a stack allocation
==8968==    at 0x42B5FB: add_inode_items (mkfs.c:493)

1. On-disk inode format has reserved (and thus, random at alloc time) fields:
   btrfs_inode_item: __le64 reserved[4]
2. Sometimes extents are created on disk without writing data there.
   (Or at least not all data is written there). Kernel code always had
   it kzalloc'ed.
Zero them all.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 extent_io.c |    1 +
 mkfs.c      |    7 +++++++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/extent_io.c b/extent_io.c
index 069c199..a93d4d6 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -572,6 +572,7 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree,
 		BUG();
 		return NULL;
 	}
+	memset (eb, 0, sizeof(struct extent_buffer) + blocksize);
 
 	eb->start = bytenr;
 	eb->len = blocksize;
diff --git a/mkfs.c b/mkfs.c
index 8ff2b1e..32f25f5 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -411,6 +411,13 @@ static int fill_inode_item(struct btrfs_trans_handle *trans,
 	u64 blocks = 0;
 	u64 sectorsize = root->sectorsize;
 
+	/*
+	 * btrfs_inode_item has some reserved fields
+	 * and represents on-disk inode entry, so
+	 * zero everything to prevent information leak
+	 */
+	memset (dst, 0, sizeof (*dst));
+
 	btrfs_set_stack_inode_generation(dst, trans->transid);
 	btrfs_set_stack_inode_size(dst, src->st_size);
 	btrfs_set_stack_inode_nbytes(dst, 0);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (5 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls Sergei Trofimovich
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

found by valgrind:
==2559== 16 bytes in 1 blocks are definitely lost in loss record 3 of 19
==2559==    at 0x4C2720E: malloc (vg_replace_malloc.c:236)
==2559==    by 0x412F7E: pretty_sizes (utils.c:1054)
==2559==    by 0x4179E9: main (mkfs.c:1395)

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 mkfs.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 32f25f5..c8b19c1 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1176,6 +1176,7 @@ int main(int ac, char **av)
 	char *output = "output.img";
 	u64 num_of_meta_chunks = 0;
 	u64 size_of_data = 0;
+	char * pretty_buf;
 
 	while(1) {
 		int c;
@@ -1395,7 +1396,8 @@ raid_groups:
 	printf("fs created label %s on %s\n\tnodesize %u leafsize %u "
 	    "sectorsize %u size %s\n",
 	    label, first_file, nodesize, leafsize, sectorsize,
-	    pretty_sizes(btrfs_super_total_bytes(&root->fs_info->super_copy)));
+	    pretty_buf = pretty_sizes(btrfs_super_total_bytes(&root->fs_info->super_copy)));
+	free (pretty_buf);
 
 	printf("%s\n", BTRFS_BUILD_VERSION);
 	btrfs_commit_transaction(trans, root);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (6 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-30 21:19 ` [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode Sergei Trofimovich
  2011-05-31 19:10 ` [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 mkfs.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index c8b19c1..73c898b 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -468,6 +468,15 @@ static int directory_select(const struct direct *entry)
 		return 1;
 }
 
+static void free_namelist(struct direct **files, int count)
+{
+	int i;
+
+	for (i = 0; i < count; ++i)
+		free(files[i]);
+	free (files);
+}
+
 static u64 calculate_dir_inode_size(char *dirname)
 {
 	int count, i;
@@ -481,6 +490,8 @@ static u64 calculate_dir_inode_size(char *dirname)
 		dir_inode_size += strlen(cur_file->d_name);
 	}
 
+	free_namelist(files, count);
+
 	dir_inode_size *= 2;
 	return dir_inode_size;
 }
@@ -971,6 +982,7 @@ static int traverse_directory(struct btrfs_trans_handle *trans,
 			}
 		}
 
+		free_namelist(files, count);
 		free(parent_dir_entry->path);
 		free(parent_dir_entry);
 
@@ -980,6 +992,7 @@ static int traverse_directory(struct btrfs_trans_handle *trans,
 
 	return 0;
 fail:
+	free_namelist(files, count);
 	free(parent_dir_entry->path);
 	free(parent_dir_entry);
 	return -1;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (7 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls Sergei Trofimovich
@ 2011-05-30 21:19 ` Sergei Trofimovich
  2011-05-31 19:10 ` [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
  9 siblings, 0 replies; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-30 21:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Sergei Trofimovich

Smart gcc noticed use of uninitialized warning when compiled
with -O0 flags:

    mkfs.c:1291: error: 'file' may be used uninitialized in this function

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
---
 mkfs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 73c898b..ef0407f 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1286,13 +1286,13 @@ int main(int ac, char **av)
 			block_count = dev_block_count;
 	} else {
 		ac = 0;
+		file = output;
 		fd = open_target(output);
 		if (fd < 0) {
 			fprintf(stderr, "unable to open the %s\n", file);
 			exit(1);
 		}
 
-		file = output;
 		first_fd = fd;
 		first_file = file;
 		block_count = size_sourcedir(source_dir, sectorsize,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
  2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
                   ` (8 preceding siblings ...)
  2011-05-30 21:19 ` [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode Sergei Trofimovich
@ 2011-05-31 19:10 ` Sergei Trofimovich
  2011-06-02 20:17   ` Andi Kleen
  9 siblings, 1 reply; 14+ messages in thread
From: Sergei Trofimovich @ 2011-05-31 19:10 UTC (permalink / raw)
  To: Chris Mason; +Cc: Sergei Trofimovich, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4153 bytes --]

Some clarifications:

> Patchset based on 'tmp' branch e6bd18d8938986c997c45f0ea95b221d4edec095.
All patches are against btrfs-progs.

====

The rest of rambling is about kernel code, which handles supers.
I have read what I've wrote last night (braindump of insane!)
and will try to elaborate a bit:

> Poking at valgrind warnings I have noticed very worrying problem.
> When we (over)write superblock we take 4096 continuous bytes in memory.

The common pattern in disk-io.c is the following:

    struct btrfs_super_block *sb = root->fs_info->sb // or ->sb_for_commit
    memcpy(dest, sb, BTRFS_SUPER_INFO_SIZE /* 4096 */);

With a memcpy we go out of sizeof(*sb) (~2.5K) and pick more fields from fs_info struct.
Let's look at them:
struct btrfs_fs_info {
...
        struct btrfs_super_block super_copy;
        struct btrfs_super_block super_for_commit;
        struct block_device *__bdev;
        struct super_block *sb;
        struct inode *btree_inode;
        struct backing_dev_info bdi;
        struct mutex trans_mutex;
        struct mutex tree_log_mutex;
        struct mutex transaction_kthread_mutex;
        struct mutex cleaner_mutex;
        struct mutex chunk_mutex;
        struct mutex volume_mutex;
        struct mutex ordered_operations_mutex;
        struct rw_semaphore extent_commit_sem;

        struct rw_semaphore cleanup_work_sem;

        struct rw_semaphore subvol_sem;
        struct srcu_struct subvol_srcu;

        struct list_head trans_list;
        struct list_head hashers;
        struct list_head dead_roots;
        struct list_head caching_block_groups;

        spinlock_t delayed_iput_lock;
        struct list_head delayed_iputs;

        atomic_t nr_async_submits;
...
So we copyout (and even checksum!) atomic counters and other volatile stuff
(I haven't looked at mutexes and semaphores, but i'm sure there is yet some
volatile stuff) 

> In kernel the structures reside in btrfs_fs_info structure, so we compute
> CRC for:
>     struct btrfs_super_block super_copy;
>     struct btrfs_super_block super_for_commit;
> and then write it to disk. [H]ere we have 2 issues:
> 1. kernel pointers and other random stuff leaks out to kernel.
>    It's nondeterministic and leaks out data (not too bad,
>    as it should be accessible only for root, but still)
> 2. more serious: is there guarantee, that noone will kick-in
>    between CRC computation and superblock outwrite?
> 
>    What if some of mutexes, semaphores or lists will change
>    it's internal state? Some async thread will kick it
>    an we will end-up writing superblock with invalid CRC!
>    This might well be the cause of recend superblock
>    corruptions under heavy load + hangup retorted to the list.
> 
> Consider the following call chain:
> [somewhere in write_dev_supers ...]
> 
>                         bh->b_end_io = btrfs_end_buffer_write_sync;
>                         crc = ~(u32)0;
>                         crc = btrfs_csum_data(NULL, (char *)sb +
>                                               BTRFS_CSUM_SIZE, crc,
>                                               BTRFS_SUPER_INFO_SIZE -
>                                               BTRFS_CSUM_SIZE);
>                         btrfs_csum_final(crc, sb->csum);

Now the problem should be a bit more clear: sb is a thing pointing to the middle
of fs_info. We checksum it with data after it in fs_info.

>                         bh = __getblk(device->bdev, bytenr / 4096,
>                                       BTRFS_SUPER_INFO_SIZE);
> 
>                         memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);

and here we write all the checksummed contents. Is there guard, which
prevents things from updating fs_info?

> 
>                         /* one reference for submit_bh */
>                         get_bh(bh);
> 
>                         set_buffer_uptodate(bh);
>                         lock_buffer(bh);
>                         bh->b_end_io = btrfs_end_buffer_write_sync;

Am I too paranoid about the issue?

Thanks!

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
  2011-05-31 19:10 ` [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
@ 2011-06-02 20:17   ` Andi Kleen
  2011-06-02 21:13     ` Sergei Trofimovich
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-06-02 20:17 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: Chris Mason, linux-btrfs

Sergei Trofimovich <slyfox@gentoo.org> writes:
>
> Am I too paranoid about the issue?

It sounds weird, because if the kernel would really checksum
mutexes on disk you would have a lot of on disk
format incompatibility between different kernel versions
(e.g. between lockdep and normal kernels or kernels
running on different architectures)

If it would really happen (no opinion on that) it would
be a serious bug.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
  2011-06-02 20:17   ` Andi Kleen
@ 2011-06-02 21:13     ` Sergei Trofimovich
  2011-06-02 23:26       ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Sergei Trofimovich @ 2011-06-02 21:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Chris Mason, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

On Thu, 02 Jun 2011 13:17:55 -0700
Andi Kleen <andi@firstfloor.org> wrote:

> Sergei Trofimovich <slyfox@gentoo.org> writes:
> >
> > Am I too paranoid about the issue?
> 
> It sounds weird, because if the kernel would really checksum
> mutexes on disk you would have a lot of on disk
> format incompatibility between different kernel versions
> (e.g. between lockdep and normal kernels or kernels
> running on different architectures)
>
> If it would really happen (no opinion on that) it would
> be a serious bug.

Oh, I don't think things are so bad.

In order it to be a problem superblock loading would have to
be loaded exactly the same way as it's stored, but it isn't.
At least super copies (baked into btrfs_fs_info) are read
to separate data block (buffer_hear) and then copied properly
(in open_ctree) to super_copy/super_for_commit:

        bh = btrfs_read_dev_super(fs_devices->latest_bdev);
        if (!bh) {
                err = -EINVAL;
                goto fail_alloc;
        }

        memcpy(&fs_info->super_copy, bh->b_data, sizeof(fs_info->super_copy));
        memcpy(&fs_info->super_for_commit, &fs_info->super_copy,
               sizeof(fs_info->super_for_commit));
        brelse(bh);

But the way superblocks are written look racy.

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
  2011-06-02 21:13     ` Sergei Trofimovich
@ 2011-06-02 23:26       ` Andi Kleen
  0 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2011-06-02 23:26 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: Andi Kleen, Chris Mason, linux-btrfs

>         bh = btrfs_read_dev_super(fs_devices->latest_bdev);
>         if (!bh) {
>                 err = -EINVAL;
>                 goto fail_alloc;
>         }
> 
>         memcpy(&fs_info->super_copy, bh->b_data, sizeof(fs_info->super_copy));
>         memcpy(&fs_info->super_for_commit, &fs_info->super_copy,
>                sizeof(fs_info->super_for_commit));
>         brelse(bh);
> 
> But the way superblocks are written look racy.

FWIW for my ext4 checksumming work I had to add an extra lock to stabilize 
the super block (and the inodes) during checksumming.

Maybe something similar is needed here too.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-06-02 23:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes' Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode) Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 5/9] mkfs.btrfs: fix symlink names writing Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode Sergei Trofimovich
2011-05-31 19:10 ` [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
2011-06-02 20:17   ` Andi Kleen
2011-06-02 21:13     ` Sergei Trofimovich
2011-06-02 23:26       ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.