[Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable
@ 2013-07-16 12:56 Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 2/7] fsck.gfs2: fix some log messages Bob Peterson
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch initializes a variable so that it no longer references
it uninitialized.

rhbz#984085
---
 gfs2/fsck/initialize.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
index b01b240..936fd5e 100644
--- a/gfs2/fsck/initialize.c
+++ b/gfs2/fsck/initialize.c
@@ -832,7 +832,7 @@ static int get_lockproto_table(struct gfs2_sbd *sdp)
 {
 	FILE *fp;
 	char line[PATH_MAX];
-	char *cluname, *end;
+	char *cluname, *end = NULL;
 	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";
 
 	memset(sdp->lockproto, 0, sizeof(sdp->lockproto));
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 2/7] fsck.gfs2: fix some log messages
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
@ 2013-07-16 12:56 ` Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 3/7] fsck.gfs2: Fix directory link on relocated directory dirents Bob Peterson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch fixes two messages: First, there's a place where it was
supposed to print a value in hexadecimal, but it printed in decimal.
Second, there's a place where it failed to print a newline.

rhbz#984085
---
 gfs2/fsck/pass2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gfs2/fsck/pass2.c b/gfs2/fsck/pass2.c
index 3d0bb49..fb15eae 100644
--- a/gfs2/fsck/pass2.c
+++ b/gfs2/fsck/pass2.c
@@ -1423,7 +1423,7 @@ static int check_hash_tbl(struct gfs2_inode *ip, uint64_t *tbl,
 				continue;
 
 			log_err(_("Dinode %llu (0x%llx) has a hash table "
-				  "inconsistency at index %d (0x%d) for %d\n"),
+				  "inconsistency at index %d (0x%x) for %d\n"),
 				(unsigned long long)ip->i_di.di_num.no_addr,
 				(unsigned long long)ip->i_di.di_num.no_addr,
 				i, i, len);
@@ -1442,7 +1442,7 @@ static int check_hash_tbl(struct gfs2_inode *ip, uint64_t *tbl,
 			proper_len = 1 << (ip->i_di.di_depth - leaf.lf_depth);
 			if (proper_len != len) {
 				log_debug(_("Length 0x%x is not proper for "
-					    "leaf %llu (0x%llx): 0x%x"),
+					    "leaf %llu (0x%llx): 0x%x\n"),
 					  len, (unsigned long long)leafblk,
 					  (unsigned long long)leafblk,
 					  proper_len);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 3/7] fsck.gfs2: Fix directory link on relocated directory dirents
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 2/7] fsck.gfs2: fix some log messages Bob Peterson
@ 2013-07-16 12:56 ` Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 4/7] fsck.gfs2: Fix infinite loop in pass1b caused by duplicates in hash table Bob Peterson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

There was a problem whereby fsck.gfs2 would discover a directory entry
that's on the wrong leaf block (the dirent's hash value doesn't fit
the acceptable range for that leaf block). In that case, fsck would
relocate the dirent to an appropriate leaf block. The problem is, if
that dirent was for a directory, the directory link count was adjusted
appropriately, but it was not properly added to the directory linkage
due to the fact that it took an error path in order to delete the
dirent that was relocated. This resulted in a problem when pass3 went
to verify the directory linkage. Despite the intact directory linkage,
it flagged the directory as not linked, then it improperly added it
to lost+found. Subsequent fsck.gfs2 would discover the double-linkage
problems.

rhbz#984085
---
 gfs2/fsck/pass2.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gfs2/fsck/pass2.c b/gfs2/fsck/pass2.c
index fb15eae..dabfc13 100644
--- a/gfs2/fsck/pass2.c
+++ b/gfs2/fsck/pass2.c
@@ -391,7 +391,18 @@ static int wrong_leaf(struct gfs2_inode *ip, struct gfs2_inum *entry,
 		   leaf, but that leaf has already been processed. So we have
 		   to nuke the dent from this leaf when we return, but we
 		   still need to do the "good dent" accounting. */
-		error = incr_link_count(*entry, ip, _("valid reference"));
+		if (de->de_type == (sdp->gfs1 ? GFS_FILE_DIR : DT_DIR)) {
+			error = set_parent_dir(sdp, de->de_inum,
+					       ip->i_di.di_num);
+			if (error > 0)
+				/* This is a bit of a kludge, but returning 0
+				   in this case causes the caller to go through
+				   function set_parent_dir a second time and
+				   deal properly with the hard link. */
+				return 0;
+		}
+		error = incr_link_count(*entry, ip,
+					_("moved valid reference"));
 		if (error > 0 &&
 		    bad_formal_ino(ip, dent, *entry, tmp_name, q, de, bh) == 1)
 			return 1; /* nuke it */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 4/7] fsck.gfs2: Fix infinite loop in pass1b caused by duplicates in hash table
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 2/7] fsck.gfs2: fix some log messages Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 3/7] fsck.gfs2: Fix directory link on relocated directory dirents Bob Peterson
@ 2013-07-16 12:56 ` Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2 Bob Peterson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

If directory leaf blocks appeared more than once in a directory hash
table, the duplicate references would be flagged as a duplicate,
but since the duplicates were all from the same directory dinode,
pass1b would not realize it needed to stop. This patch changes the
criteria for duplicate processing so that it stops when the references
are down to a single dinode, not a single reference. This allows
it to get beyond pass1b so that the problem can be fixed by the hash
table checking code in pass2.

rhbz#984085
---
 gfs2/fsck/pass1b.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gfs2/fsck/pass1b.c b/gfs2/fsck/pass1b.c
index 0dcb306..e3da20a 100644
--- a/gfs2/fsck/pass1b.c
+++ b/gfs2/fsck/pass1b.c
@@ -368,11 +368,11 @@ static int handle_dup_blk(struct gfs2_sbd *sdp, struct duptree *dt)
 	/* If there's still a last remaining reference, and it's a valid
 	   reference, use it to determine the correct block type for our
 	   blockmap and bitmap. */
-	if (dh.ref_count == 1 && !osi_list_empty(&dt->ref_inode_list)) {
+	if (dh.ref_inode_count == 1 && !osi_list_empty(&dt->ref_inode_list)) {
 		uint8_t q;
 
 		log_notice( _("Block %llu (0x%llx) has only one remaining "
-			      "valid reference.\n"),
+			      "valid inode referencing it.\n"),
 			    (unsigned long long)dup_blk,
 			    (unsigned long long)dup_blk);
 		/* If we're down to a single reference (and not all references
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
                   ` (2 preceding siblings ...)
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 4/7] fsck.gfs2: Fix infinite loop in pass1b caused by duplicates in hash table Bob Peterson
@ 2013-07-16 12:56 ` Bob Peterson
  2013-07-16 16:52   ` Andrew Price
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 6/7] fsck.gfs2: avoid negative number in leaf depth Bob Peterson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch checks whether lost+found was created during the run
and if so, it skips pass2 directory checking for lost+found.
The reason we need to do this is because when lost+found is created,
it sets the directory linkage count to 1. If pass2 checks the
directory, it will add it a second time because it checks it like
every other normal directory. Therefore, the linkage counts get off.
Plus, it's a waste of time to check it, since we just built it.

rhbz#984085
---
 gfs2/fsck/fsck.h         | 1 +
 gfs2/fsck/lost_n_found.c | 1 +
 gfs2/fsck/main.c         | 1 +
 gfs2/fsck/pass2.c        | 8 ++++++++
 4 files changed, 11 insertions(+)

diff --git a/gfs2/fsck/fsck.h b/gfs2/fsck/fsck.h
index 6d888af..8555e9e 100644
--- a/gfs2/fsck/fsck.h
+++ b/gfs2/fsck/fsck.h
@@ -131,6 +131,7 @@ struct gfs2_options {
 
 extern struct gfs2_options opts;
 extern struct gfs2_inode *lf_dip; /* Lost and found directory inode */
+extern int lf_was_created;
 extern struct gfs2_bmap *bl;
 extern uint64_t last_fs_block, last_reported_block;
 extern int64_t last_reported_fblock;
diff --git a/gfs2/fsck/lost_n_found.c b/gfs2/fsck/lost_n_found.c
index 3226ac9..919cb2f 100644
--- a/gfs2/fsck/lost_n_found.c
+++ b/gfs2/fsck/lost_n_found.c
@@ -132,6 +132,7 @@ void make_sure_lf_exists(struct gfs2_inode *ip)
 
 	q = block_type(lf_dip->i_di.di_num.no_addr);
 	if (q != gfs2_inode_dir) {
+		lf_was_created = 1;
 		/* This is a new lost+found directory, so set its block type
 		   and increment link counts for the directories */
 		/* FIXME: i'd feel better about this if fs_mkdir returned
diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
index 9c3b06d..287404d 100644
--- a/gfs2/fsck/main.c
+++ b/gfs2/fsck/main.c
@@ -22,6 +22,7 @@
 
 struct gfs2_options opts = {0};
 struct gfs2_inode *lf_dip = NULL; /* Lost and found directory inode */
+int lf_was_created = 0;
 struct gfs2_bmap *bl = NULL;
 uint64_t last_fs_block, last_reported_block = -1;
 int64_t last_reported_fblock = -1000000;
diff --git a/gfs2/fsck/pass2.c b/gfs2/fsck/pass2.c
index dabfc13..81fbe06 100644
--- a/gfs2/fsck/pass2.c
+++ b/gfs2/fsck/pass2.c
@@ -1683,6 +1683,14 @@ int pass2(struct gfs2_sbd *sdp)
 		if (q != gfs2_inode_dir)
 			continue;
 
+		/* If we created lost+found, its links should have been
+		   properly adjusted, so don't check it. */
+		if (lf_was_created &&
+		    (dirblk == lf_dip->i_di.di_num.no_addr)) {
+			log_debug("Pass2 skipping the new new lost+found.\n");
+			continue;
+		}
+
 		log_debug( _("Checking directory inode at block %llu (0x%llx)\n"),
 			  (unsigned long long)dirblk, (unsigned long long)dirblk);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 6/7] fsck.gfs2: avoid negative number in leaf depth
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
                   ` (3 preceding siblings ...)
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2 Bob Peterson
@ 2013-07-16 12:56 ` Bob Peterson
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables Bob Peterson
  2013-07-17  5:30 ` [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Fabio M. Di Nitto
  6 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

If a leaf block appears many times by mistake in the same hash table,
there's a change it belongs in a different directory's hash table, and
was put there by mistake. In that case, the leaf depth may be too
big for the directory in which it improperly appears. When that happens,
function check_leaf_depth can come up with a negative number when it
calculates the depth. This patch prevents the negative leaf depth error
by flagging an error when it determines the leaf depth is wrong.
The error causes pass2 to relocate the leaf to lost+found when it returns.

rhbz#984085
---
 gfs2/fsck/pass2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gfs2/fsck/pass2.c b/gfs2/fsck/pass2.c
index 81fbe06..a40a4a7 100644
--- a/gfs2/fsck/pass2.c
+++ b/gfs2/fsck/pass2.c
@@ -276,6 +276,9 @@ static int check_leaf_depth(struct gfs2_inode *ip, uint64_t leaf_no,
 		factor++;
 		divisor >>= 1;
 	}
+	if (ip->i_di.di_depth < factor) /* can't be fixed--leaf must be on the
+					   wrong dinode. */
+		return -1;
 	correct_depth = ip->i_di.di_depth - factor;
 	if (cur_depth == correct_depth)
 		return 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
                   ` (4 preceding siblings ...)
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 6/7] fsck.gfs2: avoid negative number in leaf depth Bob Peterson
@ 2013-07-16 12:56 ` Bob Peterson
  2013-07-16 16:52   ` Andrew Price
  2013-07-17  5:30 ` [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Fabio M. Di Nitto
  6 siblings, 1 reply; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch checks directory hash tables for multiple occurrences
of the same leaf block. When duplicates are encountered, it checks
to see which reference is appropriate for the hash table location
based on the leaf block's hash value. The appropriate reference is
kept and all duplicates are replaced by newly created leaf blocks.

rhbz#984085
---
 gfs2/fsck/pass2.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

diff --git a/gfs2/fsck/pass2.c b/gfs2/fsck/pass2.c
index a40a4a7..9f3e5c1 100644
--- a/gfs2/fsck/pass2.c
+++ b/gfs2/fsck/pass2.c
@@ -1278,6 +1278,110 @@ static int fix_hashtable(struct gfs2_inode *ip, uint64_t *tbl, unsigned hsize,
 	return changes;
 }
 
+/* check_hash_tbl_dups - check for the same leaf in multiple places */
+static int check_hash_tbl_dups(struct gfs2_inode *ip, uint64_t *tbl,
+			       unsigned hsize, int lindex, int len)
+{
+	int l, len2;
+	uint64_t leafblk, leaf_no;
+	struct gfs2_buffer_head *lbh;
+	struct gfs2_leaf leaf;
+	struct gfs2_dirent dentry, *de;
+	int hash_index; /* index into the hash table based on the hash */
+
+	leafblk = be64_to_cpu(tbl[lindex]);
+	for (l = 0; l < hsize; l++) {
+		if (l == lindex) { /* skip the valid reference */
+			l += len - 1;
+			continue;
+		}
+		if (be64_to_cpu(tbl[l]) != leafblk)
+			continue;
+
+		for (len2 = 0; l + len2 < hsize; len2++) {
+			if (l + len2 == lindex)
+				break;
+			if (be64_to_cpu(tbl[l + len2]) != leafblk)
+				break;
+		}
+		log_err(_("Dinode %llu (0x%llx) has duplicate leaf pointers "
+			  "to block %llu (0x%llx)@offsets %u (0x%x) "
+			  "(for 0x%x) and %u (0x%x) (for 0x%x)\n"),
+			(unsigned long long)ip->i_di.di_num.no_addr,
+			(unsigned long long)ip->i_di.di_num.no_addr,
+			(unsigned long long)leafblk,
+			(unsigned long long)leafblk, lindex, lindex, len,
+			l, l, len2);
+
+		/* See which set of references is valid: the one passed in
+		   or the duplicate we found. */
+		memset(&leaf, 0, sizeof(leaf));
+		leaf_no = leafblk;
+		if (!valid_block(ip->i_sbd, leaf_no)) /* Checked later */
+			continue;
+
+		lbh = bread(ip->i_sbd, leafblk);
+		if (gfs2_check_meta(lbh, GFS2_METATYPE_LF)) { /* Chked later */
+			brelse(lbh);
+			continue;
+		}
+
+		memset(&dentry, 0, sizeof(struct gfs2_dirent));
+		de = (struct gfs2_dirent *)(lbh->b_data +
+					    sizeof(struct gfs2_leaf));
+		gfs2_dirent_in(&dentry, (char *)de);
+		hash_index = hash_table_index(dentry.de_hash, ip);
+		brelse(lbh);
+		/* check the duplicate ref first */
+		if (hash_index < l ||  hash_index > l + len2) {
+			log_err(_("This leaf block has hash index %d, which "
+				  "is out of bounds for lindex (%d - %d)\n"),
+				hash_index, l, l + len2);
+			if (!query( _("Fix the hash table? (y/n) "))) {
+				log_err(_("Hash table not fixed.\n"));
+				brelse(lbh);
+				return 0;
+			}
+			/* Adjust the ondisk block count. The original value
+			   may have been correct without the duplicates but
+			   pass1 would have counted them and adjusted the
+			   count to include them. So we must subtract them. */
+			ip->i_di.di_blocks--;
+			bmodified(ip->i_bh);
+			pad_with_leafblks(ip, tbl, l, len2);
+		} else {
+			log_debug(_("Hash index 0x%x is the proper "
+				    "reference to leaf 0x%llx.\n"),
+				  l, (unsigned long long)leafblk);
+		}
+		/* Check the original ref: both references might be bad.
+		   If both were bad, just return and if we encounter it
+		   again, we'll treat it as new. If the original ref is not
+		   bad, keep looking for (and fixing) other instances. */
+		if (hash_index < lindex ||  hash_index > lindex + len) {
+			log_err(_("This leaf block has hash index %d, which "
+				  "is out of bounds for lindex (%d - %d).\n"),
+				hash_index, lindex, lindex + len);
+			if (!query( _("Fix the hash table? (y/n) "))) {
+				log_err(_("Hash table not fixed.\n"));
+				brelse(lbh);
+				return 0;
+			}
+			ip->i_di.di_blocks--;
+			bmodified(ip->i_bh);
+			pad_with_leafblks(ip, tbl, lindex, len);
+			/* At this point we know both copies are bad, so we
+			   return to start fresh */
+			return -EFAULT;
+		} else {
+			log_debug(_("Hash index 0x%x is the proper "
+				    "reference to leaf 0x%llx.\n"),
+				  lindex, (unsigned long long)leafblk);
+		}
+	}
+	return 0;
+}
+
 /* check_hash_tbl - check that the hash table is sane
  *
  * We've got to make sure the hash table is sane. Each leaf needs to
@@ -1372,6 +1476,10 @@ static int check_hash_tbl(struct gfs2_inode *ip, uint64_t *tbl,
 			lindex += proper_len;
 			continue;
 		}
+
+		if (check_hash_tbl_dups(ip, tbl, hsize, lindex, len))
+			continue;
+		
 		/* Make sure they call on proper leaf-split boundaries. This
 		   is the calculation used by the kernel, and dir_split_leaf */
 		proper_start = (lindex & ~(proper_len - 1));
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2 Bob Peterson
@ 2013-07-16 16:52   ` Andrew Price
  2013-07-16 17:14     ` Bob Peterson
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Price @ 2013-07-16 16:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

 > --- a/gfs2/fsck/pass2.c
> +++ b/gfs2/fsck/pass2.c
> @@ -1683,6 +1683,14 @@ int pass2(struct gfs2_sbd *sdp)
>   		if (q != gfs2_inode_dir)
>   			continue;
>
> +		/* If we created lost+found, its links should have been
> +		   properly adjusted, so don't check it. */
> +		if (lf_was_created &&
> +		    (dirblk == lf_dip->i_di.di_num.no_addr)) {
> +			log_debug("Pass2 skipping the new new lost+found.\n");

Spotted "new new" here but maybe it needs _() also.

Cheers,
Andy



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables Bob Peterson
@ 2013-07-16 16:52   ` Andrew Price
  2013-07-16 17:16     ` Bob Peterson
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Price @ 2013-07-16 16:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

A couple of comments below...

On 16/07/13 13:56, Bob Peterson wrote:
> This patch checks directory hash tables for multiple occurrences
> of the same leaf block. When duplicates are encountered, it checks
> to see which reference is appropriate for the hash table location
> based on the leaf block's hash value. The appropriate reference is
> kept and all duplicates are replaced by newly created leaf blocks.
>
> rhbz#984085
> ---
>   gfs2/fsck/pass2.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 108 insertions(+)
>
> diff --git a/gfs2/fsck/pass2.c b/gfs2/fsck/pass2.c
> index a40a4a7..9f3e5c1 100644
> --- a/gfs2/fsck/pass2.c
> +++ b/gfs2/fsck/pass2.c
> @@ -1278,6 +1278,110 @@ static int fix_hashtable(struct gfs2_inode *ip, uint64_t *tbl, unsigned hsize,
>   	return changes;
>   }
>
> +/* check_hash_tbl_dups - check for the same leaf in multiple places */
> +static int check_hash_tbl_dups(struct gfs2_inode *ip, uint64_t *tbl,
> +			       unsigned hsize, int lindex, int len)
> +{
> +	int l, len2;
> +	uint64_t leafblk, leaf_no;
> +	struct gfs2_buffer_head *lbh;
> +	struct gfs2_leaf leaf;
> +	struct gfs2_dirent dentry, *de;
> +	int hash_index; /* index into the hash table based on the hash */
> +
> +	leafblk = be64_to_cpu(tbl[lindex]);
> +	for (l = 0; l < hsize; l++) {
> +		if (l == lindex) { /* skip the valid reference */
> +			l += len - 1;
> +			continue;
> +		}
> +		if (be64_to_cpu(tbl[l]) != leafblk)
> +			continue;
> +
> +		for (len2 = 0; l + len2 < hsize; len2++) {
> +			if (l + len2 == lindex)
> +				break;
> +			if (be64_to_cpu(tbl[l + len2]) != leafblk)
> +				break;
> +		}
> +		log_err(_("Dinode %llu (0x%llx) has duplicate leaf pointers "
> +			  "to block %llu (0x%llx) at offsets %u (0x%x) "
> +			  "(for 0x%x) and %u (0x%x) (for 0x%x)\n"),
> +			(unsigned long long)ip->i_di.di_num.no_addr,
> +			(unsigned long long)ip->i_di.di_num.no_addr,
> +			(unsigned long long)leafblk,
> +			(unsigned long long)leafblk, lindex, lindex, len,
> +			l, l, len2);
> +
> +		/* See which set of references is valid: the one passed in
> +		   or the duplicate we found. */
> +		memset(&leaf, 0, sizeof(leaf));
> +		leaf_no = leafblk;
> +		if (!valid_block(ip->i_sbd, leaf_no)) /* Checked later */
> +			continue;
> +
> +		lbh = bread(ip->i_sbd, leafblk);
> +		if (gfs2_check_meta(lbh, GFS2_METATYPE_LF)) { /* Chked later */
> +			brelse(lbh);
> +			continue;
> +		}
> +
> +		memset(&dentry, 0, sizeof(struct gfs2_dirent));
> +		de = (struct gfs2_dirent *)(lbh->b_data +
> +					    sizeof(struct gfs2_leaf));
> +		gfs2_dirent_in(&dentry, (char *)de);
> +		hash_index = hash_table_index(dentry.de_hash, ip);
> +		brelse(lbh);

Unconditionally releasing lbh here ...

> +		/* check the duplicate ref first */
> +		if (hash_index < l ||  hash_index > l + len2) {
> +			log_err(_("This leaf block has hash index %d, which "
> +				  "is out of bounds for lindex (%d - %d)\n"),
> +				hash_index, l, l + len2);
> +			if (!query( _("Fix the hash table? (y/n) "))) {
> +				log_err(_("Hash table not fixed.\n"));
> +				brelse(lbh);

so shouldn't need to here...

> +				return 0;
> +			}
> +			/* Adjust the ondisk block count. The original value
> +			   may have been correct without the duplicates but
> +			   pass1 would have counted them and adjusted the
> +			   count to include them. So we must subtract them. */
> +			ip->i_di.di_blocks--;
> +			bmodified(ip->i_bh);
> +			pad_with_leafblks(ip, tbl, l, len2);
> +		} else {
> +			log_debug(_("Hash index 0x%x is the proper "
> +				    "reference to leaf 0x%llx.\n"),
> +				  l, (unsigned long long)leafblk);
> +		}
> +		/* Check the original ref: both references might be bad.
> +		   If both were bad, just return and if we encounter it
> +		   again, we'll treat it as new. If the original ref is not
> +		   bad, keep looking for (and fixing) other instances. */
> +		if (hash_index < lindex ||  hash_index > lindex + len) {
> +			log_err(_("This leaf block has hash index %d, which "
> +				  "is out of bounds for lindex (%d - %d).\n"),
> +				hash_index, lindex, lindex + len);
> +			if (!query( _("Fix the hash table? (y/n) "))) {
> +				log_err(_("Hash table not fixed.\n"));
> +				brelse(lbh);

and here.

The rest looks good to me.

Cheers,
Andy

> +				return 0;
> +			}
> +			ip->i_di.di_blocks--;
> +			bmodified(ip->i_bh);
> +			pad_with_leafblks(ip, tbl, lindex, len);
> +			/* At this point we know both copies are bad, so we
> +			   return to start fresh */
> +			return -EFAULT;
> +		} else {
> +			log_debug(_("Hash index 0x%x is the proper "
> +				    "reference to leaf 0x%llx.\n"),
> +				  lindex, (unsigned long long)leafblk);
> +		}
> +	}
> +	return 0;
> +}
> +
>   /* check_hash_tbl - check that the hash table is sane
>    *
>    * We've got to make sure the hash table is sane. Each leaf needs to
> @@ -1372,6 +1476,10 @@ static int check_hash_tbl(struct gfs2_inode *ip, uint64_t *tbl,
>   			lindex += proper_len;
>   			continue;
>   		}
> +
> +		if (check_hash_tbl_dups(ip, tbl, hsize, lindex, len))
> +			continue;
> +		
>   		/* Make sure they call on proper leaf-split boundaries. This
>   		   is the calculation used by the kernel, and dir_split_leaf */
>   		proper_start = (lindex & ~(proper_len - 1));
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2
  2013-07-16 16:52   ` Andrew Price
@ 2013-07-16 17:14     ` Bob Peterson
  0 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 17:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| Hi Bob,
| 
|  > --- a/gfs2/fsck/pass2.c
| > +++ b/gfs2/fsck/pass2.c
| > @@ -1683,6 +1683,14 @@ int pass2(struct gfs2_sbd *sdp)
| >   		if (q != gfs2_inode_dir)
| >   			continue;
| >
| > +		/* If we created lost+found, its links should have been
| > +		   properly adjusted, so don't check it. */
| > +		if (lf_was_created &&
| > +		    (dirblk == lf_dip->i_di.di_num.no_addr)) {
| > +			log_debug("Pass2 skipping the new new lost+found.\n");
| 
| Spotted "new new" here but maybe it needs _() also.
| 
| Cheers,
| Andy
| 
Good catch. Fixed in my latest version.

Bob



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables
  2013-07-16 16:52   ` Andrew Price
@ 2013-07-16 17:16     ` Bob Peterson
  0 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-16 17:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| Hi Bob,
| 
| A couple of comments below...
(snip)
| > +				brelse(lbh);
| 
| so shouldn't need to here...
(snip) 
| > +				brelse(lbh);
| 
| and here.
| 
| The rest looks good to me.
| 
| Cheers,
| Andy

Another good catch. Fixed in my latest version.
I found and fixed a whitespace error in this patch as well.

Bob



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable
  2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
                   ` (5 preceding siblings ...)
  2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables Bob Peterson
@ 2013-07-17  5:30 ` Fabio M. Di Nitto
  2013-07-17  7:53   ` Steven Whitehouse
  6 siblings, 1 reply; 18+ messages in thread
From: Fabio M. Di Nitto @ 2013-07-17  5:30 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 07/16/2013 02:56 PM, Bob Peterson wrote:
> This patch initializes a variable so that it no longer references
> it uninitialized.
> 
> rhbz#984085
> ---
>  gfs2/fsck/initialize.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
> index b01b240..936fd5e 100644
> --- a/gfs2/fsck/initialize.c
> +++ b/gfs2/fsck/initialize.c
> @@ -832,7 +832,7 @@ static int get_lockproto_table(struct gfs2_sbd *sdp)
>  {
>  	FILE *fp;
>  	char line[PATH_MAX];
> -	char *cluname, *end;
> +	char *cluname, *end = NULL;
>  	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";

Just spotted this reference to cluster.conf ^^ remember it doesn't exist
anymore in the new era.

Fabio



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable
  2013-07-17  5:30 ` [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Fabio M. Di Nitto
@ 2013-07-17  7:53   ` Steven Whitehouse
  2013-07-17 11:51     ` [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb Andrew Price
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Whitehouse @ 2013-07-17  7:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, 2013-07-17 at 07:30 +0200, Fabio M. Di Nitto wrote:
> On 07/16/2013 02:56 PM, Bob Peterson wrote:
> > This patch initializes a variable so that it no longer references
> > it uninitialized.
> > 
> > rhbz#984085
> > ---
> >  gfs2/fsck/initialize.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
> > index b01b240..936fd5e 100644
> > --- a/gfs2/fsck/initialize.c
> > +++ b/gfs2/fsck/initialize.c
> > @@ -832,7 +832,7 @@ static int get_lockproto_table(struct gfs2_sbd *sdp)
> >  {
> >  	FILE *fp;
> >  	char line[PATH_MAX];
> > -	char *cluname, *end;
> > +	char *cluname, *end = NULL;
> >  	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";
> 
> Just spotted this reference to cluster.conf ^^ remember it doesn't exist
> anymore in the new era.
> 
> Fabio
> 

Yes, agreed. It looks like this is trying to repair the superblock, but
I don't think that it is a good idea to assume that lack of cluster.conf
means that the fs is not clustered. Nor should be be trying to use the
lock table name from cluster.conf since we might be repairing a
filesystem on a different cluster to the one on which it was mounted and
supposed to work, so it might just result in putting incorrect
information in to this field. Perhaps better to set it to lock_dlm with
a cluster name of unknown and then prompt the user to use gfs2tune to
reset the cluster name afterwards, or something like that,

Steve.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb
  2013-07-17  7:53   ` Steven Whitehouse
@ 2013-07-17 11:51     ` Andrew Price
  2013-07-17 11:52       ` Steven Whitehouse
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Andrew Price @ 2013-07-17 11:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

As cluster.conf no longer exists we can't sniff the locking options from
it when rebuilding the superblock and in any case we shouldn't assume
that fsck.gfs2 is running on the cluster the volume belongs to.

This patch removes the get_lockproto_table function and instead sets the
lock table name to a placeholder ("unknown") and sets lockproto to
"lock_dlm".  It warns the user at the end of the run that the locktable
will need to be set before mounting.

Signed-off-by: Andrew Price <anprice@redhat.com>
---
 gfs2/fsck/initialize.c | 57 ++++----------------------------------------------
 gfs2/fsck/main.c       |  4 ++++
 2 files changed, 8 insertions(+), 53 deletions(-)

diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
index b01b240..869d2de 100644
--- a/gfs2/fsck/initialize.c
+++ b/gfs2/fsck/initialize.c
@@ -33,6 +33,7 @@ static int was_mounted_ro = 0;
 static uint64_t possible_root = HIGHEST_BLOCK;
 static struct master_dir fix_md;
 static unsigned long long blks_2free = 0;
+extern int sb_fixed;
 
 /**
  * block_mounters
@@ -828,58 +829,6 @@ static int init_system_inodes(struct gfs2_sbd *sdp)
 	return -1;
 }
 
-static int get_lockproto_table(struct gfs2_sbd *sdp)
-{
-	FILE *fp;
-	char line[PATH_MAX];
-	char *cluname, *end;
-	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";
-
-	memset(sdp->lockproto, 0, sizeof(sdp->lockproto));
-	memset(sdp->locktable, 0, sizeof(sdp->locktable));
-	fp = fopen(cfgfile, "rt");
-	if (!fp) {
-		/* no cluster.conf; must be a stand-alone file system */
-		strcpy(sdp->lockproto, "lock_nolock");
-		log_warn(_("Lock protocol determined to be: lock_nolock\n"));
-		log_warn(_("Stand-alone file system: No need for a lock "
-			   "table.\n"));
-		return 0;
-	}
-	/* We found a cluster.conf so assume it's a clustered file system */
-	log_warn(_("Lock protocol assumed to be: " GFS2_DEFAULT_LOCKPROTO
-		   "\n"));
-	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
-
-	while (fgets(line, sizeof(line) - 1, fp)) {
-		cluname = strstr(line,"<cluster name=");
-		if (cluname) {
-			cluname += 15;
-			end = strchr(cluname,'"');
-			if (end)
-				*end = '\0';
-			break;
-		}
-	}
-	if (cluname == NULL || end == NULL || end - cluname < 1) {
-		log_err(_("Error: Unable to determine cluster name from %s\n"),
-		                                                      cfgfile);
-	} else {
-		fsname = strrchr(opts.device, '/');
-		if (fsname)
-			fsname++;
-		else
-			fsname = "repaired";
-		snprintf(sdp->locktable, sizeof(sdp->locktable), "%.*s:%.16s",
-		         (int)(sizeof(sdp->locktable) - strlen(fsname) - 2),
-		         cluname, fsname);
-		log_warn(_("Lock table determined to be: %s\n"),
-			 sdp->locktable);
-	}
-	fclose(fp);
-	return 0;
-}
-
 /**
  * is_journal_copy - Is this a "real" dinode or a copy inside a journal?
  * A real dinode will be located at the block number in its no_addr.
@@ -1256,7 +1205,8 @@ static int sb_repair(struct gfs2_sbd *sdp)
 		}
 	}
 	/* Step 3 - Rebuild the lock protocol and file system table name */
-	get_lockproto_table(sdp);
+	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
+	strcpy(sdp->locktable, "unknown");
 	if (query(_("Okay to fix the GFS2 superblock? (y/n)"))) {
 		log_info(_("Found system master directory at: 0x%llx\n"),
 			 sdp->sd_sb.sb_master_dir.no_addr);
@@ -1280,6 +1230,7 @@ static int sb_repair(struct gfs2_sbd *sdp)
 		build_sb(sdp, uuid);
 		inode_put(&sdp->md.rooti);
 		inode_put(&sdp->master_dir);
+		sb_fixed = 1;
 	} else {
 		log_crit(_("GFS2 superblock not fixed; fsck cannot proceed "
 			   "without a valid superblock.\n"));
diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
index 9c3b06d..f9e7166 100644
--- a/gfs2/fsck/main.c
+++ b/gfs2/fsck/main.c
@@ -36,6 +36,7 @@ struct osi_root dirtree = (struct osi_root) { NULL, };
 struct osi_root inodetree = (struct osi_root) { NULL, };
 int dups_found = 0, dups_found_first = 0;
 struct gfs_sb *sbd1 = NULL;
+int sb_fixed = 0;
 
 /* This function is for libgfs2's sake.                                      */
 void print_it(const char *label, const char *fmt, const char *fmt2, ...)
@@ -315,6 +316,9 @@ int main(int argc, char **argv)
 		log_notice( _("Writing changes to disk\n"));
 	fsync(sdp->device_fd);
 	destroy(sdp);
+	if (sb_fixed)
+		log_warn(_("Superblock was reset. Use tunegfs2 to manually "
+		           "set lock table before mounting.\n"));
 	log_notice( _("gfs2_fsck complete\n"));
 
 	if (!error) {
-- 
1.8.1.4



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb
  2013-07-17 11:51     ` [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb Andrew Price
@ 2013-07-17 11:52       ` Steven Whitehouse
  2013-07-17 12:27       ` Bob Peterson
  2013-07-17 18:13       ` Fabio M. Di Nitto
  2 siblings, 0 replies; 18+ messages in thread
From: Steven Whitehouse @ 2013-07-17 11:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

I think that looks like a good solution. Seems to be a bit smaller
code-wise too,

Steve.

On Wed, 2013-07-17 at 12:51 +0100, Andrew Price wrote:
> As cluster.conf no longer exists we can't sniff the locking options from
> it when rebuilding the superblock and in any case we shouldn't assume
> that fsck.gfs2 is running on the cluster the volume belongs to.
> 
> This patch removes the get_lockproto_table function and instead sets the
> lock table name to a placeholder ("unknown") and sets lockproto to
> "lock_dlm".  It warns the user at the end of the run that the locktable
> will need to be set before mounting.
> 
> Signed-off-by: Andrew Price <anprice@redhat.com>
> ---
>  gfs2/fsck/initialize.c | 57 ++++----------------------------------------------
>  gfs2/fsck/main.c       |  4 ++++
>  2 files changed, 8 insertions(+), 53 deletions(-)
> 
> diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
> index b01b240..869d2de 100644
> --- a/gfs2/fsck/initialize.c
> +++ b/gfs2/fsck/initialize.c
> @@ -33,6 +33,7 @@ static int was_mounted_ro = 0;
>  static uint64_t possible_root = HIGHEST_BLOCK;
>  static struct master_dir fix_md;
>  static unsigned long long blks_2free = 0;
> +extern int sb_fixed;
>  
>  /**
>   * block_mounters
> @@ -828,58 +829,6 @@ static int init_system_inodes(struct gfs2_sbd *sdp)
>  	return -1;
>  }
>  
> -static int get_lockproto_table(struct gfs2_sbd *sdp)
> -{
> -	FILE *fp;
> -	char line[PATH_MAX];
> -	char *cluname, *end;
> -	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";
> -
> -	memset(sdp->lockproto, 0, sizeof(sdp->lockproto));
> -	memset(sdp->locktable, 0, sizeof(sdp->locktable));
> -	fp = fopen(cfgfile, "rt");
> -	if (!fp) {
> -		/* no cluster.conf; must be a stand-alone file system */
> -		strcpy(sdp->lockproto, "lock_nolock");
> -		log_warn(_("Lock protocol determined to be: lock_nolock\n"));
> -		log_warn(_("Stand-alone file system: No need for a lock "
> -			   "table.\n"));
> -		return 0;
> -	}
> -	/* We found a cluster.conf so assume it's a clustered file system */
> -	log_warn(_("Lock protocol assumed to be: " GFS2_DEFAULT_LOCKPROTO
> -		   "\n"));
> -	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
> -
> -	while (fgets(line, sizeof(line) - 1, fp)) {
> -		cluname = strstr(line,"<cluster name=");
> -		if (cluname) {
> -			cluname += 15;
> -			end = strchr(cluname,'"');
> -			if (end)
> -				*end = '\0';
> -			break;
> -		}
> -	}
> -	if (cluname == NULL || end == NULL || end - cluname < 1) {
> -		log_err(_("Error: Unable to determine cluster name from %s\n"),
> -		                                                      cfgfile);
> -	} else {
> -		fsname = strrchr(opts.device, '/');
> -		if (fsname)
> -			fsname++;
> -		else
> -			fsname = "repaired";
> -		snprintf(sdp->locktable, sizeof(sdp->locktable), "%.*s:%.16s",
> -		         (int)(sizeof(sdp->locktable) - strlen(fsname) - 2),
> -		         cluname, fsname);
> -		log_warn(_("Lock table determined to be: %s\n"),
> -			 sdp->locktable);
> -	}
> -	fclose(fp);
> -	return 0;
> -}
> -
>  /**
>   * is_journal_copy - Is this a "real" dinode or a copy inside a journal?
>   * A real dinode will be located at the block number in its no_addr.
> @@ -1256,7 +1205,8 @@ static int sb_repair(struct gfs2_sbd *sdp)
>  		}
>  	}
>  	/* Step 3 - Rebuild the lock protocol and file system table name */
> -	get_lockproto_table(sdp);
> +	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
> +	strcpy(sdp->locktable, "unknown");
>  	if (query(_("Okay to fix the GFS2 superblock? (y/n)"))) {
>  		log_info(_("Found system master directory at: 0x%llx\n"),
>  			 sdp->sd_sb.sb_master_dir.no_addr);
> @@ -1280,6 +1230,7 @@ static int sb_repair(struct gfs2_sbd *sdp)
>  		build_sb(sdp, uuid);
>  		inode_put(&sdp->md.rooti);
>  		inode_put(&sdp->master_dir);
> +		sb_fixed = 1;
>  	} else {
>  		log_crit(_("GFS2 superblock not fixed; fsck cannot proceed "
>  			   "without a valid superblock.\n"));
> diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
> index 9c3b06d..f9e7166 100644
> --- a/gfs2/fsck/main.c
> +++ b/gfs2/fsck/main.c
> @@ -36,6 +36,7 @@ struct osi_root dirtree = (struct osi_root) { NULL, };
>  struct osi_root inodetree = (struct osi_root) { NULL, };
>  int dups_found = 0, dups_found_first = 0;
>  struct gfs_sb *sbd1 = NULL;
> +int sb_fixed = 0;
>  
>  /* This function is for libgfs2's sake.                                      */
>  void print_it(const char *label, const char *fmt, const char *fmt2, ...)
> @@ -315,6 +316,9 @@ int main(int argc, char **argv)
>  		log_notice( _("Writing changes to disk\n"));
>  	fsync(sdp->device_fd);
>  	destroy(sdp);
> +	if (sb_fixed)
> +		log_warn(_("Superblock was reset. Use tunegfs2 to manually "
> +		           "set lock table before mounting.\n"));
>  	log_notice( _("gfs2_fsck complete\n"));
>  
>  	if (!error) {




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb
  2013-07-17 11:51     ` [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb Andrew Price
  2013-07-17 11:52       ` Steven Whitehouse
@ 2013-07-17 12:27       ` Bob Peterson
  2013-07-17 18:13       ` Fabio M. Di Nitto
  2 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-17 12:27 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| As cluster.conf no longer exists we can't sniff the locking options from
| it when rebuilding the superblock and in any case we shouldn't assume
| that fsck.gfs2 is running on the cluster the volume belongs to.
| 
| This patch removes the get_lockproto_table function and instead sets the
| lock table name to a placeholder ("unknown") and sets lockproto to
| "lock_dlm".  It warns the user at the end of the run that the locktable
| will need to be set before mounting.
| 
| Signed-off-by: Andrew Price <anprice@redhat.com>
| ---
|  gfs2/fsck/initialize.c | 57
|  ++++----------------------------------------------
|  gfs2/fsck/main.c       |  4 ++++
|  2 files changed, 8 insertions(+), 53 deletions(-)
| 
| diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
| index b01b240..869d2de 100644
| --- a/gfs2/fsck/initialize.c
| +++ b/gfs2/fsck/initialize.c
| @@ -33,6 +33,7 @@ static int was_mounted_ro = 0;
|  static uint64_t possible_root = HIGHEST_BLOCK;
|  static struct master_dir fix_md;
|  static unsigned long long blks_2free = 0;
| +extern int sb_fixed;
|  
|  /**
|   * block_mounters
| @@ -828,58 +829,6 @@ static int init_system_inodes(struct gfs2_sbd *sdp)
|  	return -1;
|  }
|  
| -static int get_lockproto_table(struct gfs2_sbd *sdp)
| -{
| -	FILE *fp;
| -	char line[PATH_MAX];
| -	char *cluname, *end;
| -	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";
| -
| -	memset(sdp->lockproto, 0, sizeof(sdp->lockproto));
| -	memset(sdp->locktable, 0, sizeof(sdp->locktable));
| -	fp = fopen(cfgfile, "rt");
| -	if (!fp) {
| -		/* no cluster.conf; must be a stand-alone file system */
| -		strcpy(sdp->lockproto, "lock_nolock");
| -		log_warn(_("Lock protocol determined to be: lock_nolock\n"));
| -		log_warn(_("Stand-alone file system: No need for a lock "
| -			   "table.\n"));
| -		return 0;
| -	}
| -	/* We found a cluster.conf so assume it's a clustered file system */
| -	log_warn(_("Lock protocol assumed to be: " GFS2_DEFAULT_LOCKPROTO
| -		   "\n"));
| -	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
| -
| -	while (fgets(line, sizeof(line) - 1, fp)) {
| -		cluname = strstr(line,"<cluster name=");
| -		if (cluname) {
| -			cluname += 15;
| -			end = strchr(cluname,'"');
| -			if (end)
| -				*end = '\0';
| -			break;
| -		}
| -	}
| -	if (cluname == NULL || end == NULL || end - cluname < 1) {
| -		log_err(_("Error: Unable to determine cluster name from %s\n"),
| -		                                                      cfgfile);
| -	} else {
| -		fsname = strrchr(opts.device, '/');
| -		if (fsname)
| -			fsname++;
| -		else
| -			fsname = "repaired";
| -		snprintf(sdp->locktable, sizeof(sdp->locktable), "%.*s:%.16s",
| -		         (int)(sizeof(sdp->locktable) - strlen(fsname) - 2),
| -		         cluname, fsname);
| -		log_warn(_("Lock table determined to be: %s\n"),
| -			 sdp->locktable);
| -	}
| -	fclose(fp);
| -	return 0;
| -}
| -
|  /**
|   * is_journal_copy - Is this a "real" dinode or a copy inside a journal?
|   * A real dinode will be located at the block number in its no_addr.
| @@ -1256,7 +1205,8 @@ static int sb_repair(struct gfs2_sbd *sdp)
|  		}
|  	}
|  	/* Step 3 - Rebuild the lock protocol and file system table name */
| -	get_lockproto_table(sdp);
| +	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
| +	strcpy(sdp->locktable, "unknown");
|  	if (query(_("Okay to fix the GFS2 superblock? (y/n)"))) {
|  		log_info(_("Found system master directory at: 0x%llx\n"),
|  			 sdp->sd_sb.sb_master_dir.no_addr);
| @@ -1280,6 +1230,7 @@ static int sb_repair(struct gfs2_sbd *sdp)
|  		build_sb(sdp, uuid);
|  		inode_put(&sdp->md.rooti);
|  		inode_put(&sdp->master_dir);
| +		sb_fixed = 1;
|  	} else {
|  		log_crit(_("GFS2 superblock not fixed; fsck cannot proceed "
|  			   "without a valid superblock.\n"));
| diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
| index 9c3b06d..f9e7166 100644
| --- a/gfs2/fsck/main.c
| +++ b/gfs2/fsck/main.c
| @@ -36,6 +36,7 @@ struct osi_root dirtree = (struct osi_root) { NULL, };
|  struct osi_root inodetree = (struct osi_root) { NULL, };
|  int dups_found = 0, dups_found_first = 0;
|  struct gfs_sb *sbd1 = NULL;
| +int sb_fixed = 0;
|  
|  /* This function is for libgfs2's sake.
|  */
|  void print_it(const char *label, const char *fmt, const char *fmt2, ...)
| @@ -315,6 +316,9 @@ int main(int argc, char **argv)
|  		log_notice( _("Writing changes to disk\n"));
|  	fsync(sdp->device_fd);
|  	destroy(sdp);
| +	if (sb_fixed)
| +		log_warn(_("Superblock was reset. Use tunegfs2 to manually "
| +		           "set lock table before mounting.\n"));
|  	log_notice( _("gfs2_fsck complete\n"));
|  
|  	if (!error) {
| --
| 1.8.1.4

Hi,

ACK. I'll pull patch1 of my set based on this. Thanks, Andy.

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb
  2013-07-17 11:51     ` [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb Andrew Price
  2013-07-17 11:52       ` Steven Whitehouse
  2013-07-17 12:27       ` Bob Peterson
@ 2013-07-17 18:13       ` Fabio M. Di Nitto
  2013-07-17 18:26         ` Bob Peterson
  2 siblings, 1 reply; 18+ messages in thread
From: Fabio M. Di Nitto @ 2013-07-17 18:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

You also want to get rid of this code in RHEL6 btw. It's just broken in
many different ways.

Fabio

On 07/17/2013 01:51 PM, Andrew Price wrote:
> As cluster.conf no longer exists we can't sniff the locking options from
> it when rebuilding the superblock and in any case we shouldn't assume
> that fsck.gfs2 is running on the cluster the volume belongs to.
> 
> This patch removes the get_lockproto_table function and instead sets the
> lock table name to a placeholder ("unknown") and sets lockproto to
> "lock_dlm".  It warns the user at the end of the run that the locktable
> will need to be set before mounting.
> 
> Signed-off-by: Andrew Price <anprice@redhat.com>
> ---
>  gfs2/fsck/initialize.c | 57 ++++----------------------------------------------
>  gfs2/fsck/main.c       |  4 ++++
>  2 files changed, 8 insertions(+), 53 deletions(-)
> 
> diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
> index b01b240..869d2de 100644
> --- a/gfs2/fsck/initialize.c
> +++ b/gfs2/fsck/initialize.c
> @@ -33,6 +33,7 @@ static int was_mounted_ro = 0;
>  static uint64_t possible_root = HIGHEST_BLOCK;
>  static struct master_dir fix_md;
>  static unsigned long long blks_2free = 0;
> +extern int sb_fixed;
>  
>  /**
>   * block_mounters
> @@ -828,58 +829,6 @@ static int init_system_inodes(struct gfs2_sbd *sdp)
>  	return -1;
>  }
>  
> -static int get_lockproto_table(struct gfs2_sbd *sdp)
> -{
> -	FILE *fp;
> -	char line[PATH_MAX];
> -	char *cluname, *end;
> -	const char *fsname, *cfgfile = "/etc/cluster/cluster.conf";
> -
> -	memset(sdp->lockproto, 0, sizeof(sdp->lockproto));
> -	memset(sdp->locktable, 0, sizeof(sdp->locktable));
> -	fp = fopen(cfgfile, "rt");
> -	if (!fp) {
> -		/* no cluster.conf; must be a stand-alone file system */
> -		strcpy(sdp->lockproto, "lock_nolock");
> -		log_warn(_("Lock protocol determined to be: lock_nolock\n"));
> -		log_warn(_("Stand-alone file system: No need for a lock "
> -			   "table.\n"));
> -		return 0;
> -	}
> -	/* We found a cluster.conf so assume it's a clustered file system */
> -	log_warn(_("Lock protocol assumed to be: " GFS2_DEFAULT_LOCKPROTO
> -		   "\n"));
> -	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
> -
> -	while (fgets(line, sizeof(line) - 1, fp)) {
> -		cluname = strstr(line,"<cluster name=");
> -		if (cluname) {
> -			cluname += 15;
> -			end = strchr(cluname,'"');
> -			if (end)
> -				*end = '\0';
> -			break;
> -		}
> -	}
> -	if (cluname == NULL || end == NULL || end - cluname < 1) {
> -		log_err(_("Error: Unable to determine cluster name from %s\n"),
> -		                                                      cfgfile);
> -	} else {
> -		fsname = strrchr(opts.device, '/');
> -		if (fsname)
> -			fsname++;
> -		else
> -			fsname = "repaired";
> -		snprintf(sdp->locktable, sizeof(sdp->locktable), "%.*s:%.16s",
> -		         (int)(sizeof(sdp->locktable) - strlen(fsname) - 2),
> -		         cluname, fsname);
> -		log_warn(_("Lock table determined to be: %s\n"),
> -			 sdp->locktable);
> -	}
> -	fclose(fp);
> -	return 0;
> -}
> -
>  /**
>   * is_journal_copy - Is this a "real" dinode or a copy inside a journal?
>   * A real dinode will be located at the block number in its no_addr.
> @@ -1256,7 +1205,8 @@ static int sb_repair(struct gfs2_sbd *sdp)
>  		}
>  	}
>  	/* Step 3 - Rebuild the lock protocol and file system table name */
> -	get_lockproto_table(sdp);
> +	strcpy(sdp->lockproto, GFS2_DEFAULT_LOCKPROTO);
> +	strcpy(sdp->locktable, "unknown");
>  	if (query(_("Okay to fix the GFS2 superblock? (y/n)"))) {
>  		log_info(_("Found system master directory at: 0x%llx\n"),
>  			 sdp->sd_sb.sb_master_dir.no_addr);
> @@ -1280,6 +1230,7 @@ static int sb_repair(struct gfs2_sbd *sdp)
>  		build_sb(sdp, uuid);
>  		inode_put(&sdp->md.rooti);
>  		inode_put(&sdp->master_dir);
> +		sb_fixed = 1;
>  	} else {
>  		log_crit(_("GFS2 superblock not fixed; fsck cannot proceed "
>  			   "without a valid superblock.\n"));
> diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
> index 9c3b06d..f9e7166 100644
> --- a/gfs2/fsck/main.c
> +++ b/gfs2/fsck/main.c
> @@ -36,6 +36,7 @@ struct osi_root dirtree = (struct osi_root) { NULL, };
>  struct osi_root inodetree = (struct osi_root) { NULL, };
>  int dups_found = 0, dups_found_first = 0;
>  struct gfs_sb *sbd1 = NULL;
> +int sb_fixed = 0;
>  
>  /* This function is for libgfs2's sake.                                      */
>  void print_it(const char *label, const char *fmt, const char *fmt2, ...)
> @@ -315,6 +316,9 @@ int main(int argc, char **argv)
>  		log_notice( _("Writing changes to disk\n"));
>  	fsync(sdp->device_fd);
>  	destroy(sdp);
> +	if (sb_fixed)
> +		log_warn(_("Superblock was reset. Use tunegfs2 to manually "
> +		           "set lock table before mounting.\n"));
>  	log_notice( _("gfs2_fsck complete\n"));
>  
>  	if (!error) {
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb
  2013-07-17 18:13       ` Fabio M. Di Nitto
@ 2013-07-17 18:26         ` Bob Peterson
  0 siblings, 0 replies; 18+ messages in thread
From: Bob Peterson @ 2013-07-17 18:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| You also want to get rid of this code in RHEL6 btw. It's just broken in
| many different ways.
| 
| Fabio

Perhaps Andy should open a bugzilla record so we can get his patch into RHEL6.5.

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-07-17 18:26 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16 12:56 [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Bob Peterson
2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 2/7] fsck.gfs2: fix some log messages Bob Peterson
2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 3/7] fsck.gfs2: Fix directory link on relocated directory dirents Bob Peterson
2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 4/7] fsck.gfs2: Fix infinite loop in pass1b caused by duplicates in hash table Bob Peterson
2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 5/7] fsck.gfs2: don't check newly created lost+found in pass2 Bob Peterson
2013-07-16 16:52   ` Andrew Price
2013-07-16 17:14     ` Bob Peterson
2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 6/7] fsck.gfs2: avoid negative number in leaf depth Bob Peterson
2013-07-16 12:56 ` [Cluster-devel] [gfs2-utils PATCH 7/7] fsck.gfs2: Detect and fix duplicate references in hash tables Bob Peterson
2013-07-16 16:52   ` Andrew Price
2013-07-16 17:16     ` Bob Peterson
2013-07-17  5:30 ` [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable Fabio M. Di Nitto
2013-07-17  7:53   ` Steven Whitehouse
2013-07-17 11:51     ` [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb Andrew Price
2013-07-17 11:52       ` Steven Whitehouse
2013-07-17 12:27       ` Bob Peterson
2013-07-17 18:13       ` Fabio M. Di Nitto
2013-07-17 18:26         ` Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.