All of lore.kernel.org
 help / color / mirror / Atom feed
* bug+fix in reiserfsck - reiserfscore/journal.c
@ 2004-02-27  3:13 Thorsten Kohfeldt
  2004-02-27 12:53 ` Vitaly Fertman
  0 siblings, 1 reply; 7+ messages in thread
From: Thorsten Kohfeldt @ 2004-02-27  3:13 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 10289 bytes --]

To whom it may (should!) concern,                                        26/27-FEB-2004
 
after having been annoyed for quite some time by corrupt /var file systems after
(simulated) powerloss in a SuSE 9.0 setup with reiserfs, I finally tracked the
problem down to a nasty bug in reiserfsck - reiserfscore/journal.c
 
Turned out that the journal is not always correctly replayed by reiserfsck !
 
This may be the case when the filesystem has not experienced a lot of transactions
after it has been mounted and so there are more than one mount_id to be found in
the journal. In that case, the function replay_journal() might accidentally step over
the transactions it is supposed to replay and that way those transactions are
completely and forever ignored.
 
The following fix prevents the search loop from stepping too far:
(check also the MIME attachment down below)
The diff is valid for SuSE 9.0 reiserfsprogs 3.6.9+ and also valid for
reiserfsprogs 3.6.13.

############# snip ##############
--- reiserfsprogs-3.6.13/reiserfscore/journal.c 2004-02-17 13:06:23.000000000 +0100
+++ replayFix/reiserfscore/journal.c 2004-02-27 00:11:44.000000000 +0100
@@ -784,7 +784,9 @@
  if (control.mount_id == cur.mount_id && control.trans_id == 0 && 
      control.desc_blocknr == 0)
      break;
- 
+ if (cur.mount_id == control.mount_id && cur.trans_id == control.trans_id + 1)
+     break;
+
  ret = next_transaction (fs, &cur, newest);
  
  if (ret != TRANS_FOUND)
@@ -792,8 +794,9 @@
     }
 
     while (ret == TRANS_FOUND) {
- if (control.mount_id != cur.mount_id || control.trans_id != 0 || 
-     control.desc_blocknr != 0) 
+ if ((control.mount_id != cur.mount_id || control.trans_id != 0 || 
+      control.desc_blocknr != 0) &&
+     (cur.mount_id != control.mount_id || cur.trans_id != control.trans_id + 1))
  {
      ret = next_transaction (fs, &cur, newest);
      
@@ -821,7 +824,8 @@
         replayed ++;
     }
 
-    reiserfs_warning (stderr, "%d transactions replayed\n", replayed);
+    reiserfs_warning (stderr, "Journal in %s (1st block: %d) - %d transactions replayed\n",
+     fs->fs_j_file_name, get_jp_journal_1st_block (sb_jp (fs->fs_ondisk_sb)), replayed);
  
     mark_buffer_dirty (fs->fs_super_bh);
     bwrite (fs->fs_super_bh); 
############# pins ##############

The fix was extensively tested by copying mounted reiser filesystems to
(XOR- !) loop devices and then reiserfsck-ing them.
The fixed reiserfsck version always created the same results or, when the bugged
version did replay 0 transactions, the fixed version replayed the correct amount
of transactions.
 
The unfortunate thing about the SuSE setup is, that reiserfsck is executed before
the filesystem gets mounted. The kernel "mount" code itself seems to know the trick,
but that doesn't help if reiserfsck is discarding those transactions before the
mount can perform its correct replay!
Only the root filesystem gets replayed correctly because it will be mounted
read-only first. On initiation of a mount, the log will always be replayed even if
the mount is supposed to be read-only in order to achieve consistency as early
as possible !

For a quick fix of a SuSE 9.0 system, install the reiserfs(progs) source with yast2,
untar the (source only) attachment to a directory of your choice and run
./createSuSEreiserfs. Then install the new reiserfsck into /sbin/.
IMPORTANT NOTE: Make sure that /sbin/reiserfsck AND /sbin/fsck.reiserfs
are hardlinked to each other again (ls -li shows the same inode) !
The boot code calls fsck.reiserfs instead of reiserfsck.
 
 
Also, I would like to start a discussion about the 32-bit wrap-around of the
journal transaction id.
I DO expect that is a non-academic scenario given the following assumption:
A system that would do 200 transactions per second, 12 hours a day,
5 days a week would reach the wrap around in only two years !
Of course that is a worst case calculation but it is quite probable that a
long term running system might touch that boundary. And that id survives
a reboot. So the amounts accumulate over time.
 
I saw several code sections that assume that the "newer" transaction has
also the higher transaction id. That is NOT the case during a wrap-around.
I'm curious what y'all think about that issue.
 
For a starter, I added a patch that fixes one wrap around issue inside
journal.c. Take it or leave it.

############# snip ##############
--- replayFix/reiserfscore/journal.c 2004-02-27 00:11:44.000000000 +0100
+++ wrapSafer/reiserfscore/journal.c 2004-02-27 00:18:06.000000000 +0100
@@ -136,14 +136,15 @@
     __u32 newest_trans_id, oldest_trans_id, trans_id;
     int trans_nr;
 
+    memset(oldest, 0, sizeof(*oldest));
+    memset(newest, 0, sizeof(*newest));
+    newest_trans_id = oldest_trans_id = 0; // keep compiler quiet
+
     sb = fs->fs_ondisk_sb;
     
     j_start = get_jp_journal_1st_block (sb_jp (sb));
     j_size = get_jp_journal_size (sb_jp (sb));
     
-    oldest_trans_id = 0xffffffff;
-    newest_trans_id = 0;
-
     trans_nr = 0;
     for (j_cur = 0; j_cur < j_size; j_cur ++) {
  d_bh = bread (fs->fs_journal_dev, j_start + j_cur, fs->fs_blocksize);
@@ -152,10 +153,14 @@
      continue;
  }
 
+ trans_id = get_desc_trans_id (d_bh);
+ if (!trans_nr) {
+     oldest_trans_id = trans_id;
+     newest_trans_id = trans_id;
+ }
  trans_nr ++;
 
- trans_id = get_desc_trans_id (d_bh);
- if (trans_id < oldest_trans_id) {
+ if (((__u32)(oldest_trans_id - trans_id)) < 0x80000000UL) {
      oldest_trans_id = trans_id;
 
      oldest->mount_id = get_desc_mount_id (d_bh);
@@ -166,7 +171,7 @@
      oldest->next_trans_offset = next_desc_expected (fs, d_bh) - j_start;
  }
 
- if (trans_id > newest_trans_id) {
+ if (((__u32)(trans_id - newest_trans_id)) < 0x80000000UL){
      newest_trans_id = trans_id;
 
      newest->mount_id = get_desc_mount_id (d_bh); 
############# pins ##############

For REISER 4, I guess, a 48-bit or 64-bit transaction id is an adviseable
choice !


Thorsten Kohfeldt (gmx|de)
 


ATTACHMENT: tar file with 3 patchdiffs and one script


MIME-Version: 1.0
Content-Type: application/octet-stream; name="reiserReplayFix20040227.tar.gz"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="reiserReplayFix20040227.tar.gz"

H4sIAMmSPkAAA+0Za2/bSC5f7V/BeNvAji1bkh+JkyZo0cddF9t0kaQ44LYLQZZGtmpZ0ukRJ338
9yNn9LL8SPb6usOJCCJrhuRwSA7JoQJmhyy4ZL6j372yb3t73wFkeSAfDYf4lOWj0WDlmcCeIsvK
0VBRVEXZkxW131f3YPg9hClDHEZ6ALAXzeY78e6b/x+FoGR/RZY+eHHg6o60DHT/SrdY0DVty/qK
NWQ07mgw2GZ/dTRI7Y8vaHj8hS6zB/I32+UO+D+3vyRJEGTWF95ghYYXsF7iB12jpqI5JFmV1COQ
5RNFORkMutnRhTbZt95utyHzmAcyOj6RR2uMnj4FSemPOsoA2vw5hKdP60CgaXFfBZctWRhpUaC7
oWabHfAcc3Ug/XUqyGw3SobcAIfqbRpcsEXIoqag7YDcgdD+yDyreSiGWq3TFUSx6gqiGMoQS3LB
WVkwHJFPa70ezBnzwfAWvu2wAP4V2yyqt4Ws4QSxrFA6t0LNc007nGvhJNmH+P9BI4+NEG3KIu2D
ryXq1RRcauJ4xhya4QQn6EHCpVQo9ToRH92AX5fo/4YN3FoJnAqU9V3LOCOYpEoXg3zI8gJoftCM
WAyC+PkkkS99b7db8KkONVObzBBvEjDdhGaillR0k910Mm20BWUn1R1XBLHEDXGPGqodBV1MGfbJ
s8ijaiSP4bmR7cYMxat9Id+oFXZCukIFGPnumiQRGbxmW9DcTzdI0rZrmzWWO6PAWFdYEQNlqGVq
a7fJXaWHiSRxkbLhJ2VRhIyE02zyg9RqloWVMlFaLWQg3x4nR/Pdb8Ie9+0QVnCk84UXu1FJ8Gws
FZybZzTqHKF1jhR8ZMZJ2bjsNl3Qsyw8jciPj3GG7NZnRsS4g3SAM8WdJI6R2rWknfOyGTZop6CW
MvKadlLl7DIurOA8UDnfO/6X8788zvJ/lha+d/7vD9U0/49Udcjzf39Y5f8fASL/i2TtB940lPrd
UVfp35fBlSNQ+pi+T9T+xlLgm9QUFBmOjgcUGegxFpGBDimF7cBzuvkZOgMM//n7wQGkOPlhxIxD
E4XQT/P84PF8QZkKcVoJAqWdOQXWJDKsLEALloWgRRGnuOCaEG1QWkkmSPhj5q8FeUzjmLoR2Z4r
ItoBz2tJsUHxLNEB0eyfwfXls4sr7dXbdxcvWkJjY7VzjBobpxojoBjIfyxnWHQI4rNVYoyA0mbl
7peU+/nz+r72Sbk4gTy2KpdwWpDG2a9bRqhw1zoHBwnSquH2NxiOVioabn+L4cgz0lD/l0zGSyoy
zrFKGa59rA7QRqlxBDs6MJjFeNJPTSaJKXGCtKUeuLY7xUItMlmAKzQem1BYPMy4vHcbnewlLVB3
8PlVnEssleFxCE0sJYEr8wQem5RNd60jmNfS6kyz0ME0V1+wzv0VarnSbbVWxCZf5zW4Hsy1SWzh
1UIz7SC6yyjD2Mcxnio55mQZ2BHbNP2zI+1/J5Tzv4ExKWJX8dXL1Fu+fg2K5Tv6P/2RnOT/PiYF
hfI/jlT3/x8Cv+z3Jrbbm+jhrP4LXM/sEOj8gu6a/EcIh1lBeMgLQVDxzq52KX8jBdYMN7aJYWty
B9HMC8KIud25N7OYY0ZPpwssHln9F0TcWljiMre4jA7obbYXh8Ry4rAFZioWsCxoGXNksvRix4SL
t9dJkFgNSphVohmDZBlY6iG4XkTbcFDAyAMdWRgsiHQMcxi3mRvBHYu6OPoKr6XsVl/4uHVyfhh3
5WQ1h+k3qA94/vby8t3v19C7QX/5/dnl9evr128vKGIaeoh7BM9CLN9b4pXe0m0nRuFD1Ejk3IFu
RTiqo9QTz+MLXqOgDZ5/GphqTIaLBfMQfwYBXmaIBA0gBAhjw2C4AVxJ7BpDN3LA12imR3x1jNCT
OIKlHc1wmDbQKSgO0KjslhkxXZIsG21E3JGFEGum31AymDDmgj11sUwzwXMNRibNmXTIvLhSUePE
Anfpshtkg2YzWBgisT4lBVuBt1ixx369jkcNU8JZoxhfGjTqxzi6XoeOG/UQqZHvWaMXh0EvDIye
rxtzfcrC3tXbd5fPX1416nW0/B/QeKQ0KHM3GiDp+Zsk/M1gDfgTTusokVuvMWPmQeNdiHxO4JEM
f2RYfzZw9taOQKlbdsJ5H6QAHiWS9B59EgJ/6eKzO/24xpZsK5AhkZW3PR4JMljElF0ZGhBDD3fN
/XRNNV+Ty79N/GCBElmcI+qTEy3mmBjTEapE/TicmdnAec9kNz03dhwqdmgxbg2Qbm8+Wtv3Vqud
4w+OpJXnuo43rdcNM9tYxtjXI2MGkq/Ak4yzrIibhcwPPfLtdnscT+Z8tpBgOjLtKJQMPIVu7Ava
jFTZTqpKi3nqTxLvsJVo1e20fbGccyfFLj+izCwR97cTD3A1L9K10Ex6UJJaIh5sJx5KgWMv7Ega
DSZ2WeLhdrpRkSCnGG2nOJL0OPK0gmZysqMyGdZVOId/8nEWxMU1sSTi8XbKbeE/Jx5vJd76UaJM
3O1h2W7ZU4y9NQKazkY4xkKfiyk+R2/CjX3PNwuHBP16tvBMkC4hbi+zg4Ynzc3PlLRYYJyThiBF
dz6ecZSa0cnEWgYkCrjghCA58OkLvD+FzzDFnYN0A0KiLp7+KA5/cl261v/ZaOGvW+Oe/s+wr+Tf
f5ThEeIPR0pV//0Q2Nz/Ge/o2vQlDB7KCJTRSX980pdXujZq1v75ti0l3iPuDKFN/+ja3DvEu/Eh
PPf8u8CezjB1ytRMQvGobPk71ihwyZfrgGMbzA2pxJl6WKi4oljFW+sm8sFDyQHJV3bZu3z57MWb
lzTRq4vL/uBY4R8d+FPJrvvbvvEkX6vow4K4RdMlWjSmcbtZp4AQ7FDLbvMfAm2hT21DC6NA3Orx
Gk3NHChA7xDE1Y4KbZ8ZNhZkvNCnstigwi2r01D89gpp76Gk9SIZNUhwFJTTXJIvkLP+wvVEzax8
s7lesiZC2gwB5mD1yZXzkM1z9Y9V6rUMxv30owKRI6Kx8OkjVLpqZ23RVv65Y0fT5BXdj+7wxrPg
lTdgJHFNPTAzTVpYPJiIiKxqjWXgITXfJd4UuCItm+XImHrQ06j9wps3BQ84TXpquULbxXc1+caR
9oxo60O53xlDeygXe4A6OqrtRqiDg03NmsbjsLwuJ8vdcsO3Q9HIEVp/Ar9iRX7x7DftzesL7er1
P1+KnuID+k6okeZjJ26JrhP/wiq0hzeRDbpqJBppZErrvnffJHU13fkc8kq8HrmCLc5expiqsfaY
xLZjSngE0XuTt3W2vh6gAvByRHwba32stY+m1Bwsqq5D/cmv3XYjMXNjh6/cu21qz3Eu/8EWxAEo
25Q3Qk9Fzzpxx8T1uOMpfR7yhkrR8wyqouFMPJO6eOMJLH623eZ2m79zo1gP2SKcCyHEAd9hohdo
fv5VMdX5iiPmh9X0WNpmiMinMh+iq69AQzNc8Y/UidTc4m1hbfGS2y1hnM+RDVOWD/HhsusmtNt0
WNTPRj9f85GswU9aTEPRt9xtJ23uF/3jYfJv9/V1V083sMGRRdgjdx71eRwVD/LmvxRJxfZap2kj
n5wZN7QlULZ3o/Bzd5/X/uPZ5cXri7+dQBphEtWi1u2QK74jNJ/5FF5h7AUhxr7vBUWHz0i7uBmU
qbMtFLQfzombdxevcvUgrxoFyxi6eaWMZ0zHjVMh8bPr6AoqqKCCCiqooIIKKqigggoqqKCCCiqo
oIIKKqigggoqqKCCCiqooIIKKqigggoq+JHwb17W0/4AUAAA


[-- Attachment #2: Type: text/html, Size: 13723 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug+fix in reiserfsck - reiserfscore/journal.c
  2004-02-27  3:13 bug+fix in reiserfsck - reiserfscore/journal.c Thorsten Kohfeldt
@ 2004-02-27 12:53 ` Vitaly Fertman
  2004-02-28  1:41   ` Thorsten Kohfeldt
  0 siblings, 1 reply; 7+ messages in thread
From: Vitaly Fertman @ 2004-02-27 12:53 UTC (permalink / raw)
  To: Thorsten Kohfeldt, reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 3939 bytes --]

Hello, 

On Friday 27 February 2004 06:13, Thorsten Kohfeldt wrote:
> To whom it may (should!) concern,                                       
> 26/27-FEB-2004
>
> after having been annoyed for quite some time by corrupt /var file systems
> after (simulated) powerloss in a SuSE 9.0 setup with reiserfs, I finally
> tracked the problem down to a nasty bug in reiserfsck -
> reiserfscore/journal.c
>
> Turned out that the journal is not always correctly replayed by reiserfsck
> !
>
> This may be the case when the filesystem has not experienced a lot of
> transactions after it has been mounted and so there are more than one
> mount_id to be found in the journal. In that case, the function
> replay_journal() might accidentally step over the transactions it is
> supposed to replay and that way those transactions are completely and
> forever ignored.
>
> The following fix prevents the search loop from stepping too far:
> (check also the MIME attachment down below)
> The diff is valid for SuSE 9.0 reiserfsprogs 3.6.9+ and also valid for
> reiserfsprogs 3.6.13.
>
> ############# snip ##############
> --- reiserfsprogs-3.6.13/reiserfscore/journal.c 2004-02-17
> 13:06:23.000000000 +0100 +++ replayFix/reiserfscore/journal.c 2004-02-27
> 00:11:44.000000000 +0100 @@ -784,7 +784,9 @@
>   if (control.mount_id == cur.mount_id && control.trans_id == 0 &&
>       control.desc_blocknr == 0)
>       break;
> -
> + if (cur.mount_id == control.mount_id && cur.trans_id == control.trans_id
> + 1) +     break;
> +
>   ret = next_transaction (fs, &cur, newest);
>
>   if (ret != TRANS_FOUND)
> @@ -792,8 +794,9 @@
>      }
>
>      while (ret == TRANS_FOUND) {
> - if (control.mount_id != cur.mount_id || control.trans_id != 0 ||
> -     control.desc_blocknr != 0)
> + if ((control.mount_id != cur.mount_id || control.trans_id != 0 ||
> +      control.desc_blocknr != 0) &&
> +     (cur.mount_id != control.mount_id || cur.trans_id != control.trans_id
> + 1)) {
>       ret = next_transaction (fs, &cur, newest);
>
> @@ -821,7 +824,8 @@
>          replayed ++;
>      }
>
> -    reiserfs_warning (stderr, "%d transactions replayed\n", replayed);
> +    reiserfs_warning (stderr, "Journal in %s (1st block: %d) - %d
> transactions replayed\n", +     fs->fs_j_file_name,
> get_jp_journal_1st_block (sb_jp (fs->fs_ondisk_sb)), replayed);
>
>      mark_buffer_dirty (fs->fs_super_bh);
>      bwrite (fs->fs_super_bh);
> ############# pins ##############
>
> The fix was extensively tested by copying mounted reiser filesystems to
> (XOR- !) loop devices and then reiserfsck-ing them.
> The fixed reiserfsck version always created the same results or, when the
> bugged version did replay 0 transactions, the fixed version replayed the
> correct amount of transactions.
>
> The unfortunate thing about the SuSE setup is, that reiserfsck is executed
> before the filesystem gets mounted. The kernel "mount" code itself seems to
> know the trick, but that doesn't help if reiserfsck is discarding those
> transactions before the mount can perform its correct replay!
> Only the root filesystem gets replayed correctly because it will be mounted
> read-only first. On initiation of a mount, the log will always be replayed
> even if the mount is supposed to be read-only in order to achieve
> consistency as early as possible !
>
> For a quick fix of a SuSE 9.0 system, install the reiserfs(progs) source
> with yast2, untar the (source only) attachment to a directory of your
> choice and run ./createSuSEreiserfs. Then install the new reiserfsck into
> /sbin/. IMPORTANT NOTE: Make sure that /sbin/reiserfsck AND
> /sbin/fsck.reiserfs are hardlinked to each other again (ls -li shows the
> same inode) ! The boot code calls fsck.reiserfs instead of reiserfsck.

yes, there is the bug in replaying, thank you for hitting it. After some 
simplifications and tests I got the following patch, would you test it 
also please.

-- 
Thanks,
Vitaly Fertman

[-- Attachment #2: reiserfsprogs_jreplay.patch --]
[-- Type: text/x-diff, Size: 2827 bytes --]

===== reiserfscore/journal.c 1.34 vs edited =====
--- 1.34/reiserfsprogs/reiserfscore/journal.c	Tue Feb 17 15:20:46 2004
+++ edited/reiserfscore/journal.c	Fri Feb 27 15:46:54 2004
@@ -179,7 +179,6 @@
 }
 
 #define TRANS_FOUND     1
-#define TRANS_LAST      2
 #define TRANS_NOT_FOUND 0
 
 /* trans is a valid transaction. Look for valid transaction with smallest
@@ -229,8 +228,6 @@
 	trans->commit_blocknr = commit_expected (fs, next_d_bh);
 	trans->next_trans_offset = next_desc_expected (fs, next_d_bh) - j_start;
 	found = TRANS_FOUND;
-	if (break_trans.trans_id == get_desc_trans_id (next_d_bh))
-	    found = TRANS_LAST;
     }
 
     brelse (d_bh);
@@ -335,10 +332,7 @@
 
     while (1) {
 	action (fs, &oldest);	
-	if (ret == TRANS_LAST)
-	    break;
-	ret = next_transaction (fs, &oldest, newest);	
-	if (ret == TRANS_NOT_FOUND)
+	if ((ret = next_transaction (fs, &oldest, newest)) == TRANS_NOT_FOUND)
 	    break;
     }
 }
@@ -786,29 +780,21 @@
     replayed = 0;
     ret = TRANS_FOUND;
     
-    while (cur.mount_id != control.mount_id || cur.trans_id != control.trans_id) {
-	if (control.mount_id == cur.mount_id && control.trans_id == 0 && 
-	    control.desc_blocknr == 0)
+    /* Looking to the first valid not replayed transaction. */
+    while (1) {
+	if (cur.mount_id == control.mount_id && 
+	    cur.trans_id > control.trans_id)
 	    break;
-	
-	ret = next_transaction (fs, &cur, newest);
-	
-	if (ret != TRANS_FOUND)
+
+	if ((ret = next_transaction (fs, &cur, newest)) != TRANS_FOUND)
 	    break;
     }
-
+    
     while (ret == TRANS_FOUND) {
-	if (control.mount_id != cur.mount_id || control.trans_id != 0 || 
-	    control.desc_blocknr != 0) 
-	{
-	    ret = next_transaction (fs, &cur, newest);
-	    
-	    if (ret == TRANS_NOT_FOUND)
-		break;
-	    
-	    if (cur.mount_id != control.mount_id || cur.trans_id != control.trans_id + 1)
-		break;	    
-	}
+	/* If not the next transaction to be replaied, break out here. */
+	if (cur.mount_id != control.mount_id || 
+	    cur.trans_id != control.trans_id + 1)
+	    break;
 	
 	if (!transaction_check_content(fs, &cur)) {
 	    reiserfs_warning (stderr, "Trans broken: mountid %lu, transid %lu, desc %lu, "
@@ -825,9 +811,16 @@
 	update_journal_header (bh, cur);
 	control = cur;
         replayed ++;
+
+	ret = next_transaction (fs, &cur, newest);
     }
 
-    reiserfs_warning (stderr, "%d transactions replayed\n", replayed);
+    reiserfs_warning (stderr, "Reiserfs journal '%s' in blocks [%u..%u]: %d "
+		      "transactions replayed\n", fs->fs_j_file_name, 
+		      get_jp_journal_1st_block(sb_jp(fs->fs_ondisk_sb)),
+		      get_jp_journal_1st_block(sb_jp(fs->fs_ondisk_sb)) + 
+		      get_jp_journal_size(sb_jp(fs->fs_ondisk_sb)) + 1,
+		      replayed);
 	
     mark_buffer_dirty (fs->fs_super_bh);
     bwrite (fs->fs_super_bh);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug+fix in reiserfsck - reiserfscore/journal.c
  2004-02-27 12:53 ` Vitaly Fertman
@ 2004-02-28  1:41   ` Thorsten Kohfeldt
  2004-02-28 12:26     ` Vitaly Fertman
  0 siblings, 1 reply; 7+ messages in thread
From: Thorsten Kohfeldt @ 2004-02-28  1:41 UTC (permalink / raw)
  To: Vitaly Fertman, reiserfs-list, Hans Reiser; +Cc: Chris Mason

----- Original Message -----
From: "Vitaly Fertman" <vitaly@namesys.com>
To: "Thorsten Kohfeldt" <thorsten.kohfeldt@gmx.de>;
<reiserfs-list@namesys.com>
Sent: Friday, February 27, 2004 1:53 PM
Subject: Re: bug+fix in reiserfsck - reiserfscore/journal.c


> On Friday 27 February 2004 06:13, Thorsten Kohfeldt wrote:
...
> > The fix was extensively tested by copying mounted reiser filesystems to
> > (XOR- !) loop devices and then reiserfsck-ing them.
> > The fixed reiserfsck version always created the same results or, when
> > the bugged version did replay 0 transactions, the fixed version replayed
> > the correct amount of transactions.
...
> yes, there is the bug in replaying, thank you for hitting it. After some
> simplifications and tests I got the following patch, would you test it
> also please.
> --
> Thanks,
> Vitaly Fertman


Hans,

I would really be interrested in your opinion to my topic 5) down below.


Vitaly,

I tested your patch and for MY test data set it created the same results
like the patch that I provided.

I have a few comments to your patch though:

1)
I considered restructuring the main loop like you did but decided against
it in favour of a "minimum invasion" strategy. We don't want to modify
more than necessary in production software, don't we ?
I guess verifying the correctness of your mods would require more testing.

2)
You changed the behaviour of the function next_transaction().
You removed the possible return code TRANS_LAST (=2).
If I interpret the source right, then this function is kinda exported.
Check the following excerpt:

/* this is provided for anybody who wants to deal with journal */
int replay_one_transaction (reiserfs_filsys_t *, reiserfs_trans_t *);
void for_each_transaction (reiserfs_filsys_t *, action_on_trans_t);
void for_each_block (reiserfs_filsys_t *, reiserfs_trans_t *,
action_on_block_t);
int get_boundary_transactions (reiserfs_filsys_t *, reiserfs_trans_t *,
    reiserfs_trans_t *);
int next_transaction (reiserfs_filsys_t *, reiserfs_trans_t *,
    reiserfs_trans_t);

I understand that functions should only be changed at will if they are
local to one module (static). All external references by "anybody
who wants to deal with journal" would have to be adapted.

3)
An additional issue:
The comment in front of the function next_transaction() states the
following:

/* trans is a valid transaction. Look for valid transaction with smallest
   trans id which is greater than the id of the current one */

Well, the function does not exactly behave like that. The test inside it
does check something completely different. It checks wether the trans
id is <= break_trans.trans_id. I guess that comment should be adapted.
AND: See also 5)

4)
I tried to find a reason for the test against

    (control.mount_id == cur.mount_id && control.trans_id == 0 &&
       control.desc_blocknr == 0)

in the original source. I didn't find a reason but I didn't dare to remove
it in order NOT to break any special case that was formerly encountered
and fixed.
Are you positively sure that you know why it can be removed ?
Probably that implies knowing why it was written that way in first
place . . .

5)
Your new loop code looks quite straight forward but it (again, like in
several other places) does NOT take account for the second discussion
topic in my original mail, the wrap-around problem (of the trans id):

    /* Looking to the first valid not replayed transaction. */
    while (1) {
         if (cur.mount_id == control.mount_id &&
                 cur.trans_id > control.trans_id)
             break;

It seems the reiser team has decided NOT to allow a wrap-around of
the transaction id ?
In that case there should be tools (automatically) taking care of a reset
of the journal (CLEAN it) in case the wrap around is near . . .

IS that problem taken care of in REISER 4 ?
As I stated before, I'd suggest a 64-bit transaction id in the journal.

Thorsten




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug+fix in reiserfsck - reiserfscore/journal.c
  2004-02-28  1:41   ` Thorsten Kohfeldt
@ 2004-02-28 12:26     ` Vitaly Fertman
  0 siblings, 0 replies; 7+ messages in thread
From: Vitaly Fertman @ 2004-02-28 12:26 UTC (permalink / raw)
  To: Thorsten Kohfeldt, reiserfs-list, Hans Reiser
  Cc: Chris Mason, Vladimir Saveliev

Hello,

On Saturday 28 February 2004 04:41, Thorsten Kohfeldt wrote:
> ----- Original Message -----
> From: "Vitaly Fertman" <vitaly@namesys.com>
> To: "Thorsten Kohfeldt" <thorsten.kohfeldt@gmx.de>;
> <reiserfs-list@namesys.com>
> Sent: Friday, February 27, 2004 1:53 PM
> Subject: Re: bug+fix in reiserfsck - reiserfscore/journal.c
>
> > On Friday 27 February 2004 06:13, Thorsten Kohfeldt wrote:
>
> ...
>
> > > The fix was extensively tested by copying mounted reiser filesystems to
> > > (XOR- !) loop devices and then reiserfsck-ing them.
> > > The fixed reiserfsck version always created the same results or, when
> > > the bugged version did replay 0 transactions, the fixed version
> > > replayed the correct amount of transactions.
>
> ...
>
> > yes, there is the bug in replaying, thank you for hitting it. After some
> > simplifications and tests I got the following patch, would you test it
> > also please.
> > --
> > Thanks,
> > Vitaly Fertman
>
> Hans,
>
> I would really be interrested in your opinion to my topic 5) down below.
>
>
> Vitaly,
>
> I tested your patch and for MY test data set it created the same results
> like the patch that I provided.

good.

> I have a few comments to your patch though:
>
> 1)
> I considered restructuring the main loop like you did but decided against
> it in favour of a "minimum invasion" strategy. We don't want to modify
> more than necessary in production software, don't we ?
> I guess verifying the correctness of your mods would require more testing.

Even trivial changes in code need proper testing. And we will perform 
some more tests with the end code before releasing anyway. But I think 
that having a necessity to change some part of the code we should change 
it as much as needed to make it correct, simple and clear to avoid other 
later changes.

> 2)
> You changed the behaviour of the function next_transaction().
> You removed the possible return code TRANS_LAST (=2).
> If I interpret the source right, then this function is kinda exported.
> Check the following excerpt:
>
> /* this is provided for anybody who wants to deal with journal */
> int replay_one_transaction (reiserfs_filsys_t *, reiserfs_trans_t *);
> void for_each_transaction (reiserfs_filsys_t *, action_on_trans_t);
> void for_each_block (reiserfs_filsys_t *, reiserfs_trans_t *,
> action_on_block_t);
> int get_boundary_transactions (reiserfs_filsys_t *, reiserfs_trans_t *,
>     reiserfs_trans_t *);
> int next_transaction (reiserfs_filsys_t *, reiserfs_trans_t *,
>     reiserfs_trans_t);
>
> I understand that functions should only be changed at will if they are
> local to one module (static). All external references by "anybody
> who wants to deal with journal" would have to be adapted.

Yes, but I would like to get rid of the LAST return code anyway. It was 
added some time ago to solve some urgent problems and now looks like 
a good time for refactoring. Nobody seems to use next_transaction though ...

> 3)
> An additional issue:
> The comment in front of the function next_transaction() states the
> following:
>
> /* trans is a valid transaction. Look for valid transaction with smallest
>    trans id which is greater than the id of the current one */
>
> Well, the function does not exactly behave like that. The test inside it
> does check something completely different. It checks wether the trans
> id is <= break_trans.trans_id. I guess that comment should be adapted.

ok.

> AND: See also 5)
>
> 4)
> I tried to find a reason for the test against
>
>     (control.mount_id == cur.mount_id && control.trans_id == 0 &&
>        control.desc_blocknr == 0)
>
> in the original source. I didn't find a reason but I didn't dare to remove
> it in order NOT to break any special case that was formerly encountered
> and fixed.

Probably someone can tell about such special cases and they will be 
documented but I do not know about any yet. Looks like this check
supposed that the trans id allocated for the first transaction after 
mounting is 1 and params in the journal header were zeroed at that 
time. This turned out to be different. Having this explanation the 
check 'control.desc_blocknr == 0' looks excessive and getting the 
first/next transaction greater then the last replayed and replaying it 
only if its transid == last replayed transid + 1 looks correct.

> Are you positively sure that you know why it can be removed ?
> Probably that implies knowing why it was written that way in first
> place . . .
>
> 5)
> Your new loop code looks quite straight forward but it (again, like in
> several other places) does NOT take account for the second discussion
> topic in my original mail, the wrap-around problem (of the trans id):
>
>     /* Looking to the first valid not replayed transaction. */
>     while (1) {
>          if (cur.mount_id == control.mount_id &&
>                  cur.trans_id > control.trans_id)
>              break;
>
> It seems the reiser team has decided NOT to allow a wrap-around of
> the transaction id ?

no decision yet, I answered on the first problem only.

> In that case there should be tools (automatically) taking care of a reset
> of the journal (CLEAN it) in case the wrap around is near . . .
>
> IS that problem taken care of in REISER 4 ?
> As I stated before, I'd suggest a 64-bit transaction id in the journal.

yes, it is 64 bits in Reiser4.

-- 
Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug+fix in reiserfsck - reiserfscore/journal.c
  2004-02-27  3:40   ` Hans Reiser
@ 2004-02-27  4:44     ` Thorsten Kohfeldt
  0 siblings, 0 replies; 7+ messages in thread
From: Thorsten Kohfeldt @ 2004-02-27  4:44 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Hans Reiser, Chris Mason


----- Original Message -----
From: "Hans Reiser" <reiser@namesys.com>
To: "Chris Mason" <mason@suse.com>
Cc: "Thorsten Kohfeldt" <thorsten.kohfeldt@gmx.de>;
<reiserfs-list@namesys.com>
Sent: Friday, February 27, 2004 4:40 AM
Subject: Re: bug+fix in reiserfsck - reiserfscore/journal.c


> Chris Mason wrote:
>
> >On Thu, 2004-02-26 at 04:18, Thorsten Kohfeldt wrote:
> >
> >
> >>To whom it may (should!) concern,
> >>26-FEB-2004
> >>
> >>after having been annoyed for quite some time by corrupt /var file
> >>systems after
> >>(simulated) powerloss in a SuSE 9.0 setup with reiserfs, I finally
> >>tracked the
> >>problem down to a nasty bug in reiserfsck - reiserfscore/journal.c
> >>
> >>Turned out that the journal is not always correctly replayed by
> >>reiserfsck !
> >>
> >>
> >
> >Thank you very much for the patch, I'll review it and try to get a fix
> >out asap inside suse.
> >
> >[ transaction id wrapping ]
> >
> >
> >>
> >>For a starter, I added a patch that fixes one wrap around issue inside
> >>journal.c. Take it or leave it.
> >>
> >>
> >
> >Thanks, I'll play around with this and perhaps a few alternatives.
> >
> >-chris
> >
> >
> >
> >
> Is this bug present in the mainstream kernel?  Why suse only?
>
> --
> Hans



The problem is NOT located in the SuSE kernel,
but in reiserfsprogs (at least reiserfsck/fsck.reiserfs).

Because the SuSE kernel has Chris's data logging patches, I
do NOT know wether a mainstream kernel includes that
original reiserfsprogs reiserfscore/journal.c.
I hereby forward that question to your core programmer team.

Anyway, the problem DEFINITELY exists in the GENERAL
version of reiserfsprogs (at least in reiserfsck).

General reiserfs users are advised to patch their reiserfsprogs
with the first inline diff of my original mail.

I expect to see reiserfsprogs V 3.6.14 VERY soon.

Thorsten



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug+fix in reiserfsck - reiserfscore/journal.c
  2004-02-27  0:54 ` Chris Mason
@ 2004-02-27  3:40   ` Hans Reiser
  2004-02-27  4:44     ` Thorsten Kohfeldt
  0 siblings, 1 reply; 7+ messages in thread
From: Hans Reiser @ 2004-02-27  3:40 UTC (permalink / raw)
  To: Chris Mason; +Cc: Thorsten Kohfeldt, reiserfs-list

Chris Mason wrote:

>On Thu, 2004-02-26 at 04:18, Thorsten Kohfeldt wrote:
>  
>
>>To whom it may (should!) concern,                                   
>>26-FEB-2004
>> 
>>after having been annoyed for quite some time by corrupt /var file
>>systems after
>>(simulated) powerloss in a SuSE 9.0 setup with reiserfs, I finally
>>tracked the
>>problem down to a nasty bug in reiserfsck - reiserfscore/journal.c
>> 
>>Turned out that the journal is not always correctly replayed by
>>reiserfsck !
>>    
>>
>
>Thank you very much for the patch, I'll review it and try to get a fix
>out asap inside suse.
>
>[ transaction id wrapping ]
>  
>
>> 
>>For a starter, I added a patch that fixes one wrap around issue inside
>>journal.c. Take it or leave it.
>>    
>>
>
>Thanks, I'll play around with this and perhaps a few alternatives.
>
>-chris
>
>
>
>
>  
>
Is this bug present in the mainstream kernel?  Why suse only?

-- 
Hans



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug+fix in reiserfsck - reiserfscore/journal.c
       [not found] <002601c3fc49$815d1880$e91ba8c0@thk.local>
@ 2004-02-27  0:54 ` Chris Mason
  2004-02-27  3:40   ` Hans Reiser
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2004-02-27  0:54 UTC (permalink / raw)
  To: Thorsten Kohfeldt; +Cc: reiserfs-list

On Thu, 2004-02-26 at 04:18, Thorsten Kohfeldt wrote:
> To whom it may (should!) concern,                                   
> 26-FEB-2004
>  
> after having been annoyed for quite some time by corrupt /var file
> systems after
> (simulated) powerloss in a SuSE 9.0 setup with reiserfs, I finally
> tracked the
> problem down to a nasty bug in reiserfsck - reiserfscore/journal.c
>  
> Turned out that the journal is not always correctly replayed by
> reiserfsck !

Thank you very much for the patch, I'll review it and try to get a fix
out asap inside suse.

[ transaction id wrapping ]
>  
> For a starter, I added a patch that fixes one wrap around issue inside
> journal.c. Take it or leave it.

Thanks, I'll play around with this and perhaps a few alternatives.

-chris



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-02-28 12:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-27  3:13 bug+fix in reiserfsck - reiserfscore/journal.c Thorsten Kohfeldt
2004-02-27 12:53 ` Vitaly Fertman
2004-02-28  1:41   ` Thorsten Kohfeldt
2004-02-28 12:26     ` Vitaly Fertman
     [not found] <002601c3fc49$815d1880$e91ba8c0@thk.local>
2004-02-27  0:54 ` Chris Mason
2004-02-27  3:40   ` Hans Reiser
2004-02-27  4:44     ` Thorsten Kohfeldt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.